cache

package
v1.4.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 14, 2026 License: Apache-2.0 Imports: 17 Imported by: 0

README

Governed Semantic Cache

This package implements Talon's semantic cache for LLM completions: cost and latency optimization by serving similar prompts from cache instead of calling the LLM. The cache is GDPR Article 17 compliant, PII-safe, tenant-isolated, and auditable.

Cache vs memory (clarification)

Aspect Semantic cache (this package) Agent memory (internal/memory)
Layer Gateway / proxy (LLM request path) Agent (per-agent learning)
Purpose Cost and latency: reuse similar prompts Safety and compliance: what the agent may remember
Duration Minutes to days (TTL, eviction) Weeks to indefinitely
Governance Cache TTL, data tier, PII scrubbing, GDPR erasure Categories, PII policy, constitutional AI
Config talon.config.yaml under cache agent.talon.yaml under memory
When used Before every LLM call When building context for later runs

The semantic cache sits at the proxy/gateway layer. Memory governance sits at the agent layer. Both may use similar techniques (e.g. embeddings, similarity) for different goals.

Embedding strategy (Option C — BM25, v0.2.0 default)

We use BM25-style term-vector similarity in pure Go (embedder.go):

  • No external model or CGO — single binary, no extra dependencies.
  • Deterministic — same text always yields the same blob for lookup.
  • Good for exact and near-exact match — repeated or slightly reworded prompts hit the cache.
  • Does not match paraphrases (e.g. "What is GDPR?" vs "Explain GDPR to me"); that is an acceptable MVP tradeoff. Most cache hits in practice come from repeated or near-identical queries.

Alternatives deferred to later:

  • Option A (v0.3): Local embedding model (e.g. ONNX MiniLM) for true semantic matching.
  • Option B (not recommended): LLM provider embedding API — adds latency and cost to every lookup.

What is stored

  • Stored: Prompt embedding (serialized term vector, no raw prompt text), PII-scrubbed response text, metadata (tenant_id, model, TTL, hit_count), HMAC signature.
  • Not stored: Raw prompt text, raw response text, user identifiers (except optional user_id for GDPR user-level erasure).

Components

  • store.go — SQLite schema, CRUD, lookup with similarity function, eviction, HMAC, GDPR erasure.
  • embedder.go — BM25 tokenization and cosine similarity of term vectors.
  • pii_scrubber.go — Wraps classifier.Redact for response text before storage.
  • policy.go — OPA cache eligibility (data tier, PII, request type, cache_enabled); see rego/cache.rego.

Documentation

Overview

Package cache embedder provides BM25-style text similarity for the semantic cache (Option C). No raw prompt text is stored — only a serialized term vector for similarity lookup.

Package cache: cache entry key derivation (deterministic, not password hashing).

Package cache PII scrubber wraps the classifier so LLM responses are PII-scrubbed before being stored in the semantic cache.

Package cache policy evaluates OPA cache eligibility (lookup and store).

Package cache provides a governed semantic cache for LLM completions.

The cache is gateway-level cost optimization: it stores prompt embeddings (not raw prompts) and PII-scrubbed responses, with strict tenant isolation, configurable TTL, and GDPR Article 17 erasure. See internal/cache/README.md for cache-vs-memory clarification.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DeriveEntryKey

func DeriveEntryKey(tenantID, model, prompt string) string

func TenantIDForCacheKey

func TenantIDForCacheKey(tenantID string) string

DeriveEntryKey returns a deterministic cache entry key from tenant, model, and prompt. Uses SHA-2 (SHA-256) for cache key derivation only: same inputs produce the same key for lookup/insert. No password or secret is hashed; inputs are tenant id, model name, and prompt text. SHA-2 (or SHA-3) is appropriate for non-password uses like cache keys.

Precondition: callers must pass only non-secret values—tenant identifier (e.g. from caller config, used as cache scope), model name, and prompt text. Do not pass API keys, passwords, or other secrets. The tenant ID is typically from config lookup (by API key), not the API key itself; it is an identifier like "acme-corp", not sensitive data.

TenantIDForCacheKey documents that the given string is a tenant identifier for cache scoping (e.g. from caller config), not an API key or secret. Use at call sites to make the non-secret use explicit for static analysis.

Types

type BM25

type BM25 struct {
	// MinTermLen ignores tokens shorter than this (default 2).
	MinTermLen int
}

BM25 is a pure-Go BM25-style embedder: tokenize text and produce a term vector blob. No external model or CGO; deterministic and suitable for exact and near-exact match caching.

func NewBM25

func NewBM25() *BM25

NewBM25 returns a new BM25 embedder with default settings.

func (*BM25) Embed

func (b *BM25) Embed(text string) ([]byte, error)

Embed tokenizes text and returns a serialized term vector (blob) for storage in the cache. The blob does not contain raw text; it is used only for similarity comparison via Similarity.

func (*BM25) Similarity

func (b *BM25) Similarity(queryBlob, candidateBlob []byte) (float64, error)

Similarity computes cosine similarity between two term-vector blobs (from Embed). Returns a value in [0, 1]; 1 means identical vectors. Safe to use as cache.SimilarityFunc.

func (*BM25) SimilarityFunc

func (b *BM25) SimilarityFunc() SimilarityFunc

SimilarityFunc returns a cache.SimilarityFunc that uses this BM25 embedder.

type Entry

type Entry struct {
	ID            string
	TenantID      string
	UserID        string // Optional; for user-level GDPR erasure
	CacheKey      string
	EmbeddingData []byte
	ResponseText  string
	Model         string
	DataTier      string
	PIIScrubbed   bool
	HitCount      int64
	CreatedAt     time.Time
	ExpiresAt     time.Time
	LastAccessed  *time.Time
	HMACSignature string
}

Entry is a single semantic cache record.

type Evaluator

type Evaluator struct {
	// contains filtered or unexported fields
}

Evaluator evaluates cache eligibility policy.

func NewEvaluator

func NewEvaluator(ctx context.Context) (*Evaluator, error)

NewEvaluator compiles the embedded cache.rego and returns an evaluator.

func (*Evaluator) Evaluate

func (e *Evaluator) Evaluate(ctx context.Context, input *PolicyInput) (*PolicyResult, error)

Evaluate returns whether cache lookup and store are allowed for the input.

type LookupResult

type LookupResult struct {
	Entry      *Entry
	Similarity float64
}

LookupResult is the return type of Store.Lookup. It includes the matching entry and the actual similarity score (in [0, 1]) so callers can record accurate audit data instead of the configured threshold.

type PIIScrubber

type PIIScrubber struct {
	// contains filtered or unexported fields
}

PIIScrubber wraps the PII classifier's Redact to produce cache-safe response text. Responses are scrubbed (PII replaced with placeholders like [EMAIL]) before storage.

func NewPIIScrubber

func NewPIIScrubber(scanner *classifier.Scanner) *PIIScrubber

NewPIIScrubber returns a scrubber that uses the given classifier scanner.

func (*PIIScrubber) Scrub

func (p *PIIScrubber) Scrub(ctx context.Context, text string) string

Scrub returns text with PII replaced by type-based placeholders (e.g. [EMAIL], [IBAN]). Use this for LLM response text before storing in the cache.

type PolicyInput

type PolicyInput struct {
	TenantID     string `json:"tenant_id"`
	DataTier     string `json:"data_tier"`    // public | internal | confidential | restricted
	PIIDetected  bool   `json:"pii_detected"` // from classifier pre-scan
	PIISeverity  string `json:"pii_severity"` // none | low | high
	Model        string `json:"model"`
	RequestType  string `json:"request_type"`  // completion | embedding | tool_call
	CacheEnabled bool   `json:"cache_enabled"` // from tenant/config
}

PolicyInput is the input to the cache eligibility Rego policy.

type PolicyResult

type PolicyResult struct {
	AllowLookup bool `json:"allow_lookup"`
	AllowStore  bool `json:"allow_store"`
}

PolicyResult is the result of cache policy evaluation.

type SimilarityFunc

type SimilarityFunc func(queryBlob, candidateBlob []byte) (float64, error)

SimilarityFunc compares query embedding blob to a candidate's embedding blob and returns a similarity score in [0, 1]. Used by Lookup to find the best match.

type Store

type Store struct {
	// contains filtered or unexported fields
}

Store persists semantic cache entries in SQLite with HMAC integrity.

func NewStore

func NewStore(dbPath string, signingKey string) (*Store, error)

NewStore opens or creates the cache SQLite DB and applies the schema.

func (*Store) Close

func (s *Store) Close() error

Close closes the database connection.

func (*Store) CountByTenant

func (s *Store) CountByTenant(ctx context.Context, tenantID string) (int, error)

CountByTenant returns the number of cache entries for the tenant (for max_entries_per_tenant enforcement).

func (*Store) DeleteExpired

func (s *Store) DeleteExpired(ctx context.Context) (int64, error)

DeleteExpired removes entries where expires_at < now. Returns the number of rows deleted.

func (*Store) EraseTenant

func (s *Store) EraseTenant(ctx context.Context, tenantID string) (int64, error)

EraseTenant deletes all cache entries for the tenant (GDPR Article 17). Returns count deleted.

func (*Store) EraseTenantUser

func (s *Store) EraseTenantUser(ctx context.Context, tenantID, userID string) (int64, error)

EraseTenantUser deletes all cache entries for the tenant and user (GDPR Article 17). Returns count deleted. Only entries with the given user_id are removed; entries with NULL user_id are not deleted by this call.

func (*Store) GetByID

func (s *Store) GetByID(ctx context.Context, id string) (*Entry, error)

GetByID returns the cache entry by ID, or nil if not found.

func (*Store) IncrementHitCount

func (s *Store) IncrementHitCount(ctx context.Context, id string) error

IncrementHitCount increments hit_count and sets last_accessed for the entry.

func (*Store) Insert

func (s *Store) Insert(ctx context.Context, e *Entry) error

Insert stores a new cache entry and signs it. ID is set if empty.

func (*Store) ListTenants

func (s *Store) ListTenants(ctx context.Context) ([]string, error)

ListTenants returns distinct tenant IDs that have cache entries (for CLI/stats).

func (*Store) Lookup

func (s *Store) Lookup(ctx context.Context, tenantID string, queryEmbedding []byte, threshold float64, maxCandidates int, sim SimilarityFunc) (*LookupResult, error)

Lookup finds the best-matching cache entry for the tenant and query embedding using the provided similarity function. Returns nil if no candidate exceeds the threshold. maxCandidates limits how many entries are loaded for comparison (e.g. 1000). The returned LookupResult includes the actual similarity score so callers can record it in evidence (audit trail) instead of the configured threshold.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL