semanticcache

package module
v1.5.22 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 21, 2026 License: Apache-2.0 Imports: 18 Imported by: 3

Documentation

Overview

Package semanticcache provides semantic caching integration for Bifrost plugin. This plugin caches responses using both direct hash matching (xxhash) and semantic similarity search (embeddings). It supports configurable caching behavior via the VectorStore abstraction, with TTL management and streaming response handling.

Index

Constants

View Source
const (
	PluginName                          string        = "semantic_cache"
	DefaultVectorStoreNamespace         string        = "BifrostSemanticCachePlugin"
	CacheConnectionTimeout              time.Duration = 5 * time.Second
	CreateNamespaceTimeout              time.Duration = 30 * time.Second
	CacheSetTimeout                     time.Duration = 30 * time.Second
	DefaultCacheTTL                     time.Duration = 5 * time.Minute
	DefaultCacheThreshold               float64       = 0.8
	DefaultConversationHistoryThreshold int           = 3
)

Plugin constants

View Source
const (
	CacheKey          schemas.BifrostContextKey = "semantic_cache-key"        // String. Required (or DefaultCacheKey) — bucket entries under a tenant/feature scope.
	CacheTTLKey       schemas.BifrostContextKey = "semantic_cache-ttl"        // time.Duration. Per-request override of Config.TTL.
	CacheThresholdKey schemas.BifrostContextKey = "semantic_cache-threshold"  // float64. Per-request override of the semantic similarity threshold.
	CacheTypeKey      schemas.BifrostContextKey = "semantic_cache-cache_type" // CacheType. Narrow lookup to a single path (direct or semantic).
	CacheNoStoreKey   schemas.BifrostContextKey = "semantic_cache-no_store"   // bool. Skip writing the response to cache (still served from cache on hit).
)

Per-request context keys. Callers set these on BifrostContext before the request enters Bifrost; the plugin reads them in Pre/PostLLMHook. CacheKey (or Config.DefaultCacheKey) is the only one required for caching to engage.

View Source
const SharedTestNamespace = "BifrostSemanticCachePluginTest"

SharedTestNamespace is the single Weaviate class all parallel tests share. Mirrors production: many concurrent requests hit one namespace, isolated by per-test cache_keys (see keyForTest). Distinct from the plugin's production default so test runs can't collide with a real cache.

Variables

View Source
var SelectFields = []string{"response", "stream_chunks", "expires_at", "cache_key", "provider", "model"}

SelectFields enumerates the properties projected back from the vector store on a cache hit. params_hash and from_bifrost_semantic_cache_plugin are filter-only (used in WHERE-style queries to narrow matches) and intentionally omitted from this projection — keep them defined in VectorStoreProperties below so the store creates the columns/indexes, but don't fetch them.

View Source
var VectorStoreProperties = map[string]vectorstore.VectorStoreProperties{
	"response": {
		DataType:    vectorstore.VectorStorePropertyTypeString,
		Description: "The response from the provider",
	},
	"stream_chunks": {
		DataType:    vectorstore.VectorStorePropertyTypeStringArray,
		Description: "The stream chunks from the provider",
	},
	"expires_at": {
		DataType:    vectorstore.VectorStorePropertyTypeInteger,
		Description: "The expiration time of the cache entry",
	},
	"cache_key": {
		DataType:    vectorstore.VectorStorePropertyTypeString,
		Description: "The cache key from the request",
	},
	"provider": {
		DataType:    vectorstore.VectorStorePropertyTypeString,
		Description: "The provider used for the request",
	},
	"model": {
		DataType:    vectorstore.VectorStorePropertyTypeString,
		Description: "The model used for the request",
	},
	"params_hash": {
		DataType:    vectorstore.VectorStorePropertyTypeString,
		Description: "The hash of the parameters used for the request",
	},
	"from_bifrost_semantic_cache_plugin": {
		DataType:    vectorstore.VectorStorePropertyTypeBoolean,
		Description: "Whether the cache entry was created by the BifrostSemanticCachePlugin",
	},
}

Functions

func AddUserMessage added in v1.2.6

func AddUserMessage(messages []schemas.ChatMessage, userMessage string) []schemas.ChatMessage

AddUserMessage adds a user message to existing conversation

func AssertCacheHit

func AssertCacheHit(t *testing.T, response *schemas.BifrostResponse, expectedCacheType string)

AssertCacheHit verifies that a response was served from cache

func AssertNoCacheHit

func AssertNoCacheHit(t *testing.T, response *schemas.BifrostResponse)

AssertNoCacheHit verifies that a response was NOT served from cache

func BuildConversationHistory added in v1.2.6

func BuildConversationHistory(systemPrompt string, userAssistantPairs ...[]string) []schemas.ChatMessage

BuildConversationHistory creates a conversation history from pairs of user/assistant messages

func CreateBasicChatRequest

func CreateBasicChatRequest(content string, temperature float64, maxTokens int) *schemas.BifrostChatRequest

CreateBasicChatRequest creates a basic chat completion request for testing

func CreateBasicResponsesRequest added in v1.3.0

func CreateBasicResponsesRequest(content string, temperature float64, maxTokens int) *schemas.BifrostResponsesRequest

CreateBasicResponsesRequest creates a basic Responses API request for testing

func CreateContextWithCacheKey

func CreateContextWithCacheKey(t testing.TB, suffix string) *schemas.BifrostContext

CreateContextWithCacheKey creates a context with the test cache key CreateContextWithCacheKey creates a context with a per-test cache key. suffix may be "" for tests using only one cache key.

func CreateContextWithCacheKeyAndNoStore added in v1.2.6

func CreateContextWithCacheKeyAndNoStore(t testing.TB, suffix string, noStore bool) *schemas.BifrostContext

CreateContextWithCacheKeyAndNoStore creates a context with cache key and no-store flag

func CreateContextWithCacheKeyAndTTL added in v1.2.6

func CreateContextWithCacheKeyAndTTL(t testing.TB, suffix string, ttl time.Duration) *schemas.BifrostContext

CreateContextWithCacheKeyAndTTL creates a context with cache key and custom TTL

func CreateContextWithCacheKeyAndThreshold added in v1.2.6

func CreateContextWithCacheKeyAndThreshold(t testing.TB, suffix string, threshold float64) *schemas.BifrostContext

CreateContextWithCacheKeyAndThreshold creates a context with cache key and custom threshold

func CreateContextWithCacheKeyAndType added in v1.2.6

func CreateContextWithCacheKeyAndType(t testing.TB, suffix string, cacheType CacheType) *schemas.BifrostContext

CreateContextWithCacheKeyAndType creates a context with cache key and cache type

func CreateConversationRequest added in v1.2.6

func CreateConversationRequest(messages []schemas.ChatMessage, temperature float64, maxTokens int) *schemas.BifrostChatRequest

CreateConversationRequest creates a chat request with conversation history

func CreateEmbeddingRequest added in v1.2.6

func CreateEmbeddingRequest(texts []string) *schemas.BifrostEmbeddingRequest

CreateEmbeddingRequest creates an embedding request for testing

func CreateImageGenerationRequest added in v1.4.9

func CreateImageGenerationRequest(prompt string, size string, quality string) *schemas.BifrostImageGenerationRequest

CreateImageGenerationRequest creates an image generation request for testing

func CreateResponsesRequestWithInstructions added in v1.3.0

func CreateResponsesRequestWithInstructions(content string, instructions string, temperature float64, maxTokens int) *schemas.BifrostResponsesRequest

CreateResponsesRequestWithInstructions creates a Responses API request with system instructions

func CreateResponsesRequestWithTools added in v1.3.0

func CreateResponsesRequestWithTools(content string, temperature float64, maxTokens int, tools []schemas.ResponsesTool) *schemas.BifrostResponsesRequest

CreateResponsesRequestWithTools creates a Responses API request with tools for testing

func CreateSpeechRequest

func CreateSpeechRequest(input string, voice string) *schemas.BifrostSpeechRequest

CreateSpeechRequest creates a speech synthesis request for testing

func CreateStreamingChatRequest

func CreateStreamingChatRequest(content string, temperature float64, maxTokens int) *schemas.BifrostChatRequest

CreateStreamingChatRequest creates a streaming chat completion request for testing

func CreateStreamingResponsesRequest added in v1.3.0

func CreateStreamingResponsesRequest(content string, temperature float64, maxTokens int) *schemas.BifrostResponsesRequest

CreateStreamingResponsesRequest creates a streaming Responses API request for testing

func Init

func Init(ctx context.Context, config *Config, logger schemas.Logger, store vectorstore.VectorStore) (schemas.LLMPlugin, error)

Init validates the configuration, creates the namespace in the underlying VectorStore, starts the background reaper goroutines, and returns a plugin ready to be wired into the Bifrost plugin pipeline.

Note: Init mutates *config in place to fill in defaults — TTL, Threshold, CacheBy* — so the caller sees the resolved values after this returns.

func WaitForCache

func WaitForCache(plugin schemas.LLMPlugin)

WaitForCache waits for async cache operations to complete.

WaitForPendingOperations now drains the writersWg accurately (every PostLLMHook goroutine + the expired-entry async delete is tracked), so the stored entries are guaranteed durable when this returns. The small sleep below is a buffer for vector store index visibility on stores with eventual consistency (Weaviate is usually immediate on single-node, but cloud or multi-shard setups may need a tick to make the entry queryable).

Override via SEMCACHE_TEST_INDEX_DELAY_MS for slower stores / CI.

Types

type BaseAccount

type BaseAccount struct{}

BaseAccount implements the schemas.Account interface for testing purposes.

func (*BaseAccount) GetConfigForProvider

func (baseAccount *BaseAccount) GetConfigForProvider(providerKey schemas.ModelProvider) (*schemas.ProviderConfig, error)

func (*BaseAccount) GetConfiguredProviders

func (baseAccount *BaseAccount) GetConfiguredProviders() ([]schemas.ModelProvider, error)

func (*BaseAccount) GetKeysForProvider

func (baseAccount *BaseAccount) GetKeysForProvider(ctx context.Context, providerKey schemas.ModelProvider) ([]schemas.Key, error)

type CacheType

type CacheType string
const (
	CacheTypeDirect   CacheType = "direct"
	CacheTypeSemantic CacheType = "semantic"
)

type Config

type Config struct {
	// Embedding Model settings - REQUIRED for semantic caching
	Provider       schemas.ModelProvider `json:"provider"`
	EmbeddingModel string                `json:"embedding_model,omitempty"` // Model to use for generating embeddings (optional)

	// Plugin behavior settings
	TTL                  time.Duration `json:"ttl,omitempty"`                    // Time-to-live for cached responses (default: 5min)
	Threshold            float64       `json:"threshold,omitempty"`              // Cosine similarity threshold for semantic matching (0 = unset → default 0.8)
	VectorStoreNamespace string        `json:"vector_store_namespace,omitempty"` // Namespace for vector store (optional)
	Dimension            int           `json:"dimension"`                        // Dimension for vector store (must be > 0 when Provider is set; use 1 for direct-only mode)

	// Advanced caching behavior
	DefaultCacheKey              string `json:"default_cache_key,omitempty"`              // Default cache key used when no per-request key is provided (optional, caching is disabled when empty and no per-request key is set)
	ConversationHistoryThreshold int    `json:"conversation_history_threshold,omitempty"` // Skip caching for requests with more than this number of messages in the conversation history (default: 3)
	CacheByModel                 *bool  `json:"cache_by_model,omitempty"`                 // Include model in cache key (default: true)
	CacheByProvider              *bool  `json:"cache_by_provider,omitempty"`              // Include provider in cache key (default: true)
	ExcludeSystemPrompt          *bool  `json:"exclude_system_prompt,omitempty"`          // Exclude system prompt in cache key (default: false)
}

Config contains configuration for the semantic cache plugin. The VectorStore abstraction handles the underlying storage implementation and its defaults. Only specify values you want to override from the semantic cache defaults.

Modes:

  • Semantic mode: set Provider + EmbeddingModel + Dimension > 0. Both direct hash matching and embedding-based similarity search are enabled.
  • Direct-only mode: set Provider="" and Dimension=1. The plugin disables semantic search entirely; cache lookups go through the deterministic direct hash path. Dimension=1 keeps stores that require a vector happy.

func (*Config) UnmarshalJSON

func (c *Config) UnmarshalJSON(data []byte) error

UnmarshalJSON implements custom JSON unmarshaling for Config so TTL accepts either a duration string ("1m", "1h") or a JSON number (seconds). All other fields decode through the default path via a type alias, so adding a new field on Config does not require touching this method.

type EmbeddingRequestExecutor added in v1.5.6

EmbeddingRequestExecutor invokes the embedding endpoint on the bifrost client. The plugin calls it on cache misses to compute the request embedding for semantic similarity search and storage. It mirrors the signature of bifrost.Client.EmbeddingRequest.

type Plugin

type Plugin struct {
	// contains filtered or unexported fields
}

Plugin implements schemas.LLMPlugin for semantic caching. It serves cached responses via two complementary lookup paths: a direct O(1) hash match on (provider, model, cache_key, request_hash, params_hash) for exact replays, and an embedding-based similarity search for semantically related content. Streaming responses are accumulated chunk-by-chunk and stored as a single entry on the final chunk; TTL bookkeeping is per-entry via expires_at.

func (*Plugin) Cleanup

func (plugin *Plugin) Cleanup() error

Cleanup signals the background loops to stop and waits for in-flight cache writes to drain before returning. When CleanUpOnShutdown is true, it then deletes every entry tagged from_bifrost_semantic_cache_plugin and drops the namespace — useful for ephemeral test environments. The default is to leave entries in place so they can serve subsequent process restarts.

func (*Plugin) ClearCacheForCacheID added in v1.5.11

func (plugin *Plugin) ClearCacheForCacheID(cacheID string) error

ClearCacheForCacheID deletes a single cache entry by its storage ID. The caller obtains the ID from BifrostResponse.ExtraFields.CacheDebug.CacheID, which is stamped on both cache hits and cache misses — so the same handle works whether the request wrote the entry or read it.

func (*Plugin) ClearCacheForKey

func (plugin *Plugin) ClearCacheForKey(cacheKey string) error

ClearCacheForKey deletes every entry written under the given cache_key. Use this to invalidate a tenant or feature scope in bulk. Per-entry deletion is available via ClearCacheForCacheID.

func (*Plugin) GetName

func (plugin *Plugin) GetName() string

GetName returns the canonical name used for plugin identification and logging.

func (*Plugin) HTTPTransportPostHook added in v1.4.9

func (plugin *Plugin) HTTPTransportPostHook(ctx *schemas.BifrostContext, req *schemas.HTTPRequest, resp *schemas.HTTPResponse) error

HTTPTransportPostHook is not used by the semantic cache plugin.

func (*Plugin) HTTPTransportPreHook added in v1.4.9

func (plugin *Plugin) HTTPTransportPreHook(ctx *schemas.BifrostContext, req *schemas.HTTPRequest) (*schemas.HTTPResponse, error)

HTTPTransportPreHook is not used by the semantic cache plugin.

func (*Plugin) HTTPTransportStreamChunkHook added in v1.4.15

func (plugin *Plugin) HTTPTransportStreamChunkHook(ctx *schemas.BifrostContext, req *schemas.HTTPRequest, chunk *schemas.BifrostStreamChunk) (*schemas.BifrostStreamChunk, error)

HTTPTransportStreamChunkHook passes streaming chunks through unchanged.

func (*Plugin) PostLLMHook added in v1.4.16

PostLLMHook caches the upstream response keyed by the storageID resolved in PreLLMHook (deterministic directCacheID for direct hits, request UUID otherwise). The store write runs in a goroutine tracked by writersWg with its own background context + CacheSetTimeout, so client cancellation after the response is delivered doesn't drop the cache write. Returns the response unmodified — caching never alters the request flow.

func (*Plugin) PreLLMHook added in v1.4.16

PreLLMHook performs the cache lookup before the request reaches the provider. It runs the direct hash path first (cheapest), falls back to semantic similarity search when configured, and short-circuits the pipeline with a cached response on hit. On miss, it leaves per-request state on the plugin keyed by request ID for PostLLMHook to consume when the upstream response arrives.

func (*Plugin) PreRequestHook added in v1.5.19

func (plugin *Plugin) PreRequestHook(_ *schemas.BifrostContext, _ *schemas.BifrostRequest) error

PreRequestHook implements schemas.LLMPlugin (no-op — required for plugin indexing).

func (*Plugin) SetEmbeddingRequestExecutor added in v1.5.6

func (plugin *Plugin) SetEmbeddingRequestExecutor(executor EmbeddingRequestExecutor)

SetEmbeddingRequestExecutor wires up the function the plugin uses to call out to the embedding provider. Must be set before the plugin starts serving traffic; semantic search is silently skipped while it's nil.

func (*Plugin) WaitForPendingOperations added in v1.4.19

func (plugin *Plugin) WaitForPendingOperations()

WaitForPendingOperations blocks until all pending cache operations (goroutines) complete. This is useful in tests to ensure cache entries are stored before checking for cache hits. It does NOT wait on background loops — those only exit on Cleanup.

type RetryConfig added in v1.3.0

type RetryConfig struct {
	MaxRetries int
	BaseDelay  time.Duration
}

RetryConfig defines retry configuration for API requests

func DefaultRetryConfig added in v1.3.0

func DefaultRetryConfig() RetryConfig

DefaultRetryConfig returns the default retry configuration

type StreamAccumulator

type StreamAccumulator struct {

	// RequestID is the BifrostContext request ID this accumulator is keyed by.
	RequestID string
	// StorageID is the cache entry ID the accumulated stream will be written under.
	StorageID string
	// Chunks holds every chunk seen so far, in arrival order.
	Chunks []*StreamChunk
	// LastSeenAt records the arrival time of the most recent chunk. The reaper
	// uses this so a long-running stream isn't evicted mid-flight; first-chunk
	// time alone would falsely flag still-active streams as abandoned.
	LastSeenAt time.Time
	// IsComplete is set when the final chunk has been observed; further final
	// chunks are no-ops to keep flush idempotent.
	IsComplete bool
	// Embedding is the request embedding to attach to the cache entry, or nil
	// for direct-only writes.
	Embedding []float32
	// Metadata is the unified metadata captured at first-chunk time and reused
	// at flush. expires_at is locked in here, so TTL is fixed at first chunk.
	Metadata map[string]any
	// TTL is retained for symmetry with Metadata; the effective expiry is the
	// expires_at value already baked into Metadata.
	TTL time.Duration
	// contains filtered or unexported fields
}

StreamAccumulator collects the chunks of a single streaming response so they can be flushed as one cache entry on the final chunk.

type StreamChunk

type StreamChunk struct {
	// Timestamp records when this chunk arrived at PostLLMHook. Used by the
	// reaper to drop accumulators stuck without a final chunk.
	Timestamp time.Time
	// Response is the chunk payload as delivered by the provider.
	Response *schemas.BifrostResponse
}

StreamChunk is one chunk from a streaming response, retained until the stream completes so it can be persisted as part of the cache entry.

type TestSetup

type TestSetup struct {
	Logger schemas.Logger
	Store  vectorstore.VectorStore
	Plugin schemas.LLMPlugin
	Client *bifrost.Bifrost
	Config *Config
}

TestSetup contains common test setup components

func CreateTestSetupWithConversationThreshold added in v1.2.6

func CreateTestSetupWithConversationThreshold(t *testing.T, threshold int) *TestSetup

CreateTestSetupWithConversationThreshold creates a test setup with custom conversation history threshold

func CreateTestSetupWithExcludeSystemPrompt added in v1.2.6

func CreateTestSetupWithExcludeSystemPrompt(t *testing.T, excludeSystem bool) *TestSetup

CreateTestSetupWithExcludeSystemPrompt creates a test setup with ExcludeSystemPrompt setting

func CreateTestSetupWithThresholdAndExcludeSystem added in v1.2.6

func CreateTestSetupWithThresholdAndExcludeSystem(t *testing.T, threshold int, excludeSystem bool) *TestSetup

CreateTestSetupWithThresholdAndExcludeSystem creates a test setup with both conversation threshold and exclude system prompt settings

func NewTestSetup

func NewTestSetup(t *testing.T) *TestSetup

NewTestSetup creates a new test setup with default configuration

func NewTestSetupWithConfig

func NewTestSetupWithConfig(t *testing.T, config *Config) *TestSetup

NewTestSetupWithConfig creates a new test setup with custom configuration

func NewTestSetupWithVectorStore added in v1.4.15

func NewTestSetupWithVectorStore(t *testing.T, config *Config, storeType vectorstore.VectorStoreType) *TestSetup

NewTestSetupWithVectorStore creates a new test setup with custom configuration and vector store type

func (*TestSetup) Cleanup

func (ts *TestSetup) Cleanup()

Cleanup cleans up test resources

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL