Documentation
¶
Overview ¶
Package embedding provides text embedding generation for memory vector search.
Index ¶
Constants ¶
const ( // KindNoop identifies the placeholder provider returned when no // embedder is configured. Callers MUST treat KindNoop as // "embedding unavailable" and refuse to persist vectors from it. KindNoop = "noop" // KindOllama identifies the Ollama-backed provider. KindOllama = "ollama" )
Kind values for Provider.Kind. Used by callers (platform wiring, toolkit write paths) to distinguish a real, network-backed embedder from the placeholder noop. A noop returns zero vectors with no error, which is indistinguishable from a "real" embedding at the Embed/EmbedBatch contract level; without an explicit kind, downstream consumers cannot tell whether the vectors they hold are meaningful.
const DefaultDimension = 768
DefaultDimension is the default embedding dimensionality (nomic-embed-text).
const DefaultMaxInputBytes = 6000
DefaultMaxInputBytes bounds the byte length of each text the provider sends to Ollama. The platform must cap input itself rather than rely on Ollama's truncate flag, which is unreliable: against Ollama 0.18.0 + nomic-embed-text at a 2048-token context, real content that exceeds the context returns HTTP 400 "the input length exceeds the context length" EVEN with truncate:true, because Ollama's Go-layer truncation and the runner's tokenizer disagree on the token count for some content. Plain prose embeds at ~3.4 chars/token, so the ~2048-token boundary sits near 7000 bytes; 6000 leaves margin for tokenizer drift and denser content (code, JSON specs). Operators running a larger- context model can raise this via config. The cap only trims the text that is embedded; the full content is still stored. See #623.
const DefaultTimeout = 30
DefaultTimeout is the default HTTP timeout in seconds for embedding API calls. Tuned for the singular /api/embeddings path (one text per call), where a CPU-only Ollama with nomic-embed-text typically returns in 1-3 seconds; 30s is a generous ceiling for transient slowness on the request path. Synchronous request-path callers (memory_recall, memory_manage, knowledge capture_insight, apigateway query-vector) share this default so a wedged Ollama fails the tool call at 30s instead of holding an MCP request handler open for minutes.
The batched /api/embed path used by the api-gateway embed-jobs worker needs a much higher ceiling (CPU-only Ollama on a 32-text batch can take 60+ seconds). The worker constructs its own Provider with a longer timeout from apigateway.embed_jobs.embed_timeout — see pkg/platform/apigateway_embed_jobs.go. The default here intentionally does NOT cover the batch case so request-path consumers are not caught up in the worker's longer budget (#445).
Variables ¶
This section is empty.
Functions ¶
func EmbedForSearch ¶ added in v1.81.0
EmbedForSearch returns a query embedding for relevance ranking, or nil to signal that the caller should fall back to lexical-only ranking. It returns nil when no real provider is configured, when the embed call errors, or when the result is a zero vector (the noop placeholder's output). This is the one hybrid-vs-lexical decision shared by every request-path search surface (recall_insight, the portal knowledge/asset search), so they cannot drift.
func IsConfigured ¶ added in v1.64.0
IsConfigured reports whether p is a real, configured embedding provider whose vectors are safe to persist. Returns false for nil and for the noop placeholder. Used by the platform wiring layer and toolkit write paths as a single-line guard.
func IsZeroVector ¶ added in v1.80.0
IsZeroVector reports whether every component of v is zero, the signature of the noop provider's output (an unconfigured embedder). Cosine similarity against a zero vector is meaningless, so request-path callers (memory_recall, the portal knowledge/memory search) use this to degrade to lexical ranking. Shared so every surface makes the same hybrid-vs- lexical decision and they cannot drift.
func ModelName ¶ added in v1.74.0
ModelName returns p's underlying embedding model identifier when the concrete provider exposes one, else "". The memory write path stamps this on each row (embedding_model) and the indexjobs memory Sink diffs stored rows against the current provider's model to find model-swap gaps, so both sides must read the model the same way. Mirrors the unexported indexjobs.providerModel.
Types ¶
type OllamaConfig ¶
type OllamaConfig struct {
URL string
Model string
Timeout time.Duration
// MaxInputBytes caps the byte length of each text sent to Ollama.
// Zero or negative selects DefaultMaxInputBytes. See that constant
// for why the platform bounds input itself rather than trusting
// Ollama's truncate flag.
MaxInputBytes int
}
OllamaConfig configures the Ollama embedding provider.
type Provider ¶
type Provider interface {
// Embed generates an embedding vector for a single text input.
Embed(ctx context.Context, text string) ([]float32, error)
// EmbedBatch generates embedding vectors for multiple text inputs.
EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)
// Dimension returns the dimensionality of the generated embeddings.
Dimension() int
// Kind returns a short identifier for the provider implementation
// (KindOllama, KindNoop, ...). Callers use this to refuse to
// persist vectors from the noop placeholder without depending on
// concrete type assertions.
Kind() string
}
Provider generates vector embeddings from text.
func NewNoopProvider ¶
NewNoopProvider creates a no-op embedding provider.
func NewOllamaProvider ¶
func NewOllamaProvider(cfg OllamaConfig) Provider
NewOllamaProvider creates an embedding provider that calls Ollama.