Documentation
¶
Overview ¶
Package embed defines embedder interfaces and a deterministic test fixture.
Index ¶
- type ContextualEmbedder
- type Embedder
- type Fixture
- type OpenAI
- func (e *OpenAI) Config() OpenAIConfig
- func (e *OpenAI) Dimension(ctx context.Context) (int, error)
- func (e *OpenAI) EmbedDocuments(ctx context.Context, texts []string) ([][]float64, error)
- func (e *OpenAI) EmbedQueries(ctx context.Context, texts []string) ([][]float64, error)
- func (e *OpenAI) Fingerprint() string
- func (e *OpenAI) Unconfigured() bool
- type OpenAIConfig
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type ContextualEmbedder ¶
type ContextualEmbedder interface {
Embedder
EmbedDocumentChunks(ctx context.Context, fullDoc string, chunks []string) ([][]float64, error)
}
ContextualEmbedder extends Embedder with an optional chunk-aware entry point that sees the full source document alongside the chunk texts, enabling late-chunking or other context-aware strategies. The embedding of Embedder makes the relationship explicit in the type system: every ContextualEmbedder is an Embedder, and the index's runtime type assertion from an Embedder value to a ContextualEmbedder is a superset upgrade rather than a separate disjoint implementation.
type Embedder ¶
type Embedder interface {
Fingerprint() string
Dimension(ctx context.Context) (int, error)
EmbedDocuments(ctx context.Context, texts []string) ([][]float64, error)
EmbedQueries(ctx context.Context, texts []string) ([][]float64, error)
}
Embedder generates embeddings for stored records and live queries.
type Fixture ¶
type Fixture struct {
// contains filtered or unexported fields
}
Fixture is a deterministic embedder for tests and offline use.
Safe for concurrent use by multiple goroutines once constructed: Fixture is immutable post-NewFixture and every embed path operates on local state only.
func NewFixture ¶
NewFixture returns a deterministic hash-based embedder.
func (*Fixture) EmbedDocuments ¶
EmbedDocuments returns deterministic embeddings for indexed document texts.
func (*Fixture) EmbedQueries ¶
EmbedQueries returns deterministic embeddings for query texts.
func (*Fixture) Fingerprint ¶
Fingerprint returns a stable identifier for the fixture configuration.
type OpenAI ¶
type OpenAI struct {
// contains filtered or unexported fields
}
OpenAI implements Embedder against an OpenAI-compatible HTTP embeddings API.
Safe for concurrent use by multiple goroutines once constructed: the cached dimension is guarded by a sync.Mutex, and http.Client is goroutine-safe by contract. Callers may share a single *OpenAI across a long-lived service without additional synchronization.
func NewOpenAI ¶
func NewOpenAI(cfg OpenAIConfig) *OpenAI
NewOpenAI returns an OpenAI-compatible embedder bound to cfg.
func (*OpenAI) Config ¶
func (e *OpenAI) Config() OpenAIConfig
Config returns the normalized embedder config.
func (*OpenAI) Dimension ¶
Dimension probes the upstream on first use and then reuses the cached value.
func (*OpenAI) EmbedDocuments ¶
EmbedDocuments generates embeddings for indexed document texts.
func (*OpenAI) EmbedQueries ¶
EmbedQueries generates embeddings for query texts.
func (*OpenAI) Fingerprint ¶
Fingerprint returns a deterministic identifier for the endpoint, model, and input-preparation strategy.
func (*OpenAI) Unconfigured ¶
Unconfigured reports whether the embedder is missing a base URL or model.
type OpenAIConfig ¶
type OpenAIConfig struct {
BaseURL string
Model string
// Timeout bounds a single embeddings sub-request. For inputs that
// fit in a single batch (len(texts) <= MaxBatchSize) this is also
// the total wall-clock budget. For multi-batch calls the total
// budget is Timeout * ceil(len(texts)/MaxBatchSize) — each
// sub-request gets its own Timeout-sized window, and a slow early
// batch cannot starve later batches. A zero or negative value
// selects defaultOpenAITimeout (15s) at NewOpenAI time. Callers
// that want a tighter global cap should pass a ctx with their own
// deadline; the embedder honors whichever deadline trips first.
Timeout time.Duration
APIToken string
// MaxBatchSize caps how many inputs are sent in a single embeddings
// request. EmbedDocuments and EmbedQueries chunk their input into
// sub-requests of at most this size and concatenate the results in
// order. Zero or negative values select defaultOpenAIMaxBatchSize
// (512), which is conservative for self-hosted gateways; real OpenAI
// accepts up to 2048 per request, and operators who know their
// upstream can raise this explicitly.
MaxBatchSize int
}
OpenAIConfig configures an OpenAI-compatible embedder.
APIToken is redacted from every log/display representation — String, GoString, and slog.LogValuer all render the token as "[REDACTED]". Direct field access still yields the raw value so the embedder itself can sign requests; redaction is defense-in-depth against accidental fmt.Printf / slog.Info / log line disclosure.
json.Marshal and encoding.TextMarshaler deliberately stay canonical: OpenAIConfig is a public configuration type, so overriding either would silently break any caller persisting or round-tripping the config (credential would vanish on encode, and TextMarshaler also redirects json.Marshal output through its redacted form). Callers that need a redacted view should marshal cfg.String() or build a dedicated view type.
func (OpenAIConfig) Enabled ¶
func (c OpenAIConfig) Enabled() bool
Enabled reports whether the config is usable for requests.
func (OpenAIConfig) GoString ¶
func (c OpenAIConfig) GoString() string
GoString returns a redacted Go-syntax rendering of the config for %#v. Without this, %#v falls back to reflection and surfaces the raw APIToken field value. Timeout is formatted as time.Duration(ns) so the result stays a valid Go composite literal a reader could paste back into source (Duration's default %s form "2s" is human-readable but not Go-parseable).
func (OpenAIConfig) LogValue ¶
func (c OpenAIConfig) LogValue() slog.Value
LogValue implements slog.LogValuer so slog handlers — including the default JSONHandler — render the config with APIToken redacted. slog consults LogValuer before falling back to json.Marshal, so this covers the structured-logging path without hijacking json.Marshal itself (which stays canonical for callers that round-trip the config through JSON).
func (OpenAIConfig) String ¶
func (c OpenAIConfig) String() string
String returns a redacted, human-readable rendering of the config. fmt verbs %v and %s route through this method.