Documentation
¶
Overview ¶
Package embed defines embedder interfaces and a deterministic test fixture.
Index ¶
- type ContextualEmbedder
- type Embedder
- type Fixture
- type OpenAI
- func (e *OpenAI) Config() OpenAIConfig
- func (e *OpenAI) Dimension(ctx context.Context) (int, error)
- func (e *OpenAI) EmbedDocuments(ctx context.Context, texts []string) ([][]float64, error)
- func (e *OpenAI) EmbedQueries(ctx context.Context, texts []string) ([][]float64, error)
- func (e *OpenAI) Fingerprint() string
- func (e *OpenAI) Unconfigured() bool
- type OpenAIConfig
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type ContextualEmbedder ¶ added in v0.4.0
type ContextualEmbedder interface {
EmbedDocumentChunks(ctx context.Context, fullDoc string, chunks []string) ([][]float64, error)
}
ContextualEmbedder optionally embeds chunk texts with access to the full source document, enabling late-chunking or other context-aware strategies without changing the base Embedder contract.
type Embedder ¶
type Embedder interface {
Fingerprint() string
Dimension(ctx context.Context) (int, error)
EmbedDocuments(ctx context.Context, texts []string) ([][]float64, error)
EmbedQueries(ctx context.Context, texts []string) ([][]float64, error)
}
Embedder generates embeddings for stored records and live queries.
type Fixture ¶
type Fixture struct {
// contains filtered or unexported fields
}
Fixture is a deterministic embedder for tests and offline use.
func NewFixture ¶
NewFixture returns a deterministic hash-based embedder.
func (*Fixture) EmbedDocuments ¶
EmbedDocuments returns deterministic embeddings for indexed document texts.
func (*Fixture) EmbedQueries ¶
EmbedQueries returns deterministic embeddings for query texts.
func (*Fixture) Fingerprint ¶
Fingerprint returns a stable identifier for the fixture configuration.
type OpenAI ¶ added in v0.3.0
type OpenAI struct {
// contains filtered or unexported fields
}
OpenAI implements Embedder against an OpenAI-compatible HTTP embeddings API.
func NewOpenAI ¶ added in v0.3.0
func NewOpenAI(cfg OpenAIConfig) *OpenAI
NewOpenAI returns an OpenAI-compatible embedder bound to cfg.
func (*OpenAI) Config ¶ added in v0.3.0
func (e *OpenAI) Config() OpenAIConfig
Config returns the normalized embedder config.
func (*OpenAI) Dimension ¶ added in v0.3.0
Dimension probes the upstream on first use and then reuses the cached value.
func (*OpenAI) EmbedDocuments ¶ added in v0.3.0
EmbedDocuments generates embeddings for indexed document texts.
func (*OpenAI) EmbedQueries ¶ added in v0.3.0
EmbedQueries generates embeddings for query texts.
func (*OpenAI) Fingerprint ¶ added in v0.3.0
Fingerprint returns a deterministic identifier for the endpoint, model, and input-preparation strategy.
func (*OpenAI) Unconfigured ¶ added in v0.3.0
Unconfigured reports whether the embedder is missing a base URL or model.
type OpenAIConfig ¶ added in v0.3.0
type OpenAIConfig struct {
BaseURL string
Model string
Timeout time.Duration
APIToken string
// MaxBatchSize caps how many inputs are sent in a single embeddings
// request. EmbedDocuments and EmbedQueries chunk their input into
// sub-requests of at most this size and concatenate the results in
// order. Zero or negative values select defaultOpenAIMaxBatchSize
// (512), which is conservative for self-hosted gateways; real OpenAI
// accepts up to 2048 per request, and operators who know their
// upstream can raise this explicitly.
MaxBatchSize int
}
OpenAIConfig configures an OpenAI-compatible embedder.
APIToken is redacted from every log/display representation — String, GoString, and slog.LogValuer all render the token as "[REDACTED]". Direct field access still yields the raw value so the embedder itself can sign requests; redaction is defense-in-depth against accidental fmt.Printf / slog.Info / log line disclosure.
json.Marshal and encoding.TextMarshaler deliberately stay canonical: OpenAIConfig is a public configuration type, so overriding either would silently break any caller persisting or round-tripping the config (credential would vanish on encode, and TextMarshaler also redirects json.Marshal output through its redacted form). Callers that need a redacted view should marshal cfg.String() or build a dedicated view type.
func (OpenAIConfig) Enabled ¶ added in v0.3.0
func (c OpenAIConfig) Enabled() bool
Enabled reports whether the config is usable for requests.
func (OpenAIConfig) GoString ¶ added in v1.0.0
func (c OpenAIConfig) GoString() string
GoString returns a redacted Go-syntax rendering of the config for %#v. Without this, %#v falls back to reflection and surfaces the raw APIToken field value. Timeout is formatted as time.Duration(ns) so the result stays a valid Go composite literal a reader could paste back into source (Duration's default %s form "2s" is human-readable but not Go-parseable).
func (OpenAIConfig) LogValue ¶ added in v1.0.0
func (c OpenAIConfig) LogValue() slog.Value
LogValue implements slog.LogValuer so slog handlers — including the default JSONHandler — render the config with APIToken redacted. slog consults LogValuer before falling back to json.Marshal, so this covers the structured-logging path without hijacking json.Marshal itself (which stays canonical for callers that round-trip the config through JSON).
func (OpenAIConfig) String ¶ added in v1.0.0
func (c OpenAIConfig) String() string
String returns a redacted, human-readable rendering of the config. fmt verbs %v and %s route through this method.