embed

package
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 18, 2026 License: MIT Imports: 14 Imported by: 0

Documentation

Overview

Package embed defines embedder interfaces and a deterministic test fixture.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type ContextualEmbedder added in v0.4.0

type ContextualEmbedder interface {
	EmbedDocumentChunks(ctx context.Context, fullDoc string, chunks []string) ([][]float64, error)
}

ContextualEmbedder optionally embeds chunk texts with access to the full source document, enabling late-chunking or other context-aware strategies without changing the base Embedder contract.

type Embedder

type Embedder interface {
	Fingerprint() string
	Dimension(ctx context.Context) (int, error)
	EmbedDocuments(ctx context.Context, texts []string) ([][]float64, error)
	EmbedQueries(ctx context.Context, texts []string) ([][]float64, error)
}

Embedder generates embeddings for stored records and live queries.

type Fixture

type Fixture struct {
	// contains filtered or unexported fields
}

Fixture is a deterministic embedder for tests and offline use.

func NewFixture

func NewFixture(model string, dimensions int) (*Fixture, error)

NewFixture returns a deterministic hash-based embedder.

func (*Fixture) Dimension

func (f *Fixture) Dimension(ctx context.Context) (int, error)

Dimension reports the embedding dimensionality.

func (*Fixture) EmbedDocuments

func (f *Fixture) EmbedDocuments(ctx context.Context, texts []string) ([][]float64, error)

EmbedDocuments returns deterministic embeddings for indexed document texts.

func (*Fixture) EmbedQueries

func (f *Fixture) EmbedQueries(ctx context.Context, texts []string) ([][]float64, error)

EmbedQueries returns deterministic embeddings for query texts.

func (*Fixture) Fingerprint

func (f *Fixture) Fingerprint() string

Fingerprint returns a stable identifier for the fixture configuration.

type OpenAI added in v0.3.0

type OpenAI struct {
	// contains filtered or unexported fields
}

OpenAI implements Embedder against an OpenAI-compatible HTTP embeddings API.

func NewOpenAI added in v0.3.0

func NewOpenAI(cfg OpenAIConfig) *OpenAI

NewOpenAI returns an OpenAI-compatible embedder bound to cfg.

func (*OpenAI) Config added in v0.3.0

func (e *OpenAI) Config() OpenAIConfig

Config returns the normalized embedder config.

func (*OpenAI) Dimension added in v0.3.0

func (e *OpenAI) Dimension(ctx context.Context) (int, error)

Dimension probes the upstream on first use and then reuses the cached value.

func (*OpenAI) EmbedDocuments added in v0.3.0

func (e *OpenAI) EmbedDocuments(ctx context.Context, texts []string) ([][]float64, error)

EmbedDocuments generates embeddings for indexed document texts.

func (*OpenAI) EmbedQueries added in v0.3.0

func (e *OpenAI) EmbedQueries(ctx context.Context, texts []string) ([][]float64, error)

EmbedQueries generates embeddings for query texts.

func (*OpenAI) Fingerprint added in v0.3.0

func (e *OpenAI) Fingerprint() string

Fingerprint returns a deterministic identifier for the endpoint, model, and input-preparation strategy.

func (*OpenAI) Unconfigured added in v0.3.0

func (e *OpenAI) Unconfigured() bool

Unconfigured reports whether the embedder is missing a base URL or model.

type OpenAIConfig added in v0.3.0

type OpenAIConfig struct {
	BaseURL  string
	Model    string
	Timeout  time.Duration
	APIToken string

	// MaxBatchSize caps how many inputs are sent in a single embeddings
	// request. EmbedDocuments and EmbedQueries chunk their input into
	// sub-requests of at most this size and concatenate the results in
	// order. Zero or negative values select defaultOpenAIMaxBatchSize
	// (512), which is conservative for self-hosted gateways; real OpenAI
	// accepts up to 2048 per request, and operators who know their
	// upstream can raise this explicitly.
	MaxBatchSize int
}

OpenAIConfig configures an OpenAI-compatible embedder.

APIToken is redacted from every log/display representation — String, GoString, and slog.LogValuer all render the token as "[REDACTED]". Direct field access still yields the raw value so the embedder itself can sign requests; redaction is defense-in-depth against accidental fmt.Printf / slog.Info / log line disclosure.

json.Marshal and encoding.TextMarshaler deliberately stay canonical: OpenAIConfig is a public configuration type, so overriding either would silently break any caller persisting or round-tripping the config (credential would vanish on encode, and TextMarshaler also redirects json.Marshal output through its redacted form). Callers that need a redacted view should marshal cfg.String() or build a dedicated view type.

func (OpenAIConfig) Enabled added in v0.3.0

func (c OpenAIConfig) Enabled() bool

Enabled reports whether the config is usable for requests.

func (OpenAIConfig) GoString added in v1.0.0

func (c OpenAIConfig) GoString() string

GoString returns a redacted Go-syntax rendering of the config for %#v. Without this, %#v falls back to reflection and surfaces the raw APIToken field value. Timeout is formatted as time.Duration(ns) so the result stays a valid Go composite literal a reader could paste back into source (Duration's default %s form "2s" is human-readable but not Go-parseable).

func (OpenAIConfig) LogValue added in v1.0.0

func (c OpenAIConfig) LogValue() slog.Value

LogValue implements slog.LogValuer so slog handlers — including the default JSONHandler — render the config with APIToken redacted. slog consults LogValuer before falling back to json.Marshal, so this covers the structured-logging path without hijacking json.Marshal itself (which stays canonical for callers that round-trip the config through JSON).

func (OpenAIConfig) String added in v1.0.0

func (c OpenAIConfig) String() string

String returns a redacted, human-readable rendering of the config. fmt verbs %v and %s route through this method.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL