embed

package
v0.8.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 29, 2026 License: MIT Imports: 22 Imported by: 0

Documentation

Overview

Package embed provides a fail-silent client for generating vector embeddings from an Ollama-compatible or OpenAI-compatible HTTP endpoint. A nil *Client is safe — all methods return (nil, nil) so callers can use the zero value without checking for configuration.

Supported endpoint formats (auto-detected from URL):

  • OpenAI (/v1/embeddings) — batch-capable, request {"model":...,"input":...}
  • Ollama (/api/embeddings) — serial-only, request {"model":...,"prompt":...}

Package embed provides embedding generation for converting text into float32 vectors for similarity search. The Embedder interface abstracts over multiple backends (builtin ONNX, Ollama, OpenAI).

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type BuiltinEmbedder

type BuiltinEmbedder struct {

	// P8-11: optional callback for model download lifecycle events.
	// eventType is "download_start" or "download_complete".
	OnModelEvent func(eventType string)
	// contains filtered or unexported fields
}

BuiltinEmbedder uses the pure-Go hugot library to run nomic-embed-text-v1.5 inference locally without any external dependencies. The quantized ONNX model (~137MB) is auto-downloaded from HuggingFace on first use and cached in the models directory. Output is Matryoshka-truncated to 384 dims.

Concurrent: a pool of pipeline instances allows bounded parallel inference. The pool size defaults to 3 — up to 3 Embed calls run simultaneously; additional callers block (respecting context) until a slot is returned.

Init retry: if model download fails (e.g., no internet), subsequent Embed() calls will retry initialization — not permanently broken.

func NewBuiltinEmbedder

func NewBuiltinEmbedder(modelsDir string) *BuiltinEmbedder

NewBuiltinEmbedder creates a BuiltinEmbedder that stores its model in modelsDir (typically ~/.synapses/models). The nomic-embed-text-v1.5 model is lazily downloaded on the first Embed() call. Uses a pool of 3 pipeline instances for concurrent inference. Output is 384-dim (Matryoshka truncated).

func NewBuiltinEmbedderWithPoolSize

func NewBuiltinEmbedderWithPoolSize(modelsDir string, poolSize int) *BuiltinEmbedder

NewBuiltinEmbedderWithPoolSize creates a BuiltinEmbedder with a custom pool size. poolSize is clamped to [1, 8].

func (*BuiltinEmbedder) Close

func (b *BuiltinEmbedder) Close() error

Close releases all hugot session resources in the pool. Blocks until all in-flight Embed calls complete and return their slots, then destroys all pipeline instances. Safe to call concurrently; second call is a no-op.

func (*BuiltinEmbedder) Embed

func (b *BuiltinEmbedder) Embed(ctx context.Context, text string) ([]float32, error)

Embed generates a 384-dimensional embedding for text using the builtin nomic-embed-text-v1.5 model (768 dims → Matryoshka truncation to 384). Concurrent: up to poolSize calls run in parallel; additional callers block until a pipeline slot is available (respecting context cancellation and shutdown).

If model initialization fails (e.g., no internet for download), subsequent calls will retry — the embedder is never permanently broken.

func (*BuiltinEmbedder) EmbedBatch

func (b *BuiltinEmbedder) EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)

EmbedBatch embeds multiple texts in a single ONNX forward pass, which is significantly faster than calling Embed() N times. Returns one vector per input text in the same order. Uses one pool slot for the entire batch.

func (*BuiltinEmbedder) Model

func (b *BuiltinEmbedder) Model() string

Model returns the builtin model identifier.

func (*BuiltinEmbedder) StatusDetail

func (b *BuiltinEmbedder) StatusDetail() string

StatusDetail returns a human-readable string describing the current initialization state. Thread-safe. Four possible values:

  • "ready" — pipeline pool initialized, embeddings working
  • "model cached" — model on disk but pool not yet started; will initialize automatically on the first recall() call (e.g. after daemon restart)
  • "model not yet downloaded" — Embed() has never been called and no model found on disk; model will be downloaded on first recall()
  • "unavailable" — init was attempted but failed (download error, pipeline error, or air-gapped environment); Embed() will retry automatically

func (*BuiltinEmbedder) WarmUp

func (b *BuiltinEmbedder) WarmUp(ctx context.Context) error

WarmUp pre-initializes the embedder by downloading the model if not already cached and setting up the inference pipeline pool. Call at daemon startup in a background goroutine so the first Embed() call doesn't block on download. Safe to call concurrently — uses singleflight internally.

type Client

type Client struct {
	// contains filtered or unexported fields
}

Client calls an embedding endpoint to convert text into float32 vectors. Supports two formats, auto-detected from the endpoint URL:

  • OpenAI (/v1/embeddings) — OpenAI-compatible, supports batch
  • Ollama (/api/embeddings) — Ollama native, serial-only

A nil Client is safe to use — Embed returns (nil, nil).

func NewClient

func NewClient(endpoint, model string, opts ...Option) *Client

NewClient creates a Client for the given endpoint. Returns nil if endpoint is empty (embedding disabled). model defaults to "nomic-embed-text" when empty, which produces 768-dimensional vectors and is available in Ollama without any extra setup.

func (*Client) Embed

func (c *Client) Embed(ctx context.Context, text string) ([]float32, error)

Embed returns a vector embedding for text. The format is auto-detected from the endpoint URL:

  • "/v1/embeddings" → OpenAI format
  • otherwise → Ollama format

Returns (nil, nil) if the client is nil.

func (*Client) EmbedBatch

func (c *Client) EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)

EmbedBatch returns vector embeddings for a batch of texts in one HTTP round-trip. Supports OpenAI batch format ({"model":...,"input":[...]}). Ollama does not support batch — falls back to serial Embed() calls. Returns (nil, nil) if the client is nil or texts is empty.

func (*Client) Model

func (c *Client) Model() string

Model returns the configured embedding model name.

func (*Client) WarmUp

func (c *Client) WarmUp(_ context.Context) error

WarmUp is a no-op for the HTTP client embedder (model is managed by the remote server).

type Embedder

type Embedder interface {
	// Embed returns a vector embedding for text.
	// Returns (nil, nil) only when the embedder is intentionally disabled.
	Embed(ctx context.Context, text string) ([]float32, error)

	// WarmUp pre-initializes the embedder (e.g. downloads the model file).
	// Call at daemon startup in a background goroutine so the first Embed()
	// call doesn't block on model download. Implementations that need no
	// warmup should return nil immediately.
	WarmUp(ctx context.Context) error

	// Model returns the model name used for embedding generation.
	// Used as the model key in UpsertMemoryEmbedding for cache invalidation.
	Model() string

	// Close releases any resources held by the embedder (e.g. ONNX sessions).
	// Implementations where Close is a no-op should return nil.
	Close() error
}

Embedder generates vector embeddings from text. Implementations must be safe for concurrent use. A nil Embedder is NOT safe — callers must check.

type HTTPDoer

type HTTPDoer interface {
	Do(*http.Request) (*http.Response, error)
}

HTTPDoer is the interface for making HTTP requests. *http.Client satisfies this interface. Expose it so callers can inject custom transports (retry, tracing, rate-limiting) or test doubles without needing a real network. This is the same pattern used by AWS SDK v2, google-cloud-go, and stripe-go.

type OllamaEmbedder

type OllamaEmbedder struct {
	// contains filtered or unexported fields
}

OllamaEmbedder wraps an embed.Client to satisfy the Embedder interface. Use this when embeddings mode is "ollama" — delegates to a local Ollama instance via the existing HTTP client.

func NewOllamaEmbedder

func NewOllamaEmbedder(endpoint, model string, opts ...Option) *OllamaEmbedder

NewOllamaEmbedder creates an Embedder that delegates to a local Ollama instance. endpoint is the Ollama API URL (e.g. "http://localhost:11434/api/embeddings"). model defaults to "nomic-embed-text" when empty. Returns nil if endpoint is empty.

func (*OllamaEmbedder) Close

func (o *OllamaEmbedder) Close() error

Close is a no-op for the Ollama embedder (HTTP client has no resources to release).

func (*OllamaEmbedder) Embed

func (o *OllamaEmbedder) Embed(ctx context.Context, text string) ([]float32, error)

Embed generates an embedding via the Ollama HTTP API.

func (*OllamaEmbedder) Model

func (o *OllamaEmbedder) Model() string

Model returns the configured embedding model name.

func (*OllamaEmbedder) WarmUp

func (o *OllamaEmbedder) WarmUp(ctx context.Context) error

WarmUp validates that the Ollama server has the embedding model available by performing a single test embed. Returns an error if the model is not pulled or the server is unreachable — callers can fall back to builtin ONNX.

type Option

type Option func(*Client)

Option is a functional option for Client construction.

func WithHTTPDoer

func WithHTTPDoer(d HTTPDoer) Option

WithHTTPDoer replaces the default *http.Client with a custom HTTPDoer. Use this to inject retry wrappers, custom transports, or test doubles.

// Production: add tracing
embed.NewClient(url, model, embed.WithHTTPDoer(tracedClient))

// Tests: inject a mock without needing a real HTTP server
embed.NewClient(url, model, embed.WithHTTPDoer(myMock))

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL