openaicompat

package
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 14, 2026 License: MPL-2.0 Imports: 16 Imported by: 0

Documentation

Overview

Package openaicompat provides an OpenAI-compatible embedding API client that implements [embedder.Embedder].

It works with any provider that follows the OpenAI /v1/embeddings API shape: OpenAI, Voyage AI, Azure OpenAI, Ollama, LM Studio, and others. The client handles request batching, proactive rate limiting via a token bucket, and exponential backoff with jitter on 429 and 5xx responses. Every blocking point respects context cancellation.

Index

Constants

This section is empty.

Variables

View Source
var ErrAPIError = errors.New("embedding API error")

ErrAPIError is returned when the API returns a non-transient 4xx error. Callers can use errors.Is to distinguish API errors from network or context errors.

View Source
var ErrRateLimited = errors.New("embedding API: rate limited, retries exhausted")

ErrRateLimited is returned when the API returns 429 and all retries are exhausted.

Functions

func CallAPI

func CallAPI(ctx context.Context, client *http.Client, baseURL, apiKey string, req EmbeddingRequest) ([][]float32, error)

CallAPI sends a single embedding request and returns vectors sorted by index. The API does not guarantee response ordering, so results are sorted by the index field to match the original input order.

func CallWithRetry

func CallWithRetry(
	ctx context.Context,
	client *http.Client,
	sem *semaphore.Weighted,
	baseURL, apiKey string,
	req EmbeddingRequest,
	maxRetries int,
	baseDelay time.Duration,
) ([][]float32, error)

CallWithRetry calls CallAPI with exponential backoff + jitter on retryable errors. It retries on HTTP 429, 5xx, known-transient transport errors (EOF, connection reset, TLS bad_record_mac, etc.), and http.Client.Timeout expiry (which surfaces as context.DeadlineExceeded but is distinguishable because the caller's ctx is still live). It does not retry on 4xx (non-429), fatal transport errors (DNS, certificate), or cancellation/expiry of the caller's context.

sem gates each HTTP attempt: acquired before CallAPI, released immediately after (before any backoff sleep) so slots are available to sibling goroutines during retry waits. Pass nil to disable gating.

Types

type APIError

type APIError struct {
	StatusCode int
	Body       string
}

APIError is the JSON body returned on non-2xx responses.

func (*APIError) Error

func (e *APIError) Error() string

type Client

type Client struct {
	// contains filtered or unexported fields
}

Client is an OpenAI-compatible embedding API client that implements embedder.Embedder.

func New

func New(opts Options) (*Client, error)

New constructs a Client from the given options, validating required fields. Zero-value numeric fields use built-in defaults (64 texts per batch, 5 retries, 500ms base delay, 8 concurrent sub-batches).

func (*Client) Embed

func (c *Client) Embed(ctx context.Context, text string) ([]float32, error)

Embed embeds a single text string and returns its float32 vector. It respects ctx cancellation at the rate-limiter wait and HTTP request.

func (*Client) EmbedBatch

func (c *Client) EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)

EmbedBatch is the all-or-nothing wrapper around EmbedBatchPartial. It returns errors.Join'd failure on any sub-batch error, preserving backward-compatible semantics for callers that don't need per-text results.

func (*Client) EmbedBatchPartial

func (c *Client) EmbedBatchPartial(ctx context.Context, texts []string) ([][]float32, []error)

EmbedBatchPartial embeds texts in sub-batches with per-text failure isolation. Invariants: len(vectors)==len(errs)==len(texts); vectors[i] is nil iff errs[i] != nil.

func (*Client) HTTPClient

func (c *Client) HTTPClient() *http.Client

HTTPClient returns the underlying *http.Client. Exposed for tests and for callers that need to tune transport behavior post-construction.

type EmbeddingData

type EmbeddingData struct {
	Embedding []float32 `json:"embedding"`
	Index     int       `json:"index"`
}

EmbeddingData is one embedding vector in the API response.

type EmbeddingRequest

type EmbeddingRequest struct {
	Model      string   `json:"model"`
	Input      []string `json:"input"`
	Dimensions int      `json:"dimensions,omitempty"`
	InputType  string   `json:"input_type,omitempty"`
}

EmbeddingRequest is the JSON body sent to the embedding API.

type EmbeddingResponse

type EmbeddingResponse struct {
	Data  []EmbeddingData `json:"data"`
	Usage EmbeddingUsage  `json:"usage"`
}

EmbeddingResponse is the JSON body returned by the embedding API.

type EmbeddingUsage

type EmbeddingUsage struct {
	PromptTokens int `json:"prompt_tokens,omitempty"`
	TotalTokens  int `json:"total_tokens"`
}

EmbeddingUsage reports token consumption.

type Options

type Options struct {
	BaseURL        string        // base URL of the embedding API, e.g. "https://api.openai.com/v1"
	APIKey         string        // API key sent in the Authorization header
	Model          string        // embedding model name, e.g. "text-embedding-3-small"
	Dimensions     int           // embedding vector dimensions; 0 uses the model default
	InputType      string        // optional provider-specific input type, e.g. "document" or "query" for Voyage AI
	MaxBatchSize   int           // maximum texts per API call; 0 defaults to 64
	MaxRetries     int           // maximum retry attempts on 429 and 5xx errors
	RateLimit      float64       // sustained requests per second; 0 disables rate limiting
	RateBurst      int           // maximum burst above the sustained rate
	RetryBaseDelay time.Duration // initial backoff delay before the first retry; doubles each attempt
	HTTPClient     *http.Client  // optional custom HTTP client; nil uses a pooled client
	// HTTPTimeout is applied only when HTTPClient is nil. Default 30s
	// (preserved for direct API users); main.go overrides this to a higher
	// value derived from config.EmbeddingHTTPTimeout.
	HTTPTimeout time.Duration
	Concurrency int // concurrent sub-batch HTTP requests; 0 defaults to 8
}

Options configures an OpenAI-compatible embedding client.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL