Documentation
¶
Overview ¶
Package openaicompat provides an OpenAI-compatible embedding API client that implements [embedder.Embedder].
It works with any provider that follows the OpenAI /v1/embeddings API shape: OpenAI, Voyage AI, Azure OpenAI, Ollama, LM Studio, and others. The client handles request batching, proactive rate limiting via a token bucket, and exponential backoff with jitter on 429 and 5xx responses. Every blocking point respects context cancellation.
Index ¶
- Variables
- func CallAPI(ctx context.Context, client *http.Client, baseURL, apiKey string, ...) ([][]float32, error)
- func CallWithRetry(ctx context.Context, client *http.Client, sem *semaphore.Weighted, ...) ([][]float32, error)
- type APIError
- type Client
- type EmbeddingData
- type EmbeddingRequest
- type EmbeddingResponse
- type EmbeddingUsage
- type Options
Constants ¶
This section is empty.
Variables ¶
var ErrAPIError = errors.New("embedding API error")
ErrAPIError is returned when the API returns a non-transient 4xx error. Callers can use errors.Is to distinguish API errors from network or context errors.
var ErrRateLimited = errors.New("embedding API: rate limited, retries exhausted")
ErrRateLimited is returned when the API returns 429 and all retries are exhausted.
Functions ¶
func CallAPI ¶
func CallAPI(ctx context.Context, client *http.Client, baseURL, apiKey string, req EmbeddingRequest) ([][]float32, error)
CallAPI sends a single embedding request and returns vectors sorted by index. The API does not guarantee response ordering, so results are sorted by the index field to match the original input order.
func CallWithRetry ¶
func CallWithRetry( ctx context.Context, client *http.Client, sem *semaphore.Weighted, baseURL, apiKey string, req EmbeddingRequest, maxRetries int, baseDelay time.Duration, ) ([][]float32, error)
CallWithRetry calls CallAPI with exponential backoff + jitter on retryable errors. It retries on HTTP 429, 5xx, known-transient transport errors (EOF, connection reset, TLS bad_record_mac, etc.), and http.Client.Timeout expiry (which surfaces as context.DeadlineExceeded but is distinguishable because the caller's ctx is still live). It does not retry on 4xx (non-429), fatal transport errors (DNS, certificate), or cancellation/expiry of the caller's context.
sem gates each HTTP attempt: acquired before CallAPI, released immediately after (before any backoff sleep) so slots are available to sibling goroutines during retry waits. Pass nil to disable gating.
Types ¶
type Client ¶
type Client struct {
// contains filtered or unexported fields
}
Client is an OpenAI-compatible embedding API client that implements embedder.Embedder.
func New ¶
New constructs a Client from the given options, validating required fields. Zero-value numeric fields use built-in defaults (64 texts per batch, 5 retries, 500ms base delay, 8 concurrent sub-batches).
func (*Client) Embed ¶
Embed embeds a single text string and returns its float32 vector. It respects ctx cancellation at the rate-limiter wait and HTTP request.
func (*Client) EmbedBatch ¶
EmbedBatch is the all-or-nothing wrapper around EmbedBatchPartial. It returns errors.Join'd failure on any sub-batch error, preserving backward-compatible semantics for callers that don't need per-text results.
func (*Client) EmbedBatchPartial ¶
EmbedBatchPartial embeds texts in sub-batches with per-text failure isolation. Invariants: len(vectors)==len(errs)==len(texts); vectors[i] is nil iff errs[i] != nil.
func (*Client) HTTPClient ¶
HTTPClient returns the underlying *http.Client. Exposed for tests and for callers that need to tune transport behavior post-construction.
type EmbeddingData ¶
EmbeddingData is one embedding vector in the API response.
type EmbeddingRequest ¶
type EmbeddingRequest struct {
Model string `json:"model"`
Input []string `json:"input"`
Dimensions int `json:"dimensions,omitempty"`
InputType string `json:"input_type,omitempty"`
}
EmbeddingRequest is the JSON body sent to the embedding API.
type EmbeddingResponse ¶
type EmbeddingResponse struct {
Data []EmbeddingData `json:"data"`
Usage EmbeddingUsage `json:"usage"`
}
EmbeddingResponse is the JSON body returned by the embedding API.
type EmbeddingUsage ¶
type EmbeddingUsage struct {
PromptTokens int `json:"prompt_tokens,omitempty"`
TotalTokens int `json:"total_tokens"`
}
EmbeddingUsage reports token consumption.
type Options ¶
type Options struct {
BaseURL string // base URL of the embedding API, e.g. "https://api.openai.com/v1"
APIKey string // API key sent in the Authorization header
Model string // embedding model name, e.g. "text-embedding-3-small"
Dimensions int // embedding vector dimensions; 0 uses the model default
InputType string // optional provider-specific input type, e.g. "document" or "query" for Voyage AI
MaxBatchSize int // maximum texts per API call; 0 defaults to 64
MaxRetries int // maximum retry attempts on 429 and 5xx errors
RateLimit float64 // sustained requests per second; 0 disables rate limiting
RateBurst int // maximum burst above the sustained rate
RetryBaseDelay time.Duration // initial backoff delay before the first retry; doubles each attempt
HTTPClient *http.Client // optional custom HTTP client; nil uses a pooled client
// HTTPTimeout is applied only when HTTPClient is nil. Default 30s
// (preserved for direct API users); main.go overrides this to a higher
// value derived from config.EmbeddingHTTPTimeout.
HTTPTimeout time.Duration
Concurrency int // concurrent sub-batch HTTP requests; 0 defaults to 8
}
Options configures an OpenAI-compatible embedding client.