openaicompat

package

v1.0.0 Latest Latest Go to latest Published: May 14, 2026 License: MPL-2.0 Imports: 16 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/ieshan/codamigo

Links

Open Source Insights

Documentation ¶

Overview ¶

Package openaicompat provides an OpenAI-compatible embedding API client that implements [embedder.Embedder].

It works with any provider that follows the OpenAI /v1/embeddings API shape: OpenAI, Voyage AI, Azure OpenAI, Ollama, LM Studio, and others. The client handles request batching, proactive rate limiting via a token bucket, and exponential backoff with jitter on 429 and 5xx responses. Every blocking point respects context cancellation.

Index ¶

Variables
func CallAPI(ctx context.Context, client *http.Client, baseURL, apiKey string, ...) ([][]float32, error)
func CallWithRetry(ctx context.Context, client *http.Client, sem *semaphore.Weighted, ...) ([][]float32, error)
type APIError
- func (e *APIError) Error() string
type Client
- func New(opts Options) (*Client, error)
type EmbeddingData
type EmbeddingRequest
type EmbeddingResponse
type EmbeddingUsage
type Options

Constants ¶

This section is empty.

Variables ¶

View Source

var ErrAPIError = errors.New("embedding API error")

ErrAPIError is returned when the API returns a non-transient 4xx error. Callers can use errors.Is to distinguish API errors from network or context errors.

View Source

var ErrRateLimited = errors.New("embedding API: rate limited, retries exhausted")

ErrRateLimited is returned when the API returns 429 and all retries are exhausted.

Functions ¶

func CallAPI ¶

func CallAPI(ctx context.Context, client *http.Client, baseURL, apiKey string, req EmbeddingRequest) ([][]float32, error)

CallAPI sends a single embedding request and returns vectors sorted by index. The API does not guarantee response ordering, so results are sorted by the index field to match the original input order.

func CallWithRetry ¶

func CallWithRetry(
	ctx context.Context,
	client *http.Client,
	sem *semaphore.Weighted,
	baseURL, apiKey string,
	req EmbeddingRequest,
	maxRetries int,
	baseDelay time.Duration,
) ([][]float32, error)

CallWithRetry calls CallAPI with exponential backoff + jitter on retryable errors. It retries on HTTP 429, 5xx, known-transient transport errors (EOF, connection reset, TLS bad_record_mac, etc.), and http.Client.Timeout expiry (which surfaces as context.DeadlineExceeded but is distinguishable because the caller's ctx is still live). It does not retry on 4xx (non-429), fatal transport errors (DNS, certificate), or cancellation/expiry of the caller's context.

sem gates each HTTP attempt: acquired before CallAPI, released immediately after (before any backoff sleep) so slots are available to sibling goroutines during retry waits. Pass nil to disable gating.

Types ¶

type APIError ¶

type APIError struct {
	StatusCode int
	Body       string
}

APIError is the JSON body returned on non-2xx responses.

func (*APIError) Error ¶

func (e *APIError) Error() string

type Client ¶

type Client struct {
	// contains filtered or unexported fields
}

Client is an OpenAI-compatible embedding API client that implements embedder.Embedder.

func New ¶

func New(opts Options) (*Client, error)

New constructs a Client from the given options, validating required fields. Zero-value numeric fields use built-in defaults (64 texts per batch, 5 retries, 500ms base delay, 8 concurrent sub-batches).

func (*Client) Embed ¶

func (c *Client) Embed(ctx context.Context, text string) ([]float32, error)

Embed embeds a single text string and returns its float32 vector. It respects ctx cancellation at the rate-limiter wait and HTTP request.

func (*Client) EmbedBatch ¶

func (c *Client) EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)

EmbedBatch is the all-or-nothing wrapper around EmbedBatchPartial. It returns errors.Join'd failure on any sub-batch error, preserving backward-compatible semantics for callers that don't need per-text results.

func (*Client) EmbedBatchPartial ¶

func (c *Client) EmbedBatchPartial(ctx context.Context, texts []string) ([][]float32, []error)

EmbedBatchPartial embeds texts in sub-batches with per-text failure isolation. Invariants: len(vectors)==len(errs)==len(texts); vectors[i] is nil iff errs[i] != nil.

func (*Client) HTTPClient ¶

func (c *Client) HTTPClient() *http.Client

HTTPClient returns the underlying *http.Client. Exposed for tests and for callers that need to tune transport behavior post-construction.

type EmbeddingData ¶

type EmbeddingData struct {
	Embedding []float32 `json:"embedding"`
	Index     int       `json:"index"`
}

EmbeddingData is one embedding vector in the API response.

type EmbeddingRequest ¶

type EmbeddingRequest struct {
	Model      string   `json:"model"`
	Input      []string `json:"input"`
	Dimensions int      `json:"dimensions,omitempty"`
	InputType  string   `json:"input_type,omitempty"`
}

EmbeddingRequest is the JSON body sent to the embedding API.

type EmbeddingResponse ¶

type EmbeddingResponse struct {
	Data  []EmbeddingData `json:"data"`
	Usage EmbeddingUsage  `json:"usage"`
}

EmbeddingResponse is the JSON body returned by the embedding API.

type EmbeddingUsage ¶

type EmbeddingUsage struct {
	PromptTokens int `json:"prompt_tokens,omitempty"`
	TotalTokens  int `json:"total_tokens"`
}

EmbeddingUsage reports token consumption.

type Options ¶

type Options struct {
	BaseURL        string        // base URL of the embedding API, e.g. "https://api.openai.com/v1"
	APIKey         string        // API key sent in the Authorization header
	Model          string        // embedding model name, e.g. "text-embedding-3-small"
	Dimensions     int           // embedding vector dimensions; 0 uses the model default
	InputType      string        // optional provider-specific input type, e.g. "document" or "query" for Voyage AI
	MaxBatchSize   int           // maximum texts per API call; 0 defaults to 64
	MaxRetries     int           // maximum retry attempts on 429 and 5xx errors
	RateLimit      float64       // sustained requests per second; 0 disables rate limiting
	RateBurst      int           // maximum burst above the sustained rate
	RetryBaseDelay time.Duration // initial backoff delay before the first retry; doubles each attempt
	HTTPClient     *http.Client  // optional custom HTTP client; nil uses a pooled client
	// HTTPTimeout is applied only when HTTPClient is nil. Default 30s
	// (preserved for direct API users); main.go overrides this to a higher
	// value derived from config.EmbeddingHTTPTimeout.
	HTTPTimeout time.Duration
	Concurrency int // concurrent sub-batch HTTP requests; 0 defaults to 8
}

Options configures an OpenAI-compatible embedding client.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL