rerank

package
v0.4.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 17, 2026 License: AGPL-3.0 Imports: 14 Imported by: 0

Documentation

Overview

Package rerank holds the optional read-side rerank stage of recall: after hybrid retrieval and composite ranking, a reranker reads the query and the candidates together — something embeddings can't — and reorders them by how well each answers the query. Two implementations share the Reranker contract: a cross-encoder ranking model behind a /rerank endpoint (CrossEncoder) and a chat LLM prompted over a numbered list (NewLLM).

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Candidate added in v0.0.6

type Candidate struct {
	ID      string
	Content string
}

Candidate is one memory offered to a reranker.

type Completer added in v0.0.6

type Completer interface {
	Complete(ctx context.Context, system, user string) (string, error)
}

Completer is the single-turn chat completion an LLM reranker needs. The chat clients in internal/llm satisfy it; declaring it here keeps this package free of a dependency on the chat backends.

type Config

type Config struct {
	// BaseURL is the API root (e.g. http://host:8002/v1); "/rerank" is appended.
	BaseURL string
	Model   string
	APIKey  string
	// MaxDocChars truncates each document before sending. 0 disables.
	MaxDocChars int
	// MaxBatchChars caps the total characters across query and documents in
	// a single /rerank request. Set just below the model's effective context
	// in characters (≈ n_ctx × chars-per-token × (1 − template reserve);
	// ~4000 for a 1024-token model). 0 disables proactive batching.
	MaxBatchChars int
	HTTPClient    *http.Client
}

Config configures the cross-encoder reranker client.

type CrossEncoder

type CrossEncoder struct {
	// contains filtered or unexported fields
}

CrossEncoder reranks candidates with a dedicated ranking model served over the Cohere-style /rerank API (Infinity, vLLM, TEI, llama-server --rerank).

func New

func New(cfg Config) (*CrossEncoder, error)

func (*CrossEncoder) Rerank

func (c *CrossEncoder) Rerank(ctx context.Context, query string, candidates []Candidate) ([]string, error)

Rerank scores every candidate against the query and returns candidate IDs most-relevant-first. Candidates the server omits are dropped. When the payload exceeds MaxBatchChars, candidates are split across multiple requests and merged by score.

type Limited added in v0.4.0

type Limited struct {
	// contains filtered or unexported fields
}

Limited caps in-flight Rerank calls. max <= 0 returns inner unchanged. onInFlight, if non-nil, receives the absolute in-flight count on every acquire and release.

func (*Limited) Max added in v0.4.0

func (l *Limited) Max() int

func (*Limited) Rerank added in v0.4.0

func (l *Limited) Rerank(ctx context.Context, query string, candidates []Candidate) ([]string, error)

type Reranker added in v0.0.6

type Reranker interface {
	// Rerank returns candidate IDs ordered most-relevant-first. Backends may
	// return fewer IDs than the input slice; candidates omitted from the output
	// are treated as irrelevant and DROPPED from the result set. An empty slice
	// means no candidate is relevant.
	Rerank(ctx context.Context, query string, candidates []Candidate) ([]string, error)
}

Reranker reorders retrieved candidates by how well each answers a query, and MAY drop irrelevant candidates entirely — reranking is no longer reorder-only.

func NewLLM added in v0.0.6

func NewLLM(c Completer) Reranker

NewLLM builds a reranker over a chat backend. The per-candidate content cap keeps the prompt small — a deep candidate pool of long memories can otherwise blow a RAM-limited local server's context/activation budget.

func NewLimited added in v0.4.0

func NewLimited(inner Reranker, max int, onInFlight func(n int64)) Reranker

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL