Documentation
¶
Overview ¶
Package sparse provides utilities for generating sparse vectors for hybrid search.
Sparse vectors enable exact term matching combined with semantic similarity, improving retrieval accuracy for queries that require precise term matching.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func EnsureModelDownloaded ¶
EnsureModelDownloaded pulls the model files into the local cache if missing. We only need the tokenizers for sparse vector generation.
func GenerateSparseVector ¶
GenerateSparseVector builds a normalized sparse vector from text using the registered provider. If no provider is registered, it uses the default BoWProvider for backward compatibility.
func RegisterProvider ¶
func RegisterProvider(p Provider)
RegisterProvider registers a sparse vector provider, replacing the default.
Types ¶
type BoWProvider ¶
type BoWProvider struct {
// contains filtered or unexported fields
}
BoWProvider implements the Provider interface using a Bag-of-Words approach with a pretrained tokenizer.
func NewBoWProvider ¶
func NewBoWProvider() *BoWProvider
NewBoWProvider creates a new Bag-of-Words sparse provider.
func (*BoWProvider) GenerateSparseVector ¶
func (p *BoWProvider) GenerateSparseVector(ctx context.Context, text string) (*schema.SparseVector, error)
GenerateSparseVector builds a normalized BOW sparse vector from text. Special tokens (PAD, CLS, SEP) are filtered to reduce noise.