sparse

package
v0.22.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 16, 2026 License: MIT Imports: 19 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func EnsureModelDownloaded

func EnsureModelDownloaded() (string, error)

EnsureModelDownloaded downloads the model artifacts directly. We only need the tokenizer files for sparse vector generation.

func GenerateSparseVector

func GenerateSparseVector(ctx context.Context, text string) (*schema.SparseVector, error)

GenerateSparseVector converts text into a normalized SparseVector using Bag-of-Tokens. The resulting vector is L2-normalized to unit length for consistent similarity scoring. Special tokens (PAD, CLS, SEP) are filtered out to reduce noise and index size.

Returns error if:

  • Text cannot be tokenized
  • No valid tokens remain after filtering
  • Normalization fails (due to zero norm)

func GetTokenizer added in v0.21.0

func GetTokenizer() (*tokenizer.Tokenizer, error)

GetTokenizer returns a singleton instance of the tokenizer.

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL