Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func EnsureModelDownloaded ¶
EnsureModelDownloaded downloads the model artifacts directly. We only need the tokenizer files for sparse vector generation.
func GenerateSparseVector ¶
GenerateSparseVector converts text into a normalized SparseVector using Bag-of-Tokens. The resulting vector is L2-normalized to unit length for consistent similarity scoring. Special tokens (PAD, CLS, SEP) are filtered out to reduce noise and index size.
Returns error if:
- Text cannot be tokenized
- No valid tokens remain after filtering
- Normalization fails (due to zero norm)
func GetTokenizer ¶ added in v0.21.0
GetTokenizer returns a singleton instance of the tokenizer.
Types ¶
This section is empty.
Click to show internal directories.
Click to hide internal directories.