Documentation
¶
Index ¶
- Variables
- func ChunkBySentencesExported(text string, maxWords int) []string
- func CleanText(text string) string
- func CosineSimilarity(a, b []float32) float64
- func DotProduct(a, b []float32) float64
- func FindLibrary(name, goos string) (string, error)
- func IsLibraryAvailable() bool
- func LibDirs(goos string) []string
- func LibraryNotFoundError(err error) bool
- func Normalize(v []float32)
- func ResolveLibraryError(err error) string
- type Vectorizer
Constants ¶
This section is empty.
Variables ¶
var ErrLibraryNotFound = errors.New("llama.cpp shared library not found")
ErrLibraryNotFound is returned when the shared library cannot be located.
Functions ¶
func ChunkBySentencesExported ¶
ChunkBySentencesExported is the exported wrapper for chunkBySentences, used in integration and e2e tests.
func CleanText ¶
CleanText runs the full cleaning pipeline on raw input text:
- Strip HTML / XML tags, inserting spaces between adjacent text nodes
- Remove emoji and non-printable / control characters
- Collapse whitespace and trim
func CosineSimilarity ¶
CosineSimilarity computes the cosine similarity between two embedding vectors.
func DotProduct ¶
DotProduct computes the dot product between two normalized embedding vectors.
func FindLibrary ¶
FindLibrary is exported for testing.
func IsLibraryAvailable ¶
func IsLibraryAvailable() bool
IsLibraryAvailable checks if the llama.cpp library can be found without loading it.
func LibDirs ¶
LibDirs returns the list of directories searched for the shared library. Exported for testing.
func LibraryNotFoundError ¶
LibraryNotFoundError is a sentinel check.
func ResolveLibraryError ¶
ResolveLibraryError returns a user-friendly error message when library is not found.
Types ¶
type Vectorizer ¶
type Vectorizer struct {
// contains filtered or unexported fields
}
Vectorizer represents a loaded GGUF embedding model.
func NewVectorizer ¶
func NewVectorizer(modelPath string, gpuLayers int, embedContextSize uint32) (*Vectorizer, error)
NewVectorizer loads a GGUF model file and returns a ready-to-use vectorizer. gpuLayers controls how many model layers are offloaded to GPU (0 = CPU only). embedContextSize sets the llama.cpp context window; 0 defaults to 512.
func (*Vectorizer) Close ¶
func (v *Vectorizer) Close() error
Close releases all resources held by the vectorizer.
func (*Vectorizer) EmbedDim ¶
func (v *Vectorizer) EmbedDim() int
EmbedDim returns the dimensionality of the model's embedding vectors.
func (*Vectorizer) EmbedText ¶
func (v *Vectorizer) EmbedText(text string) ([]float32, error)
EmbedText cleans the input text and converts it into a float32 embedding vector. If the text exceeds the model's token budget it is split into sentence-boundary-aware chunks and the embeddings are averaged.