Documentation
¶
Overview ¶
Package code provides code-aware sparse vector generation for source code.
The CodeSparseProvider splits identifiers (camelCase, snake_case, acronyms) before hashing into sparse vectors, improving recall for code search.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type CodeSparseProvider ¶
type CodeSparseProvider struct {
// contains filtered or unexported fields
}
func NewCodeSparseProvider ¶
func NewCodeSparseProvider() *CodeSparseProvider
func (*CodeSparseProvider) GenerateSparseVector ¶
func (p *CodeSparseProvider) GenerateSparseVector(ctx context.Context, text string) (*schema.SparseVector, error)
type Provider ¶
type Provider interface {
GenerateSparseVector(ctx context.Context, text string) (*schema.SparseVector, error)
}
func NewProvider ¶
func NewProvider() Provider
type Tokenizer ¶
type Tokenizer struct{}
Tokenizer is a code-aware sparse vector provider. It splits camelCase and snake_case identifiers into constituent terms, filters language keywords, and produces normalized sparse vectors via FNV hashing. Register it with sparse.RegisterProvider to replace the default BGE BoW provider for source code inputs.
func NewTokenizer ¶
func NewTokenizer() *Tokenizer
Click to show internal directories.
Click to hide internal directories.