Documentation
¶
Index ¶
- func AliasesFor(in, v []byte) bool
- func AssertTerms(t *testing.T, fn tokenizer.Tokenizer, in string, want []Term)
- func BruteForceLevenshteinMatches(k, m int, keyword string, terms ...string) []string
- func CollectLevenshteinMatches(k, m int, keyword string, terms ...string) []string
- func CompileEnglishQuery(t *testing.T, q string) *query.SimpleQuery
- func CompileQueryWith(t *testing.T, q string, def tokenizer.Tokenizer, ...) *query.SimpleQuery
- func CompileSpanishQuery(t *testing.T, q string) *query.SimpleQuery
- func EnglishMatchedSet(t *testing.T, q string, s *storage.Storage) []string
- func IndexOfDocument(s *storage.Storage, id string) (uint64, bool)
- func LevenshteinDistance(a, b []byte) int
- func MakeDoc[T ~string | ~[]byte](id T, fields ...*storage.FieldDefinition) *storage.Document
- func MakeField(hash uint64, length uint64, tokens ...*storage.TokenDefinition) *storage.FieldDefinition
- func MakeToken[T ~string | []byte](value T, freq uint64) *storage.TokenDefinition
- func MakeTokenTree(terms ...string) storage.Tokens
- func MatchedSetWith(t *testing.T, q string, s *storage.Storage, def tokenizer.Tokenizer, ...) []string
- func ResolveDocumentIndexes(s *storage.Storage, idxs []uint64) []string
- func RoundTrip(tb testing.TB, s *storage.Storage) *storage.Storage
- func RunFieldScore(s *storage.Storage, fieldHash uint64, candidates []uint64) (idxs []uint64, ctx *query.QueryContext)
- func RunQuery(q *query.SimpleQuery, s *storage.Storage) (idxs []uint64, ctx *query.QueryContext)
- func SortableDate(t *testing.T, s string) []byte
- func SortableFloat64(v float64) []byte
- func SortableInt64(v int64) []byte
- func SpanishMatchedSet(t *testing.T, q string, s *storage.Storage) []string
- func TempDirectory(tb testing.TB, pattern string) (name string)
- func TempFilename(tb testing.TB, pattern string) (name string)
- type Term
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func AliasesFor ¶
AliasesFor reports whether v points inside the backing array of in. Borrowed tokens must alias the input, owned tokens must not.
func BruteForceLevenshteinMatches ¶
BruteForceLevenshteinMatches is the oracle counterpart of CollectLevenshteinMatches: it scans every term, keeps those within edit distance k of keyword, sorts them byte-ascending (the automaton's yield order) and caps the result at m (m <= 0 means unlimited).
func CollectLevenshteinMatches ¶
CollectLevenshteinMatches builds a token tree from terms, runs the automaton for keyword/k/m and returns the matched values as strings, in the order the automaton yielded them. Values are copied out since Matches aliases tree keys. It returns nil when levenshtein.New rejects the parameters.
func CompileEnglishQuery ¶
func CompileEnglishQuery(t *testing.T, q string) *query.SimpleQuery
func CompileQueryWith ¶
func CompileSpanishQuery ¶
func CompileSpanishQuery(t *testing.T, q string) *query.SimpleQuery
func EnglishMatchedSet ¶
func IndexOfDocument ¶
IndexOfDocument returns the internal index assigned to an external doc id after the alphabetical sort performed by SortAndBuildFrom.
func LevenshteinDistance ¶
LevenshteinDistance is a reference byte-level edit distance (insert, delete, substitute; all cost 1) used to verify the automaton against brute force.
func MakeDoc ¶
MakeDoc creates a Document with the given external ID and field definitions. The ID must be unique across the index and will be sorted alphabetically during SortAndBuildFrom / BuildFrom.
func MakeField ¶
func MakeField(hash uint64, length uint64, tokens ...*storage.TokenDefinition) *storage.FieldDefinition
MakeField creates a FieldDefinition with the given xxh3 field hash, total token length for this document, and the list of token definitions.
func MakeToken ¶
func MakeToken[T ~string | []byte](value T, freq uint64) *storage.TokenDefinition
MakeToken creates a TokenDefinition with the given normalized value and frequency. The caller is responsible for normalization before passing the value.
func MakeTokenTree ¶
MakeTokenTree builds a byte-sorted token BTree like the one Storage produces, using the same TokenLessFunc and NoLocks options as production. Duplicate terms collapse into a single key.
func MatchedSetWith ¶
func ResolveDocumentIndexes ¶
ResolveDocumentIndexes maps a ranked slice of internal indices back to external ids.
func RoundTrip ¶
RoundTrip saves the storage to a buffer and loads it back into a fresh Storage. It returns the loaded storage. Any load error is returned to the caller.
func RunFieldScore ¶
func RunFieldScore(s *storage.Storage, fieldHash uint64, candidates []uint64) (idxs []uint64, ctx *query.QueryContext)
RunFieldScore builds a candidate bitmap, runs FieldScore against fieldHash, then resolves to a ranked slice (best first). Passing candidates == nil means "every document in the corpus"; a non-nil (even empty) slice restricts the candidate set to exactly those internal indices.
func RunQuery ¶
func RunQuery(q *query.SimpleQuery, s *storage.Storage) (idxs []uint64, ctx *query.QueryContext)
RunQuery filters then scores a query against s, returning the ranked doc indices (best first) alongside the populated context so assertions can read raw scores and the resolved bitmap.
func SortableFloat64 ¶
SortableFloat64 is the float counterpart of sortableInt.
func SortableInt64 ¶
SortableInt64 encodes v with the same sortable byte layout production uses for integer fields, so token byte order matches numeric order.
func SpanishMatchedSet ¶
Types ¶
type Term ¶
func CollectTerms ¶
CollectTerms drains the sequence and asserts the ownership invariant on every token: IsStem is true exactly when Value is an owned allocation that does not alias the input. The Token pointer is reused, so values are copied out here.