xsearch

package module
v0.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 17, 2026 License: MIT Imports: 12 Imported by: 0

README

xsearch

Fuzzy search library for Go. Per-field BM25 and Jaccard trigram scoring with optional bloom filter pre-rejection. Index any type. Handle typos, prefix matching, and group fallback — all in-process.

go get github.com/mbow/xsearch

Usage

Implement Searchable on your type:

type Drink struct {
    ID          string
    Name        string
    Category    string
    Ingredients []string
    Tags        map[string][]string
}

func (d Drink) SearchID() string { return d.ID }

func (d Drink) SearchFields() []xsearch.Field {
    fields := []xsearch.Field{
        {Name: "name", Values: []string{d.Name}, Weight: 1.0},
        {Name: "category", Values: []string{d.Category}, Weight: 0.5},
    }
    if len(d.Ingredients) > 0 {
        fields = append(fields, xsearch.Field{
            Name: "ingredients", Values: d.Ingredients, Weight: 0.4,
        })
    }
    for key, vals := range d.Tags {
        fields = append(fields, xsearch.Field{
            Name: key, Values: vals, Weight: 0.3,
        })
    }
    return fields
}

Build and search:

engine, err := xsearch.New(drinks,
    xsearch.WithBloom(100),
    xsearch.WithFallbackField("category"),
    xsearch.WithLimit(20),
)

results := engine.Search("smoky scotch")
for _, r := range results {
    item, _ := engine.Get(r.ID)
    fmt.Printf("%s (score: %.3f)\n", item.Fields[0].Values[0], r.Score)
}

Configuration

Option Default Purpose
WithBloom(bitsPerItem) disabled Bloom filter pre-rejection
WithBM25(k1, b) 1.2, 0.75 BM25 parameters
WithAlpha(alpha) 0.6 Relevance vs scorer blend weight
WithLimit(n) 10 Max results per search [2, 100]
WithFallbackField(name) none Field for group fallback
WithPrefixCache(prefixes) disabled Precompute results for given queries
WithUnicodeFold() disabled Accent-insensitive index/query normalization

External Scoring

Pass a Scorer per search to blend relevance with popularity or business logic:

results := engine.Search("lager", xsearch.WithScoring(scorer))

The library normalizes scorer output per search (max-normalized to [0, 1]) and blends with relevance using alpha. Negative values, NaN, and Inf are clamped to zero.

Prefix Cache

Precompute results for high-traffic queries (typically 1-2 character prefixes):

// Extract prefixes from your data.
var prefixes []string
seen := make(map[string]struct{})
for _, item := range items {
    name := strings.ToLower(item.Name)
    if len(name) >= 1 {
        if _, ok := seen[name[:1]]; !ok { seen[name[:1]] = struct{}{}; prefixes = append(prefixes, name[:1]) }
    }
    if len(name) >= 2 {
        if _, ok := seen[name[:2]]; !ok { seen[name[:2]] = struct{}{}; prefixes = append(prefixes, name[:2]) }
    }
}

engine, err := xsearch.New(items,
    xsearch.WithPrefixCache(prefixes),
)

// Cached queries return in ~30ns with zero heap allocations.
// When WithScoring is used, the cache is bypassed for fresh scored results.
results := engine.Search("b") // cache hit

Snapshots

Build indices once, serialize to CBOR, reload fast:

data, _ := engine.Snapshot()
engine2, _ := xsearch.NewFromSnapshot(data, items)

Snapshots are self-contained CBOR blobs with a version header (XSRC). Build-time index options are stored in the snapshot, including WithUnicodeFold(). Load-time options WithLimit, WithPrefixCache, and WithScopes may still be supplied on restore.

Snapshot version 2 breaks compatibility with older snapshot blobs. Rebuild any existing snapshots before calling NewFromSnapshot.

Bloom Filter

Use standalone or integrated via WithBloom():

bf := xsearch.NewBloom(10000, 100) // 10K items, 100 bits per item
bf.Add("trigram")
bf.MayContain("trigram") // true (no false negatives)
bf.MayContain("absent")  // false (rare false positives)

Scoring Model

Each field is scored independently. Final relevance is the weighted sum:

BM25 path (primary): relevance = sum(bm25(field_i, query) * field_i.Weight)

Jaccard fallback (typos): relevance = sum(jaccard(field_i_trigrams, query_trigrams) * field_i.Weight)

Prefix boosting applies to the highest-weighted field only.

Performance

These numbers are machine-dependent. Refresh this section with make bench-readme after make bench-record, or use make bench-publish.

Generated from profiles/benchmarks/bench-latest.txt via make bench-readme. Metric cells are medians across the recorded samples.

  • Commit: 09ff3d0
  • Branch: main (dirty)
  • Date: 2026-04-09T19:16:04+01:00
  • Filter: .
  • Count: 5
  • Time: 1s
  • CPU: 1
  • Package: .
Build Engine
Benchmark ns/op B/op allocs/op
default/docs_128 4287987 3655642 25003
default/docs_2048 70130381 57548382 278383
default/docs_8192 287796248 231610736 1054363
bloom/docs_128 5159472 4041096 26861
bloom/docs_2048 81130343 63766896 308083
bloom/docs_8192 324255754 256528092 1173150
fallback/docs_128 4718848 3665760 25107
fallback/docs_2048 72041760 57589217 278519
fallback/docs_8192 294235486 231820540 1054523
full/docs_128 5211626 4051216 26965
full/docs_2048 81450853 63807739 308219
full/docs_8192 333087481 256737898 1173310
Snapshot Encode
Benchmark ns/op MB/s B/op allocs/op
default/docs_128 907339 149.67 281539 26
default/docs_2048 9727357 251.48 4939687 33
default/docs_8192 38651944 256.26 19867137 35
full/docs_128 905951 151.86 281587 27
full/docs_2048 9967979 246.80 4972509 34
full/docs_8192 37905211 262.01 19920016 37
Snapshot Load
Benchmark ns/op MB/s B/op allocs/op
default/docs_128 2878427 47.18 1315624 13860
default/docs_2048 40106643 60.99 18099395 132874
default/docs_8192 144709414 68.45 71957434 506549
full/docs_128 2927417 47.00 1327416 13969
full/docs_2048 40246276 61.13 18164939 133015
full/docs_8192 145007250 68.49 72265662 506714
Component
Benchmark ns/op B/op allocs/op
bloom/add 19.16 0 0
bloom/may_contain_hit 14.84 0 0
bloom/may_contain_miss 5.614 0 0
extract_trigrams/allocating_medium 138.1 320 1
extract_trigrams/buffered_medium 13.31 0 0
extract_trigrams/buffered_long 37.06 0 0
tokenize/ascii 263.9 208 3
tokenize/mixed_case_unicode 457.1 248 8
bm25/docs_128 23847 2144 8
ngram/docs_128 9090 5088 7
bm25/docs_2048 368064 31320 12
ngram/docs_2048 19008 20016 10
fallback/exact/docs_128 11.87 0 0
fallback/best/docs_128 114.5 0 0
fallback/exact/docs_2048 11.90 0 0
fallback/best/docs_2048 115.6 0 0
Benchmark ns/op B/op allocs/op
exact/default/docs_128 37980 11952 71
exact/default/docs_2048 426382 87808 77
exact/default/docs_8192 1708830 329576 79
exact/bloom/docs_128 38355 11952 71
exact/bloom/docs_2048 418705 87808 77
exact/bloom/docs_8192 1752038 329576 79
prefix/default/docs_128 7228 7936 70
prefix/default/docs_2048 28572 25728 74
prefix/default/docs_8192 98797 88448 76
cached_prefix/default/docs_128 33.13 16 1
cached_prefix/default/docs_2048 33.68 16 1
cached_prefix/default/docs_8192 33.62 16 1
typo/default/docs_128 10373 5248 10
typo/default/docs_2048 21345 20256 13
typo/default/docs_8192 44944 9520 11
typo/bloom/docs_128 10251 5248 10
typo/bloom/docs_2048 21405 20256 13
typo/bloom/docs_8192 44771 9520 11
miss/default/docs_128 2149 16 1
miss/default/docs_2048 2164 16 1
miss/default/docs_8192 2127 16 1
miss_bloom/bloom/docs_128 362.1 16 1
miss_bloom/bloom/docs_2048 362.4 16 1
miss_bloom/bloom/docs_8192 361.9 16 1
fallback_exact/fallback/docs_128 5100 6680 46
fallback_exact/fallback/docs_2048 19665 27032 46
fallback_exact/fallback/docs_8192 70147 103616 48
fallback_typo/fallback/docs_128 6178 3456 12
fallback_typo/fallback/docs_2048 37913 24224 18
fallback_typo/fallback/docs_8192 37814 23712 18
scorer/default/docs_128 9288 8112 62
scorer/default/docs_2048 55055 30192 66
scorer/default/docs_8192 218168 106480 68
limit_10/default/docs_128 8373 7496 59
limit_10/default/docs_2048 47946 25288 63
limit_10/default/docs_8192 184764 88008 65
limit_100/wide/docs_128 11056 10808 89
limit_100/wide/docs_2048 101712 74328 513
limit_100/wide/docs_8192 307146 137048 515
multiword/default/docs_128 12828 7848 58
multiword/default/docs_2048 75095 25288 57
multiword/default/docs_8192 318194 90536 65
Benchmark ns/op B/op allocs/op
exact/default/docs_8192 1772159 329576 79
typo/default/docs_8192 46229 9520 11
miss/default/docs_8192 2126 16 1
miss_bloom/bloom/docs_8192 364.4 16 1

License

MIT

Changelog

v0.2.0
  • Snapshot format bumped to version 2; rebuild any existing snapshot blobs before calling NewFromSnapshot.
  • WithUnicodeFold() is now serialized into snapshots and restored on load, so folded engines round-trip without behavior drift.
  • NewFromSnapshot now rejects WithUnicodeFold() as a load-time override because it is part of the persisted snapshot config.
v0.1.0
  • Fold(s string) string — exported Unicode NFKD normalization with combining-mark stripping and ligature expansion (œ→oe, æ→ae, ß→ss). Idempotent.
  • WithUnicodeFold() Option — index- and query-time accent-insensitive search using Fold. Integrated with WithPrefixCache so prewarm keys hit after folding+normalization.
  • Engine.SearchWithFallback(primary, cascade, opts ...SearchOption) ([]Result, int) — runs primary; if empty, walks cascade in order and returns the first non-empty level. Level -1 = primary matched; 0..len(cascade)-1 = cascade index; len(cascade) = no match.
  • WithFilter(pred func(id string) bool) SearchOption — per-search predicate applied after scope filtering and before scoring/limit. Prefix-cache fast path is bypassed when a filter is set. Nil filter is zero-overhead.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Fold added in v0.1.0

func Fold(s string) string

Fold returns the Unicode-folded form of s: NFKD-normalized with combining marks stripped. Idempotent. Used by WithUnicodeFold during tokenization and exported so callers (e.g. brand-alias maps) can mirror the exact same normalization at lookup time.

Types

type Bloom

type Bloom struct {
	// contains filtered or unexported fields
}

Bloom is a Bloom filter backed by a []uint64 bit array.

func NewBloom

func NewBloom(n int, bitsPerItem int) *Bloom

NewBloom creates a bloom filter sized for n items.

func (*Bloom) Add

func (b *Bloom) Add(item string)

Add inserts item into the filter.

func (*Bloom) MayContain

func (b *Bloom) MayContain(item string) bool

MayContain reports whether item might be in the set.

type Engine

type Engine struct {
	// contains filtered or unexported fields
}

Engine is the core search engine.

func New

func New[T Searchable](items []T, opts ...Option) (*Engine, error)

New creates a search engine from a slice of Searchable items.

func NewFromSnapshot

func NewFromSnapshot[T Searchable](data []byte, items []T, opts ...Option) (*Engine, error)

NewFromSnapshot loads a pre-built engine from a self-contained CBOR snapshot.

func (*Engine) Get

func (e *Engine) Get(id string) (Item, bool)

Get returns the indexed item by ID, including original field values.

func (*Engine) IDs

func (e *Engine) IDs() []string

IDs returns the ordered list of indexed IDs.

func (*Engine) Search

func (e *Engine) Search(query string, opts ...SearchOption) []Result

Search returns results matching the query, ordered by combined score.

func (*Engine) SearchWithFallback added in v0.1.0

func (e *Engine) SearchWithFallback(primary string, cascade []string, opts ...SearchOption) ([]Result, int)

SearchWithFallback runs primary as a normal Search. If primary returns no results, walks cascade in order, returning the first non-empty level.

Returned int identifies which level matched:

-1               primary matched, cascade not walked
0..len(cascade)  cascade index that matched
len(cascade)     nothing matched at any level

All SearchOptions (filter, scope, scoring) apply to every level.

func (*Engine) Snapshot

func (e *Engine) Snapshot() ([]byte, error)

Snapshot serializes the engine's indices to a self-contained CBOR blob.

type Field

type Field struct {
	Name   string
	Values []string
	Weight float64
}

Field represents a named, weighted searchable field with one or more values.

type Highlight

type Highlight struct {
	Start      int // byte offset (inclusive)
	End        int // byte offset (exclusive)
	ValueIndex int // index into the matched Field.Values slice
}

Highlight marks a matched byte range within a field value.

type Item

type Item struct {
	ID     string
	Fields []Field
}

Item is the stored representation of an indexed document. It is returned by Engine.Get for rendering, lookup, or consumer-side hydration.

type MatchType

type MatchType int

MatchType indicates how a result was found.

const (
	MatchDirect   MatchType = iota // Found via direct n-gram or BM25 match
	MatchFallback                  // Found via fallback group
)

MatchType values.

type Option

type Option struct {
	// contains filtered or unexported fields
}

Option configures the engine.

func WithAlpha

func WithAlpha(alpha float64) Option

WithAlpha sets the blend between relevance and external score.

func WithBM25

func WithBM25(k1, b float64) Option

WithBM25 configures BM25 parameters.

func WithBloom

func WithBloom(bitsPerItem int) Option

WithBloom enables bloom filter pre-rejection.

func WithFallbackField

func WithFallbackField(fieldName string) Option

WithFallbackField sets the field used for fallback grouping.

func WithLimit

func WithLimit(n int) Option

WithLimit sets the max result count.

func WithPrefixCache added in v0.1.0

func WithPrefixCache(prefixes []string) Option

WithPrefixCache precomputes search results for the given prefixes at construction time.

func WithScopes added in v0.1.0

func WithScopes(scopes map[string][]string) Option

WithScopes registers named document scopes that can later be used with WithScope at search time.

func WithUnicodeFold added in v0.1.0

func WithUnicodeFold() Option

WithUnicodeFold enables NFKD normalization with combining-mark stripping on every Field.Values value at index build time and on every query token at search time. Default off. Adds ~5 ns/token amortized when on.

type Result

type Result struct {
	ID         string
	Score      float64
	MatchType  MatchType
	Highlights map[string][]Highlight // field name -> highlight spans
}

Result is a single search result with metadata.

type Scorer

type Scorer interface {
	Score(docIndex int) float64
}

Scorer provides an external score for a searchable item lock-free.

type SearchOption

type SearchOption func(*searchConfig)

SearchOption configures options for a single search execution.

func WithFilter added in v0.1.0

func WithFilter(pred func(id string) bool) SearchOption

WithFilter installs a per-search predicate evaluated on candidate hits after scope filtering, before scoring and limit. Drops items where pred(id) == false.

Predicate sees the doc ID so callers can compose with their own ID→domain-object lookup. Pass nil to disable; cost is zero when not set.

func WithScope added in v0.1.0

func WithScope(scope string) SearchOption

WithScope restricts a search to a named scope registered at engine construction time via WithScopes.

func WithScoring

func WithScoring(s Scorer) SearchOption

WithScoring applies a request-scoped scorer.

type Searchable

type Searchable interface {
	SearchID() string
	SearchFields() []Field
}

Searchable is implemented by any type that can be indexed and searched.

Directories

Path Synopsis
cmd
benchreadme command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL