search

package
v0.24.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 24, 2026 License: MIT Imports: 9 Imported by: 0

Documentation

Overview

Package search provides a Scout-style full-text search abstraction for the lagodev framework. It defines a small Engine interface that higher layers program against, plus two implementations:

  • Memory: a dependency-free in-memory inverted index (the default). It tokenizes documents, folds case, drops stop-words, supports term and prefix matching, TF-style ranking, equality filters, and pagination.
  • Postgres: a full-text engine that compiles tsvector/tsquery SQL through the framework's database.Connection (Grammar + Executor). It is wired so queries are parameterized; the SQL generation is unit-tested independently of a live server.

A Searchable model hook lets ORM rows opt into indexing without coupling the search package to the orm package.

Basic usage:

eng := search.NewMemory()
_ = eng.Index(ctx, "posts",
    search.Doc("1", map[string]any{"title": "Hello World", "body": "first post"}),
    search.Doc("2", map[string]any{"title": "Goodbye World", "body": "last post"}),
)
res, _ := eng.Search(ctx, "posts", "hello", search.Options{})
for _, hit := range res.Hits {
    fmt.Println(hit.ID, hit.Score)
}

Index

Examples

Constants

View Source
const DefaultPerPage = 15

DefaultPerPage is used when Options.PerPage is unset.

Variables

This section is empty.

Functions

func IsStopWord

func IsStopWord(tok string) bool

IsStopWord reports whether tok is in the built-in stop-word set. The input is folded to lower-case before the lookup.

func Tokenize

func Tokenize(text string) []string

Tokenize splits text into normalized, folded, stop-word-filtered tokens. It is the shared primitive used to index documents and parse queries so both sides use the same vocabulary. Splitting happens on any non-letter, non-digit rune.

func WrapDB

func WrapDB(db *sql.DB) querier

WrapDB adapts a standard *sql.DB to the querier expected by NewPostgres.

Types

type Document

type Document struct {
	// ID uniquely identifies the document within its index.
	ID string
	// Fields holds the searchable/filterable attributes. Keys are field names.
	Fields map[string]any
}

Document is a single indexable record: a stable ID plus a set of searchable fields. Field values may be strings, numbers, or bools; non-string values are stringified for the full-text index and kept verbatim for equality filters.

func Doc

func Doc(id string, fields map[string]any) Document

Doc is a convenience constructor for a Document.

type Engine

type Engine interface {
	// Index inserts or replaces documents in the named index. Re-indexing a
	// document with an existing ID replaces it.
	Index(ctx context.Context, index string, docs ...Document) error
	// Delete removes documents by ID from the named index. Unknown IDs are
	// ignored.
	Delete(ctx context.Context, index string, ids ...string) error
	// Search runs query against the named index and returns a ranked,
	// paginated page of hits.
	Search(ctx context.Context, index, query string, opts Options) (Results, error)
}

Engine is the search backend contract. Implementations must be safe for concurrent use.

type Hit

type Hit struct {
	ID     string
	Score  float64
	Fields map[string]any
}

Hit is a single search result: the matched document ID, its relevance score (higher is more relevant), and the original fields (when the engine retains them).

type Memory

type Memory struct {
	// contains filtered or unexported fields
}

Memory is a dependency-free, in-memory search Engine. It keeps every indexed document in a per-index map and answers queries by tokenizing both the stored documents and the query, then scoring by term-frequency overlap. It is the default engine returned by NewMemory and is safe for concurrent use.

Ranking is intentionally simple: each query token contributes the number of times it appears across a document's fields (its term frequency). When Options.Prefix is set, a query token also matches any document token that has it as a prefix. Documents with a higher accumulated score rank first; ties break on document ID for a stable ordering.

Example

ExampleMemory indexes a handful of documents into the in-memory engine and runs a query, printing the ranked result IDs. Ranking is by term-frequency overlap; ties break on ID, so the order is deterministic.

package main

import (
	"context"
	"fmt"

	"github.com/devituz/lagodev/search"
)

func main() {
	ctx := context.Background()
	eng := search.NewMemory()

	_ = eng.Index(ctx, "posts",
		search.Doc("1", map[string]any{"title": "Hello World", "body": "first post"}),
		search.Doc("2", map[string]any{"title": "Goodbye World", "body": "last post"}),
		search.Doc("3", map[string]any{"title": "World World World", "body": "all about the world"}),
	)

	res, err := eng.Search(ctx, "posts", "world", search.Options{})
	if err != nil {
		fmt.Println("search:", err)
		return
	}

	fmt.Println("total:", res.Total)
	for _, hit := range res.Hits {
		fmt.Printf("%s score=%.0f\n", hit.ID, hit.Score)
	}

}
Output:
total: 3
3 score=4
1 score=1
2 score=1

func NewMemory

func NewMemory() *Memory

NewMemory constructs an empty in-memory search Engine.

func (*Memory) Delete

func (m *Memory) Delete(_ context.Context, index string, ids ...string) error

Delete removes documents by ID from the named index. Unknown IDs and unknown indexes are silently ignored.

func (*Memory) Index

func (m *Memory) Index(_ context.Context, index string, docs ...Document) error

Index inserts or replaces docs in the named index. A document whose ID already exists is replaced wholesale (its previous tokens and fields are discarded).

func (*Memory) Search

func (m *Memory) Search(_ context.Context, index, query string, opts Options) (Results, error)

Search tokenizes query, scores every document in the named index, applies equality Filters, sorts by descending score, and returns the requested page. Total reflects the number of matches before pagination. A blank query (no tokens) yields no hits.

type Options

type Options struct {
	// Page is 1-indexed; values < 1 are treated as 1.
	Page int
	// PerPage is the page size; values < 1 fall back to DefaultPerPage.
	PerPage int
	// Filters constrains results to documents whose field equals the given
	// value (string-compared). Multiple filters are AND-ed.
	Filters map[string]any
	// Prefix enables prefix matching in addition to exact term matching, so a
	// query token "hel" matches "hello". Affects the Memory engine; the
	// Postgres engine appends ":*" to query lexemes.
	Prefix bool
}

Options controls a Search call: pagination, equality filters, and prefix matching. The zero value is valid (first page, default size, term matching).

type PgConfig

type PgConfig struct {
	// IDColumn is the primary-key column scanned into Hit.ID. Default: "id".
	IDColumn string
	// VectorColumn is the tsvector column matched against the tsquery and fed
	// to ts_rank for scoring. Default: "search".
	VectorColumn string
	// FieldsColumn, when non-empty, is selected and scanned into Hit.Fields via
	// a sql.Scanner-compatible destination supplied by the caller's driver.
	// Default: "" (Hit.Fields left nil).
	FieldsColumn string
	// Config is the Postgres text-search configuration passed to the tsquery/
	// tsvector functions (e.g. "english", "simple"). Default: "english".
	Config string
	// WebSearch selects websearch_to_tsquery instead of plainto_tsquery. It is
	// ignored when Prefix matching is requested, since websearch syntax does
	// not compose with the ":*" prefix operator.
	WebSearch bool
}

PgConfig customizes the SQL the Postgres engine emits. The zero value is valid and targets a conventional layout: one physical table per index, with a text "id" column, a generated/maintained tsvector "search" column, and a JSON/JSONB "fields" column scanned back into Hit.Fields.

type Postgres

type Postgres struct {
	// contains filtered or unexported fields
}

Postgres is a full-text search Engine backed by database/sql and Postgres' tsvector/tsquery machinery. It compiles parameterized SELECTs that rank rows with ts_rank and paginate with LIMIT/OFFSET. The engine never interpolates user input into SQL text: index names and column identifiers come from configuration, while the query and filter values are always bound parameters.

func NewPostgres

func NewPostgres(db querier, cfg PgConfig) *Postgres

NewPostgres constructs a Postgres engine over the given querier (typically a *sql.DB wrapped via WrapDB) using cfg, applying defaults for unset fields.

func (*Postgres) Delete

func (p *Postgres) Delete(ctx context.Context, index string, ids ...string) error

Delete removes rows by ID from the index table. It emits a single parameterized DELETE ... WHERE id IN ($1, $2, ...).

func (*Postgres) Index

func (p *Postgres) Index(_ context.Context, _ string, _ ...Document) error

Index is a no-op for the Postgres engine: documents live in the underlying table and are populated by the application's normal writes (the tsvector column is expected to be generated or trigger-maintained). It satisfies the Engine interface so a Postgres-backed app can share search-agnostic code.

func (*Postgres) Search

func (p *Postgres) Search(ctx context.Context, index, query string, opts Options) (Results, error)

Search compiles and runs a tsquery search against the index table, ordering by ts_rank descending and paginating with LIMIT/OFFSET. A blank query (no lexemes after tokenizing) short-circuits to an empty result without a round trip. Equality Filters become additional WHERE predicates with bound values.

type Results

type Results struct {
	Hits  []Hit
	Total int
}

Results is a page of search hits plus the total number of matches across all pages (before pagination).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL