index

package
v0.1.11 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 7, 2026 License: MIT Imports: 21 Imported by: 0

Documentation

Overview

Package index provides the SQLite-backed derived index for VaultMind.

Index

Constants

View Source
const (
	CallerAgent         = "agent"
	CallerAgentNeighbor = "agent-neighbor"
	CallerHook          = "hook"
)

Caller* constants name the provenance of an access event. The string values land in the note_accesses.caller column and let `vaultmind self` filter "what I engaged with" from "what the harness pre-loaded."

CallerAgent is the default for explicit agent reads (note get, the target of an Ask). CallerAgentNeighbor is set when an Ask's context-pack pulls a neighbor in alongside the target — still real engagement, but lower-intent than a direct read. CallerHook is set when a Claude Code hook (SessionStart persona load, UserPromptSubmit pointer fanout, etc.) fires the access; these accesses populate the activation log but `self` filters them out by default so its hot list reflects deliberate engagement rather than ambient harness traffic.

Set via the VAULTMIND_CALLER env var or passed explicitly to RecordNoteAccessAs. RecordNoteAccess (no caller arg) reads the env and falls back to CallerAgent.

Variables

This section is empty.

Functions

func ComputeAliasMentions

func ComputeAliasMentions(db *DB, minAliasLen int) (int, error)

ComputeAliasMentions scans every note body for occurrences of aliases and domain-note titles, then writes alias_mention edges into the links table. It returns the number of new edges inserted. Edges shorter than minAliasLen characters are skipped. Calling this function clears any previous alias_mention edges before computing fresh ones.

func ComputeTagOverlap

func ComputeTagOverlap(db *DB, threshold float64) (int, error)

ComputeTagOverlap scans the tags table for notes sharing common tags and writes tag_overlap edges into the links table weighted by TF-IDF-style tag specificity. Only pairs whose combined score meets threshold are inserted. Calling this function clears any previous tag_overlap edges before computing fresh ones.

func CountFTS

func CountFTS(d *DB, query string, filters ...SearchFilters) (int, error)

CountFTS returns the total number of documents matching the query and filters, independent of any limit/offset. Used for pagination totals.

func DecodeColBERTEmbedding

func DecodeColBERTEmbedding(data []byte, _ int) ([][]float32, error)

DecodeColBERTEmbedding deserializes a ColBERT BLOB back to a per-token matrix. The dims are read from the 4-byte header; the dims parameter is ignored (kept for API compat).

func DecodeEmbedding

func DecodeEmbedding(data []byte) ([]float32, error)

DecodeEmbedding deserializes raw little-endian bytes back to a float32 slice.

func DecodeSparseEmbedding

func DecodeSparseEmbedding(data []byte) (map[int32]float32, error)

DecodeSparseEmbedding deserializes packed (int32, float32) pairs back to a sparse map. Returns an empty map (not nil) when data is empty.

func DeleteNoteByPath

func DeleteNoteByPath(d *DB, path string) error

DeleteNoteByPath removes a note and all its dependent rows from every table within a single transaction. It is used by the incremental indexer to clean up notes whose source files no longer exist on disk.

func DetectEmbeddingDims

func DetectEmbeddingDims(d *DB) (int, error)

DetectEmbeddingDims returns the dimensionality of stored embeddings, or 0 if none exist. Uses a single-row query — does not load all embeddings.

In mixed-state vaults (some notes embedded with MiniLM, others with BGE-M3 after a model upgrade) this returns whichever row SQLite happens to scan first, so the result is not authoritative for "what model is this vault using." Use DetectEmbeddingDimsCounts when you need the full picture.

func EncodeColBERTEmbedding

func EncodeColBERTEmbedding(colbert [][]float32) []byte

EncodeColBERTEmbedding serializes a per-token embedding matrix with a 4-byte dims header. Format: [uint32 dims][float32 data...] where data is tokens*dims floats.

func EncodeEmbedding

func EncodeEmbedding(vec []float32) []byte

EncodeEmbedding serializes a float32 slice to raw little-endian bytes for BLOB storage.

func EncodeSparseEmbedding

func EncodeSparseEmbedding(sparse map[int32]float32) []byte

EncodeSparseEmbedding serializes a sparse vector as packed (int32 token_id, float32 weight) pairs.

func HasColBERTEmbeddings

func HasColBERTEmbeddings(d *DB) (bool, error)

HasColBERTEmbeddings returns true if any note has a stored ColBERT embedding.

func HasEmbeddings

func HasEmbeddings(d *DB) (bool, error)

HasEmbeddings returns true if any note in the index has a stored embedding.

func HasSparseEmbeddings

func HasSparseEmbeddings(d *DB) (bool, error)

HasSparseEmbeddings returns true if any note has a stored sparse embedding.

func LoadEmbedding

func LoadEmbedding(d *DB, noteID string) ([]float32, error)

LoadEmbedding reads the embedding for a single note. Returns nil, nil if no embedding stored.

func RecordNoteAccess

func RecordNoteAccess(d *DB, noteID string) error

RecordNoteAccess records a note access with the default caller (read from VAULTMIND_CALLER env var, falling back to CallerAgent). Backwards compatible with pre-2026-05-01 callers: the call signature is unchanged, so existing call sites don't have to be rewritten.

Use RecordNoteAccessAs when the caller is known structurally (e.g. query.Ask passes CallerAgent for the target and CallerAgentNeighbor for context-pack neighbors). Use RecordNoteAccess when the caller is determined by the runtime context (e.g. a shell hook setting VAULTMIND_CALLER).

func RecordNoteAccessAs

func RecordNoteAccessAs(d *DB, noteID, caller string) error

RecordNoteAccessAs records a note access with an explicit caller label. Two side effects:

  1. Inserts into note_accesses with (note_id, caller, accessed_at) — the per-event log that `self` and future ACT-R retrieval scoring read from.
  2. Updates the scalar (notes.access_count, notes.last_accessed_at) — kept for backward compatibility and fast lookup on hot paths.

Best-effort: each per-note tracking miss is the caller's responsibility to log at debug; never fail the user query over optional bookkeeping.

func ResolveLinks(db *DB) (int, error)

ResolveLinks updates unresolved links by matching dst_raw against note IDs, titles, and aliases. Sets dst_note_id and resolved=TRUE for matches.

func StoreColBERTEmbedding

func StoreColBERTEmbedding(d *DB, noteID string, colbert [][]float32) error

StoreColBERTEmbedding writes a ColBERT embedding BLOB for a note.

func StoreEmbedding

func StoreEmbedding(d *DB, noteID string, vec []float32) error

StoreEmbedding writes an embedding BLOB for a note that already exists in the index.

func StoreNote

func StoreNote(d *DB, rec NoteRecord) error

StoreNote deletes all existing rows for the note, then inserts fresh rows into every table within a single transaction (delete-before-reinsert). StoreNote stores a note within its own transaction.

func StoreNoteInTx

func StoreNoteInTx(tx *sql.Tx, rec NoteRecord) error

StoreNoteInTx stores a note within an existing transaction. Used by Rebuild for batch transactions.

func StoreSparseEmbedding

func StoreSparseEmbedding(d *DB, noteID string, sparse map[int32]float32) error

StoreSparseEmbedding writes a sparse embedding BLOB for a note.

func StripForAliasMatch

func StripForAliasMatch(body string) string

StripForAliasMatch removes markup that should be excluded from alias detection: code fences, inline code, wikilinks (keeping aliased display text), and HTML comments.

Types

type BlockRecord

type BlockRecord struct {
	BlockID   string
	Heading   string
	StartLine int
	EndLine   int
}

BlockRecord represents a block ID anchor for storage.

type BlockRow

type BlockRow struct {
	BlockID string `json:"block_id"`
	Heading string `json:"heading,omitempty"`
	Line    int    `json:"line"`
}

BlockRow represents a block ID in query results.

type DB

type DB struct {
	// contains filtered or unexported fields
}

DB wraps *sql.DB with schema initialization and VaultMind-specific helpers.

func Open

func Open(dbPath string) (*DB, error)

Open opens (or creates) a SQLite database at dbPath, creates the parent directory if needed, applies the full VaultMind schema, and configures pragmas (WAL mode, foreign key enforcement).

func (*DB) AllNoteTitles

func (d *DB) AllNoteTitles() ([]NoteTitle, error)

AllNoteTitles returns every note's ID and title from the index. Titles that are empty in the notes table (shouldn't happen post-index but guard anyway) are returned as-is — callers filter if they care.

func (*DB) Begin

func (d *DB) Begin() (*sql.Tx, error)

Begin starts a transaction.

func (*DB) Close

func (d *DB) Close() error

Close closes the underlying database connection.

func (*DB) Exec

func (d *DB) Exec(query string, args ...interface{}) (sql.Result, error)

Exec executes a query that doesn't return rows.

func (*DB) NoteHashes

func (d *DB) NoteHashes() (map[string]NoteHashInfo, error)

NoteHashes returns a map of note path → NoteHashInfo for all notes in the database. Used by the incremental indexer to detect changed and deleted notes.

func (*DB) Query

func (d *DB) Query(query string, args ...interface{}) (*sql.Rows, error)

Query executes a query that returns rows.

func (*DB) QueryFullNote

func (d *DB) QueryFullNote(id string) (*FullNote, error)

QueryFullNote returns complete note data including body, headings, blocks, aliases, tags. Uses GROUP_CONCAT subqueries to fold aliases and tags into the main note query, reducing the number of DB round-trips from 6 to 4.

func (*DB) QueryNoteByID

func (d *DB) QueryNoteByID(id string) (*NoteRow, error)

QueryNoteByID returns the note with the given ID, or nil if not found.

func (*DB) QueryNoteByPath

func (d *DB) QueryNoteByPath(path string) (*NoteRow, error)

QueryNoteByPath returns the note at the given vault-relative path, or nil.

func (*DB) QueryNotesByAlias

func (d *DB) QueryNotesByAlias(alias string, normalized bool) ([]NoteRow, error)

QueryNotesByAlias returns notes whose aliases match the given string. If normalized is true, compares against alias_normalized (lowercase, whitespace-collapsed).

func (*DB) QueryNotesByNormalized

func (d *DB) QueryNotesByNormalized(normalized string) ([]NoteRow, error)

QueryNotesByNormalized searches for notes whose title or alias, when hyphens and underscores are replaced with spaces and lowercased, matches the given normalized input.

func (*DB) QueryNotesByTitle

func (d *DB) QueryNotesByTitle(title string, caseInsensitive bool) ([]NoteRow, error)

QueryNotesByTitle returns notes matching the given title. If caseInsensitive is true, uses LOWER() comparison.

func (*DB) QueryRow

func (d *DB) QueryRow(query string, args ...interface{}) *sql.Row

QueryRow executes a query that returns at most one row.

func (*DB) UpdateMTime

func (d *DB) UpdateMTime(path string, mtime int64) error

UpdateMTime updates the mtime column for the note at the given path. Used when a file's content hash is unchanged but its mtime has changed.

type EmbedResult

type EmbedResult struct {
	Embedded    int    `json:"embedded"`
	Skipped     int    `json:"skipped"`
	Errors      int    `json:"errors"`
	EmptyOutput int    `json:"empty_output,omitempty"`
	Model       string `json:"model,omitempty"`
}

EmbedResult holds the outcome of an embedding pass.

EmptyOutput counts notes whose embedder returned without error but with empty Sparse and/or ColBERT outputs — the heads produced no usable tokens. These notes are NOT counted as Embedded (their sparse_embedding / colbert_embedding columns would be NULL); they remain pending for the next run. See vaultmind#22 for the silent-failure pattern this surfaces.

type EmbeddingDimsCount

type EmbeddingDimsCount struct {
	Dims  int // 384 = MiniLM, 1024 = BGE-M3, 0 = no dense
	Count int
}

EmbeddingDimsCount is one (dimensions, count) pair from a vault.

func DetectEmbeddingDimsCounts

func DetectEmbeddingDimsCounts(d *DB) ([]EmbeddingDimsCount, error)

DetectEmbeddingDimsCounts returns the count of notes per dense-embedding dimensionality. A consistent vault has exactly one entry; a mixed-state vault (mid-upgrade from MiniLM to BGE-M3, or partial-rebuild) returns multiple. Used by `doctor` to surface mixed state explicitly instead of claiming a single model name. See vaultmind#22 dig.

type FTSResult

type FTSResult struct {
	ID       string  `json:"id"`
	Type     string  `json:"type"`
	Title    string  `json:"title"`
	Path     string  `json:"path"`
	Snippet  string  `json:"snippet"`
	Score    float64 `json:"score"`
	IsDomain bool    `json:"is_domain_note"`
}

FTSResult represents a single full-text search hit per SRS-09.

func SearchFTS

func SearchFTS(d *DB, query string, limit, offset int, filters ...SearchFilters) ([]FTSResult, error)

SearchFTS performs a full-text search against the fts_notes table. Returns results ordered by relevance (rank), limited and offset as specified.

type FullNote

type FullNote struct {
	ID          string                 `json:"id"`
	Type        string                 `json:"type"`
	Path        string                 `json:"path"`
	Title       string                 `json:"title"`
	Frontmatter map[string]interface{} `json:"frontmatter"`
	Body        string                 `json:"body,omitempty"`
	Headings    []HeadingRow           `json:"headings,omitempty"`
	Blocks      []BlockRow             `json:"blocks,omitempty"`
	IsDomain    bool                   `json:"is_domain_note"`
	Aliases     []string               `json:"-"`
	Tags        []string               `json:"-"`
}

FullNote contains all data for a single note.

type HeadingRecord

type HeadingRecord struct {
	Slug  string
	Level int
	Title string
}

HeadingRecord represents a heading for storage.

type HeadingRow

type HeadingRow struct {
	Level int    `json:"level"`
	Title string `json:"title"`
	Slug  string `json:"slug"`
}

HeadingRow represents a heading in query results.

type IndexAndEmbedResult

type IndexAndEmbedResult struct {
	Index *IndexResult `json:"index"`
	Embed *EmbedResult `json:"embed,omitempty"`
}

IndexAndEmbedResult combines index and optional embed results for command output.

type IndexError

type IndexError struct {
	Path  string `json:"path"`
	Kind  string `json:"kind"` // "read" | "parse" | "store" | "delete"
	Error string `json:"error"`
}

IndexError names a specific per-file failure during Rebuild or Incremental. The counter in IndexResult.Errors tells you *how many* files failed; ErrorDetails tells you WHICH files and WHY — without it, a partial-index failure is an unactionable number (manifesto #3).

type IndexResult

type IndexResult struct {
	DBPath            string             `json:"db_path"`
	Indexed           int                `json:"indexed"`
	DomainNotes       int                `json:"domain_notes"`
	UnstructuredNotes int                `json:"unstructured_notes"`
	Errors            int                `json:"errors"`
	Skipped           int                `json:"skipped"`
	DuplicateIDs      int                `json:"duplicate_ids"`
	Added             int                `json:"added"`
	Updated           int                `json:"updated"`
	Deleted           int                `json:"deleted"`
	FullRebuild       bool               `json:"full_rebuild"`
	DurationMs        int64              `json:"duration_ms"`
	CompletedAt       string             `json:"completed_at"`
	ErrorDetails      []IndexError       `json:"error_details,omitempty"`
	PostIndexWarnings []PostIndexWarning `json:"post_index_warnings,omitempty"`
}

IndexResult holds the outcome of an index rebuild.

type Indexer

type Indexer struct {
	// contains filtered or unexported fields
}

Indexer orchestrates vault scanning, parsing, and SQLite storage.

func NewIndexer

func NewIndexer(vaultRoot, dbPath string, cfg *vault.Config) *Indexer

NewIndexer creates an Indexer for the given vault.

func (*Indexer) EmbedNotes

func (idx *Indexer) EmbedNotes(ctx context.Context, dbPath string, embedder embedding.Embedder) (*EmbedResult, error)

EmbedNotes computes and stores embeddings for all notes that don't have one yet. It opens its own DB connection (like Rebuild/Incremental) so it can be called after the indexer has closed its connection.

func (*Indexer) Incremental

func (idx *Indexer) Incremental() (*IndexResult, error)

Incremental scans the vault and only indexes files that are new or changed (detected via content hash). Deleted files are removed from the index.

func (*Indexer) IndexFile

func (idx *Indexer) IndexFile(relPath string) error

IndexFile re-indexes a single file by its vault-relative path.

func (*Indexer) Rebuild

func (idx *Indexer) Rebuild() (*IndexResult, error)

Rebuild performs a full rebuild: scan all .md files, parse, and store.

func (*Indexer) RunEmbed

func (idx *Indexer) RunEmbed(ctx context.Context, dbPath, model string) (*EmbedResult, error)

RunEmbed runs an embedding pass against the index DB. The embedder is constructed lazily — only when there's pending work to do.

Why lazy: the BGE-M3 model is ~2.2GB on disk and CGO+ORT session creation pegs a CPU core for ~1s every time it's invoked. Running `vaultmind index --embed --model bge-m3` against a fully-embedded vault (which happens whenever the user re-runs after editing zero notes — hooks, scripts, retries, doctor checks) used to pay that load cost unconditionally. Heat without work. Counting pending notes first lets us skip the model load entirely when there's nothing to do.

type LinkRecord

type LinkRecord struct {
	DstNoteID  string
	DstRaw     string
	EdgeType   string
	TargetKind string
	Heading    string
	BlockID    string
	Resolved   bool
	Confidence string
	Origin     string
	Weight     float64
}

LinkRecord represents a single outbound edge for storage.

type NoteAccessStats

type NoteAccessStats struct {
	NoteID         string
	AccessCount    int
	LastAccessedAt string // RFC3339Nano UTC, empty when never accessed
	Title          string
	NoteType       string
}

NoteAccessStats reports the access counters for a single note. Useful for doctor / debugging / verifying that RecordNoteAccess is firing on the paths it's supposed to. Title and NoteType are populated by ListAccessedNotes so the self-rendering layer can produce human-readable output without a separate join. LookupNoteAccess leaves them empty (single-id callers don't need them).

func ListAccessedNotes

func ListAccessedNotes(d *DB) ([]NoteAccessStats, error)

ListAccessedNotes returns access stats across all notes with at least one recorded access, sorted newest-first by last access timestamp. Backs `vaultmind self` and any caller that wants "everything that's been touched." For the agent-only filtered view (excluding hook accesses), use ListAccessedNotesByCaller.

Pre-2026-05-01 this read from the scalar columns. Post-migration-007 it reads from the events table so callers see consistent data with the caller-filtered variant.

func ListAccessedNotesByCaller

func ListAccessedNotesByCaller(d *DB, caller string) ([]NoteAccessStats, error)

ListAccessedNotesByCaller returns access stats restricted to events fired by the given caller. Used by `vaultmind self` to filter out hook fan-outs from the proprioceptive view: the SessionStart hook and per-turn pointer-recall fire RecordNoteAccess across many notes before the agent does any deliberate work, and showing them in the "hot" list pollutes the engagement signal `self` is supposed to surface. Pass an empty string to include all callers (matches ListAccessedNotes behaviour).

The "exclude" semantic — "show all callers EXCEPT X" — is provided by ListAccessedNotesExcludingCaller, which is the shape `self` actually wants ("agent + agent-neighbor, not hook").

func ListAccessedNotesExcludingCaller

func ListAccessedNotesExcludingCaller(d *DB, excludedCaller string) ([]NoteAccessStats, error)

ListAccessedNotesExcludingCaller returns access stats from all callers EXCEPT the one named. Single-caller exclusion form; see ListAccessedNotesExcludingCallers for the multi-caller form.

func ListAccessedNotesExcludingCallers

func ListAccessedNotesExcludingCallers(d *DB, excludedCallers []string) ([]NoteAccessStats, error)

ListAccessedNotesExcludingCallers returns access stats from all callers EXCEPT the ones named. Used by `vaultmind self` to filter out *both* hook fan-outs (CallerHook) and Ask context-pack neighbors (CallerAgentNeighbor) so the proprioceptive view reflects only deliberate-target accesses (CallerAgent — Ask top-hit + note get).

Round-1 review caught hook pollution; round-2 review caught the next-louder source: a single Ask fires N+1 access events (target + N neighbors) and an off-target nonsense query's neighbor fan-out dominates the hot list. Both pollutions close at the same caller- dimension layer the schema already provides.

Empty list returns the unfiltered view (matches ListAccessedNotes).

func LookupNoteAccess

func LookupNoteAccess(d *DB, noteID string) (NoteAccessStats, error)

LookupNoteAccess returns the access stats for a single note, or (zero-stats, nil) when the note doesn't exist (deliberately mirrors QueryFullNote's "not found" semantics — caller checks whether NoteID came back populated). Reads from the scalar columns; for caller-aware lookups use the event-table queries directly.

type NoteColBERTEmbedding

type NoteColBERTEmbedding struct {
	NoteID   string
	ColBERT  [][]float32
	Type     string
	Title    string
	Path     string
	BodyText string
	IsDomain bool
}

NoteColBERTEmbedding pairs a note ID with its ColBERT matrix and metadata.

func LoadAllColBERTEmbeddings

func LoadAllColBERTEmbeddings(d *DB, dims int) ([]NoteColBERTEmbedding, error)

LoadAllColBERTEmbeddings returns all notes that have stored ColBERT embeddings. dims is the embedding dimensionality for decoding.

type NoteEmbedding

type NoteEmbedding struct {
	NoteID    string
	Embedding []float32
	Type      string
	Title     string
	Path      string
	BodyText  string
	IsDomain  bool
}

NoteEmbedding pairs a note ID with its embedding vector and metadata.

func LoadAllEmbeddings

func LoadAllEmbeddings(d *DB) ([]NoteEmbedding, error)

LoadAllEmbeddings returns all notes that have stored embeddings, including metadata. This is a single query that avoids N+1 lookups when scoring and filtering results.

type NoteHashInfo

type NoteHashInfo struct {
	Hash  string
	MTime int64
}

NoteHashInfo holds the content hash and modification time for a note.

type NoteRecord

type NoteRecord struct {
	ID       string
	Path     string
	Title    string
	Type     string
	Status   string
	Created  string
	Updated  string
	BodyText string
	Hash     string
	MTime    int64
	IsDomain bool
	Aliases  []string
	Tags     []string
	ExtraKV  map[string]interface{}
	Links    []LinkRecord
	Headings []HeadingRecord
	Blocks   []BlockRecord
}

NoteRecord is the storage-ready representation of a parsed note. The indexer builds this from parser.ParsedNote + file metadata.

type NoteRow

type NoteRow struct {
	ID       string
	Type     string
	Title    string
	Path     string
	Status   string
	IsDomain bool
}

NoteRow is a lightweight note record returned by query methods.

type NoteSparseEmbedding

type NoteSparseEmbedding struct {
	NoteID   string
	Sparse   map[int32]float32
	Type     string
	Title    string
	Path     string
	BodyText string
	IsDomain bool
}

NoteSparseEmbedding pairs a note ID with its sparse vector and metadata.

func LoadAllSparseEmbeddings

func LoadAllSparseEmbeddings(d *DB) ([]NoteSparseEmbedding, error)

LoadAllSparseEmbeddings returns all notes that have stored sparse embeddings.

type NoteTitle

type NoteTitle struct {
	ID    string
	Title string
}

NoteTitle pairs a note's ID with its display title. Used by callers that need to list notes by title without loading full frontmatter/body (e.g. the ask command's fuzzy-title fallback on zero hits).

type PostIndexWarning

type PostIndexWarning struct {
	Step  string `json:"step"`
	Error string `json:"error"`
}

PostIndexWarning reports failure of a post-store pass (link resolution, alias detection, tag overlap). These run after the note-store transaction commits; their failure leaves a partially-connected graph the operator can't distinguish from a successful run without this surface.

Conventional Step values: "orphan_sweep", "link_resolution", "alias_mention", "tag_overlap".

type SearchFilters

type SearchFilters struct {
	Type string // Filter by note type (empty = no filter)
	Tag  string // Filter by tag (empty = no filter)
}

SearchFilters holds optional filters for FTS search.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL