context

package
v0.6.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 22, 2026 License: MIT Imports: 12 Imported by: 0

Documentation

Overview

Package context implements graph-aware context packing for AI agent consumption.

Package context provides equivalence class retrieval for bridging the vocabulary gap between natural-language task descriptions and code symbol names.

An equivalence class maps a concept (like "TRANSITIVE_IMPACT") to multiple phrases that developers use to describe it ("blast radius", "impact analysis", "downstream callers") and the specific symbols/tools those phrases should resolve to ("TransitiveCallers", "BlastRadius", "blast_radius").

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ComputeHITS

func ComputeHITS(ctx stdctx.Context, store types.GraphStore, nodes []types.Hash, maxIter int) (map[types.Hash]HITSScores, error)

ComputeHITS runs the HITS (Hyperlink-Induced Topic Search) algorithm on a subgraph defined by the given node hashes. It computes authority scores (nodes that are heavily pointed to) and hub scores (nodes that point to many authorities).

In the context of code graphs:

  • Authority = heavily called functions, core types, key interfaces
  • Hub = orchestrators, entry points, functions that wire things together

Parameters:

  • nodes: the subgraph to analyze (typically top-200 RWR results)
  • store: graph store for edge lookups
  • maxIter: iterations (5-10 is typical for convergence)

Returns a map from node hash to HITS scores.

func EstimateNodeTokens

func EstimateNodeTokens(n types.Node) int

EstimateNodeTokens estimates the token cost of including a node's full representation in context output. Uses format-aware scaling when format is provided via EstimateNodeTokensForFormat.

func EstimateNodeTokensForFormat added in v0.2.0

func EstimateNodeTokensForFormat(n types.Node, format string) int

EstimateNodeTokensForFormat estimates token cost with format-aware scaling. GCF uses local IDs and positional encoding, producing ~84% fewer tokens than JSON for the same symbol data.

func EstimateTokens

func EstimateTokens(text string) int

EstimateTokens returns an approximate token count for a given text string. Uses the heuristic that code averages ~4 characters per token.

func FormatContextBlock

func FormatContextBlock(block *ContextBlock, format string) (string, error)

FormatContextBlock renders a ContextBlock into the requested format. Supported formats: "xml" (default), "markdown", "json". Returns an error for unknown formats.

func NormalizeKeywords added in v0.2.0

func NormalizeKeywords(taskDescription string) string

NormalizeKeywords extracts and normalizes keywords from a task description for storage and matching. Reuses the existing keyword extraction logic but returns a space-joined string suitable for LIKE matching.

func RandomWalkWithRestart

func RandomWalkWithRestart(ctx stdctx.Context, store types.GraphStore, seeds []types.Hash, alpha float64, maxIter int) (map[types.Hash]float64, error)

RandomWalkWithRestart computes relevance scores for all nodes reachable from the seed set by simulating random walks that restart at seed nodes with probability alpha. The stationary distribution assigns higher scores to nodes that are structurally close to the seeds and highly connected.

Parameters:

  • seeds: initial nodes to start walks from (uniform weight)
  • alpha: restart probability (0.2 means 20% chance of returning to a seed each step)
  • maxIter: maximum iterations (20 is typical for convergence)
  • store: graph store for edge lookups

Returns a map from node hash to relevance score (0.0 to 1.0, normalized).

Types

type AutoConceptGenerator added in v0.2.0

type AutoConceptGenerator struct {
	// contains filtered or unexported fields
}

AutoGeneratedConcepts scans all symbols in the graph and generates equivalence classes from naming patterns. This provides repo-specific vocabulary without hand curation.

Strategies:

  • Handler pattern: "handleBlastRadius" -> concept "blast radius" -> target symbol
  • Tool pattern: "blastRadiusTool" -> concept "blast radius tool" -> target symbol
  • Package grouping: all symbols in "search/" -> concept phrases from package name
  • Composite names: "IncrementalReindex" -> "incremental reindex" as searchable phrase

func NewAutoConceptGenerator added in v0.2.0

func NewAutoConceptGenerator(store types.GraphStore) *AutoConceptGenerator

NewAutoConceptGenerator creates a generator backed by the given store.

func (*AutoConceptGenerator) Generate added in v0.2.0

Generate scans the graph and produces equivalence classes from symbol naming patterns.

type BM25Searcher added in v0.2.0

type BM25Searcher interface {
	SearchBM25Nodes(ctx stdctx.Context, query string, limit int) ([]types.Node, error)
}

BM25Searcher is implemented by stores that support full-text BM25 search. Returns nodes ordered by BM25 relevance (best matches first).

type ContextBlock

type ContextBlock struct {
	Symbols     []RankedSymbol
	Edges       []ContextEdge
	Format      string
	TokensUsed  int
	TokenBudget int
	// PackRoot is the content-addressed identity of this context pack.
	// Computed from hash(task_normalized, snapshot_root, selected_node_hashes).
	// Two identical queries against the same graph state produce the same PackRoot,
	// enabling deduplication, citation, and cross-session replay.
	PackRoot types.Hash
}

ContextBlock is the result of a context query: a ranked list of symbols that fit within a token budget, plus the edges between them.

type ContextEdge

type ContextEdge struct {
	Source   string // qualified name of source
	Target   string // qualified name of target
	EdgeType string
}

ContextEdge is an edge between two symbols in the context block.

type ContextEngine

type ContextEngine struct {
	// contains filtered or unexported fields
}

ContextEngine queries the knowing knowledge graph to produce task-specific, token-budgeted context blocks ranked by graph relationships and runtime traffic.

func NewContextEngine

func NewContextEngine(store types.GraphStore) *ContextEngine

NewContextEngine creates a ContextEngine backed by the given GraphStore. If the store implements FeedbackProvider, feedback-aware reranking is enabled.

func (*ContextEngine) ExplainSymbol added in v0.2.0

func (e *ContextEngine) ExplainSymbol(ctx stdctx.Context, task string, symbolQuery string) (*ExplainResult, error)

ExplainSymbol runs the full retrieval pipeline for a task and returns a detailed scoring breakdown for a specific symbol. If the symbol is not in the results, it still returns whatever information is available (e.g., "not found in seed set, not reached by RWR").

func (*ContextEngine) ForFiles

func (e *ContextEngine) ForFiles(ctx stdctx.Context, opts FileOptions) (*ContextBlock, error)

ForFiles produces blast-radius context weighted by runtime observations for a set of changed files.

func (*ContextEngine) ForPR

func (e *ContextEngine) ForPR(ctx stdctx.Context, opts PROptions) (*ContextBlock, error)

ForPR produces relationship-aware context for a pull request. It identifies all symbols in the changed files, runs RWR from them to find the broader impact neighborhood, and includes blast radius (callers of changed symbols) as distance-1 context. This is the highest-value context call: one invocation at PR-open time surfaces the full structural impact.

func (*ContextEngine) ForTask

func (e *ContextEngine) ForTask(ctx stdctx.Context, opts TaskOptions) (*ContextBlock, error)

ForTask produces ranked context for a task description by finding relevant symbols in the knowledge graph, scoring them, and packing them within the token budget.

func (*ContextEngine) SetCache added in v0.3.0

func (e *ContextEngine) SetCache(c *cache.SubgraphCache)

SetCache attaches a SubgraphCache for result memoization. When set, ForTask checks the cache before running retrieval and stores the result after a cache miss. Cache keys are derived from the normalized task description so that identical queries skip the full retrieval pipeline. Pass nil to disable caching.

func (*ContextEngine) SetSession added in v0.2.0

func (e *ContextEngine) SetSession(st *SessionTracker)

SetSession attaches a session tracker to the engine. When set, symbols returned by previous queries in this session receive a boost on subsequent queries. Pass nil to disable session-aware boosting.

func (*ContextEngine) SetTaskMemory added in v0.2.0

func (e *ContextEngine) SetTaskMemory(tm *TaskMemory)

SetTaskMemory attaches a task memory for passive retrieval learning. When set, past task-symbol associations boost future queries with similar keywords.

func (*ContextEngine) SetVector added in v0.2.0

func (e *ContextEngine) SetVector(vs VectorSearcher)

SetVector attaches a vector search backend to the engine.

type EquivalenceClass added in v0.2.0

type EquivalenceClass struct {
	Concept    string   // canonical concept ID (e.g., "TRANSITIVE_IMPACT")
	Phrases    []string // natural-language phrases that refer to this concept
	Targets    []string // symbol/tool identifiers to boost when phrases match
	TargetType string   // "symbol", "mcp_tool", "edge_type", "workflow", "file"
	Weight     float64  // source strength (seed: 1.0, graph: 0.7, feedback: 0.5)
	Source     string   // "seed", "graph", "feedback", "generated"
}

EquivalenceClass maps a concept to its natural-language phrases and code targets.

type ExplainResult added in v0.2.0

type ExplainResult struct {
	Symbol        types.Node
	Rank          int     // 1-indexed position in the ranked results
	TotalScore    float64 // final score after all components
	TotalSymbols  int     // total symbols considered
	Components    ScoreComponents
	HITSAuthority float64  // raw HITS authority score (0 if HITS not run)
	HITSHub       float64  // raw HITS hub score
	HITSAdjust    float64  // net HITS adjustment applied to total
	RWRScore      float64  // raw Random Walk with Restart score
	IsSeed        bool     // was this a direct keyword match (distance=0)?
	SeedChannel   string   // which channel found this symbol ("tiered", "bm25", "equiv", "rwr")
	SeedTier      string   // for tiered matches: "exact", "prefix", "substring", "path"
	EquivMatches  []string // equivalence classes that matched (concept names)
	Keywords      []string // extracted keywords from the task description
	MaxCallers    int      // max caller count in the candidate set (normalization denominator)
	CallerProxy   int      // RWR-derived caller proxy for this symbol
}

ExplainResult is the full scoring breakdown for a symbol in the context of a task query. Every field that contributed to the final score is exposed.

type FeedbackProvider

type FeedbackProvider interface {
	FeedbackBoosts(ctx stdctx.Context, hashes []types.Hash, neighborhoodRoots map[types.Hash]types.Hash) (map[types.Hash]float64, error)
}

FeedbackProvider is implemented by stores that support feedback queries.

type FileOptions

type FileOptions struct {
	Files       []string // relative file paths
	RepoURL     string   // repo URL for resolving file hashes
	TokenBudget int      // default 50000
	Format      string   // "xml", "markdown", "json"
}

FileOptions configures a file-based context query.

type HITSScores

type HITSScores struct {
	Authority float64
	Hub       float64
}

HITSScores holds the authority and hub scores for a node.

type PROptions

type PROptions struct {
	Files       []string // changed file paths (relative to repo root)
	RepoURL     string   // repo URL for resolving file hashes
	TokenBudget int      // default 8000 (larger than per-edit, used once per PR)
	Format      string   // "xml", "markdown", "json", "gcf"
}

PROptions configures a PR context query.

type PackDiff added in v0.4.0

type PackDiff struct {
	// OldPackRoot is the PackRoot of the first pack.
	OldPackRoot types.Hash
	// NewPackRoot is the PackRoot of the second pack.
	NewPackRoot types.Hash

	// AddedSymbols are in new but not in old.
	AddedSymbols []string
	// RemovedSymbols are in old but not in new.
	RemovedSymbols []string
	// CommonSymbols are in both.
	CommonSymbols []string

	// Identical is true if the packs have the same symbols (PackRoots may
	// still differ if token budgets differ).
	Identical bool
}

PackDiff describes the difference between two context packs.

func CompareContextPacks added in v0.4.0

func CompareContextPacks(old, new *ContextBlock) PackDiff

CompareContextPacks computes the symmetric difference between two context blocks. This answers "what changed in the context this agent would see?"

type RankedSymbol

type RankedSymbol struct {
	Node       types.Node
	Score      float64
	Components ScoreComponents
	Provenance string
	Distance   int
}

RankedSymbol is a graph node paired with its computed relevance score and score breakdown.

func RankSymbols

func RankSymbols(symbols []ScoringInput, hitsScores ...map[types.Hash]HITSScores) []RankedSymbol

RankSymbols scores each symbol by a weighted formula incorporating blast radius, confidence, recency, and graph distance, then returns them sorted by score descending. Blast radius is normalized relative to the max in the input set, ensuring the full 0.0-1.0 range is used regardless of codebase size.

If HITS scores are provided (non-nil map), authority scores are factored into the ranking, promoting structurally important nodes (heavily called) over leaf functions.

type ScoreComponents

type ScoreComponents struct {
	BlastRadius float64
	Confidence  float64
	Recency     float64
	Distance    float64
	Feedback    float64
	Session     float64
}

ScoreComponents breaks down a symbol's score into its weighted components.

type ScoringInput

type ScoringInput struct {
	Node               types.Node
	CallerCount        int     // number of transitive callers (blast radius)
	Confidence         float64 // provenance tier confidence (0.0-1.0)
	LastObserved       int64   // unix timestamp of last runtime observation (0 = static only)
	DistanceFromTarget int     // hops from the task target symbol
	FeedbackBoost      float64 // 0.0 = no feedback, >0 = positive signal (0.0-1.0)
	SessionBoost       float64 // 0.0 = not seen this session, >0 = recently accessed (0.0-2.0)
	IsTestFile         bool    // true if the symbol is from a test file (deprioritized unless task is about testing)
}

ScoringInput provides the raw data needed to compute a symbol's relevance score.

type SessionTracker added in v0.2.0

type SessionTracker struct {
	// contains filtered or unexported fields
}

SessionTracker records which symbols were returned by the context engine during the current session. Subsequent queries boost these symbols and their graph neighbors, implementing the "session-aware retrieval" pattern where repeated interactions surface increasingly relevant context.

Design informed by competitive analysis:

  • Exponential decay (3-minute half-life for AI sessions, not days)
  • Capped boost multiplier (max 2.0x, prevents runaway dominance)
  • Tracks both returned symbols and queried files
  • Thread-safe for concurrent MCP tool calls

func NewSessionTracker added in v0.2.0

func NewSessionTracker() *SessionTracker

NewSessionTracker creates a tracker for the current session.

func (*SessionTracker) Count added in v0.2.0

func (st *SessionTracker) Count() int

Count returns the number of unique symbols tracked this session.

func (*SessionTracker) Record added in v0.2.0

func (st *SessionTracker) Record(hash types.Hash)

Record marks a symbol as accessed at the current time. Call this for every symbol returned in a context result.

func (*SessionTracker) RecordBatch added in v0.2.0

func (st *SessionTracker) RecordBatch(hashes []types.Hash)

RecordBatch marks multiple symbols as accessed.

func (*SessionTracker) Reset added in v0.2.0

func (st *SessionTracker) Reset()

Reset clears all session history.

func (*SessionTracker) SessionBoosts added in v0.2.0

func (st *SessionTracker) SessionBoosts(hashes []types.Hash) map[types.Hash]float64

SessionBoosts returns a boost multiplier for each requested hash based on how recently and frequently it was accessed this session. Values range from 0.0 (never accessed) to maxBoost (frequently/recently accessed). The boost decays exponentially from each access timestamp.

type TaskMemory added in v0.2.0

type TaskMemory struct {
	// contains filtered or unexported fields
}

TaskMemory persists which symbols were useful for which tasks, enabling the retrieval pipeline to learn from past agent interactions. Over time, the system develops per-repo vocabulary: "when a developer asks about X, these symbols tend to be what they actually need."

The memory is passive: it records what symbols were returned by context_for_task and later accessed in the session (via SessionTracker). No explicit user action required.

func NewTaskMemory added in v0.2.0

func NewTaskMemory(db *sql.DB) *TaskMemory

NewTaskMemory creates a task memory backed by the given database. The database must have the task_memory table (migration 008).

func (*TaskMemory) Count added in v0.2.0

func (tm *TaskMemory) Count(ctx context.Context) int

Count returns the number of stored task-symbol associations.

func (*TaskMemory) Recall added in v0.2.0

func (tm *TaskMemory) Recall(ctx context.Context, queryKeywords []string) (map[types.Hash]float64, error)

Recall finds symbols that were useful for tasks with similar keywords. Uses keyword overlap: the more query keywords match stored task keywords, the stronger the signal. Returns a map of symbol hash to boost score.

func (*TaskMemory) Record added in v0.2.0

func (tm *TaskMemory) Record(ctx context.Context, keywords string, symbolHash types.Hash, score float64) error

Record stores a (keywords, symbol) association from a completed task. Call this when a symbol returned by context_for_task was later accessed by the agent (positive signal) or when explicit feedback is given.

func (*TaskMemory) RecordBatch added in v0.2.0

func (tm *TaskMemory) RecordBatch(ctx context.Context, keywords string, symbolHashes []types.Hash, score float64) error

RecordBatch stores multiple associations at once.

type TaskOptions

type TaskOptions struct {
	TaskDescription string
	TokenBudget     int    // default 50000
	Format          string // "xml", "markdown", "json"
	DBPath          string // path to knowing.db (for CLI usage)
}

TaskOptions configures a task-based context query.

type VectorSearcher added in v0.2.0

type VectorSearcher interface {
	// EmbedAndSearch embeds the query text and returns the k nearest symbol node hashes.
	EmbedAndSearch(ctx stdctx.Context, query string, k int) ([]types.Hash, error)
}

VectorSearcher provides semantic nearest-neighbor search over symbol embeddings.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL