Documentation
¶
Overview ¶
Package resolver — nl_community.go implements topic clustering for knowledge nodes using label propagation.
After NL entity extraction and optional embedding-based relationship discovery, this pass groups related knowledge nodes into communities. Each node gets a "community" metadata key containing its community label.
Label propagation is a simple, fast, pure-Go algorithm that doesn't require any external dependencies. It converges in O(iterations × edges) time.
Pipeline position: runs AFTER ResolveNLEntities and DiscoverEmbedRelations.
Package resolver — nl_embed_doccode.go bridges documentation and code using embedding similarity.
Problem: Name-matching (docedges.go) only links docs to code when the exact entity name appears in text. A "Deployment Guide" section that discusses running servers relates to Flask.run() but never mentions it by name.
Solution: For each doc Section node, embed its title+body, search HNSW for similar code entities (functions, structs, etc.), and create EXPLAINS edges when similarity exceeds a threshold. This catches implicit doc↔code links.
Cascade strategy (most specific → broadest):
- Function/class level — exact name matches (handled by docedges.go)
- Function/class level — embedding similarity (this file, high threshold)
- File level — file path references in text (handled by docedges.go linkSectionsToFiles)
- File level — embedding similarity against file nodes (this file, medium threshold)
- Module/package level — embedding similarity (this file, lower threshold)
Only levels 2, 4, 5 are implemented here. Levels 1, 3 are in docedges.go. The cascade logic: if a section already has specific edges (function-level), skip broader fallbacks to avoid noise. If no function-level links exist, try file-level, then module-level.
Pipeline position: runs AFTER ResolveDocEdges (name-match) and AFTER the node embedding pass has populated HNSW vectors.
Package resolver — nl_embed_relations.go discovers relationships between knowledge nodes using embedding similarity.
After NL entity extraction creates knowledge nodes (NodeConcept, NodeEntity, NodeArtifact, NodeDecision), this pass runs over them and wires RELATES_TO edges between nodes that are semantically similar — even if no keyword-based relationship signal was found in the text.
Pipeline position: runs AFTER ResolveNLEntities and AFTER the node embedding pass has populated vectors for knowledge nodes.
Package resolver performs a post-parse cross-file CALLS edge resolution pass.
The Go parser (and other language parsers) collect raw call sites during AST traversal but cannot resolve cross-file targets at that time because not all nodes exist yet. This package drains those call sites after all files are parsed and links them to their target nodes via CALLS edges.
Index ¶
- func DetectCommunities(g *graph.Graph, maxIterations int) int
- func DiscoverDocCodeRelations(g *graph.Graph, er EmbedResolver, threshold float64) int
- func DiscoverEmbedRelations(g *graph.Graph, er EmbedResolver, threshold float64) int
- func NormalizeKnowledgeName(name string) string
- func ResolveCallEdges(g *graph.Graph) int
- func ResolveDocEdges(g *graph.Graph) int
- func ResolveDocEdgesForFile(g *graph.Graph, filePath string) int
- func ResolveGoTypesCallEdges(g *graph.Graph, root string) (int, error)
- func ResolveHeritageEdges(g *graph.Graph) int
- func ResolveImplementsEdges(g *graph.Graph) int
- func ResolveNLEntities(g *graph.Graph, er EmbedResolver) []parser.EntityCandidate
- func ResolveNLEntitiesForFile(g *graph.Graph, filePath string, er EmbedResolver) []parser.EntityCandidate
- func ResolveNLEntitiesForFiles(g *graph.Graph, filePaths []string, er EmbedResolver) map[string][]parser.EntityCandidate
- func ResolvePathAliases(g *graph.Graph) int
- func ResolveTSTypesCallEdges(g *graph.Graph, root string) (int, error)
- func ResolveTerraformRefs(g *graph.Graph) int
- type EmbedMatch
- type EmbedResolver
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func DetectCommunities ¶
DetectCommunities runs label propagation over knowledge nodes and writes the community label into each node's Metadata["community"]. Returns the number of distinct communities found.
maxIterations caps the propagation rounds. 10 is usually enough; the algorithm converges early when no labels change.
func DiscoverDocCodeRelations ¶
func DiscoverDocCodeRelations(g *graph.Graph, er EmbedResolver, threshold float64) int
DiscoverDocCodeRelations finds code entities that are semantically similar to doc sections and creates EXPLAINS/DOCUMENTED_BY edges.
For each Section node that lacks function/class-level doc edges, it:
- Embeds the section title + body preview
- Searches HNSW for similar code entities
- Creates edges at the most specific level available (function > file > module)
Returns the number of EXPLAINS edges created. er must be non-nil; callers should guard before calling.
func DiscoverEmbedRelations ¶
func DiscoverEmbedRelations(g *graph.Graph, er EmbedResolver, threshold float64) int
DiscoverEmbedRelations finds semantically similar pairs of knowledge nodes and wires RELATES_TO edges between them. Returns the number of edges created.
For each knowledge node, it embeds the node's name+context, searches HNSW for similar knowledge nodes, and creates edges for pairs above threshold. Skips self-loops and duplicate edges (AddEdge is idempotent).
er must be non-nil; callers should guard before calling.
func NormalizeKnowledgeName ¶
NormalizeKnowledgeName returns the canonical form of a knowledge entity name: lowercase and trimmed. This is the key component used in knowledge NodeIDs. Exported so callers (e.g. the watcher) can reconstruct NodeIDs for Tier 2 without duplicating the normalisation logic. Returns "" if the result is shorter than 3 characters.
func ResolveCallEdges ¶
ResolveCallEdges drains all pending call sites from the graph and creates CALLS edges for any targets that can be resolved. Returns the number of edges created.
Must be called after all files have been parsed (i.e., after WalkDir or ParseFile returns) so that all target nodes already exist in the graph.
RTA multi-target: when instantiation data is available (Java/TypeScript), an untyped method call may resolve to MULTIPLE targets — all instantiated classes that define the method. An edge is emitted to each, matching true RTA semantics (Bacon & Sweeney, OOPSLA 1996).
func ResolveDocEdges ¶
ResolveDocEdges scans ALL Section nodes and markdown file nodes in the graph and creates EXPLAINS (doc→code) and DOCUMENTED_BY (code→doc) edges for identifiers found in section body text, section titles, and frontmatter titles.
Must be called after all files are parsed so code entity nodes exist. Returns the number of EXPLAINS edges created.
Use ResolveDocEdgesForFile for incremental updates when only a single markdown file changed (avoids rescanning the entire graph).
func ResolveDocEdgesForFile ¶
ResolveDocEdgesForFile resolves doc edges only for Section nodes and the file node that belong to filePath. All other sections' edges are left intact.
Use this in the watcher when a single markdown file is reparsed: code entities are unchanged so only the new file's sections need linking. Returns the number of EXPLAINS edges created.
func ResolveGoTypesCallEdges ¶
ResolveGoTypesCallEdges performs a type-checked CALLS resolution pass for Go files using golang.org/x/tools/go/packages. It supplements the tree-sitter resolver with cross-package, interface-dispatch, and closure-aware edges that structural analysis cannot see.
Returns the number of new CALLS edges added. Package-level type errors are logged to stderr but do not abort the run — partial results are returned. Returns an error only if packages.Load itself fails (e.g. no go.mod found).
func ResolveHeritageEdges ¶
ResolveHeritageEdges creates IMPLEMENTS edges from explicit heritage clauses (implements/extends) extracted during parsing of nominally-typed languages (TypeScript, Java, C#, Kotlin). These edges are based on explicit source declarations and are always correct — no structural heuristic needed.
RTA filtering is intentionally NOT applied here. Heritage clauses are nominal type declarations: if a class says "implements Runnable", that relationship is structurally true regardless of whether the class is instantiated. Filtering by instantiation would break abstract base class chains (e.g. AbstractBase implements Service, ConcreteImpl extends AbstractBase — filtering AbstractBase drops the Service edge and breaks transitive hierarchy traversal). The Go structural heuristic (ResolveImplementsEdges) is where RTA filtering is valuable because it may over-match; nominal declarations cannot over-match.
Returns the number of new IMPLEMENTS edges added.
func ResolveImplementsEdges ¶
ResolveImplementsEdges detects which structs satisfy which interfaces using a same-package structural heuristic: if a struct defines all methods listed in an interface's "methods" metadata, an IMPLEMENTS edge is added from the struct node to the interface node.
Structs with "heritage_implements" or "heritage_extends" metadata are SKIPPED — they use nominal typing (TypeScript, Java, C#, Kotlin) and their IMPLEMENTS edges are resolved by ResolveHeritageEdges instead. Structural matching produces false positives for nominal type systems.
This is an approximation. It only matches same-package pairs — cross-package interface satisfaction requires full type inference (go/types) which is not available here. It covers the dominant Go pattern where service types and their interfaces live in the same package.
Returns the number of new IMPLEMENTS edges added.
func ResolveNLEntities ¶
func ResolveNLEntities(g *graph.Graph, er EmbedResolver) []parser.EntityCandidate
ResolveNLEntities runs the Tier 0+1 NL-to-graph extraction pipeline for all markdown Section nodes in the graph.
Tier 0: ExtractEntityCandidates scans section bodies for backtick spans, CamelCase tokens, quoted terms, and capitalized phrases.
Tier 1: Each candidate is matched against existing code nodes by name.
- Match found → skip (docedges.go already created EXPLAINS/DOCUMENTED_BY).
- No match → create a NodeConcept knowledge node + RELATES_TO edge from the section to the new knowledge node.
When er is non-nil, Tier 1 also performs embedding-based HNSW similarity search. Candidates with cosine > 0.6 are wired directly to an existing graph node via EXPLAINS (Section→CodeEntity); candidates in the 0.4–0.6 band are created as knowledge nodes and flagged with embed_hint metadata for Tier 2.
Returns the unresolved candidates across all sections, suitable for Tier 2 LLM classification via brain.Client.ScheduleNLClassification.
Must be called after MarkdownParser.Parse (Section nodes must exist) and after ResolveDocEdges (so code-entity links don't get duplicated).
func ResolveNLEntitiesForFile ¶
func ResolveNLEntitiesForFile(g *graph.Graph, filePath string, er EmbedResolver) []parser.EntityCandidate
ResolveNLEntitiesForFile runs the Tier 0+1 NL-to-graph pipeline scoped to Section nodes belonging to filePath only. Use this in the watcher when a single markdown file changes — avoids rescanning all sections.
Returns unresolved candidates from this file for Tier 2 classification.
func ResolveNLEntitiesForFiles ¶
func ResolveNLEntitiesForFiles(g *graph.Graph, filePaths []string, er EmbedResolver) map[string][]parser.EntityCandidate
ResolveNLEntitiesForFiles runs the Tier 0+1 pipeline for a set of markdown files in a single pass — buildCodeNames is called only once regardless of how many files are in the batch. Use this in the watcher for multi-file batches (initial index, branch switch) to avoid O(N×|graph|) redundancy.
Returns a map from filePath → unresolved candidates for Tier 2 scheduling. Files with no unresolved candidates are omitted from the result.
func ResolvePathAliases ¶
ResolvePathAliases rewrites import package nodes in the graph that match tsconfig/jsconfig path aliases. This enables the resolver to match aliased imports (e.g., @/components/Foo) to their actual module locations.
Must be called after all files are parsed and before ResolveCallEdges. Returns the number of import nodes rewritten.
func ResolveTSTypesCallEdges ¶
ResolveTSTypesCallEdges performs a type-checked CALLS resolution pass for TypeScript (.ts / .tsx) files by spawning a Node.js subprocess that runs the embedded tsresolver.js script against the project at root.
The resolver uses the TypeScript compiler API (typescript npm package) to resolve cross-file call targets that tree-sitter cannot see. It supplements — never replaces — the tree-sitter CALLS edges already in g.
Returns the number of new CALLS edges added. Requires:
- Node.js available on PATH
- "typescript" package in <root>/node_modules or installed globally
On any failure the error is returned and the caller should log it and continue (the graph is still usable with tree-sitter-only edges).
func ResolveTerraformRefs ¶
ResolveTerraformRefs drains all pending TerraformRefs from the graph and creates DEPENDS_ON edges between resource nodes. Returns the number of edges created.
This enables cross-file Terraform dependency resolution: a resource defined in vpc.tf can have a DEPENDS_ON edge to a resource defined in compute.tf.
Must be called after all .tf files have been parsed (i.e., after WalkDir).
Types ¶
type EmbedMatch ¶
EmbedMatch is a single result from EmbedResolver.SearchByVector.
type EmbedResolver ¶
type EmbedResolver interface {
// EmbedText returns a vector embedding for the given text.
// Returns (nil, nil) if embedding is intentionally disabled.
// Returns (nil, err) on transient failure — callers fall back to name-match.
EmbedText(ctx context.Context, text string) ([]float32, error)
// SearchByVector finds the top-k graph nodes most similar to queryVec.
// Returns node IDs with cosine similarity scores, descending order.
SearchByVector(queryVec []float32, k int) []EmbedMatch
}
EmbedResolver provides optional embedding-based entity resolution. When non-nil, Tier 1 uses vector similarity in addition to name-matching. Implementations must be safe for concurrent use.