resolver

package
v0.8.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 29, 2026 License: MIT Imports: 22 Imported by: 0

Documentation

Overview

Package resolver — nl_community.go implements topic clustering for knowledge nodes using label propagation.

After NL entity extraction and optional embedding-based relationship discovery, this pass groups related knowledge nodes into communities. Each node gets a "community" metadata key containing its community label.

Label propagation is a simple, fast, pure-Go algorithm that doesn't require any external dependencies. It converges in O(iterations × edges) time.

Pipeline position: runs AFTER ResolveNLEntities and DiscoverEmbedRelations.

Package resolver — nl_embed_doccode.go bridges documentation and code using embedding similarity.

Problem: Name-matching (docedges.go) only links docs to code when the exact entity name appears in text. A "Deployment Guide" section that discusses running servers relates to Flask.run() but never mentions it by name.

Solution: For each doc Section node, embed its title+body, search HNSW for similar code entities (functions, structs, etc.), and create EXPLAINS edges when similarity exceeds a threshold. This catches implicit doc↔code links.

Cascade strategy (most specific → broadest):

  1. Function/class level — exact name matches (handled by docedges.go)
  2. Function/class level — embedding similarity (this file, high threshold)
  3. File level — file path references in text (handled by docedges.go linkSectionsToFiles)
  4. File level — embedding similarity against file nodes (this file, medium threshold)
  5. Module/package level — embedding similarity (this file, lower threshold)

Only levels 2, 4, 5 are implemented here. Levels 1, 3 are in docedges.go. The cascade logic: if a section already has specific edges (function-level), skip broader fallbacks to avoid noise. If no function-level links exist, try file-level, then module-level.

Pipeline position: runs AFTER ResolveDocEdges (name-match) and AFTER the node embedding pass has populated HNSW vectors.

Package resolver — nl_embed_relations.go discovers relationships between knowledge nodes using embedding similarity.

After NL entity extraction creates knowledge nodes (NodeConcept, NodeEntity, NodeArtifact, NodeDecision), this pass runs over them and wires RELATES_TO edges between nodes that are semantically similar — even if no keyword-based relationship signal was found in the text.

Pipeline position: runs AFTER ResolveNLEntities and AFTER the node embedding pass has populated vectors for knowledge nodes.

Package resolver performs a post-parse cross-file CALLS edge resolution pass.

The Go parser (and other language parsers) collect raw call sites during AST traversal but cannot resolve cross-file targets at that time because not all nodes exist yet. This package drains those call sites after all files are parsed and links them to their target nodes via CALLS edges.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DetectCommunities

func DetectCommunities(g *graph.Graph, maxIterations int) int

DetectCommunities runs label propagation over knowledge nodes and writes the community label into each node's Metadata["community"]. Returns the number of distinct communities found.

maxIterations caps the propagation rounds. 10 is usually enough; the algorithm converges early when no labels change.

func DiscoverDocCodeRelations

func DiscoverDocCodeRelations(g *graph.Graph, er EmbedResolver, threshold float64) int

DiscoverDocCodeRelations finds code entities that are semantically similar to doc sections and creates EXPLAINS/DOCUMENTED_BY edges.

For each Section node that lacks function/class-level doc edges, it:

  1. Embeds the section title + body preview
  2. Searches HNSW for similar code entities
  3. Creates edges at the most specific level available (function > file > module)

Returns the number of EXPLAINS edges created. er must be non-nil; callers should guard before calling.

func DiscoverEmbedRelations

func DiscoverEmbedRelations(g *graph.Graph, er EmbedResolver, threshold float64) int

DiscoverEmbedRelations finds semantically similar pairs of knowledge nodes and wires RELATES_TO edges between them. Returns the number of edges created.

For each knowledge node, it embeds the node's name+context, searches HNSW for similar knowledge nodes, and creates edges for pairs above threshold. Skips self-loops and duplicate edges (AddEdge is idempotent).

er must be non-nil; callers should guard before calling.

func NormalizeKnowledgeName

func NormalizeKnowledgeName(name string) string

NormalizeKnowledgeName returns the canonical form of a knowledge entity name: lowercase and trimmed. This is the key component used in knowledge NodeIDs. Exported so callers (e.g. the watcher) can reconstruct NodeIDs for Tier 2 without duplicating the normalisation logic. Returns "" if the result is shorter than 3 characters.

func ResolveCallEdges

func ResolveCallEdges(g *graph.Graph) int

ResolveCallEdges drains all pending call sites from the graph and creates CALLS edges for any targets that can be resolved. Returns the number of edges created.

Must be called after all files have been parsed (i.e., after WalkDir or ParseFile returns) so that all target nodes already exist in the graph.

RTA multi-target: when instantiation data is available (Java/TypeScript), an untyped method call may resolve to MULTIPLE targets — all instantiated classes that define the method. An edge is emitted to each, matching true RTA semantics (Bacon & Sweeney, OOPSLA 1996).

func ResolveDocEdges

func ResolveDocEdges(g *graph.Graph) int

ResolveDocEdges scans ALL Section nodes and markdown file nodes in the graph and creates EXPLAINS (doc→code) and DOCUMENTED_BY (code→doc) edges for identifiers found in section body text, section titles, and frontmatter titles.

Must be called after all files are parsed so code entity nodes exist. Returns the number of EXPLAINS edges created.

Use ResolveDocEdgesForFile for incremental updates when only a single markdown file changed (avoids rescanning the entire graph).

func ResolveDocEdgesForFile

func ResolveDocEdgesForFile(g *graph.Graph, filePath string) int

ResolveDocEdgesForFile resolves doc edges only for Section nodes and the file node that belong to filePath. All other sections' edges are left intact.

Use this in the watcher when a single markdown file is reparsed: code entities are unchanged so only the new file's sections need linking. Returns the number of EXPLAINS edges created.

func ResolveGoTypesCallEdges

func ResolveGoTypesCallEdges(g *graph.Graph, root string) (int, error)

ResolveGoTypesCallEdges performs a type-checked CALLS resolution pass for Go files using golang.org/x/tools/go/packages. It supplements the tree-sitter resolver with cross-package, interface-dispatch, and closure-aware edges that structural analysis cannot see.

Returns the number of new CALLS edges added. Package-level type errors are logged to stderr but do not abort the run — partial results are returned. Returns an error only if packages.Load itself fails (e.g. no go.mod found).

func ResolveHeritageEdges

func ResolveHeritageEdges(g *graph.Graph) int

ResolveHeritageEdges creates IMPLEMENTS edges from explicit heritage clauses (implements/extends) extracted during parsing of nominally-typed languages (TypeScript, Java, C#, Kotlin). These edges are based on explicit source declarations and are always correct — no structural heuristic needed.

RTA filtering is intentionally NOT applied here. Heritage clauses are nominal type declarations: if a class says "implements Runnable", that relationship is structurally true regardless of whether the class is instantiated. Filtering by instantiation would break abstract base class chains (e.g. AbstractBase implements Service, ConcreteImpl extends AbstractBase — filtering AbstractBase drops the Service edge and breaks transitive hierarchy traversal). The Go structural heuristic (ResolveImplementsEdges) is where RTA filtering is valuable because it may over-match; nominal declarations cannot over-match.

Returns the number of new IMPLEMENTS edges added.

func ResolveImplementsEdges

func ResolveImplementsEdges(g *graph.Graph) int

ResolveImplementsEdges detects which structs satisfy which interfaces using a same-package structural heuristic: if a struct defines all methods listed in an interface's "methods" metadata, an IMPLEMENTS edge is added from the struct node to the interface node.

Structs with "heritage_implements" or "heritage_extends" metadata are SKIPPED — they use nominal typing (TypeScript, Java, C#, Kotlin) and their IMPLEMENTS edges are resolved by ResolveHeritageEdges instead. Structural matching produces false positives for nominal type systems.

This is an approximation. It only matches same-package pairs — cross-package interface satisfaction requires full type inference (go/types) which is not available here. It covers the dominant Go pattern where service types and their interfaces live in the same package.

Returns the number of new IMPLEMENTS edges added.

func ResolveNLEntities

func ResolveNLEntities(g *graph.Graph, er EmbedResolver) []parser.EntityCandidate

ResolveNLEntities runs the Tier 0+1 NL-to-graph extraction pipeline for all markdown Section nodes in the graph.

Tier 0: ExtractEntityCandidates scans section bodies for backtick spans, CamelCase tokens, quoted terms, and capitalized phrases.

Tier 1: Each candidate is matched against existing code nodes by name.

  • Match found → skip (docedges.go already created EXPLAINS/DOCUMENTED_BY).
  • No match → create a NodeConcept knowledge node + RELATES_TO edge from the section to the new knowledge node.

When er is non-nil, Tier 1 also performs embedding-based HNSW similarity search. Candidates with cosine > 0.6 are wired directly to an existing graph node via EXPLAINS (Section→CodeEntity); candidates in the 0.4–0.6 band are created as knowledge nodes and flagged with embed_hint metadata for Tier 2.

Returns the unresolved candidates across all sections, suitable for Tier 2 LLM classification via brain.Client.ScheduleNLClassification.

Must be called after MarkdownParser.Parse (Section nodes must exist) and after ResolveDocEdges (so code-entity links don't get duplicated).

func ResolveNLEntitiesForFile

func ResolveNLEntitiesForFile(g *graph.Graph, filePath string, er EmbedResolver) []parser.EntityCandidate

ResolveNLEntitiesForFile runs the Tier 0+1 NL-to-graph pipeline scoped to Section nodes belonging to filePath only. Use this in the watcher when a single markdown file changes — avoids rescanning all sections.

Returns unresolved candidates from this file for Tier 2 classification.

func ResolveNLEntitiesForFiles

func ResolveNLEntitiesForFiles(g *graph.Graph, filePaths []string, er EmbedResolver) map[string][]parser.EntityCandidate

ResolveNLEntitiesForFiles runs the Tier 0+1 pipeline for a set of markdown files in a single pass — buildCodeNames is called only once regardless of how many files are in the batch. Use this in the watcher for multi-file batches (initial index, branch switch) to avoid O(N×|graph|) redundancy.

Returns a map from filePath → unresolved candidates for Tier 2 scheduling. Files with no unresolved candidates are omitted from the result.

func ResolvePathAliases

func ResolvePathAliases(g *graph.Graph) int

ResolvePathAliases rewrites import package nodes in the graph that match tsconfig/jsconfig path aliases. This enables the resolver to match aliased imports (e.g., @/components/Foo) to their actual module locations.

Must be called after all files are parsed and before ResolveCallEdges. Returns the number of import nodes rewritten.

func ResolveTSTypesCallEdges

func ResolveTSTypesCallEdges(g *graph.Graph, root string) (int, error)

ResolveTSTypesCallEdges performs a type-checked CALLS resolution pass for TypeScript (.ts / .tsx) files by spawning a Node.js subprocess that runs the embedded tsresolver.js script against the project at root.

The resolver uses the TypeScript compiler API (typescript npm package) to resolve cross-file call targets that tree-sitter cannot see. It supplements — never replaces — the tree-sitter CALLS edges already in g.

Returns the number of new CALLS edges added. Requires:

  • Node.js available on PATH
  • "typescript" package in <root>/node_modules or installed globally

On any failure the error is returned and the caller should log it and continue (the graph is still usable with tree-sitter-only edges).

func ResolveTerraformRefs

func ResolveTerraformRefs(g *graph.Graph) int

ResolveTerraformRefs drains all pending TerraformRefs from the graph and creates DEPENDS_ON edges between resource nodes. Returns the number of edges created.

This enables cross-file Terraform dependency resolution: a resource defined in vpc.tf can have a DEPENDS_ON edge to a resource defined in compute.tf.

Must be called after all .tf files have been parsed (i.e., after WalkDir).

Types

type EmbedMatch

type EmbedMatch struct {
	NodeID string
	Score  float64 // cosine similarity [0, 1]
}

EmbedMatch is a single result from EmbedResolver.SearchByVector.

type EmbedResolver

type EmbedResolver interface {
	// EmbedText returns a vector embedding for the given text.
	// Returns (nil, nil) if embedding is intentionally disabled.
	// Returns (nil, err) on transient failure — callers fall back to name-match.
	EmbedText(ctx context.Context, text string) ([]float32, error)

	// SearchByVector finds the top-k graph nodes most similar to queryVec.
	// Returns node IDs with cosine similarity scores, descending order.
	SearchByVector(queryVec []float32, k int) []EmbedMatch
}

EmbedResolver provides optional embedding-based entity resolution. When non-nil, Tier 1 uses vector similarity in addition to name-matching. Implementations must be safe for concurrent use.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL