types

package
v0.14.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 3, 2026 License: MIT Imports: 5 Imported by: 0

Documentation

Overview

Package types result types for graph queries and traversals.

Package types defines the core domain model for the knowing knowledge graph.

All entities (nodes, edges, files, repos, snapshots) are content-addressed using SHA-256 hashes, enabling deterministic identity, deduplication, and Merkle-based snapshot diffing. The hash functions in this package define the canonical identity computations used throughout the system.

Key types:

  • Node: a symbol (function, type, method, etc.) in the knowledge graph
  • Edge: a relationship (calls, imports, implements, references) between nodes
  • File and Repo: tracked source artifacts
  • Snapshot: a point-in-time Merkle root over all edges in a repo
  • EdgeEvent: an append-only event for event-sourced diff tracking

The GraphStore interface (interfaces.go) and Extractor interface define the contracts that concrete implementations must satisfy.

Index

Constants

View Source
const (
	KindFunction  = "function"
	KindMethod    = "method"
	KindType      = "type"
	KindInterface = "interface"
	KindConst     = "const"
	KindVar       = "var"
	KindService   = "service"
	KindRoute     = "route_handler"
	KindExternal  = "external"
	KindFile      = "file"
	KindPackage   = "package"
	KindField     = "field"
	KindEnvVar    = "env_var"
	KindProcess   = "process"
)

Node kind constants. Use these instead of raw string literals.

View Source
const (
	ProvenanceASTInferred     = "ast_inferred"
	ProvenanceASTResolved     = "ast_resolved"
	ProvenanceLSPResolved     = "lsp_resolved"
	ProvenanceSCIPResolved    = "scip_resolved"
	ProvenanceRuntimeObserved = "runtime_observed"
)

Provenance tier constants.

View Source
const (
	ConfidenceASTInferred  = 0.7
	ConfidenceASTResolved  = 0.85
	ConfidenceLSPResolved  = 0.9
	ConfidenceSCIPResolved = 1.0
)

Confidence constants by provenance tier.

Variables

This section is empty.

Functions

func VerifyEdgeHash added in v0.3.0

func VerifyEdgeHash(e Edge) error

VerifyEdgeHash recomputes an edge's hash from its stored fields and compares it to the stored EdgeHash. Returns nil if the hash matches, or a descriptive error on mismatch.

func VerifyNodeHash added in v0.3.0

func VerifyNodeHash(n Node, repoURL, packagePath string) error

VerifyNodeHash recomputes a node's hash from its stored fields and compares it to the stored NodeHash. Returns nil if the hash matches, or a descriptive error on mismatch.

The repoURL and packagePath must be passed explicitly because they are not stored on the Node struct directly. The symbolName is extracted from n.QualifiedName by taking everything after the last "." that follows "://".

Types

type BlastRadiusResult

type BlastRadiusResult struct {
	Target     Node                              // the node whose blast radius was computed
	ByRepo     map[string][]CallerWithProvenance // repo URL -> callers in that repo
	TotalCount int                               // total number of callers across all repos
	Truncated  bool                              // true if traversal hit the max depth limit
}

BlastRadiusResult groups all transitive callers of a target node by the repository they belong to. This powers the blast_radius MCP tool, showing how a change to one symbol ripples across the codebase.

type CalleeResult

type CalleeResult struct {
	Node  Node
	Depth int // 1 = direct callee, 2 = callee of a callee, etc.
}

CalleeResult is a single node in a transitive callees traversal, paired with its depth (hop count) from the query source.

type CallerResult

type CallerResult struct {
	Node  Node
	Depth int // 1 = direct caller, 2 = caller of a caller, etc.
}

CallerResult is a single node in a transitive callers traversal, paired with its depth (hop count) from the query target.

type CallerWithProvenance

type CallerWithProvenance struct {
	Caller     Node
	Depth      int
	Confidence float64          // minimum confidence along the call path (0.0 to 1.0)
	Provenance []EdgeProvenance // ordered provenance chain from caller to target
}

CallerWithProvenance pairs a caller node with the edge provenance chain from that caller back to the target. Confidence is the minimum confidence along the call path.

type DiffResult

type DiffResult struct {
	OldSnapshot  Hash
	NewSnapshot  Hash
	EdgesAdded   []Edge // edges present in NewSnapshot but not OldSnapshot
	EdgesRemoved []Edge // edges present in OldSnapshot but not NewSnapshot
	NodesAdded   []Node // nodes present in NewSnapshot but not OldSnapshot
	NodesRemoved []Node // nodes present in OldSnapshot but not NewSnapshot
}

DiffResult contains the structural diff between two snapshots. Used by the snapshot_diff and semantic_diff MCP tools to report what changed between two points in time.

type Edge

type Edge struct {
	EdgeHash   Hash    // content-addressed identity: sha256(sourceHash || targetHash || edgeType || provenance)
	SourceHash Hash    // hash of the source node (caller, importer, implementor)
	TargetHash Hash    // hash of the target node (callee, imported package, interface)
	EdgeType   string  // relationship kind: "calls", "imports", "implements", "references"
	Confidence float64 // quality score from 0.0 to 1.0; ast_inferred=0.7, lsp_resolved=0.9, ast_resolved=1.0
	Provenance string  // how the edge was derived: "ast_resolved", "ast_inferred", "lsp_resolved", etc.
	// CallSite fields store the source location of the call expression (not the
	// target declaration). These positions are used by LSP enrichment: the enricher
	// sends GetDefinition at (CallSiteFile, CallSiteLine, CallSiteCol) to confirm
	// or correct the target. Zero values mean no call-site info is available.
	CallSiteLine int    // 1-indexed line of the call expression in the source file
	CallSiteCol  int    // 0-indexed column of the call expression
	CallSiteFile string // relative file path (within the repo) containing the call expression
	// Runtime observation fields. Zero values for static edges.
	ObservationCount int   // total observations in current window (0 for static edges)
	LastObserved     int64 // unix timestamp of last observation (0 for static edges)
}

Edge represents a directed relationship between two nodes in the knowledge graph. Edge types include "calls", "imports", "implements", and "references". Each edge carries a confidence score and provenance tag indicating how it was derived (ast_resolved, ast_inferred, lsp_resolved, etc.).

type EdgeEvent

type EdgeEvent struct {
	EventID      int64  // auto-increment primary key
	EdgeHash     Hash   // hash of the edge that was added or removed
	EventType    string // "added" or "removed"
	SnapshotHash Hash   // the snapshot during which this event occurred
	SourceCommit string // git commit that triggered the event
	IndexerVer   string // version of the indexer that produced this event (e.g., "v1")
	Timestamp    int64  // unix timestamp of the event
	// Full edge data stored so removed-edge diffs work without joining
	// back to the edges table (removed edges are deleted from edges).
	SourceHash Hash    // source node hash (nullable for pre-migration events)
	TargetHash Hash    // target node hash
	EdgeType   string  // calls, imports, etc.
	Confidence float64 // confidence score
	Provenance string  // provenance tier
}

EdgeEvent represents an append-only edge mutation event for event sourcing. Each time an edge is added or removed during an index run, an EdgeEvent is recorded. These events power the SnapshotDiff query by tracking which edges changed between snapshots.

type EdgeProvenance

type EdgeProvenance struct {
	Source         string  // derivation method: "ast_resolved", "ast_inferred", "lsp_resolved", etc.
	Confidence     float64 // confidence score of this provenance step (0.0 to 1.0)
	IndexerVersion string  // version of the indexer that produced this edge
	SourceCommit   string  // git commit hash at the time of extraction
	SourceFileHash Hash    // hash of the source file from which the edge was extracted
	Timestamp      int64   // unix timestamp of extraction
}

EdgeProvenance captures the full derivation history of an edge. Used in BlastRadiusResult to show the provenance chain from a caller back to the target, so consumers can assess trustworthiness.

type ExtractOptions

type ExtractOptions struct {
	RepoURL    string // the repo URL (or local path) as registered in the store
	RepoHash   Hash   // sha256(RepoURL)
	CommitHash string // git commit hash being indexed
	FilePath   string // file path relative to the repository root
	FileHash   Hash   // content-addressed file hash: sha256(repoHash || path || contentHash)
	Content    []byte // raw file contents
	ModuleRoot string // absolute path to the module/repo root on disk (for go.mod resolution)
	// ModuleToRepoURL maps Go module paths to stored repo URLs. This is
	// populated by the indexer from the repos table so extractors can
	// resolve cross-repo call targets to the correct stored repo URL
	// rather than using heuristic inference from the import path.
	// Example: "github.com/org/repo" -> "/Users/user/code/repo"
	ModuleToRepoURL map[string]string

	// ParsedTree is an optional pre-parsed tree-sitter root node (*sitter.Node).
	// When set, extractors should use this instead of parsing the file again.
	// The indexer sets this when multiple extractors share the same language,
	// eliminating redundant parsing. Extractors that receive a non-nil ParsedTree
	// MUST NOT close the tree (the indexer owns it).
	ParsedTree ParsedTree
}

ExtractOptions contains all inputs needed for a single file extraction run. The indexer populates these fields and passes them to the selected Extractor.

type ExtractResult

type ExtractResult struct {
	Nodes []Node
	Edges []Edge
	// ParsedTree is set by tree-sitter extractors after parsing. The indexer
	// reads this and passes it to subsequent extractors handling the same file,
	// eliminating redundant parsing. Only the FIRST extractor to parse sets this;
	// subsequent extractors receive it via opts.ParsedTree and leave this nil.
	ParsedTree ParsedTree `json:"-"`
}

ExtractResult contains the nodes and edges produced by an extractor.

type Extractor

type Extractor interface {
	// Name returns a human-readable identifier for this extractor (e.g., "go", "go-treesitter").
	Name() string
	// CanHandle returns true if this extractor can process the file at the given path.
	// The path is relative to the repository root.
	CanHandle(path string) bool
	// Extract parses the file described by opts and returns extracted nodes and edges.
	// Returns an empty result (not an error) if no symbols are found.
	Extract(ctx context.Context, opts ExtractOptions) (*ExtractResult, error)
}

Extractor produces nodes and edges from source files. The indexer maintains a registry of extractors and dispatches each file to the first extractor whose CanHandle returns true. Implementations include GoExtractor (full type resolution), GoTreeSitterExtractor (fast AST-only), and TreeSitterExtractor (Python via tree-sitter).

type File

type File struct {
	FileHash    Hash   // sha256(repoHash || relativePath || contentHash)
	RepoHash    Hash   // hash of the containing Repo
	Path        string // path relative to the repository root
	ContentHash Hash   // sha256 of the raw file contents; used for skip-if-unchanged checks
}

File represents a tracked source file within a repository. The FileHash incorporates the repo, path, and content, so a file's identity changes whenever its content changes (enabling content-based change detection).

type GraphStore

type GraphStore interface {
	// Write operations (upsert semantics via INSERT OR REPLACE).
	PutNode(ctx context.Context, n Node) error
	PutEdge(ctx context.Context, e Edge) error
	PutFile(ctx context.Context, f File) error
	PutRepo(ctx context.Context, r Repo) error
	RecordEdgeEvent(ctx context.Context, ev EdgeEvent) error
	CreateSnapshot(ctx context.Context, s Snapshot) error

	// Point lookups by hash. Return nil when not found (no error).
	GetNode(ctx context.Context, hash Hash) (*Node, error)
	GetEdge(ctx context.Context, hash Hash) (*Edge, error)
	GetSnapshot(ctx context.Context, hash Hash) (*Snapshot, error)
	GetRepo(ctx context.Context, hash Hash) (*Repo, error)

	// Query operations.
	NodesByName(ctx context.Context, qualifiedPrefix string) ([]Node, error)
	EdgesFrom(ctx context.Context, sourceHash Hash, edgeType string) ([]Edge, error)
	EdgesTo(ctx context.Context, targetHash Hash, edgeType string) ([]Edge, error)
	DanglingEdges(ctx context.Context) ([]Edge, error)
	AllRepos(ctx context.Context) ([]Repo, error)
	NodesByQualifiedName(ctx context.Context, qualifiedName string) ([]Node, error)
	NodesByFileHash(ctx context.Context, fileHash Hash) ([]Node, error)

	// Delete operations for incremental re-indexing and garbage collection.
	// GC operations: delete nodes/edges not in the keep set.
	DeleteNodesNotIn(ctx context.Context, keep map[Hash]struct{}) (int64, error)
	DeleteEdgesNotIn(ctx context.Context, keep map[Hash]struct{}) (int64, error)
	DeleteEdge(ctx context.Context, hash Hash) error
	DeleteNodesByFile(ctx context.Context, fileHash Hash) (int, error)
	DeleteEdgesBySourceFile(ctx context.Context, fileHash Hash) ([]Edge, error)
	EdgesBySourceFile(ctx context.Context, fileHash Hash) ([]Edge, error)
	DeleteSnapshot(ctx context.Context, hash Hash) error

	// Graph traversals (implemented as recursive CTEs in SQLite).
	TransitiveCallers(ctx context.Context, target Hash, maxDepth int, snapshot Hash) ([]CallerResult, error)
	TransitiveCallees(ctx context.Context, source Hash, maxDepth int, snapshot Hash) ([]CalleeResult, error)
	BlastRadius(ctx context.Context, target Hash, snapshot Hash) (*BlastRadiusResult, error)

	// Snapshot operations.
	SnapshotDiff(ctx context.Context, oldRoot, newRoot Hash) (*DiffResult, error)
	StaleEdges(ctx context.Context, snapshot Hash) ([]Edge, error)
	LatestSnapshot(ctx context.Context, repoHash Hash) (*Snapshot, error)

	// File queries.
	FilesByRepo(ctx context.Context, repoHash Hash) ([]File, error)
	FileByPath(ctx context.Context, repoHash Hash, path string) (*File, error)
	NodesByFilePath(ctx context.Context, repoHash Hash, path string) ([]Node, error)
	StaleNodesByFiles(ctx context.Context, repoHash Hash, paths []string) ([]Node, error)

	// Notes: metadata that never affects Merkle computation (git notes pattern).
	// PutNote upserts a note (object_hash + key is the composite key).
	PutNote(ctx context.Context, n Note) error
	// GetNote retrieves a single note by object hash and key. Returns nil if not found.
	GetNote(ctx context.Context, objectHash Hash, key string) (*Note, error)
	// GetNotes retrieves all notes attached to an object.
	GetNotes(ctx context.Context, objectHash Hash) ([]Note, error)
	// GetNotesByKey retrieves all notes with the given key across all objects.
	GetNotesByKey(ctx context.Context, key string) ([]Note, error)
	// DeleteNote removes a single note by object hash and key.
	DeleteNote(ctx context.Context, objectHash Hash, key string) error
	// DeleteNotesByObject removes all notes attached to an object.
	DeleteNotesByObject(ctx context.Context, objectHash Hash) error

	// Close releases the underlying database connection.
	Close() error
}

GraphStore defines the operations the graph engine requires from its backing store. SQLite implements this today; an adjacency-list or external graph backend can implement it tomorrow without changing callers.

The interface is organized into four groups:

  • Write operations: PutNode, PutEdge, PutFile, PutRepo, RecordEdgeEvent, CreateSnapshot
  • Point lookups: GetNode, GetEdge, GetSnapshot, GetRepo
  • Query operations: NodesByName, EdgesFrom, EdgesTo, DanglingEdges, etc.
  • Graph traversals: TransitiveCallers, TransitiveCallees, BlastRadius, SnapshotDiff

All methods accept a context for cancellation and timeout propagation. Methods that return a pointer return nil (not an error) when the entity is not found.

type Hash

type Hash [32]byte

Hash is a content-addressed identifier (SHA-256 digest, 32 bytes). Used as the primary key for all graph entities: nodes, edges, files, repos, and snapshots. Two entities with identical content always produce the same Hash.

var EmptyHash Hash

EmptyHash is the zero-value hash.

func ComputeEdgeHash

func ComputeEdgeHash(sourceHash, targetHash Hash, edgeType, provenanceJSON string) Hash

ComputeEdgeHash computes the content-addressed hash for an edge. The hash formula is: SHA-256("edge" + NUL + sourceHash + NUL + targetHash + NUL + edgeType + NUL + provenance). The "edge\0" domain prefix distinguishes edge hashes from node, snapshot, and Merkle interior node hashes, preventing cross-domain hash collisions. Because provenance is included, upgrading an edge from "ast_inferred" to "lsp_resolved" produces a new hash (the old edge must be deleted first).

WARNING: This formula changed to include the "edge\0" prefix. Existing databases must be re-indexed.

func ComputeMerkleNodeHash added in v0.3.0

func ComputeMerkleNodeHash(left, right Hash) Hash

ComputeMerkleNodeHash computes a Merkle interior node hash with a "merkle" domain prefix. This distinguishes interior tree nodes from leaf hashes and snapshot root hashes.

func ComputeNodeHash

func ComputeNodeHash(repoURL, packagePath string, _ Hash, symbolName, symbolKind string) Hash

ComputeNodeHash computes the content-addressed hash for a node. The contentHash parameter is accepted for API compatibility but is not included in the hash computation. Node identity depends on (repo, package, name, kind) only.

The hash formula is: SHA-256("node" + NUL + repoURL + NUL + packagePath + NUL + symbolName + NUL + symbolKind). The "node\0" domain prefix distinguishes node hashes from edge, snapshot, and Merkle interior node hashes, preventing cross-domain hash collisions. NUL bytes are used as field separators to prevent ambiguous concatenation (e.g., "a/b" + "c" vs "a" + "b/c").

WARNING: This formula changed to include the "node\0" prefix. Existing databases must be re-indexed.

func ComputeSnapshotHash added in v0.3.0

func ComputeSnapshotHash(merkleRoot Hash) Hash

ComputeSnapshotHash wraps a Merkle root hash with a "snapshot" domain prefix, distinguishing snapshot identity from raw Merkle interior nodes.

func NewHash

func NewHash(data []byte) Hash

NewHash computes a SHA-256 hash from the given data.

func ParseHash added in v0.4.1

func ParseHash(s string) (Hash, error)

ParseHash decodes a hex string into a Hash.

func (Hash) IsZero

func (h Hash) IsZero() bool

IsZero returns true if the hash is the zero value.

func (Hash) MarshalJSON added in v0.4.1

func (h Hash) MarshalJSON() ([]byte, error)

MarshalJSON encodes the hash as a hex string for JSON output.

func (Hash) String

func (h Hash) String() string

String returns the hex-encoded hash.

func (*Hash) UnmarshalJSON added in v0.4.1

func (h *Hash) UnmarshalJSON(data []byte) error

UnmarshalJSON decodes a hex string into the hash.

type Node

type Node struct {
	// NodeHash is the content-addressed identity: sha256(repoURL || packagePath || symbolName || symbolKind).
	// Note: contentHash was removed from the computation; node identity
	// depends only on (repo, package, name, kind).
	NodeHash      Hash
	FileHash      Hash    // reference to the containing File record
	QualifiedName string  // fully qualified name: "{repoURL}://{pkgPath}.{TypeName}.{SymbolName}"
	Kind          string  // one of: function, type, method, interface, const, var
	Line          int     // 1-indexed source line number of the declaration
	Signature     string  // human-readable type signature for display (e.g., "func (SQLiteStore) PutNode()")
	Doc           string  // doc comment preceding the declaration (first 200 chars, language-agnostic)
	LastAuthor    string  // git blame: author of the last commit that touched this symbol's line
	LastCommitAt  int64   // git blame: unix timestamp of the last commit that touched this symbol's line
	CoveragePct   float64 // test coverage percentage for this symbol's lines (0.0-100.0, -1 = not measured)
}

Node represents a symbol in the knowledge graph. A node is a function, type, method, interface, const, or var declaration extracted from source code. Nodes are identified by a content-addressed hash computed from (repo, package, name, kind), so two nodes in different files with the same qualified identity will share a hash.

type Note added in v0.3.0

type Note struct {
	ObjectHash Hash   // the content-addressed object this note is attached to
	Key        string // the metadata key (e.g., "community_id", "context_pack")
	Value      string // the metadata value (opaque to the store; callers may use JSON)
	UpdatedAt  int64  // unix timestamp of last write
}

Note attaches arbitrary key/value metadata to any content-addressed object (node, edge, snapshot, community, pack root) without affecting Merkle computation. Modeled after git notes: a parallel metadata layer that never changes the identity of the object it annotates.

Use cases: community assignments, context pack persistence, quality scores, feedback annotations, agent session state.

type ParsedTree added in v0.6.1

type ParsedTree = any

ParsedTree is an opaque handle to a pre-parsed tree-sitter tree. Passed through ExtractOptions.ParsedTree when the indexer has already parsed the file for another extractor sharing the same language. The value is *sitter.Node (the root node) but typed as any to avoid importing tree-sitter in the types package.

type Repo

type Repo struct {
	RepoHash    Hash   // sha256(repoURL); canonical identity for the repo
	RepoURL     string // the URL or path that was passed to IndexRepo
	LastCommit  string // git commit hash from the most recent index run
	LastIndexed int64  // unix timestamp of the most recent index run
}

Repo represents a tracked repository. The RepoURL can be either a remote URL (e.g., "github.com/org/repo") or a local filesystem path, depending on how the repo was registered.

type Snapshot

type Snapshot struct {
	SnapshotHash Hash   // Merkle root: merkle_root(sorted(all_edge_hashes))
	ParentHash   Hash   // hash of the previous snapshot in the chain; zero for the first
	RepoHash     Hash   // hash of the repository this snapshot belongs to
	CommitHash   string // git commit hash at the time of snapshotting
	Timestamp    int64  // unix timestamp when the snapshot was created
	NodeCount    int    // total number of nodes in the graph at snapshot time
	EdgeCount    int    // total number of edges in the graph at snapshot time
	Generation   int    // chain depth: parent.Generation + 1; enables O(1) ancestry checks
}

Snapshot represents a point-in-time graph state for a single repository. The SnapshotHash is the Merkle root computed over all sorted edge hashes, providing a tamper-evident fingerprint of the entire graph at a given commit. Snapshots form a singly-linked chain via ParentHash, enabling efficient diffing and garbage collection.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL