Documentation
¶
Overview ¶
Package snapshot manages Merkle-based graph snapshots for the knowing knowledge graph.
Each snapshot represents a point-in-time fingerprint of all edges in a repository's graph. The fingerprint is a Merkle root computed by sorting all edge hashes lexicographically and building a binary hash tree. Two snapshots with different roots are guaranteed to have different edge sets, enabling efficient change detection.
Snapshots form a singly-linked chain (each snapshot points to its parent), supporting garbage collection of old snapshots while preserving chain integrity for the most recent N snapshots.
Index ¶
- Constants
- func ContextPackRoot(taskNormalized string, snapshotRoot types.Hash, selectedNodes []types.Hash) types.Hash
- func DiffMerkle(oldTree, newTree *MerkleTree) (added, removed []types.Hash)
- func ExtractPackagePath(qualifiedName string) (string, error)
- func SortHashes(hashes []types.Hash)
- func VerifyAbsenceProof(proof *AbsenceProof) bool
- func VerifyProof(proof *MerkleProof) bool
- type AbsenceProof
- type ChangeClass
- type ChangeClassification
- type DiffOptions
- type EdgeInput
- type GCStats
- type HierarchicalDiff
- type HierarchicalTree
- type MerkleProof
- type MerkleTree
- type ProofStep
- type SnapshotManager
- func (sm *SnapshotManager) CollectEdgeInputs(ctx context.Context, repoHash types.Hash) ([]EdgeInput, int, error)
- func (sm *SnapshotManager) ComputeSnapshot(ctx context.Context, repoHash types.Hash, commitHash string) (*types.Snapshot, error)
- func (sm *SnapshotManager) Diff(ctx context.Context, oldHash, newHash types.Hash) (*types.DiffResult, error)
- func (sm *SnapshotManager) GarbageCollect(ctx context.Context, repoHash types.Hash, keepCount int) (removed int, err error)
- func (sm *SnapshotManager) GarbageCollectFull(ctx context.Context, repoHash types.Hash, keepCount int) (GCStats, error)
- func (sm *SnapshotManager) LastHierarchicalTree() *HierarchicalTree
- func (sm *SnapshotManager) Verify(ctx context.Context, repoHash types.Hash, rosterDBPaths []string, ...) ([]VerifyError, error)
- type VerifyError
Constants ¶
const ( ClassCrossRepo = "cross_repo" ClassStdlib = "stdlib" ClassTrulyDangling = "truly_dangling" )
DanglingClassification describes why an edge target is missing from the local database. Only "truly_dangling" indicates potential corruption.
Variables ¶
This section is empty.
Functions ¶
func ContextPackRoot ¶ added in v0.3.0
func ContextPackRoot(taskNormalized string, snapshotRoot types.Hash, selectedNodes []types.Hash) types.Hash
ContextPackRoot computes a content-addressed key for a context pack: the combination of a task identifier, the snapshot state, and the selected symbols. Two identical queries against the same graph state produce the same root, enabling deduplication and caching.
func DiffMerkle ¶
func DiffMerkle(oldTree, newTree *MerkleTree) (added, removed []types.Hash)
DiffMerkle returns the leaf hashes that differ between two Merkle trees. Added are hashes present in newTree but not oldTree. Removed are hashes present in oldTree but not newTree.
func ExtractPackagePath ¶ added in v0.4.0
ExtractPackagePath extracts the package path from a qualified name. Format: "repoURL://pkgPath.SymbolName" or "repoURL://pkgPath.Type.Method" The separator "://" marks the boundary between the repo URL and the package path.
For Go packages, the package path ends at the first dot-separated component that starts with an uppercase letter (indicating a type or symbol name, not a package path segment). If no uppercase component is found, falls back to splitting at the last dot.
Examples:
"repo://pkg/sub.FuncName" -> "pkg/sub" "repo://pkg/sub.TypeName.Method" -> "pkg/sub" "repo://pkg/sub.lowercase" -> "pkg/sub"
This is the canonical package path extractor. All code that needs to derive a package path from a qualified name should use this function to avoid divergent implementations.
func SortHashes ¶
SortHashes sorts a slice of hashes lexicographically using bytes.Compare.
func VerifyAbsenceProof ¶ added in v0.4.1
func VerifyAbsenceProof(proof *AbsenceProof) bool
VerifyAbsenceProof checks that an absence proof is valid: 1. Both neighbor proofs verify against the same root. 2. left < missing < right (sorted order). 3. If both neighbors exist, they prove the gap contains no room for the missing hash.
func VerifyProof ¶ added in v0.4.0
func VerifyProof(proof *MerkleProof) bool
VerifyProof checks that a Merkle proof is valid: recomputing the root from the edge hash and proof steps produces the claimed repo root.
Types ¶
type AbsenceProof ¶ added in v0.4.1
type AbsenceProof struct {
// MissingHash is the edge hash being proved absent.
MissingHash types.Hash `json:"missing_hash"`
// LeftNeighbor is the largest leaf smaller than MissingHash (nil if MissingHash
// would be the first leaf).
LeftNeighbor *types.Hash `json:"left_neighbor,omitempty"`
// RightNeighbor is the smallest leaf larger than MissingHash (nil if MissingHash
// would be the last leaf).
RightNeighbor *types.Hash `json:"right_neighbor,omitempty"`
// LeftProof proves LeftNeighbor is in the tree (nil if no left neighbor).
LeftProof *MerkleProof `json:"left_proof,omitempty"`
// RightProof proves RightNeighbor is in the tree (nil if no right neighbor).
RightProof *MerkleProof `json:"right_proof,omitempty"`
// RepoRoot is the root the absence is proved against.
RepoRoot types.Hash `json:"repo_root"`
}
AbsenceProof proves that a specific edge hash does NOT exist in a tree. It works by proving the two adjacent leaves that bracket the missing hash: left < missing < right. Since leaves are sorted by bytes.Compare, adjacency proves there is no room for the missing hash.
func GenerateAbsenceProof ¶ added in v0.4.1
func GenerateAbsenceProof(tree *HierarchicalTree, edgeHash types.Hash, packagePath, edgeType string, edges []EdgeInput) (*AbsenceProof, error)
GenerateAbsenceProof creates a proof that edgeHash does NOT exist in the tree under the given package and edge type. Returns an error if the edge IS found (you can't prove absence of something that exists).
type ChangeClass ¶ added in v0.4.0
type ChangeClass string
ChangeClass categorizes what kind of change occurred between two snapshots.
const ( // Behavioral means call edges changed: functions call different things. // Agents should re-query for affected packages. Behavioral ChangeClass = "behavioral" // Structural means import/implements/extends edges changed: dependency // structure shifted but call behavior may be the same. Structural ChangeClass = "structural" // RuntimeDrift means runtime-observed edges changed: production traffic // patterns diverged from static analysis. RuntimeDrift ChangeClass = "runtime_drift" // MetadataOnly means only non-behavioral edges changed (references, // handles_route, configures, etc.). Safe to skip re-query in most cases. MetadataOnly ChangeClass = "metadata_only" // NoChange means the snapshots are identical. NoChange ChangeClass = "no_change" )
type ChangeClassification ¶ added in v0.4.0
type ChangeClassification struct {
// Class is the highest-severity change class detected.
Class ChangeClass
// BehavioralCount is the number of changed edge types classified as behavioral.
BehavioralCount int
// StructuralCount is the number of changed edge types classified as structural.
StructuralCount int
// RuntimeDriftCount is the number of changed edge types classified as runtime drift.
RuntimeDriftCount int
// MetadataCount is the number of changed edge types not in any other category.
MetadataCount int
// ChangedPackages from the underlying diff.
ChangedPackages []string
// ChangedEdgeTypes from the underlying diff (raw "package:edgeType" keys).
ChangedEdgeTypes []string
}
ChangeClassification is the result of ClassifyChanges.
func ClassifyChanges ¶ added in v0.4.0
func ClassifyChanges(diff *HierarchicalDiff) ChangeClassification
ClassifyChanges categorizes the changes between two hierarchical trees based on which edge-type roots changed. Returns the highest-severity classification: Behavioral > RuntimeDrift > Structural > MetadataOnly > NoChange.
type DiffOptions ¶ added in v0.3.0
type DiffOptions struct {
// PackageFilter restricts the diff to the listed package paths. When
// empty, all packages are compared (default behaviour).
PackageFilter []string
// MaxChanges caps the total number of changed/added/removed packages
// reported. Once the cap is reached the diff is marked Truncated and no
// further packages are added. 0 means no cap.
MaxChanges int
}
DiffOptions controls the behaviour of DiffHierarchicalTreesWithOptions.
type EdgeInput ¶ added in v0.3.0
type EdgeInput struct {
EdgeHash types.Hash
PackagePath string // extracted from source node's qualified name
EdgeType string // calls, imports, implements, references, etc.
}
EdgeInput is the input for building a hierarchical tree: an edge hash with its source package and edge type metadata.
type GCStats ¶ added in v0.3.0
type GCStats struct {
// SnapshotsRemoved is the number of old snapshots pruned from the chain.
SnapshotsRemoved int
// NodesRemoved is the number of orphaned nodes deleted from the graph.
NodesRemoved int64
// EdgesRemoved is the number of orphaned edges deleted from the graph.
EdgesRemoved int64
// Duration is the wall-clock time taken by the full GC run.
Duration time.Duration
}
GCStats records the outcome of a full garbage collection run.
type HierarchicalDiff ¶ added in v0.3.0
type HierarchicalDiff struct {
// ChangedPackages lists packages whose root changed (content differs).
ChangedPackages []string
// AddedPackages lists packages present in new but not old.
AddedPackages []string
// RemovedPackages lists packages present in old but not new.
RemovedPackages []string
// ChangedEdgeTypes lists "package:edgeType" keys whose root changed.
ChangedEdgeTypes []string
// RootChanged is true if the overall repo root differs.
RootChanged bool
// Truncated is true when the diff was cut short by a MaxChanges cap.
Truncated bool
}
DiffHierarchical compares two hierarchical trees and returns which packages and edge types changed. This is O(packages) instead of O(edges).
func DiffHierarchicalTrees ¶ added in v0.3.0
func DiffHierarchicalTrees(oldTree, newTree *HierarchicalTree) *HierarchicalDiff
DiffHierarchicalTrees compares two hierarchical trees at each level. It is a convenience wrapper around DiffHierarchicalTreesWithOptions with nil options (no filter, no cap).
func DiffHierarchicalTreesWithOptions ¶ added in v0.3.0
func DiffHierarchicalTreesWithOptions(oldTree, newTree *HierarchicalTree, opts *DiffOptions) *HierarchicalDiff
DiffHierarchicalTreesWithOptions compares two hierarchical trees with optional package filtering and a maximum-changes cap.
When opts.PackageFilter is non-empty, only the listed packages are examined; the resulting diff reflects only those packages. When opts.MaxChanges is positive, the diff stops accumulating entries once that many total changed/added/removed packages have been recorded and sets HierarchicalDiff.Truncated = true.
type HierarchicalTree ¶ added in v0.3.0
type HierarchicalTree struct {
Root types.Hash
// PackageRoots maps package path to its Merkle root.
// A package root is hash(sorted edge-type roots for that package).
PackageRoots map[string]types.Hash
// EdgeTypeRoots maps "package:edgeType" to its Merkle root.
// An edge-type root is hash(sorted edge hashes of that type in that package).
EdgeTypeRoots map[string]types.Hash
// PackageEdgeCounts tracks edge count per package for quick stats.
PackageEdgeCounts map[string]int
// TotalEdges is the total number of edges across all packages.
TotalEdges int
// contains filtered or unexported fields
}
HierarchicalTree represents a Merkle tree structured by semantic boundaries: repo root -> package roots -> edge-type roots -> leaf edges.
This structure enables:
- O(1) "did package X change?" checks via PackageRoots comparison
- O(1) "did call edges change?" checks via EdgeTypeRoots comparison
- Subgraph caching: cache results against package or edge-type roots
- Incremental recompute: only rebuild derived data for changed subtrees
- Lazy materialization: load only the subtrees a retrieval walk visits
The Root field is backward-compatible with the flat MerkleTree: it's computed from sorted package roots, which are computed from sorted edge-type roots per package, which are computed from sorted edge hashes. The root is deterministic and will match the flat tree when the same edges are present.
func BuildHierarchicalTree ¶ added in v0.3.0
func BuildHierarchicalTree(edges []EdgeInput) *HierarchicalTree
BuildHierarchicalTree constructs a hierarchical Merkle tree from edge inputs.
Structure:
repo root = merkle(sorted package roots)
package root = merkle(sorted edge-type roots for this package)
edge-type root = merkle(sorted edge hashes of this type in this package)
leaf = edge hash
func (*HierarchicalTree) EdgeTypeRoot ¶ added in v0.3.0
func (ht *HierarchicalTree) EdgeTypeRoot(edgeType string) types.Hash
EdgeTypeRoot returns the Merkle root for a specific edge type across all packages. Useful for checking "did any call edges change?" without scanning the full tree.
func (*HierarchicalTree) SubgraphRoot ¶ added in v0.3.0
func (ht *HierarchicalTree) SubgraphRoot(packages []string) types.Hash
SubgraphRoot computes a cache key for a subgraph defined by a set of packages. This is useful for caching results of operations that only depend on certain packages (e.g., blast_radius for a symbol in package X).
type MerkleProof ¶ added in v0.4.0
type MerkleProof struct {
// The edge being proved.
EdgeHash types.Hash `json:"edge_hash"`
// Hierarchical context.
PackagePath string `json:"package_path"`
EdgeType string `json:"edge_type"`
// Level 1: edge hash -> edge-type root.
// Binary proof within the sorted edge hashes of this (package, edgeType).
EdgeToEdgeTypeRoot []ProofStep `json:"edge_to_edge_type_root"`
EdgeTypeRoot types.Hash `json:"edge_type_root"`
// Level 2: edge-type root -> package root.
// Binary proof within the sorted edge-type roots of this package.
EdgeTypeToPackageRoot []ProofStep `json:"edge_type_to_package_root"`
PackageRoot types.Hash `json:"package_root"`
// Level 3: package root -> repo root.
// Binary proof within the sorted package roots.
PackageToRepoRoot []ProofStep `json:"package_to_repo_root"`
RepoRoot types.Hash `json:"repo_root"`
}
MerkleProof is a proof that a specific edge exists in a hierarchical snapshot. The proof path goes: edge -> edge-type root -> package root -> repo root. Each level includes the binary Merkle tree proof within that level, plus the hierarchical context (package name, edge type).
func GenerateProof ¶ added in v0.4.0
func GenerateProof(tree *HierarchicalTree, edgeHash types.Hash, packagePath, edgeType string, edges []EdgeInput) (*MerkleProof, error)
GenerateProof creates a Merkle proof that edgeHash exists in the tree under the given package and edge type. Returns an error if the edge is not found.
type MerkleTree ¶
type MerkleTree struct {
Root types.Hash // the Merkle root (top of the tree)
Leaves []types.Hash // sorted leaf hashes (the input edge hashes)
}
MerkleTree represents a binary Merkle tree built from sorted hashes. The tree is constructed bottom-up: leaves are sorted edge hashes, and each internal node is SHA-256(left_child || right_child). The root hash uniquely identifies the set of leaves. Comparing two roots in O(1) determines whether the edge sets are identical.
func BuildMerkleTree ¶
func BuildMerkleTree(hashes []types.Hash) *MerkleTree
BuildMerkleTree constructs a binary Merkle tree from a slice of edge hashes. Hashes are sorted lexicographically using bytes.Compare before tree construction. Returns a MerkleTree with the root hash and sorted leaves.
type ProofStep ¶ added in v0.4.0
type ProofStep struct {
Sibling types.Hash `json:"sibling"`
IsLeft bool `json:"is_left"` // true if sibling is on the left (target is right)
}
ProofStep represents one step in a Merkle proof path. At each level of the binary tree, the verifier needs the sibling hash and whether it's on the left or right.
type SnapshotManager ¶
type SnapshotManager struct {
// contains filtered or unexported fields
}
SnapshotManager manages Merkle root computation, snapshot chain maintenance, diff operations, and garbage collection of old snapshots.
func NewSnapshotManager ¶
func NewSnapshotManager(store types.GraphStore) *SnapshotManager
NewSnapshotManager creates a new SnapshotManager backed by the given GraphStore.
func (*SnapshotManager) CollectEdgeInputs ¶ added in v0.4.0
func (sm *SnapshotManager) CollectEdgeInputs(ctx context.Context, repoHash types.Hash) ([]EdgeInput, int, error)
CollectEdgeInputs gathers all edges with package and type metadata for a repo. This is the canonical source of EdgeInput data: both hierarchical tree construction and Merkle proof generation must use this method to ensure consistent package paths and edge hashes.
func (*SnapshotManager) ComputeSnapshot ¶
func (sm *SnapshotManager) ComputeSnapshot(ctx context.Context, repoHash types.Hash, commitHash string) (*types.Snapshot, error)
ComputeSnapshot computes a new snapshot for a repository by collecting all edge hashes, building a Merkle tree, and storing the resulting snapshot. The snapshot is chained to the latest existing snapshot for the repo. Also builds a HierarchicalTree for efficient per-package diff and caching.
func (*SnapshotManager) Diff ¶
func (sm *SnapshotManager) Diff(ctx context.Context, oldHash, newHash types.Hash) (*types.DiffResult, error)
Diff returns the structural difference between two snapshots. It delegates to the GraphStore's SnapshotDiff implementation.
func (*SnapshotManager) GarbageCollect ¶
func (sm *SnapshotManager) GarbageCollect(ctx context.Context, repoHash types.Hash, keepCount int) (removed int, err error)
GarbageCollect removes old snapshots for a repo, keeping the most recent keepCount snapshots. It preserves chain integrity by walking the chain from the latest snapshot backwards. Returns the number of removed snapshots.
func (*SnapshotManager) GarbageCollectFull ¶ added in v0.3.0
func (sm *SnapshotManager) GarbageCollectFull(ctx context.Context, repoHash types.Hash, keepCount int) (GCStats, error)
GarbageCollectFull extends the basic snapshot chain GC with a reachability sweep that prunes orphaned nodes and edges not referenced by any surviving snapshot. It returns a GCStats struct describing what was removed.
func (*SnapshotManager) LastHierarchicalTree ¶ added in v0.3.0
func (sm *SnapshotManager) LastHierarchicalTree() *HierarchicalTree
LastHierarchicalTree returns the most recently computed hierarchical tree, or nil if no snapshot has been computed in this session.
func (*SnapshotManager) Verify ¶ added in v0.3.0
func (sm *SnapshotManager) Verify(ctx context.Context, repoHash types.Hash, rosterDBPaths []string, localDBPath string) ([]VerifyError, error)
Verify performs integrity verification on a repo's graph. Checks: edge referential integrity, hash recomputation, snapshot chain continuity.
When rosterDBPaths is non-empty, dangling edges are classified as cross_repo, stdlib, or truly_dangling. When empty, all dangling edges are truly_dangling. localDBPath identifies this store's database so it is skipped in roster lookups.
type VerifyError ¶ added in v0.3.0
type VerifyError struct {
Level string // "ERROR" or "WARN"
Kind string // "dangling_edge", "hash_mismatch", "broken_chain", "missing_file"
Hash types.Hash // hash of the entity with the issue
Message string // human-readable description
Classification string // for dangling_edge: "cross_repo", "stdlib", or "truly_dangling"
}
VerifyError represents a single integrity check finding.