snapshot

package
v0.4.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 20, 2026 License: MIT Imports: 10 Imported by: 0

Documentation

Overview

Package snapshot manages Merkle-based graph snapshots for the knowing knowledge graph.

Each snapshot represents a point-in-time fingerprint of all edges in a repository's graph. The fingerprint is a Merkle root computed by sorting all edge hashes lexicographically and building a binary hash tree. Two snapshots with different roots are guaranteed to have different edge sets, enabling efficient change detection.

Snapshots form a singly-linked chain (each snapshot points to its parent), supporting garbage collection of old snapshots while preserving chain integrity for the most recent N snapshots.

Index

Constants

View Source
const (
	ClassCrossRepo     = "cross_repo"
	ClassStdlib        = "stdlib"
	ClassTrulyDangling = "truly_dangling"
)

DanglingClassification describes why an edge target is missing from the local database. Only "truly_dangling" indicates potential corruption.

Variables

This section is empty.

Functions

func ContextPackRoot added in v0.3.0

func ContextPackRoot(taskNormalized string, snapshotRoot types.Hash, selectedNodes []types.Hash) types.Hash

ContextPackRoot computes a content-addressed key for a context pack: the combination of a task identifier, the snapshot state, and the selected symbols. Two identical queries against the same graph state produce the same root, enabling deduplication and caching.

func DiffMerkle

func DiffMerkle(oldTree, newTree *MerkleTree) (added, removed []types.Hash)

DiffMerkle returns the leaf hashes that differ between two Merkle trees. Added are hashes present in newTree but not oldTree. Removed are hashes present in oldTree but not newTree.

func ExtractPackagePath added in v0.4.0

func ExtractPackagePath(qualifiedName string) (string, error)

ExtractPackagePath extracts the package path from a qualified name. Format: "repoURL://pkgPath.SymbolName" or "repoURL://pkgPath.Type.Method" The separator "://" marks the boundary between the repo URL and the package path.

For Go packages, the package path ends at the first dot-separated component that starts with an uppercase letter (indicating a type or symbol name, not a package path segment). If no uppercase component is found, falls back to splitting at the last dot.

Examples:

"repo://pkg/sub.FuncName"          -> "pkg/sub"
"repo://pkg/sub.TypeName.Method"   -> "pkg/sub"
"repo://pkg/sub.lowercase"         -> "pkg/sub"

This is the canonical package path extractor. All code that needs to derive a package path from a qualified name should use this function to avoid divergent implementations.

func SortHashes

func SortHashes(hashes []types.Hash)

SortHashes sorts a slice of hashes lexicographically using bytes.Compare.

func VerifyAbsenceProof added in v0.4.1

func VerifyAbsenceProof(proof *AbsenceProof) bool

VerifyAbsenceProof checks that an absence proof is valid: 1. Both neighbor proofs verify against the same root. 2. left < missing < right (sorted order). 3. If both neighbors exist, they prove the gap contains no room for the missing hash.

func VerifyProof added in v0.4.0

func VerifyProof(proof *MerkleProof) bool

VerifyProof checks that a Merkle proof is valid: recomputing the root from the edge hash and proof steps produces the claimed repo root.

Types

type AbsenceProof added in v0.4.1

type AbsenceProof struct {
	// MissingHash is the edge hash being proved absent.
	MissingHash types.Hash `json:"missing_hash"`

	// LeftNeighbor is the largest leaf smaller than MissingHash (nil if MissingHash
	// would be the first leaf).
	LeftNeighbor *types.Hash `json:"left_neighbor,omitempty"`
	// RightNeighbor is the smallest leaf larger than MissingHash (nil if MissingHash
	// would be the last leaf).
	RightNeighbor *types.Hash `json:"right_neighbor,omitempty"`

	// LeftProof proves LeftNeighbor is in the tree (nil if no left neighbor).
	LeftProof *MerkleProof `json:"left_proof,omitempty"`
	// RightProof proves RightNeighbor is in the tree (nil if no right neighbor).
	RightProof *MerkleProof `json:"right_proof,omitempty"`

	// RepoRoot is the root the absence is proved against.
	RepoRoot types.Hash `json:"repo_root"`
}

AbsenceProof proves that a specific edge hash does NOT exist in a tree. It works by proving the two adjacent leaves that bracket the missing hash: left < missing < right. Since leaves are sorted by bytes.Compare, adjacency proves there is no room for the missing hash.

func GenerateAbsenceProof added in v0.4.1

func GenerateAbsenceProof(tree *HierarchicalTree, edgeHash types.Hash, packagePath, edgeType string, edges []EdgeInput) (*AbsenceProof, error)

GenerateAbsenceProof creates a proof that edgeHash does NOT exist in the tree under the given package and edge type. Returns an error if the edge IS found (you can't prove absence of something that exists).

type ChangeClass added in v0.4.0

type ChangeClass string

ChangeClass categorizes what kind of change occurred between two snapshots.

const (
	// Behavioral means call edges changed: functions call different things.
	// Agents should re-query for affected packages.
	Behavioral ChangeClass = "behavioral"

	// Structural means import/implements/extends edges changed: dependency
	// structure shifted but call behavior may be the same.
	Structural ChangeClass = "structural"

	// RuntimeDrift means runtime-observed edges changed: production traffic
	// patterns diverged from static analysis.
	RuntimeDrift ChangeClass = "runtime_drift"

	// MetadataOnly means only non-behavioral edges changed (references,
	// handles_route, configures, etc.). Safe to skip re-query in most cases.
	MetadataOnly ChangeClass = "metadata_only"

	// NoChange means the snapshots are identical.
	NoChange ChangeClass = "no_change"
)

type ChangeClassification added in v0.4.0

type ChangeClassification struct {
	// Class is the highest-severity change class detected.
	Class ChangeClass

	// BehavioralCount is the number of changed edge types classified as behavioral.
	BehavioralCount int
	// StructuralCount is the number of changed edge types classified as structural.
	StructuralCount int
	// RuntimeDriftCount is the number of changed edge types classified as runtime drift.
	RuntimeDriftCount int
	// MetadataCount is the number of changed edge types not in any other category.
	MetadataCount int

	// ChangedPackages from the underlying diff.
	ChangedPackages []string
	// ChangedEdgeTypes from the underlying diff (raw "package:edgeType" keys).
	ChangedEdgeTypes []string
}

ChangeClassification is the result of ClassifyChanges.

func ClassifyChanges added in v0.4.0

func ClassifyChanges(diff *HierarchicalDiff) ChangeClassification

ClassifyChanges categorizes the changes between two hierarchical trees based on which edge-type roots changed. Returns the highest-severity classification: Behavioral > RuntimeDrift > Structural > MetadataOnly > NoChange.

type DiffOptions added in v0.3.0

type DiffOptions struct {
	// PackageFilter restricts the diff to the listed package paths. When
	// empty, all packages are compared (default behaviour).
	PackageFilter []string

	// MaxChanges caps the total number of changed/added/removed packages
	// reported. Once the cap is reached the diff is marked Truncated and no
	// further packages are added. 0 means no cap.
	MaxChanges int
}

DiffOptions controls the behaviour of DiffHierarchicalTreesWithOptions.

type EdgeInput added in v0.3.0

type EdgeInput struct {
	EdgeHash    types.Hash
	PackagePath string // extracted from source node's qualified name
	EdgeType    string // calls, imports, implements, references, etc.
}

EdgeInput is the input for building a hierarchical tree: an edge hash with its source package and edge type metadata.

type GCStats added in v0.3.0

type GCStats struct {
	// SnapshotsRemoved is the number of old snapshots pruned from the chain.
	SnapshotsRemoved int

	// NodesRemoved is the number of orphaned nodes deleted from the graph.
	NodesRemoved int64

	// EdgesRemoved is the number of orphaned edges deleted from the graph.
	EdgesRemoved int64

	// Duration is the wall-clock time taken by the full GC run.
	Duration time.Duration
}

GCStats records the outcome of a full garbage collection run.

type HierarchicalDiff added in v0.3.0

type HierarchicalDiff struct {
	// ChangedPackages lists packages whose root changed (content differs).
	ChangedPackages []string

	// AddedPackages lists packages present in new but not old.
	AddedPackages []string

	// RemovedPackages lists packages present in old but not new.
	RemovedPackages []string

	// ChangedEdgeTypes lists "package:edgeType" keys whose root changed.
	ChangedEdgeTypes []string

	// RootChanged is true if the overall repo root differs.
	RootChanged bool

	// Truncated is true when the diff was cut short by a MaxChanges cap.
	Truncated bool
}

DiffHierarchical compares two hierarchical trees and returns which packages and edge types changed. This is O(packages) instead of O(edges).

func DiffHierarchicalTrees added in v0.3.0

func DiffHierarchicalTrees(oldTree, newTree *HierarchicalTree) *HierarchicalDiff

DiffHierarchicalTrees compares two hierarchical trees at each level. It is a convenience wrapper around DiffHierarchicalTreesWithOptions with nil options (no filter, no cap).

func DiffHierarchicalTreesWithOptions added in v0.3.0

func DiffHierarchicalTreesWithOptions(oldTree, newTree *HierarchicalTree, opts *DiffOptions) *HierarchicalDiff

DiffHierarchicalTreesWithOptions compares two hierarchical trees with optional package filtering and a maximum-changes cap.

When opts.PackageFilter is non-empty, only the listed packages are examined; the resulting diff reflects only those packages. When opts.MaxChanges is positive, the diff stops accumulating entries once that many total changed/added/removed packages have been recorded and sets HierarchicalDiff.Truncated = true.

type HierarchicalTree added in v0.3.0

type HierarchicalTree struct {
	Root types.Hash

	// PackageRoots maps package path to its Merkle root.
	// A package root is hash(sorted edge-type roots for that package).
	PackageRoots map[string]types.Hash

	// EdgeTypeRoots maps "package:edgeType" to its Merkle root.
	// An edge-type root is hash(sorted edge hashes of that type in that package).
	EdgeTypeRoots map[string]types.Hash

	// PackageEdgeCounts tracks edge count per package for quick stats.
	PackageEdgeCounts map[string]int

	// TotalEdges is the total number of edges across all packages.
	TotalEdges int
	// contains filtered or unexported fields
}

HierarchicalTree represents a Merkle tree structured by semantic boundaries: repo root -> package roots -> edge-type roots -> leaf edges.

This structure enables:

  • O(1) "did package X change?" checks via PackageRoots comparison
  • O(1) "did call edges change?" checks via EdgeTypeRoots comparison
  • Subgraph caching: cache results against package or edge-type roots
  • Incremental recompute: only rebuild derived data for changed subtrees
  • Lazy materialization: load only the subtrees a retrieval walk visits

The Root field is backward-compatible with the flat MerkleTree: it's computed from sorted package roots, which are computed from sorted edge-type roots per package, which are computed from sorted edge hashes. The root is deterministic and will match the flat tree when the same edges are present.

func BuildHierarchicalTree added in v0.3.0

func BuildHierarchicalTree(edges []EdgeInput) *HierarchicalTree

BuildHierarchicalTree constructs a hierarchical Merkle tree from edge inputs.

Structure:

repo root = merkle(sorted package roots)
  package root = merkle(sorted edge-type roots for this package)
    edge-type root = merkle(sorted edge hashes of this type in this package)
      leaf = edge hash

func (*HierarchicalTree) EdgeTypeRoot added in v0.3.0

func (ht *HierarchicalTree) EdgeTypeRoot(edgeType string) types.Hash

EdgeTypeRoot returns the Merkle root for a specific edge type across all packages. Useful for checking "did any call edges change?" without scanning the full tree.

func (*HierarchicalTree) SubgraphRoot added in v0.3.0

func (ht *HierarchicalTree) SubgraphRoot(packages []string) types.Hash

SubgraphRoot computes a cache key for a subgraph defined by a set of packages. This is useful for caching results of operations that only depend on certain packages (e.g., blast_radius for a symbol in package X).

type MerkleProof added in v0.4.0

type MerkleProof struct {
	// The edge being proved.
	EdgeHash types.Hash `json:"edge_hash"`

	// Hierarchical context.
	PackagePath string `json:"package_path"`
	EdgeType    string `json:"edge_type"`

	// Level 1: edge hash -> edge-type root.
	// Binary proof within the sorted edge hashes of this (package, edgeType).
	EdgeToEdgeTypeRoot []ProofStep `json:"edge_to_edge_type_root"`
	EdgeTypeRoot       types.Hash  `json:"edge_type_root"`

	// Level 2: edge-type root -> package root.
	// Binary proof within the sorted edge-type roots of this package.
	EdgeTypeToPackageRoot []ProofStep `json:"edge_type_to_package_root"`
	PackageRoot           types.Hash  `json:"package_root"`

	// Level 3: package root -> repo root.
	// Binary proof within the sorted package roots.
	PackageToRepoRoot []ProofStep `json:"package_to_repo_root"`
	RepoRoot          types.Hash  `json:"repo_root"`
}

MerkleProof is a proof that a specific edge exists in a hierarchical snapshot. The proof path goes: edge -> edge-type root -> package root -> repo root. Each level includes the binary Merkle tree proof within that level, plus the hierarchical context (package name, edge type).

func GenerateProof added in v0.4.0

func GenerateProof(tree *HierarchicalTree, edgeHash types.Hash, packagePath, edgeType string, edges []EdgeInput) (*MerkleProof, error)

GenerateProof creates a Merkle proof that edgeHash exists in the tree under the given package and edge type. Returns an error if the edge is not found.

type MerkleTree

type MerkleTree struct {
	Root   types.Hash   // the Merkle root (top of the tree)
	Leaves []types.Hash // sorted leaf hashes (the input edge hashes)
}

MerkleTree represents a binary Merkle tree built from sorted hashes. The tree is constructed bottom-up: leaves are sorted edge hashes, and each internal node is SHA-256(left_child || right_child). The root hash uniquely identifies the set of leaves. Comparing two roots in O(1) determines whether the edge sets are identical.

func BuildMerkleTree

func BuildMerkleTree(hashes []types.Hash) *MerkleTree

BuildMerkleTree constructs a binary Merkle tree from a slice of edge hashes. Hashes are sorted lexicographically using bytes.Compare before tree construction. Returns a MerkleTree with the root hash and sorted leaves.

type ProofStep added in v0.4.0

type ProofStep struct {
	Sibling types.Hash `json:"sibling"`
	IsLeft  bool       `json:"is_left"` // true if sibling is on the left (target is right)
}

ProofStep represents one step in a Merkle proof path. At each level of the binary tree, the verifier needs the sibling hash and whether it's on the left or right.

type SnapshotManager

type SnapshotManager struct {
	// contains filtered or unexported fields
}

SnapshotManager manages Merkle root computation, snapshot chain maintenance, diff operations, and garbage collection of old snapshots.

func NewSnapshotManager

func NewSnapshotManager(store types.GraphStore) *SnapshotManager

NewSnapshotManager creates a new SnapshotManager backed by the given GraphStore.

func (*SnapshotManager) CollectEdgeInputs added in v0.4.0

func (sm *SnapshotManager) CollectEdgeInputs(ctx context.Context, repoHash types.Hash) ([]EdgeInput, int, error)

CollectEdgeInputs gathers all edges with package and type metadata for a repo. This is the canonical source of EdgeInput data: both hierarchical tree construction and Merkle proof generation must use this method to ensure consistent package paths and edge hashes.

func (*SnapshotManager) ComputeSnapshot

func (sm *SnapshotManager) ComputeSnapshot(ctx context.Context, repoHash types.Hash, commitHash string) (*types.Snapshot, error)

ComputeSnapshot computes a new snapshot for a repository by collecting all edge hashes, building a Merkle tree, and storing the resulting snapshot. The snapshot is chained to the latest existing snapshot for the repo. Also builds a HierarchicalTree for efficient per-package diff and caching.

func (*SnapshotManager) Diff

func (sm *SnapshotManager) Diff(ctx context.Context, oldHash, newHash types.Hash) (*types.DiffResult, error)

Diff returns the structural difference between two snapshots. It delegates to the GraphStore's SnapshotDiff implementation.

func (*SnapshotManager) GarbageCollect

func (sm *SnapshotManager) GarbageCollect(ctx context.Context, repoHash types.Hash, keepCount int) (removed int, err error)

GarbageCollect removes old snapshots for a repo, keeping the most recent keepCount snapshots. It preserves chain integrity by walking the chain from the latest snapshot backwards. Returns the number of removed snapshots.

func (*SnapshotManager) GarbageCollectFull added in v0.3.0

func (sm *SnapshotManager) GarbageCollectFull(ctx context.Context, repoHash types.Hash, keepCount int) (GCStats, error)

GarbageCollectFull extends the basic snapshot chain GC with a reachability sweep that prunes orphaned nodes and edges not referenced by any surviving snapshot. It returns a GCStats struct describing what was removed.

func (*SnapshotManager) LastHierarchicalTree added in v0.3.0

func (sm *SnapshotManager) LastHierarchicalTree() *HierarchicalTree

LastHierarchicalTree returns the most recently computed hierarchical tree, or nil if no snapshot has been computed in this session.

func (*SnapshotManager) Verify added in v0.3.0

func (sm *SnapshotManager) Verify(ctx context.Context, repoHash types.Hash, rosterDBPaths []string, localDBPath string) ([]VerifyError, error)

Verify performs integrity verification on a repo's graph. Checks: edge referential integrity, hash recomputation, snapshot chain continuity.

When rosterDBPaths is non-empty, dangling edges are classified as cross_repo, stdlib, or truly_dangling. When empty, all dangling edges are truly_dangling. localDBPath identifies this store's database so it is skipped in roster lookups.

type VerifyError added in v0.3.0

type VerifyError struct {
	Level          string     // "ERROR" or "WARN"
	Kind           string     // "dangling_edge", "hash_mismatch", "broken_chain", "missing_file"
	Hash           types.Hash // hash of the entity with the issue
	Message        string     // human-readable description
	Classification string     // for dangling_edge: "cross_repo", "stdlib", or "truly_dangling"
}

VerifyError represents a single integrity check finding.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL