Documentation
¶
Overview ¶
Package buildpipe — cache.go implements the A3 file-level incremental cache (spec v0.2 § 4 Phase 1). Cache key composition, manifest diffing (hit/miss/removed classification) and parser-version derivation live here.
Phase 1 scope: per-file SHA256 + cache key, skip parse on hit, full Pass 2 re-run, full PageRank/Leiden recompute when ANY file is dirty. Phase 2 (reverse-reference invalidation, partial Pass 2) is C1's job.
Package buildpipe — incremental.go drives the A3 file-level cache build path (spec v0.2 § 4 Phase 1). Two entry points:
- runCold: full rebuild (legacy V0 path, used on --no-cache or unusable cache).
- runIncremental: parse only dirty files, reload cached node sets from DB, then rerun Pass 2 / cluster / score across the merged graph.
Phase 1 simplifications (per spec, deferred to C1+):
- PageRank/Leiden recompute on ANY dirt (no <1% change-ratio shortcut).
- Cross-language Sol↔TS link rebuilt whenever any TS or Sol file is dirty.
- Reverse-reference index for partial Pass 2 invalidation: NOT implemented (Phase 2, C1's job). Pass 2 Resolve always sees the full per-language node set.
Package buildpipe — language_runners.go contains the per-language Pass 1 + Pass 2 driver functions (runGoPipeline, runTSPipeline, runSolPipeline) and their immediate helpers (stampFilePath, convertABI). Extracted from pipeline.go in G4 to keep the orchestrator file under the soft 400-line cap. Pure file move — no behavior change.
Package buildpipe — lock_propagation.go implements D1 Stage B (W-A, Within-language semantics Phase 5): cross-function lock propagation for the Go concurrency pass. Extends the existing intra-function accessed_under_lock detector (internal/parse/golang/concurrency_underlock.go) by walking the call graph from lock-holding functions into their callees and emitting accessed_under_lock(field, mutex) edges for fields touched inside reachable callee bodies.
Spec reference: docs/design/go-cross-function-lock-propagation.md (decisions resolved 2026-05-11, §5.0 — Stage B DFS depth=5, INFERRED confidence for all cross-function emits, calls+invokes traversal, goroutine bodies forced to INFERRED, opt-in flag, dedup with confidence priority).
Opt-in only: gated by Options.LockPropagation (CLI: --lock-propagation, default false). When the flag is off, the pass is a structural no-op — the existing intra-function B1 Phase 4 emit is unchanged.
Package buildpipe orchestrates the full Pass 1..4 build (spec §4.7): detect → parse → resolve → graph build/validate → cluster → score → persist. Three routing paths: cold rebuild, short-circuit (all-cached), and incremental (partial-hit — reuse cached files, re-parse dirty). See Run for routing logic.
Package buildpipe — staleness.go computes the manifest's staleness fingerprint. Prefers a path-aware git lookup (so unrelated commits don't flip the banner — see internal/server/staleness.go for the symmetrical serve-side comparison); falls back to mtime sum of up to 5 detected files when the source root isn't a git checkout. Extracted from pipeline.go in G4. Pure file move — no behavior change.
Package buildpipe — temporal.go wires CKS G6 Temporal edges (E4) into the cold-rebuild path. Conceptually:
- Run a single `git log --raw` over the repo root containing srcRoot.
- Translate repo-rooted paths into srcRoot-relative slash paths so they align with the rel paths the parsers stamped on Node.FilePath.
- For every distinct commit, append a NodeCommit (one per SHA).
- For every (file, commit) the log surfaces, emit `changed_in` from EVERY symbol in that file → that commit (file-level heuristic; line- level blame is deferred — see EdgeChangedIn doc-comment).
- For every file, emit ONE `blame` edge from its File node → its most recent commit (V0 simplification of `file:line → commit`).
Skips silently (no error) when srcRoot isn't inside a git checkout, so non-git source trees still build cleanly without temporal edges.
Package buildpipe — temporal_hunks.go wires the CKS G6 Hunk-graph H1 stage (schema 1.8) on top of emitTemporalEdges. Per design docs/design/hunk-graph.md (decisions finalised 2026-05-09):
- Encoding (§11.1): gzip stdlib, ~70% size reduction on diff text.
- Dedup (§11.2): none in H1; keep chronology of rebased hunks.
- Reach (§11.3): H1 only collects HEAD-reachable hunks (Confidence='EXTRACTED'). A future PR adds unreachable collection (Confidence='AMBIGUOUS') via reflog/fsck — H3's EvidencePack assembler MUST filter to EXTRACTED so the LLM never sees force-pushed-away code paths.
- Lang (§11.4): hunk inherits its target file's extension when in {go, ts, sol}; everything else becomes 'git'.
- Cap (§11.6): 64KB patch cap. Larger patches are stored as first 32KB + truncation marker + last 32KB. Compression is applied AFTER truncation.
- Manifest (§11.8): Hunk node IDs are NOT recorded in the per-file manifest entries (they live outside file-level cache invalidation; emitTemporalEdges runs them wholesale on every build). isMetaNodeType is the single source of truth that buildFileEntries + computeColdFileEntries + extractBlobs share.
Index ¶
- Constants
- func ComputeCacheKey(content []byte, ckgVersion, parserVersion string) string
- func ManifestUsable(old *persist.Manifest, ckgVersion string) bool
- func Run(opt Options) (persist.Manifest, error)
- func SHA256Hex(content []byte) string
- type CacheDecisions
- type DiscoveredFile
- type FileDecision
- type Options
- type TopicTreeForPersist
Constants ¶
const SchemaVersion = "1.10"
SchemaVersion is the cache-key contributor for the extraction schema. Bumped from "1.1" to "1.2" by A3 (FK ON DELETE CASCADE on edges/blobs/pkg_tree/ topic_tree). Bumped from "1.2" to "1.3" by E3 because new node kinds (Endpoint, MessageType) and edge kinds (listens_on, handles_message, rpc_calls) materially change the extraction surface — pre-1.3 DBs are missing those rows, so incremental invalidation must force a cold rebuild on first run with this binary. Bumped from "1.3" to "1.4" by E4 (CKS G6 Temporal): NodeCommit + changed_in/blame edges are emitted by the new post-Build temporal pass; pre-1.4 DBs are missing those rows. Bumped from "1.4" to "1.5" by G6 v3 (pending_refs persistence): Pass 1's unresolved cross-file references are now persisted per-file so the partial-cache rebuild path can reconstruct Pass 2's input without re-parsing cached files. Pre-1.5 DBs are missing the table, so the first 1.5 build is forced cold by ManifestUsable's version check. Bumped from "1.5" to "1.6" by P2 (CKS G3 control-flow context propagation): timeout_path / cancellation_path self-loop edges are emitted from Go context.With* call sites; pre-1.6 DBs are missing those rows so the first 1.6 build must run cold. Bumped from "1.6" to "1.7" by Track C (detector gap fill): the edges row gains an optional `dispatch_kind` TEXT column populated for `invokes` edges (P1b), plus three new emit sites — `uses_type` (P0), `instantiates` (P1c), and the lock-edge fix inside goroutine bodies (P1a). Pre-1.7 DBs are missing the column AND the new edges; opening such a DB triggers an idempotent ALTER ADD COLUMN via Migrate(), and ManifestUsable's version check forces a cold rebuild on first 1.7 run so the new edges land in their natural emission order. Bumped from "1.7" to "1.8" by Hunk-graph H1 (CKS G6 Temporal extension): new node type NodeHunk + new edges has_hunk / adjacent + gzip-compressed unified-diff blobs persisted under the existing blobs.node_id PK. No schema DDL change (the new rows reuse existing tables); pre-1.8 DBs are missing the rows + the new node/edge type literals so ManifestUsable's version check forces a cold rebuild on first 1.8 run. Bumped from "1.8" to "1.9" by W1 of schema-1.9-spec (cross-language interop expansion): TypeScript HTTP server endpoint detection (Express / Fastify / Hono / Next.js App Router). Reuses the existing NodeEndpoint type + `listens_on` edge — no new enum literals, no new columns. The bump is purely a cache-key contributor so pre-1.9 DBs don't carry forward a missing-Endpoint TS graph view on first 1.9 build. Per §6.1 of the design spec, future W2/W3/W4 stages (HTTP client matching, gRPC, message queue) will stay on 1.9 and append-only. Bumped from "1.9" to "1.10" by within-language semantics Phase 4 (2026-05-11): slot reservation for W-B (`NodeAwaitPoint` + `EdgeAwaits`, TS async/await suspension flow) and W-C (`EdgeOverrides`, Solidity virtual/override semantics). detectors land in Phase 5 — this commit is slot-only, so pre-1.10 DBs are byte-identical in their existing rows but the cache key flip forces a cold rebuild on first 1.10 run for symmetry with prior bumps. No new DDL (the new enum literals fit existing nodes.type / edges.type TEXT columns); see docs/DISPATCH-WITHIN-LANG-SEMANTICS.md §2 Phase 4 and docs/design/{ts-async-await-and-interface,solidity-inheritance-and-interface-dispatch}.md.
Kept here (not in pkg/types) because only the cache key needs it; pkg/types schema version bumps already trigger rebuilds via this constant.
Variables ¶
This section is empty.
Functions ¶
func ComputeCacheKey ¶
ComputeCacheKey returns the SHA256 of:
file_content + "|ckg:" + ckgVersion + "|parser:" + parserVersion + "|schema:" + SchemaVersion
Any change in the four contributors invalidates the cache for that file and forces a reparse on next build (spec v0.2 § 4 design).
func ManifestUsable ¶
ManifestUsable reports whether old can be used as the cache base for a build under (ckgVersion, SchemaVersion). Returns false when the global invariants drifted — caller must discard the cache and full-rebuild (silent reuse with stale schema would corrupt the DB).
nil manifest → false. Empty schema/ckg version → false (defensive).
func Run ¶
Run executes the full pipeline. Side effects: writes OutDir/graph.db and OutDir/manifest.json. Returns the persisted Manifest summary so the caller can print stats without re-reading SQLite.
Cache routing (A3 Phase 1):
- --no-cache OR no prior manifest OR schema/version mismatch → cold rebuild
- all-cached AND no removals → short-circuit (timestamp refresh only)
- mixed dirty/cached → incremental (parse only dirty, reuse cached node sets)
Types ¶
type CacheDecisions ¶
type CacheDecisions struct {
Decisions []FileDecision
Hits int
Misses int
Removed int
}
CacheDecisions is the sorted, fully-classified result of one diff pass. Sorted by Path for deterministic logging.
func DiffManifest ¶
func DiffManifest(srcRoot string, discovered []DiscoveredFile, old *persist.Manifest, ckgVersion string) (CacheDecisions, error)
DiffManifest classifies every discovered file against the OLD manifest and emits a CacheDecisions in deterministic Path order. Files in old but not in the discovery are emitted as classRemoved.
Fast/slow path (spec § 4 build flow): mtime-equal entries skip the SHA256 recomputation and reuse the old hash; mtime-mismatched entries fall through to a full hash. Either way, the cache decision is byte-equal whether mtime changed or not — mtime is purely a perf hint.
func (CacheDecisions) CachedPaths ¶
func (cd CacheDecisions) CachedPaths() []string
CachedPaths returns the srcRoot-relative paths whose cache hit, in sorted order. Caller uses these to load nodes/edges from the DB.
func (CacheDecisions) DirtyPaths ¶
func (cd CacheDecisions) DirtyPaths() []string
DirtyPaths returns the srcRoot-relative paths of files needing reparse, in the discovery order they were emitted (deterministic).
func (CacheDecisions) FormatLogLine ¶
func (cd CacheDecisions) FormatLogLine() string
FormatLogLine returns a single human-readable summary line. Stable phrasing so operator runbooks can grep for it.
func (CacheDecisions) IsAllCached ¶
func (cd CacheDecisions) IsAllCached() bool
IsAllCached returns true when every decision is classCached and there are no removals. Used by the build pipeline to short-circuit Pass 2 / metrics when nothing actually changed.
func (CacheDecisions) RemovedPaths ¶
func (cd CacheDecisions) RemovedPaths() []string
RemovedPaths returns the srcRoot-relative paths that were in the old manifest but are not in the current discovery. Caller deletes their data.
type DiscoveredFile ¶
DiscoveredFile describes one file produced by detect.* — used as input to the diff. Path is srcRoot-relative slash form.
type FileDecision ¶
type FileDecision struct {
Path string
Language string
Class fileClass
// Populated for classDirty/classCached:
SHA256 string
CacheKey string
MTime int64
ParserVersion string
// Populated for classCached only — the matching entry from the OLD
// manifest, so the caller can pull NodeIDs out for reload.
Cached *persist.FileEntry
}
FileDecision is the cache decision for one file in the current discovery. For classRemoved the Path comes from the OLD manifest (file is gone) and Language/SHA256/CacheKey/MTime are zero.
type Options ¶
type Options struct {
SrcRoot string
OutDir string
Languages []string // {"auto"} | subset of {"go","ts","sol"}
Logger *slog.Logger
CKGVersion string
// NoCache forces a full rebuild — bypasses the A3 incremental cache and
// wipes graph.db at start. Use when the cache is suspect, or for clean
// benchmark runs.
NoCache bool
// RebuildMetrics forces PageRank/Leiden recompute even when the cache
// would otherwise reuse them. Phase 1 ALWAYS recomputes when any file
// is dirty (see Run below) — this flag is the explicit operator escape
// for the "all-cached but I want fresh metrics" case.
RebuildMetrics bool
// DBDSN is an optional PostgreSQL DSN (e.g. "postgres://user:pass@host/db").
// When set, the build persists to PostgreSQL instead of a local SQLite file.
// OutDir is still used for manifest.json; --no-cache and incremental work the
// same way (NodesByFilePath reads from PG with ORDER BY start_line).
DBDSN string
// StrictValidate, when true, fails the build on the first dangling edge or
// schema violation (legacy v0.x behaviour). Default false: dangling edges
// are dropped with a warning, schema violations still abort. Lenient mode
// is required for dogfooding self-analysis, where parser bugs would
// otherwise prevent graph.db from being written and block measurement.
StrictValidate bool
// FilesFromPath is the optional path to a JSON include/exclude filter
// (see internal/filterlist). When set, only files matching the filter
// reach the parsers. Empty means "use heuristic discovery as before".
FilesFromPath string
// LockPropagation enables D1 Stage B cross-function lock propagation
// (W-A, Within-language semantics Phase 5). When true, the cold build
// path walks the Go call graph from every lock-holding function up to
// lockPropagationMaxDepth=5 hops and emits accessed_under_lock(field,
// mutex) edges for fields touched in reachable callee bodies. Default
// false (opt-in per W-A §5.0 Q5) so existing builds are byte-identical.
// Incremental cache path skips propagation regardless of this flag —
// run with --no-cache when the flag is on to measure full effect.
// Spec: docs/design/go-cross-function-lock-propagation.md.
LockPropagation bool
}
Options controls one ckg build invocation.
type TopicTreeForPersist ¶
type TopicTreeForPersist = persist.TopicTreeInput
TopicTreeForPersist re-exposes persist.TopicTreeInput under a buildpipe- local alias so persistIncrementalArtifacts can take it as a typed param without leaking the persist package detail to every caller.