Documentation
¶
Overview ¶
Package types holds the cross-package data contracts: Chunk, Hit, Filter, the Embedder and VectorStore interfaces. Keeping these here (rather than inside internal/) lets future CKS code import them without pulling in indexer/store implementations.
Index ¶
- func ChunkID(file string, startLine, endLine int, contentSHA256 string) string
- func ContentSHA256(text string) string
- func IsTestPath(path, lang string) bool
- type Branch
- type Chunk
- type ChunkKind
- type Citation
- type Embedder
- type EmbeddingIdentity
- type EnforcePoint
- type Filter
- type FlowSpineMeta
- type FlowStepMeta
- type Hit
- type HitScore
- type InvariantRef
- type InvariantTier
- type ModificationGuidance
- type PRRef
- type Stats
- type SymbolKind
- type VectorStore
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ChunkID ¶
ChunkID computes the deterministic chunk identifier:
sha256(file + "\n" + start_line + ":" + end_line + "\n" + content_sha256)
content_sha256 is the SHA-256 of the chunk Text (raw bytes — no whitespace normalization). A rename of the file changes the ID; this is intentional — rename tracking is the caller's responsibility.
func ContentSHA256 ¶
ContentSHA256 returns the canonical hash used in chunk_id and stored alongside each chunk for stale-detection. Single-source-of-truth helper — every caller (chunker, store loader, eval harness) goes through this so hashing stays consistent.
func IsTestPath ¶
IsTestPath classifies a source-relative path as a test file based on the conventional patterns of its language. It is intentionally a pure function so the chunker can call it without depending on language parsers, and so callers can re-classify when reindexing without a schema migration.
Conventions covered:
Go "*_test.go" TypeScript "*.test.ts(x)", "*.spec.ts(x)" JavaScript "*.test.js(x)", "*.spec.js(x)" (for future JS parser) Solidity "*.t.sol" (Foundry), any segment named "test" or "tests"
path is forward-slash, repo-relative. lang is the language tag the discover/parse layer assigned ("go", "typescript", "solidity", or "").
Why per-language convention: testing frameworks pick filename rules that drift between ecosystems. JUnit Java would be `Test*.java`, Python pytest is `test_*.py`. Adding a language => add one branch here. Keep the function short and explicit (P5 — readable beats clever) so the contributor adding the next language can see at a glance what to extend.
Types ¶
type Branch ¶
Branch is one conditional edge inside a flow step: under condition When, control goes to Then at code location At. Mapping a symptom (When) to its cause site (Then@At) is the core of flow-based root-cause analysis.
type Chunk ¶
type Chunk struct {
ID string `json:"id"` // see ChunkID
File string `json:"file"`
StartLine int `json:"start_line"`
EndLine int `json:"end_line"`
Language string `json:"language"` // "go" | "typescript" | "solidity" | "markdown"
IsTest bool `json:"is_test,omitempty"` // _test.go, *.test.ts, *.spec.ts, *.t.sol, test/... — populated by IsTestPath
SymbolName string `json:"symbol_name,omitempty"`
SymbolKind SymbolKind `json:"symbol_kind,omitempty"`
ChunkKind ChunkKind `json:"chunk_kind"`
CommitHash string `json:"commit_hash"`
ContentSHA256 string `json:"content_sha256"`
CKGNodeID string `json:"ckg_node_id,omitempty"` // 1:1 alignment when CKG path is provided
CanonicalID string `json:"canonical_id,omitempty"` // ckg's import-path-qualified symbol id (ADR-0001), copied verbatim from the aligned ckg node; the stable key cks uses to FindByCanonicalID against ckg
RecentPRs []PRRef `json:"recent_prs,omitempty"` // PRs that touched this chunk's file
Category string `json:"category,omitempty"` // policy category: consensus|state|crypto|p2p|... (empty = unclassified)
Guidance *ModificationGuidance `json:"guidance,omitempty"` // attached by policy loader; nil for unclassified
Invariants []InvariantRef `json:"invariants,omitempty"` // back-pointers to ChunkInvariant chunks extracted from this source
ConventionStats map[string]any `json:"convention_stats,omitempty"` // populated on ChunkConvention chunks; empty for source chunks
FlowStep *FlowStepMeta `json:"flow_step,omitempty"` // populated on ChunkFlowStep chunks (flow_meta column)
FlowSpine *FlowSpineMeta `json:"flow_spine,omitempty"` // populated on ChunkFlowSpine chunks (flow_meta column)
Provenance string `json:"provenance,omitempty"` // invariant origin: "auto" (extracted) | "curated" (corpus); empty for non-invariant chunks
EnforcedAt []EnforcePoint `json:"enforced_at,omitempty"` // populated on curated ChunkInvariant chunks (enforced_at column)
Text string `json:"text"` // chunk source (for re-embedding / display)
}
Chunk is the unit CKV embeds and stores. It is the indexable record produced by parse → chunk; the embedder turns Text into a vector and the store persists everything except Text-derived caches.
type ChunkKind ¶
type ChunkKind string
ChunkKind classifies the chunking strategy that produced the chunk. Distinct from SymbolKind because a long function may produce several "function_split" chunks, all of SymbolKind=Function.
const ( ChunkSymbol ChunkKind = "symbol" // whole function/method/type ChunkFunctionSplit ChunkKind = "function_split" // sub-chunk of a long function ChunkFileHeader ChunkKind = "file_header" // import block / top-of-file // ChunkDoc covers markdown heading sections (DocSection/ADRSection). // Kept distinct from ChunkSymbol so callers can filter the corpus by // "code vs documentation" without inspecting SymbolKind. The chunker // promotes spans whose SymbolKind is DocSection or ADRSection. ChunkDoc ChunkKind = "doc" // PR corpus kinds. Additive — existing schema_version 1.0 // indexes continue working; these appear only in indexes built with // --include-pr-history. ChunkPRBackground ChunkKind = "pr_background" ChunkPRSolution ChunkKind = "pr_solution" ChunkCommitMessage ChunkKind = "commit_message" // ChunkInvariant carries an invariant statement found inside or // adjacent to a source chunk. Each invariant chunk is paired (via // the source chunk's Invariants []InvariantRef list) with the code // it constrains. The agent can query invariants for a file to // learn what changes must NOT break. ChunkInvariant ChunkKind = "invariant" // ChunkConvention is a per-package summary of AST-derived patterns // (error handling style, logging library, naming, concurrency). // The agent queries these to learn what idioms the package follows // before proposing edits — preventing convention drift. ChunkConvention ChunkKind = "convention" // Flow-corpus kinds. A curated flow corpus (corpus.jsonl, loaded via // --flow-corpus) describes "현상 → 원인" causal paths through the code so // an agent can trace a symptom to its cause. Additive — present only in // indexes built with --flow-corpus. ChunkFlowStep ChunkKind = "flow_step" // one step in a flow (symbol + branches) ChunkFlowSpine ChunkKind = "flow_spine" // a flow's entry/summary backbone )
type Citation ¶
type Citation struct {
File string `json:"file"`
StartLine int `json:"start_line"`
EndLine int `json:"end_line"`
CommitHash string `json:"commit_hash"`
}
Citation is the {file, start_line, end_line, commit_hash} tuple CKV attaches to every chunk and every search hit. CKG uses the same shape, so hybrid responses can be merged without translation.
type Embedder ¶
type Embedder interface {
Identity() EmbeddingIdentity
Name() string
Dimension() int
MaxInputTokens() int
Embed(ctx context.Context, batch []string) ([][]float32, error)
}
Embedder turns text into a fixed-dimension vector. Implementations:
- internal/embed/mock — deterministic hash-based, for tests
- internal/embed/bgeonnx — ONNX-backed local embedder; supports a model registry (see model_config.go), currently bge-large-en-v1.5 by default.
- pkg/embed/ollama — Ollama HTTP API backend.
Embedder interface contract:
- Identity reports the embedding space (provider/model/dim/pooling/ normalization). Every backend implements it from its own model definition, so a new model or provider conforms to the same contract and gets index-compatibility enforcement (query.Open) for free. Name() and Dimension() are kept for convenience and MUST agree with Identity().Model and Identity().Dim.
- Name returns a stable identifier persisted in the manifest (e.g. "bge-large-en-v1.5"). Mismatch on rebuild → IndexUnavailable.
- Dimension is the vector length. Used to size the sqlite-vec column.
- MaxInputTokens is the model's context limit; the chunker truncates overlong text up front (signature stays at the head).
- Embed is batched. Implementations choose internal batching (CPU≈32, GPU≈256) but the caller MAY pass arbitrary-size slices.
type EmbeddingIdentity ¶
type EmbeddingIdentity struct {
Provider string // backend that produced the vectors, e.g. "ollama", "bgeonnx", "mock"
Model string // model name, e.g. "bge-m3"
Dim int // vector dimension
Pooling string // "cls" | "mean" | "last_token"; "" when the backend does not expose it
Normalize string // "l2" | "none"; "" when unknown
}
EmbeddingIdentity describes the vector space an Embedder produces. It is model-agnostic: each embedder fills it from its own configuration (e.g. a model registry), so adding or swapping an embedding model needs no change here — the identity flows from the model definition.
func (EmbeddingIdentity) Checksum ¶
func (id EmbeddingIdentity) Checksum() string
Checksum is a stable identity string for the embedding space. Two embedders that produce comparable vectors yield the same Checksum; any difference (provider, model, dim, pooling, normalization) yields a different one. It is recorded in the manifest at build time and compared on Open so a silently-incompatible index/embedder pair (e.g. Ollama bge-m3 vs ONNX bge-m3) is rejected with a reindex hint instead of returning meaningless similarity scores.
type EnforcePoint ¶
type EnforcePoint struct {
Flow string `json:"flow"`
Step string `json:"step"`
Loc string `json:"loc"`
}
EnforcePoint records where a curated invariant is enforced: a step in a flow at a code location. Serialized into the enforced_at column.
type Filter ¶
type Filter struct {
Language string `json:"language,omitempty"`
PathGlob string `json:"path,omitempty"`
SymbolKinds []SymbolKind `json:"symbol_kinds,omitempty"`
CommitHash string `json:"commit_hash,omitempty"`
}
Filter narrows a vector search by metadata. All fields are optional; an empty field is treated as "any". Filters are AND-combined.
Filter fields:
- Language: "go" | "typescript" | "solidity" | "markdown"
- PathGlob: filepath.Match-style glob (single-star; doublestar planned)
- SymbolKinds: e.g. {Function, Method}
- CommitHash: pin to a specific historical commit's chunks
func (Filter) IsZero ¶
IsZero reports whether the filter would match every chunk. Used by store implementations to skip the post-filter step entirely on the hot path.
type FlowSpineMeta ¶
type FlowSpineMeta struct {
FlowID string `json:"flow_id"`
EntryPoint string `json:"entry_point,omitempty"`
Trigger string `json:"trigger,omitempty"`
RootSymbol string `json:"root_symbol,omitempty"`
Links []string `json:"links,omitempty"`
CalledBy []string `json:"called_by,omitempty"`
}
FlowSpineMeta is the structured metadata for a ChunkFlowSpine chunk: a flow's entry point, what triggers it, and how it links to other flows. Serialized into the flow_meta column (populated in Phase B).
type FlowStepMeta ¶
type FlowStepMeta struct {
FlowID string `json:"flow_id"`
StepID string `json:"step_id"`
Symbol string `json:"symbol,omitempty"`
Kind string `json:"kind,omitempty"`
Calls []string `json:"calls,omitempty"`
Reads string `json:"reads,omitempty"`
Writes string `json:"writes,omitempty"`
Emits string `json:"emits,omitempty"`
Branches []Branch `json:"branches,omitempty"`
Invariants []string `json:"invariants,omitempty"`
}
FlowStepMeta is the structured metadata for a ChunkFlowStep chunk: the symbol the step runs at, the symbols it calls, what it reads/writes/emits, its conditional branches, and the invariant ids it must uphold. Serialized into the flow_meta column (populated in Phase B).
type Hit ¶
type Hit struct {
Chunk Chunk `json:"chunk"`
Score HitScore `json:"score"`
// StaleCitation is set by the citation-enforcement step when the
// chunk's recorded commit_hash differs from the source tree's
// current git HEAD. The hit is still returned — the file usually
// still has useful content at a different commit — but downstream
// consumers can warn the user or downgrade the snippet shape.
StaleCitation bool `json:"stale_citation,omitempty"`
}
Hit is a single search result. Score values are normalized so callers can compare across backends; raw distance is preserved for RRF input.
type HitScore ¶
type HitScore struct {
Normalized float64 `json:"normalized"` // 1 - distance/2, in [0,1]
VectorDistance float64 `json:"vector_distance"` // raw cosine distance, in [0,2]
VectorRank int `json:"vector_rank"` // 1-based within this query's vector hits
BM25Score float64 `json:"bm25_score,omitempty"` // candidate-set BM25, 0 when rerank disabled or no token match
HybridRank int `json:"hybrid_rank,omitempty"` // 1-based position after RRF fusion; 0 when rerank disabled
}
HitScore exposes both the normalized score (higher = better, range [0,1]) and the raw cosine distance (lower = better, range [0,2]). The RRF fuser upstream consumes Rank; lower-layer query callers display Normalized.
BM25Score and HybridRank are omitempty fields for the optional BM25 rerank pass. They stay zero (and absent from JSON) when Options.EnableBM25Rerank is off, preserving the schema for callers that haven't opted in.
type InvariantRef ¶
type InvariantRef struct {
ChunkID string `json:"chunk_id"` // ID of the ChunkInvariant chunk
Tier InvariantTier `json:"tier"` // 1, 2, or 3
Marker string `json:"marker,omitempty"` // e.g. "CRITICAL", "panic"
}
InvariantRef is a back-pointer attached to a source Chunk pointing at the ChunkInvariant(s) extracted from inside or near it. Kept small so adding it to every chunk does not balloon storage.
type InvariantTier ¶
type InvariantTier int
InvariantTier classifies how an invariant was detected.
Tier 1 — existing marker (// CRITICAL, // IMPORTANT, // WARNING, // Deprecated:) Tier 2 — new convention marker (// INVARIANT:, // CONSENSUS:, // SECURITY:) Tier 3 — heuristic (panic(...) / fmt.Errorf(...) with policy keywords)
Lower tiers carry higher confidence; the agent can filter by tier when noise tolerance is low (e.g. only tier 1+2 during a release).
const ( InvariantTierExistingMarker InvariantTier = 1 InvariantTierNewMarker InvariantTier = 2 InvariantTierHeuristic InvariantTier = 3 )
type ModificationGuidance ¶
type ModificationGuidance struct {
AlsoReview []string `json:"also_review,omitempty"` // other categories/files to inspect together
RequiredTests []string `json:"required_tests,omitempty"` // test suites the change should exercise
WatchOut []string `json:"watch_out,omitempty"` // pitfalls / hard-fork / byzantine risks
}
ModificationGuidance is project-policy advice attached to a chunk by the policy loader. It surfaces "if you touch this code, here is what else to consider" hints derived from the chunk's path category (e.g. consensus, state, p2p). All fields may be empty.
Guidance is informative, not enforcement. A nil pointer means the chunk's path did not match any policy rule.
type PRRef ¶
type PRRef struct {
Number int `json:"number"`
Title string `json:"title"`
MergedAtUTC string `json:"merged_at_utc,omitempty"`
}
PRRef records a PR that touched a chunk's file or symbol. Stored as JSON in the recent_prs column; the temporal slicing key (MergedAtUTC) lets query-time filtering exclude PRs merged after a cutoff.
type Stats ¶
type Stats struct {
ChunkCount int `json:"chunk_count"`
EmbeddingModel string `json:"embedding_model"`
EmbeddingDim int `json:"embedding_dim"`
IndexedHead string `json:"indexed_head"`
BuiltAt string `json:"built_at"`
SchemaVersion string `json:"schema_version"`
}
Stats reports index health. Returned by VectorStore.Stats and surfaced via the MCP `cks.ops.health` tool.
type SymbolKind ¶
type SymbolKind string
SymbolKind enumerates the AST node kinds CKV chunks against. Stored as a plain string for forward-compatibility with new languages.
const ( KindFunction SymbolKind = "Function" KindMethod SymbolKind = "Method" KindType SymbolKind = "Type" KindStruct SymbolKind = "Struct" KindInterface SymbolKind = "Interface" KindContract SymbolKind = "Contract" // Solidity KindEvent SymbolKind = "Event" // Solidity (TBD) KindModifier SymbolKind = "Modifier" // Solidity (TBD) KindFileHeader SymbolKind = "FileHeader" // Markdown indexing kinds. // Each heading-bounded section in a *.md / *.markdown file becomes one // SymbolSpan; the chunker emits a chunk per span so "왜 X 결정했나" style // queries can hit a specific decision section. KindDocSection SymbolKind = "DocSection" // markdown heading section KindADRSection SymbolKind = "ADRSection" // ADR-* / docs/adr/* markdown sections )
type VectorStore ¶
type VectorStore interface {
// Upsert inserts or replaces chunks keyed by Chunk.ID. The vector is
// derived from chunk.Text via the configured Embedder before calling.
// Note the (chunk, embedding) pairing is positional and equal-length.
Upsert(ctx context.Context, chunks []Chunk, embeddings [][]float32) error
// DeleteByFile removes every chunk whose File equals path. Used by
// the incremental indexer and by the file-rename safety path.
DeleteByFile(ctx context.Context, path string) error
// Search returns the top-k nearest chunks under cosine distance,
// post-filtered by `filter`. k is the desired result count; the
// implementation may over-fetch (e.g. 3*k) for re-rank head-room.
Search(ctx context.Context, query []float32, k int, filter Filter) ([]Hit, error)
// Stats reports indexed counts and the embedding model identity
// stored at build time. Cheap (single SQL roundtrip).
Stats(ctx context.Context) (Stats, error)
// Close releases the backing handle. Idempotent.
Close() error
}
VectorStore is the persistence + ANN search surface. Implementations:
- internal/store/sqlitevec — SQLite + vec0 virtual table
- internal/store/memory — in-RAM map (tests + dev loop)
All methods are safe to call from a single goroutine; concurrent callers must serialize themselves (the indexer pipeline is sequential per file).