Documentation
¶
Overview ¶
Package cacheutil is the shared infrastructure layer for every on-disk cache in krit. It centralizes the disk-cache architecture so individual subsystems (parse trees, XML, resource indexes, FIR findings, library-model profiles, findings bundles, cross-findings, android findings, the analysis cache) reuse one coherent set of primitives instead of growing bespoke equivalents.
The disk-cache umbrella ¶
Every persistent on-disk cache in the project should:
- Implement Backend (Registered + StatsProvider) so it appears in AllRegistered/AllStats and can be cleared by the CLI's --clear-caches path. Caches that intentionally have no per-entry stats may implement Registered alone, but Backend is the default contract.
- Use SizeCapLRU for size-bounded entries.
- Use AsyncWriter for off-hot-path persistence.
- Use VersionedDir to invalidate on schema/grammar changes.
- Use ShardedEntryPath for the on-disk layout ({root}/{hash[:2]}/{hash[2:]}{ext}).
- Use EncodeZstdGob/DecodeZstdGob for the wire format unless a domain-specific format is required.
New persistent caches MUST register here at init() so the global stats/clear/budget reporting picks them up automatically. Caches that memoize per-run derived facts (per-file imports, per-node summaries, etc.) belong in internal/filefacts/ instead — that is the in-memory run-scoped cache layer.
The split:
- cacheutil → primitives + global registry for persistent on-disk caches.
- filefacts → run-scoped per-file/per-node memoization.
- rule-local sync.Maps → not allowed (enforced by ruleslinter).
Index ¶
- Constants
- func ClearAll(ctx ClearContext) error
- func DecodeZstdGob(r io.Reader, v any) error
- func EncodeZstdGob(v any) ([]byte, error)
- func IsZstdFrame(b []byte) bool
- func Register(c Registered)
- func SetLogger(l logger.Logger)
- func ShardedEntryPath(root, hash, ext string) string
- type AsyncWriter
- type AsyncWriterStats
- type Backend
- type BudgetReport
- type BudgetRow
- type CacheStats
- type ClearContext
- type LRUStats
- type NamedCacheStats
- type Registered
- type SchemaToken
- type SizeCapLRU
- func (l *SizeCapLRU) Evict() (int, error)
- func (l *SizeCapLRU) Flush() error
- func (l *SizeCapLRU) Forget(hash string)
- func (l *SizeCapLRU) MaybeEvict() (int, error)
- func (l *SizeCapLRU) Open() error
- func (l *SizeCapLRU) Record(hash string, size int64)
- func (l *SizeCapLRU) SetDeferEvictions(on bool)
- func (l *SizeCapLRU) Stats() LRUStats
- func (l *SizeCapLRU) Touch(hash string)
- type StatsProvider
- type VersionedDir
Constants ¶
const DefaultParseCacheCapBytes int64 = 200 * 1024 * 1024
DefaultParseCacheCapBytes is the default cap for the parse cache. Picked from measured usage across large Android and Kotlin repos. 200 MB covers typical Android/KMP repos with headroom while keeping very large repositories bounded.
Variables ¶
This section is empty.
Functions ¶
func ClearAll ¶
func ClearAll(ctx ClearContext) error
ClearAll invokes Clear() on every registered cache. Uses errors.Join; never short-circuits. Errors from individual caches are accumulated.
func DecodeZstdGob ¶
DecodeZstdGob decodes a zstd(gob(v)) cache payload from r.
func EncodeZstdGob ¶
EncodeZstdGob returns zstd(gob(v)) using the shared cache compression level. It is intended for whole-entry cache payloads whose final size is needed before writing LRU metadata.
func IsZstdFrame ¶
IsZstdFrame reports whether b starts with the RFC 8878 zstd frame magic. Tests use this as a regression guard against silently writing raw gob.
func Register ¶
func Register(c Registered)
Register adds c to the global registry. Idempotent-by-Name: if a cache with the same name is already registered, it is replaced and a warning is logged.
func SetLogger ¶
SetLogger replaces the package-level Logger. Intended for tests; production callers should not need to touch this.
func ShardedEntryPath ¶
ShardedEntryPath returns "{root}/{hash[:2]}/{hash[2:]}{ext}". ext must include the leading dot (".json", ".gob"). Hashes shorter than 3 chars fall back to "{root}/_/{hash}{ext}".
Types ¶
type AsyncWriter ¶
type AsyncWriter struct {
// contains filtered or unexported fields
}
AsyncWriter runs bounded background cache-write jobs. Submit is deliberately non-blocking: callers can fall back to synchronous writes when the queue is full or closing instead of silently dropping entries.
func NewAsyncWriter ¶
func NewAsyncWriter(workers, queueSize int) *AsyncWriter
NewAsyncWriter starts workers background goroutines and buffers up to queueSize accepted jobs. Values below one are clamped to one.
func (*AsyncWriter) Close ¶
func (w *AsyncWriter) Close() error
Close prevents new submissions, drains accepted jobs, and waits for worker goroutines to exit.
func (*AsyncWriter) Flush ¶
func (w *AsyncWriter) Flush() error
Flush waits for all accepted jobs to finish and returns accumulated write errors observed so far, if any.
func (*AsyncWriter) Stats ¶
func (w *AsyncWriter) Stats() AsyncWriterStats
type AsyncWriterStats ¶
type Backend ¶
type Backend interface {
Registered
StatsProvider
}
Backend is the canonical interface every persistent on-disk cache in krit should satisfy. It is the union of Registered (Name + Clear) and StatsProvider (Stats). New caches should implement Backend so the CLI's --verbose stats, --clear-caches path, and global Budget report all pick them up uniformly. Caches that genuinely have no per-entry stats may implement Registered alone.
type BudgetReport ¶
type BudgetReport struct {
CapBytes int64 `json:"capBytes"`
UsedBytes int64 `json:"usedBytes"`
PerCache []BudgetRow `json:"perCache"`
}
BudgetReport is the aggregate slice-of-cap view across registered caches. Rows are sorted by Bytes descending so the biggest consumer is visible first. CapBytes is the conceptual global cap; PctOfCap is rounded to two decimal places.
func Budget ¶
func Budget(capBytes int64) BudgetReport
Budget returns a BudgetReport built from the current stats registry. capBytes is the conceptual global cap (e.g. DefaultParseCacheCapBytes) used only to compute pctOfCap; pass <=0 to emit zeroed percentages. Called from --perf paths: cold and warm runs both produce a report — on a cold run usedBytes is 0 and PerCache rows all have Bytes=0.
type BudgetRow ¶
type BudgetRow struct {
Name string `json:"name"`
Bytes int64 `json:"bytes"`
PctOfCap float64 `json:"pctOfCap"`
}
BudgetRow is a single cache's contribution to the global cap.
type CacheStats ¶
type CacheStats struct {
Entries int `json:"entries"`
Bytes int64 `json:"bytes"`
Hits int64 `json:"hits"`
Misses int64 `json:"misses"`
Evictions int64 `json:"evictions"`
LastWriteUnix int64 `json:"lastWriteUnix,omitempty"`
AsyncQueued int64 `json:"asyncQueued,omitempty"`
AsyncCompleted int64 `json:"asyncCompleted,omitempty"`
AsyncFailed int64 `json:"asyncFailed,omitempty"`
AsyncBytes int64 `json:"asyncBytes,omitempty"`
}
CacheStats is a point-in-time snapshot of one on-disk cache. Counters are maintained on the hot path via atomic int64 so Stats() returns in O(1) without a per-lookup lock. Entries and Bytes reflect the running in-memory view maintained by each subsystem; a full disk walk is reserved for Probe() (not yet uniformly implemented) triggered by --verbose.
type ClearContext ¶
type ClearContext struct {
RepoDir string
}
ClearContext is passed to every Registered.Clear() call. Subsystems that need runtime-resolved paths (e.g. the repo root) read them from here rather than maintaining their own globals.
type NamedCacheStats ¶
type NamedCacheStats struct {
Name string `json:"name"`
Stats CacheStats `json:"stats"`
}
NamedCacheStats pairs a cache name with its current stats.
func AllStats ¶
func AllStats() []NamedCacheStats
AllStats returns a snapshot of stats for every registered cache that also implements StatsProvider. Order matches registration order.
type Registered ¶
type Registered interface {
Name() string
Clear(ctx ClearContext) error
}
Registered is anything that can be enumerated and cleared wholesale.
func AllRegistered ¶
func AllRegistered() []Registered
AllRegistered returns a snapshot of every currently-registered cache. Used for --verbose output and tests.
type SchemaToken ¶
type SchemaToken struct {
Name string // filename under Root, e.g. "version" or "grammar-version"
Value string
}
SchemaToken is one named-version dimension.
type SizeCapLRU ¶
type SizeCapLRU struct {
EntriesRoot string // absolute path to the entries subtree
IndexPath string // absolute path to the sidecar index file
LockPath string // absolute path to the eviction lock file
Ext string // entry file extension, including leading dot
CapBytes int64 // size cap in bytes; <=0 disables eviction
LowWaterFrac float64 // evict to this fraction of CapBytes (default 0.80)
Remove func(hash string) error
RemoveBatch func(hashes []string) error
TrustIndex bool // skip per-entry filesystem validation for packed stores
// contains filtered or unexported fields
}
SizeCapLRU is a reusable LRU cap for a sharded cache dir.
func (*SizeCapLRU) Evict ¶
func (l *SizeCapLRU) Evict() (int, error)
Evict runs eviction unconditionally (subject only to the cap and lock gating). Use this at teardown after deferring evictions during the hot write path so the cap is applied once instead of per-batch.
func (*SizeCapLRU) Flush ¶
func (l *SizeCapLRU) Flush() error
Flush persists the index to the sidecar file if it has changed. Idempotent; safe to call on shutdown.
func (*SizeCapLRU) Forget ¶
func (l *SizeCapLRU) Forget(hash string)
Forget removes hash from the index. Callers pair this with an os.Remove of the underlying file (e.g. on decode error).
func (*SizeCapLRU) MaybeEvict ¶
func (l *SizeCapLRU) MaybeEvict() (int, error)
MaybeEvict runs eviction when the cap is exceeded. Returns the number of entries removed and any error persisting the sidecar. When another process holds the lock, skips with (0, nil) — last-write-wins on the race is acceptable; the cap is a soft target. A no-op while SetDeferEvictions(true) is active; callers should Evict() at teardown.
func (*SizeCapLRU) Open ¶
func (l *SizeCapLRU) Open() error
Open loads the sidecar index. A missing or corrupt sidecar triggers a rebuild by walking EntriesRoot and using file mtime as the initial access time. Safe to call multiple times; subsequent calls are no-ops.
func (*SizeCapLRU) Record ¶
func (l *SizeCapLRU) Record(hash string, size int64)
Record registers a Save of hash at size bytes. Updates the running total and access time. Replaces any prior entry (sizes may shift across grammar-version bumps).
func (*SizeCapLRU) SetDeferEvictions ¶
func (l *SizeCapLRU) SetDeferEvictions(on bool)
SetDeferEvictions toggles whether MaybeEvict performs work. When on, hot-path callers (write batches) skip eviction so they don't pay the full sort+delete cost on every batch. Pair with an Evict() call at teardown to apply the cap once.
func (*SizeCapLRU) Stats ¶
func (l *SizeCapLRU) Stats() LRUStats
Stats returns a snapshot for --perf / metrics.
func (*SizeCapLRU) Touch ¶
func (l *SizeCapLRU) Touch(hash string)
Touch records a cache hit for hash at time now. A no-op for unknown hashes; callers should only Touch hashes known to be on disk.
type StatsProvider ¶
type StatsProvider interface {
Stats() CacheStats
}
StatsProvider is an optional extension to Registered. A Registered cache that also implements StatsProvider shows up in AllStats().
type VersionedDir ¶
type VersionedDir struct {
Root string // absolute path to the cache root
EntriesDir string // subdir under Root whose contents are nuked on mismatch (default: "entries")
ExtraDirs []string // additional subdirs under Root whose contents are nuked on mismatch
ExtraFiles []string // additional files under Root whose contents are nuked on mismatch
Tokens []SchemaToken // written to {Root}/{Name} sidecar files
}
VersionedDir is a cache directory whose contents are invalidated when any of the declared schema tokens change.
func (VersionedDir) Clear ¶
func (v VersionedDir) Clear() error
Clear removes the entire cache root. Safe to call when the dir is missing.
func (VersionedDir) Open ¶
func (v VersionedDir) Open() (entriesDir string, err error)
Open ensures the directory tree exists, checks every token against its sidecar, and removes-and-recreates EntriesDir if any mismatch is found. Missing sidecars on first run are written without nuking (fresh repo). Returns the absolute path to EntriesDir.