cacheutil

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 8, 2026 License: MIT Imports: 16 Imported by: 0

Documentation

Overview

Package cacheutil is the shared infrastructure layer for every on-disk cache in krit. It centralizes the disk-cache architecture so individual subsystems (parse trees, XML, resource indexes, FIR findings, library-model profiles, findings bundles, cross-findings, android findings, the analysis cache) reuse one coherent set of primitives instead of growing bespoke equivalents.

The disk-cache umbrella

Every persistent on-disk cache in the project should:

  • Implement Backend (Registered + StatsProvider) so it appears in AllRegistered/AllStats and can be cleared by the CLI's --clear-caches path. Caches that intentionally have no per-entry stats may implement Registered alone, but Backend is the default contract.
  • Use SizeCapLRU for size-bounded entries.
  • Use AsyncWriter for off-hot-path persistence.
  • Use VersionedDir to invalidate on schema/grammar changes.
  • Use ShardedEntryPath for the on-disk layout ({root}/{hash[:2]}/{hash[2:]}{ext}).
  • Use EncodeZstdGob/DecodeZstdGob for the wire format unless a domain-specific format is required.

New persistent caches MUST register here at init() so the global stats/clear/budget reporting picks them up automatically. Caches that memoize per-run derived facts (per-file imports, per-node summaries, etc.) belong in internal/filefacts/ instead — that is the in-memory run-scoped cache layer.

The split:

  • cacheutil → primitives + global registry for persistent on-disk caches.
  • filefacts → run-scoped per-file/per-node memoization.
  • rule-local sync.Maps → not allowed (enforced by ruleslinter).

Index

Constants

View Source
const DefaultParseCacheCapBytes int64 = 200 * 1024 * 1024

DefaultParseCacheCapBytes is the default cap for the parse cache. Picked from measured usage across large Android and Kotlin repos. 200 MB covers typical Android/KMP repos with headroom while keeping very large repositories bounded.

Variables

This section is empty.

Functions

func ClearAll

func ClearAll(ctx ClearContext) error

ClearAll invokes Clear() on every registered cache. Uses errors.Join; never short-circuits. Errors from individual caches are accumulated.

func DecodeZstdGob

func DecodeZstdGob(r io.Reader, v any) error

DecodeZstdGob decodes a zstd(gob(v)) cache payload from r.

func EncodeZstdGob

func EncodeZstdGob(v any) ([]byte, error)

EncodeZstdGob returns zstd(gob(v)) using the shared cache compression level. It is intended for whole-entry cache payloads whose final size is needed before writing LRU metadata.

func IsZstdFrame

func IsZstdFrame(b []byte) bool

IsZstdFrame reports whether b starts with the RFC 8878 zstd frame magic. Tests use this as a regression guard against silently writing raw gob.

func Register

func Register(c Registered)

Register adds c to the global registry. Idempotent-by-Name: if a cache with the same name is already registered, it is replaced and a warning is logged.

func SetLogger

func SetLogger(l logger.Logger)

SetLogger replaces the package-level Logger. Intended for tests; production callers should not need to touch this.

func ShardedEntryPath

func ShardedEntryPath(root, hash, ext string) string

ShardedEntryPath returns "{root}/{hash[:2]}/{hash[2:]}{ext}". ext must include the leading dot (".json", ".gob"). Hashes shorter than 3 chars fall back to "{root}/_/{hash}{ext}".

Types

type AsyncWriter

type AsyncWriter struct {
	// contains filtered or unexported fields
}

AsyncWriter runs bounded background cache-write jobs. Submit is deliberately non-blocking: callers can fall back to synchronous writes when the queue is full or closing instead of silently dropping entries.

func NewAsyncWriter

func NewAsyncWriter(workers, queueSize int) *AsyncWriter

NewAsyncWriter starts workers background goroutines and buffers up to queueSize accepted jobs. Values below one are clamped to one.

func (*AsyncWriter) Close

func (w *AsyncWriter) Close() error

Close prevents new submissions, drains accepted jobs, and waits for worker goroutines to exit.

func (*AsyncWriter) Flush

func (w *AsyncWriter) Flush() error

Flush waits for all accepted jobs to finish and returns accumulated write errors observed so far, if any.

func (*AsyncWriter) Stats

func (w *AsyncWriter) Stats() AsyncWriterStats

func (*AsyncWriter) Submit

func (w *AsyncWriter) Submit(job func() (int64, error)) bool

Submit accepts a write job if the writer is open and its queue has capacity. It returns false without blocking when the caller should perform the write synchronously.

type AsyncWriterStats

type AsyncWriterStats struct {
	Queued    int64 `json:"queued"`
	Completed int64 `json:"completed"`
	Failed    int64 `json:"failed"`
	Bytes     int64 `json:"bytes"`
}

type Backend

type Backend interface {
	Registered
	StatsProvider
}

Backend is the canonical interface every persistent on-disk cache in krit should satisfy. It is the union of Registered (Name + Clear) and StatsProvider (Stats). New caches should implement Backend so the CLI's --verbose stats, --clear-caches path, and global Budget report all pick them up uniformly. Caches that genuinely have no per-entry stats may implement Registered alone.

type BudgetReport

type BudgetReport struct {
	CapBytes  int64       `json:"capBytes"`
	UsedBytes int64       `json:"usedBytes"`
	PerCache  []BudgetRow `json:"perCache"`
}

BudgetReport is the aggregate slice-of-cap view across registered caches. Rows are sorted by Bytes descending so the biggest consumer is visible first. CapBytes is the conceptual global cap; PctOfCap is rounded to two decimal places.

func Budget

func Budget(capBytes int64) BudgetReport

Budget returns a BudgetReport built from the current stats registry. capBytes is the conceptual global cap (e.g. DefaultParseCacheCapBytes) used only to compute pctOfCap; pass <=0 to emit zeroed percentages. Called from --perf paths: cold and warm runs both produce a report — on a cold run usedBytes is 0 and PerCache rows all have Bytes=0.

type BudgetRow

type BudgetRow struct {
	Name     string  `json:"name"`
	Bytes    int64   `json:"bytes"`
	PctOfCap float64 `json:"pctOfCap"`
}

BudgetRow is a single cache's contribution to the global cap.

type CacheStats

type CacheStats struct {
	Entries        int   `json:"entries"`
	Bytes          int64 `json:"bytes"`
	Hits           int64 `json:"hits"`
	Misses         int64 `json:"misses"`
	Evictions      int64 `json:"evictions"`
	LastWriteUnix  int64 `json:"lastWriteUnix,omitempty"`
	AsyncQueued    int64 `json:"asyncQueued,omitempty"`
	AsyncCompleted int64 `json:"asyncCompleted,omitempty"`
	AsyncFailed    int64 `json:"asyncFailed,omitempty"`
	AsyncBytes     int64 `json:"asyncBytes,omitempty"`
}

CacheStats is a point-in-time snapshot of one on-disk cache. Counters are maintained on the hot path via atomic int64 so Stats() returns in O(1) without a per-lookup lock. Entries and Bytes reflect the running in-memory view maintained by each subsystem; a full disk walk is reserved for Probe() (not yet uniformly implemented) triggered by --verbose.

type ClearContext

type ClearContext struct {
	RepoDir string
}

ClearContext is passed to every Registered.Clear() call. Subsystems that need runtime-resolved paths (e.g. the repo root) read them from here rather than maintaining their own globals.

type LRUStats

type LRUStats struct {
	Entries int
	Bytes   int64
	Cap     int64
}

LRUStats is a snapshot for metrics.

type NamedCacheStats

type NamedCacheStats struct {
	Name  string     `json:"name"`
	Stats CacheStats `json:"stats"`
}

NamedCacheStats pairs a cache name with its current stats.

func AllStats

func AllStats() []NamedCacheStats

AllStats returns a snapshot of stats for every registered cache that also implements StatsProvider. Order matches registration order.

type Registered

type Registered interface {
	Name() string
	Clear(ctx ClearContext) error
}

Registered is anything that can be enumerated and cleared wholesale.

func AllRegistered

func AllRegistered() []Registered

AllRegistered returns a snapshot of every currently-registered cache. Used for --verbose output and tests.

type SchemaToken

type SchemaToken struct {
	Name  string // filename under Root, e.g. "version" or "grammar-version"
	Value string
}

SchemaToken is one named-version dimension.

type SizeCapLRU

type SizeCapLRU struct {
	EntriesRoot  string  // absolute path to the entries subtree
	IndexPath    string  // absolute path to the sidecar index file
	LockPath     string  // absolute path to the eviction lock file
	Ext          string  // entry file extension, including leading dot
	CapBytes     int64   // size cap in bytes; <=0 disables eviction
	LowWaterFrac float64 // evict to this fraction of CapBytes (default 0.80)
	Remove       func(hash string) error
	RemoveBatch  func(hashes []string) error
	TrustIndex   bool // skip per-entry filesystem validation for packed stores
	// contains filtered or unexported fields
}

SizeCapLRU is a reusable LRU cap for a sharded cache dir.

func (*SizeCapLRU) Evict

func (l *SizeCapLRU) Evict() (int, error)

Evict runs eviction unconditionally (subject only to the cap and lock gating). Use this at teardown after deferring evictions during the hot write path so the cap is applied once instead of per-batch.

func (*SizeCapLRU) Flush

func (l *SizeCapLRU) Flush() error

Flush persists the index to the sidecar file if it has changed. Idempotent; safe to call on shutdown.

func (*SizeCapLRU) Forget

func (l *SizeCapLRU) Forget(hash string)

Forget removes hash from the index. Callers pair this with an os.Remove of the underlying file (e.g. on decode error).

func (*SizeCapLRU) MaybeEvict

func (l *SizeCapLRU) MaybeEvict() (int, error)

MaybeEvict runs eviction when the cap is exceeded. Returns the number of entries removed and any error persisting the sidecar. When another process holds the lock, skips with (0, nil) — last-write-wins on the race is acceptable; the cap is a soft target. A no-op while SetDeferEvictions(true) is active; callers should Evict() at teardown.

func (*SizeCapLRU) Open

func (l *SizeCapLRU) Open() error

Open loads the sidecar index. A missing or corrupt sidecar triggers a rebuild by walking EntriesRoot and using file mtime as the initial access time. Safe to call multiple times; subsequent calls are no-ops.

func (*SizeCapLRU) Record

func (l *SizeCapLRU) Record(hash string, size int64)

Record registers a Save of hash at size bytes. Updates the running total and access time. Replaces any prior entry (sizes may shift across grammar-version bumps).

func (*SizeCapLRU) SetDeferEvictions

func (l *SizeCapLRU) SetDeferEvictions(on bool)

SetDeferEvictions toggles whether MaybeEvict performs work. When on, hot-path callers (write batches) skip eviction so they don't pay the full sort+delete cost on every batch. Pair with an Evict() call at teardown to apply the cap once.

func (*SizeCapLRU) Stats

func (l *SizeCapLRU) Stats() LRUStats

Stats returns a snapshot for --perf / metrics.

func (*SizeCapLRU) Touch

func (l *SizeCapLRU) Touch(hash string)

Touch records a cache hit for hash at time now. A no-op for unknown hashes; callers should only Touch hashes known to be on disk.

type StatsProvider

type StatsProvider interface {
	Stats() CacheStats
}

StatsProvider is an optional extension to Registered. A Registered cache that also implements StatsProvider shows up in AllStats().

type VersionedDir

type VersionedDir struct {
	Root       string        // absolute path to the cache root
	EntriesDir string        // subdir under Root whose contents are nuked on mismatch (default: "entries")
	ExtraDirs  []string      // additional subdirs under Root whose contents are nuked on mismatch
	ExtraFiles []string      // additional files under Root whose contents are nuked on mismatch
	Tokens     []SchemaToken // written to {Root}/{Name} sidecar files
}

VersionedDir is a cache directory whose contents are invalidated when any of the declared schema tokens change.

func (VersionedDir) Clear

func (v VersionedDir) Clear() error

Clear removes the entire cache root. Safe to call when the dir is missing.

func (VersionedDir) Open

func (v VersionedDir) Open() (entriesDir string, err error)

Open ensures the directory tree exists, checks every token against its sidecar, and removes-and-recreates EntriesDir if any mismatch is found. Missing sidecars on first run are written without nuking (fresh repo). Returns the absolute path to EntriesDir.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL