Documentation
¶
Overview ¶
Package budget provides memory budget calculation and auto-tuning for codefang history analysis.
Index ¶
Constants ¶
const ( // BaseOverhead is the fixed Go runtime + libgit2 overhead. // Includes shared mmap of pack files (~200 MB for large repos). BaseOverhead = 250 * units.MiB // RepoHandleSize is the Go-visible memory per worker for libgit2 repository handle. RepoHandleSize = 10 * units.MiB // WorkerNativeOverhead is the per-worker C/mmap overhead from libgit2. // Each worker opens the repo and mmaps pack index files; the OS faults in // pack data pages during object lookups. Empirically ~50-100 MB per worker // on large repos due to shared pack page cache pressure. WorkerNativeOverhead = 50 * units.MiB // AvgDiffSize is the average size of a cached diff entry. AvgDiffSize = 2 * units.KiB // AvgCommitDataSize is the average size of in-flight commit data. AvgCommitDataSize = 64 * units.KiB // MaxBlobCacheSize caps the blob cache to avoid dominating the budget. // Beyond 256 MB the hit rate improvement is marginal for most repositories. MaxBlobCacheSize = 256 * units.MiB // MaxDiffCacheEntries caps the diff cache. Beyond 20K entries the benefit // is marginal and memory cost grows linearly. MaxDiffCacheEntries = 20000 // DefaultMwindowMappedLimit is libgit2's default mmap limit (8 GiB on 64-bit). // This allows pack file windows to consume enormous RSS on large repos. DefaultMwindowMappedLimit = 8 * units.GiB // DefaultLibgit2CacheSize is libgit2's default object cache (256 MiB). DefaultLibgit2CacheSize = 256 * units.MiB // NativeMemoryPercent is the fraction of the budget reserved for libgit2 // native memory (mwindow + object cache + decompression buffers). // The rest is available to Go heap, caches, and buffers. NativeMemoryPercent = 25 // MwindowCacheRatio controls how the native allocation is split: // 30% for mwindow (mmap'd pack data), 70% for object cache. // Lowered from 80 to reduce RSS from pack file mmap windows. // The larger object cache compensates by keeping decompressed objects // longer, reducing re-decompression overhead. MwindowCacheRatio = 30 )
Component memory sizes (empirically measured).
const ( // CacheAllocationPercent is the percentage of available budget for caches. CacheAllocationPercent = 60 // WorkerAllocationPercent is the percentage of available budget for workers. WorkerAllocationPercent = 30 // BufferAllocationPercent is the percentage of available budget for buffers. BufferAllocationPercent = 10 // SlackPercent is reserved for runtime overhead. SlackPercent = 5 // BlobCacheRatio is the portion of cache allocation for blob cache. BlobCacheRatio = 80 // DiffCacheRatio is the portion of cache allocation for diff cache. DiffCacheRatio = 20 )
Allocation proportions for budget distribution.
const ( // MinimumBudget is the smallest budget the solver will accept. // Must exceed BaseOverhead (250 MiB) plus room for at least 1 worker. MinimumBudget = 512 * units.MiB // DefaultArenaSize is the default blob arena size. // 8 MiB reduces fallback to per-blob C malloc (which accumulates in // glibc arenas as retained native RSS) by fitting ~97% of blob batches. DefaultArenaSize = 8 * units.MiB // MaxArenaSize is the maximum arena size allowed. MaxArenaSize = 16 * units.MiB // DefaultCommitBatchSize is used for all budget-derived configs. DefaultCommitBatchSize = 100 // MinWorkers is the minimum number of workers. MinWorkers = 1 // MinBufferSize is the minimum buffer size. MinBufferSize = 2 // MinDiffCacheSize is the minimum diff cache entries. MinDiffCacheSize = 100 // MinBlobCacheSize is the minimum blob cache size. MinBlobCacheSize = 1 * units.MiB // OptimalWorkerRatio is the percentage of CPU cores to use for workers. // Testing shows ~60% provides optimal performance due to contention overhead. OptimalWorkerRatio = 60 // UASTPipelineWorkerRatio is the percentage of CPU cores for UAST pipeline workers. UASTPipelineWorkerRatio = 40 // LeafWorkerDivisor controls leaf worker count: NumCPU / divisor. LeafWorkerDivisor = 3 // MinLeafWorkers is the minimum number of leaf workers. MinLeafWorkers = 4 )
Solver constraints.
const ( // StaticBaseOverhead is the fixed Go runtime + loaded analyzers overhead. // Lower than history's BaseOverhead because no libgit2 repo is opened. StaticBaseOverhead = 150 * units.MiB // StaticWorkerFootprint is the per-worker memory for parser + tree-sitter // native tree + Go-side node.Node tree + file content buffer. StaticWorkerFootprint = 50 * units.MiB // StaticAvgItemBytes is the average gob-encoded size of a report item // (map[string]any with ~8 keys). Used to estimate spill threshold. StaticAvgItemBytes = 512 // StaticAnalyzerCount is the number of static analyzers that use // SpillableDataCollector (complexity, halstead, comments, cohesion, clones, imports). StaticAnalyzerCount = 6 // MinStaticBudget is the smallest budget that produces a non-zero config. // Must cover base overhead plus at least one worker. MinStaticBudget = StaticBaseOverhead + StaticWorkerFootprint + 10*units.MiB // MaxStaticWorkers caps workers even with large budgets. MaxStaticWorkers = 16 // MinStaticSpillThreshold is the floor for spill threshold. MinStaticSpillThreshold = 1000 // MaxStaticSpillThreshold is the ceiling for spill threshold. MaxStaticSpillThreshold = 100000 )
Static analysis cost model constants (empirically measured).
const DefaultMallocArenaMax = 2
DefaultMallocArenaMax limits glibc malloc arenas to prevent RSS bloat. glibc defaults to 8*cores which retains freed memory across ~192 arenas on a 24-core machine, inflating RSS by 3-4x. A value of 2 reduces peak RSS by ~60% vs default, with minimal throughput impact when combined with malloc_trim(0) between chunks to reclaim freed arena memory.
Variables ¶
var ( // ErrBudgetTooSmall indicates the budget is below the minimum required. ErrBudgetTooSmall = errors.New("memory budget is too small") )
Solver errors.
Functions ¶
func SolveForBudget ¶
func SolveForBudget(budget int64) (framework.CoordinatorConfig, error)
SolveForBudget calculates optimal CoordinatorConfig for the given memory budget. The solver distributes available memory across workers, caches, and buffers while ensuring the total estimated usage stays within budget.
Types ¶
type NativeLimits ¶
NativeLimits holds libgit2 global memory limits derived from the budget.
func NativeLimitsForBudget ¶
func NativeLimitsForBudget(budget int64) NativeLimits
NativeLimitsForBudget computes libgit2 memory limits proportional to the memory budget. Returns zero values when no budget is set (use defaults).
type StaticBudgetConfig ¶
StaticBudgetConfig holds budget-derived parameters for the static analysis phase. Zero values mean "use defaults" — no override applied.
func SolveStaticBudget ¶
func SolveStaticBudget(budgetBytes int64) StaticBudgetConfig
SolveStaticBudget derives static analysis parameters from a memory budget. Returns zero-value config when budget is zero, negative, or below minimum.