budget

package
v1.0.0-rc.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 13, 2026 License: Apache-2.0, Apache-2.0 Imports: 4 Imported by: 0

Documentation

Overview

Package budget provides memory budget calculation and auto-tuning for codefang history analysis.

Index

Constants

View Source
const (
	// BaseOverhead is the fixed Go runtime + libgit2 overhead.
	// Includes shared mmap of pack files (~200 MB for large repos).
	BaseOverhead = 250 * units.MiB

	// RepoHandleSize is the Go-visible memory per worker for libgit2 repository handle.
	RepoHandleSize = 10 * units.MiB

	// WorkerNativeOverhead is the per-worker C/mmap overhead from libgit2.
	// Each worker opens the repo and mmaps pack index files; the OS faults in
	// pack data pages during object lookups. Empirically ~50-100 MB per worker
	// on large repos due to shared pack page cache pressure.
	WorkerNativeOverhead = 50 * units.MiB

	// AvgDiffSize is the average size of a cached diff entry.
	AvgDiffSize = 2 * units.KiB

	// AvgCommitDataSize is the average size of in-flight commit data.
	AvgCommitDataSize = 64 * units.KiB

	// MaxBlobCacheSize caps the blob cache to avoid dominating the budget.
	// Beyond 256 MB the hit rate improvement is marginal for most repositories.
	MaxBlobCacheSize = 256 * units.MiB

	// MaxDiffCacheEntries caps the diff cache. Beyond 20K entries the benefit
	// is marginal and memory cost grows linearly.
	MaxDiffCacheEntries = 20000

	// DefaultMwindowMappedLimit is libgit2's default mmap limit (8 GiB on 64-bit).
	// This allows pack file windows to consume enormous RSS on large repos.
	DefaultMwindowMappedLimit = 8 * units.GiB

	// DefaultLibgit2CacheSize is libgit2's default object cache (256 MiB).
	DefaultLibgit2CacheSize = 256 * units.MiB

	// NativeMemoryPercent is the fraction of the budget reserved for libgit2
	// native memory (mwindow + object cache + decompression buffers).
	// The rest is available to Go heap, caches, and buffers.
	NativeMemoryPercent = 25

	// MwindowCacheRatio controls how the native allocation is split:
	// 30% for mwindow (mmap'd pack data), 70% for object cache.
	// Lowered from 80 to reduce RSS from pack file mmap windows.
	// The larger object cache compensates by keeping decompressed objects
	// longer, reducing re-decompression overhead.
	MwindowCacheRatio = 30
)

Component memory sizes (empirically measured).

View Source
const (
	// CacheAllocationPercent is the percentage of available budget for caches.
	CacheAllocationPercent = 60

	// WorkerAllocationPercent is the percentage of available budget for workers.
	WorkerAllocationPercent = 30

	// BufferAllocationPercent is the percentage of available budget for buffers.
	BufferAllocationPercent = 10

	// SlackPercent is reserved for runtime overhead.
	SlackPercent = 5

	// BlobCacheRatio is the portion of cache allocation for blob cache.
	BlobCacheRatio = 80

	// DiffCacheRatio is the portion of cache allocation for diff cache.
	DiffCacheRatio = 20
)

Allocation proportions for budget distribution.

View Source
const (
	// MinimumBudget is the smallest budget the solver will accept.
	// Must exceed BaseOverhead (250 MiB) plus room for at least 1 worker.
	MinimumBudget = 512 * units.MiB

	// DefaultArenaSize is the default blob arena size.
	// 8 MiB reduces fallback to per-blob C malloc (which accumulates in
	// glibc arenas as retained native RSS) by fitting ~97% of blob batches.
	DefaultArenaSize = 8 * units.MiB

	// MaxArenaSize is the maximum arena size allowed.
	MaxArenaSize = 16 * units.MiB

	// DefaultCommitBatchSize is used for all budget-derived configs.
	DefaultCommitBatchSize = 100

	// MinWorkers is the minimum number of workers.
	MinWorkers = 1

	// MinBufferSize is the minimum buffer size.
	MinBufferSize = 2

	// MinDiffCacheSize is the minimum diff cache entries.
	MinDiffCacheSize = 100

	// MinBlobCacheSize is the minimum blob cache size.
	MinBlobCacheSize = 1 * units.MiB

	// OptimalWorkerRatio is the percentage of CPU cores to use for workers.
	// Testing shows ~60% provides optimal performance due to contention overhead.
	OptimalWorkerRatio = 60

	// UASTPipelineWorkerRatio is the percentage of CPU cores for UAST pipeline workers.
	UASTPipelineWorkerRatio = 40

	// LeafWorkerDivisor controls leaf worker count: NumCPU / divisor.
	LeafWorkerDivisor = 3

	// MinLeafWorkers is the minimum number of leaf workers.
	MinLeafWorkers = 4
)

Solver constraints.

View Source
const (
	// StaticBaseOverhead is the fixed Go runtime + loaded analyzers overhead.
	// Lower than history's BaseOverhead because no libgit2 repo is opened.
	StaticBaseOverhead = 150 * units.MiB

	// StaticWorkerFootprint is the per-worker memory for parser + tree-sitter
	// native tree + Go-side node.Node tree + file content buffer.
	StaticWorkerFootprint = 50 * units.MiB

	// StaticAvgItemBytes is the average gob-encoded size of a report item
	// (map[string]any with ~8 keys). Used to estimate spill threshold.
	StaticAvgItemBytes = 512

	// StaticAnalyzerCount is the number of static analyzers that use
	// SpillableDataCollector (complexity, halstead, comments, cohesion, clones, imports).
	StaticAnalyzerCount = 6

	// MinStaticBudget is the smallest budget that produces a non-zero config.
	// Must cover base overhead plus at least one worker.
	MinStaticBudget = StaticBaseOverhead + StaticWorkerFootprint + 10*units.MiB

	// MaxStaticWorkers caps workers even with large budgets.
	MaxStaticWorkers = 16

	// MinStaticSpillThreshold is the floor for spill threshold.
	MinStaticSpillThreshold = 1000

	// MaxStaticSpillThreshold is the ceiling for spill threshold.
	MaxStaticSpillThreshold = 100000
)

Static analysis cost model constants (empirically measured).

View Source
const DefaultMallocArenaMax = 2

DefaultMallocArenaMax limits glibc malloc arenas to prevent RSS bloat. glibc defaults to 8*cores which retains freed memory across ~192 arenas on a 24-core machine, inflating RSS by 3-4x. A value of 2 reduces peak RSS by ~60% vs default, with minimal throughput impact when combined with malloc_trim(0) between chunks to reclaim freed arena memory.

Variables

View Source
var (
	// ErrBudgetTooSmall indicates the budget is below the minimum required.
	ErrBudgetTooSmall = errors.New("memory budget is too small")
)

Solver errors.

Functions

func SolveForBudget

func SolveForBudget(budget int64) (framework.CoordinatorConfig, error)

SolveForBudget calculates optimal CoordinatorConfig for the given memory budget. The solver distributes available memory across workers, caches, and buffers while ensuring the total estimated usage stays within budget.

Types

type NativeLimits

type NativeLimits struct {
	MwindowMappedLimit int64
	CacheMaxSize       int64
	MallocArenaMax     int
}

NativeLimits holds libgit2 global memory limits derived from the budget.

func NativeLimitsForBudget

func NativeLimitsForBudget(budget int64) NativeLimits

NativeLimitsForBudget computes libgit2 memory limits proportional to the memory budget. Returns zero values when no budget is set (use defaults).

type StaticBudgetConfig

type StaticBudgetConfig struct {
	MaxWorkers     int
	SpillThreshold int
}

StaticBudgetConfig holds budget-derived parameters for the static analysis phase. Zero values mean "use defaults" — no override applied.

func SolveStaticBudget

func SolveStaticBudget(budgetBytes int64) StaticBudgetConfig

SolveStaticBudget derives static analysis parameters from a memory budget. Returns zero-value config when budget is zero, negative, or below minimum.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL