README
¶
unified_bench
Side-by-side benchmarks for HashDB, BTreeOnHashDB, TreeDB (cached), Badger, and LevelDB.
Run
- Build:
make unified-bench(writesbin/unified-bench) - Run:
./bin/unified-bench - Or:
go run ./cmd/unified_bench
Guardrail Check (Read Snapshot + Append-Only)
Targeted regression guardrail for append-only writes plus read-heavy snapshot acquisition:
./scripts/check_read_snapshot_guardrail.sh
The script validates that TestRunBenchmark_ReadSnapshotAppendOnlyGuardrail
actually ran (to avoid go test false-greens when -run matches nothing).
It retries once for diagnostics and still fails the job if only the retry
passes, so flaky regressions are surfaced instead of silently passing.
Direct invocation:
cd /path/to/gomap/cmd/unified_bench
GOWORK=off GOMEMLIMIT=4GiB GOMAXPROCS=2 go test -json -p 1 . \
-run '^TestRunBenchmark_ReadSnapshotAppendOnlyGuardrail$' -count=1
Reproducibility
- Randomized tests use a per-test PRNG derived from
-seedso every DB sees the same random key/query sequence. - The chosen seed is printed to stderr at startup.
Tests
write_seq— Sequential Writewrite_rand— Random Writebatch_write— Batch Writebatch_random— Batch Randombatch_delete— Batch Deletedelete_rand— Random Deleterandom_read— Random Readrandom_read_parallel— Random Read (Parallel aggregate throughput)random_read_parallel_acquire_snapshot— Random Read (Parallel, Snapshot Per Key)random_read_batch— Random Read (Batch)full_scan— Full Scan (iterate the full keyspace)prefix_scan— Prefix Scan (range scans over[start,end))- Aliases:
scan→full_scan,range_scan→prefix_scan,read_rand→random_read,read_rand_parallel→random_read_parallel,read_rand_batch/read_random_batch→random_read_batch
- Aliases:
random_read_batch always exercises value-read paths:
- Uses
GetManywhen available. - Falls back to per-key
Getotherwise. - Any
GetMany/Geterror fails the test. - Missing keys are not treated as benchmark-fatal by default (adapter/API contract). Use
-read-require-hitto fail fast on misses and validate value lengths.
Common flags
-profilebenchmark profile preset (seecmd/unified_bench/profiles.go):balanced(default)durable(strict durability)fast(max throughput; TreeDB WAL off + throughput-biased vlog auto policy; unsafe)wal_on_fast(TreeDB WAL on + relaxed durability + throughput-biased vlog auto policy; unsafe)
-dbs(allor CSV):hashdb,btree,treedb,badger,leveldb-test(allor CSV): see list above-keysnumber of keys (default 100000)-keycountscomma-separated key counts to sweep over (overrides-keys)-keyscalegenerate keycounts by scale:log10ordoubling(uses-keys-min/-keys-max)-valsizevalue size in bytes (default 128)-val-patternvalue pattern for write tests (includingdataset_write_*) (zero|repeat|repeat_tail64|ultra_compressible_repeat|highly_compressible_notail|half_repeat_half_random|medium_compressible_sparse|celestia_height_prefix_fill|random)- Note: dataset-write generation now uses the same normalized behavior as other write tests (legacy pattern names are accepted as aliases, but generation is unified under
makeValuePool).
- Note: dataset-write generation now uses the same normalized behavior as other write tests (legacy pattern names are accepted as aliases, but generation is unified under
-val-pool-sizenumber of distinct values to cycle through for-val-pattern(0= auto)-batchsizebatch size (default 8000)-read-workersnumber of goroutines forrandom_read_parallelandrandom_read_parallel_acquire_snapshot(defaultGOMAXPROCS)-read-require-hitfail read benchmarks (random_read*,random_read_batch) on misses and validate value length matches-valsize-range-queriesnumber of prefix/range queries (default 200)-range-spannumber of keys per range (default 100)-leveldb-block-compressionLevelDB: block compression mode (default|on|off|both)-leveldb-block-sizeLevelDB: table block size in bytes (default 4096)-treedb-chunk-sizeTreeDB: pager chunk size in bytes (default256KiB)-treedb-flush-thresholdTreeDB (cached) flush threshold in bytes (default 64MB)-treedb-max-queued-memtablesTreeDB (cached) max queued immutable memtables before applying backpressure flush (0=default,<0=disable)-treedb-slowdown-backlog-secondsTreeDB (cached) start backpressure when queued backlog exceeds this many seconds of flush work-treedb-stop-backlog-secondsTreeDB (cached) block writers when queued backlog exceeds this many seconds of flush work-treedb-max-backlog-bytesTreeDB (cached) absolute cap on queued backlog bytes-treedb-writer-flush-max-memtablesTreeDB (cached) max memtables a writer will help flush per op-treedb-writer-flush-max-msTreeDB (cached) max time (ms) a writer will help flush per op-treedb-iter-debugprint prefix scan iterator timing + debug stats-treedb-iter-debug-limitmax per-query debug lines to print (default 20)-treedb-maintenance-ops-per-coalesceTreeDB: ops-per-coalesce maintenance budget (0=default,<0=disable budget)-treedb-bg-vacuum-intervalTreeDB: background index vacuum interval (0=disabled)-treedb-bg-vacuum-span-ppmTreeDB: background index vacuum span ratio threshold (ppm),0=default-treedb-allow-unsafeTreeDB: allow unsafe durability/integrity options (required for unsafe toggles)-treedb-vlog-dictTreeDB: value-log dict compression mode (default|on|off|both)-treedb-vlog-auto-policyTreeDB: value-log auto policy (balanced|throughput|size)-treedb-vlog-rewrite-min-segment-age-msTreeDB: minimum source segment age for online generational rewrite (0=default)-treedb-vlog-dict-frame-encode-levelTreeDB: dict frame zstd encoder level (engine|fastest|default|better|best|all|<int>)-treedb-vlog-dict-frame-entropyTreeDB: dict frame entropy mode (engine|on|off|both)-seedPRNG seed for randomized tests (default 1;0= time-based)-keepkeep temp DB directories after run-settle-before-scansclose+reopen DBs beforefull_scan/prefix_scanto measure scan performance on a “settled” (fully flushed) state-progresslive table updates to stderr (default true)-formatoutput format:tableormarkdown-cpuprofilewrite per-test CPU profiles to<prefix>_<test>_<db>.pprof-cpuprofile-testsrestrict CPU profiling to a CSV list of tests (e.g.random_read,batch_random)-allocsprofilewrite per-test allocation delta profiles to<prefix>_<test>_<db>.pprof(analyzable with-sample_index=alloc_space|alloc_objects)-allocsprofile-testsrestrict allocation profiling to a CSV list of tests-allocsprofilerateallocation sampling rate in bytes forruntime.MemProfileRate(default524288)-checkpoint-cpuprofilewrite per-checkpoint CPU profiles to<prefix>_checkpoint_<test>_<db>.pprof-checkpoint-cpuprofile-testsrestrict checkpoint CPU profiling to a CSV list of tests-profile-dirwrite all profile outputs into one directory (auto-sets defaults for-cpuprofile,-allocsprofile,-checkpoint-cpuprofile,-blockprofile,-mutexprofile,-trace; explicit flags still win). Also emitsbenchprof_results.jsonandbenchprof_results.md, then automatically runsbenchprofin-process.-treedb-cache-stats-before-readsprint selecttreedb.cache.*stats before read/scan tests (treedb only)-blockprofile,-mutexprofilewrite global profiling artifacts to files and also emit per-test contention delta profiles in the same directory (block_<test>_<db>.pprof,mutex_<test>_<db>.pprof) when the computed delta is non-empty-tracewrite runtime execution trace to file-max-wallabort the run if wall time exceeds this duration (guardrail;0= disabled)-max-rss-mbabort the run if RSS exceeds this many MiB (guardrail;0= disabled; Linux-only)-checkpoint-between-testsforce a best-effort durability checkpoint between tests (DBs that supportCheckpoint()), and also once after the final test so end-of-run disk usage reflects a settled state-vacuum-between-testsvacuum supported DBs between tests (implies-checkpoint-between-tests; TreeDB usesVacuumIndexOnline)-treedb-vlog-rewrite-after-runrun a full TreeDB value-log rewrite after the run and report before/after disk usage + the data directory path-checkpoint-every-opsforce a best-effort durability checkpoint every N ops during write-heavy tests (DBs that supportCheckpoint())-checkpoint-every-bytesforce a best-effort durability checkpoint every N approx bytes during write-heavy tests (DBs that supportCheckpoint())-suitenamed suite:readme— generates the README graphs + sweep tableschurn— churn + settled scans (treedb,leveldb)churnvacuum— churn + settled scans, then index compaction and scan againflushdrain— write burst → checkpoint boundary → read; prints checkpoint timing (TreeDB-focused). Use-flushdrain-checkpoint-max=<duration>to fail the suite if the checkpoint beforerandom_readexceeds your latency target.flushthrash— forces a small TreeDB flush threshold; catches flush thrash / runaway backlog regressionsbigkeys_guard— small TreeDB flush threshold + large keycount, with wall/RSS caps for CI guardrailslongmix— long-ish mixed workload + settle boundary with fragmentation reportssload_readheavy— settled point reads with value-log pointers + forkchoice-style batch commitsmaintenance_budget— sweep TreeDB maintenance K values; reports checkpoint time vs index size, recommends K
-outdiroutput directory for suite artifacts (plots/images; used by-suite readme)
Standard Profile Workflow (benchprof)
Use -profile-dir so all profiles and ops outputs are captured in one place:
OUT=$(mktemp -d /tmp/gomap_profiles_XXXXXX)
./bin/unified-bench \
-dbs treedb \
-keys 800000 \
-profile fast \
-checkpoint-between-tests \
-test random_write,random_delete,random_read,full_scan,prefix_scan \
-profile-dir "$OUT" \
-progress=false
./bin/benchprof -profiles-dir "$OUT"
This writes:
benchprof_results.json/benchprof_results.mdcpu_<test>_<db>.pprofallocs_<test>_<db>.pprofblock_<test>_<db>.pprof(when non-empty delta)mutex_<test>_<db>.pprof(when non-empty delta)checkpoint_cpu_checkpoint_<test>_<db>.pprofblock.pprof,mutex.pprof,trace.outinsights.md,insights.json,insights.html(frombenchprof)
Notes
TreeDB is a cached engine (memtable + background flush). If you run long write-heavy phases and then measure random_read/scans immediately, the results can be dominated by background flush work (“flush debt”).
Recommended:
- For settled read/scan performance: use
-checkpoint-between-testsor-settle-before-scans. - For mixed workload under flush debt: keep defaults and optionally enable
-treedb-cache-stats-before-readsto see queue/backlog stats.
Repro: mixed vs settled reads (TreeDB)
Mixed (reads under flush debt; intentionally stressful):
go run ./cmd/unified_bench -dbs treedb -profile fast -keys 900000 -valsize 128 -batchsize 1000 \\
-test sequential_write,random_write,dataset_write_random,dataset_write_sorted,batch_write,batch_random,batch_delete,batch_small_seq,random_delete,random_read \\
-treedb-cache-stats-before-reads -progress=false
Settled (reads after a durability boundary):
go run ./cmd/unified_bench -dbs treedb -profile fast -keys 900000 -valsize 128 -batchsize 1000 \\
-test sequential_write,random_write,dataset_write_random,dataset_write_sorted,batch_write,batch_random,batch_delete,batch_small_seq,random_delete,random_read \\
-checkpoint-between-tests -progress=false
Repro: compression matrix (TreeDB dict + LevelDB block compression)
Run TreeDB twice (dict on/off) and LevelDB twice (block compression on/off) in one invocation:
./bin/unified-bench -test batch_write,random_write,batch_delete -dbs treedb,leveldb -profile fast -keys 4000000 -format markdown \\
-treedb-force-value-pointers \\
-treedb-vlog-dict both \\
-leveldb-block-compression both
To sweep dict-frame encoder knobs (zstd level × entropy coding), use:
./bin/unified-bench -test batch_write -dbs treedb -profile fast -keys 1000000 -format markdown \\
-treedb-force-value-pointers \\
-treedb-vlog-dict on \\
-treedb-vlog-dict-frame-encode-level all \\
-treedb-vlog-dict-frame-entropy both
Repro: random read parallel sweep
Run random_read_parallel with separate worker counts:
./bin/unified-bench -dbs treedb,leveldb -profile fast -keys 500000 -test random_read_parallel -read-workers 1 -progress=false
./bin/unified-bench -dbs treedb,leveldb -profile fast -keys 500000 -test random_read_parallel -read-workers 2 -progress=false
./bin/unified-bench -dbs treedb,leveldb -profile fast -keys 500000 -test random_read_parallel -read-workers 4 -progress=false
./bin/unified-bench -dbs treedb,leveldb -profile fast -keys 500000 -test random_read_parallel -read-workers 8 -progress=false
-test all now includes random_read_parallel and random_read_parallel_acquire_snapshot in the output table:
./bin/unified-bench -dbs treedb,leveldb -profile fast -keys 500000 -test all -read-workers 4 -format markdown -progress=false
Source Files
¶
- adapter_badger.go
- adapter_bbolt.go
- adapter_buntdb.go
- adapter_hashdb.go
- adapter_leveldb.go
- adapter_lmdb_flags.go
- adapter_nutsdb.go
- adapter_pebble.go
- adapter_pogreb.go
- adapter_treedb.go
- adapter_treedb_backend.go
- db_registry.go
- db_variants.go
- hostinfo.go
- hostinfo_linux.go
- main.go
- plots.go
- profiles.go
- rss_linux.go
- suite_flushdrain.go
- suite_lanes_probe.go
- suite_maintenance_budget.go
- suite_parallel_unlock.go
- suite_vlog_dict.go
- value_pattern.go