Advanced Cache (advCache)
High‑load in‑memory HTTP cache & reverse proxy for Go.
Designed for low latency and sustained high throughput — hundreds of thousands of RPS on commodity hosts.
Built around sharded storage, LRU with TinyLFU admission (Doorkeeper), background refresh, a resilient upstream cluster, and a lightweight worker orchestrator. The hot path is engineered to avoid allocations and global locks.
Highlights
- Sharded storage (power‑of‑two shards) with per‑shard LRU and a global shard balancer for proportional eviction.
- Admission = W‑TinyLFU + Doorkeeper (Count‑Min Sketch + gated Bloom‑like filter).
- Background refresh with TTL, β‑staggering, scan‑rate and upstream rate limiting.
- Reverse proxy mode with an upstream cluster: per‑backend rate limiting, health probing, slow‑start, throttling/quarantine, and a fan‑in slot‑selection pattern for balanced dispatch.
- Worker orchestration for eviction/refresh/GC: on/off/start/reload/scale via a lightweight governor.
- Careful memory discipline: pooled buffers, zero‑copy headers, predictable storage budget.
- Dump/restore per shard with CRC32 and version rotation (optional GZIP).
- fasthttp HTTP layer, focused REST surface, Prometheus/VictoriaMetrics metrics.
- Kubernetes‑friendly: liveness probe, graceful shutdown, configurable GOMAXPROCS (auto when
0).
See also: METRICS.md and ROADMAP.md.
Repository map
Quick orientation to major components.
cmd/
main.go # entrypoint, flags, wiring, logger, probes
internal/cache/api/ # HTTP controllers: main cache route, on/off, clear, metrics
pkg/
config/ # YAML config loader & derived fields
http/server/ # fasthttp server, middlewares (Server header, JSON, etc)
orchestrator/ # worker governor & transport
pools/ # buffer and slice pools
prometheus/metrics/ # Prometheus/VictoriaMetrics exposition
storage/ # sharded map, per‑shard LRU, LFU, dumper, refresher, evictor
lru/ lfU/ map/
upstream/ # backend & cluster: rate‑limit, health, proxy logic
k8s/ # probes
utils/, common/, types/ # helpers
How requests are canonicalized (cache key)
To ensure keys are consistent and idempotent, requests are normalized before lookup/insert:
Whitelist filtering
Only items listed in config participate in the key:
- Query:
rules.*.cache_key.query — exact parameter names (supports names like project[id]).
- Headers:
rules.*.cache_key.headers — exact header names to include (e.g. Accept-Encoding).
All other query params and headers are ignored for the key.
Deterministic ordering
Selected query params and headers are sorted lexicographically (by name, then value) before key construction, so semantically identical requests map to the same key.
Source: pkg/model/query.go, pkg/model/header.go, pkg/common/sort/key_value.go.
Compression variants
If you whitelist Accept-Encoding, its normalized value becomes part of the key to isolate gzip/brotli/plain variants.
- Whitelist filtering (no sorting): Only response headers from
rules.*.cache_value.headers are stored and forwarded as‑is.
(No reordering is performed.)
Server: <service-name> — always set by middleware; if an upstream server name was present, it is preserved as X-Origin-Server and replaced with local Server.
Source: pkg/http/server/middleware/server_name.go.
Note: X-Refreshed-At is planned to indicate background refresh timing. (See ROADMAP.md.)
Configuration
Two example profiles are included:
advcache.cfg.yaml — deployment profile
advcache.cfg.local.yaml — local/stress profile
Selected top‑level keys (under cache:):
env — log/metrics label (dev, prod, etc).
runtime.gomaxprocs — 0 = auto (via automaxprocs); set explicit N to cap CPUs.
api.{name,port} — service name and HTTP port.
upstream.policy — "await" (back‑pressure) or "deny" (fail‑fast).
upstream.cluster.backends[] — per‑backend: rate, timeout, max_timeout, use_max_timeout_header, healthcheck.
data.dump — snapshots: {enabled,dir,name,crc32_control_sum,max_versions,gzip}.
storage.size — memory budget (bytes).
admission — TinyLFU: table_len_per_shard (power‑of‑two), estimated_length, door_bits_per_counter (8–16 typical), sample_multiplier (traffic‑proportional aging).
eviction — pressure policy: soft_limit (background eviction + enforce admission), hard_limit (minimal hot‑path eviction + runtime memory limit); replicas, scan_rate.
refresh — {enabled,ttl,beta,rate,replicas,scan_rate,coefficient}.
forceGC — periodic FreeOSMemory.
metrics.enabled — Prometheus/VictoriaMetrics.
k8s.probe.timeout — probe timeout.
rules — per‑path overrides + cache key/value canonicalization.
Example (deployment excerpt)
cache:
env: "dev"
enabled: true
runtime:
gomaxprocs: 0
api:
name: "starTeam.advCache"
port: "8020"
upstream:
policy: "await"
cluster:
backends:
- id: "prod-node-1"
enabled: true
host: "localhost:8081"
scheme: "http"
rate: 100000
timeout: "10s"
max_timeout: "1m"
use_max_timeout_header: ""
healthcheck: "/healthcheck"
- id: "low-resources-prod-node-2"
enabled: true
host: "localhost:8082"
scheme: "http"
rate: 3000
timeout: "10s"
max_timeout: "1m"
use_max_timeout_header: ""
healthcheck: "/healthcheck"
- id: "legacy-prod-node-3"
enabled: true
host: "localhost:8083"
scheme: "http"
rate: 500
timeout: "1m"
max_timeout: "10m"
use_max_timeout_header: ""
healthcheck: "/legacy/health/is-ok"
data:
dump:
enabled: false
dir: "public/dump"
name: "cache.dump"
crc32_control_sum: true
max_versions: 3
gzip: false
storage:
size: 53687091200 # 50 GiB
admission:
table_len_per_shard: 32768
estimated_length: 10000000
door_bits_per_counter: 12
sample_multiplier: 12
eviction:
enabled: true
replicas: 4
scan_rate: 8
soft_limit: 0.8
hard_limit: 0.9
refresh:
enabled: true
ttl: "3h"
beta: 0.5
rate: 1250
replicas: 4
scan_rate: 32
coefficient: 0.5
forceGC:
enabled: true
interval: "10s"
metrics:
enabled: true
k8s:
probe:
timeout: "5s"
rules:
/api/v2/cloud/data:
cache_key:
query: [project[id], domain, language, choice, timezone]
headers: [Accept-Encoding]
cache_value:
headers: [Content-Type, Content-Length, Content-Encoding, Connection, Strict-Transport-Security, Vary, Cache-Control]
/api/v1/stats:
enabled: true
ttl: "36h"
beta: 0.4
coefficient: 0.7
cache_key:
query: [language, timezone]
headers: [Accept-Encoding]
cache_value:
headers: [Content-Type, Content-Length, Content-Encoding, Connection, Strict-Transport-Security, Vary, Cache-Control]
Example (local stress excerpt)
cache:
env: "dev"
enabled: true
runtime:
gomaxprocs: 12
api:
name: "starTeam.adv:8020"
port: "8081"
upstream:
policy: "deny"
cluster:
backends:
- id: "adv"
enabled: true
host: "localhost:8020"
scheme: "http"
rate: 250000
timeout: "5s"
max_timeout: "3m"
use_max_timeout_header: "X-Google-Bot"
healthcheck: "/k8s/probe"
storage:
size: 10737418240 # 10 GiB
admission:
table_len_per_shard: 32768
estimated_length: 10000000
door_bits_per_counter: 12
sample_multiplier: 10
eviction:
enabled: true
replicas: 4
scan_rate: 8
soft_limit: 0.9
hard_limit: 0.99
forceGC:
enabled: true
interval: "10s"
metrics:
enabled: true
Eviction & pressure policy
-
Background eviction at SOFT‑LIMIT
When heap_usage >= storage.size × soft_limit, the evictor runs in the background and does not touch the hot path. It removes items using a larger LRU sample (preferentially keeping newer entries). Increase replicas and scan_rate to shave memory continuously.
-
Admission at SOFT‑LIMIT
TinyLFU admission is enforced on the hot path during pressure to avoid polluting the cache with low‑value inserts while the evictor catches up.
-
Minimal hot‑path eviction at HARD‑LIMIT
When heap_usage >= storage.size × hard_limit, a single‑item eviction per request is applied to reduce contention with the background worker, and the runtime memory limit is set in parallel. This preserves throughput and avoids latency cliffs.
TinyLFU + Doorkeeper (admission)
- Count‑Min Sketch (depth=4) with compact counters, sharded to minimize contention.
- Sample‑based aging: ages after
estimated_length × sample_multiplier observations (traffic‑proportional).
- Doorkeeper (Bloom‑like bitset) gates first‑seen keys; reset with aging to avoid FPR growth.
Recommended starting points:
table_len_per_shard: 8192–32768 · door_bits_per_counter: 12 · sample_multiplier: 8–12
Sizing evidence (current tests)
With randomized object sizes between 1 KiB and 16 KiB (mocks), the cache fills to ~10 GiB of logical data with ~500 MiB of overhead. Resident usage stabilizes around ~10.5 GiB for a 10 GiB dataset under these conditions.
Build & run
Requirements: Go 1.24+
# Build
go build -o advCache ./cmd/main.go
# Run (uses default config path if present)
./advCache
# Run with an explicit config path
./advCache -cfg ./advcache.cfg.yaml
# Docker (example multi‑stage)
docker build -t advcache .
docker run --rm -p 8020:8020 -v "$PWD/public/dump:/app/public/dump" advcache -cfg /app/advcache.cfg.yaml
The built‑in defaults try advcache.cfg.yaml and then advcache.cfg.local.yaml if -cfg is not provided.
HTTP endpoints
GET /{any} — main cached endpoint (cache key = path + selected query + selected request headers).
GET /cache/on — enable caching.
GET /cache/off — disable caching.
GET /cache/clear — two‑step clear (first call returns a token with 5‑min TTL; second call with ?token= clears).
GET /metrics — Prometheus/VictoriaMetrics exposition.
Observability
- Hits, misses, proxied/fallback counts, errors, panics.
- Cache length and memory gauges.
- Upstream health: healthy/sick/dead.
- Eviction/admission activity.
- Refresh scan/attempt metrics.
Enable periodic stats to stdout with logs.stats: true in config.
Tuning guide (ops)
- Upstream policy:
deny for fail‑fast load tests; await in production for back‑pressure.
- Eviction thresholds: start with
soft_limit: 0.8–0.9, hard_limit: 0.9–0.99, forceGC.enabled: true. If hot‑path eviction triggers often, increase evictor replicas or scan_rate.
- Admission: watch Doorkeeper density and reset interval; if density > ~0.5, increase
door_bits_per_counter or reduce sample_multiplier.
- CPU: leave
gomaxprocs: 0 in production; pin CPUs via container limits/quotas if needed.
- Headers: whitelist only what must participate in the key;
Accept-Encoding is a good default when you store compressed variants.
Testing
- Unit tests around storage hot path, TinyLFU, and shard balancer.
- Dump/load tests with CRC and rotation.
- Upstream fault injection: timeouts, spikes, error bursts.
- Benchmarks with
-benchmem, race tests for concurrency‑sensitive code.
License
MIT — see LICENSE.
Maintainer
Borislav Glazunov — glazunov2142@gmail.com · Telegram @glbrslv