archive

package module
v1.0.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 7, 2025 License: MIT Imports: 10 Imported by: 0

README

go-cache-archive

A persistent, fixed-size ring-buffer cache that lives on disk but behaves like a simple in-memory byte‐slice store. It is optimised for very large datasets ( > GBs ) that do not fit comfortably in RAM and for high-frequency append–read workloads where a single producer continually writes new records while many consumers read them, possibly concurrently.

The library is implemented in pure Go and is production-ready on Linux (requires golang.org/x/sys/unix).


Key Features

Feature Purpose
Memory-mapping (mmap) Optional direct access to the file’s pages without extra syscalls — the kernel loads pages on demand and evicts them under pressure. Great for read-heavy scenarios.
Sharding Splits a huge cache into multiple smaller files (cache.dat, cache.dat.1, …) so each file is below the filesystem’s sweet-spot size (default 256 MB × 4). This keeps mmap page tables small and reduces fsync latency.
Buffer Pool Re-uses byte slices for I/O when mmap is disabled, dramatically reducing make([]byte, …) allocations.
Prefetch When a record is read the next N records can be fetched asynchronously (either by touching mmap pages or issuing background reads). Useful for sequential consumers.
Auto-sequenced writes WriteHead picks the next ID & wraps at configurable min/max, removing boiler-plate from the producer.
CRC32 Integrity Every payload is stored with a 4-byte IEEE CRC32 checksum so corrupted sectors are detected eagerly.
Goroutine-safe Internally sharded sync.RWMutex means thousands of concurrent readers and writers can operate without global contention.

Why use a disk-backed ring buffer?

Typical RAM usage

go-cache-archive stores exactly recordSize + 4 bytes per entry on disk. With mmap enabled the kernel will only keep hot pages resident. Cold pages are purged without affecting the process’s RSS.

Cache Size Records (example) RSS when cold RSS after reading entire cache
1 GB 33 M × 32 B ~ 0 MB (metadata only) ≤ 1 GB (pages stay until pressure)
8 GB 262 M × 32 B ~ 0 MB ≤ 8 GB

In contrast an in-memory [][]byte cache would always keep the full N × recordSize in RAM and add extra pointer & slice-header overhead (~24 B each).

Theoretical Throughput
Operation mmap* direct I/O
Read memcpy-speed (≈ >8 GB/s on modern NVMe) because the kernel prefetches clusters and the call avoids syscalls pread syscall per record, limited by disk throughput plus syscall overhead
Write memcpy into mapped page then msync (optional) pwrite then optional fsync

* numbers assume SSD/NVMe and the page already loaded in the VFS cache.

In benchmarks with 4 kB records on an NVMe drive:

  • Writes: 400–500 MB/s sustained (msync batched every 64 records).
  • Reads (cold): 2–3 GB/s sequential thanks to the kernel readahead.

Producer / Consumer Model

flowchart LR
    subgraph Producer/Cache
        P[Producer]
        C[RingBufferCache]
        P -- "Write(ID++, payload)" --> C
    end

    subgraph Consumers
        direction TB
        Consumer1[Consumer 1]
        ConsumerN[... Many Consumers ...]
    end

    C -- "Read(lastID-N..∞)" --> ConsumerN
    Consumer1 -- "Read(ID)" --> C
    C --> Disk[(Cold Storage)]
  1. Producer appends one record every second (or faster).
  2. Each consumer polls the last known ID or uses BulkRead to fetch the newest batch.
  3. With mmap enabled consumers incur zero syscalls if the page is resident.
  4. Back-pressure is handled naturally by the OS: rarely accessed pages are evicted, keeping RSS bounded.

Quick Start

import "github.com/luhtfiimanal/go-cache-archive"

// Create a cache allowing IDs 100..5000 (wraps automatically)
opts := archive.DefaultOptions()
opts.MinIDAlloc = 100
opts.MaxIDAlloc = 5000

cache, err := archive.NewRingBufferCacheWithOptions("/var/lib/myapp/cache.dat", opts)
if err != nil { panic(err) }

defer cache.Close()

// producer
payload := make([]byte, 128)
cache.Write(1, payload, true) // flush immediately

// consumer
p, _ := cache.Read(1)

See archive_test.go for more examples.

Auto-sequenced writes (producer convenience)

Instead of picking the next ID yourself, call WriteHead and let the cache advance head atomically:

id, err := cache.WriteHead(payload, true) // flush to disk

// and you can get the current head with
head := cache.Head()

// and tail with
tail := cache.Tail()

head, tail, and Head() / Tail() give you visibility into the current range. They are persisted in a side-car .meta file so the cache resumes correctly after restart.


Project File Layout

File Responsibility
doc.go Package-level documentation visible at pkg.go.dev.
options.go CacheOptions struct and DefaultOptions().
shard.go Internal shard struct (file descriptor, mmap slice, offsets).
cache.go RingBufferCache definition and constructors.
shard_lookup.go Helper to map a global ID ➜ shard + relative ID.
buffer.go Buffer-pool helpers and lock-sharding util.
io.go Write, Read, BulkWrite, BulkRead, CRC logic, prefetch.
stats.go Lightweight stats collection (Hits, Misses, ratios).
flush_close.go Flush and Close implementations (msync/fsync).
head_tail.go Ring-buffer metadata (head/tail) + WriteHead, Head, Tail.
archive_test.go Unit tests covering correctness and concurrency.

Roadmap / Ideas

  • Eviction or TTL‐based trimming for infinite streams.
  • Compression codecs per record.
  • Windows / macOS support.

Contributions welcome — open an issue or PR!

Documentation

Overview

Package archive provides a persistent ring-buffer cache that stores fixed-size records on disk with optional memory-mapping, sharding, buffer pooling, and automatic read-ahead (prefetch).

The library is organised into several files for clarity:

options.go      – configuration struct & defaults
shard.go        – shard representation
cache.go        – constructors & core fields
shard_lookup.go – helper to locate a shard for an ID
buffer.go       – pooled buffer & lock helpers
io.go           – read/write logic & CRC integrity
stats.go        – lightweight stats accessors
flush_close.go  – flush & close helpers

See the README for usage examples.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type CacheOptions

type CacheOptions struct {
	// Ring ID allocation range
	MinIDAlloc     int64 // ID pertama yang akan digunakan (default 1)
	MaxIDAlloc     int64 // Batas maksimal ID (0 = sama dengan size)
	UseMmap        bool  // Gunakan memory-mapping untuk performa lebih baik
	ShardCount     int   // Jumlah shard (0 = single file)
	RecordSize     int   // Ukuran payload setiap record (byte), wajib >0
	BufferPoolSize int   // Ukuran pool buffer (0 = disable)
	PrefetchSize   int   // Prefetch N records ke depan (0 = disable)
}

CacheOptions menyediakan opsi konfigurasi untuk RingBufferCache.

  • UseMmap: aktifkan memory-mapping untuk akses data lebih cepat
  • ShardCount: jumlah shard untuk memecah file besar (0 = single file)
  • BufferPoolSize: ukuran pool buffer untuk mengurangi alokasi (0 = nonaktif)
  • PrefetchSize: jumlah record diprefetch saat membaca (0 = nonaktif)

Semua bidang bersifat opsi; nilai 0 artinya gunakan default. Lihat DefaultOptions() untuk nilai bawaan.

func DefaultOptions

func DefaultOptions() CacheOptions

DefaultOptions mengembalikan konfigurasi default yang digunakan NewRingBufferCache.

type RingBufferCache

type RingBufferCache struct {
	// contains filtered or unexported fields
}

RingBufferCache menyediakan implementasi ring buffer berbasis file dengan dukungan sharding, memory-mapping, buffer-pool, dan prefetching.

Semua operasi aman untuk goroutine.

func NewRingBufferCache

func NewRingBufferCache(path string, opts CacheOptions) (*RingBufferCache, error)

NewRingBufferCache membuat cache dengan opsi default (lihat DefaultOptions).

func NewRingBufferCacheWithOptions

func NewRingBufferCacheWithOptions(basePath string, opts CacheOptions) (*RingBufferCache, error)

NewRingBufferCacheWithOptions creates a new RingBufferCache with the specified options. Parameters:

  • basePath: The base file path for the cache shards.
  • opts: Configuration options for the cache.

Returns:

  • A pointer to the created RingBufferCache.
  • An error if initialization fails, including directory creation, file opening, or memory mapping.

func (*RingBufferCache) BulkRead

func (c *RingBufferCache) BulkRead(startID int64, count int) ([][]byte, error)

BulkRead membaca beberapa record berturut-turut.

func (*RingBufferCache) BulkWrite

func (c *RingBufferCache) BulkWrite(startID int64, payloads [][]byte, flush bool) error

BulkWrite menulis beberapa payload berturut-turut.

func (*RingBufferCache) Close

func (c *RingBufferCache) Close() error

Close menutup semua sumber daya (file & mmap) milik cache.

func (*RingBufferCache) Delete added in v1.0.2

func (c *RingBufferCache) Delete(id int64) error

Delete menghapus record dengan ID tertentu dengan cara menulis payload bernilai nol sepanjang `record` byte serta CRC yang sesuai (agar terbaca sebagai record kosong yang valid). Selalu melakukan flush (fsync/msync) sehingga perubahan segera persisten di disk.

func (*RingBufferCache) Flush

func (c *RingBufferCache) Flush() error

Flush memaksa semua data tersimpan ke disk.

func (*RingBufferCache) GetStats

func (c *RingBufferCache) GetStats() Stats

GetStats mengambil snapshot statistik tanpa lock berat.

func (*RingBufferCache) Head

func (c *RingBufferCache) Head() int64

Head returns current head (last written ID).

func (*RingBufferCache) Read

func (c *RingBufferCache) Read(id int64) ([]byte, error)

Read mengambil payload dari ID tertentu.

func (*RingBufferCache) RecordSize

func (c *RingBufferCache) RecordSize() int

RecordSize mengembalikan ukuran setiap record (payload).

func (*RingBufferCache) ResetStats

func (c *RingBufferCache) ResetStats()

ResetStats mengatur ulang penghitung hit/miss.

func (*RingBufferCache) ShardCount

func (c *RingBufferCache) ShardCount() int

ShardCount mengembalikan jumlah shard di disk.

func (*RingBufferCache) Size

func (c *RingBufferCache) Size() int64

Size mengembalikan jumlah slot ID total.

func (*RingBufferCache) Tail

func (c *RingBufferCache) Tail() int64

Tail returns current tail (currently static unless eviction implemented).

func (*RingBufferCache) Write

func (c *RingBufferCache) Write(id int64, payload []byte, flush bool) error

func (*RingBufferCache) WriteHead

func (c *RingBufferCache) WriteHead(payload []byte, flush bool) (int64, error)

WriteHead writes payload to the next ID (head+1, wrapping) and returns the new ID.

type Stats

type Stats struct {
	Hits     uint64
	Misses   uint64
	HitRatio float64
}

Stats menyimpan statistik hit/miss cache. HitRatio dalam persentase (0-100).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL