Documentation
¶
Overview ¶
Package csrfile defines the on-disk binary format used by GoGraph's Tier 2 (out-of-core, mmap-backed) CSR storage.
The format is stable and versioned; the full specification lives at docs/csrfile-v1.md alongside this repository. Tier 2 readers mmap the file and reinterpret the aligned sections as typed []uint64 slices without parsing.
Example ¶
Example writes a CSR snapshot to an on-disk Tier 2 file and reads it back through the mmap-backed Reader, inspecting the header and the stored edge weights without re-parsing the body.
package main
import (
"fmt"
"os"
"path/filepath"
"github.com/FlavioCFOliveira/GoGraph/graph/adjlist"
"github.com/FlavioCFOliveira/GoGraph/graph/csr"
"github.com/FlavioCFOliveira/GoGraph/store/csrfile"
)
func main() {
dir, err := os.MkdirTemp("", "csrfile-example")
if err != nil {
panic(err)
}
defer func() { _ = os.RemoveAll(dir) }()
// Build a small weighted directed graph and freeze it into an
// immutable CSR snapshot.
a := adjlist.New[string, int64](adjlist.Config{Directed: true})
for _, e := range []struct {
src, dst string
w int64
}{
{"a", "b", 10},
{"a", "c", 20},
{"b", "c", 30},
} {
if err := a.AddEdge(e.src, e.dst, e.w); err != nil {
panic(err)
}
}
c := csr.BuildFromAdjList(a)
// Write the CSR to disk; publication is atomic (write .tmp, fsync,
// rename).
path := filepath.Join(dir, "graph.csr")
if _, err := csrfile.WriteToFile(path, c); err != nil {
panic(err)
}
// Open mmaps the file read-only and verifies the header and tail
// CRC. The returned slices alias the mapped region and stay valid
// until Close.
r, err := csrfile.Open(path)
if err != nil {
panic(err)
}
defer func() { _ = r.Close() }()
// NEdges is the live edge count. NVertices is the length of the
// dense vertex-offset (row-pointer) array, which is sized by the
// largest interned NodeID rather than by the number of distinct
// keys, so it is not asserted here; csr.CSR.Order reports the live
// vertex count when that is what you need.
h := r.Header()
fmt.Printf("edges=%d\n", h.NEdges)
weights, ok := r.WeightsUint64()
fmt.Printf("weights present=%t count=%d\n", ok, len(weights))
// The mmapped edge slice has one entry per stored edge.
fmt.Printf("edge slice len=%d\n", len(r.Edges()))
}
Output: edges=3 weights present=true count=3 edge slice len=3
Index ¶
- Constants
- Variables
- func AlignUp(n, alignment uint64) uint64
- func BuildFixture(spec FixtureSpec) (*csr.CSR[struct{}], error)
- func EncodeHeader(h Header) []byte
- func Reinterpret[T any](data []byte, n int) []T
- type AccessPattern
- type FixtureSpec
- type Header
- type Reader
- func (r *Reader) Close() error
- func (r *Reader) Edges() []graph.NodeID
- func (r *Reader) Header() Header
- func (r *Reader) Read(fn func(vertices []uint64, edges []graph.NodeID, weights []byte) error) error
- func (r *Reader) SetHint(pattern AccessPattern) error
- func (r *Reader) Vertices() []uint64
- func (r *Reader) WeightsFloat64() ([]float64, bool)
- func (r *Reader) WeightsRaw() []byte
- func (r *Reader) WeightsUint64() ([]uint64, bool)
- type WeightKind
Examples ¶
Constants ¶
const Alignment = 64
Alignment is the section alignment in bytes; every typed section starts at an offset that is a multiple of this value.
const CurrentVersion uint16 = 1
CurrentVersion is the format version emitted by this build.
const HeaderSize = 64
HeaderSize is the fixed byte length of the file header.
Variables ¶
var ( // ErrBadMagic indicates the first four bytes are not Magic. ErrBadMagic = errors.New("csrfile: bad magic") // ErrUnsupportedVersion indicates a newer-than-build version. ErrUnsupportedVersion = errors.New("csrfile: unsupported version") // ErrUnsupportedByteOrder indicates a non-little-endian file. ErrUnsupportedByteOrder = errors.New("csrfile: unsupported byte order") // ErrUnknownWeightKind indicates an out-of-range WeightKind. ErrUnknownWeightKind = errors.New("csrfile: unknown weight kind") // ErrFileCorrupted indicates the tail CRC32C check failed. ErrFileCorrupted = errors.New("csrfile: file corrupted") // ErrHeaderTooShort indicates a short read while parsing the header. ErrHeaderTooShort = errors.New("csrfile: header too short") // ErrReaderClosed indicates an operation was attempted on a Reader // whose mmap has already been released by [Reader.Close]. ErrReaderClosed = errors.New("csrfile: reader closed") // ErrHeaderInconsistent indicates the decoded header's counts and // offsets do not match the single canonical on-disk layout for // those counts (a malformed, hostile, or overflowing header). It // wraps [ErrFileCorrupted]: errors.Is(err, ErrFileCorrupted) is // true for any structural rejection, so callers that already test // for ErrFileCorrupted keep working. ErrHeaderInconsistent = fmt.Errorf("%w: header layout inconsistent", ErrFileCorrupted) )
Errors surfaced by the format helpers.
var Magic = [4]byte{'G', 'G', 'C', 'S'}
Magic is the 4-byte file identifier: ASCII "GGCS".
Functions ¶
func BuildFixture ¶
func BuildFixture(spec FixtureSpec) (*csr.CSR[struct{}], error)
BuildFixture deterministically builds a CSR snapshot meeting spec. The graph is directed and uses uint32 vertex identifiers; weights are absent (struct{}). Suitable for Tier 2 benchmarks and the crash-recovery harness, where reproducibility matters more than realism.
BuildFixture returns any error surfaced by the underlying adjlist.AdjList; with the default uncapped configuration the only failure mode is adjlist.ErrShardFull, which cannot be reached because no adjlist.Config.MaxShardCapacity is set.
func EncodeHeader ¶
EncodeHeader writes h into a fresh HeaderSize-byte slice.
func Reinterpret ¶
Reinterpret returns a typed slice of length n that aliases the memory backing data. T must be a fixed-size primitive (or a named alias of one) — int8/int16/int32/int64/uint*/float32/float64 or a type whose layout is identical to one of those. The function panics when data is too short to hold n elements of T or when its alignment is incompatible with T.
Precondition on n: the byte requirement size(T)*n must be representable; n is treated as untrusted. When size(T)*n overflows (or exceeds what a Go slice length can address), the requirement can never be satisfied by any real buffer, so Reinterpret takes the same "data too short" panic path rather than computing a wrapped product and slicing out of bounds. Callers that derive n from a wire- or file-encoded value therefore get a deterministic, guarded failure instead of an out-of-bounds read.
Because the returned slice aliases data's memory, callers must preserve the data buffer's lifetime for the duration they hold the returned slice, and must not mutate it through the returned slice in ways that would surprise other readers. Typical use: re-typing the body of a memory-mapped region as a []uint64 view.
This helper is unsafe in the Go-language sense (it uses unsafe.Pointer and unsafe.Slice); the project policy for unsafe usage is documented in CONTRIBUTING.md.
Types ¶
type AccessPattern ¶
type AccessPattern uint8
AccessPattern is the advisory hint given to the OS about the expected memory-access pattern of a mapped section.
const ( AccessDefault AccessPattern = iota AccessSequential AccessRandom AccessWillNeed AccessDontNeed )
Supported access patterns.
type FixtureSpec ¶
type FixtureSpec struct {
// Vertices is the number of pre-interned vertex IDs.
Vertices uint64
// Edges is the number of edges to add (uniformly random
// (src, dst) over [0, Vertices)).
Edges uint64
// Seed is the PCG seed (any uint64).
Seed uint64
// Multigraph allows parallel edges; without it duplicates are
// silently collapsed.
Multigraph bool
}
FixtureSpec parameterises BuildFixture. The same seed produces the same graph every time, making the harness deterministic.
type Header ¶
type Header struct {
Version uint16
Alignment uint8
NVertices uint64
NEdges uint64
Weight WeightKind
VerticesOffset uint64
EdgesOffset uint64
WeightsOffset uint64
TailCRCOffset uint64
}
Header is the in-memory representation of the 64-byte file preamble.
func DecodeHeader ¶
DecodeHeader parses the first HeaderSize bytes into a Header.
func Layout ¶
func Layout(nVertices, nEdges uint64, weight WeightKind) (header Header, totalBytes uint64)
Layout computes the on-disk byte offsets and total size for a header populated with NVertices, NEdges, and Weight. It returns the header with offsets filled and the total file size including the trailing CRC32C uint32.
All arithmetic is overflow-safe: when NVertices, NEdges, or the derived section sizes would overflow a uint64 (as a hostile or corrupted header can request), Layout returns the zero Header and a zero totalBytes. Callers — both the writer and [Header.validate] — must treat a zero totalBytes as "this header is not representable on disk" and reject it rather than proceeding with bogus offsets.
func WriteToFile ¶
WriteToFile serialises c into the path atomically and durably: data lands in path + ".tmp" first, the temp file's contents are fsync'd, the file is renamed onto path, and finally the PARENT directory is fsync'd so the rename's directory entry survives a crash. Concurrent readers see either the previous file or the new file, never a partial write.
On return with a nil error the published file is crash-durable: it survives process crash, host crash, and kill -9. This guarantee matters because WriteToFile is the bulk loader's sole durability mechanism — the bulk path bypasses the WAL, so there is no replay and no later checkpoint of this artefact to recover a lost rename. Without the parent-directory fsync, a crash within the kernel's writeback window after a successful return could lose the rename's directory entry and with it the entire bulk load. The parent fsync is a no-op on platforms without a directory-fsync primitive (Windows); see [parentDirFsync].
W must be one of the supported weight kinds (int32/uint32/float32 for 4-byte; int/uint/int64/uint64/float64/uintptr for 8-byte) or struct{} for unweighted graphs. Unsupported types produce ErrUnknownWeightKind.
type Reader ¶
type Reader struct {
// contains filtered or unexported fields
}
Reader is a read-only, mmap-backed view of a csrfile.
All slices returned by Reader.Vertices / Reader.Edges / Reader.WeightsRaw alias the underlying mmap region — they must not be mutated and remain valid only for as long as the mapping is live. Reader.Close unmaps the region, after which any retained slice is a dangling view into freed memory; reading it is a use-after-unmap fault, not a recoverable Go panic.
Concurrency. A Reader is safe for concurrent use, but a long-lived reader that binds the slices once and iterates them MUST do so inside Reader.Read: Read holds an internal read lock for the whole duration of the callback, and Close blocks until every in-flight Read has returned before it unmaps. Calling the bare accessors (Reader.Vertices etc.) and then iterating outside Read races a concurrent Close and may fault the process; the accessors are retained only for short, non-concurrent inspection.
func Open ¶
Open mmaps path read-only, verifies the header and the tail CRC, and returns a Reader pointing into the mapped region.
func (*Reader) Close ¶
Close releases the mmap and underlying file. Any slice returned by the Reader becomes invalid.
Close acquires the Reader's write lock, so it blocks until every in-flight Reader.Read has returned before unmapping — no reader can be iterating the mapping at the moment it is released. Close is idempotent: the second and later calls observe the already-released mapping and return nil without unmapping twice. Close is safe to call concurrently from multiple goroutines.
func (*Reader) Edges ¶
Edges returns the edges slice. The returned slice aliases the mmap region and is valid only until Reader.Close; long-lived or concurrent iteration MUST use Reader.Read.
func (*Reader) Read ¶ added in v0.2.0
Read invokes fn with the mmap-aliased vertices, edges and raw weights slices while holding an internal read lock for the whole duration of the call. The lock keeps the mapping live across the entire callback, so fn may safely bind the slices once and iterate them: a concurrent Reader.Close blocks until fn returns before it unmaps, closing the use-after-unmap window that bare accessors plus an external iteration would leave open.
The slices passed to fn alias the read-only mmap region; fn must not mutate them and must not retain them beyond its own return — they are invalid once Read returns. weights is nil when the file is unweighted.
Read returns ErrReaderClosed if the Reader has already been closed; otherwise it returns whatever fn returns. The read lock is released even if fn panics. Read is safe for concurrent use; any number of Read calls run in parallel.
func (*Reader) SetHint ¶
func (r *Reader) SetHint(pattern AccessPattern) error
SetHint applies an OS-level advisory hint to the mapped region describing the expected access pattern. On Linux it issues madvise; on other platforms the call returns nil and is a no-op at the OS level (Go's mmap-go does not expose madvise on every platform).
SetHint holds the read lock across the OS call, so it cannot race a concurrent Reader.Close (which would otherwise unmap the region the madvise syscall is about to touch). It returns ErrReaderClosed if the Reader is already closed. Safe for concurrent use.
func (*Reader) Vertices ¶
Vertices returns the offsets slice. Each entry is the start index in Reader.Edges of that vertex's out-neighbours.
The returned slice aliases the mmap region and is valid only until Reader.Close. A consumer that iterates it concurrently with a possible Close MUST use Reader.Read instead; see the Reader type documentation.
func (*Reader) WeightsFloat64 ¶
WeightsFloat64 returns the weights section as a []float64 when possible.
func (*Reader) WeightsRaw ¶
WeightsRaw returns the raw bytes of the weights section. Use Reader.WeightsUint64 / Reader.WeightsFloat64 for typed views. Returns nil when the file is unweighted. The returned slice aliases the mmap region and is valid only until Reader.Close; long-lived or concurrent iteration MUST use Reader.Read.
func (*Reader) WeightsUint64 ¶
WeightsUint64 returns the weights section as a []uint64 when possible. Returns nil, false when the weight kind is not 8-byte integer.
type WeightKind ¶
type WeightKind uint8
WeightKind tags the on-disk type of the weights section.
const ( WeightAbsent WeightKind = 0 WeightUint32 WeightKind = 1 WeightUint64 WeightKind = 2 WeightFloat32 WeightKind = 3 WeightFloat64 WeightKind = 4 )
Supported weight kinds.
func (WeightKind) Size ¶
func (k WeightKind) Size() int
Size returns the byte size of one weight value for k.