Documentation
¶
Overview ¶
Package csrfile defines the on-disk binary format used by GoGraph's Tier 2 (out-of-core, mmap-backed) CSR storage.
The format is stable and versioned; the full specification lives at docs/csrfile-v1.md alongside this repository. Tier 2 readers mmap the file and reinterpret the aligned sections as typed []uint64 slices without parsing.
Example ¶
Example writes a CSR snapshot to an on-disk Tier 2 file and reads it back through the mmap-backed Reader, inspecting the header and the stored edge weights without re-parsing the body.
package main
import (
"fmt"
"os"
"path/filepath"
"github.com/FlavioCFOliveira/GoGraph/graph/adjlist"
"github.com/FlavioCFOliveira/GoGraph/graph/csr"
"github.com/FlavioCFOliveira/GoGraph/store/csrfile"
)
func main() {
dir, err := os.MkdirTemp("", "csrfile-example")
if err != nil {
panic(err)
}
defer func() { _ = os.RemoveAll(dir) }()
// Build a small weighted directed graph and freeze it into an
// immutable CSR snapshot.
a := adjlist.New[string, int64](adjlist.Config{Directed: true})
for _, e := range []struct {
src, dst string
w int64
}{
{"a", "b", 10},
{"a", "c", 20},
{"b", "c", 30},
} {
if err := a.AddEdge(e.src, e.dst, e.w); err != nil {
panic(err)
}
}
c := csr.BuildFromAdjList(a)
// Write the CSR to disk; publication is atomic (write .tmp, fsync,
// rename).
path := filepath.Join(dir, "graph.csr")
if _, err := csrfile.WriteToFile(path, c); err != nil {
panic(err)
}
// Open mmaps the file read-only and verifies the header and tail
// CRC. The returned slices alias the mapped region and stay valid
// until Close.
r, err := csrfile.Open(path)
if err != nil {
panic(err)
}
defer func() { _ = r.Close() }()
// NEdges is the live edge count. NVertices is the length of the
// dense vertex-offset (row-pointer) array, which is sized by the
// largest interned NodeID rather than by the number of distinct
// keys, so it is not asserted here; csr.CSR.Order reports the live
// vertex count when that is what you need.
h := r.Header()
fmt.Printf("edges=%d\n", h.NEdges)
weights, ok := r.WeightsUint64()
fmt.Printf("weights present=%t count=%d\n", ok, len(weights))
// The mmapped edge slice has one entry per stored edge.
fmt.Printf("edge slice len=%d\n", len(r.Edges()))
}
Output: edges=3 weights present=true count=3 edge slice len=3
Index ¶
- Constants
- Variables
- func AlignUp(n, alignment uint64) uint64
- func BuildFixture(spec FixtureSpec) (*csr.CSR[struct{}], error)
- func EncodeHeader(h Header) []byte
- func Reinterpret[T any](data []byte, n int) []T
- type AccessPattern
- type FixtureSpec
- type Header
- type Reader
- func (r *Reader) Close() error
- func (r *Reader) Edges() []graph.NodeID
- func (r *Reader) Header() Header
- func (r *Reader) SetHint(pattern AccessPattern) error
- func (r *Reader) Vertices() []uint64
- func (r *Reader) WeightsFloat64() ([]float64, bool)
- func (r *Reader) WeightsRaw() []byte
- func (r *Reader) WeightsUint64() ([]uint64, bool)
- type WeightKind
Examples ¶
Constants ¶
const Alignment = 64
Alignment is the section alignment in bytes; every typed section starts at an offset that is a multiple of this value.
const CurrentVersion uint16 = 1
CurrentVersion is the format version emitted by this build.
const HeaderSize = 64
HeaderSize is the fixed byte length of the file header.
Variables ¶
var ( // ErrBadMagic indicates the first four bytes are not Magic. ErrBadMagic = errors.New("csrfile: bad magic") // ErrUnsupportedVersion indicates a newer-than-build version. ErrUnsupportedVersion = errors.New("csrfile: unsupported version") // ErrUnsupportedByteOrder indicates a non-little-endian file. ErrUnsupportedByteOrder = errors.New("csrfile: unsupported byte order") // ErrUnknownWeightKind indicates an out-of-range WeightKind. ErrUnknownWeightKind = errors.New("csrfile: unknown weight kind") // ErrFileCorrupted indicates the tail CRC32C check failed. ErrFileCorrupted = errors.New("csrfile: file corrupted") // ErrHeaderTooShort indicates a short read while parsing the header. ErrHeaderTooShort = errors.New("csrfile: header too short") // ErrHeaderInconsistent indicates the decoded header's counts and // offsets do not match the single canonical on-disk layout for // those counts (a malformed, hostile, or overflowing header). It // wraps [ErrFileCorrupted]: errors.Is(err, ErrFileCorrupted) is // true for any structural rejection, so callers that already test // for ErrFileCorrupted keep working. ErrHeaderInconsistent = fmt.Errorf("%w: header layout inconsistent", ErrFileCorrupted) )
Errors surfaced by the format helpers.
var Magic = [4]byte{'G', 'G', 'C', 'S'}
Magic is the 4-byte file identifier: ASCII "GGCS".
Functions ¶
func BuildFixture ¶
func BuildFixture(spec FixtureSpec) (*csr.CSR[struct{}], error)
BuildFixture deterministically builds a CSR snapshot meeting spec. The graph is directed and uses uint32 vertex identifiers; weights are absent (struct{}). Suitable for Tier 2 benchmarks and the crash-recovery harness, where reproducibility matters more than realism.
BuildFixture returns any error surfaced by the underlying adjlist.AdjList; with the default uncapped configuration the only failure mode is adjlist.ErrShardFull, which cannot be reached because no adjlist.Config.MaxShardCapacity is set.
func EncodeHeader ¶
EncodeHeader writes h into a fresh HeaderSize-byte slice.
func Reinterpret ¶
Reinterpret returns a typed slice of length n that aliases the memory backing data. T must be a fixed-size primitive (or a named alias of one) — int8/int16/int32/int64/uint*/float32/float64 or a type whose layout is identical to one of those. The function panics when data is too short to hold n elements of T or when its alignment is incompatible with T.
Precondition on n: the byte requirement size(T)*n must be representable; n is treated as untrusted. When size(T)*n overflows (or exceeds what a Go slice length can address), the requirement can never be satisfied by any real buffer, so Reinterpret takes the same "data too short" panic path rather than computing a wrapped product and slicing out of bounds. Callers that derive n from a wire- or file-encoded value therefore get a deterministic, guarded failure instead of an out-of-bounds read.
Because the returned slice aliases data's memory, callers must preserve the data buffer's lifetime for the duration they hold the returned slice, and must not mutate it through the returned slice in ways that would surprise other readers. Typical use: re-typing the body of a memory-mapped region as a []uint64 view.
This helper is unsafe in the Go-language sense (it uses unsafe.Pointer and unsafe.Slice); the project policy for unsafe usage is documented in CONTRIBUTING.md.
Types ¶
type AccessPattern ¶
type AccessPattern uint8
AccessPattern is the advisory hint given to the OS about the expected memory-access pattern of a mapped section.
const ( AccessDefault AccessPattern = iota AccessSequential AccessRandom AccessWillNeed AccessDontNeed )
Supported access patterns.
type FixtureSpec ¶
type FixtureSpec struct {
// Vertices is the number of pre-interned vertex IDs.
Vertices uint64
// Edges is the number of edges to add (uniformly random
// (src, dst) over [0, Vertices)).
Edges uint64
// Seed is the PCG seed (any uint64).
Seed uint64
// Multigraph allows parallel edges; without it duplicates are
// silently collapsed.
Multigraph bool
}
FixtureSpec parameterises BuildFixture. The same seed produces the same graph every time, making the harness deterministic.
type Header ¶
type Header struct {
Version uint16
Alignment uint8
NVertices uint64
NEdges uint64
Weight WeightKind
VerticesOffset uint64
EdgesOffset uint64
WeightsOffset uint64
TailCRCOffset uint64
}
Header is the in-memory representation of the 64-byte file preamble.
func DecodeHeader ¶
DecodeHeader parses the first HeaderSize bytes into a Header.
func Layout ¶
func Layout(nVertices, nEdges uint64, weight WeightKind) (header Header, totalBytes uint64)
Layout computes the on-disk byte offsets and total size for a header populated with NVertices, NEdges, and Weight. It returns the header with offsets filled and the total file size including the trailing CRC32C uint32.
All arithmetic is overflow-safe: when NVertices, NEdges, or the derived section sizes would overflow a uint64 (as a hostile or corrupted header can request), Layout returns the zero Header and a zero totalBytes. Callers — both the writer and [Header.validate] — must treat a zero totalBytes as "this header is not representable on disk" and reject it rather than proceeding with bogus offsets.
func WriteToFile ¶
WriteToFile serialises c into the path atomically: data lands in path + ".tmp" first, the file is fsync'd, and only then renamed onto path. Concurrent readers see either the previous file or the new file, never a partial write.
W must be one of the supported weight kinds (int32/uint32/float32 for 4-byte; int/uint/int64/uint64/float64/uintptr for 8-byte) or struct{} for unweighted graphs. Unsupported types produce ErrUnknownWeightKind.
type Reader ¶
type Reader struct {
// contains filtered or unexported fields
}
Reader is a read-only, mmap-backed view of a csrfile.
All slices returned by Reader.Vertices / Reader.Edges / [Reader.Weights] alias the underlying mmap region — they must not be mutated and remain valid only while the Reader is open. Close invalidates them.
func Open ¶
Open mmaps path read-only, verifies the header and the tail CRC, and returns a Reader pointing into the mapped region.
func (*Reader) Close ¶
Close releases the mmap and underlying file. Any slice returned by the Reader becomes invalid.
func (*Reader) SetHint ¶
func (r *Reader) SetHint(pattern AccessPattern) error
SetHint applies an OS-level advisory hint to the mapped region describing the expected access pattern. On Linux it issues madvise; on other platforms the call returns nil and is a no-op at the OS level (Go's mmap-go does not expose madvise on every platform).
func (*Reader) Vertices ¶
Vertices returns the offsets slice. Each entry is the start index in Reader.Edges of that vertex's out-neighbours.
func (*Reader) WeightsFloat64 ¶
WeightsFloat64 returns the weights section as a []float64 when possible.
func (*Reader) WeightsRaw ¶
WeightsRaw returns the raw bytes of the weights section. Use Reader.WeightsUint64 / Reader.WeightsFloat64 for typed views. Returns nil when the file is unweighted.
func (*Reader) WeightsUint64 ¶
WeightsUint64 returns the weights section as a []uint64 when possible. Returns nil, false when the weight kind is not 8-byte integer.
type WeightKind ¶
type WeightKind uint8
WeightKind tags the on-disk type of the weights section.
const ( WeightAbsent WeightKind = 0 WeightUint32 WeightKind = 1 WeightUint64 WeightKind = 2 WeightFloat32 WeightKind = 3 WeightFloat64 WeightKind = 4 )
Supported weight kinds.
func (WeightKind) Size ¶
func (k WeightKind) Size() int
Size returns the byte size of one weight value for k.