csrfile

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 5, 2026 License: MIT Imports: 20 Imported by: 0

Documentation

Overview

Package csrfile defines the on-disk binary format used by GoGraph's Tier 2 (out-of-core, mmap-backed) CSR storage.

The format is stable and versioned; the full specification lives at docs/csrfile-v1.md alongside this repository. Tier 2 readers mmap the file and reinterpret the aligned sections as typed []uint64 slices without parsing.

Example

Example writes a CSR snapshot to an on-disk Tier 2 file and reads it back through the mmap-backed Reader, inspecting the header and the stored edge weights without re-parsing the body.

package main

import (
	"fmt"
	"os"
	"path/filepath"

	"github.com/FlavioCFOliveira/GoGraph/graph/adjlist"
	"github.com/FlavioCFOliveira/GoGraph/graph/csr"
	"github.com/FlavioCFOliveira/GoGraph/store/csrfile"
)

func main() {
	dir, err := os.MkdirTemp("", "csrfile-example")
	if err != nil {
		panic(err)
	}
	defer func() { _ = os.RemoveAll(dir) }()

	// Build a small weighted directed graph and freeze it into an
	// immutable CSR snapshot.
	a := adjlist.New[string, int64](adjlist.Config{Directed: true})
	for _, e := range []struct {
		src, dst string
		w        int64
	}{
		{"a", "b", 10},
		{"a", "c", 20},
		{"b", "c", 30},
	} {
		if err := a.AddEdge(e.src, e.dst, e.w); err != nil {
			panic(err)
		}
	}
	c := csr.BuildFromAdjList(a)

	// Write the CSR to disk; publication is atomic (write .tmp, fsync,
	// rename).
	path := filepath.Join(dir, "graph.csr")
	if _, err := csrfile.WriteToFile(path, c); err != nil {
		panic(err)
	}

	// Open mmaps the file read-only and verifies the header and tail
	// CRC. The returned slices alias the mapped region and stay valid
	// until Close.
	r, err := csrfile.Open(path)
	if err != nil {
		panic(err)
	}
	defer func() { _ = r.Close() }()

	// NEdges is the live edge count. NVertices is the length of the
	// dense vertex-offset (row-pointer) array, which is sized by the
	// largest interned NodeID rather than by the number of distinct
	// keys, so it is not asserted here; csr.CSR.Order reports the live
	// vertex count when that is what you need.
	h := r.Header()
	fmt.Printf("edges=%d\n", h.NEdges)

	weights, ok := r.WeightsUint64()
	fmt.Printf("weights present=%t count=%d\n", ok, len(weights))

	// The mmapped edge slice has one entry per stored edge.
	fmt.Printf("edge slice len=%d\n", len(r.Edges()))

}
Output:
edges=3
weights present=true count=3
edge slice len=3

Index

Examples

Constants

View Source
const Alignment = 64

Alignment is the section alignment in bytes; every typed section starts at an offset that is a multiple of this value.

View Source
const CurrentVersion uint16 = 1

CurrentVersion is the format version emitted by this build.

View Source
const HeaderSize = 64

HeaderSize is the fixed byte length of the file header.

Variables

View Source
var (
	// ErrBadMagic indicates the first four bytes are not Magic.
	ErrBadMagic = errors.New("csrfile: bad magic")
	// ErrUnsupportedVersion indicates a newer-than-build version.
	ErrUnsupportedVersion = errors.New("csrfile: unsupported version")
	// ErrUnsupportedByteOrder indicates a non-little-endian file.
	ErrUnsupportedByteOrder = errors.New("csrfile: unsupported byte order")
	// ErrUnknownWeightKind indicates an out-of-range WeightKind.
	ErrUnknownWeightKind = errors.New("csrfile: unknown weight kind")
	// ErrFileCorrupted indicates the tail CRC32C check failed.
	ErrFileCorrupted = errors.New("csrfile: file corrupted")
	// ErrHeaderTooShort indicates a short read while parsing the header.
	ErrHeaderTooShort = errors.New("csrfile: header too short")
	// ErrReaderClosed indicates an operation was attempted on a Reader
	// whose mmap has already been released by [Reader.Close].
	ErrReaderClosed = errors.New("csrfile: reader closed")
	// ErrHeaderInconsistent indicates the decoded header's counts and
	// offsets do not match the single canonical on-disk layout for
	// those counts (a malformed, hostile, or overflowing header). It
	// wraps [ErrFileCorrupted]: errors.Is(err, ErrFileCorrupted) is
	// true for any structural rejection, so callers that already test
	// for ErrFileCorrupted keep working.
	ErrHeaderInconsistent = fmt.Errorf("%w: header layout inconsistent", ErrFileCorrupted)
)

Errors surfaced by the format helpers.

View Source
var Magic = [4]byte{'G', 'G', 'C', 'S'}

Magic is the 4-byte file identifier: ASCII "GGCS".

Functions

func AlignUp

func AlignUp(n, alignment uint64) uint64

AlignUp rounds n up to the next multiple of alignment.

func BuildFixture

func BuildFixture(spec FixtureSpec) (*csr.CSR[struct{}], error)

BuildFixture deterministically builds a CSR snapshot meeting spec. The graph is directed and uses uint32 vertex identifiers; weights are absent (struct{}). Suitable for Tier 2 benchmarks and the crash-recovery harness, where reproducibility matters more than realism.

BuildFixture returns any error surfaced by the underlying adjlist.AdjList; with the default uncapped configuration the only failure mode is adjlist.ErrShardFull, which cannot be reached because no adjlist.Config.MaxShardCapacity is set.

func EncodeHeader

func EncodeHeader(h Header) []byte

EncodeHeader writes h into a fresh HeaderSize-byte slice.

func Reinterpret

func Reinterpret[T any](data []byte, n int) []T

Reinterpret returns a typed slice of length n that aliases the memory backing data. T must be a fixed-size primitive (or a named alias of one) — int8/int16/int32/int64/uint*/float32/float64 or a type whose layout is identical to one of those. The function panics when data is too short to hold n elements of T or when its alignment is incompatible with T.

Precondition on n: the byte requirement size(T)*n must be representable; n is treated as untrusted. When size(T)*n overflows (or exceeds what a Go slice length can address), the requirement can never be satisfied by any real buffer, so Reinterpret takes the same "data too short" panic path rather than computing a wrapped product and slicing out of bounds. Callers that derive n from a wire- or file-encoded value therefore get a deterministic, guarded failure instead of an out-of-bounds read.

Because the returned slice aliases data's memory, callers must preserve the data buffer's lifetime for the duration they hold the returned slice, and must not mutate it through the returned slice in ways that would surprise other readers. Typical use: re-typing the body of a memory-mapped region as a []uint64 view.

This helper is unsafe in the Go-language sense (it uses unsafe.Pointer and unsafe.Slice); the project policy for unsafe usage is documented in CONTRIBUTING.md.

Types

type AccessPattern

type AccessPattern uint8

AccessPattern is the advisory hint given to the OS about the expected memory-access pattern of a mapped section.

const (
	AccessDefault AccessPattern = iota
	AccessSequential
	AccessRandom
	AccessWillNeed
	AccessDontNeed
)

Supported access patterns.

type FixtureSpec

type FixtureSpec struct {
	// Vertices is the number of pre-interned vertex IDs.
	Vertices uint64
	// Edges is the number of edges to add (uniformly random
	// (src, dst) over [0, Vertices)).
	Edges uint64
	// Seed is the PCG seed (any uint64).
	Seed uint64
	// Multigraph allows parallel edges; without it duplicates are
	// silently collapsed.
	Multigraph bool
}

FixtureSpec parameterises BuildFixture. The same seed produces the same graph every time, making the harness deterministic.

type Header struct {
	Version        uint16
	Alignment      uint8
	NVertices      uint64
	NEdges         uint64
	Weight         WeightKind
	VerticesOffset uint64
	EdgesOffset    uint64
	WeightsOffset  uint64
	TailCRCOffset  uint64
}

Header is the in-memory representation of the 64-byte file preamble.

func DecodeHeader

func DecodeHeader(buf []byte) (Header, error)

DecodeHeader parses the first HeaderSize bytes into a Header.

func Layout

func Layout(nVertices, nEdges uint64, weight WeightKind) (header Header, totalBytes uint64)

Layout computes the on-disk byte offsets and total size for a header populated with NVertices, NEdges, and Weight. It returns the header with offsets filled and the total file size including the trailing CRC32C uint32.

All arithmetic is overflow-safe: when NVertices, NEdges, or the derived section sizes would overflow a uint64 (as a hostile or corrupted header can request), Layout returns the zero Header and a zero totalBytes. Callers — both the writer and [Header.validate] — must treat a zero totalBytes as "this header is not representable on disk" and reject it rather than proceeding with bogus offsets.

func WriteToFile

func WriteToFile[W any](path string, c *csr.CSR[W]) (Header, error)

WriteToFile serialises c into the path atomically and durably: data lands in path + ".tmp" first, the temp file's contents are fsync'd, the file is renamed onto path, and finally the PARENT directory is fsync'd so the rename's directory entry survives a crash. Concurrent readers see either the previous file or the new file, never a partial write.

On return with a nil error the published file is crash-durable: it survives process crash, host crash, and kill -9. This guarantee matters because WriteToFile is the bulk loader's sole durability mechanism — the bulk path bypasses the WAL, so there is no replay and no later checkpoint of this artefact to recover a lost rename. Without the parent-directory fsync, a crash within the kernel's writeback window after a successful return could lose the rename's directory entry and with it the entire bulk load. The parent fsync is a no-op on platforms without a directory-fsync primitive (Windows); see [parentDirFsync].

W must be one of the supported weight kinds (int32/uint32/float32 for 4-byte; int/uint/int64/uint64/float64/uintptr for 8-byte) or struct{} for unweighted graphs. Unsupported types produce ErrUnknownWeightKind.

type Reader

type Reader struct {
	// contains filtered or unexported fields
}

Reader is a read-only, mmap-backed view of a csrfile.

All slices returned by Reader.Vertices / Reader.Edges / Reader.WeightsRaw alias the underlying mmap region — they must not be mutated and remain valid only for as long as the mapping is live. Reader.Close unmaps the region, after which any retained slice is a dangling view into freed memory; reading it is a use-after-unmap fault, not a recoverable Go panic.

Concurrency. A Reader is safe for concurrent use, but a long-lived reader that binds the slices once and iterates them MUST do so inside Reader.Read: Read holds an internal read lock for the whole duration of the callback, and Close blocks until every in-flight Read has returned before it unmaps. Calling the bare accessors (Reader.Vertices etc.) and then iterating outside Read races a concurrent Close and may fault the process; the accessors are retained only for short, non-concurrent inspection.

func Open

func Open(path string) (*Reader, error)

Open mmaps path read-only, verifies the header and the tail CRC, and returns a Reader pointing into the mapped region.

func (*Reader) Close

func (r *Reader) Close() error

Close releases the mmap and underlying file. Any slice returned by the Reader becomes invalid.

Close acquires the Reader's write lock, so it blocks until every in-flight Reader.Read has returned before unmapping — no reader can be iterating the mapping at the moment it is released. Close is idempotent: the second and later calls observe the already-released mapping and return nil without unmapping twice. Close is safe to call concurrently from multiple goroutines.

func (*Reader) Edges

func (r *Reader) Edges() []graph.NodeID

Edges returns the edges slice. The returned slice aliases the mmap region and is valid only until Reader.Close; long-lived or concurrent iteration MUST use Reader.Read.

func (*Reader) Header

func (r *Reader) Header() Header

Header returns the parsed file header.

func (*Reader) Read added in v0.2.0

func (r *Reader) Read(fn func(vertices []uint64, edges []graph.NodeID, weights []byte) error) error

Read invokes fn with the mmap-aliased vertices, edges and raw weights slices while holding an internal read lock for the whole duration of the call. The lock keeps the mapping live across the entire callback, so fn may safely bind the slices once and iterate them: a concurrent Reader.Close blocks until fn returns before it unmaps, closing the use-after-unmap window that bare accessors plus an external iteration would leave open.

The slices passed to fn alias the read-only mmap region; fn must not mutate them and must not retain them beyond its own return — they are invalid once Read returns. weights is nil when the file is unweighted.

Read returns ErrReaderClosed if the Reader has already been closed; otherwise it returns whatever fn returns. The read lock is released even if fn panics. Read is safe for concurrent use; any number of Read calls run in parallel.

func (*Reader) SetHint

func (r *Reader) SetHint(pattern AccessPattern) error

SetHint applies an OS-level advisory hint to the mapped region describing the expected access pattern. On Linux it issues madvise; on other platforms the call returns nil and is a no-op at the OS level (Go's mmap-go does not expose madvise on every platform).

SetHint holds the read lock across the OS call, so it cannot race a concurrent Reader.Close (which would otherwise unmap the region the madvise syscall is about to touch). It returns ErrReaderClosed if the Reader is already closed. Safe for concurrent use.

func (*Reader) Vertices

func (r *Reader) Vertices() []uint64

Vertices returns the offsets slice. Each entry is the start index in Reader.Edges of that vertex's out-neighbours.

The returned slice aliases the mmap region and is valid only until Reader.Close. A consumer that iterates it concurrently with a possible Close MUST use Reader.Read instead; see the Reader type documentation.

func (*Reader) WeightsFloat64

func (r *Reader) WeightsFloat64() ([]float64, bool)

WeightsFloat64 returns the weights section as a []float64 when possible.

func (*Reader) WeightsRaw

func (r *Reader) WeightsRaw() []byte

WeightsRaw returns the raw bytes of the weights section. Use Reader.WeightsUint64 / Reader.WeightsFloat64 for typed views. Returns nil when the file is unweighted. The returned slice aliases the mmap region and is valid only until Reader.Close; long-lived or concurrent iteration MUST use Reader.Read.

func (*Reader) WeightsUint64

func (r *Reader) WeightsUint64() ([]uint64, bool)

WeightsUint64 returns the weights section as a []uint64 when possible. Returns nil, false when the weight kind is not 8-byte integer.

type WeightKind

type WeightKind uint8

WeightKind tags the on-disk type of the weights section.

const (
	WeightAbsent  WeightKind = 0
	WeightUint32  WeightKind = 1
	WeightUint64  WeightKind = 2
	WeightFloat32 WeightKind = 3
	WeightFloat64 WeightKind = 4
)

Supported weight kinds.

func (WeightKind) Size

func (k WeightKind) Size() int

Size returns the byte size of one weight value for k.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL