csrfile

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 2, 2026 License: MIT Imports: 17 Imported by: 0

Documentation

Overview

Package csrfile defines the on-disk binary format used by GoGraph's Tier 2 (out-of-core, mmap-backed) CSR storage.

The format is stable and versioned; the full specification lives at docs/csrfile-v1.md alongside this repository. Tier 2 readers mmap the file and reinterpret the aligned sections as typed []uint64 slices without parsing.

Example

Example writes a CSR snapshot to an on-disk Tier 2 file and reads it back through the mmap-backed Reader, inspecting the header and the stored edge weights without re-parsing the body.

package main

import (
	"fmt"
	"os"
	"path/filepath"

	"github.com/FlavioCFOliveira/GoGraph/graph/adjlist"
	"github.com/FlavioCFOliveira/GoGraph/graph/csr"
	"github.com/FlavioCFOliveira/GoGraph/store/csrfile"
)

func main() {
	dir, err := os.MkdirTemp("", "csrfile-example")
	if err != nil {
		panic(err)
	}
	defer func() { _ = os.RemoveAll(dir) }()

	// Build a small weighted directed graph and freeze it into an
	// immutable CSR snapshot.
	a := adjlist.New[string, int64](adjlist.Config{Directed: true})
	for _, e := range []struct {
		src, dst string
		w        int64
	}{
		{"a", "b", 10},
		{"a", "c", 20},
		{"b", "c", 30},
	} {
		if err := a.AddEdge(e.src, e.dst, e.w); err != nil {
			panic(err)
		}
	}
	c := csr.BuildFromAdjList(a)

	// Write the CSR to disk; publication is atomic (write .tmp, fsync,
	// rename).
	path := filepath.Join(dir, "graph.csr")
	if _, err := csrfile.WriteToFile(path, c); err != nil {
		panic(err)
	}

	// Open mmaps the file read-only and verifies the header and tail
	// CRC. The returned slices alias the mapped region and stay valid
	// until Close.
	r, err := csrfile.Open(path)
	if err != nil {
		panic(err)
	}
	defer func() { _ = r.Close() }()

	// NEdges is the live edge count. NVertices is the length of the
	// dense vertex-offset (row-pointer) array, which is sized by the
	// largest interned NodeID rather than by the number of distinct
	// keys, so it is not asserted here; csr.CSR.Order reports the live
	// vertex count when that is what you need.
	h := r.Header()
	fmt.Printf("edges=%d\n", h.NEdges)

	weights, ok := r.WeightsUint64()
	fmt.Printf("weights present=%t count=%d\n", ok, len(weights))

	// The mmapped edge slice has one entry per stored edge.
	fmt.Printf("edge slice len=%d\n", len(r.Edges()))

}
Output:
edges=3
weights present=true count=3
edge slice len=3

Index

Examples

Constants

View Source
const Alignment = 64

Alignment is the section alignment in bytes; every typed section starts at an offset that is a multiple of this value.

View Source
const CurrentVersion uint16 = 1

CurrentVersion is the format version emitted by this build.

View Source
const HeaderSize = 64

HeaderSize is the fixed byte length of the file header.

Variables

View Source
var (
	// ErrBadMagic indicates the first four bytes are not Magic.
	ErrBadMagic = errors.New("csrfile: bad magic")
	// ErrUnsupportedVersion indicates a newer-than-build version.
	ErrUnsupportedVersion = errors.New("csrfile: unsupported version")
	// ErrUnsupportedByteOrder indicates a non-little-endian file.
	ErrUnsupportedByteOrder = errors.New("csrfile: unsupported byte order")
	// ErrUnknownWeightKind indicates an out-of-range WeightKind.
	ErrUnknownWeightKind = errors.New("csrfile: unknown weight kind")
	// ErrFileCorrupted indicates the tail CRC32C check failed.
	ErrFileCorrupted = errors.New("csrfile: file corrupted")
	// ErrHeaderTooShort indicates a short read while parsing the header.
	ErrHeaderTooShort = errors.New("csrfile: header too short")
	// ErrHeaderInconsistent indicates the decoded header's counts and
	// offsets do not match the single canonical on-disk layout for
	// those counts (a malformed, hostile, or overflowing header). It
	// wraps [ErrFileCorrupted]: errors.Is(err, ErrFileCorrupted) is
	// true for any structural rejection, so callers that already test
	// for ErrFileCorrupted keep working.
	ErrHeaderInconsistent = fmt.Errorf("%w: header layout inconsistent", ErrFileCorrupted)
)

Errors surfaced by the format helpers.

View Source
var Magic = [4]byte{'G', 'G', 'C', 'S'}

Magic is the 4-byte file identifier: ASCII "GGCS".

Functions

func AlignUp

func AlignUp(n, alignment uint64) uint64

AlignUp rounds n up to the next multiple of alignment.

func BuildFixture

func BuildFixture(spec FixtureSpec) (*csr.CSR[struct{}], error)

BuildFixture deterministically builds a CSR snapshot meeting spec. The graph is directed and uses uint32 vertex identifiers; weights are absent (struct{}). Suitable for Tier 2 benchmarks and the crash-recovery harness, where reproducibility matters more than realism.

BuildFixture returns any error surfaced by the underlying adjlist.AdjList; with the default uncapped configuration the only failure mode is adjlist.ErrShardFull, which cannot be reached because no adjlist.Config.MaxShardCapacity is set.

func EncodeHeader

func EncodeHeader(h Header) []byte

EncodeHeader writes h into a fresh HeaderSize-byte slice.

func Reinterpret

func Reinterpret[T any](data []byte, n int) []T

Reinterpret returns a typed slice of length n that aliases the memory backing data. T must be a fixed-size primitive (or a named alias of one) — int8/int16/int32/int64/uint*/float32/float64 or a type whose layout is identical to one of those. The function panics when data is too short to hold n elements of T or when its alignment is incompatible with T.

Precondition on n: the byte requirement size(T)*n must be representable; n is treated as untrusted. When size(T)*n overflows (or exceeds what a Go slice length can address), the requirement can never be satisfied by any real buffer, so Reinterpret takes the same "data too short" panic path rather than computing a wrapped product and slicing out of bounds. Callers that derive n from a wire- or file-encoded value therefore get a deterministic, guarded failure instead of an out-of-bounds read.

Because the returned slice aliases data's memory, callers must preserve the data buffer's lifetime for the duration they hold the returned slice, and must not mutate it through the returned slice in ways that would surprise other readers. Typical use: re-typing the body of a memory-mapped region as a []uint64 view.

This helper is unsafe in the Go-language sense (it uses unsafe.Pointer and unsafe.Slice); the project policy for unsafe usage is documented in CONTRIBUTING.md.

Types

type AccessPattern

type AccessPattern uint8

AccessPattern is the advisory hint given to the OS about the expected memory-access pattern of a mapped section.

const (
	AccessDefault AccessPattern = iota
	AccessSequential
	AccessRandom
	AccessWillNeed
	AccessDontNeed
)

Supported access patterns.

type FixtureSpec

type FixtureSpec struct {
	// Vertices is the number of pre-interned vertex IDs.
	Vertices uint64
	// Edges is the number of edges to add (uniformly random
	// (src, dst) over [0, Vertices)).
	Edges uint64
	// Seed is the PCG seed (any uint64).
	Seed uint64
	// Multigraph allows parallel edges; without it duplicates are
	// silently collapsed.
	Multigraph bool
}

FixtureSpec parameterises BuildFixture. The same seed produces the same graph every time, making the harness deterministic.

type Header struct {
	Version        uint16
	Alignment      uint8
	NVertices      uint64
	NEdges         uint64
	Weight         WeightKind
	VerticesOffset uint64
	EdgesOffset    uint64
	WeightsOffset  uint64
	TailCRCOffset  uint64
}

Header is the in-memory representation of the 64-byte file preamble.

func DecodeHeader

func DecodeHeader(buf []byte) (Header, error)

DecodeHeader parses the first HeaderSize bytes into a Header.

func Layout

func Layout(nVertices, nEdges uint64, weight WeightKind) (header Header, totalBytes uint64)

Layout computes the on-disk byte offsets and total size for a header populated with NVertices, NEdges, and Weight. It returns the header with offsets filled and the total file size including the trailing CRC32C uint32.

All arithmetic is overflow-safe: when NVertices, NEdges, or the derived section sizes would overflow a uint64 (as a hostile or corrupted header can request), Layout returns the zero Header and a zero totalBytes. Callers — both the writer and [Header.validate] — must treat a zero totalBytes as "this header is not representable on disk" and reject it rather than proceeding with bogus offsets.

func WriteToFile

func WriteToFile[W any](path string, c *csr.CSR[W]) (Header, error)

WriteToFile serialises c into the path atomically: data lands in path + ".tmp" first, the file is fsync'd, and only then renamed onto path. Concurrent readers see either the previous file or the new file, never a partial write.

W must be one of the supported weight kinds (int32/uint32/float32 for 4-byte; int/uint/int64/uint64/float64/uintptr for 8-byte) or struct{} for unweighted graphs. Unsupported types produce ErrUnknownWeightKind.

type Reader

type Reader struct {
	// contains filtered or unexported fields
}

Reader is a read-only, mmap-backed view of a csrfile.

All slices returned by Reader.Vertices / Reader.Edges / [Reader.Weights] alias the underlying mmap region — they must not be mutated and remain valid only while the Reader is open. Close invalidates them.

func Open

func Open(path string) (*Reader, error)

Open mmaps path read-only, verifies the header and the tail CRC, and returns a Reader pointing into the mapped region.

func (*Reader) Close

func (r *Reader) Close() error

Close releases the mmap and underlying file. Any slice returned by the Reader becomes invalid.

func (*Reader) Edges

func (r *Reader) Edges() []graph.NodeID

Edges returns the edges slice.

func (*Reader) Header

func (r *Reader) Header() Header

Header returns the parsed file header.

func (*Reader) SetHint

func (r *Reader) SetHint(pattern AccessPattern) error

SetHint applies an OS-level advisory hint to the mapped region describing the expected access pattern. On Linux it issues madvise; on other platforms the call returns nil and is a no-op at the OS level (Go's mmap-go does not expose madvise on every platform).

func (*Reader) Vertices

func (r *Reader) Vertices() []uint64

Vertices returns the offsets slice. Each entry is the start index in Reader.Edges of that vertex's out-neighbours.

func (*Reader) WeightsFloat64

func (r *Reader) WeightsFloat64() ([]float64, bool)

WeightsFloat64 returns the weights section as a []float64 when possible.

func (*Reader) WeightsRaw

func (r *Reader) WeightsRaw() []byte

WeightsRaw returns the raw bytes of the weights section. Use Reader.WeightsUint64 / Reader.WeightsFloat64 for typed views. Returns nil when the file is unweighted.

func (*Reader) WeightsUint64

func (r *Reader) WeightsUint64() ([]uint64, bool)

WeightsUint64 returns the weights section as a []uint64 when possible. Returns nil, false when the weight kind is not 8-byte integer.

type WeightKind

type WeightKind uint8

WeightKind tags the on-disk type of the weights section.

const (
	WeightAbsent  WeightKind = 0
	WeightUint32  WeightKind = 1
	WeightUint64  WeightKind = 2
	WeightFloat32 WeightKind = 3
	WeightFloat64 WeightKind = 4
)

Supported weight kinds.

func (WeightKind) Size

func (k WeightKind) Size() int

Size returns the byte size of one weight value for k.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL