snap

package
v0.0.17 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 19, 2025 License: Apache-2.0 Imports: 14 Imported by: 0

README

Event-Based Snapshot Design

Overview

This package provides event-based snapshot storage for logd. Snapshots store events directly from the stream package, with a size-bound index for efficient path lookups.

Design Goals

  1. Store events directly: Snapshots contain a stream of stream.Event values, not IR nodes
  2. Size-bound index: The index fits in memory even for very large snapshots
  3. Efficient path access: Look up paths without reading the entire snapshot
  4. Streaming reads: Support reading paths without full document reconstruction

Architecture

Snapshot Format
[8 bytes: event stream size (uint64, big-endian)]
[4 bytes: index size (uint32, big-endian)]
[event stream bytes]
[index bytes]
  • Event stream size: 8-byte uint64 indicating size of event stream in bytes
  • Index size: 4-byte uint32 indicating size of index in bytes
  • Event stream: Sequence of stream.Event values encoded in Tony format
  • Index: Size-bound map of paths to byte offsets

The sizes are written first (as placeholders, then updated), followed by the event stream and index.

Index Structure

The index is a list of kpaths in order of the stream events, each associated with an offset in the event data:

  • IndexEntry: Contains a kpath (string) and offset (int64)
  • Entries: Ordered list of IndexEntry values, sorted by offset
  • Ancestor lookup: If exact path not indexed, find nearest ancestor
Building Snapshots
  1. Write placeholder sizes at beginning (12 bytes: 8 for event stream size, 4 for index size)
  2. Process IR node or event stream
  3. Convert to events (if starting from IR node)
  4. Write events to snapshot file
  5. Build size-bound index while processing events
  6. After all events written:
    • Write index bytes
    • Seek back to beginning and update event stream size and index size
Reading Snapshots
  1. Read sizes from beginning of file:
    • Read event stream size (8 bytes, uint64)
    • Read index size (4 bytes, uint32)
  2. Read event stream (starting at offset 12, for event stream size bytes)
  3. Read index (starting after event stream, for index size bytes)
  4. Parse index structure
  5. For path lookup:
    • Find path or nearest ancestor in index
    • Seek to offset in event stream
    • Decode events from that offset
    • Reconstruct path value from events

Implementation Status

  • Basic index structure (Index, IndexEntry)
  • Event-based snapshot writer
  • Index builder (records paths and offsets as events are processed)
  • Snapshot reader with path lookup, basic

Migration from IR-Node-Based Snapshots

The previous IR-node-based implementation (with !snap-loc, !snap-range, !snap-chunks) has been archived in archive/. The new design:

  • Simpler: No chunking logic, just events + index
  • More efficient: Direct event storage, no IR node conversion overhead
  • Better scaling: Size-bound index works for arbitrarily large snapshots

Documentation

Overview

Package snap provides event-based snapshot storage.

Snapshots store stream.Event sequences with a size-bound index mapping paths to byte offsets. This enables efficient path lookups without loading entire documents into memory.

Format: [header: 12 bytes][events][index]

Index

Constants

View Source
const (
	DefaultChunkSize = 4096
	HeaderSize       = 12
)

Variables

This section is empty.

Functions

func EncodeRandomDocument

func EncodeRandomDocument(doc *ir.Node, enc *stream.Encoder) error

EncodeRandomDocument encodes a random document to events using the stream encoder

func GetChunkSize

func GetChunkSize() int

GetChunkSize returns the chunk size for indexing (bytes). Defaults to 4096. Override with SNAP_MAX_CHUNK_SIZE env var.

func RandomDocument

func RandomDocument(config RandomDocConfig) (*ir.Node, []string, error)

RandomDocument generates a random document with mixed structure Returns the document as an ir.Node and all paths that exist in it

Types

type Builder

type Builder struct {
	// contains filtered or unexported fields
}

Builder writes snapshot files by consuming stream events. Automatically creates index entries at chunk boundaries.

func NewBuilder

func NewBuilder(w W, index *Index, patches []*ir.Node) (*Builder, error)

NewBuilder creates a snapshot builder writing to w. Populates the provided index as events are written.

func (*Builder) Close

func (b *Builder) Close() error

func (*Builder) WriteEvent

func (b *Builder) WriteEvent(ev *stream.Event) error

WriteEvent writes an event to the snapshot. Creates index entries when chunk size threshold is reached.

type Index

type Index struct {
	Entries []IndexEntry // Ordered by Offset
}

Index maps kinded paths to event stream offsets. Entries are in document order (sorted for objects, sequential for arrays).

func OpenIndex

func OpenIndex(r io.Reader, size int) (*Index, error)

OpenIndex reads an index from a reader of size size

func (*Index) EstimatedSize

func (idx *Index) EstimatedSize() int64

EstimatedSize returns an estimate of the index size in bytes.

func (*Index) FromTony

func (s *Index) FromTony(data []byte, opts ...gomap.UnmapOption) error

FromTony parses Tony format bytes and populates Index.

func (*Index) FromTonyIR

func (s *Index) FromTonyIR(node *ir.Node, opts ...gomap.UnmapOption) error

FromTonyIR populates Index from a Tony IR node.

func (*Index) Lookup

func (idx *Index) Lookup(kp string) (index int, err error)

Lookup finds the index entry at or before path kp in document order. Returns the largest index i where Entries[i].Path <= kp. Returns 0 if kp comes before all indexed paths.

func (*Index) ToTony

func (s *Index) ToTony(opts ...gomap.MapOption) ([]byte, error)

ToTony converts Index to Tony format bytes.

func (*Index) ToTonyIR

func (s *Index) ToTonyIR(opts ...gomap.MapOption) (*ir.Node, error)

ToTonyIR converts Index to a Tony IR node.

type IndexEntry

type IndexEntry struct {
	Path   *Path // Kinded path (e.g., "a.b[0]", "users.123.name")
	Offset int64 // Byte offset in event stream
	Size   int64 `tony:"omit"`
}

IndexEntry maps a kinded path to its byte offset in the event stream.

tony:schemagen=index-entry

func (*IndexEntry) FromTony

func (s *IndexEntry) FromTony(data []byte, opts ...gomap.UnmapOption) error

FromTony parses Tony format bytes and populates IndexEntry.

func (*IndexEntry) FromTonyIR

func (s *IndexEntry) FromTonyIR(node *ir.Node, opts ...gomap.UnmapOption) error

FromTonyIR populates IndexEntry from a Tony IR node.

func (*IndexEntry) ToTony

func (s *IndexEntry) ToTony(opts ...gomap.MapOption) ([]byte, error)

ToTony converts IndexEntry to Tony format bytes.

func (*IndexEntry) ToTonyIR

func (s *IndexEntry) ToTonyIR(opts ...gomap.MapOption) (*ir.Node, error)

ToTonyIR converts IndexEntry to a Tony IR node.

type Path

type Path struct {
	kpath.KPath
}

func (*Path) MarshalText

func (p *Path) MarshalText() ([]byte, error)

func (*Path) String

func (p *Path) String() string

func (*Path) UnmarshalText

func (p *Path) UnmarshalText(d []byte) error

type PathFinder

type PathFinder struct {
	R io.ReadSeekCloser
	// contains filtered or unexported fields
}

PathFinder seeks to an indexed offset and extracts events for a target path.

Uses stream.KPathState to initialize state for the indexed path. For leaf array elements, KPathState positions one element before, so processing the first event at the offset advances to the correct position.

func NewPathFinder

func NewPathFinder(r io.ReadSeekCloser, index *Index, off int64, idxPath, desPath *kpath.KPath, eventSize int64) (*PathFinder, error)

NewPathFinder creates a PathFinder starting at offset off (indexed at idxPath) to find desPath.

Initializes state using stream.KPathState(idxPath), which positions correctly for reading events starting at off. For field and sparse array entries, advances state past the key by processing a dummy null event. index is the snapshot index, used to determine chunk boundaries for buffering. eventSize is the total size of the event stream, used to prevent reading past into the index section.

func (*PathFinder) FindEvents

func (pf *PathFinder) FindEvents() ([]stream.Event, error)

FindEvents extracts events for the desired path from the snapshot. Buffers chunks for efficient I/O, reading additional chunks as needed.

type R

type R interface {
	io.ReadSeekCloser
}

type RandomDocConfig

type RandomDocConfig struct {
	// MinSize and MaxSize control the approximate size range in bytes
	MinSize int
	MaxSize int

	// MaxDepth controls maximum nesting depth
	MaxDepth int

	// ObjectFieldProbability is probability (0.0-1.0) that a container will be an object vs array
	ObjectFieldProbability float64

	// ContainerProbability is probability (0.0-1.0) that a value will be a container vs primitive
	ContainerProbability float64

	// StringLengthRange controls string value lengths
	StringLengthMin int
	StringLengthMax int

	// Seed for random number generator (0 means use current time)
	Seed int64
}

RandomDocConfig configures random document generation

func DefaultRandomDocConfig

func DefaultRandomDocConfig() RandomDocConfig

DefaultRandomDocConfig returns a reasonable default configuration

type Snapshot

type Snapshot struct {
	R         io.ReadSeekCloser
	Index     *Index
	EventSize uint64 // Size of event stream in bytes

}

Snapshot is an opened snapshot file providing random access to paths.

func Open

func Open(rc R) (*Snapshot, error)

Open reads a snapshot from rc. The index is loaded into memory; events are read on demand.

func (*Snapshot) Close

func (s *Snapshot) Close() error

func (*Snapshot) ReadPath

func (s *Snapshot) ReadPath(p string) (*ir.Node, error)

ReadPath reads the IR node at path p. Returns nil if path not found.

type W

type W interface {
	io.WriteCloser
	io.Seeker
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL