snap

package
v0.0.13 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 17, 2025 License: Apache-2.0 Imports: 15 Imported by: 0

README

Event-Based Snapshot Design

Overview

This package provides event-based snapshot storage for logd. Snapshots store events directly from the stream package, with a size-bound index for efficient path lookups.

Design Goals

  1. Store events directly: Snapshots contain a stream of stream.Event values, not IR nodes
  2. Size-bound index: The index fits in memory even for very large snapshots
  3. Efficient path access: Look up paths without reading the entire snapshot
  4. Streaming reads: Support reading paths without full document reconstruction

Architecture

Snapshot Format
[8 bytes: event stream size (uint64, big-endian)]
[4 bytes: index size (uint32, big-endian)]
[event stream bytes]
[index bytes]
  • Event stream size: 8-byte uint64 indicating size of event stream in bytes
  • Index size: 4-byte uint32 indicating size of index in bytes
  • Event stream: Sequence of stream.Event values encoded in Tony format
  • Index: Size-bound map of paths to byte offsets

The sizes are written first (as placeholders, then updated), followed by the event stream and index.

Index Structure

The index is a list of kpaths in order of the stream events, each associated with an offset in the event data:

  • IndexEntry: Contains a kpath (string) and offset (int64)
  • Entries: Ordered list of IndexEntry values, sorted by offset
  • Ancestor lookup: If exact path not indexed, find nearest ancestor
Building Snapshots
  1. Write placeholder sizes at beginning (12 bytes: 8 for event stream size, 4 for index size)
  2. Process IR node or event stream
  3. Convert to events (if starting from IR node)
  4. Write events to snapshot file
  5. Build size-bound index while processing events
  6. After all events written:
    • Write index bytes
    • Seek back to beginning and update event stream size and index size
Reading Snapshots
  1. Read sizes from beginning of file:
    • Read event stream size (8 bytes, uint64)
    • Read index size (4 bytes, uint32)
  2. Read event stream (starting at offset 12, for event stream size bytes)
  3. Read index (starting after event stream, for index size bytes)
  4. Parse index structure
  5. For path lookup:
    • Find path or nearest ancestor in index
    • Seek to offset in event stream
    • Decode events from that offset
    • Reconstruct path value from events

Implementation Status

  • Basic index structure (Index, IndexEntry)
  • Event-based snapshot writer
  • Index builder (records paths and offsets as events are processed)
  • Snapshot reader with path lookup, basic

Migration from IR-Node-Based Snapshots

The previous IR-node-based implementation (with !snap-loc, !snap-range, !snap-chunks) has been archived in archive/. The new design:

  • Simpler: No chunking logic, just events + index
  • More efficient: Direct event storage, no IR node conversion overhead
  • Better scaling: Size-bound index works for arbitrarily large snapshots

Documentation

Overview

Package snap provides event-based snapshot storage for logd.

This package is being redesigned to store events directly from the stream package, with a size-bound index into those events. This allows efficient storage and retrieval of large snapshots without loading entire documents into memory.

Design Principles:

  • Store events directly (from stream.Event) rather than IR nodes
  • Maintain a size-bound index for efficient path lookups
  • Support streaming reads without full document reconstruction

The previous IR-node-based implementation has been archived in internal/snap/archive/.

Index

Constants

View Source
const (
	DefaultChunkSize = 4096
	HeaderSize       = 12
)

Variables

This section is empty.

Functions

func EncodeRandomDocument

func EncodeRandomDocument(doc *ir.Node, enc *stream.Encoder) error

EncodeRandomDocument encodes a random document to events using the stream encoder

func GetChunkSize

func GetChunkSize() int

GetChunkSize returns the maximum chunk size for indexing. Defaults to DefaultMaxChunkSize (4096) if SNAP_MAX_CHUNK_SIZE environment variable is not set. This allows tests to use smaller chunk sizes to exercise chunk boundary conditions.

func RandomDocument

func RandomDocument(config RandomDocConfig) (*ir.Node, []string, error)

RandomDocument generates a random document with mixed structure Returns the document as an ir.Node and all paths that exist in it

Types

type Builder

type Builder struct {
	// contains filtered or unexported fields
}

func NewBuilder

func NewBuilder(w W, index *Index, patches []*ir.Node) (*Builder, error)

func (*Builder) Close

func (b *Builder) Close() error

func (*Builder) WriteEvent

func (b *Builder) WriteEvent(ev *stream.Event) error

WriteEvent processes a single event, writing it to the snapshot and updating state/index.

type Index

type Index struct {
	// Entries is a list of indexed paths in order of appearance in the event stream.
	// Entries are ordered by their Offset values.
	Entries []IndexEntry
}

Index is an index into event-based snapshots. It contains a list of kpaths in order of the stream events, each associated with an offset in the event data.

func OpenIndex

func OpenIndex(r io.Reader, size int) (*Index, error)

OpenIndex reads an index from a reader of size size

func (*Index) EstimatedSize

func (idx *Index) EstimatedSize() int64

EstimatedSize returns an estimate of the index size in bytes.

func (*Index) FromTony

func (s *Index) FromTony(data []byte, opts ...gomap.UnmapOption) error

FromTony parses Tony format bytes and populates Index.

func (*Index) FromTonyIR

func (s *Index) FromTonyIR(node *ir.Node, opts ...gomap.UnmapOption) error

FromTonyIR populates Index from a Tony IR node.

func (*Index) Lookup

func (idx *Index) Lookup(kp string) (index int, err error)

Lookup finds the entry for the given path, or the entry just before it in sorted (document) order. Since document order for objects is sorted order, entries are sorted by path. Returns the entry just before where the path would be inserted, which should be an ancestor or a sibling that comes before the requested path.

If the target path comes before all indexed entries, returns the first entry (index 0). The caller should check if the returned entry is actually before or at the target path.

func (*Index) ToTony

func (s *Index) ToTony(opts ...gomap.MapOption) ([]byte, error)

ToTony converts Index to Tony format bytes.

func (*Index) ToTonyIR

func (s *Index) ToTonyIR(opts ...gomap.MapOption) (*ir.Node, error)

ToTonyIR converts Index to a Tony IR node.

type IndexEntry

type IndexEntry struct {
	Path   *Path // Kinded path (e.g., "a.b[0]", "users.123.name")
	Offset int64 // Byte offset in the event stream where this path appears
	Size   int64 `tony:"omit"`
}

IndexEntry represents a single entry in the snapshot index. Each entry maps a kinded path to its byte offset in the event stream.

tony:schemagen=index-entry

func (*IndexEntry) FromTony

func (s *IndexEntry) FromTony(data []byte, opts ...gomap.UnmapOption) error

FromTony parses Tony format bytes and populates IndexEntry.

func (*IndexEntry) FromTonyIR

func (s *IndexEntry) FromTonyIR(node *ir.Node, opts ...gomap.UnmapOption) error

FromTonyIR populates IndexEntry from a Tony IR node.

func (*IndexEntry) ToTony

func (s *IndexEntry) ToTony(opts ...gomap.MapOption) ([]byte, error)

ToTony converts IndexEntry to Tony format bytes.

func (*IndexEntry) ToTonyIR

func (s *IndexEntry) ToTonyIR(opts ...gomap.MapOption) (*ir.Node, error)

ToTonyIR converts IndexEntry to a Tony IR node.

type Path

type Path struct {
	kpath.KPath
}

func (*Path) MarshalText

func (p *Path) MarshalText() ([]byte, error)

func (*Path) String

func (p *Path) String() string

func (*Path) UnmarshalText

func (p *Path) UnmarshalText(d []byte) error

type PathFinder

type PathFinder struct {
	R io.ReadSeekCloser
	// contains filtered or unexported fields
}

func NewPathFinder

func NewPathFinder(r io.ReadSeekCloser, off int64, idxPath, desPath *kpath.KPath) (*PathFinder, error)

func (*PathFinder) FindEvents

func (pf *PathFinder) FindEvents() ([]stream.Event, error)

FindEvents reads events from the snapshot and returns only those events that correspond to the desired path.

type R

type R interface {
	io.ReadSeekCloser
}

type RandomDocConfig

type RandomDocConfig struct {
	// MinSize and MaxSize control the approximate size range in bytes
	MinSize int
	MaxSize int

	// MaxDepth controls maximum nesting depth
	MaxDepth int

	// ObjectFieldProbability is probability (0.0-1.0) that a container will be an object vs array
	ObjectFieldProbability float64

	// ContainerProbability is probability (0.0-1.0) that a value will be a container vs primitive
	ContainerProbability float64

	// StringLengthRange controls string value lengths
	StringLengthMin int
	StringLengthMax int

	// Seed for random number generator (0 means use current time)
	Seed int64
}

RandomDocConfig configures random document generation

func DefaultRandomDocConfig

func DefaultRandomDocConfig() RandomDocConfig

DefaultRandomDocConfig returns a reasonable default configuration

type Snapshot

type Snapshot struct {
	R         io.ReadSeekCloser
	Index     *Index
	EventSize uint64 // Size of event stream in bytes

}

func Open

func Open(rc R) (*Snapshot, error)

func (*Snapshot) Close

func (s *Snapshot) Close() error

func (*Snapshot) ReadPath

func (s *Snapshot) ReadPath(p string) (*ir.Node, error)

ReadPath reads a specific path from the snapshot. Returns nil if the path is not found.

type W

type W interface {
	io.WriteCloser
	io.Seeker
}

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL