dataobj

package
v3.6.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 23, 2026 License: AGPL-3.0 Imports: 20 Imported by: 0

Documentation

Overview

Package dataobj holds utilities for working with data objects.

Data objects are a container format for storing data intended to be retrieved from object storage. Each data object is composed of one or more "sections," each of which contains a specific type of data, such as logs stored in a columnar format.

Sections are further split into two "regions": the section data and the section metadata. Section metadata is intended to be a lightweight payload per section (usually protobuf) which aids in reading smaller portions of the data region.

Each section has a type indicating what kind of data was encoded in that section. A data object may have multiple sections of the same type.

The dataobj package provides a low-level Builder interface for composing sections into a dataobj, and a SectionReader interface for reading encoded sections.

Section implementations are stored in their own packages and provide higher-level utilities for writing and reading those sections. See github.com/grafana/loki/v3/pkg/dataobj/sections/logs for an example.

Creating a new section implementation

To create a new section implementation:

  1. Create a new package for your section.

  2. Create a new SectionType for your section. Pick a combination of namespace and kind that avoids collisions with other sections that may be written to a file.

  3. Create a "Builder" type which implmeents SectionBuilder. Your builder type should have methods for buffering data to be appended. Encode buffered data on a call to Flush.

  4. Create higher-level reading APIs on top of SectionReader, decoding the data encoded from your builder.

Then, callers can create an instance of your Builder and store it in a data object using Builder.Append, and read it back using your higher-level APIs.

While not required, it is typical for section packages to additionally implement:

  • A package-level CheckSection function to check if a Section was built with your package.

  • A package-specific Section type that wraps Section to use in your reading APIs.

  • A function which returns the estimated size of something to be appended to your builder, so callers can flush the section once it gets big enough.

  • A method on your builder to report the estimated size of the section, both before and after compression.

## Encoding and decoding

There are no requirements on how to encode or decode a section, but there are recommendations:

  • Use protobuf for representing the metadata of the section.

  • Write section data so that it can be read in smaller units. For example, columnar data is split into pages, each of which can be read independently.

[SectionReader]s cannot access data outside of their section. Calling DataRange with an offset of 0 reads data from the beginning of that section's data region, not from the start of the dataobj.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Builder

type Builder struct {
	// contains filtered or unexported fields
}

A Builder builds data objects from a set of incoming log data. Log data is appended to a builder by calling Builder.Append. Buffered log data is flushed manually by calling Builder.Flush.

Methods on Builder are not goroutine-safe; callers are responsible for synchronizing calls.

func NewBuilder

func NewBuilder(scratchStore scratch.Store) *Builder

A Builder accumulates data from a set of in-progress sections. A Builder can be flushed into a data object by calling Builder.Flush.

If provided, completed sections will be written to sectionScratchPath to reduce the peak memory usage of a builder to the peak memory usage of in-progress sections.

NewBuilder returns an error if sectionScratchPath specifies an invalid path on disk.

func (*Builder) Append

func (b *Builder) Append(sec SectionBuilder) error

Append flushes a SectionBuilder, buffering its data and metadata into b. Append does not enforce ordering; sections may be flushed to the dataobj in any order.

Append returns an error if the section failed to flush.

After successfully calling Append, sec is reset and can be reused.

func (*Builder) Bytes added in v3.6.0

func (b *Builder) Bytes() int

Bytes returns the current number of bytes buffered in b for all appended sections.

func (*Builder) Flush

func (b *Builder) Flush() (*Object, io.Closer, error)

Flush constructs a new Object from the accumulated sections. Allocated resources for the Object must be released by calling Close on the returned io.Closer. After closing, the returned Object must no longer be read.

Flush returns an error if the object could not be constructed. Builder.Reset is called after a successful flush to discard any pending data, allowing new data to be appended.

func (*Builder) Reset

func (b *Builder) Reset()

Reset discards pending data and resets the builder to an empty state.

type Metrics added in v3.6.0

type Metrics struct {
	// contains filtered or unexported fields
}

Metrics instruments encoded data objects.

func NewMetrics added in v3.6.0

func NewMetrics() *Metrics

NewMetrics creates a new set of metrics for encoding.

func (*Metrics) Observe added in v3.6.0

func (m *Metrics) Observe(obj *Object) error

Observe updates metrics with statistics about the given Object.

func (*Metrics) Register added in v3.6.0

func (m *Metrics) Register(reg prometheus.Registerer) error

Register registers metrics to report to reg.

func (*Metrics) Unregister added in v3.6.0

func (m *Metrics) Unregister(reg prometheus.Registerer)

Unregister unregisters metrics from the provided Registerer.

type Object added in v3.5.0

type Object struct {
	// contains filtered or unexported fields
}

An Object is a representation of a data object.

func FromBucket added in v3.5.0

func FromBucket(ctx context.Context, bucket objstore.BucketReader, path string) (*Object, error)

FromBucket opens an Object from the given storage bucket and path. FromBucket returns an error if the metadata of the Object cannot be read or if the provided ctx times out.

func FromReaderAt added in v3.5.0

func FromReaderAt(r io.ReaderAt, size int64) (*Object, error)

FromReadSeeker opens an Object from the given ReaderAt. The size argument specifies the size of the data object in bytes. FromReaderAt returns an error if the metadata of the Object cannot be read.

func (*Object) Reader added in v3.6.0

func (o *Object) Reader(ctx context.Context) (io.ReadCloser, error)

Reader returns a reader for the entire raw data object.

func (*Object) Sections added in v3.6.0

func (o *Object) Sections() Sections

Sections returns the list of sections available in the Object. The slice of returned sections must not be mutated.

func (*Object) Size added in v3.6.0

func (o *Object) Size() int64

Size returns the size of the data object in bytes.

func (*Object) Tenants added in v3.6.0

func (o *Object) Tenants() []string

Tenant returns the list of tenant that have sections in the Object. The slice of returned tenants must not be mutated.

type Section added in v3.6.0

type Section struct {
	Type   SectionType   // The type denoting the kind of data held in a section.
	Reader SectionReader // The low-level reader for a Section.

	// Tenant specifies the tenant that owns this section. Tenant is required
	// for sections which wholly contain tenant-specific data.
	Tenant string
}

A Section is a subset of an Object that holds a specific type of data. Use section packages for higher-level abstractions around sections.

type SectionBuilder added in v3.6.0

type SectionBuilder interface {
	// Type returns the SectionType representing the section being built.
	// Implementations are responsible for guaranteeing that two no
	// SectionBuilders return the same SectionType for different encodings.
	//
	// The returned Type is encoded directly into data objects. Implementations
	// that change SectionType values should be careful to continue supporting
	// old values for backwards compatibility.
	Type() SectionType

	// Flush encodes and flushes the section to w. Encodings that rely on byte
	// offsets should be relative to the first byte of the section's data.
	//
	// Flush returns the number of bytes written to w, and any error encountered
	// while encoding or flushing.
	//
	// After Flush is called, the SectionBuilder is reset to a fresh state and
	// can be reused.
	Flush(w SectionWriter) (n int64, err error)

	// Reset resets the SectionBuilder to a fresh state.
	Reset()
}

A SectionBuilder accumulates data for a single in-progress section.

Each section package provides an implementation of SectionBuilder that includes utilities to buffer data into that section. Callers should use Bytes or EstimatedSize to determine when enough data has been accumulated into a section.

type SectionReader added in v3.6.0

type SectionReader interface {
	// ExtensionData returns optional encoded information about the section
	// stored at the file level, provided through the [SectionWriter]. Sections
	// can use this for retrieving critical information that must be known
	// without needing to read the metadata first.
	//
	// ExtensionData will be nil if no extension data is available.
	ExtensionData() []byte

	// DataRange opens a reader of length bytes from the data region of a
	// section. The offset argument determines where in the data region reading
	// should start.
	//
	// DataRange returns an error if the read fails or if offset+length goes
	// beyond the readable data region. The returned reader is only valid as long
	// as the provided ctx is not canceled.
	DataRange(ctx context.Context, offset, length int64) (io.ReadCloser, error)

	// MetadataRange opens a reader of length bytes from the metadata region of
	// a section. The offset argument determines where in the metadata region
	// reading should start.
	//
	// MetadataRange returns an error if the read fails or if offset+length goes
	// beyond the readable metadata region. The returned reader is only valid as long
	// as the provided ctx is not canceled.
	MetadataRange(ctx context.Context, offset, length int64) (io.ReadCloser, error)

	// DataSize returns the total size of the data region of a section. DataSize
	// returns 0 for sections with no data region.
	DataSize() int64

	// MetadataSize returns the total size of the metadata region of a section.
	// MetadataSize returns 0 for sections with no metadata region.
	MetadataSize() int64
}

SectionReader is a low-level interface to read data ranges and metadata from a section.

Section packages provider higher-level abstractions around Section using this interface.

type SectionType added in v3.6.0

type SectionType struct {
	Namespace string // A namesapce for the section (e.g., "github.com/grafana/loki").
	Kind      string // The kind of section, scoped to the namespace (e.g., "logs").

	// Version is an optional section-specified value denoting an encoding
	// version of the section.
	Version uint32
}

SectionType uniquely identifies a Section type.

func (SectionType) Equals added in v3.6.0

func (ty SectionType) Equals(o SectionType) bool

Equals returns true if o has the same namespace and kind as ty. The Version field is not checked.

func (SectionType) String added in v3.6.0

func (ty SectionType) String() string

type SectionWriter added in v3.6.0

type SectionWriter interface {
	// WriteSection writes a section to the underlying data stream, partitioned
	// by section data and section metadata. It returns the sum of bytes written
	// from both input slices (0 <= n <= len(data)+len(metadata)) and any error
	// encountered that caused the write to stop early.
	//
	// The opts argument provides additional information about the section being
	// written. If opts is nil, the section is written without any additional
	// context.
	//
	// Implementations of WriteSection:
	//
	//   - Must return an error if the write stops early.
	//   - Must not modify the slices passed to it, even temporarily.
	//   - Must not retain references to slices after WriteSection returns.
	//
	// The physical layout of data and metadata is not defined: they may be
	// written non-contiguously, interleaved, or in any order.
	WriteSection(opts *WriteSectionOptions, data, metadata []byte) (n int64, err error)
}

SectionWriter writes data object sections to an underlying stream, such as a data object.

type Sections added in v3.6.0

type Sections []*Section

A Sections is a slice of Section.

func (Sections) Count added in v3.6.0

func (s Sections) Count(predicate func(*Section) bool) int

Count returns the number of sections that pass some predicate.

func (Sections) Filter added in v3.6.0

func (s Sections) Filter(predicate func(*Section) bool) iter.Seq2[int, *Section]

Filter returns an iterator over sections that pass some predicate. The index field is the number of the section that passed the predicate.

type WriteSectionOptions added in v3.6.0

type WriteSectionOptions struct {
	// Tenant that owns the written data and metadata. Tenant must be set for
	// sections that are wholly owned by a single tenant.
	Tenant string

	// ExtensionData is an optional field for section information to store at
	// the file level. To minimize the cost of opening data objects, sections
	// should only use this field for information that's required to start
	// reading section metadata and to keep the payload as small as possible.
	//
	// ExtensionData does not impact the return value of n in
	// [SectionWriter.WriteSection].
	//
	// Implementations of [SectionWriter] must not retain references to this
	// slice after WriteSection returns.
	ExtensionData []byte
}

WriteSectionOptions provides additional options when writing sections.

Directories

Path Synopsis
logsobj
Package logsobj provides tooling for creating logs-oriented data objects.
Package logsobj provides tooling for creating logs-oriented data objects.
indexobj
Package indexobj provides tooling for creating index-oriented data objects.
Package indexobj provides tooling for creating index-oriented data objects.
internal
arrowconv
Package arrowconv provides helper utilities for converting between Arrow and dataset values.
Package arrowconv provides helper utilities for converting between Arrow and dataset values.
dataset
Package dataset contains utilities for working with datasets.
Package dataset contains utilities for working with datasets.
result
Package result provides utilities for dealing with iterators that can fail during iteration.
Package result provides utilities for dealing with iterators that can fail during iteration.
streamio
Package streamio defines interfaces shared by other packages for streaming binary data.
Package streamio defines interfaces shared by other packages for streaming binary data.
util/bitmask
Package bitmask provides an API for creating and manipulating bitmasks of arbitrary length.
Package bitmask provides an API for creating and manipulating bitmasks of arbitrary length.
util/bufpool
Package bufpool offers a pool of *bytes.Buffer objects that are placed into exponentially sized buckets.
Package bufpool offers a pool of *bytes.Buffer objects that are placed into exponentially sized buckets.
util/protocodec
Package protocodec provides utilities for encoding and decoding protobuf messages into files.
Package protocodec provides utilities for encoding and decoding protobuf messages into files.
util/sliceclear
Package sliceclear provides a way to clear and truncate the length of a slice.
Package sliceclear provides a way to clear and truncate the length of a slice.
util/symbolizer
Package symbolizer provides a string interning mechanism to reduce memory usage by reusing identical strings.
Package symbolizer provides a string interning mechanism to reduce memory usage by reusing identical strings.
sections
internal/columnar
Package columnar provides a base implementation for sections which store columnar data using github.com/grafana/loki/v3/pkg/dataobj/internal/dataset.
Package columnar provides a base implementation for sections which store columnar data using github.com/grafana/loki/v3/pkg/dataobj/internal/dataset.
logs
Package logs defines types used for the data object logs section.
Package logs defines types used for the data object logs section.
pointers
Package pointers defines types used for the data object pointers section.
Package pointers defines types used for the data object pointers section.
streams
Package streams defines types used for the data object streams section.
Package streams defines types used for the data object streams section.
TODO(grobinson): Find a way to move this file into the dataobj package.
TODO(grobinson): Find a way to move this file into the dataobj package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL