blob

package

v2.1.0 Latest Latest Go to latest Published: Aug 27, 2025 License: BSD-3-Clause Imports: 19 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/cockroachdb/pebble

Links

Open Source Insights

Documentation ¶

Overview ¶

Package blob implements mechanics for encoding and decoding values into blob files.

Blob file format ¶

A blob file consists of a sequence of blob-value blocks containing values, followed by an index block describing the location of the blob-value blocks. At the tail of the file is a fixed-size footer encoding the exact offset and length of the index block.

Semantically, a blob file implements an array of blob values. SSTables that reference separated blob values encode a tuple of a (blockID, blockValueID) to identify the value within the blob file. The blockID identifies the blob-value block that contains the value. The blockValueID identifies the value within the block. A reader retrieving a particular value uses the index block to identify the offset and length of the blob-value block containing the referenced value. It loads the identified blob-value block and then uses the block's internal structure to retrieve the value based on the blockValueID.

A blob file may be rewritten (without rewriting referencing sstables) to remove unused values. Extant handles within sstables must continue to work. See the Sparseness section below for more details.

## Index Block

The index block is used to determine which blob-value block contains a particular value and the block's physical offset and length within the file. The index block uses a columnar encoding (see pkg colblk) to encode two columns:

**Virtual Blocks**: an array of uints that is only non-empty for blob files that have been rewritten. The length of the array is identified by the first 4 bytes of the index block as a custom block header. Within the array each 64-bit uint value's least significant 32 bits encode the index of the physical block containing the original block's data. This index can be used to look up the byte offset and length of the physical block within the index block's offsets column. The most significant 32 bits of each uint value encode a BlockValueID offset that remaps the original BlockValueID to the corresponding BlockValueID in the new physical block. A reader adds this BlockValueID offset to a handle's BlockValueID to get the index of the value within the physical block.

TODO(jackson,radu): Consider interleaving the encoding of the uints so that in the common case of <64K blocks and <64K values per-block, the uint column can be encoded in 32-bits. See related issue https://github.com/cockroachdb/pebble/v2/issues/4426.

**Offsets**: an array of uints encoding the offset in the blob file at which each block begins. There are <num-physical-blocks>+1 offsets. The last offset points to the first byte after the last block. The length of each block is inferred through the difference between consecutive offsets.

## Blob Value Blocks

A blob value block is a columnar block encoding blob values. It encodes a single column: a RawBytes of values. The colblk.RawBytes encoding allows constant-time access to the i'th value within the block.

## Sparseness

A rewrite of a blob file elides values that are no longer referenced, conserving disk space. Within a value block, an absent value is represented as an empty byte slice within the RawBytes column. This requires the overhead of 1 additional offset within the RawBytes encoding (typically 2-4 bytes).

If a wide swath of values are no longer referenced, entire blocks may elided. When this occurs, the index block's virtual blocks column will map multiple of the original blockIDs to the same physical block.

We expect significant locality to gaps in referenced values. Compactions will remove swaths of references all at once, typically all the values of keys that fall within a narrow keyspan. This locality allows us to represent most sparseness using the gaps between blocks, without suffering the 2-4 bytes of overhead for absent values internally within a block.

Note: If we find this locality not hold for some reason, we can extend the blob-value block format to encode a NullBitmap. This would allow us to represent missing values using 2-bits per missing value.

## Diagram

+------------------------------------------------------------------------------+ | BLOB FILE FORMAT | +------------------------------------------------------------------------------+ | Value Block #0 | | +----------------------------------------------------------------------+ | | | RawBytes[...] | | | +----------------------------------------------------------------------+ | | Value Block #1 | | +----------------------------------------------------------------------+ | | | RawBytes[...] | | | +----------------------------------------------------------------------+ | | ... | | Value Block #N | | +----------------------------------------------------------------------+ | | | RawBytes[...] | | | +----------------------------------------------------------------------+ | | | +------------------------------- Index Block ----------------------------------+ | Custom Header (4 bytes) | | Num virtual blocks: M | | +---------Virtual blocks (M)--------+ +--------Offsets(N+1)---------+ | | | idx block index valueIDoffset | | idx offset | | | | 0 0 0 | | 0 0 | | | | 1 0 0 | | 1 32952 | | | | 2 0 32 | | 2 65904 | | | | 3 1 0 | | 3 92522 | | | | 4 2 0 | | 4 125474 | | | | 5 3 0 | +-----------------------------+ | | +-----------------------------------+ | +----------------------------- Footer (30 bytes) ------------------------------+ | CRC Checksum (4 bytes) | | Index Block Offset (8 bytes) | | Index Block Length (8 bytes) | | Checksum Type (1 byte) | | Format (1 byte) | | Magic String (8 bytes) | +------------------------------------------------------------------------------+

Index ¶

Constants
type BlockID
type BlockValueID
type FileFormat
- func (f FileFormat) String() string
type FileMapping
type FileReader
- func NewFileReader(ctx context.Context, r objstorage.Readable, ro FileReaderOptions) (*FileReader, error)
- func (r *FileReader) Close() error
- func (r *FileReader) IndexHandle() block.Handle
- func (r *FileReader) InitReadHandle(rh *objstorageprovider.PreallocatedReadHandle) objstorage.ReadHandle
- func (r *FileReader) Layout() (string, error)
- func (r *FileReader) ReadIndexBlock(ctx context.Context, env block.ReadEnv, rh objstorage.ReadHandle) (block.BufferHandle, error)
- func (r *FileReader) ReadValueBlock(ctx context.Context, env block.ReadEnv, rh objstorage.ReadHandle, ...) (block.BufferHandle, error)
type FileReaderOptions
type FileRewriter
- func NewFileRewriter(fileID base.BlobFileID, inputFileNum base.DiskFileNum, rp ReaderProvider, ...) *FileRewriter
- func (rw *FileRewriter) Close() (FileWriterStats, error)
- func (rw *FileRewriter) CopyBlock(ctx context.Context, blockID BlockID, totalValueSize int, valueIDs []int) error
type FileWriter
- func NewFileWriter(fn base.DiskFileNum, w objstorage.Writable, opts FileWriterOptions) *FileWriter
- func (w *FileWriter) AddValue(v []byte) Handle
- func (w *FileWriter) Close() (FileWriterStats, error)
- func (w *FileWriter) EstimatedSize() uint64
- func (w *FileWriter) FlushForTesting()
type FileWriterOptions
type FileWriterStats
- func (s FileWriterStats) String() string
type Handle
- func (h Handle) SafeFormat(w redact.SafePrinter, _ rune)
- func (h Handle) String() string
type HandleSuffix
- func DecodeHandleSuffix(src []byte) HandleSuffix
- func (h HandleSuffix) Encode(b []byte) int
type InlineHandle
- func (h InlineHandle) Encode(b []byte) int
- func (h InlineHandle) SafeFormat(w redact.SafePrinter, _ rune)
- func (h InlineHandle) String() string
type InlineHandlePreface
- func DecodeInlineHandlePreface(src []byte) (InlineHandlePreface, []byte)
type ReaderProvider
type ReferenceID
type ValueFetcher
- func (r *ValueFetcher) Close() error
- func (r *ValueFetcher) Fetch(ctx context.Context, blobFileID base.BlobFileID, blockID BlockID, ...) (val []byte, callerOwned bool, err error)
- func (r *ValueFetcher) FetchHandle(ctx context.Context, handle []byte, blobFileID base.BlobFileID, valLen uint32, ...) (val []byte, callerOwned bool, err error)
- func (r *ValueFetcher) Init(fm FileMapping, rp ReaderProvider, env block.ReadEnv)
type ValueReader

Constants ¶

View Source

const MaxInlineHandleLength = 4 * binary.MaxVarintLen32

MaxInlineHandleLength is the maximum length of an inline blob handle.

Handle fields are varint encoded, so maximum 5 bytes each.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type BlockID ¶

type BlockID uint32

BlockID identifies a block within a blob file. If a blob file has not been rewritten, the block ID is simply an index of the block within the file. If the blob file has been rewritten to reclaim disk space, the rewritten blob file will contain fewer blocks than the original. The rewritten blob file's index block contains a column mapping the original block ID to the index of the block in the new blob file containing the original block's data.

type BlockValueID ¶

type BlockValueID uint32

BlockValueID identifies a value within a block of a blob file. The BlockValueID is local to the block. The BlockValueID is an index 0..n-1 into the array of values in the original blob-value block.

type FileFormat ¶

type FileFormat uint8

FileFormat identifies the format of a blob file.

const (
	// FileFormatV1 is the first version of the blob file format.
	FileFormatV1 FileFormat = 1
)

func (FileFormat) String ¶

func (f FileFormat) String() string

String implements the fmt.Stringer interface.

type FileMapping ¶

type FileMapping interface {
	// Lookup returns the disk file number for the given blob file ID. It
	// returns false for the second return value if the blob file ID is not
	// present in the mapping.
	Lookup(base.BlobFileID) (base.DiskFileNum, bool)
}

A FileMapping defines the mapping between blob file IDs and disk file numbers. It's implemented by *manifest.BlobFileSet.

type FileReader ¶

type FileReader struct {
	// contains filtered or unexported fields
}

FileReader reads a blob file. If you update this struct, make sure you also update the magic number in StringForTests() in metrics.go.

func NewFileReader ¶

func NewFileReader(
	ctx context.Context, r objstorage.Readable, ro FileReaderOptions,
) (*FileReader, error)

NewFileReader opens a blob file for reading.

In error cases, the objstorage.Readable is still open. The caller remains responsible for closing it if necessary.

func (*FileReader) Close ¶

func (r *FileReader) Close() error

Close implements io.Closer, closing the underlying Readable.

func (*FileReader) IndexHandle ¶

func (r *FileReader) IndexHandle() block.Handle

IndexHandle returns the block handle for the file's index block.

func (*FileReader) InitReadHandle ¶

func (r *FileReader) InitReadHandle(
	rh *objstorageprovider.PreallocatedReadHandle,
) objstorage.ReadHandle

InitReadHandle initializes a read handle for the file reader, using the provided preallocated read handle.

func (*FileReader) Layout ¶

func (r *FileReader) Layout() (string, error)

Layout returns the layout (block organization) as a string for a blob file.

func (*FileReader) ReadIndexBlock ¶

func (r *FileReader) ReadIndexBlock(
	ctx context.Context, env block.ReadEnv, rh objstorage.ReadHandle,
) (block.BufferHandle, error)

ReadIndexBlock reads the index block from the file.

func (*FileReader) ReadValueBlock ¶

func (r *FileReader) ReadValueBlock(
	ctx context.Context, env block.ReadEnv, rh objstorage.ReadHandle, h block.Handle,
) (block.BufferHandle, error)

ReadValueBlock reads a value block from the file.

type FileReaderOptions ¶

type FileReaderOptions struct {
	block.ReaderOptions
}

FileReaderOptions configures a reader of a blob file.

type FileRewriter ¶

type FileRewriter struct {
	// contains filtered or unexported fields
}

A FileRewriter copies values from an input blob file, outputting a new blob file containing a subset of the original blob file's values. The original Handles used to access values in the original blob file will continue to work with the new blob file, as long as the value was copied during rewrite.

func NewFileRewriter ¶

func NewFileRewriter(
	fileID base.BlobFileID,
	inputFileNum base.DiskFileNum,
	rp ReaderProvider,
	readEnv block.ReadEnv,
	outputFileNum base.DiskFileNum,
	w objstorage.Writable,
	opts FileWriterOptions,
) *FileRewriter

NewFileRewriter creates a new FileRewriter that will copy values from the input blob file to the output blob file.

func (*FileRewriter) Close ¶

func (rw *FileRewriter) Close() (FileWriterStats, error)

Close finishes writing the output blob file and releases resources.

func (*FileRewriter) CopyBlock ¶

func (rw *FileRewriter) CopyBlock(
	ctx context.Context, blockID BlockID, totalValueSize int, valueIDs []int,
) error

CopyBlock copies the values for the given blockID to the output blob file. CopyBlock must be called with ascending blockIDs. The totalValueSize must be the size of all the values indicated by valueIDs.

type FileWriter ¶

type FileWriter struct {
	// contains filtered or unexported fields
}

A FileWriter writes a blob file.

func NewFileWriter ¶

func NewFileWriter(fn base.DiskFileNum, w objstorage.Writable, opts FileWriterOptions) *FileWriter

NewFileWriter creates a new FileWriter.

func (*FileWriter) AddValue ¶

func (w *FileWriter) AddValue(v []byte) Handle

AddValue adds the provided value to the blob file, returning a Handle identifying the location of the value.

func (*FileWriter) Close ¶

func (w *FileWriter) Close() (FileWriterStats, error)

Close finishes writing the blob file.

func (*FileWriter) EstimatedSize ¶

func (w *FileWriter) EstimatedSize() uint64

EstimatedSize returns an estimate of the disk space consumed by the blob file if it were closed now.

func (*FileWriter) FlushForTesting ¶

func (w *FileWriter) FlushForTesting()

FlushForTesting flushes the current block to the write queue. Writers should generally not call FlushForTesting, and instead let the heuristics configured through FileWriterOptions handle flushing.

It's exposed so that tests can force flushes to construct blob files with arbitrary structures.

type FileWriterOptions ¶

type FileWriterOptions struct {
	Compression   *block.CompressionProfile
	ChecksumType  block.ChecksumType
	FlushGovernor block.FlushGovernor
	// Only CPUMeasurer.MeasureCPUBlobFileSecondary is used.
	CpuMeasurer base.CPUMeasurer
}

FileWriterOptions are used to configure the FileWriter.

type FileWriterStats ¶

type FileWriterStats struct {
	BlockCount             uint32
	ValueCount             uint32
	UncompressedValueBytes uint64
	FileLen                uint64
}

FileWriterStats aggregates statistics about a blob file written by a FileWriter.

func (FileWriterStats) String ¶

func (s FileWriterStats) String() string

String implements the fmt.Stringer interface.

type Handle ¶

type Handle struct {
	BlobFileID base.BlobFileID
	ValueLen   uint32
	// BlockID identifies the block within the blob file containing the value.
	BlockID BlockID
	// ValueID identifies the value within the block identified by BlockID.
	ValueID BlockValueID
}

Handle describes the location of a value stored within a blob file.

func (Handle) SafeFormat ¶

func (h Handle) SafeFormat(w redact.SafePrinter, _ rune)

SafeFormat implements redact.SafeFormatter.

func (Handle) String ¶

func (h Handle) String() string

String implements the fmt.Stringer interface.

type HandleSuffix ¶

type HandleSuffix struct {
	BlockID BlockID
	ValueID BlockValueID
}

HandleSuffix is the suffix of an inline handle. It's decoded only when the value is being fetched from the blob file.

func DecodeHandleSuffix ¶

func DecodeHandleSuffix(src []byte) HandleSuffix

DecodeHandleSuffix decodes the HandleSuffix from the provided buffer.

func (HandleSuffix) Encode ¶

func (h HandleSuffix) Encode(b []byte) int

Encode encodes the handle suffix into the provided buffer, returning the number of bytes encoded.

type InlineHandle ¶

type InlineHandle struct {
	InlineHandlePreface
	HandleSuffix
}

InlineHandle describes a handle as it is encoded within a sstable block. The inline handle does not encode the blob file number outright. Instead it encodes an index into the containing sstable's BlobReferences.

The inline handle is composed of two parts: a preface (InlineHandlePreface) and a suffix (HandleSuffix). The preface is eagerly decoded from the encoded handle when returning an InternalValue to higher layers. The remaining bits (the suffix) are decoded only when the value is being fetched from the blob file.

func (InlineHandle) Encode ¶

func (h InlineHandle) Encode(b []byte) int

Encode encodes the inline handle into the provided buffer, returning the number of bytes encoded.

func (InlineHandle) SafeFormat ¶

func (h InlineHandle) SafeFormat(w redact.SafePrinter, _ rune)

SafeFormat implements redact.SafeFormatter.

func (InlineHandle) String ¶

func (h InlineHandle) String() string

String implements the fmt.Stringer interface.

type InlineHandlePreface ¶

type InlineHandlePreface struct {
	ReferenceID ReferenceID
	ValueLen    uint32
}

InlineHandlePreface is the prefix of an inline handle. It's eagerly decoded when returning an InternalValue to higher layers.

func DecodeInlineHandlePreface ¶

func DecodeInlineHandlePreface(src []byte) (InlineHandlePreface, []byte)

DecodeInlineHandlePreface decodes the blob reference index and value length from the beginning of a variable-width encoded InlineHandle.

type ReaderProvider ¶

type ReaderProvider interface {
	// GetValueReader returns a ValueReader for the given file number.
	GetValueReader(ctx context.Context, fileNum base.DiskFileNum) (r ValueReader, closeFunc func(), err error)
}

A ReaderProvider is an interface that can be used to retrieve a ValueReader for a given file number.

type ReferenceID ¶

type ReferenceID uint32

ReferenceID identifies a particular blob reference within a table. It's implemented as an index into the slice of the BlobReferences recorded in the manifest.

type ValueFetcher ¶

type ValueFetcher struct {
	// contains filtered or unexported fields
}

A ValueFetcher retrieves values stored out-of-band in separate blob files. The ValueFetcher caches accessed file readers to avoid redundant file cache and block cache lookups when performing consecutive value retrievals.

A single ValueFetcher can be used to fetch values from multiple files, and it will internally cache readers for each file.

When finished with a ValueFetcher, one must call Close to release all cached readers and block buffers.

func (*ValueFetcher) Close ¶

func (r *ValueFetcher) Close() error

Close closes the ValueFetcher and releases all cached readers. Once Close is called, the ValueFetcher is no longer usable.

func (*ValueFetcher) Fetch ¶

func (r *ValueFetcher) Fetch(
	ctx context.Context, blobFileID base.BlobFileID, blockID BlockID, valueID BlockValueID,
) (val []byte, callerOwned bool, err error)

Fetch is like FetchHandle, but it constructs handle and does not validate the value length. Fetch must not be called after Close.

func (*ValueFetcher) FetchHandle ¶

func (r *ValueFetcher) FetchHandle(
	ctx context.Context, handle []byte, blobFileID base.BlobFileID, valLen uint32, buf []byte,
) (val []byte, callerOwned bool, err error)

FetchHandle returns the value, given the handle. FetchHandle must not be called after Close.

func (*ValueFetcher) Init ¶

func (r *ValueFetcher) Init(fm FileMapping, rp ReaderProvider, env block.ReadEnv)

Init initializes the ValueFetcher.

type ValueReader ¶

type ValueReader interface {
	// IndexHandle returns the handle for the file's index block.
	IndexHandle() block.Handle

	// InitReadHandle initializes a ReadHandle for the file, using the provided
	// preallocated read handle to avoid an allocation.
	InitReadHandle(rh *objstorageprovider.PreallocatedReadHandle) objstorage.ReadHandle

	// ReadValueBlock retrieves a value block described by the provided block
	// handle from the block cache, or reads it from the blob file if it's not
	// already cached.
	ReadValueBlock(context.Context, block.ReadEnv, objstorage.ReadHandle,
		block.Handle) (block.BufferHandle, error)

	// ReadIndexBlock retrieves the index block from the block cache, or reads
	// it from the blob file if it's not already cached.
	ReadIndexBlock(context.Context, block.ReadEnv, objstorage.ReadHandle) (block.BufferHandle, error)
}

A ValueReader is an interface defined over a file that can be used to read value blocks.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL