chunk

package
v0.38.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 9, 2026 License: Apache-2.0, MIT Imports: 11 Imported by: 60

Documentation

Overview

Package chunk implements streaming block splitters.

Splitters read data from a reader and produce byte slices (chunks). The size and contents of these slices depend on the splitting method.

Built-in methods include fixed-size, Rabin fingerprint, and Buzhash content-defined chunking. Additional methods can be registered with Register and instantiated through FromString.

Index

Constants

View Source
const (
	// BlockSizeLimit is the maximum block size defined by the bitswap spec.
	// https://specs.ipfs.tech/bitswap-protocol/#block-sizes
	BlockSizeLimit int = 2 * 1024 * 1024 // 2MiB

	// ChunkOverheadBudget is reserved for protobuf/UnixFS framing overhead
	// when chunks are wrapped in non-raw leaves (--raw-leaves=false).
	ChunkOverheadBudget int = 256

	// ChunkSizeLimit is the maximum chunk size accepted by the chunker.
	// It is set below BlockSizeLimit to leave room for framing overhead
	// so that serialized blocks stay within the 2MiB wire limit.
	//
	// In practice this limit only matters for custom chunker sizes.
	// The CID-deterministic profiles defined in IPIP-499 use max 1MiB
	// chunks, well within this limit.
	ChunkSizeLimit int = BlockSizeLimit - ChunkOverheadBudget
)

Variables

View Source
var (
	ErrRabinMin = errors.New("rabin min must be greater than 16")
	ErrSize     = errors.New("chunker size must be greater than 0")
	ErrSizeMax  = fmt.Errorf("chunker parameters may not exceed the maximum chunk size of %d", ChunkSizeLimit)
)
View Source
var DefaultBlockSize int64 = 1024 * 256

DefaultBlockSize is the chunk size that splitters produce (or aim to). Can be modified to change the default for all subsequent chunker operations. For CID-deterministic imports, prefer using UnixFSProfile presets from ipld/unixfs/io/profile.go which set this and other related globals.

View Source
var IpfsRabinPoly = chunker.Pol(17437180132763653)

IpfsRabinPoly is the irreducible polynomial of degree 53 used by for Rabin.

Functions

func Chan

func Chan(s Splitter) (<-chan []byte, <-chan error)

Chan returns a channel that receives each of the chunks produced by a splitter, along with another one for errors.

func Register added in v0.38.0

func Register(name string, fn SplitterFunc)

Register makes a custom chunker available to FromString under the given name. The name is matched against the portion of the chunker string before the first dash. For example, passing "mychunker-128" to FromString selects the chunker registered as "mychunker", and the SplitterFunc receives the full string "mychunker-128" so it can parse its own parameters.

Register is typically called from an init function:

func init() {
    chunk.Register("mychunker", func(r io.Reader, s string) (chunk.Splitter, error) {
        // parse parameters from s, return a Splitter
    })
}

Register panics if name is empty, contains a dash, fn is nil, or a chunker with the same name is already registered. This follows the convention established by database/sql.Register.

Register is safe for concurrent use.

Types

type Buzhash

type Buzhash struct {
	// contains filtered or unexported fields
}

func NewBuzhash

func NewBuzhash(r io.Reader) *Buzhash

func (*Buzhash) NextBytes

func (b *Buzhash) NextBytes() ([]byte, error)

func (*Buzhash) Reader

func (b *Buzhash) Reader() io.Reader

type Rabin

type Rabin struct {
	// contains filtered or unexported fields
}

Rabin implements the Splitter interface and splits content with Rabin fingerprints.

func NewRabin

func NewRabin(r io.Reader, avgBlkSize uint64) *Rabin

NewRabin creates a new Rabin splitter with the given average block size.

func NewRabinMinMax

func NewRabinMinMax(r io.Reader, min, avg, max uint64) *Rabin

NewRabinMinMax returns a new Rabin splitter which uses the given min, average and max block sizes.

func (*Rabin) NextBytes

func (r *Rabin) NextBytes() ([]byte, error)

NextBytes reads the next bytes from the reader and returns a slice.

func (*Rabin) Reader

func (r *Rabin) Reader() io.Reader

Reader returns the io.Reader associated to this Splitter.

type Splitter

type Splitter interface {
	Reader() io.Reader
	NextBytes() ([]byte, error)
}

A Splitter reads bytes from a Reader and creates "chunks" (byte slices) that can be used to build DAG nodes.

func DefaultSplitter

func DefaultSplitter(r io.Reader) Splitter

DefaultSplitter returns a SizeSplitter with the DefaultBlockSize.

func FromString

func FromString(r io.Reader, chunker string) (Splitter, error)

FromString returns a Splitter for the given chunker specification string.

Built-in chunkers:

  • "" or "default" -- fixed-size chunks using DefaultBlockSize
  • "size-{size}" -- fixed-size chunks of the given byte size
  • "rabin" -- Rabin fingerprint chunking with DefaultBlockSize average
  • "rabin-{avg}" -- Rabin fingerprint chunking with the given average size
  • "rabin-{min}-{avg}-{max}" -- Rabin with explicit bounds
  • "buzhash" -- Buzhash content-defined chunking

Custom chunkers registered via Register are also available. The name is extracted as everything before the first dash.

func NewSizeSplitter

func NewSizeSplitter(r io.Reader, size int64) Splitter

NewSizeSplitter returns a new size-based Splitter with the given block size.

type SplitterFunc added in v0.38.0

type SplitterFunc func(r io.Reader, chunker string) (Splitter, error)

SplitterFunc creates a Splitter from a reader and a specification string such as "mychunker-param1-param2". It is used to register custom chunkers via Register so they become available globally through FromString. The function is responsible for parsing and validating any parameters encoded in the string.

type SplitterGen

type SplitterGen func(r io.Reader) Splitter

SplitterGen creates a Splitter from a reader. It is used at runtime by callers that already know which chunking strategy and parameters they want (e.g. "fixed-size at 256 KiB"). See SizeSplitterGen for a convenient way to build one.

func SizeSplitterGen

func SizeSplitterGen(size int64) SplitterGen

SizeSplitterGen returns a SplitterGen that creates a fixed-size Splitter with the given block size.

Directories

Path Synopsis
This file generates bytehash LUT
This file generates bytehash LUT

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL