chunk

package

v0.38.0 Latest Latest Go to latest Published: Apr 9, 2026 License: Apache-2.0, MIT Imports: 11 Imported by: 60

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/ipfs/boxo

Links

Open Source Insights

Documentation ¶

Overview ¶

Package chunk implements streaming block splitters.

Splitters read data from a reader and produce byte slices (chunks). The size and contents of these slices depend on the splitting method.

Built-in methods include fixed-size, Rabin fingerprint, and Buzhash content-defined chunking. Additional methods can be registered with Register and instantiated through FromString.

Index ¶

Constants
Variables
func Chan(s Splitter) (<-chan []byte, <-chan error)
func Register(name string, fn SplitterFunc)
type Buzhash
- func NewBuzhash(r io.Reader) *Buzhash
- func (b *Buzhash) NextBytes() ([]byte, error)
- func (b *Buzhash) Reader() io.Reader
type Rabin
- func NewRabin(r io.Reader, avgBlkSize uint64) *Rabin
- func NewRabinMinMax(r io.Reader, min, avg, max uint64) *Rabin
- func (r *Rabin) NextBytes() ([]byte, error)
- func (r *Rabin) Reader() io.Reader
type Splitter
type SplitterFunc
type SplitterGen
- func SizeSplitterGen(size int64) SplitterGen

Constants ¶

View Source

const (
	// BlockSizeLimit is the maximum block size defined by the bitswap spec.
	// https://specs.ipfs.tech/bitswap-protocol/#block-sizes
	BlockSizeLimit int = 2 * 1024 * 1024 // 2MiB

	// ChunkOverheadBudget is reserved for protobuf/UnixFS framing overhead
	// when chunks are wrapped in non-raw leaves (--raw-leaves=false).
	ChunkOverheadBudget int = 256

	// ChunkSizeLimit is the maximum chunk size accepted by the chunker.
	// It is set below BlockSizeLimit to leave room for framing overhead
	// so that serialized blocks stay within the 2MiB wire limit.
	//
	// In practice this limit only matters for custom chunker sizes.
	// The CID-deterministic profiles defined in IPIP-499 use max 1MiB
	// chunks, well within this limit.
	ChunkSizeLimit int = BlockSizeLimit - ChunkOverheadBudget
)

Variables ¶

View Source

var (
	ErrRabinMin = errors.New("rabin min must be greater than 16")
	ErrSize     = errors.New("chunker size must be greater than 0")
	ErrSizeMax  = fmt.Errorf("chunker parameters may not exceed the maximum chunk size of %d", ChunkSizeLimit)
)

View Source

var DefaultBlockSize int64 = 1024 * 256

DefaultBlockSize is the chunk size that splitters produce (or aim to). Can be modified to change the default for all subsequent chunker operations. For CID-deterministic imports, prefer using UnixFSProfile presets from ipld/unixfs/io/profile.go which set this and other related globals.

View Source

var IpfsRabinPoly = chunker.Pol(17437180132763653)

IpfsRabinPoly is the irreducible polynomial of degree 53 used by for Rabin.

Functions ¶

func Chan ¶

func Chan(s Splitter) (<-chan []byte, <-chan error)

Chan returns a channel that receives each of the chunks produced by a splitter, along with another one for errors.

func Register ¶ added in v0.38.0

func Register(name string, fn SplitterFunc)

Register makes a custom chunker available to FromString under the given name. The name is matched against the portion of the chunker string before the first dash. For example, passing "mychunker-128" to FromString selects the chunker registered as "mychunker", and the SplitterFunc receives the full string "mychunker-128" so it can parse its own parameters.

Register is typically called from an init function:

func init() {
    chunk.Register("mychunker", func(r io.Reader, s string) (chunk.Splitter, error) {
        // parse parameters from s, return a Splitter
    })
}

Register panics if name is empty, contains a dash, fn is nil, or a chunker with the same name is already registered. This follows the convention established by database/sql.Register.

Register is safe for concurrent use.

Types ¶

type Buzhash ¶

type Buzhash struct {
	// contains filtered or unexported fields
}

func NewBuzhash ¶

func NewBuzhash(r io.Reader) *Buzhash

func (*Buzhash) NextBytes ¶

func (b *Buzhash) NextBytes() ([]byte, error)

func (*Buzhash) Reader ¶

func (b *Buzhash) Reader() io.Reader

type Rabin ¶

type Rabin struct {
	// contains filtered or unexported fields
}

Rabin implements the Splitter interface and splits content with Rabin fingerprints.

func NewRabin ¶

func NewRabin(r io.Reader, avgBlkSize uint64) *Rabin

NewRabin creates a new Rabin splitter with the given average block size.

func NewRabinMinMax ¶

func NewRabinMinMax(r io.Reader, min, avg, max uint64) *Rabin

NewRabinMinMax returns a new Rabin splitter which uses the given min, average and max block sizes.

func (*Rabin) NextBytes ¶

func (r *Rabin) NextBytes() ([]byte, error)

NextBytes reads the next bytes from the reader and returns a slice.

func (*Rabin) Reader ¶

func (r *Rabin) Reader() io.Reader

Reader returns the io.Reader associated to this Splitter.

type Splitter ¶

type Splitter interface {
	Reader() io.Reader
	NextBytes() ([]byte, error)
}

A Splitter reads bytes from a Reader and creates "chunks" (byte slices) that can be used to build DAG nodes.

func DefaultSplitter ¶

func DefaultSplitter(r io.Reader) Splitter

DefaultSplitter returns a SizeSplitter with the DefaultBlockSize.

func FromString ¶

func FromString(r io.Reader, chunker string) (Splitter, error)

FromString returns a Splitter for the given chunker specification string.

Built-in chunkers:

"" or "default" -- fixed-size chunks using DefaultBlockSize
"size-{size}" -- fixed-size chunks of the given byte size
"rabin" -- Rabin fingerprint chunking with DefaultBlockSize average
"rabin-{avg}" -- Rabin fingerprint chunking with the given average size
"rabin-{min}-{avg}-{max}" -- Rabin with explicit bounds
"buzhash" -- Buzhash content-defined chunking

Custom chunkers registered via Register are also available. The name is extracted as everything before the first dash.

func NewSizeSplitter ¶

func NewSizeSplitter(r io.Reader, size int64) Splitter

NewSizeSplitter returns a new size-based Splitter with the given block size.

type SplitterFunc ¶ added in v0.38.0

type SplitterFunc func(r io.Reader, chunker string) (Splitter, error)

SplitterFunc creates a Splitter from a reader and a specification string such as "mychunker-param1-param2". It is used to register custom chunkers via Register so they become available globally through FromString. The function is responsible for parsing and validating any parameters encoded in the string.

type SplitterGen ¶

type SplitterGen func(r io.Reader) Splitter

SplitterGen creates a Splitter from a reader. It is used at runtime by callers that already know which chunking strategy and parameters they want (e.g. "fixed-size at 256 KiB"). See SizeSplitterGen for a convenient way to build one.

func SizeSplitterGen ¶

func SizeSplitterGen(size int64) SplitterGen

SizeSplitterGen returns a SplitterGen that creates a fixed-size Splitter with the given block size.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
gen This file generates bytehash LUT	This file generates bytehash LUT

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL