Documentation
¶
Overview ¶
Package chunk implements streaming block splitters.
Splitters read data from a reader and produce byte slices (chunks). The size and contents of these slices depend on the splitting method.
Built-in methods include fixed-size, Rabin fingerprint, and Buzhash content-defined chunking. Additional methods can be registered with Register and instantiated through FromString.
Index ¶
Constants ¶
const ( // BlockSizeLimit is the maximum block size defined by the bitswap spec. // https://specs.ipfs.tech/bitswap-protocol/#block-sizes BlockSizeLimit int = 2 * 1024 * 1024 // 2MiB // ChunkOverheadBudget is reserved for protobuf/UnixFS framing overhead // when chunks are wrapped in non-raw leaves (--raw-leaves=false). ChunkOverheadBudget int = 256 // ChunkSizeLimit is the maximum chunk size accepted by the chunker. // It is set below BlockSizeLimit to leave room for framing overhead // so that serialized blocks stay within the 2MiB wire limit. // // In practice this limit only matters for custom chunker sizes. // The CID-deterministic profiles defined in IPIP-499 use max 1MiB // chunks, well within this limit. ChunkSizeLimit int = BlockSizeLimit - ChunkOverheadBudget )
Variables ¶
var ( ErrRabinMin = errors.New("rabin min must be greater than 16") ErrSize = errors.New("chunker size must be greater than 0") ErrSizeMax = fmt.Errorf("chunker parameters may not exceed the maximum chunk size of %d", ChunkSizeLimit) )
var DefaultBlockSize int64 = 1024 * 256
DefaultBlockSize is the chunk size that splitters produce (or aim to). Can be modified to change the default for all subsequent chunker operations. For CID-deterministic imports, prefer using UnixFSProfile presets from ipld/unixfs/io/profile.go which set this and other related globals.
var IpfsRabinPoly = chunker.Pol(17437180132763653)
IpfsRabinPoly is the irreducible polynomial of degree 53 used by for Rabin.
Functions ¶
func Chan ¶
Chan returns a channel that receives each of the chunks produced by a splitter, along with another one for errors.
func Register ¶ added in v0.38.0
func Register(name string, fn SplitterFunc)
Register makes a custom chunker available to FromString under the given name. The name is matched against the portion of the chunker string before the first dash. For example, passing "mychunker-128" to FromString selects the chunker registered as "mychunker", and the SplitterFunc receives the full string "mychunker-128" so it can parse its own parameters.
Register is typically called from an init function:
func init() {
chunk.Register("mychunker", func(r io.Reader, s string) (chunk.Splitter, error) {
// parse parameters from s, return a Splitter
})
}
Register panics if name is empty, contains a dash, fn is nil, or a chunker with the same name is already registered. This follows the convention established by database/sql.Register.
Register is safe for concurrent use.
Types ¶
type Rabin ¶
type Rabin struct {
// contains filtered or unexported fields
}
Rabin implements the Splitter interface and splits content with Rabin fingerprints.
func NewRabinMinMax ¶
NewRabinMinMax returns a new Rabin splitter which uses the given min, average and max block sizes.
type Splitter ¶
A Splitter reads bytes from a Reader and creates "chunks" (byte slices) that can be used to build DAG nodes.
func DefaultSplitter ¶
DefaultSplitter returns a SizeSplitter with the DefaultBlockSize.
func FromString ¶
FromString returns a Splitter for the given chunker specification string.
Built-in chunkers:
- "" or "default" -- fixed-size chunks using DefaultBlockSize
- "size-{size}" -- fixed-size chunks of the given byte size
- "rabin" -- Rabin fingerprint chunking with DefaultBlockSize average
- "rabin-{avg}" -- Rabin fingerprint chunking with the given average size
- "rabin-{min}-{avg}-{max}" -- Rabin with explicit bounds
- "buzhash" -- Buzhash content-defined chunking
Custom chunkers registered via Register are also available. The name is extracted as everything before the first dash.
type SplitterFunc ¶ added in v0.38.0
SplitterFunc creates a Splitter from a reader and a specification string such as "mychunker-param1-param2". It is used to register custom chunkers via Register so they become available globally through FromString. The function is responsible for parsing and validating any parameters encoded in the string.
type SplitterGen ¶
SplitterGen creates a Splitter from a reader. It is used at runtime by callers that already know which chunking strategy and parameters they want (e.g. "fixed-size at 256 KiB"). See SizeSplitterGen for a convenient way to build one.
func SizeSplitterGen ¶
func SizeSplitterGen(size int64) SplitterGen
SizeSplitterGen returns a SplitterGen that creates a fixed-size Splitter with the given block size.