Documentation
¶
Overview ¶
Package fastcdc is a Go implementation of the FastCDC content defined chunking algorithm. See https://www.usenix.org/system/files/conference/atc16/atc16-paper-xia.pdf for details.
Example (Basic) ¶
data := make([]byte, 10*1024*1024)
rand.Seed(4542)
rand.Read(data)
rd := bytes.NewReader(data)
chunker, err := fastcdc.NewChunker(rd, fastcdc.Options{
AverageSize: 1024 * 1024, // target 1 MiB average chunk size
})
if err != nil {
log.Fatal(err)
}
fmt.Printf("%-32s %s\n", "CHECKSUM", "CHUNK SIZE")
for {
chunk, err := chunker.Next()
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}
fmt.Printf("%x %d\n", md5.Sum(chunk.Data), chunk.Length)
}
Output: CHECKSUM CHUNK SIZE d5bb40f862d68f4c3a2682e6d433f0d7 1788060 113a0aa2023d7dce6a2aac1f807b5bd2 1117240 5b9147b10d4fe6f96282da481ce848ca 1180487 dcc4644befb599fa644635b0c5a1ea2c 1655501 224db3de422ad0dd2c840e3e24e0cb03 363172 e071658eccda587789f1dabb6f773851 1227750 215868103f0b4ea7f715e179e5b9a6c7 1451026 21e65e40970ec22f5b13ddf60493b746 1150129 b8209a1dbef955ef64636af796450252 552395
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Chunk ¶
type Chunk struct {
// Offset is the number of bytes from the start of the reader to the beginning of
// the chunk.
Offset int
// Length is the length of the chunk in bytes. Same as len(Data).
Length int
// Data is the chunk data.
Data []byte
// Fingerprint is the value of the rolling hash algorithm for the chunk data.
Fingerprint uint64
}
Chunk stores a content-defined chunk returned by a Chunker.
type Chunker ¶
type Chunker struct {
// contains filtered or unexported fields
}
Chunker implements the FastCDC content defined chunking algorithm. See https://www.usenix.org/system/files/conference/atc16/atc16-paper-xia.pdf.
func NewChunker ¶
NewChunker returns a Chunker with the given Options.
type Options ¶
type Options struct {
// NormalSize is the target chunk size. Typically a power of 2. It must be in the
// range 64B to 1GiB.
AverageSize int
// (Optional) MinSize is the minimum allowed chunk size. By default, it's set to
// AverageSize / 4.
MinSize int
// (Optional) MaxSize is the maximum allowed chunk size. By default, it's set to
// AverageSize * 4.
MaxSize int
// (Optional) Sets the chunk normalization level. It may be set to 1, 2 or 3,
// unless DisableNormalization is set, in which case it's ignored. By default,
// it's set to 2.
Normalization int
// (Optional) DisableNormalization turns normalization off. By default, it's set to
// false.
DisableNormalization bool
// (Optional) Seed alters the lookup table of the rolling hash algorithm to mitigate
// chunk-size based fingerprinting attacks. It may be set to a random uint64.
Seed uint64
// (Optional) BufSize is the size of the internal buffer used while chunking. It has
// no effect on the chuking output, but performance is improved with larger buffers.
// It must be at least MaxSize. Recommended values are 1 to 3 times MaxSize. By
// default it is set to MaxSize * 2.
BufSize int
}
Options configures the options for the Chunker.
Click to show internal directories.
Click to hide internal directories.
