bbloom

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 8, 2026 License: MIT Imports: 7 Imported by: 30

README

bbloom

GoDoc codecov

A fast bloom filter with a real bitset, JSON serialization, and thread-safe variants.

Why this fork

Forked from AndreasBriese/bbloom in 2019 after the upstream became unmaintained. The fork fixes safety and correctness issues, and adds features needed for production use:

  • Caller-provided SipHash keys (NewWithKeys) to prevent hash-flooding with untrusted input
  • Fixed double-hash step to always be odd, avoiding degenerate probe sequences
  • SipHash keys preserved across JSON serialization round-trips
  • Proper error handling in deserialization

The library may contain IPFS-specific optimizations but works as a general-purpose bloom filter.

Usage

// create a bloom filter for 65536 items and 0.1% false-positive rate
bf, _ := bbloom.New(float64(1<<16), float64(0.001))

// or specify size and hash locations explicitly
// bf, _ = bbloom.New(650000.0, 7.0)

// add an item
bf.Add([]byte("butter"))

// check membership
bf.Has([]byte("butter"))    // true
bf.Has([]byte("Butter"))    // false

// add only if not already present
bf.AddIfNotHas([]byte("butter"))  // false (already in set)
bf.AddIfNotHas([]byte("buTTer"))  // true  (new entry)

// thread-safe variants: AddTS, HasTS, AddIfNotHasTS
bf.AddTS([]byte("peanutbutter"))
bf.HasTS([]byte("peanutbutter"))  // true

// JSON serialization
data := bf.JSONMarshal()
restored, _ := bbloom.JSONUnmarshal(data)
restored.Has([]byte("butter"))    // true

Used in IPFS

Kubo and Boxo use this library where CID deduplication or tracking is needed but the number of CIDs is too large to keep in memory as a map. Two main use cases:

  • Blockstore bloom cache: answers Has() checks without hitting the datastore, filtering out the majority of negative lookups.
  • DAG walker dedup: tracks visited CIDs during DAG traversal in the provider/reprovide system, keeping memory usage proportional to the bloom filter size rather than the number of blocks walked.

Benchmarks

See BENCHMARKS.md for comparison against other bloom filter libraries.

License

MIT (bbloom) and CC0 (inlined SipHash). See LICENSE.

Documentation

Overview

Package bbloom implements a fast bloom filter with a real bitset.

A Bloom filter is a space-efficient probabilistic data structure used to test set membership. It may return false positives but never false negatives: Bloom.Has returning true means the entry was probably added, while false means the entry was definitely not added.

This implementation uses an inlined SipHash-2-4 for hashing, rounds the bitset up to the next power of two for fast masking, and provides both non-thread-safe and mutex-protected (TS-suffixed) variants of all operations.

By default (New) the filter uses publicly known SipHash keys. When the filter will hold data controlled by untrusted parties, use NewWithKeys with random secret keys to prevent hash-flooding attacks.

Filters can be serialized to JSON with Bloom.JSONMarshal and restored with JSONUnmarshal.

Index

Examples

Constants

This section is empty.

Variables

View Source
var (
	// ErrUsage is returned by [New] when it is not called with exactly two arguments.
	ErrUsage = errors.New("usage: New(float64(number_of_entries), float64(number_of_hashlocations)) i.e. New(float64(1000), float64(3)) or New(float64(number_of_entries), float64(ratio_of_false_positives)) i.e. New(float64(1000), float64(0.03))")

	// ErrInvalidParms is returned by [New] when a parameter is negative.
	ErrInvalidParms = errors.New("one of the parameters was outside of allowed range")
)

Functions

This section is empty.

Types

type Bloom

type Bloom struct {
	// Mtx is exposed so callers can hold the lock across multiple
	// operations (e.g. a Has followed by Add) without racing.
	Mtx sync.RWMutex
	// contains filtered or unexported fields
}

Bloom is a bloom filter backed by a power-of-two sized bitset. The Mtx field is exposed so callers can hold the lock across multiple operations when needed.

func JSONUnmarshal

func JSONUnmarshal(dbData []byte) (*Bloom, error)

JSONUnmarshal restores a bloom filter from a JSON byte slice produced by Bloom.JSONMarshal or Bloom.JSONMarshalTS.

Example
// Create and populate a filter.
bf, err := New(float64(10000), 0.001)
if err != nil {
	panic(err)
}
bf.Add([]byte("persisted"))

// Serialize to JSON.
data := bf.JSONMarshal()

// Restore from JSON.
bf2, err := JSONUnmarshal(data)
if err != nil {
	panic(err)
}

fmt.Println(bf2.Has([]byte("persisted")))
fmt.Println(bf2.Has([]byte("missing")))
Output:
true
false

func New

func New(params ...float64) (bloomfilter *Bloom, err error)

New creates a bloom filter with default SipHash keys. It accepts exactly two float64 arguments:

  • If the second parameter is < 1 it is treated as a false-positive rate, and the filter is sized automatically. Example: New(float64(1<<16), 0.01) -- 65536 items, 1% FP rate.

  • If the second parameter is >= 1 it is treated as the number of hash locations, and the first parameter is the bitset size. Example: New(650000.0, 7.0) -- 650000-bit filter, 7 hash locations.

The default SipHash keys are publicly known constants. If the filter will hold data controlled by untrusted parties, use NewWithKeys instead to prevent hash-flooding attacks.

Example
// Create a filter sized for 10000 entries with a 0.1% false positive rate.
bf, err := New(float64(10000), 0.001)
if err != nil {
	panic(err)
}

bf.Add([]byte("hello"))
bf.Add([]byte("world"))

fmt.Println(bf.Has([]byte("hello")))
fmt.Println(bf.Has([]byte("world")))
fmt.Println(bf.Has([]byte("missing")))
Output:
true
true
false

func NewWithBoolset

func NewWithBoolset(bs []byte, locs uint64) (bloomfilter *Bloom)

NewWithBoolset creates a bloom filter from a pre-existing bitset, typically obtained from a previous Bloom.JSONMarshal export or an external source. bs is the serialized bitset (big-endian uint64 words) and locs is the number of hash locations per entry. The filter uses default SipHash keys; use NewWithBoolsetAndKeys to restore a filter that was created with custom keys.

func NewWithBoolsetAndKeys

func NewWithBoolsetAndKeys(bs []byte, locs, k0, k1 uint64) (bloomfilter *Bloom)

NewWithBoolsetAndKeys creates a bloom filter from a pre-existing bitset with caller-provided SipHash keys. This is the constructor to use when restoring a filter that was originally created with NewWithKeys. See NewWithKeys for why custom keys matter and NewWithBoolset for how the bitset parameter is interpreted.

func NewWithKeys

func NewWithKeys(k0, k1 uint64, params ...float64) (*Bloom, error)

NewWithKeys creates a bloom filter with caller-provided SipHash keys.

The default keys used by New are publicly known constants baked into the source code. An attacker who knows the keys can craft inputs that all hash to the same bit positions, filling the filter faster than normal and raising the false-positive rate. This is a concern when the filter holds data chosen by untrusted parties (e.g. content-addressed blocks fetched from the network).

Providing random, secret keys (e.g. generated once per node from crypto/rand) restores SipHash's anti-collision guarantees and makes such attacks infeasible.

The params are interpreted the same way as in New. Custom keys are preserved across Bloom.JSONMarshal / JSONUnmarshal round-trips. Note: custom keys are included in plaintext in the Bloom.JSONMarshal output, so treat serialized filters accordingly.

Example
// Use random secret keys to prevent hash-flooding attacks when the
// filter holds data from untrusted sources.
var k0, k1 uint64 = 0x0123456789abcdef, 0xfedcba9876543210

bf, err := NewWithKeys(k0, k1, float64(10000), 0.001)
if err != nil {
	panic(err)
}

bf.Add([]byte("hello"))
fmt.Println(bf.Has([]byte("hello")))
fmt.Println(bf.Has([]byte("missing")))
Output:
true
false

func (*Bloom) Add

func (bl *Bloom) Add(entry []byte)

Add inserts entry into the bloom filter. Not safe for concurrent use; see Bloom.AddTS for a mutex-protected variant.

func (*Bloom) AddIfNotHas

func (bl *Bloom) AddIfNotHas(entry []byte) (added bool)

AddIfNotHas adds entry only if it is not already present in the filter. It returns true if the entry was added, false if it was already present. Not safe for concurrent use; see Bloom.AddIfNotHasTS.

Example
bf, err := New(float64(512), float64(1))
if err != nil {
	panic(err)
}

fmt.Printf("%v %v %v %v\n", bf.sizeExp, bf.size, bf.setLocs, bf.shift)

bf.Add([]byte("Manfred"))
fmt.Println("bf.Add([]byte(\"Manfred\"))")
fmt.Printf("bf.Has([]byte(\"Manfred\")) -> %v - should be true\n", bf.Has([]byte("Manfred")))
fmt.Printf("bf.Add([]byte(\"manfred\")) -> %v - should be false\n", bf.Has([]byte("manfred")))
fmt.Printf("bf.AddIfNotHas([]byte(\"Manfred\")) -> %v - should be false\n", bf.AddIfNotHas([]byte("Manfred")))
fmt.Printf("bf.AddIfNotHas([]byte(\"manfred\")) -> %v - should be true\n", bf.AddIfNotHas([]byte("manfred")))

bf.AddTS([]byte("Hans"))
fmt.Println("bf.AddTS([]byte(\"Hans\")")
fmt.Printf("bf.HasTS([]byte(\"Hans\")) -> %v - should be true\n", bf.HasTS([]byte("Hans")))
fmt.Printf("bf.AddTS([]byte(\"hans\")) -> %v - should be false\n", bf.HasTS([]byte("hans")))
fmt.Printf("bf.AddIfNotHasTS([]byte(\"Hans\")) -> %v - should be false\n", bf.AddIfNotHasTS([]byte("Hans")))
fmt.Printf("bf.AddIfNotHasTS([]byte(\"hans\")) -> %v - should be true\n", bf.AddIfNotHasTS([]byte("hans")))
Output:
9 511 1 55
bf.Add([]byte("Manfred"))
bf.Has([]byte("Manfred")) -> true - should be true
bf.Add([]byte("manfred")) -> false - should be false
bf.AddIfNotHas([]byte("Manfred")) -> false - should be false
bf.AddIfNotHas([]byte("manfred")) -> true - should be true
bf.AddTS([]byte("Hans")
bf.HasTS([]byte("Hans")) -> true - should be true
bf.AddTS([]byte("hans")) -> false - should be false
bf.AddIfNotHasTS([]byte("Hans")) -> false - should be false
bf.AddIfNotHasTS([]byte("hans")) -> true - should be true

func (*Bloom) AddIfNotHasTS

func (bl *Bloom) AddIfNotHasTS(entry []byte) (added bool)

AddIfNotHasTS is the thread-safe version of Bloom.AddIfNotHas. It acquires a write lock for the duration of the operation.

func (*Bloom) AddTS

func (bl *Bloom) AddTS(entry []byte)

AddTS is the thread-safe version of Bloom.Add. It acquires a write lock for the duration of the operation.

func (*Bloom) Clear

func (bl *Bloom) Clear()

Clear resets the bloom filter, zeroing the bitset and the element counter. Not safe for concurrent use; see Bloom.ClearTS.

func (*Bloom) ClearTS

func (bl *Bloom) ClearTS()

ClearTS is the thread-safe version of Bloom.Clear.

func (*Bloom) ElementsAdded

func (bl *Bloom) ElementsAdded() uint64

ElementsAdded returns the number of elements added to the bloom filter.

func (*Bloom) FillRatio

func (bl *Bloom) FillRatio() float64

FillRatio returns the fraction of bits set in the filter (0.0 to 1.0). Not safe for concurrent use; see Bloom.FillRatioTS.

func (*Bloom) FillRatioTS

func (bl *Bloom) FillRatioTS() float64

FillRatioTS is the thread-safe version of Bloom.FillRatio.

func (*Bloom) Has

func (bl *Bloom) Has(entry []byte) bool

Has reports whether entry is in the filter. A true result may be a false positive; a false result is always definitive. Not safe for concurrent use; see Bloom.HasTS.

Example
bf, err := New(float64(10000), 0.001)
if err != nil {
	panic(err)
}

bf.Add([]byte("apple"))

// Has returns true for entries that were added.
fmt.Println(bf.Has([]byte("apple")))

// Has returns false for entries that were never added (no false negatives).
fmt.Println(bf.Has([]byte("banana")))
Output:
true
false

func (*Bloom) HasTS

func (bl *Bloom) HasTS(entry []byte) bool

HasTS is the thread-safe version of Bloom.Has. It acquires a read lock for the duration of the operation.

func (*Bloom) JSONMarshal

func (bl *Bloom) JSONMarshal() []byte

JSONMarshal serializes the bloom filter to a JSON byte slice. The result can be restored with JSONUnmarshal. Not safe for concurrent use; see Bloom.JSONMarshalTS.

func (*Bloom) JSONMarshalTS

func (bl *Bloom) JSONMarshalTS() []byte

JSONMarshalTS is the thread-safe version of Bloom.JSONMarshal.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL