Documentation
¶
Overview ¶
Package bbloom implements a fast bloom filter with a real bitset.
A Bloom filter is a space-efficient probabilistic data structure used to test set membership. It may return false positives but never false negatives: Bloom.Has returning true means the entry was probably added, while false means the entry was definitely not added.
This implementation uses an inlined SipHash-2-4 for hashing, rounds the bitset up to the next power of two for fast masking, and provides both non-thread-safe and mutex-protected (TS-suffixed) variants of all operations.
By default (New) the filter uses publicly known SipHash keys. When the filter will hold data controlled by untrusted parties, use NewWithKeys with random secret keys to prevent hash-flooding attacks.
Filters can be serialized to JSON with Bloom.JSONMarshal and restored with JSONUnmarshal.
Index ¶
- Variables
- type Bloom
- func JSONUnmarshal(dbData []byte) (*Bloom, error)
- func New(params ...float64) (bloomfilter *Bloom, err error)
- func NewWithBoolset(bs []byte, locs uint64) (bloomfilter *Bloom)
- func NewWithBoolsetAndKeys(bs []byte, locs, k0, k1 uint64) (bloomfilter *Bloom)
- func NewWithKeys(k0, k1 uint64, params ...float64) (*Bloom, error)
- func (bl *Bloom) Add(entry []byte)
- func (bl *Bloom) AddIfNotHas(entry []byte) (added bool)
- func (bl *Bloom) AddIfNotHasTS(entry []byte) (added bool)
- func (bl *Bloom) AddTS(entry []byte)
- func (bl *Bloom) Clear()
- func (bl *Bloom) ClearTS()
- func (bl *Bloom) ElementsAdded() uint64
- func (bl *Bloom) FillRatio() float64
- func (bl *Bloom) FillRatioTS() float64
- func (bl *Bloom) Has(entry []byte) bool
- func (bl *Bloom) HasTS(entry []byte) bool
- func (bl *Bloom) JSONMarshal() []byte
- func (bl *Bloom) JSONMarshalTS() []byte
Examples ¶
Constants ¶
This section is empty.
Variables ¶
var ( // ErrUsage is returned by [New] when it is not called with exactly two arguments. ErrUsage = errors.New("usage: New(float64(number_of_entries), float64(number_of_hashlocations)) i.e. New(float64(1000), float64(3)) or New(float64(number_of_entries), float64(ratio_of_false_positives)) i.e. New(float64(1000), float64(0.03))") // ErrInvalidParms is returned by [New] when a parameter is negative. ErrInvalidParms = errors.New("one of the parameters was outside of allowed range") )
Functions ¶
This section is empty.
Types ¶
type Bloom ¶
type Bloom struct {
// Mtx is exposed so callers can hold the lock across multiple
// operations (e.g. a Has followed by Add) without racing.
Mtx sync.RWMutex
// contains filtered or unexported fields
}
Bloom is a bloom filter backed by a power-of-two sized bitset. The Mtx field is exposed so callers can hold the lock across multiple operations when needed.
func JSONUnmarshal ¶
JSONUnmarshal restores a bloom filter from a JSON byte slice produced by Bloom.JSONMarshal or Bloom.JSONMarshalTS.
Example ¶
// Create and populate a filter.
bf, err := New(float64(10000), 0.001)
if err != nil {
panic(err)
}
bf.Add([]byte("persisted"))
// Serialize to JSON.
data := bf.JSONMarshal()
// Restore from JSON.
bf2, err := JSONUnmarshal(data)
if err != nil {
panic(err)
}
fmt.Println(bf2.Has([]byte("persisted")))
fmt.Println(bf2.Has([]byte("missing")))
Output: true false
func New ¶
New creates a bloom filter with default SipHash keys. It accepts exactly two float64 arguments:
If the second parameter is < 1 it is treated as a false-positive rate, and the filter is sized automatically. Example: New(float64(1<<16), 0.01) -- 65536 items, 1% FP rate.
If the second parameter is >= 1 it is treated as the number of hash locations, and the first parameter is the bitset size. Example: New(650000.0, 7.0) -- 650000-bit filter, 7 hash locations.
The default SipHash keys are publicly known constants. If the filter will hold data controlled by untrusted parties, use NewWithKeys instead to prevent hash-flooding attacks.
Example ¶
// Create a filter sized for 10000 entries with a 0.1% false positive rate.
bf, err := New(float64(10000), 0.001)
if err != nil {
panic(err)
}
bf.Add([]byte("hello"))
bf.Add([]byte("world"))
fmt.Println(bf.Has([]byte("hello")))
fmt.Println(bf.Has([]byte("world")))
fmt.Println(bf.Has([]byte("missing")))
Output: true true false
func NewWithBoolset ¶
NewWithBoolset creates a bloom filter from a pre-existing bitset, typically obtained from a previous Bloom.JSONMarshal export or an external source. bs is the serialized bitset (big-endian uint64 words) and locs is the number of hash locations per entry. The filter uses default SipHash keys; use NewWithBoolsetAndKeys to restore a filter that was created with custom keys.
func NewWithBoolsetAndKeys ¶
NewWithBoolsetAndKeys creates a bloom filter from a pre-existing bitset with caller-provided SipHash keys. This is the constructor to use when restoring a filter that was originally created with NewWithKeys. See NewWithKeys for why custom keys matter and NewWithBoolset for how the bitset parameter is interpreted.
func NewWithKeys ¶
NewWithKeys creates a bloom filter with caller-provided SipHash keys.
The default keys used by New are publicly known constants baked into the source code. An attacker who knows the keys can craft inputs that all hash to the same bit positions, filling the filter faster than normal and raising the false-positive rate. This is a concern when the filter holds data chosen by untrusted parties (e.g. content-addressed blocks fetched from the network).
Providing random, secret keys (e.g. generated once per node from crypto/rand) restores SipHash's anti-collision guarantees and makes such attacks infeasible.
The params are interpreted the same way as in New. Custom keys are preserved across Bloom.JSONMarshal / JSONUnmarshal round-trips. Note: custom keys are included in plaintext in the Bloom.JSONMarshal output, so treat serialized filters accordingly.
Example ¶
// Use random secret keys to prevent hash-flooding attacks when the
// filter holds data from untrusted sources.
var k0, k1 uint64 = 0x0123456789abcdef, 0xfedcba9876543210
bf, err := NewWithKeys(k0, k1, float64(10000), 0.001)
if err != nil {
panic(err)
}
bf.Add([]byte("hello"))
fmt.Println(bf.Has([]byte("hello")))
fmt.Println(bf.Has([]byte("missing")))
Output: true false
func (*Bloom) Add ¶
Add inserts entry into the bloom filter. Not safe for concurrent use; see Bloom.AddTS for a mutex-protected variant.
func (*Bloom) AddIfNotHas ¶
AddIfNotHas adds entry only if it is not already present in the filter. It returns true if the entry was added, false if it was already present. Not safe for concurrent use; see Bloom.AddIfNotHasTS.
Example ¶
bf, err := New(float64(512), float64(1))
if err != nil {
panic(err)
}
fmt.Printf("%v %v %v %v\n", bf.sizeExp, bf.size, bf.setLocs, bf.shift)
bf.Add([]byte("Manfred"))
fmt.Println("bf.Add([]byte(\"Manfred\"))")
fmt.Printf("bf.Has([]byte(\"Manfred\")) -> %v - should be true\n", bf.Has([]byte("Manfred")))
fmt.Printf("bf.Add([]byte(\"manfred\")) -> %v - should be false\n", bf.Has([]byte("manfred")))
fmt.Printf("bf.AddIfNotHas([]byte(\"Manfred\")) -> %v - should be false\n", bf.AddIfNotHas([]byte("Manfred")))
fmt.Printf("bf.AddIfNotHas([]byte(\"manfred\")) -> %v - should be true\n", bf.AddIfNotHas([]byte("manfred")))
bf.AddTS([]byte("Hans"))
fmt.Println("bf.AddTS([]byte(\"Hans\")")
fmt.Printf("bf.HasTS([]byte(\"Hans\")) -> %v - should be true\n", bf.HasTS([]byte("Hans")))
fmt.Printf("bf.AddTS([]byte(\"hans\")) -> %v - should be false\n", bf.HasTS([]byte("hans")))
fmt.Printf("bf.AddIfNotHasTS([]byte(\"Hans\")) -> %v - should be false\n", bf.AddIfNotHasTS([]byte("Hans")))
fmt.Printf("bf.AddIfNotHasTS([]byte(\"hans\")) -> %v - should be true\n", bf.AddIfNotHasTS([]byte("hans")))
Output: 9 511 1 55 bf.Add([]byte("Manfred")) bf.Has([]byte("Manfred")) -> true - should be true bf.Add([]byte("manfred")) -> false - should be false bf.AddIfNotHas([]byte("Manfred")) -> false - should be false bf.AddIfNotHas([]byte("manfred")) -> true - should be true bf.AddTS([]byte("Hans") bf.HasTS([]byte("Hans")) -> true - should be true bf.AddTS([]byte("hans")) -> false - should be false bf.AddIfNotHasTS([]byte("Hans")) -> false - should be false bf.AddIfNotHasTS([]byte("hans")) -> true - should be true
func (*Bloom) AddIfNotHasTS ¶
AddIfNotHasTS is the thread-safe version of Bloom.AddIfNotHas. It acquires a write lock for the duration of the operation.
func (*Bloom) AddTS ¶
AddTS is the thread-safe version of Bloom.Add. It acquires a write lock for the duration of the operation.
func (*Bloom) Clear ¶
func (bl *Bloom) Clear()
Clear resets the bloom filter, zeroing the bitset and the element counter. Not safe for concurrent use; see Bloom.ClearTS.
func (*Bloom) ClearTS ¶
func (bl *Bloom) ClearTS()
ClearTS is the thread-safe version of Bloom.Clear.
func (*Bloom) ElementsAdded ¶
ElementsAdded returns the number of elements added to the bloom filter.
func (*Bloom) FillRatio ¶
FillRatio returns the fraction of bits set in the filter (0.0 to 1.0). Not safe for concurrent use; see Bloom.FillRatioTS.
func (*Bloom) FillRatioTS ¶
FillRatioTS is the thread-safe version of Bloom.FillRatio.
func (*Bloom) Has ¶
Has reports whether entry is in the filter. A true result may be a false positive; a false result is always definitive. Not safe for concurrent use; see Bloom.HasTS.
Example ¶
bf, err := New(float64(10000), 0.001)
if err != nil {
panic(err)
}
bf.Add([]byte("apple"))
// Has returns true for entries that were added.
fmt.Println(bf.Has([]byte("apple")))
// Has returns false for entries that were never added (no false negatives).
fmt.Println(bf.Has([]byte("banana")))
Output: true false
func (*Bloom) HasTS ¶
HasTS is the thread-safe version of Bloom.Has. It acquires a read lock for the duration of the operation.
func (*Bloom) JSONMarshal ¶
JSONMarshal serializes the bloom filter to a JSON byte slice. The result can be restored with JSONUnmarshal. Not safe for concurrent use; see Bloom.JSONMarshalTS.
func (*Bloom) JSONMarshalTS ¶
JSONMarshalTS is the thread-safe version of Bloom.JSONMarshal.