xorfilter

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 8, 2022 License: Apache-2.0 Imports: 3 Imported by: 12

README

xorfilter: Go library implementing xor and binary fuse filters

GoDoc Build Status

Bloom filters are used to quickly check whether an element is part of a set. Xor and binary fuse filters are a faster and more concise alternative to Bloom filters. They are also smaller than cuckoo filters.

We are assuming that your set is made of 64-bit integers. If you have strings or other data structures, you need to hash them first to a 64-bit integer. It is not important to have a good hash function, but collision should be unlikely (~1/2^64).

The current implementation has a false positive rate of about 0.3% and a memory usage of less than 9 bits per entry for sizeable sets.

You construct the filter as follows starting from a slice of 64-bit integers:

filter,_ := xorfilter.PopulateBinaryFuse8(keys) // keys is of type []uint64

It returns an object of type BinaryFuse8. The 64-bit integers would typically be hash values of your objects.

You can then query it as follows:

filter.Contains(v) // v is of type uint64

It will always return true if v was part of the initial construction (Populate) and almost always return false otherwise.

An xor filter is immutable, it is concurrent. The expectation is that you build it once and use it many times.

Though the filter itself does not use much memory, the construction of the filter needs many bytes of memory per set entry.

For persistence, you only need to serialize the following data structure:

type BinaryFuse8 struct {
	Seed               uint64
	SegmentLength      uint32
	SegmentLengthMask  uint32
	SegmentCount       uint32
	SegmentCountLength uint32

	Fingerprints []uint8
}

Duplicate keys

When constructing the filter, you should ensure that there are not too many duplicate keys. If you are hashing objects with a good hash function, you should have no concern, because there should be very few collisions. However, you can construct cases where there are many duplicates. If you think that this might happen, then you should check the error condition.

filter,err := xorfilter.PopulateBinaryFuse8(keys) // keys is of type []uint64
if err != nil {
 // you have too many duplicate keys, de-duplicate them?
}

Effectively, an error is returned when the filter could not be build after MaxIterations iterations (default to 100).

Implementations of xor filters in other programming languages

Documentation

Index

Constants

View Source
const ARITY = 3
View Source
const SEGMENT_COUNT = 100
View Source
const SLOTS = SEGMENT_COUNT + ARITY - 1

Variables

View Source
var MaxIterations = 1024

The maximum number of iterations allowed before the populate function returns an error

Functions

This section is empty.

Types

type BinaryFuse8

type BinaryFuse8 struct {
	Seed               uint64
	SegmentLength      uint32
	SegmentLengthMask  uint32
	SegmentCount       uint32
	SegmentCountLength uint32

	Fingerprints []uint8
}

func PopulateBinaryFuse8

func PopulateBinaryFuse8(keys []uint64) (*BinaryFuse8, error)

PopulateBinaryFuse8 fills a BinaryFuse8 filter with provided keys. The function may return an error after too many iterations: it is unlikely.

func (*BinaryFuse8) Contains

func (filter *BinaryFuse8) Contains(key uint64) bool

Contains returns `true` if key is part of the set with a false positive probability of <0.4%.

type Fuse8

type Fuse8 struct {
	Seed          uint64
	SegmentLength uint32
	Fingerprints  []uint8
}

The Fuse8 xor filter uses 8-bit fingerprints. It offers the same <0.4% false-positive probability as the xor filter, but uses less space (~9.1 bits/entry vs ~9.9 bits/entry).

The Fuse8 xor filter uses the fuse data structure, which requires a large number of keys to be operational. Experimentally, this number is somewhere >1e5. For smaller key sets, prefer thhe Xor8 filter.

For more information on the fuse graph data structure, see https://arxiv.org/abs/1907.04749. This implementation is referenced from the C implementation at https://github.com/FastFilter/xor_singleheader/pull/11.

func PopulateFuse8

func PopulateFuse8(keys []uint64) (*Fuse8, error)

Populate fills a Fuse8 filter with provided keys. The caller is responsible for ensuring there are no duplicate keys provided. The function may return an error after too many iterations: it is almost surely an indication that you have duplicate keys.

func (*Fuse8) Contains

func (filter *Fuse8) Contains(key uint64) bool

Contains returns `true` if key is part of the set with a false positive probability of <0.4%.

type Xor8

type Xor8 struct {
	Seed         uint64
	BlockLength  uint32
	Fingerprints []uint8
}

Xor8 offers a 0.3% false-positive probability

func Populate

func Populate(keys []uint64) (*Xor8, error)

Populate fills the filter with provided keys. The caller is responsible to ensure that there are no duplicate keys. The function may return an error after too many iterations: it is almost surely an indication that you have duplicate keys.

func (*Xor8) Contains

func (filter *Xor8) Contains(key uint64) bool

Contains tell you whether the key is likely part of the set

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL