sbmap

package module
v0.0.0-...-f4e4c37 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 26, 2025 License: MIT Imports: 5 Imported by: 0

README

sbmap

import "github.com/barbell-math/smoothbrain-hashmap"

A very simple library that implements a generic, linear probing map.

Example (Custom Eq And Hash Funcs)

seed := maphash.MakeSeed()

h := NewCustom[string, string](
	4,
	func(l, r string) bool { return strings.ToLower(l) == strings.ToLower(r) },
	func(v string) uint64 { return maphash.String(seed, strings.ToLower(v)) },
)
h.Put("one", "one")
h.Put("two", "two")
h.Put("three", "three")
h.Put("ThReE", "four")

if val, ok := h.Get("three"); ok {
	fmt.Println(val)
}

h.Remove("tWO")
fmt.Println(h.Len())

fmt.Println("Keys:")
keys := slices.Collect(h.Keys())
slices.Sort(keys)
fmt.Println(keys)

//Output:
// four
// 2
// Keys:
// [one three]
Output
four
2
Keys:
[one three]

Example (Simple)

h := New[int, string]()
h.Put(1, "one")
h.Put(2, "two")
h.Put(3, "three")

if val, ok := h.Get(1); ok {
	fmt.Println(val)
}
if _, ok := h.Get(4); !ok {
	fmt.Println("4 was not in the map!")
}

h.Remove(1)
fmt.Println(h.Len())

fmt.Println("Keys:")
keys := slices.Collect(h.Keys())
slices.Sort(keys)
fmt.Println(keys)

//Output:
// one
// 4 was not in the map!
// 2
// Keys:
// [2 3]
Output
one
4 was not in the map!
2
Keys:
[2 3]

Index

func ComparableEqual

func ComparableEqual[T comparable](l T, r T) bool

An equality function that can be passed to NewCustom when using a comparable type. If the key type is comparable then you can simply use New instead of NewCustom and this function will be Used by default.

func ComparableHash

func ComparableHash[T comparable]() func(v T) uint64

A hash function that can be passed to NewCustom when using a comparable type. If the key type is comparable then you can simply use New instead of NewCustom and this function will be Used by default.

type Map

type Map[K any, V any] struct {
    // contains filtered or unexported fields
}

func New
func New[K comparable, V comparable]() Map[K, V]

Creates a Map where K is the key type and V is the value type. ComparableEqual and ComparableHash functions will be Used by the returned Map. For creating a Map with non-comparable types or custom hash and equality functions refer to NewCustom.

func NewCap
func NewCap[K comparable, V comparable](_cap int) Map[K, V]

Creates a Map where K is the key type and V is the value type with a capacity of `_cap`. ComparableEqual and ComparableHash functions will be Used by the returned Map. For creating a Map with non-comparable types or custom hash and equality functions refer to NewCustom.

func NewCustom
func NewCustom[K any, V any](_cap int, eq func(l K, r K) bool, hash func(v K) uint64) Map[K, V]

Creates a Map where K is the key type and V is the value type with a capacity of `_cap`. The supplied `eq` and `hash` functions will be Used by the Map. If two values are equal the `hash` function hash function should return the same hash for both values.

func (*Map[K, V]) Clear
func (m *Map[K, V]) Clear()

Removes all values from the underlying hash but keeps the maps underlying capacity.

func (*Map[K, V]) Copy
func (m *Map[K, V]) Copy() *Map[K, V]

Creates a copy of the supplied hash map. All values will be copied using memcpy, meaning a shallow copy will be made of the values.

func (*Map[K, V]) Get
func (m *Map[K, V]) Get(k K) (V, bool)

Gets the value that is related to the supplied key. If the key is found the boolean return value will be true and the value will be returned. If the key is not found the boolean return value will be false and a zero-initialized value of type V will be returned.

func (*Map[K, V]) Keys
func (m *Map[K, V]) Keys() iter.Seq[K]

Iterates over all of the keys in the map. Uses the stdlib `iter` package so this function can be Used in a standard `for` loop.

func (*Map[K, V]) Len
func (m *Map[K, V]) Len() int

Returns the number of elements in the hash map. This is different than the maps capacity.

func (*Map[K, V]) PntrVals
func (m *Map[K, V]) PntrVals() iter.Seq[*V]

Iterates over all of the values in the map. Uses the stdlib `iter` package so this function can be Used in a standard `for` loop. The value may be mutated and the results will be seen by the hash map.

func (*Map[K, V]) Put
func (m *Map[K, V]) Put(k K, v V)

Places the supplied key, value pair in the hash map. If the key was already present in the map the old value will be overwritten. The map will rehash as necessary.

func (*Map[K, V]) Remove
func (m *Map[K, V]) Remove(k K)

Removes the supplied key and associated value from the hash map if it is present. If the key is not present in the map then no action will be taken.

func (*Map[K, V]) Vals
func (m *Map[K, V]) Vals() iter.Seq[V]

Iterates over all of the values in the map. Uses the stdlib `iter` package so this function can be Used in a standard `for` loop.

func (*Map[K, V]) Zero
func (m *Map[K, V]) Zero()

Removes all values from the underlying hash and resets the maps capacity to the default initial capacity.

Generated by gomarkdoc

Use in Your Project

Simply include the package and you will be able to use the default map with no simd intrinsics. If you wish to use simd intrinsics then you must compile your project with one of following tags: sbmap_simd128, sbmap_simd256, or sbmap_simd512. In testing sbmap_simd128 has shown the greatest performance.

Note: Please keep in mind that the hash and equality functions you provide to the map can have a significant impact on performance. For this map to perform well the functions you provide must be correct, and efficient.

When Should You Use This?

There are a couple scenarios where this map will make sense to use:

  1. You want to support maps with key types outside of the comparable type set
  2. You want to use a map that performs less allocations than the builtin map
  3. Your hardware has SIMD support and you can take advantage of it

Helpful Developer Cmds

To build the build system:

go build -o ./bs/bs ./bs

The build system can then be used as usual:

./bs/bs --help
./bs/bs buildbs # Builds the build system!

To run unit tests:

go test -v ./...                        # or...
go test -tags=sbmap_simd128 -v ./...    # or...
go test -tags=sbmap_simd256 -v ./...    # or...
go test -tags=sbmap_simd512 -v ./...

To run the delve debugger:

dlv test github.com/barbell-math/smoothbrain-hashmap                                        # or...
dlv test --build-flags="-tags=sbmap_simd128" github.com/barbell-math/smoothbrain-hashmap    # or...
dlv test --build-flags="-tags=sbmap_simd256" github.com/barbell-math/smoothbrain-hashmap    # or...
dlv test --build-flags="-tags=sbmap_simd512" github.com/barbell-math/smoothbrain-hashmap

To run gdb on a selected packages tests:

go test -gcflags "-N" -ldflags="-compressdwarf=false" -c ./...                      # or...
./bs/bin/bs unitTestExe default

go test -tags=sbmap_simd128 -gcflags "-N" -ldflags="-compressdwarf=false" -c ./...  # or...
./bs/bin/bs unitTestExe 128

go test -tags=sbmap_simd256 -gcflags "-N" -ldflags="-compressdwarf=false" -c ./...  # or...
./bs/bin/bs unitTestExe 256

go test -tags=sbmap_simd512 -gcflags "-N" -ldflags="-compressdwarf=false" -c ./...  # or...
./bs/bin/bs unitTestExe 512

gdb slotProbes.test             # or...
gdb smoothbrain-hashmap.test

The delve debugger is better suited for dealing with go constructs. It handles accessing variables, printing values, and breakpoints a bit more smoothly than gdb. However, delve cannot see the xmm/ymm/zmm registers. This is a major problem considering that this hash map uses SIMD. gdb on the other hand does not handle accessing/printing go variables/constructs as well and setting breakpoints is not as easy but, it can see the xmm/ymm/zmm registers. Pick your poison.

By default running the LargishDataset unit test will run profiling and will produce a file: ./bs/testProf.prof. This can then be viewed using golangs builtin pprof tool:

go tool pprof ./bs/tmp/testProf.prof

Developer Resources

Writing assembly is not a trivial task. These are some helpful resources:

  1. amd64 instruction list: Lists all of the available instructions in the amd64 instruction set. However, not all of the instructions listed in this resource are available for use with the go assembler. To check what instructions are available it is often helpful too search the go repo for them.
  2. available instructions search: This link provides a way to search the go repo for assembly instructions.
  3. godbolt: Sometimes it is difficult to know where to start. In this case it can sometimes help to look at what the compiler would do. It is often helpful to write a small code sample in C and see what the compiler produces on high optimization levels.
  4. Algorithmica: This is a resource to better understand SIMD instructions and how to use them. If you have not dealt with SIMD instructions before it is recommended to read this first.

Why?

Golang has a builtin map. Why make a new one? Because I wanted to support arbitrary types as keys. This is just a small issue that I have run into before when trying to write generic code. The builtin map being restricted to comparable types restricts it from being used as a set in many scenarios. So, this map implementation allows the user to supply their own custom equality and hash functions. It is assumed that the functions the user supplies are valid for use with a map.

Implementation

The map in this package is implemented as a swiss table with one contiguous backing array for storage. This is similar to the implementation that the builtin map uses, except the builtin map does not use one contiguous array for backing storage, it uses the more widely accepted bucket approach. On the go dev blog it is claimed that the builtin map was implemented this way to make it so adding a value would not create a sudden increase in processing time if the map needed to resize the backing array and rehash all of the elements.

Go is frequently used for latency-sensitive servers, so we don’t want operations on built-in types that can have arbitrarily large impact on tail latency. Instead, Go maps grow incrementally, so that each insertion has an upper bound on the amount of growth work it must do. This bounds the latency impact of a single map insertion. Source: https://go.dev/blog/swisstable

However, this bucket approach comes with a significant increase in the number of allocations made by the builtin map. (Nothing is free after all.) Plot 1 shows the number of allocations made by the builtin map compared to the number of allocations made by the map in this package.

plot1 Plot 1: A plot showing the total number of allocations to add N elements to the map.

So, the question can be asked: is there is a world where the reduced number of allocations performed by the map in this package allows it to have greater performance than the builtin map? After writing some benchmarks, the data can be looked at. Plot 2 shown below shows the time taken to put N elements into a map.

plot2 Plot 2: A plot showing the time taken to add N elements.

The map in this package has very clear jumps in processing time which are associated with when the map needs to resize and rehash. These jumps can be seen occurring at higher number of elements for higher load factor percentages, which makes sense because more elements can be added before triggering a resize and rehash given a higher load factor percentage. The builtin map does not show the same jumps, but to my surprise it does show more controlled increases in processing time at the same points where the map in this package has jumps in processing time. Seeing this large, but smooth, increase in processing time tells me that the builtin map still has similar limitations to resizing and rehashing even if those are not the exact actions that are occurring.

Plot 2 also shows the map in this package operating at differing load factors. The load factors in the range of 60%-75% perform the best in relation to the builtin map. The default load factor for the map in this package is 75%, the highest percentage that can be used before performance significantly decreases.

So, the original question of if reducing the number of allocations can result in an increase in performance can now be answered: yes, it can. The simd128 version of the map defined in this package shows a slight performance increase over the builtin map. Up until now only the simd128 version of the map defined in this package has been shown in graphs. This was done to reduce clutter and because the simd128 version of the map actually performs the best, as shown in plot 3. The reason there is a jagged section is because data points within that region were collected every few hundred elements. After that region sampling dropped to follow powers of 10 to help reduce the time taken to gather the benchmark results.

plot3 Plot 3: A plot showing the time taken to add N elements normalized against the time taken to add N elements to the builtin map across various simd tags.

Plot 3 shows that

  1. simd128 performs the best.
  2. simd256 has a slight edge, especially towards the end of the graph, to put it in second place.
  3. the default implementation with no SIMD is about as good as the simd512 version, if not better.

The pattern of worse performance as the SIMD register size increases is due to the nature of how a swiss table works. The larger the SIMD register the larger the potential number of collisions at each stage of comparison. Yes, SIMD will allow more values to be compared at once but having more values also results in more collisions. These collisions have to be iterated over sequentially which results in slower performance.

Documentation

Overview

A very simple library that implements a generic, linear probing map.

Example (CustomEqAndHashFuncs)
seed := maphash.MakeSeed()

h := NewCustom[string, string](
	4,
	func(l, r string) bool { return strings.ToLower(l) == strings.ToLower(r) },
	func(v string) uint64 { return maphash.String(seed, strings.ToLower(v)) },
)
h.Put("one", "one")
h.Put("two", "two")
h.Put("three", "three")
h.Put("ThReE", "four")

if val, ok := h.Get("three"); ok {
	fmt.Println(val)
}

h.Remove("tWO")
fmt.Println(h.Len())

fmt.Println("Keys:")
keys := slices.Collect(h.Keys())
slices.Sort(keys)
fmt.Println(keys)
Output:

four
2
Keys:
[one three]
Example (Simple)
h := New[int, string]()
h.Put(1, "one")
h.Put(2, "two")
h.Put(3, "three")

if val, ok := h.Get(1); ok {
	fmt.Println(val)
}
if _, ok := h.Get(4); !ok {
	fmt.Println("4 was not in the map!")
}

h.Remove(1)
fmt.Println(h.Len())

fmt.Println("Keys:")
keys := slices.Collect(h.Keys())
slices.Sort(keys)
fmt.Println(keys)
Output:

one
4 was not in the map!
2
Keys:
[2 3]

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func ComparableEqual

func ComparableEqual[T comparable](l T, r T) bool

An equality function that can be passed to NewCustom when using a comparable type. If the key type is comparable then you can simply use New instead of NewCustom and this function will be Used by default.

func ComparableHash

func ComparableHash[T comparable]() func(v T) uint64

A hash function that can be passed to NewCustom when using a comparable type. If the key type is comparable then you can simply use New instead of NewCustom and this function will be Used by default.

Types

type Map

type Map[K any, V any] struct {
	// contains filtered or unexported fields
}

func New

func New[K comparable, V comparable]() Map[K, V]

Creates a Map where K is the key type and V is the value type. ComparableEqual and ComparableHash functions will be Used by the returned Map. For creating a Map with non-comparable types or custom hash and equality functions refer to NewCustom.

func NewCap

func NewCap[K comparable, V comparable](_cap int) Map[K, V]

Creates a Map where K is the key type and V is the value type with a capacity of `_cap`. ComparableEqual and ComparableHash functions will be Used by the returned Map. For creating a Map with non-comparable types or custom hash and equality functions refer to NewCustom.

func NewCustom

func NewCustom[K any, V any](
	_cap int,
	eq func(l K, r K) bool,
	hash func(v K) uint64,
) Map[K, V]

Creates a Map where K is the key type and V is the value type with a capacity of `_cap`. The supplied `eq` and `hash` functions will be Used by the Map. If two values are equal the `hash` function hash function should return the same hash for both values.

func (*Map[K, V]) Clear

func (m *Map[K, V]) Clear()

Removes all values from the underlying hash but keeps the maps underlying capacity.

func (*Map[K, V]) Copy

func (m *Map[K, V]) Copy() *Map[K, V]

Creates a copy of the supplied hash map. All values will be copied using memcpy, meaning a shallow copy will be made of the values.

func (*Map[K, V]) Get

func (m *Map[K, V]) Get(k K) (V, bool)

Gets the value that is related to the supplied key. If the key is found the boolean return value will be true and the value will be returned. If the key is not found the boolean return value will be false and a zero-initialized value of type V will be returned.

func (*Map[K, V]) Keys

func (m *Map[K, V]) Keys() iter.Seq[K]

Iterates over all of the keys in the map. Uses the stdlib `iter` package so this function can be Used in a standard `for` loop.

func (*Map[K, V]) Len

func (m *Map[K, V]) Len() int

Returns the number of elements in the hash map. This is different than the maps capacity.

func (*Map[K, V]) PntrVals

func (m *Map[K, V]) PntrVals() iter.Seq[*V]

Iterates over all of the values in the map. Uses the stdlib `iter` package so this function can be Used in a standard `for` loop. The value may be mutated and the results will be seen by the hash map.

func (*Map[K, V]) Put

func (m *Map[K, V]) Put(k K, v V)

Places the supplied key, value pair in the hash map. If the key was already present in the map the old value will be overwritten. The map will rehash as necessary.

func (*Map[K, V]) Remove

func (m *Map[K, V]) Remove(k K)

Removes the supplied key and associated value from the hash map if it is present. If the key is not present in the map then no action will be taken.

func (*Map[K, V]) Vals

func (m *Map[K, V]) Vals() iter.Seq[V]

Iterates over all of the values in the map. Uses the stdlib `iter` package so this function can be Used in a standard `for` loop.

func (*Map[K, V]) Zero

func (m *Map[K, V]) Zero()

Removes all values from the underlying hash and resets the maps capacity to the default initial capacity.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL