stree

package module
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 24, 2026 License: BSD-2-Clause Imports: 6 Imported by: 0

README

S-Tree

A static B-tree implementation in Go based on the algorithm described at Algorithmica - S-Tree with SIMD acceleration (SSE4.2 and AVX2).

This is early work and may change without warning!

STree-Go is designed for high-performance, cache-efficient lookups. It uses Eytzinger (B-tree) numeration with 16-element blocks that align with typical 64-byte CPU cache lines, maximizing cache utilization during tree traversal. On amd64 platforms, the library automatically detects and uses SSE4.2 or AVX2 SIMD instructions at runtime for accelerated search operations, while providing a pure Go fallback that works on all platforms without SIMD support.

The data structure is designed to work directly with memory-mapped byte slices, making it ideal for persistent, disk-backed indices. Search operations allocate no memory, ensuring predictable performance without GC pressure. Once built, the tree structure is immutable, making it safe for concurrent read access.

The data structure is documented in stree.ksy as a Kaitai Struct format.

Usage

package main

import (
	"fmt"
	"os"

	"github.com/Akron/stree-go"
)

func main() {
	// 1. Build the S-Tree
	
	// Input: unsorted uint32 values with duplicates.
	values := []uint32{42, 17, 100, 5, 73, 88, 42, 17}
	
	tree, err := stree.Build(values)
	if err != nil {
		panic(err)
	}
	
	// 2. Serialize to disk
	file, err := os.Create("index.stree")
	if err != nil {
		panic(err)
	}
	
	_, err = tree.WriteTo(file)
	if err != nil {
		file.Close()
		panic(err)
	}
	file.Close()
	
	// 3. Load the S-Tree
	data, err := os.ReadFile("index.stree")
	if err != nil {
		panic(err)
	}
	
	reader, err := stree.NewReaderWithValidation(data)
	if err != nil {
		panic(err)
	}

    // 4. Search for keys
	// returns the position in the tree, or -1 if not found.
	// The position can be used to correlate
	// with external data.
	keysToFind := []uint32{42, 100, 999}
	for _, key := range keysToFind {
		pos := reader.Search(key)
		if pos >= 0 {
			fmt.Printf("Found %d at position %d\n", key, pos)
		} else {
			fmt.Printf("Key %d not found\n", key)
		}
	}
	}

Development

Assembly is generated using avo. Regeneration is done using

$ go generate ./internal/asm

Performance

Typical benchmark results on Intel Core Ultra 5 125H:

Operation Size Generic SSE4.2 AVX2
Search 1,000 25 ns 11 ns 8 ns
Search 10,000 16 ns 13 ns 11 ns
Search 100,000 59 ns 20 ns 13 ns

Limitations

  • Values must be (uint31): All keys must be in the range [0, 2^31 - 1]. This restriction is enforced at build time and search time. Build() and BuildFromKeyed() will return ErrValueTooLarge if any value exceeds this limit. Search() returns -1 immediately for keys >= 0x80000000. This constraint ensures consistent behavior between pure Go and SIMD implementations, as SIMD comparison uses signed arithmetic.
  • In-place modification: Build() and BuildFromKeyed() sort the input slice in place.

Disclaimer

This library was developed with AI assistance (Claude Opus 4.5).

Copyright (c) 2025-2026 Nils Diewald

Documentation

Overview

Package stree provides an S-Tree (Static B-Tree) implementation optimized for cache efficiency and SIMD operations.

The S-Tree uses Eytzinger layout with a branching factor of 17 (16 keys per node), enabling efficient cache-line-aligned traversal and SIMD-accelerated search.

Reference: https://en.algorithmica.org/hpc/data-structures/s-tree/

Example

Example demonstrates basic S-Tree usage: building a tree from values and searching.

// Build a tree from uint32 values.
// Note: The input slice will be sorted and deduplicated in-place.
values := []uint32{42, 17, 100, 5, 73}
tree, err := Build(values)
if err != nil {
	panic(err)
}

// Create a reader from the tree's serialized data.
reader, err := NewReader(tree.Data())
if err != nil {
	panic(err)
}

// Search for a value. Returns the position if found, -1 if not.
pos := reader.Search(42)
fmt.Printf("Search(42): found=%v\n", pos >= 0)

pos = reader.Search(99)
fmt.Printf("Search(99): found=%v\n", pos >= 0)

// Use Contains for simple membership testing.
fmt.Printf("Contains(100): %v\n", reader.Contains(100))

fmt.Printf("Count: %d\n", reader.Count())
Output:

Search(42): found=true
Search(99): found=false
Contains(100): true
Count: 5

Index

Examples

Constants

View Source
const (

	// Magic bytes identifying an S-Tree file.
	Magic = "STRE"

	// Version is the current format version.
	Version uint16 = 0x0001

	// MaxValue is the maximum allowed value for keys (2^31 - 1).
	// Values must be < 0x80000000 because SIMD comparison uses signed arithmetic.
	// This ensures consistent behavior between pure Go and SIMD implementations.
	MaxValue = uint32(0x7FFFFFFF)
)

Variables

View Source
var (
	SearchAVX2    = searchAVX2
	SearchSSE     = searchSSE
	SearchGeneric = searchGeneric
)

Exported for benchmarking purposes

View Source
var (
	ErrEmptyInput     = errors.New("stree: empty input")
	ErrInvalidMagic   = errors.New("stree: invalid magic bytes")
	ErrInvalidVersion = errors.New("stree: unsupported version")
	ErrInvalidData    = errors.New("stree: invalid data")
	ErrDataTooShort   = errors.New("stree: data too short")
	ErrInvalidBlockSz = errors.New("stree: invalid block size")
	ErrValueTooLarge  = errors.New("stree: value exceeds maximum (must be < 0x80000000)")
)

Errors returned by S-Tree operations.

Functions

func DataSize

func DataSize(count int) int

DataSize returns the total size in bytes needed to store count elements (header + data blocks).

func HasAVX2

func HasAVX2() bool

HasAVX2 returns true if AVX2 is available.

func HasSSE42

func HasSSE42() bool

HasSSE42 returns true if SSE4.2 is available.

func SIMDAvailable

func SIMDAvailable() bool

SIMDAvailable returns true if SIMD-accelerated search is available.

Types

type Keyed

type Keyed interface {
	// Key returns the uint32 key used for building and searching the S-Tree.
	Key() uint32
	// Index returns the stored index position (set by SetIndex during building).
	Index() uint32
	// SetIndex is called during building with the position of this key in the S-Tree.
	// This allows correlating the key with additional data stored elsewhere.
	SetIndex(idx uint32)
}

Keyed is the interface for types that can be indexed in an S-Tree. Implementations must provide a key for sorting/searching and accept an index position after the tree is built.

type Reader

type Reader struct {
	// contains filtered or unexported fields
}

Reader provides read-only access to an S-Tree. It can work with any byte slice, including memory-mapped files.

func NewReader

func NewReader(data []byte) (*Reader, error)

NewReader creates a Reader from a serialized byte slice. The slice must contain valid S-Tree data (header + blocks). The slice is NOT copied; the Reader references the original data.

func NewReaderWithValidation

func NewReaderWithValidation(data []byte) (*Reader, error)

NewReaderWithValidation creates a Reader from a byte slice and validates data integrity. This performs CRC-32 validation during construction, which adds O(n) time cost. Use NewReader for faster loading when integrity is already established.

Example

ExampleNewReaderWithValidation demonstrates creating a reader with CRC-32 validation.

tree, _ := Build([]uint32{10, 20, 30})

// NewReaderWithValidation checks the CRC-32 checksum during construction.
// Use this when loading from untrusted sources (network, disk).
reader, err := NewReaderWithValidation(tree.Data())
if err != nil {
	fmt.Println("Data integrity check failed!")
	return
}

fmt.Printf("Valid data, count: %d\n", reader.Count())
Output:

Valid data, count: 3

func (*Reader) All

func (r *Reader) All() func(yield func(uint32) bool)

All returns an iterator over all non-sentinel values in the tree. Values are returned in tree traversal order (not necessarily sorted). For sorted iteration, use Sorted().

func (*Reader) Contains

func (r *Reader) Contains(key uint32) bool

Contains returns true if the S-Tree contains the given key.

func (*Reader) Count

func (r *Reader) Count() int

Count returns the number of elements in the S-Tree.

func (*Reader) Data

func (r *Reader) Data() []byte

Data returns the underlying byte slice.

func (*Reader) NumBlocks

func (r *Reader) NumBlocks() int

NumBlocks returns the number of blocks in the S-Tree.

func (*Reader) Search

func (r *Reader) Search(key uint32) int

Search searches for a key in the S-Tree using tree traversal. Returns the position in the data array where the key is found, or -1 if not found. Returns -1 immediately if key >= 0x80000000 (not a valid uint31). This uses the optimized pure-Go implementation if no SIMD-optimized version can be used for the current architecture.

func (*Reader) Sorted

func (r *Reader) Sorted() func(yield func(value uint32, index int) bool)

Sorted returns an iterator over all (value, index) pairs in sorted (ascending) order. The index is the position in the S-Tree data structure. This performs an in-order traversal of the Eytzinger tree structure. This is useful for merging trees while preserving index information.

Example

ExampleReader_Sorted demonstrates iterating through all values in sorted order. This performs an in-order traversal of the Eytzinger tree structure, yielding both the value and its position in the tree.

values := []uint32{50, 10, 40, 20, 30}
tree, _ := Build(values)
reader, _ := NewReader(tree.Data())

// Sorted() returns an iterator that yields (value, index) pairs
// in ascending order by value.
fmt.Println("Values in sorted order:")
reader.Sorted()(func(value uint32, index int) bool {
	fmt.Printf("  value=%d, index=%d\n", value, index)
	return true // return false to stop iteration early
})
Output:

Values in sorted order:
  value=10, index=0
  value=20, index=1
  value=30, index=2
  value=40, index=3
  value=50, index=4

func (*Reader) ValidateCRC32

func (r *Reader) ValidateCRC32() bool

ValidateCRC32 checks the data integrity by validating the stored CRC-32 checksum. Returns true if the checksum is valid, false if data is corrupted. This operation is O(n) where n is the total data size.

type STree

type STree struct {
	// contains filtered or unexported fields
}

STree represents an S-Tree (Static B-Tree) in memory. It can be created from a slice of uint32 values and written to disk.

func Build

func Build(values []uint32) (*STree, error)

Build creates a new S-Tree from the given slice of uint32 values. The input slice does not need to be sorted; duplicates will be removed. Returns ErrEmptyInput if the input is empty. Returns ErrValueTooLarge if any value is >= 0x80000000 (not a valid uint31).

WARNING: The input slice will be sorted in-place. If you need to preserve the original order, make a copy before calling Build.

Example

ExampleBuild demonstrates building an S-Tree from uint32 values.

// Input does not need to be sorted; duplicates are removed automatically.
values := []uint32{30, 10, 20, 10, 40, 20}
tree, err := Build(values)
if err != nil {
	panic(err)
}

// Count reflects unique values only.
fmt.Printf("Unique count: %d\n", tree.Count())

// The tree data can be written to disk or memory-mapped later.
fmt.Printf("Data size: %d bytes\n", len(tree.Data()))
Output:

Unique count: 4
Data size: 80 bytes

func BuildFromKeyed

func BuildFromKeyed[T Keyed](items []T) (*STree, error)

BuildFromKeyed creates a new S-Tree from a slice of Keyed items. The input does not need to be sorted; duplicates will be removed. During building, each unique item's SetIndex method is called with its position in the tree. This is the most efficient way to build a tree when you need index correlation. Returns ErrEmptyInput if the input is empty. Returns ErrValueTooLarge if any key is >= 0x80000000 (not a valid uint31).

WARNING: The input slice will be reordered in-place (sorted by key). If you need to preserve the original order, make a copy before calling BuildFromKeyed.

This is useful when you need to correlate keys with additional data:

type Entry struct {
    key   uint32
    index uint32
    data  []byte
}
func (e *Entry) Key() uint32            { return e.key }
func (e *Entry) Index() uint32          { return e.index }
func (e *Entry) SetIndex(idx uint32)    { e.index = idx }

entries := []*Entry{{key: 10}, {key: 5}, {key: 20}}
tree, _ := stree.BuildFromKeyed(entries)
// Now entries[i].Index() contains its position in the tree
Example

ExampleBuildFromKeyed demonstrates building an S-Tree while tracking the position of each item in the tree structure.

// Create documents with IDs (keys) and associated data.
docs := []*Document{
	{ID: 300, Title: "Third"},
	{ID: 100, Title: "First"},
	{ID: 200, Title: "Second"},
}

// Build the tree. Each document's Position field will be set to its
// index in the tree, enabling O(1) lookup of associated data.
tree, err := BuildFromKeyed(docs)
if err != nil {
	panic(err)
}

// After building, we can look up documents by their tree position.
reader, err := NewReader(tree.Data())
if err != nil {
	panic(err)
}

// Search returns the same position that was stored in the Document.
pos := reader.Search(200)
fmt.Printf("Position of ID 200: %d\n", pos)

// Find the document with that position.
for _, doc := range docs {
	if int(doc.Position) == pos {
		fmt.Printf("Found: %s\n", doc.Title)
	}
}
Output:

Position of ID 200: 1
Found: Second

func (*STree) Count

func (st *STree) Count() int

Count returns the number of unique elements in the S-Tree.

func (*STree) Data

func (st *STree) Data() []byte

Data returns the underlying byte slice containing the serialized S-Tree. This can be used directly with mmap or written to a file.

func (*STree) NumBlocks

func (st *STree) NumBlocks() int

NumBlocks returns the number of blocks in the S-Tree.

func (*STree) WriteTo

func (st *STree) WriteTo(w io.Writer) (int64, error)

WriteTo writes the S-Tree to an io.Writer. Implements io.WriterTo interface.

Example

ExampleSTree_WriteTo demonstrates serializing an S-Tree to a writer.

values := []uint32{1, 2, 3, 4, 5}
tree, _ := Build(values)

// Write to any io.Writer (file, buffer, network, etc.)
var buf bytes.Buffer
n, err := tree.WriteTo(&buf)
if err != nil {
	panic(err)
}

fmt.Printf("Wrote %d bytes\n", n)

// The serialized data can be read back with NewReader.
reader, _ := NewReader(buf.Bytes())
fmt.Printf("Reader count: %d\n", reader.Count())
Output:

Wrote 80 bytes
Reader count: 5

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL