Documentation
¶
Overview ¶
Package stree provides an S-Tree (Static B-Tree) implementation optimized for cache efficiency and SIMD operations.
The S-Tree uses Eytzinger layout with a branching factor of 17 (16 keys per node), enabling efficient cache-line-aligned traversal and SIMD-accelerated search.
Reference: https://en.algorithmica.org/hpc/data-structures/s-tree/
Example ¶
Example demonstrates basic S-Tree usage: building a tree from values and searching.
// Build a tree from uint32 values.
// Note: The input slice will be sorted and deduplicated in-place.
values := []uint32{42, 17, 100, 5, 73}
tree, err := Build(values)
if err != nil {
panic(err)
}
// Create a reader from the tree's serialized data.
reader, err := NewReader(tree.Data())
if err != nil {
panic(err)
}
// Search for a value. Returns the position if found, -1 if not.
pos := reader.Search(42)
fmt.Printf("Search(42): found=%v\n", pos >= 0)
pos = reader.Search(99)
fmt.Printf("Search(99): found=%v\n", pos >= 0)
// Use Contains for simple membership testing.
fmt.Printf("Contains(100): %v\n", reader.Contains(100))
fmt.Printf("Count: %d\n", reader.Count())
Output: Search(42): found=true Search(99): found=false Contains(100): true Count: 5
Index ¶
- Constants
- Variables
- func DataSize(count int) int
- func HasAVX2() bool
- func HasSSE42() bool
- func SIMDAvailable() bool
- type Keyed
- type Reader
- func (r *Reader) All() func(yield func(uint32) bool)
- func (r *Reader) Contains(key uint32) bool
- func (r *Reader) Count() int
- func (r *Reader) Data() []byte
- func (r *Reader) NumBlocks() int
- func (r *Reader) Search(key uint32) int
- func (r *Reader) Sorted() func(yield func(value uint32, index int) bool)
- func (r *Reader) ValidateCRC32() bool
- type STree
Examples ¶
Constants ¶
const ( // Magic bytes identifying an S-Tree file. Magic = "STRE" // Version is the current format version. Version uint16 = 0x0001 // MaxValue is the maximum allowed value for keys (2^31 - 1). // Values must be < 0x80000000 because SIMD comparison uses signed arithmetic. // This ensures consistent behavior between pure Go and SIMD implementations. MaxValue = uint32(0x7FFFFFFF) )
Variables ¶
var ( SearchAVX2 = searchAVX2 SearchSSE = searchSSE SearchGeneric = searchGeneric )
Exported for benchmarking purposes
var ( ErrEmptyInput = errors.New("stree: empty input") ErrInvalidMagic = errors.New("stree: invalid magic bytes") ErrInvalidVersion = errors.New("stree: unsupported version") ErrInvalidData = errors.New("stree: invalid data") ErrDataTooShort = errors.New("stree: data too short") ErrInvalidBlockSz = errors.New("stree: invalid block size") ErrValueTooLarge = errors.New("stree: value exceeds maximum (must be < 0x80000000)") )
Errors returned by S-Tree operations.
Functions ¶
func DataSize ¶
DataSize returns the total size in bytes needed to store count elements (header + data blocks).
func SIMDAvailable ¶
func SIMDAvailable() bool
SIMDAvailable returns true if SIMD-accelerated search is available.
Types ¶
type Keyed ¶
type Keyed interface {
// Key returns the uint32 key used for building and searching the S-Tree.
Key() uint32
// Index returns the stored index position (set by SetIndex during building).
Index() uint32
// SetIndex is called during building with the position of this key in the S-Tree.
// This allows correlating the key with additional data stored elsewhere.
SetIndex(idx uint32)
}
Keyed is the interface for types that can be indexed in an S-Tree. Implementations must provide a key for sorting/searching and accept an index position after the tree is built.
type Reader ¶
type Reader struct {
// contains filtered or unexported fields
}
Reader provides read-only access to an S-Tree. It can work with any byte slice, including memory-mapped files.
func NewReader ¶
NewReader creates a Reader from a serialized byte slice. The slice must contain valid S-Tree data (header + blocks). The slice is NOT copied; the Reader references the original data.
func NewReaderWithValidation ¶
NewReaderWithValidation creates a Reader from a byte slice and validates data integrity. This performs CRC-32 validation during construction, which adds O(n) time cost. Use NewReader for faster loading when integrity is already established.
Example ¶
ExampleNewReaderWithValidation demonstrates creating a reader with CRC-32 validation.
tree, _ := Build([]uint32{10, 20, 30})
// NewReaderWithValidation checks the CRC-32 checksum during construction.
// Use this when loading from untrusted sources (network, disk).
reader, err := NewReaderWithValidation(tree.Data())
if err != nil {
fmt.Println("Data integrity check failed!")
return
}
fmt.Printf("Valid data, count: %d\n", reader.Count())
Output: Valid data, count: 3
func (*Reader) All ¶
All returns an iterator over all non-sentinel values in the tree. Values are returned in tree traversal order (not necessarily sorted). For sorted iteration, use Sorted().
func (*Reader) Search ¶
Search searches for a key in the S-Tree using tree traversal. Returns the position in the data array where the key is found, or -1 if not found. Returns -1 immediately if key >= 0x80000000 (not a valid uint31). This uses the optimized pure-Go implementation if no SIMD-optimized version can be used for the current architecture.
func (*Reader) Sorted ¶
Sorted returns an iterator over all (value, index) pairs in sorted (ascending) order. The index is the position in the S-Tree data structure. This performs an in-order traversal of the Eytzinger tree structure. This is useful for merging trees while preserving index information.
Example ¶
ExampleReader_Sorted demonstrates iterating through all values in sorted order. This performs an in-order traversal of the Eytzinger tree structure, yielding both the value and its position in the tree.
values := []uint32{50, 10, 40, 20, 30}
tree, _ := Build(values)
reader, _ := NewReader(tree.Data())
// Sorted() returns an iterator that yields (value, index) pairs
// in ascending order by value.
fmt.Println("Values in sorted order:")
reader.Sorted()(func(value uint32, index int) bool {
fmt.Printf(" value=%d, index=%d\n", value, index)
return true // return false to stop iteration early
})
Output: Values in sorted order: value=10, index=0 value=20, index=1 value=30, index=2 value=40, index=3 value=50, index=4
func (*Reader) ValidateCRC32 ¶
ValidateCRC32 checks the data integrity by validating the stored CRC-32 checksum. Returns true if the checksum is valid, false if data is corrupted. This operation is O(n) where n is the total data size.
type STree ¶
type STree struct {
// contains filtered or unexported fields
}
STree represents an S-Tree (Static B-Tree) in memory. It can be created from a slice of uint32 values and written to disk.
func Build ¶
Build creates a new S-Tree from the given slice of uint32 values. The input slice does not need to be sorted; duplicates will be removed. Returns ErrEmptyInput if the input is empty. Returns ErrValueTooLarge if any value is >= 0x80000000 (not a valid uint31).
WARNING: The input slice will be sorted in-place. If you need to preserve the original order, make a copy before calling Build.
Example ¶
ExampleBuild demonstrates building an S-Tree from uint32 values.
// Input does not need to be sorted; duplicates are removed automatically.
values := []uint32{30, 10, 20, 10, 40, 20}
tree, err := Build(values)
if err != nil {
panic(err)
}
// Count reflects unique values only.
fmt.Printf("Unique count: %d\n", tree.Count())
// The tree data can be written to disk or memory-mapped later.
fmt.Printf("Data size: %d bytes\n", len(tree.Data()))
Output: Unique count: 4 Data size: 80 bytes
func BuildFromKeyed ¶
BuildFromKeyed creates a new S-Tree from a slice of Keyed items. The input does not need to be sorted; duplicates will be removed. During building, each unique item's SetIndex method is called with its position in the tree. This is the most efficient way to build a tree when you need index correlation. Returns ErrEmptyInput if the input is empty. Returns ErrValueTooLarge if any key is >= 0x80000000 (not a valid uint31).
WARNING: The input slice will be reordered in-place (sorted by key). If you need to preserve the original order, make a copy before calling BuildFromKeyed.
This is useful when you need to correlate keys with additional data:
type Entry struct {
key uint32
index uint32
data []byte
}
func (e *Entry) Key() uint32 { return e.key }
func (e *Entry) Index() uint32 { return e.index }
func (e *Entry) SetIndex(idx uint32) { e.index = idx }
entries := []*Entry{{key: 10}, {key: 5}, {key: 20}}
tree, _ := stree.BuildFromKeyed(entries)
// Now entries[i].Index() contains its position in the tree
Example ¶
ExampleBuildFromKeyed demonstrates building an S-Tree while tracking the position of each item in the tree structure.
// Create documents with IDs (keys) and associated data.
docs := []*Document{
{ID: 300, Title: "Third"},
{ID: 100, Title: "First"},
{ID: 200, Title: "Second"},
}
// Build the tree. Each document's Position field will be set to its
// index in the tree, enabling O(1) lookup of associated data.
tree, err := BuildFromKeyed(docs)
if err != nil {
panic(err)
}
// After building, we can look up documents by their tree position.
reader, err := NewReader(tree.Data())
if err != nil {
panic(err)
}
// Search returns the same position that was stored in the Document.
pos := reader.Search(200)
fmt.Printf("Position of ID 200: %d\n", pos)
// Find the document with that position.
for _, doc := range docs {
if int(doc.Position) == pos {
fmt.Printf("Found: %s\n", doc.Title)
}
}
Output: Position of ID 200: 1 Found: Second
func (*STree) Data ¶
Data returns the underlying byte slice containing the serialized S-Tree. This can be used directly with mmap or written to a file.
func (*STree) WriteTo ¶
WriteTo writes the S-Tree to an io.Writer. Implements io.WriterTo interface.
Example ¶
ExampleSTree_WriteTo demonstrates serializing an S-Tree to a writer.
values := []uint32{1, 2, 3, 4, 5}
tree, _ := Build(values)
// Write to any io.Writer (file, buffer, network, etc.)
var buf bytes.Buffer
n, err := tree.WriteTo(&buf)
if err != nil {
panic(err)
}
fmt.Printf("Wrote %d bytes\n", n)
// The serialized data can be read back with NewReader.
reader, _ := NewReader(buf.Bytes())
fmt.Printf("Reader count: %d\n", reader.Count())
Output: Wrote 80 bytes Reader count: 5