util

package

v1.0.11 Latest Latest Go to latest Published: Apr 24, 2025 License: MIT Imports: 9 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/ValentinKolb/dKV

Links

Open Source Insights

Documentation ¶

Overview ¶

Package util provides utility components for database implementations that satisfy the db.KVDB interface.

The package contains:

statistics: Utility tools for analyzing database characteristics and a SizeHistogram for tracking data size distribution
functions: Hash functions and other utility functions
mapheap: A priority queue implementation for garbage collection that also supports key-based access
lockfreempsc: A lockmgr-free Multi-Producer Single-Consumer (MPSC) queue implementation build for high throughput and low latency

This package is particularly useful for:

Database developers implementing the KVDB interface
Implementation of garbage collection or other priority queue systems
Monitoring systems that need to track database size and distribution metrics

Each component is designed to work with any implementation of the db.KVDB interface, allowing for consistent validation and measurement across different storage backends.

Package util provides a lockmgr-free Multi-Producer Single-Consumer (MPSC) queue implementation.

Features and Guarantees:

Lock-Free: atomic operations for high throughput and low latency even under high contention
Unbounded Size: the queue can grow to any size as needed, limited only by available memory
Small Footprint: minimal memory overhead per item (two pointers per item)
Thread-Safe writes: Allows any number of goroutines to safely Push() concurrently
Single Consumer: Designed for a single goroutine to consume values (via the Recv() channel).
No Strict FIFO Guarantee: Under concurrent Push() operations, the exact ordering of items is determined by which producer completes its operation first, not by which producer started first.

Package util ¶

This file provides a specialized priority queue for garbage collection purposes.

This implementation combines a binary heap with a hash map to provide both efficient priority-based operations and key-based access. It's particularly useful for garbage collection scenarios where items need to be prioritized by age or other metrics, while still allowing direct access to specific items.

Key advantages of this implementation:

1. Time Complexity:

O(log n) for priority operations (Push, Pop, Update)
O(1) for key-based lookups and existence checks
O(log n) for key-based removal

2. Garbage Collection Benefits:

Efficiently identifies the oldest/lowest-priority items for collection
Supports direct removal when items are manually freed
Allows checking if specific items are scheduled for collection
Can update priorities when items are accessed (for LRU-like behaviors)

3. Memory Efficiency:

Avoids memory leaks by properly cleaning up references
Minimizes overhead with a compact representation
Scales efficiently for large numbers of objects

4. Concurrency Considerations:

Note: This implementation is not thread-safe by default
For concurrent use, external synchronization should be applied

Use cases include memory management systems, caches with expiration, connection pools, and any scenario requiring both prioritization and direct access to elements.

Example usage:

// Create a new queue
gcQueue := NewMapHeap()
heap.Init(gcQueue)

// Add items with object IDs and timestamps
gcQueue.AddItem(1001, timestamp1)
gcQueue.AddItem(1002, timestamp2)

// Get the oldest item
oldest, exists := gcQueue.Peek()

// Remove a specific item (e.g., when manually freed)
gcQueue.RemoveByKey(1001)

// Process items in priority order
for gcQueue.Len() > 0 {
    item := heap.Pop(gcQueue).(*item)
    // Process item for garbage collection
}

Package util provides testing, benchmarking, and utility tools for KVDB implementations. This file implements a specialized size histogram for efficient tracking and analysis of data size distributions. The histogram uses exponential bucket sizing to cover a wide range of values (bytes to gigabytes) with minimal memory overhead.

Key features include:

Efficient memory usage through bucketing
Thread-safe sample addition and querying
Statistical estimators (median, percentiles)
Distribution analysis capabilities

This utility is particularly useful for database implementations that need to report on data characteristics without performing expensive full scans.

Index ¶

func GenerateSeed() uint64
type DistributionStats
- func NewDistributionStats(shardSizes []float64) DistributionStats
type LockFreeMPSC
- func NewLockFreeMPSC[T interface{}]() *LockFreeMPSC[T]
type MapHeap
- func NewMapHeap() *MapHeap
type SizeHistogram
- func NewSizeHistogram() *SizeHistogram
type Stats
- func NewStats(values []float64) Stats
type UintKey
- func HashString(s string, seed uint64) UintKey

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func GenerateSeed ¶

func GenerateSeed() uint64

GenerateSeed creates a more robust random seed for internal hash distribution

Types ¶

type DistributionStats ¶

type DistributionStats struct {
	Stats
	DistributionQuality float64 `json:"distribution_quality"`
}

func NewDistributionStats ¶

func NewDistributionStats(shardSizes []float64) DistributionStats

NewDistributionStats computes quality metrics for value distribution

type LockFreeMPSC ¶

type LockFreeMPSC[T interface{}] struct {
	// contains filtered or unexported fields
}

LockFreeMPSC is a lockmgr-free multi-producer single-consumer queue Implementation uses a linked list of nodes with atomic operations for concurrent push operations without locks

func NewLockFreeMPSC ¶

func NewLockFreeMPSC[T interface{}]() *LockFreeMPSC[T]

NewLockFreeMPSC creates a new lockmgr-free multi-producer single-consumer queue

func (*LockFreeMPSC[T]) Close ¶

func (q *LockFreeMPSC[T]) Close()

Close closes the queue, preventing further writes. Any items already in the queue will still be delivered to the consumer.

func (*LockFreeMPSC[T]) IsClosed ¶

func (q *LockFreeMPSC[T]) IsClosed() bool

IsClosed returns true if the queue is closed.

func (*LockFreeMPSC[T]) Len ¶

func (q *LockFreeMPSC[T]) Len() int

Len returns an approximate count of the number of items in the queue. This is O(n) and should only be used for debugging.

func (*LockFreeMPSC[T]) Push ¶

func (q *LockFreeMPSC[T]) Push(value *T) bool

Push adds an item to the queue. Returns true if the item was added, or false if the queue is closed.

Thread-safety: This method is thread-safe and can be called concurrently.

func (*LockFreeMPSC[T]) Recv ¶

func (q *LockFreeMPSC[T]) Recv() <-chan *T

Recv returns a receive-only channel for consuming from the queue. This allows the queue to be used with the '<-' operator in select statements.

type MapHeap ¶

type MapHeap struct {
	// contains filtered or unexported fields
}

MapHeap implements a priority queue for garbage collection with both heap operations and key-based access

func NewMapHeap ¶

func NewMapHeap() *MapHeap

NewMapHeap creates a new garbage collection queue

func (*MapHeap) AddItem ¶

func (gcq *MapHeap) AddItem(key, priority uint64)

AddItem adds a new item to the queue or updates existing one

func (*MapHeap) Contains ¶

func (gcq *MapHeap) Contains(key uint64) bool

Contains checks if a key exists in the queue

func (*MapHeap) GetByKey ¶

func (gcq *MapHeap) GetByKey(key uint64) (*item, bool)

GetByKey retrieves an item by its key without removing it

func (*MapHeap) Len ¶

func (gcq *MapHeap) Len() int

Len returns the number of items in the queue (part of heap.Interface)

func (*MapHeap) Less ¶

func (gcq *MapHeap) Less(i, j int) bool

Less compares items by value (part of heap.Interface) For GC, typically we want oldest items first (min-heap by timestamp)

func (*MapHeap) Peek ¶

func (gcq *MapHeap) Peek() (*item, bool)

Peek returns the minimum value item without removing it

func (*MapHeap) Pop ¶

func (gcq *MapHeap) Pop() interface{}

Pop removes and returns the minimum item (part of heap.Interface)

func (*MapHeap) Push ¶

func (gcq *MapHeap) Push(x interface{})

Push adds an item to the heap (part of heap.Interface)

func (*MapHeap) RemoveByKey ¶

func (gcq *MapHeap) RemoveByKey(key uint64) (uint64, bool)

RemoveByKey removes an item by its key

func (*MapHeap) Swap ¶

func (gcq *MapHeap) Swap(i, j int)

Swap exchanges items at positions i and j (part of heap.Interface)

type SizeHistogram ¶

type SizeHistogram struct {
	// contains filtered or unexported fields
}

SizeHistogram tracks the distribution of data sizes It organizes sizes into buckets for efficient memory usage while still providing accurate size estimations. Supports tracking values from bytes to multiple gigabytes.

func NewSizeHistogram ¶

func NewSizeHistogram() *SizeHistogram

NewSizeHistogram creates a new size histogram with default bucket boundaries The boundaries are calibrated to handle sizes from bytes to gigabytes