Documentation
¶
Overview ¶
Package util provides utility components for database implementations that satisfy the db.KVDB interface.
The package contains:
- statistics: Utility tools for analyzing database characteristics and a SizeHistogram for tracking data size distribution
- functions: Hash functions and other utility functions
- mapheap: A priority queue implementation for garbage collection that also supports key-based access
- lockfreempsc: A lockmgr-free Multi-Producer Single-Consumer (MPSC) queue implementation build for high throughput and low latency
This package is particularly useful for:
- Database developers implementing the KVDB interface
- Implementation of garbage collection or other priority queue systems
- Monitoring systems that need to track database size and distribution metrics
Each component is designed to work with any implementation of the db.KVDB interface, allowing for consistent validation and measurement across different storage backends.
Package util provides a lockmgr-free Multi-Producer Single-Consumer (MPSC) queue implementation.
Features and Guarantees:
- Lock-Free: atomic operations for high throughput and low latency even under high contention
- Unbounded Size: the queue can grow to any size as needed, limited only by available memory
- Small Footprint: minimal memory overhead per item (two pointers per item)
- Thread-Safe writes: Allows any number of goroutines to safely Push() concurrently
- Single Consumer: Designed for a single goroutine to consume values (via the Recv() channel).
- No Strict FIFO Guarantee: Under concurrent Push() operations, the exact ordering of items is determined by which producer completes its operation first, not by which producer started first.
Package util ¶
This file provides a specialized priority queue for garbage collection purposes.
This implementation combines a binary heap with a hash map to provide both efficient priority-based operations and key-based access. It's particularly useful for garbage collection scenarios where items need to be prioritized by age or other metrics, while still allowing direct access to specific items.
Key advantages of this implementation:
1. Time Complexity:
- O(log n) for priority operations (Push, Pop, Update)
- O(1) for key-based lookups and existence checks
- O(log n) for key-based removal
2. Garbage Collection Benefits:
- Efficiently identifies the oldest/lowest-priority items for collection
- Supports direct removal when items are manually freed
- Allows checking if specific items are scheduled for collection
- Can update priorities when items are accessed (for LRU-like behaviors)
3. Memory Efficiency:
- Avoids memory leaks by properly cleaning up references
- Minimizes overhead with a compact representation
- Scales efficiently for large numbers of objects
4. Concurrency Considerations:
- Note: This implementation is not thread-safe by default
- For concurrent use, external synchronization should be applied
Use cases include memory management systems, caches with expiration, connection pools, and any scenario requiring both prioritization and direct access to elements.
Example usage:
// Create a new queue
gcQueue := NewMapHeap()
heap.Init(gcQueue)
// Add items with object IDs and timestamps
gcQueue.AddItem(1001, timestamp1)
gcQueue.AddItem(1002, timestamp2)
// Get the oldest item
oldest, exists := gcQueue.Peek()
// Remove a specific item (e.g., when manually freed)
gcQueue.RemoveByKey(1001)
// Process items in priority order
for gcQueue.Len() > 0 {
item := heap.Pop(gcQueue).(*item)
// Process item for garbage collection
}
Package util provides testing, benchmarking, and utility tools for KVDB implementations. This file implements a specialized size histogram for efficient tracking and analysis of data size distributions. The histogram uses exponential bucket sizing to cover a wide range of values (bytes to gigabytes) with minimal memory overhead.
Key features include:
- Efficient memory usage through bucketing
- Thread-safe sample addition and querying
- Statistical estimators (median, percentiles)
- Distribution analysis capabilities
This utility is particularly useful for database implementations that need to report on data characteristics without performing expensive full scans.
Index ¶
- func GenerateSeed() uint64
- type DistributionStats
- type LockFreeMPSC
- type MapHeap
- func (gcq *MapHeap) AddItem(key, priority uint64)
- func (gcq *MapHeap) Contains(key uint64) bool
- func (gcq *MapHeap) GetByKey(key uint64) (*item, bool)
- func (gcq *MapHeap) Len() int
- func (gcq *MapHeap) Less(i, j int) bool
- func (gcq *MapHeap) Peek() (*item, bool)
- func (gcq *MapHeap) Pop() interface{}
- func (gcq *MapHeap) Push(x interface{})
- func (gcq *MapHeap) RemoveByKey(key uint64) (uint64, bool)
- func (gcq *MapHeap) Swap(i, j int)
- type SizeHistogram
- func (h *SizeHistogram) AddSample(size int)
- func (h *SizeHistogram) AverageSize() int
- func (h *SizeHistogram) GetCount() int64
- func (h *SizeHistogram) GetPercentileEstimate(percentile int) int
- func (h *SizeHistogram) MedianEstimate() int
- func (h *SizeHistogram) Reset()
- func (h *SizeHistogram) SizeDistribution() ([]int, []float64)
- type Stats
- type UintKey
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func GenerateSeed ¶
func GenerateSeed() uint64
GenerateSeed creates a more robust random seed for internal hash distribution
Types ¶
type DistributionStats ¶
func NewDistributionStats ¶
func NewDistributionStats(shardSizes []float64) DistributionStats
NewDistributionStats computes quality metrics for value distribution
type LockFreeMPSC ¶
type LockFreeMPSC[T interface{}] struct {
// contains filtered or unexported fields
}
LockFreeMPSC is a lockmgr-free multi-producer single-consumer queue Implementation uses a linked list of nodes with atomic operations for concurrent push operations without locks
func NewLockFreeMPSC ¶
func NewLockFreeMPSC[T interface{}]() *LockFreeMPSC[T]
NewLockFreeMPSC creates a new lockmgr-free multi-producer single-consumer queue
func (*LockFreeMPSC[T]) Close ¶
func (q *LockFreeMPSC[T]) Close()
Close closes the queue, preventing further writes. Any items already in the queue will still be delivered to the consumer.
func (*LockFreeMPSC[T]) IsClosed ¶
func (q *LockFreeMPSC[T]) IsClosed() bool
IsClosed returns true if the queue is closed.
func (*LockFreeMPSC[T]) Len ¶
func (q *LockFreeMPSC[T]) Len() int
Len returns an approximate count of the number of items in the queue. This is O(n) and should only be used for debugging.
func (*LockFreeMPSC[T]) Push ¶
func (q *LockFreeMPSC[T]) Push(value *T) bool
Push adds an item to the queue. Returns true if the item was added, or false if the queue is closed.
Thread-safety: This method is thread-safe and can be called concurrently.
func (*LockFreeMPSC[T]) Recv ¶
func (q *LockFreeMPSC[T]) Recv() <-chan *T
Recv returns a receive-only channel for consuming from the queue. This allows the queue to be used with the '<-' operator in select statements.
type MapHeap ¶
type MapHeap struct {
// contains filtered or unexported fields
}
MapHeap implements a priority queue for garbage collection with both heap operations and key-based access
func (*MapHeap) Less ¶
Less compares items by value (part of heap.Interface) For GC, typically we want oldest items first (min-heap by timestamp)
func (*MapHeap) Pop ¶
func (gcq *MapHeap) Pop() interface{}
Pop removes and returns the minimum item (part of heap.Interface)
func (*MapHeap) Push ¶
func (gcq *MapHeap) Push(x interface{})
Push adds an item to the heap (part of heap.Interface)
func (*MapHeap) RemoveByKey ¶
RemoveByKey removes an item by its key
type SizeHistogram ¶
type SizeHistogram struct {
// contains filtered or unexported fields
}
SizeHistogram tracks the distribution of data sizes It organizes sizes into buckets for efficient memory usage while still providing accurate size estimations. Supports tracking values from bytes to multiple gigabytes.
func NewSizeHistogram ¶
func NewSizeHistogram() *SizeHistogram
NewSizeHistogram creates a new size histogram with default bucket boundaries The boundaries are calibrated to handle sizes from bytes to gigabytes
func (*SizeHistogram) AddSample ¶
func (h *SizeHistogram) AddSample(size int)
AddSample adds a size sample to the histogram
Thread-safe: This method is safe for concurrent use
func (*SizeHistogram) AverageSize ¶
func (h *SizeHistogram) AverageSize() int
AverageSize returns the average size across all samples
Thread-safe: This method is safe for concurrent use
func (*SizeHistogram) GetCount ¶
func (h *SizeHistogram) GetCount() int64
GetCount returns the total number of samples
Thread-safe: This method is safe for concurrent use
func (*SizeHistogram) GetPercentileEstimate ¶
func (h *SizeHistogram) GetPercentileEstimate(percentile int) int
GetPercentileEstimate returns an estimate for the given percentile (0-100)
Thread-safe: This method is safe for concurrent use
func (*SizeHistogram) MedianEstimate ¶
func (h *SizeHistogram) MedianEstimate() int
MedianEstimate estimates the median size based on the histogram
Thread-safe: This method is safe for concurrent use
func (*SizeHistogram) Reset ¶
func (h *SizeHistogram) Reset()
Reset clears all histogram data
Thread-safe: This method is safe for concurrent use
func (*SizeHistogram) SizeDistribution ¶
func (h *SizeHistogram) SizeDistribution() ([]int, []float64)
SizeDistribution returns the distribution of samples across buckets Returns two slices: bucket boundaries and the percentage in each bucket
Thread-safe: This method is safe for concurrent use
type Stats ¶
type UintKey ¶
type UintKey uint64
UintKey is an efficient key type based on uint64 for internal hash representation
func HashString ¶
HashString generates a hash value for a string with a seed This function uses the FNV-1a hash algorithm, which is fast and has good distribution