util

package
v1.0.11 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 24, 2025 License: MIT Imports: 9 Imported by: 0

Documentation

Overview

Package util provides utility components for database implementations that satisfy the db.KVDB interface.

The package contains:

  • statistics: Utility tools for analyzing database characteristics and a SizeHistogram for tracking data size distribution
  • functions: Hash functions and other utility functions
  • mapheap: A priority queue implementation for garbage collection that also supports key-based access
  • lockfreempsc: A lockmgr-free Multi-Producer Single-Consumer (MPSC) queue implementation build for high throughput and low latency

This package is particularly useful for:

  • Database developers implementing the KVDB interface
  • Implementation of garbage collection or other priority queue systems
  • Monitoring systems that need to track database size and distribution metrics

Each component is designed to work with any implementation of the db.KVDB interface, allowing for consistent validation and measurement across different storage backends.

Package util provides a lockmgr-free Multi-Producer Single-Consumer (MPSC) queue implementation.

Features and Guarantees:

  • Lock-Free: atomic operations for high throughput and low latency even under high contention
  • Unbounded Size: the queue can grow to any size as needed, limited only by available memory
  • Small Footprint: minimal memory overhead per item (two pointers per item)
  • Thread-Safe writes: Allows any number of goroutines to safely Push() concurrently
  • Single Consumer: Designed for a single goroutine to consume values (via the Recv() channel).
  • No Strict FIFO Guarantee: Under concurrent Push() operations, the exact ordering of items is determined by which producer completes its operation first, not by which producer started first.

Package util

This file provides a specialized priority queue for garbage collection purposes.

This implementation combines a binary heap with a hash map to provide both efficient priority-based operations and key-based access. It's particularly useful for garbage collection scenarios where items need to be prioritized by age or other metrics, while still allowing direct access to specific items.

Key advantages of this implementation:

1. Time Complexity:

  • O(log n) for priority operations (Push, Pop, Update)
  • O(1) for key-based lookups and existence checks
  • O(log n) for key-based removal

2. Garbage Collection Benefits:

  • Efficiently identifies the oldest/lowest-priority items for collection
  • Supports direct removal when items are manually freed
  • Allows checking if specific items are scheduled for collection
  • Can update priorities when items are accessed (for LRU-like behaviors)

3. Memory Efficiency:

  • Avoids memory leaks by properly cleaning up references
  • Minimizes overhead with a compact representation
  • Scales efficiently for large numbers of objects

4. Concurrency Considerations:

  • Note: This implementation is not thread-safe by default
  • For concurrent use, external synchronization should be applied

Use cases include memory management systems, caches with expiration, connection pools, and any scenario requiring both prioritization and direct access to elements.

Example usage:

// Create a new queue
gcQueue := NewMapHeap()
heap.Init(gcQueue)

// Add items with object IDs and timestamps
gcQueue.AddItem(1001, timestamp1)
gcQueue.AddItem(1002, timestamp2)

// Get the oldest item
oldest, exists := gcQueue.Peek()

// Remove a specific item (e.g., when manually freed)
gcQueue.RemoveByKey(1001)

// Process items in priority order
for gcQueue.Len() > 0 {
    item := heap.Pop(gcQueue).(*item)
    // Process item for garbage collection
}

Package util provides testing, benchmarking, and utility tools for KVDB implementations. This file implements a specialized size histogram for efficient tracking and analysis of data size distributions. The histogram uses exponential bucket sizing to cover a wide range of values (bytes to gigabytes) with minimal memory overhead.

Key features include:

  • Efficient memory usage through bucketing
  • Thread-safe sample addition and querying
  • Statistical estimators (median, percentiles)
  • Distribution analysis capabilities

This utility is particularly useful for database implementations that need to report on data characteristics without performing expensive full scans.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func GenerateSeed

func GenerateSeed() uint64

GenerateSeed creates a more robust random seed for internal hash distribution

Types

type DistributionStats

type DistributionStats struct {
	Stats
	DistributionQuality float64 `json:"distribution_quality"`
}

func NewDistributionStats

func NewDistributionStats(shardSizes []float64) DistributionStats

NewDistributionStats computes quality metrics for value distribution

type LockFreeMPSC

type LockFreeMPSC[T interface{}] struct {
	// contains filtered or unexported fields
}

LockFreeMPSC is a lockmgr-free multi-producer single-consumer queue Implementation uses a linked list of nodes with atomic operations for concurrent push operations without locks

func NewLockFreeMPSC

func NewLockFreeMPSC[T interface{}]() *LockFreeMPSC[T]

NewLockFreeMPSC creates a new lockmgr-free multi-producer single-consumer queue

func (*LockFreeMPSC[T]) Close

func (q *LockFreeMPSC[T]) Close()

Close closes the queue, preventing further writes. Any items already in the queue will still be delivered to the consumer.

func (*LockFreeMPSC[T]) IsClosed

func (q *LockFreeMPSC[T]) IsClosed() bool

IsClosed returns true if the queue is closed.

func (*LockFreeMPSC[T]) Len

func (q *LockFreeMPSC[T]) Len() int

Len returns an approximate count of the number of items in the queue. This is O(n) and should only be used for debugging.

func (*LockFreeMPSC[T]) Push

func (q *LockFreeMPSC[T]) Push(value *T) bool

Push adds an item to the queue. Returns true if the item was added, or false if the queue is closed.

Thread-safety: This method is thread-safe and can be called concurrently.

func (*LockFreeMPSC[T]) Recv

func (q *LockFreeMPSC[T]) Recv() <-chan *T

Recv returns a receive-only channel for consuming from the queue. This allows the queue to be used with the '<-' operator in select statements.

type MapHeap

type MapHeap struct {
	// contains filtered or unexported fields
}

MapHeap implements a priority queue for garbage collection with both heap operations and key-based access

func NewMapHeap

func NewMapHeap() *MapHeap

NewMapHeap creates a new garbage collection queue

func (*MapHeap) AddItem

func (gcq *MapHeap) AddItem(key, priority uint64)

AddItem adds a new item to the queue or updates existing one

func (*MapHeap) Contains

func (gcq *MapHeap) Contains(key uint64) bool

Contains checks if a key exists in the queue

func (*MapHeap) GetByKey

func (gcq *MapHeap) GetByKey(key uint64) (*item, bool)

GetByKey retrieves an item by its key without removing it

func (*MapHeap) Len

func (gcq *MapHeap) Len() int

Len returns the number of items in the queue (part of heap.Interface)

func (*MapHeap) Less

func (gcq *MapHeap) Less(i, j int) bool

Less compares items by value (part of heap.Interface) For GC, typically we want oldest items first (min-heap by timestamp)

func (*MapHeap) Peek

func (gcq *MapHeap) Peek() (*item, bool)

Peek returns the minimum value item without removing it

func (*MapHeap) Pop

func (gcq *MapHeap) Pop() interface{}

Pop removes and returns the minimum item (part of heap.Interface)

func (*MapHeap) Push

func (gcq *MapHeap) Push(x interface{})

Push adds an item to the heap (part of heap.Interface)

func (*MapHeap) RemoveByKey

func (gcq *MapHeap) RemoveByKey(key uint64) (uint64, bool)

RemoveByKey removes an item by its key

func (*MapHeap) Swap

func (gcq *MapHeap) Swap(i, j int)

Swap exchanges items at positions i and j (part of heap.Interface)

type SizeHistogram

type SizeHistogram struct {
	// contains filtered or unexported fields
}

SizeHistogram tracks the distribution of data sizes It organizes sizes into buckets for efficient memory usage while still providing accurate size estimations. Supports tracking values from bytes to multiple gigabytes.

func NewSizeHistogram

func NewSizeHistogram() *SizeHistogram

NewSizeHistogram creates a new size histogram with default bucket boundaries The boundaries are calibrated to handle sizes from bytes to gigabytes

func (*SizeHistogram) AddSample

func (h *SizeHistogram) AddSample(size int)

AddSample adds a size sample to the histogram

Thread-safe: This method is safe for concurrent use

func (*SizeHistogram) AverageSize

func (h *SizeHistogram) AverageSize() int

AverageSize returns the average size across all samples

Thread-safe: This method is safe for concurrent use

func (*SizeHistogram) GetCount

func (h *SizeHistogram) GetCount() int64

GetCount returns the total number of samples

Thread-safe: This method is safe for concurrent use

func (*SizeHistogram) GetPercentileEstimate

func (h *SizeHistogram) GetPercentileEstimate(percentile int) int

GetPercentileEstimate returns an estimate for the given percentile (0-100)

Thread-safe: This method is safe for concurrent use

func (*SizeHistogram) MedianEstimate

func (h *SizeHistogram) MedianEstimate() int

MedianEstimate estimates the median size based on the histogram

Thread-safe: This method is safe for concurrent use

func (*SizeHistogram) Reset

func (h *SizeHistogram) Reset()

Reset clears all histogram data

Thread-safe: This method is safe for concurrent use

func (*SizeHistogram) SizeDistribution

func (h *SizeHistogram) SizeDistribution() ([]int, []float64)

SizeDistribution returns the distribution of samples across buckets Returns two slices: bucket boundaries and the percentage in each bucket

Thread-safe: This method is safe for concurrent use

type Stats

type Stats struct {
	StdDeviation float64 `json:"std_deviation"`
	Min          float64 `json:"min"`
	Max          float64 `json:"max"`
	Mean         float64 `json:"mean"`
	MinMaxRatio  float64 `json:"min_max_ratio"`
}

func NewStats

func NewStats(values []float64) Stats

NewStats computes the standard deviation, minimum, and maximum values from an array of float64 values.

type UintKey

type UintKey uint64

UintKey is an efficient key type based on uint64 for internal hash representation

func HashString

func HashString(s string, seed uint64) UintKey

HashString generates a hash value for a string with a seed This function uses the FNV-1a hash algorithm, which is fast and has good distribution

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL