pruning

package
v0.0.0-...-706b979 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 4, 2026 License: AGPL-3.0 Imports: 13 Imported by: 0

Documentation

Index

Constants

View Source
const GlobCacheTTL = 30 * time.Second

GlobCacheTTL is the default TTL for glob result caching (30 seconds)

View Source
const PartitionCacheTTL = 60 * time.Second

PartitionCacheTTL is the default TTL for partition path caching (60 seconds)

Variables

This section is empty.

Functions

This section is empty.

Types

type PartitionPruner

type PartitionPruner struct {
	// contains filtered or unexported fields
}

PartitionPruner provides file-level partition pruning to skip reading Parquet files that don't match the query's WHERE clause predicates.

Key optimization: Instead of reading ALL files with /**/*.parquet glob, we extract time ranges from WHERE clauses and build a targeted file list.

Example:

Query: SELECT * FROM cpu WHERE time >= '2024-03-15' AND time < '2024-03-16'
Before: Read all 8,760 hour partitions (1 year)
After: Read only 24 hour partitions (1 day)
Result: 365x fewer files, 10-100x faster

func NewPartitionPruner

func NewPartitionPruner(logger zerolog.Logger) *PartitionPruner

NewPartitionPruner creates a new partition pruner

func (*PartitionPruner) CleanupGlobCache

func (p *PartitionPruner) CleanupGlobCache() int

CleanupGlobCache removes expired entries from the glob cache

func (*PartitionPruner) CleanupPartitionCache

func (p *PartitionPruner) CleanupPartitionCache() int

CleanupPartitionCache removes expired entries from the partition cache

func (*PartitionPruner) ExtractTimeRange

func (p *PartitionPruner) ExtractTimeRange(sqlStr string) *TimeRange

ExtractTimeRange extracts time range from WHERE clause

Supports patterns like: - time >= '2024-03-15' AND time < '2024-03-16' - time BETWEEN '2024-03-15' AND '2024-03-16' - time > '2024-03-15 10:00:00' - time >= '2024-03-15T10:00:00Z' - time > NOW() - INTERVAL '20 days' - time >= CURRENT_TIMESTAMP - INTERVAL '24 hours' - time < NOW() + INTERVAL '1 week'

func (*PartitionPruner) GeneratePartitionPaths

func (p *PartitionPruner) GeneratePartitionPaths(basePath, database, measurement string, timeRange *TimeRange) []string

GeneratePartitionPaths generates list of partition paths for the given time range

Arc's partition structure: {database}/{measurement}/{year}/{month}/{day}/{hour}/file.parquet Example: default/cpu/2024/03/15/14/cpu_20240315_140000_1000.parquet

func (*PartitionPruner) GetAllCacheStats

func (p *PartitionPruner) GetAllCacheStats() map[string]interface{}

GetAllCacheStats returns combined statistics for all caches

func (*PartitionPruner) GetGlobCacheStats

func (p *PartitionPruner) GetGlobCacheStats() map[string]interface{}

GetGlobCacheStats returns glob cache statistics

func (*PartitionPruner) GetPartitionCacheStats

func (p *PartitionPruner) GetPartitionCacheStats() map[string]interface{}

GetPartitionCacheStats returns partition cache statistics

func (*PartitionPruner) GetStats

func (p *PartitionPruner) GetStats() StatsSnapshot

GetStats returns a snapshot of pruning statistics (thread-safe)

func (*PartitionPruner) InvalidateAllCaches

func (p *PartitionPruner) InvalidateAllCaches()

InvalidateAllCaches clears both glob and partition caches Call this after compaction or when new partitions are created

func (*PartitionPruner) InvalidateGlobCache

func (p *PartitionPruner) InvalidateGlobCache()

InvalidateGlobCache clears the glob cache Call this after compaction or when new partitions are created

func (*PartitionPruner) InvalidatePartitionCache

func (p *PartitionPruner) InvalidatePartitionCache()

InvalidatePartitionCache clears the partition path cache Call this after compaction or when new partitions are created

func (*PartitionPruner) OptimizeTablePath

func (p *PartitionPruner) OptimizeTablePath(originalPath, sql string) (interface{}, bool)

OptimizeTablePath optimizes a table path based on WHERE clause predicates

Returns the optimized path (string or []string) and whether optimization was applied

func (*PartitionPruner) ResetStats

func (p *PartitionPruner) ResetStats()

ResetStats resets statistics counters (thread-safe)

func (*PartitionPruner) SetStorageBackend

func (p *PartitionPruner) SetStorageBackend(backend storage.Backend)

SetStorageBackend sets the storage backend for remote path validation (S3/Azure). When set, the pruner will filter out non-existent partition paths before returning them.

type PrunerStats

type PrunerStats struct {
	QueriesOptimized atomic.Int64
}

PrunerStats tracks partition pruning statistics using atomic counters for thread-safety

type StatsSnapshot

type StatsSnapshot struct {
	QueriesOptimized int64
}

StatsSnapshot holds a point-in-time snapshot of pruner statistics

type TimeRange

type TimeRange struct {
	Start time.Time
	End   time.Time
}

TimeRange represents a time range filter

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL