Documentation
¶
Index ¶
- Constants
- type PartitionPruner
- func (p *PartitionPruner) CleanupGlobCache() int
- func (p *PartitionPruner) CleanupPartitionCache() int
- func (p *PartitionPruner) ExtractTimeRange(sqlStr string) *TimeRange
- func (p *PartitionPruner) GeneratePartitionPaths(basePath, database, measurement string, timeRange *TimeRange) []string
- func (p *PartitionPruner) GetAllCacheStats() map[string]interface{}
- func (p *PartitionPruner) GetGlobCacheStats() map[string]interface{}
- func (p *PartitionPruner) GetPartitionCacheStats() map[string]interface{}
- func (p *PartitionPruner) GetStats() StatsSnapshot
- func (p *PartitionPruner) InvalidateAllCaches()
- func (p *PartitionPruner) InvalidateGlobCache()
- func (p *PartitionPruner) InvalidatePartitionCache()
- func (p *PartitionPruner) OptimizeTablePath(originalPath, sql string) (interface{}, bool)
- func (p *PartitionPruner) ResetStats()
- func (p *PartitionPruner) SetStorageBackend(backend storage.Backend)
- type PrunerStats
- type StatsSnapshot
- type TimeRange
Constants ¶
const GlobCacheTTL = 30 * time.Second
GlobCacheTTL is the default TTL for glob result caching (30 seconds)
const PartitionCacheTTL = 60 * time.Second
PartitionCacheTTL is the default TTL for partition path caching (60 seconds)
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type PartitionPruner ¶
type PartitionPruner struct {
// contains filtered or unexported fields
}
PartitionPruner provides file-level partition pruning to skip reading Parquet files that don't match the query's WHERE clause predicates.
Key optimization: Instead of reading ALL files with /**/*.parquet glob, we extract time ranges from WHERE clauses and build a targeted file list.
Example:
Query: SELECT * FROM cpu WHERE time >= '2024-03-15' AND time < '2024-03-16' Before: Read all 8,760 hour partitions (1 year) After: Read only 24 hour partitions (1 day) Result: 365x fewer files, 10-100x faster
func NewPartitionPruner ¶
func NewPartitionPruner(logger zerolog.Logger) *PartitionPruner
NewPartitionPruner creates a new partition pruner
func (*PartitionPruner) CleanupGlobCache ¶
func (p *PartitionPruner) CleanupGlobCache() int
CleanupGlobCache removes expired entries from the glob cache
func (*PartitionPruner) CleanupPartitionCache ¶
func (p *PartitionPruner) CleanupPartitionCache() int
CleanupPartitionCache removes expired entries from the partition cache
func (*PartitionPruner) ExtractTimeRange ¶
func (p *PartitionPruner) ExtractTimeRange(sqlStr string) *TimeRange
ExtractTimeRange extracts time range from WHERE clause
Supports patterns like: - time >= '2024-03-15' AND time < '2024-03-16' - time BETWEEN '2024-03-15' AND '2024-03-16' - time > '2024-03-15 10:00:00' - time >= '2024-03-15T10:00:00Z' - time > NOW() - INTERVAL '20 days' - time >= CURRENT_TIMESTAMP - INTERVAL '24 hours' - time < NOW() + INTERVAL '1 week'
func (*PartitionPruner) GeneratePartitionPaths ¶
func (p *PartitionPruner) GeneratePartitionPaths(basePath, database, measurement string, timeRange *TimeRange) []string
GeneratePartitionPaths generates list of partition paths for the given time range
Arc's partition structure: {database}/{measurement}/{year}/{month}/{day}/{hour}/file.parquet Example: default/cpu/2024/03/15/14/cpu_20240315_140000_1000.parquet
func (*PartitionPruner) GetAllCacheStats ¶
func (p *PartitionPruner) GetAllCacheStats() map[string]interface{}
GetAllCacheStats returns combined statistics for all caches
func (*PartitionPruner) GetGlobCacheStats ¶
func (p *PartitionPruner) GetGlobCacheStats() map[string]interface{}
GetGlobCacheStats returns glob cache statistics
func (*PartitionPruner) GetPartitionCacheStats ¶
func (p *PartitionPruner) GetPartitionCacheStats() map[string]interface{}
GetPartitionCacheStats returns partition cache statistics
func (*PartitionPruner) GetStats ¶
func (p *PartitionPruner) GetStats() StatsSnapshot
GetStats returns a snapshot of pruning statistics (thread-safe)
func (*PartitionPruner) InvalidateAllCaches ¶
func (p *PartitionPruner) InvalidateAllCaches()
InvalidateAllCaches clears both glob and partition caches Call this after compaction or when new partitions are created
func (*PartitionPruner) InvalidateGlobCache ¶
func (p *PartitionPruner) InvalidateGlobCache()
InvalidateGlobCache clears the glob cache Call this after compaction or when new partitions are created
func (*PartitionPruner) InvalidatePartitionCache ¶
func (p *PartitionPruner) InvalidatePartitionCache()
InvalidatePartitionCache clears the partition path cache Call this after compaction or when new partitions are created
func (*PartitionPruner) OptimizeTablePath ¶
func (p *PartitionPruner) OptimizeTablePath(originalPath, sql string) (interface{}, bool)
OptimizeTablePath optimizes a table path based on WHERE clause predicates
Returns the optimized path (string or []string) and whether optimization was applied
func (*PartitionPruner) ResetStats ¶
func (p *PartitionPruner) ResetStats()
ResetStats resets statistics counters (thread-safe)
func (*PartitionPruner) SetStorageBackend ¶
func (p *PartitionPruner) SetStorageBackend(backend storage.Backend)
SetStorageBackend sets the storage backend for remote path validation (S3/Azure). When set, the pruner will filter out non-existent partition paths before returning them.
type PrunerStats ¶
PrunerStats tracks partition pruning statistics using atomic counters for thread-safety
type StatsSnapshot ¶
type StatsSnapshot struct {
QueriesOptimized int64
}
StatsSnapshot holds a point-in-time snapshot of pruner statistics