queryworker

package

v1.11.0 Latest Latest Go to latest Published: Feb 17, 2026 License: AGPL-3.0 Imports: 27 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/cardinalhq/lakerunner

Links

Open Source Insights

Documentation ¶

Index ¶

Constants
func EvaluatePushDown[T promql.Timestamped](ctx context.Context, w *CacheManager, request queryapi.PushDownRequest, ...) (<-chan T, error)
func EvaluatePushDownWithAggSplit[T promql.Timestamped](ctx context.Context, w *CacheManager, request queryapi.PushDownRequest, ...) (<-chan T, error)
type CacheKey
type CacheManager
- func NewCacheManager(dl DownloadBatchFunc, dataset string, ...) *CacheManager
- func (w *CacheManager) Close()
type DiskUsageFunc
type DownloadBatchFunc
type ParquetFileCache
- func NewParquetFileCache(cleanupInterval time.Duration) (*ParquetFileCache, error)
- func NewParquetFileCacheWithBaseDir(baseDir string, cleanupInterval time.Duration) (*ParquetFileCache, error)
type RowMapper
type WorkerService
- func NewWorkerService(metricsGlobSize int, logsGlobSize int, tracesGlobSize int, ...) (*WorkerService, error)

Constants ¶

View Source

const (
	// DefaultCleanupInterval is how often the cleanup goroutine runs.
	DefaultCleanupInterval = 5 * time.Minute

	// DefaultDeletionDelay is how long to wait before actually deleting a file
	// after it's been marked for deletion. This allows concurrent queries to
	// finish using the file before it's removed.
	DefaultDeletionDelay = 1 * time.Minute

	// DiskUsageHighWatermark is the disk utilization level that triggers eviction.
	// When disk usage exceeds this fraction, the cache will start evicting files.
	DiskUsageHighWatermark = 0.80

	// DiskUsageLowWatermark is the target disk utilization after eviction.
	// Eviction continues until disk usage drops below this level.
	DiskUsageLowWatermark = 0.70
)

View Source

const (
	ChannelBufferSize = 4096
)

Variables ¶

This section is empty.

Functions ¶

func EvaluatePushDown ¶ added in v1.3.0

func EvaluatePushDown[T promql.Timestamped](
	ctx context.Context,
	w *CacheManager,
	request queryapi.PushDownRequest,
	userSQL string,
	s3GlobSize int,
	mapper RowMapper[T],
) (<-chan T, error)

func EvaluatePushDownWithAggSplit ¶ added in v1.8.0

func EvaluatePushDownWithAggSplit[T promql.Timestamped](
	ctx context.Context,
	w *CacheManager,
	request queryapi.PushDownRequest,
	tblSQL string,
	aggSQL string,
	groupBy []string,
	matcherFields []string,
	s3GlobSize int,
	mapper RowMapper[T],
) (<-chan T, error)

EvaluatePushDownWithAggSplit evaluates a pushdown query with intelligent routing between agg_ and tbl_ files based on segment eligibility. For segments where agg_ files exist and support the query's GROUP BY and matchers, use aggSQL on agg_ files. Otherwise use tblSQL on tbl_ files.

Types ¶

type CacheKey ¶ added in v1.7.1

type CacheKey struct {
	Region   string // cloud region (e.g., "us-east-1"), can be empty
	Bucket   string // bucket name
	ObjectID string // object key/path within the bucket
}

CacheKey uniquely identifies a cached file by its cloud storage location.

type CacheManager ¶ added in v1.3.0

type CacheManager struct {
	// contains filtered or unexported fields
}

CacheManager coordinates downloads and queries over local parquet files.

func NewCacheManager ¶ added in v1.3.0

func NewCacheManager(dl DownloadBatchFunc, dataset string, storageProfileProvider storageprofile.StorageProfileProvider, pool *duckdbx.DB, parquetCache *ParquetFileCache) *CacheManager

func (*CacheManager) Close ¶ added in v1.3.0

func (w *CacheManager) Close()

type DiskUsageFunc ¶ added in v1.8.0

type DiskUsageFunc func(path string) (usedBytes, totalBytes uint64, err error)

DiskUsageFunc is a function that returns disk usage statistics.

type DownloadBatchFunc ¶ added in v1.3.0

type DownloadBatchFunc func(ctx context.Context, storageProfile storageprofile.StorageProfile, keys []string) error

DownloadBatchFunc downloads ALL given paths to their target local paths.

type ParquetFileCache ¶ added in v1.7.0

type ParquetFileCache struct {
	// contains filtered or unexported fields
}

ParquetFileCache manages downloaded parquet files with disk pressure-based cleanup. It tracks file sizes and access times for metrics and LRU eviction purposes. The cache owns its storage directory and provides a domain-aware API for callers to work with (bucket, region, objectId) rather than file paths.

func NewParquetFileCache ¶ added in v1.7.0

func NewParquetFileCache(cleanupInterval time.Duration) (*ParquetFileCache, error)

NewParquetFileCache creates a new parquet file cache with disk pressure-based cleanup. The cache directory is created under os.TempDir(), which respects the TMPDIR environment variable (typically set by helpers.SetupTempDir()).

func NewParquetFileCacheWithBaseDir ¶ added in v1.7.1

func NewParquetFileCacheWithBaseDir(baseDir string, cleanupInterval time.Duration) (*ParquetFileCache, error)

NewParquetFileCacheWithBaseDir creates a new parquet file cache with a custom base directory. This is useful for testing where each test needs an isolated cache directory.

func (*ParquetFileCache) Close ¶ added in v1.7.0

func (pfc *ParquetFileCache) Close()

Close stops the cleanup goroutine and removes all cached files.

func (*ParquetFileCache) FileCount ¶ added in v1.7.0

func (pfc *ParquetFileCache) FileCount() int64

FileCount returns the current number of tracked files.

func (*ParquetFileCache) GetOrPrepare ¶ added in v1.7.1

func (pfc *ParquetFileCache) GetOrPrepare(region, bucket, objectID string) (localPath string, exists bool, err error)

GetOrPrepare checks if a file is cached and returns its local path. If cached and valid, returns (localPath, true, nil). If not cached, prepares the directory structure and returns (localPath, false, nil). The caller should download to the returned path and then call TrackFile.

func (*ParquetFileCache) MarkForDeletion ¶ added in v1.7.0

func (pfc *ParquetFileCache) MarkForDeletion(region, bucket, objectID string)

MarkForDeletion marks a file for delayed deletion. The file will be deleted after DefaultDeletionDelay has passed. Until then, the file remains on disk but GetOrPrepare will return exists=false, causing re-downloads if needed.

func (*ParquetFileCache) MarkForDeletionWithCompanion ¶ added in v1.8.0

func (pfc *ParquetFileCache) MarkForDeletionWithCompanion(region, bucket, objectID string)

MarkForDeletionWithCompanion marks a tbl_ file and its corresponding agg_ file for delayed deletion. This ensures tbl_ and agg_ files are treated as a unit.

func (*ParquetFileCache) RegisterMetrics ¶ added in v1.7.0

func (pfc *ParquetFileCache) RegisterMetrics() error

RegisterMetrics registers OTEL metrics for the parquet file cache.

func (*ParquetFileCache) ScanExistingFiles ¶ added in v1.7.0

func (pfc *ParquetFileCache) ScanExistingFiles() error

ScanExistingFiles scans the cache directory and tracks any existing files. This is useful on startup to recover state from previous runs. Files with parseable paths (containing "db/" marker) are fully tracked with CacheKey reconstruction. Other files are tracked for cleanup only.

func (*ParquetFileCache) TotalBytes ¶ added in v1.7.0

func (pfc *ParquetFileCache) TotalBytes() int64

TotalBytes returns the total size of tracked files.

func (*ParquetFileCache) TrackFile ¶ added in v1.7.0

func (pfc *ParquetFileCache) TrackFile(region, bucket, objectID string) error

TrackFile marks a file as successfully downloaded and starts tracking it. Should be called after a file is downloaded to the path returned by GetOrPrepare.

type RowMapper ¶ added in v1.3.0

type RowMapper[T promql.Timestamped] func(queryapi.PushDownRequest, []string, *sql.Rows) (T, error)

RowMapper turns the current row into a T.

type WorkerService ¶ added in v1.3.0

type WorkerService struct {
	MetricsCM            *CacheManager
	LogsCM               *CacheManager
	TracesCM             *CacheManager
	StorageProfilePoller storageprofile.StorageProfileProvider
	MetricsGlobSize      int
	LogsGlobSize         int
	TracesGlobSize       int
	// contains filtered or unexported fields
}

WorkerService wires HTTP → CacheManager → SSE.

func NewWorkerService ¶ added in v1.3.0

func NewWorkerService(
	metricsGlobSize int,
	logsGlobSize int,
	tracesGlobSize int,
	maxParallelDownloads int,
	sp storageprofile.StorageProfileProvider,
	cloudManagers cloudstorage.ClientProvider,
	duckdbSettings duckdbx.DuckDBSettings,
) (*WorkerService, error)

func (*WorkerService) Close ¶ added in v1.4.4

func (ws *WorkerService) Close()

func (*WorkerService) Run ¶ added in v1.3.0

func (ws *WorkerService) Run(doneCtx context.Context) error

func (*WorkerService) ServeHTTP ¶ added in v1.3.0

func (ws *WorkerService) ServeHTTP(w http.ResponseWriter, r *http.Request)

ServeHttp serves SSE with merged, sorted points from cache+S3.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL