Documentation
¶
Overview ¶
Package read provides the ReadBuilder, TableScan, Plan, and TableRead pipeline.
Index ¶
- type DataSplit
- type Plan
- type ReadBuilder
- func (rb *ReadBuilder) NewPredicateBuilder() *predicate.Builder
- func (rb *ReadBuilder) NewRead() *TableRead
- func (rb *ReadBuilder) NewScan() *TableScan
- func (rb *ReadBuilder) WithFilter(p *predicate.Predicate) *ReadBuilder
- func (rb *ReadBuilder) WithLimit(n int64) *ReadBuilder
- func (rb *ReadBuilder) WithProjection(cols []string) *ReadBuilder
- type StartingFrom
- type StreamBatch
- type StreamReadBuilder
- func (sb *StreamReadBuilder) NewStream() *TableStream
- func (sb *StreamReadBuilder) WithFilter(p *predicate.Predicate) *StreamReadBuilder
- func (sb *StreamReadBuilder) WithPollInterval(d time.Duration) *StreamReadBuilder
- func (sb *StreamReadBuilder) WithProjection(cols []string) *StreamReadBuilder
- func (sb *StreamReadBuilder) WithStartingFrom(s StartingFrom) *StreamReadBuilder
- type TableRead
- type TableScan
- type TableStream
- type TableStreamReader
- func (r *TableStreamReader) Err() error
- func (r *TableStreamReader) Next() bool
- func (r *TableStreamReader) Record() arrow.RecordBatch
- func (r *TableStreamReader) RecordBatch() arrow.RecordBatch
- func (r *TableStreamReader) Release()
- func (r *TableStreamReader) Retain()
- func (r *TableStreamReader) Schema() *arrow.Schema
- func (r *TableStreamReader) SnapshotID() int64
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type DataSplit ¶
type DataSplit struct {
Partition *manifest.ManifestEntry // representative entry for partition info
Bucket int
Files []manifest.DataFileMeta
NeedsMerge bool // true for primary-key tables that require sort-merge deduplication
DeletionFiles map[string]manifest.DeletionVectorMeta
}
DataSplit represents a unit of work: a set of data files from one bucket/partition that should be read together.
DeletionFiles maps data file name → DeletionVectorMeta for any files in this split that have associated deletion vectors. Nil means no DVs are present. DV resolution is performed during planning (batch reads only); stream delta reads always produce splits with DeletionFiles == nil.
type ReadBuilder ¶
type ReadBuilder struct {
// contains filtered or unexported fields
}
ReadBuilder is the entry point for building a table scan + read pipeline.
func NewReadBuilder ¶
func NewReadBuilder(tbl *table.FileStoreTable) *ReadBuilder
NewReadBuilder creates a ReadBuilder for the given table.
By default all columns are returned (no projection) and no filter is applied. Call ReadBuilder.WithProjection, ReadBuilder.WithFilter, and ReadBuilder.WithLimit to refine the read before calling ReadBuilder.NewScan and ReadBuilder.NewRead.
A ReadBuilder is not safe for concurrent use. Create one per goroutine or protect it with a mutex.
func (*ReadBuilder) NewPredicateBuilder ¶
func (rb *ReadBuilder) NewPredicateBuilder() *predicate.Builder
NewPredicateBuilder returns a PredicateBuilder scoped to the (possibly projected) schema.
func (*ReadBuilder) NewRead ¶
func (rb *ReadBuilder) NewRead() *TableRead
NewRead returns a TableRead configured with this builder's settings.
func (*ReadBuilder) NewScan ¶
func (rb *ReadBuilder) NewScan() *TableScan
NewScan returns a TableScan configured with this builder's settings.
func (*ReadBuilder) WithFilter ¶
func (rb *ReadBuilder) WithFilter(p *predicate.Predicate) *ReadBuilder
WithFilter attaches a filter predicate applied during both stats-based file pruning (planning) and row-level evaluation (reading). Passing nil clears any previously set filter. Use ReadBuilder.NewPredicateBuilder to construct predicates scoped to the effective read schema.
func (*ReadBuilder) WithLimit ¶
func (rb *ReadBuilder) WithLimit(n int64) *ReadBuilder
WithLimit sets a row limit hint (used for early exit, not guaranteed ordering).
func (*ReadBuilder) WithProjection ¶
func (rb *ReadBuilder) WithProjection(cols []string) *ReadBuilder
WithProjection limits the columns returned to the named subset. Column names must match those in the table schema exactly; unknown names are silently ignored. Passing nil or an empty slice returns all columns.
type StartingFrom ¶
type StartingFrom int
StartingFrom controls where a stream read begins.
const ( // StartingFromLatest skips all existing data and only emits rows from // snapshots that appear after the stream is started. StartingFromLatest StartingFrom = iota // StartingFromEarliest replays all existing snapshots before emitting new ones. StartingFromEarliest )
type StreamBatch ¶
StreamBatch is the result of a single TableStream.Next() call.
type StreamReadBuilder ¶
type StreamReadBuilder struct {
// contains filtered or unexported fields
}
StreamReadBuilder is the entry point for building a continuous stream read.
func NewStreamReadBuilder ¶
func NewStreamReadBuilder(tbl *table.FileStoreTable) *StreamReadBuilder
NewStreamReadBuilder creates a StreamReadBuilder for the given table.
By default reads start from the latest snapshot (StartingFromLatest) with no filter and no projection. Use StreamReadBuilder.WithStartingFrom, StreamReadBuilder.WithFilter, and StreamReadBuilder.WithProjection to configure before calling StreamReadBuilder.NewStream.
func (*StreamReadBuilder) NewStream ¶
func (sb *StreamReadBuilder) NewStream() *TableStream
NewStream returns a TableStream that iterates over new snapshots. Each call to TableStream.Next blocks until a new APPEND snapshot arrives.
func (*StreamReadBuilder) WithFilter ¶
func (sb *StreamReadBuilder) WithFilter(p *predicate.Predicate) *StreamReadBuilder
WithFilter attaches a filter predicate applied at the row level to each snapshot batch. Stats-based pruning is also applied during planning. Passing nil clears any previously set filter.
func (*StreamReadBuilder) WithPollInterval ¶
func (sb *StreamReadBuilder) WithPollInterval(d time.Duration) *StreamReadBuilder
WithPollInterval sets how often to check for new snapshots (default: 2s).
func (*StreamReadBuilder) WithProjection ¶
func (sb *StreamReadBuilder) WithProjection(cols []string) *StreamReadBuilder
WithProjection limits the columns returned to the named subset. Unknown names are silently ignored. Passing nil or empty returns all columns.
func (*StreamReadBuilder) WithStartingFrom ¶
func (sb *StreamReadBuilder) WithStartingFrom(s StartingFrom) *StreamReadBuilder
WithStartingFrom sets where the stream begins (default: StartingFromLatest).
type TableRead ¶
type TableRead struct {
// contains filtered or unexported fields
}
TableRead executes the read of splits and produces Arrow data.
func (*TableRead) ToArrow ¶
ToArrow reads all splits and materialises the result as a single arrow.Table held entirely in memory.
For large tables or streaming workloads prefer TableRead.ToArrowReader, which streams record batches one at a time without accumulating all data.
The caller must call Release() on the returned table when done to free memory.
func (*TableRead) ToArrowReader ¶
func (tr *TableRead) ToArrowReader(ctx context.Context, splits []DataSplit) (array.RecordReader, error)
ToArrowReader returns an array.RecordReader that streams record batches from all splits. The caller is responsible for calling Release() on each record and Release() on the reader.
type TableScan ¶
type TableScan struct {
// contains filtered or unexported fields
}
TableScan resolves a snapshot into a Plan of DataSplits.
type TableStream ¶
type TableStream struct {
// contains filtered or unexported fields
}
TableStream iterates over snapshots as they arrive, emitting one batch of splits per new snapshot. Call Next in a loop; each successful call provides splits that can be read with NewRead().ToArrowReader().
func (*TableStream) LastSnapshotID ¶
func (ts *TableStream) LastSnapshotID() int64
LastSnapshotID returns the ID of the last snapshot consumed, or -1 if none yet.
func (*TableStream) Next ¶
func (ts *TableStream) Next(ctx context.Context) (*StreamBatch, error)
Next blocks until a new snapshot is available, then returns its splits.
Only APPEND commits produce splits; COMPACT/OVERWRITE/ANALYZE snapshots are silently skipped (lastID is advanced but no batch is returned to the caller). Returns a non-nil error (including context.Canceled / context.DeadlineExceeded) when the context is done.
type TableStreamReader ¶
type TableStreamReader struct {
// contains filtered or unexported fields
}
TableStreamReader is a convenience wrapper that implements array.RecordReader across an unbounded stream of snapshots. It blocks inside Next() waiting for new data, and respects context cancellation.
func NewStreamReader ¶
func NewStreamReader(ctx context.Context, sb *StreamReadBuilder) *TableStreamReader
NewStreamReader creates a TableStreamReader from a StreamReadBuilder. The returned reader implements array.RecordReader and can be used wherever a standard Arrow record reader is expected. It blocks inside TableStreamReader.Next until a new snapshot batch arrives or ctx is cancelled.
func (*TableStreamReader) Err ¶
func (r *TableStreamReader) Err() error
Err returns the first error encountered, or nil if the reader stopped because the context was cancelled. Always check Err after Next returns false.
func (*TableStreamReader) Next ¶
func (r *TableStreamReader) Next() bool
Next advances to the next record batch, blocking until new data arrives. Returns false when the context is cancelled or an error occurs; check TableStreamReader.Err to distinguish the two cases.
func (*TableStreamReader) Record ¶
func (r *TableStreamReader) Record() arrow.RecordBatch
Record is an alias for RecordBatch, present for interface compatibility.
func (*TableStreamReader) RecordBatch ¶
func (r *TableStreamReader) RecordBatch() arrow.RecordBatch
RecordBatch returns the current record batch. Valid only after Next returns true.
func (*TableStreamReader) Release ¶
func (r *TableStreamReader) Release()
Release releases the current record batch and any open inner reader. Safe to call multiple times. Call when you are done with the reader.
func (*TableStreamReader) Retain ¶
func (r *TableStreamReader) Retain()
Retain is a no-op; TableStreamReader does not use reference counting.
func (*TableStreamReader) Schema ¶
func (r *TableStreamReader) Schema() *arrow.Schema
Schema returns the Arrow schema of the record batches produced by this reader.
func (*TableStreamReader) SnapshotID ¶
func (r *TableStreamReader) SnapshotID() int64
SnapshotID returns the snapshot ID of the current record batch. Returns -1 before the first successful Next call.