read

package

v0.1.1 Latest Latest Go to latest Published: May 26, 2026 License: Apache-2.0 Imports: 24 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/larssk/paimon-go

Links

Open Source Insights

Documentation ¶

Overview ¶

Package read provides the ReadBuilder, TableScan, Plan, and TableRead pipeline.

Index ¶

type DataSplit
type Plan
type ReadBuilder
- func NewReadBuilder(tbl *table.FileStoreTable) *ReadBuilder
type StartingFrom
type StreamBatch
type StreamReadBuilder
- func NewStreamReadBuilder(tbl *table.FileStoreTable) *StreamReadBuilder
type TableRead
- func (tr *TableRead) ToArrow(ctx context.Context, splits []DataSplit) (arrow.Table, error)
- func (tr *TableRead) ToArrowReader(ctx context.Context, splits []DataSplit) (array.RecordReader, error)
type TableScan
- func (ts *TableScan) Plan(ctx context.Context) (*Plan, error)
type TableStream
- func (ts *TableStream) LastSnapshotID() int64
- func (ts *TableStream) Next(ctx context.Context) (*StreamBatch, error)
type TableStreamReader
- func NewStreamReader(ctx context.Context, sb *StreamReadBuilder) *TableStreamReader

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type DataSplit ¶

type DataSplit struct {
	Partition     *manifest.ManifestEntry // representative entry for partition info
	Bucket        int
	Files         []manifest.DataFileMeta
	NeedsMerge    bool // true for primary-key tables that require sort-merge deduplication
	DeletionFiles map[string]manifest.DeletionVectorMeta
}

DataSplit represents a unit of work: a set of data files from one bucket/partition that should be read together.

DeletionFiles maps data file name → DeletionVectorMeta for any files in this split that have associated deletion vectors. Nil means no DVs are present. DV resolution is performed during planning (batch reads only); stream delta reads always produce splits with DeletionFiles == nil.

type Plan ¶

type Plan struct {
	Splits     []DataSplit
	SnapshotID int64
}

Plan is the output of TableScan.Plan(): a list of splits ready to be read.

type ReadBuilder ¶

type ReadBuilder struct {
	// contains filtered or unexported fields
}

ReadBuilder is the entry point for building a table scan + read pipeline.

func NewReadBuilder ¶

func NewReadBuilder(tbl *table.FileStoreTable) *ReadBuilder

NewReadBuilder creates a ReadBuilder for the given table.

By default all columns are returned (no projection) and no filter is applied. Call ReadBuilder.WithProjection, ReadBuilder.WithFilter, and ReadBuilder.WithLimit to refine the read before calling ReadBuilder.NewScan and ReadBuilder.NewRead.

A ReadBuilder is not safe for concurrent use. Create one per goroutine or protect it with a mutex.

func (*ReadBuilder) NewPredicateBuilder ¶

func (rb *ReadBuilder) NewPredicateBuilder() *predicate.Builder

NewPredicateBuilder returns a PredicateBuilder scoped to the (possibly projected) schema.

func (*ReadBuilder) NewRead ¶

func (rb *ReadBuilder) NewRead() *TableRead

NewRead returns a TableRead configured with this builder's settings.

func (*ReadBuilder) NewScan ¶

func (rb *ReadBuilder) NewScan() *TableScan

NewScan returns a TableScan configured with this builder's settings.

func (*ReadBuilder) WithFilter ¶

func (rb *ReadBuilder) WithFilter(p *predicate.Predicate) *ReadBuilder

WithFilter attaches a filter predicate applied during both stats-based file pruning (planning) and row-level evaluation (reading). Passing nil clears any previously set filter. Use ReadBuilder.NewPredicateBuilder to construct predicates scoped to the effective read schema.

func (*ReadBuilder) WithLimit ¶

func (rb *ReadBuilder) WithLimit(n int64) *ReadBuilder

WithLimit sets a row limit hint (used for early exit, not guaranteed ordering).

func (*ReadBuilder) WithProjection ¶

func (rb *ReadBuilder) WithProjection(cols []string) *ReadBuilder

WithProjection limits the columns returned to the named subset. Column names must match those in the table schema exactly; unknown names are silently ignored. Passing nil or an empty slice returns all columns.

type StartingFrom ¶

type StartingFrom int

StartingFrom controls where a stream read begins.

const (
	// StartingFromLatest skips all existing data and only emits rows from
	// snapshots that appear after the stream is started.
	StartingFromLatest StartingFrom = iota

	// StartingFromEarliest replays all existing snapshots before emitting new ones.
	StartingFromEarliest
)

type StreamBatch ¶

type StreamBatch struct {
	Splits     []DataSplit
	SnapshotID int64
}

StreamBatch is the result of a single TableStream.Next() call.

type StreamReadBuilder ¶

type StreamReadBuilder struct {
	// contains filtered or unexported fields
}

StreamReadBuilder is the entry point for building a continuous stream read.

func NewStreamReadBuilder ¶

func NewStreamReadBuilder(tbl *table.FileStoreTable) *StreamReadBuilder

NewStreamReadBuilder creates a StreamReadBuilder for the given table.

By default reads start from the latest snapshot (StartingFromLatest) with no filter and no projection. Use StreamReadBuilder.WithStartingFrom, StreamReadBuilder.WithFilter, and StreamReadBuilder.WithProjection to configure before calling StreamReadBuilder.NewStream.

func (*StreamReadBuilder) NewStream ¶

func (sb *StreamReadBuilder) NewStream() *TableStream

NewStream returns a TableStream that iterates over new snapshots. Each call to TableStream.Next blocks until a new APPEND snapshot arrives.

func (*StreamReadBuilder) WithFilter ¶

func (sb *StreamReadBuilder) WithFilter(p *predicate.Predicate) *StreamReadBuilder

WithFilter attaches a filter predicate applied at the row level to each snapshot batch. Stats-based pruning is also applied during planning. Passing nil clears any previously set filter.

func (*StreamReadBuilder) WithPollInterval ¶

func (sb *StreamReadBuilder) WithPollInterval(d time.Duration) *StreamReadBuilder

WithPollInterval sets how often to check for new snapshots (default: 2s).

func (*StreamReadBuilder) WithProjection ¶

func (sb *StreamReadBuilder) WithProjection(cols []string) *StreamReadBuilder

WithProjection limits the columns returned to the named subset. Unknown names are silently ignored. Passing nil or empty returns all columns.

func (*StreamReadBuilder) WithStartingFrom ¶

func (sb *StreamReadBuilder) WithStartingFrom(s StartingFrom) *StreamReadBuilder

WithStartingFrom sets where the stream begins (default: StartingFromLatest).

type TableRead ¶

type TableRead struct {
	// contains filtered or unexported fields
}

TableRead executes the read of splits and produces Arrow data.

func (*TableRead) ToArrow ¶

func (tr *TableRead) ToArrow(ctx context.Context, splits []DataSplit) (arrow.Table, error)

ToArrow reads all splits and materialises the result as a single arrow.Table held entirely in memory.

For large tables or streaming workloads prefer TableRead.ToArrowReader, which streams record batches one at a time without accumulating all data.

The caller must call Release() on the returned table when done to free memory.

func (*TableRead) ToArrowReader ¶

func (tr *TableRead) ToArrowReader(ctx context.Context, splits []DataSplit) (array.RecordReader, error)

ToArrowReader returns an array.RecordReader that streams record batches from all splits. The caller is responsible for calling Release() on each record and Release() on the reader.

type TableScan ¶

type TableScan struct {
	// contains filtered or unexported fields
}

TableScan resolves a snapshot into a Plan of DataSplits.

func (*TableScan) Plan ¶

func (ts *TableScan) Plan(ctx context.Context) (*Plan, error)

Plan resolves the latest snapshot, reads all manifest files, prunes by stats, and returns a Plan of DataSplits ready for reading.

type TableStream ¶

type TableStream struct {
	// contains filtered or unexported fields
}

TableStream iterates over snapshots as they arrive, emitting one batch of splits per new snapshot. Call Next in a loop; each successful call provides splits that can be read with NewRead().ToArrowReader().

func (*TableStream) LastSnapshotID ¶

func (ts *TableStream) LastSnapshotID() int64

LastSnapshotID returns the ID of the last snapshot consumed, or -1 if none yet.

func (*TableStream) Next ¶

func (ts *TableStream) Next(ctx context.Context) (*StreamBatch, error)

Next blocks until a new snapshot is available, then returns its splits.

Only APPEND commits produce splits; COMPACT/OVERWRITE/ANALYZE snapshots are silently skipped (lastID is advanced but no batch is returned to the caller). Returns a non-nil error (including context.Canceled / context.DeadlineExceeded) when the context is done.

type TableStreamReader ¶

type TableStreamReader struct {
	// contains filtered or unexported fields
}

TableStreamReader is a convenience wrapper that implements array.RecordReader across an unbounded stream of snapshots. It blocks inside Next() waiting for new data, and respects context cancellation.

func NewStreamReader ¶

func NewStreamReader(ctx context.Context, sb *StreamReadBuilder) *TableStreamReader

NewStreamReader creates a TableStreamReader from a StreamReadBuilder. The returned reader implements array.RecordReader and can be used wherever a standard Arrow record reader is expected. It blocks inside TableStreamReader.Next until a new snapshot batch arrives or ctx is cancelled.

func (*TableStreamReader) Err ¶

func (r *TableStreamReader) Err() error

Err returns the first error encountered, or nil if the reader stopped because the context was cancelled. Always check Err after Next returns false.

func (*TableStreamReader) Next ¶

func (r *TableStreamReader) Next() bool

Next advances to the next record batch, blocking until new data arrives. Returns false when the context is cancelled or an error occurs; check TableStreamReader.Err to distinguish the two cases.

func (*TableStreamReader) Record ¶

func (r *TableStreamReader) Record() arrow.RecordBatch

Record is an alias for RecordBatch, present for interface compatibility.

func (*TableStreamReader) RecordBatch ¶

func (r *TableStreamReader) RecordBatch() arrow.RecordBatch

RecordBatch returns the current record batch. Valid only after Next returns true.

func (*TableStreamReader) Release ¶

func (r *TableStreamReader) Release()

Release releases the current record batch and any open inner reader. Safe to call multiple times. Call when you are done with the reader.

func (*TableStreamReader) Retain ¶

func (r *TableStreamReader) Retain()

Retain is a no-op; TableStreamReader does not use reference counting.

func (*TableStreamReader) Schema ¶

func (r *TableStreamReader) Schema() *arrow.Schema

Schema returns the Arrow schema of the record batches produced by this reader.

func (*TableStreamReader) SnapshotID ¶

func (r *TableStreamReader) SnapshotID() int64

SnapshotID returns the snapshot ID of the current record batch. Returns -1 before the first successful Next call.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL