Documentation
¶
Overview ¶
Package predicate provides filter predicates for Paimon table scans.
Predicates are used at two levels:
- Stats-based pruning: applied against SimpleStats (min/max/null-count per file) to skip entire data files during planning.
- Row-level filtering: applied in the read path (pushed down to the Parquet reader).
Index ¶
- func EvalRow(p *Predicate, rec arrow.RecordBatch, row int) bool
- func TestByStats(p *Predicate, stats manifest.SimpleStats, rowCount int64) bool
- type Builder
- func (b *Builder) Equal(name string, value interface{}) (*Predicate, error)
- func (b *Builder) GreaterOrEqual(name string, value interface{}) (*Predicate, error)
- func (b *Builder) GreaterThan(name string, value interface{}) (*Predicate, error)
- func (b *Builder) In(name string, values ...interface{}) (*Predicate, error)
- func (b *Builder) IsNotNull(name string) (*Predicate, error)
- func (b *Builder) IsNull(name string) (*Predicate, error)
- func (b *Builder) LessOrEqual(name string, value interface{}) (*Predicate, error)
- func (b *Builder) LessThan(name string, value interface{}) (*Predicate, error)
- func (b *Builder) NotEqual(name string, value interface{}) (*Predicate, error)
- type Op
- type Predicate
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func EvalRow ¶
func EvalRow(p *Predicate, rec arrow.RecordBatch, row int) bool
EvalRow evaluates a predicate against a single row of an Arrow RecordBatch. p.FieldIdx must be bound to the column indices of rec (i.e. the read/projected schema).
func TestByStats ¶
func TestByStats(p *Predicate, stats manifest.SimpleStats, rowCount int64) bool
TestByStats returns false if the predicate is guaranteed NOT to match any row in a file with the given stats. Returns true if the file may contain matching rows.
Types ¶
type Builder ¶
type Builder struct {
// contains filtered or unexported fields
}
Builder constructs leaf predicates against a given schema.
Obtain a Builder from [read.ReadBuilder.NewPredicateBuilder] (preferred) or directly via NewBuilder. The Builder is bound to a specific set of fields; if the field set changes (e.g. after projection) call WithProjection to rebind FieldIdx values.
The value argument to comparison methods must match the Go type that corresponds to the Paimon field type:
Paimon type Go value type INT / INTEGER int32 BIGINT int64 FLOAT float32 DOUBLE float64 BOOLEAN bool STRING / VARCHAR string DATE int32 (days since 1970-01-01) TIMESTAMP int64 (epoch milliseconds)
Passing the wrong type does not panic at construction time but will produce incorrect results during stats-based pruning and row-level evaluation.
Example ¶
ExampleBuilder demonstrates the typical pattern: create a Builder from the table schema, build individual leaf predicates, then combine them.
In practice obtain the Builder from [read.ReadBuilder.NewPredicateBuilder] rather than constructing it directly — that automatically uses the effective (possibly projected) read schema.
package main
import (
"fmt"
"github.com/larssk/paimon-go/predicate"
"github.com/larssk/paimon-go/schema"
)
// fields is a small schema used across examples.
var fields = []schema.DataField{
{ID: 0, Name: "user_id", Type: schema.DataType{Type: "BIGINT"}},
{ID: 1, Name: "status", Type: schema.DataType{Type: "STRING"}},
{ID: 2, Name: "amount", Type: schema.DataType{Type: "DOUBLE"}},
{ID: 3, Name: "region", Type: schema.DataType{Type: "STRING"}},
{ID: 4, Name: "score", Type: schema.DataType{Type: "INT"}},
}
func main() {
b := predicate.NewBuilder(fields)
// amount > 100.0
gt, err := b.GreaterThan("amount", float64(100))
if err != nil {
panic(err)
}
// status == "PAID"
eq, err := b.Equal("status", "PAID")
if err != nil {
panic(err)
}
// Combine: amount > 100.0 AND status == "PAID"
filter := predicate.And(gt, eq)
fmt.Println(filter.Op == predicate.OpAnd)
}
Output: true
Example (In) ¶
ExampleBuilder_in shows how to match a column against a fixed set of values.
package main
import (
"fmt"
"github.com/larssk/paimon-go/predicate"
"github.com/larssk/paimon-go/schema"
)
// fields is a small schema used across examples.
var fields = []schema.DataField{
{ID: 0, Name: "user_id", Type: schema.DataType{Type: "BIGINT"}},
{ID: 1, Name: "status", Type: schema.DataType{Type: "STRING"}},
{ID: 2, Name: "amount", Type: schema.DataType{Type: "DOUBLE"}},
{ID: 3, Name: "region", Type: schema.DataType{Type: "STRING"}},
{ID: 4, Name: "score", Type: schema.DataType{Type: "INT"}},
}
func main() {
b := predicate.NewBuilder(fields)
// status IN ("PAID", "PENDING")
p, err := b.In("status", "PAID", "PENDING")
if err != nil {
panic(err)
}
fmt.Println(p.Op == predicate.OpIn)
}
Output: true
Example (IsNull) ¶
ExampleBuilder_isNull shows null and not-null checks.
package main
import (
"fmt"
"github.com/larssk/paimon-go/predicate"
"github.com/larssk/paimon-go/schema"
)
// fields is a small schema used across examples.
var fields = []schema.DataField{
{ID: 0, Name: "user_id", Type: schema.DataType{Type: "BIGINT"}},
{ID: 1, Name: "status", Type: schema.DataType{Type: "STRING"}},
{ID: 2, Name: "amount", Type: schema.DataType{Type: "DOUBLE"}},
{ID: 3, Name: "region", Type: schema.DataType{Type: "STRING"}},
{ID: 4, Name: "score", Type: schema.DataType{Type: "INT"}},
}
func main() {
b := predicate.NewBuilder(fields)
// rows where user_id IS NOT NULL
notNull, err := b.IsNotNull("user_id")
if err != nil {
panic(err)
}
// rows where region IS NULL
isNull, err := b.IsNull("region")
if err != nil {
panic(err)
}
// either condition
filter := predicate.Or(notNull, isNull)
fmt.Println(filter.Op == predicate.OpOr)
}
Output: true
Example (Range) ¶
ExampleBuilder_range shows how to express a range filter (low <= col <= high) by combining two leaf predicates with And.
package main
import (
"fmt"
"github.com/larssk/paimon-go/predicate"
"github.com/larssk/paimon-go/schema"
)
// fields is a small schema used across examples.
var fields = []schema.DataField{
{ID: 0, Name: "user_id", Type: schema.DataType{Type: "BIGINT"}},
{ID: 1, Name: "status", Type: schema.DataType{Type: "STRING"}},
{ID: 2, Name: "amount", Type: schema.DataType{Type: "DOUBLE"}},
{ID: 3, Name: "region", Type: schema.DataType{Type: "STRING"}},
{ID: 4, Name: "score", Type: schema.DataType{Type: "INT"}},
}
func main() {
b := predicate.NewBuilder(fields)
low, err := b.GreaterOrEqual("score", int32(10))
if err != nil {
panic(err)
}
high, err := b.LessOrEqual("score", int32(100))
if err != nil {
panic(err)
}
// 10 <= score <= 100
filter := predicate.And(low, high)
fmt.Println(filter.Op == predicate.OpAnd)
}
Output: true
Example (UnknownField) ¶
ExampleBuilder_unknownField shows the error returned when a field name is not present in the schema.
package main
import (
"fmt"
"github.com/larssk/paimon-go/predicate"
"github.com/larssk/paimon-go/schema"
)
// fields is a small schema used across examples.
var fields = []schema.DataField{
{ID: 0, Name: "user_id", Type: schema.DataType{Type: "BIGINT"}},
{ID: 1, Name: "status", Type: schema.DataType{Type: "STRING"}},
{ID: 2, Name: "amount", Type: schema.DataType{Type: "DOUBLE"}},
{ID: 3, Name: "region", Type: schema.DataType{Type: "STRING"}},
{ID: 4, Name: "score", Type: schema.DataType{Type: "INT"}},
}
func main() {
b := predicate.NewBuilder(fields)
_, err := b.Equal("nonexistent", "value")
fmt.Println(err)
}
Output: predicate: unknown field "nonexistent"
func NewBuilder ¶
NewBuilder creates a Builder bound to the given field slice. Prefer [read.ReadBuilder.NewPredicateBuilder] which binds automatically to the effective (possibly projected) read schema.
func (*Builder) Equal ¶
Equal builds a predicate that matches rows where name == value. Returns an error if name is not present in the builder's field set.
func (*Builder) GreaterOrEqual ¶
GreaterOrEqual builds a predicate that matches rows where name >= value. Returns an error if name is not present in the builder's field set.
func (*Builder) GreaterThan ¶
GreaterThan builds a predicate that matches rows where name > value. Returns an error if name is not present in the builder's field set.
func (*Builder) In ¶
In builds a predicate that matches rows where name is equal to any of values. Returns an error if name is not present in the builder's field set.
func (*Builder) IsNotNull ¶
IsNotNull builds a predicate that matches rows where name IS NOT NULL. Returns an error if name is not present in the builder's field set.
func (*Builder) IsNull ¶
IsNull builds a predicate that matches rows where name IS NULL. Returns an error if name is not present in the builder's field set.
func (*Builder) LessOrEqual ¶
LessOrEqual builds a predicate that matches rows where name <= value. Returns an error if name is not present in the builder's field set.
type Op ¶
type Op int
Op is a predicate operator.
const ( OpEqual Op = iota // field == value OpNotEqual // field != value OpLessThan // field < value OpLessOrEqual // field <= value OpGreaterThan // field > value OpGreaterOrEqual // field >= value OpIsNull // field IS NULL OpIsNotNull // field IS NOT NULL OpIn // field IN (values...) OpAnd // logical AND of Children OpOr // logical OR of Children OpNot // logical NOT of Children[0] )
type Predicate ¶
type Predicate struct {
Op Op
FieldIdx int
FieldName string
// TypeTag as returned by schema.TypeTag (used for BinaryRow decoding).
TypeTag string
Literals []interface{}
// For AND / OR / NOT: child predicates.
Children []*Predicate
}
Predicate is a filter expression over a table's fields.
A Predicate is either a leaf (Op is one of the comparison operators, FieldIdx and Literals are set) or a compound node (Op is OpAnd / OpOr / OpNot, Children are set). The tree is built by Builder methods and the And, Or, Not combinators; callers do not normally construct Predicate values directly.
FieldIdx is the zero-based index of the field in the read/projected schema and must be rebound with WithProjection whenever the column set changes.
func And ¶
And combines two or more predicates with logical AND. Returns a predicate that is true only when all children are true. Passing a single predicate is valid and returns an OpAnd wrapper around it.
func Not ¶
Not negates a predicate. Note: NOT is not applied during stats-based file pruning (conservative: the file is always kept). It is applied at row-level evaluation by EvalRow.
Example ¶
ExampleNot shows how to negate a predicate. Note: Not is applied at row-level evaluation but is skipped during stats-based file pruning (the file is kept conservatively).
package main
import (
"fmt"
"github.com/larssk/paimon-go/predicate"
"github.com/larssk/paimon-go/schema"
)
// fields is a small schema used across examples.
var fields = []schema.DataField{
{ID: 0, Name: "user_id", Type: schema.DataType{Type: "BIGINT"}},
{ID: 1, Name: "status", Type: schema.DataType{Type: "STRING"}},
{ID: 2, Name: "amount", Type: schema.DataType{Type: "DOUBLE"}},
{ID: 3, Name: "region", Type: schema.DataType{Type: "STRING"}},
{ID: 4, Name: "score", Type: schema.DataType{Type: "INT"}},
}
func main() {
b := predicate.NewBuilder(fields)
eu, err := b.Equal("region", "EU")
if err != nil {
panic(err)
}
// region != "EU" (via NOT)
notEU := predicate.Not(eu)
fmt.Println(notEU.Op == predicate.OpNot)
}
Output: true