predicate

package
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 26, 2026 License: Apache-2.0 Imports: 6 Imported by: 0

Documentation

Overview

Package predicate provides filter predicates for Paimon table scans.

Predicates are used at two levels:

  1. Stats-based pruning: applied against SimpleStats (min/max/null-count per file) to skip entire data files during planning.
  2. Row-level filtering: applied in the read path (pushed down to the Parquet reader).

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func EvalRow

func EvalRow(p *Predicate, rec arrow.RecordBatch, row int) bool

EvalRow evaluates a predicate against a single row of an Arrow RecordBatch. p.FieldIdx must be bound to the column indices of rec (i.e. the read/projected schema).

func TestByStats

func TestByStats(p *Predicate, stats manifest.SimpleStats, rowCount int64) bool

TestByStats returns false if the predicate is guaranteed NOT to match any row in a file with the given stats. Returns true if the file may contain matching rows.

Types

type Builder

type Builder struct {
	// contains filtered or unexported fields
}

Builder constructs leaf predicates against a given schema.

Obtain a Builder from [read.ReadBuilder.NewPredicateBuilder] (preferred) or directly via NewBuilder. The Builder is bound to a specific set of fields; if the field set changes (e.g. after projection) call WithProjection to rebind FieldIdx values.

The value argument to comparison methods must match the Go type that corresponds to the Paimon field type:

Paimon type          Go value type
INT / INTEGER        int32
BIGINT               int64
FLOAT                float32
DOUBLE               float64
BOOLEAN              bool
STRING / VARCHAR     string
DATE                 int32  (days since 1970-01-01)
TIMESTAMP            int64  (epoch milliseconds)

Passing the wrong type does not panic at construction time but will produce incorrect results during stats-based pruning and row-level evaluation.

Example

ExampleBuilder demonstrates the typical pattern: create a Builder from the table schema, build individual leaf predicates, then combine them.

In practice obtain the Builder from [read.ReadBuilder.NewPredicateBuilder] rather than constructing it directly — that automatically uses the effective (possibly projected) read schema.

package main

import (
	"fmt"

	"github.com/larssk/paimon-go/predicate"
	"github.com/larssk/paimon-go/schema"
)

// fields is a small schema used across examples.
var fields = []schema.DataField{
	{ID: 0, Name: "user_id", Type: schema.DataType{Type: "BIGINT"}},
	{ID: 1, Name: "status", Type: schema.DataType{Type: "STRING"}},
	{ID: 2, Name: "amount", Type: schema.DataType{Type: "DOUBLE"}},
	{ID: 3, Name: "region", Type: schema.DataType{Type: "STRING"}},
	{ID: 4, Name: "score", Type: schema.DataType{Type: "INT"}},
}

func main() {
	b := predicate.NewBuilder(fields)

	// amount > 100.0
	gt, err := b.GreaterThan("amount", float64(100))
	if err != nil {
		panic(err)
	}

	// status == "PAID"
	eq, err := b.Equal("status", "PAID")
	if err != nil {
		panic(err)
	}

	// Combine: amount > 100.0 AND status == "PAID"
	filter := predicate.And(gt, eq)

	fmt.Println(filter.Op == predicate.OpAnd)
}
Output:
true
Example (In)

ExampleBuilder_in shows how to match a column against a fixed set of values.

package main

import (
	"fmt"

	"github.com/larssk/paimon-go/predicate"
	"github.com/larssk/paimon-go/schema"
)

// fields is a small schema used across examples.
var fields = []schema.DataField{
	{ID: 0, Name: "user_id", Type: schema.DataType{Type: "BIGINT"}},
	{ID: 1, Name: "status", Type: schema.DataType{Type: "STRING"}},
	{ID: 2, Name: "amount", Type: schema.DataType{Type: "DOUBLE"}},
	{ID: 3, Name: "region", Type: schema.DataType{Type: "STRING"}},
	{ID: 4, Name: "score", Type: schema.DataType{Type: "INT"}},
}

func main() {
	b := predicate.NewBuilder(fields)

	// status IN ("PAID", "PENDING")
	p, err := b.In("status", "PAID", "PENDING")
	if err != nil {
		panic(err)
	}

	fmt.Println(p.Op == predicate.OpIn)
}
Output:
true
Example (IsNull)

ExampleBuilder_isNull shows null and not-null checks.

package main

import (
	"fmt"

	"github.com/larssk/paimon-go/predicate"
	"github.com/larssk/paimon-go/schema"
)

// fields is a small schema used across examples.
var fields = []schema.DataField{
	{ID: 0, Name: "user_id", Type: schema.DataType{Type: "BIGINT"}},
	{ID: 1, Name: "status", Type: schema.DataType{Type: "STRING"}},
	{ID: 2, Name: "amount", Type: schema.DataType{Type: "DOUBLE"}},
	{ID: 3, Name: "region", Type: schema.DataType{Type: "STRING"}},
	{ID: 4, Name: "score", Type: schema.DataType{Type: "INT"}},
}

func main() {
	b := predicate.NewBuilder(fields)

	// rows where user_id IS NOT NULL
	notNull, err := b.IsNotNull("user_id")
	if err != nil {
		panic(err)
	}

	// rows where region IS NULL
	isNull, err := b.IsNull("region")
	if err != nil {
		panic(err)
	}

	// either condition
	filter := predicate.Or(notNull, isNull)

	fmt.Println(filter.Op == predicate.OpOr)
}
Output:
true
Example (Range)

ExampleBuilder_range shows how to express a range filter (low <= col <= high) by combining two leaf predicates with And.

package main

import (
	"fmt"

	"github.com/larssk/paimon-go/predicate"
	"github.com/larssk/paimon-go/schema"
)

// fields is a small schema used across examples.
var fields = []schema.DataField{
	{ID: 0, Name: "user_id", Type: schema.DataType{Type: "BIGINT"}},
	{ID: 1, Name: "status", Type: schema.DataType{Type: "STRING"}},
	{ID: 2, Name: "amount", Type: schema.DataType{Type: "DOUBLE"}},
	{ID: 3, Name: "region", Type: schema.DataType{Type: "STRING"}},
	{ID: 4, Name: "score", Type: schema.DataType{Type: "INT"}},
}

func main() {
	b := predicate.NewBuilder(fields)

	low, err := b.GreaterOrEqual("score", int32(10))
	if err != nil {
		panic(err)
	}
	high, err := b.LessOrEqual("score", int32(100))
	if err != nil {
		panic(err)
	}

	// 10 <= score <= 100
	filter := predicate.And(low, high)

	fmt.Println(filter.Op == predicate.OpAnd)
}
Output:
true
Example (UnknownField)

ExampleBuilder_unknownField shows the error returned when a field name is not present in the schema.

package main

import (
	"fmt"

	"github.com/larssk/paimon-go/predicate"
	"github.com/larssk/paimon-go/schema"
)

// fields is a small schema used across examples.
var fields = []schema.DataField{
	{ID: 0, Name: "user_id", Type: schema.DataType{Type: "BIGINT"}},
	{ID: 1, Name: "status", Type: schema.DataType{Type: "STRING"}},
	{ID: 2, Name: "amount", Type: schema.DataType{Type: "DOUBLE"}},
	{ID: 3, Name: "region", Type: schema.DataType{Type: "STRING"}},
	{ID: 4, Name: "score", Type: schema.DataType{Type: "INT"}},
}

func main() {
	b := predicate.NewBuilder(fields)

	_, err := b.Equal("nonexistent", "value")
	fmt.Println(err)
}
Output:
predicate: unknown field "nonexistent"

func NewBuilder

func NewBuilder(fields []schema.DataField) *Builder

NewBuilder creates a Builder bound to the given field slice. Prefer [read.ReadBuilder.NewPredicateBuilder] which binds automatically to the effective (possibly projected) read schema.

func (*Builder) Equal

func (b *Builder) Equal(name string, value interface{}) (*Predicate, error)

Equal builds a predicate that matches rows where name == value. Returns an error if name is not present in the builder's field set.

func (*Builder) GreaterOrEqual

func (b *Builder) GreaterOrEqual(name string, value interface{}) (*Predicate, error)

GreaterOrEqual builds a predicate that matches rows where name >= value. Returns an error if name is not present in the builder's field set.

func (*Builder) GreaterThan

func (b *Builder) GreaterThan(name string, value interface{}) (*Predicate, error)

GreaterThan builds a predicate that matches rows where name > value. Returns an error if name is not present in the builder's field set.

func (*Builder) In

func (b *Builder) In(name string, values ...interface{}) (*Predicate, error)

In builds a predicate that matches rows where name is equal to any of values. Returns an error if name is not present in the builder's field set.

func (*Builder) IsNotNull

func (b *Builder) IsNotNull(name string) (*Predicate, error)

IsNotNull builds a predicate that matches rows where name IS NOT NULL. Returns an error if name is not present in the builder's field set.

func (*Builder) IsNull

func (b *Builder) IsNull(name string) (*Predicate, error)

IsNull builds a predicate that matches rows where name IS NULL. Returns an error if name is not present in the builder's field set.

func (*Builder) LessOrEqual

func (b *Builder) LessOrEqual(name string, value interface{}) (*Predicate, error)

LessOrEqual builds a predicate that matches rows where name <= value. Returns an error if name is not present in the builder's field set.

func (*Builder) LessThan

func (b *Builder) LessThan(name string, value interface{}) (*Predicate, error)

LessThan builds a predicate that matches rows where name < value. Returns an error if name is not present in the builder's field set.

func (*Builder) NotEqual

func (b *Builder) NotEqual(name string, value interface{}) (*Predicate, error)

NotEqual builds a predicate that matches rows where name != value. Returns an error if name is not present in the builder's field set.

type Op

type Op int

Op is a predicate operator.

const (
	OpEqual          Op = iota // field == value
	OpNotEqual                 // field != value
	OpLessThan                 // field < value
	OpLessOrEqual              // field <= value
	OpGreaterThan              // field > value
	OpGreaterOrEqual           // field >= value
	OpIsNull                   // field IS NULL
	OpIsNotNull                // field IS NOT NULL
	OpIn                       // field IN (values...)
	OpAnd                      // logical AND of Children
	OpOr                       // logical OR of Children
	OpNot                      // logical NOT of Children[0]
)

type Predicate

type Predicate struct {
	Op        Op
	FieldIdx  int
	FieldName string
	// TypeTag as returned by schema.TypeTag (used for BinaryRow decoding).
	TypeTag  string
	Literals []interface{}
	// For AND / OR / NOT: child predicates.
	Children []*Predicate
}

Predicate is a filter expression over a table's fields.

A Predicate is either a leaf (Op is one of the comparison operators, FieldIdx and Literals are set) or a compound node (Op is OpAnd / OpOr / OpNot, Children are set). The tree is built by Builder methods and the And, Or, Not combinators; callers do not normally construct Predicate values directly.

FieldIdx is the zero-based index of the field in the read/projected schema and must be rebound with WithProjection whenever the column set changes.

func And

func And(preds ...*Predicate) *Predicate

And combines two or more predicates with logical AND. Returns a predicate that is true only when all children are true. Passing a single predicate is valid and returns an OpAnd wrapper around it.

func Not

func Not(p *Predicate) *Predicate

Not negates a predicate. Note: NOT is not applied during stats-based file pruning (conservative: the file is always kept). It is applied at row-level evaluation by EvalRow.

Example

ExampleNot shows how to negate a predicate. Note: Not is applied at row-level evaluation but is skipped during stats-based file pruning (the file is kept conservatively).

package main

import (
	"fmt"

	"github.com/larssk/paimon-go/predicate"
	"github.com/larssk/paimon-go/schema"
)

// fields is a small schema used across examples.
var fields = []schema.DataField{
	{ID: 0, Name: "user_id", Type: schema.DataType{Type: "BIGINT"}},
	{ID: 1, Name: "status", Type: schema.DataType{Type: "STRING"}},
	{ID: 2, Name: "amount", Type: schema.DataType{Type: "DOUBLE"}},
	{ID: 3, Name: "region", Type: schema.DataType{Type: "STRING"}},
	{ID: 4, Name: "score", Type: schema.DataType{Type: "INT"}},
}

func main() {
	b := predicate.NewBuilder(fields)

	eu, err := b.Equal("region", "EU")
	if err != nil {
		panic(err)
	}

	// region != "EU" (via NOT)
	notEU := predicate.Not(eu)

	fmt.Println(notEU.Op == predicate.OpNot)
}
Output:
true

func Or

func Or(preds ...*Predicate) *Predicate

Or combines two or more predicates with logical OR. Returns a predicate that is true when at least one child is true.

func WithProjection

func WithProjection(p *Predicate, projectedFields []schema.DataField) *Predicate

WithProjection rebinds predicate leaf FieldIdx values to match a projected field slice. Fields not present in the projected set are removed (predicate becomes nil for that leaf).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL