dataset

package
v0.0.0-...-370038a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 24, 2026 License: MIT Imports: 5 Imported by: 0

Documentation

Overview

Package dataset provides zero-copy, lazy-evaluating columnar data abstractions for the Grammar of Graphics pipeline.

Engine-First Architecture

Every data operation is delegated to an Engine backend. The dataset package defines only interfaces and contracts — no concrete column types, no fallbacks. Engines (Arrow, memory, SQL) implement sub-interfaces (Aggregator, Windower, Joiner, etc.) for the operations they support.

Type System

The type system is aligned with Apache Arrow:

  • Field maps to arrow.Field (name, type, nullable, metadata)
  • Schema maps to arrow.Schema (ordered collection of fields)
  • AnyColumn is the type-erased column interface (engine-native storage)
  • Column is the generic typed access layer
  • GetColumn bridges untyped to typed via a single type assertion

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Close

func Close(ds Table) error

Close releases resources if the dataset implements Closer. Safe to call on any Dataset — returns nil for datasets without resources.

func Names

func Names(ds Table) []string

Names returns the column names from a dataset's schema.

Types

type AggFunc

type AggFunc int

AggFunc identifies an aggregation function.

const (
	AggSum AggFunc = iota
	AggMean
	AggMin
	AggMax
	AggCount
	AggMedian
	AggVariance
)

type AggSpec

type AggSpec struct {
	OutputName string  // name of the result column
	InputName  string  // name of the source column
	Fn         AggFunc // which aggregation to apply
}

AggSpec describes a single aggregation to apply in Summarize.

func Count

func Count(out, in string) AggSpec

func Max

func Max(out, in string) AggSpec

func Mean

func Mean(out, in string) AggSpec

func Median

func Median(out, in string) AggSpec

func Min

func Min(out, in string) AggSpec

func Sum

func Sum(out, in string) AggSpec

Agg helpers for building AggSpecs.

func Variance

func Variance(out, in string) AggSpec

type Aggregator

type Aggregator interface {
	Sum(col AnyColumn) (AnyColumn, error)
	Mean(col AnyColumn) (AnyColumn, error)
	MinMax(col AnyColumn) (min AnyColumn, max AnyColumn, err error)
	Count(col AnyColumn) (AnyColumn, error)
	Median(col AnyColumn) (AnyColumn, error)
	Variance(col AnyColumn) (AnyColumn, error)
}

Aggregator provides vectorized aggregation kernels. All methods return AnyColumn (single-element column) preserving the input type — aligned with Arrow compute kernel type rules:

  • Sum: numeric → same type (int64→int64, float64→float64)
  • Mean: numeric → float64 (always widens)
  • MinMax: any ordered type → (min, max) of same type
  • Count: any → int64
  • Median: numeric → float64
  • Variance: numeric → float64

For Arrow: delegates to arrow/math SIMD operations. For SQL: generates SELECT SUM/AVG/MIN/MAX/COUNT queries.

type AndPred

type AndPred struct{ Preds []Masker }

AndPred combines masks with AND.

func And

func And(preds ...Masker) AndPred

func (AndPred) Expr

func (p AndPred) Expr() string

func (AndPred) Mask

func (p AndPred) Mask(ds Table) ([]bool, error)

type AnyColumn

type AnyColumn interface {
	Name() string
	Len() int64
	DType() DType
}

AnyColumn is the type-erased column interface. This is what Dataset stores, engines operate on, and maps hold. Every engine-native column type implements this.

type BetweenPred

type BetweenPred struct {
	Col    string
	Lo, Hi any
}

func Between

func Between(col string, lo, hi any) BetweenPred

func (BetweenPred) Expr

func (p BetweenPred) Expr() string

func (BetweenPred) Mask

func (p BetweenPred) Mask(ds Table) ([]bool, error)

type BoolAppender

type BoolAppender interface {
	Append(v bool)
	AppendNull()
	AppendValues(vs []bool)
	Reserve(n int)
}

BoolAppender streams bool values into a column.

type BoolMask

type BoolMask []bool

BoolMask is a pre-computed boolean mask that implements Masker. Useful when the filter has already been computed externally (e.g. faceting).

func (BoolMask) Expr

func (m BoolMask) Expr() string

func (BoolMask) Mask

func (m BoolMask) Mask(_ Table) ([]bool, error)

type Builder

type Builder interface {
	Float64(col string) Float64Appender
	Int64(col string) Int64Appender
	String(col string) StringAppender
	Bool(col string) BoolAppender

	Build() (Table, error)
}

Builder provides streaming, typed, zero-boxing construction. Each column has its own typed appender — no any boxing, no allocations per row.

type BuilderFactory

type BuilderFactory interface {
	NewBuilder(schema *Schema) Builder
}

BuilderFactory creates schema-aware builders for streaming construction.

type CSVConfig

type CSVConfig struct {
	HasHeader  bool
	Comma      rune
	Comment    rune
	NullValues []string
	// ChunkSize is the number of rows per batch. 0 means engine default.
	// Arrow default: 65536, Memory default: unlimited.
	ChunkSize int
}

CSVConfig holds engine-agnostic CSV configuration. The dataset/csv facade constructs this from functional options and passes it to the engine's CSVReader/CSVWriter implementation.

type CSVReader

type CSVReader interface {
	ReadCSV(ctx context.Context, r io.Reader, cfg CSVConfig) (Table, error)
}

CSVReader reads CSV data into an engine-native Dataset. Memory engine: uses go-simdcsv + schema inference. Arrow engine: uses arrow/csv.NewInferringReader for zero-copy ingest.

type CSVWriter

type CSVWriter interface {
	WriteCSV(ctx context.Context, w io.Writer, ds Table, cfg CSVConfig) error
}

CSVWriter writes a Dataset to CSV. Memory engine: uses go-simdcsv Writer. Arrow engine: uses go-simdcsv Writer (generic — CSV output is string-based).

type Caster

type Caster interface {
	Cast(col AnyColumn, target DType) (AnyColumn, error)
}

Caster provides engine-controlled type casting. Casting is an engine operation — the engine knows its native column types and how to convert between them.

type Closer

type Closer interface {
	Close() error
}

Closer is optionally implemented by datasets that hold resources requiring explicit cleanup (e.g., Arrow tables, database connections).

type Column

type Column[T any] interface {
	AnyColumn
	Values() []T
	IsNull() []bool
}

Column is the typed access layer. Engine-specific column types implement both AnyColumn and Column[T] for their native type.

Values returns the underlying typed slice — zero-copy for both Arrow (returns the Arrow buffer) and memory (returns the Go slice).

IsNull returns the null bitmap. nil means no nulls (common case, zero alloc).

func GetColumn

func GetColumn[T any](ds Table, name string) (Column[T], error)

GetColumn retrieves a typed column from a dataset. This is the only place a type assertion occurs — call sites get compile-time type safety from this point forward.

type ColumnFactory

type ColumnFactory interface {
	NewFloat64Column(name string, data []float64) AnyColumn
	NewInt64Column(name string, data []int64) AnyColumn
	NewStringColumn(name string, data []string) AnyColumn
	NewBoolColumn(name string, data []bool) AnyColumn
	NewTimestampColumn(name string, data []int64) AnyColumn

	// FromColumns assembles columns into a Dataset with the given schema.
	// All columns must have the same length.
	FromColumns(schema *Schema, cols ...AnyColumn) (Table, error)
}

ColumnFactory wraps existing typed slices into engine-native columns. Memory engine: wraps the slice (zero-copy). Arrow engine: builds an Arrow array (one allocation).

type CompPred

type CompPred struct {
	Col string
	Op  Op
	Val any
}

CompPred compares a column against a scalar value. Implements both Masker (local eval) and Expr() (SQL pushdown).

func Eq

func Eq(col string, val any) CompPred

func Ge

func Ge(col string, val any) CompPred

func Gt

func Gt(col string, val any) CompPred

func Le

func Le(col string, val any) CompPred

func Lt

func Lt(col string, val any) CompPred

func Ne

func Ne(col string, val any) CompPred

func (CompPred) Expr

func (p CompPred) Expr() string

func (CompPred) Mask

func (p CompPred) Mask(ds Table) ([]bool, error)

type Composer

type Composer interface {
	Stack(datasets ...Table) (Table, error)
	Combine(datasets ...Table) (Table, error)
}

Composer provides row/column binding operations. For Arrow: zero-copy concatenation of Arrow arrays. For SQL: UNION ALL / lateral join.

type DType

type DType int

DType represents the logical data type of a column. This is the type ID — analogous to arrow.Type.

const (
	// DTypeFloat64 is a 64-bit floating point column.
	DTypeFloat64 DType = iota
	// DTypeInt64 is a 64-bit integer column.
	DTypeInt64
	// DTypeString is a string/categorical column.
	DTypeString
	// DTypeBool is a boolean column.
	DTypeBool
	// DTypeTimestamp is a timestamp column stored as int64 nanoseconds
	// since the Unix epoch (1970-01-01T00:00:00Z). This representation
	// is zero-copy compatible with Arrow's TIMESTAMP(ns) type.
	DTypeTimestamp
	// DTypeUnknown is an unrecognized type.
	DTypeUnknown
)

func (DType) String

func (d DType) String() string

String returns the human-readable name of the DType.

type Dataset

type Dataset struct {
	// contains filtered or unexported fields
}

Frame is the fluent API for data manipulation. All verbs return a new Frame (immutable chain). Every operation delegates to the dataset's engine via sub-interfaces — the Frame never touches raw data directly.

Usage:

result := dataset.From(ds).
    Select("x", "y").
    Filter(dataset.Gt("x", 0)).
    Arrange("x").
    Collect()

func From

func From(ds Table) Dataset

From wraps a Table in a Dataset for fluent verb chaining.

func NewDataset

func NewDataset(eng Engine, cols ...AnyColumn) (Dataset, error)

NewDataset creates a Dataset from an engine and columns. The schema is inferred from the columns' names and types.

func ReplaceColumn

func ReplaceColumn(ds Dataset, name string, values []float64) (Dataset, error)

ReplaceColumn replaces a named column in a Dataset with new float64 values. All other columns are preserved. Used for discrete-to-numeric remapping.

func (Dataset) AntiJoin

func (f Dataset) AntiJoin(other Table, spec JoinSpec) Dataset

func (Dataset) Arrange

func (f Dataset) Arrange(cols ...string) Dataset

Arrange sorts the dataset by the named column (ascending). Engine's Selector.SortIndices computes the permutation; Selector.Take applies it.

func (Dataset) Collect

func (f Dataset) Collect() (Table, error)

Collect materializes the frame's pipeline and returns the Dataset and error.

func (Dataset) Column

func (f Dataset) Column(name string) (AnyColumn, error)

Convenience forwarding methods — allow Dataset to be used where Table is expected.

func (Dataset) Combine

func (f Dataset) Combine(others ...Table) Dataset

func (Dataset) Distinct

func (f Dataset) Distinct(cols ...string) Dataset

Distinct removes duplicate rows based on the specified columns. If no columns are specified, all columns are used.

func (Dataset) DropNA

func (f Dataset) DropNA(cols ...string) Dataset

func (Dataset) Err

func (f Dataset) Err() error

Err returns the first error encountered in the chain, or nil.

func (Dataset) Fill

func (f Dataset) Fill(col string, dir FillDirection) Dataset

func (Dataset) Filter

func (f Dataset) Filter(mask Masker) Dataset

Filter keeps rows where the Masker evaluates to true.

func (Dataset) FullJoin

func (f Dataset) FullJoin(other Table, spec JoinSpec) Dataset

func (Dataset) GroupBy

func (f Dataset) GroupBy(cols ...string) GroupedFrame

GroupBy specifies columns to group by. Returns a GroupedFrame for Summarize.

func (Dataset) Head

func (f Dataset) Head(n int) Dataset

Head returns the first n rows.

func (Dataset) InnerJoin

func (f Dataset) InnerJoin(other Table, spec JoinSpec) Dataset

func (Dataset) LeftJoin

func (f Dataset) LeftJoin(other Table, spec JoinSpec) Dataset

func (Dataset) Mutate

func (f Dataset) Mutate(name string, fn MutateFunc) Dataset

Mutate appends or replaces a column using a MutateFunc.

func (Dataset) NumCols

func (f Dataset) NumCols() int64

func (Dataset) NumRows

func (f Dataset) NumRows() int64

func (Dataset) PivotLonger

func (f Dataset) PivotLonger(spec PivotLongerSpec) Dataset

func (Dataset) PivotWider

func (f Dataset) PivotWider(spec PivotWiderSpec) Dataset

func (Dataset) Rename

func (f Dataset) Rename(oldName, newName string) Dataset

Rename renames a column.

func (Dataset) RightJoin

func (f Dataset) RightJoin(other Table, spec JoinSpec) Dataset

func (Dataset) Schema

func (f Dataset) Schema() *Schema

func (Dataset) Select

func (f Dataset) Select(cols ...string) Dataset

Select keeps only the named columns, in the order specified.

func (Dataset) SemiJoin

func (f Dataset) SemiJoin(other Table, spec JoinSpec) Dataset

func (Dataset) Separate

func (f Dataset) Separate(col string, into []string, sep string) Dataset

func (Dataset) Slice

func (f Dataset) Slice(start, end int) Dataset

Slice returns rows in the range [start, end). Engine's Selector.SliceColumn handles this — for Arrow, zero-copy via array.NewSlice.

func (Dataset) Stack

func (f Dataset) Stack(others ...Table) Dataset

func (Dataset) Table

func (f Dataset) Table() Table

Table returns the underlying Table, or nil if an error occurred.

func (Dataset) Tail

func (f Dataset) Tail(n int) Dataset

Tail returns the last n rows.

type Engine

type Engine interface {
	// Name returns a human-readable identifier (e.g., "arrow", "memory", "sql").
	Name() string
}

Engine is the marker interface that all compute backends implement.

func GetEngine

func GetEngine(ds Table) Engine

GetEngine extracts the engine from a dataset. Returns nil if the dataset does not carry an engine.

type ErrColumnNotFound

type ErrColumnNotFound struct {
	Name string
}

ErrColumnNotFound indicates a requested column does not exist.

func (*ErrColumnNotFound) Error

func (e *ErrColumnNotFound) Error() string

type Field

type Field struct {
	Name     string
	Dtype    DType
	Nullable bool
	Metadata map[string]string
}

Field describes a single column in a dataset — its name, logical type, nullability, and optional metadata. This maps directly to arrow.Field.

Metadata carries type-specific parameters that DType alone cannot express:

  • Timestamp timezone: {"tz": "America/Sao_Paulo"}
  • Display format: {"format": "2006-01-02"}
  • Units: {"unit": "ns"}

func BoolCol

func BoolCol(name string) Field

func FloatCol

func FloatCol(name string) Field

func IntCol

func IntCol(name string) Field

func NullableFloatCol

func NullableFloatCol(name string) Field

NullableFloatCol creates a nullable float64 field.

func NullableIntCol

func NullableIntCol(name string) Field

NullableIntCol creates a nullable int64 field.

func NullableStringCol

func NullableStringCol(name string) Field

NullableStringCol creates a nullable string field.

func StringCol

func StringCol(name string) Field

func TimestampCol

func TimestampCol(name string) Field

func (Field) WithMetadata

func (f Field) WithMetadata(md map[string]string) Field

WithMetadata returns a copy of the field with the given metadata.

func (Field) WithNullable

func (f Field) WithNullable() Field

WithNullable returns a copy of the field with Nullable set.

type FillDirection

type FillDirection int

FillDirection specifies the direction for filling missing values.

const (
	// FillDown fills missing values with the previous non-null value (carry forward).
	FillDown FillDirection = iota
	// FillUp fills missing values with the next non-null value (carry backward).
	FillUp
)

type Filler

type Filler interface {
	Fill(col AnyColumn, dir FillDirection) (AnyColumn, error)
	DropNA(ds Table, cols ...string) (Table, error)
	ReplaceNA(col AnyColumn, defaultVal float64) (AnyColumn, error)
}

Filler provides missing-value handling operations. For Arrow: streaming fill with zero allocation. For SQL: generates COALESCE / window-based fill.

type Filterer

type Filterer interface {
	Filter(ds Table, mask Masker) (Table, error)
}

Filterer provides mask-based row filtering. For Arrow: boolean mask filtering with zero-copy. For SQL: generates WHERE clauses.

type Float64Appender

type Float64Appender interface {
	Append(v float64)
	AppendNull()
	AppendValues(vs []float64)
	Reserve(n int)
}

Float64Appender streams float64 values into a column.

type GroupedFrame

type GroupedFrame struct {
	// contains filtered or unexported fields
}

GroupedFrame holds a Frame with group-by columns set.

func (GroupedFrame) Summarize

func (gf GroupedFrame) Summarize(specs ...AggSpec) Dataset

Summarize applies aggregations per group using the engine's Aggregator. All computation is delegated to the engine — the Frame only orchestrates grouping.

type HasEngine

type HasEngine interface {
	Table
	Engine() Engine
}

HasEngine is implemented by datasets that carry an engine reference. This enables engine propagation through transformations — stat packages and ggplot internals can produce new datasets using the same engine without importing engine-specific packages.

type InPred

type InPred struct {
	Col  string
	Vals []any
}

func In

func In(col string, vals ...any) InPred

func (InPred) Expr

func (p InPred) Expr() string

func (InPred) Mask

func (p InPred) Mask(ds Table) ([]bool, error)

type Int64Appender

type Int64Appender interface {
	Append(v int64)
	AppendNull()
	AppendValues(vs []int64)
	Reserve(n int)
}

Int64Appender streams int64 values into a column.

type IsNotNullPred

type IsNotNullPred struct{ Col string }

func IsNotNull

func IsNotNull(col string) IsNotNullPred

func (IsNotNullPred) Expr

func (p IsNotNullPred) Expr() string

func (IsNotNullPred) Mask

func (p IsNotNullPred) Mask(ds Table) ([]bool, error)

type IsNullPred

type IsNullPred struct{ Col string }

func IsNull

func IsNull(col string) IsNullPred

func (IsNullPred) Expr

func (p IsNullPred) Expr() string

func (IsNullPred) Mask

func (p IsNullPred) Mask(ds Table) ([]bool, error)

type JoinSpec

type JoinSpec struct {
	Type      JoinType
	LeftCols  []string
	RightCols []string
}

JoinSpec describes how to match rows between two datasets.

func On

func On(cols ...string) JoinSpec

On creates a JoinSpec matching on columns with the same name in both datasets.

type JoinType

type JoinType int

JoinType identifies the kind of join to perform.

const (
	// JoinLeft keeps all rows from the left dataset; unmatched right rows are null-filled.
	JoinLeft JoinType = iota
	// JoinRight keeps all rows from the right dataset; unmatched left rows are null-filled.
	JoinRight
	// JoinInner keeps only rows with matches in both datasets.
	JoinInner
	// JoinFull keeps all rows from both datasets; unmatched sides are null-filled.
	JoinFull
	// JoinSemi keeps rows from the left that have at least one match in the right.
	// No columns from the right are included.
	JoinSemi
	// JoinAnti keeps rows from the left that have NO match in the right.
	// No columns from the right are included.
	JoinAnti
)

type Joiner

type Joiner interface {
	Join(left, right Table, spec JoinSpec) (Table, error)
}

Joiner provides join operations across datasets. For Arrow: hash-join with lazy indexed column views. For SQL: generates JOIN ... ON ... clauses.

type Masker

type Masker interface {
	// Mask computes a boolean mask of length int(ds.NumRows()). True entries are kept.
	Mask(ds Table) ([]bool, error)
}

Masker describes a row-level filter condition that can be lazily evaluated against a dataset to produce a boolean mask.

type MathKernel

type MathKernel interface {
	// Binary arithmetic (column × column, same length required)
	AddCols(a, b AnyColumn) (AnyColumn, error)
	SubCols(a, b AnyColumn) (AnyColumn, error)
	MulCols(a, b AnyColumn) (AnyColumn, error)
	DivCols(a, b AnyColumn) (AnyColumn, error)

	// Scalar arithmetic (column × scalar)
	AddScalar(col AnyColumn, val float64) (AnyColumn, error)
	MulScalar(col AnyColumn, val float64) (AnyColumn, error)

	// Unary numeric
	Abs(col AnyColumn) (AnyColumn, error)
	Neg(col AnyColumn) (AnyColumn, error)
	Sign(col AnyColumn) (AnyColumn, error)
	Sqrt(col AnyColumn) (AnyColumn, error)
	Pow(col AnyColumn, exp float64) (AnyColumn, error)

	// Transcendental — logarithmic
	Exp(col AnyColumn) (AnyColumn, error)
	Ln(col AnyColumn) (AnyColumn, error)
	Log2(col AnyColumn) (AnyColumn, error)
	Log10(col AnyColumn) (AnyColumn, error)

	// Transcendental — trigonometric
	Sin(col AnyColumn) (AnyColumn, error)
	Cos(col AnyColumn) (AnyColumn, error)
	Tan(col AnyColumn) (AnyColumn, error)
	Asin(col AnyColumn) (AnyColumn, error)
	Acos(col AnyColumn) (AnyColumn, error)
	Atan(col AnyColumn) (AnyColumn, error)
	Atan2(y, x AnyColumn) (AnyColumn, error)

	// Transcendental — hyperbolic / special
	Tanh(col AnyColumn) (AnyColumn, error)
	Sigmoid(col AnyColumn) (AnyColumn, error)
	Erf(col AnyColumn) (AnyColumn, error)

	// Rounding
	Round(col AnyColumn) (AnyColumn, error)
	Floor(col AnyColumn) (AnyColumn, error)
	Ceil(col AnyColumn) (AnyColumn, error)

	// Bitwise (int64 columns only)
	BitAnd(a, b AnyColumn) (AnyColumn, error)
	BitOr(a, b AnyColumn) (AnyColumn, error)
	BitXor(a, b AnyColumn) (AnyColumn, error)
	BitNot(col AnyColumn) (AnyColumn, error)
	BitShiftLeft(col AnyColumn, n int) (AnyColumn, error)
	BitShiftRight(col AnyColumn, n int) (AnyColumn, error)
}

MathKernel provides element-wise mathematical transforms on numeric columns.

Arrow engine: uses Arrow compute Datum API when available, highway SIMD for gaps. Memory engine: uses highway SIMD on raw slices, falls back to math stdlib. SQL engine: generates MATH functions (EXP, LOG, SIN, etc.)

All methods require float64 columns unless noted (bitwise requires int64).

type MutateFunc

type MutateFunc interface {
	// Apply produces a new column from the dataset.
	Apply(ds Table) (AnyColumn, error)
}

MutateFunc describes a column transformation for Mutate.

type NotPred

type NotPred struct{ Pred Masker }

NotPred inverts a mask.

func Not

func Not(pred Masker) NotPred

func (NotPred) Expr

func (p NotPred) Expr() string

func (NotPred) Mask

func (p NotPred) Mask(ds Table) ([]bool, error)

type Op

type Op int

Op identifies a comparison operator.

const (
	OpGt        Op = iota // col > val
	OpLt                  // col < val
	OpGe                  // col >= val
	OpLe                  // col <= val
	OpEq                  // col == val
	OpNe                  // col != val
	OpBetween             // lo <= col <= hi
	OpIn                  // col IN (vals...)
	OpIsNull              // col IS NULL
	OpIsNotNull           // col IS NOT NULL
)

type OrPred

type OrPred struct{ Preds []Masker }

OrPred combines masks with OR.

func Or

func Or(preds ...Masker) OrPred

func (OrPred) Expr

func (p OrPred) Expr() string

func (OrPred) Mask

func (p OrPred) Mask(ds Table) ([]bool, error)

type ParquetConfig

type ParquetConfig struct {
	// Compression codec: "snappy", "gzip", "zstd", "lz4", "none".
	Compression string
}

ParquetConfig holds engine-agnostic Parquet configuration.

type ParquetReader

type ParquetReader interface {
	ReadParquet(ctx context.Context, r io.ReaderAt, size int64, cfg ParquetConfig) (Table, error)
}

ParquetReader reads Parquet data into an engine-native Dataset. Memory engine: uses parquet-go for struct-based row reading. Arrow engine: uses pqarrow.ReadTable for zero-copy columnar ingest.

type ParquetWriter

type ParquetWriter interface {
	WriteParquet(ctx context.Context, w io.Writer, ds Table, cfg ParquetConfig) error
}

ParquetWriter writes a Dataset to Parquet format. Memory engine: uses parquet-go GenericWriter. Arrow engine: uses pqarrow.WriteTable.

type PivotLongerSpec

type PivotLongerSpec struct {
	// Cols are the column names to pivot from wide to long format.
	// These columns are "gathered" into a single name+value pair.
	Cols []string
	// NamesTo is the output column name that will hold the original column names.
	NamesTo string
	// ValuesTo is the output column name that will hold the values.
	ValuesTo string
}

PivotLongerSpec configures a PivotLonger operation.

type PivotWiderSpec

type PivotWiderSpec struct {
	// NamesFrom is the column whose unique values become new column names.
	NamesFrom string
	// ValuesFrom is the column whose values fill the new columns.
	ValuesFrom string
}

PivotWiderSpec configures a PivotWider operation.

type Reshaper

type Reshaper interface {
	PivotLonger(ds Table, spec PivotLongerSpec) (Table, error)
	PivotWider(ds Table, spec PivotWiderSpec) (Table, error)
	Separate(ds Table, col string, into []string, sep string) (Table, error)
	Concatenate(ds Table, col string, from []string, sep string) (Table, error)
	Complete(ds Table, cols ...string) (Table, error)
}

Reshaper provides reshape/pivot operations. For Arrow: lazy column views (repeatedView, interleavedView). For SQL: generates CASE WHEN / UNPIVOT / CROSSTAB.

type Schema

type Schema struct {
	// contains filtered or unexported fields
}

Schema describes the complete structure of a dataset — an ordered collection of Fields with a name-to-index lookup. This maps directly to arrow.Schema.

func NewSchema

func NewSchema(fields ...Field) *Schema

NewSchema creates a Schema from an ordered list of fields. Panics if any two fields share the same name.

func (*Schema) Field

func (s *Schema) Field(i int) Field

Field returns the field at index i.

func (*Schema) FieldIndex

func (s *Schema) FieldIndex(name string) int

FieldIndex returns the index of the named field, or -1.

func (*Schema) Fields

func (s *Schema) Fields() []Field

Fields returns a copy of the schema's fields.

func (*Schema) HasField

func (s *Schema) HasField(name string) bool

HasField returns true if the schema contains a field with the given name.

func (*Schema) NumFields

func (s *Schema) NumFields() int

NumFields returns the number of fields.

type Selector

type Selector interface {
	// Select reorders/selects rows by index (scatter-gather).
	// This is the Arrow "Take" kernel.
	Select(col AnyColumn, indices []int) (AnyColumn, error)

	// Slice returns rows [start, end) from a column.
	// For Arrow: zero-copy via array.NewSlice.
	Slice(col AnyColumn, start, end int) (AnyColumn, error)

	// SortIndices returns the permutation that sorts the column ascending.
	// Returns an int slice, not a column — it's metadata for Take().
	SortIndices(col AnyColumn) ([]int, error)

	// FilterIndices returns the row indices where mask[i] == true.
	// Returns an int slice for use with Take().
	FilterIndices(mask []bool) []int
}

Selector provides engine-native column/row manipulation primitives. These are the building blocks for Frame verbs (Select, Arrange, Head, etc.).

For Arrow: zero-copy slicing, compute Take kernel, sort-indices kernel. For Memory: direct slice operations. For SQL: generates ORDER BY, LIMIT/OFFSET, WHERE rowid IN (...).

type StringAppender

type StringAppender interface {
	Append(v string)
	AppendNull()
	AppendValues(vs []string)
	Reserve(n int)
}

StringAppender streams string values into a column.

type Table

type Table interface {
	// Schema returns the dataset's schema.
	Schema() *Schema

	// Column retrieves a named column. Returns [ErrColumnNotFound] if absent.
	// The returned [AnyColumn] can be type-asserted to [Column[T]] for typed
	// access, or use [GetColumn] for a safe generic retrieval.
	Column(name string) (AnyColumn, error)

	// NumRows returns the logical number of rows.
	NumRows() int64

	// NumCols returns the number of columns.
	NumCols() int64
}

Dataset represents an immutable, columnar data source.

Implementations include in-memory frames, Arrow tables, and SQL-backed remote tables. All ETL operations are available via [Frame].

type Windower

type Windower interface {
	Lag(col AnyColumn, n int) (AnyColumn, error)
	Lead(col AnyColumn, n int) (AnyColumn, error)
	CumSum(col AnyColumn) (AnyColumn, error)
	CumMax(col AnyColumn) (AnyColumn, error)
	CumMin(col AnyColumn) (AnyColumn, error)
	Rank(col AnyColumn) (AnyColumn, error)
	DenseRank(col AnyColumn) (AnyColumn, error)
	PercentRank(col AnyColumn) (AnyColumn, error)
	RowNumber(n int) (AnyColumn, error)
}

Windower provides window function kernels. For Arrow: streaming accumulators over Arrow arrays. For SQL: generates OVER() / WINDOW clauses.

Directories

Path Synopsis
Package arrow provides an Apache Arrow-backed compute engine for the dataset package.
Package arrow provides an Apache Arrow-backed compute engine for the dataset package.
Package bigquery implements a BigQuery SQL pushdown engine for the dataset library.
Package bigquery implements a BigQuery SQL pushdown engine for the dataset library.
Package compute provides portable SIMD primitives for the dataset engines.
Package compute provides portable SIMD primitives for the dataset engines.
Package csv provides CSV reading and writing for the dataset package.
Package csv provides CSV reading and writing for the dataset package.
Package math provides SIMD-accelerated mathematical transforms for the dataset engines.
Package math provides SIMD-accelerated mathematical transforms for the dataset engines.
Package memory provides a lightweight Go-slice-backed compute engine for the dataset package.
Package memory provides a lightweight Go-slice-backed compute engine for the dataset package.
Package parquet provides Parquet reading and writing for the dataset package.
Package parquet provides Parquet reading and writing for the dataset package.
Package sort provides SIMD-accelerated sorting for the dataset engines.
Package sort provides SIMD-accelerated sorting for the dataset engines.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL