dataset

package

v0.0.7 Latest Latest Go to latest Published: May 22, 2026 License: MIT Imports: 14 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/TuSKan/ggplot

Links

Open Source Insights

Documentation ¶

Overview ¶

Package dataset provides columnar data abstractions for the Grammar of Graphics pipeline. Frame verbs execute eagerly via the dataset's engine (memory and arrow backends materialize on each verb); the BigQuery engine is the only backend with internal lazy SQL accumulation. Arrow IPC and Parquet ingest paths support zero-copy reads.

Engine-First Architecture ¶

Every data operation is delegated to an Engine backend. The dataset package defines only interfaces and contracts — no concrete column types, no fallbacks. Engines (Arrow, memory, SQL) implement sub-interfaces (Aggregator, Windower, Joiner, etc.) for the operations they support.

Type System ¶

The type system is aligned with Apache Arrow:

Field maps to arrow.Field (name, type, nullable, metadata)
Schema maps to arrow.Schema (ordered collection of fields)
AnyColumn is the type-erased column interface (engine-native storage)
Column is the generic typed access layer
GetColumn bridges untyped to typed via a single type assertion

Index ¶

Variables
func Abs(s []float64) []float64
func Clamp[T cmp.Ordered](lo, hi T) func([]T) []T
func Clean(s []float64) []float64
func Close(ds Table) error
func Names(ds Table) []string
func RegisterCSVReader(engineName string, r CSVReader)
func RegisterCSVWriter(engineName string, w CSVWriter)
func RegisterParquetReader(engineName string, r ParquetReader)
func RegisterParquetWriter(engineName string, w ParquetWriter)
func ScalarFloat64(col AnyColumn) (float64, bool)
func Sorted[T cmp.Ordered](s []T) []T
type AggFunc
type AggSpec
- func Count(out, in string) AggSpec
- func First(out, in string) AggSpec
- func Last(out, in string) AggSpec
- func Max(out, in string) AggSpec
- func Mean(out, in string) AggSpec
- func Median(out, in string) AggSpec
- func Min(out, in string) AggSpec
- func Mode(out, in string) AggSpec
- func Percentile(out, in string, p float64) AggSpec
- func StdDev(out, in string) AggSpec
- func Sum(out, in string) AggSpec
- func Variance(out, in string) AggSpec
type Aggregator
type AndPred
- func And(preds ...Masker) AndPred
- func (p AndPred) Expr() string
- func (p AndPred) Mask(ds Table) ([]bool, error)
type AnyColumn
- func ConstInt64Column(eng Engine, name string, val int64, n int) AnyColumn
- func ConstStringColumn(eng Engine, name string, val string, n int) AnyColumn
- func Int64ColumnFromStrings(eng Engine, name string, values []string) (AnyColumn, []string)
type BetweenPred
- func Between(col string, lo, hi any) BetweenPred
- func (p BetweenPred) Expr() string
- func (p BetweenPred) Mask(ds Table) ([]bool, error)
type BoolAppender
type BoolMask
- func (m BoolMask) Expr() string
- func (m BoolMask) Mask(_ Table) ([]bool, error)
type Builder
type BuilderFactory
type CSVConfig
type CSVReader
- func GetCSVReader(engineName string) (CSVReader, bool)
type CSVWriter
- func GetCSVWriter(engineName string) (CSVWriter, bool)
type Caster
type Closer
type Column
- func GetColumn[T any](ds Table, name string) (Column[T], error)
type ColumnFactory
type ColumnNotFoundError
- func (e *ColumnNotFoundError) Error() string
type CompPred
- func Eq(col string, val any) CompPred
- func Ge(col string, val any) CompPred
- func Gt(col string, val any) CompPred
- func Le(col string, val any) CompPred
- func Lt(col string, val any) CompPred
- func Ne(col string, val any) CompPred
- func (p CompPred) Expr() string
- func (p CompPred) Mask(ds Table) ([]bool, error)
type Composer
type DType
- func (d DType) String() string
type Dataset
- func From(ds Table) Dataset
- func NewDataset(eng Engine, cols ...AnyColumn) (Dataset, error)
- func ReplaceColumn(ds Dataset, name string, values []float64) Dataset
- func (f Dataset) AntiJoin(other Table, spec JoinSpec) Dataset
- func (f Dataset) Arrange(cols ...string) Dataset
- func (d Dataset) Bools(name string) ([]bool, error)
- func (f Dataset) Collect(ctx context.Context) (Dataset, error)
- func (f Dataset) Collected() bool
- func (f Dataset) Column(name string) (AnyColumn, error)
- func (f Dataset) Combine(others ...Table) Dataset
- func (f Dataset) Distinct(cols ...string) Dataset
- func (f Dataset) DropNA(cols ...string) Dataset
- func (f Dataset) Err() error
- func (f Dataset) Fill(col string, dir FillDirection) Dataset
- func (f Dataset) Filter(mask Masker) Dataset
- func (d Dataset) Float64(name string, opts ...Float64Opt) ([]float64, error)
- func (f Dataset) FullJoin(other Table, spec JoinSpec) Dataset
- func (f Dataset) GroupBy(cols ...string) GroupedFrame
- func (f Dataset) Head(n int) Dataset
- func (f Dataset) InnerJoin(other Table, spec JoinSpec) Dataset
- func (d Dataset) Int64(name string, opts ...Int64Opt) ([]int64, error)
- func (f Dataset) LeftJoin(other Table, spec JoinSpec) Dataset
- func (f Dataset) Mutate(name string, fn MutateFunc) Dataset
- func (f Dataset) NumCols() int64
- func (f Dataset) NumRows() int64
- func (f Dataset) PivotLonger(spec PivotLongerSpec) Dataset
- func (f Dataset) PivotWider(spec PivotWiderSpec) Dataset
- func (f Dataset) Rename(oldName, newName string) Dataset
- func (f Dataset) RightJoin(other Table, spec JoinSpec) Dataset
- func (f Dataset) Schema() *Schema
- func (f Dataset) Select(cols ...string) Dataset
- func (f Dataset) SelectRows(indices []int) (Dataset, error)
- func (f Dataset) SemiJoin(other Table, spec JoinSpec) Dataset
- func (f Dataset) Separate(col string, into []string, sep string) Dataset
- func (f Dataset) Slice(start, end int) Dataset
- func (f Dataset) Stack(others ...Table) Dataset
- func (d Dataset) Strings(name string, opts ...StringOpt) ([]string, error)
- func (f Dataset) Table() Table
- func (f Dataset) Tail(n int) Dataset
- func (f Dataset) WithColumn(col AnyColumn) Dataset
type Engine
- func GetEngine(ds Table) Engine
type Field
- func BoolCol(name string) Field
- func DateCol(name string) Field
- func FloatCol(name string) Field
- func IntCol(name string) Field
- func NullableFloatCol(name string) Field
- func NullableIntCol(name string) Field
- func NullableStringCol(name string) Field
- func StringCol(name string) Field
- func TimeCol(name string) Field
- func TimestampCol(name string) Field
- func (f Field) WithMetadata(md map[string]string) Field
- func (f Field) WithNullable() Field
type FillDirection
type Filler
type Filterer
type Float64Appender
type Float64Opt
- func FillNaN(fill float64) Float64Opt
type GroupedFrame
- func (gf GroupedFrame) Summarize(specs ...AggSpec) Dataset
type HasEngine
type InPred
- func In(col string, vals ...any) InPred
- func (p InPred) Expr() string
- func (p InPred) Mask(ds Table) ([]bool, error)
type Int64Appender
type Int64Opt
type IsNotNullPred
- func IsNotNull(col string) IsNotNullPred
- func (p IsNotNullPred) Expr() string
- func (p IsNotNullPred) Mask(ds Table) ([]bool, error)
type IsNullPred
- func IsNull(col string) IsNullPred
- func (p IsNullPred) Expr() string
- func (p IsNullPred) Mask(ds Table) ([]bool, error)
type JoinSpec
- func On(cols ...string) JoinSpec
type JoinType
type Joiner
type Masker
type MathKernel
type MutateFunc
type NotPred
- func Not(pred Masker) NotPred
- func (p NotPred) Expr() string
- func (p NotPred) Mask(ds Table) ([]bool, error)
type Op
type Optimizer
type OrPred
- func Or(preds ...Masker) OrPred
- func (p OrPred) Expr() string
- func (p OrPred) Mask(ds Table) ([]bool, error)
type ParquetConfig
type ParquetReader
- func GetParquetReader(engineName string) (ParquetReader, bool)
type ParquetWriter
- func GetParquetWriter(engineName string) (ParquetWriter, bool)
type PivotLongerSpec
type PivotWiderSpec
type Reshaper
type Schema
- func NewSchema(fields ...Field) *Schema
- func (s *Schema) Field(i int) Field
- func (s *Schema) FieldIndex(name string) int
- func (s *Schema) Fields() []Field
- func (s *Schema) HasField(name string) bool
- func (s *Schema) NumFields() int
type Selector
type StatKernel
type StringAppender
type StringOpt
type Table
type Windower

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	// ErrUncollected is returned when an operation requires a collected Dataset.
	ErrUncollected = errors.New("dataset: operation on uncollected Dataset — call Collect(ctx) first")

	// ErrUnsupportedEngine is returned when an engine lacks a required capability.
	ErrUnsupportedEngine = errors.New("dataset: unsupported engine capability")

	// ErrNoEngine is returned when a Dataset has no engine.
	ErrNoEngine = errors.New("dataset: Dataset requires an engine")

	// ErrInvalidSlice is returned when a slice range is invalid.
	ErrInvalidSlice = errors.New("dataset: invalid slice range")

	// ErrNoAggResults is returned when there are no aggregation results.
	ErrNoAggResults = errors.New("dataset: no aggregation results to merge")

	// ErrUnsupportedAggFunc is returned for unknown aggregation functions.
	ErrUnsupportedAggFunc = errors.New("dataset: unknown AggFunc")

	// ErrUnsupportedDType is returned for unsupported data types.
	ErrUnsupportedDType = errors.New("dataset: unsupported DType")

	// ErrTypeMismatch is returned when column types don't match.
	ErrTypeMismatch = errors.New("dataset: column type mismatch")

	// ErrColumnNotNumeric is returned when a numeric column is required.
	ErrColumnNotNumeric = errors.New("dataset: column is not numeric")

	// ErrUnsupportedPredicate is returned for unsupported filter predicates.
	ErrUnsupportedPredicate = errors.New("dataset: unsupported predicate operator")
)

Sentinel errors for the dataset package.

Functions ¶

func Abs ¶

func Abs(s []float64) []float64

Abs applies math.Abs to all elements in place.

func Clamp ¶

func Clamp[T cmp.Ordered](lo, hi T) func([]T) []T

Clamp returns an option that clamps slice elements to the range [lo, hi]. Example: Clamp[int64](-5, 5), Clamp(0.0, 1.0)

func Clean ¶

func Clean(s []float64) []float64

Clean drops NaN and ±Inf from a float64 slice.

func Close ¶

func Close(ds Table) error

Close releases resources if the dataset implements Closer. Safe to call on any Dataset — returns nil for datasets without resources.

func Names ¶

func Names(ds Table) []string

Names returns the column names from a dataset's schema.

func RegisterCSVReader ¶ added in v0.0.6

func RegisterCSVReader(engineName string, r CSVReader)

RegisterCSVReader registers a CSVReader implementation for an engine.

func RegisterCSVWriter ¶ added in v0.0.6

func RegisterCSVWriter(engineName string, w CSVWriter)

RegisterCSVWriter registers a CSVWriter implementation for an engine.

func RegisterParquetReader ¶ added in v0.0.6

func RegisterParquetReader(engineName string, r ParquetReader)

RegisterParquetReader registers a ParquetReader implementation for an engine.

func RegisterParquetWriter ¶ added in v0.0.6

func RegisterParquetWriter(engineName string, w ParquetWriter)

RegisterParquetWriter registers a ParquetWriter implementation for an engine.

func ScalarFloat64 ¶ added in v0.0.5

func ScalarFloat64(col AnyColumn) (float64, bool)

ScalarFloat64 extracts a single float64 from a 1-element aggregate column (e.g. the result of Aggregator.Sum). Returns 0, false if the column is empty, not float64, or has zero value.

func Sorted ¶

func Sorted[T cmp.Ordered](s []T) []T

Sorted sorts the slice in place. Generic over any ordered type.

Types ¶

type AggFunc ¶

type AggFunc int

AggFunc identifies an aggregation function.

const (
	AggSum AggFunc = iota
	AggMean
	AggMin
	AggMax
	AggCount
	AggMedian
	AggVariance
	AggStdDev     // population standard deviation = sqrt(variance)
	AggFirst      // first element
	AggLast       // last element
	AggMode       // most frequent value
	AggPercentile // quantile; requires PercentileSpec.P
)

AggSum is the sum aggregation.

type AggSpec ¶

type AggSpec struct {
	OutputName string  // name of the result column
	InputName  string  // name of the source column
	Fn         AggFunc // which aggregation to apply
	P          float64 // percentile ∈ [0,1]; only used when Fn == AggPercentile
}

AggSpec describes a single aggregation to apply in Summarize.

func Count ¶

func Count(out, in string) AggSpec

Count builds a count aggregation spec.

func First ¶ added in v0.0.5

func First(out, in string) AggSpec

First builds a first-element aggregation spec.

func Last ¶ added in v0.0.5

func Last(out, in string) AggSpec

Last builds a last-element aggregation spec.

func Max ¶

func Max(out, in string) AggSpec

Max builds a max aggregation spec.

func Mean ¶

func Mean(out, in string) AggSpec

Mean builds a mean aggregation spec.

func Median ¶

func Median(out, in string) AggSpec

Median builds a median aggregation spec.

func Min ¶

func Min(out, in string) AggSpec

Min builds a min aggregation spec.

func Mode ¶ added in v0.0.5

func Mode(out, in string) AggSpec

Mode builds a mode (most-frequent-value) aggregation spec.

func Percentile ¶ added in v0.0.5

func Percentile(out, in string, p float64) AggSpec

Percentile builds a percentile aggregation spec. p ∈ [0,1].

func StdDev ¶ added in v0.0.5

func StdDev(out, in string) AggSpec

StdDev builds a standard deviation aggregation spec.

func Sum ¶

func Sum(out, in string) AggSpec

Sum builds a sum aggregation spec.

func Variance ¶

func Variance(out, in string) AggSpec

Variance builds a variance aggregation spec.

type Aggregator ¶

type Aggregator interface {
	Sum(col AnyColumn) (AnyColumn, error)
	Mean(col AnyColumn) (AnyColumn, error)
	MinMax(col AnyColumn) (mnCol AnyColumn, mxCol AnyColumn, err error)
	Count(col AnyColumn) (AnyColumn, error)
	Median(col AnyColumn) (AnyColumn, error)
	Variance(col AnyColumn) (AnyColumn, error)
	StdDev(col AnyColumn) (AnyColumn, error)                // sqrt(variance)
	First(col AnyColumn) (AnyColumn, error)                 // first element
	Last(col AnyColumn) (AnyColumn, error)                  // last element
	Mode(col AnyColumn) (AnyColumn, error)                  // most frequent value
	Percentile(col AnyColumn, p float64) (AnyColumn, error) // quantile ∈ [0,1]
}

Aggregator provides vectorized aggregation kernels. All methods return AnyColumn (single-element column) preserving the input type — aligned with Arrow compute kernel type rules:

Sum: numeric → same type (int64→int64, float64→float64)
Mean: numeric → float64 (always widens)
MinMax: any ordered type → (min, max) of same type
Count: any → int64
Median: numeric → float64
Variance: numeric → float64

For Arrow: delegates to arrow/math SIMD operations. For SQL: generates SELECT SUM/AVG/MIN/MAX/COUNT queries.

type AndPred ¶

type AndPred struct{ Preds []Masker }

AndPred combines masks with AND.

func And ¶

func And(preds ...Masker) AndPred

And combines multiple predicates with logical AND.

func (AndPred) Expr ¶

func (p AndPred) Expr() string

Expr returns the SQL representation of the AND combination.

func (AndPred) Mask ¶

func (p AndPred) Mask(ds Table) ([]bool, error)

Mask evaluates all sub-predicates and combines them with AND.

type AnyColumn ¶

type AnyColumn interface {
	Name() string
	Len() int64
	DType() DType
}

AnyColumn is the type-erased column interface. This is what Dataset stores, engines operate on, and maps hold. Every engine-native column type implements this.

func ConstInt64Column ¶ added in v0.0.4

func ConstInt64Column(eng Engine, name string, val int64, n int) AnyColumn

ConstInt64Column creates a constant int64 column with the given name and value, repeated n times. Useful for injecting system columns like PANEL.

func ConstStringColumn ¶ added in v0.0.4

func ConstStringColumn(eng Engine, name string, val string, n int) AnyColumn

ConstStringColumn creates a constant string column with the given name and value, repeated n times.

func Int64ColumnFromStrings ¶ added in v0.0.4

func Int64ColumnFromStrings(eng Engine, name string, values []string) (AnyColumn, []string)

Int64ColumnFromStrings creates an int64 column by mapping distinct string values to 0-based indices, preserving first-occurrence order. Returns the column and the ordered list of distinct values.

type BetweenPred ¶

type BetweenPred struct {
	Col    string
	Lo, Hi any
}

BetweenPred selects rows where a column value is between Lo and Hi.

func Between ¶

func Between(col string, lo, hi any) BetweenPred

Between builds a BETWEEN predicate for the given column and bounds.

func (BetweenPred) Expr ¶

func (p BetweenPred) Expr() string

Expr returns the SQL representation of this BETWEEN predicate.

func (BetweenPred) Mask ¶

func (p BetweenPred) Mask(ds Table) ([]bool, error)

Mask evaluates the BETWEEN predicate against each row.

type BoolAppender ¶

type BoolAppender interface {
	Append(v bool)
	AppendNull()
	AppendValues(vs []bool)
	Reserve(n int)
}

BoolAppender streams bool values into a column.

type BoolMask ¶

type BoolMask []bool

BoolMask is a pre-computed boolean mask that implements Masker. Useful when the filter has already been computed externally (e.g. faceting).

func (BoolMask) Expr ¶

func (m BoolMask) Expr() string

Expr returns a constant "TRUE" placeholder for SQL contexts.

func (BoolMask) Mask ¶

func (m BoolMask) Mask(_ Table) ([]bool, error)

Mask returns the pre-computed boolean slice unchanged.

type Builder ¶

type Builder interface {
	Float64(col string) Float64Appender
	Int64(col string) Int64Appender
	String(col string) StringAppender
	Bool(col string) BoolAppender

	Build() (Table, error)
}

Builder provides streaming, typed, zero-boxing construction. Each column has its own typed appender — no any boxing, no allocations per row.

type BuilderFactory ¶

type BuilderFactory interface {
	NewBuilder(schema *Schema) Builder
}

BuilderFactory creates schema-aware builders for streaming construction.

type CSVConfig ¶

type CSVConfig struct {
	HasHeader  bool
	Comma      rune
	Comment    rune
	NullValues []string
	// ChunkSize is the number of rows per batch. 0 means engine default.
	// Arrow default: 65536, Memory default: unlimited.
	ChunkSize int
}

CSVConfig holds engine-agnostic CSV configuration. The dataset/csv facade constructs this from functional options and passes it to the engine's CSVReader/CSVWriter implementation.

type CSVReader ¶

type CSVReader interface {
	ReadCSV(ctx context.Context, eng Engine, r io.Reader, cfg CSVConfig) (Table, error)
}

CSVReader reads CSV data into an engine-native Dataset. Memory engine: uses go-simdcsv + schema inference. Arrow engine: uses arrow/csv.NewInferringReader for zero-copy ingest.

func GetCSVReader ¶ added in v0.0.6

func GetCSVReader(engineName string) (CSVReader, bool)

GetCSVReader retrieves a registered CSVReader for an engine.

type CSVWriter ¶

type CSVWriter interface {
	WriteCSV(ctx context.Context, eng Engine, w io.Writer, ds Table, cfg CSVConfig) error
}

CSVWriter writes a Dataset to CSV. Memory engine: uses go-simdcsv Writer. Arrow engine: uses go-simdcsv Writer (generic — CSV output is string-based).

func GetCSVWriter ¶ added in v0.0.6

func GetCSVWriter(engineName string) (CSVWriter, bool)

GetCSVWriter retrieves a registered CSVWriter for an engine.

type Caster ¶

type Caster interface {
	Cast(col AnyColumn, target DType) (AnyColumn, error)
}

Caster provides engine-controlled type casting. Casting is an engine operation — the engine knows its native column types and how to convert between them.

type Closer ¶

type Closer interface {
	Close() error
}

Closer is optionally implemented by datasets that hold resources requiring explicit cleanup (e.g., Arrow tables, database connections).

type Column ¶

type Column[T any] interface {
	AnyColumn
	Values() []T
	IsNull() []bool
}

Column is the typed access layer. Engine-specific column types implement both AnyColumn and Column[T] for their native type.

Values returns the underlying typed slice — zero-copy for both Arrow (returns the Arrow buffer) and memory (returns the Go slice).

IsNull returns the null bitmap. nil means no nulls (common case, zero alloc).

func GetColumn ¶

func GetColumn[T any](ds Table, name string) (Column[T], error)

GetColumn retrieves a typed column from a dataset. This is the only place a type assertion occurs — call sites get compile-time type safety from this point forward.

type ColumnFactory ¶

type ColumnFactory interface {
	NewFloat64Column(name string, data []float64) AnyColumn
	NewInt64Column(name string, data []int64) AnyColumn
	NewStringColumn(name string, data []string) AnyColumn
	NewBoolColumn(name string, data []bool) AnyColumn
	NewTimestampColumn(name string, data []int64) AnyColumn

	// FromColumns assembles columns into a Dataset with the given schema.
	// All columns must have the same length.
	FromColumns(schema *Schema, cols ...AnyColumn) (Table, error)
}

ColumnFactory wraps existing typed slices into engine-native columns. Memory engine: wraps the slice (zero-copy). Arrow engine: builds an Arrow array (one allocation).

type ColumnNotFoundError ¶ added in v0.0.2

type ColumnNotFoundError struct {
	Name string
}

ColumnNotFoundError indicates a requested column does not exist.

func (*ColumnNotFoundError) Error ¶ added in v0.0.2

func (e *ColumnNotFoundError) Error() string

type CompPred ¶

type CompPred struct {
	Col string
	Op  Op
	Val any
}

CompPred compares a column against a scalar value. Implements both Masker (local eval) and Expr() (SQL pushdown).

func Eq ¶

func Eq(col string, val any) CompPred

Eq builds a col == val predicate.

func Ge ¶

func Ge(col string, val any) CompPred

Ge builds a col >= val predicate.

func Gt ¶

func Gt(col string, val any) CompPred

Gt builds a col > val predicate.

func Le ¶

func Le(col string, val any) CompPred

Le builds a col <= val predicate.

func Lt ¶

func Lt(col string, val any) CompPred

Lt builds a col < val predicate.

func Ne ¶

func Ne(col string, val any) CompPred

Ne builds a col != val predicate.

func (CompPred) Expr ¶

func (p CompPred) Expr() string

Expr returns the SQL representation of this comparison.

func (CompPred) Mask ¶

func (p CompPred) Mask(ds Table) ([]bool, error)

Mask evaluates the comparison predicate against each row.

type Composer ¶

type Composer interface {
	Stack(datasets ...Table) (Table, error)
	Combine(datasets ...Table) (Table, error)
}

Composer provides row/column binding operations. For Arrow: zero-copy concatenation of Arrow arrays. For SQL: UNION ALL / lateral join.

type DType ¶

type DType int

DType represents the logical data type of a column. This is the type ID — analogous to arrow.Type.

const (
	// DTypeFloat64 is a 64-bit floating point column.
	DTypeFloat64 DType = iota
	// DTypeInt64 is a 64-bit integer column.
	DTypeInt64
	// DTypeString is a string/categorical column.
	DTypeString
	// DTypeBool is a boolean column.
	DTypeBool
	// DTypeTimestamp is a timestamp column stored as int64 nanoseconds
	// since the Unix epoch (1970-01-01T00:00:00Z). This representation
	// is zero-copy compatible with Arrow's TIMESTAMP(ns) type.
	DTypeTimestamp
	// DTypeDate is a date-only column stored as int64 days since the
	// Unix epoch (1970-01-01). Compatible with Arrow's DATE32 type.
	DTypeDate
	// DTypeTime is a time-of-day column stored as int64 nanoseconds
	// since midnight (00:00:00.000000000). Compatible with Arrow's TIME64(ns).
	DTypeTime
	// DTypeUnknown is an unrecognized type.
	DTypeUnknown
)

func (DType) String ¶

func (d DType) String() string

String returns the human-readable name of the DType.

type Dataset ¶

type Dataset struct {
	// contains filtered or unexported fields
}

Dataset is the fluent API for data manipulation. All verbs return a new Dataset that records the operation lazily — no computation happens until Dataset.Collect is called. The chain forms a linked list of [op] nodes rooted at a materialised Table.

Usage:

result, err := dataset.From(ds).
    Select("x", "y").
    Filter(dataset.Gt("x", 0)).
    Arrange("x").
    Collect(ctx)

func From ¶

func From(ds Table) Dataset

From wraps a Table in a Dataset for fluent verb chaining.

func NewDataset ¶

func NewDataset(eng Engine, cols ...AnyColumn) (Dataset, error)

NewDataset creates a Dataset from an engine and columns. The schema is inferred from the columns' names and types.

func ReplaceColumn ¶

func ReplaceColumn(ds Dataset, name string, values []float64) Dataset

ReplaceColumn returns a lazy Dataset that replaces a named column with new float64 values when collected.

func (Dataset) AntiJoin ¶

func (f Dataset) AntiJoin(other Table, spec JoinSpec) Dataset

AntiJoin keeps rows from the left that have no match in other.

func (Dataset) Arrange ¶

func (f Dataset) Arrange(cols ...string) Dataset

Arrange sorts the dataset by the named column (ascending).

func (Dataset) Bools ¶

func (d Dataset) Bools(name string) ([]bool, error)

Bools returns the bool values of the named column.

func (Dataset) Collect ¶

func (f Dataset) Collect(ctx context.Context) (Dataset, error)

Collect materialises the lazy operation chain, returning a new Dataset with the result Table populated. If already materialised, returns self.

This is the single materialisation boundary — all data access must go through a collected Dataset.

func (Dataset) Collected ¶

func (f Dataset) Collected() bool

Collected reports whether the Dataset has been materialised.

func (Dataset) Column ¶

func (f Dataset) Column(name string) (AnyColumn, error)

Column retrieves a named column. Requires a collected Dataset.

func (Dataset) Combine ¶

func (f Dataset) Combine(others ...Table) Dataset

Combine horizontally concatenates this dataset with others (column-bind).

func (Dataset) Distinct ¶

func (f Dataset) Distinct(cols ...string) Dataset

Distinct removes duplicate rows based on the specified columns.

func (Dataset) DropNA ¶

func (f Dataset) DropNA(cols ...string) Dataset

DropNA removes rows with missing values in the specified columns.

func (Dataset) Err ¶

func (f Dataset) Err() error

Err returns the first error encountered in the chain, or nil.

func (Dataset) Fill ¶

func (f Dataset) Fill(col string, dir FillDirection) Dataset

Fill forward- or backward-fills missing values in the named column.

func (Dataset) Filter ¶

func (f Dataset) Filter(mask Masker) Dataset

Filter keeps rows where the Masker evaluates to true.

func (Dataset) Float64 ¶

func (d Dataset) Float64(name string, opts ...Float64Opt) ([]float64, error)

Float64 returns the float64 values of the named column, optionally transformed by a chain of Float64Opts. With no opts, the returned slice aliases the underlying column data (zero-copy). Any opt forces a copy before the chain runs, so callers may freely mutate the result.

If the column is int64-backed (DTypeInt64, DTypeTimestamp, DTypeDate, DTypeTime), the values are converted to float64 automatically. This enables all draw functions to work with temporal data without changes.

func (Dataset) FullJoin ¶

func (f Dataset) FullJoin(other Table, spec JoinSpec) Dataset

FullJoin performs a full outer join against other on the given key spec.

func (Dataset) GroupBy ¶

func (f Dataset) GroupBy(cols ...string) GroupedFrame

GroupBy specifies columns to group by. Returns a GroupedFrame for Summarize.

func (Dataset) Head ¶

func (f Dataset) Head(n int) Dataset

Head returns the first n rows.

func (Dataset) InnerJoin ¶

func (f Dataset) InnerJoin(other Table, spec JoinSpec) Dataset

InnerJoin performs an inner join against other on the given key spec.

func (Dataset) Int64 ¶

func (d Dataset) Int64(name string, opts ...Int64Opt) ([]int64, error)

Int64 returns the int64 values of the named column, optionally transformed.

func (Dataset) LeftJoin ¶

func (f Dataset) LeftJoin(other Table, spec JoinSpec) Dataset

LeftJoin performs a left join against other on the given key spec.

func (Dataset) Mutate ¶

func (f Dataset) Mutate(name string, fn MutateFunc) Dataset

Mutate appends or replaces a column using a MutateFunc.

func (Dataset) NumCols ¶

func (f Dataset) NumCols() int64

NumCols returns the number of columns, or 0 if uncollected.

func (Dataset) NumRows ¶

func (f Dataset) NumRows() int64

NumRows returns the number of rows, or 0 if uncollected.

func (Dataset) PivotLonger ¶

func (f Dataset) PivotLonger(spec PivotLongerSpec) Dataset

PivotLonger reshapes wide data to long format.

func (Dataset) PivotWider ¶

func (f Dataset) PivotWider(spec PivotWiderSpec) Dataset

PivotWider reshapes long data to wide format.

func (Dataset) Rename ¶

func (f Dataset) Rename(oldName, newName string) Dataset

Rename renames a column.

func (Dataset) RightJoin ¶

func (f Dataset) RightJoin(other Table, spec JoinSpec) Dataset

RightJoin performs a right join against other on the given key spec.

func (Dataset) Schema ¶

func (f Dataset) Schema() *Schema

Schema returns the schema, or nil if uncollected.

func (Dataset) Select ¶

func (f Dataset) Select(cols ...string) Dataset

Select keeps only the named columns, in the order specified.

func (Dataset) SelectRows ¶

func (f Dataset) SelectRows(indices []int) (Dataset, error)

SelectRows returns a new materialised Dataset containing only the rows at the given indices. This is more efficient than Filter when you already have indices (avoids O(n) bool-mask allocation).

The Dataset must be materialised (collected). Use on collected datasets only.

func (Dataset) SemiJoin ¶

func (f Dataset) SemiJoin(other Table, spec JoinSpec) Dataset

SemiJoin keeps rows from the left that have a match in other.

func (Dataset) Separate ¶

func (f Dataset) Separate(col string, into []string, sep string) Dataset

Separate splits a string column into multiple columns by a separator.

func (Dataset) Slice ¶

func (f Dataset) Slice(start, end int) Dataset

Slice returns rows in the range [start, end).

func (Dataset) Stack ¶

func (f Dataset) Stack(others ...Table) Dataset

Stack vertically concatenates this dataset with others (row-bind).

func (Dataset) Strings ¶

func (d Dataset) Strings(name string, opts ...StringOpt) ([]string, error)

Strings returns the string values of the named column, optionally transformed.

func (Dataset) Table ¶

func (f Dataset) Table() Table

Table returns the underlying Table, or nil if the Dataset is uncollected. Callers must check for nil or call Collect(ctx) before accessing the Table.

func (Dataset) Tail ¶

func (f Dataset) Tail(n int) Dataset

Tail returns the last n rows.

func (Dataset) WithColumn ¶ added in v0.0.4

func (f Dataset) WithColumn(col AnyColumn) Dataset

WithColumn appends or replaces a pre-built column in the dataset. This is the simplest way to inject a column that was constructed externally (e.g., via ColumnFactory.NewInt64Column).

type Engine ¶

type Engine interface {
	// Name returns a human-readable identifier (e.g., "arrow", "memory", "sql").
	Name() string
	// Context returns the engine's lifecycle context.
	Context() context.Context
}

Engine is the marker interface that all compute backends implement. Every engine carries a context.Context that governs its lifecycle. Long-running operations should check Context().Err() for cancellation.

func GetEngine ¶

func GetEngine(ds Table) Engine

GetEngine extracts the engine from a dataset. Returns nil if the dataset does not carry an engine.

type Field ¶

type Field struct {
	Name     string
	Dtype    DType
	Nullable bool
	Metadata map[string]string
}

Field describes a single column in a dataset — its name, logical type, nullability, and optional metadata. This maps directly to arrow.Field.

Metadata carries type-specific parameters that DType alone cannot express:

Timestamp timezone: {"tz": "America/Sao_Paulo"}
Display format: {"format": "2006-01-02"}
Units: {"unit": "ns"}

func BoolCol ¶

func BoolCol(name string) Field

BoolCol creates a bool field descriptor.

func DateCol ¶ added in v0.0.6

func DateCol(name string) Field

DateCol creates a date-only field descriptor (days since epoch).

func FloatCol ¶

func FloatCol(name string) Field

FloatCol creates a float64 field descriptor.

func IntCol ¶

func IntCol(name string) Field

IntCol creates an int64 field descriptor.

func NullableFloatCol ¶

func NullableFloatCol(name string) Field

NullableFloatCol creates a nullable float64 field.

func NullableIntCol ¶

func NullableIntCol(name string) Field

NullableIntCol creates a nullable int64 field.

func NullableStringCol ¶

func NullableStringCol(name string) Field

NullableStringCol creates a nullable string field.

func StringCol ¶

func StringCol(name string) Field

StringCol creates a string field descriptor.

func TimeCol ¶ added in v0.0.6

func TimeCol(name string) Field

TimeCol creates a time-of-day field descriptor (ns since midnight).

func TimestampCol ¶

func TimestampCol(name string) Field

TimestampCol creates a timestamp field descriptor.

func (Field) WithMetadata ¶

func (f Field) WithMetadata(md map[string]string) Field

WithMetadata returns a copy of the field with the given metadata.

func (Field) WithNullable ¶

func (f Field) WithNullable() Field

WithNullable returns a copy of the field with Nullable set.

type FillDirection ¶

type FillDirection int

FillDirection specifies the direction for filling missing values.

const (
	// FillDown fills missing values with the previous non-null value (carry forward).
	FillDown FillDirection = iota
	// FillUp fills missing values with the next non-null value (carry backward).
	FillUp
)

type Filler ¶

type Filler interface {
	Fill(col AnyColumn, dir FillDirection) (AnyColumn, error)
	DropNA(ds Table, cols ...string) (Table, error)
	ReplaceNA(col AnyColumn, defaultVal float64) (AnyColumn, error)
}

Filler provides missing-value handling operations. For Arrow: streaming fill with zero allocation. For SQL: generates COALESCE / window-based fill.

type Filterer ¶

type Filterer interface {
	Filter(ds Table, mask Masker) (Table, error)
}

Filterer provides mask-based row filtering. For Arrow: boolean mask filtering with zero-copy. For SQL: generates WHERE clauses.

type Float64Appender ¶

type Float64Appender interface {
	Append(v float64)
	AppendNull()
	AppendValues(vs []float64)
	Reserve(n int)
}

Float64Appender streams float64 values into a column.

type Float64Opt ¶

type Float64Opt = func([]float64) []float64

Float64Opt transforms a float64 slice (e.g. Clean, Clamp, Sorted).

func FillNaN ¶

func FillNaN(fill float64) Float64Opt

FillNaN replaces all NaNs with the provided value.

type GroupedFrame ¶

type GroupedFrame struct {
	// contains filtered or unexported fields
}

GroupedFrame holds a Frame with group-by columns set.

func (GroupedFrame) Summarize ¶

func (gf GroupedFrame) Summarize(specs ...AggSpec) Dataset

Summarize applies aggregations per group, producing a lazy Dataset.

type HasEngine ¶

type HasEngine interface {
	Table
	Engine() Engine
}

HasEngine is implemented by datasets that carry an engine reference. This enables engine propagation through transformations — stat packages and ggplot internals can produce new datasets using the same engine without importing engine-specific packages.

type InPred ¶

type InPred struct {
	Col  string
	Vals []any
}

InPred selects rows where a column value is in a set of values.

func In ¶

func In(col string, vals ...any) InPred

In builds an IN predicate for the given column and value set.

func (InPred) Expr ¶

func (p InPred) Expr() string

Expr returns the SQL representation of this IN predicate.

func (InPred) Mask ¶

func (p InPred) Mask(ds Table) ([]bool, error)

Mask evaluates the IN predicate against each row.

type Int64Appender ¶

type Int64Appender interface {
	Append(v int64)
	AppendNull()
	AppendValues(vs []int64)
	Reserve(n int)
}

Int64Appender streams int64 values into a column.

type Int64Opt ¶

type Int64Opt = func([]int64) []int64

Int64Opt transforms an int64 slice.

type IsNotNullPred ¶

type IsNotNullPred struct{ Col string }

IsNotNullPred selects rows where a column value is not null.

func IsNotNull ¶

func IsNotNull(col string) IsNotNullPred

IsNotNull builds a not-null-check predicate.

func (IsNotNullPred) Expr ¶

func (p IsNotNullPred) Expr() string

Expr returns the SQL representation of an IS NOT NULL check.

func (IsNotNullPred) Mask ¶

func (p IsNotNullPred) Mask(ds Table) ([]bool, error)

Mask evaluates the IS NOT NULL predicate against each row.

type IsNullPred ¶

type IsNullPred struct{ Col string }

IsNullPred selects rows where a column value is null.

func IsNull ¶

func IsNull(col string) IsNullPred

IsNull builds a null-check predicate.

func (IsNullPred) Expr ¶

func (p IsNullPred) Expr() string

Expr returns the SQL representation of an IS NULL check.

func (IsNullPred) Mask ¶

func (p IsNullPred) Mask(ds Table) ([]bool, error)

Mask evaluates the IS NULL predicate against each row.

type JoinSpec ¶

type JoinSpec struct {
	Type      JoinType
	LeftCols  []string
	RightCols []string
}

JoinSpec describes how to match rows between two datasets.

func On ¶

func On(cols ...string) JoinSpec

On creates a JoinSpec matching on columns with the same name in both datasets.

type JoinType ¶

type JoinType int

JoinType identifies the kind of join to perform.

const (
	// JoinLeft keeps all rows from the left dataset; unmatched right rows are null-filled.
	JoinLeft JoinType = iota
	// JoinRight keeps all rows from the right dataset; unmatched left rows are null-filled.
	JoinRight
	// JoinInner keeps only rows with matches in both datasets.
	JoinInner
	// JoinFull keeps all rows from both datasets; unmatched sides are null-filled.
	JoinFull
	// JoinSemi keeps rows from the left that have at least one match in the right.
	// No columns from the right are included.
	JoinSemi
	// JoinAnti keeps rows from the left that have NO match in the right.
	// No columns from the right are included.
	JoinAnti
)

type Joiner ¶

type Joiner interface {
	Join(left, right Table, spec JoinSpec) (Table, error)
}

Joiner provides join operations across datasets. For Arrow: hash-join with lazy indexed column views. For SQL: generates JOIN ... ON ... clauses.

type Masker ¶

type Masker interface {
	// Mask computes a boolean mask of length int(ds.NumRows()). True entries are kept.
	Mask(ds Table) ([]bool, error)
}

Masker describes a row-level filter condition that can be lazily evaluated against a dataset to produce a boolean mask.

type MathKernel ¶

type MathKernel interface {
	// Binary arithmetic (column × column, same length required)
	AddCols(a, b AnyColumn) (AnyColumn, error)
	SubCols(a, b AnyColumn) (AnyColumn, error)
	MulCols(a, b AnyColumn) (AnyColumn, error)
	DivCols(a, b AnyColumn) (AnyColumn, error)

	// Scalar arithmetic (column × scalar)
	AddScalar(col AnyColumn, val float64) (AnyColumn, error)
	MulScalar(col AnyColumn, val float64) (AnyColumn, error)

	// Unary numeric
	Abs(col AnyColumn) (AnyColumn, error)
	Neg(col AnyColumn) (AnyColumn, error)
	Sign(col AnyColumn) (AnyColumn, error)
	Sqrt(col AnyColumn) (AnyColumn, error)
	Pow(col AnyColumn, exp float64) (AnyColumn, error)

	// Transcendental — logarithmic
	Exp(col AnyColumn) (AnyColumn, error)
	Ln(col AnyColumn) (AnyColumn, error)
	Log2(col AnyColumn) (AnyColumn, error)
	Log10(col AnyColumn) (AnyColumn, error)

	// Transcendental — trigonometric
	Sin(col AnyColumn) (AnyColumn, error)
	Cos(col AnyColumn) (AnyColumn, error)
	Tan(col AnyColumn) (AnyColumn, error)
	Asin(col AnyColumn) (AnyColumn, error)
	Acos(col AnyColumn) (AnyColumn, error)
	Atan(col AnyColumn) (AnyColumn, error)
	Atan2(y, x AnyColumn) (AnyColumn, error)

	// Transcendental — hyperbolic / special
	Tanh(col AnyColumn) (AnyColumn, error)
	Sigmoid(col AnyColumn) (AnyColumn, error)
	Erf(col AnyColumn) (AnyColumn, error)

	// Rounding
	Round(col AnyColumn) (AnyColumn, error)
	Floor(col AnyColumn) (AnyColumn, error)
	Ceil(col AnyColumn) (AnyColumn, error)

	// Bitwise (int64 columns only)
	BitAnd(a, b AnyColumn) (AnyColumn, error)
	BitOr(a, b AnyColumn) (AnyColumn, error)
	BitXor(a, b AnyColumn) (AnyColumn, error)
	BitNot(col AnyColumn) (AnyColumn, error)
	BitShiftLeft(col AnyColumn, n int) (AnyColumn, error)
	BitShiftRight(col AnyColumn, n int) (AnyColumn, error)
}

MathKernel provides element-wise mathematical transforms on numeric columns.

Arrow engine: uses Arrow compute Datum API when available, highway SIMD for gaps. Memory engine: uses highway SIMD on raw slices, falls back to math stdlib. SQL engine: generates MATH functions (EXP, LOG, SIN, etc.)

All methods require float64 columns unless noted (bitwise requires int64).

type MutateFunc ¶

type MutateFunc interface {
	// Apply produces a new column from the dataset.
	Apply(ds Table) (AnyColumn, error)
}

MutateFunc describes a column transformation for Mutate.

type NotPred ¶

type NotPred struct{ Pred Masker }

NotPred inverts a mask.

func Not ¶

func Not(pred Masker) NotPred

Not returns a predicate that inverts the given mask.

func (NotPred) Expr ¶

func (p NotPred) Expr() string

Expr returns the SQL NOT(...) expression.

func (NotPred) Mask ¶

func (p NotPred) Mask(ds Table) ([]bool, error)

Mask evaluates the NOT predicate against the dataset rows.

type Op ¶

type Op int

Op identifies a comparison operator.

const (
	OpGt        Op = iota // col > val
	OpLt                  // col < val
	OpGe                  // col >= val
	OpLe                  // col <= val
	OpEq                  // col == val
	OpNe                  // col != val
	OpBetween             // lo <= col <= hi
	OpIn                  // col IN (vals...)
	OpIsNull              // col IS NULL
	OpIsNotNull           // col IS NOT NULL
)

OpGt identifies the greater-than operator.

type Optimizer ¶

type Optimizer interface {
	Optimize(ops []op) []op
}

Optimizer is optionally implemented by engines that can fuse or reorder operations for efficiency. BigQuery uses this to fuse verb chains into a single SQL query.

type OrPred ¶

type OrPred struct{ Preds []Masker }

OrPred combines masks with OR.

func Or ¶

func Or(preds ...Masker) OrPred

Or combines multiple predicates with logical OR.

func (OrPred) Expr ¶

func (p OrPred) Expr() string

Expr returns the SQL representation of the OR combination.

func (OrPred) Mask ¶

func (p OrPred) Mask(ds Table) ([]bool, error)

Mask evaluates the OR predicate against the dataset rows.

type ParquetConfig ¶

type ParquetConfig struct {
	// Compression codec: "snappy", "gzip", "zstd", "lz4", "none".
	Compression string
}

ParquetConfig holds engine-agnostic Parquet configuration.

type ParquetReader ¶

type ParquetReader interface {
	ReadParquet(ctx context.Context, eng Engine, r io.ReaderAt, size int64, cfg ParquetConfig) (Table, error)
}

ParquetReader reads Parquet data into an engine-native Dataset. Memory engine: uses parquet-go for struct-based row reading. Arrow engine: uses pqarrow.ReadTable for zero-copy columnar ingest.

func GetParquetReader ¶ added in v0.0.6

func GetParquetReader(engineName string) (ParquetReader, bool)

GetParquetReader retrieves a registered ParquetReader for an engine.

type ParquetWriter ¶

type ParquetWriter interface {
	WriteParquet(ctx context.Context, eng Engine, w io.Writer, ds Table, cfg ParquetConfig) error
}

ParquetWriter writes a Dataset to Parquet format. Memory engine: uses parquet-go GenericWriter. Arrow engine: uses pqarrow.WriteTable.

func GetParquetWriter ¶ added in v0.0.6

func GetParquetWriter(engineName string) (ParquetWriter, bool)

GetParquetWriter retrieves a registered ParquetWriter for an engine.

type PivotLongerSpec ¶

type PivotLongerSpec struct {
	// Cols are the column names to pivot from wide to long format.
	// These columns are "gathered" into a single name+value pair.
	Cols []string
	// NamesTo is the output column name that will hold the original column names.
	NamesTo string
	// ValuesTo is the output column name that will hold the values.
	ValuesTo string
}

PivotLongerSpec configures a PivotLonger operation.

type PivotWiderSpec ¶

type PivotWiderSpec struct {
	// NamesFrom is the column whose unique values become new column names.
	NamesFrom string
	// ValuesFrom is the column whose values fill the new columns.
	ValuesFrom string
}

PivotWiderSpec configures a PivotWider operation.

type Reshaper ¶

type Reshaper interface {
	PivotLonger(ds Table, spec PivotLongerSpec) (Table, error)
	PivotWider(ds Table, spec PivotWiderSpec) (Table, error)
	Separate(ds Table, col string, into []string, sep string) (Table, error)
	Concatenate(ds Table, col string, from []string, sep string) (Table, error)
	Complete(ds Table, cols ...string) (Table, error)
}

Reshaper provides reshape/pivot operations. For Arrow: lazy column views (repeatedView, interleavedView). For SQL: generates CASE WHEN / UNPIVOT / CROSSTAB.

type Schema ¶

type Schema struct {
	// contains filtered or unexported fields
}

Schema describes the complete structure of a dataset — an ordered collection of Fields with a name-to-index lookup. This maps directly to arrow.Schema.

func NewSchema ¶

func NewSchema(fields ...Field) *Schema

NewSchema creates a Schema from an ordered list of fields. Panics if any two fields share the same name.

func (*Schema) Field ¶

func (s *Schema) Field(i int) Field

Field returns the field at index i.

func (*Schema) FieldIndex ¶

func (s *Schema) FieldIndex(name string) int

FieldIndex returns the index of the named field, or -1.

func (*Schema) Fields ¶

func (s *Schema) Fields() []Field

Fields returns a copy of the schema's fields.

func (*Schema) HasField ¶

func (s *Schema) HasField(name string) bool

HasField returns true if the schema contains a field with the given name.

func (*Schema) NumFields ¶

func (s *Schema) NumFields() int

NumFields returns the number of fields.

type Selector ¶

type Selector interface {
	// Select reorders/selects rows by index (scatter-gather).
	// This is the Arrow "Take" kernel.
	Select(col AnyColumn, indices []int) (AnyColumn, error)

	// Slice returns rows [start, end) from a column.
	// For Arrow: zero-copy via array.NewSlice.
	Slice(col AnyColumn, start, end int) (AnyColumn, error)

	// SortIndices returns the permutation that sorts the column ascending.
	// Returns an int slice, not a column — it's metadata for Take().
	SortIndices(col AnyColumn) ([]int, error)

	// FilterIndices returns the row indices where mask[i] == true.
	// Returns an int slice for use with Take().
	FilterIndices(mask []bool) []int
}

Selector provides engine-native column/row manipulation primitives. These are the building blocks for Frame verbs (Select, Arrange, Head, etc.).

For Arrow: zero-copy slicing, compute Take kernel, sort-indices kernel. For Memory: direct slice operations. For SQL: generates ORDER BY, LIMIT/OFFSET, WHERE rowid IN (...).

type StatKernel ¶ added in v0.0.5

type StatKernel interface {
	// Histogram bins a numeric column into equal-width bins.
	// Returns a Table with columns: "x" (bin centers) and "count" (frequencies).
	// nBins <= 0 means auto-select using Sturges' rule.
	Histogram(col AnyColumn, nBins int) (Table, error)

	// KDE computes kernel density estimation over a numeric column.
	// Returns a Table with columns: "x" (grid points) and "density".
	// bandwidth <= 0 means Silverman auto-select. points is the output grid size.
	KDE(ctx context.Context, col AnyColumn, bandwidth float64, points int) (Table, error)

	// LinearFit computes OLS linear regression y = a + b*x.
	// Returns a Table with columns: "x" (grid) and "y" (fitted values).
	// nOut is the number of output grid points.
	LinearFit(xCol, yCol AnyColumn, nOut int) (Table, error)

	// LoessFit computes locally weighted regression (LOESS).
	// Returns a Table with columns: "x" (grid) and "y" (fitted values).
	// nOut is the number of output grid points.
	LoessFit(ctx context.Context, xCol, yCol AnyColumn, nOut int) (Table, error)

	// LinearFitSE computes OLS regression with 95% confidence bands.
	// Returns a Table with columns: "x", "y" (fitted), "ymin", "ymax".
	// nOut is the number of output grid points.
	LinearFitSE(xCol, yCol AnyColumn, nOut int) (Table, error)

	// LoessFitSE computes LOESS with approximate 95% confidence bands.
	// Returns a Table with columns: "x", "y" (fitted), "ymin", "ymax".
	// nOut is the number of output grid points.
	LoessFitSE(ctx context.Context, xCol, yCol AnyColumn, nOut int) (Table, error)

	// Boxplot computes the five-number summary for a numeric column,
	// optionally grouped by a categorical column.
	// Returns a Table with columns: "x", "lower", "q1", "middle", "q3",
	// "upper", "notch_lower", "notch_upper".
	// groupCol may be nil for a single-group boxplot.
	// whisker is "tukey" (1.5*IQR) or "range" (min-max).
	Boxplot(yCol, groupCol AnyColumn, whisker string, notch bool) (Table, error)
}

StatKernel provides statistical compute kernels that produce new Tables. These are higher-level operations that consume one or more columns and produce a complete result table.

For Memory/Arrow: implemented via go-highway SIMD + stdlib math. For SQL: could generate UDFs or client-side fallback.

type StringAppender ¶

type StringAppender interface {
	Append(v string)
	AppendNull()
	AppendValues(vs []string)
	Reserve(n int)
}

StringAppender streams string values into a column.

type StringOpt ¶

type StringOpt = func([]string) []string

StringOpt transforms a string slice.

type Table ¶

type Table interface {
	// Schema returns the dataset's schema.
	Schema() *Schema

	// Column retrieves a named column. Returns [ColumnNotFoundError] if absent.
	// The returned [AnyColumn] can be type-asserted to [Column[T]] for typed
	// access, or use [GetColumn] for a safe generic retrieval.
	Column(name string) (AnyColumn, error)

	// NumRows returns the logical number of rows.
	NumRows() int64

	// NumCols returns the number of columns.
	NumCols() int64
}

Table represents an immutable, columnar data source.

Implementations include in-memory tables, Arrow tables, and BigQuery-backed remote tables. ETL verbs are exposed by wrapping a Table in a Dataset (the fluent API defined in frame.go) via From.

type Windower ¶

type Windower interface {
	Lag(col AnyColumn, n int) (AnyColumn, error)
	Lead(col AnyColumn, n int) (AnyColumn, error)
	CumSum(col AnyColumn) (AnyColumn, error)
	CumMax(col AnyColumn) (AnyColumn, error)
	CumMin(col AnyColumn) (AnyColumn, error)
	Rank(col AnyColumn) (AnyColumn, error)
	DenseRank(col AnyColumn) (AnyColumn, error)
	PercentRank(col AnyColumn) (AnyColumn, error)
	RowNumber(n int) (AnyColumn, error)
}

Windower provides window function kernels. For Arrow: streaming accumulators over Arrow arrays. For SQL: generates OVER() / WINDOW clauses.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
arrow Package arrow provides an Apache Arrow-backed compute engine for the dataset package.	Package arrow provides an Apache Arrow-backed compute engine for the dataset package.
csv Package csv provides the Arrow CSV engine driver.	Package csv provides the Arrow CSV engine driver.
parquet Package parquet provides the Arrow Parquet engine driver.	Package parquet provides the Arrow Parquet engine driver.
bigquery Package bigquery implements a BigQuery SQL pushdown engine for the dataset library.	Package bigquery implements a BigQuery SQL pushdown engine for the dataset library.
compute Package compute provides portable SIMD primitives for the dataset engines.	Package compute provides portable SIMD primitives for the dataset engines.
csv Package csv provides CSV reading and writing for the dataset package.	Package csv provides CSV reading and writing for the dataset package.
math Package math provides SIMD-accelerated mathematical transforms for the dataset engines.	Package math provides SIMD-accelerated mathematical transforms for the dataset engines.
memory Package memory provides a lightweight Go-slice-backed compute engine for the dataset package.	Package memory provides a lightweight Go-slice-backed compute engine for the dataset package.
csv Package csv provides the Memory CSV engine driver.	Package csv provides the Memory CSV engine driver.
parquet Package parquet provides the Memory Parquet engine driver.	Package parquet provides the Memory Parquet engine driver.
parquet Package parquet provides Parquet reading and writing for the dataset package.	Package parquet provides Parquet reading and writing for the dataset package.
sort Package sort provides SIMD-accelerated sorting for the dataset engines.	Package sort provides SIMD-accelerated sorting for the dataset engines.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL