Documentation
¶
Overview ¶
Package arrow provides an Apache Arrow-backed compute engine for the dataset package. It implements dataset.ColumnFactory, dataset.BuilderFactory, dataset.Aggregator, and dataset.Caster using Arrow arrays and arrow/math SIMD kernels.
Usage:
eng := arrow.NewEngine(ctx, memory.DefaultAllocator)
f := eng.(dataset.ColumnFactory)
ds, _ := f.FromColumns(
dataset.NewSchema(dataset.FloatCol("x"), dataset.StringCol("label")),
f.NewFloat64Column("x", []float64{1, 2, 3}),
f.NewStringColumn("label", []string{"a", "b", "c"}),
)
Index ¶
- Variables
- type Engine
- func (e *Engine) Abs(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Acos(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) AddCols(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) AddScalar(col dataset.AnyColumn, val float64) (dataset.AnyColumn, error)
- func (e *Engine) Alloc() memory.Allocator
- func (e *Engine) Asin(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Atan(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Atan2(y, x dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) BitAnd(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) BitNot(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) BitOr(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) BitShiftLeft(col dataset.AnyColumn, n int) (dataset.AnyColumn, error)
- func (e *Engine) BitShiftRight(col dataset.AnyColumn, n int) (dataset.AnyColumn, error)
- func (e *Engine) BitXor(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Boxplot(yCol, groupCol dataset.AnyColumn, whisker string, notch bool) (dataset.Table, error)
- func (e *Engine) Cast(col dataset.AnyColumn, target dataset.DType) (dataset.AnyColumn, error)
- func (e *Engine) Ceil(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Combine(datasets ...dataset.Table) (dataset.Table, error)
- func (e *Engine) Complete(ds dataset.Table, cols ...string) (dataset.Table, error)
- func (e *Engine) Concatenate(ds dataset.Table, col string, from []string, sep string) (dataset.Table, error)
- func (e *Engine) Context() context.Context
- func (e *Engine) Cos(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Count(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) CumMax(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) CumMin(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) CumSum(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) DenseRank(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) DivCols(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) DropNA(ds dataset.Table, cols ...string) (dataset.Table, error)
- func (e *Engine) Erf(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Exp(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Fill(col dataset.AnyColumn, dir dataset.FillDirection) (dataset.AnyColumn, error)
- func (e *Engine) Filter(ds dataset.Table, mask dataset.Masker) (dataset.Table, error)
- func (e *Engine) FilterIndices(mask []bool) []int
- func (e *Engine) First(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Floor(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) FromColumns(schema *dataset.Schema, cols ...dataset.AnyColumn) (dataset.Table, error)
- func (e *Engine) Histogram(col dataset.AnyColumn, nBins int) (dataset.Table, error)
- func (e *Engine) Join(left, right dataset.Table, spec dataset.JoinSpec) (dataset.Table, error)
- func (e *Engine) KDE(ctx context.Context, col dataset.AnyColumn, bandwidth float64, points int) (dataset.Table, error)
- func (e *Engine) Lag(col dataset.AnyColumn, offset int) (dataset.AnyColumn, error)
- func (e *Engine) Last(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Lead(col dataset.AnyColumn, offset int) (dataset.AnyColumn, error)
- func (e *Engine) LinearFit(xCol, yCol dataset.AnyColumn, nOut int) (dataset.Table, error)
- func (e *Engine) LinearFitSE(xCol, yCol dataset.AnyColumn, nOut int) (dataset.Table, error)
- func (e *Engine) Ln(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) LoessFit(ctx context.Context, xCol, yCol dataset.AnyColumn, nOut int) (dataset.Table, error)
- func (e *Engine) LoessFitSE(ctx context.Context, xCol, yCol dataset.AnyColumn, nOut int) (dataset.Table, error)
- func (e *Engine) Log2(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Log10(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Mean(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Median(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) MinMax(col dataset.AnyColumn) (dataset.AnyColumn, dataset.AnyColumn, error)
- func (e *Engine) Mode(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) MulCols(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) MulScalar(col dataset.AnyColumn, val float64) (dataset.AnyColumn, error)
- func (e *Engine) Name() string
- func (e *Engine) Neg(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) NewBoolColumn(name string, data []bool) dataset.AnyColumn
- func (e *Engine) NewBuilder(schema *dataset.Schema) dataset.Builder
- func (e *Engine) NewDateColumn(name string, data []int64) dataset.AnyColumn
- func (e *Engine) NewFloat64Column(name string, data []float64) dataset.AnyColumn
- func (e *Engine) NewInt64Column(name string, data []int64) dataset.AnyColumn
- func (e *Engine) NewStringColumn(name string, data []string) dataset.AnyColumn
- func (e *Engine) NewTimeColumn(name string, data []int64) dataset.AnyColumn
- func (e *Engine) NewTimestampColumn(name string, data []int64) dataset.AnyColumn
- func (e *Engine) PercentRank(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Percentile(col dataset.AnyColumn, p float64) (dataset.AnyColumn, error)
- func (e *Engine) PivotLonger(ds dataset.Table, spec dataset.PivotLongerSpec) (dataset.Table, error)
- func (e *Engine) PivotWider(ds dataset.Table, spec dataset.PivotWiderSpec) (dataset.Table, error)
- func (e *Engine) Pow(col dataset.AnyColumn, exp float64) (dataset.AnyColumn, error)
- func (e *Engine) Rank(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) ReplaceNA(col dataset.AnyColumn, defaultVal float64) (dataset.AnyColumn, error)
- func (e *Engine) Round(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) RowNumber(n int) (dataset.AnyColumn, error)
- func (e *Engine) Select(col dataset.AnyColumn, indices []int) (dataset.AnyColumn, error)
- func (e *Engine) Separate(ds dataset.Table, col string, into []string, sep string) (dataset.Table, error)
- func (e *Engine) Sigmoid(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Sign(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Sin(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Slice(col dataset.AnyColumn, start, end int) (dataset.AnyColumn, error)
- func (e *Engine) SortIndices(col dataset.AnyColumn) ([]int, error)
- func (e *Engine) Sqrt(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Stack(datasets ...dataset.Table) (dataset.Table, error)
- func (e *Engine) StdDev(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) SubCols(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Sum(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Tan(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Tanh(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Variance(col dataset.AnyColumn) (dataset.AnyColumn, error)
Constants ¶
This section is empty.
Variables ¶
var ( // ErrUnsupportedType is returned for unsupported column types. ErrUnsupportedType = errors.New("arrow: unsupported column type") // ErrLengthMismatch is returned when column lengths don't match. ErrLengthMismatch = errors.New("arrow: column length mismatch") // ErrEmptyColumn is returned when an operation requires non-empty data. ErrEmptyColumn = errors.New("arrow: empty column") // ErrRequiresFloat64 is returned when a float64 column is required. ErrRequiresFloat64 = errors.New("arrow: operation requires float64 column") // ErrRequiresInt64 is returned when an int64 column is required. ErrRequiresInt64 = errors.New("arrow: operation requires int64 column") // ErrRequiresNumeric is returned when a numeric column is required. ErrRequiresNumeric = errors.New("arrow: operation requires numeric column") // ErrJoinKeyMismatch is returned when join key types don't match. ErrJoinKeyMismatch = errors.New("arrow: join key type mismatch") // ErrTakeTypeMismatch is returned when a Take/Slice result has unexpected type. ErrTakeTypeMismatch = errors.New("arrow: unexpected result type from Take/Slice") // ErrComputeTypeMismatch is returned when a compute kernel result has unexpected type. ErrComputeTypeMismatch = errors.New("arrow: unexpected result type from compute kernel") // ErrOutOfRange is returned when a parameter value is out of the expected range. ErrOutOfRange = errors.New("arrow: value out of range") )
Sentinel errors for the arrow engine package.
Functions ¶
This section is empty.
Types ¶
type Engine ¶
type Engine struct {
// contains filtered or unexported fields
}
Engine is the Arrow compute backend.
func NewEngine ¶
NewEngine creates an Arrow engine with the given lifecycle context and memory allocator.
func (*Engine) AddCols ¶
AddCols returns the element-wise sum of two float64 columns (Arrow native).
func (*Engine) BitShiftLeft ¶
BitShiftLeft shifts each int64 element left by n bits.
func (*Engine) BitShiftRight ¶
BitShiftRight shifts each int64 element right by n bits.
func (*Engine) Boxplot ¶ added in v0.0.5
func (e *Engine) Boxplot(yCol, groupCol dataset.AnyColumn, whisker string, notch bool) (dataset.Table, error)
Boxplot computes the five-number summary for a numeric column.
func (*Engine) Complete ¶
Complete generates all combinations of the specified columns' unique values.
func (*Engine) Concatenate ¶
func (e *Engine) Concatenate(ds dataset.Table, col string, from []string, sep string) (dataset.Table, error)
Concatenate joins multiple string columns into one with a separator.
func (*Engine) FilterIndices ¶
FilterIndices returns the indices where mask is true.
func (*Engine) First ¶ added in v0.0.5
First returns the first element of a column as a single-row column.
func (*Engine) FromColumns ¶
func (e *Engine) FromColumns(schema *dataset.Schema, cols ...dataset.AnyColumn) (dataset.Table, error)
FromColumns builds a Table from pre-built Arrow columns.
func (*Engine) Join ¶
Join implements the Joiner interface with a hash-join algorithm. It supports Inner, Left, Right, Full, Semi, and Anti joins.
func (*Engine) KDE ¶ added in v0.0.5
func (e *Engine) KDE(ctx context.Context, col dataset.AnyColumn, bandwidth float64, points int) (dataset.Table, error)
KDE computes kernel density estimation over a numeric column.
func (*Engine) Last ¶ added in v0.0.5
Last returns the last element of a column as a single-row column.
func (*Engine) LinearFitSE ¶ added in v0.0.5
LinearFitSE computes OLS regression with 95% confidence bands.
func (*Engine) LoessFit ¶ added in v0.0.5
func (e *Engine) LoessFit(ctx context.Context, xCol, yCol dataset.AnyColumn, nOut int) (dataset.Table, error)
LoessFit computes locally weighted regression (LOESS).
func (*Engine) LoessFitSE ¶ added in v0.0.5
func (e *Engine) LoessFitSE(ctx context.Context, xCol, yCol dataset.AnyColumn, nOut int) (dataset.Table, error)
LoessFitSE computes LOESS with approximate 95% confidence bands.
func (*Engine) Mode ¶ added in v0.0.5
Mode returns the most frequent value as a single-row column. For ties, the first sorted value wins (deterministic). Float64 and int64 use a sort-based scan (O(n log n), no map overhead). Strings iterate over the Arrow array directly (no materialization to []string).
func (*Engine) NewBoolColumn ¶
NewBoolColumn creates a bool column backed by an Arrow array.
func (*Engine) NewBuilder ¶
NewBuilder creates a row-at-a-time Table builder.
func (*Engine) NewDateColumn ¶ added in v0.0.6
NewDateColumn creates a date column (days since epoch) stored as Arrow int64.
func (*Engine) NewFloat64Column ¶
NewFloat64Column creates a float64 column backed by an Arrow array.
func (*Engine) NewInt64Column ¶
NewInt64Column creates an int64 column backed by an Arrow array.
func (*Engine) NewStringColumn ¶
NewStringColumn creates a string column backed by an Arrow array.
func (*Engine) NewTimeColumn ¶ added in v0.0.6
NewTimeColumn creates a time-of-day column (nanoseconds since midnight) stored as Arrow int64.
func (*Engine) NewTimestampColumn ¶
NewTimestampColumn creates a timestamp column stored as Arrow int64.
func (*Engine) PercentRank ¶
PercentRank returns the percent rank ((rank-1) / (n-1)).
func (*Engine) Percentile ¶ added in v0.0.5
Percentile returns the p-th quantile as a single-row float64 column. p ∈ [0,1]. Uses sort-based R-7 linear interpolation. Float64: zero-copy slice → copy → sort → interpolate. Int64: zero-copy → convert to float64 → sort → interpolate.
func (*Engine) PivotLonger ¶
PivotLonger reshapes a wide dataset to long format.
func (*Engine) PivotWider ¶
PivotWider reshapes a long dataset to wide format.
func (*Engine) Separate ¶
func (e *Engine) Separate(ds dataset.Table, col string, into []string, sep string) (dataset.Table, error)
Separate splits a string column by a delimiter into multiple columns.
func (*Engine) SortIndices ¶
SortIndices uses Arrow compute's SortIndicesArray kernel. Arrow's implementation handles null placement and type dispatch natively.
func (*Engine) StdDev ¶ added in v0.0.5
StdDev returns the sample standard deviation as a single-row float64 column.