Documentation
¶
Overview ¶
Package memory provides a lightweight Go-slice-backed compute engine for the dataset package. It implements dataset.ColumnFactory, dataset.BuilderFactory, dataset.Aggregator, and dataset.Caster.
Usage:
eng := memory.NewEngine(context.Background())
f := eng.(dataset.ColumnFactory)
ds, _ := f.FromColumns(
dataset.NewSchema(dataset.FloatCol("x"), dataset.StringCol("label")),
f.NewFloat64Column("x", []float64{1, 2, 3}),
f.NewStringColumn("label", []string{"a", "b", "c"}),
)
Index ¶
- Variables
- type Engine
- func (e *Engine) Abs(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Acos(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) AddCols(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) AddScalar(col dataset.AnyColumn, val float64) (dataset.AnyColumn, error)
- func (e *Engine) Asin(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Atan(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Atan2(y, x dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) BitAnd(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) BitNot(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) BitOr(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) BitShiftLeft(col dataset.AnyColumn, n int) (dataset.AnyColumn, error)
- func (e *Engine) BitShiftRight(col dataset.AnyColumn, n int) (dataset.AnyColumn, error)
- func (e *Engine) BitXor(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Boxplot(yCol, groupCol dataset.AnyColumn, whisker string, notch bool) (dataset.Table, error)
- func (e *Engine) Cast(col dataset.AnyColumn, target dataset.DType) (dataset.AnyColumn, error)
- func (e *Engine) Ceil(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Combine(datasets ...dataset.Table) (dataset.Table, error)
- func (e *Engine) Complete(ds dataset.Table, cols ...string) (dataset.Table, error)
- func (e *Engine) Concatenate(ds dataset.Table, col string, from []string, sep string) (dataset.Table, error)
- func (e *Engine) Context() context.Context
- func (e *Engine) Cos(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Count(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) CumMax(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) CumMin(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) CumSum(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) DenseRank(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) DivCols(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) DropNA(ds dataset.Table, cols ...string) (dataset.Table, error)
- func (e *Engine) Erf(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Exp(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Fill(col dataset.AnyColumn, dir dataset.FillDirection) (dataset.AnyColumn, error)
- func (e *Engine) Filter(ds dataset.Table, mask dataset.Masker) (dataset.Table, error)
- func (e *Engine) FilterIndices(mask []bool) []int
- func (e *Engine) First(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Floor(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) FromColumns(schema *dataset.Schema, cols ...dataset.AnyColumn) (dataset.Table, error)
- func (e *Engine) Histogram(col dataset.AnyColumn, nBins int) (dataset.Table, error)
- func (e *Engine) Join(left, right dataset.Table, spec dataset.JoinSpec) (dataset.Table, error)
- func (e *Engine) KDE(ctx context.Context, col dataset.AnyColumn, bandwidth float64, points int) (dataset.Table, error)
- func (e *Engine) Lag(col dataset.AnyColumn, n int) (dataset.AnyColumn, error)
- func (e *Engine) Last(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Lead(col dataset.AnyColumn, n int) (dataset.AnyColumn, error)
- func (e *Engine) LinearFit(xCol, yCol dataset.AnyColumn, nOut int) (dataset.Table, error)
- func (e *Engine) LinearFitSE(xCol, yCol dataset.AnyColumn, nOut int) (dataset.Table, error)
- func (e *Engine) Ln(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) LoessFit(ctx context.Context, xCol, yCol dataset.AnyColumn, nOut int) (dataset.Table, error)
- func (e *Engine) LoessFitSE(ctx context.Context, xCol, yCol dataset.AnyColumn, nOut int) (dataset.Table, error)
- func (e *Engine) Log2(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Log10(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Mean(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Median(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) MinMax(col dataset.AnyColumn) (dataset.AnyColumn, dataset.AnyColumn, error)
- func (e *Engine) Mode(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) MulCols(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) MulScalar(col dataset.AnyColumn, val float64) (dataset.AnyColumn, error)
- func (e *Engine) Name() string
- func (e *Engine) Neg(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) NewBoolColumn(name string, data []bool) dataset.AnyColumn
- func (e *Engine) NewBuilder(schema *dataset.Schema) dataset.Builder
- func (e *Engine) NewDateColumn(name string, days []int64) dataset.AnyColumn
- func (e *Engine) NewDateFromString(name string, values []string) (dataset.AnyColumn, error)
- func (e *Engine) NewDateFromTime(name string, times []time.Time) dataset.AnyColumn
- func (e *Engine) NewFloat64Column(name string, data []float64) dataset.AnyColumn
- func (e *Engine) NewInt64Column(name string, data []int64) dataset.AnyColumn
- func (e *Engine) NewStringColumn(name string, data []string) dataset.AnyColumn
- func (e *Engine) NewTimeColumn(name string, nanos []int64) dataset.AnyColumn
- func (e *Engine) NewTimestampColumn(name string, data []int64) dataset.AnyColumn
- func (e *Engine) NewTimestampFromString(name string, values []string) (dataset.AnyColumn, error)
- func (e *Engine) NewTimestampFromTime(name string, times []time.Time) dataset.AnyColumn
- func (e *Engine) PercentRank(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Percentile(col dataset.AnyColumn, p float64) (dataset.AnyColumn, error)
- func (e *Engine) PivotLonger(ds dataset.Table, spec dataset.PivotLongerSpec) (dataset.Table, error)
- func (e *Engine) PivotWider(ds dataset.Table, spec dataset.PivotWiderSpec) (dataset.Table, error)
- func (e *Engine) Pow(col dataset.AnyColumn, exp float64) (dataset.AnyColumn, error)
- func (e *Engine) Rank(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) ReplaceNA(col dataset.AnyColumn, defaultVal float64) (dataset.AnyColumn, error)
- func (e *Engine) Round(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) RowNumber(n int) (dataset.AnyColumn, error)
- func (e *Engine) Select(col dataset.AnyColumn, indices []int) (dataset.AnyColumn, error)
- func (e *Engine) Separate(ds dataset.Table, col string, into []string, sep string) (dataset.Table, error)
- func (e *Engine) Sigmoid(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Sign(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Sin(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Slice(col dataset.AnyColumn, start, end int) (dataset.AnyColumn, error)
- func (e *Engine) SortIndices(col dataset.AnyColumn) ([]int, error)
- func (e *Engine) Sqrt(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Stack(datasets ...dataset.Table) (dataset.Table, error)
- func (e *Engine) StdDev(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) SubCols(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Sum(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Tan(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Tanh(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Variance(col dataset.AnyColumn) (dataset.AnyColumn, error)
Constants ¶
This section is empty.
Variables ¶
var ( // ErrUnsupportedType is returned for unsupported column types. ErrUnsupportedType = errors.New("memory: unsupported column type") // ErrLengthMismatch is returned when column lengths don't match. ErrLengthMismatch = errors.New("memory: column length mismatch") // ErrEmptyColumn is returned when an operation requires non-empty data. ErrEmptyColumn = errors.New("memory: empty column") // ErrRequiresFloat64 is returned when a float64 column is required. ErrRequiresFloat64 = errors.New("memory: operation requires float64 column") // ErrRequiresInt64 is returned when an int64 column is required. ErrRequiresInt64 = errors.New("memory: operation requires int64 column") // ErrRequiresNumeric is returned when a numeric column is required. ErrRequiresNumeric = errors.New("memory: operation requires numeric column") // ErrJoinKeyMismatch is returned when join key types don't match. ErrJoinKeyMismatch = errors.New("memory: join key type mismatch") // ErrTakeTypeMismatch is returned when a Take/Select result has unexpected type. ErrTakeTypeMismatch = errors.New("memory: unexpected result type from Take/Select") // ErrOutOfRange is returned when a parameter value is out of the expected range. ErrOutOfRange = errors.New("memory: value out of range") )
Sentinel errors for the memory engine package.
Functions ¶
This section is empty.
Types ¶
type Engine ¶
type Engine struct {
// contains filtered or unexported fields
}
Engine is the Go-slice compute backend.
func (*Engine) BitShiftLeft ¶
BitShiftLeft shifts each int64 element left by n bits.
func (*Engine) BitShiftRight ¶
BitShiftRight shifts each int64 element right by n bits.
func (*Engine) Boxplot ¶ added in v0.0.5
func (e *Engine) Boxplot(yCol, groupCol dataset.AnyColumn, whisker string, notch bool) (dataset.Table, error)
Boxplot computes the five-number summary for a numeric column.
func (*Engine) Complete ¶
Complete generates all combinations of the specified columns' unique values, filling missing rows with null values.
func (*Engine) Concatenate ¶
func (e *Engine) Concatenate(ds dataset.Table, col string, from []string, sep string) (dataset.Table, error)
Concatenate joins multiple string columns into one with a separator.
func (*Engine) FilterIndices ¶
FilterIndices returns the indices where mask is true.
func (*Engine) First ¶ added in v0.0.5
First returns the first element of a column as a single-row column.
func (*Engine) FromColumns ¶
func (e *Engine) FromColumns(schema *dataset.Schema, cols ...dataset.AnyColumn) (dataset.Table, error)
FromColumns constructs a Table from a schema and pre-built columns.
func (*Engine) Join ¶
Join implements the Joiner interface with a hash-join algorithm. It supports Inner, Left, Right, Full, Semi, and Anti joins.
func (*Engine) KDE ¶ added in v0.0.5
func (e *Engine) KDE(ctx context.Context, col dataset.AnyColumn, bandwidth float64, points int) (dataset.Table, error)
KDE computes kernel density estimation over a numeric column.
func (*Engine) Last ¶ added in v0.0.5
Last returns the last element of a column as a single-row column.
func (*Engine) LinearFitSE ¶ added in v0.0.5
LinearFitSE computes OLS regression with 95% confidence bands.
func (*Engine) LoessFit ¶ added in v0.0.5
func (e *Engine) LoessFit(ctx context.Context, xCol, yCol dataset.AnyColumn, nOut int) (dataset.Table, error)
LoessFit computes locally weighted regression (LOESS).
func (*Engine) LoessFitSE ¶ added in v0.0.5
func (e *Engine) LoessFitSE(ctx context.Context, xCol, yCol dataset.AnyColumn, nOut int) (dataset.Table, error)
LoessFitSE computes LOESS with approximate 95% confidence bands. Uses local residual variance to estimate SE at each grid point.
func (*Engine) Mode ¶ added in v0.0.5
Mode returns the most frequent value as a single-row column. For ties, the first value encountered wins.
func (*Engine) NewBoolColumn ¶
NewBoolColumn creates a bool column from the given slice.
func (*Engine) NewBuilder ¶
NewBuilder creates a typed row-appender for the given schema.
func (*Engine) NewDateColumn ¶ added in v0.0.6
NewDateColumn creates a date-only column from int64 days since the Unix epoch (1970-01-01). Compatible with Arrow DATE32.
func (*Engine) NewDateFromString ¶ added in v0.0.6
NewDateFromString creates a date column by parsing date strings using rickb777/date. Supports ISO 8601 ("2006-01-02", "20060102"). Returns an error if any value fails to parse.
func (*Engine) NewDateFromTime ¶ added in v0.0.6
NewDateFromTime creates a date column from Go time.Time values. Each value is truncated to midnight UTC and stored as days since epoch.
func (*Engine) NewFloat64Column ¶
NewFloat64Column creates a float64 column from the given slice.
func (*Engine) NewInt64Column ¶
NewInt64Column creates an int64 column from the given slice.
func (*Engine) NewStringColumn ¶
NewStringColumn creates a string column from the given slice.
func (*Engine) NewTimeColumn ¶ added in v0.0.6
NewTimeColumn creates a time-of-day column from int64 nanoseconds since midnight (00:00:00.000000000). Compatible with Arrow TIME64(ns).
func (*Engine) NewTimestampColumn ¶
NewTimestampColumn creates a timestamp column (int64-backed) from the given slice.
func (*Engine) NewTimestampFromString ¶ added in v0.0.6
NewTimestampFromString creates a timestamp column by parsing date/time strings. Tries RFC3339 first, then ISO 8601 date (via rickb777/date), then common layouts. Returns an error if any value fails to parse.
func (*Engine) NewTimestampFromTime ¶ added in v0.0.6
NewTimestampFromTime creates a timestamp column from Go time.Time values. Each value is converted to UnixNano (int64 nanoseconds since epoch).
func (*Engine) PercentRank ¶
PercentRank returns (rank - 1) / (n - 1) as float64. Returns 0 for single element.
func (*Engine) Percentile ¶ added in v0.0.5
Percentile returns the p-th quantile as a single-row float64 column. p ∈ [0,1]. Uses sort-based linear interpolation (R-7 method).
func (*Engine) PivotLonger ¶
PivotLonger reshapes a wide dataset to long format. Columns listed in spec.Cols are "gathered" into two new columns: spec.NamesTo (holds original column names) and spec.ValuesTo (holds values). All other columns are repeated for each gathered column.
func (*Engine) PivotWider ¶
PivotWider reshapes a long dataset to wide format. spec.NamesFrom identifies the column whose unique values become new column names. spec.ValuesFrom identifies the column whose values fill the new columns. All other columns are the "id" columns that define unique rows.
func (*Engine) Rank ¶
Rank returns competition rank (1-indexed). Ties get the same rank, next rank skips. E.g. [10,20,20,30] → [1,2,2,4].
func (*Engine) ReplaceNA ¶
ReplaceNA replaces null (NaN) values in a float64 column with defaultVal.
func (*Engine) Separate ¶
func (e *Engine) Separate(ds dataset.Table, col string, into []string, sep string) (dataset.Table, error)
Separate splits a string column by a delimiter into multiple columns.
func (*Engine) SortIndices ¶
SortIndices returns the permutation that sorts the column ascending.
func (*Engine) StdDev ¶ added in v0.0.5
StdDev returns the sample standard deviation as a single-row float64 column.