Documentation
¶
Overview ¶
Package memory provides a lightweight Go-slice-backed compute engine for the dataset package. It implements dataset.ColumnFactory, dataset.BuilderFactory, dataset.Aggregator, and dataset.Caster.
Usage:
eng := memory.NewEngine(context.Background())
f := eng.(dataset.ColumnFactory)
ds, _ := f.FromColumns(
dataset.NewSchema(dataset.FloatCol("x"), dataset.StringCol("label")),
f.NewFloat64Column("x", []float64{1, 2, 3}),
f.NewStringColumn("label", []string{"a", "b", "c"}),
)
Index ¶
- Variables
- type Engine
- func (e *Engine) Abs(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Acos(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) AddCols(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) AddScalar(col dataset.AnyColumn, val float64) (dataset.AnyColumn, error)
- func (e *Engine) Asin(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Atan(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Atan2(y, x dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) BitAnd(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) BitNot(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) BitOr(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) BitShiftLeft(col dataset.AnyColumn, n int) (dataset.AnyColumn, error)
- func (e *Engine) BitShiftRight(col dataset.AnyColumn, n int) (dataset.AnyColumn, error)
- func (e *Engine) BitXor(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Boxplot(yCol, groupCol dataset.AnyColumn, whisker string, notch bool) (dataset.Table, error)
- func (e *Engine) Cast(col dataset.AnyColumn, target dataset.DType) (dataset.AnyColumn, error)
- func (e *Engine) Ceil(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Combine(datasets ...dataset.Table) (dataset.Table, error)
- func (e *Engine) Complete(ds dataset.Table, cols ...string) (dataset.Table, error)
- func (e *Engine) Concatenate(ds dataset.Table, col string, from []string, sep string) (dataset.Table, error)
- func (e *Engine) Context() context.Context
- func (e *Engine) Cos(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Count(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) CumMax(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) CumMin(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) CumSum(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) DenseRank(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) DivCols(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) DropNA(ds dataset.Table, cols ...string) (dataset.Table, error)
- func (e *Engine) Erf(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Exp(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Fill(col dataset.AnyColumn, dir dataset.FillDirection) (dataset.AnyColumn, error)
- func (e *Engine) Filter(ds dataset.Table, mask dataset.Masker) (dataset.Table, error)
- func (e *Engine) FilterIndices(mask []bool) []int
- func (e *Engine) First(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Floor(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) FromColumns(schema *dataset.Schema, cols ...dataset.AnyColumn) (dataset.Table, error)
- func (e *Engine) Histogram(col dataset.AnyColumn, nBins int) (dataset.Table, error)
- func (e *Engine) Join(left, right dataset.Table, spec dataset.JoinSpec) (dataset.Table, error)
- func (e *Engine) KDE(ctx context.Context, col dataset.AnyColumn, bandwidth float64, points int) (dataset.Table, error)
- func (e *Engine) Lag(col dataset.AnyColumn, n int) (dataset.AnyColumn, error)
- func (e *Engine) Last(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Lead(col dataset.AnyColumn, n int) (dataset.AnyColumn, error)
- func (e *Engine) LinearFit(xCol, yCol dataset.AnyColumn, nOut int) (dataset.Table, error)
- func (e *Engine) LinearFitSE(xCol, yCol dataset.AnyColumn, nOut int) (dataset.Table, error)
- func (e *Engine) Ln(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) LoessFit(ctx context.Context, xCol, yCol dataset.AnyColumn, nOut int) (dataset.Table, error)
- func (e *Engine) LoessFitSE(ctx context.Context, xCol, yCol dataset.AnyColumn, nOut int) (dataset.Table, error)
- func (e *Engine) Log2(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Log10(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Mean(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Median(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) MinMax(col dataset.AnyColumn) (dataset.AnyColumn, dataset.AnyColumn, error)
- func (e *Engine) Mode(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) MulCols(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) MulScalar(col dataset.AnyColumn, val float64) (dataset.AnyColumn, error)
- func (e *Engine) Name() string
- func (e *Engine) Neg(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) NewBoolColumn(name string, data []bool) dataset.AnyColumn
- func (e *Engine) NewBuilder(schema *dataset.Schema) dataset.Builder
- func (e *Engine) NewFloat64Column(name string, data []float64) dataset.AnyColumn
- func (e *Engine) NewInt64Column(name string, data []int64) dataset.AnyColumn
- func (e *Engine) NewStringColumn(name string, data []string) dataset.AnyColumn
- func (e *Engine) NewTimestampColumn(name string, data []int64) dataset.AnyColumn
- func (e *Engine) PercentRank(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Percentile(col dataset.AnyColumn, p float64) (dataset.AnyColumn, error)
- func (e *Engine) PivotLonger(ds dataset.Table, spec dataset.PivotLongerSpec) (dataset.Table, error)
- func (e *Engine) PivotWider(ds dataset.Table, spec dataset.PivotWiderSpec) (dataset.Table, error)
- func (e *Engine) Pow(col dataset.AnyColumn, exp float64) (dataset.AnyColumn, error)
- func (e *Engine) Rank(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) ReadCSV(_ context.Context, r io.Reader, cfg dataset.CSVConfig) (dataset.Table, error)
- func (e *Engine) ReadParquet(_ context.Context, r io.ReaderAt, size int64, _ dataset.ParquetConfig) (dataset.Table, error)
- func (e *Engine) ReplaceNA(col dataset.AnyColumn, defaultVal float64) (dataset.AnyColumn, error)
- func (e *Engine) Round(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) RowNumber(n int) (dataset.AnyColumn, error)
- func (e *Engine) Select(col dataset.AnyColumn, indices []int) (dataset.AnyColumn, error)
- func (e *Engine) Separate(ds dataset.Table, col string, into []string, sep string) (dataset.Table, error)
- func (e *Engine) Sigmoid(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Sign(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Sin(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Slice(col dataset.AnyColumn, start, end int) (dataset.AnyColumn, error)
- func (e *Engine) SortIndices(col dataset.AnyColumn) ([]int, error)
- func (e *Engine) Sqrt(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Stack(datasets ...dataset.Table) (dataset.Table, error)
- func (e *Engine) StdDev(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) SubCols(a, b dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Sum(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Tan(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Tanh(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) Variance(col dataset.AnyColumn) (dataset.AnyColumn, error)
- func (e *Engine) WriteCSV(_ context.Context, w io.Writer, ds dataset.Table, cfg dataset.CSVConfig) error
- func (e *Engine) WriteParquet(_ context.Context, w io.Writer, ds dataset.Table, _ dataset.ParquetConfig) error
Constants ¶
This section is empty.
Variables ¶
var ( // ErrUnsupportedType is returned for unsupported column types. ErrUnsupportedType = errors.New("memory: unsupported column type") // ErrLengthMismatch is returned when column lengths don't match. ErrLengthMismatch = errors.New("memory: column length mismatch") // ErrEmptyColumn is returned when an operation requires non-empty data. ErrEmptyColumn = errors.New("memory: empty column") // ErrRequiresFloat64 is returned when a float64 column is required. ErrRequiresFloat64 = errors.New("memory: operation requires float64 column") // ErrRequiresInt64 is returned when an int64 column is required. ErrRequiresInt64 = errors.New("memory: operation requires int64 column") // ErrRequiresNumeric is returned when a numeric column is required. ErrRequiresNumeric = errors.New("memory: operation requires numeric column") // ErrJoinKeyMismatch is returned when join key types don't match. ErrJoinKeyMismatch = errors.New("memory: join key type mismatch") // ErrTakeTypeMismatch is returned when a Take/Select result has unexpected type. ErrTakeTypeMismatch = errors.New("memory: unexpected result type from Take/Select") // ErrOutOfRange is returned when a parameter value is out of the expected range. ErrOutOfRange = errors.New("memory: value out of range") )
Sentinel errors for the memory engine package.
Functions ¶
This section is empty.
Types ¶
type Engine ¶
type Engine struct {
// contains filtered or unexported fields
}
Engine is the Go-slice compute backend.
func (*Engine) BitShiftLeft ¶
BitShiftLeft shifts each int64 element left by n bits.
func (*Engine) BitShiftRight ¶
BitShiftRight shifts each int64 element right by n bits.
func (*Engine) Boxplot ¶ added in v0.0.5
func (e *Engine) Boxplot(yCol, groupCol dataset.AnyColumn, whisker string, notch bool) (dataset.Table, error)
Boxplot computes the five-number summary for a numeric column.
func (*Engine) Complete ¶
Complete generates all combinations of the specified columns' unique values, filling missing rows with null values.
func (*Engine) Concatenate ¶
func (e *Engine) Concatenate(ds dataset.Table, col string, from []string, sep string) (dataset.Table, error)
Concatenate joins multiple string columns into one with a separator.
func (*Engine) FilterIndices ¶
FilterIndices returns the indices where mask is true.
func (*Engine) First ¶ added in v0.0.5
First returns the first element of a column as a single-row column.
func (*Engine) FromColumns ¶
func (e *Engine) FromColumns(schema *dataset.Schema, cols ...dataset.AnyColumn) (dataset.Table, error)
FromColumns constructs a Table from a schema and pre-built columns.
func (*Engine) Join ¶
Join implements the Joiner interface with a hash-join algorithm. It supports Inner, Left, Right, Full, Semi, and Anti joins.
func (*Engine) KDE ¶ added in v0.0.5
func (e *Engine) KDE(ctx context.Context, col dataset.AnyColumn, bandwidth float64, points int) (dataset.Table, error)
KDE computes kernel density estimation over a numeric column.
func (*Engine) Last ¶ added in v0.0.5
Last returns the last element of a column as a single-row column.
func (*Engine) LinearFitSE ¶ added in v0.0.5
LinearFitSE computes OLS regression with 95% confidence bands.
func (*Engine) LoessFit ¶ added in v0.0.5
func (e *Engine) LoessFit(ctx context.Context, xCol, yCol dataset.AnyColumn, nOut int) (dataset.Table, error)
LoessFit computes locally weighted regression (LOESS).
func (*Engine) LoessFitSE ¶ added in v0.0.5
func (e *Engine) LoessFitSE(ctx context.Context, xCol, yCol dataset.AnyColumn, nOut int) (dataset.Table, error)
LoessFitSE computes LOESS with approximate 95% confidence bands. Uses local residual variance to estimate SE at each grid point.
func (*Engine) Mode ¶ added in v0.0.5
Mode returns the most frequent value as a single-row column. For ties, the first value encountered wins.
func (*Engine) NewBoolColumn ¶
NewBoolColumn creates a bool column from the given slice.
func (*Engine) NewBuilder ¶
NewBuilder creates a typed row-appender for the given schema.
func (*Engine) NewFloat64Column ¶
NewFloat64Column creates a float64 column from the given slice.
func (*Engine) NewInt64Column ¶
NewInt64Column creates an int64 column from the given slice.
func (*Engine) NewStringColumn ¶
NewStringColumn creates a string column from the given slice.
func (*Engine) NewTimestampColumn ¶
NewTimestampColumn creates a timestamp column (int64-backed) from the given slice.
func (*Engine) PercentRank ¶
PercentRank returns (rank - 1) / (n - 1) as float64. Returns 0 for single element.
func (*Engine) Percentile ¶ added in v0.0.5
Percentile returns the p-th quantile as a single-row float64 column. p ∈ [0,1]. Uses sort-based linear interpolation (R-7 method).
func (*Engine) PivotLonger ¶
PivotLonger reshapes a wide dataset to long format. Columns listed in spec.Cols are "gathered" into two new columns: spec.NamesTo (holds original column names) and spec.ValuesTo (holds values). All other columns are repeated for each gathered column.
func (*Engine) PivotWider ¶
PivotWider reshapes a long dataset to wide format. spec.NamesFrom identifies the column whose unique values become new column names. spec.ValuesFrom identifies the column whose values fill the new columns. All other columns are the "id" columns that define unique rows.
func (*Engine) Rank ¶
Rank returns competition rank (1-indexed). Ties get the same rank, next rank skips. E.g. [10,20,20,30] → [1,2,2,4].
func (*Engine) ReadCSV ¶
func (e *Engine) ReadCSV(_ context.Context, r io.Reader, cfg dataset.CSVConfig) (dataset.Table, error)
ReadCSV reads CSV data using go-simdcsv with schema inference.
func (*Engine) ReadParquet ¶
func (e *Engine) ReadParquet(_ context.Context, r io.ReaderAt, size int64, _ dataset.ParquetConfig) (dataset.Table, error)
ReadParquet reads Parquet data using parquet-go (row-based reader).
func (*Engine) ReplaceNA ¶
ReplaceNA replaces null (NaN) values in a float64 column with defaultVal.
func (*Engine) Separate ¶
func (e *Engine) Separate(ds dataset.Table, col string, into []string, sep string) (dataset.Table, error)
Separate splits a string column by a delimiter into multiple columns.
func (*Engine) SortIndices ¶
SortIndices returns the permutation that sorts the column ascending.
func (*Engine) StdDev ¶ added in v0.0.5
StdDev returns the sample standard deviation as a single-row float64 column.
func (*Engine) Variance ¶
Variance returns the sample variance of a float64 column as a single-row column.