Documentation
¶
Overview ¶
Package stat provides composable statistical transforms for the Grammar of Graphics. Each transform implements the Transform interface and can be chained arbitrarily in a pipeline:
stat.NormalizeY(stat.BinX(stat.WithBins(40)))
All transforms delegate heavy computation to the engine's dataset.StatKernel interface — zero materialization occurs within this package.
Transform factories:
- BinX — histogram binning (via StatKernel.Histogram)
- Count — frequency counting (via GroupBy + AggCount)
- DensityX — kernel density estimation (via StatKernel.KDE)
- SmoothXY — linear / LOESS regression (via StatKernel.LinearFit / LoessFit)
- SummaryXY — grouped mean (via GroupBy + AggMean)
- BoxplotY — five-number summary (via StatKernel.Boxplot)
- IdentityTransform — pass-through
- NormalizeY / NormalizeX — rescale channel to sum to a total
- FilterX / FilterY — row-level predicate filter
- SortBy — sort rows by column
- ReverseRows — reverse row order
- TopN — keep top/bottom N rows by column
- SelectRow — keep a single row by mode (first, last, min, max)
- StackY / StackX — cumulative stacking within a group
- GroupX / GroupY — group-by with reducer (sum, mean, median, min, max, count, variance, deviation, first, last, mode)
Index ¶
- Variables
- func RunPipeline(ctx context.Context, pipeline []Transform, data dataset.Dataset, ...) (dataset.Dataset, map[string]string, error)
- type BinOption
- type BoxplotOption
- type ChannelHint
- type DensityOption
- type GroupOption
- type NormalizeOption
- type SelectMode
- type SmoothOption
- type SortOption
- type Transform
- func BinX(opts ...BinOption) Transform
- func BinY(opts ...BinOption) Transform
- func BoxplotY(opts ...BoxplotOption) Transform
- func Count() Transform
- func DensityX(opts ...DensityOption) Transform
- func DensityY(opts ...DensityOption) Transform
- func FilterX(masker dataset.Masker) Transform
- func FilterY(masker dataset.Masker) Transform
- func Group(xReducer, yReducer string, opts ...GroupOption) Transform
- func GroupX(reducer string) Transform
- func GroupY(reducer string) Transform
- func IdentityTransform() Transform
- func NormalizeX(opts ...NormalizeOption) Transform
- func NormalizeY(opts ...NormalizeOption) Transform
- func ReverseRows() Transform
- func SelectRow(mode SelectMode, column string) Transform
- func SmoothXY(opts ...SmoothOption) Transform
- func SortBy(column string, opts ...SortOption) Transform
- func StackX() Transform
- func StackY() Transform
- func SummaryXY() Transform
- func TopN(n int, column string, opts ...SortOption) Transform
- type TransformInput
- type TransformResult
Constants ¶
This section is empty.
Variables ¶
var ( // ErrInsufficientData is returned when there's not enough data. ErrInsufficientData = errors.New("stat: insufficient data") // ErrUnsupportedType is returned for unsupported column types. ErrUnsupportedType = errors.New("stat: unsupported column type") // ErrMissingColumn is returned when a required column is missing. ErrMissingColumn = errors.New("stat: missing required column") // ErrInvalidParam is returned for invalid parameter values. ErrInvalidParam = errors.New("stat: invalid parameter") )
Sentinel errors for the stat package.
Functions ¶
func RunPipeline ¶ added in v0.0.5
func RunPipeline(ctx context.Context, pipeline []Transform, data dataset.Dataset, mapping map[string]string) (dataset.Dataset, map[string]string, error)
RunPipeline executes an ordered chain of transforms. If the pipeline is nil or empty, data passes through unchanged (identity).
Between stages, if a transform produces a lazy (uncollected) Dataset, RunPipeline materializes it before passing to the next transform. This ensures each transform receives a Dataset with a valid Table(), so engine interfaces (Selector, Aggregator, etc.) work correctly.
Types ¶
type BinOption ¶ added in v0.0.5
type BinOption func(*binConfig)
BinOption configures the BinX / BinY transform.
func WithBinMethod ¶ added in v0.0.5
WithBinMethod selects the automatic binning strategy. Supported: "sturges" (default), "scott", "fd" (Freedman-Diaconis), "sqrt".
func WithCumulative ¶ added in v0.0.5
WithCumulative enables cumulative histogram mode. +1 = forward cumulative (left to right), -1 = reverse (right to left). 0 = off (default, standard histogram).
type BoxplotOption ¶ added in v0.0.5
type BoxplotOption func(*boxplotConfig)
BoxplotOption configures the BoxplotY transform.
func WithNotch ¶ added in v0.0.5
func WithNotch(enabled bool) BoxplotOption
WithNotch enables notched boxplots (95% CI around median).
func WithWhisker ¶ added in v0.0.5
func WithWhisker(rule string) BoxplotOption
WithWhisker sets the whisker rule: "tukey" (1.5×IQR, default) or "range" (min-max).
type ChannelHint ¶ added in v0.0.5
type ChannelHint string
ChannelHint declares the semantic meaning of a channel's output. Open string type: known hints get special formatting; unknown hints get default formatting. Third-party transforms can declare arbitrary hints.
const ( HintNone ChannelHint = "" HintCount ChannelHint = "count" HintProportion ChannelHint = "proportion" HintProbability ChannelHint = "probability" // axis clamps to [0,1] HintInterval ChannelHint = "interval" HintCumulative ChannelHint = "cumulative" HintDeviation ChannelHint = "deviation" )
Standard channel hints.
type DensityOption ¶ added in v0.0.5
type DensityOption func(*densityConfig)
DensityOption configures the DensityX / DensityY transform.
func WithBandwidth ¶ added in v0.0.5
func WithBandwidth(bw float64) DensityOption
WithBandwidth sets an explicit KDE bandwidth. 0 = Silverman auto-select.
func WithDensityPoints ¶ added in v0.0.5
func WithDensityPoints(n int) DensityOption
WithDensityPoints sets the number of output grid points.
type GroupOption ¶ added in v0.0.5
type GroupOption func(*groupDualConfig)
GroupOption configures the Group transform.
func WithGroupBy ¶ added in v0.0.5
func WithGroupBy(col string) GroupOption
WithGroupBy sets an explicit group-by column for Group. When empty, the group-by column is auto-detected from the "color" or "group" aesthetic.
type NormalizeOption ¶ added in v0.0.5
type NormalizeOption func(*normalizeConfig)
NormalizeOption configures NormalizeY / NormalizeX.
func WithTotal ¶ added in v0.0.5
func WithTotal(t float64) NormalizeOption
WithTotal sets the target sum for normalization (default 1.0). Use 100 for percentage output.
type SelectMode ¶ added in v0.0.5
type SelectMode string
SelectMode identifies which row to keep from a sorted column.
const ( SelectFirst SelectMode = "first" // keep first row (smallest value) SelectLast SelectMode = "last" // keep last row (largest value) SelectMin SelectMode = "min" // keep row with minimum value SelectMax SelectMode = "max" // keep row with maximum value )
Standard select modes.
type SmoothOption ¶ added in v0.0.5
type SmoothOption func(*smoothConfig)
SmoothOption configures the SmoothXY transform.
func WithMethod ¶ added in v0.0.5
func WithMethod(m string) SmoothOption
WithMethod sets the smoothing method: "linear" or "loess" (default).
func WithNOut ¶ added in v0.0.5
func WithNOut(n int) SmoothOption
WithNOut sets the output grid size.
func WithSE ¶ added in v0.0.5
func WithSE(se bool) SmoothOption
WithSE enables 95% confidence band output (ymin, ymax columns).
func WithSmoothPoints ¶ added in v0.0.5
func WithSmoothPoints(n int) SmoothOption
WithSmoothPoints is an alias for WithNOut for backward compatibility.
type SortOption ¶ added in v0.0.5
type SortOption func(*sortConfig)
SortOption configures SortBy and TopN.
func Ascending ¶ added in v0.0.5
func Ascending() SortOption
Ascending makes SortBy sort in ascending order (the default).
type Transform ¶ added in v0.0.5
type Transform interface {
// Name returns a stable identifier for debugging, golden tests,
// and pipeline introspection.
Name() string
// Apply runs the transform. Implementations MUST NOT mutate in.
Apply(ctx context.Context, in TransformInput) (TransformResult, error)
// OutputMapping describes how aesthetic channels are rewritten.
// nil means the transform preserves the mapping (identity, filter, sort).
// A non-nil map rewrites channels: {"y": "count"} means the y channel
// now points at the "count" column the transform produced.
OutputMapping() map[string]string
// OutputSchema names the columns this transform produces.
OutputSchema() []string
// OutputHints declares semantic hints for output channels.
// Axis/legend formatters use these: HintProportion → "%" tick formatting,
// HintCount → integer ticks, HintInterval → bin-edge rendering.
// nil for transforms that don't change channel semantics.
OutputHints() map[string]ChannelHint
}
Transform is the composable data-transform contract. Transforms chain because their input and output shapes are the same: data + mapping in, data + mapping out. The bin transform, the smooth transform, the normalize transform, the identity transform — all the same shape.
Apply MUST NOT mutate in; it returns a new TransformResult.
func BinX ¶ added in v0.0.5
BinX returns a Transform that bins the x channel into evenly-spaced bins, producing x (centers) and count (per-bin frequency). Uses engine-native StatKernel.Histogram — zero materialization in stat/.
func BinY ¶ added in v0.0.5
BinY returns a Transform that bins the y channel into evenly-spaced bins, producing y (centers) and count (per-bin frequency). This is the symmetric counterpart of BinX for horizontal histograms.
func BoxplotY ¶ added in v0.0.5
func BoxplotY(opts ...BoxplotOption) Transform
BoxplotY returns a Transform that computes the five-number summary for each unique X group. Produces x, lower, q1, middle, q3, upper, notch_lower, notch_upper columns. Uses engine-native StatKernel.Boxplot — zero materialization in stat/.
func Count ¶
func Count() Transform
Count returns a Transform that counts occurrences of each unique x value. Produces x (unique values, sorted) and count columns. Uses engine-native GroupBy + Summarize(AggCount) — zero materialization.
func DensityX ¶ added in v0.0.5
func DensityX(opts ...DensityOption) Transform
DensityX returns a Transform that computes a kernel density estimation on the x channel. Produces x (grid) and density columns. Uses engine-native StatKernel.KDE — zero materialization in stat/.
func DensityY ¶ added in v0.0.5
func DensityY(opts ...DensityOption) Transform
DensityY returns a Transform that computes a kernel density estimation on the y channel. Produces y (grid) and density columns. This is the symmetric counterpart of DensityX for horizontal density plots.
func FilterX ¶ added in v0.0.5
FilterX returns a Transform that keeps only rows where the x channel satisfies the given masker.
func FilterY ¶ added in v0.0.5
FilterY returns a Transform that keeps only rows where the y channel satisfies the given masker. The masker is evaluated by the engine (supports BigQuery SQL pushdown, Arrow compute kernels, etc.).
Use dataset predicates to build maskers:
stat.FilterY(dataset.Gt("y", 25.0))
func Group ¶ added in v0.0.5
func Group(xReducer, yReducer string, opts ...GroupOption) Transform
Group returns a Transform that groups data and applies different reducers to both axes simultaneously. The group-by column is auto-detected from the "color" or "group" aesthetic in the mapping; use WithGroupBy to override.
Example:
stat.Group("mean", "sum") // mean(x), sum(y), group by color
stat.Group("p50", "max", stat.WithGroupBy("id")) // p50(x), max(y), group by id
func GroupX ¶ added in v0.0.5
GroupX returns a Transform that groups data by the x channel and applies a named reducer to the y channel within each group. Produces sorted unique x values and the reduced y values.
For engine-supported aggregations (sum, mean, min, max, count, median, variance), delegates to the engine's Aggregator interface via Dataset.GroupBy().Summarize(). For custom reducers, falls back to scalar computation.
GroupX("mean") is equivalent to the existing SummaryXY transform.
func IdentityTransform ¶ added in v0.0.5
func IdentityTransform() Transform
IdentityTransform returns a Transform that passes data through unchanged.
func NormalizeX ¶ added in v0.0.5
func NormalizeX(opts ...NormalizeOption) Transform
NormalizeX rescales the x channel so values sum to the given total.
func NormalizeY ¶ added in v0.0.5
func NormalizeY(opts ...NormalizeOption) Transform
NormalizeY rescales the y channel so values sum to the given total (default 1.0). Applied as a pipeline stage after transforms like BinX or Count to convert frequencies into proportions.
Uses the engine's Aggregator.Sum for the column sum and MathKernel.MulScalar for element-wise scaling — no manual float64 loops. Returns a lazy Dataset.
Output hint: y → HintProportion.
func ReverseRows ¶ added in v0.0.5
func ReverseRows() Transform
ReverseRows returns a Transform that reverses the row order. Uses Dataset.SelectRows for engine-native scatter-gather.
func SelectRow ¶ added in v0.0.5
func SelectRow(mode SelectMode, column string) Transform
SelectRow returns a Transform that selects a single row from the dataset based on the given mode and column. The mode determines which row is kept:
- SelectFirst: first row in natural order
- SelectLast: last row in natural order
- SelectMin: row with the smallest value in column
- SelectMax: row with the largest value in column
Uses engine-native Arrange + Head/Tail — stays lazy.
func SmoothXY ¶ added in v0.0.5
func SmoothXY(opts ...SmoothOption) Transform
SmoothXY returns a Transform that fits a smooth curve through (x, y) data. Produces x (grid) and y (fitted) columns. Uses engine-native StatKernel.LinearFit or LoessFit — zero materialization.
func SortBy ¶ added in v0.0.5
func SortBy(column string, opts ...SortOption) Transform
SortBy returns a Transform that sorts all rows by the named column. Default order is ascending; use Desc() for descending. Uses the engine's Selector.SortIndices + Dataset.SelectRows.
func StackX ¶ added in v0.0.5
func StackX() Transform
StackX returns a Transform that cumulatively stacks x values. Produces an "xmin" column containing the base of each stacked segment: xmin = CumSum(x) - x, and replaces x with CumSum(x).
Uses the engine's Windower.CumSum and MathKernel.SubCols — no manual float64 loops. Returns a lazy Dataset.
This is the horizontal counterpart of StackY.
func StackY ¶ added in v0.0.5
func StackY() Transform
StackY returns a Transform that cumulatively stacks y values. Produces a "ymin" column containing the base of each stacked segment: ymin = CumSum(y) - y, and replaces y with CumSum(y).
Uses the engine's Windower.CumSum and MathKernel.SubCols — no manual float64 loops. Returns a lazy Dataset.
Unlike geom.Stack() (a position adjustment that offsets bar positions across color groups), StackY is a data transform that accumulates y values within a single group's pipeline.
type TransformInput ¶ added in v0.0.5
TransformInput carries data and aesthetic mapping into a transform.