stat

package
v0.0.8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 24, 2026 License: MIT Imports: 6 Imported by: 0

Documentation

Overview

Package stat provides composable statistical transforms for the Grammar of Graphics. Each transform implements the Transform interface and can be chained arbitrarily in a pipeline:

stat.NormalizeY(stat.BinX(stat.WithBins(40)))

All transforms delegate heavy computation to the engine's dataset.StatKernel interface — zero materialization occurs within this package.

Transform factories:

  • BinX — histogram binning (via StatKernel.Histogram)
  • Count — frequency counting (via GroupBy + AggCount)
  • DensityX — kernel density estimation (via StatKernel.KDE)
  • SmoothXY — linear / LOESS regression (via StatKernel.LinearFit / LoessFit)
  • SummaryXY — grouped mean (via GroupBy + AggMean)
  • BoxplotY — five-number summary (via StatKernel.Boxplot)
  • IdentityTransform — pass-through
  • NormalizeY / NormalizeX — rescale channel to sum to a total
  • FilterX / FilterY — row-level predicate filter
  • SortBy — sort rows by column
  • ReverseRows — reverse row order
  • TopN — keep top/bottom N rows by column
  • SelectRow — keep a single row by mode (first, last, min, max)
  • StackY / StackX — cumulative stacking within a group
  • GroupX / GroupY — group-by with reducer (sum, mean, median, min, max, count, variance, deviation, first, last, mode)

Index

Constants

This section is empty.

Variables

View Source
var (
	// ErrInsufficientData is returned when there's not enough data.
	ErrInsufficientData = errors.New("stat: insufficient data")

	// ErrUnsupportedType is returned for unsupported column types.
	ErrUnsupportedType = errors.New("stat: unsupported column type")

	// ErrMissingColumn is returned when a required column is missing.
	ErrMissingColumn = errors.New("stat: missing required column")

	// ErrInvalidParam is returned for invalid parameter values.
	ErrInvalidParam = errors.New("stat: invalid parameter")
)

Sentinel errors for the stat package.

Functions

func RunPipeline added in v0.0.5

func RunPipeline(ctx context.Context, pipeline []Transform, data dataset.Dataset, mapping map[string]string) (dataset.Dataset, map[string]string, error)

RunPipeline executes an ordered chain of transforms. If the pipeline is nil or empty, data passes through unchanged (identity).

Between stages, if a transform produces a lazy (uncollected) Dataset, RunPipeline materializes it before passing to the next transform. This ensures each transform receives a Dataset with a valid Table(), so engine interfaces (Selector, Aggregator, etc.) work correctly.

Types

type BinOption added in v0.0.5

type BinOption func(*binConfig)

BinOption configures the BinX / BinY transform.

func WithBinMethod added in v0.0.5

func WithBinMethod(m string) BinOption

WithBinMethod selects the automatic binning strategy. Supported: "sturges" (default), "scott", "fd" (Freedman-Diaconis), "sqrt".

func WithBins added in v0.0.5

func WithBins(n int) BinOption

WithBins sets an explicit bin count.

func WithCumulative added in v0.0.5

func WithCumulative(dir int) BinOption

WithCumulative enables cumulative histogram mode. +1 = forward cumulative (left to right), -1 = reverse (right to left). 0 = off (default, standard histogram).

type BoxplotOption added in v0.0.5

type BoxplotOption func(*boxplotConfig)

BoxplotOption configures the BoxplotY transform.

func WithNotch added in v0.0.5

func WithNotch(enabled bool) BoxplotOption

WithNotch enables notched boxplots (95% CI around median).

func WithWhisker added in v0.0.5

func WithWhisker(rule string) BoxplotOption

WithWhisker sets the whisker rule: "tukey" (1.5×IQR, default) or "range" (min-max).

type ChannelHint added in v0.0.5

type ChannelHint string

ChannelHint declares the semantic meaning of a channel's output. Open string type: known hints get special formatting; unknown hints get default formatting. Third-party transforms can declare arbitrary hints.

const (
	HintNone        ChannelHint = ""
	HintCount       ChannelHint = "count"
	HintProportion  ChannelHint = "proportion"
	HintProbability ChannelHint = "probability" // axis clamps to [0,1]
	HintInterval    ChannelHint = "interval"
	HintCumulative  ChannelHint = "cumulative"
	HintDeviation   ChannelHint = "deviation"
)

Standard channel hints.

type DensityOption added in v0.0.5

type DensityOption func(*densityConfig)

DensityOption configures the DensityX / DensityY transform.

func WithBandwidth added in v0.0.5

func WithBandwidth(bw float64) DensityOption

WithBandwidth sets an explicit KDE bandwidth. 0 = Silverman auto-select.

func WithDensityPoints added in v0.0.5

func WithDensityPoints(n int) DensityOption

WithDensityPoints sets the number of output grid points.

type GroupOption added in v0.0.5

type GroupOption func(*groupDualConfig)

GroupOption configures the Group transform.

func WithGroupBy added in v0.0.5

func WithGroupBy(col string) GroupOption

WithGroupBy sets an explicit group-by column for Group. When empty, the group-by column is auto-detected from the "color" or "group" aesthetic.

type NormalizeOption added in v0.0.5

type NormalizeOption func(*normalizeConfig)

NormalizeOption configures NormalizeY / NormalizeX.

func WithTotal added in v0.0.5

func WithTotal(t float64) NormalizeOption

WithTotal sets the target sum for normalization (default 1.0). Use 100 for percentage output.

type SelectMode added in v0.0.5

type SelectMode string

SelectMode identifies which row to keep from a sorted column.

const (
	SelectFirst SelectMode = "first" // keep first row (smallest value)
	SelectLast  SelectMode = "last"  // keep last row (largest value)
	SelectMin   SelectMode = "min"   // keep row with minimum value
	SelectMax   SelectMode = "max"   // keep row with maximum value
)

Standard select modes.

type SmoothOption added in v0.0.5

type SmoothOption func(*smoothConfig)

SmoothOption configures the SmoothXY transform.

func WithMethod added in v0.0.5

func WithMethod(m string) SmoothOption

WithMethod sets the smoothing method: "linear" or "loess" (default).

func WithNOut added in v0.0.5

func WithNOut(n int) SmoothOption

WithNOut sets the output grid size.

func WithSE added in v0.0.5

func WithSE(se bool) SmoothOption

WithSE enables 95% confidence band output (ymin, ymax columns).

func WithSmoothPoints added in v0.0.5

func WithSmoothPoints(n int) SmoothOption

WithSmoothPoints is an alias for WithNOut for backward compatibility.

type SortOption added in v0.0.5

type SortOption func(*sortConfig)

SortOption configures SortBy and TopN.

func Ascending added in v0.0.5

func Ascending() SortOption

Ascending makes SortBy sort in ascending order (the default).

func Desc added in v0.0.5

func Desc() SortOption

Desc makes SortBy sort in descending order.

type Transform added in v0.0.5

type Transform interface {
	// Name returns a stable identifier for debugging, golden tests,
	// and pipeline introspection.
	Name() string

	// Apply runs the transform. Implementations MUST NOT mutate in.
	Apply(ctx context.Context, in TransformInput) (TransformResult, error)

	// OutputMapping describes how aesthetic channels are rewritten.
	// nil means the transform preserves the mapping (identity, filter, sort).
	// A non-nil map rewrites channels: {"y": "count"} means the y channel
	// now points at the "count" column the transform produced.
	OutputMapping() map[string]string

	// OutputSchema names the columns this transform produces.
	OutputSchema() []string

	// OutputHints declares semantic hints for output channels.
	// Axis/legend formatters use these: HintProportion → "%" tick formatting,
	// HintCount → integer ticks, HintInterval → bin-edge rendering.
	// nil for transforms that don't change channel semantics.
	OutputHints() map[string]ChannelHint
}

Transform is the composable data-transform contract. Transforms chain because their input and output shapes are the same: data + mapping in, data + mapping out. The bin transform, the smooth transform, the normalize transform, the identity transform — all the same shape.

Apply MUST NOT mutate in; it returns a new TransformResult.

func BinX added in v0.0.5

func BinX(opts ...BinOption) Transform

BinX returns a Transform that bins the x channel into evenly-spaced bins, producing x (centers) and count (per-bin frequency). Uses engine-native StatKernel.Histogram — zero materialization in stat/.

func BinY added in v0.0.5

func BinY(opts ...BinOption) Transform

BinY returns a Transform that bins the y channel into evenly-spaced bins, producing y (centers) and count (per-bin frequency). This is the symmetric counterpart of BinX for horizontal histograms.

func BoxplotY added in v0.0.5

func BoxplotY(opts ...BoxplotOption) Transform

BoxplotY returns a Transform that computes the five-number summary for each unique X group. Produces x, lower, q1, middle, q3, upper, notch_lower, notch_upper columns. Uses engine-native StatKernel.Boxplot — zero materialization in stat/.

func Count

func Count() Transform

Count returns a Transform that counts occurrences of each unique x value. Produces x (unique values, sorted) and count columns. Uses engine-native GroupBy + Summarize(AggCount) — zero materialization.

func DensityX added in v0.0.5

func DensityX(opts ...DensityOption) Transform

DensityX returns a Transform that computes a kernel density estimation on the x channel. Produces x (grid) and density columns. Uses engine-native StatKernel.KDE — zero materialization in stat/.

func DensityY added in v0.0.5

func DensityY(opts ...DensityOption) Transform

DensityY returns a Transform that computes a kernel density estimation on the y channel. Produces y (grid) and density columns. This is the symmetric counterpart of DensityX for horizontal density plots.

func FilterX added in v0.0.5

func FilterX(masker dataset.Masker) Transform

FilterX returns a Transform that keeps only rows where the x channel satisfies the given masker.

func FilterY added in v0.0.5

func FilterY(masker dataset.Masker) Transform

FilterY returns a Transform that keeps only rows where the y channel satisfies the given masker. The masker is evaluated by the engine (supports BigQuery SQL pushdown, Arrow compute kernels, etc.).

Use dataset predicates to build maskers:

stat.FilterY(dataset.Gt("y", 25.0))

func Group added in v0.0.5

func Group(xReducer, yReducer string, opts ...GroupOption) Transform

Group returns a Transform that groups data and applies different reducers to both axes simultaneously. The group-by column is auto-detected from the "color" or "group" aesthetic in the mapping; use WithGroupBy to override.

Example:

stat.Group("mean", "sum")                        // mean(x), sum(y), group by color
stat.Group("p50", "max", stat.WithGroupBy("id")) // p50(x), max(y), group by id

func GroupX added in v0.0.5

func GroupX(reducer string) Transform

GroupX returns a Transform that groups data by the x channel and applies a named reducer to the y channel within each group. Produces sorted unique x values and the reduced y values.

For engine-supported aggregations (sum, mean, min, max, count, median, variance), delegates to the engine's Aggregator interface via Dataset.GroupBy().Summarize(). For custom reducers, falls back to scalar computation.

GroupX("mean") is equivalent to the existing SummaryXY transform.

func GroupY added in v0.0.5

func GroupY(reducer string) Transform

GroupY returns a Transform that groups by y and reduces x.

func IdentityTransform added in v0.0.5

func IdentityTransform() Transform

IdentityTransform returns a Transform that passes data through unchanged.

func NormalizeX added in v0.0.5

func NormalizeX(opts ...NormalizeOption) Transform

NormalizeX rescales the x channel so values sum to the given total.

func NormalizeY added in v0.0.5

func NormalizeY(opts ...NormalizeOption) Transform

NormalizeY rescales the y channel so values sum to the given total (default 1.0). Applied as a pipeline stage after transforms like BinX or Count to convert frequencies into proportions.

Uses the engine's Aggregator.Sum for the column sum and MathKernel.MulScalar for element-wise scaling — no manual float64 loops. Returns a lazy Dataset.

Output hint: y → HintProportion.

func ReverseRows added in v0.0.5

func ReverseRows() Transform

ReverseRows returns a Transform that reverses the row order. Uses Dataset.SelectRows for engine-native scatter-gather.

func SelectRow added in v0.0.5

func SelectRow(mode SelectMode, column string) Transform

SelectRow returns a Transform that selects a single row from the dataset based on the given mode and column. The mode determines which row is kept:

Uses engine-native Arrange + Head/Tail — stays lazy.

func SmoothXY added in v0.0.5

func SmoothXY(opts ...SmoothOption) Transform

SmoothXY returns a Transform that fits a smooth curve through (x, y) data. Produces x (grid) and y (fitted) columns. Uses engine-native StatKernel.LinearFit or LoessFit — zero materialization.

func SortBy added in v0.0.5

func SortBy(column string, opts ...SortOption) Transform

SortBy returns a Transform that sorts all rows by the named column. Default order is ascending; use Desc() for descending. Uses the engine's Selector.SortIndices + Dataset.SelectRows.

func StackX added in v0.0.5

func StackX() Transform

StackX returns a Transform that cumulatively stacks x values. Produces an "xmin" column containing the base of each stacked segment: xmin = CumSum(x) - x, and replaces x with CumSum(x).

Uses the engine's Windower.CumSum and MathKernel.SubCols — no manual float64 loops. Returns a lazy Dataset.

This is the horizontal counterpart of StackY.

func StackY added in v0.0.5

func StackY() Transform

StackY returns a Transform that cumulatively stacks y values. Produces a "ymin" column containing the base of each stacked segment: ymin = CumSum(y) - y, and replaces y with CumSum(y).

Uses the engine's Windower.CumSum and MathKernel.SubCols — no manual float64 loops. Returns a lazy Dataset.

Unlike geom.Stack() (a position adjustment that offsets bar positions across color groups), StackY is a data transform that accumulates y values within a single group's pipeline.

func SummaryXY added in v0.0.5

func SummaryXY() Transform

SummaryXY returns a Transform that computes mean(y) for each distinct x value. Produces x (sorted unique) and y (mean) columns. Uses engine-native GroupBy + Summarize(AggMean) — zero materialization.

func TopN added in v0.0.5

func TopN(n int, column string, opts ...SortOption) Transform

TopN returns a Transform that keeps the top N rows by the named column. Default order is descending (largest first); use Ascending() for smallest first. Uses Dataset.Arrange + Dataset.Head/Tail — stays lazy.

type TransformInput added in v0.0.5

type TransformInput struct {
	Data    dataset.Dataset
	Mapping map[string]string
}

TransformInput carries data and aesthetic mapping into a transform.

type TransformResult added in v0.0.5

type TransformResult struct {
	Data    dataset.Dataset
	Mapping map[string]string
}

TransformResult carries transformed data and rewritten mapping out.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL