dataframe

package
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 4, 2025 License: MIT Imports: 8 Imported by: 0

Documentation

Overview

Package dataframe provides a two-dimensional labeled data structure with operations.

Package dataframe provides a two-dimensional labeled data structure.

DataFrame is the primary data structure in GopherData, analogous to pandas.DataFrame in Python or data.frame in R. It consists of:

  • Multiple Series (columns) with potentially different types
  • A shared Index for row labels
  • Copy-on-write semantics for predictable behavior

Key features:

  • Type-safe generic Series under the hood
  • Efficient null handling via bit-packed masks
  • Thread-safe concurrent reads
  • Rich selection and filtering operations
  • Automatic parallelism for expensive operations

Example:

df, err := dataframe.New(map[string]any{
    "name": []string{"Alice", "Bob", "Charlie"},
    "age":  []int64{25, 30, 35},
})

subset := df.Select("name", "age")
filtered := subset.Filter(func(r *Row) bool {
    age, _ := r.Get("age")
    return age.(int64) > 25
})

Index

Constants

View Source
const (
	AggSum    = "sum"    // Sum of values
	AggMean   = "mean"   // Arithmetic mean
	AggMedian = "median" // 50th percentile
	AggStd    = "std"    // Sample standard deviation
	AggVar    = "var"    // Sample variance
	AggMin    = "min"    // Minimum value
	AggMax    = "max"    // Maximum value
	AggCount  = "count"  // Count non-null values
	AggSize   = "size"   // Count all values (including nulls)
	AggFirst  = "first"  // First value in group
	AggLast   = "last"   // Last value in group
)

Aggregation function names

View Source
const (
	JoinInner = "inner" // Intersection
	JoinLeft  = "left"  // All from left, matching from right
	JoinRight = "right" // All from right, matching from left
	JoinOuter = "outer" // Union (full outer join)
	JoinCross = "cross" // Cartesian product
)

Join types

Variables

This section is empty.

Functions

This section is empty.

Types

type AggOption

type AggOption func(*AggOptions)

AggOption is a functional option for aggregations.

func SkipNA

func SkipNA(skip bool) AggOption

SkipNA sets whether to skip null values (default: true).

type AggOptions

type AggOptions struct {
	// contains filtered or unexported fields
}

AggOptions configures aggregation behavior.

type DataFrame

type DataFrame struct {
	// contains filtered or unexported fields
}

DataFrame is a two-dimensional, size-mutable, tabular data structure. Operations return new DataFrames (copy-on-write semantics) unless noted.

func FromRecords

func FromRecords(records []map[string]any) (*DataFrame, error)

FromRecords creates a DataFrame from a slice of maps (records). Each map represents a row with column names as keys.

func New

func New(data map[string]any) (*DataFrame, error)

New creates a new DataFrame from a map of column names to data slices. All slices must have the same length. Automatically infers data types and creates a default RangeIndex.

func (*DataFrame) Apply

func (df *DataFrame) Apply(fn func(*Row) any, resultCol string) *DataFrame

Apply applies a function to each row and adds the result as a new column. The function receives a Row and returns a single value.

func (*DataFrame) ApplyColumn

func (df *DataFrame) ApplyColumn(col string, fn func(any) any) *DataFrame

ApplyColumn applies a function to each value in a column.

func (*DataFrame) ApplyElement

func (df *DataFrame) ApplyElement(cols []string, fn func(map[string]any) map[string]any) *DataFrame

ApplyElement applies a function to selected columns element-wise. The function receives a map of column values for the current row.

func (*DataFrame) Argsort

func (df *DataFrame) Argsort(col string, order core.Order, opts ...SortOption) []int

Argsort returns the indices that would sort the DataFrame.

func (*DataFrame) Column

func (df *DataFrame) Column(name string) (*series.Series[any], error)

Column returns the Series for the given column name. Returns an error if the column doesn't exist.

func (*DataFrame) Columns

func (df *DataFrame) Columns() []string

Columns returns a copy of the column names.

func (*DataFrame) Copy

func (df *DataFrame) Copy() *DataFrame

Copy returns a deep copy of the DataFrame.

func (*DataFrame) Count

func (df *DataFrame) Count(cols ...string) (map[string]int, error)

Count returns the count of non-null values for each column.

func (*DataFrame) Describe

func (df *DataFrame) Describe() (*DataFrame, error)

Describe generates descriptive statistics for numeric columns. Returns a DataFrame with statistics: count, mean, std, min, 25%, 50%, 75%, max.

func (*DataFrame) Drop

func (df *DataFrame) Drop(cols ...string) *DataFrame

Drop returns a new DataFrame with the specified columns removed.

func (*DataFrame) DropNA

func (df *DataFrame) DropNA(opts ...DropNAOption) *DataFrame

DropNA returns a new DataFrame with rows containing null values removed.

func (*DataFrame) EWM

func (df *DataFrame) EWM(alpha float64) *Window

EWM creates an exponentially weighted moving window.

func (*DataFrame) Empty

func (df *DataFrame) Empty() bool

Empty returns true if the DataFrame has no rows.

func (*DataFrame) Expanding

func (df *DataFrame) Expanding(minPeriods int) *Window

Expanding creates an expanding window.

func (*DataFrame) FillNA

func (df *DataFrame) FillNA(value any) *DataFrame

FillNA returns a new DataFrame with null values replaced by the given value.

func (*DataFrame) FillNAColumn

func (df *DataFrame) FillNAColumn(col string, value any) *DataFrame

FillNAColumn returns a new DataFrame with nulls in a specific column replaced.

func (*DataFrame) FillNADict

func (df *DataFrame) FillNADict(values map[string]any) *DataFrame

FillNADict returns a new DataFrame with nulls replaced using a column-specific map.

func (*DataFrame) Filter

func (df *DataFrame) Filter(fn func(*Row) bool) *DataFrame

Filter returns a new DataFrame containing only rows for which the predicate returns true. This creates a copy of the data for filtered rows.

func (*DataFrame) GroupBy

func (df *DataFrame) GroupBy(cols ...string) (*GroupBy, error)

GroupBy creates a GroupBy object for aggregation operations.

func (*DataFrame) HasColumn

func (df *DataFrame) HasColumn(name string) bool

HasColumn returns true if the DataFrame has a column with the given name.

func (*DataFrame) Head

func (df *DataFrame) Head(n int) string

Head returns a string representation of the first n rows.

func (*DataFrame) Iloc

func (df *DataFrame) Iloc(positions ...int) *DataFrame

Iloc returns rows by integer position.

func (*DataFrame) Index

func (df *DataFrame) Index() core.Index

Index returns the index of the DataFrame.

func (*DataFrame) Interpolate

func (df *DataFrame) Interpolate(method string, opts ...InterpolateOption) *DataFrame

Interpolate fills null values using interpolation. method can be "linear", "ffill", or "bfill".

func (*DataFrame) IsNA

func (df *DataFrame) IsNA() (*DataFrame, error)

IsNA returns a DataFrame of boolean values indicating null positions.

func (*DataFrame) Join

func (df *DataFrame) Join(other *DataFrame, joinType, onCol string, opts ...JoinOption) (*DataFrame, error)

Join performs a join operation on a single column.

func (*DataFrame) Loc

func (df *DataFrame) Loc(labels ...any) (*DataFrame, error)

Loc returns rows by label-based indexing.

func (*DataFrame) Map

func (df *DataFrame) Map(fn func(any) any) *DataFrame

Map applies a function to each value in the DataFrame. Returns a new DataFrame with all values transformed.

func (*DataFrame) Max

func (df *DataFrame) Max(cols ...string) (map[string]any, error)

Max returns the maximum value for each column.

func (*DataFrame) Mean

func (df *DataFrame) Mean(cols ...string) (map[string]float64, error)

Mean calculates the arithmetic mean of numeric columns.

func (*DataFrame) Median

func (df *DataFrame) Median(cols ...string) (map[string]float64, error)

Median calculates the median of numeric columns.

func (*DataFrame) Melt

func (df *DataFrame) Melt(idVars, valueVars []string, varName, valueName string) (*DataFrame, error)

Melt transforms wide format to long format. idVars: columns to use as identifier variables valueVars: columns to unpivot (if empty, use all non-id columns) varName: name for the variable column valueName: name for the value column

func (*DataFrame) Merge

func (df *DataFrame) Merge(other *DataFrame, joinType string, leftOn, rightOn []string, opts ...JoinOption) (*DataFrame, error)

Merge performs a join operation on multiple columns. leftOn and rightOn specify the join keys for left and right DataFrames.

func (*DataFrame) Min

func (df *DataFrame) Min(cols ...string) (map[string]any, error)

Min returns the minimum value for each column.

func (*DataFrame) Ncols

func (df *DataFrame) Ncols() int

Ncols returns the number of columns in the DataFrame.

func (*DataFrame) NotNA

func (df *DataFrame) NotNA() (*DataFrame, error)

NotNA returns a DataFrame of boolean values indicating non-null positions.

func (*DataFrame) Nrows

func (df *DataFrame) Nrows() int

Nrows returns the number of rows in the DataFrame.

func (*DataFrame) Pivot

func (df *DataFrame) Pivot(index, columns, values string) (*DataFrame, error)

Pivot transforms long format to wide format. index: column to use as row index columns: column to use for new column names values: column to use for cell values

func (*DataFrame) Rename

func (df *DataFrame) Rename(mapping map[string]string) *DataFrame

Rename renames columns in the DataFrame.

func (*DataFrame) Rolling

func (df *DataFrame) Rolling(size int, opts ...WindowOption) *Window

Rolling creates a rolling window.

func (*DataFrame) Select

func (df *DataFrame) Select(cols ...string) *DataFrame

Select returns a new DataFrame containing only the specified columns. This is a view operation (zero-copy) - the underlying data is shared.

func (*DataFrame) SetIndex

func (df *DataFrame) SetIndex(idx core.Index) error

SetIndex sets a new index for the DataFrame.

func (*DataFrame) Shape

func (df *DataFrame) Shape() (int, int)

Shape returns (nrows, ncols).

func (*DataFrame) SliceRows

func (df *DataFrame) SliceRows(start, end int) *DataFrame

SliceRows returns a new DataFrame with rows from start (inclusive) to end (exclusive).

func (*DataFrame) Sort

func (df *DataFrame) Sort(col string, order core.Order, opts ...SortOption) *DataFrame

Sort sorts the DataFrame by a single column. Returns a new DataFrame with rows reordered.

func (*DataFrame) SortIndex

func (df *DataFrame) SortIndex(order core.Order, opts ...SortOption) *DataFrame

SortIndex sorts the DataFrame by its index.

func (*DataFrame) SortMulti

func (df *DataFrame) SortMulti(cols []string, orders []core.Order, opts ...SortOption) *DataFrame

SortMulti sorts the DataFrame by multiple columns. Columns are sorted in order of priority (first column is primary sort key).

func (*DataFrame) Stack

func (df *DataFrame) Stack() (*DataFrame, error)

Stack pivots columns into rows (multi-level index). For simplicity, this implementation creates a long-form DataFrame.

func (*DataFrame) Std

func (df *DataFrame) Std(cols ...string) (map[string]float64, error)

Std calculates the sample standard deviation of numeric columns.

func (*DataFrame) String

func (df *DataFrame) String() string

String returns a string representation of the DataFrame.

func (*DataFrame) Sum

func (df *DataFrame) Sum(cols ...string) (map[string]float64, error)

Sum calculates the sum of numeric columns. Returns a map of column names to their sums.

func (*DataFrame) Tail

func (df *DataFrame) Tail(n int) string

Tail returns a string representation of the last n rows.

func (*DataFrame) Transpose

func (df *DataFrame) Transpose() (*DataFrame, error)

Transpose swaps rows and columns.

func (*DataFrame) Unstack

func (df *DataFrame) Unstack(rowCol, colCol, valueCol string) (*DataFrame, error)

Unstack pivots rows into columns (inverse of Stack).

func (*DataFrame) Var

func (df *DataFrame) Var(cols ...string) (map[string]float64, error)

Var calculates the sample variance of numeric columns.

func (*DataFrame) WithColumn

func (df *DataFrame) WithColumn(name string, s *series.Series[any]) *DataFrame

WithColumn adds or replaces a column with the given Series.

type DatetimeIndex

type DatetimeIndex struct {
	// contains filtered or unexported fields
}

DatetimeIndex is a time-based index for time series data.

func NewDatetimeIndex

func NewDatetimeIndex(times []time.Time, tz *time.Location) *DatetimeIndex

NewDatetimeIndex creates a new DatetimeIndex.

func (*DatetimeIndex) Copy

func (di *DatetimeIndex) Copy() core.Index

Copy returns a copy of the index.

func (*DatetimeIndex) Get

func (di *DatetimeIndex) Get(pos int) any

Get returns the label at the given position.

func (*DatetimeIndex) Len

func (di *DatetimeIndex) Len() int

Len returns the number of elements in the index.

func (*DatetimeIndex) Loc

func (di *DatetimeIndex) Loc(labels ...any) ([]int, error)

Loc returns the integer positions for the given labels.

func (*DatetimeIndex) Slice

func (di *DatetimeIndex) Slice(start, end int) core.Index

Slice returns a subset of the index.

type DropNAOption

type DropNAOption func(*DropNAOptions)

DropNAOption is a functional option for DropNA.

func HowAll

func HowAll() DropNAOption

HowAll drops rows only if all values are null.

func HowAny

func HowAny() DropNAOption

HowAny drops rows with any null value (default behavior).

func Subset

func Subset(cols []string) DropNAOption

Subset specifies columns to consider for null checking.

func Thresh

func Thresh(n int) DropNAOption

Thresh sets the minimum number of non-null values required to keep a row.

type DropNAOptions

type DropNAOptions struct {
	// contains filtered or unexported fields
}

DropNAOptions configures DropNA behavior.

type GroupBy

type GroupBy struct {
	// contains filtered or unexported fields
}

GroupBy represents a grouped DataFrame for aggregation operations.

func (*GroupBy) Agg

func (gb *GroupBy) Agg(ops map[string]string) (*DataFrame, error)

Agg performs single aggregation per column. ops maps column names to aggregation function names. Example: {"sales": "sum", "qty": "mean"}

func (*GroupBy) AggMultiple

func (gb *GroupBy) AggMultiple(ops map[string][]string) (*DataFrame, error)

AggMultiple performs multiple aggregations per column. ops maps column names to slices of aggregation function names. Example: {"sales": ["sum", "mean", "std"], "qty": ["min", "max"]} Result columns: [group_keys..., sales_sum, sales_mean, sales_std, qty_min, qty_max]

func (*GroupBy) Apply

func (gb *GroupBy) Apply(fn func(*DataFrame) any) (*DataFrame, error)

Apply applies a custom function to each group and returns a DataFrame.

func (*GroupBy) Count

func (gb *GroupBy) Count() (*DataFrame, error)

Count returns the count of non-null values in each group.

func (*GroupBy) Size

func (gb *GroupBy) Size() (*DataFrame, error)

Size returns the size of each group (including nulls).

type InterpolateOption

type InterpolateOption func(*InterpolateOptions)

InterpolateOption is a functional option for Interpolate.

func Limit

func Limit(n int) InterpolateOption

Limit sets the maximum number of consecutive nulls to fill.

type InterpolateOptions

type InterpolateOptions struct {
	// contains filtered or unexported fields
}

InterpolateOptions configures interpolation behavior.

type JoinOption

type JoinOption func(*JoinOptions)

JoinOption is a functional option for joins.

func WithIndicator

func WithIndicator(colName string) JoinOption

WithIndicator adds a column indicating the source of each row.

func WithSuffixes

func WithSuffixes(left, right string) JoinOption

WithSuffixes sets custom suffixes for overlapping columns.

type JoinOptions

type JoinOptions struct {
	// contains filtered or unexported fields
}

JoinOptions configures join behavior.

type RangeIndex

type RangeIndex struct {
	// contains filtered or unexported fields
}

RangeIndex is an integer-based index with start, stop, and step. Represents indices [start, start+step, start+2*step, ..., stop-1].

func NewRangeIndex

func NewRangeIndex(start, stop, step int) *RangeIndex

NewRangeIndex creates a new RangeIndex.

func (*RangeIndex) Copy

func (ri *RangeIndex) Copy() core.Index

Copy returns a copy of the index.

func (*RangeIndex) Get

func (ri *RangeIndex) Get(pos int) any

Get returns the label at the given position.

func (*RangeIndex) Len

func (ri *RangeIndex) Len() int

Len returns the number of elements in the index.

func (*RangeIndex) Loc

func (ri *RangeIndex) Loc(labels ...any) ([]int, error)

Loc returns the integer positions for the given labels.

func (*RangeIndex) Slice

func (ri *RangeIndex) Slice(start, end int) core.Index

Slice returns a subset of the index.

type Row

type Row struct {
	// contains filtered or unexported fields
}

Row represents a single row in a DataFrame for filtering operations.

func (*Row) Get

func (r *Row) Get(col string) (any, bool)

Get returns the value in the specified column for this row.

type SortOption

type SortOption func(*SortOptions)

SortOption is a functional option for sorting.

func NullsFirst

func NullsFirst() SortOption

NullsFirst places null values at the beginning.

func NullsLast

func NullsLast() SortOption

NullsLast places null values at the end (default).

func Stable

func Stable() SortOption

Stable uses stable sort algorithm.

type SortOptions

type SortOptions struct {
	// contains filtered or unexported fields
}

SortOptions configures sort behavior.

type StringIndex

type StringIndex struct {
	// contains filtered or unexported fields
}

StringIndex is a string-based index with a lookup map for fast label-based access.

func NewStringIndex

func NewStringIndex(labels []string) *StringIndex

NewStringIndex creates a new StringIndex.

func (*StringIndex) Copy

func (si *StringIndex) Copy() core.Index

Copy returns a copy of the index.

func (*StringIndex) Get

func (si *StringIndex) Get(pos int) any

Get returns the label at the given position.

func (*StringIndex) Len

func (si *StringIndex) Len() int

Len returns the number of elements in the index.

func (*StringIndex) Loc

func (si *StringIndex) Loc(labels ...any) ([]int, error)

Loc returns the integer positions for the given labels.

func (*StringIndex) Slice

func (si *StringIndex) Slice(start, end int) core.Index

Slice returns a subset of the index.

type Window

type Window struct {
	// contains filtered or unexported fields
}

Window represents a rolling window over a DataFrame.

func (*Window) Max

func (w *Window) Max(col string) (*series.Series[any], error)

Max calculates the rolling maximum for a column.

func (*Window) Mean

func (w *Window) Mean(col string) (*series.Series[float64], error)

Mean calculates the rolling mean for a column.

func (*Window) Min

func (w *Window) Min(col string) (*series.Series[any], error)

Min calculates the rolling minimum for a column.

func (*Window) Std

func (w *Window) Std(col string) (*series.Series[float64], error)

Std calculates the rolling standard deviation for a column.

func (*Window) Sum

func (w *Window) Sum(col string) (*series.Series[float64], error)

Sum calculates the rolling sum for a column.

type WindowOption

type WindowOption func(*WindowOptions)

WindowOption is a functional option for windows.

func Center

func Center() WindowOption

Center centers the window around the current value.

func MinPeriods

func MinPeriods(n int) WindowOption

MinPeriods sets the minimum number of observations in window.

type WindowOptions

type WindowOptions struct {
	// contains filtered or unexported fields
}

WindowOptions configures window behavior.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL