transform

package
v1.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 24, 2026 License: MIT Imports: 5 Imported by: 0

Documentation

Overview

Package transform implements data transformations for tabular CSV data.

All functions operate on [][]string data matrices with associated metadata (headers, column types). This design keeps the package free of any dependency on application-level types (such as FileData) so it can be shared across application entry points without circular imports.

Supported transformations:

  • Mathematical: log, sqrt, square (element-wise, numeric columns)
  • Scaling: z-score standardization, min-max scaling (numeric columns)
  • Discretization: equal-width binning (numeric → categorical)
  • Encoding: one-hot encoding (categorical → multiple numeric columns)

The primary entry points are:

  • Apply — execute a transformation and return the modified data
  • GetTransformableColumns — list columns eligible for a given transformation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func GetTransformableColumns

func GetTransformableColumns(in Input, transformType Type) []string

GetTransformableColumns returns the column names from in that are eligible for the given transformation type.

Mathematical and scaling transforms (Log, Sqrt, Square, Standardize, MinMax, Bin) require numeric columns. Columns with the "#target" suffix are excluded. OneHot requires categorical columns.

Types

type Input

type Input struct {
	// Data is the row-major data matrix (each row is a slice of string values).
	Data [][]string
	// Headers is the ordered list of column names.
	Headers []string
	// ColumnTypes maps each column name to its detected type ("numeric" or "categorical").
	ColumnTypes map[string]string
	// CategoricalColumns maps each categorical column name to its value slice.
	// Used by binning (to register newly discretized columns) and one-hot encoding
	// (to remove the source column after expansion).
	CategoricalColumns map[string][]string
	// Rows is the number of data rows.
	Rows int
	// Columns is the number of columns (= len(Headers)).
	Columns int
}

Input carries the tabular data and metadata that transform functions operate on. It mirrors the relevant fields of the application FileData type so that callers can pass raw slices without introducing a package dependency.

type Options

type Options struct {
	// Type is the transformation to apply.
	Type Type
	// Columns lists the column names to transform.
	Columns []string
	// BinCount is the number of bins for Bin transformations (default: 5).
	BinCount int
	// MinValue is the lower bound of the target range for MinMax scaling (default: 0).
	MinValue float64
	// MaxValue is the upper bound of the target range for MinMax scaling (default: 1).
	MaxValue float64
}

Options configures a transformation.

type Result

type Result struct {
	// TransformedColumns lists the column names that were successfully transformed.
	TransformedColumns []string
	// NewColumns lists column names added during the transformation (e.g. one-hot columns).
	NewColumns []string
	// Messages contains informational and warning messages produced during the transformation.
	Messages []string
	// Headers is the updated ordered list of column names after the transformation.
	Headers []string
	// Data is the updated row-major data matrix after the transformation.
	Data [][]string
	// ColumnTypes is the updated column-type map after the transformation.
	ColumnTypes map[string]string
	// CategoricalColumns is the updated categorical-column value map after the transformation.
	CategoricalColumns map[string][]string
	// Columns is the updated number of columns.
	Columns int
}

Result carries the output of Apply. The original Input is never modified; all changes are reflected here.

func Apply

func Apply(in Input, opts Options) (*Result, error)

Apply executes the transformation described by opts against a deep copy of the input data and returns the modified data as a Result. The original Input is never modified.

Unsupported transformation types return an error.

type Type

type Type string

Type identifies a supported transformation.

const (
	// Log applies the natural logarithm to each value. Values must be positive.
	Log Type = "log"
	// Sqrt applies the square root to each value. Values must be non-negative.
	Sqrt Type = "sqrt"
	// Square squares each value.
	Square Type = "square"
	// Standardize applies z-score standardization (mean=0, std=1).
	Standardize Type = "standardize"
	// MinMax applies min-max scaling to a configurable target range (default [0, 1]).
	MinMax Type = "minmax"
	// Bin discretizes a numeric column into equal-width bins.
	Bin Type = "bin"
	// OneHot encodes a categorical column into one binary column per unique value.
	OneHot Type = "onehot"
)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL