Documentation
¶
Overview ¶
Package transform implements data transformations for tabular CSV data.
All functions operate on [][]string data matrices with associated metadata (headers, column types). This design keeps the package free of any dependency on application-level types (such as FileData) so it can be shared across application entry points without circular imports.
Supported transformations:
- Mathematical: log, sqrt, square (element-wise, numeric columns)
- Scaling: z-score standardization, min-max scaling (numeric columns)
- Discretization: equal-width binning (numeric → categorical)
- Encoding: one-hot encoding (categorical → multiple numeric columns)
The primary entry points are:
- Apply — execute a transformation and return the modified data
- GetTransformableColumns — list columns eligible for a given transformation
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func GetTransformableColumns ¶
GetTransformableColumns returns the column names from in that are eligible for the given transformation type.
Mathematical and scaling transforms (Log, Sqrt, Square, Standardize, MinMax, Bin) require numeric columns. Columns with the "#target" suffix are excluded. OneHot requires categorical columns.
Types ¶
type Input ¶
type Input struct {
// Data is the row-major data matrix (each row is a slice of string values).
Data [][]string
// Headers is the ordered list of column names.
Headers []string
// ColumnTypes maps each column name to its detected type ("numeric" or "categorical").
ColumnTypes map[string]string
// CategoricalColumns maps each categorical column name to its value slice.
// Used by binning (to register newly discretized columns) and one-hot encoding
// (to remove the source column after expansion).
CategoricalColumns map[string][]string
// Rows is the number of data rows.
Rows int
// Columns is the number of columns (= len(Headers)).
Columns int
}
Input carries the tabular data and metadata that transform functions operate on. It mirrors the relevant fields of the application FileData type so that callers can pass raw slices without introducing a package dependency.
type Options ¶
type Options struct {
// Type is the transformation to apply.
Type Type
// Columns lists the column names to transform.
Columns []string
// BinCount is the number of bins for Bin transformations (default: 5).
BinCount int
// MinValue is the lower bound of the target range for MinMax scaling (default: 0).
MinValue float64
// MaxValue is the upper bound of the target range for MinMax scaling (default: 1).
MaxValue float64
}
Options configures a transformation.
type Result ¶
type Result struct {
// TransformedColumns lists the column names that were successfully transformed.
TransformedColumns []string
// NewColumns lists column names added during the transformation (e.g. one-hot columns).
NewColumns []string
// Messages contains informational and warning messages produced during the transformation.
Messages []string
// Headers is the updated ordered list of column names after the transformation.
Headers []string
// Data is the updated row-major data matrix after the transformation.
Data [][]string
// ColumnTypes is the updated column-type map after the transformation.
ColumnTypes map[string]string
// CategoricalColumns is the updated categorical-column value map after the transformation.
CategoricalColumns map[string][]string
// Columns is the updated number of columns.
Columns int
}
Result carries the output of Apply. The original Input is never modified; all changes are reflected here.
type Type ¶
type Type string
Type identifies a supported transformation.
const ( // Log applies the natural logarithm to each value. Values must be positive. Log Type = "log" // Sqrt applies the square root to each value. Values must be non-negative. Sqrt Type = "sqrt" // Square squares each value. Square Type = "square" // Standardize applies z-score standardization (mean=0, std=1). Standardize Type = "standardize" // MinMax applies min-max scaling to a configurable target range (default [0, 1]). MinMax Type = "minmax" // Bin discretizes a numeric column into equal-width bins. Bin Type = "bin" // OneHot encodes a categorical column into one binary column per unique value. OneHot Type = "onehot" )