preprocessing

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 6, 2025 License: MIT Imports: 6 Imported by: 0

Documentation

Overview

Package preprocessing provides data preprocessing utilities for machine learning.

This package implements scikit-learn compatible preprocessing components including:

  • StandardScaler: Standardizes features by removing the mean and scaling to unit variance
  • MinMaxScaler: Transforms features by scaling each feature to a given range
  • OneHotEncoder: Encodes categorical features as one-hot numeric arrays

All preprocessing components follow the scikit-learn API pattern with Fit, Transform, and FitTransform methods. They integrate seamlessly with the BaseEstimator pattern for consistent state management and serialization support.

Example usage:

scaler := preprocessing.NewStandardScaler(true, true)
err := scaler.Fit(trainingData)
if err != nil {
	log.Fatal(err)
}
scaledData, err := scaler.Transform(testData)

The package is designed for production machine learning pipelines with emphasis on memory efficiency, thread safety, and compatibility with popular ML libraries.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type MinMaxScaler

type MinMaxScaler struct {
	model.BaseEstimator

	// Min は各特徴量の最小値
	Min []float64

	// Max は各特徴量の最大値
	Max []float64

	// Scale は各特徴量のスケール (max - min)
	Scale []float64

	// DataMin は学習データの最小値
	DataMin []float64

	// DataMax は学習データの最大値
	DataMax []float64

	// NFeatures は特徴量の数
	NFeatures int

	// FeatureRange はスケーリング後の範囲 [min, max]
	FeatureRange [2]float64
}

MinMaxScaler はscikit-learn互換のMin-Maxスケーラー データを指定した範囲(デフォルト[0,1])にスケーリングする

Example

ExampleMinMaxScaler demonstrates basic MinMaxScaler usage

package main

import (
	"fmt"
	"log"

	"github.com/YuminosukeSato/scigo/preprocessing"
	"gonum.org/v1/gonum/mat"
)

func main() {
	// Create sample data
	data := []float64{
		1.0, 10.0,
		2.0, 20.0,
		3.0, 30.0,
		4.0, 40.0,
	}
	X := mat.NewDense(4, 2, data)

	// Create MinMaxScaler for [0, 1] range
	scaler := preprocessing.NewMinMaxScaler([2]float64{0.0, 1.0})

	// Fit and transform
	scaled, err := scaler.FitTransform(X)
	if err != nil {
		log.Fatal(err)
	}

	// Print first and last values (should be 0.0 and 1.0)
	fmt.Printf("First row: [%.1f, %.1f]\n", scaled.At(0, 0), scaled.At(0, 1))
	fmt.Printf("Last row: [%.1f, %.1f]\n", scaled.At(3, 0), scaled.At(3, 1))

}
Output:

First row: [0.0, 0.0]
Last row: [1.0, 1.0]
Example (CustomRange)

ExampleMinMaxScaler_customRange demonstrates custom range scaling

package main

import (
	"fmt"
	"log"

	"github.com/YuminosukeSato/scigo/preprocessing"
	"gonum.org/v1/gonum/mat"
)

func main() {
	// Create sample data
	data := []float64{
		0.0,
		5.0,
		10.0,
	}
	X := mat.NewDense(3, 1, data)

	// Create MinMaxScaler for [-1, 1] range
	scaler := preprocessing.NewMinMaxScaler([2]float64{-1.0, 1.0})
	scaled, err := scaler.FitTransform(X)
	if err != nil {
		log.Fatal(err)
	}

	// Print scaled values
	for i := 0; i < 3; i++ {
		fmt.Printf("%.1f -> %.1f\n", X.At(i, 0), scaled.At(i, 0))
	}

}
Output:

0.0 -> -1.0
5.0 -> 0.0
10.0 -> 1.0

func NewMinMaxScaler

func NewMinMaxScaler(featureRange [2]float64) *MinMaxScaler

NewMinMaxScaler creates a new MinMaxScaler for feature scaling.

MinMaxScaler transforms features by scaling each feature to a given range. The transformation is given by: X_scaled = (X - X.min) / (X.max - X.min) * (max - min) + min

Parameters:

  • featureRange: Target range for scaling [min, max] (typically [0, 1] or [-1, 1])

Returns:

  • *MinMaxScaler: A new MinMaxScaler instance ready for fitting

Example:

// Scale to [0, 1] range
scaler := preprocessing.NewMinMaxScaler([2]float64{0.0, 1.0})
err := scaler.Fit(trainingData)
scaledData, err := scaler.Transform(testData)

// Scale to [-1, 1] range
scaler := preprocessing.NewMinMaxScaler([2]float64{-1.0, 1.0})

func NewMinMaxScalerDefault

func NewMinMaxScalerDefault() *MinMaxScaler

NewMinMaxScalerDefault はデフォルト設定([0,1]範囲)でMinMaxScalerを作成する

func (*MinMaxScaler) Fit

func (m *MinMaxScaler) Fit(X mat.Matrix) (err error)

Fit computes the minimum and maximum values for each feature from training data.

This method calculates the feature-wise minimum and maximum values that will be used for scaling transformations. The scaler must be fitted before calling Transform or InverseTransform.

Parameters:

  • X: Training data matrix of shape (n_samples, n_features)

Returns:

  • error: nil if successful, otherwise an error describing the failure

Errors:

  • ErrEmptyData: if X is empty

Example:

scaler := preprocessing.NewMinMaxScaler([2]float64{0.0, 1.0})
err := scaler.Fit(trainingData)
if err != nil {
    log.Fatal(err)
}

func (*MinMaxScaler) FitTransform

func (m *MinMaxScaler) FitTransform(X mat.Matrix) (_ mat.Matrix, err error)

FitTransform fits the scaler and transforms the training data in one step.

This convenience method combines Fit and Transform operations, computing min/max statistics from the input data and immediately applying the scaling. Equivalent to calling Fit(X) followed by Transform(X).

Parameters:

  • X: Training data matrix of shape (n_samples, n_features)

Returns:

  • mat.Matrix: Scaled training data matrix in the target range
  • error: nil if successful, otherwise an error from either fitting or transformation

Example:

scaler := preprocessing.NewMinMaxScaler([2]float64{0.0, 1.0})
scaledTraining, err := scaler.FitTransform(trainingData)
if err != nil {
    log.Fatal(err)
}
// Now scaler is fitted and can transform new data
scaledTest, err := scaler.Transform(testData)

func (*MinMaxScaler) GetParams

func (m *MinMaxScaler) GetParams() map[string]interface{}

GetParams はスケーラーのパラメータを取得する

func (*MinMaxScaler) InverseTransform

func (m *MinMaxScaler) InverseTransform(X mat.Matrix) (_ mat.Matrix, err error)

InverseTransform reverses the min-max scaling transformation.

This method transforms scaled data back to the original range using the fitted min/max statistics. Useful for interpreting results or recovering original data values.

Parameters:

  • X: Scaled data matrix of shape (n_samples, n_features)

Returns:

  • mat.Matrix: Data matrix in original scale and range
  • error: nil if successful, otherwise an error describing the failure

Errors:

  • ErrNotFitted: if the scaler hasn't been fitted yet
  • ErrDimensionMismatch: if X doesn't match the number of features from training

Example:

originalData, err := scaler.InverseTransform(scaledData)
if err != nil {
    log.Fatal(err)
}

func (*MinMaxScaler) String

func (m *MinMaxScaler) String() string

String はスケーラーの文字列表現を返す

func (*MinMaxScaler) Transform

func (m *MinMaxScaler) Transform(X mat.Matrix) (_ mat.Matrix, err error)

Transform scales input data to the fitted feature range.

This method transforms data using the minimum and maximum values computed during the Fit phase. Each feature is independently scaled to the target range.

Parameters:

  • X: Input data matrix of shape (n_samples, n_features)

Returns:

  • mat.Matrix: Scaled data matrix with values in the target range
  • error: nil if successful, otherwise an error describing the failure

Errors:

  • ErrNotFitted: if the scaler hasn't been fitted yet
  • ErrDimensionMismatch: if X doesn't match the number of features from training

Example:

scaledData, err := scaler.Transform(testData)
if err != nil {
    log.Fatal(err)
}
// scaledData values are now in the range specified during NewMinMaxScaler

type OneHotEncoder

type OneHotEncoder struct {
	model.BaseEstimator

	// Categories は各特徴量のカテゴリ一覧(ソート済み)
	Categories [][]string

	// CategoryToIdx は各特徴量のカテゴリ→インデックスマップ
	CategoryToIdx []map[string]int

	// NFeatures は入力特徴量数
	NFeatures int

	// NOutputs は出力特徴量数(全カテゴリの合計数)
	NOutputs int
}

OneHotEncoder はscikit-learn互換のOne-Hotエンコーダー カテゴリカルな文字列データを0/1のバイナリベクトルに変換する

Example

ExampleOneHotEncoder demonstrates OneHotEncoder usage

package main

import (
	"fmt"
	"log"

	"github.com/YuminosukeSato/scigo/preprocessing"
)

func main() {
	// Create sample categorical data
	data := [][]string{
		{"red"},
		{"green"},
		{"blue"},
		{"red"},
	}

	// Create and fit encoder
	encoder := preprocessing.NewOneHotEncoder()
	err := encoder.Fit(data)
	if err != nil {
		log.Fatal(err)
	}

	// Transform the data
	encoded, err := encoder.Transform(data)
	if err != nil {
		log.Fatal(err)
	}

	// Print feature names
	features := encoder.GetFeatureNamesOut(nil)
	fmt.Printf("Features: %v\n", features)

	// Print encoded shape
	r, c := encoded.Dims()
	fmt.Printf("Encoded shape: (%d, %d)\n", r, c)

}
Output:

Features: [x0_blue x0_green x0_red]
Encoded shape: (4, 3)

func NewOneHotEncoder

func NewOneHotEncoder() *OneHotEncoder

NewOneHotEncoder は新しいOneHotEncoderを作成する

戻り値:

  • *OneHotEncoder: 新しいOneHotEncoderインスタンス

使用例:

encoder := preprocessing.NewOneHotEncoder()
err := encoder.Fit(data)
encoded, err := encoder.Transform(data)

func (*OneHotEncoder) Fit

func (e *OneHotEncoder) Fit(data [][]string) (err error)

Fit は訓練データからカテゴリ情報を学習する

パラメータ:

  • data: 訓練データ (n_samples × n_features の文字列スライス)

戻り値:

  • error: エラーが発生した場合

func (*OneHotEncoder) FitTransform

func (e *OneHotEncoder) FitTransform(data [][]string) (_ mat.Matrix, err error)

FitTransform は訓練データで学習し、同じデータを変換する

パラメータ:

  • data: 訓練・変換するデータ

戻り値:

  • mat.Matrix: one-hot encodingされたデータ
  • error: エラーが発生した場合

func (*OneHotEncoder) GetFeatureNamesOut

func (e *OneHotEncoder) GetFeatureNamesOut(inputFeatures []string) []string

GetFeatureNamesOut は変換後の特徴量の名前を返す

パラメータ:

  • inputFeatures: 入力特徴量の名前(nilの場合は"x0", "x1", ...を使用)

戻り値:

  • []string: 出力特徴量の名前のスライス

例:

  • 入力特徴量名が["animal", "size"]の場合
  • 出力: ["animal_cat", "animal_dog", "size_small", "size_large"]

func (*OneHotEncoder) Transform

func (e *OneHotEncoder) Transform(data [][]string) (_ mat.Matrix, err error)

Transform は学習済みのカテゴリ情報を使ってデータをone-hot encodingする

パラメータ:

  • data: 変換するデータ

戻り値:

  • mat.Matrix: one-hot encodingされたデータ
  • error: エラーが発生した場合

type StandardScaler

type StandardScaler struct {
	model.BaseEstimator

	// Mean は各特徴量の平均値
	Mean []float64

	// Scale は各特徴量の標準偏差
	Scale []float64

	// NFeatures は特徴量の数
	NFeatures int

	// WithMean は平均を引くかどうか (デフォルト: true)
	WithMean bool

	// WithStd は標準偏差で割るかどうか (デフォルト: true)
	WithStd bool
}

StandardScaler はscikit-learn互換の標準化スケーラー データを平均0、標準偏差1に変換する

Example

ExampleStandardScaler demonstrates basic usage of StandardScaler

package main

import (
	"fmt"
	"log"

	"github.com/YuminosukeSato/scigo/preprocessing"
	"gonum.org/v1/gonum/mat"
)

func main() {
	// Create sample training data
	data := []float64{
		1.0, 2.0,
		3.0, 4.0,
		5.0, 6.0,
		7.0, 8.0,
	}
	X := mat.NewDense(4, 2, data)

	// Create and fit scaler
	scaler := preprocessing.NewStandardScaler(true, true)
	err := scaler.Fit(X)
	if err != nil {
		log.Fatal(err)
	}

	// Transform the data
	scaled, err := scaler.Transform(X)
	if err != nil {
		log.Fatal(err)
	}

	// Print first row of scaled data
	fmt.Printf("Scaled first row: [%.2f, %.2f]\n", scaled.At(0, 0), scaled.At(0, 1))

}
Output:

Scaled first row: [-1.34, -1.34]
Example (FitTransform)

ExampleStandardScaler_fitTransform demonstrates FitTransform usage

package main

import (
	"fmt"
	"log"

	"github.com/YuminosukeSato/scigo/preprocessing"
	"gonum.org/v1/gonum/mat"
)

func main() {
	// Create sample data
	data := []float64{
		10.0, 100.0,
		20.0, 200.0,
		30.0, 300.0,
	}
	X := mat.NewDense(3, 2, data)

	// Create scaler and fit+transform in one step
	scaler := preprocessing.NewStandardScaler(true, true)
	scaled, err := scaler.FitTransform(X)
	if err != nil {
		log.Fatal(err)
	}

	// Check that scaler is now fitted
	if scaler.IsFitted() {
		fmt.Println("Scaler is fitted")
	}

	// Print dimensions
	r, c := scaled.Dims()
	fmt.Printf("Scaled data shape: (%d, %d)\n", r, c)

}
Output:

Scaler is fitted
Scaled data shape: (3, 2)
Example (InverseTransform)

ExampleStandardScaler_inverseTransform demonstrates inverse transformation

package main

import (
	"fmt"
	"log"

	"github.com/YuminosukeSato/scigo/preprocessing"
	"gonum.org/v1/gonum/mat"
)

func main() {
	// Original data
	data := []float64{
		2.0, 4.0,
		6.0, 8.0,
	}
	X := mat.NewDense(2, 2, data)

	// Standardize
	scaler := preprocessing.NewStandardScaler(true, true)
	scaled, err := scaler.FitTransform(X)
	if err != nil {
		log.Fatal(err)
	}

	// Inverse transform back to original scale
	restored, err := scaler.InverseTransform(scaled)
	if err != nil {
		log.Fatal(err)
	}

	// Check if values match original (within floating point precision)
	fmt.Printf("Original: [%.1f, %.1f]\n", X.At(0, 0), X.At(0, 1))
	fmt.Printf("Restored: [%.1f, %.1f]\n", restored.At(0, 0), restored.At(0, 1))

}
Output:

Original: [2.0, 4.0]
Restored: [2.0, 4.0]

func NewStandardScaler

func NewStandardScaler(withMean, withStd bool) *StandardScaler

NewStandardScaler creates a new StandardScaler for feature standardization.

StandardScaler transforms features by removing the mean and scaling to unit variance. This is a common preprocessing step that ensures all features contribute equally to machine learning algorithms and improves numerical stability.

Parameters:

  • withMean: whether to center the data at zero by removing the mean (default: true)
  • withStd: whether to scale the data to unit variance by dividing by standard deviation (default: true)

Returns:

  • *StandardScaler: A new StandardScaler instance ready for fitting

Example:

// Standard z-score normalization (mean=0, std=1)
scaler := preprocessing.NewStandardScaler(true, true)
err := scaler.Fit(X_train)
X_scaled, err := scaler.Transform(X_test)

// Scale only (keep original mean)
scaler := preprocessing.NewStandardScaler(false, true)

func NewStandardScalerDefault

func NewStandardScalerDefault() *StandardScaler

NewStandardScalerDefault はデフォルト設定でStandardScalerを作成する

func (*StandardScaler) Fit

func (s *StandardScaler) Fit(X mat.Matrix) (err error)

Fit computes the statistics (mean and scale) from the training data.

This method calculates the feature-wise mean and standard deviation from the provided training data, which will be used for future transformations. The scaler must be fitted before calling Transform or InverseTransform.

Parameters:

  • X: Training data matrix of shape (n_samples, n_features)

Returns:

  • error: nil if successful, otherwise an error describing the failure

Errors:

  • ErrEmptyData: if X is empty
  • ErrDimensionMismatch: if X has inconsistent dimensions

Example:

scaler := preprocessing.NewStandardScaler(true, true)
err := scaler.Fit(trainingData)
if err != nil {
    log.Fatal(err)
}

func (*StandardScaler) FitTransform

func (s *StandardScaler) FitTransform(X mat.Matrix) (_ mat.Matrix, err error)

FitTransform fits the scaler and transforms the training data in one step.

This convenience method combines Fit and Transform operations, computing statistics from the input data and immediately applying the transformation. Equivalent to calling Fit(X) followed by Transform(X).

Parameters:

  • X: Training data matrix of shape (n_samples, n_features)

Returns:

  • mat.Matrix: Standardized training data matrix
  • error: nil if successful, otherwise an error from either fitting or transformation

Example:

scaler := preprocessing.NewStandardScaler(true, true)
scaledTraining, err := scaler.FitTransform(trainingData)
if err != nil {
    log.Fatal(err)
}
// Now scaler is fitted and can transform new data
scaledTest, err := scaler.Transform(testData)

func (*StandardScaler) GetParams

func (s *StandardScaler) GetParams() map[string]interface{}

GetParams はスケーラーのパラメータを取得する

func (*StandardScaler) InverseTransform

func (s *StandardScaler) InverseTransform(X mat.Matrix) (_ mat.Matrix, err error)

InverseTransform reverses the standardization transformation.

This method transforms standardized data back to the original scale using the fitted statistics. The inverse transformation formula is: X_orig = X_scaled * scale + mean.

Parameters:

  • X: Standardized data matrix of shape (n_samples, n_features)

Returns:

  • mat.Matrix: Data matrix in original scale
  • error: nil if successful, otherwise an error describing the failure

Errors:

  • ErrNotFitted: if the scaler hasn't been fitted yet
  • ErrDimensionMismatch: if X doesn't match the number of features from training

Example:

originalData, err := scaler.InverseTransform(scaledData)
if err != nil {
    log.Fatal(err)
}

func (*StandardScaler) String

func (s *StandardScaler) String() string

String はスケーラーの文字列表現を返す

func (*StandardScaler) Transform

func (s *StandardScaler) Transform(X mat.Matrix) (_ mat.Matrix, err error)

Transform applies standardization to the input data using fitted statistics.

This method standardizes features by removing the mean and scaling to unit variance using the statistics computed during the Fit phase. The transformation formula is: X_scaled = (X - mean) / scale.

Parameters:

  • X: Input data matrix of shape (n_samples, n_features)

Returns:

  • mat.Matrix: Standardized data matrix with same shape as input
  • error: nil if successful, otherwise an error describing the failure

Errors:

  • ErrNotFitted: if the scaler hasn't been fitted yet
  • ErrDimensionMismatch: if X doesn't match the number of features from training

Example:

scaledData, err := scaler.Transform(testData)
if err != nil {
    log.Fatal(err)
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL