Documentation
¶
Overview ¶
Package preprocessing provides data preprocessing utilities for machine learning.
This package implements scikit-learn compatible preprocessing components including:
- StandardScaler: Standardizes features by removing the mean and scaling to unit variance
- MinMaxScaler: Transforms features by scaling each feature to a given range
- OneHotEncoder: Encodes categorical features as one-hot numeric arrays
All preprocessing components follow the scikit-learn API pattern with Fit, Transform, and FitTransform methods. They integrate seamlessly with the BaseEstimator pattern for consistent state management and serialization support.
Example usage:
scaler := preprocessing.NewStandardScaler(true, true)
err := scaler.Fit(trainingData)
if err != nil {
log.Fatal(err)
}
scaledData, err := scaler.Transform(testData)
The package is designed for production machine learning pipelines with emphasis on memory efficiency, thread safety, and compatibility with popular ML libraries.
Index ¶
- type MinMaxScaler
- func (m *MinMaxScaler) Fit(X mat.Matrix) (err error)
- func (m *MinMaxScaler) FitTransform(X mat.Matrix) (_ mat.Matrix, err error)
- func (m *MinMaxScaler) GetParams() map[string]interface{}
- func (m *MinMaxScaler) InverseTransform(X mat.Matrix) (_ mat.Matrix, err error)
- func (m *MinMaxScaler) String() string
- func (m *MinMaxScaler) Transform(X mat.Matrix) (_ mat.Matrix, err error)
- type OneHotEncoder
- type StandardScaler
- func (s *StandardScaler) Fit(X mat.Matrix) (err error)
- func (s *StandardScaler) FitTransform(X mat.Matrix) (_ mat.Matrix, err error)
- func (s *StandardScaler) GetParams() map[string]interface{}
- func (s *StandardScaler) InverseTransform(X mat.Matrix) (_ mat.Matrix, err error)
- func (s *StandardScaler) String() string
- func (s *StandardScaler) Transform(X mat.Matrix) (_ mat.Matrix, err error)
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type MinMaxScaler ¶
type MinMaxScaler struct {
model.BaseEstimator
// Min は各特徴量の最小値
Min []float64
// Max は各特徴量の最大値
Max []float64
// Scale は各特徴量のスケール (max - min)
Scale []float64
// DataMin は学習データの最小値
DataMin []float64
// DataMax は学習データの最大値
DataMax []float64
// NFeatures は特徴量の数
NFeatures int
// FeatureRange はスケーリング後の範囲 [min, max]
FeatureRange [2]float64
}
MinMaxScaler はscikit-learn互換のMin-Maxスケーラー データを指定した範囲(デフォルト[0,1])にスケーリングする
Example ¶
ExampleMinMaxScaler demonstrates basic MinMaxScaler usage
package main
import (
"fmt"
"log"
"github.com/YuminosukeSato/scigo/preprocessing"
"gonum.org/v1/gonum/mat"
)
func main() {
// Create sample data
data := []float64{
1.0, 10.0,
2.0, 20.0,
3.0, 30.0,
4.0, 40.0,
}
X := mat.NewDense(4, 2, data)
// Create MinMaxScaler for [0, 1] range
scaler := preprocessing.NewMinMaxScaler([2]float64{0.0, 1.0})
// Fit and transform
scaled, err := scaler.FitTransform(X)
if err != nil {
log.Fatal(err)
}
// Print first and last values (should be 0.0 and 1.0)
fmt.Printf("First row: [%.1f, %.1f]\n", scaled.At(0, 0), scaled.At(0, 1))
fmt.Printf("Last row: [%.1f, %.1f]\n", scaled.At(3, 0), scaled.At(3, 1))
}
Output: First row: [0.0, 0.0] Last row: [1.0, 1.0]
Example (CustomRange) ¶
ExampleMinMaxScaler_customRange demonstrates custom range scaling
package main
import (
"fmt"
"log"
"github.com/YuminosukeSato/scigo/preprocessing"
"gonum.org/v1/gonum/mat"
)
func main() {
// Create sample data
data := []float64{
0.0,
5.0,
10.0,
}
X := mat.NewDense(3, 1, data)
// Create MinMaxScaler for [-1, 1] range
scaler := preprocessing.NewMinMaxScaler([2]float64{-1.0, 1.0})
scaled, err := scaler.FitTransform(X)
if err != nil {
log.Fatal(err)
}
// Print scaled values
for i := 0; i < 3; i++ {
fmt.Printf("%.1f -> %.1f\n", X.At(i, 0), scaled.At(i, 0))
}
}
Output: 0.0 -> -1.0 5.0 -> 0.0 10.0 -> 1.0
func NewMinMaxScaler ¶
func NewMinMaxScaler(featureRange [2]float64) *MinMaxScaler
NewMinMaxScaler creates a new MinMaxScaler for feature scaling.
MinMaxScaler transforms features by scaling each feature to a given range. The transformation is given by: X_scaled = (X - X.min) / (X.max - X.min) * (max - min) + min
Parameters:
- featureRange: Target range for scaling [min, max] (typically [0, 1] or [-1, 1])
Returns:
- *MinMaxScaler: A new MinMaxScaler instance ready for fitting
Example:
// Scale to [0, 1] range
scaler := preprocessing.NewMinMaxScaler([2]float64{0.0, 1.0})
err := scaler.Fit(trainingData)
scaledData, err := scaler.Transform(testData)
// Scale to [-1, 1] range
scaler := preprocessing.NewMinMaxScaler([2]float64{-1.0, 1.0})
func NewMinMaxScalerDefault ¶
func NewMinMaxScalerDefault() *MinMaxScaler
NewMinMaxScalerDefault はデフォルト設定([0,1]範囲)でMinMaxScalerを作成する
func (*MinMaxScaler) Fit ¶
func (m *MinMaxScaler) Fit(X mat.Matrix) (err error)
Fit computes the minimum and maximum values for each feature from training data.
This method calculates the feature-wise minimum and maximum values that will be used for scaling transformations. The scaler must be fitted before calling Transform or InverseTransform.
Parameters:
- X: Training data matrix of shape (n_samples, n_features)
Returns:
- error: nil if successful, otherwise an error describing the failure
Errors:
- ErrEmptyData: if X is empty
Example:
scaler := preprocessing.NewMinMaxScaler([2]float64{0.0, 1.0})
err := scaler.Fit(trainingData)
if err != nil {
log.Fatal(err)
}
func (*MinMaxScaler) FitTransform ¶
FitTransform fits the scaler and transforms the training data in one step.
This convenience method combines Fit and Transform operations, computing min/max statistics from the input data and immediately applying the scaling. Equivalent to calling Fit(X) followed by Transform(X).
Parameters:
- X: Training data matrix of shape (n_samples, n_features)
Returns:
- mat.Matrix: Scaled training data matrix in the target range
- error: nil if successful, otherwise an error from either fitting or transformation
Example:
scaler := preprocessing.NewMinMaxScaler([2]float64{0.0, 1.0})
scaledTraining, err := scaler.FitTransform(trainingData)
if err != nil {
log.Fatal(err)
}
// Now scaler is fitted and can transform new data
scaledTest, err := scaler.Transform(testData)
func (*MinMaxScaler) GetParams ¶
func (m *MinMaxScaler) GetParams() map[string]interface{}
GetParams はスケーラーのパラメータを取得する
func (*MinMaxScaler) InverseTransform ¶
InverseTransform reverses the min-max scaling transformation.
This method transforms scaled data back to the original range using the fitted min/max statistics. Useful for interpreting results or recovering original data values.
Parameters:
- X: Scaled data matrix of shape (n_samples, n_features)
Returns:
- mat.Matrix: Data matrix in original scale and range
- error: nil if successful, otherwise an error describing the failure
Errors:
- ErrNotFitted: if the scaler hasn't been fitted yet
- ErrDimensionMismatch: if X doesn't match the number of features from training
Example:
originalData, err := scaler.InverseTransform(scaledData)
if err != nil {
log.Fatal(err)
}
func (*MinMaxScaler) Transform ¶
Transform scales input data to the fitted feature range.
This method transforms data using the minimum and maximum values computed during the Fit phase. Each feature is independently scaled to the target range.
Parameters:
- X: Input data matrix of shape (n_samples, n_features)
Returns:
- mat.Matrix: Scaled data matrix with values in the target range
- error: nil if successful, otherwise an error describing the failure
Errors:
- ErrNotFitted: if the scaler hasn't been fitted yet
- ErrDimensionMismatch: if X doesn't match the number of features from training
Example:
scaledData, err := scaler.Transform(testData)
if err != nil {
log.Fatal(err)
}
// scaledData values are now in the range specified during NewMinMaxScaler
type OneHotEncoder ¶
type OneHotEncoder struct {
model.BaseEstimator
// Categories は各特徴量のカテゴリ一覧(ソート済み)
Categories [][]string
// CategoryToIdx は各特徴量のカテゴリ→インデックスマップ
CategoryToIdx []map[string]int
// NFeatures は入力特徴量数
NFeatures int
// NOutputs は出力特徴量数(全カテゴリの合計数)
NOutputs int
}
OneHotEncoder はscikit-learn互換のOne-Hotエンコーダー カテゴリカルな文字列データを0/1のバイナリベクトルに変換する
Example ¶
ExampleOneHotEncoder demonstrates OneHotEncoder usage
package main
import (
"fmt"
"log"
"github.com/YuminosukeSato/scigo/preprocessing"
)
func main() {
// Create sample categorical data
data := [][]string{
{"red"},
{"green"},
{"blue"},
{"red"},
}
// Create and fit encoder
encoder := preprocessing.NewOneHotEncoder()
err := encoder.Fit(data)
if err != nil {
log.Fatal(err)
}
// Transform the data
encoded, err := encoder.Transform(data)
if err != nil {
log.Fatal(err)
}
// Print feature names
features := encoder.GetFeatureNamesOut(nil)
fmt.Printf("Features: %v\n", features)
// Print encoded shape
r, c := encoded.Dims()
fmt.Printf("Encoded shape: (%d, %d)\n", r, c)
}
Output: Features: [x0_blue x0_green x0_red] Encoded shape: (4, 3)
func NewOneHotEncoder ¶
func NewOneHotEncoder() *OneHotEncoder
NewOneHotEncoder は新しいOneHotEncoderを作成する
戻り値:
- *OneHotEncoder: 新しいOneHotEncoderインスタンス
使用例:
encoder := preprocessing.NewOneHotEncoder() err := encoder.Fit(data) encoded, err := encoder.Transform(data)
func (*OneHotEncoder) Fit ¶
func (e *OneHotEncoder) Fit(data [][]string) (err error)
Fit は訓練データからカテゴリ情報を学習する
パラメータ:
- data: 訓練データ (n_samples × n_features の文字列スライス)
戻り値:
- error: エラーが発生した場合
func (*OneHotEncoder) FitTransform ¶
func (e *OneHotEncoder) FitTransform(data [][]string) (_ mat.Matrix, err error)
FitTransform は訓練データで学習し、同じデータを変換する
パラメータ:
- data: 訓練・変換するデータ
戻り値:
- mat.Matrix: one-hot encodingされたデータ
- error: エラーが発生した場合
func (*OneHotEncoder) GetFeatureNamesOut ¶
func (e *OneHotEncoder) GetFeatureNamesOut(inputFeatures []string) []string
GetFeatureNamesOut は変換後の特徴量の名前を返す
パラメータ:
- inputFeatures: 入力特徴量の名前(nilの場合は"x0", "x1", ...を使用)
戻り値:
- []string: 出力特徴量の名前のスライス
例:
- 入力特徴量名が["animal", "size"]の場合
- 出力: ["animal_cat", "animal_dog", "size_small", "size_large"]
type StandardScaler ¶
type StandardScaler struct {
model.BaseEstimator
// Mean は各特徴量の平均値
Mean []float64
// Scale は各特徴量の標準偏差
Scale []float64
// NFeatures は特徴量の数
NFeatures int
// WithMean は平均を引くかどうか (デフォルト: true)
WithMean bool
// WithStd は標準偏差で割るかどうか (デフォルト: true)
WithStd bool
}
StandardScaler はscikit-learn互換の標準化スケーラー データを平均0、標準偏差1に変換する
Example ¶
ExampleStandardScaler demonstrates basic usage of StandardScaler
package main
import (
"fmt"
"log"
"github.com/YuminosukeSato/scigo/preprocessing"
"gonum.org/v1/gonum/mat"
)
func main() {
// Create sample training data
data := []float64{
1.0, 2.0,
3.0, 4.0,
5.0, 6.0,
7.0, 8.0,
}
X := mat.NewDense(4, 2, data)
// Create and fit scaler
scaler := preprocessing.NewStandardScaler(true, true)
err := scaler.Fit(X)
if err != nil {
log.Fatal(err)
}
// Transform the data
scaled, err := scaler.Transform(X)
if err != nil {
log.Fatal(err)
}
// Print first row of scaled data
fmt.Printf("Scaled first row: [%.2f, %.2f]\n", scaled.At(0, 0), scaled.At(0, 1))
}
Output: Scaled first row: [-1.34, -1.34]
Example (FitTransform) ¶
ExampleStandardScaler_fitTransform demonstrates FitTransform usage
package main
import (
"fmt"
"log"
"github.com/YuminosukeSato/scigo/preprocessing"
"gonum.org/v1/gonum/mat"
)
func main() {
// Create sample data
data := []float64{
10.0, 100.0,
20.0, 200.0,
30.0, 300.0,
}
X := mat.NewDense(3, 2, data)
// Create scaler and fit+transform in one step
scaler := preprocessing.NewStandardScaler(true, true)
scaled, err := scaler.FitTransform(X)
if err != nil {
log.Fatal(err)
}
// Check that scaler is now fitted
if scaler.IsFitted() {
fmt.Println("Scaler is fitted")
}
// Print dimensions
r, c := scaled.Dims()
fmt.Printf("Scaled data shape: (%d, %d)\n", r, c)
}
Output: Scaler is fitted Scaled data shape: (3, 2)
Example (InverseTransform) ¶
ExampleStandardScaler_inverseTransform demonstrates inverse transformation
package main
import (
"fmt"
"log"
"github.com/YuminosukeSato/scigo/preprocessing"
"gonum.org/v1/gonum/mat"
)
func main() {
// Original data
data := []float64{
2.0, 4.0,
6.0, 8.0,
}
X := mat.NewDense(2, 2, data)
// Standardize
scaler := preprocessing.NewStandardScaler(true, true)
scaled, err := scaler.FitTransform(X)
if err != nil {
log.Fatal(err)
}
// Inverse transform back to original scale
restored, err := scaler.InverseTransform(scaled)
if err != nil {
log.Fatal(err)
}
// Check if values match original (within floating point precision)
fmt.Printf("Original: [%.1f, %.1f]\n", X.At(0, 0), X.At(0, 1))
fmt.Printf("Restored: [%.1f, %.1f]\n", restored.At(0, 0), restored.At(0, 1))
}
Output: Original: [2.0, 4.0] Restored: [2.0, 4.0]
func NewStandardScaler ¶
func NewStandardScaler(withMean, withStd bool) *StandardScaler
NewStandardScaler creates a new StandardScaler for feature standardization.
StandardScaler transforms features by removing the mean and scaling to unit variance. This is a common preprocessing step that ensures all features contribute equally to machine learning algorithms and improves numerical stability.
Parameters:
- withMean: whether to center the data at zero by removing the mean (default: true)
- withStd: whether to scale the data to unit variance by dividing by standard deviation (default: true)
Returns:
- *StandardScaler: A new StandardScaler instance ready for fitting
Example:
// Standard z-score normalization (mean=0, std=1) scaler := preprocessing.NewStandardScaler(true, true) err := scaler.Fit(X_train) X_scaled, err := scaler.Transform(X_test) // Scale only (keep original mean) scaler := preprocessing.NewStandardScaler(false, true)
func NewStandardScalerDefault ¶
func NewStandardScalerDefault() *StandardScaler
NewStandardScalerDefault はデフォルト設定でStandardScalerを作成する
func (*StandardScaler) Fit ¶
func (s *StandardScaler) Fit(X mat.Matrix) (err error)
Fit computes the statistics (mean and scale) from the training data.
This method calculates the feature-wise mean and standard deviation from the provided training data, which will be used for future transformations. The scaler must be fitted before calling Transform or InverseTransform.
Parameters:
- X: Training data matrix of shape (n_samples, n_features)
Returns:
- error: nil if successful, otherwise an error describing the failure
Errors:
- ErrEmptyData: if X is empty
- ErrDimensionMismatch: if X has inconsistent dimensions
Example:
scaler := preprocessing.NewStandardScaler(true, true)
err := scaler.Fit(trainingData)
if err != nil {
log.Fatal(err)
}
func (*StandardScaler) FitTransform ¶
FitTransform fits the scaler and transforms the training data in one step.
This convenience method combines Fit and Transform operations, computing statistics from the input data and immediately applying the transformation. Equivalent to calling Fit(X) followed by Transform(X).
Parameters:
- X: Training data matrix of shape (n_samples, n_features)
Returns:
- mat.Matrix: Standardized training data matrix
- error: nil if successful, otherwise an error from either fitting or transformation
Example:
scaler := preprocessing.NewStandardScaler(true, true)
scaledTraining, err := scaler.FitTransform(trainingData)
if err != nil {
log.Fatal(err)
}
// Now scaler is fitted and can transform new data
scaledTest, err := scaler.Transform(testData)
func (*StandardScaler) GetParams ¶
func (s *StandardScaler) GetParams() map[string]interface{}
GetParams はスケーラーのパラメータを取得する
func (*StandardScaler) InverseTransform ¶
InverseTransform reverses the standardization transformation.
This method transforms standardized data back to the original scale using the fitted statistics. The inverse transformation formula is: X_orig = X_scaled * scale + mean.
Parameters:
- X: Standardized data matrix of shape (n_samples, n_features)
Returns:
- mat.Matrix: Data matrix in original scale
- error: nil if successful, otherwise an error describing the failure
Errors:
- ErrNotFitted: if the scaler hasn't been fitted yet
- ErrDimensionMismatch: if X doesn't match the number of features from training
Example:
originalData, err := scaler.InverseTransform(scaledData)
if err != nil {
log.Fatal(err)
}
func (*StandardScaler) Transform ¶
Transform applies standardization to the input data using fitted statistics.
This method standardizes features by removing the mean and scaling to unit variance using the statistics computed during the Fit phase. The transformation formula is: X_scaled = (X - mean) / scale.
Parameters:
- X: Input data matrix of shape (n_samples, n_features)
Returns:
- mat.Matrix: Standardized data matrix with same shape as input
- error: nil if successful, otherwise an error describing the failure
Errors:
- ErrNotFitted: if the scaler hasn't been fitted yet
- ErrDimensionMismatch: if X doesn't match the number of features from training
Example:
scaledData, err := scaler.Transform(testData)
if err != nil {
log.Fatal(err)
}