mRMR

package module

v1.0.0 Latest Latest Go to latest Published: Dec 19, 2024 License: MIT Imports: 8 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/PQMark/mRMR

Links

Open Source Insights

README ¶

mRMR

Overview

Maximum relevance miminum redundancy (mRMR) is a filter-based feature selection method that explicitly considers redundancy among features while maximizing their relevance to the target variable. It has the particular advantage over some commonly used wrapper-based methods, such as Recursive Feature Elimination (RFE) and Boruta, in selecting a compact but informative set of features.

mRMR Variants

The core idea of mRMR is to select a feature subset S from the feature set F, such that:

Classical mRMR employs mutual information for both relevance and redundancy calculations. However, this approach struggles with continuous data, as it requires estimating the probability density function (PDF), which is computationally expensive. As a workaround, one can

Discretize the data: Convert continuous features into discrete bins.
Use alternative metrics: F-statistic for relevance; Pearson correlation coefficient for redundancy.

Normalization-Based Approach

A known drawback of mRMR is the imbalance between the two terms in the subtraction. To address this, Vinh et al. proposed normalizing each term:

Where:

|Ω_C|: Number of classes.
N: Quantization level.

Quotient-Based Approach

An alternative variation of mRMR considers the quotient of relevance and redundancy instead of their difference:

Install

go get github.com/PQMark/mRMR

Usage

Read Data

Use the ReadCSV helper function to load and preprocess your data from a CSV file.

data, features, groups := mRMR.ReadCSV(
    "path/to/data.csv",
    []int{1, 4}, // (1-based) Irrelevant columns, e.g. columns 1 and 4
    []int{},     // (1-based) Irrelevant rows, e.g. none
    1,           // (1-based) Index for features info
    2,           // (1-based) Index for group info
    true,        // Each column is a feature, e.g. true
)

mRMR

mRMRData := mRMR.DatamRMR{X: data, Class: groups}
parasmRMR := mRMR.ParasmRMR{
    Data: mRMRData,
}
featureSelectedIndices, relevance, redundancyMap := parasmRMR.MRMR()
featureSelected := GetFeatures(features, featureSelectedIndices)

Args for ParasmRMR:

Discretization (bool): Whether to discretize the data before feature selection.
BinSize (int): Number of bins used if discretization is enabled.
Method (string): Method for relevance/redundancy calculation.
Options: "mi-mi", "fs-pearson", "nmi-nmi" (Default: "nmi-nmi").
Calculation (string): How to combine relevance and redundancy measures.
Options: "diff", "quo".
MaxFeatures (int): Maximum number of features to select.
RedundancyMethod (string): Method for handling redundancy.
Options: "avg", "max".
Threshold (float64): Controls the quantization error for normalized MI. (Default: 0.01)
Verbose (bool): If true, prints intermediate relevance, redundancy, and combined results.

Example on MNIST

Both methods achieve a weighted F1 score above 95%. Remarkably, mRMR selects far fewer pixel features than Boruta while still maintaining comparable performance.
mRMR (mi-mi):
mRMR Feature Importance

Boruta:
Boruta Feature Importance

Note: The importance scores are derived from a Random Forest trained on a stratified sample of 1,000 MNIST instances (digits 0, 1, and 7) with selected pixels, with 20% held out for testing.

References

Maximum Relevance and Minimum Redundancy Feature Selection Methods for a Marketing Machine Learning Platform
Hanchuan Peng, Fuhui Long, and Chris Ding
arXiv preprint arXiv:1908.05376, 2019.
A Novel Feature Selection Method Based on Normalized Mutual Information
Jun Zhang, Pengjun Deng, and Yong Yu
Applied Intelligence, 2011.

Documentation ¶

Index ¶

func CheckIfAllNegative(data []float64) bool
func CheckIfAllSmallerOne(data []float64) bool
func Delete[T any](data []T, idx int) []T
func Discretization(data [][]float64, binSize int) ([][]float64, [][]float64)
func FStatistic(feature []float64, class []int) float64
func GetFeatures(features []string, indices []int) []string
func MinMaxNormalization(data []float64) []float64
func MutualInfo[T1, T2 Numeric](data1 []T1, data2 []T2) float64
func PairwiseOperation(data1, data2 []float64, operation string) []float64
func PearsonCorrelation(data1, data2 []float64) float64
func QuantizationError(quantizedData, originalData []float64) float64
func QuantizationLevel(data [][]float64, threshold float64) int
func ReadCSV(filepath string, irrelevantCols, irrelevantRows []int, ...) ([][]float64, []string, []int)
func RedundancyUpdate(data [][]float64, featureToConsider []int, target int, ...) map[[2]int]float64
func Relevance(data [][]float64, class []int, relevanceFunc func([]float64, []int) float64) []float64
type DatamRMR
type Numeric
type ParasmRMR
- func (paras *ParasmRMR) MRMR() ([]int, []float64, map[[2]int]float64)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func CheckIfAllNegative ¶

func CheckIfAllNegative(data []float64) bool

func CheckIfAllSmallerOne ¶

func CheckIfAllSmallerOne(data []float64) bool

func Delete ¶

func Delete[T any](data []T, idx int) []T

func Discretization ¶

func Discretization(data [][]float64, binSize int) ([][]float64, [][]float64)

func FStatistic ¶

func FStatistic(feature []float64, class []int) float64

FStatistic returns the f-statistic of feature and class.

func GetFeatures ¶

func GetFeatures(features []string, indices []int) []string

func MinMaxNormalization ¶

func MinMaxNormalization(data []float64) []float64

func MutualInfo ¶

func MutualInfo[T1, T2 Numeric](data1 []T1, data2 []T2) float64

MutualInfo calculates the mutual information between two data slices.

func PairwiseOperation ¶

func PairwiseOperation(data1, data2 []float64, operation string) []float64

func PearsonCorrelation ¶

func PearsonCorrelation(data1, data2 []float64) float64

PearsonCorrelation returns the absolute value of pearson correlation coefficient

func QuantizationError ¶

func QuantizationError(quantizedData, originalData []float64) float64

get the quantization error

func QuantizationLevel ¶

func QuantizationLevel(data [][]float64, threshold float64) int

get the quantization level

func ReadCSV ¶

func ReadCSV(filepath string, irrelevantCols, irrelevantRows []int, featureIndex, groupIndex int, colFeatures bool) ([][]float64, []string, []int)

ReadCSV reads a CSV file and returns data, feature strings and class lables.

func RedundancyUpdate ¶

func RedundancyUpdate(data [][]float64, featureToConsider []int, target int, redundancyMap map[[2]int]float64, redundancyFunc func([]float64, []float64) float64) map[[2]int]float64

RedundancyUpdate calculates the redundancy between each unselected feature with last selected feature and updates the redundancy map.

func Relevance ¶

func Relevance(data [][]float64, class []int, relevanceFunc func([]float64, []int) float64) []float64

Relevance computes the relevance of each feature with respect to the class and returns the scores as a slice.

Types ¶

type DatamRMR ¶

type DatamRMR struct {
	X     [][]float64
	Class []int
}

DatamRMR holds the input dataset and its class labels. Each row is an instance

type Numeric ¶

type Numeric interface {
	int | int8 | int16 | int32 | int64 | float32 | float64
}

type ParasmRMR ¶

type ParasmRMR struct {
	Data             DatamRMR
	Discretization   bool
	BinSize          int
	Method           string
	Calculation      string
	MaxFeatures      int
	RedundancyMethod string
	Threshold        float64
	Verbose          bool
	QLevel           int
	RelevanceFunc    func([]float64, []int) float64
	RedundancyFunc   func([]float64, []float64) float64
}

ParasmRMR holds parameters and functions needed to execute the mRMR algorithm.

func (*ParasmRMR) MRMR ¶

func (paras *ParasmRMR) MRMR() ([]int, []float64, map[[2]int]float64)

MRMR executes the mRMR feature selection and returns: - selectedFeatures: the indices of selected features - relevanceAll: the relevance scores of all features - redundancyMap: a map storing pairwise redundancy values

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL