Documentation
¶
Overview ¶
Package csv provides unified CSV parsing, writing, and validation functionality for the GoPCA monorepo. It consolidates previously scattered CSV operations into a single, well-tested package following the DRY principle.
Index ¶
- func AnalyzeMissingValues(data *Data) map[string]interface{}
- func Save(w io.Writer, data *Data, opts Options) error
- func SaveFile(filename string, data *Data, opts Options) error
- func SaveMatrix(filename string, matrix types.Matrix, headers []string, rowNames []string, ...) error
- func ToNumericMatrix(stringData [][]string, nullValues []string) (types.Matrix, [][]bool, error)
- func ToStringMatrix(matrix types.Matrix, precision int) [][]string
- func ValidateStructure(data *Data) error
- type CSVWritable
- type ColumnStatistics
- type Data
- type DataProvider
- type Options
- type ParseMode
- type Reader
- type ValidationResult
- type Validator
- type Writer
- func (w *Writer) Write(output io.Writer, data *Data) error
- func (w *Writer) WriteFile(filename string, data *Data) error
- func (w *Writer) WriteMatrix(output io.Writer, matrix types.Matrix, headers []string, rowNames []string) error
- func (w *Writer) WriteMatrixFile(filename string, matrix types.Matrix, headers []string, rowNames []string) error
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func AnalyzeMissingValues ¶
AnalyzeMissingValues analyzes missing value patterns
func SaveMatrix ¶
func SaveMatrix(filename string, matrix types.Matrix, headers []string, rowNames []string, opts Options) error
SaveMatrix is a convenience function for writing a matrix to CSV
func ToNumericMatrix ¶
ToNumericMatrix converts string data to numeric matrix with missing value tracking
func ToStringMatrix ¶
ToStringMatrix converts numeric matrix to string representation
func ValidateStructure ¶
ValidateStructure performs basic structural validation
Types ¶
type CSVWritable ¶
type CSVWritable interface {
DataProvider
WriteHeaders(w io.Writer, opts Options) error
WriteRow(w io.Writer, index int, opts Options) error
}
CSVWritable is an interface for data that can be written to CSV
type ColumnStatistics ¶
type ColumnStatistics struct {
Name string
Index int
DataType string // "numeric", "categorical", "mixed"
NonMissing int
Missing int
MissingPercent float64
Mean float64 // For numeric columns
StdDev float64 // For numeric columns
Min float64 // For numeric columns
Max float64 // For numeric columns
UniqueValues int // For categorical columns
HasZeroVariance bool // Warning flag
}
ColumnStatistics contains statistics for a single column
type Data ¶
type Data struct {
// Core numeric data (always present for PCA)
Matrix types.Matrix // Numeric data matrix
Headers []string // Column names
RowNames []string // Row names
MissingMask [][]bool // Track missing values (true = missing)
Rows int // Number of data rows
Columns int // Number of data columns
// Additional data types (optional)
StringData [][]string // Raw string data (for GoCSV)
CategoricalColumns map[string][]string // Categorical columns by name
NumericTargetColumns map[string][]float64 // Numeric target columns
}
Data represents parsed CSV data with support for different data types
type DataProvider ¶
type DataProvider interface {
GetHeaders() []string
GetRowNames() []string
GetDimensions() (rows, cols int)
HasNumericData() bool
HasStringData() bool
}
DataProvider is an interface that different data representations can implement to provide consistent access to CSV data regardless of internal structure
type Options ¶
type Options struct {
// Parsing options
Delimiter rune // Field delimiter: ',', ';', '\t'
DecimalSeparator rune // Decimal separator: '.', ','
HasHeaders bool // First row contains column names
HasRowNames bool // First column contains row names
NullValues []string // Strings to treat as missing values
ParseMode ParseMode // How to parse the data
TargetSuffix string // Suffix to identify target columns (e.g., "#target")
// Reading options (for large files)
SkipRows int // Number of rows to skip at start
MaxRows int // Maximum rows to read (0 for all)
Columns []int // Specific columns to read (empty for all)
StreamingMode bool // Enable streaming for large files
// Writing options
FloatFormat byte // Format for float output: 'g', 'f', 'e'
Precision int // Decimal precision for float output (-1 for auto)
}
Options provides unified configuration for CSV operations
func DefaultOptions ¶
func DefaultOptions() Options
DefaultOptions returns sensible default options for CSV operations
func EuropeanOptions ¶
func EuropeanOptions() Options
EuropeanOptions returns options for European CSV format (semicolon delimiter, comma decimal)
func TabDelimitedOptions ¶
func TabDelimitedOptions() Options
TabDelimitedOptions returns options for tab-delimited files
type Reader ¶
type Reader struct {
// contains filtered or unexported fields
}
Reader provides unified CSV reading functionality
type ValidationResult ¶
type ValidationResult struct {
Valid bool
Errors []string
Warnings []string
ColumnStats []ColumnStatistics
}
ValidationResult contains the results of CSV validation
func ValidateFile ¶
func ValidateFile(filename string, opts Options) (*ValidationResult, error)
ValidateFile validates a CSV file
type Validator ¶
type Validator struct {
// contains filtered or unexported fields
}
Validator provides CSV validation functionality
func NewValidator ¶
NewValidator creates a new CSV validator with the given options
func (*Validator) Validate ¶
func (v *Validator) Validate(data *Data) *ValidationResult
Validate performs comprehensive validation on CSV data
type Writer ¶
type Writer struct {
// contains filtered or unexported fields
}
Writer provides unified CSV writing functionality