Documentation
¶
Overview ¶
Package dataquality provides data quality analysis and missing value handling for tabular datasets, independent of any UI or Wails framework.
The package operates on plain [][]string data matrices and column metadata maps to avoid circular dependencies with UI-layer packages. Callers convert their domain types (e.g. FileData) into the AnalysisInput struct before calling package functions.
Key capabilities:
- Missing value detection, statistics, and fill strategies (mean, median, mode, forward-fill, backward-fill, custom value)
- Per-column statistics (mean, median, standard deviation, percentiles, skewness, kurtosis, categorical frequency)
- Distribution analysis and histogram generation
- Outlier detection via IQR and Z-score methods
- Pairwise Pearson correlation matrix
- Data quality scoring and actionable issue/recommendation generation
Index ¶
- func Fill(data [][]string, headers []string, columnTypes map[string]string, ...) ([][]string, error)
- type AnalysisInput
- type ColumnAnalysis
- type ColumnMissing
- type ColumnStatistics
- type DataProfile
- type DataQualityReport
- type DistributionInfo
- type FillRequest
- type HistogramBin
- type MissingValueStats
- type OutlierInfo
- type QualityIssue
- type Recommendation
- type RowMissing
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Fill ¶
func Fill(data [][]string, headers []string, columnTypes map[string]string, req FillRequest) ([][]string, error)
Fill applies a fill strategy to a deep copy of data and returns the new matrix. The original data is never modified.
strategy values: "mean", "median", "mode", "forward", "backward", "custom". If req.Column is empty, all columns are processed; otherwise only the named column is filled.
Types ¶
type AnalysisInput ¶
type AnalysisInput struct {
Data [][]string
Headers []string
ColumnTypes map[string]string // "numeric", "categorical", "target"
RowNames []string
Rows int
Columns int
}
AnalysisInput carries all data needed for quality analysis. Callers populate this from their own domain type (e.g. FileData) so that this package remains independent of UI/Wails types.
type ColumnAnalysis ¶
type ColumnAnalysis struct {
Name string `json:"name"`
Type string `json:"type"` // "numeric", "categorical", "target"
Stats ColumnStatistics `json:"stats"`
Distribution DistributionInfo `json:"distribution"`
Outliers []OutlierInfo `json:"outliers"`
QualityScore float64 `json:"qualityScore"`
}
ColumnAnalysis contains detailed analysis for a single column.
type ColumnMissing ¶
type ColumnMissing struct {
Name string `json:"name"`
TotalValues int `json:"totalValues"`
MissingValues int `json:"missingValues"`
MissingPercent float64 `json:"missingPercent"`
Pattern string `json:"pattern"` // "none", "random", "systematic", "top", "bottom"
}
ColumnMissing contains missing-value statistics for one column.
type ColumnStatistics ¶
type ColumnStatistics struct {
Count int `json:"count"`
Missing int `json:"missing"`
MissingPercent float64 `json:"missingPercent"`
Unique int `json:"unique"`
Mean *float64 `json:"mean,omitempty"`
Median *float64 `json:"median,omitempty"`
Mode *string `json:"mode,omitempty"`
StdDev *float64 `json:"stdDev,omitempty"`
Min *float64 `json:"min,omitempty"`
Max *float64 `json:"max,omitempty"`
Q1 *float64 `json:"q1,omitempty"`
Q3 *float64 `json:"q3,omitempty"`
IQR *float64 `json:"iqr,omitempty"`
Skewness *float64 `json:"skewness,omitempty"`
Kurtosis *float64 `json:"kurtosis,omitempty"`
Categories map[string]int `json:"categories,omitempty"`
}
ColumnStatistics contains statistical measures for a column.
type DataProfile ¶
type DataProfile struct {
Rows int `json:"rows"`
Columns int `json:"columns"`
NumericColumns int `json:"numericColumns"`
CategoricalColumns int `json:"categoricalColumns"`
TargetColumns int `json:"targetColumns"`
MissingPercent float64 `json:"missingPercent"`
DuplicateRows int `json:"duplicateRows"`
MemorySize string `json:"memorySize"`
}
DataProfile contains overall dataset-level statistics.
type DataQualityReport ¶
type DataQualityReport struct {
DataProfile DataProfile `json:"dataProfile"`
ColumnAnalysis []ColumnAnalysis `json:"columnAnalysis"`
QualityScore float64 `json:"qualityScore"`
Issues []QualityIssue `json:"issues"`
Recommendations []Recommendation `json:"recommendations"`
}
DataQualityReport is the top-level result of a full data quality analysis.
func AnalyzeDataQuality ¶
func AnalyzeDataQuality(in AnalysisInput) (*DataQualityReport, error)
AnalyzeDataQuality performs comprehensive data quality analysis on the given input and returns a DataQualityReport. Returns an error if the input is empty.
type DistributionInfo ¶
type DistributionInfo struct {
Histogram []HistogramBin `json:"histogram,omitempty"`
IsNormal bool `json:"isNormal"`
NormalityPValue float64 `json:"normalityPValue,omitempty"`
DistType string `json:"distType"` // "normal", "right-skewed", "left-skewed", "bimodal", "unknown"
}
DistributionInfo describes the distribution shape of a numeric column.
type FillRequest ¶
type FillRequest struct {
Strategy string // "mean", "median", "mode", "forward", "backward", "custom"
Column string // Column name, or empty string to process all columns
Value string // Custom fill value (used when Strategy == "custom")
}
FillRequest describes a missing-value fill operation.
type HistogramBin ¶
type HistogramBin struct {
Min float64 `json:"min"`
Max float64 `json:"max"`
Count int `json:"count"`
}
HistogramBin represents one bin in a histogram.
type MissingValueStats ¶
type MissingValueStats struct {
TotalCells int `json:"totalCells"`
MissingCells int `json:"missingCells"`
MissingPercent float64 `json:"missingPercent"`
ColumnStats map[string]*ColumnMissing `json:"columnStats"`
RowStats map[int]*RowMissing `json:"rowStats"`
}
MissingValueStats contains missing-value statistics for an entire dataset.
func AnalyzeMissing ¶
func AnalyzeMissing(data [][]string, headers []string) *MissingValueStats
AnalyzeMissing returns missing-value statistics for the given data matrix and column headers. An empty or nil data slice returns a zeroed stats struct.
type OutlierInfo ¶
type OutlierInfo struct {
RowIndex int `json:"rowIndex"`
Value string `json:"value"`
Method string `json:"method"` // "iqr" or "zscore"
Score float64 `json:"score"`
}
OutlierInfo describes one detected outlier value.
type QualityIssue ¶
type QualityIssue struct {
Severity string `json:"severity"` // "error", "warning", "info"
Category string `json:"category"` // "missing", "outlier", "duplicate", "correlation", "variance", "distribution"
Description string `json:"description"`
Affected []string `json:"affected"`
Impact string `json:"impact"`
}
QualityIssue describes a detected data quality problem.
type Recommendation ¶
type Recommendation struct {
Priority string `json:"priority"` // "high", "medium", "low"
Category string `json:"category"`
Action string `json:"action"`
Description string `json:"description"`
Columns []string `json:"columns,omitempty"`
}
Recommendation is an actionable suggestion derived from the quality analysis.