Documentation
¶
Overview ¶
Package importer provides automatic file import with type detection for tinySQL.
This package supports importing structured data from various formats (CSV, TSV, JSON, XML) directly into tinySQL tables. It auto-detects delimiters, headers, encodings, and column types to minimize manual configuration.
Features:
- Auto-detect delimiter: ',', ';', '\t', '|' (configurable)
- Auto-detect header row (configurable override)
- Encoding: UTF-8, UTF-8 BOM, UTF-16LE/BE (BOM-based)
- Transparent GZIP input
- Smart type inference (INT, FLOAT, BOOL, TEXT, TIME, JSON)
- Streaming import with batched INSERTs
- Optional CREATE TABLE and TRUNCATE
Example:
import "github.com/SimonWaldherr/tinySQL/internal/importer"
f, _ := os.Open("data.csv")
result, err := importer.ImportCSV(ctx, db, tenant, "mytable", f, nil)
fmt.Printf("Imported %d rows with %d columns\n", result.RowsInserted, len(result.ColumnNames))
Index ¶
- func OpenFile(ctx context.Context, filePath string, opts *ImportOptions) (*storage.DB, string, error)
- type FuzzyImportOptions
- type ImportOptions
- type ImportResult
- func FuzzyImportCSV(ctx context.Context, db *storage.DB, tenant string, tableName string, ...) (*ImportResult, error)
- func FuzzyImportJSON(ctx context.Context, db *storage.DB, tenant string, tableName string, ...) (*ImportResult, error)
- func ImportCSV(ctx context.Context, db *storage.DB, tenant string, tableName string, ...) (*ImportResult, error)
- func ImportFile(ctx context.Context, db *storage.DB, tenant string, tableName string, ...) (*ImportResult, error)
- func ImportJSON(ctx context.Context, db *storage.DB, tenant string, tableName string, ...) (*ImportResult, error)
- func ImportXML(ctx context.Context, db *storage.DB, tenant string, tableName string, ...) (*ImportResult, error)
- type XMLRecord
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func OpenFile ¶
func OpenFile(ctx context.Context, filePath string, opts *ImportOptions) (*storage.DB, string, error)
OpenFile opens a data file (CSV, TSV, JSON, XML) and returns a DB with the data loaded. The table name is derived from the filename unless specified in options.
This is a convenience function for quick data exploration:
db, _ := importer.OpenFile(context.Background(), "data.csv", nil) // Now you can query it with tinySQL
Types ¶
type FuzzyImportOptions ¶ added in v0.3.1
type FuzzyImportOptions struct {
*ImportOptions
// SkipInvalidRows skips rows that don't match the expected column count (default true)
SkipInvalidRows bool
// TrimWhitespace aggressively trims whitespace from all values (default true)
TrimWhitespace bool
// FixQuotes attempts to fix unmatched quotes in CSV data (default true)
FixQuotes bool
// CoerceTypes attempts to coerce invalid values to the expected type (default true)
CoerceTypes bool
// MaxSkippedRows maximum number of rows to skip before giving up (default 100)
MaxSkippedRows int
// AllowMixedTypes allows columns to have mixed types (converts to TEXT) (default true)
AllowMixedTypes bool
// FuzzyJSON attempts to parse malformed JSON (default true)
FuzzyJSON bool
// RemoveInvalidChars removes invalid UTF-8 characters (default true)
RemoveInvalidChars bool
// AutoFixDelimiters tries to detect and fix inconsistent delimiters (default true)
AutoFixDelimiters bool
}
FuzzyImportOptions extends ImportOptions with fuzzy parsing capabilities
type ImportOptions ¶
type ImportOptions struct {
// BatchSize controls how many rows are buffered before executing INSERT (default 1000).
BatchSize int
// NullLiterals are treated as SQL NULL (case-insensitive, trimmed).
// Defaults: "", "null", "na", "n/a", "none", "#n/a"
NullLiterals []string
// CreateTable creates the table if it doesn't exist using inferred column types (default true).
// Disable this if you want to create the table manually with specific types/constraints.
CreateTable bool
// Truncate clears the table before import (default false).
Truncate bool
// HeaderMode controls header detection:
// "auto" (default) → heuristic decides based on data analysis
// "present" → first row is always treated as header
// "absent" → first row is data, synthetic column names generated (col_1, col_2, ...)
HeaderMode string
// DelimiterCandidates tested during auto-detection. Default: , ; \t |
// Override to force specific delimiter(s).
DelimiterCandidates []rune
// TableName overrides the target table name (useful when importing from stdin/pipes).
TableName string
// SampleBytes caps the amount of data used for detection (default 128KB).
SampleBytes int
// SampleRecords caps the number of records analyzed for detection (default 500).
SampleRecords int
// TypeInference controls automatic type detection (default true).
// When enabled, analyzes sample data to determine INT, FLOAT, BOOL, TIME, or TEXT.
// When disabled, all columns default to TEXT type.
TypeInference bool
// DateTimeFormats lists custom datetime formats to try during type detection.
// Defaults include RFC3339, ISO8601, common US/EU formats.
DateTimeFormats []string
// StrictTypes when true causes import to fail if data doesn't match detected types (default false).
// When false, falls back to TEXT on type conversion errors.
StrictTypes bool
}
ImportOptions configures the importer behavior. All fields are optional.
type ImportResult ¶
type ImportResult struct {
RowsInserted int64 // Total rows successfully inserted
RowsSkipped int64 // Rows skipped due to errors (if StrictTypes=false)
Delimiter rune // Detected or configured delimiter
HadHeader bool // Whether a header row was detected/configured
Encoding string // Detected encoding: "utf-8", "utf-8-bom", "utf-16le", "utf-16be"
ColumnNames []string // Final column names used
ColumnTypes []storage.ColType // Detected column types
Errors []string // Non-fatal errors encountered during import
}
ImportResult returns metadata about the import operation.
func FuzzyImportCSV ¶ added in v0.3.1
func FuzzyImportCSV( ctx context.Context, db *storage.DB, tenant string, tableName string, src io.Reader, opts *FuzzyImportOptions, ) (*ImportResult, error)
FuzzyImportCSV is a more forgiving version of ImportCSV that handles malformed data
func FuzzyImportJSON ¶ added in v0.3.1
func FuzzyImportJSON( ctx context.Context, db *storage.DB, tenant string, tableName string, src io.Reader, opts *FuzzyImportOptions, ) (*ImportResult, error)
FuzzyImportJSON attempts to parse malformed JSON data
func ImportCSV ¶
func ImportCSV( ctx context.Context, db *storage.DB, tenant string, tableName string, src io.Reader, opts *ImportOptions, ) (*ImportResult, error)
ImportCSV imports delimited data (CSV/TSV) from a reader into a tinySQL table.
This function auto-detects the file format, creates the table if needed, and streams data insertion with batching for performance. It handles various encodings, compressed input (gzip), and performs smart type inference.
Parameters:
- ctx: Context for cancellation
- db: Target database instance
- tenant: Tenant/schema name (use "default" for single-tenant mode)
- tableName: Target table name (can be overridden via opts.TableName)
- src: Input reader (file, network stream, stdin, etc.)
- opts: Optional configuration (nil uses sensible defaults)
Returns ImportResult with metadata and any error encountered.
ImportCSV is the entrypoint for CSV/TSV imports. It orchestrates detection, decoding, sampling, type inference and insertion while delegating detailed tasks to smaller helpers to keep cyclomatic complexity low.
func ImportFile ¶
func ImportFile( ctx context.Context, db *storage.DB, tenant string, tableName string, filePath string, opts *ImportOptions, ) (*ImportResult, error)
ImportFile detects the file format and imports it into a tinySQL table. Supports: CSV, TSV, JSON, XML (with automatic detection based on extension/content).
Parameters:
- ctx: Context for cancellation
- db: Target database instance
- tenant: Tenant/schema name (use "default" for single-tenant mode)
- tableName: Target table name (if empty, derived from filename)
- filePath: Path to the file to import
- opts: Optional configuration (nil uses sensible defaults)
Returns ImportResult with metadata and any error encountered.
func ImportJSON ¶
func ImportJSON( ctx context.Context, db *storage.DB, tenant string, tableName string, src io.Reader, opts *ImportOptions, ) (*ImportResult, error)
ImportJSON imports JSON data from a reader into a tinySQL table. Supports:
- Array of objects: [{"id": 1, "name": "Alice"}, ...]
- JSON Lines format: {"id": 1, "name": "Alice"}\n{"id": 2, "name": "Bob"}
- Single object: {"id": 1, "name": "Alice"} (creates single-row table)
func ImportXML ¶
func ImportXML( ctx context.Context, db *storage.DB, tenant string, tableName string, src io.Reader, opts *ImportOptions, ) (*ImportResult, error)
ImportXML imports XML data from a reader into a tinySQL table. Supports simple row-based XML like:
<root> <record id="1" name="Alice" /> <record id="2" name="Bob" /> </root>
or:
<root> <record><id>1</id><name>Alice</name></record> <record><id>2</id><name>Bob</name></record> </root>