importer

package

v0.5.3 Latest Latest Go to latest Published: Dec 19, 2025 License: AGPL-3.0 Imports: 22 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/SimonWaldherr/tinySQL

Links

Open Source Insights

Documentation ¶

Overview ¶

Package importer provides automatic file import with type detection for tinySQL.

This package supports importing structured data from various formats (CSV, TSV, JSON, XML) directly into tinySQL tables. It auto-detects delimiters, headers, encodings, and column types to minimize manual configuration.

Features:

Auto-detect delimiter: ',', ';', '\t', '|' (configurable)
Auto-detect header row (configurable override)
Encoding: UTF-8, UTF-8 BOM, UTF-16LE/BE (BOM-based)
Transparent GZIP input
Smart type inference (INT, FLOAT, BOOL, TEXT, TIME, JSON)
Streaming import with batched INSERTs
Optional CREATE TABLE and TRUNCATE

Example:

import "github.com/SimonWaldherr/tinySQL/internal/importer"

f, _ := os.Open("data.csv")
result, err := importer.ImportCSV(ctx, db, tenant, "mytable", f, nil)
fmt.Printf("Imported %d rows with %d columns\n", result.RowsInserted, len(result.ColumnNames))

Index ¶

func OpenFile(ctx context.Context, filePath string, opts *ImportOptions) (*storage.DB, string, error)
type FuzzyImportOptions
type ImportOptions
type ImportResult
type XMLRecord

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func OpenFile ¶

func OpenFile(ctx context.Context, filePath string, opts *ImportOptions) (*storage.DB, string, error)

OpenFile opens a data file (CSV, TSV, JSON, XML) and returns a DB with the data loaded. The table name is derived from the filename unless specified in options.

This is a convenience function for quick data exploration:

db, _ := importer.OpenFile(context.Background(), "data.csv", nil)
// Now you can query it with tinySQL

Types ¶

type FuzzyImportOptions ¶ added in v0.3.1

type FuzzyImportOptions struct {
	*ImportOptions

	// SkipInvalidRows skips rows that don't match the expected column count (default true)
	SkipInvalidRows bool

	// TrimWhitespace aggressively trims whitespace from all values (default true)
	TrimWhitespace bool

	// FixQuotes attempts to fix unmatched quotes in CSV data (default true)
	FixQuotes bool

	// CoerceTypes attempts to coerce invalid values to the expected type (default true)
	CoerceTypes bool

	// MaxSkippedRows maximum number of rows to skip before giving up (default 100)
	MaxSkippedRows int

	// AllowMixedTypes allows columns to have mixed types (converts to TEXT) (default true)
	AllowMixedTypes bool

	// FuzzyJSON attempts to parse malformed JSON (default true)
	FuzzyJSON bool

	// RemoveInvalidChars removes invalid UTF-8 characters (default true)
	RemoveInvalidChars bool

	// AutoFixDelimiters tries to detect and fix inconsistent delimiters (default true)
	AutoFixDelimiters bool
}

FuzzyImportOptions extends ImportOptions with fuzzy parsing capabilities

type ImportOptions ¶

type ImportOptions struct {
	// BatchSize controls how many rows are buffered before executing INSERT (default 1000).
	BatchSize int

	// NullLiterals are treated as SQL NULL (case-insensitive, trimmed).
	// Defaults: "", "null", "na", "n/a", "none", "#n/a"
	NullLiterals []string

	// CreateTable creates the table if it doesn't exist using inferred column types (default true).
	// Disable this if you want to create the table manually with specific types/constraints.
	CreateTable bool

	// Truncate clears the table before import (default false).
	Truncate bool

	// HeaderMode controls header detection:
	//   "auto" (default)  → heuristic decides based on data analysis
	//   "present"         → first row is always treated as header
	//   "absent"          → first row is data, synthetic column names generated (col_1, col_2, ...)
	HeaderMode string

	// DelimiterCandidates tested during auto-detection. Default: , ; \t |
	// Override to force specific delimiter(s).
	DelimiterCandidates []rune

	// TableName overrides the target table name (useful when importing from stdin/pipes).
	TableName string

	// SampleBytes caps the amount of data used for detection (default 128KB).
	SampleBytes int

	// SampleRecords caps the number of records analyzed for detection (default 500).
	SampleRecords int

	// TypeInference controls automatic type detection (default true).
	// When enabled, analyzes sample data to determine INT, FLOAT, BOOL, TIME, or TEXT.
	// When disabled, all columns default to TEXT type.
	TypeInference bool

	// DateTimeFormats lists custom datetime formats to try during type detection.
	// Defaults include RFC3339, ISO8601, common US/EU formats.
	DateTimeFormats []string

	// StrictTypes when true causes import to fail if data doesn't match detected types (default false).
	// When false, falls back to TEXT on type conversion errors.
	StrictTypes bool
}

ImportOptions configures the importer behavior. All fields are optional.

type ImportResult ¶

type ImportResult struct {
	RowsInserted int64             // Total rows successfully inserted
	RowsSkipped  int64             // Rows skipped due to errors (if StrictTypes=false)
	Delimiter    rune              // Detected or configured delimiter
	HadHeader    bool              // Whether a header row was detected/configured
	Encoding     string            // Detected encoding: "utf-8", "utf-8-bom", "utf-16le", "utf-16be"
	ColumnNames  []string          // Final column names used
	ColumnTypes  []storage.ColType // Detected column types
	Errors       []string          // Non-fatal errors encountered during import
}

ImportResult returns metadata about the import operation.

func FuzzyImportCSV ¶ added in v0.3.1

func FuzzyImportCSV(
	ctx context.Context,
	db *storage.DB,
	tenant string,
	tableName string,
	src io.Reader,
	opts *FuzzyImportOptions,
) (*ImportResult, error)

FuzzyImportCSV is a more forgiving version of ImportCSV that handles malformed data

func FuzzyImportJSON ¶ added in v0.3.1

func FuzzyImportJSON(
	ctx context.Context,
	db *storage.DB,
	tenant string,
	tableName string,
	src io.Reader,
	opts *FuzzyImportOptions,
) (*ImportResult, error)

FuzzyImportJSON attempts to parse malformed JSON data

func ImportCSV ¶

func ImportCSV(
	ctx context.Context,
	db *storage.DB,
	tenant string,
	tableName string,
	src io.Reader,
	opts *ImportOptions,
) (*ImportResult, error)

ImportCSV imports delimited data (CSV/TSV) from a reader into a tinySQL table.

This function auto-detects the file format, creates the table if needed, and streams data insertion with batching for performance. It handles various encodings, compressed input (gzip), and performs smart type inference.

Parameters:

ctx: Context for cancellation
db: Target database instance
tenant: Tenant/schema name (use "default" for single-tenant mode)
tableName: Target table name (can be overridden via opts.TableName)
src: Input reader (file, network stream, stdin, etc.)
opts: Optional configuration (nil uses sensible defaults)

Returns ImportResult with metadata and any error encountered.

ImportCSV is the entrypoint for CSV/TSV imports. It orchestrates detection, decoding, sampling, type inference and insertion while delegating detailed tasks to smaller helpers to keep cyclomatic complexity low.

func ImportFile ¶

func ImportFile(
	ctx context.Context,
	db *storage.DB,
	tenant string,
	tableName string,
	filePath string,
	opts *ImportOptions,
) (*ImportResult, error)

ImportFile detects the file format and imports it into a tinySQL table. Supports: CSV, TSV, JSON, XML (with automatic detection based on extension/content).

Parameters:

ctx: Context for cancellation
db: Target database instance
tenant: Tenant/schema name (use "default" for single-tenant mode)
tableName: Target table name (if empty, derived from filename)
filePath: Path to the file to import
opts: Optional configuration (nil uses sensible defaults)

Returns ImportResult with metadata and any error encountered.

func ImportJSON ¶

func ImportJSON(
	ctx context.Context,
	db *storage.DB,
	tenant string,
	tableName string,
	src io.Reader,
	opts *ImportOptions,
) (*ImportResult, error)

ImportJSON imports JSON data from a reader into a tinySQL table. Supports:

Array of objects: [{"id": 1, "name": "Alice"}, ...]
JSON Lines format: {"id": 1, "name": "Alice"}\n{"id": 2, "name": "Bob"}
Single object: {"id": 1, "name": "Alice"} (creates single-row table)

func ImportXML ¶

func ImportXML(
	ctx context.Context,
	db *storage.DB,
	tenant string,
	tableName string,
	src io.Reader,
	opts *ImportOptions,
) (*ImportResult, error)

ImportXML imports XML data from a reader into a tinySQL table. Supports simple row-based XML like:

<root>
  <record id="1" name="Alice" />
  <record id="2" name="Bob" />
</root>

or:

<root>
  <record><id>1</id><name>Alice</name></record>
  <record><id>2</id><name>Bob</name></record>
</root>

type XMLRecord ¶

type XMLRecord struct {
	XMLName xml.Name
	Attrs   []xml.Attr  `xml:",any,attr"`
	Content string      `xml:",chardata"`
	Nodes   []XMLRecord `xml:",any"`
}

XMLRecord represents a generic XML element for table import.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL