importer

package
v0.5.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 19, 2025 License: AGPL-3.0 Imports: 22 Imported by: 0

Documentation

Overview

Package importer provides automatic file import with type detection for tinySQL.

This package supports importing structured data from various formats (CSV, TSV, JSON, XML) directly into tinySQL tables. It auto-detects delimiters, headers, encodings, and column types to minimize manual configuration.

Features:

  • Auto-detect delimiter: ',', ';', '\t', '|' (configurable)
  • Auto-detect header row (configurable override)
  • Encoding: UTF-8, UTF-8 BOM, UTF-16LE/BE (BOM-based)
  • Transparent GZIP input
  • Smart type inference (INT, FLOAT, BOOL, TEXT, TIME, JSON)
  • Streaming import with batched INSERTs
  • Optional CREATE TABLE and TRUNCATE

Example:

import "github.com/SimonWaldherr/tinySQL/internal/importer"

f, _ := os.Open("data.csv")
result, err := importer.ImportCSV(ctx, db, tenant, "mytable", f, nil)
fmt.Printf("Imported %d rows with %d columns\n", result.RowsInserted, len(result.ColumnNames))

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func OpenFile

func OpenFile(ctx context.Context, filePath string, opts *ImportOptions) (*storage.DB, string, error)

OpenFile opens a data file (CSV, TSV, JSON, XML) and returns a DB with the data loaded. The table name is derived from the filename unless specified in options.

This is a convenience function for quick data exploration:

db, _ := importer.OpenFile(context.Background(), "data.csv", nil)
// Now you can query it with tinySQL

Types

type FuzzyImportOptions added in v0.3.1

type FuzzyImportOptions struct {
	*ImportOptions

	// SkipInvalidRows skips rows that don't match the expected column count (default true)
	SkipInvalidRows bool

	// TrimWhitespace aggressively trims whitespace from all values (default true)
	TrimWhitespace bool

	// FixQuotes attempts to fix unmatched quotes in CSV data (default true)
	FixQuotes bool

	// CoerceTypes attempts to coerce invalid values to the expected type (default true)
	CoerceTypes bool

	// MaxSkippedRows maximum number of rows to skip before giving up (default 100)
	MaxSkippedRows int

	// AllowMixedTypes allows columns to have mixed types (converts to TEXT) (default true)
	AllowMixedTypes bool

	// FuzzyJSON attempts to parse malformed JSON (default true)
	FuzzyJSON bool

	// RemoveInvalidChars removes invalid UTF-8 characters (default true)
	RemoveInvalidChars bool

	// AutoFixDelimiters tries to detect and fix inconsistent delimiters (default true)
	AutoFixDelimiters bool
}

FuzzyImportOptions extends ImportOptions with fuzzy parsing capabilities

type ImportOptions

type ImportOptions struct {
	// BatchSize controls how many rows are buffered before executing INSERT (default 1000).
	BatchSize int

	// NullLiterals are treated as SQL NULL (case-insensitive, trimmed).
	// Defaults: "", "null", "na", "n/a", "none", "#n/a"
	NullLiterals []string

	// CreateTable creates the table if it doesn't exist using inferred column types (default true).
	// Disable this if you want to create the table manually with specific types/constraints.
	CreateTable bool

	// Truncate clears the table before import (default false).
	Truncate bool

	// HeaderMode controls header detection:
	//   "auto" (default)  → heuristic decides based on data analysis
	//   "present"         → first row is always treated as header
	//   "absent"          → first row is data, synthetic column names generated (col_1, col_2, ...)
	HeaderMode string

	// DelimiterCandidates tested during auto-detection. Default: , ; \t |
	// Override to force specific delimiter(s).
	DelimiterCandidates []rune

	// TableName overrides the target table name (useful when importing from stdin/pipes).
	TableName string

	// SampleBytes caps the amount of data used for detection (default 128KB).
	SampleBytes int

	// SampleRecords caps the number of records analyzed for detection (default 500).
	SampleRecords int

	// TypeInference controls automatic type detection (default true).
	// When enabled, analyzes sample data to determine INT, FLOAT, BOOL, TIME, or TEXT.
	// When disabled, all columns default to TEXT type.
	TypeInference bool

	// DateTimeFormats lists custom datetime formats to try during type detection.
	// Defaults include RFC3339, ISO8601, common US/EU formats.
	DateTimeFormats []string

	// StrictTypes when true causes import to fail if data doesn't match detected types (default false).
	// When false, falls back to TEXT on type conversion errors.
	StrictTypes bool
}

ImportOptions configures the importer behavior. All fields are optional.

type ImportResult

type ImportResult struct {
	RowsInserted int64             // Total rows successfully inserted
	RowsSkipped  int64             // Rows skipped due to errors (if StrictTypes=false)
	Delimiter    rune              // Detected or configured delimiter
	HadHeader    bool              // Whether a header row was detected/configured
	Encoding     string            // Detected encoding: "utf-8", "utf-8-bom", "utf-16le", "utf-16be"
	ColumnNames  []string          // Final column names used
	ColumnTypes  []storage.ColType // Detected column types
	Errors       []string          // Non-fatal errors encountered during import
}

ImportResult returns metadata about the import operation.

func FuzzyImportCSV added in v0.3.1

func FuzzyImportCSV(
	ctx context.Context,
	db *storage.DB,
	tenant string,
	tableName string,
	src io.Reader,
	opts *FuzzyImportOptions,
) (*ImportResult, error)

FuzzyImportCSV is a more forgiving version of ImportCSV that handles malformed data

func FuzzyImportJSON added in v0.3.1

func FuzzyImportJSON(
	ctx context.Context,
	db *storage.DB,
	tenant string,
	tableName string,
	src io.Reader,
	opts *FuzzyImportOptions,
) (*ImportResult, error)

FuzzyImportJSON attempts to parse malformed JSON data

func ImportCSV

func ImportCSV(
	ctx context.Context,
	db *storage.DB,
	tenant string,
	tableName string,
	src io.Reader,
	opts *ImportOptions,
) (*ImportResult, error)

ImportCSV imports delimited data (CSV/TSV) from a reader into a tinySQL table.

This function auto-detects the file format, creates the table if needed, and streams data insertion with batching for performance. It handles various encodings, compressed input (gzip), and performs smart type inference.

Parameters:

  • ctx: Context for cancellation
  • db: Target database instance
  • tenant: Tenant/schema name (use "default" for single-tenant mode)
  • tableName: Target table name (can be overridden via opts.TableName)
  • src: Input reader (file, network stream, stdin, etc.)
  • opts: Optional configuration (nil uses sensible defaults)

Returns ImportResult with metadata and any error encountered.

ImportCSV is the entrypoint for CSV/TSV imports. It orchestrates detection, decoding, sampling, type inference and insertion while delegating detailed tasks to smaller helpers to keep cyclomatic complexity low.

func ImportFile

func ImportFile(
	ctx context.Context,
	db *storage.DB,
	tenant string,
	tableName string,
	filePath string,
	opts *ImportOptions,
) (*ImportResult, error)

ImportFile detects the file format and imports it into a tinySQL table. Supports: CSV, TSV, JSON, XML (with automatic detection based on extension/content).

Parameters:

  • ctx: Context for cancellation
  • db: Target database instance
  • tenant: Tenant/schema name (use "default" for single-tenant mode)
  • tableName: Target table name (if empty, derived from filename)
  • filePath: Path to the file to import
  • opts: Optional configuration (nil uses sensible defaults)

Returns ImportResult with metadata and any error encountered.

func ImportJSON

func ImportJSON(
	ctx context.Context,
	db *storage.DB,
	tenant string,
	tableName string,
	src io.Reader,
	opts *ImportOptions,
) (*ImportResult, error)

ImportJSON imports JSON data from a reader into a tinySQL table. Supports:

  • Array of objects: [{"id": 1, "name": "Alice"}, ...]
  • JSON Lines format: {"id": 1, "name": "Alice"}\n{"id": 2, "name": "Bob"}
  • Single object: {"id": 1, "name": "Alice"} (creates single-row table)

func ImportXML

func ImportXML(
	ctx context.Context,
	db *storage.DB,
	tenant string,
	tableName string,
	src io.Reader,
	opts *ImportOptions,
) (*ImportResult, error)

ImportXML imports XML data from a reader into a tinySQL table. Supports simple row-based XML like:

<root>
  <record id="1" name="Alice" />
  <record id="2" name="Bob" />
</root>

or:

<root>
  <record><id>1</id><name>Alice</name></record>
  <record><id>2</id><name>Bob</name></record>
</root>

type XMLRecord

type XMLRecord struct {
	XMLName xml.Name
	Attrs   []xml.Attr  `xml:",any,attr"`
	Content string      `xml:",chardata"`
	Nodes   []XMLRecord `xml:",any"`
}

XMLRecord represents a generic XML element for table import.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL