filesql

package module
v0.4.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 3, 2025 License: MIT Imports: 31 Imported by: 2

README ΒΆ

filesql

Go Reference Go Report Card MultiPlatformUnitTest Coverage

ζ—₯本θͺž | Русский | δΈ­ζ–‡ | ν•œκ΅­μ–΄ | EspaΓ±ol | FranΓ§ais

logo

filesql is a Go SQL driver that enables you to query CSV, TSV, LTSV, Parquet, and Excel (XLSX) files using SQLite3 SQL syntax. Query your data files directly without any imports or transformations!

🎯 Why filesql?

This library was born from the experience of maintaining two separate CLI tools - sqly and sqluv. Both tools shared a common feature: executing SQL queries against CSV, TSV, and other file formats.

Rather than maintaining duplicate code across both projects, we extracted the core functionality into this reusable SQL driver. Now, any Go developer can leverage this capability in their own applications!

✨ Features

  • πŸ” SQLite3 SQL Interface - Use SQLite3's powerful SQL dialect to query your files
  • πŸ“ Multiple File Formats - Support for CSV, TSV, LTSV, Parquet, and Excel (XLSX) files
  • πŸ—œοΈ Compression Support - Automatically handles .gz, .bz2, .xz, and .zst compressed files
  • 🌊 Stream Processing - Efficiently handles large files through streaming with configurable chunk sizes
  • πŸ“– Flexible Input Sources - Support for file paths, directories, io.Reader, and embed.FS
  • πŸš€ Zero Setup - No database server required, everything runs in-memory
  • πŸ’Ύ Auto-Save - Automatically persist changes back to files
  • 🌍 Cross-Platform - Works seamlessly on Linux, macOS, and Windows
  • ⚑ SQLite3 Powered - Built on the robust SQLite3 engine for reliable SQL processing

πŸ“‹ Supported File Formats

Extension Format Description
.csv CSV Comma-separated values
.tsv TSV Tab-separated values
.ltsv LTSV Labeled Tab-separated Values
.parquet Parquet Apache Parquet columnar format
.xlsx Excel XLSX Microsoft Excel workbook format
.csv.gz, .tsv.gz, .ltsv.gz, .parquet.gz, .xlsx.gz Gzip compressed Gzip compressed files
.csv.bz2, .tsv.bz2, .ltsv.bz2, .parquet.bz2, .xlsx.bz2 Bzip2 compressed Bzip2 compressed files
.csv.xz, .tsv.xz, .ltsv.xz, .parquet.xz, .xlsx.xz XZ compressed XZ compressed files
.csv.zst, .tsv.zst, .ltsv.zst, .parquet.zst, .xlsx.zst Zstandard compressed Zstandard compressed files

πŸ“¦ Installation

go get github.com/nao1215/filesql

πŸ”§ Requirements

  • Go Version: 1.24 or later
  • Operating Systems:
    • Linux
    • macOS
    • Windows

πŸš€ Quick Start

Simple Usage

The recommended way to get started is with OpenContext for proper timeout handling:

package main

import (
    "context"
    "fmt"
    "log"
    "time"
    
    "github.com/nao1215/filesql"
)

func main() {
    // Create context with timeout for large file operations
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()
    
    // Open a CSV file as a database
    db, err := filesql.OpenContext(ctx, "data.csv")
    if err != nil {
        log.Fatal(err)
    }
    defer db.Close()
    
    // Query the data (table name = filename without extension)
    rows, err := db.QueryContext(ctx, "SELECT * FROM data WHERE age > 25")
    if err != nil {
        log.Fatal(err)
    }
    defer rows.Close()
    
    // Process results
    for rows.Next() {
        var name string
        var age int
        if err := rows.Scan(&name, &age); err != nil {
            log.Fatal(err)
        }
        fmt.Printf("Name: %s, Age: %d\n", name, age)
    }
}
Multiple Files and Formats
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()

// Open multiple files at once (including Parquet)
db, err := filesql.OpenContext(ctx, "users.csv", "orders.tsv", "logs.ltsv.gz", "analytics.parquet")
if err != nil {
    log.Fatal(err)
}
defer db.Close()

// Join data across different file formats
rows, err := db.QueryContext(ctx, `
    SELECT u.name, o.order_date, l.event, a.metrics
    FROM users u
    JOIN orders o ON u.id = o.user_id
    JOIN logs l ON u.id = l.user_id
    JOIN analytics a ON u.id = a.user_id
    WHERE o.order_date > '2024-01-01'
`)
Working with Directories
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()

// Load all supported files from a directory (recursive)
db, err := filesql.OpenContext(ctx, "/path/to/data/directory")
if err != nil {
    log.Fatal(err)
}
defer db.Close()

// See what tables are available
rows, err := db.QueryContext(ctx, "SELECT name FROM sqlite_master WHERE type='table'")

πŸ”§ Advanced Usage

Builder Pattern

For advanced scenarios, use the builder pattern:

package main

import (
    "context"
    "embed"
    "log"
    
    "github.com/nao1215/filesql"
)

//go:embed data/*.csv
var embeddedFiles embed.FS

func main() {
    ctx := context.Background()
    
    // Configure data sources with builder
    validatedBuilder, err := filesql.NewBuilder().
        AddPath("local_file.csv").      // Local file
        AddFS(embeddedFiles).           // Embedded files
        SetDefaultChunkSize(5000). // 5000 rows per chunk
        Build(ctx)
    if err != nil {
        log.Fatal(err)
    }
    
    db, err := validatedBuilder.Open(ctx)
    if err != nil {
        log.Fatal(err)
    }
    defer db.Close()
    
    // Query across all data sources
    rows, err := db.Query("SELECT name FROM sqlite_master WHERE type='table'")
    if err != nil {
        log.Fatal(err)
    }
    defer rows.Close()
}
Auto-Save Features
Auto-Save on Database Close
// Auto-save changes when database is closed
validatedBuilder, err := filesql.NewBuilder().
    AddPath("data.csv").
    EnableAutoSave("./backup"). // Save to backup directory
    Build(ctx)
if err != nil {
    log.Fatal(err)
}

db, err := validatedBuilder.Open(ctx)
if err != nil {
    log.Fatal(err)
}
defer db.Close() // Changes are automatically saved here

// Make changes
db.Exec("UPDATE data SET status = 'processed' WHERE id = 1")
db.Exec("INSERT INTO data (name, age) VALUES ('John', 30)")
Auto-Save on Transaction Commit
// Auto-save after each transaction
validatedBuilder, err := filesql.NewBuilder().
    AddPath("data.csv").
    EnableAutoSaveOnCommit(""). // Empty = overwrite original files
    Build(ctx)
if err != nil {
    log.Fatal(err)
}

db, err := validatedBuilder.Open(ctx)
if err != nil {
    log.Fatal(err)
}
defer db.Close()

// Changes are saved after each commit
tx, _ := db.Begin()
tx.Exec("UPDATE data SET status = 'processed' WHERE id = 1")
tx.Commit() // Auto-save happens here
Working with io.Reader and Network Data
import (
    "net/http"
    "github.com/nao1215/filesql"
)

// Load data from HTTP response
resp, err := http.Get("https://example.com/data.csv")
if err != nil {
    log.Fatal(err)
}
defer resp.Body.Close()

validatedBuilder, err := filesql.NewBuilder().
    AddReader(resp.Body, "remote_data", filesql.FileTypeCSV).
    Build(ctx)
if err != nil {
    log.Fatal(err)
}

db, err := validatedBuilder.Open(ctx)
if err != nil {
    log.Fatal(err)
}
defer db.Close()

// Query remote data
rows, err := db.QueryContext(ctx, "SELECT * FROM remote_data LIMIT 10")
Manual Data Export

If you prefer manual control over saving:

ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()

db, err := filesql.OpenContext(ctx, "data.csv")
if err != nil {
    log.Fatal(err)
}
defer db.Close()

// Make modifications
db.Exec("UPDATE data SET status = 'processed'")

// Manually export changes
err = filesql.DumpDatabase(db, "./output")
if err != nil {
    log.Fatal(err)
}

// Or with custom format and compression
options := filesql.NewDumpOptions().
    WithFormat(filesql.OutputFormatTSV).
    WithCompression(filesql.CompressionGZ)
err = filesql.DumpDatabase(db, "./output", options)

// Export to Parquet format
parquetOptions := filesql.NewDumpOptions().
    WithFormat(filesql.OutputFormatParquet)
// Note: Parquet export is implemented, but external compression is not supported (use Parquet's built-in compression)

πŸ“ Table Naming Rules

filesql automatically derives table names from file paths:

  • users.csv β†’ table users
  • data.tsv.gz β†’ table data
  • /path/to/sales.csv β†’ table sales
  • products.ltsv.bz2 β†’ table products
  • analytics.parquet β†’ table analytics

⚠️ Important Notes

SQL Syntax

Since filesql uses SQLite3 as its underlying engine, all SQL syntax follows SQLite3's SQL dialect. This includes:

  • Functions (e.g., date(), substr(), json_extract())
  • Window functions
  • Common Table Expressions (CTEs)
  • Triggers and views
Data Modifications
  • INSERT, UPDATE, and DELETE operations affect the in-memory database
  • Original files remain unchanged by default
  • Use auto-save features or DumpDatabase() to persist changes
  • This makes it safe to experiment with data transformations
Performance Tips
  • Use OpenContext() with timeouts for large files
  • Configure chunk sizes (rows per chunk) with SetDefaultChunkSize() for memory optimization
  • Single SQLite connection works best for most scenarios
  • Use streaming for files larger than available memory
Concurrency Limitations

⚠️ IMPORTANT: This library is NOT thread-safe and has concurrency limitations:

  • Do NOT share database connections across goroutines
  • Do NOT perform concurrent operations on the same database instance
  • Do NOT call db.Close() while queries are active in other goroutines
  • Use separate database instances for concurrent operations if needed
  • Race conditions may cause segmentation faults or data corruption

Recommended pattern for concurrent access:

// βœ… GOOD: Separate database instances per goroutine
func processFileConcurrently(filename string) error {
    db, err := filesql.Open(filename)  // Each goroutine gets its own instance
    if err != nil {
        return err
    }
    defer db.Close()
    
    // Safe to use within this goroutine
    return processData(db)
}

// ❌ BAD: Sharing database instance across goroutines
var sharedDB *sql.DB  // This will cause race conditions
Parquet Support
  • Reading: Full support for Apache Parquet files with complex data types
  • Writing: Export functionality is implemented (external compression not supported, use Parquet's built-in compression)
  • Type Mapping: Parquet types are mapped to SQLite types (see PARQUET_TYPE_MAPPING.md)
  • Compression: Parquet's built-in compression is used instead of external compression
  • Large Data: Parquet files are efficiently processed with Arrow's columnar format
Excel (XLSX) Support
  • 1-Sheet-1-Table Structure: Each sheet in an Excel workbook becomes a separate SQL table
  • Table Naming: SQL table names follow the format {filename}_{sheetname} (e.g., "sales_Q1", "sales_Q2")
  • Header Row Processing: First row of each sheet becomes the column headers for that table
  • Standard SQL Operations: Query each sheet independently or use JOINs to combine data across sheets
  • Memory Requirements: XLSX files require full loading into memory due to the ZIP-based format structure, even during streaming operations
  • Implementation Note: XLSX files are fully loaded into memory due to ZIP structure and all sheets are processed (CSV/TSV streaming parsers are not applicable)
  • Export Functionality: When exporting to XLSX format, table names become sheet names automatically
  • Compression Support: Full support for compressed XLSX files (.xlsx.gz, .xlsx.bz2, .xlsx.xz, .xlsx.zst)
Excel File Structure Example
Excel File with Multiple Sheets:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Sheet1      β”‚    β”‚ Sheet2      β”‚    β”‚ Sheet3      β”‚
β”‚ Name   Age  β”‚    β”‚ Product     β”‚    β”‚ Region      β”‚
β”‚ Alice   25  β”‚    β”‚ Laptop      β”‚    β”‚ North       β”‚
β”‚ Bob     30  β”‚    β”‚ Mouse       β”‚    β”‚ South       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Results in 3 separate SQL tables:

sales_Sheet1:           sales_Sheet2:           sales_Sheet3:
β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Name β”‚ Age β”‚          β”‚ Product β”‚             β”‚ Region β”‚
β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€          β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€             β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Aliceβ”‚  25 β”‚          β”‚ Laptop  β”‚             β”‚ North  β”‚
β”‚ Bob  β”‚  30 β”‚          β”‚ Mouse   β”‚             β”‚ South  β”‚
β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜

SQL Examples:
SELECT * FROM sales_Sheet1 WHERE Age > 27;
SELECT s1.Name, s2.Product FROM sales_Sheet1 s1 
  JOIN sales_Sheet2 s2 ON s1.rowid = s2.rowid;

🎨 Advanced Examples

Complex SQL Queries
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()

db, err := filesql.OpenContext(ctx, "employees.csv", "departments.csv")
if err != nil {
    log.Fatal(err)
}
defer db.Close()

// Use advanced SQLite features
query := `
    WITH dept_stats AS (
        SELECT 
            department_id,
            AVG(salary) as avg_salary,
            COUNT(*) as emp_count
        FROM employees
        GROUP BY department_id
    )
    SELECT 
        e.name,
        e.salary,
        d.name as department,
        ds.avg_salary as dept_avg,
        RANK() OVER (PARTITION BY e.department_id ORDER BY e.salary DESC) as salary_rank
    FROM employees e
    JOIN departments d ON e.department_id = d.id
    JOIN dept_stats ds ON e.department_id = ds.department_id
    WHERE e.salary > ds.avg_salary * 0.8
    ORDER BY d.name, salary_rank
`

rows, err := db.QueryContext(ctx, query)
Context and Cancellation
import (
    "context"
    "time"
)

// Set timeout for large file operations
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
defer cancel()

db, err := filesql.OpenContext(ctx, "huge_dataset.csv.gz")
if err != nil {
    log.Fatal(err)
}
defer db.Close()

// Query with context for cancellation support
rows, err := db.QueryContext(ctx, "SELECT * FROM huge_dataset WHERE status = 'active'")

🀝 Contributing

Contributions are welcome! Please see the Contributing Guide for more details.

πŸ’– Support

If you find this project useful, please consider:

  • ⭐ Giving it a star on GitHub - it helps others discover the project
  • πŸ’ Becoming a sponsor - your support keeps the project alive and motivates continued development

Your support, whether through stars, sponsorships, or contributions, is what drives this project forward. Thank you!

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

Documentation ΒΆ

Overview ΒΆ

Package filesql provides a file-based SQL driver implementation that enables querying CSV, TSV, LTSV, Parquet, and Excel (XLSX) files using SQLite3 SQL syntax.

filesql allows you to treat structured text files as SQL databases without any data import or transformation steps. It uses SQLite3 as an in-memory database engine, providing full SQL capabilities including JOINs, aggregations, window functions, and CTEs.

Features ΒΆ

  • Query CSV, TSV, LTSV, Parquet, and Excel (XLSX) files using standard SQL
  • Automatic handling of compressed files (gzip, bzip2, xz, zstandard)
  • Support for multiple input sources (files, directories, io.Reader, embed.FS)
  • Efficient streaming for large files with configurable chunk sizes
  • Cross-platform compatibility (Linux, macOS, Windows)
  • Optional auto-save functionality to persist changes

Basic Usage ΒΆ

The simplest way to use filesql is with the Open or OpenContext functions:

db, err := filesql.Open("data.csv")
if err != nil {
    log.Fatal(err)
}
defer db.Close()

rows, err := db.Query("SELECT * FROM data WHERE age > 25")
if err != nil {
    log.Fatal(err)
}
defer rows.Close()

Advanced Usage ΒΆ

For more complex scenarios, use the Builder pattern:

builder := filesql.NewBuilder().
    AddPath("users.csv").
    AddPath("orders.tsv").
    EnableAutoSave("./output")

validatedBuilder, err := builder.Build(ctx)
if err != nil {
    log.Fatal(err)
}

db, err := validatedBuilder.Open(ctx)
if err != nil {
    log.Fatal(err)
}
defer db.Close()

Table Naming ΒΆ

Table names are automatically derived from file paths:

  • "users.csv" becomes table "users"
  • "data.tsv.gz" becomes table "data"
  • "/path/to/logs.ltsv" becomes table "logs"
  • "sales.xlsx" with multiple sheets becomes tables "sales_Sheet1", "sales_Sheet2", etc.

Data Modifications ΒΆ

INSERT, UPDATE, and DELETE operations affect only the in-memory database. Original files remain unchanged unless auto-save is enabled. To persist changes manually, use the DumpDatabase function.

SQL Syntax ΒΆ

Since filesql uses SQLite3 as its underlying engine, all SQL syntax follows SQLite3's SQL dialect. This includes support for:

  • Common Table Expressions (CTEs)
  • Window functions
  • JSON functions
  • Date and time functions
  • And all other SQLite3 features

Column Name Handling ΒΆ

Column names are handled with case-sensitive comparison for duplicate detection, maintaining backward compatibility. Headers with identical names after trimming whitespace (regardless of case differences) are considered duplicates and will result in an error.

For complete SQL syntax documentation, see: https://www.sqlite.org/lang.html

Index ΒΆ

Examples ΒΆ

Constants ΒΆ

View Source
const (
	// DefaultRowsPerChunk is the default number of rows per chunk
	DefaultRowsPerChunk = 1000
	// DefaultChunkSize is the default chunk size (rows); alias for clarity
	DefaultChunkSize = DefaultRowsPerChunk
	// MinChunkSize is the minimum allowed rows per chunk
	MinChunkSize = 1
	// ValidationPeekSize is the size used for validation peek operations
	ValidationPeekSize = 1
)

Processing constants (rows-based)

View Source
const (
	// MaxSampleSize limits how many values to sample for type inference
	MaxSampleSize = 1000
	// MinConfidenceThreshold is the minimum percentage of values that must match a type
	MinConfidenceThreshold = 0.8
	// EarlyTerminationThreshold is the percentage of text values that triggers early termination
	EarlyTerminationThreshold = 0.5
	// MinDatetimeLength is the minimum reasonable length for datetime values
	MinDatetimeLength = 4
	// MaxDatetimeLength is the maximum reasonable length for datetime values
	MaxDatetimeLength = 35
	// SamplingStratificationFactor determines when to use stratified vs simple sampling
	SamplingStratificationFactor = 3
	// MinRealThreshold is the minimum percentage of real values needed to classify as REAL
	MinRealThreshold = 0.1
)

Type inference constants

Variables ΒΆ

View Source
var (

	// ErrEmptyData indicates that the data source contains no records
	ErrEmptyData = errors.New("filesql: empty data source")

	// ErrUnsupportedFormat indicates an unsupported file format
	ErrUnsupportedFormat = errors.New("filesql: unsupported file format")

	// ErrInvalidData indicates malformed or invalid data
	ErrInvalidData = errors.New("filesql: invalid data format")

	// ErrNoTables indicates no tables found in database
	ErrNoTables = errors.New("filesql: no tables found in database")

	// ErrFileNotFound indicates file not found
	ErrFileNotFound = errors.New("filesql: file not found")

	// ErrPermissionDenied indicates permission denied
	ErrPermissionDenied = errors.New("filesql: permission denied")

	// ErrMemoryLimit indicates memory limit exceeded
	ErrMemoryLimit = errors.New("filesql: memory limit exceeded")

	// ErrContextCancelled indicates context was cancelled
	ErrContextCancelled = errors.New("filesql: context cancelled")
)

Standard error messages and error creation functions for consistency

Functions ΒΆ

func DumpDatabase ΒΆ

func DumpDatabase(db *sql.DB, outputDir string, opts ...DumpOptions) error

DumpDatabase saves all database tables to files in the specified directory.

Basic usage:

err := filesql.DumpDatabase(db, "./output")

This will save all tables as CSV files in the output directory.

Advanced usage with options:

// Default: Export as CSV files
err := DumpDatabase(db, "./output")

// Export as TSV files with gzip compression
options := NewDumpOptions().
	WithFormat(OutputFormatTSV).
	WithCompression(CompressionGZ)
err := DumpDatabase(db, "./output", options)
Example ΒΆ

ExampleDumpDatabase demonstrates exporting modified data

package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"path/filepath"
	"strings"
	"time"

	"github.com/nao1215/filesql"
)

// createTempTestData creates temporary CSV files for the example
func createTempTestData() string {
	tmpDir, err := os.MkdirTemp("", "filesql_example")
	if err != nil {
		log.Fatal(err)
	}

	employeesData := `id,name,department_id,salary,hire_date
1,Alice Johnson,1,95000,2020-01-15
2,Bob Smith,1,85000,2019-03-22
3,Charlie Brown,1,80000,2021-06-10
4,David Wilson,1,75000,2022-02-28
5,Eve Davis,2,70000,2020-09-15
6,Frank Miller,2,65000,2021-11-30
7,Grace Lee,3,60000,2019-12-05
8,Henry Taylor,3,55000,2022-04-18`

	err = os.WriteFile(filepath.Join(tmpDir, "employees.csv"), []byte(employeesData), 0600)
	if err != nil {
		log.Fatal(err)
	}

	departmentsData := `id,name,budget,manager_id
1,Engineering,1000000,1
2,Marketing,800000,5
3,Sales,600000,7
4,HR,400000,9`

	err = os.WriteFile(filepath.Join(tmpDir, "departments.csv"), []byte(departmentsData), 0600)
	if err != nil {
		log.Fatal(err)
	}

	return tmpDir
}

func main() {
	tmpDir := createTempTestData()
	defer os.RemoveAll(tmpDir)

	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
	defer cancel()

	db, err := filesql.OpenContext(ctx, filepath.Join(tmpDir, "employees.csv"))
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	// Modify data in memory
	_, err = db.ExecContext(ctx, `
		UPDATE employees 
		SET salary = salary * 1.10 
		WHERE department_id = 1
	`)
	if err != nil {
		log.Fatal(err)
	}

	// Add a new employee
	_, err = db.ExecContext(ctx, `
		INSERT INTO employees (id, name, department_id, salary, hire_date) 
		VALUES (99, 'New Employee', 2, 60000, '2024-01-01')
	`)
	if err != nil {
		log.Fatal(err)
	}

	// Create output directory
	outputDir := filepath.Join(tmpDir, "output")
	err = os.MkdirAll(outputDir, 0750)
	if err != nil {
		log.Fatal(err)
	}

	// Export modified data
	err = filesql.DumpDatabase(db, outputDir)
	if err != nil {
		log.Fatal(err)
	}

	// Verify export by reading the exported file
	exportedFile := filepath.Join(outputDir, "employees.csv")
	if _, err := os.Stat(exportedFile); err != nil {
		log.Fatal("Exported file not found:", err)
	}

	// Count records in exported file
	db2, err := filesql.OpenContext(ctx, exportedFile)
	if err != nil {
		log.Fatal(err)
	}
	defer db2.Close()

	var count int
	err = db2.QueryRowContext(ctx, "SELECT COUNT(*) FROM employees").Scan(&count)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("Original file: 8 employees\n")
	fmt.Printf("Modified and exported: %d employees\n", count)

	// Extract just the filename for consistent output (normalize path separators for cross-platform compatibility)
	exportPath := strings.Replace(exportedFile, tmpDir, "/tmp/filesql_example*", 1)
	exportPath = strings.ReplaceAll(exportPath, "\\", "/") // Convert Windows backslashes to forward slashes
	fmt.Printf("Export location: %s\n", exportPath)

}
Output:

Original file: 8 employees
Modified and exported: 9 employees
Export location: /tmp/filesql_example*/output/employees.csv
Example (DataProcessing) ΒΆ
package main

import (
	"fmt"
	"log"
	"os"
	"path/filepath"

	"github.com/nao1215/filesql"
)

func main() {
	tempDir := filepath.Join(os.TempDir(), "filesql_processing_example")
	if err := os.MkdirAll(tempDir, 0750); err != nil {
		log.Fatal(err)
	}
	defer os.RemoveAll(tempDir)

	// Open CSV file
	db, err := filesql.Open(filepath.Join("testdata", "sample.csv"))
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	// Process data with SQL
	_, err = db.Exec(`
		UPDATE sample 
		SET age = age + 1 
		WHERE name LIKE '%John%'
	`)
	if err != nil {
		log.Fatal(err)
	}

	// Add aggregated data
	_, err = db.Exec(`
		INSERT INTO sample (id, name, age, email) 
		SELECT 999, 'Summary: ' || COUNT(*), AVG(age), 'summary@example.com'
		FROM sample 
		WHERE id < 999
	`)
	if err != nil {
		log.Fatal(err)
	}

	// Export processed data in different formats for different use cases

	// 1. TSV for spreadsheet import
	options := filesql.NewDumpOptions().WithFormat(filesql.OutputFormatTSV)
	spreadsheetDir := filepath.Join(tempDir, "for_spreadsheet")
	if err := filesql.DumpDatabase(db, spreadsheetDir, options); err != nil {
		log.Fatal(err)
	}
	fmt.Println("Exported TSV for spreadsheet import")

	// 2. Compressed CSV for archival
	options = filesql.NewDumpOptions().
		WithFormat(filesql.OutputFormatCSV).
		WithCompression(filesql.CompressionGZ)
	archiveDir := filepath.Join(tempDir, "for_archive")
	if err := filesql.DumpDatabase(db, archiveDir, options); err != nil {
		log.Fatal(err)
	}
	fmt.Println("Exported compressed CSV for archival")

	// 3. LTSV for log analysis
	options = filesql.NewDumpOptions().WithFormat(filesql.OutputFormatLTSV)
	logDir := filepath.Join(tempDir, "for_logs")
	if err := filesql.DumpDatabase(db, logDir, options); err != nil {
		log.Fatal(err)
	}
	fmt.Println("Exported LTSV for log analysis")

	// Show what was created
	dirs := []string{"for_spreadsheet", "for_archive", "for_logs"}
	for _, dir := range dirs {
		files, err := filepath.Glob(filepath.Join(tempDir, dir, "*"))
		if err != nil {
			log.Fatal(err)
		}
		for _, file := range files {
			fmt.Printf("%s: %s\n", dir, filepath.Base(file))
		}
	}

}
Output:

Exported TSV for spreadsheet import
Exported compressed CSV for archival
Exported LTSV for log analysis
for_spreadsheet: sample.tsv
for_archive: sample.csv.gz
for_logs: sample.ltsv
Example (MultipleFormats) ΒΆ
package main

import (
	"fmt"
	"log"
	"os"
	"path/filepath"

	"github.com/nao1215/filesql"
)

func main() {
	tempDir := filepath.Join(os.TempDir(), "filesql_formats_example")
	if err := os.MkdirAll(tempDir, 0750); err != nil {
		log.Fatal(err)
	}
	defer os.RemoveAll(tempDir)

	// Open CSV file and modify data
	db, err := filesql.Open(filepath.Join("testdata", "sample.csv"))
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	// Add some data to demonstrate functionality
	_, err = db.Exec("INSERT INTO sample (id, name, age, email) VALUES (4, 'Alice Brown', 28, 'alice@example.com')")
	if err != nil {
		log.Fatal(err)
	}

	// Demonstrate different compression options
	compressionTypes := []struct {
		name        string
		compression filesql.CompressionType
		extension   string
	}{
		{"No compression", filesql.CompressionNone, ""},
		{"Gzip compression", filesql.CompressionGZ, ".gz"},
		{"XZ compression", filesql.CompressionXZ, ".xz"},
		{"Zstd compression", filesql.CompressionZSTD, ".zst"},
	}

	for _, ct := range compressionTypes {
		fmt.Printf("%s:\n", ct.name)

		options := filesql.NewDumpOptions().
			WithFormat(filesql.OutputFormatCSV).
			WithCompression(ct.compression)

		outputDir := filepath.Join(tempDir, "compression_"+ct.compression.String())
		if err := filesql.DumpDatabase(db, outputDir, options); err != nil {
			log.Fatal(err)
		}

		files, err := filepath.Glob(filepath.Join(outputDir, "*"))
		if err != nil {
			log.Fatal(err)
		}
		for _, file := range files {
			fmt.Printf("  %s\n", filepath.Base(file))
		}
	}

}
Output:

No compression:
  sample.csv
Gzip compression:
  sample.csv.gz
XZ compression:
  sample.csv.xz
Zstd compression:
  sample.csv.zst
Example (WithOptions) ΒΆ
package main

import (
	"fmt"
	"log"
	"os"
	"path/filepath"

	"github.com/nao1215/filesql"
)

func main() {
	// Create a temporary directory for output
	tempDir := filepath.Join(os.TempDir(), "filesql_dump_example")
	if err := os.MkdirAll(tempDir, 0750); err != nil {
		log.Fatal(err)
	}
	defer os.RemoveAll(tempDir)

	// Open CSV file
	db, err := filesql.Open(filepath.Join("testdata", "sample.csv"))
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	// Example 1: Default CSV output (no options)
	fmt.Println("Example 1: Default CSV output")
	csvDir := filepath.Join(tempDir, "csv_output")
	if err := filesql.DumpDatabase(db, csvDir); err != nil {
		log.Fatal(err)
	}

	// List output files
	files1, err := filepath.Glob(filepath.Join(csvDir, "*"))
	if err != nil {
		log.Fatal(err)
	}
	for _, file := range files1 {
		fmt.Printf("Created: %s\n", filepath.Base(file))
	}

	// Example 2: TSV output with gzip compression
	fmt.Println("\nExample 2: TSV output with gzip compression")
	tsvDir := filepath.Join(tempDir, "tsv_output")
	options := filesql.NewDumpOptions().
		WithFormat(filesql.OutputFormatTSV).
		WithCompression(filesql.CompressionGZ)
	if err := filesql.DumpDatabase(db, tsvDir, options); err != nil {
		log.Fatal(err)
	}

	files2, err := filepath.Glob(filepath.Join(tsvDir, "*"))
	if err != nil {
		log.Fatal(err)
	}
	for _, file := range files2 {
		fmt.Printf("Created: %s\n", filepath.Base(file))
	}

	// Example 3: LTSV output with zstd compression
	fmt.Println("\nExample 3: LTSV output with zstd compression")
	ltsvDir := filepath.Join(tempDir, "ltsv_output")
	options3 := filesql.NewDumpOptions().
		WithFormat(filesql.OutputFormatLTSV).
		WithCompression(filesql.CompressionZSTD)
	if err := filesql.DumpDatabase(db, ltsvDir, options3); err != nil {
		log.Fatal(err)
	}

	files3, err := filepath.Glob(filepath.Join(ltsvDir, "*"))
	if err != nil {
		log.Fatal(err)
	}
	for _, file := range files3 {
		fmt.Printf("Created: %s\n", filepath.Base(file))
	}

}
Output:

Example 1: Default CSV output
Created: sample.csv

Example 2: TSV output with gzip compression
Created: sample.tsv.gz

Example 3: LTSV output with zstd compression
Created: sample.ltsv.zst

func Open ΒΆ

func Open(paths ...string) (*sql.DB, error)

Open creates an SQL database from CSV, TSV, or LTSV files.

Quick start:

db, err := filesql.Open("data.csv")
if err != nil {
	return err
}
defer db.Close()

rows, err := db.Query("SELECT * FROM data WHERE age > 25")

Parameters:

  • paths: One or more file paths or directories
  • Files: "users.csv", "products.tsv", "logs.ltsv"
  • Compressed: "data.csv.gz", "archive.tsv.bz2"
  • Directories: "/data/" (loads all CSV/TSV/LTSV files recursively)

Table names:

  • "users.csv" β†’ table "users"
  • "data.tsv.gz" β†’ table "data"
  • "/path/to/sales.csv" β†’ table "sales"

Note: Original files are never modified. Changes exist only in memory. To save changes, use DumpDatabase() function.

Example with multiple files:

// Open a single CSV file
db, err := filesql.Open("data/users.csv")
if err != nil {
	log.Fatal(err)
}
defer db.Close()

// Complex query with JOINs, aggregation, and window functions
rows, err := db.Query(`
	SELECT
		u.name,
		u.department,
		u.salary,
		AVG(u.salary) OVER (PARTITION BY u.department) as dept_avg_salary,
		RANK() OVER (PARTITION BY u.department ORDER BY u.salary DESC) as salary_rank,
		COUNT(*) OVER (PARTITION BY u.department) as dept_size
	FROM users u
	WHERE u.salary > (
		SELECT AVG(salary) * 0.8
		FROM users
		WHERE department = u.department
	)
	ORDER BY u.department, u.salary DESC
`)
if err != nil {
	log.Fatal(err)
}
defer rows.Close()

// Process results
for rows.Next() {
	var name, dept string
	var salary, deptAvg float64
	var rank, deptSize int
	if err := rows.Scan(&name, &dept, &salary, &deptAvg, &rank, &deptSize); err != nil {
		log.Fatal(err)
	}
	fmt.Printf("%s (%s): $%.2f (Rank: %d/%d, Dept Avg: $%.2f)\n",
		name, dept, salary, rank, deptSize, deptAvg)
}
Example ΒΆ

ExampleOpen demonstrates how to use filesql.Open() with complex SQL queries. This example shows advanced SQL features including JOINs, window functions, subqueries, and aggregations on CSV data loaded into an in-memory SQLite database.

package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"path/filepath"

	"github.com/nao1215/filesql"
)

func main() {
	// Create temporary test data files
	tmpDir := createTempTestData()
	defer os.RemoveAll(tmpDir)

	// Open the database with multiple files
	db, err := filesql.Open(filepath.Join(tmpDir, "employees.csv"), filepath.Join(tmpDir, "departments.csv"))
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	// Complex query demonstrating multiple SQL features:
	// - JOINs between tables
	// - Window functions (RANK, AVG, COUNT)
	// - Subqueries with correlated conditions
	// - CASE statements
	// - Grouping and ordering
	query := `
		SELECT 
			e.name,
			d.name as department_name,
			e.salary,
			d.budget,
			RANK() OVER (PARTITION BY e.department_id ORDER BY e.salary DESC) as salary_rank_in_dept,
			AVG(e.salary) OVER (PARTITION BY e.department_id) as dept_avg_salary,
			COUNT(*) OVER (PARTITION BY e.department_id) as dept_employee_count,
			CASE 
				WHEN e.salary > (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id) * 1.2
				THEN 'High Performer'
				WHEN e.salary < (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id) * 0.8  
				THEN 'Below Average'
				ELSE 'Average'
			END as performance_category,
			ROUND(e.salary / d.budget * 100, 2) as salary_budget_percentage
		FROM employees e
		JOIN departments d ON e.department_id = d.id
		WHERE e.salary > (
			SELECT AVG(salary) * 0.7
			FROM employees e2 
			WHERE e2.department_id = e.department_id
		)
		AND d.budget > 500000
		ORDER BY d.name, e.salary DESC
		LIMIT 10
	`

	rows, err := db.QueryContext(context.Background(), query)
	if err != nil {
		log.Fatal(err)
	}
	defer rows.Close()

	fmt.Println("Employee Analysis Report:")
	fmt.Println("========================")

	for rows.Next() {
		var name, deptName, perfCategory string
		var salary, budget, deptAvgSalary, salaryBudgetPct float64
		var salaryRank, deptEmpCount int

		err := rows.Scan(&name, &deptName, &salary, &budget, &salaryRank,
			&deptAvgSalary, &deptEmpCount, &perfCategory, &salaryBudgetPct)
		if err != nil {
			log.Fatal(err)
		}

		fmt.Printf("%-15s | %-12s | $%7.0f | Rank: %d/%d | %s\n",
			name, deptName, salary, salaryRank, deptEmpCount, perfCategory)
	}

	if err = rows.Err(); err != nil {
		log.Fatal(err)
	}

}

// createTempTestData creates temporary CSV files for the example
func createTempTestData() string {
	tmpDir, err := os.MkdirTemp("", "filesql_example")
	if err != nil {
		log.Fatal(err)
	}

	employeesData := `id,name,department_id,salary,hire_date
1,Alice Johnson,1,95000,2020-01-15
2,Bob Smith,1,85000,2019-03-22
3,Charlie Brown,1,80000,2021-06-10
4,David Wilson,1,75000,2022-02-28
5,Eve Davis,2,70000,2020-09-15
6,Frank Miller,2,65000,2021-11-30
7,Grace Lee,3,60000,2019-12-05
8,Henry Taylor,3,55000,2022-04-18`

	err = os.WriteFile(filepath.Join(tmpDir, "employees.csv"), []byte(employeesData), 0600)
	if err != nil {
		log.Fatal(err)
	}

	departmentsData := `id,name,budget,manager_id
1,Engineering,1000000,1
2,Marketing,800000,5
3,Sales,600000,7
4,HR,400000,9`

	err = os.WriteFile(filepath.Join(tmpDir, "departments.csv"), []byte(departmentsData), 0600)
	if err != nil {
		log.Fatal(err)
	}

	return tmpDir
}
Output:

Employee Analysis Report:
========================
Alice Johnson   | Engineering  | $  95000 | Rank: 1/4 | Average
Bob Smith       | Engineering  | $  85000 | Rank: 2/4 | Average
Charlie Brown   | Engineering  | $  80000 | Rank: 3/4 | Average
David Wilson    | Engineering  | $  75000 | Rank: 4/4 | Average
Eve Davis       | Marketing    | $  70000 | Rank: 1/2 | Average
Frank Miller    | Marketing    | $  65000 | Rank: 2/2 | Average
Grace Lee       | Sales        | $  60000 | Rank: 1/2 | Average
Henry Taylor    | Sales        | $  55000 | Rank: 2/2 | Average
Example (AdvancedSQL) ΒΆ

ExampleOpen_advancedSQL demonstrates advanced SQL features available in SQLite3

package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"path/filepath"
	"strings"
	"time"

	"github.com/nao1215/filesql"
)

func main() {
	tmpDir := createAdvancedTestData()
	defer os.RemoveAll(tmpDir)

	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
	defer cancel()

	db, err := filesql.OpenContext(ctx, tmpDir)
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	fmt.Println("=== Advanced SQL Features ===")

	// Window functions with RANK() and ROW_NUMBER()
	fmt.Println("\n1. Window Functions - Employee Rankings by Department:")
	rows, err := db.QueryContext(ctx, `
		SELECT 
			e.name,
			d.name as department,
			e.salary,
			RANK() OVER (PARTITION BY e.department_id ORDER BY e.salary DESC) as salary_rank,
			ROW_NUMBER() OVER (ORDER BY e.salary DESC) as overall_rank
		FROM employees e
		JOIN departments d ON e.department_id = d.id
		ORDER BY e.department_id, salary_rank
	`)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("%-15s %-12s %-8s %-10s %s\n", "Name", "Department", "Salary", "Dept Rank", "Overall Rank")
	fmt.Println(strings.Repeat("-", 65))

	for rows.Next() {
		var name, department string
		var salary float64
		var salaryRank, overallRank int

		if err := rows.Scan(&name, &department, &salary, &salaryRank, &overallRank); err != nil {
			log.Fatal(err)
		}
		fmt.Printf("%-15s %-12s $%-7.0f %-10d %d\n", name, department, salary, salaryRank, overallRank)
	}
	_ = rows.Close() // Ignore close error in test cleanup

	// Common Table Expressions (CTE)
	fmt.Println("\n2. Common Table Expressions - Department Analysis:")
	rows, err = db.QueryContext(ctx, `
		WITH dept_stats AS (
			SELECT 
				d.name as department,
				COUNT(e.id) as employee_count,
				AVG(e.salary) as avg_salary,
				MAX(e.salary) as max_salary,
				MIN(e.salary) as min_salary
			FROM departments d
			LEFT JOIN employees e ON d.id = e.department_id
			GROUP BY d.id, d.name
		),
		company_avg AS (
			SELECT AVG(salary) as company_avg_salary
			FROM employees
		)
		SELECT 
			ds.department,
			ds.employee_count,
			ds.avg_salary,
			ca.company_avg_salary,
			ds.avg_salary - ca.company_avg_salary as salary_diff,
			CASE 
				WHEN ds.avg_salary > ca.company_avg_salary THEN 'Above Average'
				WHEN ds.avg_salary < ca.company_avg_salary THEN 'Below Average'
				ELSE 'At Average'
			END as comparison
		FROM dept_stats ds
		CROSS JOIN company_avg ca
		WHERE ds.employee_count > 0
		ORDER BY ds.avg_salary DESC
	`)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("%-12s %-5s %-10s %-12s %-10s %s\n", "Department", "Count", "Avg Salary", "Company Avg", "Difference", "Comparison")
	fmt.Println(strings.Repeat("-", 75))

	for rows.Next() {
		var department, comparison string
		var employeeCount int
		var avgSalary, companyAvg, salaryDiff float64

		if err := rows.Scan(&department, &employeeCount, &avgSalary, &companyAvg, &salaryDiff, &comparison); err != nil {
			log.Fatal(err)
		}
		fmt.Printf("%-12s %-5d $%-9.0f $%-11.0f $%-9.0f %s\n",
			department, employeeCount, avgSalary, companyAvg, salaryDiff, comparison)
	}
	_ = rows.Close() // Ignore close error in test cleanup

	// JSON operations (if data contains JSON)
	fmt.Println("\n3. Text Functions - Name Analysis:")
	rows, err = db.QueryContext(ctx, `
		SELECT 
			name,
			LENGTH(name) as name_length,
			UPPER(SUBSTR(name, 1, 1)) || LOWER(SUBSTR(name, 2)) as formatted_name,
			INSTR(name, ' ') as space_position,
			CASE 
				WHEN INSTR(name, ' ') > 0 THEN SUBSTR(name, 1, INSTR(name, ' ') - 1)
				ELSE name
			END as first_name
		FROM employees
		WHERE LENGTH(name) > 8
		ORDER BY name_length DESC
	`)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("%-15s %-6s %-15s %-8s %s\n", "Name", "Length", "Formatted", "Space@", "First Name")
	fmt.Println(strings.Repeat("-", 60))

	for rows.Next() {
		var name, formattedName, firstName string
		var nameLength, spacePos int

		if err := rows.Scan(&name, &nameLength, &formattedName, &spacePos, &firstName); err != nil {
			log.Fatal(err)
		}
		fmt.Printf("%-15s %-6d %-15s %-8d %s\n", name, nameLength, formattedName, spacePos, firstName)
	}
	_ = rows.Close() // Ignore close error in test cleanup

}

// createAdvancedTestData creates test data for advanced SQL examples
func createAdvancedTestData() string {
	tmpDir, err := os.MkdirTemp("", "filesql_advanced_example_*")
	if err != nil {
		log.Fatal(err)
	}

	employeesData := `id,name,department_id,salary,hire_date
1,Alice Johnson,1,95000,2023-01-15
2,Bob Smith,2,85000,2023-02-20
3,Charlie Brown,1,80000,2023-03-10
4,David Wilson,1,75000,2023-04-05
5,Eve Davis,2,65000,2023-05-15
6,Frank Miller,3,70000,2023-06-01`

	employeesFile := filepath.Join(tmpDir, "employees.csv")
	err = os.WriteFile(employeesFile, []byte(employeesData), 0600)
	if err != nil {
		log.Fatal(err)
	}

	departmentsData := `id,name,budget
1,Engineering,500000
2,Sales,300000
3,Marketing,200000
4,HR,150000`

	departmentsFile := filepath.Join(tmpDir, "departments.csv")
	err = os.WriteFile(departmentsFile, []byte(departmentsData), 0600)
	if err != nil {
		log.Fatal(err)
	}

	return tmpDir
}
Output:

=== Advanced SQL Features ===

1. Window Functions - Employee Rankings by Department:
Name            Department   Salary   Dept Rank  Overall Rank
-----------------------------------------------------------------
Alice Johnson   Engineering  $95000   1          1
Charlie Brown   Engineering  $80000   2          3
David Wilson    Engineering  $75000   3          4
Bob Smith       Sales        $85000   1          2
Eve Davis       Sales        $65000   2          6
Frank Miller    Marketing    $70000   1          5

2. Common Table Expressions - Department Analysis:
Department   Count Avg Salary Company Avg  Difference Comparison
---------------------------------------------------------------------------
Engineering  3     $83333     $78333       $5000      Above Average
Sales        2     $75000     $78333       $-3333     Below Average
Marketing    1     $70000     $78333       $-8333     Below Average

3. Text Functions - Name Analysis:
Name            Length Formatted       Space@   First Name
------------------------------------------------------------
Alice Johnson   13     Alice johnson   6        Alice
Charlie Brown   13     Charlie brown   8        Charlie
David Wilson    12     David wilson    6        David
Frank Miller    12     Frank miller    6        Frank
Bob Smith       9      Bob smith       4        Bob
Eve Davis       9      Eve davis       4        Eve
Example (CompressionSupport) ΒΆ

ExampleOpen_compressionSupport demonstrates working with compressed files

package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"path/filepath"
	"strings"
	"time"

	"github.com/nao1215/filesql"
)

func main() {
	tmpDir := createCompressedTestData()
	defer os.RemoveAll(tmpDir)

	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
	defer cancel()

	// Open compressed files seamlessly
	db, err := filesql.OpenContext(ctx, tmpDir)
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	fmt.Println("=== Compression Support Demo ===")
	fmt.Println("Successfully loaded compressed files:")

	// List all tables from compressed files
	rows, err := db.QueryContext(ctx, `
		SELECT name, sql 
		FROM sqlite_master 
		WHERE type='table' 
		ORDER BY name
	`)
	if err != nil {
		log.Fatal(err)
	}

	for rows.Next() {
		var tableName, createSQL string
		if err := rows.Scan(&tableName, &createSQL); err != nil {
			log.Fatal(err)
		}

		// Count records in each table
		var count int
		countQuery := "SELECT COUNT(*) FROM " + tableName
		if err := db.QueryRowContext(ctx, countQuery).Scan(&count); err != nil {
			log.Fatal(err)
		}

		fmt.Printf("- %s: %d records\n", tableName, count)
	}
	_ = rows.Close() // Ignore close error in test cleanup

	// Demonstrate querying across compressed files
	fmt.Println("\nCross-file analysis from compressed data:")

	analysisRows, err := db.QueryContext(ctx, `
		SELECT 
			'logs' as source_table,
			COUNT(*) as total_records,
			MIN(timestamp) as earliest,
			MAX(timestamp) as latest
		FROM logs
		
		UNION ALL
		
		SELECT 
			'products' as source_table,
			COUNT(*) as total_records,
			'N/A' as earliest,
			'N/A' as latest
		FROM products
		
		ORDER BY source_table
	`)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("%-12s %-8s %-19s %s\n", "Table", "Records", "Earliest", "Latest")
	fmt.Println(strings.Repeat("-", 60))

	for analysisRows.Next() {
		var sourceTable, earliest, latest string
		var totalRecords int

		if err := analysisRows.Scan(&sourceTable, &totalRecords, &earliest, &latest); err != nil {
			log.Fatal(err)
		}
		fmt.Printf("%-12s %-8d %-19s %s\n", sourceTable, totalRecords, earliest, latest)
	}
	_ = analysisRows.Close() // Ignore close error in test cleanup

}

// createCompressedTestData creates test data with compressed files
func createCompressedTestData() string {
	tmpDir, err := os.MkdirTemp("", "filesql_compressed_example_*")
	if err != nil {
		log.Fatal(err)
	}

	logsData := `timestamp,level,message,user_id
2024-01-01 10:00:00,INFO,User login,1001
2024-01-01 11:30:00,INFO,Order created,1002
2024-01-01 12:15:00,ERROR,Payment failed,1003
2024-01-01 13:45:00,INFO,User logout,1001
2024-01-01 14:00:00,INFO,System backup completed,0`

	logsFile := filepath.Join(tmpDir, "logs.csv")
	err = os.WriteFile(logsFile, []byte(logsData), 0600)
	if err != nil {
		log.Fatal(err)
	}

	productsData := `id,name,category,price,in_stock
1,Laptop Pro,Electronics,1299.99,true
2,Office Chair,Furniture,299.99,true
3,Wireless Mouse,Electronics,49.99,false`

	productsFile := filepath.Join(tmpDir, "products.csv")
	err = os.WriteFile(productsFile, []byte(productsData), 0600)
	if err != nil {
		log.Fatal(err)
	}

	return tmpDir
}
Output:

=== Compression Support Demo ===
Successfully loaded compressed files:
- logs: 5 records
- products: 3 records

Cross-file analysis from compressed data:
Table        Records  Earliest            Latest
------------------------------------------------------------
logs         5        2024-01-01 10:00:00 2024-01-01 14:00:00
products     3        N/A                 N/A
Example (Constraints) ΒΆ

ExampleOpen_constraints demonstrates the constraint that modifications don't affect original files

package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"path/filepath"

	"github.com/nao1215/filesql"
)

// createTempTestData creates temporary CSV files for the example
func createTempTestData() string {
	tmpDir, err := os.MkdirTemp("", "filesql_example")
	if err != nil {
		log.Fatal(err)
	}

	employeesData := `id,name,department_id,salary,hire_date
1,Alice Johnson,1,95000,2020-01-15
2,Bob Smith,1,85000,2019-03-22
3,Charlie Brown,1,80000,2021-06-10
4,David Wilson,1,75000,2022-02-28
5,Eve Davis,2,70000,2020-09-15
6,Frank Miller,2,65000,2021-11-30
7,Grace Lee,3,60000,2019-12-05
8,Henry Taylor,3,55000,2022-04-18`

	err = os.WriteFile(filepath.Join(tmpDir, "employees.csv"), []byte(employeesData), 0600)
	if err != nil {
		log.Fatal(err)
	}

	departmentsData := `id,name,budget,manager_id
1,Engineering,1000000,1
2,Marketing,800000,5
3,Sales,600000,7
4,HR,400000,9`

	err = os.WriteFile(filepath.Join(tmpDir, "departments.csv"), []byte(departmentsData), 0600)
	if err != nil {
		log.Fatal(err)
	}

	return tmpDir
}

func main() {
	tmpDir := createTempTestData()
	defer os.RemoveAll(tmpDir)

	db, err := filesql.Open(filepath.Join(tmpDir, "employees.csv"))
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	// Show original data count
	var originalCount int
	err = db.QueryRowContext(context.Background(), "SELECT COUNT(*) FROM employees").Scan(&originalCount)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("Original employee count: %d\n", originalCount)

	// Insert new data (only affects in-memory database)
	_, err = db.ExecContext(context.Background(), "INSERT INTO employees (id, name, department_id, salary, hire_date) VALUES (99, 'Test User', 1, 50000, '2023-01-01')")
	if err != nil {
		log.Fatal(err)
	}

	// Show in-memory count
	var memoryCount int
	err = db.QueryRowContext(context.Background(), "SELECT COUNT(*) FROM employees").Scan(&memoryCount)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("In-memory count after INSERT: %d\n", memoryCount)

	// Verify original file is unchanged by reopening
	db2, err := filesql.Open(filepath.Join(tmpDir, "employees.csv"))
	if err != nil {
		log.Fatal(err)
	}
	defer db2.Close()

	var fileCount int
	err = db2.QueryRowContext(context.Background(), "SELECT COUNT(*) FROM employees").Scan(&fileCount)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("File-based count (unchanged): %d\n", fileCount)

}
Output:

Original employee count: 8
In-memory count after INSERT: 9
File-based count (unchanged): 8
Example (CustomerInsights) ΒΆ

ExampleOpen_customerInsights demonstrates customer behavior analysis

package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"path/filepath"
	"strings"
	"time"

	"github.com/nao1215/filesql"
)

// createSalesTestData creates test data for sales analysis examples
func createSalesTestData() string {
	tmpDir, err := os.MkdirTemp("", "filesql_sales_example")
	if err != nil {
		log.Fatal(err)
	}

	salesData := `order_id,customer_id,product_name,category,quantity,unit_price,order_date,region
1,101,Laptop Pro,Electronics,2,1299.99,2024-01-15,North
2,102,Wireless Mouse,Electronics,1,29.99,2024-01-16,South
3,103,Office Chair,Furniture,1,299.99,2024-01-17,East
4,101,USB Cable,Electronics,3,12.99,2024-01-18,North
5,104,Standing Desk,Furniture,1,599.99,2024-01-19,West
6,105,Bluetooth Speaker,Electronics,2,79.99,2024-01-20,South
7,106,Coffee Table,Furniture,1,199.99,2024-01-21,East
8,102,Keyboard,Electronics,1,89.99,2024-01-22,South
9,107,Monitor 24inch,Electronics,1,249.99,2024-01-23,North
10,103,Desk Lamp,Furniture,2,39.99,2024-01-24,East`

	err = os.WriteFile(filepath.Join(tmpDir, "sales.csv"), []byte(salesData), 0600)
	if err != nil {
		log.Fatal(err)
	}

	customersData := `customer_id,name,email,city,registration_date
101,John Doe,john@example.com,New York,2023-06-01
102,Jane Smith,jane@example.com,Los Angeles,2023-07-15
103,Bob Johnson,bob@example.com,Chicago,2023-08-20
104,Alice Brown,alice@example.com,Houston,2023-09-10
105,Charlie Wilson,charlie@example.com,Phoenix,2023-10-05
106,Diana Lee,diana@example.com,Philadelphia,2023-11-12
107,Frank Miller,frank@example.com,San Antonio,2023-12-03`

	err = os.WriteFile(filepath.Join(tmpDir, "customers.csv"), []byte(customersData), 0600)
	if err != nil {
		log.Fatal(err)
	}

	return tmpDir
}

func main() {
	tmpDir := createSalesTestData()
	defer os.RemoveAll(tmpDir)

	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
	defer cancel()

	db, err := filesql.OpenContext(ctx, tmpDir)
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	// Customer lifetime value and behavior analysis
	query := `
		SELECT 
			c.name,
			c.city,
			COUNT(s.order_id) as total_orders,
			SUM(s.quantity * s.unit_price) as lifetime_value,
			AVG(s.quantity * s.unit_price) as avg_order_value,
			MIN(s.order_date) as first_purchase,
			MAX(s.order_date) as last_purchase,
			julianday(MAX(s.order_date)) - julianday(MIN(s.order_date)) as days_active,
			COUNT(DISTINCT s.category) as categories_purchased
		FROM customers c
		JOIN sales s ON c.customer_id = s.customer_id
		GROUP BY c.customer_id, c.name, c.city
		HAVING total_orders > 1
		ORDER BY lifetime_value DESC
	`

	rows, err := db.QueryContext(ctx, query)
	if err != nil {
		log.Fatal(err)
	}
	defer rows.Close()

	fmt.Println("Customer Insights (Multi-Purchase Customers):")
	fmt.Println("===========================================")
	fmt.Printf("%-12s %-12s %-7s %-10s %-10s %-12s %-12s %-6s %s\n",
		"Name", "City", "Orders", "LTV", "Avg Order", "First Buy", "Last Buy", "Days", "Categories")
	fmt.Println(strings.Repeat("-", 100))

	for rows.Next() {
		var name, city, firstPurchase, lastPurchase string
		var totalOrders, daysActive, categoriesPurchased int
		var lifetimeValue, avgOrderValue float64

		err := rows.Scan(&name, &city, &totalOrders, &lifetimeValue, &avgOrderValue,
			&firstPurchase, &lastPurchase, &daysActive, &categoriesPurchased)
		if err != nil {
			log.Fatal(err)
		}

		fmt.Printf("%-12s %-12s %-7d $%-9.2f $%-9.2f %-12s %-12s %-6d %d\n",
			name, city, totalOrders, lifetimeValue, avgOrderValue,
			firstPurchase, lastPurchase, daysActive, categoriesPurchased)
	}

}
Output:

Customer Insights (Multi-Purchase Customers):
===========================================
Name         City         Orders  LTV        Avg Order  First Buy    Last Buy     Days   Categories
----------------------------------------------------------------------------------------------------
John Doe     New York     2       $2638.95   $1319.47   2024-01-15   2024-01-18   3      1
Bob Johnson  Chicago      2       $379.97    $189.99    2024-01-17   2024-01-24   7      1
Jane Smith   Los Angeles  2       $119.98    $59.99     2024-01-16   2024-01-22   6      1
Example (ErrorHandling) ΒΆ

ExampleOpen_errorHandling demonstrates proper error handling patterns

package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"path/filepath"
	"strings"
	"time"

	"github.com/nao1215/filesql"
)

// createTempTestData creates temporary CSV files for the example
func createTempTestData() string {
	tmpDir, err := os.MkdirTemp("", "filesql_example")
	if err != nil {
		log.Fatal(err)
	}

	employeesData := `id,name,department_id,salary,hire_date
1,Alice Johnson,1,95000,2020-01-15
2,Bob Smith,1,85000,2019-03-22
3,Charlie Brown,1,80000,2021-06-10
4,David Wilson,1,75000,2022-02-28
5,Eve Davis,2,70000,2020-09-15
6,Frank Miller,2,65000,2021-11-30
7,Grace Lee,3,60000,2019-12-05
8,Henry Taylor,3,55000,2022-04-18`

	err = os.WriteFile(filepath.Join(tmpDir, "employees.csv"), []byte(employeesData), 0600)
	if err != nil {
		log.Fatal(err)
	}

	departmentsData := `id,name,budget,manager_id
1,Engineering,1000000,1
2,Marketing,800000,5
3,Sales,600000,7
4,HR,400000,9`

	err = os.WriteFile(filepath.Join(tmpDir, "departments.csv"), []byte(departmentsData), 0600)
	if err != nil {
		log.Fatal(err)
	}

	return tmpDir
}

func main() {
	// Example 1: Handling non-existent files gracefully
	_, err := filesql.Open("nonexistent.csv")
	if err != nil {
		fmt.Printf("Expected error for non-existent file: %v\n", err)
	}

	// Example 2: Context timeout handling
	ctx, cancel := context.WithTimeout(context.Background(), 1*time.Nanosecond) // Very short timeout
	defer cancel()

	// This will likely timeout
	tmpDir := createTempTestData()
	defer os.RemoveAll(tmpDir)

	time.Sleep(10 * time.Millisecond) // Ensure timeout triggers
	_, err = filesql.OpenContext(ctx, tmpDir)
	if err != nil {
		// Extract the core error message (context deadline exceeded)
		errMsg := err.Error()
		if strings.Contains(errMsg, "context deadline exceeded") {
			fmt.Printf("Expected timeout error: %s\n", "context deadline exceeded")
		} else {
			fmt.Printf("Expected timeout error: %v\n", err)
		}
	}

	// Example 3: Successful operation with proper error checking
	ctx2, cancel2 := context.WithTimeout(context.Background(), 5*time.Second)
	defer cancel2()

	db, err := filesql.OpenContext(ctx2, tmpDir)
	if err != nil {
		fmt.Printf("Unexpected error: %v\n", err)
		return
	}
	defer db.Close()

	// Test query with error handling
	rows, err := db.QueryContext(ctx2, "SELECT COUNT(*) FROM employees")
	if err != nil {
		fmt.Printf("Query error: %v\n", err)
		return
	}
	defer rows.Close()

	var count int
	if rows.Next() {
		if err := rows.Scan(&count); err != nil {
			fmt.Printf("Scan error: %v\n", err)
			return
		}
		fmt.Printf("Successfully counted %d employees\n", count)
	}

	if err := rows.Err(); err != nil {
		fmt.Printf("Rows iteration error: %v\n", err)
		return
	}

}
Output:

Expected error for non-existent file: failed to load file: path does not exist: nonexistent.csv
Expected timeout error: context deadline exceeded
Successfully counted 8 employees
Example (FinancialDataAnalysis) ΒΆ

ExampleOpen_financialDataAnalysis demonstrates financial data processing

package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"path/filepath"
	"strings"
	"time"

	"github.com/nao1215/filesql"
)

func main() {
	tmpDir := createFinancialTestData()
	defer os.RemoveAll(tmpDir)

	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
	defer cancel()

	db, err := filesql.OpenContext(ctx, tmpDir)
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	fmt.Println("=== Financial Data Analysis ===")

	// Monthly revenue trend
	fmt.Println("\n1. Monthly Revenue Trend:")
	rows, err := db.QueryContext(ctx, `
		SELECT 
			strftime('%Y-%m', transaction_date) as month,
			COUNT(*) as transaction_count,
			SUM(amount) as total_revenue,
			AVG(amount) as avg_transaction,
			MAX(amount) as largest_transaction
		FROM transactions 
		WHERE type = 'sale'
		GROUP BY month
		ORDER BY month
	`)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("%-8s %-6s %-10s %-8s %s\n", "Month", "Count", "Revenue", "Average", "Largest")
	fmt.Println(strings.Repeat("-", 50))
	for rows.Next() {
		var month string
		var count int
		var revenue, average, largest float64
		if err := rows.Scan(&month, &count, &revenue, &average, &largest); err != nil {
			log.Fatal(err)
		}
		fmt.Printf("%-8s %-6d $%-9.2f $%-7.2f $%.2f\n", month, count, revenue, average, largest)
	}
	_ = rows.Close() // Ignore close error in test cleanup

	// Expense category breakdown
	fmt.Println("\n2. Expense Categories:")
	rows, err = db.QueryContext(ctx, `
		SELECT 
			category,
			COUNT(*) as transaction_count,
			SUM(ABS(amount)) as total_expense,
			ROUND(SUM(ABS(amount)) * 100.0 / (
				SELECT SUM(ABS(amount)) FROM transactions WHERE type = 'expense'
			), 2) as percentage
		FROM transactions 
		WHERE type = 'expense'
		GROUP BY category
		ORDER BY total_expense DESC
	`)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("%-15s %-6s %-12s %-10s\n", "Category", "Count", "Total", "Percentage")
	fmt.Println(strings.Repeat("-", 45))
	for rows.Next() {
		var category string
		var count int
		var expense, percentage float64
		if err := rows.Scan(&category, &count, &expense, &percentage); err != nil {
			log.Fatal(err)
		}
		fmt.Printf("%-15s %-6d $%-11.2f %-10.2f%%\n", category, count, expense, percentage)
	}
	_ = rows.Close() // Ignore close error in test cleanup

	// Cash flow summary
	fmt.Println("\n3. Cash Flow Summary:")
	var totalIncome, totalExpenses, netIncome float64
	err = db.QueryRowContext(ctx, `
		SELECT 
			SUM(CASE WHEN type = 'sale' THEN amount ELSE 0 END) as total_income,
			SUM(CASE WHEN type = 'expense' THEN ABS(amount) ELSE 0 END) as total_expenses,
			SUM(CASE WHEN type = 'sale' THEN amount ELSE -ABS(amount) END) as net_income
		FROM transactions
	`).Scan(&totalIncome, &totalExpenses, &netIncome)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("Total Income:  $%.2f\n", totalIncome)
	fmt.Printf("Total Expenses: $%.2f\n", totalExpenses)
	fmt.Printf("Net Income:    $%.2f\n", netIncome)
	fmt.Printf("Profit Margin: %.2f%%\n", (netIncome/totalIncome)*100)

}

// createFinancialTestData creates test data for financial analysis examples
func createFinancialTestData() string {
	tmpDir, err := os.MkdirTemp("", "filesql_financial_example_*")
	if err != nil {
		log.Fatal(err)
	}

	transactionData := `transaction_id,transaction_date,type,category,amount,description
1,2024-01-15,sale,Product Sales,1500.00,Sale of premium product
2,2024-01-20,sale,Product Sales,850.00,Sale of standard product
3,2024-01-25,sale,Service,1200.00,Consulting service
4,2024-01-10,expense,Office Supplies,-150.00,Office equipment purchase
5,2024-01-18,expense,Marketing,-250.00,Social media advertising
6,2024-02-05,sale,Product Sales,1200.00,Sale of premium product
7,2024-02-15,sale,Service,1000.00,Training service
8,2024-02-08,expense,Office Supplies,-200.00,Stationery purchase`

	transactionFile := filepath.Join(tmpDir, "transactions.csv")
	err = os.WriteFile(transactionFile, []byte(transactionData), 0600)
	if err != nil {
		log.Fatal(err)
	}

	return tmpDir
}
Output:

=== Financial Data Analysis ===

1. Monthly Revenue Trend:
Month    Count  Revenue    Average  Largest
--------------------------------------------------
2024-01  3      $3550.00   $1183.33 $1500.00
2024-02  2      $2200.00   $1100.00 $1200.00

2. Expense Categories:
Category        Count  Total        Percentage
---------------------------------------------
Office Supplies 2      $350.00      58.33     %
Marketing       1      $250.00      41.67     %

3. Cash Flow Summary:
Total Income:  $5750.00
Total Expenses: $600.00
Net Income:    $5150.00
Profit Margin: 89.57%
Example (MultipleFiles) ΒΆ

ExampleOpen_multipleFiles demonstrates opening multiple files and directories

package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"path/filepath"

	"github.com/nao1215/filesql"
)

// createTempTestData creates temporary CSV files for the example
func createTempTestData() string {
	tmpDir, err := os.MkdirTemp("", "filesql_example")
	if err != nil {
		log.Fatal(err)
	}

	employeesData := `id,name,department_id,salary,hire_date
1,Alice Johnson,1,95000,2020-01-15
2,Bob Smith,1,85000,2019-03-22
3,Charlie Brown,1,80000,2021-06-10
4,David Wilson,1,75000,2022-02-28
5,Eve Davis,2,70000,2020-09-15
6,Frank Miller,2,65000,2021-11-30
7,Grace Lee,3,60000,2019-12-05
8,Henry Taylor,3,55000,2022-04-18`

	err = os.WriteFile(filepath.Join(tmpDir, "employees.csv"), []byte(employeesData), 0600)
	if err != nil {
		log.Fatal(err)
	}

	departmentsData := `id,name,budget,manager_id
1,Engineering,1000000,1
2,Marketing,800000,5
3,Sales,600000,7
4,HR,400000,9`

	err = os.WriteFile(filepath.Join(tmpDir, "departments.csv"), []byte(departmentsData), 0600)
	if err != nil {
		log.Fatal(err)
	}

	return tmpDir
}

func main() {
	tmpDir := createTempTestData()
	defer os.RemoveAll(tmpDir)

	// Open database with multiple paths (files and directories)
	db, err := filesql.Open(tmpDir)
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	// Query to show all available tables
	rows, err := db.QueryContext(context.Background(), "SELECT name FROM sqlite_master WHERE type='table' ORDER BY name")
	if err != nil {
		log.Fatal(err)
	}
	defer rows.Close()

	fmt.Println("Available tables:")
	for rows.Next() {
		var tableName string
		if err := rows.Scan(&tableName); err != nil {
			log.Fatal(err)
		}
		fmt.Printf("- %s\n", tableName)
	}

}
Output:

Available tables:
- departments
- employees
Example (PerformanceOptimization) ΒΆ

ExampleOpen_performanceOptimization demonstrates techniques for handling large datasets efficiently

package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"path/filepath"
	"time"

	"github.com/nao1215/filesql"
)

func main() {
	tmpDir := createLargeTestData()
	defer os.RemoveAll(tmpDir)

	ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
	defer cancel()

	db, err := filesql.OpenContext(ctx, tmpDir)
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	// Technique 1: Use LIMIT and OFFSET for pagination
	fmt.Println("=== Performance Optimization Techniques ===")
	fmt.Println("\n1. Pagination with LIMIT and OFFSET:")

	pageSize := 3
	offset := 0

	for page := 1; page <= 2; page++ {
		rows, err := db.QueryContext(ctx, `
			SELECT customer_id, name, total_orders 
			FROM customer_summary 
			ORDER BY total_orders DESC 
			LIMIT ? OFFSET ?
		`, pageSize, offset)
		if err != nil {
			log.Fatal(err)
		}

		fmt.Printf("Page %d:\n", page)
		for rows.Next() {
			var customerID int
			var name string
			var totalOrders int
			if err := rows.Scan(&customerID, &name, &totalOrders); err != nil {
				log.Fatal(err)
			}
			fmt.Printf("  - %s (ID: %d, Orders: %d)\n", name, customerID, totalOrders)
		}
		_ = rows.Close() // Ignore close error in test cleanup
		offset += pageSize
	}

	// Technique 2: Use indexes by querying with WHERE clauses on sorted columns
	fmt.Println("\n2. Efficient filtering with indexes:")
	rows, err := db.QueryContext(ctx, `
		SELECT name, email, registration_date 
		FROM customer_summary 
		WHERE total_spent > 1000 
		ORDER BY total_spent DESC
	`)
	if err != nil {
		log.Fatal(err)
	}
	defer rows.Close()

	fmt.Println("High-value customers:")
	for rows.Next() {
		var name, email, regDate string
		if err := rows.Scan(&name, &email, &regDate); err != nil {
			log.Fatal(err)
		}
		fmt.Printf("  - %s (%s) - Registered: %s\n", name, email, regDate)
	}

	// Technique 3: Aggregate queries for summary statistics
	fmt.Println("\n3. Summary statistics:")
	var totalCustomers int
	var avgOrders, totalRevenue, avgSpent float64

	err = db.QueryRowContext(ctx, `
		SELECT 
			COUNT(*) as total_customers,
			AVG(total_orders) as avg_orders,
			SUM(total_spent) as total_revenue,
			AVG(total_spent) as avg_spent
		FROM customer_summary
	`).Scan(&totalCustomers, &avgOrders, &totalRevenue, &avgSpent)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("Total customers: %d\n", totalCustomers)
	fmt.Printf("Average orders per customer: %.1f\n", avgOrders)
	fmt.Printf("Total revenue: $%.2f\n", totalRevenue)
	fmt.Printf("Average customer value: $%.2f\n", avgSpent)

}

// createLargeTestData creates test data for performance optimization examples
func createLargeTestData() string {
	tmpDir, err := os.MkdirTemp("", "filesql_large_example_*")
	if err != nil {
		log.Fatal(err)
	}

	customerData := `customer_id,name,email,registration_date,total_orders,total_spent
1001,Premium Customer A,premium.a@example.com,2023-01-15,15,1500.00
1002,Premium Customer B,premium.b@example.com,2023-02-20,12,1200.00
1003,Premium Customer C,premium.c@example.com,2023-03-10,10,1000.00
1004,Regular Customer D,regular.d@example.com,2023-04-05,8,800.00
1005,Regular Customer E,regular.e@example.com,2023-05-15,6,600.00
1006,Regular Customer F,regular.f@example.com,2023-06-20,5,500.00
1007,Budget Customer G,budget.g@example.com,2023-07-10,3,300.00
1008,Budget Customer H,budget.h@example.com,2023-08-25,2,200.00
1009,New Customer I,new.i@example.com,2023-09-30,1,100.00
1010,New Customer J,new.j@example.com,2023-10-15,1,100.00`

	customerFile := filepath.Join(tmpDir, "customer_summary.csv")
	err = os.WriteFile(customerFile, []byte(customerData), 0600)
	if err != nil {
		log.Fatal(err)
	}

	return tmpDir
}
Output:

=== Performance Optimization Techniques ===

1. Pagination with LIMIT and OFFSET:
Page 1:
  - Premium Customer A (ID: 1001, Orders: 15)
  - Premium Customer B (ID: 1002, Orders: 12)
  - Premium Customer C (ID: 1003, Orders: 10)
Page 2:
  - Regular Customer D (ID: 1004, Orders: 8)
  - Regular Customer E (ID: 1005, Orders: 6)
  - Regular Customer F (ID: 1006, Orders: 5)

2. Efficient filtering with indexes:
High-value customers:
  - Premium Customer A (premium.a@example.com) - Registered: 2023-01-15
  - Premium Customer B (premium.b@example.com) - Registered: 2023-02-20

3. Summary statistics:
Total customers: 10
Average orders per customer: 6.3
Total revenue: $6300.00
Average customer value: $630.00
Example (SalesAnalysis) ΒΆ

ExampleOpen_salesAnalysis demonstrates practical sales data analysis

package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"path/filepath"
	"strings"
	"time"

	"github.com/nao1215/filesql"
)

// createSalesTestData creates test data for sales analysis examples
func createSalesTestData() string {
	tmpDir, err := os.MkdirTemp("", "filesql_sales_example")
	if err != nil {
		log.Fatal(err)
	}

	salesData := `order_id,customer_id,product_name,category,quantity,unit_price,order_date,region
1,101,Laptop Pro,Electronics,2,1299.99,2024-01-15,North
2,102,Wireless Mouse,Electronics,1,29.99,2024-01-16,South
3,103,Office Chair,Furniture,1,299.99,2024-01-17,East
4,101,USB Cable,Electronics,3,12.99,2024-01-18,North
5,104,Standing Desk,Furniture,1,599.99,2024-01-19,West
6,105,Bluetooth Speaker,Electronics,2,79.99,2024-01-20,South
7,106,Coffee Table,Furniture,1,199.99,2024-01-21,East
8,102,Keyboard,Electronics,1,89.99,2024-01-22,South
9,107,Monitor 24inch,Electronics,1,249.99,2024-01-23,North
10,103,Desk Lamp,Furniture,2,39.99,2024-01-24,East`

	err = os.WriteFile(filepath.Join(tmpDir, "sales.csv"), []byte(salesData), 0600)
	if err != nil {
		log.Fatal(err)
	}

	customersData := `customer_id,name,email,city,registration_date
101,John Doe,john@example.com,New York,2023-06-01
102,Jane Smith,jane@example.com,Los Angeles,2023-07-15
103,Bob Johnson,bob@example.com,Chicago,2023-08-20
104,Alice Brown,alice@example.com,Houston,2023-09-10
105,Charlie Wilson,charlie@example.com,Phoenix,2023-10-05
106,Diana Lee,diana@example.com,Philadelphia,2023-11-12
107,Frank Miller,frank@example.com,San Antonio,2023-12-03`

	err = os.WriteFile(filepath.Join(tmpDir, "customers.csv"), []byte(customersData), 0600)
	if err != nil {
		log.Fatal(err)
	}

	return tmpDir
}

func main() {
	tmpDir := createSalesTestData()
	defer os.RemoveAll(tmpDir)

	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
	defer cancel()

	db, err := filesql.OpenContext(ctx, tmpDir)
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	// Sales summary by category and region
	query := `
		SELECT 
			category,
			region,
			COUNT(*) as order_count,
			SUM(quantity * unit_price) as total_revenue,
			AVG(quantity * unit_price) as avg_order_value,
			MIN(order_date) as first_order,
			MAX(order_date) as last_order
		FROM sales 
		GROUP BY category, region
		ORDER BY total_revenue DESC
	`

	rows, err := db.QueryContext(ctx, query)
	if err != nil {
		log.Fatal(err)
	}
	defer rows.Close()

	fmt.Println("Sales Analysis by Category and Region:")
	fmt.Println("=====================================")
	fmt.Printf("%-12s %-8s %-6s %-10s %-12s %-12s %s\n",
		"Category", "Region", "Orders", "Revenue", "Avg Order", "First Order", "Last Order")
	fmt.Println(strings.Repeat("-", 80))

	for rows.Next() {
		var category, region, firstOrder, lastOrder string
		var orderCount int
		var totalRevenue, avgOrderValue float64

		err := rows.Scan(&category, &region, &orderCount, &totalRevenue, &avgOrderValue, &firstOrder, &lastOrder)
		if err != nil {
			log.Fatal(err)
		}

		fmt.Printf("%-12s %-8s %-6d $%-9.2f $%-11.2f %-12s %s\n",
			category, region, orderCount, totalRevenue, avgOrderValue, firstOrder, lastOrder)
	}

}
Output:

Sales Analysis by Category and Region:
=====================================
Category     Region   Orders Revenue    Avg Order    First Order  Last Order
--------------------------------------------------------------------------------
Electronics  North    3      $2888.94   $962.98      2024-01-15   2024-01-23
Furniture    West     1      $599.99    $599.99      2024-01-19   2024-01-19
Furniture    East     3      $579.96    $193.32      2024-01-17   2024-01-24
Electronics  South    3      $279.96    $93.32       2024-01-16   2024-01-22
Example (WebLogAnalysis) ΒΆ

ExampleOpen_webLogAnalysis demonstrates analyzing web server logs

package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"path/filepath"
	"strings"
	"time"

	"github.com/nao1215/filesql"
)

func main() {
	tmpDir := createWebLogTestData()
	defer os.RemoveAll(tmpDir)

	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
	defer cancel()

	db, err := filesql.OpenContext(ctx, tmpDir)
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	fmt.Println("=== Web Log Analysis ===")

	// Top pages by hits
	fmt.Println("\n1. Top Pages by Hits:")
	rows, err := db.QueryContext(ctx, `
		SELECT 
			path,
			COUNT(*) as hits,
			COUNT(DISTINCT ip_address) as unique_visitors
		FROM access_logs 
		WHERE status_code = 200
		GROUP BY path
		ORDER BY hits DESC
		LIMIT 5
	`)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("%-20s %-6s %s\n", "Path", "Hits", "Unique")
	fmt.Println(strings.Repeat("-", 35))
	for rows.Next() {
		var path string
		var hits, unique int
		if err := rows.Scan(&path, &hits, &unique); err != nil {
			log.Fatal(err)
		}
		fmt.Printf("%-20s %-6d %d\n", path, hits, unique)
	}
	_ = rows.Close() // Ignore close error in test cleanup

	// Error analysis
	fmt.Println("\n2. Error Analysis:")
	rows, err = db.QueryContext(ctx, `
		SELECT 
			status_code,
			COUNT(*) as error_count,
			ROUND(COUNT(*) * 100.0 / (SELECT COUNT(*) FROM access_logs), 2) as percentage
		FROM access_logs 
		WHERE status_code >= 400
		GROUP BY status_code
		ORDER BY error_count DESC
	`)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("%-12s %-6s %-10s\n", "Status Code", "Count", "Percentage")
	fmt.Println(strings.Repeat("-", 30))
	for rows.Next() {
		var statusCode, errorCount int
		var percentage float64
		if err := rows.Scan(&statusCode, &errorCount, &percentage); err != nil {
			log.Fatal(err)
		}
		fmt.Printf("%-12d %-6d %-10.2f%%\n", statusCode, errorCount, percentage)
	}
	_ = rows.Close() // Ignore close error in test cleanup

	// Hourly traffic pattern
	fmt.Println("\n3. Traffic by Hour:")
	rows, err = db.QueryContext(ctx, `
		SELECT 
			CAST(strftime('%H', timestamp) AS INTEGER) as hour,
			COUNT(*) as requests,
			AVG(response_time) as avg_response_time
		FROM access_logs
		GROUP BY hour
		ORDER BY hour
	`)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("%-5s %-9s %-12s\n", "Hour", "Requests", "Avg Response")
	fmt.Println(strings.Repeat("-", 28))
	for rows.Next() {
		var hour, requests int
		var avgResponseTime float64
		if err := rows.Scan(&hour, &requests, &avgResponseTime); err != nil {
			log.Fatal(err)
		}
		fmt.Printf("%-5d %-9d %-12.0fms\n", hour, requests, avgResponseTime)
	}
	_ = rows.Close() // Ignore close error in test cleanup

}

// createWebLogTestData creates test data for web log analysis examples
func createWebLogTestData() string {
	tmpDir, err := os.MkdirTemp("", "filesql_weblog_example_*")
	if err != nil {
		log.Fatal(err)
	}

	accessLogsData := `timestamp,ip_address,method,path,status_code,response_time,user_agent
2024-01-01 09:15:30,192.168.1.100,GET,/,200,150,Mozilla/5.0
2024-01-01 09:30:45,192.168.1.101,GET,/products,200,200,Mozilla/5.0
2024-01-01 10:05:15,192.168.1.100,GET,/,200,120,Mozilla/5.0
2024-01-01 10:20:30,192.168.1.102,GET,/about,200,180,Mozilla/5.0
2024-01-01 10:35:45,192.168.1.101,GET,/products,200,160,Mozilla/5.0
2024-01-01 11:10:15,192.168.1.103,GET,/contact,200,140,Mozilla/5.0
2024-01-01 11:25:30,192.168.1.100,GET,/,200,200,Mozilla/5.0
2024-01-01 11:40:45,192.168.1.104,GET,/missing,404,50,Mozilla/5.0
2024-01-01 14:15:30,192.168.1.105,GET,/notfound,404,100,Mozilla/5.0`

	accessLogsFile := filepath.Join(tmpDir, "access_logs.csv")
	err = os.WriteFile(accessLogsFile, []byte(accessLogsData), 0600)
	if err != nil {
		log.Fatal(err)
	}

	return tmpDir
}
Output:

=== Web Log Analysis ===

1. Top Pages by Hits:
Path                 Hits   Unique
-----------------------------------
/                    3      1
/products            2      1
/contact             1      1
/about               1      1

2. Error Analysis:
Status Code  Count  Percentage
------------------------------
404          2      22.22     %

3. Traffic by Hour:
Hour  Requests  Avg Response
----------------------------
9     2         175         ms
10    3         153         ms
11    3         130         ms
14    1         100         ms

func OpenContext ΒΆ added in v0.0.2

func OpenContext(ctx context.Context, paths ...string) (*sql.DB, error)

OpenContext is like Open but accepts a context for cancellation and timeout control.

Use this when you need to:

  • Set timeouts for loading large files
  • Support cancellation in server applications
  • Integrate with context-aware code

Example with timeout:

// Open a single CSV file with timeout
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
db, err := filesql.OpenContext(ctx, "data/users.csv")
if err != nil {
	log.Fatal(err)
}
defer db.Close()

// Complex query with JOINs, aggregation, and window functions
rows, err := db.QueryContext(ctx, `
	SELECT
		u.name,
		u.department,
		u.salary,
		AVG(u.salary) OVER (PARTITION BY u.department) as dept_avg_salary,
		RANK() OVER (PARTITION BY u.department ORDER BY u.salary DESC) as salary_rank,
		COUNT(*) OVER (PARTITION BY u.department) as dept_size
	FROM users u
	WHERE u.salary > (
		SELECT AVG(salary) * 0.8
		FROM users
		WHERE department = u.department
	)
	ORDER BY u.department, u.salary DESC
`)
if err != nil {
	log.Fatal(err)
}
defer rows.Close()

// Process results
for rows.Next() {
	var name, dept string
	var salary, deptAvg float64
	var rank, deptSize int
	if err := rows.Scan(&name, &dept, &salary, &deptAvg, &rank, &deptSize); err != nil {
		log.Fatal(err)
	}
	fmt.Printf("%s (%s): $%.2f (Rank: %d/%d, Dept Avg: $%.2f)\n",
		name, dept, salary, rank, deptSize, deptAvg)
}
Example ΒΆ

ExampleOpenContext demonstrates opening files with context support for timeout and cancellation

package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"path/filepath"
	"time"

	"github.com/nao1215/filesql"
)

// createTempTestData creates temporary CSV files for the example
func createTempTestData() string {
	tmpDir, err := os.MkdirTemp("", "filesql_example")
	if err != nil {
		log.Fatal(err)
	}

	employeesData := `id,name,department_id,salary,hire_date
1,Alice Johnson,1,95000,2020-01-15
2,Bob Smith,1,85000,2019-03-22
3,Charlie Brown,1,80000,2021-06-10
4,David Wilson,1,75000,2022-02-28
5,Eve Davis,2,70000,2020-09-15
6,Frank Miller,2,65000,2021-11-30
7,Grace Lee,3,60000,2019-12-05
8,Henry Taylor,3,55000,2022-04-18`

	err = os.WriteFile(filepath.Join(tmpDir, "employees.csv"), []byte(employeesData), 0600)
	if err != nil {
		log.Fatal(err)
	}

	departmentsData := `id,name,budget,manager_id
1,Engineering,1000000,1
2,Marketing,800000,5
3,Sales,600000,7
4,HR,400000,9`

	err = os.WriteFile(filepath.Join(tmpDir, "departments.csv"), []byte(departmentsData), 0600)
	if err != nil {
		log.Fatal(err)
	}

	return tmpDir
}

func main() {
	tmpDir := createTempTestData()
	defer os.RemoveAll(tmpDir)

	// Create a context with timeout
	ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
	defer cancel()

	// Open database with context
	db, err := filesql.OpenContext(ctx, filepath.Join(tmpDir, "employees.csv"))
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	// Query with context support
	rows, err := db.QueryContext(ctx, `
		SELECT name, salary 
		FROM employees 
		WHERE salary > 70000 
		ORDER BY salary DESC
	`)
	if err != nil {
		log.Fatal(err)
	}
	defer rows.Close()

	fmt.Println("High earners (>$70,000):")
	for rows.Next() {
		var name string
		var salary float64
		if err := rows.Scan(&name, &salary); err != nil {
			log.Fatal(err)
		}
		fmt.Printf("- %s: $%.0f\n", name, salary)
	}

}
Output:

High earners (>$70,000):
- Alice Johnson: $95000
- Bob Smith: $85000
- Charlie Brown: $80000
- David Wilson: $75000

Types ΒΆ

type ChunkSize ΒΆ added in v0.4.2

type ChunkSize int

ChunkSize represents a chunk size with validation

func NewChunkSize ΒΆ added in v0.4.2

func NewChunkSize(size int) ChunkSize

NewChunkSize creates a new ChunkSize with validation

func (ChunkSize) Int ΒΆ added in v0.4.2

func (cs ChunkSize) Int() int

Int returns the int value of ChunkSize

func (ChunkSize) IsValid ΒΆ added in v0.4.2

func (cs ChunkSize) IsValid() bool

IsValid checks if the chunk size is valid

func (ChunkSize) String ΒΆ added in v0.4.2

func (cs ChunkSize) String() string

String returns the string representation of ChunkSize

type CompressionFactory ΒΆ added in v0.4.4

type CompressionFactory struct{}

CompressionFactory provides factory methods for compression handling

func NewCompressionFactory ΒΆ added in v0.4.4

func NewCompressionFactory() *CompressionFactory

NewCompressionFactory creates a new compression factory

func (*CompressionFactory) CreateHandlerForFile ΒΆ added in v0.4.4

func (f *CompressionFactory) CreateHandlerForFile(path string) CompressionHandler

CreateHandlerForFile creates an appropriate compression handler for a given file path

func (*CompressionFactory) CreateReaderForFile ΒΆ added in v0.4.4

func (f *CompressionFactory) CreateReaderForFile(path string) (io.Reader, func() error, error)

CreateReaderForFile opens a file and returns a reader that handles decompression

func (*CompressionFactory) CreateWriterForFile ΒΆ added in v0.4.4

func (f *CompressionFactory) CreateWriterForFile(path string, compressionType CompressionType) (io.Writer, func() error, error)

CreateWriterForFile creates a file and returns a writer that handles compression

func (*CompressionFactory) DetectCompressionType ΒΆ added in v0.4.4

func (f *CompressionFactory) DetectCompressionType(path string) CompressionType

DetectCompressionType detects the compression type from a file path

func (*CompressionFactory) GetBaseFileType ΒΆ added in v0.4.4

func (f *CompressionFactory) GetBaseFileType(path string) FileType

GetBaseFileType determines the base file type after removing compression extensions

func (*CompressionFactory) RemoveCompressionExtension ΒΆ added in v0.4.4

func (f *CompressionFactory) RemoveCompressionExtension(path string) string

RemoveCompressionExtension removes the compression extension from a file path if present

type CompressionHandler ΒΆ added in v0.4.4

type CompressionHandler interface {
	// CreateReader wraps an io.Reader with a decompression reader if needed
	CreateReader(reader io.Reader) (io.Reader, func() error, error)
	// CreateWriter wraps an io.Writer with a compression writer if needed
	CreateWriter(writer io.Writer) (io.Writer, func() error, error)
	// Extension returns the file extension for this compression type (e.g., ".gz")
	Extension() string
}

CompressionHandler defines the interface for handling file compression/decompression

func NewCompressionHandler ΒΆ added in v0.4.4

func NewCompressionHandler(compressionType CompressionType) CompressionHandler

NewCompressionHandler creates a new compression handler for the given compression type

type CompressionType ΒΆ added in v0.0.4

type CompressionType int

CompressionType represents the compression type

const (
	// CompressionNone represents no compression
	CompressionNone CompressionType = iota
	// CompressionGZ represents gzip compression
	CompressionGZ
	// CompressionBZ2 represents bzip2 compression
	CompressionBZ2
	// CompressionXZ represents xz compression
	CompressionXZ
	// CompressionZSTD represents zstd compression
	CompressionZSTD
)

func (CompressionType) Extension ΒΆ added in v0.2.0

func (c CompressionType) Extension() string

Extension returns the file extension for the compression type

func (CompressionType) String ΒΆ added in v0.2.0

func (c CompressionType) String() string

String returns the string representation of CompressionType

type DBBuilder ΒΆ added in v0.1.0

type DBBuilder struct {
	// contains filtered or unexported fields
}

DBBuilder configures and creates database connections from various data sources.

Basic usage:

builder := filesql.NewBuilder().
	AddPath("data.csv").
	AddPath("users.tsv")

validatedBuilder, err := builder.Build(ctx)
if err != nil {
	return err
}

db, err := validatedBuilder.Open(ctx)
defer db.Close()

Supports:

  • File paths (AddPath)
  • Embedded filesystems (AddFS)
  • io.Reader streams (AddReader)
  • Auto-save functionality (EnableAutoSave)
Example (Chaining) ΒΆ
package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"path/filepath"
	"testing/fstest"

	"github.com/nao1215/filesql"
)

func main() {
	// Create temporary files
	tempDir, _ := os.MkdirTemp("", "filesql-example")
	defer os.RemoveAll(tempDir)

	csvFile := filepath.Join(tempDir, "data1.csv")
	content1 := "id,name\n1,Alice\n2,Bob\n"
	os.WriteFile(csvFile, []byte(content1), 0644)

	tsvFile := filepath.Join(tempDir, "data2.tsv")
	content2 := "id\tproduct\n1\tLaptop\n2\tPhone\n"
	os.WriteFile(tsvFile, []byte(content2), 0644)

	// Create mock filesystem
	mockFS := fstest.MapFS{
		"logs.ltsv": &fstest.MapFile{Data: []byte("time:2024-01-01T00:00:00Z\tlevel:info\n")},
	}

	// Demonstrate method chaining
	ctx := context.Background()
	db, err := filesql.NewBuilder().
		AddPath(csvFile).
		AddPaths(tsvFile).
		AddFS(mockFS).
		Build(ctx)
	if err != nil {
		log.Fatal(err)
	}

	connection, err := db.Open(ctx)
	if err != nil {
		log.Fatal(err)
	}
	defer connection.Close()

	// Count tables from different sources
	rows, err := connection.Query("SELECT name FROM sqlite_master WHERE type='table' ORDER BY name")
	if err != nil {
		log.Fatal(err)
	}
	defer rows.Close()

	var tableCount int
	for rows.Next() {
		var tableName string
		rows.Scan(&tableName)
		tableCount++
	}

	fmt.Printf("Successfully loaded %d tables from mixed sources\n", tableCount)
}
Output:

Successfully loaded 3 tables from mixed sources
Example (ErrorHandling) ΒΆ
package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"path/filepath"

	"github.com/nao1215/filesql"
)

func main() {
	// Example 1: Build without inputs should fail
	builder := filesql.NewBuilder()
	ctx := context.Background()

	_, err := builder.Build(ctx)
	if err != nil {
		fmt.Printf("Expected error for no inputs: %v\n", err)
	}

	// Example 2: Open without Build should fail
	builder2 := filesql.NewBuilder().AddPath("nonexistent.csv")
	_, err = builder2.Open(ctx)
	if err != nil {
		fmt.Println("Expected error for Open without Build")
	}

	// Example 3: Non-existent file should fail during Build
	builder3 := filesql.NewBuilder().AddPath(filepath.Join("nonexistent", "file.csv"))
	_, err = builder3.Build(ctx)
	if err != nil {
		fmt.Println("Expected error for non-existent file")
	}

	// Example 4: Success case
	tempDir, _ := os.MkdirTemp("", "filesql-example")
	defer os.RemoveAll(tempDir)

	csvFile := filepath.Join(tempDir, "valid.csv")
	os.WriteFile(csvFile, []byte("id,name\n1,test\n"), 0644)

	builder4 := filesql.NewBuilder().AddPath(csvFile)
	validatedBuilder, err := builder4.Build(ctx)
	if err != nil {
		log.Fatal(err)
	}

	db, err := validatedBuilder.Open(ctx)
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	fmt.Println("Success: Valid file loaded correctly")

}
Output:

Expected error for no inputs: at least one path must be provided
Expected error for Open without Build
Expected error for non-existent file
Success: Valid file loaded correctly

func NewBuilder ΒΆ added in v0.1.0

func NewBuilder() *DBBuilder

NewBuilder creates a new database builder.

Start here when you need:

  • Multiple data sources (files, streams, embedded FS)
  • Auto-save functionality
  • Custom chunk sizes for large files
  • More control than the simple Open() function

Example:

builder := filesql.NewBuilder().
	AddPath("data.csv").
	EnableAutoSave("./backup")
Example ΒΆ

ExampleNewBuilder demonstrates the basic usage of the Builder pattern. This is the recommended approach for most use cases, especially when working with embedded filesystems or when you need more control over the database creation process.

package main

import (
	"fmt"

	"github.com/nao1215/filesql"
)

func main() {
	// Create a new builder - this is the starting point for all Builder pattern usage
	builder := filesql.NewBuilder()

	// The builder supports method chaining for a fluent API
	// You can add individual paths or multiple paths at once
	builder.AddPath("users.csv").AddPaths("orders.tsv", "products.ltsv")

	fmt.Printf("Builder created successfully: %t\n", builder != nil)

	// In real usage, you would continue with:
	// ctx := context.Background()
	// validatedBuilder, err := builder.Build(ctx)
	// if err != nil { return err }
	// db, err := validatedBuilder.Open(ctx)
	// if err != nil { return err }
	// defer db.Close()

}
Output:

Builder created successfully: true

func (*DBBuilder) AddFS ΒΆ added in v0.1.0

func (b *DBBuilder) AddFS(filesystem fs.FS) *DBBuilder

AddFS adds files from an embedded filesystem (go:embed).

Automatically finds all CSV, TSV, and LTSV files in the filesystem.

Example:

//go:embed data/*.csv data/*.tsv
var dataFS embed.FS

builder.AddFS(dataFS)

Returns self for chaining.

Example ΒΆ
package main

import (
	"context"
	"fmt"
	"log"
	"testing/fstest"

	"github.com/nao1215/filesql"
)

func main() {
	// Create mock filesystem with test data
	mockFS := fstest.MapFS{
		"users.csv":    &fstest.MapFile{Data: []byte("id,name,department\n1,Alice,Engineering\n2,Bob,Sales\n")},
		"products.tsv": &fstest.MapFile{Data: []byte("id\tname\tprice\n1\tLaptop\t1000\n2\tPhone\t500\n")},
		"logs.ltsv":    &fstest.MapFile{Data: []byte("time:2024-01-01T00:00:00Z\tlevel:info\tmsg:started\n")},
		"readme.txt":   &fstest.MapFile{Data: []byte("This file will be ignored\n")}, // unsupported format
	}

	// Use builder with filesystem
	builder := filesql.NewBuilder().AddFS(mockFS)

	ctx := context.Background()
	validatedBuilder, err := builder.Build(ctx)
	if err != nil {
		log.Fatal(err)
	}

	db, err := validatedBuilder.Open(ctx)
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	// Clean up temporary files

	// List all tables that were created from the filesystem
	rows, err := db.Query("SELECT name FROM sqlite_master WHERE type='table' ORDER BY name")
	if err != nil {
		log.Fatal(err)
	}
	defer rows.Close()

	var tableCount int
	for rows.Next() {
		var tableName string
		rows.Scan(&tableName)
		tableCount++
	}

	fmt.Printf("Created %d tables from filesystem\n", tableCount)
}
Output:

Created 3 tables from filesystem
Example (EmbedFS) ΒΆ
package main

import (
	"context"
	"embed"
	"fmt"
	"io/fs"
	"log"

	"github.com/nao1215/filesql"
)

//go:embed testdata/embed_test/*.csv testdata/embed_test/*.tsv
var builderExampleFS embed.FS

func main() {
	// Use embedded test filesystem
	subFS, err := fs.Sub(builderExampleFS, "testdata/embed_test")
	if err != nil {
		log.Fatal(err)
	}

	// Use builder with embedded filesystem
	builder := filesql.NewBuilder().AddFS(subFS)

	ctx := context.Background()
	validatedBuilder, err := builder.Build(ctx)
	if err != nil {
		log.Fatal(err)
	}

	db, err := validatedBuilder.Open(ctx)
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	// Clean up temporary files

	// Count the number of tables created from embedded files
	rows, err := db.Query("SELECT COUNT(*) FROM sqlite_master WHERE type='table'")
	if err != nil {
		log.Fatal(err)
	}
	defer rows.Close()

	var tableCount int
	if rows.Next() {
		rows.Scan(&tableCount)
	}

	fmt.Printf("Created %d tables from embedded files\n", tableCount)
}
Output:

Created 3 tables from embedded files

func (*DBBuilder) AddPath ΒΆ added in v0.1.0

func (b *DBBuilder) AddPath(path string) *DBBuilder

AddPath adds a file or directory to load.

Examples:

  • Single file: AddPath("users.csv")
  • Compressed: AddPath("data.tsv.gz")
  • Directory: AddPath("/data/") // loads all CSV/TSV/LTSV files

Returns self for chaining.

Example ΒΆ
package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"path/filepath"

	"github.com/nao1215/filesql"
)

func main() {
	// Create temporary CSV file for example
	tempDir, _ := os.MkdirTemp("", "filesql-example")
	defer os.RemoveAll(tempDir)

	csvFile := filepath.Join(tempDir, "users.csv")
	content := "id,name,age\n1,Alice,30\n2,Bob,25\n"
	os.WriteFile(csvFile, []byte(content), 0644)

	// Use builder to add a single file path
	builder := filesql.NewBuilder().AddPath(csvFile)

	// Build and open database
	ctx := context.Background()
	validatedBuilder, err := builder.Build(ctx)
	if err != nil {
		log.Fatal(err)
	}

	db, err := validatedBuilder.Open(ctx)
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	// Query the data
	rows, err := db.Query("SELECT COUNT(*) FROM users")
	if err != nil {
		log.Fatal(err)
	}
	defer rows.Close()

	var count int
	if rows.Next() {
		rows.Scan(&count)
	}
	fmt.Printf("Number of users: %d\n", count)
}
Output:

Number of users: 2

func (*DBBuilder) AddPaths ΒΆ added in v0.1.0

func (b *DBBuilder) AddPaths(paths ...string) *DBBuilder

AddPaths adds multiple files or directories at once.

Example:

builder.AddPaths("users.csv", "products.tsv", "/data/logs/")

Returns self for chaining.

Example ΒΆ
package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"path/filepath"

	"github.com/nao1215/filesql"
)

func main() {
	// Create temporary files for example
	tempDir, _ := os.MkdirTemp("", "filesql-example")
	defer os.RemoveAll(tempDir)

	// Create users.csv
	usersFile := filepath.Join(tempDir, "users.csv")
	usersContent := "id,name\n1,Alice\n2,Bob\n"
	os.WriteFile(usersFile, []byte(usersContent), 0644)

	// Create products.csv
	productsFile := filepath.Join(tempDir, "products.csv")
	productsContent := "id,product_name\n1,Laptop\n2,Phone\n"
	os.WriteFile(productsFile, []byte(productsContent), 0644)

	// Use builder to add multiple file paths
	builder := filesql.NewBuilder().AddPaths(usersFile, productsFile)

	ctx := context.Background()
	validatedBuilder, err := builder.Build(ctx)
	if err != nil {
		log.Fatal(err)
	}

	db, err := validatedBuilder.Open(ctx)
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	// Query data from both tables
	rows, err := db.Query(`
		SELECT u.name, p.product_name 
		FROM users u 
		JOIN products p ON u.id = p.id
	`)
	if err != nil {
		log.Fatal(err)
	}
	defer rows.Close()

	for rows.Next() {
		var name, product string
		rows.Scan(&name, &product)
		fmt.Printf("%s has %s\n", name, product)
	}
}
Output:

Alice has Laptop
Bob has Phone

func (*DBBuilder) AddReader ΒΆ added in v0.2.0

func (b *DBBuilder) AddReader(reader io.Reader, tableName string, fileType FileType) *DBBuilder

AddReader adds data from an io.Reader (file, network stream, etc.).

Parameters:

  • reader: Any io.Reader (file, bytes.Buffer, http.Response.Body, etc.)
  • tableName: Name for the SQL table (e.g., "users")
  • fileType: Data format (FileTypeCSV, FileTypeTSV, FileTypeLTSV, etc.)

Example:

resp, _ := http.Get("https://example.com/data.csv")
builder.AddReader(resp.Body, "remote_data", FileTypeCSV)

Returns self for chaining.

Example ΒΆ

ExampleDBBuilder_AddReader demonstrates using io.Reader as a data source

package main

import (
	"context"
	"fmt"
	"log"
	"strings"

	"github.com/nao1215/filesql"
)

func main() {
	// CSV data from an io.Reader (could be from network, API response, etc.)
	csvData := `id,name,department,salary
1,Alice,Engineering,95000
2,Bob,Sales,78000
3,Charlie,Engineering,102000
4,Diana,Marketing,85000`

	// Create a reader from the CSV data
	reader := strings.NewReader(csvData)

	// Build database with Reader input
	ctx := context.Background()
	builder := filesql.NewBuilder().
		AddReader(reader, "employees", filesql.FileTypeCSV). // Specify table name and type explicitly
		SetDefaultChunkSize(5000)                            // Set 5000 rows per chunk for large data

	// Build validates the input
	validatedBuilder, err := builder.Build(ctx)
	if err != nil {
		log.Fatal(err)
	}

	// Open creates the database connection
	db, err := validatedBuilder.Open(ctx)
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	// Query the data
	rows, err := db.Query(`
		SELECT name, department, salary 
		FROM employees 
		WHERE salary > 80000 
		ORDER BY salary DESC
	`)
	if err != nil {
		log.Fatal(err)
	}
	defer rows.Close()

	// Process results
	fmt.Println("High earners (salary > 80,000):")
	for rows.Next() {
		var name, dept string
		var salary int
		if err := rows.Scan(&name, &dept, &salary); err != nil {
			log.Fatal(err)
		}
		fmt.Printf("- %s (%s): $%d\n", name, dept, salary)
	}
}
Output:

High earners (salary > 80,000):
- Charlie (Engineering): $102000
- Alice (Engineering): $95000
- Diana (Marketing): $85000
Example (Compressed) ΒΆ

ExampleDBBuilder_AddReader_compressed demonstrates using compressed data from io.Reader

package main

import (
	"bytes"
	"context"
	"fmt"
	"log"

	"github.com/nao1215/filesql"
)

func main() {
	// Simulate compressed TSV data (in practice, this would be actual compressed data)
	tsvData := "product_id\tproduct_name\tprice\n1\tLaptop\t999\n2\tMouse\t25\n3\tKeyboard\t75"
	reader := bytes.NewReader([]byte(tsvData))

	ctx := context.Background()
	builder := filesql.NewBuilder().
		// Specify that this is TSV data (not actually compressed in this example)
		AddReader(reader, "products", filesql.FileTypeTSV)

	validatedBuilder, err := builder.Build(ctx)
	if err != nil {
		log.Fatal(err)
	}

	db, err := validatedBuilder.Open(ctx)
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	// Query the products
	var count int
	err = db.QueryRow("SELECT COUNT(*) FROM products WHERE price < 100").Scan(&count)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("Products under $100: %d\n", count)
}
Output:

Products under $100: 2
Example (Multiple) ΒΆ

ExampleDBBuilder_AddReader_multiple demonstrates combining multiple readers with files

package main

import (
	"context"
	"fmt"
	"log"
	"strings"

	"github.com/nao1215/filesql"
)

func main() {
	// First reader: Users data
	usersCSV := `user_id,name,email
1,Alice,alice@example.com
2,Bob,bob@example.com`

	// Second reader: Orders data
	ordersCSV := `order_id,user_id,amount
101,1,250
102,2,180
103,1,320`

	ctx := context.Background()
	builder := filesql.NewBuilder().
		AddReader(strings.NewReader(usersCSV), "users", filesql.FileTypeCSV).
		AddReader(strings.NewReader(ordersCSV), "orders", filesql.FileTypeCSV).
		SetDefaultChunkSize(2500) // 2500 rows per chunk

	validatedBuilder, err := builder.Build(ctx)
	if err != nil {
		log.Fatal(err)
	}

	db, err := validatedBuilder.Open(ctx)
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	// Join query across both tables
	rows, err := db.Query(`
		SELECT u.name, SUM(o.amount) as total_spent
		FROM users u
		JOIN orders o ON u.user_id = o.user_id
		GROUP BY u.user_id, u.name
		ORDER BY total_spent DESC
	`)
	if err != nil {
		log.Fatal(err)
	}
	defer rows.Close()

	fmt.Println("Customer spending:")
	for rows.Next() {
		var name string
		var total float64
		if err := rows.Scan(&name, &total); err != nil {
			log.Fatal(err)
		}
		fmt.Printf("- %s: $%.0f\n", name, total)
	}
}
Output:

Customer spending:
- Alice: $570
- Bob: $180

func (*DBBuilder) Build ΒΆ added in v0.1.0

func (b *DBBuilder) Build(ctx context.Context) (*DBBuilder, error)

Build validates all configured inputs and prepares the builder for opening a database. This method must be called before Open(). It performs the following operations:

1. Validates that at least one input source is configured 2. Checks existence and format of all file paths 3. Processes embedded filesystems by converting files to streaming readers 4. Validates that all files have supported extensions

After successful validation, the builder is ready to create database connections with Open(). The context is used for file operations and can be used for cancellation.

Returns the same builder instance for method chaining, or an error if validation fails.

Example ΒΆ
package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"path/filepath"

	"github.com/nao1215/filesql"
)

func main() {
	// Create temporary CSV file
	tempDir, _ := os.MkdirTemp("", "filesql-example")
	defer os.RemoveAll(tempDir)

	csvFile := filepath.Join(tempDir, "data.csv")
	content := "name,value\ntest,123\n"
	os.WriteFile(csvFile, []byte(content), 0644)

	// Build validates inputs and prepares for opening
	builder := filesql.NewBuilder().AddPath(csvFile)

	ctx := context.Background()
	validatedBuilder, err := builder.Build(ctx)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("Builder validated successfully: %t\n", validatedBuilder != nil)
}
Output:

Builder validated successfully: true

func (*DBBuilder) DisableAutoSave ΒΆ added in v0.1.0

func (b *DBBuilder) DisableAutoSave() *DBBuilder

DisableAutoSave disables automatic saving (default behavior). Returns the builder for method chaining.

func (*DBBuilder) EnableAutoSave ΒΆ added in v0.1.0

func (b *DBBuilder) EnableAutoSave(outputDir string, options ...DumpOptions) *DBBuilder

EnableAutoSave automatically saves changes when the database is closed.

Parameters:

  • outputDir: Where to save files
  • "" (empty): Overwrite original files
  • "./backup": Save to backup directory

Example:

builder.AddPath("data.csv").
	EnableAutoSave("") // Auto-save to original file on db.Close()

Returns self for chaining.

Example ΒΆ

ExampleDBBuilder_EnableAutoSave demonstrates automatic saving on database close. This feature automatically saves modified data when the database connection is closed, providing a convenient way to persist changes without manual intervention.

package main

import (
	"context"
	"fmt"
	"os"
	"path/filepath"
	"time"

	"github.com/nao1215/filesql"
)

func main() {
	// Create temporary directory and test file
	tempDir, _ := os.MkdirTemp("", "filesql-autosave-example")
	defer os.RemoveAll(tempDir)

	// Create sample CSV file
	csvPath := filepath.Join(tempDir, "employees.csv")
	csvContent := "name,department,salary\nAlice,Engineering,80000\nBob,Marketing,65000\n"
	_ = os.WriteFile(csvPath, []byte(csvContent), 0600)

	// Create output directory
	outputDir := filepath.Join(tempDir, "backup")
	_ = os.MkdirAll(outputDir, 0750)

	ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
	defer cancel()

	// Configure builder with auto-save on close
	builder := filesql.NewBuilder().
		AddPath(csvPath).
		EnableAutoSave(outputDir, filesql.NewDumpOptions()) // Save to backup directory on close

	validatedBuilder, _ := builder.Build(ctx)

	db, _ := validatedBuilder.Open(ctx)

	// Modify data - this will be automatically saved when db.Close() is called
	_, _ = db.ExecContext(ctx, "INSERT INTO employees (name, department, salary) VALUES ('Charlie', 'Sales', 70000)")
	_, _ = db.ExecContext(ctx, "UPDATE employees SET salary = 85000 WHERE name = 'Alice'")

	// Close database - triggers automatic save to backup directory
	_ = db.Close()

	// Verify the backup file was created and contains our changes
	backupFile := filepath.Join(outputDir, "employees.csv")
	if _, err := os.Stat(backupFile); err == nil {
		fmt.Println("Auto-save completed successfully")
	}

}
Output:

Auto-save completed successfully

func (*DBBuilder) EnableAutoSaveOnCommit ΒΆ added in v0.1.0

func (b *DBBuilder) EnableAutoSaveOnCommit(outputDir string, options ...DumpOptions) *DBBuilder

EnableAutoSaveOnCommit automatically saves changes after each transaction commit.

Use this for real-time persistence. Note: May impact performance.

Example:

builder.AddPath("data.csv").
	EnableAutoSaveOnCommit("./output") // Save after each commit

Returns self for chaining.

Example ΒΆ

ExampleDBBuilder_EnableAutoSaveOnCommit demonstrates automatic saving on transaction commit. This provides more frequent saves but may impact performance for workloads with many commits.

package main

import (
	"context"
	"fmt"
	"os"
	"path/filepath"
	"time"

	"github.com/nao1215/filesql"
)

func main() {
	// Create temporary directory and test file
	tempDir, _ := os.MkdirTemp("", "filesql-commit-save-example")
	defer os.RemoveAll(tempDir)

	// Create sample CSV file
	csvPath := filepath.Join(tempDir, "transactions.csv")
	csvContent := "id,amount,status\n1,100.50,pending\n2,250.75,pending\n"
	_ = os.WriteFile(csvPath, []byte(csvContent), 0600)

	ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
	defer cancel()

	// Configure builder with auto-save on commit
	// Using temp directory to keep the example self-contained
	builder := filesql.NewBuilder().
		AddPath(csvPath).
		EnableAutoSaveOnCommit(tempDir, filesql.NewDumpOptions()) // Save to temp directory on each commit

	validatedBuilder, _ := builder.Build(ctx)

	db, _ := validatedBuilder.Open(ctx)
	defer func() { _ = db.Close() }()

	// Start transaction
	tx, _ := db.BeginTx(ctx, nil)

	// Process transactions within the transaction
	_, _ = tx.ExecContext(ctx, "UPDATE transactions SET status = 'completed' WHERE id = 1")
	_, _ = tx.ExecContext(ctx, "INSERT INTO transactions (id, amount, status) VALUES (3, 175.25, 'completed')")

	// Commit transaction - triggers automatic save
	_ = tx.Commit()

	fmt.Println("Transaction committed with auto-save")

}
Output:

Transaction committed with auto-save

func (*DBBuilder) Open ΒΆ added in v0.1.0

func (b *DBBuilder) Open(ctx context.Context) (*sql.DB, error)

Open creates and returns a database connection using the configured and validated inputs. This method can only be called after Build() has been successfully executed. It creates an in-memory SQLite database and loads all configured files as tables using streaming.

Table names are derived from file names without extensions: - "users.csv" becomes table "users" - "data.tsv.gz" becomes table "data"

The returned database connection supports the full SQLite3 SQL syntax. Auto-save functionality is supported for both file paths and reader inputs. The caller is responsible for closing the connection when done.

Returns a *sql.DB connection or an error if the database cannot be created.

Example ΒΆ
package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"path/filepath"

	"github.com/nao1215/filesql"
)

func main() {
	// Create temporary CSV file
	tempDir, _ := os.MkdirTemp("", "filesql-example")
	defer os.RemoveAll(tempDir)

	csvFile := filepath.Join(tempDir, "employees.csv")
	content := "id,name,salary\n1,Alice,50000\n2,Bob,60000\n3,Charlie,55000\n"
	os.WriteFile(csvFile, []byte(content), 0644)

	// Complete builder workflow: AddPath -> Build -> Open
	builder := filesql.NewBuilder().AddPath(csvFile)

	ctx := context.Background()
	validatedBuilder, err := builder.Build(ctx)
	if err != nil {
		log.Fatal(err)
	}

	db, err := validatedBuilder.Open(ctx)
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	// Perform complex SQL query
	rows, err := db.Query(`
		SELECT name, salary,
		       salary - (SELECT AVG(salary) FROM employees) as salary_diff
		FROM employees 
		WHERE salary > (SELECT AVG(salary) FROM employees)
		ORDER BY salary DESC
	`)
	if err != nil {
		log.Fatal(err)
	}
	defer rows.Close()

	for rows.Next() {
		var name string
		var salary, diff float64
		rows.Scan(&name, &salary, &diff)
		fmt.Printf("%s: $%.0f (+$%.0f above average)\n", name, salary, diff)
	}
}
Output:

Bob: $60000 (+$5000 above average)

func (*DBBuilder) SetDefaultChunkSize ΒΆ added in v0.2.0

func (b *DBBuilder) SetDefaultChunkSize(size int) *DBBuilder

SetDefaultChunkSize sets chunk size (number of rows) for large file processing.

Default: 1000 rows. Adjust based on available memory and processing needs.

Example:

builder.SetDefaultChunkSize(5000) // 5000 rows per chunk

Returns self for chaining.

type DumpOptions ΒΆ added in v0.0.4

type DumpOptions struct {
	// Format specifies the output file format
	Format OutputFormat
	// Compression specifies the compression type
	Compression CompressionType
}

DumpOptions configures how database tables are exported to files.

Example:

options := NewDumpOptions().
	WithFormat(OutputFormatTSV).
	WithCompression(CompressionGZ)

err := DumpDatabase(db, "./output", options)

func NewDumpOptions ΒΆ added in v0.0.4

func NewDumpOptions() DumpOptions

NewDumpOptions creates default export options (CSV, no compression).

Modify with:

  • WithFormat(): Change file format (CSV, TSV, LTSV)
  • WithCompression(): Add compression (GZ, BZ2, XZ, ZSTD)

func (DumpOptions) FileExtension ΒΆ added in v0.2.0

func (o DumpOptions) FileExtension() string

FileExtension returns the complete file extension including compression

Example ΒΆ
package main

import (
	"fmt"

	"github.com/nao1215/filesql"
)

func main() {
	// Show how file extensions are built
	examples := []struct {
		format      filesql.OutputFormat
		compression filesql.CompressionType
	}{
		{filesql.OutputFormatCSV, filesql.CompressionNone},
		{filesql.OutputFormatTSV, filesql.CompressionGZ},
		{filesql.OutputFormatLTSV, filesql.CompressionBZ2},
		{filesql.OutputFormatCSV, filesql.CompressionXZ},
		{filesql.OutputFormatTSV, filesql.CompressionZSTD},
	}

	for _, ex := range examples {
		options := filesql.DumpOptions{
			Format:      ex.format,
			Compression: ex.compression,
		}
		fmt.Printf("Format: %-4s, Compression: %-4s -> Extension: %s\n",
			ex.format.String(),
			ex.compression.String(),
			options.FileExtension())
	}

}
Output:

Format: csv , Compression: none -> Extension: .csv
Format: tsv , Compression: gz   -> Extension: .tsv.gz
Format: ltsv, Compression: bz2  -> Extension: .ltsv.bz2
Format: csv , Compression: xz   -> Extension: .csv.xz
Format: tsv , Compression: zstd -> Extension: .tsv.zst

func (DumpOptions) WithCompression ΒΆ added in v0.2.0

func (o DumpOptions) WithCompression(compression CompressionType) DumpOptions

WithCompression adds compression to output files.

Options:

  • CompressionNone: No compression (default)
  • CompressionGZ: Gzip compression (.gz)
  • CompressionBZ2: Bzip2 compression (.bz2)
  • CompressionXZ: XZ compression (.xz)
  • CompressionZSTD: Zstandard compression (.zst)

func (DumpOptions) WithFormat ΒΆ added in v0.2.0

func (o DumpOptions) WithFormat(format OutputFormat) DumpOptions

WithFormat sets the output file format.

Options:

  • OutputFormatCSV: Comma-separated values
  • OutputFormatTSV: Tab-separated values
  • OutputFormatLTSV: Labeled tab-separated values
  • OutputFormatParquet: Apache Parquet columnar format

type ErrorContext ΒΆ added in v0.4.2

type ErrorContext struct {
	Operation string
	FilePath  string
	TableName string
	Details   string
}

ErrorContext provides context for where an error occurred

func NewErrorContext ΒΆ added in v0.4.2

func NewErrorContext(operation, filePath string) *ErrorContext

NewErrorContext creates a new error context

func (*ErrorContext) Error ΒΆ added in v0.4.2

func (ec *ErrorContext) Error(baseErr error) error

Error creates a formatted error with context

func (*ErrorContext) WithDetails ΒΆ added in v0.4.2

func (ec *ErrorContext) WithDetails(details string) *ErrorContext

WithDetails adds details to the error context

func (*ErrorContext) WithTable ΒΆ added in v0.4.2

func (ec *ErrorContext) WithTable(tableName string) *ErrorContext

WithTable adds table context to the error

type FileType ΒΆ added in v0.2.0

type FileType int

FileType represents supported file types including compression variants

const (
	// FileTypeCSV represents CSV file type
	FileTypeCSV FileType = iota
	// FileTypeTSV represents TSV file type
	FileTypeTSV
	// FileTypeLTSV represents LTSV file type
	FileTypeLTSV
	// FileTypeParquet represents Parquet file type
	FileTypeParquet
	// FileTypeXLSX represents Excel XLSX file type
	FileTypeXLSX
	// FileTypeCSVGZ represents gzip-compressed CSV file type
	FileTypeCSVGZ
	// FileTypeTSVGZ represents gzip-compressed TSV file type
	FileTypeTSVGZ
	// FileTypeLTSVGZ represents gzip-compressed LTSV file type
	FileTypeLTSVGZ
	// FileTypeParquetGZ represents gzip-compressed Parquet file type
	FileTypeParquetGZ
	// FileTypeCSVBZ2 represents bzip2-compressed CSV file type
	FileTypeCSVBZ2
	// FileTypeTSVBZ2 represents bzip2-compressed TSV file type
	FileTypeTSVBZ2
	// FileTypeLTSVBZ2 represents bzip2-compressed LTSV file type
	FileTypeLTSVBZ2
	// FileTypeParquetBZ2 represents bzip2-compressed Parquet file type
	FileTypeParquetBZ2
	// FileTypeCSVXZ represents xz-compressed CSV file type
	FileTypeCSVXZ
	// FileTypeTSVXZ represents xz-compressed TSV file type
	FileTypeTSVXZ
	// FileTypeLTSVXZ represents xz-compressed LTSV file type
	FileTypeLTSVXZ
	// FileTypeParquetXZ represents xz-compressed Parquet file type
	FileTypeParquetXZ
	// FileTypeCSVZSTD represents zstd-compressed CSV file type
	FileTypeCSVZSTD
	// FileTypeTSVZSTD represents zstd-compressed TSV file type
	FileTypeTSVZSTD
	// FileTypeLTSVZSTD represents zstd-compressed LTSV file type
	FileTypeLTSVZSTD
	// FileTypeParquetZSTD represents zstd-compressed Parquet file type
	FileTypeParquetZSTD
	// FileTypeXLSXGZ represents gzip-compressed Excel XLSX file type
	FileTypeXLSXGZ
	// FileTypeXLSXBZ2 represents bzip2-compressed Excel XLSX file type
	FileTypeXLSXBZ2
	// FileTypeXLSXXZ represents xz-compressed Excel XLSX file type
	FileTypeXLSXXZ
	// FileTypeXLSXZSTD represents zstd-compressed Excel XLSX file type
	FileTypeXLSXZSTD
	// FileTypeUnsupported represents unsupported file type
	FileTypeUnsupported
)

type MemoryInfo ΒΆ added in v0.4.4

type MemoryInfo struct {
	CurrentMB int64        // Current memory usage in MB
	LimitMB   int64        // Memory limit in MB
	Usage     float64      // Usage percentage (0.0-1.0)
	Status    MemoryStatus // Current status
}

MemoryInfo contains detailed memory usage information

type MemoryLimit ΒΆ added in v0.4.4

type MemoryLimit struct {
	// contains filtered or unexported fields
}

MemoryLimit provides configurable memory limits with graceful degradation for file processing operations. It monitors heap usage and can trigger memory management actions when thresholds are exceeded.

The system supports three states:

  • OK: Memory usage is within acceptable limits
  • WARNING: Memory usage approaches the limit, suggesting reduced chunk sizes
  • EXCEEDED: Memory usage has exceeded the limit, processing should be halted

Usage example:

limit := NewMemoryLimit(512) // 512MB limit
if limit.CheckMemoryUsage() == MemoryStatusExceeded {
    return limit.CreateMemoryError("processing")
}

Performance Note: CheckMemoryUsage() calls runtime.ReadMemStats which can pause for milliseconds. Use sparingly in hot paths.

Thread Safety: All methods are safe for concurrent use by multiple goroutines.

func NewMemoryLimit ΒΆ added in v0.4.4

func NewMemoryLimit(maxMemoryMB int64) *MemoryLimit

NewMemoryLimit creates a new memory limit configuration

func (*MemoryLimit) CheckMemoryUsage ΒΆ added in v0.4.4

func (ml *MemoryLimit) CheckMemoryUsage() MemoryStatus

CheckMemoryUsage checks current memory usage against limits

func (*MemoryLimit) CreateMemoryError ΒΆ added in v0.4.4

func (ml *MemoryLimit) CreateMemoryError(operation string) error

CreateMemoryError creates a memory limit error with helpful context

func (*MemoryLimit) Disable ΒΆ added in v0.4.4

func (ml *MemoryLimit) Disable()

Disable disables memory limit checking

func (*MemoryLimit) Enable ΒΆ added in v0.4.4

func (ml *MemoryLimit) Enable()

Enable enables memory limit checking

func (*MemoryLimit) GetMemoryInfo ΒΆ added in v0.4.4

func (ml *MemoryLimit) GetMemoryInfo() MemoryInfo

GetMemoryInfo returns current memory usage information

func (*MemoryLimit) IsEnabled ΒΆ added in v0.4.4

func (ml *MemoryLimit) IsEnabled() bool

IsEnabled returns whether memory limits are enabled

func (*MemoryLimit) SetWarningThreshold ΒΆ added in v0.4.4

func (ml *MemoryLimit) SetWarningThreshold(threshold float64)

SetWarningThreshold sets the warning threshold (0.0-1.0)

func (*MemoryLimit) ShouldReduceChunkSize ΒΆ added in v0.4.4

func (ml *MemoryLimit) ShouldReduceChunkSize(chunkSize int) (bool, int)

ShouldReduceChunkSize returns true if chunk size should be reduced for memory management

type MemoryPool ΒΆ added in v0.4.4

type MemoryPool struct {
	// contains filtered or unexported fields
}

MemoryPool manages a pool of reusable byte slices, record slices, and string slices to reduce memory allocations during file processing operations.

The pool automatically manages object lifecycles and includes safeguards against memory leaks by limiting the maximum size of objects that can be returned to the pool. Objects that grow beyond maxSize are discarded rather than pooled.

Usage example:

pool := NewMemoryPool(1024 * 1024) // 1MB max buffer size
buffer := pool.GetByteBuffer()
defer pool.PutByteBuffer(buffer)
// Use buffer...

Thread Safety: All methods are safe for concurrent use by multiple goroutines.

func NewMemoryPool ΒΆ added in v0.4.4

func NewMemoryPool(maxSize int) *MemoryPool

NewMemoryPool creates a new memory pool with configurable max buffer size

func (*MemoryPool) ForceGC ΒΆ added in v0.4.4

func (mp *MemoryPool) ForceGC()

ForceGC forces garbage collection and clears pools if memory pressure is high

func (*MemoryPool) GetByteBuffer ΒΆ added in v0.4.4

func (mp *MemoryPool) GetByteBuffer() []byte

GetByteBuffer gets a byte buffer from the pool

func (*MemoryPool) GetRecordSlice ΒΆ added in v0.4.4

func (mp *MemoryPool) GetRecordSlice() []Record

GetRecordSlice gets a record slice from the pool

func (*MemoryPool) GetStringSlice ΒΆ added in v0.4.4

func (mp *MemoryPool) GetStringSlice() []string

GetStringSlice gets a string slice from the pool

func (*MemoryPool) PutByteBuffer ΒΆ added in v0.4.4

func (mp *MemoryPool) PutByteBuffer(buf []byte)

PutByteBuffer returns a byte buffer to the pool if it's not too large

func (*MemoryPool) PutRecordSlice ΒΆ added in v0.4.4

func (mp *MemoryPool) PutRecordSlice(slice []Record)

PutRecordSlice returns a record slice to the pool if it's not too large

func (*MemoryPool) PutStringSlice ΒΆ added in v0.4.4

func (mp *MemoryPool) PutStringSlice(slice []string)

PutStringSlice returns a string slice to the pool if it's not too large

type MemoryStatus ΒΆ added in v0.4.4

type MemoryStatus int

MemoryStatus represents the current memory status

const (
	// MemoryStatusOK indicates memory usage is within acceptable limits
	MemoryStatusOK MemoryStatus = iota
	// MemoryStatusWarning indicates memory usage is approaching the limit
	MemoryStatusWarning
	// MemoryStatusExceeded indicates memory usage has exceeded the limit
	MemoryStatusExceeded
)

Memory status constants

func (MemoryStatus) String ΒΆ added in v0.4.4

func (ms MemoryStatus) String() string

String returns string representation of memory status

type OutputFormat ΒΆ added in v0.0.4

type OutputFormat int

OutputFormat represents the output file format

const (
	// OutputFormatCSV represents CSV output format
	OutputFormatCSV OutputFormat = iota
	// OutputFormatTSV represents TSV output format
	OutputFormatTSV
	// OutputFormatLTSV represents LTSV output format
	OutputFormatLTSV
	// OutputFormatParquet represents Parquet output format
	OutputFormatParquet
	// OutputFormatXLSX represents Excel XLSX output format
	OutputFormatXLSX
)

func (OutputFormat) Extension ΒΆ added in v0.2.0

func (f OutputFormat) Extension() string

Extension returns the file extension for the format

func (OutputFormat) String ΒΆ added in v0.2.0

func (f OutputFormat) String() string

String returns the string representation of OutputFormat

type Record ΒΆ added in v0.4.4

type Record []string

Record represents file records as a slice of string fields. This type was changed from unexported 'record' to exported 'Record' in v0.5.0 to fix lint issues with exported methods returning unexported types.

Breaking change: Code that previously imported and used the unexported 'record' type will need to be updated to use 'Record'.

type TableName ΒΆ added in v0.4.2

type TableName struct {
	// contains filtered or unexported fields
}

TableName represents a table name with validation

func NewTableName ΒΆ added in v0.4.2

func NewTableName(name string) TableName

NewTableName creates a new TableName with validation

func (TableName) Equal ΒΆ added in v0.4.2

func (tn TableName) Equal(other TableName) bool

Equal compares two table names

func (TableName) Sanitize ΒΆ added in v0.4.2

func (tn TableName) Sanitize() TableName

Sanitize returns a sanitized version of the table name

func (TableName) String ΒΆ added in v0.4.2

func (tn TableName) String() string

String returns the string representation of TableName

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL