db-toolkit

module

v0.0.0-...-bd113ff Latest Latest Go to latest Published: Mar 2, 2026 License: MIT

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/rnestertsov/db-toolkit

Links

Open Source Insights

README ¶

db-toolkit

Arrow-based data I/O for Go — like io.Reader/io.Writer, but for databases and file formats.

Overview

db-toolkit provides a unified streaming interface for moving data between databases and file formats using Apache Arrow as the in-memory representation. It follows a pull model: readers produce BatchStreams of Arrow records, writers consume them, and dio.Copy wires them together.

    Reader           BatchStream           Writer
┌────────────┐    ┌──────────────┐    ┌────────────┐
│ Source     │───▶│ Schema()     │───▶│ Destination│
│ Schema()   │    │ Next(ctx)    │    │            │
│ Read(ctx)  │    │ Close()      │    │ WriteStream│
└────────────┘    └──────────────┘    └────────────┘

Features:

Pull-model streaming — readers produce batches on demand, keeping memory usage bounded
Filter pushdown — push predicates down to the source (SQL WHERE clauses, Parquet row-group pruning)
Column projection — read only the columns you need
Platform registry — database platforms self-register via init(), activated with a blank import

Package Layout

Package	Description
`dio`	Core interfaces: `Reader`, `Writer`, `BatchStream`, `Filter`, `Copy`
`dio/csv`	CSV reader and writer
`dio/parquet`	Parquet reader and writer with row-group pruning via filter pushdown
`dio/gen`	Synthetic data generation for testing
`platform`	Platform registry (`Register`, `Get`, `List`)
`platform/postgres`	PostgreSQL reader, writer, and filter pushdown

Quick Start

package main

import (
    "context"
    "os"

    "github.com/rnestertsov/db-toolkit/dio"
    "github.com/rnestertsov/db-toolkit/dio/csv"
    "github.com/rnestertsov/db-toolkit/platform"
    "github.com/rnestertsov/db-toolkit/platform/postgres"
)

func main() {
    ctx := context.Background()

    pg, err := platform.Get("postgres")
    if err != nil {
        panic(err)
    }

    conn, err := pg.OpenConnection(ctx, postgres.ConnectionConfig{
        Name: "mydb",
        User: "myuser",
        Host: "localhost",
        Port: "5432",
    })
    if err != nil {
        panic(err)
    }
    defer conn.Close(ctx)

    reader, err := conn.Query(ctx, "SELECT id, name, email FROM users")
    if err != nil {
        panic(err)
    }

    writer := csv.NewWriter(os.Stdout)

    if err := dio.Copy(ctx, reader, writer); err != nil {
        panic(err)
    }
}

Key Concepts

BatchStream

A BatchStream is a pull-based iterator over Arrow record batches. Call Next(ctx) to get the next batch; it returns io.EOF when exhausted. Callers must call Release() on each returned record.

stream, err := reader.Read(ctx)
defer stream.Close()

for {
    rec, err := stream.Next(ctx)
    if err == io.EOF {
        break
    }
    // process rec
    rec.Release()
}

Filter Pushdown

Pass filters as ReadOptions to push predicates to the source. SQL databases translate them into WHERE clauses; Parquet readers use them for row-group pruning.

stream, err := reader.Read(ctx,
    dio.WithFilter([]dio.Filter{
        dio.Eq{Column: 0, Value: 42},
        dio.GtEq{Column: 1, Value: "2024-01-01"},
    }),
)

Readers optionally implement dio.FilterClassifier to report how precisely they handle each filter (FilterExact, FilterInexact, or FilterUnsupported).

Column Projection

Read only the columns you need:

stream, err := reader.Read(ctx,
    dio.WithProjection([]int{0, 2, 5}),
)

Platform Registry

Platforms register themselves at import time. Activate a platform with a blank import, then retrieve it by name:

import _ "github.com/rnestertsov/db-toolkit/platform/postgres"

pg, err := platform.Get("postgres")

Supported Platforms

Platform	Package	Status
PostgreSQL	`platform/postgres`	Supported

Adding a new platform? See docs/adding-a-platform.md.

Development

Prerequisites

Go 1.24+
Docker (for integration tests using testcontainers)

Build

make build

Test

make test

Integration tests spin up real database instances via testcontainers, so Docker must be running.

License

This project is licensed under the MIT License — see the LICENSE file for details.

Directories ¶

Path	Synopsis
dio Package dio provides a simple and easy to use API for reading and writing records from/to different data sources.	Package dio provides a simple and easy to use API for reading and writing records from/to different data sources.
csv
gen
parquet ABOUTME: Row group pruning logic for Parquet filter pushdown.	ABOUTME: Row group pruning logic for Parquet filter pushdown.
platform
postgres

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL