porter

command module
v0.10.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 21, 2025 License: MIT Imports: 8 Imported by: 0

README

Porter Logo

Porter

Zero‑copy analytics, delivered at Mach Arrow.

Go Report Card Build and Test

Status: Experimental
Porter is under active development and is stable for development and experimental use cases.


Seeking Testers & Contributors
Porter is actively evolving, and I'm looking for developers willing to spend a few hours testing any use cases you might have in mind.
If you hit a bug, open an issue or send a PR. Otherwise, feel free to assign it to me.
Your feedback will shape the future of fast, simple, open SQL over Arrow Flight.

Ready to contribute? Check out our Contributing Guidelines for community standards, development setup, and technical details.


What is Porter?

Porter is a high-performance Flight SQL server that makes Arrow-native analytics effortless. It provides a flexible abstraction layer for different database backends, supporting both DuckDB and ClickHouse with a unified Flight SQL interface. It bridges the gap between analytical databases and modern data infrastructure by speaking the Flight SQL protocol natively.

Think of it as analytics with wings – all the analytical power you love, now network-accessible with zero-copy Arrow streaming.

Key Features

Blazing Fast
  • Zero-copy Arrow streaming for maximum throughput
  • Connection pooling and resource optimization
  • In-memory caching with LRU eviction
  • Concurrent query execution with transaction support
Developer Friendly
  • Single binary deployment
  • Configuration via YAML or environment variables
  • Docker support with optimized images
  • Comprehensive CLI with intuitive commands
  • Extensive middleware support for customization
Standards Compliant
  • Apache Arrow Flight SQL protocol
  • JDBC/ODBC compatibility through Flight SQL drivers
  • SQL standard compliance via database backends
  • Cross-platform support (Linux, macOS, Windows)
Built-in Benchmarking
  • TPC-H benchmark suite with DuckDB's native TPC-H module
  • Multiple output formats (table, JSON, Arrow)
  • Configurable scale factors and iterations
  • Query plan analysis and performance metrics
  • Zero setup – no external data files required

Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Flight SQL    │    │     Porter       │    │    Database     │
│    Clients      │◄──►│   Flight SQL     │◄──►│    Backends     │
│                 │    │     Server       │    │                 │
└─────────────────┘    └──────────────────┘    └─────────────────┘
        │                        │                        │
        │                        │                        │
    ┌───▼────┐              ┌────▼────┐              ┌────▼────┐
    │ Arrow  │              │ Metrics │              │ Schema  │
    │Streamng│              │Promtheus│              │ Cache   │
    └────────┘              └─────────┘              └─────────┘

Core Components:

  • Flight SQL Handler – Protocol implementation and query routing
  • Database Backends – Flexible abstraction supporting DuckDB and ClickHouse
  • Connection Pool – Efficient database connection management with backend-specific optimizations
  • Cache Layer – Smart caching for schemas and query results
  • Metrics Collection – Comprehensive observability and monitoring
  • Middleware Stack – Authentication, logging, and custom extensions

Quick Start

Prerequisites
  • Go 1.24+ (for building from source)
  • DuckDB (bundled with Go bindings)
  • ClickHouse server (optional, for ClickHouse backend)
Installation
Option 1: Build from Source
git clone https://github.com/TFMV/porter.git
cd porter
make build
Option 2: Using Go Install
go install github.com/TFMV/porter/cmd/server@latest
Running Porter
Basic Usage
# Start with in-memory DuckDB database
./porter serve

# Start with persistent DuckDB database
./porter serve --database ./my-data.db --address 0.0.0.0:32010

# Start with ClickHouse backend
./porter serve --database "clickhouse://localhost:9000/default"

# Start with configuration file
./porter serve --config ./config.yaml
Using Docker
# Build the image
make docker-build

# Run the container
make docker-run

# Or run directly
docker run -p 32010:32010 -p 9090:9090 porter:latest
Configuration

Create a config.yaml file:

server:
  address: "0.0.0.0:32010"
  max_connections: 100
  connection_timeout: "30s"
  query_timeout: "5m"
  max_message_size: 16777216  # 16MB

database:
  # DuckDB backend (default)
  dsn: "duckdb://./data.db"
  # ClickHouse backend
  # dsn: "clickhouse://localhost:9000/default?username=default&password="
  max_open_conns: 32
  max_idle_conns: 8
  conn_max_lifetime: "1h"
  health_check_period: "30s"

logging:
  level: "info"
  format: "json"

metrics:
  enabled: true
  address: ":9090"

tls:
  enabled: false
  cert_file: ""
  key_file: ""

auth:
  enabled: false
  type: "oauth2"

Supported Database Backends

Porter provides a unified Flight SQL interface across multiple database backends:

DuckDB Backend
  • Connection String: duckdb://path/to/database.db or duckdb://:memory:
  • Features: Full analytical SQL, columnar storage, embedded deployment
  • Use Cases: Development, analytics workloads, embedded applications
  • Performance: Excellent for OLAP queries, TPC-H benchmarks
ClickHouse Backend
  • Connection String: clickhouse://host:port/database?username=user&password=pass
  • Features: Distributed analytics, real-time ingestion, horizontal scaling
  • Use Cases: Large-scale analytics, time-series data, real-time dashboards
  • Performance: Optimized for high-throughput analytical workloads
Backend Selection

The backend is automatically detected from the DSN connection string:

database:
  # DuckDB (embedded)
  dsn: "duckdb://./analytics.db"
  
  # ClickHouse (distributed)  
  dsn: "clickhouse://localhost:9000/analytics?username=default"

Both backends support:

  • Full Flight SQL protocol compliance
  • Arrow-native streaming for zero-copy data transfer
  • Connection pooling and resource management
  • Transaction support and prepared statements
  • Schema discovery and metadata operations

Connecting to Porter

Python (using ADBC)
import adbc_driver_flightsql.dbapi

# Connect to Porter
conn = adbc_driver_flightsql.dbapi.connect(
    "grpc://localhost:32010"
)

# Execute queries
cursor = conn.cursor()
cursor.execute("SELECT * FROM my_table LIMIT 10")
results = cursor.fetchall()
Python (using PyArrow)
from pyarrow import flight

# Connect to Porter Flight SQL server
client = flight.FlightClient("grpc://localhost:32010")

# Execute query and get results as Arrow table
info = client.get_flight_info(
    flight.FlightDescriptor.for_command(b"SELECT * FROM my_table")
)
table = client.do_get(info.endpoints[0].ticket).read_all()
Java (using Arrow Flight SQL JDBC)
String url = "jdbc:arrow-flight-sql://localhost:32010";
Connection conn = DriverManager.getConnection(url);

Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery("SELECT * FROM my_table");

while (rs.next()) {
    System.out.println(rs.getString(1));
}

Benchmarking

Porter includes a comprehensive TPC-H benchmarking suite that uses DuckDB's built-in TPC-H module for reproducible performance testing.

Quick Benchmark
# Run a single query with small dataset
./porter bench --query q1 --scale 0.01

# Run multiple queries with medium dataset  
./porter bench --query q1,q3,q5 --scale 0.1 --iterations 3

# Run all queries with full dataset
./porter bench --all --scale 1 --format json --output results.json
Example Output
TPC-H Benchmark Results
=======================

Configuration:
  Scale Factor: 0.01
  Iterations: 1
  Total Time: 120.5ms

Environment:
  DuckDB Version: v1.3.1
  Platform: darwin/arm64

Results:
Query  Iter Time         Rows     Status
-----  ---- ----         ----     ------
q1     1    2.952ms      4        OK
q6     1    0.562ms      1        OK
q10    1    3.665ms      20       OK
Benchmark Analysis

Use the included Python script to analyze results:

# Analyze single benchmark
python3 examples/benchmark_analysis.py results.json

# Compare two benchmark runs
python3 examples/benchmark_analysis.py --compare baseline.json current.json

For detailed benchmarking documentation, see docs/benchmarking.md.

Microbenchmarks

Porter also ships a small suite of microbenchmarks under bench/ for quick performance checks. First generate a dataset:

go run ./bench/gen_data.go --rows=100000 --cols=5 --format=parquet --out=test.parquet

Run the benchmarks with:

make bench

Results are printed in standard Go benchmark format and can be formatted as CSV or Markdown using the helpers in bench/report.go.

Development

Building
# Standard build
make build

# Optimized build (smaller binary)
make optimize

# Show binary size and dependencies
make size
Testing
# Run all tests
make test

# Run with verbose output
go test -v ./...

# Run specific package tests
go test -v ./pkg/handlers
Development Server
# Run with hot reload (requires air)
air

# Or run directly
make run

Monitoring & Observability

Metrics

Porter exposes Prometheus metrics on :9090 by default:

  • Query Performance: Execution time, throughput, error rates
  • Connection Pool: Active connections, wait times, health status
  • Cache Performance: Hit/miss ratios, eviction rates
  • Resource Usage: Memory allocation, GC statistics
  • Flight SQL Operations: RPC call counts, streaming metrics
Health Checks
# Check server health
curl http://localhost:9090/health

# Prometheus metrics
curl http://localhost:9090/metrics
Logging

Structured logging with configurable levels:

# Set log level
export PORTER_LOG_LEVEL=debug

# JSON format (default)
export PORTER_LOG_FORMAT=json

# Human-readable format
export PORTER_LOG_FORMAT=console

Advanced Configuration

Environment Variables

All configuration options can be set via environment variables with the PORTER_ prefix:

export PORTER_ADDRESS="0.0.0.0:32010"
export PORTER_DATABASE=":memory:"
export PORTER_LOG_LEVEL="debug"
export PORTER_METRICS_ENABLED="true"
export PORTER_MAX_CONNECTIONS="100"
Command Line Options
./porter serve --help

# Key options:
--address string              Server listen address (default "0.0.0.0:32010")
--database string             DuckDB database path (default ":memory:")
--config string               Configuration file path
--log-level string            Log level: debug, info, warn, error (default "info")
--max-connections int         Maximum concurrent connections (default 100)
--metrics                     Enable Prometheus metrics (default true)
--metrics-address string      Metrics server address (default ":9090")
--tls                         Enable TLS encryption
--auth                        Enable authentication

Contributing

We welcome contributions! Porter is designed to be extensible and community-driven.

Areas for Contribution
  • Performance optimizations and benchmarking
  • Additional authentication providers (JWT, LDAP, etc.)
  • Enhanced monitoring and alerting capabilities
  • Client libraries and integrations
  • Documentation and examples
  • Testing and quality assurance
Development Setup
# Clone the repository
git clone https://github.com/TFMV/porter.git
cd porter

# Install dependencies
go mod download

# Run tests
make test

# Build and run
make build
./porter serve

Documentation

Porter and the Future of Data Migration

With Porter's current capabilities:

  • Multi-backend support (DuckDB and ClickHouse, with more planned)
  • Unified Flight SQL interface across all backends
  • Arrow-native streaming between different database systems

And as Porter gains:

  • Distributed query execution (via DoExchange)
  • Additional backend support (PostgreSQL, BigQuery, etc.)
  • Enhanced Flight SQL interoperability

…it becomes not just a query engine, but a conduit — a streamlined pathway for moving structured data between systems without serialization overhead.

Why This Matters

Most data migration today is:

  • Batch-oriented
  • File-based (CSV, Parquet)
  • Bloated with ETL pipelines and format juggling

With Porter, you get:

  • Arrow in, Arrow out: A columnar memory format travels intact
  • Query-based filtering: Only move what matters
  • Streaming transport: No intermediate staging or local disk
  • Backend-agnostic execution: Read from DuckDB, write to ClickHouse, or vice versa
Example Use Case
# Query DuckDB and stream results to ClickHouse (conceptual example)
# Currently: Use Porter as a unified Flight SQL gateway
porter serve --database "duckdb://source.db" &
porter serve --database "clickhouse://localhost:9000/target" --address :32011 &

# Future: Direct data migration pipeline
porter migrate --from "duckdb://source.db" --to "clickhouse://localhost:9000/target" \
  --query "SELECT * FROM users WHERE active = true" --target-table users_active

With distributed query + DoExchange, this grows into:

  • Live sync jobs between heterogeneous databases
  • Selective replication across cloud and edge
  • In-flight transformations using SQL only

Known Issues & Limitations

  • Experimental status: APIs may change without notice
  • Limited authentication: Only basic auth and OAuth2 currently supported
  • Single-node only: No distributed query execution yet
  • Memory constraints: Large result sets may cause memory pressure
  • Platform support: Some features may be limited on Windows

License

Released under the MIT License. See LICENSE for details.


Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
entry command
cmd
benchmark command
server command
Package main provides the entry point for the DuckDB Flight SQL Server.
Package main provides the entry point for the DuckDB Flight SQL Server.
server/config
Package config provides configuration structures for the Flight SQL server.
Package config provides configuration structures for the Flight SQL server.
server/middleware
Package middleware provides gRPC middleware for the Flight SQL server.
Package middleware provides gRPC middleware for the Flight SQL server.
test-flight command
pkg
benchmark
Package benchmark provides TPC-H benchmarking capabilities for Porter.
Package benchmark provides TPC-H benchmarking capabilities for Porter.
errors
Package errors provides standardized error types for the Flight SQL server.
Package errors provides standardized error types for the Flight SQL server.
handlers
Package handlers contains Flight SQL protocol handlers.
Package handlers contains Flight SQL protocol handlers.
infrastructure
Package infrastructure provides enterprise-grade JDBC compatibility for Porter.
Package infrastructure provides enterprise-grade JDBC compatibility for Porter.
infrastructure/converter
Package converter provides type conversion between DuckDB and Apache Arrow.
Package converter provides type conversion between DuckDB and Apache Arrow.
infrastructure/metrics
Package metrics provides metrics collection for the Flight SQL server.
Package metrics provides metrics collection for the Flight SQL server.
infrastructure/pool
Package pool provides database connection pooling for DuckDB.
Package pool provides database connection pooling for DuckDB.
models
Package models provides data structures used throughout the Flight SQL server.
Package models provides data structures used throughout the Flight SQL server.
repositories
Package repositories defines interfaces for data access operations.
Package repositories defines interfaces for data access operations.
repositories/clickhouse
Package clickhouse provides ClickHouse‑specific repository implementations.
Package clickhouse provides ClickHouse‑specific repository implementations.
repositories/duckdb
Package duckdb provides DuckDB‑specific repository implementations.
Package duckdb provides DuckDB‑specific repository implementations.
services
Package services contains business logic implementations.
Package services contains business logic implementations.
test

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL