Porter
A streaming-first Arrow Flight SQL server for DuckDB β simple, sharp, and built for motion.
π§ Overview
Porter is a DuckDB-backed Arrow Flight SQL server designed around one idea:
SQL goes in. Arrow streams out. Everything else is detail.
It sits directly on top of Apache Arrow Flight SQL and exposes a clean execution surface for both raw SQL and prepared statements.
No orchestration layer. No distributed query engine. No abstraction sprawl.
Just a tight execution loop between Flight and DuckDB.
β‘ Key Characteristics
- Streaming-first execution model (Arrow RecordBatch streams)
- Native DuckDB execution via ADBC
- Full prepared statement lifecycle with parameter binding
- TTL-based handle management with background GC
- Minimal, explicit Flight SQL surface area
π§± Architecture
Porter keeps the control flow linear:
+-------------------+
| Flight Client |
+-------------------+
|
gRPC / Flight
|
+-------------------+
| Porter Server |
|-------------------|
| Flight SQL Layer |
| Handle Manager |
| Prepared Stmts |
| Stream Engine |
+-------------------+
|
+-------------------+
| DuckDB |
| (via ADBC) |
+-------------------+
|
+-------------------+
| Arrow RecordBatches|
+-------------------+
The server is intentionally thin: routing, lifecycle, and streaming glue only.
DuckDB does the heavy lifting.
π Getting Started
0. Install DuckDB driver
Before anything else:
./install_duckdb.sh
This sets up the required DuckDB ADBC driver environment.
1. Run the Server
go run ./cmd/server
Defaults:
- Address:
localhost:32010
- Database: in-memory DuckDB (
:memory:)
2. Run a Client
You have two ways to exercise the system:
Native client
go run ./cmd/client
Example harness
go run ./example
Both will issue queries and stream Arrow record batches back from Flight.
π» CLI Usage
Porter also exposes a developer-facing CLI under cmd/porter. The built CLI is a small, composable tool for local workflows.
Build and use the Porter CLI
go build -o porter ./cmd/porter
./porter --help
Run the server
The default action is serve, so ./porter behaves the same as ./porter serve.
./porter serve --db :memory: --port 32010
# or simply
./porter --db :memory: --port 32010
Execute a single query
./porter query "SELECT 1 AS value"
Start an interactive REPL
./porter repl
Load Parquet data
./porter load data.parquet
Inspect a table schema
./porter schema table_name
Environment variables
PORTER_DB and PORTER_PORT are supported as alternate configuration sources.
π§ Execution Model
Porter supports two execution paths:
1. One-shot SQL
GetFlightInfoStatement β plan + handle
DoGetStatement β stream results
Ephemeral handles, auto-expire under TTL.
2. Prepared Statements
CreatePreparedStatement β persistent handle
DoPutPreparedStatementQuery β bind parameters
DoGetPreparedStatement β execute + stream
ClosePreparedStatement β cleanup
Parameter batches are real Arrow RecordBatches, reference-counted and safely transferred across execution boundaries.
𧬠Design Rules
Porter is built on strict invariants:
- Flight SQL owns protocol routing (via
fsql.NewFlightServer)
- Porter only implements execution semantics
- Handles are in-memory and TTL-bound
- GC runs in the background (no inline eviction logic)
- Arrow memory is explicitly retained/released
Nothing implicit. Nothing magical.
π Streaming Core
All query results flow through a single pattern:
DuckDB β Arrow RecordReader β Channel β Flight StreamChunks
Records are retained per batch and released after network write completion.
This keeps backpressure and memory usage predictable.
π Wire Contract
Porter supports both raw and Flight SQL-native flows:
| Operation |
Behavior |
| SQL Query |
Raw SQL β FlightInfo β DoGet stream |
| Prepared Statements |
Handle-based execution with binding |
| Schema Introspection |
Lightweight probe execution |
Both converge on the same execution engine.
π WebSockets (Coming Soon)
A WebSocket transport layer is in progress.
Planned capabilities:
- Bi-directional streaming query sessions
- Low-latency Arrow batch push over WS frames
- Browser-native Flight-like client
- Session-based prepared statement lifecycle
Think of it as Flight SQL without the gRPC boundary.
π£οΈ Roadmap
- Streaming Flight SQL execution
- Prepared statements with parameter binding
- TTL-based handle lifecycle
- Background garbage collection
- WebSocket transport layer
- Session-aware execution context
- Improved schema introspection (reduce probe execution)
- Performance benchmarking suite
π§ͺ Philosophy
Porter is intentionally narrow:
No distributed illusions. No unnecessary abstraction layers. Just a fast path from query to stream.
It is a system designed for hacking, embedding, and evolving.
π€ Contributing
If youβve ever looked at a data system and thought:
βWhy is this so complicated?β
you already understand what Porter is trying to fix.
Build it smaller. Make it clearer. Keep it moving.