π§ Icebox
π― What is Icebox?
Icebox is a zero-configuration data lakehouse that gets you from zero to querying Iceberg tables in under five minutes. Perfect for:
- π¬ Experimenting with Apache Iceberg table format
- π Learning lakehouse concepts and workflows
- π§ͺ Prototyping data pipelines locally
- π Testing Iceberg integrations before production
No servers, no complex setup, no dependencies - just a single binary and your data.
β¨ Features
π Zero-Setup Experience
- Single binary - No installation complexity
- Embedded catalog - SQLite-based, no external database needed
- REST catalog support - Connect to existing Iceberg REST catalogs
- Embedded MinIO server - S3-compatible storage for testing production workflows
- Local storage - File system integration out of the box
- Auto-configuration - Sensible defaults, minimal configuration required
π Data Operations
- Parquet import with automatic schema inference
- Iceberg table creation and management
- Namespace organization and operations
- Pack/Unpack - Portable project archives for sharing and backup
- Arrow integration for efficient data processing
- Transaction support with proper ACID guarantees
π SQL Querying
- DuckDB integration for high-performance analytics
- Interactive SQL shell with command history and multi-line support
- Time-travel queries - Query tables at any point in their history
- Multiple output formats - table, CSV, JSON
- Auto-registration of catalog tables for immediate querying
- Query performance metrics and optimization features
π οΈ Developer-Friendly
- Rich CLI with intuitive commands and helpful output
- Comprehensive table operations - create, list, describe, history
- Namespace management for organized data governance
- Dry-run modes to preview operations
- YAML configuration for reproducible setups
π Quick Start
1. Install Icebox
# Build from source (Go 1.21+ required)
git clone https://github.com/TFMV/icebox.git
cd icebox/icebox
go build -o icebox cmd/icebox/main.go
2. Initialize Your Lakehouse
# Create a new lakehouse project
./icebox init my-lakehouse
cd my-lakehouse
# Your project structure is ready
tree .icebox/
# .icebox/
# βββ catalog/
# β βββ catalog.db # SQLite catalog
# βββ data/ # Table storage
# βββ minio/ # MinIO data (if enabled)
3. Import Your First Table
# Import a Parquet file (creates namespace and table automatically)
./icebox import sales_data.parquet --table sales
β
Successfully imported table!
π Import Results:
Table: [default sales]
Records: 1,000,000
Size: 45.2 MB
Location: file:///.icebox/data/default/sales
4. Start Querying
# Query your data with SQL
./icebox sql "SELECT COUNT(*) FROM sales"
π Registered 1 tables for querying
β±οΈ Query [query_1234] executed in 145ms
π 1 rows returned
βββββββββββ
β count β
βββββββββββ€
β1000000 β
βββββββββββ
# Use the interactive shell for complex analysis
./icebox shell
π§ Icebox SQL Shell v0.1.0
Interactive SQL querying for Apache Iceberg
Type \help for help, \quit to exit
icebox> SELECT region, SUM(amount) FROM sales GROUP BY region LIMIT 3;
β±οΈ Query executed in 89ms
π 3 rows returned
ββββββββββ¬βββββββββββ
β region β sum β
ββββββββββΌβββββββββββ€
β North β 2456789 β
β South β 1987432 β
β East β 2123456 β
ββββββββββ΄βββββββββββ
icebox> \quit
π That's it! You now have a working Iceberg lakehouse with SQL querying.
π New Features
ποΈ Embedded MinIO Server
Test S3-compatible storage workflows locally with zero configuration:
# Initialize with embedded MinIO
./icebox init my-project --storage minio
# Or enable in existing project
cat >> .icebox.yml << EOF
storage:
type: minio
minio:
embedded: true
console: true # Enable web console at http://localhost:9000
EOF
# MinIO starts automatically with Icebox
./icebox sql "SHOW TABLES"
# ποΈ Starting embedded MinIO server...
# β
MinIO server started successfully
Features:
- π S3-Compatible API - Test cloud storage workflows locally
- π Web Console - Browser-based management interface
- π‘οΈ Secure by Default - Configurable authentication and TLS
- π Performance Optimized - Modern connection pooling and timeouts
π¦ Pack & Unpack
Create portable archives of your lakehouse projects:
# Create project archive
./icebox pack my-analytics-project.tar.gz
# Share and distribute
scp my-analytics-project.tar.gz colleague@server:/home/colleague/
# Restore anywhere
./icebox unpack my-analytics-project.tar.gz
Perfect for:
- π€ Sharing projects with colleagues
- πΎ Backup and archival
- π Distribution of datasets and schemas
- π§ͺ Testing with consistent environments
π Examples
Quick Data Analysis
# Import and analyze customer data
./icebox import customers.parquet --table customers
./icebox sql "SELECT region, AVG(lifetime_value) FROM customers GROUP BY region"
# Time-travel to see historical data
./icebox time-travel customers --as-of "2024-01-01"
--query "SELECT COUNT(*) FROM customers"
REST Catalog Integration
# Connect to production Iceberg REST catalog
./icebox init prod-analytics --catalog rest --uri https://catalog.company.com
# Import data and query immediately
./icebox import events.parquet --table analytics.user_events
./icebox sql "SELECT event_type, COUNT(*) FROM analytics.user_events GROUP BY event_type"
Project Organization
# Create namespaced tables
./icebox import transactions.parquet --table finance.transactions
./icebox import campaigns.parquet --table marketing.campaigns
./icebox import orders.parquet --table sales.orders
# Query across namespaces
./icebox sql "
SELECT f.account_type, SUM(s.amount)
FROM finance.transactions f
JOIN sales.orders s ON f.transaction_id = s.id
GROUP BY f.account_type
"
For more comprehensive examples and detailed usage, see our π Usage Guide.
π Storage & Catalog Support
| Storage Type |
Description |
Use Case |
| Local Filesystem |
File-based storage |
Development, testing |
| Embedded MinIO |
S3-compatible local server |
Cloud workflow testing |
| External MinIO |
Remote MinIO instance |
Shared development |
| Catalog Type |
Description |
Use Case |
| SQLite |
Embedded local catalog |
Single-user development |
| REST |
External Iceberg REST catalog |
Multi-user, production |
ποΈ Architecture
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β CLI Layer β β Storage Layer β β Catalog Layer β
β β β β β β
β β’ import βββββΊβ β’ Local FS βββββΊβ β’ SQLite β
β β’ sql/shell β β β’ MinIO S3 β β β’ REST API β
β β’ table ops β β β’ Cloud storage β β β’ Authenticationβ
β β’ pack/unpack β β β’ File:// URIs β β β’ Multi-user β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
βββββββββββββββββββββββββΌββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββ
β Apache Iceberg β
β β
β β’ Table format β
β β’ Time travel β
β β’ Transaction log β
β β’ DuckDB engine β
βββββββββββββββββββββββ
π Documentation
Feature Documentation
πΊοΈ Roadmap
β
Current Version (v0.1.0)
- β
SQLite & REST catalog support with authentication
- β
Embedded MinIO server with S3-compatible API
- β
Parquet import with schema inference
- β
SQL engine with DuckDB integration
- β
Interactive SQL shell with rich features
- β
Time-travel queries for historical data analysis
- β
Table & namespace management operations
- β
Pack/Unpack for portable project archives
π Future Releases
- Cloud Storage - Native S3, GCS, Azure integration
- Streaming Ingestion - Real-time data processing
- Web UI - Browser-based data exploration
- Advanced Analytics - Enhanced query capabilities
- SDK Libraries - Programmatic access
π€ Contributing
We welcome contributions! Icebox is designed to be approachable for developers at all levels.
Quick Contribution Guide
- π΄ Fork the repository and create a feature branch
- π§ͺ Write tests for your changes
- π Update documentation as needed
- β
Ensure tests pass with
go test ./...
- π Submit a pull request
Development
# Build from source
git clone https://github.com/TFMV/icebox.git
cd icebox/icebox
go mod tidy
go build -o icebox cmd/icebox/main.go
# Run tests
go test ./...
Areas for Contribution
- π Bug fixes and stability improvements
- π Documentation and examples
- β¨ New features and enhancements
- π§ͺ Test coverage improvements
- π¨ CLI/UX enhancements
π License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.