icebox

module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 24, 2025 License: MIT

README ΒΆ

🧊 Icebox

A single-binary playground for Apache Iceberg
Five minutes to first query

Go Apache Iceberg License

Quick Start β€’ Features β€’ Examples β€’ Usage Guide β€’ Contributing


🎯 What is Icebox?

Icebox is a zero-configuration data lakehouse that gets you from zero to querying Iceberg tables in under five minutes. Perfect for:

  • πŸ”¬ Experimenting with Apache Iceberg table format
  • πŸ“š Learning lakehouse concepts and workflows
  • πŸ§ͺ Prototyping data pipelines locally
  • πŸš€ Testing Iceberg integrations before production

No servers, no complex setup, no dependencies - just a single binary and your data.

✨ Features

πŸš€ Zero-Setup Experience
  • Single binary - No installation complexity
  • Embedded catalog - SQLite-based, no external database needed
  • REST catalog support - Connect to existing Iceberg REST catalogs
  • Embedded MinIO server - S3-compatible storage for testing production workflows
  • Local storage - File system integration out of the box
  • Auto-configuration - Sensible defaults, minimal configuration required
πŸ“ Data Operations
  • Parquet import with automatic schema inference
  • Iceberg table creation and management
  • Namespace organization and operations
  • Pack/Unpack - Portable project archives for sharing and backup
  • Arrow integration for efficient data processing
  • Transaction support with proper ACID guarantees
πŸ” SQL Querying
  • DuckDB integration for high-performance analytics
  • Interactive SQL shell with command history and multi-line support
  • Time-travel queries - Query tables at any point in their history
  • Multiple output formats - table, CSV, JSON
  • Auto-registration of catalog tables for immediate querying
  • Query performance metrics and optimization features
πŸ› οΈ Developer-Friendly
  • Rich CLI with intuitive commands and helpful output
  • Comprehensive table operations - create, list, describe, history
  • Namespace management for organized data governance
  • Dry-run modes to preview operations
  • YAML configuration for reproducible setups

πŸš€ Quick Start

1. Install Icebox
# Build from source (Go 1.21+ required)
git clone https://github.com/TFMV/icebox.git
cd icebox/icebox
go build -o icebox cmd/icebox/main.go
2. Initialize Your Lakehouse
# Create a new lakehouse project
./icebox init my-lakehouse
cd my-lakehouse

# Your project structure is ready
tree .icebox/
# .icebox/
# β”œβ”€β”€ catalog/
# β”‚   └── catalog.db     # SQLite catalog
# β”œβ”€β”€ data/              # Table storage
# └── minio/             # MinIO data (if enabled)
3. Import Your First Table
# Import a Parquet file (creates namespace and table automatically)
./icebox import sales_data.parquet --table sales

βœ… Successfully imported table!

πŸ“Š Import Results:
   Table: [default sales]
   Records: 1,000,000
   Size: 45.2 MB
   Location: file:///.icebox/data/default/sales
4. Start Querying
# Query your data with SQL
./icebox sql "SELECT COUNT(*) FROM sales"
πŸ“‹ Registered 1 tables for querying
⏱️  Query [query_1234] executed in 145ms
πŸ“Š 1 rows returned
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ count   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚1000000  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

# Use the interactive shell for complex analysis
./icebox shell

🧊 Icebox SQL Shell v0.1.0
Interactive SQL querying for Apache Iceberg
Type \help for help, \quit to exit

icebox> SELECT region, SUM(amount) FROM sales GROUP BY region LIMIT 3;
⏱️  Query executed in 89ms
πŸ“Š 3 rows returned
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ region β”‚   sum    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ North  β”‚ 2456789  β”‚
β”‚ South  β”‚ 1987432  β”‚
β”‚ East   β”‚ 2123456  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

icebox> \quit

πŸŽ‰ That's it! You now have a working Iceberg lakehouse with SQL querying.

🌟 New Features

πŸ—„οΈ Embedded MinIO Server

Test S3-compatible storage workflows locally with zero configuration:

# Initialize with embedded MinIO
./icebox init my-project --storage minio

# Or enable in existing project
cat >> .icebox.yml << EOF
storage:
  type: minio
  minio:
    embedded: true
    console: true    # Enable web console at http://localhost:9000
EOF

# MinIO starts automatically with Icebox
./icebox sql "SHOW TABLES"
# πŸ—„οΈ Starting embedded MinIO server...
# βœ… MinIO server started successfully

Features:

  • πŸš€ S3-Compatible API - Test cloud storage workflows locally
  • 🌐 Web Console - Browser-based management interface
  • πŸ›‘οΈ Secure by Default - Configurable authentication and TLS
  • πŸ“Š Performance Optimized - Modern connection pooling and timeouts
πŸ“¦ Pack & Unpack

Create portable archives of your lakehouse projects:

# Create project archive
./icebox pack my-analytics-project.tar.gz

# Share and distribute
scp my-analytics-project.tar.gz colleague@server:/home/colleague/

# Restore anywhere
./icebox unpack my-analytics-project.tar.gz

Perfect for:

  • πŸ“€ Sharing projects with colleagues
  • πŸ’Ύ Backup and archival
  • πŸš€ Distribution of datasets and schemas
  • πŸ§ͺ Testing with consistent environments

πŸ“‹ Examples

Quick Data Analysis
# Import and analyze customer data
./icebox import customers.parquet --table customers
./icebox sql "SELECT region, AVG(lifetime_value) FROM customers GROUP BY region"

# Time-travel to see historical data
./icebox time-travel customers --as-of "2024-01-01" 
  --query "SELECT COUNT(*) FROM customers"
REST Catalog Integration
# Connect to production Iceberg REST catalog
./icebox init prod-analytics --catalog rest --uri https://catalog.company.com

# Import data and query immediately
./icebox import events.parquet --table analytics.user_events
./icebox sql "SELECT event_type, COUNT(*) FROM analytics.user_events GROUP BY event_type"
Project Organization
# Create namespaced tables
./icebox import transactions.parquet --table finance.transactions
./icebox import campaigns.parquet --table marketing.campaigns
./icebox import orders.parquet --table sales.orders

# Query across namespaces
./icebox sql "
SELECT f.account_type, SUM(s.amount) 
FROM finance.transactions f 
JOIN sales.orders s ON f.transaction_id = s.id
GROUP BY f.account_type
"

For more comprehensive examples and detailed usage, see our πŸ“š Usage Guide.

🌐 Storage & Catalog Support

Storage Type Description Use Case
Local Filesystem File-based storage Development, testing
Embedded MinIO S3-compatible local server Cloud workflow testing
External MinIO Remote MinIO instance Shared development
Catalog Type Description Use Case
SQLite Embedded local catalog Single-user development
REST External Iceberg REST catalog Multi-user, production

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   CLI Layer     β”‚    β”‚  Storage Layer  β”‚    β”‚  Catalog Layer  β”‚
β”‚                 β”‚    β”‚                 β”‚    β”‚                 β”‚
β”‚ β€’ import        │◄──►│ β€’ Local FS      │◄──►│ β€’ SQLite        β”‚
β”‚ β€’ sql/shell     β”‚    β”‚ β€’ MinIO S3      β”‚    β”‚ β€’ REST API      β”‚
β”‚ β€’ table ops     β”‚    β”‚ β€’ Cloud storage β”‚    β”‚ β€’ Authenticationβ”‚
β”‚ β€’ pack/unpack   β”‚    β”‚ β€’ File:// URIs  β”‚    β”‚ β€’ Multi-user    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                       β”‚                       β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   Apache Iceberg    β”‚
                    β”‚                     β”‚
                    β”‚ β€’ Table format      β”‚
                    β”‚ β€’ Time travel       β”‚
                    β”‚ β€’ Transaction log   β”‚
                    β”‚ β€’ DuckDB engine     β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“š Documentation

Feature Documentation

πŸ—ΊοΈ Roadmap

βœ… Current Version (v0.1.0)
  • βœ… SQLite & REST catalog support with authentication
  • βœ… Embedded MinIO server with S3-compatible API
  • βœ… Parquet import with schema inference
  • βœ… SQL engine with DuckDB integration
  • βœ… Interactive SQL shell with rich features
  • βœ… Time-travel queries for historical data analysis
  • βœ… Table & namespace management operations
  • βœ… Pack/Unpack for portable project archives
πŸš€ Future Releases
  • Cloud Storage - Native S3, GCS, Azure integration
  • Streaming Ingestion - Real-time data processing
  • Web UI - Browser-based data exploration
  • Advanced Analytics - Enhanced query capabilities
  • SDK Libraries - Programmatic access

🀝 Contributing

We welcome contributions! Icebox is designed to be approachable for developers at all levels.

Quick Contribution Guide
  1. 🍴 Fork the repository and create a feature branch
  2. πŸ§ͺ Write tests for your changes
  3. πŸ“ Update documentation as needed
  4. βœ… Ensure tests pass with go test ./...
  5. πŸ”„ Submit a pull request
Development
# Build from source
git clone https://github.com/TFMV/icebox.git
cd icebox/icebox
go mod tidy
go build -o icebox cmd/icebox/main.go

# Run tests
go test ./...
Areas for Contribution
  • πŸ› Bug fixes and stability improvements
  • πŸ“š Documentation and examples
  • ✨ New features and enhancements
  • πŸ§ͺ Test coverage improvements
  • 🎨 CLI/UX enhancements

πŸ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.


Made with ❀️ for the data community

⭐ Star this project β€’ πŸ“š Usage Guide β€’ πŸ› Report Issue

Directories ΒΆ

Path Synopsis
cmd
icebox command
engine
fs
pkg
sdk

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL