orcas

module
v0.0.0-...-e4aee15 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 7, 2026 License: MIT

README ΒΆ

OrcaS: Open Ready-to-Use Content Addressable Storage

πŸš€ What is OrcaS?

OrcaS (Open Ready-to-Use Content Addressable Storage) is a lightweight, high-performance object storage system built with Content Addressable Storage (CAS) at its core. It provides enterprise-grade features like instant deduplication, multi-versioning, zero-knowledge encryption, and smart compression - all in a single binary that's ready to deploy.

Why OrcaS?

  • 🌐 Open: Open source (MIT license), transparent, community-driven development
  • βœ… Ready-to-Use: Content Addressable Storage ensures data integrity and automatic deduplication, production-ready out of the box
  • 🎯 Content Addressable Storage: Data is stored by content hash, enabling automatic deduplication and integrity verification
  • ⚑ Instant Upload (Deduplication): Upload files in seconds, not minutes - identical files are detected instantly without uploading
  • πŸ”’ Zero-Knowledge Encryption: Your data, your keys - end-to-end encryption with industry-standard algorithms
  • πŸ“¦ Production Ready: S3-compatible API, VFS mount support, and comprehensive documentation
  • πŸš€ High Performance: Optimized for both small and large files with intelligent packaging and chunking

✨ Key Features

⏱ Instant Upload (Object-level Deduplication)

What it does: Upload identical files instantly without transferring data.

How it works:

  • Calculates multiple checksums (XXH3, SHA-256) for each file
  • Before uploading, checks if identical content already exists
  • If found, creates a reference to existing data instead of uploading
  • Result: Upload time drops from minutes to milliseconds for duplicate files

Use cases:

  • Backup systems (same files across multiple backups)
  • Version control systems (similar files across versions)
  • Multi-user environments (shared files)
  • CDN edge storage (cached content)

Benefits:

  • πŸš€ 99%+ faster uploads for duplicate files
  • πŸ’Ύ Massive storage savings - store 1 copy, reference it N times
  • ⚑ Bandwidth savings - no redundant data transfer
  • πŸ” Automatic integrity verification - content hash ensures data correctness

Deduplication Benefits

πŸ“¦ Small Object Packaging

What it does: Efficiently stores many small files together.

How it works:

  • Groups small files (< 64KB) into packages
  • Reduces metadata overhead and I/O operations
  • Maintains individual file access while optimizing storage

Benefits:

  • πŸ“ˆ 10x+ performance improvement for small file operations
  • πŸ’° Reduced storage costs - less metadata overhead
  • ⚑ Faster operations - batch metadata writes

πŸ”ͺ Large Object Chunking

What it does: Splits large files into manageable chunks.

How it works:

  • Automatically chunks files larger than configured threshold (default 10MB)
  • Each chunk stored independently with its own checksum
  • Enables parallel upload/download and efficient updates

Benefits:

  • πŸ”„ Parallel processing - upload/download chunks concurrently
  • πŸ›‘οΈ Resumable transfers - retry failed chunks independently
  • ✏️ Efficient updates - only modified chunks need re-upload
  • πŸ“Š Better resource utilization - process large files efficiently

πŸ—‚ Object Multi-versioning

What it does: Automatically maintains file version history.

How it works:

  • Each file modification creates a new version
  • Old versions preserved automatically
  • Configurable retention policies
  • Space-efficient through content deduplication

Benefits:

  • πŸ”™ Point-in-time recovery - restore any previous version
  • πŸ›‘οΈ Data protection - accidental deletions are recoverable
  • πŸ“š Audit trail - track all changes over time
  • πŸ’Ύ Space efficient - unchanged data shared across versions

πŸ” Zero-Knowledge Encryption

What it does: End-to-end encryption where only you hold the keys.

How it works:

  • AES-256 encryption (industry standard)
  • Encryption keys never leave your control
  • Optional per-bucket encryption keys
  • Transparent encryption/decryption

Benefits:

  • πŸ”’ Maximum security - even storage admins can't read your data
  • βœ… Compliance ready - meets strict security requirements
  • πŸ›‘οΈ Data privacy - your data, your control
  • 🌍 International standards - AES-256 encryption

πŸ—œ Smart Compression

What it does: Automatically compresses data to save space.

How it works:

  • Configurable compression algorithms (zstd, gzip, etc.)
  • Compression applied before encryption
  • Automatic detection of already-compressed data
  • Per-bucket compression settings

Benefits:

  • πŸ’Ύ Storage savings - typically 30-70% reduction
  • ⚑ Bandwidth savings - less data to transfer
  • 🎯 Smart defaults - works out of the box
  • βš™οΈ Configurable - adjust per your needs

πŸ—οΈ Architecture & Design

Content Addressable Storage (CAS) Core

OrcaS is built on Content Addressable Storage principles, where data is stored and retrieved by its content hash rather than location.

Content Addressable Storage Architecture

Key Benefits of CAS:

  1. Automatic Deduplication: Identical content stored once, referenced many times
  2. Integrity Verification: Content hash ensures data hasn't been corrupted
  3. Efficient Versioning: New versions only store changed content
  4. Simplified Backup: Same content = same hash = no re-upload needed

System Architecture

System Architecture

Instant Upload Flow

Instant Upload Flow

Data Storage Structure

Storage Layout:
β”œβ”€β”€ Metadata (SQLite)
β”‚   β”œβ”€β”€ Objects (files, directories)
β”‚   β”œβ”€β”€ DataInfo (content metadata)
β”‚   β”œβ”€β”€ Versions (version history)
β”‚   └── References (deduplication)
β”‚
└── Data Blocks (File System)
    └── <bucket_id>/
        └── <hash_prefix>/
            └── <hash>/
                └── <dataID>_<chunk_number>

πŸ“Š Performance Highlights

  • Instant Upload: 99%+ faster for duplicate files (milliseconds vs minutes)
  • Small Files: 10x+ performance improvement with packaging
  • Large Files: Parallel chunk processing for optimal throughput
  • Storage Efficiency: 30-70% space savings with compression + deduplication
  • Concurrent Operations: Optimized for high concurrency

Performance Test Reports:

πŸ”§ Path Management

OrcaS supports flexible path management, allowing you to use different storage paths within the same process. This is useful for multi-tenant scenarios or when managing multiple storage locations.

Creating Handlers with Paths

LocalHandler

NewLocalHandler requires both basePath and dataPath parameters:

import (
    "github.com/orcastor/orcas/core"
)

// Create handler with custom paths
handler := core.NewLocalHandler("/custom/base/path", "/custom/data/path")
defer handler.Close()

// basePath: path for main database and bucket databases
// dataPath: path for data file storage
NoAuthHandler

NewNoAuthHandler only requires dataPath parameter. The basePath is automatically set to empty string (no main database):

// Create NoAuthHandler (bypasses authentication)
handler := core.NewNoAuthHandler("/custom/data/path")
defer handler.Close()

// Only dataPath is needed, basePath is always empty for NoAuth mode

Creating Admins with Paths

LocalAdmin

NewLocalAdmin requires both basePath and dataPath parameters:

// Create admin with custom paths
admin := core.NewLocalAdmin("/custom/base/path", "/custom/data/path")

// basePath: path for main database and bucket databases
// dataPath: path for data file storage
NoAuthAdmin

NewNoAuthAdmin only requires dataPath parameter. The basePath is automatically set to empty string (no main database):

// Create NoAuthAdmin (bypasses authentication and permission checks)
admin := core.NewNoAuthAdmin("/custom/data/path")

// Only dataPath is needed, basePath is always empty for NoAuth mode

Path Usage Examples

// Example: Using current directory for both paths
handler := core.NewLocalHandler(".", ".")
admin := core.NewLocalAdmin(".", ".")

// Example: Separate paths for base and data
handler := core.NewLocalHandler("/var/orcas/base", "/var/orcas/data")
admin := core.NewLocalAdmin("/var/orcas/base", "/var/orcas/data")

// Example: NoAuth mode (no main database, only data path)
handler := core.NewNoAuthHandler("/var/orcas/data")
admin := core.NewNoAuthAdmin("/var/orcas/data")

Benefits

  • πŸ”„ Multi-tenant Support: Different contexts can use different storage paths
  • 🎯 Flexible Configuration: Specify paths directly when creating handlers/admins
  • βš™οΈ NoAuth Mode: Simplified path management for NoAuth handlers/admins (only dataPath needed)
  • πŸš€ Process Isolation: Multiple storage locations in the same process

πŸ“š Documentation

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

πŸ“„ License

MIT License - see LICENSE file for details.

⭐ Why Star This Project?

  • 🎯 Production Ready: Battle-tested, actively maintained
  • πŸš€ High Performance: Optimized for real-world workloads
  • πŸ”’ Security First: Zero-knowledge encryption built-in
  • πŸ’Ύ Storage Efficient: Automatic deduplication saves space and costs
  • πŸ› οΈ Easy to Use: S3-compatible API, VFS mount, comprehensive docs
  • 🌟 Innovative: Content Addressable Storage with instant deduplication
  • πŸ“ˆ Actively Developed: Regular updates and improvements
  • 🀝 Open Source: MIT licensed, community-driven

Star us if you find this project useful! ⭐


FOSSA Status

Directories ΒΆ

Path Synopsis
Package core provides test helper functions for managing test environment
Package core provides test helper functions for managing test environment
rpc
s3
cmd command
util
Package util provides S3-specific utility functions for S3 API handlers
Package util provides S3-specific utility functions for S3 API handlers
Package vfs provides ORCAS filesystem implementation, supporting FUSE mounting and random access API
Package vfs provides ORCAS filesystem implementation, supporting FUSE mounting and random access API

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL