๐ป GhostFS - A File System Simulator
๐ฏ What is GhostFS?
GhostFS is a SQL-backed file system emulator that mimics a cloud storage API like Dropbox for example. Instead of dealing with real files and folders, GhostFS creates a virtual file system stored in a DuckDB database that you can traverse, query, and manipulate through a REST API.
Perfect for:
- Testing file migration tools (like ByteWave) without moving real data
- Simulating massive file systems with millions of files and folders
- Prototyping cloud storage integrations with controllable environments
- Load testing file system operations at scale
๐ Why GhostFS?
The Problem
- Testing file migration tools requires terabytes of real data
- Cloud APIs have rate limits and costs during development
- Creating realistic folder structures manually is time-consuming
- Real file systems are slow for large-scale testing
The Solution
- Instant file system generation with configurable depth and complexity
- No storage overhead - millions of "files" in a lightweight database
- Full API control - simulate network issues, auth failures, rate limits
- Realistic testing without the infrastructure costs
โจ Features
Current (v0.1)
- ๐๏ธ DuckDB Backend - Fast, embedded SQL database
- ๐ฑ Intelligent Seeding - Generate realistic folder structures
- ๐ Multi-FS Mode - Primary + secondary tables for migration testing
- ๐ฒ Probabilistic Subsets - Secondary tables with configurable
dst_prob
- ๐ก REST API - Standard HTTP endpoints for file operations
- ๐ Batch Operations - Create/delete multiple items at once
- ๐ฏ Table Management - List and manage multiple file systems
- ๐ Access Tracking - Automatic tracking of accessed folders via
checked flag
- ๐ Write Queues - Non-blocking batch updates for optimal performance
Coming Soon (v0.2+)
- ๐ Network Simulation - Configurable latency, jitter, timeouts
- ๐ Auth Simulation - Token expiration, permission failures
- โก Rate Limiting - Simulate API throttling
- ๐ Metrics & Analytics - Track usage patterns
- ๐ง Plugin System - Extend with custom behaviors
๐๏ธ Architecture & File System Modes
Single-FS vs Multi-FS Mode
GhostFS operates in two distinct modes:
๐ต Single-FS Mode (Default)
- Uses only the primary table (
nodes)
- Perfect for basic file system testing
- All items exist in one unified file system
๐ก Multi-FS Mode (Advanced)
- Uses primary table + secondary tables
- Simulates source โ destination migration scenarios
- Secondary tables contain probabilistic subsets of the primary table
- Each item has a
dst_prob chance of appearing in secondary tables
How Secondary Tables Work
When generating a file system in Multi-FS mode:
- Primary Table is populated with the complete file system
- Secondary Tables are populated by iterating through primary items
- Each item has a probabilistic chance (based on
dst_prob) to be included
- Results in realistic migration scenarios with missing files/folders
Example with dst_prob: 0.7:
Primary Table (Source): Secondary Table (Destination):
โโโ folder1/ โโโ folder1/ โ
(70% chance - included)
โ โโโ file1.txt โ โโโ file1.txt โ
(70% chance - included)
โ โโโ file2.txt โ โโโ file3.txt โ
(70% chance - included)
โ โโโ file3.txt โโโ folder3/ โ
(70% chance - included)
โโโ folder2/ โโโ file6.txt โ
(70% chance - included)
โ โโโ file4.txt
โโโ folder3/ โ folder2/ missing (30% chance - excluded)
โ โโโ file5.txt โ file2.txt missing (30% chance - excluded)
โ โโโ file6.txt โ file4.txt missing (30% chance - excluded)
โโโ folder4/ โ file5.txt missing (30% chance - excluded)
โโโ file7.txt โ folder4/ missing (30% chance - excluded)
System Architecture
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ
โ REST API โโโโโโ GhostFS โโโโโโ DuckDB โ
โ (Chi Router) โ โ Server โ โ โโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ โ Primary โ โ
โ โ โ โ Table โ โ
โ โโโโโโโโโโผโโโโโโโโโ โ โ (nodes) โ โ
โ โ Table Manager โ โ โโโโโโโโโโโโโโโ โ
โ โ (Multi-table) โ โ โโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโโโโ โ โ Secondary โ โ
โ โ โ โ Table 1 โ โ
โ โโโโโโโโโโผโโโโโโโโโ โ โ (subset) โ โ
โ โ Write Queue โโโค โโโโโโโโโโโโโโโ โ
โ โ (Batching) โ โ โโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโโโโ โ โ Secondary โ โ
โ โ โ Table N โ โ
โโโโโโผโโโโโโ โ โ (subset) โ โ
โ Client โ โ โโโโโโโโโโโโโโโ โ
โ App โ โโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโ
๐ ๏ธ Installation & Setup
Prerequisites
Quick Start
# Clone the repository
git clone https://github.com/Voltaic314/GhostFS.git
cd GhostFS
# Install dependencies
go mod download
# Seed the database with sample data
go run main.go
# Start the API server
cd code/api
go run main.go server.go
# Server starts on http://localhost:8086 (configurable via config.json)
Configuration
Create or modify config.json:
Single-FS Mode (Basic)
{
"database": {
"path": "GhostFS.db",
"tables": {
"primary": {
"table_name": "nodes",
"min_child_folders": 2,
"max_child_folders": 8,
"min_child_files": 5,
"max_child_files": 15,
"min_depth": 3,
"max_depth": 6
}
}
},
"network": {
"address": "localhost",
"port": 8086
}
}
Multi-FS Mode (Migration Testing)
{
"database": {
"path": "GhostFS.db",
"tables": {
"primary": {
"table_name": "nodes_source",
"min_child_folders": 3,
"max_child_folders": 10,
"min_child_files": 8,
"max_child_files": 20,
"min_depth": 4,
"max_depth": 8
},
"secondary": {
"destination_partial": {
"table_name": "nodes_dest_partial",
"dst_prob": 0.7
},
"destination_sparse": {
"table_name": "nodes_dest_sparse",
"dst_prob": 0.3
}
}
}
},
"network": {
"address": "localhost",
"port": 8086
}
}
Configuration Explained:
dst_prob: 0.7 = 70% of items from primary will appear in this secondary table
dst_prob: 0.3 = 30% of items from primary will appear in this secondary table
- Multiple secondary tables simulate different migration scenarios
๐ API Reference
Base URL: http://localhost:8086
Tables Management
List All File Systems
POST /tables/list
Response:
{
"success": true,
"data": {
"tables": [
{
"table_id": "uuid-here",
"table_name": "nodes",
"type": "primary"
}
]
}
}
File System Operations
List Items in Folder
POST /items/list
Content-Type: application/json
{
"table_id": "uuid-here",
"folder_id": "root-folder-id",
"folders_only": false
}
Get Root Folder
GET /items/get_root
Content-Type: application/json
{
"table_id": "uuid-here"
}
Create Multiple Items
POST /items/new
Content-Type: application/json
{
"table_id": "uuid-here",
"parent_id": "parent-folder-id",
"items": [
{"name": "New Folder", "type": "folder"},
{"name": "document.txt", "type": "file", "size": 1024}
]
}
Delete Multiple Items
POST /items/delete
Content-Type: application/json
{
"table_id": "uuid-here",
"item_ids": ["item-id-1", "item-id-2"]
}
Get Download URLs
POST /items/download
Content-Type: application/json
{
"table_id": "uuid-here",
"file_ids": ["file-id-1", "file-id-2"]
}
๐ฎ Usage Examples
Migration Testing Scenarios
Scenario 1: Incomplete Migration Detection
# 1. Generate source file system (primary table)
go run main.go
# 2. List source file system
curl -X POST http://localhost:8086/items/list \
-d '{"table_id": "source-table-id", "folder_id": "root"}'
# 3. List destination file system (secondary table with dst_prob: 0.7)
curl -X POST http://localhost:8086/items/list \
-d '{"table_id": "dest-table-id", "folder_id": "root"}'
# 4. Compare results - ~30% of files should be missing from destination
# Your migration tool should detect these missing files
Scenario 2: Incremental Sync Validation
// Test your sync tool's ability to detect missing files
sourceItems := ghostfs.ListItems("source-table-id", "root")
destItems := ghostfs.ListItems("dest-partial-table-id", "root")
// Your sync logic should identify missing items
missingItems := findMissingItems(sourceItems, destItems)
// With dst_prob: 0.7, expect ~30% missing items
// Run your incremental sync
syncTool.SyncMissing(missingItems)
// Validate sync completed successfully
// Connect to GhostFS
client := ghostfs.NewClient("http://localhost:8086")
// List available file systems
tables, _ := client.ListTables()
tableID := tables[0].TableID
// Get root folder contents
items, _ := client.ListItems(tableID, "root")
// Simulate migrating files
for _, item := range items {
if item.Type == "file" {
// Your migration logic here
downloadURL, _ := client.GetDownloadURL(tableID, item.ID)
// Process file...
}
}
Testing Rclone Integration
# Use GhostFS as a WebDAV endpoint (coming soon)
rclone sync ghostfs:/ local:backup/ --dry-run
# Or use the REST API directly
curl -X POST http://localhost:8086/items/list \
-H "Content-Type: application/json" \
-d '{"table_id": "your-table-id", "folder_id": "root"}'
๐ง Development
Project Structure
GhostFS/
โโโ code/
โ โโโ api/ # REST API server
โ โ โโโ routes/
โ โ โ โโโ tables/ # Table management endpoints
โ โ โ โโโ items/ # File/folder CRUD endpoints
โ โ โโโ main.go # API server entry point
โ โ โโโ server.go # Server configuration
โ โโโ db/ # Database layer
โ โ โโโ tables/ # Table management
โ โ โโโ seed/ # Database seeding
โ โ โโโ write_queue.go # Batched writes
โ โโโ types/ # Shared types
โ โโโ api/ # API response types
โ โโโ db/ # Database schema types
โโโ config.json # Configuration
โโโ main.go # Seeder entry point
Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature)
- Commit your changes (
git commit -m 'Add amazing feature')
- Push to the branch (
git push origin feature/amazing-feature)
- Open a Pull Request
๐ฏ Use Cases
- ByteWave - File migration testing and validation
- Cloud Storage SDKs - Integration testing
- Backup Tools - Restore process validation
- File Sync Apps - Conflict resolution testing
- Performance Testing - Large-scale file operation benchmarks
๐บ๏ธ Roadmap
- v0.2 - Network simulation (latency, failures)
- v0.3 - Authentication simulation
- v0.4 - Rate limiting and quotas
- v0.5 - WebDAV/S3 protocol support
- v1.0 - Plugin system and custom behaviors
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
Built with โค๏ธ for the file migration and sync testing community
โญ Star this repo if you find it useful!