Elastickv
Overview
Elastickv is an experimental project undertaking the challenge of creating a distributed key-value store optimized for cloud environments, in a manner similar to DynamoDB. This project is currently in the planning and development phase, with the goal to incorporate advanced features like Raft-based data replication, dynamic node scaling, and automatic hot spot re-allocation. Elastickv aspires to be a next-generation cloud data storage solution, combining efficiency with scalability.
THIS PROJECT IS CURRENTLY UNDER DEVELOPMENT AND IS NOT READY FOR PRODUCTION USE.
Implemented Features (Verified)
- Raft-based Data Replication: KV state replication is implemented on Raft, with leader-based commit and follower forwarding paths.
- Shard-aware Data Plane: Static shard ranges across multiple Raft groups with shard routing/coordinator are implemented.
- Durable Route Control Plane (Milestone 1): Durable route catalog, versioned route snapshot apply, watcher-based route refresh, and manual
ListRoutes/SplitRange (same-group split) are implemented.
- Protocol Adapters: gRPC (
RawKV/TransactionalKV), Redis (core commands + MULTI/EXEC and list operations), and DynamoDB-compatible API (PutItem/GetItem/DeleteItem/UpdateItem/Query/Scan/BatchWriteItem/TransactWriteItems) implementations are available (runtime exposure depends on the selected server entrypoint/configuration).
- DynamoDB Compatibility Scope:
CreateTable/DeleteTable/DescribeTable/ListTables/PutItem/GetItem/DeleteItem/UpdateItem/Query/Scan/BatchWriteItem/TransactWriteItems are implemented.
- Basic Consistency Behaviors: Write-after-read checks, leader redirection/forwarding paths, and OCC conflict detection for transactional writes are covered by tests.
Planned Features
- Dynamic Node Scaling: Automatic node/range scaling based on load is not yet implemented (current sharding operations are configuration/manual driven).
- Automatic Hot Spot Re-allocation: Automatic hotspot detection/scheduling and cross-group relocation are not yet implemented (Milestone 1 currently provides manual same-group split).
Development Status
Elastickv is in the experimental and developmental phase, aspiring to bring to life features that resonate with industry standards like DynamoDB, tailored for cloud infrastructures. We welcome contributions, ideas, and feedback as we navigate through the intricacies of developing a scalable and efficient cloud-optimized distributed key-value store.
Architecture
Architecture diagrams are available in:
docs/architecture_overview.md
Deployment/runbook documents:
docs/docker_multinode_manual_run.md (manual docker run, 4-5 node cluster on multiple VMs, no docker compose)
Metrics and Grafana
Elastickv now exposes Prometheus metrics on --metricsAddress (default: localhost:9090 in main.go, 127.0.0.1:9090 in cmd/server/demo.go single-node mode). The built-in 3-node demo binds metrics on 0.0.0.0:9091, 0.0.0.0:9092, and 0.0.0.0:9093, and uses the bearer token demo-metrics-token unless --metricsToken is set.
The exported metrics cover:
- DynamoDB-compatible API request rate, success/error split, latency, request/response size, and per-table read/write item counts
- Raft local state, leader identity, current members, commit/applied index, and leader contact lag
Provisioned monitoring assets live under:
monitoring/prometheus/prometheus.yml
monitoring/grafana/dashboards/elastickv-cluster-overview.json
monitoring/grafana/provisioning/
monitoring/docker-compose.yml
If you bind --metricsAddress to a non-loopback address, --metricsToken is required. Prometheus must send the same bearer token, for example:
scrape_configs:
- job_name: elastickv
authorization:
type: Bearer
credentials: YOUR_METRICS_TOKEN
To scrape a multi-node deployment, bind --metricsAddress to each node's private IP and set --metricsToken, for example --metricsAddress "10.0.0.11:9090" --metricsToken "YOUR_METRICS_TOKEN".
For the local 3-node demo, start Grafana and Prometheus with:
cd monitoring
docker compose up -d
monitoring/prometheus/prometheus.yml assumes the demo token demo-metrics-token. If you override --metricsToken when running go run ./cmd/server/demo.go, update authorization.credentials in that file to match.
Example Usage
This section provides sample commands to demonstrate how to use the project. Make sure you have the necessary dependencies installed before running these commands.
Starting the Server
To start the server, use the following command:
go run cmd/server/demo.go
Migrating Legacy BoltDB Raft Storage
Recent versions store Raft logs and stable state in Pebble (raft.db) instead of
the legacy BoltDB files (logs.dat and stable.dat). If startup fails with:
legacy boltdb Raft storage "logs.dat" found in ...
stop the node and run the offline migrator against the directory shown in the
error:
go run ./cmd/raft-migrate --dir /var/lib/elastickv/n1
mv /var/lib/elastickv/n1/logs.dat /var/lib/elastickv/n1/logs.dat.bak
mv /var/lib/elastickv/n1/stable.dat /var/lib/elastickv/n1/stable.dat.bak
For multi-group layouts, pass the exact group directory from the error message
(for example /var/lib/elastickv/n1/group-1).
After that, start Elastickv normally. The migrator leaves the legacy files in
place as a backup, but they must be moved or removed before startup because the
server intentionally refuses to run while logs.dat or stable.dat are still
present.
To expose metrics on a dedicated port:
go run . \
--address "127.0.0.1:50051" \
--redisAddress "127.0.0.1:6379" \
--dynamoAddress "127.0.0.1:8000" \
--metricsAddress "127.0.0.1:9090" \
--raftId "n1"
Starting the Client
To start the client, use this command:
go run cmd/client/client.go
Working with Redis
To start the Redis client:
redis-cli -p 63791
Setting and Getting Key-Value Pairs
To set a key-value pair and retrieve it:
set key value
get key
quit
Connecting to a Follower Node
To connect to a follower node:
redis-cli -p 63792
get key
Redirecting Set Operations to Leader Node
redis-cli -p 63792
set bbbb 1234
get bbbb
quit
redis-cli -p 63793
get bbbb
quit
redis-cli -p 63791
get bbbb
quit
Manual Route Split API (Milestone 1)
Milestone 1 includes manual control-plane APIs on proto.Distribution:
ListRoutes
SplitRange (same-group split only)
Use grpcurl against a running node:
# 1) Read current durable route catalog
grpcurl -plaintext -d '{}' localhost:50051 proto.Distribution/ListRoutes
# 2) Split route 1 at user key "g" (bytes are base64 in grpcurl JSON: "g" -> "Zw==")
grpcurl -plaintext -d '{
"expectedCatalogVersion": 1,
"routeId": 1,
"splitKey": "Zw=="
}' localhost:50051 proto.Distribution/SplitRange
Example SplitRange response:
{
"catalogVersion": "2",
"left": {
"routeId": "3",
"start": "",
"end": "Zw==",
"raftGroupId": "1",
"state": "ROUTE_STATE_ACTIVE",
"parentRouteId": "1"
},
"right": {
"routeId": "4",
"start": "Zw==",
"end": "bQ==",
"raftGroupId": "1",
"state": "ROUTE_STATE_ACTIVE",
"parentRouteId": "1"
}
}
Notes:
expectedCatalogVersion must match the latest ListRoutes.catalogVersion.
splitKey must be strictly inside the parent range (not equal to range start/end).
- Milestone 1 split keeps both children in the same Raft group as the parent.
Development
Running Jepsen tests
Jepsen tests live in jepsen/. Install Leiningen and run tests locally:
curl -L https://raw.githubusercontent.com/technomancy/leiningen/stable/bin/lein > ~/lein
chmod +x ~/lein
(cd jepsen && ~/lein test)
These Jepsen tests execute concurrent read and write operations while a nemesis
injects random network partitions. Jepsen's linearizability checker verifies the
history.
Setup pre-commit hooks
git config --local core.hooksPath .githooks