A correctness-first, embeddable distributed key-value database built in Rust.
KayaDB is the database for people who believe crashes should be test cases, not horror stories.
The complete documentation is published on GitHub Pages:
→ https://tuntii.github.io/KayaDB/
It includes:
- Installation (crates.io, release binaries, build from source)
- Getting started, CLI reference, and runbooks
- Architecture, specifications (WAL, LSM, recovery, Raft…)
- Correctness testing (SimDisk, Jepsen, chaos-matrix CI)
- Security, releases, contributing, and roadmap
Sources live in docs/ (Docsify on GitHub Pages; SUMMARY.md kept for GitBook compatibility).
Quick links in this README:
KayaDB is an open-source storage engine and distributed key-value database prototype designed around a simple thesis:
If a storage bug cannot be reproduced, inspected, and turned into an invariant, it is not really fixed.
The project combines an LSM-tree storage engine, a write-ahead log, deterministic disk fault injection (SimDisk), a replayable simulator, a Raft prototype, a TCP server, an async Rust client, and an operator CLI — all inside one intentionally small Rust workspace.
Full documentation: docs/README.md · Installation · Getting started
This README is a high-level overview. Deep dives live on the docs site.
# CLI + embedded mode (no server needed)
cargo install kayactl
# Server binary
cargo install kaya-server --bin kayadb-serverPre-built binaries: GitHub Releases (v0.1.43 and later).
Rust library: kaya-engine = "0.1.43" — see installation guide.
| Area | Status | Notes |
|---|---|---|
| Write-ahead log | ✅ Implemented | CRC32C-protected records, append, recovery, inspection |
| LSM storage | ✅ Implemented | Memtable, SSTable, manifest, flush, L0 compaction |
| Crash recovery | ✅ Implemented | Durable-prefix recovery, tail truncation, idempotence tests |
| Deterministic disk faults | ✅ Implemented | SimDisk, FaultSchedule, replayable operation ordering |
| Simulator | ✅ Implemented | Seeded workloads, trace replay, reference-model checks |
| Fuzz targets | ✅ Implemented | WAL, SSTable, manifest, server command frame decoders |
| Raft state machine | ✅ Prototype | Election, AppendEntries, commit index, simulation coverage |
| TCP cluster mode | ✅ Prototype | Joint-consensus membership, leader-routed client operations |
| Async Rust client | ✅ Implemented | kaya-client with leader redirection support |
| Operator CLI | ✅ Implemented | Local mode, server mode, inspect, stats, add-node/remove-node |
| Production hardening | ✅ M13 | Native TLS (tls feature), operator token, mTLS sidecar runbooks, chaos validation |
| Storage algorithms | 🟡 M14 | Pluggable compaction policy, SSTable bloom filter, WAL group-commit batching |
KayaDB completed M13 productization (2026-06-21) and is now in M14 correctness+algorithm (v0.1.43): deeper LSM policies, module splits, and expanded Jepsen/chaos CI. It remains a correctness-first distributed KV engine with documented day-2 runbooks — not a fully hardened multi-tenant SaaS database. See security and deployment notes and accepted risks (§7) before any production-like deployment.
- Rust 1.85 or newer
- Cargo
- Linux, macOS, or Windows for development
git clone https://github.com/Tuntii/KayaDB.git
cd KayaDB
cargo build --workspace
cargo test --workspaceCI gates on:
cargo fmt --all -- --check
cargo clippy --workspace --all-targets -- -D warnings
cargo test --workspacekayactl can open an embedded engine directly against a data directory:
cargo run -p kayactl -- --data ./.kayadb-demo put hello world
cargo run -p kayactl -- --data ./.kayadb-demo get hello
cargo run -p kayactl -- --data ./.kayadb-demo scan he
cargo run -p kayactl -- --data ./.kayadb-demo stats
cargo run -p kayactl -- --data ./.kayadb-demo recover --dry-runIn one terminal:
cargo run -p kaya-server --bin kayadb-server -- --data ./.kaya-node1 --raft-addr 127.0.0.1:7481 --client-addr 127.0.0.1:7379In another terminal:
cargo run -p kayactl -- --server 127.0.0.1:7379 put hello world
cargo run -p kayactl -- --server 127.0.0.1:7379 get hello
cargo run -p kayactl -- --server 127.0.0.1:7379 statusStart one command per terminal:
cargo run -p kaya-server --bin kayadb-server -- --node-id 1 --raft-addr 127.0.0.1:7481 --client-addr 127.0.0.1:7379 --peer 2=127.0.0.1:7482,127.0.0.1:7380 --peer 3=127.0.0.1:7483,127.0.0.1:7381 --data ./.kaya-node1cargo run -p kaya-server --bin kayadb-server -- --node-id 2 --raft-addr 127.0.0.1:7482 --client-addr 127.0.0.1:7380 --peer 1=127.0.0.1:7481,127.0.0.1:7379 --peer 3=127.0.0.1:7483,127.0.0.1:7381 --data ./.kaya-node2cargo run -p kaya-server --bin kayadb-server -- --node-id 3 --raft-addr 127.0.0.1:7483 --client-addr 127.0.0.1:7381 --peer 1=127.0.0.1:7481,127.0.0.1:7379 --peer 2=127.0.0.1:7482,127.0.0.1:7380 --data ./.kaya-node3Then talk to any node. If the contacted node knows the leader, kayactl and kaya-client can follow the redirect:
cargo run -p kayactl -- --server 127.0.0.1:7379 put user:1 ada
cargo run -p kayactl -- --server 127.0.0.1:7380 get user:1
cargo run -p kayactl -- --server 127.0.0.1:7381 status --jsonFor a longer walkthrough, see docs/getting-started.md.
Use kaya-engine when you want the storage engine in-process:
use std::sync::Arc;
use kaya_core::{DurabilityMode, EngineConfig};
use kaya_engine::{Engine, ReadOptions, WriteOptions};
use kaya_io::FileDisk;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let data_dir = std::env::temp_dir().join("kayadb_embedded_example");
let config = EngineConfig {
data_dir: data_dir.clone(),
..EngineConfig::default()
};
let disk = Arc::new(FileDisk::new(data_dir));
let mut engine = Engine::open(config, disk).await?;
engine
.put(
b"hello".to_vec(),
b"world".to_vec(),
WriteOptions {
durability: Some(DurabilityMode::Strict),
idempotency_key: None,
},
)
.await?;
let value = engine.get(b"hello", ReadOptions::default()).await?;
assert_eq!(value.as_deref(), Some(&b"world"[..]));
Ok(())
}See crates/kaya-engine/examples/embedded.rs.
Use kaya-client when your application talks to a running server:
use std::net::SocketAddr;
use kaya_client::KayaClient;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let addr: SocketAddr = "127.0.0.1:7379".parse()?;
let mut client = KayaClient::connect(addr).await?;
client.put(b"hello", b"world").await?;
if let Some(value) = client.get(b"hello").await? {
println!("{}", String::from_utf8_lossy(&value));
}
Ok(())
}See crates/kaya-client/examples/.
For the full architecture, crate map, data flows, write/read paths, and design decisions, see the Architecture chapter in the documentation.
KayaDB emphasizes design-first + correctness-first development:
- All persistent formats are inspectable via
kayactl inspect - The same engine code runs against real
FileDiskand deterministicSimDisk - Extensive use of seeded simulation, trace replay, crash/recovery idempotence tests, and fuzzing
See:
Inspect any on-disk artifact:
kayactl inspect wal ./data/wal-000001.wal
kayactl inspect sstable ./data/sst-000001.sst
kayactl --data ./data recover --dry-run --jsonBenchmark suite (Criterion):
cargo bench -p kaya-benchDetailed numbers and methodology live in BENCHMARKS.md.
See the full status and roadmap and the tracked productization north star (M13 exit gates — prototype → deployable product).
M13 productization is complete (durable Raft state, TLS, operator token, day-2 runbooks). M14 (v0.1.43) adds compaction policy selection, SSTable bloom filters, WAL batching, god-file splits, and Jepsen/chaos-matrix CI gates. Remaining deployment hardening (data-at-rest, multi-tenant, audit logging) is documented as accepted risk in security.md §7. Planned next: Jepsen full suite hardening and Linux io_uring backend — see ROADMAP.md.
For the complete picture, use the official documentation or docs/README.md.
KayaDB is open source and contributor-friendly, but correctness culture is non-negotiable.
- Please read CONTRIBUTING.md before submitting changes.
- We follow the Contributor Covenant Code of Conduct.
- Security vulnerabilities: see .github/SECURITY.md.
Run the full checks locally before pushing:
cargo fmt --all -- --check
cargo clippy --workspace --all-targets -- -D warnings
cargo test --workspaceGood first areas are listed inside CONTRIBUTING.md.
KayaDB is dual-licensed under MIT OR Apache-2.0.
You may choose either license when using or contributing to the project.
KayaDB: make the storage layer explain itself.
If this project interests you, star it, break it, replay it, and send the invariant back as a PR.