-
Notifications
You must be signed in to change notification settings - Fork 2
Home
Bare-Metal AI Operating System Current Status: 95% complete (AI Runtime: 100%) Blocking: Native hardware testing only Last Updated: 29 January 2026
EMBODIOS is the world's first bare-metal AI operating system - where the AI model runs directly on hardware without a traditional OS layer. No userspace. No OS overhead. Just transformers and hardware.
What started as a Friday night experiment has evolved into a production roadmap demonstrating that kernel-space AI inference is not only possible, but offers significant advantages for specific use cases.
| Component | Status | Completion |
|---|---|---|
| Kernel Foundation | Memory, boot, interrupts, DMA, scheduler | 95% ✅ |
| AI Runtime | GGUF, BPE, streaming inference, quantization | 100% ✅ |
| Drivers | NVMe, VirtIO, e1000e, PCI, TCP/IP, Industrial | 85% ✅ |
| Performance | SIMD, parallel inference, benchmarks | 90% ✅ |
| Documentation | Wiki, README, Contributing guide | 100% ✅ |
| Overall | Ready for v1.0 - hardware testing only | 95% |
- ✅ Interactive Chat Mode:
talkcommand for dedicated conversation sessions - ✅ Performance Stats: Separate
perfcommand for timing metrics - ✅ Console UX: Polished help system, status display, typo suggestions
- ✅ Production ISO Builder:
scripts/create_iso.shwith GRUB boot menu - ✅ Streaming Inference: Memory-efficient with parallel workers
- ✅ Q4_K NEON SIMD: Fused matmul for ARM64
- ✅ Stability Tests: 1h-72h automated long-running tests
- ✅ GGUF Parser: Full support for TinyLlama, Phi-2, SmolLM, Mistral-7B
- ✅ BPE Tokenizer: Proper tokenization from GGUF vocabulary
- ✅ All Quantization Types: Q4_K, Q5_K, Q6_K, Q8_0
Overview & Planning:
- Executive-Summary - Vision, targets, and roadmap overview
- Current-State-Analysis - Detailed status, what works, what's missing
- Development-Roadmap - Three-phase plan to v1.0
Architecture & Strategy:
- Three-Strategic-Pillars - Core development tracks
- Pillar-1:-Ollama-GGUF-Integration - Industry-standard model support (85% COMPLETE)
- Pillar-2:-Linux-Driver-Compatibility - Reuse existing drivers
- Pillar-3:-Performance-Optimization - 85+ tokens/sec target
Integration Guides:
- llama.cpp-Integration-Roadmap - Core transformer implementation
- EMBODIOS---exo-Integration-Architecture - Distributed inference (post-v1.0)
Functionality:
- ✅ Load TinyLlama-1.1B Q4_K_M using GGUF format
- ✅ Generate coherent text
- ✅ Switch between models dynamically
- ✅ Interactive chat mode (
talkcommand) - ⏳ Boot on real hardware (Intel NUC) - blocking item
Performance:
- ⏳ 85+ tokens/sec inference - needs native hardware
- ✅ <20ms first token latency
- ✅ ±0.5ms latency jitter (10x better than userspace)
- ✅ <1 second boot time
Deliverables:
- ✅ Production ISO with manifest system (
scripts/create_iso.sh) - ✅ Complete documentation in GitHub Wiki
- ✅ Contributing guide with code style
- ✅ Console Commands reference
- ✅ Example models (SmolLM, TinyLlama, Phi-2, Mistral-7B)
- ✅ Works in QEMU
| Metric | llama.cpp | EMBODIOS v1.0 | Status |
|---|---|---|---|
| Speed | 83-86 tok/s | 85-95 tok/s | |
| Memory | 160 MB | 120 MB | ✅ 25% less |
| Latency | ±5-10ms | ±0.5ms | ✅ 10x better |
| Boot | N/A | <1 sec | ✅ Instant |
| First token | ~50ms | <20ms | ✅ 2.5x faster |
Core Technologies:
- Kernel: x86_64 and ARM64 multiboot2
- AI Format: GGUF (Ollama-compatible)
- Reference: llama.cpp (transformer implementation)
- Drivers: Linux compatibility layer
- Optimization: SIMD (SSE2, AVX2), integer-only math
Development Tools:
- GRUB 2.x (bootloader)
- QEMU (testing)
- GCC/Clang (compilation)
- Docker (reproducible builds)
- No Userspace: AI model runs in kernel mode with direct hardware access
- Zero-Copy: Identity-mapped memory eliminates DMA overhead
- Reuse, Don't Rewrite: Linux driver compatibility layer
- Industry Standard: GGUF format for Ollama ecosystem compatibility
- Performance First: SIMD throughout, cache-optimized data structures
Phase 1: Foundation ✅ COMPLETE
- GGUF parser with metadata extraction
- Core Linux compatibility shim
- Quick performance wins (KV cache, pre-computed embeddings)
Phase 2: AI Runtime ✅ 85% COMPLETE
- Multi-model support (load/switch/unload)
- BPE tokenizer from GGUF
- Transformer inference
Phase 3: Drivers & Polish
- VirtIO drivers (net, block)
- NVMe driver for real hardware
- 85+ tokens/sec validation
- Production ISO and documentation
See Development-Roadmap for complete breakdown.
Full guide: Contributing - Code style, PR process, testing
Choose Your Pillar:
- Kernel hacker? → Pillar-2:-Linux-Driver-Compatibility
- AI researcher? → Pillar-1:-Ollama-GGUF-Integration
- Performance engineer? → Pillar-3:-Performance-Optimization
Getting Started:
- Read the Executive-Summary
- Check Current-State-Analysis for what's done
- Review Contributing guide
- Pick a task from Development-Roadmap
Getting Started:
- Getting-Started - Quick start guide
- Console-Commands - Complete command reference
- API-Reference - API documentation
- Modelfile-Reference - Model configuration
- Hardware-Requirements - Supported hardware
Technical Deep Dives:
- Architecture-Overview - System architecture
- Architecture-Comparison - Comparison with other systems
- Bare-Metal-Deployment - Deployment guide
- Performance-Benchmarks - Benchmark results
- Quantized-Integer-Inference - Integer-only AI inference
- Project-Structure - Codebase layout
- GitHub: EMBODIOS Repository
- Models: GGUF format from Hugging Face/Ollama
- Reference: llama.cpp
- Testing: QEMU x86_64 and ARM64
Kernel-space AI enables:
- Ultra-low latency: 10x better consistency for real-time AI
- Minimal footprint: 25% less memory for edge/embedded devices
- Direct hardware access: No syscall overhead, zero-copy DMA
- Security: Model isolation at kernel level
What started as "that's crazy" is now a production roadmap.
This underground, cyberpunk project is looking for contributors.
Last Updated: 29 January 2026 Project Status: 75% complete, AI runtime 90% done Next Milestone: Real hardware testing + Performance benchmarking vs llama.cpp