Skip to content
Dmitry Dimcha edited this page Jan 29, 2026 · 18 revisions

EMBODIOS Wiki

Bare-Metal AI Operating System Current Status: 95% complete (AI Runtime: 100%) Blocking: Native hardware testing only Last Updated: 29 January 2026


Welcome

EMBODIOS is the world's first bare-metal AI operating system - where the AI model runs directly on hardware without a traditional OS layer. No userspace. No OS overhead. Just transformers and hardware.

What started as a Friday night experiment has evolved into a production roadmap demonstrating that kernel-space AI inference is not only possible, but offers significant advantages for specific use cases.


Project Status

Component Status Completion
Kernel Foundation Memory, boot, interrupts, DMA, scheduler 95% ✅
AI Runtime GGUF, BPE, streaming inference, quantization 100% ✅
Drivers NVMe, VirtIO, e1000e, PCI, TCP/IP, Industrial 85% ✅
Performance SIMD, parallel inference, benchmarks 90% ✅
Documentation Wiki, README, Contributing guide 100% ✅
Overall Ready for v1.0 - hardware testing only 95%

Recent Achievements (January 2026)

  • Interactive Chat Mode: talk command for dedicated conversation sessions
  • Performance Stats: Separate perf command for timing metrics
  • Console UX: Polished help system, status display, typo suggestions
  • Production ISO Builder: scripts/create_iso.sh with GRUB boot menu
  • Streaming Inference: Memory-efficient with parallel workers
  • Q4_K NEON SIMD: Fused matmul for ARM64
  • Stability Tests: 1h-72h automated long-running tests
  • GGUF Parser: Full support for TinyLlama, Phi-2, SmolLM, Mistral-7B
  • BPE Tokenizer: Proper tokenization from GGUF vocabulary
  • All Quantization Types: Q4_K, Q5_K, Q6_K, Q8_0

Wiki Navigation

Overview & Planning:

Architecture & Strategy:

Integration Guides:


v1.0 Goals

Functionality:

  • ✅ Load TinyLlama-1.1B Q4_K_M using GGUF format
  • ✅ Generate coherent text
  • ✅ Switch between models dynamically
  • ✅ Interactive chat mode (talk command)
  • ⏳ Boot on real hardware (Intel NUC) - blocking item

Performance:

  • ⏳ 85+ tokens/sec inference - needs native hardware
  • ✅ <20ms first token latency
  • ✅ ±0.5ms latency jitter (10x better than userspace)
  • ✅ <1 second boot time

Deliverables:

  • ✅ Production ISO with manifest system (scripts/create_iso.sh)
  • ✅ Complete documentation in GitHub Wiki
  • ✅ Contributing guide with code style
  • ✅ Console Commands reference
  • ✅ Example models (SmolLM, TinyLlama, Phi-2, Mistral-7B)
  • ✅ Works in QEMU

Performance Targets

Metric llama.cpp EMBODIOS v1.0 Status
Speed 83-86 tok/s 85-95 tok/s ⚠️ Needs benchmarking
Memory 160 MB 120 MB 25% less
Latency ±5-10ms ±0.5ms 10x better
Boot N/A <1 sec Instant
First token ~50ms <20ms ✅ 2.5x faster

Technology Stack

Core Technologies:

  • Kernel: x86_64 and ARM64 multiboot2
  • AI Format: GGUF (Ollama-compatible)
  • Reference: llama.cpp (transformer implementation)
  • Drivers: Linux compatibility layer
  • Optimization: SIMD (SSE2, AVX2), integer-only math

Development Tools:

  • GRUB 2.x (bootloader)
  • QEMU (testing)
  • GCC/Clang (compilation)
  • Docker (reproducible builds)

Key Design Principles

  1. No Userspace: AI model runs in kernel mode with direct hardware access
  2. Zero-Copy: Identity-mapped memory eliminates DMA overhead
  3. Reuse, Don't Rewrite: Linux driver compatibility layer
  4. Industry Standard: GGUF format for Ollama ecosystem compatibility
  5. Performance First: SIMD throughout, cache-optimized data structures

Development Phases

Phase 1: Foundation ✅ COMPLETE

  • GGUF parser with metadata extraction
  • Core Linux compatibility shim
  • Quick performance wins (KV cache, pre-computed embeddings)

Phase 2: AI Runtime ✅ 85% COMPLETE

  • Multi-model support (load/switch/unload)
  • BPE tokenizer from GGUF
  • Transformer inference

Phase 3: Drivers & Polish ⚠️ IN PROGRESS

  • VirtIO drivers (net, block)
  • NVMe driver for real hardware
  • 85+ tokens/sec validation
  • Production ISO and documentation

See Development-Roadmap for complete breakdown.


Contributing

Full guide: Contributing - Code style, PR process, testing

Choose Your Pillar:

Getting Started:

  1. Read the Executive-Summary
  2. Check Current-State-Analysis for what's done
  3. Review Contributing guide
  4. Pick a task from Development-Roadmap

Documentation

Getting Started:

Technical Deep Dives:


Quick Links


Why This Matters

Kernel-space AI enables:

  • Ultra-low latency: 10x better consistency for real-time AI
  • Minimal footprint: 25% less memory for edge/embedded devices
  • Direct hardware access: No syscall overhead, zero-copy DMA
  • Security: Model isolation at kernel level

What started as "that's crazy" is now a production roadmap.

This underground, cyberpunk project is looking for contributors.


Last Updated: 29 January 2026 Project Status: 75% complete, AI runtime 90% done Next Milestone: Real hardware testing + Performance benchmarking vs llama.cpp

Clone this wiki locally