Run useful local AI on older NVIDIA GPUs.
Local-first Β· Open-source Β· GGUF runtime Β· Gateway dashboard Β· Agent-ready
Kimari Local AI is an open-source framework for running local language models on consumer NVIDIA GPUs, especially older cards such as the GTX 1060 6GB and GTX 1080 8GB.
Kimari provides:
- a CLI-first local AI workflow;
- GPU-aware profiles;
- llama.cpp / GGUF runtime support;
- an OpenAI-compatible local endpoint;
- a Gateway Dashboard;
- local integration helpers for Open WebUI, Continue.dev, OpenClaw and Hermes;
- private training/evaluation infrastructure for future Kimari models.
Status: alpha software. Useful today, but not production-ready.
Kimari today is:
- A local AI framework and CLI.
- A GGUF/llama.cpp workflow for old NVIDIA GPUs.
- A local OpenAI-compatible endpoint helper.
- A Gateway Dashboard preview.
Kimari today is not:
- A new inference engine replacing llama.cpp.
- A public Kimari-4B model.
- A production server.
- A benchmark leaderboard.
See docs/PROJECT_TRUTH.md for the full honesty document.
We are iterating on a custom model to power Kimari. The focus is benchmark honesty: the model must not fabricate scores or metrics.
| Phase | Status | Result |
|---|---|---|
| V2 SFT (SmolLM3-3B) | β | 1 safety + 2 factual regressions |
| V3 SFT (+12 corrective) | β | es-tech-003 fixed, refuse-010 persisted |
| V4 SFT (+6 aggressive) | β | Aggregate improved, benchmark honesty still failed |
| A/B: Qwen3 vs SmolLM3 | β | Qwen3 wins (better honesty + tech quality) |
| DPO Pilot 15% (Qwen3) | β | 3G/0R/7Y β 0 benchmark fabrications |
| DPO Full 420 (Qwen3) | β | Regressed (1 fabrication) |
| Expanded eval (180 prompts) | β³ | Candidate V5 pending |
Current best: Pilot15 adapter (Smouj013/kimari-v5-qwen3-dpo-pilot15-adapter, private).
Gate: BLOCKED β must reach 0 fabrications in expanded evaluation.
No public weights, adapters, GGUF files or benchmark claims at this stage.
- Older GPU support β Designed specifically for GTX 1060 and GTX 1080, not just the latest cards.
- Zero cloud dependency β Everything runs locally. No subscriptions, no API keys, no telemetry.
- OpenAI-compatible β Drop-in endpoint for existing tools and integrations.
- CLI-first β One command to install, one command to start, one command to diagnose.
- Honest status β No inflated benchmarks, no "coming soon" claims. Alpha means alpha.
| Area | Status |
|---|---|
| Framework / CLI | β Usable alpha |
| Local GGUF runtime | β Working |
| OpenAI-compatible endpoint | β Working |
| GTX 1060 validation | β Validated with TinyLlama test model |
| Gateway Dashboard | β Local preview |
| Open WebUI / Continue / OpenClaw configs | β Documented |
| Kimari SFT/private adapter work | π Private only |
| Public Kimari-4B weights | β Not released |
| Public GGUF Kimari model | β Not released |
| Release gate | π BLOCKED |
Kimari is the framework. Kimari-4B is not released yet.
No public Kimari weights, adapters or GGUF files are available at this stage.
curl -fsSL https://raw.githubusercontent.com/smouj/kimari-local-ai/main/install.sh | bashOr the secure alternative (recommended for review):
curl -fsSLO https://raw.githubusercontent.com/smouj/kimari-local-ai/main/install.sh
less install.sh
bash install.sh --dry-run
bash install.sh --with-test-model --yesThen open the guided console:
kimari consolegit clone https://github.com/smouj/kimari-local-ai.git
cd kimari-local-ai
pip install -e .
kimari doctor --deepkimari pull testkimari startLocal endpoint: http://127.0.0.1:11435/v1
kimari gateway setup
kimari gateway start --openThe dashboard is controlled through the Kimari CLI. Users should not need to run
npmmanually.
| Dashboard Overview (Light) | Dashboard Overview (Dark) |
|---|---|
![]() |
![]() |
Gateway Dashboard at
http://127.0.0.1:3105β monochrome design, dark/light mode. Start withkimari gateway start --open.
Kimari includes a local dashboard for monitoring and managing your local AI environment.
kimari gateway start --openThe dashboard shows:
- local runtime status;
- GPU and VRAM information;
- model status;
- profiles;
- integrations;
- logs;
- experimental chat playground;
- release gate status.
Security defaults:
| Setting | Default |
|---|---|
| Host | 127.0.0.1 |
| Public bind | Disabled |
| Mode | Local preview |
| Gate | BLOCKED |
| Tokens in UI | No |
| Public model upload | No |
See docs/GATEWAY_DASHBOARD.md for details.
Kimari has been tested on a real NVIDIA GTX 1060 6GB under WSL2 with llama-server CUDA.
| Metric | CUDA GTX 1060 | CPU-only |
|---|---|---|
| Prompt processing | 228 tok/s | 77 tok/s |
| Token generation | 73 tok/s | 33 tok/s |
| Model VRAM | 1221 MiB | β |
| Detail | Value |
|---|---|
| GPU | NVIDIA GeForce GTX 1060 6GB |
| OS | WSL2 Ubuntu 24.04 |
| Backend | llama-server CUDA |
| Test model | TinyLlama 1.1B Q4_K_M |
| Kimari-4B | Not released |
This validation uses TinyLlama as a test model. It is not a Kimari-4B benchmark.
See docs/GTX1060_SHOWCASE.md and docs/GTX1060_LOCAL_RUNTIME_RESULT.md.
| Feature | Command / Docs |
|---|---|
| Diagnostics | kimari doctor --deep |
| Guided setup | kimari setup --write --yes |
| Test model download | kimari pull test |
| Local API server | kimari start |
| Chat via CLI | kimari chat "hello" |
| Gateway dashboard | kimari gateway start --open |
| Open WebUI config | kimari integrations generate --target openwebui |
| Continue.dev config | kimari integrations generate --target continue |
| OpenClaw config | kimari integrations generate --target openclaw |
| Model verification | kimari models hash / kimari models verify |
| Update check | kimari update check --online |
| Performance tuning | kimari optimize / kimari perf |
| Benchmark | kimari benchmark --dry-run |
Full CLI reference: docs/CLI_REFERENCE.md
Kimari currently runs compatible local GGUF models. Official Kimari models are still private or planned.
| Model line | Status |
|---|---|
| TinyLlama test profile | β Available for validation |
| Kimari Runtime 1.5B | π Private experiments |
| Kimari Core 3B | π Private experiments |
| Kimari-4B | β Not released |
| Official Kimari GGUF | β Not released |
Kimari follows an open-license model policy. Official public releases must use permissive-compatible base models and pass private evaluation, manual review and local GGUF validation.
See docs/KIMARI4B_RELEASE_GATE.md, docs/KIMARI_OPEN_LICENSE_POLICY.md, docs/KIMARI_BASE_MODEL_LICENSE_MATRIX.md.
Kimari-4B model training is active. The focus is benchmark honesty: 0 fabricated benchmark claims.
| Phase | Date | Result |
|---|---|---|
| V2 SFT (SmolLM3-3B) | May 2026 | 1 safety + 2 factual regressions |
| V3 SFT (+12 corrective) | May 2026 | Partial improvement |
| V4 SFT (+6 aggressive) | May 2026 | es-tech-003 fixed, honesty persisted |
| A/B: Qwen3 vs SmolLM3 | May 2026 | Qwen3 selected |
| DPO Pilot 15% (Qwen3) | May 2026 | 3G/0R/7Y β 0 fabrications |
| DPO Full 420 (Qwen3) | May 2026 | Regressed (1 fabrication) |
| Expanded eval (180 prompts) | May 2026 | β³ Candidate V5 pending |
Best checkpoint: Pilot15 (Smouj013/kimari-v5-qwen3-dpo-pilot15-adapter, private).
Gate: BLOCKED
See:
- docs/KIMARI_V5_QWEN3_DPO_SMOKE_REPORT.md
- docs/KIMARI_V5_FULL420_REGRESSION_REPORT.md
- docs/KIMARI4B_RELEASE_GATE.md
Kimari-4B is not public yet. For real local agent workflows today, use:
agent-qwen1060β GTX 1060 6GB, Qwen3-4B Q4_K_M, 4K contextagent-qwen1080β GTX 1080 8GB, Qwen3-4B Q4_K_M, 8K contextagent-smollm1060β safer 3B fallback for GTX 1060
kimari pull recommended
kimari start --profile agent-qwen1060| Profile | GPU | VRAM | Quantization | Context | Status |
|---|---|---|---|---|---|
test |
Any 6 GB+ | 6 GB | Q4_K_M | 4,096 | Default (alpha) |
agent-qwen1060 |
GTX 1060 | 6 GB | Q4_K_M | 4,096 | β Public model |
agent-qwen1080 |
GTX 1080 | 8 GB | Q4_K_M | 8,192 | β Public model |
agent-smollm1060 |
GTX 1060 | 6 GB | Q4_K_M | 4,096 | β Public model (low VRAM) |
gtx1060 |
GTX 1060 | 6 GB | Q4_K_M | 8,192 | Requires Kimari-4B |
gtx1080 |
GTX 1080 | 8 GB | Q5_K_M | 16,384 | Requires Kimari-4B |
turbo |
6 GB+ | 6 GB | IQ4_XS | 8,192 | Requires Kimari-4B |
The
testprofile is the default during alpha, using TinyLlama 1.1B. For real agent work, useagent-qwen1060oragent-smollm1060with public GGUF models. When Kimari-4B is published,gtx1060will become the new default.
| Topic | Link |
|---|---|
| Install guide | docs/INSTALL_ONE_COMMAND.md |
| Console guide | docs/KIMARI_CONSOLE.md |
| Gateway dashboard | docs/GATEWAY_DASHBOARD.md |
| CLI reference | docs/CLI_REFERENCE.md |
| Local endpoint test | docs/LOCAL_OPENAI_ENDPOINT_TEST.md |
| Run agents now | docs/RUN_AGENTS_NOW.md |
| Tool | Link |
|---|---|
| Open WebUI | docs/OPENWEBUI_LOCAL_SETUP.md |
| OpenClaw | docs/OPENCLAW_LOCAL_SETUP.md |
| Continue.dev | docs/CONTINUE_LOCAL_SETUP.md |
| Hermes | docs/OPENWEBUI_OPENCLAW_QUICK_CONFIG.md |
| Topic | Link |
|---|---|
| Release gate | docs/KIMARI4B_RELEASE_GATE.md |
| Open-license policy | docs/KIMARI_OPEN_LICENSE_POLICY.md |
| Dataset policy | docs/KIMARI_SFT_V1_DATASET.md |
| Eval plan | docs/KIMARI_EVAL_PRIVATE_V1.md |
| Training history | docs/KIMARI4B_RUN_HISTORY.md |
| Training plan | docs/MODEL_TRAINING_PLAN.md |
| Topic | Link |
|---|---|
| Performance tuning | docs/PERFORMANCE_TUNING_PLAN.md |
| Model hashing | docs/MODEL_HASHING.md |
| Benchmarks | docs/MEASURED_BENCHMARKS.md |
| Experimental API | docs/API_EXPERIMENTAL.md |
| PyPI release gate | docs/PYPI_RELEASE_GATE.md |
| Architecture | docs/00-03_architecture.md |
| Security | SECURITY.md |
| Privacy | PRIVACY.md |
| Resource | Link |
|---|---|
| Kimari organization | https://huggingface.co/kimari-ai |
| Kimari Fit Lab | https://huggingface.co/spaces/kimari-ai/kimari-fit-lab |
| Reference GGUF collection | https://huggingface.co/collections/Smouj013/kimari-compatible-gguf-models-6a0352c75d2bfeff34d51e66 |
The Hugging Face Space is a compatibility/demo tool. It does not run Kimari-4B. The collection contains reference/community GGUF models, not official Kimari models.
| Stage | Goal |
|---|---|
| β Current | Local runtime + Gateway Dashboard + GitHub Pages landing page + Console + Installer |
| Next | Private adapter runtime preview |
| Next | Agent Gateway tools and web-search dry-run |
| Next | Manual review of private outputs |
| Later | Private GGUF export |
| Later | Public preview decision |
See ROADMAP.md for the full roadmap.
Kimari is local-first and conservative by default:
- localhost-only defaults;
- no public bind unless explicitly requested;
- no token storage in dashboard;
- no public model upload;
- no public GGUF;
- no benchmark claims without reproducible validation;
- no automatic gate transitions.
Kimari Local AI is released under the MIT License. See LICENSE.
Model weights are not included in this repository. See MODEL_LICENSES.md for model licensing information.


