Skip to content
#

gated-deltanet

Here are 2 public repositories matching this topic...

Language: All
Filter by language

From-scratch C++/CUDA LLM inference engine for the NVIDIA RTX 5090 (sm_120a). The fastest single-user inference on the 5090: faster decode than llama.cpp, at-or-ahead of vLLM on NVFP4, and the only engine running native NVFP4 on consumer Blackwell. 100% written by Claude Code.

  • Updated Jun 24, 2026
  • Cuda

Run modern hybrid/MoE LLMs correctly and fast on cheap old Tesla P100 / GTX 1080 Ti cards. Fork of ik_llama.cpp: clean concurrent (np>1) Gated-DeltaNet hybrid decoding + Pascal sm_60 FP16 build tuning + built-in fan-out decomposer.

  • Updated Jun 7, 2026
  • Shell

Improve this page

Add a description, image, and links to the gated-deltanet topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the gated-deltanet topic, visit your repo's landing page and select "manage topics."

Learn more