gated-deltanet

Here are 2 public repositories matching this topic...

kekzl / imp

From-scratch C++/CUDA LLM inference engine for the NVIDIA RTX 5090 (sm_120a). The fastest single-user inference on the 5090: faster decode than llama.cpp, at-or-ahead of vLLM on NVFP4, and the only engine running native NVFP4 on consumer Blackwell. 100% written by Claude Code.

Updated Jun 24, 2026
Cuda

poisonxa16 / PXA_llama

Star

Run modern hybrid/MoE LLMs correctly and fast on cheap old Tesla P100 / GTX 1080 Ti cards. Fork of ik_llama.cpp: clean concurrent (np>1) Gated-DeltaNet hybrid decoding + Pascal sm_60 FP16 build tuning + built-in fan-out decomposer.

pascal concurrency cuda moe homelab mixture-of-experts hybrid-models tesla-p100 llama-cpp local-llm llm-inference gguf speculative-decoding qwen3 gated-deltanet ik-llama gtx-1080-ti sm60

Updated Jun 7, 2026
Shell

Improve this page

Add a description, image, and links to the gated-deltanet topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the gated-deltanet topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly