Gradient | Blog | X/Twitter (Gradient) | X/Twitter (Parallax) | Discord | Arxiv
- [2026/02] 🦞 Parallax now supports OpenClaw integration! See Docs
- [2025/10] 🔥 Parallax won #1 Product of The Day on Product Hunt!
- [2025/10] 🔥 Parallax version 0.0.1 has been released!
A fully decentralized inference engine developed by Gradient. Parallax lets you build your own AI cluster for model inference onto a set of distributed nodes despite their varying configuration and physical location. Its core features include:
- Host local LLM on personal devices
- Cross-platform support
- Pipeline parallel model sharding
- Paged KV cache management & continuous batching for Mac
- Dynamic request scheduling and routing for high performance
The backend architecture:
- P2P communication powered by Lattica
- GPU backend powered by SGLang and vLLM
- Mac backend powered by MLX LM
git clone https://github.com/GradientHQ/parallax.git
cd parallax
./install.sh
source .venv/bin/activate
parallax serve -m Qwen/Qwen3.5-0.8BWe warmly welcome contributions of all kinds! For guidelines on how to get involved, please refer to our Contributing Guide.
| Provider | HuggingFace Collection | Blog | Description | |
|---|---|---|---|---|
| DeepSeek | Deepseek | DeepSeek-V3.2 DeepSeek-R1 |
DeepSeek-V3.2 Release | DeepSeek-V3.2 is built for stronger reasoning and agent workflows, adding thinking-aware tool use while keeping both thinking and non-thinking tool-use modes available. |
| MiniMax | MiniMax AI | MiniMax-M3 MiniMax-M2.7 |
MiniMax M3: Frontier Coding, 1M Context, Native Multimodality | MiniMax-M3 is a sparse-attention model for coding, agentic work, long-context tasks, and multimodal input, with MiniMax-M2.7 kept as a supported sibling model. |
| GLM | Z AI | GLM-5.2 GLM-5.1 GLM-4.7 |
GLM-5.2: Built for Long-Horizon Tasks | GLM-5.2 is Z AI's latest flagship model for long-horizon coding agents, large-scale implementation, automated research, and performance optimization. |
| Kimi-K2 | Moonshot AI | Kimi-K2-Thinking Kimi-K2-Instruct-0905 |
Kimi K2 Thinking | Kimi-K2-Thinking is the reasoning-focused Kimi-K2 variant for long-horizon agentic workflows, multi-step reasoning, and tool-heavy tasks. |
| Qwen | Qwen | Qwen3.6-35B-A3B | Qwen3.6-35B-A3B: Agentic Coding Power, Now Open to All | Qwen3.6-35B-A3B is an open-weight MoE model focused on stable, practical coding and agentic workflows, with Qwen3-Next and larger Qwen3 MoE variants also represented. |
| gpt-oss | OpenAI | gpt-oss-120b gpt-oss-safeguard-120b |
Introducing gpt-oss-safeguard | gpt-oss-120b is OpenAI's open-weight GPT model; gpt-oss-safeguard-120b adds policy-driven reasoning classification for flexible moderation and safety workflows. |
| Step | StepFun | Step-3.5-Flash | Step 3.5 Flash | Step-3.5-Flash is a sparse MoE foundation model for efficient reasoning and agentic workflows, activating a small subset of its total parameters per token. |
