Skip to content

GradientHQ/parallax

Repository files navigation

Trusted by Partners

SGLang vLLM Qwen DeepSeek Kimi Minimax ZAI

license issue resolution open issues

Parallax by Gradient - Host LLMs across devices sharing GPU to make your AI go brrr | Product Hunt

Gradient | Blog | X/Twitter (Gradient) | X/Twitter (Parallax) | Discord | Arxiv

News

  • [2026/02] 🦞 Parallax now supports OpenClaw integration! See Docs
  • [2025/10] 🔥 Parallax won #1 Product of The Day on Product Hunt!
  • [2025/10] 🔥 Parallax version 0.0.1 has been released!

About

A fully decentralized inference engine developed by Gradient. Parallax lets you build your own AI cluster for model inference onto a set of distributed nodes despite their varying configuration and physical location. Its core features include:

  • Host local LLM on personal devices
  • Cross-platform support
  • Pipeline parallel model sharding
  • Paged KV cache management & continuous batching for Mac
  • Dynamic request scheduling and routing for high performance

The backend architecture:

User Guide

Quick Install

git clone https://github.com/GradientHQ/parallax.git
cd parallax
./install.sh
source .venv/bin/activate
parallax serve -m Qwen/Qwen3.5-0.8B

Contributing

We warmly welcome contributions of all kinds! For guidelines on how to get involved, please refer to our Contributing Guide.

Supported Models

Provider HuggingFace Collection Blog Description
DeepSeek Deepseek DeepSeek-V3.2
DeepSeek-R1
DeepSeek-V3.2 Release DeepSeek-V3.2 is built for stronger reasoning and agent workflows, adding thinking-aware tool use while keeping both thinking and non-thinking tool-use modes available.
MiniMax MiniMax AI MiniMax-M3
MiniMax-M2.7
MiniMax M3: Frontier Coding, 1M Context, Native Multimodality MiniMax-M3 is a sparse-attention model for coding, agentic work, long-context tasks, and multimodal input, with MiniMax-M2.7 kept as a supported sibling model.
GLM Z AI GLM-5.2
GLM-5.1
GLM-4.7
GLM-5.2: Built for Long-Horizon Tasks GLM-5.2 is Z AI's latest flagship model for long-horizon coding agents, large-scale implementation, automated research, and performance optimization.
Kimi-K2 Moonshot AI Kimi-K2-Thinking
Kimi-K2-Instruct-0905
Kimi K2 Thinking Kimi-K2-Thinking is the reasoning-focused Kimi-K2 variant for long-horizon agentic workflows, multi-step reasoning, and tool-heavy tasks.
Qwen Qwen Qwen3.6-35B-A3B Qwen3.6-35B-A3B: Agentic Coding Power, Now Open to All Qwen3.6-35B-A3B is an open-weight MoE model focused on stable, practical coding and agentic workflows, with Qwen3-Next and larger Qwen3 MoE variants also represented.
gpt-oss OpenAI gpt-oss-120b
gpt-oss-safeguard-120b
Introducing gpt-oss-safeguard gpt-oss-120b is OpenAI's open-weight GPT model; gpt-oss-safeguard-120b adds policy-driven reasoning classification for flexible moderation and safety workflows.
Step StepFun Step-3.5-Flash Step 3.5 Flash Step-3.5-Flash is a sparse MoE foundation model for efficient reasoning and agentic workflows, activating a small subset of its total parameters per token.