GitHub - GradientHQ/parallax: Parallax is a distributed model serving framework that lets you build your own AI cluster anywhere

Trusted by Partners

News

[2026/02] 🦞 Parallax now supports OpenClaw integration! See Docs
[2025/10] 🔥 Parallax won #1 Product of The Day on Product Hunt!
[2025/10] 🔥 Parallax version 0.0.1 has been released!

About

A fully decentralized inference engine developed by Gradient. Parallax lets you build your own AI cluster for model inference onto a set of distributed nodes despite their varying configuration and physical location. Its core features include:

Host local LLM on personal devices
Cross-platform support
Pipeline parallel model sharding
Paged KV cache management & continuous batching for Mac
Dynamic request scheduling and routing for high performance

The backend architecture:

P2P communication powered by Lattica
GPU backend powered by SGLang and vLLM
Mac backend powered by MLX LM

User Guide

Quick Install

git clone https://github.com/GradientHQ/parallax.git
cd parallax
./install.sh
source .venv/bin/activate
parallax serve -m Qwen/Qwen3.5-0.8B

Contributing

We warmly welcome contributions of all kinds! For guidelines on how to get involved, please refer to our Contributing Guide.

Supported Models

	Provider	HuggingFace Collection	Blog	Description
DeepSeek	Deepseek	DeepSeek-V3.2 DeepSeek-R1	DeepSeek-V3.2 Release	DeepSeek-V3.2 is built for stronger reasoning and agent workflows, adding thinking-aware tool use while keeping both thinking and non-thinking tool-use modes available.
MiniMax	MiniMax AI	MiniMax-M3 MiniMax-M2.7	MiniMax M3: Frontier Coding, 1M Context, Native Multimodality	MiniMax-M3 is a sparse-attention model for coding, agentic work, long-context tasks, and multimodal input, with MiniMax-M2.7 kept as a supported sibling model.
GLM	Z AI	GLM-5.2 GLM-5.1 GLM-4.7	GLM-5.2: Built for Long-Horizon Tasks	GLM-5.2 is Z AI's latest flagship model for long-horizon coding agents, large-scale implementation, automated research, and performance optimization.
Kimi-K2	Moonshot AI	Kimi-K2-Thinking Kimi-K2-Instruct-0905	Kimi K2 Thinking	Kimi-K2-Thinking is the reasoning-focused Kimi-K2 variant for long-horizon agentic workflows, multi-step reasoning, and tool-heavy tasks.
Qwen	Qwen	Qwen3.6-35B-A3B	Qwen3.6-35B-A3B: Agentic Coding Power, Now Open to All	Qwen3.6-35B-A3B is an open-weight MoE model focused on stable, practical coding and agentic workflows, with Qwen3-Next and larger Qwen3 MoE variants also represented.
gpt-oss	OpenAI	gpt-oss-120b gpt-oss-safeguard-120b	Introducing gpt-oss-safeguard	gpt-oss-120b is OpenAI's open-weight GPT model; gpt-oss-safeguard-120b adds policy-driven reasoning classification for flexible moderation and safety workflows.
Step	StepFun	Step-3.5-Flash	Step 3.5 Flash	Step-3.5-Flash is a sparse MoE foundation model for efficient reasoning and agentic workflows, activating a small subset of its total parameters per token.

Name		Name	Last commit message	Last commit date
Latest commit History 458 Commits
.github/workflows		.github/workflows
docker		docker
docs		docs
scripts		scripts
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

News

About

User Guide

Quick Install

Contributing

Supported Models

About

Uh oh!

Releases 4

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

News

About

User Guide

Quick Install

Contributing

Supported Models

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 4

Uh oh!

Contributors

Uh oh!

Languages