NousProxy

OpenAI + Anthropic compatible REST API proxy backed by NousResearch Portal OAuth — run your own inference gateway with subscription-based rate limits. Works with agentic coding tools like OpenCode, Claude Code, and OpenAI Codex CLI.

How It Works

App Client --[API Key]--> NousProxy --[OAuth Agent Key]--> NousResearch Inference API

Proxy handles OAuth device code flow (one-time setup).
Auto-refreshes access token and agent key in the background.
Clients use a static proxy API key for authentication.
Forwards requests to NousResearch inference API with product attribution (tags: ["product=hermes-agent"]).

Quick Start (Docker — Recommended)

cd /opt/nous-proxy

# Build & start
docker compose up -d

# Get your proxy API key
cat data/api_keys.json

# Auth (one-time, auto-polls until you authorize)
curl -X POST http://localhost:8090/auth/device-code \
  -H "Authorization: Bearer np-xxx"

# → Open the URL, enter the code
# → Polling happens automatically in background

# Check auth status
curl http://localhost:8090/auth/status \
  -H "Authorization: Bearer np-xxx"

# Use it
curl http://localhost:8090/v1/chat/completions \
  -H "Authorization: Bearer np-xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "xiaomi/mimo-v2-pro",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# Logs
docker logs -f nous-proxy

# Stop
docker compose down

# Rebuild after code changes
docker compose up -d --build

Data (tokens, API keys) persists in ./data/ via bind mount.

Endpoints

Method	Path	Description
GET	`/`	Landing page
GET	`/health`	Health check + token status
POST	`/auth/device-code`	Start OAuth + auto-poll
GET	`/auth/status`	Check auth/polling status
POST	`/auth/poll`	Fallback: wait for auth completion
POST	`/v1/chat/completions`	OpenAI-compatible chat completions
GET	`/v1/models`	List available models
POST	`/v1/messages`	Anthropic Messages API (Claude Code)
POST	`/v1/messages/count_tokens`	Anthropic token counting stub
POST	`/v1/responses`	OpenAI Responses API (Codex CLI)
POST	`/admin/generate-key`	Generate a new proxy API key

Agentic Coding Tools

1. OpenCode

OpenCode uses AI SDK with OpenAI-compatible API. Configure in ~/.opencode.json:

{
  "provider": {
    "nous": {
      "options": {
        "baseURL": "http://localhost:8090/v1",
        "apiKey": "np-YOUR_PROXY_KEY"
      }
    }
  }
}

Then run /models in OpenCode to select a model (e.g. xiaomi/mimo-v2-pro).

2. Claude Code

Claude Code uses Anthropic Messages API. Configure in .claude/settings.json:

{
  "env": {
    "ANTHROPIC_BASE_URL": "http://localhost:8090",
    "ANTHROPIC_AUTH_TOKEN": "np-***",
    "ANTHROPIC_MODEL": "xiaomi/mimo-v2-pro",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "xiaomi/mimo-v2-pro",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "xiaomi/mimo-v2-pro"
  }
}

The proxy translates Anthropic Messages API to OpenAI chat completions automatically, including tool use and streaming.

Note: Thinking blocks are filtered out to reduce latency — the Xiaomi model generates excessive reasoning (20-30 events) that slows down responses. Tool calls work correctly with proper input_json_delta streaming.

3. OpenAI Codex CLI

Codex CLI uses OpenAI Responses API. Configure in ~/.codex/config.toml:

model = "xiaomi/mimo-v2-pro"
model_provider = "nous"

[model_providers.nous]
name = "NousResearch"
base_url = "http://localhost:8090/v1"
wire_api = "responses"
experimental_bearer_token = "np-***"

The proxy translates Responses API to Chat Completions format automatically.

Note: The web_search tool is filtered out (not supported). The apply_patch custom tool is converted to a function tool. Xiaomi model can generate tool calls but may exhibit "planning mode" behavior — generating text instead of executing tools when the system prompt is complex. This is a model limitation, not a proxy issue.

Models — Free Plan

NousResearch agent key from free OAuth subscription has model access restrictions:

Category	Access	Notes
Standard models	✅	350+ models available
`:free` models (26)	❌	Blocked: "OpenRouter free models are not supported"
`openrouter/*` models (4)	❌	Blocked: "This model is not supported on Free Tier"

$0 Standard Models (Free Plan)

These models cost $0 on the standard (non-:free) path and work with free subscription:

Model	Context	Max Output	Tools	Reasoning
`xiaomi/mimo-v2-pro`	1M	131K	✅	✅
`xiaomi/mimo-v2-omni`	262K	65K	✅	✅

Other $0 models exist (image gen, video gen, reranking) but are not text chat LLMs:

black-forest-labs/flux.2-* — image generation
alibaba/wan-2.6, alibaba/wan-2.7 — video generation
bytedance/seedance-* — video generation
openai/sora-2-pro — video generation
google/veo-3.1 — video generation
cohere/rerank-* — reranking (not chat)

Cheap Paid Models

For access beyond $0 models, check available models via:

curl http://localhost:8090/v1/models \
  -H "Authorization: Bearer np-xxx" | python3 -m json.tool

Configuration

Edit /opt/nous-proxy/.env:

NOUS_PORTAL_URL=https://portal.nousresearch.com
NOUS_INFERENCE_URL=https://inference-api.nousresearch.com/v1
PROXY_PORT=8090
PROXY_API_KEYS=np-xxx   # Comma-separated, or auto-generated

Project Structure

/opt/nous-proxy/
├── pyproject.toml         # Dependencies & build config
├── Dockerfile             # Multi-stage Docker build
├── docker-compose.yml     # Container orchestration
├── docker-entrypoint.sh   # Auto-fix data dir ownership
├── .dockerignore          # Docker build exclusions
├── .env                   # Environment config
├── .env.example           # Template
├── data/                  # Persisted tokens & API keys (bind mount)
│   ├── tokens.json
│   └── api_keys.json
└── nous_proxy/
    ├── __init__.py
    ├── config.py          # Settings (pydantic-settings)
    ├── auth.py            # OAuth device code flow
    ├── token_manager.py   # Token lifecycle + auto-refresh
    ├── api_keys.py        # API key validation
    ├── proxy.py           # Request forwarding + attribution
    ├── anthropic.py       # Anthropic Messages API translator (Claude Code)
    ├── responses.py       # OpenAI Responses API translator (Codex CLI)
    └── main.py            # FastAPI app + CLI

Development

cd /opt/nous-proxy
uv venv .venv
source .venv/bin/activate
uv pip install -e .

# Run with auto-reload
python -m nous_proxy.main --reload

Known Issues

Claude Code

Thinking blocks filtered: Xiaomi model generates 20-30 reasoning events per response, causing high latency. Thinking is filtered out at the proxy level.
Tool call arguments: Fixed — input_json_delta now properly streams tool arguments.
developer role: Merged into system message (Responses API specific).

Codex CLI

web_search tool: Filtered out (not a function tool type).
apply_patch tool: Converted from custom type to function type.
"Planning mode" behavior: Xiaomi model may generate text instead of tool calls when Codex's complex system prompt (~25K chars) is present. This is a model limitation — the model works fine with simpler prompts (e.g., hermes-agent).
Streaming: Responses API streaming works with proper SSE events (response.created, response.output_item.added, response.function_call_arguments.delta, response.completed).

General

Model compatibility: xiaomi/mimo-v2-pro supports reasoning and tool calling, but behavior varies by client system prompt complexity.
Rate limits: NousResearch free tier has rate limits. Upgrade for higher limits.
Context window: Large conversations (>100KB body) may cause timeouts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NousProxy

How It Works

Quick Start (Docker — Recommended)

Endpoints

Agentic Coding Tools

1. OpenCode

2. Claude Code

3. OpenAI Codex CLI

Models — Free Plan

$0 Standard Models (Free Plan)

Cheap Paid Models

Configuration

Project Structure

Development

Known Issues

Claude Code

Codex CLI

General

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
nous_proxy		nous_proxy
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

NousProxy

How It Works

Quick Start (Docker — Recommended)

Endpoints

Agentic Coding Tools

1. OpenCode

2. Claude Code

3. OpenAI Codex CLI

Models — Free Plan

$0 Standard Models (Free Plan)

Cheap Paid Models

Configuration

Project Structure

Development

Known Issues

Claude Code

Codex CLI

General

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages