Local-first AI model routing for serious agents.
One endpoint. Any provider. The router figures out the rest.
Sage Router is a local-first, self-hosted AI model gateway that intelligently routes requests to the best available model based on intent, latency, and capability — not just price.
Sage Router optimizes for getting the job done:
- Intent-based routing: Code tasks go to coding models, creative tasks to creative models, reasoning tasks to reasoning models
- Automatic fallback: If one provider fails or hits rate limits, it seamlessly tries the next
- Dynamic discovery: New models from Ollama, Anthropic, OpenAI, Google, NVIDIA NIM / NVIDIA Cloud, and OpenClaw are auto-detected — no config updates needed
- Zero API lock-in: Use any subscription or key you already have (Ollama, Claude, OpenAI, Gemini, NVIDIA NIM, GitHub Copilot)
- Debuggable routing: Surface the selected provider/model in headers,
/health, or optional debug output
openclaw skill add sage-router --from clawhub
openclaw skill configure sage-routergit clone https://github.com/earlvanze/sage-router.git
cd sage-router
pip install -r requirements.txt # if any
python3 router.py --port 8790Install from the Umbrel App Store or add the personal repo:
# In umbrel.yaml → appRepositories
- https://github.com/earlvanze/umbrel-personal-appsThe Umbrel app pins ghcr.io/earlvanze/sage-router-public:v3.28.7 and stores its config under the app data directory. The built-in config dashboard is accessible from the Umbrel app tile.
For a CDN-style endpoint across multiple Sage Router installs, deploy the Tailnet edge proxy:
cd deploy/tailnet-edge
cp .env.example .env
docker compose up -d --build
tailscale serve --bg --https=443 http://127.0.0.1:8790The edge health-checks each configured Tailnet upstream, routes OpenAI-compatible traffic to the lowest-latency healthy Sage Router node, and keeps provider credentials on the private routers. Publish it privately with Tailscale Serve/Funnel, or front a stable cloud VM edge with Cloudflare for a CDN-style public endpoint. See deploy/tailnet-edge for Google Cloud VM bootstrap and public monetization notes.
The current public deployment is intentionally split:
https://sagerouter.devandhttps://www.sagerouter.devare static Cloudflare Pages (sage-router-web). They host marketing/docs/account UI only.https://app.sagerouter.devis the hosted account/login surface, served by the same Cloudflare Pages project with Supabase Auth redirects pointed at this host.https://app.sagerouter.dev/statusis the public reliability page. It reads/edge/healthand/pricingfrom the public API edge to show selected upstream ID, Tailnet/cloud backend class, CDN-style reliability evidence, lowest-latency retry failover policy, control-plane health, auth mode, rate-limit/quota enforcement, pre-auth generated-key attempt throttling, generated-key revocation posture, customer endpoint, plan limits, and secret-free billing readiness without exposing customer data, internal upstream URLs, Tailnet hostnames, Stripe price IDs, provider credentials, or secrets.https://sagerouter.dev/billingis the dedicated billing recovery page. It explains Stripe checkout, Stripe billing portal, manual/crypto settlement, activation states, generatedsk_sage_*key behavior before and after payment, payment recovery, and safe no-secret support context.https://sagerouter.dev/fusionis the premium compound-model page. It documentssage-router/fusion, thesage-router:fusionserver tool,tool_choice: "required", Pro/Max gating, and thefusion_plan_requiredupgrade path.https://sagerouter.dev/supportis the public support and billing help page. It routes customers to account setup, Stripe billing portal, manual/crypto settlement, quota/API-key troubleshooting, 503 reliability checks, security reporting, and abuse reporting while explicitly telling users not to send prompts, workflow text, provider credentials, OAuth tokens, API keys, private keys, cookies, raw provider responses, or customer data in public support channels.https://sagerouter.dev/managed-accessis the managed-provider-access private beta and Max implementation intake page. It captures contact plus allowlisted qualification buckets such as deployment preference, expected monthly routed request volume, provider access posture, target provider family, commercial preference, support need, and target launch window; it explains that managed access still requires provider authorization, provider terms acknowledgment, an authorized provider allowlist, a configured provider cost model, and plan-margin checks before activation; it does not collect prompts, workflow text, provider credentials, OAuth tokens, generated API keys, private keys, cookies, raw provider responses, actual provider costs, or customer data.https://api.sagerouter.devis a Cloudflare-proxied GCP edge VM. The edge health-checks Tailnet Sage Router installs plus the Google-hosted Sage Router API origin, then routes to the lowest-latency healthy backend. Public health and response headers identify the edge layer and selected backend only with stable public IDs such asupstream-1, never configured upstream URLs or Tailnet hostnames.- Tailnet Edge is the reliability layer for routing to healthy Sage Router installs on a Tailnet. In public mode, set
SAGE_ROUTER_EDGE_AUTH_MODE=supabase:/pricing,/plans,/model-catalog, and/features/agent-nativeare public control-plane metadata;/pricingalso exposes abillingobject with secret-free Stripe checkout readiness, configured plan names, billing portal readiness, manual settlement status paths, activation statuses, and generated-key limits. The same metadata exposespublicLaunch.managedProviderAccess, which must stay disabled until provider resale terms are acknowledged, a provider-family allowlist is configured, a margin policy, positive unit economics backed by a positiveSAGEROUTER_PROVIDER_RESALE_COST_CENTS_PER_1K_REQUESTS, durable quota/rate-limit enforcement, generated-key revocation, operator abuse review, durable operator audit events, and managed-access acceptable-use terms are ready. Public unit-economics rows expose plan revenue and derived maximum safe provider cost per 1,000 requests without exposing actual configured provider costs.providerFamilyReadinessandoneSubscriptionReadinesskeep OpenRouter visible as supported BYOK routing while excluding it from the managed subscription resale offer unless separate provider authorization is added later.SAGEROUTER_MANAGED_PROVIDER_RESALE_ENABLED=1is treated as an operator request only; public metadata keepsenabled: falseand reports missing controls until every prerequisite is satisfied. The marketing site publishes/models,/provider-resale-terms, and/margin-policyas reviewable prerequisites, but those pages do not enable managed resale by themselves;/v1/*and/v1beta/*model APIs accept active generatedsk_sage_*customer API keys; anonymous model API failures stay fail-closed but include account, pricing, status, OpenAI base URL, and API-key-prefix guidance for setup debugging; account/billing UI requests preserve Supabase user JWTs and should be pinned to a hosted control-plane origin withSAGE_ROUTER_CONTROL_PLANE_UPSTREAM; operator analytics such as/analytics/funnelrequires the private edge admin token, is pinned to the control plane, and can injectSAGE_ROUTER_CONTROL_PLANE_TOKENseparately from the Tailnet backend token. Browser login belongs onapp.sagerouter.dev;api.sagerouter.devshould remain API-only. Browser-originating account, billing, and customer-suspension mutations are rejected at the edge unlessOriginis a trusted Sage Router app/local/preview origin; CLI and server clients withoutOriginstill pass through normal auth. Generated keys and account/billing JWT routes are rate-limited bySAGE_ROUTER_EDGE_RATE_LIMITS; generated-key-looking model API attempts are also throttled by client IP throughSAGE_ROUTER_EDGE_AUTH_ATTEMPT_RATE_LIMITbefore Supabase generated-key lookup, so random invalid keys cannot create unbounded service-role reads; generated model API keys can also be counted against durable monthly Supabase quotas withSAGE_ROUTER_EDGE_QUOTA_ENABLED=1after applyingsupabase/migrations/20260619021500_sage_router_usage_quotas.sql. Supabase user JWT validation usesSAGE_ROUTER_EDGE_AUTH_CACHE_SECONDS, but generated customer API keys default toSAGE_ROUTER_EDGE_API_KEY_AUTH_CACHE_SECONDS=0so revocation takes effect on the next request. The private edge admin token is exempt for recovery. Hosted origins should also setSAGE_ROUTER_CLIENT_AUTH_REQUIRED=1; direct origin requests to/v1/models, setup, admin, discovery, and dashboard config routes must fail closed unless they carry a valid operator token, and generated customer keys are only accepted for model metadata/traffic. - After the edge validates a generated customer API key, it forwards customer id, user id, plan, and status as trusted internal headers while replacing the customer key with the private backend token. Hosted routers use those headers only after backend-token auth, keeping route telemetry, account analytics, first-request activation, quota support, and operator review attributed to the paying customer without exposing raw generated keys to Tailnet model backends.
https://sagerouter.dev/quickstartis the hosted API first-request path. It showsOPENAI_BASE_URL=https://api.sagerouter.dev/v1, generatedsk_sage_*key setup, thesage-router/frontierprofile, premiumsage-router/fusion, curl, JavaScript, Python, and Codex examples, plus 401/402/429/503 troubleshooting.https://sagerouter.dev/api-troubleshootingis the no-secret diagnostic path for hosted 401/402/429/503 responses. It documents safe probes,WWW-Authenticate,Retry-After,X-RateLimit-*,X-Quota-*, account/pricing/status onboarding links, and the non-secretapiKeyPrefixwithout asking customers to paste prompts or credentials.https://sagerouter.dev/docs/api-referenceis the hosted API reference for OpenAI-compatible customers. It documentsGET /v1/models,POST /v1/chat/completions,POST /v1/responses, public/model-catalog, generatedsk_sage_*keys, quotas, rate limits, and failover signals.https://sagerouter.dev/docs/gateway-migrationis the Gateway migration guide for OpenAI-compatible customers. It mapsOPENAI_BASE_URL=https://gateway.example/api/v1toOPENAI_BASE_URL=https://api.sagerouter.dev/v1, generatedsk_sage_*keys,sage-router/frontier, premiumsage-router/fusion, model catalog discovery, and the provider terms boundary.https://sagerouter.dev/docs/codexis the dedicated Codex CLI setup path. It shows hostedhttps://api.sagerouter.dev/v1/, localhttp://127.0.0.1:8790/v1/, and Tailnethttp://<tailnet-host>:8790/v1/profiles usingwire_api = "responses"andsage-router/frontier.https://sagerouter.dev/integrationsis the public integrations index. It collects hosted, local port8790, and Tailnet setup paths for OpenAI-compatible clients, Codex, Cursor, Aider, Continue, Claude Code, OpenHands, Anthropic-compatible clients, Ollama, Ollama Cloud, NVIDIA NIM, OpenClaw, Hermes, and Pi agents while preserving the no-secret support boundary.
The hosted account page at https://app.sagerouter.dev/account.html is the customer onboarding surface:
- Create an account or sign in with email, magic link, or an enabled Supabase OAuth provider.
- Choose Lite, Pro, or Max. Stripe checkout posts the selected plan to
/billing/stripe/checkout; after checkout links a Stripe customer, the account page opens/billing/stripe/portalfor self-service billing, payment-method changes, cancellation, and subscription management. Crypto/manual settlement stays available for accounts that are not ready for Stripe, with default settlement amounts derived from the selected monthly plan unless an agreed override is supplied. The account page can create a manual intent and refresh its bounded public status without echoing customer notes. Operators approve pending manual intents through the private/admin/payment-intents/{intent_id}/approvepath; approval activates the selected plan, records a secret-free audit event, rejects replay/stale approvals, and still leaves suspended customers suspended. - Generate an
sk_sage_*API key, copy the raw key while it is shown once, test it against/v1/models, send a first browser-sidesage-router/frontierchat completion from the account page, and use the copyable OpenAI SDK, Codex CLI, Anthropic-compatible, or curl quickstart.
The account page consumes the secret-free /pricing.billing readiness metadata
before opening checkout. If Stripe or the selected plan is not configured, the
Stripe button is disabled, the funnel records account_checkout_unavailable,
and the customer is directed to manual settlement or billing help instead of a
known failing checkout path. If checkout, billing-portal, or manual-settlement
requests fail after a buyer clicks, the account page records only coarse
failure states such as stripe_not_configured, unauthorized,
rate_limited, or service_unavailable so operators can fix conversion
friction without collecting raw billing errors or customer secrets.
Plan-specific pricing links such as /account.html?plan=pro preselect that
checkout plan, remember it locally through signup/login, and restore the plan
from Stripe success/cancel return URLs so new customers do not accidentally
land on the default checkout tier.
API keys created before checkout are stored, but the account page marks routing as blocked until the customer is active, trialing, or manually enabled; the edge enforces the same rule before proxying /v1/* traffic. Revoked keys and inactive accounts are rechecked against Supabase by default on every generated-key request. Customers are limited to SAGE_ROUTER_MAX_ACTIVE_API_KEYS_PER_CUSTOMER active generated keys at a time, default 5; revoked keys do not count against the cap. Signed-in customers can revoke their own generated keys from the account page even while verified-email gates block new key creation, Stripe checkout, or manual payment intent creation, so leaked keys can be shut down immediately without opening a support ticket. Successful customer revokes record a bounded api_key.revoke audit event and anonymous account-funnel revoke telemetry without raw generated keys, key hashes, prompts, provider credentials, or raw error payloads.
Operator abuse controls are fail-closed. GET /admin/customers?q=...&status=...&limit=... and GET /admin/customers/{customer_id} require the private operator token and return bounded customer, usage, activation, review-flag, operator-audit, and public API-key metadata for support and abuse review without raw generated keys, key hashes, provider credentials, prompts, or raw responses. Review flags are server-derived from bounded state such as suspension, routing block, missing first request, key limit, quota pressure, and idle paid usage. POST /admin/customers/{customer_id}/suspend sets the customer status to suspended, revokes all active generated API keys for that customer, records a secret-free operator audit event, and immediately blocks generated-key routing. A suspended status is sticky across Stripe lifecycle webhooks and manual payment approvals, so payment recovery or subscription updates cannot accidentally restore routing for an account held for abuse, chargeback, provider-risk, or security review. After review, POST /admin/customers/{customer_id}/unsuspend defaults the customer to inactive; operators can pass {"status":"active"} only when access should be restored. Previously revoked generated API keys stay revoked, and the unsuspend path also records a bounded operator audit event, so the customer must create a fresh key after reactivation.
The account page also shows current-period usage from the same Supabase usage counter that the public edge enforces, including requests used, remaining monthly quota, the active request-per-minute limit, and upgrade recommendations when routing is blocked or usage passes 75%/90% of the current plan quota. The public edge publishes only safe enforcement metadata on /edge/health so launch readiness can verify Supabase auth, rate limits, pre-auth generated-key attempt throttling, durable quotas, immediate generated-key revocation, and non-wildcard browser CORS without exposing secrets. The built-in API key test calls the public edge's /v1/models endpoint with the generated key so a new customer can separate key, billing, quota, and backend availability problems before configuring an agent. The same account page can send a first browser-side sage-router/frontier chat completion with the session-only generated key, so users can prove paid routing works before copying client configuration. The support page gives those same customers a safe escalation path for account, billing, quota, generated-key, 401/402/429/503, reliability, security, and abuse issues without asking them to paste secrets into public channels. The signed-in account page renders a copyable safe support context packet with only plan, routing, quota, generated-key count, verification, first-request, endpoint, and support-path state; it omits prompts, provider credentials, OAuth tokens, generated API keys, private keys, cookies, raw provider responses, and customer data. The account launch checklist mirrors the $10k MRR activation funnel by marking signed-in account, paid routing, generated key, public-edge verification, and server-recorded first routed usage as separate steps. The hosted analytics dashboard at https://app.sagerouter.dev/analytics.html uses the signed-in account session and calls /account/analytics, so customers see only their own privacy-safe routing telemetry while /analytics and /analytics/funnel remain operator/global endpoints; it also reads the server-derived /account/usage.activation state for plan, usage, generated-key, first-request, quota, and routing status to show the next conversion action before or after checkout. Operators can view the private global launch funnel and operator customer review at https://app.sagerouter.dev/launch-funnel.html by entering the private admin or analytics token; the browser stores that token only in tab-scoped sessionStorage when explicitly requested. The launch funnel endpoint reports waitlist, managed-access beta interest, target provider family, commercial preference, support need, target launch window, and inbound intent demand, signup, generated-key, first-request, paid-conversion, retained-paid, estimated MRR, target attainment, target-aware bottlenecks, conversion actions with owner/surface/success metric, checkout-readiness friction from aggregate checkout unavailable, checkout failure, billing-portal failure, and manual-settlement failure events, anonymous marketing CTA event/plan/source/channel breakdowns, model catalog family/search-bucket demand, browser-visible auth provider state checks for OAuth onboarding, ranked acquisition actions, per-plan $10k MRR gaps, and prioritized plan revenue actions without returning email addresses, prompts, message bodies, API keys, provider credentials, OAuth tokens, raw campaign URLs, raw model search text, or raw responses. The managed-access intake also emits anonymous page-view, form-start, submit, and received events with only allowlisted qualification buckets, so one-subscription and Max implementation demand are visible before full contact submission while email and company fields remain confined to the waitlist path. When there are no qualified source/channel clicks yet, the same endpoint seeds zero-click acquisition actions for Gateway migration, GitHub builder traffic, pricing checkout proof, calculator qualification, model catalog activation, quickstart first-request activation, managed-access beta outreach, and launch-plan outreach so the operator view still points at concrete launch motions. The private launch funnel renders deterministic campaign links for those ranked acquisition actions, using only coarse action buckets to generate public UTM URLs rather than replaying raw visitor URLs, and each link can be copied from the dashboard for founder-sales outreach. The operator customer review calls /admin/customers through the control-plane edge and shows bounded customer, usage, activation, review flags, operator audit events, and public API-key metadata without raw generated keys, API-key hashes, provider credentials, prompts, or raw responses.
The hosted operator dashboard also includes an operational readiness panel that reads public /edge/health and /pricing metadata to show live edge health, API-key enforcement, Stripe checkout/portal readiness, and managed-provider launch gating without using the private token or exposing secrets. It also renders a copyable no-secret operator launch brief that condenses the current $10k MRR snapshot, top conversion move, revenue motions, acquisition links, model catalog demand, managed-access demand, checkout friction, and GitHub OAuth onboarding state for founder-sales and support follow-up without emails, prompts, OAuth tokens, generated API keys, provider credentials, raw campaign URLs, raw model search text, or raw responses.
Programmatic clients should call the API edge directly:
export OPENAI_BASE_URL=https://api.sagerouter.dev/v1
export OPENAI_API_KEY=sk_sage_your_key_here
curl "$OPENAI_BASE_URL/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "sage-router/frontier",
"messages": [{"role": "user", "content": "Say hello from Sage Router"}]
}'Keep anonymous /v1/* traffic blocked at the edge. New users should reach account, billing, and API key workflows through the hosted control plane, then use generated API keys for model traffic.
Dedicated onboarding: https://sagerouter.dev/fusion.
Pro, Max, metered, manual, and operator-enabled customers can use
sage-router/fusion as a premium compound model:
curl "$OPENAI_BASE_URL/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "sage-router/fusion",
"messages": [{"role": "user", "content": "Compare the safest launch options and recommend one."}]
}'Fusion fans the prompt to a small parallel panel of eligible high-quality
routes, then asks a judge route to synthesize consensus, contradictions, gaps,
and useful details into one OpenAI-compatible response. Lite/free generated
keys receive 402 fusion_plan_required.
gateway-style clients can also attach Fusion as a premium server tool:
{
"model": "sage-router/frontier",
"messages": [{"role": "user", "content": "Survey the strongest arguments for and against this launch."}],
"tools": [{"type": "sage-router:fusion"}],
"tool_choice": "required"
}sage-router:fusion server-tool entries are handled by
Sage Router before provider routing, so unknown server-tool markers are not
forwarded to downstream providers. tool_choice: "required" always invokes
Fusion; automatic tool choice invokes Fusion only for prompts that look like
multi-perspective research, comparison, review, risk, or decision work.
Ordinary function-tool workloads should keep using sage-router/agentic or
sage-router/frontier. Fusion route telemetry records only selected
provider/model IDs, elapsed times, status, plan, and auth type; it does not
store prompts, panel answers, final answers, API keys, OAuth tokens, provider
credentials, or raw provider responses.
Fusion requests return the canonical sage-router/fusion response shape.
Hosted plan limits are exposed from /pricing and enforced at the public edge:
| Plan | Price | Included requests | Rate limit |
|---|---|---|---|
| Lite | $6/month | 10,000/month | 60/minute |
| Pro | $30/month | 50,000/month | 180/minute |
| Max | $72/month | 200,000/month | 600/minute |
Customer-facing hosted pricing and plan positioning are published at
https://sagerouter.dev/pricing. The buyer-facing launch plan at
https://sagerouter.dev/launch-plan turns the $10k MRR operating plan into
plan-mix, activation-funnel, operator-evidence, and managed-access boundary
copy. The launch math and $10k MRR operating plan live in
docs/saas-launch-10k-mrr.md.
For acquisition and onboarding, https://sagerouter.dev/quickstart gives new
customers a first hosted API request path, https://sagerouter.dev/api-troubleshooting
gives customers a no-secret 401/402/429/503 diagnostic path,
https://sagerouter.dev/docs/api-reference gives OpenAI-compatible customers
the hosted API contract for models, chat completions, Responses API, quotas,
rate limits, and failover signals,
https://sagerouter.dev/docs/gateway-migration gives gateway users a
base-URL, generated-key, route-profile, and provider-boundary migration path,
https://sagerouter.dev/docs/codex gives Codex CLI users hosted, local port 8790, and Tailnet profile examples, while
https://sagerouter.dev/agent-native explains route profiles, Responses API and Codex compatibility, health-aware fallback, BYOK custody, local/Tailnet/hosted deployment choices, and public feature metadata for agent harnesses, and
https://sagerouter.dev/models gives prospects a searchable public model catalog backed by safe /model-catalog metadata with embedded fallback, including sage-router/fusion as a Pro/Max synthesis route, while keeping live /v1/models behind generated sk_sage_* customer keys, and
https://sagerouter.dev/integrations gives tool-specific setup choices for
OpenAI-compatible clients, Codex, Cursor, Aider, Continue, Claude Code,
OpenHands, Anthropic-compatible clients, Ollama, local port 8790, and Tailnet
agents, and
https://sagerouter.dev/model-routing-calculator helps prospects estimate
routing savings, escalation rules, fallback gaps, and review rates for one
workflow before they create a hosted API key. The calculator recommends
Lite/Pro/Max from workflow volume, risk flags, and routing score, then carries
that plan into /account.html?plan=... for preselected checkout after account
creation. It also reads public /pricing billing readiness metadata so it can
use current plan limits and avoid promising a Stripe checkout when the selected
plan is not configured; in that case it records calculator_checkout_unavailable
and sends the prospect to the account/manual billing path.
The public homepage now treats hosted signup as live: the homepage primary CTA
is Create hosted API key, links directly to /account.html?plan=pro, and
keeps pricing, quickstart, status, model gateway comparison, model catalog,
security, analytics, login, and local GitHub install paths available from the
hero. The waitlist remains an updates/support path, not the primary conversion
path. When a prospect requests the future one-subscription managed access path
or Max implementation support, pricing and comparison pages link to
/managed-access; the private-beta intake stores contact and allowlisted
qualification buckets plus coarse inbound intent from known CTA URLs such as
?intent=max-implementation or ?intent=gateway-migration, so beta,
migration, and implementation demand can be measured without enabling public
provider resale. The intake asks which target provider family and commercial
preference a prospect would buy first, plus support need and target launch
window, including Ollama, OpenAI, and Anthropic private-beta interest for
authorization review.
Browser-originating waitlist writes are guarded before Supabase inserts: Sage
Router production hosts, Cloudflare Pages previews, local development, and exact
origins configured with SAGEROUTER_WAITLIST_ALLOWED_ORIGINS are accepted, and
Turnstile can be enabled as an additional bot challenge. Mutating waitlist
requests must carry an explicit trusted Origin; Referer is only stored as
sanitized attribution metadata and is not accepted as an origin fallback.
The hosted web app uses Supabase Auth. Email/password signup and email magic links are the baseline onboarding path; OAuth buttons are additive and appear only when the matching provider is enabled in Supabase. GitHub login requires a GitHub OAuth/GitHub App client, not repository permissions:
- Homepage URL:
https://app.sagerouter.dev - Authorization callback URL:
https://awtangrlqqsdpksarhwo.supabase.co/auth/v1/callback
The account, login, and analytics pages read https://awtangrlqqsdpksarhwo.supabase.co/auth/v1/settings with the public anon key and hide disabled OAuth providers. When GitHub is disabled but visible in the Supabase settings payload, the UI says GitHub sign-in is pending owner setup and keeps email magic-link/password signup as the supported path. This keeps onboarding usable through email signup while GitHub or other providers are still being configured.
Email signup and magic-link requests attach bounded Supabase user metadata for launch attribution: selected hosted plan, signup surface, auth method, UTM source/medium/campaign, referrer host, and landing path. OAuth clicks persist the same bounded context in browser storage before the provider redirect. This metadata must not include prompts, workflow text, provider credentials, OAuth tokens, generated API keys, private keys, raw URLs, cookies, raw provider responses, or customer data.
Hosted customer actions require verified email by default when
SAGE_ROUTER_SUPABASE_AUTH_ENABLED=1. The account page still loads for
signed-in users and shows the verification state, but API-key creation, Stripe
checkout, and manual crypto payment intent creation return
email_verification_required until Supabase reports email_confirmed_at,
confirmed_at, or verified email metadata. Set
SAGE_ROUTER_REQUIRE_VERIFIED_EMAIL=0 only for trusted private/self-hosted
deployments.
When verification is required, the signed-in account page exposes a resend
verification control backed by Supabase Auth. It uses the authenticated account
email returned by the server, does not ask the user to retype an email address,
and records only aggregate resend-click/sent funnel events without storing
email addresses.
Manual crypto payment recovery is customer-scoped. A signed-in customer can
reload the account page and recover their latest pending or settled manual
payment intent through /billing/crypto/status without passing an intent id;
the response uses the same public payment shape as explicit status checks and
does not echo customer notes or operator-only billing context.
Browser-originating account and billing mutations are also origin-guarded on
the router before customer, API-key, or billing state is touched. Requests with
no Origin header continue to work for CLI and server clients, while present
origins must be Sage Router production hosts, Cloudflare Pages previews, local
development hosts, or exact origins configured with
SAGE_ROUTER_BROWSER_ALLOWED_ORIGINS.
The account page also renders hosted plan selection before sign-in from public
/pricing metadata. The selected Lite/Pro/Max plan is persisted in browser
storage, shows quota, rate limit, and estimated cost per 1,000 requests, and is
used after login when the customer continues to Stripe checkout.
The public homepage, calculator, pricing, launch plan, quickstart, and
model gateway comparison pages emit anonymous pre-signup page-view, CTA, and
quickstart snippet-copy intent to /api/funnel-event so the private launch
funnel can count demand before users create accounts.
The public model catalog also emits page-view, filter, CTA, and bucketed search
intent so operators can measure model-family demand without storing raw search
text.
The event path stores
event name, selected plan, sanitized source/target URL, and small metadata
buckets only; it must not store workflow text, prompt bodies, emails, API keys,
or provider credentials. Browser-originating writes are also guarded by allowed
Sage Router origins: production hosts, Cloudflare Pages preview hosts ending in
.sage-router-web.pages.dev, local development hosts, and any additional exact
origins configured with SAGEROUTER_FUNNEL_ALLOWED_ORIGINS. Mutating funnel
events must carry an explicit trusted Origin; Referer is stored only as
sanitized attribution metadata and is not accepted as an origin fallback. This
keeps the service-role-backed Supabase insert path from becoming a generic
third-party event sink.
The account and standalone login pages also emit privacy-safe activation and
checkout intent events for signup/login attempts, OAuth clicks, wallet connect
attempts, browser-visible auth-provider state checks, plan selection, API-key
creation, setup snippet-copy intent, public-edge key verification, first browser test request success,
Stripe checkout clicks and returns, Stripe portal clicks and returns, and
crypto/manual payment intent clicks. Pricing,
calculator, model gateway comparison, model catalog, login, account, and homepage
events include only coarse attribution buckets such as source surface, UTM
source/medium/campaign tokens, referrer host, landing path, model family, and
search bucket; the operator launch funnel aggregates those into source-surface,
channel, model-family, and search-bucket counts without returning raw URLs, raw
model search text, emails, prompts, credentials, generated keys, wallet
addresses, provider credentials, OAuth secrets, completion text, or API keys.
Account setup snippet-copy events store only the snippet identifier, not the
copied snippet body or generated key. The private launch funnel rolls those
snippet IDs into setup-copy activation and setup-copy to first-request rates so
operators can tighten Codex/OpenAI snippets without storing customer secrets.
These events help diagnose customer drop-off after login, including whether
GitHub OAuth is still pending while email onboarding is available, and which
launch channels produce demand; Stripe webhooks and Supabase customer state
remain the source of truth for paid conversion, quota, and routing entitlement.
The hosted /billing recovery page uses the same event path for account,
pricing, support, troubleshooting, quickstart, and status clicks so payment
recovery friction is visible without collecting invoices, support messages,
secrets, prompts, generated keys, or provider credentials.
Bootstrap the GitHub app and wire Supabase without opening the Supabase dashboard:
bash scripts/bootstrap_github_supabase_auth.shCheck the current GitHub/Supabase auth state without changing anything:
bash scripts/check_github_supabase_auth_status.shThe status helper probes the Sage Router Supabase project
awtangrlqqsdpksarhwo, verifies the management-side site_url, email signup,
and app/API redirect allow-list when SUPABASE_ACCESS_TOKEN is present, then
checks browser-visible /auth/v1/settings with the project anon key. It prints
only pass/warn/fail status, never OAuth client secrets, anon keys, service-role
keys, or management tokens. A GitHub warning means email onboarding still works
and the owner approval step below is still pending. When GitHub is disabled, the
status helper prints the hosted fallback command with
SAGEROUTER_GITHUB_APP_LOCAL_CAPTURE=0 while the bootstrapper defaults to
/home/digit/.openclaw/sage-router-github-auth.env, so the one-time GitHub
client secret is preserved locally before Supabase is patched.
The auth bootstrap, configurator, and read-only status helper silently load
only the needed variables from /home/digit/.openclaw/.env and
/home/digit/.openclaw/sage-router-github-auth.env when those variables are
not already set. Override the first path with SAGEROUTER_SECRET_ENV_FILE and
the GitHub credential path with SAGEROUTER_GITHUB_APP_ENV_OUTPUT.
GitHub requires an owner-approved browser step before it returns app credentials. By default the bootstrap script opens a local browser form, listens on an auto-selected http://127.0.0.1 port, captures GitHub's one-hour manifest code, exchanges it for the app client id/secret, and patches Supabase Auth in the same run.
On WSL/Windows, the bootstrap copies the generated manifest form into the
Windows temp directory, prints both the Windows and WSL paths, and opens the
Windows file:/// URL first. This avoids browser handlers that cannot read
\\wsl.localhost or WSL /tmp paths. If the browser does not appear, open the
printed Windows path manually or use the hosted callback fallback below.
To approve from another Tailnet device, bind the temporary listener on all
interfaces and advertise the machine's Tailnet IP. The script serves the same
one-time GitHub app manifest form at the printed Tailnet URL; open that form
URL, not the callback path directly, because the callback must include GitHub's
temporary ?code=... after approval:
SAGEROUTER_GITHUB_APP_LOCAL_CAPTURE_BIND=0.0.0.0 \
SAGEROUTER_GITHUB_APP_LOCAL_CAPTURE_HOST=100.115.208.70 \
bash scripts/bootstrap_github_supabase_auth.shAfter patching Supabase, the configurator verifies the management API state
(site_url, email auth, GitHub auth, and app/API redirect allow-list entries).
When a public anon/publishable key is available in the environment, it also
checks /auth/v1/settings so the browser-visible OAuth buttons match the
management config before the launch readiness script is rerun.
The verification defaults to the Sage Router project ref
awtangrlqqsdpksarhwo and the anon key published in the hosted app scripts.
If you override Supabase settings from the environment, prefer
SAGE_ROUTER_SUPABASE_URL and SAGE_ROUTER_SUPABASE_ANON_KEY; generic
PUBLIC_*, VITE_*, or SUPABASE_ANON_KEY values are accepted only when the
anon-key JWT belongs to the same project ref, which avoids false results on
machines that also work with other Supabase projects.
If local capture is not available, fall back to the hosted callback page. After approving the app, GitHub redirects to /github-app-manifest with a temporary one-hour code; the page is marked noindex,nofollow, explains that the browser only holds the short-lived manifest code, and prints the exact local exchange command. Rerun the same script with the full callback URL or the raw code:
SAGEROUTER_GITHUB_APP_LOCAL_CAPTURE=0 \
bash scripts/bootstrap_github_supabase_auth.sh
bash scripts/bootstrap_github_supabase_auth.sh 'https://app.sagerouter.dev/github-app-manifest?code=...'
# or:
SAGEROUTER_GITHUB_APP_MANIFEST_CODE=... \
bash scripts/bootstrap_github_supabase_auth.shIf the Supabase Management API token is being refreshed or debugged, preserve the one-time GitHub client secret before the Supabase patch runs:
bash scripts/bootstrap_github_supabase_auth.sh 'https://app.sagerouter.dev/github-app-manifest?code=...'The callback page prints the exact command, including env loading, credential
preservation to /home/digit/.openclaw/sage-router-github-auth.env, and the
launch readiness rerun. It also shows the raw temporary code as a fallback if
clipboard access is blocked by the browser. If the code expires, rerun
SAGEROUTER_GITHUB_APP_LOCAL_CAPTURE=0 bash scripts/bootstrap_github_supabase_auth.sh
and approve the app again.
If a GitHub OAuth App already exists, pass its credentials directly:
SAGEROUTER_GITHUB_CLIENT_ID=... \
SAGEROUTER_GITHUB_CLIENT_SECRET=... \
bash scripts/configure_supabase_github_auth.shCheck the current hosted launch gates with:
set -a; source /home/digit/.openclaw/.env; set +a
scripts/check_sagerouter_launch_readiness.shThe readiness check verifies the public API edge, visible edge-layer headers with redacted public upstream IDs, Supabase auth mode, authenticated rate limits, pre-auth generated-key attempt throttling, durable edge quotas, immediate generated-key revocation, non-wildcard browser CORS, lowest-latency retry failover metadata on /edge/health, redacted public health snapshots without internal upstream URLs, anonymous auth gating, the API-only browser/dashboard boundary on api.sagerouter.dev, browser CORS preflight for the hosted API-key verification, browser first-routed-request, and operator launch-funnel flows, hosted pricing metadata including secret-free Stripe checkout readiness, configured Lite/Pro/Max checkout plans, billing portal readiness, verified-email billing requirements, generated-key activation metadata, and absence of leaked Stripe price IDs or secret tokens, the managed provider access guard, provider-family BYOK boundary, direct origin auth gating, Supabase management auth settings, public browser-visible Supabase auth settings with email and GitHub OAuth enabled, quota, funnel-event, and operator-audit schema, hosted login/account/GitHub callback/operator launch funnel pages, hosted security headers, the public security/trust/support and terms/privacy/acceptable-use pages, the provider-resale terms and margin-policy prerequisite pages, the managed-access private beta intake page, the API quickstart, the API troubleshooting page, the hosted API reference, the Gateway migration guide, the Codex setup page, the agent-native routing page, the integrations index, the dedicated billing recovery page, the model routing calculator, the public launch plan, the operator-only privacy-safe /analytics/funnel endpoint including managed-access beta demand fields, marketing source/channel attribution, and target-aware bottlenecks, the bounded operator /admin/customers review endpoint with secret-free audit events and without raw keys or hashes, the non-mutating waitlist health endpoint on SAGEROUTER_APP_BASE_URL (default https://app.sagerouter.dev), optional Cloudflare Turnstile waitlist configuration, and the marketing comparison/migration/pricing/billing/model/quickstart/troubleshooting/API-reference/Codex/agent-native/integrations/launch-plan pages on SAGEROUTER_MARKETING_BASE_URL (default https://sagerouter.dev). By default, publicLaunch.managedProviderAccess.enabled must be false. If SAGEROUTER_MANAGED_PROVIDER_RESALE_ENABLED=1, readiness requires SAGEROUTER_PROVIDER_RESALE_TERMS_URL, SAGEROUTER_PROVIDER_RESALE_TERMS_ACKNOWLEDGED=1, SAGEROUTER_PROVIDER_RESALE_ALLOWED_PROVIDERS, SAGEROUTER_PROVIDER_RESALE_MARGIN_POLICY_URL, a positive SAGEROUTER_PROVIDER_RESALE_COST_CENTS_PER_1K_REQUESTS, positive unit economics across fixed API plans with SAGEROUTER_PROVIDER_RESALE_MIN_GROSS_MARGIN_PERCENT at 30 or higher, derived max-safe-cost thresholds in public metadata, durable quota/rate-limit enforcement, generated-key revocation, operator abuse review, durable operator audit events, and the managed-access acceptable-use boundary before treating bundled provider access as launchable. Until those checks pass, public metadata reports requested: true, readinessSatisfied: false, enabled: false, the remaining missingControls, provider-family readiness, and one-subscription readiness that keeps OpenRouter BYOK-supported but outside bundled managed resale. The direct-origin probe uses SAGEROUTER_ORIGIN_BASE_URL when set; otherwise it auto-discovers the Cloud Run URL from SAGEROUTER_CLOUD_RUN_PROJECT/SAGEROUTER_CLOUD_RUN_REGION/SAGEROUTER_CLOUD_RUN_SERVICE, defaulting to the live hosted service.
Use the public deploy helper to avoid branch/digest drift between the static site and hosted API:
set -a; source /home/digit/.openclaw/.env; set +a
scripts/deploy_sagerouter_public.shThe helper builds Cloudflare Pages from a clean temporary copy so local
node_modules or Dropbox permissions cannot affect the production build, then
deploys project sage-router-web to production branch main and reruns launch
readiness. To update Cloud Run in the same pass, set an immutable release image
digest:
GHCR_IMAGE_DIGEST=sha256:... scripts/deploy_sagerouter_public.shIf SAGEROUTER_DEPLOY_CLOUD_RUN=1 is set without a digest, the helper resolves
the latest successful GitHub Actions Release image digest from the run log and
deploys that digest through the Artifact Registry GHCR remote cache.
Monthly API-key quotas require the Supabase usage counter table and RPC. Apply
the idempotent migration through the Supabase Management API before enabling
SAGE_ROUTER_EDGE_QUOTA_ENABLED=1:
set -a; source /home/digit/.openclaw/.env; set +a
scripts/apply_supabase_quota_schema.sh
scripts/check_sagerouter_launch_readiness.shWhen a generated customer key exhausts its monthly quota, the public edge
returns HTTP 402 with X-Quota-Period, X-Quota-Limit, X-Quota-Used,
X-Quota-Remaining, and X-Quota-Reset headers plus a secret-free JSON body
containing the current plan, usage, reset epoch, account upgrade URL, billing
URL, support URL, and status URL. Edge quota infrastructure failures remain
HTTP 503 and point customers to status/support rather than suggesting that a
plan upgrade will fix an operator-side issue.
Stripe checkout reuses an existing stripe_customer_id when a customer is already linked, the account page exposes Stripe's customer billing portal after checkout, and Stripe webhook retries are idempotent by event_id. Checkout completion only activates generated-key routing when the signed Checkout Session reports payment_status=paid or payment_status=no_payment_required; unpaid or missing payment status events are recorded but do not grant or change routing entitlement. Signed subscription lifecycle webhooks update customer routing status: active/trialing subscriptions enable generated-key routing, canceled subscriptions disable routing, and failed or uncollectible invoices mark the customer past_due. Later signed invoice.payment_succeeded, invoice.paid, or checkout.session.async_payment_succeeded events restore active generated-key routing after resolving the existing Stripe customer binding and deriving the Sage Router plan from invoice line price IDs or checkout metadata, unless the customer has been operator-suspended. Subscription create/update webhooks derive the effective Sage Router plan from Stripe subscription item price IDs first, then fall back to webhook metadata or the current customer plan, so Stripe portal plan changes do not leave quota and routing state on a stale plan. Webhooks also verify that metadata.customer_id/client_reference_id agrees with any existing stripe_customer_id binding before changing billing state, so stale or misdirected Stripe metadata cannot reassign another customer's quota or routing entitlement. Apply supabase/migrations/20260619034200_stripe_webhook_idempotency.sql anywhere the SaaS tables already exist so duplicate signed webhook deliveries cannot create duplicate payment event rows.
For the existing GCP deployment notes, see deploy/gcp. For the privacy-preserving relay design where customer credentials stay on the user's machine, see docs/cloud-tunnel.
Point any OpenAI-compatible tool at Sage Router:
export OPENAI_BASE_URL=http://localhost:8790/v1
export OPENAI_API_KEY=irrelevant # Sage Router uses your configured provider authOr for Gemini CLI:
export GOOGLE_GEMINI_BASE_URL=http://localhost:8790
export GEMINI_API_KEY=routedOr for Anthropic tools:
export ANTHROPIC_BASE_URL=http://localhost:8790
export ANTHROPIC_API_KEY=irrelevantCodex CLI can use Sage Router through the OpenAI Responses-compatible endpoint. The public setup guide is published at https://sagerouter.dev/docs/codex. For local port 8790, add the provider and profile to ~/.codex/config.toml:
[model_providers.sage-router]
name = "Sage Router"
base_url = "http://127.0.0.1:8790/v1/"
env_key = "SAGE_ROUTER_API_KEY"
wire_api = "responses"
[profiles.sage-router-frontier]
model_provider = "sage-router"
model = "sage-router/frontier"Run Codex with:
export SAGE_ROUTER_API_KEY=local-router
codex --profile sage-router-frontierThe sage-router/frontier model name selects the bundled frontier routing profile from router-profiles.json.
- Codex CLI
- Claude Code
- OpenClaw
- Hermes
- Pi agents
- Cursor
- Aider
- Continue
- OpenHands
- Ollama and Ollama Cloud
- NVIDIA NIM / NVIDIA Cloud
- OpenAI-compatible clients
- Anthropic-compatible clients
- Harness fallback
Sage Router connects directly to chatgpt.com/backend-api/codex using the same OpenAI OAuth token stored in your OpenClaw auth-profiles.json. No API key needed when an auth profile is present — it reads the current ChatGPT session JWT from ~/.openclaw/agents/main/agent/auth-profiles.json and refreshes it on each request.
Sage Router intentionally does not implement its own auth.openai.com/codex/device OAuth route. Use the official Codex/OpenClaw sign-in flow, then import the resulting OpenClaw auth profile or pass an access token through environment.
To use:
# Force a request through the Codex backend
curl http://localhost:8790/v1/chat/completions \
-H "Authorization: Bearer local" \
-d '{"model":"openai-codex/gpt-5.5","messages":[{"role":"user","content":"Hello"}]}'The openai-codex provider is enabled by default. Models: gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.4-pro.
Key env vars:
SAGE_ROUTER_OPENAI_CODEX_AUTH_PROFILE_PATH— path toauth-profiles.json(default:~/.openclaw/agents/main/agent/auth-profiles.json)SAGE_ROUTER_OPENAI_CODEX_AUTH_PROFILE_PATHS— comma-separated auth profile paths, useful for container app-data layoutsOPENAI_CODEX_API_KEY,OPENAI_CODEX_ACCESS_TOKEN, orCODEX_ACCESS_TOKEN— fallback OAuth access token (used if auth-profiles is unavailable)
Local, Tailnet, Umbrel, and Docker installs ship a built-in web dashboard at
the root URL (/). Open that private install URL in a browser to see:
- Provider health status and latency
- Available models per provider
- Usage analytics
- Provider enable/disable toggles
- API key management
- Learned model modalities with per-model edit/reset controls
For programmatic clients (sending Accept: application/json), the private
install root URL returns the JSON API descriptor instead. The dashboard is also
available at /dashboard on private installs.
Hosted SaaS uses a stricter browser boundary: https://api.sagerouter.dev/
and https://api.sagerouter.dev/dashboard return JSON auth/onboarding guidance
instead of serving browser UI. Customer login, account, usage analytics, and
operator launch-funnel dashboards live on https://app.sagerouter.dev, while
model traffic uses generated sk_sage_* API keys at
https://api.sagerouter.dev/v1.
Enable SAGE_ROUTER_SHOW_MODEL_PREFIX=1 to prefix every chat response with [provider/model] so you can see which model answered:
[openai-codex/gpt-5.5] Here is the response...
| Endpoint | Format | Used By |
|---|---|---|
POST /v1/responses |
OpenAI Responses | Codex CLI custom providers |
POST /v1/chat/completions |
OpenAI | OpenAI SDK, Aider, Continue, Zed |
POST /v1/messages |
Anthropic | Cursor, Claude Code, Claude Desktop |
POST /v1beta/models/{model}:generateContent |
Gemini CLI | |
POST /v1beta/models/{model}:streamGenerateContent |
Gemini CLI (streaming) | |
GET /v1beta/models |
Gemini CLI (model discovery) | |
POST /chat/completions |
OpenAI | Legacy/short path |
Sage Router analyzes every request for:
- Intent: CODE, CHAT, REASONING, CREATIVE, REFACTOR, DOCUMENTATION
- Complexity: LOW, MEDIUM, HIGH, UNKNOWN
- Requirements: reasoning, json, tools, longContext, streaming
- Thinking level: off, low, medium, high
Then it scores all available models and selects the optimal chain:
Request: "Refactor this Python function"
→ Intent: CODE, Complexity: MEDIUM
→ Route Mode: balanced
→ Selected Chain:
1. ollama/glm-5.1:cloud (best score for CODE + available)
2. openai-codex/gpt-5.5 (fallback)
3. ollama/kimi-k2:cloud (fallback)
4. openai/gpt-4.1 (last resort)
If the first model fails or times out, it automatically tries the next. No manual retry needed.
The core selection path lives in router.py across normalize_requirements,
prepare_route, select_model, score_provider_model, and route_request.
In pseudocode:
function route_request(payload):
messages = payload.messages
thinking = normalize_thinking(payload.reasoning / payload.thinking)
route_mode = payload.routeMode or "balanced"
requirements = normalize_requirements(payload)
latest_prompt = last user/developer message text
intent = classify_intent(latest_prompt) # code, analysis, general, creative, etc.
complexity = estimate_complexity(latest_prompt)
estimated_tokens = estimate_prompt_tokens(messages)
if caller forced a provider/model:
candidate_chain = validate_forced_route_against_requirements()
else:
candidate_chain = []
rejected = []
for each provider in configured providers:
if provider is disabled: continue
if provider is Ollama: refresh discovered model list
if provider has no models or endpoint is unreachable: continue
for each model in provider.models:
if local-first mode and provider is approved decentralized infrastructure: allow
if local-first mode and provider endpoint is not LAN/Tailnet/local: reject
if local-first mode and provider is a known cloud/SSO proxy: reject
if local-first mode and model is an Ollama Cloud model: reject
if model is not chat-capable: reject
if model does not satisfy hard requirements
such as JSON, tools, streaming, reasoning, or long context: reject
score = base score for intent + provider API type
score += model-name intent hints
score += / -= context-window fit
score += / -= provider/model family preferences
score += / -= route mode preference
# fast, realtime, best, local-first, balanced
score += / -= thinking-level preference
# high favors larger reasoning models, low favors lightweight models
score += / -= current health/cooldown signal
score += / -= empirical latency/success adjustment
if tools were supplied but not forced:
score += soft bonus for models with tool support
candidate_chain.append((score, provider, model))
sort candidate_chain by score descending, then provider/model name for stability
candidate_chain = top MAX_PROVIDER_ATTEMPTS
for provider, model in candidate_chain:
response = call_provider(provider, model, payload)
if response has visible text or valid tool calls:
record successful route event
return response
record failure and try next candidate
record failed route event
return provider_failure_error
Hard filters happen before scoring, so an otherwise high-scoring model is never selected when it cannot satisfy explicit requirements like forced tool calling or JSON output. Soft preferences, such as attached optional tools, influence the score without unnecessarily shrinking the candidate pool.
Configure any number of providers in openclaw.json or via environment variables:
{
"providers": {
"ollama": {
"baseUrl": "http://localhost:11434",
"models": ["auto-discover"],
"api": "ollama"
}
}
}Models are auto-discovered via /api/tags.
{
"providers": {
"anthropic": {
"baseUrl": "https://api.anthropic.com",
"apiKey": "${ANTHROPIC_API_KEY}",
"models": ["claude-opus-4", "claude-sonnet-4", "claude-haiku-4"],
"api": "anthropic-messages"
}
}
}Pro tip: Route Claude subscription usage through Dario to avoid burning API credits when available.
{
"providers": {
"openai": {
"baseUrl": "https://api.openai.com/v1",
"apiKey": "${OPENAI_API_KEY}",
"models": ["auto-discover"],
"api": "openai-completions"
}
}
}Models are auto-discovered via /v1/models.
OpenRouter remains a supported OpenAI-compatible provider. Configure it only with an authorized customer-controlled key; Sage Router does not sell bundled OpenRouter access. OpenRouter models stay discoverable and routable through the BYOK-compatible provider path, but OpenRouter does not count as a managed provider resale family for Sage Router subscription packaging unless a separate provider authorization is added later.
{
"providers": {
"openrouter": {
"baseUrl": "https://openrouter.ai/api/v1",
"apiKey": "${OPENROUTER_API_KEY}",
"models": ["auto-discover"],
"api": "openai-completions"
}
}
}Models are auto-discovered via /v1/models. Set
SAGE_ROUTER_OPENROUTER_FREE_ONLY=1 to keep discovery constrained to free
model IDs when using that account mode.
Darkbloom is OpenAI-compatible at https://api.darkbloom.dev. If DARKBLOOM_API_KEY is present in ~/.openclaw/.env or the skill-local .env, Sage Router loads it automatically through the bundled darkbloom provider profile.
{
"providers": {
"darkbloom": {
"baseUrl": "https://api.darkbloom.dev",
"apiKey": "${DARKBLOOM_API_KEY}",
"models": "auto-discover",
"api": "openai-completions"
}
}
}Models are auto-discovered via /v1/models. Chat requests route through /v1/chat/completions.
{
"providers": {
"google": {
"baseUrl": "https://generativelanguage.googleapis.com/v1beta",
"apiKey": "${GEMINI_API_KEY}",
"models": ["auto-discover"],
"api": "google-generative-ai"
}
}
}Models are auto-discovered via the Gemini API.
{
"providers": {
"github-copilot": {
"baseUrl": "https://api.githubcopilot.com",
"apiKey": "${GITHUB_COPILOT_TOKEN}",
"models": ["auto-discover"],
"api": "openai-completions"
}
}
}Models are auto-discovered via Copilot's /v1/models.
### xAI (Grok)
**API Key mode** (recommended for production):
```json
{
"providers": {
"xai": {
"baseUrl": "https://api.x.ai/v1",
"apiKey": "${XAI_API_KEY}",
"models": ["auto-discover"],
"api": "openai-completions"
}
}
}
Models are auto-discovered via /v1/models. Supports tool calling, streaming, and passthrough.
{
"plugins": {
"entries": {
"nvidia": {
"enabled": true,
"config": {
"autoDiscovery": {
"enabled": true,
"base_url": "integrate.api.nvidia.com/v1",
"api_key": "$NVIDIA_API_KEY"
}
}
}
}
}
}Models are auto-discovered from NVIDIA NIM / NVIDIA Cloud when NVIDIA_API_KEY is present. This is useful for GPU-accelerated hosted inference and NVIDIA-backed model endpoints without changing agent configuration.
{
"providers": {
"openai-codex": {
"baseUrl": "http://127.0.0.1:8790",
"models": ["auto-discover"],
"api": "openclaw-gateway"
}
}
}Models are auto-discovered via the gateway's /v1/models endpoint.
No config needed — Sage Router reads your ChatGPT OAuth JWT from ~/.openclaw/agents/main/agent/auth-profiles.json automatically. Models: gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.4-pro.
The pre-built image is available at ghcr.io/earlvanze/sage-router-public (public). It bundles Sage Router plus Dario and the config dashboard.
Mount an app-owned config directory and add provider or Codex credentials from the dashboard:
docker run -p 8790:8790 \
-v sage-router-config:/config \
-v sage-router-ollama:/root/.ollama \
ghcr.io/earlvanze/sage-router-public:v3.28.7Provider credentials and imported Codex auth JSON are written under /config.
Do not mount another app's private state or a host user's home directory for
auth; copy/import compatible config through the Sage Router setup flow instead.
Or build from source:
# Router only, with Dario available for Anthropic-compatible requests
docker compose up -d --build
# Router + llama.cpp GPU classifier sidecar
SAGE_ROUTER_INTENT_CLASSIFIER_ENABLED=1 \
SAGE_ROUTER_MODELS_DIR=/path/to/gguf-models \
docker compose --profile classifier up -d --buildKey production flags:
SAGE_ROUTER_OPENROUTER_FREE_ONLY=1
SAGE_ROUTER_DARIO_AUTOSTART=1
SAGE_ROUTER_INTENT_CLASSIFIER_ENABLED=1
SAGE_ROUTER_INTENT_CLASSIFIER_PROVIDER=llamacpp
SAGE_ROUTER_INTENT_CLASSIFIER_BASE_URL=http://llamacpp-classifier:8080
SAGE_ROUTER_INTENT_CLASSIFIER_MODEL=classifier
SAGE_ROUTER_INTENT_CLASSIFIER_MODEL_PATH=/models/qwen2.5-0.5b-instruct-q4_K_M.gguf
SAGE_ROUTER_INTENT_CLASSIFIER_N_GPU_LAYERS=999The classifier backend speaks OpenAI-compatible llama.cpp server API (/v1/chat/completions), so it can be run as a sidecar, on Cyber GPU, or replaced by any compatible local inference server.
| Provider | Dynamic Discovery | Force Model | Passthrough | Auth Method |
|---|---|---|---|---|
| Ollama | ✅ /api/tags |
✅ | ✅ | Local socket |
| Google Gemini | ✅ /v1beta/models |
✅ | ✅ | API key |
| Anthropic | ✅ Via Dario | ✅ | ✅ | API key |
| OpenAI | ✅ /v1/models |
✅ | ✅ | API key |
| GitHub Copilot | ✅ /v1/models |
✅ | ✅ | Token |
| NVIDIA NIM / Cloud | ✅ auto-discovery | ✅ | ✅ | API key |
| OpenClaw Gateway | ✅ /v1/models |
✅ | ✅ | Gateway token |
| OpenClaw Codex OAuth | ✅ auto-profile | ✅ | ✅ | ChatGPT JWT (auth-profiles) |
| xAI/Grok (API) | ✅ /v1/models |
✅ | ✅ | API key |
| xAI/Grok (SSO) | ❌ SSO proxy | ❌ | ❌ | Cookie/SSO |
Dynamic Discovery: Models are auto-fetched from provider API
Force Model: Request specific model via "model": "provider/model"
Passthrough: Any model name accepted (even if not in discovered list)
Control how Sage Router selects models:
| Mode | Behavior |
|---|---|
fast |
Prefer local models, minimize latency |
balanced |
Balance capability and speed |
best |
Always pick the best model for the task, regardless of latency |
local-first / local-strict |
Local-strict mode. Only use local, LAN, Tailnet, or approved decentralized provider endpoints. Reject centralized Internet APIs such as OpenAI, Anthropic/Dario, Google, NVIDIA Cloud, Copilot, hosted model gateways, etc. Darkbloom is allowed as decentralized infrastructure. Ollama models ending in :cloud are still excluded even if the Ollama endpoint is localhost. |
Set via request: {"route": "fast"} or header: X-Route-Mode: fast
Control reasoning depth per request:
| Level | Description |
|---|---|
off |
No reasoning, maximum speed |
low |
Minimal reasoning |
medium |
Standard reasoning (default) |
high |
Deep reasoning for complex tasks |
Set via request: {"thinking": "high"} or {"reasoning": "high"}
To surface routing info back in the response payload, send:
{
"debug": true
}or:
{
"routeDebug": true
}Current behavior:
- response headers always include
X-Sage-Router-*routing metadata /healthexposes the last selected provider/model and attempts- debug mode adds
sage_routermetadata to the JSON response - for plain text responses, debug mode also prefixes the visible content with the selected
provider/model
curl http://localhost:8790/healthReturns:
- Configured providers
- Available models
- Last route decision
- Reasoning capabilities by provider
- Selected provider/model, attempt history, and rejection reasons for the last request
Every routed response also includes headers like:
X-Sage-Router-ModelX-Sage-Router-ProviderX-Sage-Router-IntentX-Sage-Router-Request-Id
Use these when you need to know exactly which model answered.
Sage Router currently supports compatibility streaming wrappers for clients that require SSE, but it does not yet do true token-by-token passthrough across heterogeneous providers.
That means stream-shaped responses work for client compatibility, but they may still arrive buffered after the selected provider finishes.
Sage Router was built because switching API keys between coding agents is tedious, burning Claude API credits on trivial tasks is wasteful, and configuring models in 3 different places is fragile.
With OpenClaw Codex OAuth, you get ChatGPT Pro/Codex access through your existing session token — no API key, no gateway subprocess, no stale tokens. The router reads the JWT directly from your OpenClaw auth profile and sends it to chatgpt.com/backend-api/codex.
# Provider API keys (used for auto-discovery)
ANTHROPIC_API_KEY=sk-...
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=AIza...
NVIDIA_API_KEY=nvapi-...
OLLAMA_HOST=http://localhost:11434
# OpenClaw Codex OAuth (chatgpt.com/backend-api/codex)
SAGE_ROUTER_OPENAI_CODEX_AUTH_PROFILE_PATH=~/.openclaw/agents/main/agent/auth-profiles.json
CODEX_ACCESS_TOKEN= # fallback OAuth access token when no auth profile is mounted
# Router behavior
SAGE_ROUTER_DEFAULT_MODE=balanced
SAGE_ROUTER_TIMEOUT=60
SAGE_ROUTER_SHOW_MODEL_PREFIX=1 # prefix responses with [provider/model]
SAGE_ROUTER_DISABLED_PROVIDERS= # comma-separated: ollama-cloud,anthropic,...{ "models": { "providers": { "ollama": { "baseUrl": "http://localhost:11434", "models": [{"id": "llama3.2:latest", "name": "Llama 3.2"}], "api": "ollama" }, "anthropic": { "baseUrl": "https://api.anthropic.com", "apiKey": "${ANTHROPIC_API_KEY}", "models": [{"id": "claude-sonnet-4", "name": "Claude Sonnet"}], "api": "anthropic-messages" } } } }
---
## Development
```bash
# Run locally
python3 router.py --port 8790
# Run tests
pytest tests/
# Enable debug logging
LOG_LEVEL=DEBUG python3 router.py
- Multi-modal support (vision-capable model detection and image payload routing)
- Tool/function calling proxy (OpenAI, Ollama, and Anthropic-compatible tool-call normalization)
- Cloudflare Pages marketing site on
https://sagerouter.dev - Integration guides for major agent harnesses and SDK-compatible clients
- Waitlist capture into AOps Supabase with optional Turnstile abuse protection
- OpenClaw Codex OAuth passthrough (chatgpt.com/backend-api/codex)
- Umbrel App Store packaging (v1.0.4)
- Built-in config dashboard with provider health, toggles, and analytics
- Request/response caching
- Customer-scoped usage analytics dashboard
- Distributed deployment mode
- CDN-hosted option / hosted reliability layer
MIT — Use it, fork it, improve it. PRs welcome.
Built this because I was tired of:
- Switching API keys between coding agents
- Burning Claude API credits on trivial tasks
- Configuring new models in 3 different places
If you're running local AI infrastructure, Sage Router is the single endpoint that makes everything else just work.
🦞
Sage Router supports named routing profiles for reusable policy bundles. Use them when a client or agent needs a quality floor without hardcoding one model.
Request a profile with any of:
{ "model": "sage-router/frontier" }
{ "model": "frontier" }
{ "profile": "frontier" }
{ "routerProfile": "coding-max" }Profiles live in router-profiles.json and can set:
- route mode:
fast,balanced,best,local-first,realtime - thinking level:
low,medium,high - requirements: quality, reasoning, tools, JSON, vision, documents, long context
- constraints: provider/model allowlists and denylists,
minParamsB,frontierLargeOnly,frontierOrReasoningTools,suppressIntermediateToolText
Bundled profiles:
frontier— public-channel quality profile, high thinking, quality/reasoning required, tiny/free filler models blocked, tool-call narration suppressedfusion— premium multi-model panel plus judge synthesis for chat prompts where several authorized routes are worth the extra latency and costfrontier-large— strict frontier/large-model-only routingfast-local— low-latency local-first routingcoding-max— high-thinking coding route with weak model exclusions