Route
smarter.
Pay
less.
One OpenAI-compatible endpoint. 345+ models scored on real benchmarks, routed by topic and complexity. Pay per call in USDC on Tempo or Base — no signup, no subscription, no API key. Add council mode when one model is not enough.
Save up to 90% on AI costs.
Most prompts don't need a frontier model. ArcRouter classifies every request by topic and complexity, then routes to the cheapest model that still passes the quality bar. Same OpenAI-compatible endpoint. Pay per call in USDC on Tempo or Base.
Up to
Cost savings on routes that would otherwise hit a frontier model (Sonnet, Opus, GPT-5). The router picks the cheapest model that still passes the benchmark for that topic and complexity tier — no quality drop.
Models scored
Prices refresh daily from OpenRouter. Quality scores blend LiveBench, LiveCodeBench, GPQA Diamond, and HuggingFace evals. No synthetic-only ranks at the top.
Topic categories
Granular routing: code/frontend, math/calculus, science/physics. Plus 4 complexity tiers and agentic detection — better than picking one model for everything.
Compatible with OpenAI SDKs, Claude Code, Cursor, Cline, and all OpenAI-compatible tools.
Drop in. Pick your path.
Same OpenAI-compatible endpoint underneath. Pick the path that matches the surface you already work in — pure HTTP, our typed SDK, or an MCP server for AI coding tools.
OpenAI SDK drop-in
Use the openai package you already have. Change the base URL. Done.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.arcrouter.com/v1",
apiKey: "mpp-handled-via-mppx",
});
const res = await client.chat.completions.create({
messages: [{ role: "user", content: "..." }],
model: "auto",
});Same shape OpenAI returns, plus a routing object with model picked, cost, savings.
TypeScript SDK
@arcrouter/sdk wraps openai. Auto-handles x402 wallet signing + retry.
import { ArcRouter } from "@arcrouter/sdk";
const arc = new ArcRouter({
wallet: { privateKey: process.env.X402_KEY },
budget: "auto",
});
const res = await arc.chat("...");
const verified = await arc.council("...");Same SDK supports streaming, council, workflow budgets, model aliases.
MCP for Claude Code / Cursor
One command. Adds three tools to any MCP client: chat, models, health.
claude mcp add arcrouter \ --transport http \ https://api.arcrouter.com/mcp # Then ask Claude Code: # "route this prompt to the best model" # It calls arcrouter_chat automatically.
Works in Claude Code, Cursor, Cline, Windsurf, and any HTTP MCP host.
All three paths hit the same OpenAI-compatible endpoint. Switch between them as your stack changes — no lock-in to any single integration.
See full guideHow the router decides.
Classify the prompt
Lexical classifier scores topic (24 categories incl. subcategories like code/frontend, math/calculus, science/physics) and complexity tier (SIMPLE / MEDIUM / COMPLEX / REASONING) in under 1ms. Detects whether the request is agentic — tool calls, multi-step, or chained.
Shortlist candidates from D1
Query the model database for candidates matching topic + complexity + budget. Ranked by a complexity-weighted blend of quality (real benchmarks) and value (cost-adjusted). Output-price caps per tier protect margin.
Semantic rerank
Workers AI embeds the prompt (bge-base-en-v1.5, free) and cosine-ranks the shortlist. Final score blends semantic match (0.55) + value (0.35) + reliability (0.10). Circuit breaker skips models that have failed recently.
Direct or OpenRouter
If the chosen model is OpenAI / Anthropic / Google / DeepSeek / xAI — direct provider call (no markup, lower latency). Long tail goes via OpenRouter. Top-3 failover on error. SSE streaming with full routing metadata in response headers and body.
Full pipeline detail — including how complexity weights blend with value scores, embedding model choice, and direct-provider fallbacks.
Read the architecturePay-per-call. Wallet-native.
Two production-grade payment rails on every endpoint. No signup, no API key, no Stripe form. Your agent points an OpenAI SDK at our URL and pays in USDC per request. We advertise the price in the 402, verify the signed proof, then return the LLM response.
MPP
Tempo · chainId 4217Machine Payments Protocol by Tempo + Stripe. Server advertises price in WWW-Authenticate: Payment, client signs with mppx, retries. Sub-cent payments with fee sponsorship support.
- Spec
- RFC 7235 (HTTP Authorization)
- Asset
- USDC.e
- Finality
- ~500ms
Authorization: Payment <credential>
x402
Base · chainId 8453Coinbase x402 spec. Standard 402 → EIP-712 typed signature → retry. Compatible with @x402/core, viem, MetaMask, Coinbase Wallet, Phantom (EVM mode).
- Spec
- HTTP 402 Payment Required
- Asset
- USDC
- Finality
- ~2s
X-PAYMENT <credential>
Dual-rail by design. The 402 challenge advertises both — your client picks whichever wallet it holds. Prices stay in parity across rails so there is no arbitrage to manage.
Read the specRouting decisions backed by real benchmarks.
Every model in our shortlist has a quality score. Scores come from four independent benchmark sources, refreshed daily by a Cloudflare cron. No synthetic-only ranks at the top, no vendor-aligned bias.
Benchmark scores tracked
Across 345+ models, refreshed daily. Quality + value blended by complexity tier when shortlisting candidates.
LiveBench
General reasoning, math, coding — refreshed weekly to prevent contamination.
LiveCodeBench
Code generation across competitive programming problems, decontaminated.
GPQA Diamond
Graduate-level physics, chemistry, biology questions.
HuggingFace Open LLM
Community-run leaderboard across MMLU, HellaSwag, TruthfulQA, ARC, Winogrande.
Composite scores recalculate after every benchmark refresh. The shortlist is filtered by budget and complexity, then an embedding reranker scores semantic match against your prompt.
See the rankingsBuilt for the loop.
Routing a single chat message is easy. Routing 50 calls inside one agent run — with budgets, fallbacks, step semantics, and tool guarantees — is what we built for.
X-Agent-Step
Tell the router which step of a workflow this request is. simple-action → SIMPLE tier, code-generation → COMPLEX, reasoning → REASONING, verification → council. Override the prompt classifier when your agent already knows the answer.
Workflow budgets
Set total_budget_usd for a session. The router auto-downgrades models at 60%, 80%, 95% spent. Returns 402 when exhausted. Per-workflow telemetry at /v1/workflow/{id}/usage — spend, models used, tier distribution.
Session pinning
Pass session_id and the router remembers which model worked. Subsequent calls in the same session pin to that model (1h TTL). Stops a multi-step flow from jumping providers mid-conversation.
Tool-call guarantees
Set tool_choice: "required" and the router enforces it. Fails over to another tool-capable model if the first emits no_op natural-language instead of calling the tool. Detects tools[] and prefers tool-capable models automatically.
Used by agents running long, budgeted workflows. The router treats each step as its own routing decision while keeping spend bounded.
Read the specMulti-model verification on demand.
One model can hallucinate. Five rarely agree on the same wrong answer. Add mode: "council" to any request when accuracy matters more than latency. Pricing scales 5x to cover the parallel inference — honest, no hidden cost.
3–7 models answer in parallel
Diverse providers picked by topic and complexity. Avoids single-vendor bias and single-model failure modes.
Agreement scored
Embedding cosine similarity on paid tier, Jaccard word-overlap on free. Each answer gets an agreement score against the others.
Chairman synthesizes on disagreement
Confidence below 0.6? A separate Chairman model reads all answers and the critiques and synthesizes the final response.
Recommended for legal, medical, financial, and audit-grade prompts. For everyday traffic, default smart routing already picks the best single model — no need to pay 5x.
One endpoint. Every major provider.
Direct connections to OpenAI, Anthropic, Google, DeepSeek, and xAI when keys are configured. OpenRouter fallback covers the long tail. Switch providers without changing code — the router resolves model aliases to whichever model currently benchmarks best for your topic.
Frames Websets
First production integration of MPP — the Tempo + Stripe payment protocol — against our endpoint. Agents pay per call in USDC.e on Tempo. Proves the dual-rail design end-to-end.
Pilot program
Building an agent that needs MPP or x402? Free credits, direct founder-to-founder integration support, public co-marketing on this site.
Email →Five public repos
SDK, MCP server, classifier, and the awesome-arcrouter list — all MIT-licensed under the ArcRouterAI org on GitHub. Audit the wire format, run it yourself.
GitHub →Built for the workloads agents actually run.
Autonomous agents that pay their own way
Drop ArcRouter into LangChain, AutoGen, Claude Agent SDK, or any OpenAI-compatible tool. The agent signs MPP or x402 micropayments per call — no API key, no subscription, no human in the loop.
Most reliable on: tool-calling, multi-step plans, long-horizon tasks.
Budgets that stop a runaway loop
Set a total budget for a workflow. The router tracks spend across every step and auto-downgrades to cheaper models at 60/80/95%. Returns 402 when exhausted — your agent gets a hard stop instead of a surprise bill.
Used by: research agents, code-gen pipelines, content workflows.
Council mode for high-stakes prompts
3–7 models answer in parallel. The router computes agreement and surfaces the consensus. When models disagree, a Chairman model synthesizes. Use for legal, medical, financial, or any prompt where one wrong answer costs more than five right ones.
Same OpenAI API. Add mode: 'council'. Pay 5x tier price.
Try every model without picking one
Routing decisions log to D1. Hit /v1/usage to see which models won for your traffic. Swap providers later without changing code — the router resolves model aliases (claude, gpt, gemini, deepseek) to whichever model currently benchmarks best.
Replace 5 provider SDKs with one base URL.
Route.
Optimize.
Save.
Manual model selection vs. automatic benchmark-based routing. Same quality, lower cost.
$ query: "Explain OAuth 2.0 authorization code flow"
{
"response": "OAuth 2.0 authorization code flow is...",
"routing": {
"topic": "code/security",
"selected_model": "deepseek/deepseek-chat",
"quality_score": 82,
"value_score": 94,
"data_source": "database"
},
"cost": "$0.0008",
"latency": "1.4s",
"savings": "89% vs GPT-5"
}Right model. Per call. Wallet pays.
Playground
Routes to the best model for your prompt — see topic, complexity, model selected, and cost
Simple, transparent
Try it. No signup.
- —Smart routing, free models only
- —20 requests / hour per IP
- —No API key required
- —OpenAI SDK compatible
Agents & builders
- —SIMPLE $0.001 / MEDIUM $0.002
- —COMPLEX $0.005 / REASONING $0.012
- —budget=premium → flat $0.015 (frontier models)
- —Council verification = 5x tier price
- —MPP (Tempo) + x402 (Base) USDC
- —1,000 req / hour per wallet
EU edge. ZDR. BYOK.
- —EU edge processing (Cloudflare)
- —Zero data retention by default
- —Bring-your-own-keys (roadmap)
- —Negotiated SLA + direct support
- —Volume pricing