Agent-native LLM gateway · MPP + x402

Route

smarter.

Pay

less.

One OpenAI-compatible endpoint. 345+ models scored on real benchmarks, routed by topic and complexity. Pay per call in USDC on Tempo or Base — no signup, no subscription, no API key. Add council mode when one model is not enough.

Try Playground
Why ArcRouter

Save up to 90% on AI costs.

Most prompts don't need a frontier model. ArcRouter classifies every request by topic and complexity, then routes to the cheapest model that still passes the quality bar. Same OpenAI-compatible endpoint. Pay per call in USDC on Tempo or Base.

0%

Up to

Cost savings on routes that would otherwise hit a frontier model (Sonnet, Opus, GPT-5). The router picks the cheapest model that still passes the benchmark for that topic and complexity tier — no quality drop.

0+

Models scored

Prices refresh daily from OpenRouter. Quality scores blend LiveBench, LiveCodeBench, GPQA Diamond, and HuggingFace evals. No synthetic-only ranks at the top.

0

Topic categories

Granular routing: code/frontend, math/calculus, science/physics. Plus 4 complexity tiers and agentic detection — better than picking one model for everything.

# Drop-in OpenAI replacement. Works everywhere.
$https://api.arcrouter.com/v1/chat/completions

Compatible with OpenAI SDKs, Claude Code, Cursor, Cline, and all OpenAI-compatible tools.

Three ways to integrate

Drop in. Pick your path.

Same OpenAI-compatible endpoint underneath. Pick the path that matches the surface you already work in — pure HTTP, our typed SDK, or an MCP server for AI coding tools.

Path A

OpenAI SDK drop-in

Use the openai package you already have. Change the base URL. Done.

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.arcrouter.com/v1",
  apiKey: "mpp-handled-via-mppx",
});

const res = await client.chat.completions.create({
  messages: [{ role: "user", content: "..." }],
  model: "auto",
});

Same shape OpenAI returns, plus a routing object with model picked, cost, savings.

Path B

TypeScript SDK

@arcrouter/sdk wraps openai. Auto-handles x402 wallet signing + retry.

import { ArcRouter } from "@arcrouter/sdk";

const arc = new ArcRouter({
  wallet: { privateKey: process.env.X402_KEY },
  budget: "auto",
});

const res = await arc.chat("...");
const verified = await arc.council("...");

Same SDK supports streaming, council, workflow budgets, model aliases.

Path C

MCP for Claude Code / Cursor

One command. Adds three tools to any MCP client: chat, models, health.

claude mcp add arcrouter \
  --transport http \
  https://api.arcrouter.com/mcp

# Then ask Claude Code:
#   "route this prompt to the best model"
# It calls arcrouter_chat automatically.

Works in Claude Code, Cursor, Cline, Windsurf, and any HTTP MCP host.

All three paths hit the same OpenAI-compatible endpoint. Switch between them as your stack changes — no lock-in to any single integration.

See full guide
How it works

How the router decides.

01

Classify the prompt

Lexical classifier scores topic (24 categories incl. subcategories like code/frontend, math/calculus, science/physics) and complexity tier (SIMPLE / MEDIUM / COMPLEX / REASONING) in under 1ms. Detects whether the request is agentic — tool calls, multi-step, or chained.

02

Shortlist candidates from D1

Query the model database for candidates matching topic + complexity + budget. Ranked by a complexity-weighted blend of quality (real benchmarks) and value (cost-adjusted). Output-price caps per tier protect margin.

03

Semantic rerank

Workers AI embeds the prompt (bge-base-en-v1.5, free) and cosine-ranks the shortlist. Final score blends semantic match (0.55) + value (0.35) + reliability (0.10). Circuit breaker skips models that have failed recently.

04

Direct or OpenRouter

If the chosen model is OpenAI / Anthropic / Google / DeepSeek / xAI — direct provider call (no markup, lower latency). Long tail goes via OpenRouter. Top-3 failover on error. SSE streaming with full routing metadata in response headers and body.

Full pipeline detail — including how complexity weights blend with value scores, embedding model choice, and direct-provider fallbacks.

Read the architecture
On-Chain Payments

Pay-per-call. Wallet-native.

Two production-grade payment rails on every endpoint. No signup, no API key, no Stripe form. Your agent points an OpenAI SDK at our URL and pays in USDC per request. We advertise the price in the 402, verify the signed proof, then return the LLM response.

MPP

Tempo · chainId 4217

Machine Payments Protocol by Tempo + Stripe. Server advertises price in WWW-Authenticate: Payment, client signs with mppx, retries. Sub-cent payments with fee sponsorship support.

Spec
RFC 7235 (HTTP Authorization)
Asset
USDC.e
Finality
~500ms
Authorization: Payment <credential>

x402

Base · chainId 8453

Coinbase x402 spec. Standard 402 → EIP-712 typed signature → retry. Compatible with @x402/core, viem, MetaMask, Coinbase Wallet, Phantom (EVM mode).

Spec
HTTP 402 Payment Required
Asset
USDC
Finality
~2s
X-PAYMENT <credential>

Dual-rail by design. The 402 challenge advertises both — your client picks whichever wallet it holds. Prices stay in parity across rails so there is no arbitrage to manage.

Read the spec
Rankings

Routing decisions backed by real benchmarks.

Every model in our shortlist has a quality score. Scores come from four independent benchmark sources, refreshed daily by a Cloudflare cron. No synthetic-only ranks at the top, no vendor-aligned bias.

1,414

Benchmark scores tracked

Across 345+ models, refreshed daily. Quality + value blended by complexity tier when shortlisting candidates.

Source

LiveBench

General reasoning, math, coding — refreshed weekly to prevent contamination.

Source

LiveCodeBench

Code generation across competitive programming problems, decontaminated.

Source

GPQA Diamond

Graduate-level physics, chemistry, biology questions.

Source

HuggingFace Open LLM

Community-run leaderboard across MMLU, HellaSwag, TruthfulQA, ARC, Winogrande.

Composite scores recalculate after every benchmark refresh. The shortlist is filtered by budget and complexity, then an embedding reranker scores semantic match against your prompt.

See the rankings
Agent Workflows

Built for the loop.

Routing a single chat message is easy. Routing 50 calls inside one agent run — with budgets, fallbacks, step semantics, and tool guarantees — is what we built for.

Step routing

X-Agent-Step

Tell the router which step of a workflow this request is. simple-action → SIMPLE tier, code-generation → COMPLEX, reasoning → REASONING, verification → council. Override the prompt classifier when your agent already knows the answer.

Spend caps

Workflow budgets

Set total_budget_usd for a session. The router auto-downgrades models at 60%, 80%, 95% spent. Returns 402 when exhausted. Per-workflow telemetry at /v1/workflow/{id}/usage — spend, models used, tier distribution.

Stickiness

Session pinning

Pass session_id and the router remembers which model worked. Subsequent calls in the same session pin to that model (1h TTL). Stops a multi-step flow from jumping providers mid-conversation.

Reliability

Tool-call guarantees

Set tool_choice: "required" and the router enforces it. Fails over to another tool-capable model if the first emits no_op natural-language instead of calling the tool. Detects tools[] and prefers tool-capable models automatically.

Used by agents running long, budgeted workflows. The router treats each step as its own routing decision while keeping spend bounded.

Read the spec
Council Verification

Multi-model verification on demand.

One model can hallucinate. Five rarely agree on the same wrong answer. Add mode: "council" to any request when accuracy matters more than latency. Pricing scales 5x to cover the parallel inference — honest, no hidden cost.

01

3–7 models answer in parallel

Diverse providers picked by topic and complexity. Avoids single-vendor bias and single-model failure modes.

02

Agreement scored

Embedding cosine similarity on paid tier, Jaccard word-overlap on free. Each answer gets an agreement score against the others.

03

Chairman synthesizes on disagreement

Confidence below 0.6? A separate Chairman model reads all answers and the critiques and synthesizes the final response.

Recommended for legal, medical, financial, and audit-grade prompts. For everyday traffic, default smart routing already picks the best single model — no need to pay 5x.

Networks · Models · Rails

One endpoint. Every major provider.

Direct connections to OpenAI, Anthropic, Google, DeepSeek, and xAI when keys are configured. OpenRouter fallback covers the long tail. Switch providers without changing code — the router resolves model aliases to whichever model currently benchmarks best for your topic.

OpenAI
Anthropic
Google
DeepSeek
xAI
Meta
Mistral
Qwen
Moonshot
Z.AI
MiniMax
OpenRouter
Running onTempoBaseUSDCCloudflare WorkersMCP
First MPP integration

Frames Websets

First production integration of MPP — the Tempo + Stripe payment protocol — against our endpoint. Agents pay per call in USDC.e on Tempo. Proves the dual-rail design end-to-end.

Want to be next?

Pilot program

Building an agent that needs MPP or x402? Free credits, direct founder-to-founder integration support, public co-marketing on this site.

Email →
Open source

Five public repos

SDK, MCP server, classifier, and the awesome-arcrouter list — all MIT-licensed under the ArcRouterAI org on GitHub. Audit the wire format, run it yourself.

GitHub →
Use Cases

Built for the workloads agents actually run.

Agent automation

Autonomous agents that pay their own way

Drop ArcRouter into LangChain, AutoGen, Claude Agent SDK, or any OpenAI-compatible tool. The agent signs MPP or x402 micropayments per call — no API key, no subscription, no human in the loop.

Most reliable on: tool-calling, multi-step plans, long-horizon tasks.

Multi-step workflows

Budgets that stop a runaway loop

Set a total budget for a workflow. The router tracks spend across every step and auto-downgrades to cheaper models at 60/80/95%. Returns 402 when exhausted — your agent gets a hard stop instead of a surprise bill.

Used by: research agents, code-gen pipelines, content workflows.

Answers that must be right

Council mode for high-stakes prompts

3–7 models answer in parallel. The router computes agreement and surfaces the consensus. When models disagree, a Chairman model synthesizes. Use for legal, medical, financial, or any prompt where one wrong answer costs more than five right ones.

Same OpenAI API. Add mode: 'council'. Pay 5x tier price.

Cost-controlled R&D

Try every model without picking one

Routing decisions log to D1. Hit /v1/usage to see which models won for your traffic. Swap providers later without changing code — the router resolves model aliases (claude, gpt, gemini, deepseek) to whichever model currently benchmarks best.

Replace 5 provider SDKs with one base URL.

Compare

Route.
Optimize.
Save.

Manual model selection vs. automatic benchmark-based routing. Same quality, lower cost.

Topic detectionValue scoreCost savings
›_comparison.json
$ query: "Explain OAuth 2.0 authorization code flow"

{
  "response": "OAuth 2.0 authorization code flow is...",
  "routing": {
    "topic": "code/security",
    "selected_model": "deepseek/deepseek-chat",
    "quality_score": 82,
    "value_score": 94,
    "data_source": "database"
  },
  "cost": "$0.0008",
  "latency": "1.4s",
  "savings": "89% vs GPT-5"
}
Value Score94
Savings89%
Routing DetailsBenchmark verified
Topic: code/security
Quality score: 82/100
Selected: DeepSeek V3.2
Illustrative example • real responses will vary

Right model. Per call. Wallet pays.

Try It Live

Playground

Routes to the best model for your prompt — see topic, complexity, model selected, and cost

Pricing

Simple, transparent

Free

Try it. No signup.

$0/ forever
  • Smart routing, free models only
  • 20 requests / hour per IP
  • No API key required
  • OpenAI SDK compatible
Try in Playground
Pay-per-call

Agents & builders

$0.001/ request and up
  • SIMPLE $0.001 / MEDIUM $0.002
  • COMPLEX $0.005 / REASONING $0.012
  • budget=premium → flat $0.015 (frontier models)
  • Council verification = 5x tier price
  • MPP (Tempo) + x402 (Base) USDC
  • 1,000 req / hour per wallet
Enterprise

EU edge. ZDR. BYOK.

Custom
  • EU edge processing (Cloudflare)
  • Zero data retention by default
  • Bring-your-own-keys (roadmap)
  • Negotiated SLA + direct support
  • Volume pricing
Talk to founder