Rankings

Model Rankings

Live benchmark scores for 0 language models. Data aggregated daily from HuggingFace Open LLM Leaderboard, LiveBench, and LiveCodeBench.

TOTAL MODELS

—

SCORED

—

FREE MODELS

—

LAST UPDATED

—

All Models

Click column headers to sort. Quality scores are composites from multiple benchmarks per domain (0-100 scale).

0 models

Loading models...

Methodology

Data Sources

HuggingFace Open LLM Leaderboard — General, reasoning, and science benchmarks across 130+ models
LiveBench — Monthly refreshed benchmarks for coding, math, reasoning, writing, and data analysis
LiveCodeBench — Contamination-free code generation benchmark from LeetCode, Codeforces, and AtCoder
OpenRouter — Real-time pricing data for 345+ models

Scoring

Quality score — Weighted composite of benchmark results per domain (0-100). Higher is better.
Value score — Quality adjusted for price. Factors in cost-effectiveness so cheaper models with good scores rank higher.
Domain scores — Per-domain quality scores. A model may excel at code but be average at writing.

How ArcRouter Uses These Scores

When you send a query, our semantic routing engine detects the topic, queries the benchmark database for the best-scoring models in that domain, and uses embedding-based reranking to select the optimal model. This means your math question goes to the best math model, your code question goes to the best code model — automatically, at the lowest cost.

Data freshness: Scores are recomputed daily at 06:00 UTC via automated scrapers. Pricing is updated from OpenRouter in real-time. Benchmark data may lag 1-7 days behind source leaderboards.