Model Rankings
Live benchmark scores for 0 language models. Data aggregated daily from HuggingFace Open LLM Leaderboard, LiveBench, and LiveCodeBench.
All Models
Click column headers to sort. Quality scores are composites from multiple benchmarks per domain (0-100 scale).
Loading models...
Methodology
Data Sources
- HuggingFace Open LLM Leaderboard — General, reasoning, and science benchmarks across 130+ models
- LiveBench — Monthly refreshed benchmarks for coding, math, reasoning, writing, and data analysis
- LiveCodeBench — Contamination-free code generation benchmark from LeetCode, Codeforces, and AtCoder
- OpenRouter — Real-time pricing data for 345+ models
Scoring
- Quality score — Weighted composite of benchmark results per domain (0-100). Higher is better.
- Value score — Quality adjusted for price. Factors in cost-effectiveness so cheaper models with good scores rank higher.
- Domain scores — Per-domain quality scores. A model may excel at code but be average at writing.
How ArcRouter Uses These Scores
When you send a query, our semantic routing engine detects the topic, queries the benchmark database for the best-scoring models in that domain, and uses embedding-based reranking to select the optimal model. This means your math question goes to the best math model, your code question goes to the best code model — automatically, at the lowest cost.
Data freshness: Scores are recomputed daily at 06:00 UTC via automated scrapers. Pricing is updated from OpenRouter in real-time. Benchmark data may lag 1-7 days behind source leaderboards.