Rankings

Model Rankings

Live benchmark scores for 0 language models. Data aggregated daily from HuggingFace Open LLM Leaderboard, LiveBench, and LiveCodeBench.

TOTAL MODELS
SCORED
FREE MODELS
LAST UPDATED

All Models

Click column headers to sort. Quality scores are composites from multiple benchmarks per domain (0-100 scale).

0 models

Loading models...

Methodology

Data Sources

  • HuggingFace Open LLM Leaderboard — General, reasoning, and science benchmarks across 130+ models
  • LiveBench — Monthly refreshed benchmarks for coding, math, reasoning, writing, and data analysis
  • LiveCodeBench — Contamination-free code generation benchmark from LeetCode, Codeforces, and AtCoder
  • OpenRouter — Real-time pricing data for 345+ models

Scoring

  • Quality score — Weighted composite of benchmark results per domain (0-100). Higher is better.
  • Value score — Quality adjusted for price. Factors in cost-effectiveness so cheaper models with good scores rank higher.
  • Domain scores — Per-domain quality scores. A model may excel at code but be average at writing.

How ArcRouter Uses These Scores

When you send a query, our semantic routing engine detects the topic, queries the benchmark database for the best-scoring models in that domain, and uses embedding-based reranking to select the optimal model. This means your math question goes to the best math model, your code question goes to the best code model — automatically, at the lowest cost.

Data freshness: Scores are recomputed daily at 06:00 UTC via automated scrapers. Pricing is updated from OpenRouter in real-time. Benchmark data may lag 1-7 days behind source leaderboards.