Independent analysis of AI

Understand the AI landscape to choose the best model and provider for your use case

Highlights

Intelligence
Artificial Analysis Intelligence Index; Higher is better
Intelligence: Artificial Analysis Intelligence Index; Higher is better
  1. GPT-5.5 (xhigh): 60
  2. Claude Opus 4.7 (max): 57
  3. Gemini 3.1 Pro Preview: 57
  4. GPT-5.4 (xhigh): 57
  5. Kimi K2.6: 54
  6. MiMo-V2.5-Pro: 54
  7. Muse Spark: 52
  8. DeepSeek V4 Pro (Max): 52
  9. Grok 4.20 0309 v2: 49
  10. NVIDIA Nemotron 3 Super: 36
  11. gpt-oss-120B (high): 33
Speed
Output Tokens per Second; Higher is better
Speed: Output Tokens per Second; Higher is better
  1. gpt-oss-120B (high): 209
  2. NVIDIA Nemotron 3 Super: 154
  3. Gemini 3.1 Pro Preview: 123
  4. Grok 4.20 0309 v2: 115
  5. Kimi K2.6: 112
  6. GPT-5.4 (xhigh): 79
  7. GPT-5.5 (xhigh): 74
  8. MiMo-V2.5-Pro: 60
  9. Claude Opus 4.7 (max): 49
  10. DeepSeek V4 Pro (Max): 36
Price
USD per 1M Tokens; Lower is better
Price: USD per 1M Tokens; Lower is better
  1. gpt-oss-120B (high): 0.3
  2. NVIDIA Nemotron 3 Super: 0.4
  3. MiMo-V2.5-Pro: 1.5
  4. Kimi K2.6: 1.7
  5. DeepSeek V4 Pro (Max): 2.2
  6. Grok 4.20 0309 v2: 3
  7. Gemini 3.1 Pro Preview: 4.5
  8. GPT-5.4 (xhigh): 5.6
  9. Claude Opus 4.7 (max): 10
  10. GPT-5.5 (xhigh): 11.3
Get personalized recommendations based on your priorities for intelligence, speed, and cost.
Personalized Model Recommendation
Compare AI agents across capabilities, pricing, and platform support.
Explore agents for general work, coding, customer support, and more
What is the Artificial Analysis Intelligence Index?
Learn about the Artificial Analysis Intelligence Index and how it is calculated

Changelog

New article published · 24 Apr
DeepSeek is back among the leading open weights models with V4 Pro and V4 Flash
New language model evaluation · 24 Apr
Qwen3.6 27B (Non-reasoning)Qwen3.6 27B (Non-reasoning)
New article published · 23 Apr
OpenAI's GPT-5.5 is the new leading AI model
New language model evaluation · 23 Apr
DeepSeek V4 Pro (Reasoning, Max Effort)DeepSeek V4 Pro (Reasoning, Max Effort)
New language model evaluation · 23 Apr
DeepSeek V4 Pro (Reasoning, High Effort)DeepSeek V4 Pro (Reasoning, High Effort)
New language model evaluation · 23 Apr
DeepSeek V4 Flash (Reasoning, Max Effort)DeepSeek V4 Flash (Reasoning, Max Effort)
New language model evaluation · 23 Apr
DeepSeek V4 Flash (Reasoning, High Effort)DeepSeek V4 Flash (Reasoning, High Effort)
New language model evaluation · 23 Apr
GPT-5.5 (low)GPT-5.5 (low)
New language model evaluation · 23 Apr
GPT-5.5 (high)GPT-5.5 (high)
New language model evaluation · 23 Apr
GPT-5.5 (Non-reasoning)GPT-5.5 (Non-reasoning)
New language model evaluation · 23 Apr
GPT-5.5 (medium)GPT-5.5 (medium)
New language model evaluation · 23 Apr
GPT-5.5 (xhigh)GPT-5.5 (xhigh)
New language model evaluation · 23 Apr
Ling-2.6-1TLing-2.6-1T
New language model evaluation · 23 Apr
Qwen3.6 27B (Reasoning)Qwen3.6 27B (Reasoning)
New language model evaluation · 22 Apr
MiMo-V2.5-ProMiMo-V2.5-Pro
New language model evaluation · 21 Apr
Claude Opus 4.7 (Non-reasoning, High Effort)Claude Opus 4.7 (Non-reasoning, High Effort)
New language model evaluation · 21 Apr
Qwen3.6 35B A3B (Non-reasoning)Qwen3.6 35B A3B (Non-reasoning)
New language model evaluation · 21 Apr
Ling 2.6 FlashLing 2.6 Flash
New article published · 20 Apr
Kimi K2.6: The new leading open weights model
New language model evaluation · 20 Apr
Kimi K2.6Kimi K2.6See more

Intelligence

Intelligence of leading AI models based on our independent evaluations

Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt
GPT-5.5 (xhigh)Logo of GPT-5.5 (xhigh) which relates to the data aboveGPT-5.5(xhigh)Reasoning modelClaude Opus 4.7 (max)Logo of Claude Opus 4.7 (max) which relates to the data aboveClaude Opus4.7 (max)Reasoning modelGemini 3.1 Pro PreviewLogo of Gemini 3.1 Pro Preview which relates to the data aboveGemini 3.1Pro PreviewReasoning modelGPT-5.4 (xhigh)Logo of GPT-5.4 (xhigh) which relates to the data aboveGPT-5.4(xhigh)Reasoning modelKimi K2.6Logo of Kimi K2.6 which relates to the data aboveKimi K2.6Reasoning modelMiMo-V2.5-ProLogo of MiMo-V2.5-Pro which relates to the data aboveMiMo-V2.5-ProReasoning modelMuse SparkLogo of Muse Spark which relates to the data aboveMuse SparkReasoning modelQwen3.6 Max PreviewLogo of Qwen3.6 Max Preview which relates to the data aboveQwen3.6 MaxPreviewReasoning modelClaude Sonnet 4.6 (max)Logo of Claude Sonnet 4.6 (max) which relates to the data aboveClaudeSonnet 4.6(max) Reasoning modelDeepSeek V4 Pro (Max)Logo of DeepSeek V4 Pro (Max) which relates to the data aboveDeepSeek V4Pro (Max)Reasoning modelGLM-5.1Logo of GLM-5.1 which relates to the data aboveGLM-5.1Reasoning modelMiniMax-M2.7Logo of MiniMax-M2.7 which relates to the data aboveMiniMax-M2.7Reasoning modelGrok 4.20 0309 v2Logo of Grok 4.20 0309 v2 which relates to the data aboveGrok 4.200309 v2Reasoning modelGPT-5.4 mini (xhigh)Logo of GPT-5.4 mini (xhigh) which relates to the data aboveGPT-5.4 mini(xhigh)Reasoning modelDeepSeek V4 Flash (Max)Logo of DeepSeek V4 Flash (Max) which relates to the data aboveDeepSeek V4Flash (Max)Reasoning modelGemini 3 FlashLogo of Gemini 3 Flash which relates to the data aboveGemini 3FlashReasoning modelQwen3.5 397B A17BLogo of Qwen3.5 397B A17B which relates to the data aboveQwen3.5 397BA17BReasoning modelDeepSeek V3.2Logo of DeepSeek V3.2 which relates to the data aboveDeepSeekV3.2Reasoning modelGemma 4 31BLogo of Gemma 4 31B which relates to the data aboveGemma 4 31BReasoning modelClaude 4.5 HaikuLogo of Claude 4.5 Haiku which relates to the data aboveClaude 4.5HaikuReasoning modelNVIDIA Nemotron 3 SuperLogo of NVIDIA Nemotron 3 Super which relates to the data aboveNVIDIANemotron 3Super Reasoning modelNova 2.0 Pro Preview (medium)Logo of Nova 2.0 Pro Preview (medium) which relates to the data aboveNova 2.0 ProPreview(medium) Reasoning modelgpt-oss-120B (high)Logo of gpt-oss-120B (high) which relates to the data abovegpt-oss-120B(high)Reasoning modelMistral Small 4Logo of Mistral Small 4 which relates to the data aboveMistralSmall 4Reasoning modelSolar Pro 3Logo of Solar Pro 3 which relates to the data aboveSolar Pro 3Reasoning modelgpt-oss-20B (high)Logo of gpt-oss-20B (high) which relates to the data abovegpt-oss-20B(high)Reasoning modelK2 Think V2Logo of K2 Think V2 which relates to the data aboveK2 Think V2Reasoning model605757575454525252525150494947464542393736363328262424
Reasoning models are indicated by a lightbulb icon.

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Artificial Analysis Intelligence Index by Open Weights / Proprietary

Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt
Proprietary
Open Weights
Open Weights (Commercial Use Restricted)
GPT-5.5 (xhigh)Logo of GPT-5.5 (xhigh) which relates to the data aboveGPT-5.5(xhigh)Reasoning modelClaude Opus 4.7 (max)Logo of Claude Opus 4.7 (max) which relates to the data aboveClaude Opus4.7 (max)Reasoning modelGemini 3.1 Pro PreviewLogo of Gemini 3.1 Pro Preview which relates to the data aboveGemini 3.1Pro PreviewReasoning modelGPT-5.4 (xhigh)Logo of GPT-5.4 (xhigh) which relates to the data aboveGPT-5.4(xhigh)Reasoning modelKimi K2.6Logo of Kimi K2.6 which relates to the data aboveKimi K2.6Reasoning modelMiMo-V2.5-ProLogo of MiMo-V2.5-Pro which relates to the data aboveMiMo-V2.5-ProReasoning modelMuse SparkLogo of Muse Spark which relates to the data aboveMuse SparkReasoning modelQwen3.6 Max PreviewLogo of Qwen3.6 Max Preview which relates to the data aboveQwen3.6 MaxPreviewReasoning modelClaude Sonnet 4.6 (max)Logo of Claude Sonnet 4.6 (max) which relates to the data aboveClaudeSonnet 4.6(max) Reasoning modelDeepSeek V4 Pro (Max)Logo of DeepSeek V4 Pro (Max) which relates to the data aboveDeepSeek V4Pro (Max)Reasoning modelGLM-5.1Logo of GLM-5.1 which relates to the data aboveGLM-5.1Reasoning modelMiniMax-M2.7Logo of MiniMax-M2.7 which relates to the data aboveMiniMax-M2.7Reasoning modelGrok 4.20 0309 v2Logo of Grok 4.20 0309 v2 which relates to the data aboveGrok 4.200309 v2Reasoning modelGPT-5.4 mini (xhigh)Logo of GPT-5.4 mini (xhigh) which relates to the data aboveGPT-5.4 mini(xhigh)Reasoning modelDeepSeek V4 Flash (Max)Logo of DeepSeek V4 Flash (Max) which relates to the data aboveDeepSeek V4Flash (Max)Reasoning modelGemini 3 FlashLogo of Gemini 3 Flash which relates to the data aboveGemini 3FlashReasoning modelQwen3.5 397B A17BLogo of Qwen3.5 397B A17B which relates to the data aboveQwen3.5 397BA17BReasoning modelDeepSeek V3.2Logo of DeepSeek V3.2 which relates to the data aboveDeepSeekV3.2Reasoning modelGemma 4 31BLogo of Gemma 4 31B which relates to the data aboveGemma 4 31BReasoning modelClaude 4.5 HaikuLogo of Claude 4.5 Haiku which relates to the data aboveClaude 4.5HaikuReasoning modelNVIDIA Nemotron 3 SuperLogo of NVIDIA Nemotron 3 Super which relates to the data aboveNVIDIANemotron 3Super Reasoning modelNova 2.0 Pro Preview (medium)Logo of Nova 2.0 Pro Preview (medium) which relates to the data aboveNova 2.0 ProPreview(medium) Reasoning modelgpt-oss-120B (high)Logo of gpt-oss-120B (high) which relates to the data abovegpt-oss-120B(high)Reasoning modelMistral Small 4Logo of Mistral Small 4 which relates to the data aboveMistralSmall 4Reasoning modelSolar Pro 3Logo of Solar Pro 3 which relates to the data aboveSolar Pro 3Reasoning modelgpt-oss-20B (high)Logo of gpt-oss-20B (high) which relates to the data abovegpt-oss-20B(high)Reasoning modelK2 Think V2Logo of K2 Think V2 which relates to the data aboveK2 Think V2Reasoning model605757575454525252525150494947464542393736363328262424
Reasoning models are indicated by a lightbulb icon.

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Indicates whether the model weights are available. Models are labelled as 'Commercial Use Restricted' if the weights are available but commercial use is limited (typically requires obtaining a paid license).

Intelligence vs. Cost to Run Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index; Cost to Run Intelligence Index
Most attractive quadrant
Alibaba
Amazon
Anthropic
DeepSeek
Google
Kimi
MiniMax
Mistral
NVIDIA
OpenAI
xAI
Xiaomi
Z AI
32641282565121.02k2.05k4.10k8.19kCost to Run Intelligence Index (USD, Log Scale)20253035404550556065Artificial Analysis Intelligence Indexgpt-oss-20B (high)Mistral Small 4gpt-oss-120B (high)DeepSeek V3.2DeepSeek V4 Flash (Max)NVIDIA Nemotron 3 SuperMiniMax-M2.7Gemini 3 FlashQwen3.5 397B A17BMiMo-V2.5-ProNova 2.0 Pro Preview (medium)Grok 4.20 0309 v2GLM-5.1Claude 4.5 HaikuQwen3.6 Max PreviewGemini 3.1 Pro PreviewKimi K2.6DeepSeek V4 Pro (Max)GPT-5.4 mini (xhigh)GPT-5.4 (xhigh)GPT-5.5 (xhigh)Claude Sonnet 4.6 (max)Claude Opus 4.7 (max)
Reasoning models are indicated by a lightbulb icon.

The cost to run the evaluations in the Artificial Analysis Intelligence Index, calculated using the model's input and output token pricing and the number of tokens used across evaluations (excluding repeats).

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Frontier Language Model Intelligence, Over Time

Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt
Alibaba
Anthropic
DeepSeek
Google
Kimi
MBZUAI Institute of Foundation Models
Meta
MiniMax
Mistral
OpenAI
Upstage
xAI
Xiaomi
Z AI
Nov ’22Jan ’23Mar ’23May ’23Jul ’23Sep ’23Nov ’23Jan ’24Mar ’24May ’24Jul ’24Sep ’24Nov ’24Jan ’25Mar ’25May ’25Jul ’25Sep ’25Nov ’25Jan ’26Mar ’26May ’26Release Date0510152025303540455055606570Artificial Analysis Intelligence Index
Reasoning models are indicated by a lightbulb icon.

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Image & Video Leaderboards

Top models from our Image Arena and Video Arena leaderboards, with 95% confidence intervals

Text to Image Leaderboard

ELO scores from blind preference votes in our Image Arena. See the full leaderboard here.
GPT Image 2 (high)Logo of GPT Image 2 (high) which relates to the data aboveGPT Image 2(high)GPT Image 1.5 (high)Logo of GPT Image 1.5 (high) which relates to the data aboveGPT Image 1.5(high)Nano Banana 2 (Gemini 3.1 Flash Image Preview)Logo of Nano Banana 2 (Gemini 3.1 Flash Image Preview) which relates to the data aboveNano Banana 2(Gemini 3.1 FlashImage Preview)Nano Banana Pro (Gemini 3 Pro Image)Logo of Nano Banana Pro (Gemini 3 Pro Image) which relates to the data aboveNano Banana Pro(Gemini 3 ProImage) FLUX.2 [max]Logo of FLUX.2 [max] which relates to the data aboveFLUX.2 [max]Seedream 4.0Logo of Seedream 4.0 which relates to the data aboveSeedream 4.0MAI-Image-2Logo of MAI-Image-2 which relates to the data aboveMAI-Image-2FLUX.2 [pro]Logo of FLUX.2 [pro] which relates to the data aboveFLUX.2 [pro]grok-imagine-imageLogo of grok-imagine-image which relates to the data abovegrok-imagine-imageFLUX.2 [flex]Logo of FLUX.2 [flex] which relates to the data aboveFLUX.2 [flex]ImagineArt 2.0Logo of ImagineArt 2.0 which relates to the data aboveImagineArt 2.0Imagen 4 UltraLogo of Imagen 4 Ultra which relates to the data aboveImagen 4 UltraFLUX.2 [dev] TurboLogo of FLUX.2 [dev] Turbo which relates to the data aboveFLUX.2 [dev]TurboSeedream 4.5Logo of Seedream 4.5 which relates to the data aboveSeedream 4.5Qwen Image Max 2512Logo of Qwen Image Max 2512 which relates to the data aboveQwen Image Max2512133212701263121612051202119811891186118011781175116911671162

Intelligence Breakdown

Intelligence Evaluations

Intelligence evaluations measured independently by Artificial Analysis; Higher is better
Results claimed by AI Lab (not yet independently verified)
GDPval-AA
GPT-5.5 (xhigh)Logo of GPT-5.5 (xhigh) which relates to the data aboveGPT-5.5 (xhigh)Reasoning modelClaude Opus 4.7 (max)Logo of Claude Opus 4.7 (max) which relates to the data aboveClaude Opus 4.7(max)Reasoning modelClaude Sonnet 4.6 (max)Logo of Claude Sonnet 4.6 (max) which relates to the data aboveClaude Sonnet 4.6(max)Reasoning modelGPT-5.4 (xhigh)Logo of GPT-5.4 (xhigh) which relates to the data aboveGPT-5.4 (xhigh)Reasoning modelMiMo-V2.5-ProLogo of MiMo-V2.5-Pro which relates to the data aboveMiMo-V2.5-ProReasoning modelDeepSeek V4 Pro (Max)Logo of DeepSeek V4 Pro (Max) which relates to the data aboveDeepSeek V4 Pro(Max)Reasoning modelGLM-5.1Logo of GLM-5.1 which relates to the data aboveGLM-5.1Reasoning modelMiniMax-M2.7Logo of MiniMax-M2.7 which relates to the data aboveMiniMax-M2.7Reasoning modelQwen3.6 Max PreviewLogo of Qwen3.6 Max Preview which relates to the data aboveQwen3.6 Max PreviewReasoning modelKimi K2.6Logo of Kimi K2.6 which relates to the data aboveKimi K2.6Reasoning modelGPT-5.4 mini (xhigh)Logo of GPT-5.4 mini (xhigh) which relates to the data aboveGPT-5.4 mini(xhigh)Reasoning modelMuse SparkLogo of Muse Spark which relates to the data aboveMuse SparkReasoning modelDeepSeek V4 Flash (Max)Logo of DeepSeek V4 Flash (Max) which relates to the data aboveDeepSeek V4 Flash(Max)Reasoning modelGemini 3.1 Pro PreviewLogo of Gemini 3.1 Pro Preview which relates to the data aboveGemini 3.1 ProPreviewReasoning modelGemini 3 FlashLogo of Gemini 3 Flash which relates to the data aboveGemini 3 FlashReasoning modelDeepSeek V3.2Logo of DeepSeek V3.2 which relates to the data aboveDeepSeek V3.2Reasoning modelQwen3.5 397B A17BLogo of Qwen3.5 397B A17B which relates to the data aboveQwen3.5 397B A17BReasoning modelGrok 4.20 0309 v2Logo of Grok 4.20 0309 v2 which relates to the data aboveGrok 4.20 0309 v2Reasoning modelClaude 4.5 HaikuLogo of Claude 4.5 Haiku which relates to the data aboveClaude 4.5 HaikuReasoning modelGemma 4 31BLogo of Gemma 4 31B which relates to the data aboveGemma 4 31BReasoning modelNVIDIA Nemotron 3 SuperLogo of NVIDIA Nemotron 3 Super which relates to the data aboveNVIDIA Nemotron 3SuperReasoning modelNova 2.0 Pro Preview (medium)Logo of Nova 2.0 Pro Preview (medium) which relates to the data aboveNova 2.0 ProPreview (medium)Reasoning modelgpt-oss-120B (high)Logo of gpt-oss-120B (high) which relates to the data abovegpt-oss-120B (high)Reasoning modelMistral Small 4Logo of Mistral Small 4 which relates to the data aboveMistral Small 4Reasoning modelSolar Pro 3Logo of Solar Pro 3 which relates to the data aboveSolar Pro 3Reasoning modelgpt-oss-20B (high)Logo of gpt-oss-20B (high) which relates to the data abovegpt-oss-20B (high)Reasoning modelK2 Think V2Logo of K2 Think V2 which relates to the data aboveK2 Think V2Reasoning model64%63%59%59%54%53%52%51%50%49%47%46%44%41%35%35%35%34%34%31%25%24%22%18%9%8%5%
Terminal-Bench Hard
GPT-5.5 (xhigh)Logo of GPT-5.5 (xhigh) which relates to the data aboveGPT-5.5 (xhigh)Reasoning modelGPT-5.4 (xhigh)Logo of GPT-5.4 (xhigh) which relates to the data aboveGPT-5.4 (xhigh)Reasoning modelGemini 3.1 Pro PreviewLogo of Gemini 3.1 Pro Preview which relates to the data aboveGemini 3.1 ProPreviewReasoning modelClaude Sonnet 4.6 (max)Logo of Claude Sonnet 4.6 (max) which relates to the data aboveClaude Sonnet 4.6(max)Reasoning modelGPT-5.4 mini (xhigh)Logo of GPT-5.4 mini (xhigh) which relates to the data aboveGPT-5.4 mini(xhigh)Reasoning modelClaude Opus 4.7 (max)Logo of Claude Opus 4.7 (max) which relates to the data aboveClaude Opus 4.7(max)Reasoning modelDeepSeek V4 Pro (Max)Logo of DeepSeek V4 Pro (Max) which relates to the data aboveDeepSeek V4 Pro(Max)Reasoning modelMuse SparkLogo of Muse Spark which relates to the data aboveMuse SparkReasoning modelKimi K2.6Logo of Kimi K2.6 which relates to the data aboveKimi K2.6Reasoning modelQwen3.6 Max PreviewLogo of Qwen3.6 Max Preview which relates to the data aboveQwen3.6 Max PreviewReasoning modelMiMo-V2.5-ProLogo of MiMo-V2.5-Pro which relates to the data aboveMiMo-V2.5-ProReasoning modelGLM-5.1Logo of GLM-5.1 which relates to the data aboveGLM-5.1Reasoning modelQwen3.5 397B A17BLogo of Qwen3.5 397B A17B which relates to the data aboveQwen3.5 397B A17BReasoning modelMiniMax-M2.7Logo of MiniMax-M2.7 which relates to the data aboveMiniMax-M2.7Reasoning modelGemini 3 FlashLogo of Gemini 3 Flash which relates to the data aboveGemini 3 FlashReasoning modelGrok 4.20 0309 v2Logo of Grok 4.20 0309 v2 which relates to the data aboveGrok 4.20 0309 v2Reasoning modelGemma 4 31BLogo of Gemma 4 31B which relates to the data aboveGemma 4 31BReasoning modelDeepSeek V4 Flash (Max)Logo of DeepSeek V4 Flash (Max) which relates to the data aboveDeepSeek V4 Flash(Max)Reasoning modelDeepSeek V3.2Logo of DeepSeek V3.2 which relates to the data aboveDeepSeek V3.2Reasoning modelNVIDIA Nemotron 3 SuperLogo of NVIDIA Nemotron 3 Super which relates to the data aboveNVIDIA Nemotron 3SuperReasoning modelClaude 4.5 HaikuLogo of Claude 4.5 Haiku which relates to the data aboveClaude 4.5 HaikuReasoning modelNova 2.0 Pro Preview (medium)Logo of Nova 2.0 Pro Preview (medium) which relates to the data aboveNova 2.0 ProPreview (medium)Reasoning modelgpt-oss-120B (high)Logo of gpt-oss-120B (high) which relates to the data abovegpt-oss-120B (high)Reasoning modelMistral Small 4Logo of Mistral Small 4 which relates to the data aboveMistral Small 4Reasoning modelgpt-oss-20B (high)Logo of gpt-oss-20B (high) which relates to the data abovegpt-oss-20B (high)Reasoning modelSolar Pro 3Logo of Solar Pro 3 which relates to the data aboveSolar Pro 3Reasoning modelK2 Think V2Logo of K2 Think V2 which relates to the data aboveK2 Think V2Reasoning model61%58%54%53%52%52%46%46%44%44%43%43%41%39%39%38%36%36%36%29%27%24%24%17%11%8%7%
𝜏²-Bench Telecom
GLM-5.1Logo of GLM-5.1 which relates to the data aboveGLM-5.1Reasoning modelDeepSeek V4 Pro (Max)Logo of DeepSeek V4 Pro (Max) which relates to the data aboveDeepSeek V4 Pro(Max)Reasoning modelKimi K2.6Logo of Kimi K2.6 which relates to the data aboveKimi K2.6Reasoning modelQwen3.6 Max PreviewLogo of Qwen3.6 Max Preview which relates to the data aboveQwen3.6 Max PreviewReasoning modelGemini 3.1 Pro PreviewLogo of Gemini 3.1 Pro Preview which relates to the data aboveGemini 3.1 ProPreviewReasoning modelQwen3.5 397B A17BLogo of Qwen3.5 397B A17B which relates to the data aboveQwen3.5 397B A17BReasoning modelDeepSeek V4 Flash (Max)Logo of DeepSeek V4 Flash (Max) which relates to the data aboveDeepSeek V4 Flash(Max)Reasoning modelMiMo-V2.5-ProLogo of MiMo-V2.5-Pro which relates to the data aboveMiMo-V2.5-ProReasoning modelGPT-5.5 (xhigh)Logo of GPT-5.5 (xhigh) which relates to the data aboveGPT-5.5 (xhigh)Reasoning modelGrok 4.20 0309 v2Logo of Grok 4.20 0309 v2 which relates to the data aboveGrok 4.20 0309 v2Reasoning modelNova 2.0 Pro Preview (medium)Logo of Nova 2.0 Pro Preview (medium) which relates to the data aboveNova 2.0 ProPreview (medium)Reasoning modelMuse SparkLogo of Muse Spark which relates to the data aboveMuse SparkReasoning modelDeepSeek V3.2Logo of DeepSeek V3.2 which relates to the data aboveDeepSeek V3.2Reasoning modelClaude Opus 4.7 (max)Logo of Claude Opus 4.7 (max) which relates to the data aboveClaude Opus 4.7(max)Reasoning modelGPT-5.4 (xhigh)Logo of GPT-5.4 (xhigh) which relates to the data aboveGPT-5.4 (xhigh)Reasoning modelSolar Pro 3Logo of Solar Pro 3 which relates to the data aboveSolar Pro 3Reasoning modelMiniMax-M2.7Logo of MiniMax-M2.7 which relates to the data aboveMiniMax-M2.7Reasoning modelGPT-5.4 mini (xhigh)Logo of GPT-5.4 mini (xhigh) which relates to the data aboveGPT-5.4 mini(xhigh)Reasoning modelGemini 3 FlashLogo of Gemini 3 Flash which relates to the data aboveGemini 3 FlashReasoning modelClaude Sonnet 4.6 (max)Logo of Claude Sonnet 4.6 (max) which relates to the data aboveClaude Sonnet 4.6(max)Reasoning modelNVIDIA Nemotron 3 SuperLogo of NVIDIA Nemotron 3 Super which relates to the data aboveNVIDIA Nemotron 3SuperReasoning modelgpt-oss-120B (high)Logo of gpt-oss-120B (high) which relates to the data abovegpt-oss-120B (high)Reasoning modelgpt-oss-20B (high)Logo of gpt-oss-20B (high) which relates to the data abovegpt-oss-20B (high)Reasoning modelGemma 4 31BLogo of Gemma 4 31B which relates to the data aboveGemma 4 31BReasoning modelClaude 4.5 HaikuLogo of Claude 4.5 Haiku which relates to the data aboveClaude 4.5 HaikuReasoning modelMistral Small 4Logo of Mistral Small 4 which relates to the data aboveMistral Small 4Reasoning modelK2 Think V2Logo of K2 Think V2 which relates to the data aboveK2 Think V2Reasoning model98%96%96%96%96%96%95%94%94%93%93%92%91%89%87%86%85%83%80%76%68%66%60%60%55%41%25%
AA-LCR
GPT-5.5 (xhigh)Logo of GPT-5.5 (xhigh) which relates to the data aboveGPT-5.5 (xhigh)Reasoning modelGPT-5.4 (xhigh)Logo of GPT-5.4 (xhigh) which relates to the data aboveGPT-5.4 (xhigh)Reasoning modelMiMo-V2.5-ProLogo of MiMo-V2.5-Pro which relates to the data aboveMiMo-V2.5-ProReasoning modelGemini 3.1 Pro PreviewLogo of Gemini 3.1 Pro Preview which relates to the data aboveGemini 3.1 ProPreviewReasoning modelClaude Sonnet 4.6 (max)Logo of Claude Sonnet 4.6 (max) which relates to the data aboveClaude Sonnet 4.6(max)Reasoning modelClaude 4.5 HaikuLogo of Claude 4.5 Haiku which relates to the data aboveClaude 4.5 HaikuReasoning modelClaude Opus 4.7 (max)Logo of Claude Opus 4.7 (max) which relates to the data aboveClaude Opus 4.7(max)Reasoning modelMuse SparkLogo of Muse Spark which relates to the data aboveMuse SparkReasoning modelKimi K2.6Logo of Kimi K2.6 which relates to the data aboveKimi K2.6Reasoning modelQwen3.6 Max PreviewLogo of Qwen3.6 Max Preview which relates to the data aboveQwen3.6 Max PreviewReasoning modelGPT-5.4 mini (xhigh)Logo of GPT-5.4 mini (xhigh) which relates to the data aboveGPT-5.4 mini(xhigh)Reasoning modelMiniMax-M2.7Logo of MiniMax-M2.7 which relates to the data aboveMiniMax-M2.7Reasoning modelGemini 3 FlashLogo of Gemini 3 Flash which relates to the data aboveGemini 3 FlashReasoning modelDeepSeek V4 Pro (Max)Logo of DeepSeek V4 Pro (Max) which relates to the data aboveDeepSeek V4 Pro(Max)Reasoning modelQwen3.5 397B A17BLogo of Qwen3.5 397B A17B which relates to the data aboveQwen3.5 397B A17BReasoning modelDeepSeek V3.2Logo of DeepSeek V3.2 which relates to the data aboveDeepSeek V3.2Reasoning modelDeepSeek V4 Flash (Max)Logo of DeepSeek V4 Flash (Max) which relates to the data aboveDeepSeek V4 Flash(Max)Reasoning modelGLM-5.1Logo of GLM-5.1 which relates to the data aboveGLM-5.1Reasoning modelGemma 4 31BLogo of Gemma 4 31B which relates to the data aboveGemma 4 31BReasoning modelNVIDIA Nemotron 3 SuperLogo of NVIDIA Nemotron 3 Super which relates to the data aboveNVIDIA Nemotron 3SuperReasoning modelGrok 4.20 0309 v2Logo of Grok 4.20 0309 v2 which relates to the data aboveGrok 4.20 0309 v2Reasoning modelNova 2.0 Pro Preview (medium)Logo of Nova 2.0 Pro Preview (medium) which relates to the data aboveNova 2.0 ProPreview (medium)Reasoning modelK2 Think V2Logo of K2 Think V2 which relates to the data aboveK2 Think V2Reasoning modelgpt-oss-120B (high)Logo of gpt-oss-120B (high) which relates to the data abovegpt-oss-120B (high)Reasoning modelMistral Small 4Logo of Mistral Small 4 which relates to the data aboveMistral Small 4Reasoning modelgpt-oss-20B (high)Logo of gpt-oss-20B (high) which relates to the data abovegpt-oss-20B (high)Reasoning modelSolar Pro 3Logo of Solar Pro 3 which relates to the data aboveSolar Pro 3Reasoning model74%74%73%73%71%70%70%70%70%70%69%69%66%66%66%65%63%62%62%60%58%54%53%51%45%31%27%
AA-Omniscience Accuracy
GPT-5.5 (xhigh)Logo of GPT-5.5 (xhigh) which relates to the data aboveGPT-5.5 (xhigh)Reasoning modelGemini 3.1 Pro PreviewLogo of Gemini 3.1 Pro Preview which relates to the data aboveGemini 3.1 ProPreviewReasoning modelGemini 3 FlashLogo of Gemini 3 Flash which relates to the data aboveGemini 3 FlashReasoning modelGPT-5.4 (xhigh)Logo of GPT-5.4 (xhigh) which relates to the data aboveGPT-5.4 (xhigh)Reasoning modelClaude Opus 4.7 (max)Logo of Claude Opus 4.7 (max) which relates to the data aboveClaude Opus 4.7(max)Reasoning modelMuse SparkLogo of Muse Spark which relates to the data aboveMuse SparkReasoning modelDeepSeek V4 Pro (Max)Logo of DeepSeek V4 Pro (Max) which relates to the data aboveDeepSeek V4 Pro(Max)Reasoning modelClaude Sonnet 4.6 (max)Logo of Claude Sonnet 4.6 (max) which relates to the data aboveClaude Sonnet 4.6(max)Reasoning modelQwen3.6 Max PreviewLogo of Qwen3.6 Max Preview which relates to the data aboveQwen3.6 Max PreviewReasoning modelGPT-5.4 mini (xhigh)Logo of GPT-5.4 mini (xhigh) which relates to the data aboveGPT-5.4 mini(xhigh)Reasoning modelDeepSeek V4 Flash (Max)Logo of DeepSeek V4 Flash (Max) which relates to the data aboveDeepSeek V4 Flash(Max)Reasoning modelDeepSeek V3.2Logo of DeepSeek V3.2 which relates to the data aboveDeepSeek V3.2Reasoning modelKimi K2.6Logo of Kimi K2.6 which relates to the data aboveKimi K2.6Reasoning modelQwen3.5 397B A17BLogo of Qwen3.5 397B A17B which relates to the data aboveQwen3.5 397B A17BReasoning modelGrok 4.20 0309 v2Logo of Grok 4.20 0309 v2 which relates to the data aboveGrok 4.20 0309 v2Reasoning modelMiniMax-M2.7Logo of MiniMax-M2.7 which relates to the data aboveMiniMax-M2.7Reasoning modelGLM-5.1Logo of GLM-5.1 which relates to the data aboveGLM-5.1Reasoning modelNVIDIA Nemotron 3 SuperLogo of NVIDIA Nemotron 3 Super which relates to the data aboveNVIDIA Nemotron 3SuperReasoning modelMiMo-V2.5-ProLogo of MiMo-V2.5-Pro which relates to the data aboveMiMo-V2.5-ProReasoning modelMistral Small 4Logo of Mistral Small 4 which relates to the data aboveMistral Small 4Reasoning modelNova 2.0 Pro Preview (medium)Logo of Nova 2.0 Pro Preview (medium) which relates to the data aboveNova 2.0 ProPreview (medium)Reasoning modelgpt-oss-120B (high)Logo of gpt-oss-120B (high) which relates to the data abovegpt-oss-120B (high)Reasoning modelGemma 4 31BLogo of Gemma 4 31B which relates to the data aboveGemma 4 31BReasoning modelSolar Pro 3Logo of Solar Pro 3 which relates to the data aboveSolar Pro 3Reasoning modelClaude 4.5 HaikuLogo of Claude 4.5 Haiku which relates to the data aboveClaude 4.5 HaikuReasoning modelK2 Think V2Logo of K2 Think V2 which relates to the data aboveK2 Think V2Reasoning modelgpt-oss-20B (high)Logo of gpt-oss-20B (high) which relates to the data abovegpt-oss-20B (high)Reasoning model57%55%54%50%46%45%43%40%38%37%37%33%33%31%27%26%24%24%23%22%22%22%20%18%17%16%16%
AA-Omniscience Non-Hallucination Rate
Grok 4.20 0309 v2Logo of Grok 4.20 0309 v2 which relates to the data aboveGrok 4.20 0309 v2Reasoning modelMiMo-V2.5-ProLogo of MiMo-V2.5-Pro which relates to the data aboveMiMo-V2.5-ProReasoning modelClaude 4.5 HaikuLogo of Claude 4.5 Haiku which relates to the data aboveClaude 4.5 HaikuReasoning modelGLM-5.1Logo of GLM-5.1 which relates to the data aboveGLM-5.1Reasoning modelMiniMax-M2.7Logo of MiniMax-M2.7 which relates to the data aboveMiniMax-M2.7Reasoning modelClaude Opus 4.7 (max)Logo of Claude Opus 4.7 (max) which relates to the data aboveClaude Opus 4.7(max)Reasoning modelKimi K2.6Logo of Kimi K2.6 which relates to the data aboveKimi K2.6Reasoning modelQwen3.6 Max PreviewLogo of Qwen3.6 Max Preview which relates to the data aboveQwen3.6 Max PreviewReasoning modelClaude Sonnet 4.6 (max)Logo of Claude Sonnet 4.6 (max) which relates to the data aboveClaude Sonnet 4.6(max)Reasoning modelGemini 3.1 Pro PreviewLogo of Gemini 3.1 Pro Preview which relates to the data aboveGemini 3.1 ProPreviewReasoning modelK2 Think V2Logo of K2 Think V2 which relates to the data aboveK2 Think V2Reasoning modelMistral Small 4Logo of Mistral Small 4 which relates to the data aboveMistral Small 4Reasoning modelMuse SparkLogo of Muse Spark which relates to the data aboveMuse SparkReasoning modelGemma 4 31BLogo of Gemma 4 31B which relates to the data aboveGemma 4 31BReasoning modelDeepSeek V3.2Logo of DeepSeek V3.2 which relates to the data aboveDeepSeek V3.2Reasoning modelGPT-5.5 (xhigh)Logo of GPT-5.5 (xhigh) which relates to the data aboveGPT-5.5 (xhigh)Reasoning modelNVIDIA Nemotron 3 SuperLogo of NVIDIA Nemotron 3 Super which relates to the data aboveNVIDIA Nemotron 3SuperReasoning modelSolar Pro 3Logo of Solar Pro 3 which relates to the data aboveSolar Pro 3Reasoning modelGPT-5.4 (xhigh)Logo of GPT-5.4 (xhigh) which relates to the data aboveGPT-5.4 (xhigh)Reasoning modelQwen3.5 397B A17BLogo of Qwen3.5 397B A17B which relates to the data aboveQwen3.5 397B A17BReasoning modelNova 2.0 Pro Preview (medium)Logo of Nova 2.0 Pro Preview (medium) which relates to the data aboveNova 2.0 ProPreview (medium)Reasoning modelGPT-5.4 mini (xhigh)Logo of GPT-5.4 mini (xhigh) which relates to the data aboveGPT-5.4 mini(xhigh)Reasoning modelgpt-oss-120B (high)Logo of gpt-oss-120B (high) which relates to the data abovegpt-oss-120B (high)Reasoning modelGemini 3 FlashLogo of Gemini 3 Flash which relates to the data aboveGemini 3 FlashReasoning modelDeepSeek V4 Pro (Max)Logo of DeepSeek V4 Pro (Max) which relates to the data aboveDeepSeek V4 Pro(Max)Reasoning modelgpt-oss-20B (high)Logo of gpt-oss-20B (high) which relates to the data abovegpt-oss-20B (high)Reasoning modelDeepSeek V4 Flash (Max)Logo of DeepSeek V4 Flash (Max) which relates to the data aboveDeepSeek V4 Flash(Max)Reasoning model83%75%74%71%66%64%61%56%54%50%41%33%27%18%18%14%13%12%11%11%10%10%9%8%6%6%4%
Humanity's Last Exam
Gemini 3.1 Pro PreviewLogo of Gemini 3.1 Pro Preview which relates to the data aboveGemini 3.1 ProPreviewReasoning modelGPT-5.5 (xhigh)Logo of GPT-5.5 (xhigh) which relates to the data aboveGPT-5.5 (xhigh)Reasoning modelGPT-5.4 (xhigh)Logo of GPT-5.4 (xhigh) which relates to the data aboveGPT-5.4 (xhigh)Reasoning modelMuse SparkLogo of Muse Spark which relates to the data aboveMuse SparkReasoning modelClaude Opus 4.7 (max)Logo of Claude Opus 4.7 (max) which relates to the data aboveClaude Opus 4.7(max)Reasoning modelDeepSeek V4 Pro (Max)Logo of DeepSeek V4 Pro (Max) which relates to the data aboveDeepSeek V4 Pro(Max)Reasoning modelKimi K2.6Logo of Kimi K2.6 which relates to the data aboveKimi K2.6Reasoning modelGemini 3 FlashLogo of Gemini 3 Flash which relates to the data aboveGemini 3 FlashReasoning modelMiMo-V2.5-ProLogo of MiMo-V2.5-Pro which relates to the data aboveMiMo-V2.5-ProReasoning modelGrok 4.20 0309 v2Logo of Grok 4.20 0309 v2 which relates to the data aboveGrok 4.20 0309 v2Reasoning modelDeepSeek V4 Flash (Max)Logo of DeepSeek V4 Flash (Max) which relates to the data aboveDeepSeek V4 Flash(Max)Reasoning modelClaude Sonnet 4.6 (max)Logo of Claude Sonnet 4.6 (max) which relates to the data aboveClaude Sonnet 4.6(max)Reasoning modelQwen3.6 Max PreviewLogo of Qwen3.6 Max Preview which relates to the data aboveQwen3.6 Max PreviewReasoning modelMiniMax-M2.7Logo of MiniMax-M2.7 which relates to the data aboveMiniMax-M2.7Reasoning modelGLM-5.1Logo of GLM-5.1 which relates to the data aboveGLM-5.1Reasoning modelQwen3.5 397B A17BLogo of Qwen3.5 397B A17B which relates to the data aboveQwen3.5 397B A17BReasoning modelGPT-5.4 mini (xhigh)Logo of GPT-5.4 mini (xhigh) which relates to the data aboveGPT-5.4 mini(xhigh)Reasoning modelGemma 4 31BLogo of Gemma 4 31B which relates to the data aboveGemma 4 31BReasoning modelDeepSeek V3.2Logo of DeepSeek V3.2 which relates to the data aboveDeepSeek V3.2Reasoning modelNVIDIA Nemotron 3 SuperLogo of NVIDIA Nemotron 3 Super which relates to the data aboveNVIDIA Nemotron 3SuperReasoning modelgpt-oss-120B (high)Logo of gpt-oss-120B (high) which relates to the data abovegpt-oss-120B (high)Reasoning modelSolar Pro 3Logo of Solar Pro 3 which relates to the data aboveSolar Pro 3Reasoning modelgpt-oss-20B (high)Logo of gpt-oss-20B (high) which relates to the data abovegpt-oss-20B (high)Reasoning modelClaude 4.5 HaikuLogo of Claude 4.5 Haiku which relates to the data aboveClaude 4.5 HaikuReasoning modelMistral Small 4Logo of Mistral Small 4 which relates to the data aboveMistral Small 4Reasoning modelK2 Think V2Logo of K2 Think V2 which relates to the data aboveK2 Think V2Reasoning modelNova 2.0 Pro Preview (medium)Logo of Nova 2.0 Pro Preview (medium) which relates to the data aboveNova 2.0 ProPreview (medium)Reasoning model45%44%42%40%40%36%36%35%34%32%32%30%29%28%28%27%27%23%22%19%19%10%10%10%10%10%9%
GPQA Diamond
Gemini 3.1 Pro PreviewLogo of Gemini 3.1 Pro Preview which relates to the data aboveGemini 3.1 ProPreviewReasoning modelGPT-5.5 (xhigh)Logo of GPT-5.5 (xhigh) which relates to the data aboveGPT-5.5 (xhigh)Reasoning modelGPT-5.4 (xhigh)Logo of GPT-5.4 (xhigh) which relates to the data aboveGPT-5.4 (xhigh)Reasoning modelClaude Opus 4.7 (max)Logo of Claude Opus 4.7 (max) which relates to the data aboveClaude Opus 4.7(max)Reasoning modelGrok 4.20 0309 v2Logo of Grok 4.20 0309 v2 which relates to the data aboveGrok 4.20 0309 v2Reasoning modelKimi K2.6Logo of Kimi K2.6 which relates to the data aboveKimi K2.6Reasoning modelGemini 3 FlashLogo of Gemini 3 Flash which relates to the data aboveGemini 3 FlashReasoning modelDeepSeek V4 Flash (Max)Logo of DeepSeek V4 Flash (Max) which relates to the data aboveDeepSeek V4 Flash(Max)Reasoning modelQwen3.5 397B A17BLogo of Qwen3.5 397B A17B which relates to the data aboveQwen3.5 397B A17BReasoning modelDeepSeek V4 Pro (Max)Logo of DeepSeek V4 Pro (Max) which relates to the data aboveDeepSeek V4 Pro(Max)Reasoning modelQwen3.6 Max PreviewLogo of Qwen3.6 Max Preview which relates to the data aboveQwen3.6 Max PreviewReasoning modelMuse SparkLogo of Muse Spark which relates to the data aboveMuse SparkReasoning modelGPT-5.4 mini (xhigh)Logo of GPT-5.4 mini (xhigh) which relates to the data aboveGPT-5.4 mini(xhigh)Reasoning modelClaude Sonnet 4.6 (max)Logo of Claude Sonnet 4.6 (max) which relates to the data aboveClaude Sonnet 4.6(max)Reasoning modelMiniMax-M2.7Logo of MiniMax-M2.7 which relates to the data aboveMiniMax-M2.7Reasoning modelGLM-5.1Logo of GLM-5.1 which relates to the data aboveGLM-5.1Reasoning modelMiMo-V2.5-ProLogo of MiMo-V2.5-Pro which relates to the data aboveMiMo-V2.5-ProReasoning modelGemma 4 31BLogo of Gemma 4 31B which relates to the data aboveGemma 4 31BReasoning modelDeepSeek V3.2Logo of DeepSeek V3.2 which relates to the data aboveDeepSeek V3.2Reasoning modelNVIDIA Nemotron 3 SuperLogo of NVIDIA Nemotron 3 Super which relates to the data aboveNVIDIA Nemotron 3SuperReasoning modelNova 2.0 Pro Preview (medium)Logo of Nova 2.0 Pro Preview (medium) which relates to the data aboveNova 2.0 ProPreview (medium)Reasoning modelgpt-oss-120B (high)Logo of gpt-oss-120B (high) which relates to the data abovegpt-oss-120B (high)Reasoning modelMistral Small 4Logo of Mistral Small 4 which relates to the data aboveMistral Small 4Reasoning modelSolar Pro 3Logo of Solar Pro 3 which relates to the data aboveSolar Pro 3Reasoning modelK2 Think V2Logo of K2 Think V2 which relates to the data aboveK2 Think V2Reasoning modelgpt-oss-20B (high)Logo of gpt-oss-20B (high) which relates to the data abovegpt-oss-20B (high)Reasoning modelClaude 4.5 HaikuLogo of Claude 4.5 Haiku which relates to the data aboveClaude 4.5 HaikuReasoning model94%94%92%91%91%91%90%89%89%89%89%88%88%88%87%87%87%86%84%80%79%78%77%72%71%69%67%
SciCode
Gemini 3.1 Pro PreviewLogo of Gemini 3.1 Pro Preview which relates to the data aboveGemini 3.1 ProPreviewReasoning modelGPT-5.4 (xhigh)Logo of GPT-5.4 (xhigh) which relates to the data aboveGPT-5.4 (xhigh)Reasoning modelGPT-5.5 (xhigh)Logo of GPT-5.5 (xhigh) which relates to the data aboveGPT-5.5 (xhigh)Reasoning modelClaude Opus 4.7 (max)Logo of Claude Opus 4.7 (max) which relates to the data aboveClaude Opus 4.7(max)Reasoning modelKimi K2.6Logo of Kimi K2.6 which relates to the data aboveKimi K2.6Reasoning modelMuse SparkLogo of Muse Spark which relates to the data aboveMuse SparkReasoning modelGemini 3 FlashLogo of Gemini 3 Flash which relates to the data aboveGemini 3 FlashReasoning modelMiMo-V2.5-ProLogo of MiMo-V2.5-Pro which relates to the data aboveMiMo-V2.5-ProReasoning modelDeepSeek V4 Pro (Max)Logo of DeepSeek V4 Pro (Max) which relates to the data aboveDeepSeek V4 Pro(Max)Reasoning modelGPT-5.4 mini (xhigh)Logo of GPT-5.4 mini (xhigh) which relates to the data aboveGPT-5.4 mini(xhigh)Reasoning modelMiniMax-M2.7Logo of MiniMax-M2.7 which relates to the data aboveMiniMax-M2.7Reasoning modelQwen3.6 Max PreviewLogo of Qwen3.6 Max Preview which relates to the data aboveQwen3.6 Max PreviewReasoning modelClaude Sonnet 4.6 (max)Logo of Claude Sonnet 4.6 (max) which relates to the data aboveClaude Sonnet 4.6(max)Reasoning modelGrok 4.20 0309 v2Logo of Grok 4.20 0309 v2 which relates to the data aboveGrok 4.20 0309 v2Reasoning modelDeepSeek V4 Flash (Max)Logo of DeepSeek V4 Flash (Max) which relates to the data aboveDeepSeek V4 Flash(Max)Reasoning modelGLM-5.1Logo of GLM-5.1 which relates to the data aboveGLM-5.1Reasoning modelGemma 4 31BLogo of Gemma 4 31B which relates to the data aboveGemma 4 31BReasoning modelClaude 4.5 HaikuLogo of Claude 4.5 Haiku which relates to the data aboveClaude 4.5 HaikuReasoning modelNova 2.0 Pro Preview (medium)Logo of Nova 2.0 Pro Preview (medium) which relates to the data aboveNova 2.0 ProPreview (medium)Reasoning modelQwen3.5 397B A17BLogo of Qwen3.5 397B A17B which relates to the data aboveQwen3.5 397B A17BReasoning modelgpt-oss-120B (high)Logo of gpt-oss-120B (high) which relates to the data abovegpt-oss-120B (high)Reasoning modelDeepSeek V3.2Logo of DeepSeek V3.2 which relates to the data aboveDeepSeek V3.2Reasoning modelMistral Small 4Logo of Mistral Small 4 which relates to the data aboveMistral Small 4Reasoning modelNVIDIA Nemotron 3 SuperLogo of NVIDIA Nemotron 3 Super which relates to the data aboveNVIDIA Nemotron 3SuperReasoning modelgpt-oss-20B (high)Logo of gpt-oss-20B (high) which relates to the data abovegpt-oss-20B (high)Reasoning modelK2 Think V2Logo of K2 Think V2 which relates to the data aboveK2 Think V2Reasoning modelSolar Pro 3Logo of Solar Pro 3 which relates to the data aboveSolar Pro 3Reasoning model59%57%56%55%54%52%51%50%50%50%47%47%47%46%45%44%43%43%43%42%39%39%38%36%34%33%25%
IFBench
Grok 4.20 0309 v2Logo of Grok 4.20 0309 v2 which relates to the data aboveGrok 4.20 0309 v2Reasoning modelMiMo-V2.5-ProLogo of MiMo-V2.5-Pro which relates to the data aboveMiMo-V2.5-ProReasoning modelDeepSeek V4 Flash (Max)Logo of DeepSeek V4 Flash (Max) which relates to the data aboveDeepSeek V4 Flash(Max)Reasoning modelNova 2.0 Pro Preview (medium)Logo of Nova 2.0 Pro Preview (medium) which relates to the data aboveNova 2.0 ProPreview (medium)Reasoning modelQwen3.5 397B A17BLogo of Qwen3.5 397B A17B which relates to the data aboveQwen3.5 397B A17BReasoning modelGemini 3 FlashLogo of Gemini 3 Flash which relates to the data aboveGemini 3 FlashReasoning modelGemini 3.1 Pro PreviewLogo of Gemini 3.1 Pro Preview which relates to the data aboveGemini 3.1 ProPreviewReasoning modelQwen3.6 Max PreviewLogo of Qwen3.6 Max Preview which relates to the data aboveQwen3.6 Max PreviewReasoning modelDeepSeek V4 Pro (Max)Logo of DeepSeek V4 Pro (Max) which relates to the data aboveDeepSeek V4 Pro(Max)Reasoning modelGLM-5.1Logo of GLM-5.1 which relates to the data aboveGLM-5.1Reasoning modelKimi K2.6Logo of Kimi K2.6 which relates to the data aboveKimi K2.6Reasoning modelGPT-5.5 (xhigh)Logo of GPT-5.5 (xhigh) which relates to the data aboveGPT-5.5 (xhigh)Reasoning modelMuse SparkLogo of Muse Spark which relates to the data aboveMuse SparkReasoning modelMiniMax-M2.7Logo of MiniMax-M2.7 which relates to the data aboveMiniMax-M2.7Reasoning modelGemma 4 31BLogo of Gemma 4 31B which relates to the data aboveGemma 4 31BReasoning modelGPT-5.4 (xhigh)Logo of GPT-5.4 (xhigh) which relates to the data aboveGPT-5.4 (xhigh)Reasoning modelGPT-5.4 mini (xhigh)Logo of GPT-5.4 mini (xhigh) which relates to the data aboveGPT-5.4 mini(xhigh)Reasoning modelNVIDIA Nemotron 3 SuperLogo of NVIDIA Nemotron 3 Super which relates to the data aboveNVIDIA Nemotron 3SuperReasoning modelSolar Pro 3Logo of Solar Pro 3 which relates to the data aboveSolar Pro 3Reasoning modelgpt-oss-120B (high)Logo of gpt-oss-120B (high) which relates to the data abovegpt-oss-120B (high)Reasoning modelgpt-oss-20B (high)Logo of gpt-oss-20B (high) which relates to the data abovegpt-oss-20B (high)Reasoning modelK2 Think V2Logo of K2 Think V2 which relates to the data aboveK2 Think V2Reasoning modelDeepSeek V3.2Logo of DeepSeek V3.2 which relates to the data aboveDeepSeek V3.2Reasoning modelClaude Opus 4.7 (max)Logo of Claude Opus 4.7 (max) which relates to the data aboveClaude Opus 4.7(max)Reasoning modelClaude Sonnet 4.6 (max)Logo of Claude Sonnet 4.6 (max) which relates to the data aboveClaude Sonnet 4.6(max)Reasoning modelClaude 4.5 HaikuLogo of Claude 4.5 Haiku which relates to the data aboveClaude 4.5 HaikuReasoning modelMistral Small 4Logo of Mistral Small 4 which relates to the data aboveMistral Small 4Reasoning model81%80%79%79%79%78%77%77%77%76%76%76%76%76%76%74%73%72%71%69%65%63%61%59%57%54%48%
CritPt
GPT-5.4 Pro (xhigh)Logo of GPT-5.4 Pro (xhigh) which relates to the data aboveGPT-5.4 Pro (xhigh)Reasoning modelGPT-5.5 (xhigh)Logo of GPT-5.5 (xhigh) which relates to the data aboveGPT-5.5 (xhigh)Reasoning modelGPT-5.4 (xhigh)Logo of GPT-5.4 (xhigh) which relates to the data aboveGPT-5.4 (xhigh)Reasoning modelGemini 3.1 Pro PreviewLogo of Gemini 3.1 Pro Preview which relates to the data aboveGemini 3.1 ProPreviewReasoning modelDeepSeek V4 Pro (Max)Logo of DeepSeek V4 Pro (Max) which relates to the data aboveDeepSeek V4 Pro(Max)Reasoning modelClaude Opus 4.7 (max)Logo of Claude Opus 4.7 (max) which relates to the data aboveClaude Opus 4.7(max)Reasoning modelMuse SparkLogo of Muse Spark which relates to the data aboveMuse SparkReasoning modelGPT-5.4 mini (xhigh)Logo of GPT-5.4 mini (xhigh) which relates to the data aboveGPT-5.4 mini(xhigh)Reasoning modelGemini 3 FlashLogo of Gemini 3 Flash which relates to the data aboveGemini 3 FlashReasoning modelKimi K2.6Logo of Kimi K2.6 which relates to the data aboveKimi K2.6Reasoning modelDeepSeek V4 Flash (Max)Logo of DeepSeek V4 Flash (Max) which relates to the data aboveDeepSeek V4 Flash(Max)Reasoning modelGrok 4.20 0309 v2Logo of Grok 4.20 0309 v2 which relates to the data aboveGrok 4.20 0309 v2Reasoning modelGLM-5.1Logo of GLM-5.1 which relates to the data aboveGLM-5.1Reasoning modelMiMo-V2.5-ProLogo of MiMo-V2.5-Pro which relates to the data aboveMiMo-V2.5-ProReasoning modelQwen3.6 Max PreviewLogo of Qwen3.6 Max Preview which relates to the data aboveQwen3.6 Max PreviewReasoning modelClaude Sonnet 4.6 (max)Logo of Claude Sonnet 4.6 (max) which relates to the data aboveClaude Sonnet 4.6(max)Reasoning modelNVIDIA Nemotron 3 SuperLogo of NVIDIA Nemotron 3 Super which relates to the data aboveNVIDIA Nemotron 3SuperReasoning modelDeepSeek V3.2Logo of DeepSeek V3.2 which relates to the data aboveDeepSeek V3.2Reasoning modelQwen3.5 397B A17BLogo of Qwen3.5 397B A17B which relates to the data aboveQwen3.5 397B A17BReasoning modelgpt-oss-20B (high)Logo of gpt-oss-20B (high) which relates to the data abovegpt-oss-20B (high)Reasoning modelGemma 4 31BLogo of Gemma 4 31B which relates to the data aboveGemma 4 31BReasoning modelgpt-oss-120B (high)Logo of gpt-oss-120B (high) which relates to the data abovegpt-oss-120B (high)Reasoning modelMiniMax-M2.7Logo of MiniMax-M2.7 which relates to the data aboveMiniMax-M2.7Reasoning modelMistral Small 4Logo of Mistral Small 4 which relates to the data aboveMistral Small 4Reasoning modelClaude 4.5 HaikuLogo of Claude 4.5 Haiku which relates to the data aboveClaude 4.5 HaikuReasoning modelNova 2.0 Pro Preview (medium)Logo of Nova 2.0 Pro Preview (medium) which relates to the data aboveNova 2.0 ProPreview (medium)Reasoning modelSolar Pro 3Logo of Solar Pro 3 which relates to the data aboveSolar Pro 3Reasoning modelK2 Think V2Logo of K2 Think V2 which relates to the data aboveK2 Think V2Reasoning model30%27%23%18%13%12%11%10%9%8%7%7%5%4%4%3%3%3%2%1%1%1%1%0%0%0%0%0%
APEX-Agents-AA
GPT-5.5 (xhigh)Logo of GPT-5.5 (xhigh) which relates to the data aboveGPT-5.5 (xhigh)Reasoning modelGPT-5.4 (xhigh)Logo of GPT-5.4 (xhigh) which relates to the data aboveGPT-5.4 (xhigh)Reasoning modelGemini 3.1 Pro PreviewLogo of Gemini 3.1 Pro Preview which relates to the data aboveGemini 3.1 ProPreviewReasoning modelGPT-5.4 mini (xhigh)Logo of GPT-5.4 mini (xhigh) which relates to the data aboveGPT-5.4 mini(xhigh)Reasoning modelClaude Sonnet 4.6 (max)Logo of Claude Sonnet 4.6 (max) which relates to the data aboveClaude Sonnet 4.6(max)Reasoning modelGemini 3 FlashLogo of Gemini 3 Flash which relates to the data aboveGemini 3 FlashReasoning modelQwen3.5 397B A17BLogo of Qwen3.5 397B A17B which relates to the data aboveQwen3.5 397B A17BReasoning modelDeepSeek V3.2Logo of DeepSeek V3.2 which relates to the data aboveDeepSeek V3.2Reasoning modelMiniMax-M2.7Logo of MiniMax-M2.7 which relates to the data aboveMiniMax-M2.7Reasoning modelgpt-oss-120B (high)Logo of gpt-oss-120B (high) which relates to the data abovegpt-oss-120B (high)Reasoning modelNVIDIA Nemotron 3 SuperLogo of NVIDIA Nemotron 3 Super which relates to the data aboveNVIDIA Nemotron 3SuperReasoning modelgpt-oss-20B (high)Logo of gpt-oss-20B (high) which relates to the data abovegpt-oss-20B (high)Reasoning model38%33%32%28%28%28%15%14%11%3%2%1%
MMMU-Pro
Gemini 3.1 Pro PreviewLogo of Gemini 3.1 Pro Preview which relates to the data aboveGemini 3.1 ProPreviewReasoning modelMuse SparkLogo of Muse Spark which relates to the data aboveMuse SparkReasoning modelGPT-5.5 (xhigh)Logo of GPT-5.5 (xhigh) which relates to the data aboveGPT-5.5 (xhigh)Reasoning modelGemini 3 FlashLogo of Gemini 3 Flash which relates to the data aboveGemini 3 FlashReasoning modelKimi K2.6Logo of Kimi K2.6 which relates to the data aboveKimi K2.6Reasoning modelGPT-5.4 (xhigh)Logo of GPT-5.4 (xhigh) which relates to the data aboveGPT-5.4 (xhigh)Reasoning modelQwen3.5 397B A17BLogo of Qwen3.5 397B A17B which relates to the data aboveQwen3.5 397B A17BReasoning modelGrok 4.20 0309 v2Logo of Grok 4.20 0309 v2 which relates to the data aboveGrok 4.20 0309 v2Reasoning modelGemma 4 31BLogo of Gemma 4 31B which relates to the data aboveGemma 4 31BReasoning modelGPT-5.4 mini (xhigh)Logo of GPT-5.4 mini (xhigh) which relates to the data aboveGPT-5.4 mini(xhigh)Reasoning modelClaude Sonnet 4.6 (max)Logo of Claude Sonnet 4.6 (max) which relates to the data aboveClaude Sonnet 4.6(max)Reasoning modelNova 2.0 Pro Preview (medium)Logo of Nova 2.0 Pro Preview (medium) which relates to the data aboveNova 2.0 ProPreview (medium)Reasoning modelClaude 4.5 HaikuLogo of Claude 4.5 Haiku which relates to the data aboveClaude 4.5 HaikuReasoning modelMistral Small 4Logo of Mistral Small 4 which relates to the data aboveMistral Small 4Reasoning model82%81%80%80%79%78%77%75%73%73%73%65%59%57%
Reasoning models are indicated by a lightbulb icon.

While model intelligence generally translates across use cases, specific evaluations may be more relevant for certain use cases.

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

AA-Omniscience

Explore

AA-Omniscience is a knowledge and hallucination benchmark that rewards accuracy, punishes bad guesses and provides a comprehensive view of which models produce factually reliable outputs across different domains

AA-Omniscience Index

AA-Omniscience Index (higher is better) measures knowledge reliability and hallucination. It rewards correct answers, penalizes hallucinations, and has no penalty for refusing to answer. Scores range from -100 to 100, where 0 means as many correct as incorrect answers, and negative scores mean more incorrect than correct.
0Gemini 3.1 Pro PreviewLogo of Gemini 3.1 Pro Preview which relates to the data aboveGemini 3.1Pro PreviewReasoning modelClaude Opus 4.7 (max)Logo of Claude Opus 4.7 (max) which relates to the data aboveClaude Opus4.7 (max)Reasoning modelGPT-5.5 (xhigh)Logo of GPT-5.5 (xhigh) which relates to the data aboveGPT-5.5(xhigh)Reasoning modelGrok 4.20 0309 v2Logo of Grok 4.20 0309 v2 which relates to the data aboveGrok 4.200309 v2Reasoning modelClaude Sonnet 4.6 (max)Logo of Claude Sonnet 4.6 (max) which relates to the data aboveClaudeSonnet 4.6(max) Reasoning modelGemini 3 FlashLogo of Gemini 3 Flash which relates to the data aboveGemini 3FlashReasoning modelQwen3.6 Max PreviewLogo of Qwen3.6 Max Preview which relates to the data aboveQwen3.6 MaxPreviewReasoning modelKimi K2.6Logo of Kimi K2.6 which relates to the data aboveKimi K2.6Reasoning modelGPT-5.4 (xhigh)Logo of GPT-5.4 (xhigh) which relates to the data aboveGPT-5.4(xhigh)Reasoning modelMuse SparkLogo of Muse Spark which relates to the data aboveMuse SparkReasoning modelMiMo-V2.5-ProLogo of MiMo-V2.5-Pro which relates to the data aboveMiMo-V2.5-ProReasoning modelGLM-5.1Logo of GLM-5.1 which relates to the data aboveGLM-5.1Reasoning modelMiniMax-M2.7Logo of MiniMax-M2.7 which relates to the data aboveMiniMax-M2.7Reasoning modelClaude 4.5 HaikuLogo of Claude 4.5 Haiku which relates to the data aboveClaude 4.5HaikuReasoning modelDeepSeek V4 Pro (Max)Logo of DeepSeek V4 Pro (Max) which relates to the data aboveDeepSeek V4Pro (Max)Reasoning modelGPT-5.4 mini (xhigh)Logo of GPT-5.4 mini (xhigh) which relates to the data aboveGPT-5.4 mini(xhigh)Reasoning modelDeepSeek V3.2Logo of DeepSeek V3.2 which relates to the data aboveDeepSeekV3.2Reasoning modelDeepSeek V4 Flash (Max)Logo of DeepSeek V4 Flash (Max) which relates to the data aboveDeepSeek V4Flash (Max)Reasoning modelQwen3.5 397B A17BLogo of Qwen3.5 397B A17B which relates to the data aboveQwen3.5 397BA17BReasoning modelMistral Small 4Logo of Mistral Small 4 which relates to the data aboveMistralSmall 4Reasoning modelK2 Think V2Logo of K2 Think V2 which relates to the data aboveK2 Think V2Reasoning modelNVIDIA Nemotron 3 SuperLogo of NVIDIA Nemotron 3 Super which relates to the data aboveNVIDIANemotron 3Super Reasoning modelGemma 4 31BLogo of Gemma 4 31B which relates to the data aboveGemma 4 31BReasoning modelNova 2.0 Pro Preview (medium)Logo of Nova 2.0 Pro Preview (medium) which relates to the data aboveNova 2.0 ProPreview(medium) Reasoning modelgpt-oss-120B (high)Logo of gpt-oss-120B (high) which relates to the data abovegpt-oss-120B(high)Reasoning modelSolar Pro 3Logo of Solar Pro 3 which relates to the data aboveSolar Pro 3Reasoning modelgpt-oss-20B (high)Logo of gpt-oss-20B (high) which relates to the data abovegpt-oss-20B(high)Reasoning model33262015121210664421-4-10-19-21-23-30-30-34-42-45-48-50-54-64
Reasoning models are indicated by a lightbulb icon.

AA-Omniscience Index (higher is better) measures knowledge reliability and hallucination. It rewards correct answers, penalizes hallucinations, and has no penalty for refusing to answer. Scores range from -100 to 100, where 0 means as many correct as incorrect answers, and negative scores mean more incorrect than correct.

GDPval-AA

Explore

GDPval-AA evaluates AI models on real-world, economically valuable tasks across a wide range of occupations

GDPval-AA Leaderboard

ELO scores for agentic performance on real-world work tasks using web and shell access via Stirrup, an open-source harness developed by Artificial Analysis
GPT-5.5 (xhigh)Logo of GPT-5.5 (xhigh) which relates to the data aboveGPT-5.5(xhigh)Reasoning modelClaude Opus 4.7 (max)Logo of Claude Opus 4.7 (max) which relates to the data aboveClaude Opus4.7 (max)Reasoning modelClaude Sonnet 4.6 (max)Logo of Claude Sonnet 4.6 (max) which relates to the data aboveClaudeSonnet 4.6(max) Reasoning modelGPT-5.4 (xhigh)Logo of GPT-5.4 (xhigh) which relates to the data aboveGPT-5.4(xhigh)Reasoning modelMiMo-V2.5-ProLogo of MiMo-V2.5-Pro which relates to the data aboveMiMo-V2.5-ProReasoning modelDeepSeek V4 Pro (Max)Logo of DeepSeek V4 Pro (Max) which relates to the data aboveDeepSeek V4Pro (Max)Reasoning modelGLM-5.1Logo of GLM-5.1 which relates to the data aboveGLM-5.1Reasoning modelMiniMax-M2.7Logo of MiniMax-M2.7 which relates to the data aboveMiniMax-M2.7Reasoning modelQwen3.6 Max PreviewLogo of Qwen3.6 Max Preview which relates to the data aboveQwen3.6 MaxPreviewReasoning modelKimi K2.6Logo of Kimi K2.6 which relates to the data aboveKimi K2.6Reasoning modelGPT-5.4 mini (xhigh)Logo of GPT-5.4 mini (xhigh) which relates to the data aboveGPT-5.4 mini(xhigh)Reasoning modelMuse SparkLogo of Muse Spark which relates to the data aboveMuse SparkReasoning modelDeepSeek V4 Flash (Max)Logo of DeepSeek V4 Flash (Max) which relates to the data aboveDeepSeek V4Flash (Max)Reasoning modelGemini 3.1 Pro PreviewLogo of Gemini 3.1 Pro Preview which relates to the data aboveGemini 3.1Pro PreviewReasoning modelGemini 3 FlashLogo of Gemini 3 Flash which relates to the data aboveGemini 3FlashReasoning modelDeepSeek V3.2Logo of DeepSeek V3.2 which relates to the data aboveDeepSeekV3.2Reasoning modelQwen3.5 397B A17BLogo of Qwen3.5 397B A17B which relates to the data aboveQwen3.5 397BA17BReasoning modelGrok 4.20 0309 v2Logo of Grok 4.20 0309 v2 which relates to the data aboveGrok 4.200309 v2Reasoning modelClaude 4.5 HaikuLogo of Claude 4.5 Haiku which relates to the data aboveClaude 4.5HaikuReasoning modelGemma 4 31BLogo of Gemma 4 31B which relates to the data aboveGemma 4 31BReasoning modelNVIDIA Nemotron 3 SuperLogo of NVIDIA Nemotron 3 Super which relates to the data aboveNVIDIANemotron 3Super Reasoning modelNova 2.0 Pro Preview (medium)Logo of Nova 2.0 Pro Preview (medium) which relates to the data aboveNova 2.0 ProPreview(medium) Reasoning modelgpt-oss-120B (high)Logo of gpt-oss-120B (high) which relates to the data abovegpt-oss-120B(high)Reasoning modelMistral Small 4Logo of Mistral Small 4 which relates to the data aboveMistralSmall 4Reasoning modelSolar Pro 3Logo of Solar Pro 3 which relates to the data aboveSolar Pro 3Reasoning modelgpt-oss-20B (high)Logo of gpt-oss-20B (high) which relates to the data abovegpt-oss-20B(high)Reasoning modelK2 Think V2Logo of K2 Think V2 which relates to the data aboveK2 Think V2Reasoning model178217531676167415791554153515141509148614351426138813141207120211971179117611171004976948863676654609

Artificial Analysis Openness Index

Explore

Artificial Analysis Openness Index assesses how 'open' models are on the basis of their availability and transparency across different components.

Artificial Analysis Openness Index: Components

Openness Index underlying score contribution by components, up to a maximum of 18 (higher is more open)
Model Availability
Transparency - Methodology
Transparency - Post-training Data
Transparency - Pre-training Data
K2 Think V2Reasoning modelNVIDIANemotron 3Super Reasoning modelDeepSeek V4Pro (Max)Reasoning modelDeepSeek V4Flash (Max)Reasoning modelGLM-5.1Reasoning modelgpt-oss-20B(high)Reasoning modelgpt-oss-120B(high)Reasoning modelGemma 4 31BReasoning modelMistralSmall 4Reasoning modelQwen3.5 397BA17BReasoning modelKimi K2.6Reasoning modelMiniMax-M2.7Reasoning modelClaude 4.5HaikuReasoning model16.015.09.09.08.07.07.07.07.07.06.04.02.06.06.06.06.06.06.06.06.06.06.04.03.06.06.03.03.02.02.02.02.0

Artificial Analysis Openness Index vs. Artificial Analysis Intelligence Index

Artificial Analysis Openness Index; Artificial Analysis Intelligence Index
Most attractive quadrant
Alibaba
Anthropic
DeepSeek
Google
Kimi
MBZUAI Institute of Foundation Models
MiniMax
Mistral
NVIDIA
OpenAI
Z AI
202530354045505560Artificial Analysis Intelligence Index0102030405060708090100Artificial Analysis Openness IndexClaude 4.5 Haikugpt-oss-20B (high)Mistral Small 4gpt-oss-120B (high)MiniMax-M2.7Gemma 4 31BQwen3.5 397B A17BKimi K2.6GLM-5.1DeepSeek V4 Flash (Max)DeepSeek V4 Pro (Max)NVIDIA Nemotron 3 SuperK2 Think V2

Output Tokens

Output tokens of leading AI models based on our independent evaluations

Output Tokens Used to Run Artificial Analysis Intelligence Index

Tokens used to run all evaluations in the Artificial Analysis Intelligence Index
Reasoning Tokens
Answer Tokens
DeepSeek V4 Flash (Max)Logo of DeepSeek V4 Flash (Max) which relates to the data aboveDeepSeek V4Flash (Max)Reasoning modelGPT-5.4 mini (xhigh)Logo of GPT-5.4 mini (xhigh) which relates to the data aboveGPT-5.4 mini(xhigh)Reasoning modelClaude Sonnet 4.6 (max)Logo of Claude Sonnet 4.6 (max) which relates to the data aboveClaudeSonnet 4.6(max) Reasoning modelDeepSeek V4 Pro (Max)Logo of DeepSeek V4 Pro (Max) which relates to the data aboveDeepSeek V4Pro (Max)Reasoning modelKimi K2.6Logo of Kimi K2.6 which relates to the data aboveKimi K2.6Reasoning modelGPT-5.4 (xhigh)Logo of GPT-5.4 (xhigh) which relates to the data aboveGPT-5.4(xhigh)Reasoning modelSolar Pro 3Logo of Solar Pro 3 which relates to the data aboveSolar Pro 3Reasoning modelGLM-5.1Logo of GLM-5.1 which relates to the data aboveGLM-5.1Reasoning modelClaude Opus 4.7 (max)Logo of Claude Opus 4.7 (max) which relates to the data aboveClaude Opus4.7 (max)Reasoning modelNVIDIA Nemotron 3 SuperLogo of NVIDIA Nemotron 3 Super which relates to the data aboveNVIDIANemotron 3Super Reasoning modelK2 Think V2Logo of K2 Think V2 which relates to the data aboveK2 Think V2Reasoning modelMiMo-V2.5-ProLogo of MiMo-V2.5-Pro which relates to the data aboveMiMo-V2.5-ProReasoning modelClaude 4.5 HaikuLogo of Claude 4.5 Haiku which relates to the data aboveClaude 4.5HaikuReasoning modelMiniMax-M2.7Logo of MiniMax-M2.7 which relates to the data aboveMiniMax-M2.7Reasoning modelQwen3.5 397B A17BLogo of Qwen3.5 397B A17B which relates to the data aboveQwen3.5 397BA17BReasoning modelgpt-oss-120B (high)Logo of gpt-oss-120B (high) which relates to the data abovegpt-oss-120B(high)Reasoning modelGPT-5.5 (xhigh)Logo of GPT-5.5 (xhigh) which relates to the data aboveGPT-5.5(xhigh)Reasoning modelQwen3.6 Max PreviewLogo of Qwen3.6 Max Preview which relates to the data aboveQwen3.6 MaxPreviewReasoning modelGemini 3 FlashLogo of Gemini 3 Flash which relates to the data aboveGemini 3FlashReasoning modelDeepSeek V3.2Logo of DeepSeek V3.2 which relates to the data aboveDeepSeekV3.2Reasoning modelgpt-oss-20B (high)Logo of gpt-oss-20B (high) which relates to the data abovegpt-oss-20B(high)Reasoning modelGrok 4.20 0309 v2Logo of Grok 4.20 0309 v2 which relates to the data aboveGrok 4.200309 v2Reasoning modelMuse SparkLogo of Muse Spark which relates to the data aboveMuse SparkReasoning modelGemini 3.1 Pro PreviewLogo of Gemini 3.1 Pro Preview which relates to the data aboveGemini 3.1Pro PreviewReasoning modelMistral Small 4Logo of Mistral Small 4 which relates to the data aboveMistralSmall 4Reasoning modelGemma 4 31BLogo of Gemma 4 31B which relates to the data aboveGemma 4 31BReasoning modelNova 2.0 Pro Preview (medium)Logo of Nova 2.0 Pro Preview (medium) which relates to the data aboveNova 2.0 ProPreview(medium) Reasoning model240M240M200M190M170M120M120M110M110M110M99M92M87M87M86M78M75M74M72M61M61M61M58M57M53M39M36M230M230M190M180M160M110M100M110M93M100M95M82M79M79M80M73M68M65M68M57M58M56M53M53M49M32M33M
Reasoning models are indicated by a lightbulb icon.

The number of tokens required to run all evaluations in the Artificial Analysis Intelligence Index (excluding repeats).

Cost Efficiency

Cost of leading AI models based on our independent evaluations

Cost to Run Artificial Analysis Intelligence Index

Cost (USD) to run all evaluations in the Artificial Analysis Intelligence Index
Input Cost
Reasoning Cost
Output Cost
Claude Opus 4.7 (max)Logo of Claude Opus 4.7 (max) which relates to the data aboveClaude Opus4.7 (max)Reasoning modelClaude Sonnet 4.6 (max)Logo of Claude Sonnet 4.6 (max) which relates to the data aboveClaudeSonnet 4.6(max) Reasoning modelGPT-5.5 (xhigh)Logo of GPT-5.5 (xhigh) which relates to the data aboveGPT-5.5(xhigh)Reasoning modelGPT-5.4 (xhigh)Logo of GPT-5.4 (xhigh) which relates to the data aboveGPT-5.4(xhigh)Reasoning modelGPT-5.4 mini (xhigh)Logo of GPT-5.4 mini (xhigh) which relates to the data aboveGPT-5.4 mini(xhigh)Reasoning modelDeepSeek V4 Pro (Max)Logo of DeepSeek V4 Pro (Max) which relates to the data aboveDeepSeek V4Pro (Max)Reasoning modelKimi K2.6Logo of Kimi K2.6 which relates to the data aboveKimi K2.6Reasoning modelGemini 3.1 Pro PreviewLogo of Gemini 3.1 Pro Preview which relates to the data aboveGemini 3.1Pro PreviewReasoning modelQwen3.6 Max PreviewLogo of Qwen3.6 Max Preview which relates to the data aboveQwen3.6 MaxPreviewReasoning modelClaude 4.5 HaikuLogo of Claude 4.5 Haiku which relates to the data aboveClaude 4.5HaikuReasoning modelGLM-5.1Logo of GLM-5.1 which relates to the data aboveGLM-5.1Reasoning modelGrok 4.20 0309 v2Logo of Grok 4.20 0309 v2 which relates to the data aboveGrok 4.200309 v2Reasoning modelNova 2.0 Pro Preview (medium)Logo of Nova 2.0 Pro Preview (medium) which relates to the data aboveNova 2.0 ProPreview(medium) Reasoning modelMiMo-V2.5-ProLogo of MiMo-V2.5-Pro which relates to the data aboveMiMo-V2.5-ProReasoning modelQwen3.5 397B A17BLogo of Qwen3.5 397B A17B which relates to the data aboveQwen3.5 397BA17BReasoning modelGemini 3 FlashLogo of Gemini 3 Flash which relates to the data aboveGemini 3FlashReasoning modelMiniMax-M2.7Logo of MiniMax-M2.7 which relates to the data aboveMiniMax-M2.7Reasoning modelNVIDIA Nemotron 3 SuperLogo of NVIDIA Nemotron 3 Super which relates to the data aboveNVIDIANemotron 3Super Reasoning modelDeepSeek V4 Flash (Max)Logo of DeepSeek V4 Flash (Max) which relates to the data aboveDeepSeek V4Flash (Max)Reasoning modelDeepSeek V3.2Logo of DeepSeek V3.2 which relates to the data aboveDeepSeekV3.2Reasoning modelgpt-oss-120B (high)Logo of gpt-oss-120B (high) which relates to the data abovegpt-oss-120B(high)Reasoning modelMistral Small 4Logo of Mistral Small 4 which relates to the data aboveMistralSmall 4Reasoning modelgpt-oss-20B (high)Logo of gpt-oss-20B (high) which relates to the data abovegpt-oss-20B(high)Reasoning model$4811$3959$3357$2851$1354$1071$948$892$861$583$544$514$467$462$418$278$176$145$113$71$67$48$22$2013$987$1101$1042$2319$2805$2030$1717$1036$614$626$636$509$485$479
Reasoning models are indicated by a lightbulb icon.

The cost to run the evaluations in the Artificial Analysis Intelligence Index, calculated using the model's input and output token pricing and the number of tokens used across evaluations (excluding repeats).

Speed & Latency

Comparison of first-party API performance

Output Speed

Output Tokens per Second; Higher is better
gpt-oss-20B (high)Logo of gpt-oss-20B (high) which relates to the data abovegpt-oss-20B(high)Reasoning modelgpt-oss-120B (high)Logo of gpt-oss-120B (high) which relates to the data abovegpt-oss-120B(high)Reasoning modelGemini 3 FlashLogo of Gemini 3 Flash which relates to the data aboveGemini 3FlashReasoning modelGPT-5.4 mini (xhigh)Logo of GPT-5.4 mini (xhigh) which relates to the data aboveGPT-5.4 mini(xhigh)Reasoning modelMistral Small 4Logo of Mistral Small 4 which relates to the data aboveMistralSmall 4Reasoning modelNVIDIA Nemotron 3 SuperLogo of NVIDIA Nemotron 3 Super which relates to the data aboveNVIDIANemotron 3Super Reasoning modelGemini 3.1 Pro PreviewLogo of Gemini 3.1 Pro Preview which relates to the data aboveGemini 3.1Pro PreviewReasoning modelNova 2.0 Pro Preview (medium)Logo of Nova 2.0 Pro Preview (medium) which relates to the data aboveNova 2.0 ProPreview(medium) Reasoning modelGrok 4.20 0309 v2Logo of Grok 4.20 0309 v2 which relates to the data aboveGrok 4.200309 v2Reasoning modelKimi K2.6Logo of Kimi K2.6 which relates to the data aboveKimi K2.6Reasoning modelClaude 4.5 HaikuLogo of Claude 4.5 Haiku which relates to the data aboveClaude 4.5HaikuReasoning modelDeepSeek V4 Flash (Max)Logo of DeepSeek V4 Flash (Max) which relates to the data aboveDeepSeek V4Flash (Max)Reasoning modelGPT-5.4 (xhigh)Logo of GPT-5.4 (xhigh) which relates to the data aboveGPT-5.4(xhigh)Reasoning modelDeepSeek V3.2Logo of DeepSeek V3.2 which relates to the data aboveDeepSeekV3.2Reasoning modelGPT-5.5 (xhigh)Logo of GPT-5.5 (xhigh) which relates to the data aboveGPT-5.5(xhigh)Reasoning modelMiMo-V2.5-ProLogo of MiMo-V2.5-Pro which relates to the data aboveMiMo-V2.5-ProReasoning modelClaude Sonnet 4.6 (max)Logo of Claude Sonnet 4.6 (max) which relates to the data aboveClaudeSonnet 4.6(max) Reasoning modelQwen3.5 397B A17BLogo of Qwen3.5 397B A17B which relates to the data aboveQwen3.5 397BA17BReasoning modelClaude Opus 4.7 (max)Logo of Claude Opus 4.7 (max) which relates to the data aboveClaude Opus4.7 (max)Reasoning modelGLM-5.1Logo of GLM-5.1 which relates to the data aboveGLM-5.1Reasoning modelMiniMax-M2.7Logo of MiniMax-M2.7 which relates to the data aboveMiniMax-M2.7Reasoning modelDeepSeek V4 Pro (Max)Logo of DeepSeek V4 Pro (Max) which relates to the data aboveDeepSeek V4Pro (Max)Reasoning modelGemma 4 31BLogo of Gemma 4 31B which relates to the data aboveGemma 4 31BReasoning modelQwen3.6 Max PreviewLogo of Qwen3.6 Max Preview which relates to the data aboveQwen3.6 MaxPreviewReasoning model28020917416216115412312011511210784797674605553494948363534
Reasoning models are indicated by a lightbulb icon.

Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

Price

Price of leading AI models based on our independent evaluations

Pricing: Input and Output Prices

Price: USD per 1M Tokens
Input price
Output price
gpt-oss-20B (high)Logo of gpt-oss-20B (high) which relates to the data abovegpt-oss-20B(high)Reasoning modelDeepSeek V4 Flash (Max)Logo of DeepSeek V4 Flash (Max) which relates to the data aboveDeepSeek V4Flash (Max)Reasoning modelDeepSeek V3.2Logo of DeepSeek V3.2 which relates to the data aboveDeepSeekV3.2Reasoning modelgpt-oss-120B (high)Logo of gpt-oss-120B (high) which relates to the data abovegpt-oss-120B(high)Reasoning modelMistral Small 4Logo of Mistral Small 4 which relates to the data aboveMistralSmall 4Reasoning modelNVIDIA Nemotron 3 SuperLogo of NVIDIA Nemotron 3 Super which relates to the data aboveNVIDIANemotron 3Super Reasoning modelMiniMax-M2.7Logo of MiniMax-M2.7 which relates to the data aboveMiniMax-M2.7Reasoning modelGemini 3 FlashLogo of Gemini 3 Flash which relates to the data aboveGemini 3FlashReasoning modelMiMo-V2.5-ProLogo of MiMo-V2.5-Pro which relates to the data aboveMiMo-V2.5-ProReasoning modelQwen3.5 397B A17BLogo of Qwen3.5 397B A17B which relates to the data aboveQwen3.5 397BA17BReasoning modelKimi K2.6Logo of Kimi K2.6 which relates to the data aboveKimi K2.6Reasoning modelDeepSeek V4 Pro (Max)Logo of DeepSeek V4 Pro (Max) which relates to the data aboveDeepSeek V4Pro (Max)Reasoning modelGPT-5.4 mini (xhigh)Logo of GPT-5.4 mini (xhigh) which relates to the data aboveGPT-5.4 mini(xhigh)Reasoning modelGLM-5.1Logo of GLM-5.1 which relates to the data aboveGLM-5.1Reasoning modelClaude 4.5 HaikuLogo of Claude 4.5 Haiku which relates to the data aboveClaude 4.5HaikuReasoning modelGrok 4.20 0309 v2Logo of Grok 4.20 0309 v2 which relates to the data aboveGrok 4.200309 v2Reasoning modelQwen3.6 Max PreviewLogo of Qwen3.6 Max Preview which relates to the data aboveQwen3.6 MaxPreviewReasoning modelNova 2.0 Pro Preview (medium)Logo of Nova 2.0 Pro Preview (medium) which relates to the data aboveNova 2.0 ProPreview(medium) Reasoning modelGemini 3.1 Pro PreviewLogo of Gemini 3.1 Pro Preview which relates to the data aboveGemini 3.1Pro PreviewReasoning modelGPT-5.4 (xhigh)Logo of GPT-5.4 (xhigh) which relates to the data aboveGPT-5.4(xhigh)Reasoning modelClaude Sonnet 4.6 (max)Logo of Claude Sonnet 4.6 (max) which relates to the data aboveClaudeSonnet 4.6(max) Reasoning modelClaude Opus 4.7 (max)Logo of Claude Opus 4.7 (max) which relates to the data aboveClaude Opus4.7 (max)Reasoning modelGPT-5.5 (xhigh)Logo of GPT-5.5 (xhigh) which relates to the data aboveGPT-5.5(xhigh)Reasoning modelGPT-5.4 Pro (xhigh)Logo of GPT-5.4 Pro (xhigh) which relates to the data aboveGPT-5.4 Pro(xhigh)Reasoning model0.070.140.280.150.150.30.30.510.60.951.740.751.4121.31.2522.5355300.20.280.420.60.60.751.2333.643.484.54.4567.8101215152530180
Reasoning models are indicated by a lightbulb icon.

Price per token included in the request/message sent to the API, represented as USD per million Tokens.

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

API Provider Performance

Output Speed vs. Price: gpt-oss-120B (high)

Output Speed: Output Tokens per Second, Price: USD per 1M Tokens; 10,000 Input Tokens
Most attractive quadrant
Amazon
Azure
Baseten
Cerebras
Clarifai
Cloudflare
Databricks
DeepInfra
DeepInfra (Turbo)
Eigen AI
Fireworks
Google Vertex
Groq
Lightning AI
Nebius Base
Nebius Fast
Novita
Parasail
SambaNova
Scaleway
Snowflake
Together.ai
Weights & Biases
$0.05$0.10$0.15$0.20$0.25$0.30$0.35$0.40$0.45$0.50Price (USD per 1M Tokens)02004006008001.00k1.20k1.40k1.60k1.80k2.00k2.20kOutput Speed (Output Tokens per Second)ParasailNovitaDeepInfraFireworksTogether.aiCloudflareWeights & BiasesDeepInfra (Turbo)Nebius BaseScalewayMicrosoft AzureBasetenLightning AIAmazon BedrockDatabricksSnowflakeGoogle VertexGroqClarifaiNebius FastEigen AISambaNovaCerebras
Reasoning models are indicated by a lightbulb icon.

Smaller, emerging providers are offering high output speed and at competitive prices.

Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).

Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).

Figures represent median (P50) measurement over the past 72 hours to reflect sustained changes in performance.

Pricing (Input and Output Prices): gpt-oss-120B (high)

Price: USD per 1M Tokens; Lower is better; 10,000 Input Tokens
Input price
Output price
DeepInfraLogo of DeepInfra which relates to the data aboveDeepInfraNovitaLogo of Novita which relates to the data aboveNovitaSnowflakeLogo of Snowflake which relates to the data aboveSnowflakeGoogle VertexLogo of Google Vertex which relates to the data aboveGoogle VertexClarifaiLogo of Clarifai which relates to the data aboveClarifaiLightning AILogo of Lightning AI which relates to the data aboveLightning AIBasetenLogo of Baseten which relates to the data aboveBasetenNebius FastLogo of Nebius Fast which relates to the data aboveNebius FastEigen AILogo of Eigen AI which relates to the data aboveEigen AIDatabricksLogo of Databricks which relates to the data aboveDatabricksAzureLogo of Azure which relates to the data aboveAzureAmazonLogo of Amazon which relates to the data aboveAmazonTogether.aiLogo of Together.ai which relates to the data aboveTogether.aiDeepInfra (Turbo)Logo of DeepInfra (Turbo) which relates to the data aboveDeepInfra (Turbo)GroqLogo of Groq which relates to the data aboveGroqWeights & BiasesLogo of Weights & Biases which relates to the data aboveWeights & BiasesFireworksLogo of Fireworks which relates to the data aboveFireworksNebius BaseLogo of Nebius Base which relates to the data aboveNebius BaseSambaNovaLogo of SambaNova which relates to the data aboveSambaNovaParasailLogo of Parasail which relates to the data aboveParasailScalewayLogo of Scaleway which relates to the data aboveScalewayCerebrasLogo of Cerebras which relates to the data aboveCerebrasCloudflareLogo of Cloudflare which relates to the data aboveCloudflare0.040.050.220.090.090.10.10.10.10.150.150.150.150.150.150.150.150.150.220.10.170.350.350.190.250.220.360.360.40.50.50.50.60.60.60.60.60.60.60.60.60.590.750.70.750.75
Reasoning models are indicated by a lightbulb icon.

The relative importance of input vs. output token prices varies by use case. E.g. Generation tasks are typically more output token weighted while document processing tasks are more input token weighted.

Price per token included in the request/message sent to the API, represented as USD per million Tokens.

Price per token generated by the model (received from the API), represented as USD per million Tokens.

Output Speed: gpt-oss-120B (high)

Output Speed: Output Tokens per Second; 10,000 Input Tokens
CerebrasLogo of Cerebras which relates to the data aboveCerebrasSambaNovaLogo of SambaNova which relates to the data aboveSambaNovaEigen AILogo of Eigen AI which relates to the data aboveEigen AINebius FastLogo of Nebius Fast which relates to the data aboveNebius FastClarifaiLogo of Clarifai which relates to the data aboveClarifaiGroqLogo of Groq which relates to the data aboveGroqGoogle VertexLogo of Google Vertex which relates to the data aboveGoogle VertexSnowflakeLogo of Snowflake which relates to the data aboveSnowflakeDatabricksLogo of Databricks which relates to the data aboveDatabricksAmazonLogo of Amazon which relates to the data aboveAmazonLightning AILogo of Lightning AI which relates to the data aboveLightning AIBasetenLogo of Baseten which relates to the data aboveBasetenAzureLogo of Azure which relates to the data aboveAzureScalewayLogo of Scaleway which relates to the data aboveScalewayNebius BaseLogo of Nebius Base which relates to the data aboveNebius BaseDeepInfra (Turbo)Logo of DeepInfra (Turbo) which relates to the data aboveDeepInfra (Turbo)Weights & BiasesLogo of Weights & Biases which relates to the data aboveWeights & BiasesCloudflareLogo of Cloudflare which relates to the data aboveCloudflareTogether.aiLogo of Together.ai which relates to the data aboveTogether.aiFireworksLogo of Fireworks which relates to the data aboveFireworksDeepInfraLogo of DeepInfra which relates to the data aboveDeepInfraNovitaLogo of Novita which relates to the data aboveNovitaParasailLogo of Parasail which relates to the data aboveParasail11711588594625221843670659641506476431301277240212203188
Reasoning models are indicated by a lightbulb icon.

Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).