Best Open Source Models — 2026 Rankings
Open Source LLM Leaderboard
The definitive ranking of every major open source model — compared across reasoning, coding, math, software engineering, and instruction following benchmarks.
Last updated: 2026-02-18
S
GLM-4.7
355B
GLM-5
744B
Kimi K2.5
1T
MiniMax M2.5
230B
DeepSeek V3.2
685B
Step-3.5-Flash
196B
A
Qwen 3.5
397B
MiMo-V2-Flash
309B
DeepSeek R1
671B
Qwen 3 235B
235B
B
GPT-oss 120B
117B
Mistral Large
675B
Nemotron Ultra 253B
253B
Nemotron Super 49B
49B
Step3
316B
C
DeepSeek V3
671B
Llama 4 Maverick
400B
Gemma 3 27B
27B
Nemotron Nano 30B
30B
D
Best Open Source Models by Task — Benchmark Rankings
Which open source LLM is best for coding, reasoning, or math? See how every model stacks up across key benchmarks — hover any bar for details.
Best Overall (MMLU)
General knowledge across 57 subjects (MMLU)
Best Overall Knowledge
Advanced knowledge across subjects (MMLU-Pro)
Open Source Model Benchmark Scores
Complete benchmark results for every open source LLM. Click any column header to sort and rank. Scores sourced from official tech reports.
Filter:
DeepSeek R1 DeepSeek | 671B | 128K | 90.8 | 84.0 | 90.2 | N/A | 65.9 | 87.5 | 71.5 | 97.3 | 1398 | 83.3 |
DeepSeek V3 DeepSeek | 671B | 128K | 88.5 | 81.2 | N/A | 38.8 | 49.2 | N/A | 68.4 | 94.0 | 1359 | N/A |
DeepSeek V3.2 DeepSeek | 685B | 130K | 88.5 | 85.0 | N/A | 67.8 | 74.1 | 89.3 | 79.9 | N/A | 1421 | N/A |
Gemma 3 27B | 27B | 128K | N/A | 67.5 | N/A | N/A | 29.7 | N/A | 42.4 | 89.0 | 1365 | N/A |
GLM-4.7 Zhipu AI | 355B | 200K | 90.1 | 84.3 | 94.2 | 73.8 | 84.9 | 95.7 | 85.7 | N/A | 1445 | 88.0 |
GLM-5 Zhipu AI | 744B | 200K | 85.0 | 70.4 | 90.0 | 77.8 | 52.0 | 84.0 | 86.0 | 88.0 | 1451 | 88.0 |
GPT-oss 120B OpenAI | 117B | 128K | 90.0 | 90.0 | N/A | 62.4 | 60.0 | N/A | 80.9 | N/A | 1354 | N/A |
Kimi K2.5 Moonshot | 1T | 262K | 92.0 | 87.1 | 99.0 | 76.8 | 85.0 | 96.1 | 87.6 | 98.0 | 1447 | 94.0 |
Llama 4 Maverick Meta | 400B | 1M | 85.5 | 80.5 | 62.0 | N/A | 43.4 | N/A | 69.8 | N/A | 1328 | N/A |
MiMo-V2-Flash Xiaomi | 309B | 262K | 86.7 | 84.9 | 84.8 | 73.4 | 80.6 | 94.1 | 83.7 | N/A | 1401 | N/A |
MiniMax M2.5 MiniMax | 230B | 205K | 85.0 | 76.5 | 89.6 | 80.2 | 65.0 | 86.3 | 85.2 | N/A | N/A | 87.5 |
Mistral Large Mistral | 675B | 256K | 85.5 | N/A | 92.0 | N/A | 82.8 | 88.0 | 43.9 | 93.6 | 1416 | N/A |
Nemotron Nano 30B Nvidia | 30B | 1M | N/A | 78.1 | N/A | N/A | N/A | N/A | 78.1 | N/A | N/A | N/A |
Nemotron Super 49B Nvidia | 49B | 128K | N/A | 79.5 | N/A | N/A | 73.6 | 82.7 | 72.0 | 97.4 | N/A | 88.6 |
Nemotron Ultra 253B Nvidia | 253B | 128K | N/A | N/A | N/A | N/A | 66.3 | 72.5 | 76.0 | 97.0 | 1347 | 89.5 |
Qwen 3 235B Qwen | 235B | 262K | N/A | 84.4 | N/A | N/A | 74.1 | 92.3 | 81.1 | N/A | 1422 | 87.8 |
Qwen 3.5 Qwen | 397B | 262K | 88.5 | 87.8 | N/A | 76.4 | 83.6 | N/A | 88.4 | N/A | N/A | 92.6 |
Step-3.5-Flash Stepfun | 196B | 256K | N/A | N/A | N/A | 74.4 | 86.4 | 97.3 | N/A | N/A | N/A | N/A |
Step3 Stepfun | 316B | 66K | N/A | N/A | N/A | N/A | 67.1 | 82.9 | 73.0 | N/A | 1360 | N/A |
Compare Open Source LLMs Head-to-Head
Select two open source models to see how they stack up across all benchmarks.
Model A
Model B
GLM-4.7
GLM-5
MMLU
90.1
vs
85.0
MMLU-Pro
84.3
vs
70.4
HumanEval
94.2
vs
90.0
SWE-bench Verified
73.8
vs
77.8
LiveCodeBench
84.9
vs
52.0
AIME 2025
95.7
vs
84.0
GPQA Diamond
85.7
vs
86.0
Chatbot Arena
1445
vs
1451
IFEval
88.0
vs
88.0
Benchmarks won
5
vs
3
Try These Open Source Models in Onyx
Onyx is the open-source AI platform that lets you connect any of these open source LLMs to your team's docs, apps, and people.