Best Open Source Models — 2026 Rankings

Open Source LLM Leaderboard

The definitive ranking of every major open source model — compared across reasoning, coding, math, software engineering, and instruction following benchmarks.

Roshan Desai · Last updated: 2026-03-24

LLM Leaderboard (All Models)

→

Self-Hosted LLM Leaderboard

→

Best LLM for Coding

→

GLM-5

744B

Kimi K2.5

MiniMax M2.5

230B

DeepSeek V3.2

685B

Step-3.5-Flash

196B

Qwen 3.5

397B

GLM-4.7

355B

MiMo-V2-Flash

309B

DeepSeek R1

671B

Qwen 3 235B

235B

GPT-oss 120B

117B

Mistral Large

675B

Nemotron Ultra 253B

253B

Nemotron Super 49B

49B

Step3

316B

DeepSeek V3

671B

Llama 4 Maverick

400B

Gemma 3 27B

27B

Nemotron Nano 30B

30B

EnterpriseRAG-Bench · 500K+ docs · 9 enterprise sources · MIT

See how your RAG solution stacks up on real company data

See how we did it

Best Open Source Models by Task — Benchmark Rankings

Which open source LLM is best for coding, reasoning, or math? See how every model stacks up across key benchmarks — hover any bar for details.

Best Overall (MMLU)

General knowledge across 57 subjects (MMLU)

Best Overall Knowledge

Advanced knowledge across subjects (MMLU-Pro)

Open Source Model Benchmark Scores

Complete benchmark results for every open source LLM. Click any column header to sort and rank. Scores sourced from official tech reports.

Filter:


DeepSeek R1 DeepSeek	671B	128K	84.0	71.5	83.3	1398	49.2	90.2	65.9	87.5	97.3	90.8
DeepSeek V3 DeepSeek	671B	128K	81.2	68.4	N/A	1359	38.8	N/A	49.2	N/A	94.0	88.5
DeepSeek V3.2 DeepSeek	685B	130K	85.0	79.9	N/A	1423	67.8	N/A	74.1	89.3	N/A	88.5
Gemma 3 27B Google	27B	128K	67.5	42.4	N/A	1366	N/A	N/A	29.7	N/A	89.0	N/A
GLM-4.7 Zhipu AI	355B	200K	84.3	85.7	88.0	1441	73.8	94.2	84.9	95.7	N/A	90.1
GLM-5 Zhipu AI	744B	200K	70.4	86.0	88.0	1454	77.8	90.0	52.0	84.0	88.0	85.0
GPT-oss 120B OpenAI	117B	128K	90.0	80.9	N/A	1355	62.4	88.3	60.0	97.9	N/A	90.0
Kimi K2.5 Moonshot	1T	262K	87.1	87.6	94.0	1438	76.8	99.0	85.0	96.1	98.0	92.0
Llama 4 Maverick Meta	400B	1M	80.5	69.8	N/A	1328	N/A	62.0	43.4	N/A	N/A	85.5
MiMo-V2-Flash Xiaomi	309B	262K	84.9	83.7	N/A	1393	73.4	84.8	80.6	94.1	N/A	86.7
MiniMax M2.5 MiniMax	230B	205K	76.5	85.2	87.5	1404	80.2	89.6	65.0	86.3	N/A	85.0
Mistral Large Mistral	675B	256K	N/A	43.9	N/A	1416	N/A	92.0	82.8	88.0	93.6	85.5
Nemotron Nano 30B Nvidia	30B	1M	78.1	78.1	N/A	1318	N/A	N/A	N/A	N/A	N/A	N/A
Nemotron Super 49B Nvidia	49B	128K	79.5	72.0	88.6	1342	N/A	N/A	73.6	82.7	97.4	N/A
Nemotron Ultra 253B Nvidia	253B	128K	N/A	76.0	89.5	1348	N/A	N/A	66.3	72.5	97.0	N/A
Qwen 3 235B Qwen	235B	262K	84.4	81.1	87.8	1423	N/A	N/A	74.1	92.3	N/A	N/A
Qwen 3.5 Qwen	397B	262K	87.8	88.4	92.6	1450	76.4	N/A	83.6	N/A	N/A	88.5
Step-3.5-Flash Stepfun	196B	262K	85.8	N/A	N/A	1389	74.4	81.1	86.4	99.8	N/A	N/A
Step3 Stepfun	316B	66K	N/A	73.0	N/A	1348	N/A	N/A	67.1	82.9	N/A	N/A

Compare Open Source LLMs Head-to-Head

Select two open source models to see how they stack up across all benchmarks.

Model A

Model B

GLM-4.7

GLM-5

MMLU-Pro

84.3

70.4

GPQA Diamond

85.7

86.0

IFEval

88.0

Chatbot Arena

1441

1454

SWE-bench Verified

73.8

77.8

HumanEval

94.2

90.0

LiveCodeBench

84.9

52.0

AIME 2025

95.7

84.0

MMLU

90.1

85.0

Benchmarks won

Try These Open Source Models in Onyx

Onyx is the open-source AI platform that lets you connect any of these open source LLMs to your team's docs, apps, and people.

Book a Demo View on GitHub