Best Open Source Models — 2026 Rankings

Open Source LLM Leaderboard

The definitive ranking of every major open source model — compared across reasoning, coding, math, software engineering, and instruction following benchmarks.

Roshan Desai

Roshan Desai · Last updated: 2026-03-12

S

GLM-5

744B

Kimi K2.5

1T

MiniMax M2.5

230B

DeepSeek V3.2

685B

Step-3.5-Flash

196B

Qwen 3.5

397B

A

GLM-4.7

355B

MiMo-V2-Flash

309B

DeepSeek R1

671B

Qwen 3 235B

235B

B

GPT-oss 120B

117B

Mistral Large

675B

Nemotron Ultra 253B

253B

Nemotron Super 49B

49B

Step3

316B

C

DeepSeek V3

671B

Llama 4 Maverick

400B

Gemma 3 27B

27B

Nemotron Nano 30B

30B

D

Best Open Source Models by Task — Benchmark Rankings

Which open source LLM is best for coding, reasoning, or math? See how every model stacks up across key benchmarks — hover any bar for details.

Best Overall (MMLU)

General knowledge across 57 subjects (MMLU)

Best Overall Knowledge

Advanced knowledge across subjects (MMLU-Pro)

Open Source Model Benchmark Scores

Complete benchmark results for every open source LLM. Click any column header to sort and rank. Scores sourced from official tech reports.

Filter:

DeepSeek R1

DeepSeek

671B

128K

84.0

71.5

83.3

1398

49.2

90.2

65.9

87.5

97.3

90.8

DeepSeek V3

DeepSeek

671B

128K

81.2

68.4

N/A

1359

38.8

N/A

49.2

N/A

94.0

88.5

DeepSeek V3.2

DeepSeek

685B

130K

85.0

79.9

N/A

1423

67.8

N/A

74.1

89.3

N/A

88.5

Gemma 3 27B

Google

27B

128K

67.5

42.4

N/A

1366

N/A

N/A

29.7

N/A

89.0

N/A

GLM-4.7

Zhipu AI

355B

200K

84.3

85.7

88.0

1441

73.8

94.2

84.9

95.7

N/A

90.1

GLM-5

Zhipu AI

744B

200K

70.4

86.0

88.0

1454

77.8

90.0

52.0

84.0

88.0

85.0

GPT-oss 120B

OpenAI

117B

128K

90.0

80.9

N/A

1355

62.4

88.3

60.0

97.9

N/A

90.0

Kimi K2.5

Moonshot

1T

262K

87.1

87.6

94.0

1438

76.8

99.0

85.0

96.1

98.0

92.0

Llama 4 Maverick

Meta

400B

1M

80.5

69.8

N/A

1328

N/A

62.0

43.4

N/A

N/A

85.5

MiMo-V2-Flash

Xiaomi

309B

262K

84.9

83.7

N/A

1393

73.4

84.8

80.6

94.1

N/A

86.7

MiniMax M2.5

MiniMax

230B

205K

76.5

85.2

87.5

1404

80.2

89.6

65.0

86.3

N/A

85.0

Mistral Large

Mistral

675B

256K

N/A

43.9

N/A

1416

N/A

92.0

82.8

88.0

93.6

85.5

Nemotron Nano 30B

Nvidia

30B

1M

78.1

78.1

N/A

1318

N/A

N/A

N/A

N/A

N/A

N/A

Nemotron Super 49B

Nvidia

49B

128K

79.5

72.0

88.6

1342

N/A

N/A

73.6

82.7

97.4

N/A

Nemotron Ultra 253B

Nvidia

253B

128K

N/A

76.0

89.5

1348

N/A

N/A

66.3

72.5

97.0

N/A

Qwen 3 235B

Qwen

235B

262K

84.4

81.1

87.8

1423

N/A

N/A

74.1

92.3

N/A

N/A

Qwen 3.5

Qwen

397B

262K

87.8

88.4

92.6

1450

76.4

N/A

83.6

N/A

N/A

88.5

Step-3.5-Flash

Stepfun

196B

262K

85.8

N/A

N/A

1389

74.4

81.1

86.4

99.8

N/A

N/A

Step3

Stepfun

316B

66K

N/A

73.0

N/A

1348

N/A

N/A

67.1

82.9

N/A

N/A

Compare Open Source LLMs Head-to-Head

Select two open source models to see how they stack up across all benchmarks.

Model A

Model B

GLM-4.7

GLM-5

MMLU-Pro

84.3

vs

70.4

GPQA Diamond

85.7

vs

86.0

IFEval

88.0

vs

88.0

Chatbot Arena

1441

vs

1454

SWE-bench Verified

73.8

vs

77.8

HumanEval

94.2

vs

90.0

LiveCodeBench

84.9

vs

52.0

AIME 2025

95.7

vs

84.0

MMLU

90.1

vs

85.0

Benchmarks won

5

vs

3

Try These Open Source Models in Onyx

Onyx is the open-source AI platform that lets you connect any of these open source LLMs to your team's docs, apps, and people.