Self-Hosted LLMs — 2026 Rankings

Self-Hosted LLM Leaderboard

The definitive ranking of self-hostable LLMs for enterprise — compared across quality, speed, hardware requirements, and cost. Find the best open-weight model for your infrastructure.

Roshan Desai

Roshan Desai · Last updated: 2026-03-12

S

Kimi K2.5

1T

GLM-5

744B

MiniMax M2.5

230B

Qwen 3.5

397B

A

DeepSeek R1

671B

GLM-4.7

355B

Mistral Large 3

675B

GPT-oss 120B

117B

DeepSeek V3.2

685B

Step-3.5-Flash

196B

MiMo-V2-Flash

309B

Qwen3.5-9B

9B

Qwen3.5-4B

4B

Qwen3-Coder-Next

80B

B

Llama 4 Maverick

400B

Nemotron Ultra 253B

253B

Qwen3-235B-A22B

235B

Hunyuan 2.0

406B

GPT-oss 20B

20B

Llama 4 Scout

109B

C

Llama 3.3 70B

70B

DS-R1-Distill-Llama-70B

70B

Qwen 2.5-72B

72B

Gemma 3 27B

27B

DS-R1-Distill-Qwen-32B

32B

Command R+

104B

Devstral-2-123B

123B

D

Mistral Small 3.1

24B

Phi-4

14B

Llama 3.1-8B

8B

Qwen3-30B-A3B

30B

Gemma 3 12B

12B

DS-R1-Distill-Qwen-14B

14B

DS-R1-Distill-Qwen-7B

7B

Phi-4-mini

3.8B

Best Self-Hosted LLMs by Task — Benchmark Rankings

Which self-hosted model is best for coding, reasoning, or agentic tasks? See how every open-weight model stacks up — hover any bar for details.

Best Advanced Knowledge

Advanced knowledge with harder 10-option format (MMLU-Pro)

Best in Graduate Reasoning

PhD-level science reasoning (GPQA Diamond)

Best at Instruction Following

Instruction following accuracy (IFEval)

Chatbot Arena Rankings

Crowdsourced Elo from human preference votes (LMArena)

Self-Hosted LLM Benchmark Scores & Hardware Requirements

Complete benchmark results, VRAM requirements, and licensing for every major self-hostable LLM. Click any column header to sort and rank.

Filter:

Command R+

Cohere

104B

131K

CC-BY-NC

55 GB

208 GB

N/A

N/A

N/A

1262

N/A

N/A

N/A

N/A

N/A

DeepSeek R1

DeepSeek

671B

128K

MIT

351 GB

1340 GB

84.0

71.5

83.3

1398

49.2

90.2

65.9

87.5

97.3

DeepSeek V3.2

DeepSeek

685B

130K

N/A

351 GB

1367 GB

85.0

79.9

N/A

1423

67.8

N/A

74.1

89.3

N/A

Devstral-2-123B

Mistral

123B

256K

Modified MIT

65 GB

246 GB

N/A

N/A

N/A

N/A

72.2

N/A

N/A

N/A

N/A

DS-R1-Distill-Llama-70B

DeepSeek

70B

128K

MIT

36 GB

140 GB

N/A

65.2

N/A

N/A

N/A

86.0

57.5

70.0

94.5

DS-R1-Distill-Qwen-14B

DeepSeek

14B

128K

MIT

8 GB

28 GB

N/A

59.1

N/A

N/A

N/A

N/A

53.1

N/A

93.9

DS-R1-Distill-Qwen-32B

DeepSeek

32B

128K

MIT

17 GB

64 GB

N/A

62.1

N/A

N/A

N/A

85.4

53.1

72.0

94.3

DS-R1-Distill-Qwen-7B

DeepSeek

7B

128K

MIT

4 GB

14 GB

N/A

49.1

N/A

N/A

N/A

N/A

N/A

N/A

92.8

Gemma 3 12B

Google

12B

128K

Gemma License

8 GB

24 GB

60.0

40.9

N/A

1342

N/A

85.4

N/A

N/A

N/A

Gemma 3 27B

Google

27B

128K

Gemma License

14 GB

54 GB

67.5

42.4

N/A

1366

N/A

N/A

29.7

N/A

89.0

GLM-4.7

Zhipu AI

355B

200K

MIT

180 GB

710 GB

84.3

85.7

88.0

1441

73.8

94.2

84.9

95.7

N/A

GLM-5

Zhipu AI

744B

200K

MIT

386 GB

1490 GB

70.4

86.0

88.0

1454

77.8

90.0

52.0

84.0

88.0

GPT-oss 120B

OpenAI

117B

128K

Apache 2.0

62 GB

234 GB

90.0

80.9

N/A

1355

62.4

88.3

60.0

97.9

N/A

GPT-oss 20B

OpenAI

20B

128K

Apache 2.0

11 GB

40 GB

85.3

71.5

N/A

1318

N/A

N/A

N/A

98.7

N/A

Hunyuan 2.0

Tencent

406B

256K

Tencent License

215 GB

812 GB

N/A

N/A

N/A

N/A

53.0

N/A

N/A

N/A

N/A

Kimi K2.5

Moonshot

1T

262K

MIT

542 GB

2000 GB

87.1

87.6

94.0

1438

76.8

99.0

85.0

96.1

98.0

Llama 3.1-8B

Meta

8B

131K

Llama License

5 GB

16 GB

48.3

32.8

80.4

1212

N/A

72.6

N/A

N/A

51.9

Llama 3.3 70B

Meta

70B

131K

Llama License

38 GB

140 GB

68.9

50.7

92.1

1319

N/A

88.4

N/A

N/A

77.0

Llama 4 Maverick

Meta

400B

1M

Llama License

206 GB

800 GB

80.5

69.8

N/A

1328

N/A

62.0

43.4

N/A

N/A

Llama 4 Scout

Meta

109B

10M

Llama License

58 GB

218 GB

74.3

58.2

N/A

1323

N/A

N/A

N/A

N/A

N/A

MiMo-V2-Flash

Xiaomi

309B

262K

MIT

159 GB

618 GB

84.9

83.7

N/A

1393

73.4

84.8

80.6

94.1

N/A

MiniMax M2.5

MiniMax

230B

205K

Apache 2.0

117 GB

460 GB

76.5

85.2

87.5

1404

80.2

89.6

65.0

86.3

N/A

Mistral Large 3

Mistral

675B

256K

Apache 2.0

355 GB

1350 GB

N/A

43.9

N/A

1416

N/A

92.0

82.8

88.0

93.6

Mistral Small 3.1

Mistral

24B

131K

Apache 2.0

14 GB

48 GB

66.8

40.7

79.8

1304

N/A

87.2

N/A

N/A

N/A

Nemotron Ultra 253B

Nvidia

253B

128K

Open Weight

135 GB

506 GB

N/A

76.0

89.5

1348

N/A

N/A

66.3

72.5

97.0

Phi-4

Microsoft

14B

16K

MIT

9 GB

28 GB

70.4

56.1

64.6

1256

N/A

82.6

N/A

N/A

80.4

Phi-4-mini

Microsoft

3.8B

131K

MIT

3 GB

8 GB

52.8

30.4

N/A

N/A

N/A

72.0

N/A

N/A

N/A

Qwen 2.5-72B

Qwen

72B

131K

Apache 2.0

37 GB

145 GB

71.1

49.0

86.5

1303

N/A

86.6

N/A

N/A

83.1

Qwen 3.5

Qwen

397B

262K

Apache 2.0

207 GB

794 GB

87.8

88.4

92.6

1450

76.4

N/A

83.6

N/A

N/A

Qwen3-235B-A22B

Qwen

235B

131K

Apache 2.0

120 GB

470 GB

N/A

71.1

N/A

1423

N/A

N/A

70.7

81.5

N/A

Qwen3-30B-A3B

Qwen

30B

131K

Apache 2.0

16 GB

60 GB

68.7

60.0

N/A

1384

N/A

N/A

N/A

76.7

95.2

Qwen3-Coder-Next

Qwen

80B

256K

Apache 2.0

42 GB

160 GB

78.4

53.4

89.1

N/A

70.6

94.1

74.5

89.2

83.5

Qwen3.5-4B

Qwen

4B

262K

Apache 2.0

2 GB

8 GB

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

Qwen3.5-9B

Qwen

9B

262K

Apache 2.0

5 GB

18 GB

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

Step-3.5-Flash

Stepfun

196B

262K

Apache 2.0

102 GB

392 GB

85.8

N/A

N/A

1389

74.4

81.1

86.4

99.8

N/A

VRAM estimates are based on model weight size only: FP16 uses 2 bytes per parameter (e.g. 70B model = 140 GB), INT4 uses 0.5 bytes per parameter (e.g. 70B model = 35 GB). Actual usage is typically 10–20% higher due to KV cache, activations, and framework overhead. Tools like Ollama default to 4-bit quantization, so real-world usage is often closer to the INT4 figure.

Compare Self-Hosted LLMs Head-to-Head

Select two models to see how they stack up across all benchmarks.

Model A

Model B

DeepSeek R1

Qwen 3.5

MMLU-Pro

84.0

vs

87.8

GPQA Diamond

71.5

vs

88.4

IFEval

83.3

vs

92.6

Chatbot Arena

1398

vs

1450

SWE-bench Verified

49.2

vs

76.4

LiveCodeBench

65.9

vs

83.6

Benchmarks won

0

vs

6

Deploy These Models with Onyx

Onyx is the open-source AI platform that lets you self-host any of these LLMs and connect them to your team's docs, apps, and people.