Self-Hosted LLMs — 2026 Rankings
Self-Hosted LLM Leaderboard
The definitive ranking of self-hostable LLMs for enterprise — compared across quality, speed, hardware requirements, and cost. Find the best open-weight model for your infrastructure.
Roshan Desai · Last updated: 2026-03-12
LLM Leaderboard (All Models)
→Open Source LLM Leaderboard
→Best LLM for Coding
→Calculate hardware requirements
→S
Kimi K2.5
1T
GLM-5
744B
MiniMax M2.5
230B
Qwen 3.5
397B
A
DeepSeek R1
671B
GLM-4.7
355B
Mistral Large 3
675B
GPT-oss 120B
117B
DeepSeek V3.2
685B
Step-3.5-Flash
196B
MiMo-V2-Flash
309B
Qwen3.5-9B
9B
Qwen3.5-4B
4B
Qwen3-Coder-Next
80B
B
Llama 4 Maverick
400B
Nemotron Ultra 253B
253B
Qwen3-235B-A22B
235B
Hunyuan 2.0
406B
GPT-oss 20B
20B
Llama 4 Scout
109B
C
Llama 3.3 70B
70B
DS-R1-Distill-Llama-70B
70B
Qwen 2.5-72B
72B
Gemma 3 27B
27B
DS-R1-Distill-Qwen-32B
32B
Command R+
104B
Devstral-2-123B
123B
D
Mistral Small 3.1
24B
Phi-4
14B
Llama 3.1-8B
8B
Qwen3-30B-A3B
30B
Gemma 3 12B
12B
DS-R1-Distill-Qwen-14B
14B
DS-R1-Distill-Qwen-7B
7B
Phi-4-mini
3.8B
Best Self-Hosted LLMs by Task — Benchmark Rankings
Which self-hosted model is best for coding, reasoning, or agentic tasks? See how every open-weight model stacks up — hover any bar for details.
Best Advanced Knowledge
Advanced knowledge with harder 10-option format (MMLU-Pro)
Best in Graduate Reasoning
PhD-level science reasoning (GPQA Diamond)
Best at Instruction Following
Instruction following accuracy (IFEval)
Chatbot Arena Rankings
Crowdsourced Elo from human preference votes (LMArena)
Self-Hosted LLM Benchmark Scores & Hardware Requirements
Complete benchmark results, VRAM requirements, and licensing for every major self-hostable LLM. Click any column header to sort and rank.
Filter:
Command R+ Cohere | 104B | 131K | CC-BY-NC | 55 GB | 208 GB | N/A | N/A | N/A | 1262 | N/A | N/A | N/A | N/A | N/A |
DeepSeek R1 DeepSeek | 671B | 128K | MIT | 351 GB | 1340 GB | 84.0 | 71.5 | 83.3 | 1398 | 49.2 | 90.2 | 65.9 | 87.5 | 97.3 |
DeepSeek V3.2 DeepSeek | 685B | 130K | N/A | 351 GB | 1367 GB | 85.0 | 79.9 | N/A | 1423 | 67.8 | N/A | 74.1 | 89.3 | N/A |
Devstral-2-123B Mistral | 123B | 256K | Modified MIT | 65 GB | 246 GB | N/A | N/A | N/A | N/A | 72.2 | N/A | N/A | N/A | N/A |
DS-R1-Distill-Llama-70B DeepSeek | 70B | 128K | MIT | 36 GB | 140 GB | N/A | 65.2 | N/A | N/A | N/A | 86.0 | 57.5 | 70.0 | 94.5 |
DS-R1-Distill-Qwen-14B DeepSeek | 14B | 128K | MIT | 8 GB | 28 GB | N/A | 59.1 | N/A | N/A | N/A | N/A | 53.1 | N/A | 93.9 |
DS-R1-Distill-Qwen-32B DeepSeek | 32B | 128K | MIT | 17 GB | 64 GB | N/A | 62.1 | N/A | N/A | N/A | 85.4 | 53.1 | 72.0 | 94.3 |
DS-R1-Distill-Qwen-7B DeepSeek | 7B | 128K | MIT | 4 GB | 14 GB | N/A | 49.1 | N/A | N/A | N/A | N/A | N/A | N/A | 92.8 |
Gemma 3 12B | 12B | 128K | Gemma License | 8 GB | 24 GB | 60.0 | 40.9 | N/A | 1342 | N/A | 85.4 | N/A | N/A | N/A |
Gemma 3 27B | 27B | 128K | Gemma License | 14 GB | 54 GB | 67.5 | 42.4 | N/A | 1366 | N/A | N/A | 29.7 | N/A | 89.0 |
GLM-4.7 Zhipu AI | 355B | 200K | MIT | 180 GB | 710 GB | 84.3 | 85.7 | 88.0 | 1441 | 73.8 | 94.2 | 84.9 | 95.7 | N/A |
GLM-5 Zhipu AI | 744B | 200K | MIT | 386 GB | 1490 GB | 70.4 | 86.0 | 88.0 | 1454 | 77.8 | 90.0 | 52.0 | 84.0 | 88.0 |
GPT-oss 120B OpenAI | 117B | 128K | Apache 2.0 | 62 GB | 234 GB | 90.0 | 80.9 | N/A | 1355 | 62.4 | 88.3 | 60.0 | 97.9 | N/A |
GPT-oss 20B OpenAI | 20B | 128K | Apache 2.0 | 11 GB | 40 GB | 85.3 | 71.5 | N/A | 1318 | N/A | N/A | N/A | 98.7 | N/A |
Hunyuan 2.0 Tencent | 406B | 256K | Tencent License | 215 GB | 812 GB | N/A | N/A | N/A | N/A | 53.0 | N/A | N/A | N/A | N/A |
Kimi K2.5 Moonshot | 1T | 262K | MIT | 542 GB | 2000 GB | 87.1 | 87.6 | 94.0 | 1438 | 76.8 | 99.0 | 85.0 | 96.1 | 98.0 |
Llama 3.1-8B Meta | 8B | 131K | Llama License | 5 GB | 16 GB | 48.3 | 32.8 | 80.4 | 1212 | N/A | 72.6 | N/A | N/A | 51.9 |
Llama 3.3 70B Meta | 70B | 131K | Llama License | 38 GB | 140 GB | 68.9 | 50.7 | 92.1 | 1319 | N/A | 88.4 | N/A | N/A | 77.0 |
Llama 4 Maverick Meta | 400B | 1M | Llama License | 206 GB | 800 GB | 80.5 | 69.8 | N/A | 1328 | N/A | 62.0 | 43.4 | N/A | N/A |
Llama 4 Scout Meta | 109B | 10M | Llama License | 58 GB | 218 GB | 74.3 | 58.2 | N/A | 1323 | N/A | N/A | N/A | N/A | N/A |
MiMo-V2-Flash Xiaomi | 309B | 262K | MIT | 159 GB | 618 GB | 84.9 | 83.7 | N/A | 1393 | 73.4 | 84.8 | 80.6 | 94.1 | N/A |
MiniMax M2.5 MiniMax | 230B | 205K | Apache 2.0 | 117 GB | 460 GB | 76.5 | 85.2 | 87.5 | 1404 | 80.2 | 89.6 | 65.0 | 86.3 | N/A |
Mistral Large 3 Mistral | 675B | 256K | Apache 2.0 | 355 GB | 1350 GB | N/A | 43.9 | N/A | 1416 | N/A | 92.0 | 82.8 | 88.0 | 93.6 |
Mistral Small 3.1 Mistral | 24B | 131K | Apache 2.0 | 14 GB | 48 GB | 66.8 | 40.7 | 79.8 | 1304 | N/A | 87.2 | N/A | N/A | N/A |
Nemotron Ultra 253B Nvidia | 253B | 128K | Open Weight | 135 GB | 506 GB | N/A | 76.0 | 89.5 | 1348 | N/A | N/A | 66.3 | 72.5 | 97.0 |
Phi-4 Microsoft | 14B | 16K | MIT | 9 GB | 28 GB | 70.4 | 56.1 | 64.6 | 1256 | N/A | 82.6 | N/A | N/A | 80.4 |
Phi-4-mini Microsoft | 3.8B | 131K | MIT | 3 GB | 8 GB | 52.8 | 30.4 | N/A | N/A | N/A | 72.0 | N/A | N/A | N/A |
Qwen 2.5-72B Qwen | 72B | 131K | Apache 2.0 | 37 GB | 145 GB | 71.1 | 49.0 | 86.5 | 1303 | N/A | 86.6 | N/A | N/A | 83.1 |
Qwen 3.5 Qwen | 397B | 262K | Apache 2.0 | 207 GB | 794 GB | 87.8 | 88.4 | 92.6 | 1450 | 76.4 | N/A | 83.6 | N/A | N/A |
Qwen3-235B-A22B Qwen | 235B | 131K | Apache 2.0 | 120 GB | 470 GB | N/A | 71.1 | N/A | 1423 | N/A | N/A | 70.7 | 81.5 | N/A |
Qwen3-30B-A3B Qwen | 30B | 131K | Apache 2.0 | 16 GB | 60 GB | 68.7 | 60.0 | N/A | 1384 | N/A | N/A | N/A | 76.7 | 95.2 |
Qwen3-Coder-Next Qwen | 80B | 256K | Apache 2.0 | 42 GB | 160 GB | 78.4 | 53.4 | 89.1 | N/A | 70.6 | 94.1 | 74.5 | 89.2 | 83.5 |
Qwen3.5-4B Qwen | 4B | 262K | Apache 2.0 | 2 GB | 8 GB | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
Qwen3.5-9B Qwen | 9B | 262K | Apache 2.0 | 5 GB | 18 GB | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
Step-3.5-Flash Stepfun | 196B | 262K | Apache 2.0 | 102 GB | 392 GB | 85.8 | N/A | N/A | 1389 | 74.4 | 81.1 | 86.4 | 99.8 | N/A |
VRAM estimates are based on model weight size only: FP16 uses 2 bytes per parameter (e.g. 70B model = 140 GB), INT4 uses 0.5 bytes per parameter (e.g. 70B model = 35 GB). Actual usage is typically 10–20% higher due to KV cache, activations, and framework overhead. Tools like Ollama default to 4-bit quantization, so real-world usage is often closer to the INT4 figure.
Compare Self-Hosted LLMs Head-to-Head
Select two models to see how they stack up across all benchmarks.
Model A
Model B
DeepSeek R1
Qwen 3.5
MMLU-Pro
84.0
vs
87.8
GPQA Diamond
71.5
vs
88.4
IFEval
83.3
vs
92.6
Chatbot Arena
1398
vs
1450
SWE-bench Verified
49.2
vs
76.4
LiveCodeBench
65.9
vs
83.6
Benchmarks won
0
vs
6
Deploy These Models with Onyx
Onyx is the open-source AI platform that lets you self-host any of these LLMs and connect them to your team's docs, apps, and people.