Self-Hosted LLMs — 2026 Rankings
Self-Hosted LLM Leaderboard
The definitive ranking of self-hostable LLMs for enterprise — compared across quality, speed, hardware requirements, and cost. Find the best open-weight model for your infrastructure.
Last updated: 2026-02-23
S
Kimi K2.5
1T
GLM-5
745B
Qwen 3.5
397B
A
DeepSeek R1
671B
Mistral Large 3
675B
GPT-oss 120B
120B
DeepSeek V3
671B
Step-3.5 Flash
196B
MiMo-V2-Flash
309B
B
Llama 4 Maverick
400B
Nemotron Ultra 253B
253B
Qwen3-235B-A22B
235B
Hunyuan 2.0
406B
GPT-oss 20B
20B
Llama 4 Scout
109B
C
Llama 3.3 70B
70B
DS-R1-Distill-Llama-70B
70B
Qwen 2.5-72B
72B
Gemma 3 27B
27B
DS-R1-Distill-Qwen-32B
32B
Command R+
104B
Qwen2.5-Coder-32B
32B
D
Mistral Small 3.1
24B
Phi-4
14B
Llama 3.1-8B
8B
Qwen3-30B-A3B
30B
Gemma 3 12B
12B
DS-R1-Distill-Qwen-14B
14B
DS-R1-Distill-Qwen-7B
7B
Phi-4-mini
3.8B
Best Self-Hosted LLMs by Task — Benchmark Rankings
Which self-hosted model is best for coding, reasoning, or agentic tasks? See how every open-weight model stacks up — hover any bar for details.
Best Advanced Knowledge
Advanced knowledge with harder 10-option format (MMLU-Pro)
Best in Graduate Reasoning
PhD-level science reasoning (GPQA Diamond)
Best at Instruction Following
Instruction following accuracy (IFEval)
Chatbot Arena Rankings
Crowdsourced Elo from human preference votes (LMArena)
Self-Hosted LLM Benchmark Scores & Hardware Requirements
Complete benchmark results, VRAM requirements, and licensing for every major self-hostable LLM. Click any column header to sort and rank.
Filter:
Command R+ Cohere | 104B | 131K | CC-BY-NC | 55 GB | 208 GB | N/A | N/A | N/A | 1195 | N/A | N/A | N/A | N/A | N/A |
DeepSeek R1 DeepSeek | 671B | 128K | MIT | 350 GB | 1340 GB | 84.0 | 71.5 | 83.3 | 1398 | 49.2 | 90.2 | 65.9 | 93.3 | 97.3 |
DeepSeek V3 DeepSeek | 671B | 128K | MIT | 350 GB | 1340 GB | 81.2 | 68.4 | N/A | 1359 | 38.8 | N/A | 49.2 | N/A | 94.0 |
DS-R1-Distill-Llama-70B DeepSeek | 70B | 128K | MIT | 38 GB | 140 GB | N/A | 65.2 | N/A | N/A | N/A | 86.0 | 57.5 | 70.0 | 94.5 |
DS-R1-Distill-Qwen-14B DeepSeek | 14B | 128K | MIT | 9 GB | 28 GB | N/A | 59.1 | N/A | N/A | N/A | N/A | 53.1 | N/A | 93.9 |
DS-R1-Distill-Qwen-32B DeepSeek | 32B | 128K | MIT | 19 GB | 64 GB | N/A | 62.1 | N/A | N/A | N/A | 85.4 | 53.1 | 72.0 | 94.3 |
DS-R1-Distill-Qwen-7B DeepSeek | 7B | 128K | MIT | 5 GB | 14 GB | N/A | 49.1 | N/A | N/A | N/A | N/A | N/A | N/A | 92.8 |
Gemma 3 12B | 12B | 128K | Gemma License | 8 GB | 24 GB | 60.0 | 40.9 | N/A | N/A | N/A | 85.4 | N/A | N/A | N/A |
Gemma 3 27B | 27B | 128K | Gemma License | 16 GB | 54 GB | 67.5 | 42.4 | N/A | 1365 | N/A | N/A | 29.7 | N/A | 89.0 |
GLM-5 Zhipu AI | 745B | 200K | MIT | 390 GB | 1490 GB | 70.4 | 86.0 | 88.0 | 1451 | 77.8 | 90.0 | 52.0 | 84.0 | 88.0 |
GPT-oss 120B OpenAI | 120B | 128K | Apache 2.0 | 62 GB | 80 GB | 90.0 | 80.9 | N/A | 1354 | 62.4 | N/A | 60.0 | 97.9 | N/A |
GPT-oss 20B OpenAI | 20B | 128K | Apache 2.0 | 14 GB | 16 GB | 85.3 | 71.5 | N/A | N/A | N/A | N/A | N/A | 98.7 | N/A |
Hunyuan 2.0 Tencent | 406B | 256K | Tencent License | 215 GB | 812 GB | N/A | N/A | N/A | N/A | 53.0 | N/A | N/A | N/A | N/A |
Kimi K2.5 Moonshot | 1T | 262K | MIT | 600 GB | 2000 GB | 87.1 | 87.6 | 94.0 | 1447 | 76.8 | 99.0 | 85.0 | 96.1 | 98.0 |
Llama 3.1-8B Meta | 8B | 131K | Llama License | 5 GB | 16 GB | 48.3 | 32.8 | 80.4 | 1186 | N/A | 72.6 | N/A | N/A | 51.9 |
Llama 3.3 70B Meta | 70B | 131K | Llama License | 38 GB | 140 GB | 68.9 | 50.7 | 92.1 | 1310 | N/A | 88.4 | N/A | N/A | 77.0 |
Llama 4 Maverick Meta | 400B | 1M | Llama License | 210 GB | 800 GB | 80.5 | 69.8 | N/A | 1328 | N/A | 62.0 | 43.4 | N/A | N/A |
Llama 4 Scout Meta | 109B | 10M | Llama License | 58 GB | 218 GB | 74.3 | 58.2 | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
MiMo-V2-Flash Xiaomi | 309B | 128K | MIT | 165 GB | 618 GB | 84.9 | 83.7 | N/A | N/A | 73.4 | N/A | 80.6 | 94.1 | N/A |
Mistral Large 3 Mistral | 675B | 256K | Apache 2.0 | 355 GB | 1350 GB | N/A | 43.9 | N/A | 1416 | N/A | 92.0 | 82.8 | 88.0 | 93.6 |
Mistral Small 3.1 Mistral | 24B | 131K | Apache 2.0 | 14 GB | 48 GB | 66.8 | 40.7 | 79.8 | N/A | N/A | 87.2 | N/A | N/A | N/A |
Nemotron Ultra 253B Nvidia | 253B | 128K | Open Weight | 135 GB | 506 GB | N/A | 76.0 | 89.5 | 1347 | N/A | N/A | 66.3 | 72.5 | 97.0 |
Phi-4 Microsoft | 14B | 16K | MIT | 9 GB | 28 GB | 70.4 | 56.1 | 64.6 | N/A | N/A | 82.6 | N/A | N/A | 80.4 |
Phi-4-mini Microsoft | 3.8B | 131K | MIT | 3 GB | 8 GB | 52.8 | 30.4 | N/A | N/A | N/A | 72.0 | N/A | N/A | N/A |
Qwen 2.5-72B Qwen | 72B | 131K | Apache 2.0 | 39 GB | 145 GB | 71.1 | 49.0 | 86.5 | 1295 | N/A | 86.6 | N/A | N/A | 83.1 |
Qwen 3.5 Qwen | 397B | 262K | Apache 2.0 | 210 GB | 794 GB | 87.8 | 88.4 | 92.6 | N/A | 76.4 | N/A | 83.6 | N/A | N/A |
Qwen2.5-Coder-32B Qwen | 32B | 131K | Apache 2.0 | 19 GB | 64 GB | N/A | N/A | N/A | N/A | N/A | 92.7 | 43.2 | N/A | N/A |
Qwen3-235B-A22B Qwen | 235B | 131K | Apache 2.0 | 125 GB | 470 GB | N/A | 71.1 | N/A | N/A | N/A | N/A | 70.7 | 81.5 | N/A |
Qwen3-30B-A3B Qwen | 30B | 131K | Apache 2.0 | 18 GB | 60 GB | 68.7 | 60.0 | N/A | N/A | N/A | N/A | N/A | 76.7 | 95.2 |
Step-3.5 Flash StepFun | 196B | 262K | Apache 2.0 | 120 GB | 392 GB | 85.8 | N/A | N/A | N/A | 74.4 | 81.1 | 86.4 | 99.8 | N/A |
Compare Self-Hosted LLMs Head-to-Head
Select two models to see how they stack up across all benchmarks.
Model A
Model B
DeepSeek R1
Qwen 3.5
MMLU-Pro
84.0
vs
87.8
GPQA Diamond
71.5
vs
88.4
IFEval
83.3
vs
92.6
SWE-bench Verified
49.2
vs
76.4
LiveCodeBench
65.9
vs
83.6
Benchmarks won
0
vs
5
Deploy These Models with Onyx
Onyx is the open-source AI platform that lets you self-host any of these LLMs and connect them to your team's docs, apps, and people.