All Insights

AI Tools11 min readPublished May 5, 2026Last updated May 7, 2026

Best Open Source LLMs in 2026

By Roshan Desai

Two years ago, using an open-source LLM for serious work meant accepting a meaningful capability gap versus GPT-4 or Claude. That's no longer true. In 2026, MIT-licensed models like Kimi K2.5 and GLM-5 now approach proprietary frontier models on several coding and reasoning benchmarks. For teams with data privacy requirements, the need to fine-tune on their own data, or the desire to avoid recurring API costs, the open-source tier is now a viable primary choice, not just a fallback.

This guide covers the top 10 open-source and open-weight LLMs from the Onyx Open LLM Leaderboard, updated as of March 12, 2026.

How this guide is sourced: Licensing, benchmark, parameter, and API availability data comes from the Onyx Open LLM Leaderboard. The recommendations in each section are editorial guidance for teams comparing open-source and open-weight options.

TL;DR: Open-source LLMs have closed most of the gap with proprietary models for coding and reasoning tasks. Kimi K2.5 and GLM-5 are the two strongest picks: Kimi K2.5 leads on code generation and math under an MIT license, while GLM-5 is the best open-weight model for autonomously fixing real software bugs. For teams that need Apache 2.0 licensing, Qwen 3.5 leads on reasoning. If you want a cheap hosted API rather than self-hosting, DeepSeek V3.2 at $0.28/M is the best reference point. For teams that want to run a capable model on a single H100, GPT-oss 120B is the practical choice.

What Is an Open-Source LLM?

An open-source large language model is one whose weights are publicly available for download, so you can run it on your own hardware, fine-tune it on your own data, and deploy it without paying per-token API fees. The license determines what you can actually do with it commercially.

The most permissive licenses are MIT and Apache 2.0, which allow unrestricted commercial use. The Llama License (Meta) and Gemma License (Google) are open for most uses but have specific restrictions.

License types in this guide:

License	Commercial Use	Fine-Tuning	Redistribution	Restrictions
MIT	Yes	Yes	Yes	None
Apache 2.0	Yes	Yes	Yes	Attribution required
Llama License	Yes (under 700M users)	Yes	Yes	Requires Meta approval above threshold
Gemma License	Yes	Yes	Yes	Prohibits uses that harm Google products
Open Weight	Varies	Varies	Varies	Check per model

What Is Onyx?

Onyx is an open-source AI platform that helps teams operationalize open-source and open-weight LLMs. The model provides inference, while Onyx supplies connectors, permission-aware retrieval, AI chat, agents, deep research, and user access across the organization.

That matters because adopting an open model is rarely the end of the project. Teams still need to connect it to Slack, Confluence, Google Drive, Jira, GitHub, and other knowledge sources without breaking access controls. Onyx provides that layer while keeping model choice open.

Best Open Source LLMs 2026: Comparison Table

Model	Provider	License	Params (Total/Active)	SWE-bench	GPQA Diamond	AIME 2025	HumanEval	Arena Elo
Kimi K2.5	Moonshot	MIT	1T / 32B	76.8%	87.6%	96.1%	99.0%	1,447
GLM-5	Zhipu AI	MIT	744B / 40B	77.8%	86.0%	84.0%	90.0%	1,451
GLM-4.7	Zhipu AI	MIT	355B / 32B	73.8%	85.7%	95.7%	94.2%	1,445
Qwen 3.5	Qwen	Apache 2.0	397B / 17B	76.4%	88.4%	N/A	N/A	N/A
MiMo-V2-Flash	Xiaomi	MIT	309B / 15B	73.4%	83.7%	94.1%	84.8%	1,401
DeepSeek V3.2	DeepSeek	Unlicensed	685B / 37B	67.8%	79.9%	89.3%	N/A	1,421
Qwen 3 235B	Qwen	Apache 2.0	235B / 22B	N/A	81.1%	92.3%	N/A	1,422
Step-3.5-Flash	Stepfun	Proprietary API	196B / 11B	74.4%	N/A	97.3%	81.1%	N/A
MiniMax M2.5	MiniMax	Proprietary API	230B / 10B	80.2%	85.2%	86.3%	89.6%	N/A
GPT-oss 120B	OpenAI	Apache 2.0	117B / 5.1B	62.4%	80.9%	97.9%	88.3%	1,354

Source: Onyx Open LLM Leaderboard, last updated March 12, 2026.

Top Open Source Models at a Glance

Best at fixing real bugs: GLM-5, Kimi K2.5, Qwen 3.5, GLM-4.7, MiMo-V2-Flash

Best code generation: Kimi K2.5, GLM-4.7, GLM-5, GPT-oss 120B, MiMo-V2-Flash

Best reasoning: Qwen 3.5, Kimi K2.5, GLM-5, GLM-4.7, MiMo-V2-Flash

Best math: GPT-oss 120B, Kimi K2.5, GLM-4.7, MiMo-V2-Flash, Qwen 3 235B

Low-cost API options: DeepSeek V3.2 at $0.28/M, Kimi K2.5 free API, Step-3.5-Flash at $0.10/M

Top Open Source Models: Detailed Reviews

1. Kimi K2.5 (Moonshot, MIT)

Facts: Kimi K2.5 is listed as MIT-licensed with 1T total / 32B active parameters, 76.8% SWE-bench, 87.6% GPQA Diamond, 96.1% AIME 2025, 99.0% HumanEval, and 1,447 Arena Elo.

Recommendation: Choose Kimi K2.5 when you want the strongest combination of code generation, math performance, and permissive licensing in one model.

2. GLM-5 (Zhipu AI, MIT)

Facts: GLM-5 is listed as MIT-licensed with 744B total / 40B active parameters, 77.8% SWE-bench, 86.0% GPQA Diamond, 84.0% AIME 2025, 90.0% HumanEval, and 1,451 Arena Elo.

Recommendation: Pick GLM-5 if your top priority is open-weight software-engineering performance rather than the best HumanEval or math score.

3. GLM-4.7 (Zhipu AI, MIT)

Facts: GLM-4.7 is listed as MIT-licensed with 355B total / 32B active parameters, 73.8% SWE-bench, 85.7% GPQA Diamond, 95.7% AIME 2025, 94.2% HumanEval, and 1,445 Arena Elo.

Recommendation: GLM-4.7 is the better fit than GLM-5 when you still want strong open-weight coding performance but need a somewhat more accessible deployment profile.

4. Qwen 3.5 (Alibaba, Apache 2.0)

Facts: Qwen 3.5 is listed as Apache 2.0 licensed with 397B total / 17B active parameters, 76.4% SWE-bench, and 88.4% GPQA Diamond. HumanEval, AIME, and Arena Elo are not listed in this snapshot.

Recommendation: Use Qwen 3.5 when you need an Apache-licensed model with especially strong reasoning performance.

5. MiMo-V2-Flash (Xiaomi, MIT)

Facts: MiMo-V2-Flash is listed as MIT-licensed with 309B total / 15B active parameters, 73.4% SWE-bench, 83.7% GPQA Diamond, 94.1% AIME 2025, 84.8% HumanEval, and 1,401 Arena Elo.

Recommendation: MiMo-V2-Flash is a sensible option when you want solid math and coding performance under MIT without reaching for the largest models in this category.

6. DeepSeek V3.2 (DeepSeek)

Facts: DeepSeek V3.2 is listed with public weights, no standard open-source license, 685B total / 37B active parameters, 67.8% SWE-bench, 79.9% GPQA Diamond, 89.3% AIME 2025, 1,421 Arena Elo, and $0.28/M input pricing.

Recommendation: DeepSeek V3.2 is a cost-driven choice for teams comfortable reviewing non-standard licensing terms before production use.

7. Qwen 3 235B (Alibaba, Apache 2.0)

Facts: Qwen 3 235B is listed as Apache 2.0 licensed with 235B total / 22B active parameters, 81.1% GPQA Diamond, 92.3% AIME 2025, and 1,422 Arena Elo. SWE-bench and HumanEval are not listed in this snapshot.

Recommendation: Choose Qwen 3 235B when you want Apache licensing and stronger reasoning signals than you need coding-specific benchmark depth.

8. Step-3.5-Flash (Stepfun)

Facts: Step-3.5-Flash is listed as a proprietary API model with 196B total / 11B active parameters, 74.4% SWE-bench, 97.3% AIME 2025, 81.1% HumanEval, and $0.10/M input pricing.

Recommendation: Step-3.5-Flash is useful as a budget benchmark in this comparison, but it is not the right pick if self-hosting or open-weight access is your actual requirement.

9. MiniMax M2.5 (MiniMax)

Facts: MiniMax M2.5 is listed as a proprietary API model with 230B total / 10B active parameters, 80.2% SWE-bench, 85.2% GPQA Diamond, 86.3% AIME 2025, 89.6% HumanEval, and $0.30/M input pricing.

Recommendation: MiniMax M2.5 is relevant here mainly as a price-performance reference point for teams deciding whether open-weight deployment is worth the tradeoff.

10. GPT-oss 120B (OpenAI, Apache 2.0)

Facts: GPT-oss 120B is listed as Apache 2.0 licensed with 117B total / 5.1B active parameters, 62.4% SWE-bench, 80.9% GPQA Diamond, 97.9% AIME 2025, 88.3% HumanEval, and 1,354 Arena Elo.

Recommendation: GPT-oss 120B is the strongest option in this list when single-node self-hosting and Apache licensing matter more than reaching the top of SWE-bench.

Open Source LLMs: Best for Each Use Case

Use Case	Best Model	License	Key Score
Best coding	Kimi K2.5	MIT	99% HumanEval, 76.8% SWE-bench
Best reasoning	Qwen 3.5	Apache 2.0	88.4% GPQA Diamond
Best math	GLM-4.7	MIT	95.7% AIME 2025
Best at fixing real bugs	GLM-5	MIT	77.8% SWE-bench
Cheapest API with frontier scores	DeepSeek V3.2	Unlicensed	$0.28/M input
Best single-H100 deployment	GPT-oss 120B	Apache 2.0	62.4% SWE-bench, 97.9% AIME
Best algorithmic tasks at low cost	Step-3.5-Flash	API	$0.10/M, 74.4% SWE-bench
Best for fine-tuning (unrestricted)	Kimi K2.5	MIT	Fully open weights

Using Open-Source LLMs in Enterprise Workflows

Choosing an MIT or Apache 2.0 model is only part of the decision. Teams also need a way to compare hosted and self-hosted backends, connect those models to internal data, and preserve permissions.

Onyx is useful in this context because it gives teams a common application layer on top of open-source models. You can test a hosted API against a self-hosted endpoint, connect the chosen model to sources like Slack, Confluence, Jira, Google Drive, and GitHub, and keep permission-aware retrieval in front of users. That makes it easier to act on the licensing and deployment tradeoffs in this guide instead of evaluating each model in isolation.

Recommended Open-Source LLM Stack

Deployment goal	Recommended setup	Why
Local evaluation	Ollama or LM Studio with a smaller open model	Fastest way to test prompts and fit
Production inference	vLLM or SGLang serving an MIT or Apache model	Better throughput and operational control
Enterprise knowledge	Onyx connected to the inference endpoint	Adds connectors, permissions, citations, and user access
Regulated deployment	Onyx self-hosted + local model + private storage	Keeps data and inference inside the environment

Pick the model and platform together. A permissive model license is valuable only if the surrounding application layer also supports the deployment and governance requirements.

Frequently Asked Questions

What is the best open-source LLM in 2026?

In this leaderboard snapshot, there is no single winner across every benchmark. Kimi K2.5 leads HumanEval (99%) and AIME (96.1%). GLM-5 has the strongest SWE-bench result among open models (77.8%). Qwen 3.5 has the highest GPQA Diamond score among the open-source models listed (88.4%). The best choice depends on whether you care most about coding, reasoning, licensing, or deployment constraints.

What is the difference between open-source and open-weight LLMs?

Open-source LLMs have publicly available weights, architecture, and (ideally) training code under a permissive license like MIT or Apache 2.0. Open-weight models release weights but may have proprietary training code or restrictive license terms. In practice, most "open-source" LLMs are open-weight: you can download and run them, but full source code and training data are rarely published.

Which open-source LLMs can I use commercially?

MIT and Apache 2.0 licensed models allow commercial use without restriction: Kimi K2.5, GLM-4.7, GLM-5, MiMo-V2-Flash, GPT-oss 120B, Qwen 3.5, Qwen 3 235B. DeepSeek V3.2 has a non-standard license that requires review for commercial deployments. Step-3.5-Flash and MiniMax M2.5 are proprietary API models. Always check the specific license terms for your deployment scenario.

Can I run these open-source models locally?

Most models in this list require enterprise hardware (4x H100 80GB or equivalent) for full-precision inference. More accessible options include GPT-oss 120B (1x H100) and the DeepSeek R1 distilled variants (DS-R1-Distill-Qwen-32B, DS-R1-Distill-Llama-70B) that run on a single RTX 4090 or H100. See the Best Self-Hosted LLMs 2026 guide for hardware requirements per model.

What is the best platform for running open-source and proprietary LLMs together?

Most teams end up mixing models: a self-hosted open-weight model for sensitive data, a cheap API for high-volume tasks, and a frontier model for the hardest work. Onyx gives teams a single interface to connect all of these, routing tasks to the right model while keeping answers grounded in company knowledge from Slack, Confluence, Jira, Google Drive, and GitHub. It's MIT-licensed, supports self-hosted and API-based backends, and is free to get started.

Related Insights

Enterprise SearchAI Tools

12 min read

How Onyx's RAG Engine Cuts Token Usage at Enterprise Scale (2026)

Agents that read every source burn tokens on every task. Learn how Onyx's RAG engine indexes your sources once, condenses them into a vector DB, and serves enterprise context in one cheap retrieval call.

Roshan Desai

Jul 9, 2026

AI ToolsEnterprise Search

22 min read

Best Enterprise RAG Platforms for 2026: A Buyer's Guide

Compare 11 enterprise RAG platforms across architecture, connectors, deployment, security, and pricing. Includes turnkey, cloud, and open-source options with current 2026 pricing and analyst data.

Best LLMs for Coding in 2026

Claude Opus 4.6 leads SWE-bench Verified at 80.8%. GPT-5.4 leads Terminal-Bench at 75.1%. Full benchmark breakdown for 10 coding LLMs with cost comparison and open-source picks.

Roshan Desai

May 7, 2026