Best LLMs for Coding — 2026 Rankings

Coding LLM Leaderboard

Which AI model writes the best code? We rank every major LLM — open and closed source — across software engineering, code generation, competitive programming, and agentic coding benchmarks.

Roshan Desai

Roshan Desai · Last updated: 2026-02-24

S

Claude Opus 4.6

N/A

GPT-5.2

N/A

Kimi K2.5

1T

MiniMax M2.5

230B

A

Claude Sonnet 4.6

N/A

Gemini 3 Pro

N/A

Qwen 3.5

397B

Step-3.5-Flash

196B

GLM-5

744B

MiMo-V2-Flash

309B

Mistral Large

675B

B

DeepSeek V3.2

685B

DeepSeek R1

671B

C

GPT-oss 120B

117B

Qwen2.5-Coder-32B

32B

Nemotron Ultra 253B

253B

D

DeepSeek V3

671B

Llama 4 Maverick

400B

Grok 3

N/A

Cost vs. Coding Performance

Which models give you the best coding performance for the price? Top-left is the sweet spot — high performance, low cost.

Top-left = best value (high performance, low cost). Models without pricing data are excluded.

Best Coding LLMs by Benchmark

How does each model perform on real-world software engineering, code generation, competitive programming, and terminal-based coding tasks? Hover any bar for details.

Best at Software Engineering

Real-world software engineering tasks (SWE-bench Verified)

Best for Code Generation

Python code generation from docstrings (HumanEval)

Best in Competitive Coding

Real-world coding problems (LiveCodeBench)

Best at Terminal Coding

Agentic terminal coding tasks (Terminal-Bench 2.0)

Coding Benchmark Scores & Pricing

Complete coding benchmark results and pricing for every model. Click any column header to sort.

Filter:

Claude Opus 4.6

Anthropic

N/A

200K

$15.00

$75.00

82.0

91.3

80.8

95.0

76.0

65.4

Claude Sonnet 4.6

Anthropic

N/A

200K

$3.00

$15.00

79.1

89.9

79.6

92.1

72.4

59.1

DeepSeek R1

DeepSeek

671B

128K

$0.28

$0.42

84.0

71.5

49.2

90.2

65.9

N/A

DeepSeek V3

DeepSeek

671B

128K

$0.28

$1.10

81.2

68.4

38.8

N/A

49.2

N/A

DeepSeek V3.2

DeepSeek

685B

130K

$0.28

$0.42

85.0

79.9

67.8

N/A

74.1

39.6

Gemini 3 Pro

Google

N/A

1M

$1.25

$10.00

85.0

91.9

78.0

93.0

81.3

56.2

GLM-5

Zhipu AI

744B

200K

N/A

N/A

70.4

86.0

77.8

90.0

52.0

56.2

GPT-5.2

OpenAI

N/A

128K

$2.00

$8.00

N/A

93.2

80.0

95.0

80.0

64.7

GPT-oss 120B

OpenAI

117B

128K

N/A

N/A

90.0

80.9

62.4

88.3

60.0

18.7

Grok 3

xAI

N/A

131K

$3.00

$15.00

N/A

84.6

49.0

94.5

79.4

52.0

Kimi K2.5

Moonshot

1T

262K

N/A

N/A

87.1

87.6

76.8

99.0

85.0

50.8

Llama 4 Maverick

Meta

400B

1M

N/A

N/A

80.5

69.8

N/A

62.0

43.4

N/A

MiMo-V2-Flash

Xiaomi

309B

262K

N/A

N/A

84.9

83.7

73.4

84.8

80.6

38.5

MiniMax M2.5

MiniMax

230B

205K

$0.30

$1.20

76.5

85.2

80.2

89.6

65.0

42.2

Mistral Large

Mistral

675B

256K

N/A

N/A

N/A

43.9

N/A

92.0

82.8

N/A

Nemotron Ultra 253B

Nvidia

253B

128K

N/A

N/A

N/A

76.0

N/A

N/A

66.3

N/A

Qwen 3.5

Qwen

397B

262K

N/A

N/A

87.8

88.4

76.4

N/A

83.6

52.5

Qwen2.5-Coder-32B

Qwen

32B

131K

N/A

N/A

N/A

N/A

N/A

92.7

43.2

N/A

Step-3.5-Flash

Stepfun

196B

256K

$0.10

$0.30

N/A

N/A

74.4

81.1

86.4

51.0

Compare Coding LLMs Head-to-Head

Select two models to see how they compare across all coding and reasoning benchmarks.

Model A

Model B

Claude Opus 4.6

GPT-5.2

GPQA Diamond

91.3

vs

93.2

SWE-bench Verified

80.8

vs

80.0

HumanEval

95.0

vs

95.0

LiveCodeBench

76.0

vs

80.0

Terminal-Bench 2.0

65.4

vs

64.7

Benchmarks won

2

vs

2

Try These Models in Onyx

Onyx is the open-source AI platform that lets you connect any of these LLMs to your team's docs, apps, and people.