Onyx

enterprise-rag-bench

Open·MIT·v1.0

A RAG benchmark

for the real world.

The first RAG benchmark built on company-internal data, not Wikipedia. 500K+ docs, 500 questions, 9 enterprise sources. Open-source, MIT-licensed. By Onyx.

Show
How does your RAG stack up?
Bars: per-metric scores on 500 enterprise RAG questions (higher is better).
Score %0%25%50%75%100%72.468.261.050.249.048.441.937.735.634.527.225.024.9Onyx + GPT-5.4OpenClawOpenAI File SearchRAGFlowAmazon Q (Kendra)Azure AI SearchVertex AI SearchNVIDIA AI BlueprintsAnythingLLMWeaviate VerbaLlamaIndexLangChainOpen WebUI…ENTERPRISERAG-BENCH · 500 QUESTIONS · 12 SYSTEMSONYX.APP/ENTERPRISERAG-BENCH ↗

Talk to the team that built the benchmark

Run this on your data.

Same retrieval, same agentic refinement, same connectors, on your Slack, Drive, Jira, Confluence, and the rest of your stack. Bring a use case. We'll show you what it looks like.

§2 · Inside the dataset

We generated a realistic synthetic company with documents across 9 different sources.

511,962

documents across 9 sources

How it's built
275K120K5K35K25K15K10K8K6K
03 · Add noise
Inject realistic distractors. Off-topic threads, half-finished drafts, near-duplicate pages. The clutter retrieval has to ignore.

§3 · Compare

Head to Head

We evaluated different RAG products and frameworks on the benchmark questions to see how they stack up. See where each one wins across the different question types.

25%50%75%100%BasicSemanticIntra-DocProjectConstrainedConflictingCompletenessMisc.High LevelNot Found

§4 · Citation

@misc{sun2026enterpriseragbench,
  title        = {EnterpriseRAG-Bench: A RAG Benchmark for Company Internal Knowledge},
  author       = {Sun, Yuhong and Rahmfeld, J. and Weaver, Chris and Desai, Roshan
                  and Huang, Wenxi and Chen, Weijia and Butler, Mark H.},
  year         = {2026},
  howpublished = {\url{https://github.com/onyx-dot-app/EnterpriseRAG-Bench}},
  note         = {Draft. Final paper forthcoming.}
}

Think your RAG can beat the field?

Run the 500-question test set against your system. Email results to joachim@onyx.app. We verify, then post you on the public leaderboard.

Built by Onyx · Open-source, MIT-licensed

Open source

Ship a script or notebook that re-runs your system on the released corpus. We re-run it, you get scored.

Submit your repo

Closed source

Point us at a sandbox or API endpoint. We hit it with the question set and verify your numbers.

Submit your endpoint