enterprise-rag-bench
A RAG benchmark
for the real world.
The first RAG benchmark built on company-internal data, not Wikipedia. 500K+ docs, 500 questions, 9 enterprise sources. Open-source, MIT-licensed. By Onyx.
Talk to the team that built the benchmark
Run this on your data.
Same retrieval, same agentic refinement, same connectors, on your Slack, Drive, Jira, Confluence, and the rest of your stack. Bring a use case. We'll show you what it looks like.
§2 · Inside the dataset
We generated a realistic synthetic company with documents across 9 different sources.
511,962
documents across 9 sources
§3 · Compare
Head to Head
We evaluated different RAG products and frameworks on the benchmark questions to see how they stack up. See where each one wins across the different question types.
§4 · Citation
@misc{sun2026enterpriseragbench,
title = {EnterpriseRAG-Bench: A RAG Benchmark for Company Internal Knowledge},
author = {Sun, Yuhong and Rahmfeld, J. and Weaver, Chris and Desai, Roshan
and Huang, Wenxi and Chen, Weijia and Butler, Mark H.},
year = {2026},
howpublished = {\url{https://github.com/onyx-dot-app/EnterpriseRAG-Bench}},
note = {Draft. Final paper forthcoming.}
}Think your RAG can beat the field?
Run the 500-question test set against your system. Email results to joachim@onyx.app. We verify, then post you on the public leaderboard.
Built by Onyx · Open-source, MIT-licensed
Open source
Ship a script or notebook that re-runs your system on the released corpus. We re-run it, you get scored.
Closed source
Point us at a sandbox or API endpoint. We hit it with the question set and verify your numbers.