
All Posts
By Evan Lohn, Joachim Rahmfeld
At Onyx, we are dedicated to expanding the knowledge and insights users can gain from their enterprise data, thereby enhancing productivity across job functions.
So, what is Onyx? Onyx is an AI Assistant that companies can deploy at any scale—on laptops, on-premises, or in the cloud—to connect documented knowledge from many sources, including Slack, Google Drive, and Confluence. Onyx leverages LLMs to create a subject matter expert for teams, enabling users to not only find relevant documents and but also get answers for questions such as "Is feature X already supported?" or "Where's the pull request for feature Y?"
Last year, we embarked on enhancing our Enterprise Search and Knowledge Retrieval capabilities by setting the following goals:
Questions that fall into these categories are usually of high value to the user, however a traditional RAG-like system tends to struggle in these situations.
For example, consider the question: "What are some of the product-related differences between what we positioned at Nike vs Puma that could factor into our differing sales outcomes?". This question involves both, multiple entities, as well as an ambiguity (product-related sales outcomes can mean many things).
Unless there happen to be documents in the corpus that deal pretty much with this exact question, a RAG system is challenged to find a good answer here.
These are the types of questions where our new Agent Search comes in. What is the idea here?
On a high level, the approach is to:
To make this more concrete for the example above, some valid initial sub-questions could be "Which products did we discuss with Puma?", "Which products did we discuss with Nike?", "Which issues were reported by Puma?" To encapsulate this type of logical process a lot of steps, calculations, and LLM calls need to be organized and orchestrated.
The purpose of this blog is to illustrate how we approached this problem on a functional level, discuss how we went about our technology selection approach, and share in good detail how we leveraged LangGraph as a backbone, and specifically which lessons we learned.
We hope this write-up will be useful to readers who are interested in this space and/or who want to build agents using LangGraph and share some of our requirements.
Roughly, our targeted logical flow looks on the high level like this:

Key aspects and requirements of this flow are:
So indeed, a lot has to happen to achieve our goal of being able to address substantially broader and more ambiguous questions.
While this flow is certainly quite workflow-centric, it presents an initial step towards a broad(er) Agent Search flow(s). We intend to hook various tools into the flow, update the refinement process, etc. We may at a later date potentially also introduce Human-in-the-Loop type interactions with the users, like approving answers before refinement or re-running part of the flow with some manual changes.
Addressing our requirements, also with an eye towards the near/mid future, asks for a framework that
And—oh yeah!—the answers also needed to be produced in a timely manner matching user expectations, and do so at scale.
So the key question we had to address was ”How do we best implement this?”
The options for us were essentially whether to implement this flow ourselves from the ground up by extending our existing flow, or to leverage an existing agentic framework—and if so, which one.
Given our priorities outlined above, we landed on LangGraph as our main candidate for our implementation framework, with implementation-from-scratch probably a relatively close second.
The initial drivers in favor of LangGraph were:
However, we certainly also had some concerns which favored an implementation from scratch, including:
Who changed this value?)To decide, as one does for most projects of this type, we started with a prototype evaluation. Specifically, we quickly (~1 week/1 FTE, including learnings) implemented a stripped-down, stand-alone LangGraph implementation of our targeted flow, where we tried to test:
The results were encouraging, and we proceeded to implement our actual flow in LangGraph within our application. As expected, in the process, we learned a number of additional lessons. Below we document what we have learned, and the conventions we intend to follow in the future, as our use of Agent flows will expand.
As the project quickly grew more complex, here are some of the observations we made and practices we adhered to.
<action>_<object>.py naming convention. Adding a digit for the step number can also be helpful, while it would require some extra work when nodes are added or removed.We use Pydantic across our code base, so it was great to see that LangGraph supports Pydantic models in addition to TypeDicts. Consequently, we use Pydantic models all through the LangGraph implementation as well. (Unfortunately, for graph outputs Pydantic is not yet supported).
As we have many nodes with their own actions and outputs (state updates) within a subgraph, we generally look at the (sub)graph states as driven by the node updates. So rather than defining the keys directly within the subgraph state, we define Pydantic state models for the various node updates and then construct the graph state by inheriting from the various node update (and other) models.
Benefits:
Challenges:
Not surprisingly, being deliberate about setting default values (or not!) for state keys is very important. If not handled carefully, setting default values in inappropriate situations can lead to unintended behavior that may be more difficult to detect. For instance, consider the following problematic configuration:

Here, Main Graph Node A sets my_key to my_data. This value is later intended to be used by an Inner Subgraph node. But in this example we (purposefully) missed adding that key to the Outer Subgraph. Not surprisingly, the value of this key for the inner subgraph would be an empty string.A similar situation would occur on the output side as well: if we updated my_key in the Inner Subgraph Node C, then this would not update the my_key state in the main graph.
Had we been instead careful and not set a default value for my_key in the inner subgraph as shown here:

then an error would be raised, as my_key does not have an input value for the Inner Subgraph. Then, the missed state in Outer SubGraph would be added to arrive at the proper configuration:

This is of course not really different from traditional nested functions, but in our experience, in the LangGraph context these issues are a bit harder to identify.
Our recommendations-no surprise-are to:
<type> | None = None, … with the exception of when the key is a list and we expect to add to a list from many nodes.We have a lot of requirements for processes to be executed in parallel, and there are multiple types of parallelism.
Parallelism of Identical Flows:
Map-Reduce BranchesAn example of this type of parallelism in our flow is the validation of retrieved documents, i.e., the testing of each document in the list of retrieved documents for relevance to the question. Obviously, one wants to do these tests in parallel, and LangGraph's Map-Reduce branches work quite well for us in these situations:

Above, the bold-face state key is the one being updated during the fan-out, and italic keys refer to fan-out node-internal variables.
Parallelism of Distinct Flow Segments:
Extensive Usage of Subgraphs! In the following situation, imagine B1 and B_2 each take 5s to execute, whereas C takes 8s. In the scenario on the left, D starts actually 13s after A has completed, because B_2 only starts once B1 _and C are completed. In the scenario on the right on the other hand, D starts 10s after A completed. Wrapping B1 and B_2 into a subgraph ensures that from the parent graph's perspective there is one node on the left and one node on the right, and B_2's execution is not waiting for C's completion. (Note: we always use _subgraphs as nodes within the parent graph, vs invoking a subgraph within a node of the parent.)

We do have plenty of repeated flow segments that consist of multiple nodes. One example is the Extended Search Retrieval, where documents for a given (sub-)question are retrieved. At the core, the process consists of a Search, followed by a Relevance Validation of each retrieved doc with respect to the question, and concludes with a reranking of the validated documents. To make this repeated process efficient, we wrap it into a subgraph, which is then used either by the main graph or other subgraphs. One needs to be careful, as always, with the definition and sharing of the keys between the parent graph and the subgraph.
Node Structure
one-action-per-node, though one could easily reduce node sprawl by putting more consecutive actions into one node.formatting node at the end, whose role it is to convert data into a desired key update.Streaming
LangGraph versions
Lastly, here is our current graph (x-ray level set to 1 to limit complexity, so a number of the nodes are actually subgraphs which may contain further subgraphs):

It is quite evident that this flow has a strong resemblance with the logical flow that we laid out at the beginning, with a few additions to facilitate the Basic Search flow if the Agent Search is not selected.
We see this implementation as a first step, and we plan on expanding the flow to become substantially more agentic in the near future. LangGraph certainly has thus far been a good fit for our needs.
We invite you to check out agent-search on GitHub, book a demo, try out our cloud version for free, and join slack, discord #agent-search channels to discuss Onyx Search more broadly, as well as Agents!
Related Posts
Benchmarking agentic RAG on workplace questions
Onyx outperformed ChatGPT Enterprise, Claude Enterprise, and Notion AI in an RAG benchmark using our internal and web data, complementing our recent Deep Research benchmark success.
Lessons from building the best Deep Research (and how you can build better agents)
A practical, behind-the-scenes look at how we built the #1 Deep Research system and the design choices that create performant agent systems.
Building an Internet Search to rival OpenAI
We reverse-engineered how top agents run web search, then built the same two-tool workflow in Onyx for fast, reliable, low-hallucination internet answers.