This article is part of our coverage of the latest in AI research.
One of the power applications of large language models (LLMs) is retrieval augmented generation (RAG), where the model uses contextual information to answer questions beyond its training data. GEAR, a new retrieval technique from researchers at Huawei promises to improve RAG applications for complex queries that require gathering information from various documents.
Classic RAG is designed to match the input prompt to a set of documents. This is a useful approach for simple queries. However, some queries require multiple retrieval stages and answers before the final answer can be formulated. (Example: “Did Michael Jordan and Scottie Pippen ever play against each other in the NBA?” This query requires the model to retrieve the career games of the two players separately, find out if/when they played on separate teams, if the periods overlapped, and if they both played when their teams faced off.)
Classic RAG struggles with multi-hop question-answering tasks. There are multiple efforts to address these kinds of questions. One popular approach is to create graph representations from the retrieved documents and use an LLM to reason over the graph. This approach usually results in very long prompts and multiple LLM calls to reach the answer.
Graph-enhanced Agent for Retrieval-augmented generation (GEAR), the approach developed by Huawei, improves graph-based RAG through several novel techniques.
The key component is Synchronized Graph Expansion (SyncGE), a graph-based method for retrieving relevant passages for queries. The main goal of SyncGE is to create a graph representation from the given query. SyncGE first uses a simple retriever (e.g., BM25) to obtain a set of relevant passages. It then uses an LLM to summarize the passages into knowledge triples (the standard way to store graph information). Next, it uses a beam search algorithm to select the most relevant triples to the query and build a graph representation. It attaches this graph data to the retrieved passages to make it easier for the LLM to reason over the documents and answer complex questions.
In case the query is complex and requires multi-step retrieval, GEAR has a “multi-step extension” component that uses an agent that iteratively interacts with the graph retriever to build up the passages and graph representation for the query. First, GEAR decomposes the original query into simpler sub-queries and uses SyncGE to retrieve the relevant passages and their corresponding graph representation. It then uses a “gist memory constructor” to iteratively create a unified graph representation of the entire knowledge it has retrieved. As it builds up the gist memory, it checks with the LLM to see whether it has enough information to answer the query and terminates the process if it does.
The combination of the graph retriever and the agent in the GEAR framework is inspired by the communication between the hippocampus and neocortex in the human brain. In fact, the work also takes many cues from HippoRAG, another graph retrieval technique that tries to mirror the workings of the brain.
The researchers evaluated GEAR on three multi-hop QA benchmarks and measured the retrieval and QA performance. They compared GEAR with several popular retrieval techniques, including Interleaving Retrieval with Chain-of-Thought (IRCoT) and HippoRAG.
According to their findings, SyncGE achieves state-of-the-art performance in single-step retrieval, outperforming other single-step retrievers such as BM25, Sentence-BERT (SBERT), a hybrid approach combining BM25 and SBERT, and HippoRAG. GEAR also achieves state-of-the-art performance across all tested datasets in multi-step retrieval, surpassing IRCoT and HippoRAG with IRCoT. Moreover, GEAR requires fewer iterations to build its knowledge graph because SyncGE is able to bridge passages across distant reasoning hops. This makes GEAR faster, more efficient, and less expensive than the alternatives because it consumes fewer tokens.
While there is a lot of discussion about RAG becoming irrelevant because the context windows of LLMs continue to grow, I think techniques like GEAR can turn out to be very useful. With so much data scattered across different documents, these graph-based techniques can help us search and discover insights in ways that are technically impossible, slow, or very expensive.