How LLMs can automatically design agentic systems

Robot design
Image created with DALL-E 3

This article is part of our coverage of the latest in AI research.

Agentic systems are one of the most interesting areas for creating applications with large language models (LLMs). However, most current agentic systems are designed manually.

A new paper by researchers at the University of British Columbia and Vector Institute introduce Automated Design of Agentic Systems (ADAS), a new paradigm that allows LLMs to discover and design new agentic systems.

ADAS takes after other areas of research where machine learning is used to automate the design of new systems, such as AutoML and Neural Architecture Search (NAS). As the researchers point out in their paper, “the history of machine learning reveals a recurring theme: manually created artifacts become replaced by learned, more efficient solutions over time as we get more compute and data.”

The researchers wanted to verify whether the same could be done with agentic systems. To solve the problem, the researchers formulated agent design into a search problem.

ADAS revolves around using an algorithm to search the space of possible agentic systems and find the best solution to an evaluation function. It is the same basic principle that you see in other optimization problems: an end goal, a set of tunable parameters, and an algorithm that can search different combinations of those parameters.

However, what makes ADAS unique is that its search space is code instead of a set of predefined hyperparameters. This means that the end result is a program for an agentic system, which could be a pipeline of prompts and tools that are combined together. There are two key benefits to this approach. First, the result will be interpretable because you will be able to review the code line by line. And second, the ADAS system will be able to use existing tools such as LangChain and AutoGen, as well as LLM APIs and local models.

Automated Design of Agentic Systems
The framework for Automated Design of Agentic Systems (ADAS) (source: arXiv)

This means there are limitless possibilities to create and explore different agentic systems. However, this strength is also the tradeoff. The search space is so complex that finding optimal solutions can be virtually impossible. Considering the tremendous number of building blocks yet to be discovered in agentic systems, it would take a long time to discover all of them. 

To overcome this limitation, the researchers propose Meta Agent Search, an ADAS algorithm that can find optimal solutions efficiently. The core idea of Meta Agent Search is to use foundation models to iteratively program and improve new agents. 

Meta Agent Search provides the LLM with information about the target task, coding frameworks and building blocks, and additional instructions. The prompt also includes instructions for the agent to write code and prompts to maximize performance on the task. After a new agent is generated, it is evaluated using the validation data from the target domain.

If the agent passes a certain threshold in the evaluation set, it is added to an archive and added to the prompt, creating a feedback loop to improve its previous outputs. The meta agent is encouraged to search for interesting new ideas at each iteration to maximize discovery. 

Meta Agent Search
Meta Agent Search framework (source: arxiv)

The researchers evaluated Meta Agent Search on several tasks, including the Abstract Reasoning Corpus (ARC), which evaluates the general intelligence of an AI system, and four popular benchmarks on reading comprehension, math, science questions, and multi-task problem-solving. They also tested whether the discovered agents transferred well to held-out domains and models.

As they had expected, the agents discovered through their ADAS algorithm outperformed the state-of-the-art manually designed models. But some of the more interesting findings were about how the algorithm found new agents. 

For example, on the ARC challenge, Meta Agent Search continued to innovate as its archive of previous observations grew. At one point, an important design pattern emerged where it discovered to use multiple chain-of-thought (CoT) steps to generate possible answers, refine them, and finally ensemble the best answers. This became a crucial stepping stone that subsequent designs tended to utilize.

Meta agent search for ARC
Meta Agent Search managed to find an complex agent system for the ARC challenge that outperforms current manually designed systems (source: arXiv)

They also discovered that the performance of the ADAS algorithm is limited by the knowledge of the foundation model it uses. For example, for challenging questions in science and multi-task domains, the knowledge of the model is not sufficient to solve the questions, which limits the improvement through optimizing agentic systems. This problem should be alleviated as frontier models continue to be imbued with new knowledge. 

On the other hand, in domains such as reading comprehension and math domains, where the models have adequate knowledge to solve the questions, errors are mostly due to hallucinations or calculation mistakes. These are problems that can be mitigated through well-designed agentic systems, like the ones discovered by Meta Agent Search.

Another interesting finding is that the discovered agentic systems were general enough to easily transfer from one model to another. For example, an agentic system designed for GPT-3.5 could easily be ported to Claude Sonnet or GPT-4 and achieve higher performance. This means you can easily explore the agentic space with lower cost models and then fine-tune them for a few final iterations on stronger models.

The agentic systems also transferred well not only to similar domains but also to unrelated tasks, such as going from mathematics to reading comprehension. In most cases, the agentic system designed by the ADAS algorithm outperformed the state-of-the-art manually designed agent for the transferred task. However, it still underperformed an ADAS-designed agent for the same task. But it shows that with an ADAS algorithm, an agent designed for one task can become a starting point to create a specialized agent for another task.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.