Blog

Simulating millions of LLM agents with AgentTorch

October 2, 2024

This article is part of our coverage of the latest in AI research.

Agent-based models (ABMs) are software that simulate the dynamics of populations. ABMs can help analyze past events, evaluate counterfactuals, and predict future effects of policies, assisting in policy questions by simulating how interventions affect individual behaviors and environmental dynamics.

But ABMs require realistic environments and expressive agents, which have high computational costs and complex calibration processes. To address these challenges, researchers at MIT Media Lab have developed AgentTorch, a framework that can simulate millions of with large language models (LLMs).

LLM-powered agents have shown potential to enable more general and adaptive human-like behavior and can have positive implications for ABMs. But so far, LLM-agent simulations have been limited to tabletop games and small population scenarios.

AgentTorch promises to enable complex dynamics and adaptive individual behavior for million-size populations without the need for specialized hardware. It supports differentiable implementations of continuous and discrete environments. Differentiable ABMs can be calibrated through gradient-assisted techniques and composed into end-to-end pipelines as opposed to systems that must be adjusted manually.

According to a paper released by the MIT researchers, AgentTorch’s design is motivated by active collaborations worldwide. AgentTorch models are helping mitigate a measles outbreak in New Zealand, capture the foraging behavior of migratory birds in Alaska, and analyze the dairy supply chain in the Pacific islands to safeguard against a potential H5N1 outbreak.

To validate the effectiveness of AgentTorch, the researchers used the framework to simulate the dynamics of Covid-19 and how it affected the population-wide isolation and employment behaviors of New York City from August 2020 to July 2022.

They used “LLM archetypes” to simulate 8.4 million agents representing the population of New York. Archetypes assume that many agents will have similar behaviors and the number of different behaviors is typically much smaller than the number of agents. Therefore, we only need to query the LLM to inform the behavior of each archetype instead of simulating every agent separately.

For example, each combination of demographic factors (age, sex, geolocation, etc.), disease dynamics (e.g., changes in number of cases), and external factors (e.g., stimulus payments) can be considered as one archetype whose behavior approximates that of the population. This archetype configuration is shown in the prompt template they used for their ABM.

LLM archetype agent prompt (source: arXiv)

They tried different combinations of LLM archetypes and compared it with census information published by the government from December 2020 to March 2021, when the second round of stimulus payments were handed out across the U.S.

“We observe that the correlation between the output predicted by LLM-agents and the observed data increases as we add more contextual information to the prompt, demonstrating that it is possible to tailor the behavior of LLM-agents by passing contextual information into their input prompts,” the researchers write.

LLM agents simulating NYC — LLM archetypes simulate different boroughs of NYC (bluer is more accurate) (source: arXiv)

Their findings show that LLMs can simulate the collective behavior of three of New York’s five boroughs with high accuracy, which corresponds to roughly 5 million people with very diverse demographic and economic states. Compared to other ABM techniques, LLM archetypes achieve lower error rates when forecasting infections and unemployment rates. More importantly, LLM archetypes have significantly lower computational costs, making it possible to apply them to very large populations.

While the technique is promising, it is not without limitations. Among the key problems, the researchers point out to are potentially inconsistent or biased outputs of LLMs.

“Ensuring the robustness and fairness of LLM-driven agents remains an open challenge,” they write.

Another problem is that the scalability of LLM archetypes can come at the cost of lower diversity among individual agents. Finally, their current experiments with LLM-based agents are limited to simple actions.

“Despite these limitations, we believe AgentTorch represents a significant step forward in agent-based modeling, opening up new possibilities for understanding and addressing societal challenges,” they write.

Are we at the cusp of a new era for artificial…

What to know about o3 and o4-mini, OpenAI’s new reasoning models

GPT-4.1: OpenAI’s most confusing model

Making data work for you: Challenges, innovations, and lessons learned

Under the hood: The Innovations powering DeepSeek’s AI breakthrough

How to turbocharge your product and market research with DeepSearch

How looking differently at data can save your machine learning project

Building a solid data foundation for generative AI applications

These mistakes can ruin your machine learning project

How to choose metrics for your machine learning projects

How Google’s Agent2Agent can boost AI productivity through inter-agent communication

Demystifying vibe coding: Hype, reality, and why you still need to…

What is Model Context Protocol (MCP)?

What to know about Claude 3.7 Sonnet, Anthropic’s new frontier language…

Everything you need to know about Grok-3

The cash flow conundrum: How technology is reshaping small business finance

What to know about the security of open-source machine learning models

Understanding the impact of open-source language models

What we learned from the deep learning revolution

AI21 Labs’ mission to make large language models get their facts…

Simulating millions of LLM agents with AgentTorch

Like this:

Leave a ReplyCancel reply

Like this:

Leave a ReplyCancel reply

Discover more from TechTalks