
This article is part of our coverage of the latest in AI research.
Prompt engineering is essential for maximizing the effectiveness of large language models (LLMs). But crafting effective prompts that perform the task as expected can be very challenging. A recent research paper proposes a new mental model for prompt engineering called “Method Actors,” which frames LLMs as actors, prompts as scripts, and LLM outputs as performances. This approach has shown promising results, significantly improving LLM performance on complex reasoning tasks.
The Method Actors model draws parallels between LLMs and actors. Both mimic human thought and emotion, and their success is often judged by the authenticity of their performance. Imagining LLMs as actors performing a role can better align a user’s expectations with the model’s capabilities.
This mental model leads to four key principles for prompt engineering and prompt architectures:
Prompt engineering is playwriting and directing. A prompt should provide the LLM with a character, motivation, setting, and stage directions, much like a script for an actor. Beyond simply assigning a role, the prompt should set up a narrative and provide clear instructions on the desired format and steps that the LLM should follow.
Performance requires preparation. Just as an actor prepares for a role, an LLM often requires background preparation for complex tasks. Since LLMs imitate the products of thinking rather than thinking itself, prompts for complex tasks should guide the LLM to produce the outputs of the “behind-the-scenes” thinking required. These intermediate outputs build up to the final performance.
Complex tasks should be decomposed to the point at which imitation and authenticity produce equivalent results. Complex tasks should be broken down into subtasks where imitation and genuine execution produce the same outcome. This decomposition enhances the accuracy and reliability of LLM responses.
Where imitation fails, compensate with methods that do not rely upon LLMs. When you can’t reach a point where imitation equals genuine execution, you can integrate external tools and methods into the broader LLM system to compensate for potential errors and hallucinations. This could involve using deterministic verifiers, retrieval systems, or other programming methods.
To evaluate the effectiveness of the Method Actors approach, the researchers used the New York Times Connections puzzle as a benchmark. This word puzzle requires players to group four sets of four words based on shared meanings, making it a suitable testbed for complex reasoning and pattern recognition.
The researchers compared several prompting techniques using GPT-4o:
Vanilla: The model is given the basic instructions about the task and told to solve the problem.
Chain of thought (CoT): In addition to the instructions, the model is assigned a role and instructed to solve the problem “step by step”—also known as “chain-of-thought prompting” (CoT)—forcing it to generate a reasoning chain before the answer.
CoT scripted: This prompt includes the classic CoT plus a set of problem/solution pairs in the prompt along with the reasoning chain. It also instructs the model to solve the new problem in the same sequence as the provided examples.
Actor: This method takes full advantage of the “Method Actor” principles. For example, the role definition part is much more detailed, including imaginary scenes and settings that can highlight the sensitivity of the task and why solving it is important. It also breaks down the problem into multiple stages instead of trying to solve it in one pass. For example, it uses a brainstorming stage, where the model generates potential answers based on different solution templates and a discernment stage that judges the results of the brainstormed solutions. The prompt of each stage is scripted in a way that maximizes the LLM’s acting capabilities.
Actor-2: This method builds on Actor and adds external tools to overcome potential hallucinations that the LLM might create when generating and evaluating answers. For example, you can use a deterministic verifier to validate the candidate solutions provided by the model and provide feedback, or you might use a retrieval system to provide additional facts that the model can check its answers against.
The results of the experiments clearly demonstrated the superiority of the Method Actors approach. The vanilla approach solved only 27% of the puzzles, with 12% solved perfectly. Chain of Thought improved the success rate to 41%, with 20% perfect solutions. However, the Method Actors approach significantly outperformed both, solving 78% of the puzzles with 41% solved perfectly. By incorporating external tools in Actor-2, the performance further increased to 86% of puzzles solved, with 50% perfect solutions. The results were even more impressive with OpenAI’s o1 model, which uses inference-time scaling to reason more about problems and provide more accurate results.
These findings suggest that the Method Actors model can substantially improve LLM performance on complex reasoning tasks. By treating LLMs as actors and prompts as scripts, we can leverage their ability to mimic human-like reasoning processes and achieve more accurate and reliable results. The researchers suggest that future work can evaluate how this mental model affects LLM performance in other domains and how novel mental models can lead to unique and effective prompting methods.