PAS finds the best prompting technique for your LLM

robot choosing prompt
Image created with Microsoft Copilot

This article is part of our coverage of the latest in AI research.

Prompt engineering is a class of techniques that improve the capabilities of large language models (LLMs). But matching the right prompt engineering technique to a given prompt requires specialized knowledge and can be time-consuming. 

Researchers at Peking University and Baichuan have introduced Prompt Augmentation System (PAS), a technique that improves the performance of LLMs by automatically choosing the best prompting technique for each input. 

PAS is plug-and-play and is compatible with various LLMs, making it much easier for users to take advantage of prompt engineering.

The challenges of prompt engineering

While LLMs can generate impressive results with simple prompts, they can be quite sensitive to the way the prompt is phrased. A slight change in the wording of a prompt can significantly change the output of an LLM, even when the user’s intent remains the same.

Prompt engineering techniques aim to find optimal ways to phrase requests to LLMs to improve their performance on different tasks. There are many different prompt engineering techniques, such as few-shot learning, chain-of-thought prompting, ReAct, and tree of thought. Each technique has its strengths and weaknesses, and choosing the optimal prompting method often depends on the specific task and the LLM.

Automated prompt engineering (APE) tries to address this challenge by using a model that automatically finds the best prompting technique for each input.

There are various APE models, but despite their benefits, they face several challenges, including the need for large amounts of manually labeled examples, lack of flexibility across different tasks and LLMs, and reliance on outdated evaluation metrics that don’t consider human feedback. 

Prompt Augmentation System

The researchers present Prompt Augmentation System (PAS) as a “plug-and-play” APE system that aims to overcome the challenges of previous approaches. Instead of requiring users to write prompts from scratch, PAS augments their input with the instructions and prompt engineering techniques needed to get the best results. 

PAS automated prompt engineering
automated prompt engineering with PAS (source: Arxiv)

PAS is composed of two main parts: a prompt complementary dataset and an LLM-based prompt augmentation model. The dataset contains pairs of raw prompts and their improved versions, which have been automatically generated and refined through a multi-step process. The prompt augmentation model is then trained on this dataset to learn how to enhance new prompts. 

To create the dataset, the researchers first collected high-quality prompts from two curated sources, the LMSYS-1M dataset and the WildChat dataset. They used embedding models and clustering algorithms to group the examples and remove duplicates and prompts that are too similar. They then used an LLM to select the highest-quality prompts and classify them into 14 categories (e.g., question answering and coding).

Complementary dataset for PAS
Pipeline for gathering the complementary prompt dataset to train PAS models (source: Arxiv)

Next, the researchers used few-shot learning techniques to automatically generate new prompts for each category, using a small set of examples as guidance. They used another few-shot prompt to evaluate the quality of the generated prompts. The low-quality examples were then sent back to the generation step for improvement.

The resulting prompt-complementary dataset, consisting of approximately 9,000 high-quality pairs, can be used to fine-tune a smaller LLM, which serves as the basis of the PAS model. 

When a user provides a raw prompt to PAS, the model automatically generates a complementary prompt that enhances the original input without altering it, guiding the target LLM toward producing more accurate and relevant outputs. One of the main advantages of PAS is that it can be plugged into any LLM application, whether it is using a closed API or an open source model running on your own servers.

Process for generating new training examples for PAS models
Process for generating new training examples for PAS models (source: Arxiv)

“The primary advantage of such a system is its ability to seamlessly enhance the capabilities of existing LLMs without the need for extensive retraining or modification,” the researchers write.

PAS in action

The researchers evaluated PAS on three benchmark datasets: Arena-hard, Alpaca-Eval 2.0, and Alpaca-Eval 2.0 (LC). These are benchmarks that test the abilities of LLMs in a range of tasks, from language comprehension to complex reasoning problems.

They trained the 7-billion-parameter versions of Qwen-2 and LLaMA-2 on the complementary prompt dataset and used them as PAS models to enhance the prompts provided to larger models such as GPT-4, Qwen2-72B, and Llama-3-70B.

Across all three benchmarks, the models guided by PAS consistently outperformed the baseline models without prompt augmentation. On average, PAS improved the performance scores by 8 points. Notably, when applied to GPT-4-0613, PAS led to an average improvement of 11.46 points.

“Overall, our PAS method not only outperforms the baseline but also consistently surpasses the previous SoTA model BPO, establishing its robustness and effectiveness as a fine-tuning strategy for enhancing prompt-based learning systems,” the researchers write.

The evaluations also showed that PAS achieves its performance with a significantly smaller dataset than other state-of-the-art methods. While previous methods required up to 170,000 examples for training, PAS achieves similar or better results with only 9,000 examples.

Importantly, PAS is both LLM-agnostic and task-agnostic, meaning it can be applied to different models and tasks without requiring task-specific customization. 

“Unlike other methods that require significant human intervention and have limitations in applicability across different LLMs and tasks, PAS provides a highly versatile and efficient solution,” the researchers write.

PAS can be a very useful tool for LLM pipelines, where you have small models that are optimized for a specific task. The PAS model can help you get the most out of your stronger models.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.