This article is part of our coverage of the latest in AI research.
In a few years, large language models (LLM) have evolved from processing a few thousand tokens to millions. The near-infinite context windows of LLMs are unlocking new applications and easier ways to adapt them to custom tasks.
According to a recent study by researchers at Carnegie Mellon University and Tel Aviv University, in-context learning (ICL) with long-context models can achieve performance levels comparable to or even exceeding those of fine-tuned models, especially when dealing with large datasets.
The findings suggest that ICL with long-context LLMs can help product teams create prototypes and full applications without using resource-intensive and time-consuming techniques.
Few- and many-shot in-context learning
You can adjust LLMs to perform new tasks without retraining or fine-tuning them. For this, you can use their in-context learning abilities. When you insert examples of problem-solution pairs in the prompt, the model will be able to find solution patterns and solve similar problems.
The number of ICL examples a model supports depends on the length of its context window. For example, early versions of GPT-3 supported around 2,000 tokens, which only allows for a few ICL examples. But early studies showed that you could get the models to do plenty of new tasks with few-shot ICL.
But GPT-4 supports up to 128,000 tokens and Google’s Gemini 1.5 Pro will support 2 million tokens. These models support long-shot ICL with hundreds or even thousands of examples in the prompt.
A recent Google study explores the impressive abilities of long-shot ICL in teaching LLMs new tasks or changing their learned biases. However, that study was solely focused on Gemini Pro, which makes it difficult to compare it to other baselines.
Long-shot ICL vs retrieval and fine-tuning
In their new study, the researchers at Carnegie Mellon and Tel Aviv Universities experimented with open models. They used different versions of Llama-2 7B with context windows of up to 80,000 tokens and the 32k version of Mistral-7B.
Their experiments included several classification datasets. The goal was to see how far you can use ICL to improve the model’s ability to classify unseen examples. They compare long-shot ICL with retrieval-augmented generation (RAG) and low-rank adaptation (LoRA), an LLM fine-tuning method that reduces memory and compute requirements.
Their findings show that scaling up ICL to many examples yields strong results. As they increased ICL examples from 10 to 1,000 demonstrations, they were able to achieve gains of up to 50.8 points.
When you have few ICL examples, RAG outperforms random sampling. But as you add more examples, the importance of the selection strategy diminishes. This can help you test proof-of-concepts without the need to set up RAG pipelines.
When you have a small set of examples, ICL generally outperforms LoRA finetuning. When label spaces are larger, finetuning performance decreases compared to ICL, the researchers found, “likely because these are more open-ended classification problems and require more data to train the classifier.” However, the inference costs of fine-tuned models are considerably smaller than ICL.
Another interesting observation was that as the number of examples grows, label sorting begins to have a dramatic impact on performance. When you have many ICL examples, sorting them by label will degrade the model’s performance in comparison to random ordering.
“This suggests that contextualization of examples with different labels is important to performance, and that this contextualization only occurs effectively over relatively short distances in the context window,” the researchers write.
What does it mean for LLM applications?
Long-shot ICL has important implications for developing LLM applications. The general trend of the advances in LLMs has been to reduce the barriers to creating machine learning applications. For example, you previously needed a team of ML experts and weeks of training and testing to create a sentiment analysis model. You can now do it with simple prompt engineering on a pre-trained LLM.
If you enjoyed this article, please consider supporting TechTalks with a paid subscription (and gain access to subscriber-only posts)
Long-shot ICL further reduces the barrier to creating ML applications. For example, if you have an application that the model can’t do out-of-the-box with zero-shot prompting, you usually need to either fine-tune the model or set up a RAG pipeline to provide it with the necessary contextual information.
Now, thanks to many-shot ICL, you can just dump all your documents or demonstrations into the context window and work on your prompt. This will help save time in creating prototypes and proof of concepts. It will also enable product managers who don’t have heavy machine learning experience and coding skills to create and iterate over their own prototypes.
However, once you reach product-market fit and need to optimize your LLM application for scale, you will still need to do everything you can to cut costs and increase speed. Long-shot ICL is expensive if you are paying per token. It can also slow inference and increase memory requirements if you’re hosting your models. This is where the rest of the techniques can help you. For example, a simple RAG pipeline can considerably reduce your token consumption. Fine-tuning enables your model to do zero-shot question answering without the need to provide extra tokens of context. Fine-tuning techniques such as Adapt-LLM enable the model to choose between RAG and in-memory knowledge based on its confidence in the topic. You can eventually use more advanced techniques, such as using custom bi-directional encoders to perform tasks such as classification more efficiently and without the need for memory-intensive LLMs.
All of these tools and techniques will serve you in one way or another. But with advances such as many-shot ICL, getting started with LLM applications has never been easier.