Comments on: StreamingLLM gives language models unlimited context

By: Ben Dickson

Ben Dickson — Wed, 29 Nov 2023 21:26:10 +0000

In reply to Andy Tenland. That is what I meant. It enables you to continue your conversation with the LLM past the context window, though as you said, it sticks to length of the context window (e.g., 4k tokens). That's what the article says too if you read it carefully.

By: Andy Tenland

Andy Tenland — Wed, 29 Nov 2023 15:12:37 +0000

In reply to Ben Dickson.

Your explanation is inaccurate. It does not change the context window in any way. If the LLM has a 4k context window, it can only respond using the context of the latest 4k tokens. StreamingLLM makes LLMs more efficient by removing the need the reset the cache and improves accuracy vs LLMs that aren’t resetting their cache. It doesn’t make it so that an LLM with a 4k context window can accurately respond to a 128k token prompt. This article is spreading misinformation. Read the FAQ section here. https://github.com/mit-han-lab/streaming-llm

By: Ben Dickson

Ben Dickson — Wed, 29 Nov 2023 06:20:51 +0000

In reply to Jonathan Hostetler. Hi Jonathan. StreamingLLM does not change the architecture of the model to expand the context window. What it does is shift the context window while maintaining the accuracy and the reused part of the KV cache. So basically, you can extend the conversation with the LLM into millions of tokens as if its context window was unlimited, but without making any changes to the model or retraining it. I hope this helps.

By: Jonathan Hostetler

Jonathan Hostetler — Wed, 29 Nov 2023 03:41:31 +0000

This seems amazing but I’m a bit confused. From what I understand you saying in this article, StreamingLLM could expand the context window of an LLM such as Llama to 4 million tokens, meaning I could hypothetically input 3 million words. However, the GitHub page explicitly says that it does not expand the context window. Am I missing something?