Adversarial pop-ups trick AI agents into clicking malicious links

Robot browsing the web
Image created with DALL-E 3

This article is part of our coverage of the latest in AI research.

Anthropic recently released a feature that enables Claude, its large language model (LLM), to automatically accomplish tasks by controlling the user’s computer, opening apps, and browsing the web. Visual agents that understand graphical user interfaces and perform actions are becoming frontiers of competition in the AI arms race. Google is reportedly working on its own version of a computer-controlling AI agent.

However, a new study shows that these agents are vulnerable to adversarial attacks that exploit their lack of real-world knowledge. Researchers from Georgia Tech, the University of Hong Kong, and Stanford University have demonstrated that carefully designed pop-up windows can easily trick visual agents.

These agents use vision-language models (VLMs) to interpret graphical user interfaces (GUI) like web pages or screenshots. Given a user request, the agent parses the visual information, locates the relevant elements on the page, and takes actions like clicking buttons or filling forms. This makes it possible to automate complex web tasks.

However, the web is full of deceptive practices designed to trick users into clicking malicious links or downloading malware. These include pop-up ads, fake download buttons, and countdown timers for fake deals. Experienced human users have learned to recognize these deceptive patterns and mostly ignore them. 

The researchers decided to test if visual agents can recognize malicious pop-ups that are easily recognized by humans. They considered a threat model where attackers manipulate the observations of the agent, such as screenshots and accessibility tree data. The goal of the attacker is to trick the agent into clicking the malicious pop-up. 

The researchers explored four different attack elements:

Attention Hook: A few words in the pop-up designed to grab the agent’s attention

Instruction: The desired behavior that the attacker wants the agent to follow

Information Banner: Information that either explains or misleads the agent about the purpose of the pop-up

ALT Descriptor: Textual information embedded in the accessibility tree to describe the content of the pop-up to screen readers.

Malicious popup for VLMs
source (arXiv)

They carefully designed and positioned adversarial pop-ups on web pages and tested their effects on several frontier VLMs, including different variants of GPT-4, Gemini, and Claude.

The results of the experiments show that all tested models were highly susceptible to the adversarial pop-ups, with attack success rates (ASR) exceeding 80% on some tests.  

Even with basic defense techniques, such as explicitly instructing the agent to ignore pop-ups or adding advertisement notices, the agents still clicked on the malicious links. The researchers found that adding an instruction to ignore pop-ups only reduced the attack success rate by a maximum of 25%.

example malicious popup for VLMs
Example malicious popup for VLMs (source: arXiv)

Surprisingly, the researchers found that detailed information about the web page, such as the user’s original query, the position of the pop-up, and the specific architecture of the agent’s framework, were not essential for carrying out successful attacks. A simple pop-up with generic instructions was enough to trick most agents into clicking the malicious link.

“Even though these pop-ups look very suspicious (by design) to human users, agents cannot distinguish the difference between pop-ups and typical digital content, even with extra advertisement notices,” the researchers write. “We believe more detailed and specific defense strategies are necessary to mitigate risks systematically.”

This study highlights the risks of the territory we’re entering as we unleash AI agents in environments that are designed for humans. The different reasoning mode of LLMs makes them susceptible to attacks that would not fool average human users. This further underscores the need to think about new design principles for applications and websites that will be accessed by AI agents.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.