Anthropic has just announced perhaps the most creepy capability for its Claude chatbot: computer use. The new feature, currently in beta, enables Claude to monitor your computer’s screen, take over control of your mouse and keyboard, fire up any app, and interact with them just as a user would.
The examples shared by Anthropic show the model carries out multi-step tasks that require planning and execution across multiple applications. For example, it can search the files on your computer for information to fill a web form. It can design a web page, download it on your computer, and open it in the browser. Or it can search for information on the web, plan an event, send out emails, and fill your calendar.
“Claude 3.5 Sonnet is the first frontier AI model to offer computer use in public beta. At this stage, it is still experimental—at times cumbersome and error-prone. We’re releasing computer use early for feedback from developers, and expect the capability to improve rapidly over time,” according to a blog post by Anthropic.
Off the top of my head, I can think of many ways this can go wrong. For example, a prompt injection attack can command the model to visit a malicious website and download malware on your computer. Or it can read information from your local files and send them to a remote destination. And note that every interaction sends information to Anthropic’s cloud servers, which can include sensitive data. Or simply put, it can be used as a tool to pose as a legitimate user.
That said, I think this can be an interesting tool that can help discover and experiment with new ways to use and combine applications without the need to make any modifications.
LLMs with computer use can be compared to humanoid robots and self-driving cars, AI systems that are meant to navigate environments that were designed for humans. However, just like those systems, the model will have to deal with a long tail of corner cases that will require constant adjustments and retraining.
But unlike the environments of self-driving cars and humanoid robots, computer applications can be easily adapted to fit LLMs. Once a proof of concept is developed with the computer use feature, a more efficient and robust infrastructure can be created by adding the right APIs to the applications. The LLM can then transition from computer use to more mature and reliable API use. I’m still interested to see the unexpected ways developers will put this new feature to use.