screen time

The Rise of the Self-Clicking Computer

Photo: Intelligencer

AI start-up Anthropic, creator of popular chatbot Claude, announced a new feature this week called “computer use,” which is a set of tools “that can manipulate a computer desktop environment.” In plain English, as the name suggests, it’s an AI that can use your machine for you:

Over the past year, the biggest players in AI have been making versions of the same claim: that “agents” — systems given some degree of autonomy to complete tasks on behalf of their users — are the next step for the industry. Different firms have different definitions for the term and ambitions for the concept, but the basic idea is pretty intuitive. Real, useful AI automation isn’t possible until models can interact with the real world, not just with users in a chat window. Or, as in Claude’s case, with the real world as mediated by a desktop computer.

Anthropic’s software is in limited testing, but early reviews suggest the concept is viable. “It is capable of some planning, it has the ability to use a computer by looking at a screen (through taking a screenshot) and interacting with it (by moving a virtual mouse and typing),” writes management professor and AI influencer Ethan Mollick, who has been testing the tool. Despite “large gaps,” he says, he was “surprised at how capable and flexible this system is already,” and believes that similar approaches are likely to become more common. The demo videos are worth watching if you haven’t interacted with a recent AI model, including this one from the company where the agent goes off-task:

What’s impossible to ignore about features like this in practice is that they demand huge amounts of access in order to function, a tension that’s going to become more evident as tech companies get more ambitious with AI tools in general. When Google, Apple, Microsoft, and OpenAI talk about the future of AI assistants, more useful chatbots, and the rise of agents, they’re also talking about a world in which they have unprecedented access to the digital matter of users’ lives. Claude’s demo here makes that abundantly clear. By ingesting and interacting with everything on users’ screens, Claude’s relationship with its users is instantly more intimate than the ones they have with their other digital services combined. This level of access represents a massive secondary opportunity for AI firms, who would be potentially leapfrogging the biggest tech companies of the last generation in terms of raw access to user data, massively shifting norms around privacy in the process.

The other striking thing about these demos is that, despite being impressive and novel, they’re clearly showing off a transitional technology. In description, an AI agent is an entity with access to the resources it needs to, say, book a plane ticket or put together a document. In practice, at this early stage, it’s a tool that interacts with human interfaces — websites and pieces of software — by effectively impersonating a user, a bit like a regular car piloted by a humanoid robot rather than a vehicle that simply controls itself.

It’s a fascinating proof of concept with a lot of room for improvement, but it also sets up an antagonistic relationship with at least some of the software that it’s “using.” Claude is seen here searching Google, for example, in order to complete other tasks; Google, which makes money from showing ads to actual people, will eventually have something to say about systems like this, which both depend on and undermine it. Similarly, for its part, Google has been talking since 2018 about basic agentic systems that can make phone calls on behalf of users, dealing with annoying phone trees or customer-support situations autonomously. Likewise, companies that feed people into those phone trees or complicated customer-service interactions aren’t likely to stand still in a situation in which most of those calls are being dealt with by bots (indeed, restaurants are already adapting).

What Claude is able to do here is already surprising, but it depends in no small part on fascinating small deceptions made in the name of productivity. If the goal is to let AI interact with the real world, asking for control of users’ computers is an incredibly useful first step and a shortcut to an enormous range of possible tasks, but it’s also a bold and perhaps risky approach by a firm that, unlike some other players in AI, doesn’t already have access to users’ email accounts or social-media profiles. (In terms of straightforward functionality, it’s worth noting that Anthropic’s computer sse feature has a lot in common with apps known as auto-clickers, macro tools, and key-pressers, which are used to automate humanlike actions on computers and phones and are widely used for producing spam and committing fraud.)

As Mollick suggests, software like this, which is also in development by OpenAI, among others, represents one way that AI companies are planning on “breaking out of the chatbox,” at least in concept. As people in the industry like to say, it’s “the worst it will ever be” in terms of raw capability, but, perhaps counterintuitively, it’s also as unencumbered as it will ever be, operating in a world that hasn’t had time to adjust to, or thwart, its presence.

The Rise of the Self-Clicking Computer