Anthropic’s Updated AI Models Can Control Your Computer for You

If you’ve always wanted to offload some of your tedious computer work to artificial intelligence, then that future is now a little closer: the updated Claude 3.5 Sonnet AI model, just released by Anthropic , is capable of taking control of your mouse and keyboard and performing tasks. on one’s own.

Right now it’s only in beta and only available to developers with access to the Claude API, but moving forward we can all have AI fill out forms, move files, search the web, etc., and do all the other tasks in which we previously relied on our fingers to solve.

First of all, the updated models of Claude: Anthropic have now supplanted users of Claude 3.5 Sonnet, which it says offers “wide-ranging improvements” and especially significant improvements in terms of encoding capabilities, with a significant increase in performance in standard benchmark tests (including SWE-bench . based on GitHub).

Then there’s Claude 3.5 Haiku, a new version of Anthropic’s faster, lighter, less expensive and less powerful AI model. Again, the company says overall performance has been improved and, as with the Sonnet, there are definite benefits in terms of encoding capabilities.

However, the biggest focus will be on the desktop experience included in the Claude 3.5 Sonnet update, which promises to take desktop automation to the next level. For now, however, Anthropic stresses that this is more of a beta product.

Use of the computer in Claude’s sonnet 3.5

In Anthropic’s demo video below, you can see Claude’s AI being tasked with filling out a form. The various pieces of information required for this form must be obtained from different databases and browser tabs, but all the user has to do is ask the form to be filled out and indicate where the required information can be found.

While Claude performs tasks, it takes screenshots and examines them to see what it’s looking at, similar to the image recognition and analysis capabilities that AI is already well known for. It then determines what it needs to do next based on what is displayed on the screen and the instructions given to it.

In this case, the AI ​​is smart enough to realize that it needs to switch to another browser tab and run a search on the company name to find some of the information it is looking for. Moving the cursor, clicking the cursor, and entering text are all done by Claude. The bot is able to identify the required data and copy it into the correct form fields.

After all, Claude is smart enough to notice and press the form submit button on the screen, which will then complete the task – all while the user watches. From the start, the AI ​​model appears to be able to understand what’s happening on the screen and figure out how to manipulate it to complete tasks.

However, Anthropic notes that basic tasks like scrolling, dragging, and zooming still “present challenges” for Claude, and beta testers are encouraged to test them using “low-risk” scenarios for now. In the OSWorld test, which measures how well AI can perform computing tasks, Claude 3.5 Sonnet apparently scores 14.9% (humans typically score around 70-75%).

Claude can now follow prompts to complete computer tasks. 1 credit

The developers of the new features were not afraid to point out some possible errors: in one test, Claude canceled the screen recording for no apparent reason. In another case, the bot suddenly and accidentally switched from a coding task to viewing online photos of Yellowstone National Park.

Anthropic also notes that every step forward in AI may bring new security concerns. According to an audit conducted by internal trust and security teams, the ability to use the computers in their current form does not pose an increased risk to system security, although this will be continually re-evaluated. Moreover, no user-provided data (including screenshots) will be used to train Claude AI models.

More…

Leave a Reply