Agents That See and Click
In late 2024, Anthropic launched computer use as a public beta for Claude, letting developers direct the AI to look at a screen, move a cursor, click buttons, and type text. By early 2026, computer-use agents have gone from research demos to shipping products. Anthropic's Claude can navigate desktop applications, OpenAI's Computer Using Agent operates within a secure browser environment, and Google is building similar capabilities into its agent products. The implications for software automation are enormous and immediate.
This is fundamentally different from traditional API integrations. Instead of connecting to a service's API, a computer-use agent interacts with the software's visual interface. The same way a human would. It reads text on screen, identifies buttons and form fields, and executes click sequences. This means it can automate any software that has a graphical interface, including legacy enterprise tools that have no API at all.
How Computer Use Actually Works
The technical architecture behind computer-use agents combines vision models with action planning. The agent captures a screenshot of the current screen state, processes it through a multimodal model that understands UI elements, generates an action plan (click here, type this, scroll there), and executes those actions through operating system APIs. The cycle repeats until the task is complete.
Anthropic's implementation focuses on raw OS-level control, meaning Claude can interact with any application on the desktop. Not just the browser. This gives it an early lead in automating deep-system workflows like file management, desktop application operation, and multi-application tasks. OpenAI's Computer Using Agent takes a different approach, running GPT-4o inside a secure virtual browser to follow high-level instructions for web-based tasks.
The tradeoffs between these approaches are significant. Anthropic's OS-level access is more powerful but carries greater safety risks. OpenAI's sandboxed browser approach is more constrained but easier to secure. Both companies have invested heavily in safety measures to prevent misuse, including limiting access to financial services and restricting certain types of interactions.
Claude Cowork: The Office Agent
In January 2026, Anthropic launched Claude Cowork, an agentic AI assistant that can plan and execute tasks autonomously for business users. Cowork uses computer-use capabilities to perform office automation tasks including contract review, data analysis, marketing campaign setup, sales workflows, HR processing, and general productivity work.
The product represents a bet that computer use will be most valuable not as a developer tool but as a business user tool. Instead of requiring users to write prompts or configure integrations, Cowork watches what you do, understands the workflow, and offers to handle repetitive steps. Early reports suggest it handles straightforward office tasks reliably, though complex multi-application workflows still require human oversight.
Anthropic also launched a Chrome extension that gives Claude context about everything happening in the user's browser, enabling it to assist with web-based tasks in real time. Select users can chat with Claude in a sidecar window that maintains context across browser tabs and sessions.
The Safety Challenge
Computer-use agents introduce safety risks that are qualitatively different from traditional AI systems. An agent that can click buttons and fill forms can also click the wrong buttons and fill the wrong forms. Anthropic has acknowledged these risks publicly, noting that AI agents with browser access are vulnerable to prompt injection attacks where malicious web content could redirect the agent's actions.
Both Anthropic and OpenAI have implemented multiple defensive layers. Claude's browser agent is blocked from accessing websites offering financial services, adult content, and pirated content by default. OpenAI's CUA runs in an isolated virtual environment that prevents it from affecting the user's actual system. These are reasonable first steps, but the attack surface for computer-use agents is fundamentally larger than for text-only AI, and the security tooling is still immature.
The enterprise implications are substantial. Companies that deploy computer-use agents need to think about access control at the UI level, not just the API level. An agent with access to an employee's screen has implicit access to every application that employee can see, which may include sensitive data that the agent's task does not require.
Real-World Performance
Benchmarks for computer-use agents are still evolving. The 2025-2026 guide to AI computer-use benchmarks tracks performance across tasks ranging from simple web navigation to complex multi-step workflows. Current success rates vary dramatically by task complexity: simple tasks like filling a form or navigating to a specific page achieve 85-95% success rates, while complex multi-step tasks that require error recovery and adaptation hover around 40-60%.
These numbers will improve rapidly as models get better at visual understanding and action planning. But for production deployments today, the reliability gap means computer-use agents are best suited for supervised automation. Handling the repetitive steps of a workflow while a human monitors and intervenes when the agent gets stuck or makes an error.
What This Means for Software Development
The rise of computer-use agents has implications for how software is built. If AI can interact with any graphical interface, the investment case for building comprehensive APIs changes. Legacy enterprise software that would take years to modernize with proper API coverage can be automated immediately through computer-use agents. This does not eliminate the need for APIs, but it provides a faster path to automation for the long tail of enterprise applications.
For developers building new applications, the question becomes: should you optimize your UI for AI agents as well as human users? Some companies are already adding semantic labels and structured metadata to their interfaces specifically to improve AI agent performance. This trend will accelerate as computer-use agents become more prevalent.
Sources and Signals
Product information from Anthropic's official announcements and blog posts. OpenAI CUA details from published documentation and WorkOS comparison analysis. Safety analysis based on Anthropic's published safety guidelines and independent security research. Benchmark data from O-Mega AI's computer-use benchmark tracking.