Skip to content

Agentic Browsers and Computer Use

Using AI 6 min read

In Short

An agentic browser, or computer-use agent, is an AI that looks at a screen and then acts on it, moving the cursor, clicking, typing, and scrolling to do a task for you. In mid-2026 the main consumer versions are OpenAI's ChatGPT Atlas and ChatGPT agent, Perplexity Comet, Anthropic's Claude for Chrome, and Google Chrome's auto browse, with developer computer-use tools underneath. They are useful for booking, forms, research, and shopping, but they are supervised assistants, not reliable autonomous workers. Handing one your logged-in accounts and payment details carries real risk, because a hostile webpage can try to hijack it.

Snapshot caveat: This area moves fast. Reflects June 2026. Re-verify dates and prices on each provider's official page.

01. What It Is

Two ideas sit behind this topic. Computer use is the underlying capability. An agentic browser is the consumer packaging of it.

Computer use means the model gets a picture of the screen and replies with a concrete action, like moving the cursor, clicking, typing text, or scrolling. Anthropic shipped the first frontier-model version as a public beta on October 22, 2024, alongside an updated Claude 3.5 Sonnet. It was a developer tool, not a finished app, run on a computer you supply such as a sandboxed virtual machine.

An agentic browser wraps that capability in a browser, an extension, or an "agent mode." You use one much like the chat apps in how-to-use-an-llm, except it can act on a website, not just answer questions about it. The mental model is a coworker driving a browser while you watch over their shoulder.

02. Why It Matters

Most everyday admin happens in a browser, from booking to filling a form or pulling details out of a PDF. For years an AI could read the web and describe it back to you.
An agentic browser is the first time it can act on the web for you, running the agent loop from agents-and-agentic-workflows. That is why every major AI company shipped a version in close succession, and why HUMAN Security reported AI-agent traffic grew 7,851% year over year in 2025.

03. How It Works

The screenshot-and-action loop

Under the hood this is the agent loop applied to a real screen. The software captures the screen or reads the page. The model looks at that image and decides the single next action. The action runs, the changed screen feeds back in, and the cycle repeats until the task is done or the agent reaches a checkpoint where it must ask you.

Two skills combine.
Seeing the screen is the multimodal vision from multimodal-models, and taking actions is the tool use from tool-use-function-calling. It ships as a developer computer-use API from Anthropic, OpenAI, or Google, or as a consumer agentic browser, the kind below.

The products

Anthropic computer use and Claude for Chrome:
Computer use is Anthropic's developer building block, released in October 2024 and called experimental and error-prone at launch. The consumer version, Claude for Chrome, is an extension that reads a page and acts on it, clicking buttons and filling in forms. It launched as a research preview on August 25, 2025 with 1,000 Max-plan users and reached Pro, Team, and Enterprise subscribers by December 2025.

OpenAI Operator, ChatGPT agent, and Atlas:
Operator, launched January 23, 2025, was the first mainstream consumer browser agent, a preview that drove its own cloud browser to book or order groceries. It is gone as a standalone product, folded into ChatGPT agent on July 17, 2025. On October 21, 2025 OpenAI launched ChatGPT Atlas, an AI-native browser, macOS first. Atlas pairs a ChatGPT sidebar that explains the page with an Agent Mode that takes web actions under supervision and pauses to confirm sensitive ones.

Perplexity Comet:
Comet launched to Perplexity Max subscribers in July 2025, then went free to everyone on macOS and Windows on October 2, 2025. Paid users also get a background assistant that runs several tasks while they are away.

Google Mariner and Chrome:
Google's Project Mariner, revealed December 11, 2024, was the research prototype, a Gemini 2.0 browser agent that read the screen and then planned and acted. Google folded it into Gemini and Chrome. The consumer layer is Chrome's auto browse, which rolled out January 28, 2026 to Google AI Pro and AI Ultra subscribers in the US. It fills in forms and clicks through pages, pausing before actions like buying or posting on social media.

The risks

The headline risk is prompt injection from the page itself. A hidden instruction planted in a webpage or email can hijack the agent into doing something you never asked for. This is not theoretical. Anthropic's own red-teaming found browser use without mitigations had a 23.6% attack success rate across 123 test cases, dropping to 11.2% with mitigations, and on a tougher challenge set those mitigations cut the rate from 35.7% to zero. Even with its strongest model, Anthropic reports about 1% residual against an adaptive attacker and says plainly that no browser agent is immune. Brave's security team found the same in a shipping product, showing in August 2025 that asking Comet to summarize a page could be hijacked by hidden text to steal credentials and one-time passwords, a problem it called systemic across the category.
See prompt-injection-ai-security.

The second risk is the account-and-payment problem. An agent driving your logged-in browser inherits your sessions and saved payment methods, so a confused or hijacked agent acts with your full privileges. This is why vendors layer defenses such as site-level permissions, confirmation prompts before high-risk steps, blocks on categories like financial services, and classifiers that scan untrusted content.

Oversight follows from the first two risks. Every serious product ships confirmation gates because the agents still make mistakes. The reliable pattern in mid-2026 is to watch the run and approve the sensitive steps yourself, giving the agent the least access it needs rather than your whole logged-in life.

04. Key Terms

Term Plain meaning
Computer use The base capability. The model looks at a screenshot and replies with an action like click, type, or scroll. The engine inside an agentic browser.
Agentic browser A browser, extension, or mode where the AI acts on websites for you, not just answers questions about them.
Agent mode The setting that lets the assistant take multi-step actions for you, unlike a sidebar that only chats about the page.
Screenshot-and-action loop The cycle the agent repeats. See the screen, pick the next action, do it, look again.
Prompt injection (indirect) A hidden instruction in a webpage or email that hijacks the agent into something you did not ask for. The headline risk.
Action confirmation A checkpoint where the agent pauses before a sensitive step like buying or sending. Also called human-in-the-loop.
Least privilege Giving the agent the minimum sites and actions it needs, not access to everything you are logged into.

05. Examples

  • Filling a repetitive web form from details you already have, such as fields in a PDF. A good fit, with a human check before you submit.
  • Booking and checkout. The agent reaches a booking or payment page and pauses for you to confirm. Treat that confirmation as a real decision, not a rubber stamp.
  • When NOT to use it. Do not turn an agent loose on a page you distrust while it is logged into your bank or email. An untrusted page plus live credentials is exactly what prompt injection exploits.

06. Common Misconceptions

"Giving it my own logged-in browser is safe because it is my account."
That is where the risk lives. The agent inherits your sessions and payment methods, so a hijacked agent acts with your full privileges. Brave showed one poisoned page could lead Comet to expose credentials and one-time passwords.

"Agent mode means I can start it and walk away."
Current products are built for supervision. They pause to confirm purchases and posts because they still make mistakes, and the vendors' own numbers show prompt injection is not solved. Watch the run and approve the sensitive steps.

"These agents are basically autonomous now."
They are supervised assistants with guardrails. Top web benchmarks sit in the 80s percent and full-computer benchmarks far lower, meaning routine failure on harder multi-step tasks. Claude's OSWorld score rose from under 15% in late 2024 to 72.5% by early 2026, real progress that still leaves a wide gap to reliable.

Verified against primary sources

Every claim traces to a cited source below.

Key terms

Computer use
The model looks at a screenshot and replies with an action like click, type, or scroll.
Agentic browser
A browser, extension, or mode where the AI acts on websites for you, not just answers.
Agent mode
The setting that lets the assistant take multi-step actions, unlike a sidebar that only chats.
Prompt injection (indirect)
A hidden instruction in a webpage or email that hijacks the agent into something you did not ask for.
Least privilege
Giving the agent the minimum sites and actions it needs, not access to everything you are logged into.

Tags

#agentic-browsers #computer-use #prompt-injection #ai-agents #browser-automation

More in Do More With AI