The rise of AI agents that use your computer: what you need to know in 2025

Something remarkable is happening in AI: machines are learning to use computers the same way you do. Not through APIs or code - but by looking at the screen, moving the mouse, clicking buttons, and typing text.

They're called computer-use agents (CUAs), and in December 2025, they're having their moment.

The funding frenzy

In just the first week of December 2025, the CUA space saw explosive activity:

Simular AI raised $21.5 million Series A from Felicis (plus Nvidia's NVentures), bringing total funding to $27M. Built by ex-Google DeepMind researchers, their agent can "literally move the mouse on the screen" - controlling your entire Mac, not just the browser.
OpenAGI Lux emerged from stealth December 1st, built by MIT/CMU/UIUC researchers who open-sourced the training infrastructure (OSGym) that generated their results.
Browser-use already raised $17 million in March 2025 (Felicis led again). The YC W25 company went viral when Chinese startup Manus used it. Now 20+ YC companies rely on it.
Amazon Nova Act went generally available at AWS re:Invent, promising 90%+ task reliability at scale.

The market validates this: AI agents are a $7.6 billion market in 2025, projected to hit $50-100+ billion by 2030-2034. Around 85% of enterprises have already integrated AI agents into at least one workflow.

The benchmark race

How do you measure an AI's ability to use a computer? Benchmarks like Mind2Web (web tasks) and OSWorld (full OS tasks) have become the scorecards everyone watches.

Here's where the major players stand in December 2025:

Agent	Mind2Web	OSWorld	Speed	Provider
OpenAGI Lux	83.6%	-	1 sec/step	OpenAGI Foundation
Simular Agent S	-	69.9%	Varies	Simular AI
Google Gemini CUA	69.0%	-	~2 sec/step	Google
Claude Sonnet 4.5	61.0%	61.4%	~2 sec/step	Anthropic
OpenAI Operator	61.3%	38.1%	~3 sec/step	OpenAI (deprecated Aug 2025)
Amazon Nova Act	-	90%+ reliability	Varies	AWS

Human performance on OSWorld: ~72%. Simular's Agent S is at 69.9% - we're nearly there.

The numbers that should catch your eye:

Lux claims 3x speed (1 sec vs 3 sec per action) and 10x cheaper than Operator
Lux goes beyond browsers - it can control desktop apps (Excel, Slack, Adobe) that most CUAs can't touch
Nova Act focuses on reliability over raw accuracy, hitting 90%+ for production workflows
Claude Opus 4.5 (Anthropic's newest) was described as "best model in the world for computer use" with improved vision

Microsoft selected Simular as one of five companies for its Windows 365 for Agents program. The validation from both Nvidia (investor) and Microsoft (partner) suggests the big players see what's coming.

The security problem no one has solved

Here's the part that should concern you.

As AI agents gained the ability to browse the web for you, security researchers started poking holes. What they found wasn't pretty.

Prompt injection: the attack that won't go away

Imagine visiting a website that contains hidden instructions - invisible to you, but clearly visible to the AI agent reading the page for you. Those instructions could tell the agent to:

Export your emails to an attacker's server
Change your password and send the new one somewhere
Make purchases on your behalf
Click malicious links while logged into sensitive accounts

This is prompt injection, and it's the Achilles' heel of every CUA on the market.

Perplexity's BrowseSafe research (December 2025) built an open-source defense model that detects these attacks with 91% accuracy. Sounds good until you realize that means nearly 10% of attacks still succeed. Their BrowseSafe-Bench dataset includes 14,719 examples across 11 attack types and 9 injection strategies - and sophisticated attacks (multilingual instructions, hypothetical framing) dropped detection accuracy to just 76%.

For comparison:

PromptGuard-2: 35% detection rate
GPT-5: 85% detection rate
BrowseSafe: 91% detection rate
Reality: 100% would be required for safety

OpenAI's CISO acknowledged that "prompt injection remains a frontier, unsolved security problem."

Why this matters for sensitive tasks

These agents often have access to your logged-in sessions, saved passwords, email, and banking. Security researcher Chalhoub put it bluntly: "These are significantly more dangerous than traditional browser vulnerabilities. With an AI system, it's actively reading content and making decisions for you."

Rachel Tobac, CEO of SocialProof Security, warns that "user credentials for AI browsers are likely to become a new target for attackers" and recommends siloing sensitive accounts from early versions of AI browsers.

Three architectures emerging

Watching this space, three distinct approaches are forming:

1. Full desktop control (Simular, Lux)

Companies like Simular and OpenAGI want AI that controls your entire computer - browser, Excel, Slack, design tools, everything. Simular's co-founder Ang Li says they can "literally move the mouse on the screen... repeating whatever human activities in the digital world."

Pros: Maximum capability, can automate complex multi-app workflows Cons: Maximum attack surface, needs access to everything

2. Browser-only agents (Google Mariner, Perplexity Comet)

OpenAI Operator (deprecated in favor of ChatGPT agent), Google Mariner, and Perplexity Comet focus specifically on web browsing. More contained, but still processing everything you see online.

Pros: More contained scope, easier to sandbox Cons: Still vulnerable to prompt injection, can't help with desktop apps

3. Task-specific agents (Nova Act, Dosel)

Instead of "do anything on my computer," these focus on one job and do it well. Amazon Nova Act targets enterprise UI automation (QA testing, data entry, checkout flows) with 90%+ reliability. Less impressive demos, but more reliable for actual production use.

Pros: Narrower attack surface, purpose-built security, higher reliability Cons: Limited to specific use cases

What this means for you

If you're excited about AI agents handling your tedious computer work, you should be. The technology is real and improving fast - 39% of consumers already feel comfortable using AI agents, and 70% would use them to book flights.

But for the sensitive stuff, ask yourself:

What data does this agent have access to?
Where does that data get processed - on your machine or in the cloud?
What happens if the agent gets tricked by a malicious website?
Who sees the credentials the agent handles?

For general browsing, research, and shopping? The risk is probably acceptable.

For anything involving passwords, banking, healthcare, or sensitive accounts? The security story isn't there yet for general-purpose agents. Even the best defense (BrowseSafe) lets 10% of attacks through. That's not good enough when your bank account is on the line.

Where we fit in

Full disclosure: we build Dosel, which uses browser automation to change passwords across websites.

We made deliberate architectural choices based on watching this space:

Local-only processing: Your passwords never leave your Mac, never touch a cloud server
Zero-knowledge design: The AI sees screens but credentials are injected at the right moment - invisible to the model
Task-specific focus: We do one thing (password changes) rather than everything
Anti-exfiltration protections: Can't be prompt-injected into sending credentials elsewhere

We use browser-use under the hood - the same library powering 20+ YC companies - but with a custom fork that adds secure credential injection. The AI navigates websites and finds password change forms, but never actually sees your passwords.

General CUAs are impressive technology. But for credential management specifically, the security model matters more than benchmark scores.

What's next

The CUA space will keep evolving rapidly in 2025-2026:

Microsoft + Simular: Windows 365 for Agents program brings desktop automation to enterprise
Anthropic's Claude for Chrome: Browser extension that autonomously completes tasks across tabs
Google's Chrome agentic features: New security measures including prompt-injection classifiers
BrowseSafe and friends: Security solutions becoming table stakes for any serious CUA
Enterprise adoption: Hertz reports 5x shipping velocity with Nova Act for QA automation

The question isn't whether AI will use computers for us. It's whether we can make it safe enough to trust with what matters.

For low-stakes tasks, the answer is clearly yes.

For sensitive digital labor - passwords, credentials, financial accounts - the jury is still out on general-purpose agents. Purpose-built tools with security-first architecture are the safer bet.

Sources

Download Dosel → — 5 free automated password changes per month, no credit card required.

Questions about computer-use agents and password security? Reach out at hello@dosel.app.