AIAgentsAutomation

AI Agents in 2026: What They Can Actually Do (And Where They Still Fail)

Every AI company shipped an "agent" in 2025. By 2026, most of them are quietly quiet. Here is the honest map of what AI agents can genuinely do today — browse, research, draft, code, schedule, automate tickets — and where they still faceplant (tool reliability, recovering from failures, long-running tasks, cost control). Written for people deciding where to actually bet, not people writing pitch decks.

Siddharth PuriApril 8, 20269 min read

AI & Future of Work

AI Agents in 2026: What They Can Actually Do (And Where They Still Fail)

April 8, 2026 · 9 min read · Siddharth Puri

In 2025, every AI company shipped an "agent" of some kind. By mid-2026, most of the demo videos are quietly buried and the product pages have been rewritten. What is left is a much narrower, much more useful picture of what agents can actually do — and a much clearer picture of where they still fail.

This is the version of that map I would want before committing engineering time to an agent project.

What agents are genuinely good at today

Research and summarisation — "find me everything publicly available about X and summarise it" works well
Browser automation with narrow scope — booking, scraping, routine form-filling
First-pass code across small, well-defined problems
Email and calendar triage with heavy guardrails
Tier-1 customer support where escalation is easy
Internal tools where the cost of a mistake is "a human reviews"

The common thread: bounded scope, cheap failure, human in the loop on exit. When those three are true, agents in 2026 are genuinely productive.

Where agents still faceplant

Long-running tasks where state accumulates — they lose the plot around step 15
Tasks where the right move is "give up and ask a human" — they power through and produce nonsense
Anything where tool errors cascade — one failed API call often cooks the whole run
Cost control — agents happily burn $40 of tokens solving a $3 problem
Consistency across runs — same prompt, same inputs, sometimes different behaviour

The last one is the sleeper. Most engineering teams are used to systems that behave the same given the same inputs. Agents do not. You have to design around that from day one, not bolt it on.

The "evaluation" problem nobody wants to talk about

Shipping an agent without a proper evaluation harness is shipping a car without a speedometer. You will not know when it regresses. You will not know when the new model helped or hurt. You will only notice when a customer tells you.

Evals are unsexy infrastructure that nobody demos. They are also the single highest-leverage thing you can build if you are serious about agents. If your agent team has a glossier roadmap than their eval suite, bet against the project.

Where to actually bet

Internal tools first, customer-facing later. Your team tolerates agent quirks; your customers do not
Workflows where a human already reviews the output. You are automating the first 80%, not the last 20%
Domains with clear ground truth — answer was right or wrong, no ambiguity
Problems with bounded cost. Pay per success, not per attempt

The boring truth

Agents in 2026 are great for a specific class of problem, useless for another class, and dangerous for a third. The winning teams are not the ones building the flashiest agent. They are the ones honest about which class they are in.

The demo era is over. The "quietly compounding inside serious companies" era is just starting. That is the better era to be in.

An agent without evals is a demo. A demo is not a product.

All posts— Siddharth Puri

AI Agents in 2026: What They Can Actually Do (And Where They Still Fail)

AI Agents in 2026: What They Can Actually Do (And Where They Still Fail)

What agents are genuinely good at today

Where agents still faceplant

The "evaluation" problem nobody wants to talk about

Where to actually bet

The boring truth

Keep reading

Next-Gen Cloud Methods: Demystifying the New Version Launch

Next-Gen Cloud Methods: Demystifying the New Version Launch

Google I/O 2025: What Actually Changed — and Why This Round Feels Different

Google I/O 2025: What Actually Changed — and Why This Round Feels Different

What OpenAI Actually Shipped in 2025 — and How Each Update Changes Real Workflows

What OpenAI Actually Shipped in 2025 — and How Each Update Changes Real Workflows