AIAgentsAutomation

AI Agents in 2026: What They Can Actually Do (And Where They Still Fail)

Every AI company shipped an "agent" in 2025. By 2026, most of them are quietly quiet. Here is the honest map of what AI agents can genuinely do today — browse, research, draft, code, schedule, automate tickets — and where they still faceplant (tool reliability, recovering from failures, long-running tasks, cost control). Written for people deciding where to actually bet, not people writing pitch decks.

Siddharth PuriApril 8, 20269 min read
AI & Future of Work

AI Agents in 2026: What They Can Actually Do (And Where They Still Fail)

April 8, 2026 · 9 min read · Siddharth Puri

In 2025, every AI company shipped an "agent" of some kind. By mid-2026, most of the demo videos are quietly buried and the product pages have been rewritten. What is left is a much narrower, much more useful picture of what agents can actually do — and a much clearer picture of where they still fail.

This is the version of that map I would want before committing engineering time to an agent project.

What agents are genuinely good at today

  • Research and summarisation — "find me everything publicly available about X and summarise it" works well
  • Browser automation with narrow scope — booking, scraping, routine form-filling
  • First-pass code across small, well-defined problems
  • Email and calendar triage with heavy guardrails
  • Tier-1 customer support where escalation is easy
  • Internal tools where the cost of a mistake is "a human reviews"

The common thread: bounded scope, cheap failure, human in the loop on exit. When those three are true, agents in 2026 are genuinely productive.

Where agents still faceplant

  • Long-running tasks where state accumulates — they lose the plot around step 15
  • Tasks where the right move is "give up and ask a human" — they power through and produce nonsense
  • Anything where tool errors cascade — one failed API call often cooks the whole run
  • Cost control — agents happily burn $40 of tokens solving a $3 problem
  • Consistency across runs — same prompt, same inputs, sometimes different behaviour

The last one is the sleeper. Most engineering teams are used to systems that behave the same given the same inputs. Agents do not. You have to design around that from day one, not bolt it on.

The "evaluation" problem nobody wants to talk about

Shipping an agent without a proper evaluation harness is shipping a car without a speedometer. You will not know when it regresses. You will not know when the new model helped or hurt. You will only notice when a customer tells you.

Evals are unsexy infrastructure that nobody demos. They are also the single highest-leverage thing you can build if you are serious about agents. If your agent team has a glossier roadmap than their eval suite, bet against the project.

Where to actually bet

  • Internal tools first, customer-facing later. Your team tolerates agent quirks; your customers do not
  • Workflows where a human already reviews the output. You are automating the first 80%, not the last 20%
  • Domains with clear ground truth — answer was right or wrong, no ambiguity
  • Problems with bounded cost. Pay per success, not per attempt

The boring truth

Agents in 2026 are great for a specific class of problem, useless for another class, and dangerous for a third. The winning teams are not the ones building the flashiest agent. They are the ones honest about which class they are in.

The demo era is over. The "quietly compounding inside serious companies" era is just starting. That is the better era to be in.

An agent without evals is a demo. A demo is not a product.
All postsSiddharth Puri

Keep reading

View all →
AI & Future of Work

Claude 3 vs GPT-5: What Changed and Why It Matters

March 26, 2026 · 9 min

Claude 3 vs GPT-5: What Changed and Why It Matters

They both claim to be the smartest thing ever built, and both demos look suspiciously similar. This is a ground-level look at how Claude 3 and GPT-5 actually differ in reasoning depth, long-context reliability, code quality and tool use — plus a blunt cheat sheet for which one to pick for which job. Written in English, without the benchmarks theatre.

AI & Future of Work

Will AI Really Replace Developers or Just Upgrade Them?

March 18, 2026 · 8 min

Will AI Really Replace Developers or Just Upgrade Them?

The internet has been burying developers every year since 1998 and we keep showing up for breakfast. Here is the honest split — which parts of the job AI genuinely eats (boilerplate, docs, test scaffolding, Stack Overflow archaeology) and which parts quietly get harder and more valuable (product judgement, architecture, ambiguity). Short answer: it replaces the parts of your job you hated, and the parts that pay you get more fun.

AI & Future of Work

Jobs That Will Survive the AI Revolution (And Why)

March 8, 2026 · 9 min

Jobs That Will Survive the AI Revolution (And Why)

Forget the "creative vs repetitive" framing — the real line is "work customers trust a machine with vs work they do not." This post maps three tiers: jobs that will stay deeply human (health, founding sales, investigative journalism, skilled trades), jobs that will transform rather than disappear (design, engineering, teaching, support), and jobs that are quietly becoming the best bets of the decade. A calmer, less LinkedIn-flavoured take on the next ten years.