AICase studies

The Biggest AI Blunders Companies Don’t Talk About

Behind every "AI-powered" press release sits a stack of quiet disasters nobody posts on LinkedIn — the chatbot that recommended a competitor, the email that implied a refund policy that does not exist, the personalisation engine that rediscovered last year's bias. This post walks through the failure modes that keep repeating, the shared root cause (it is almost never the model), and the one question that separates "AI roadmap" from "expensive demo": where is your evaluation plan?

Siddharth PuriFebruary 12, 20268 min read
AI & Future of Work

The Biggest AI Blunders Companies Don’t Talk About

February 12, 2026 · 8 min read · Siddharth Puri

No one tweets about the AI feature that shipped, embarrassed the company for a week, and got quietly rolled back at 2 AM on a Thursday. No one writes a LinkedIn post saying "we burned six months and $300k and then turned it off." So let me, because the pattern keeps repeating, and it is not actually that subtle once you have seen it enough times.

The classic failure modes

  • Chatbots that confidently recommend competitor products because the training data included reviews
  • AI-generated emails that imply a refund policy or warranty the company never promised
  • Personalisation systems that rediscover last year's bias in a new way every quarter
  • Demos that are 80% hand-coded, 20% LLM call, and get called "AI-powered" on the press release
  • Summarisation features that confidently invent quotes nobody said
  • Support bots that escalate nothing, making angry users call three times before reaching a human
  • Hiring screens that quietly filter out good candidates for reasons nobody can explain

The common root cause

Most of these are not AI problems. They are "we shipped without evaluation" problems. Let me say that more precisely, because I have said it in client meetings at least forty times: if you cannot describe how your AI feature is tested, it is not a feature. It is a future incident.

The companies that get AI right have three things most companies skip entirely:

  • A set of real examples that represent what customers will actually ask (not what the demo script asked)
  • A defined bar for "good enough" that everyone agreed to before shipping
  • A continuous eval loop that tells them when quality drops, without relying on customers to complain

Case study: the helpful chatbot that hurt retention

A company I worked with shipped a support chatbot to take pressure off their Tier 1 team. Launch week: queue dropped 40%. Leadership celebrated. Three months later, retention had quietly dropped 6%, and nobody could figure out why.

The chatbot was resolving tickets — technically. But the resolutions were shallow. Users got a polite dismissal from the bot, felt unheard, and churned silently. The metric ("tickets resolved") said everything was great. The metric ("are customers renewing?") said the opposite. The real bug was that "tickets resolved" was never the real goal.

Case study: the personalisation engine that re-discovered bias

Another client built a fancy personalisation engine. It learned from user behaviour, updated weekly, shipped recommendations. Inside six months it had rediscovered every historical bias in their dataset and was showing women different products than men in ways that made the marketing team wince.

The model was working exactly as designed. The design did not include "do not make things worse than the status quo." That is not an AI mistake. That is a specification mistake.

Case study: the enterprise demo that could not survive a Tuesday

Sales demo looked incredible. Customer signs $2M contract. In week three of deployment, real customer data is weirder than demo data, queries break, confidence evaporates. The customer renews for a fraction. Everyone involved learns an expensive lesson about the gap between demo data and real data.

Demos are selected. Real life is unselected. This gap is where most AI products die.

How to not be the next case study

  • Before you build, write down ten weird real inputs the feature will see. Not five happy-path ones — ten weird ones
  • Define what "good" looks like in measurable terms. If you cannot measure it, you cannot ship it
  • Ship to 1% first. Not because it is hard to deploy to 100%, but because it is hard to roll back from 100%
  • Build a "quality-drop alarm" — an automated check that screams when outputs degrade
  • Write the rollback plan before the launch plan. If your rollback is "we panic," you are not ready
  • Treat your AI feature like a third-party vendor — expect it to go down, expect it to hallucinate, design accordingly
If your AI roadmap has no evaluation plan, you do not have an AI roadmap. You have a demo.

None of this is glamorous. None of it makes good press-release material. But the companies that do this quietly are the ones whose AI features are still shipping a year later, with customers who trust them. Everybody else is on their fourth rebrand of the same feature.

All postsSiddharth Puri

Keep reading

View all →
AI & Future of Work

Claude 3 vs GPT-5: What Changed and Why It Matters

March 26, 2026 · 9 min

Claude 3 vs GPT-5: What Changed and Why It Matters

They both claim to be the smartest thing ever built, and both demos look suspiciously similar. This is a ground-level look at how Claude 3 and GPT-5 actually differ in reasoning depth, long-context reliability, code quality and tool use — plus a blunt cheat sheet for which one to pick for which job. Written in English, without the benchmarks theatre.

AI & Future of Work

Will AI Really Replace Developers or Just Upgrade Them?

March 18, 2026 · 8 min

Will AI Really Replace Developers or Just Upgrade Them?

The internet has been burying developers every year since 1998 and we keep showing up for breakfast. Here is the honest split — which parts of the job AI genuinely eats (boilerplate, docs, test scaffolding, Stack Overflow archaeology) and which parts quietly get harder and more valuable (product judgement, architecture, ambiguity). Short answer: it replaces the parts of your job you hated, and the parts that pay you get more fun.

AI & Future of Work

Jobs That Will Survive the AI Revolution (And Why)

March 8, 2026 · 9 min

Jobs That Will Survive the AI Revolution (And Why)

Forget the "creative vs repetitive" framing — the real line is "work customers trust a machine with vs work they do not." This post maps three tiers: jobs that will stay deeply human (health, founding sales, investigative journalism, skilled trades), jobs that will transform rather than disappear (design, engineering, teaching, support), and jobs that are quietly becoming the best bets of the decade. A calmer, less LinkedIn-flavoured take on the next ten years.