30+ sources. Zero spin.
Cross-referenced, unbiased news. Both sides of every story.
Enterprise AI Agents Are Failing at an 80-95% Rate — and the Models Aren't the Problem

The Dirty Secret Nobody in the AI Hype Cycle Wants to Say Out Loud
Most enterprise AI agents fail. Not because the models are dumb. Because the organizations deploying them are unprepared.
MIT's NANDA report put a number on it: 95% of generative AI pilots at companies are failing. McKinsey's State of AI report confirms very few enterprise agents make it from pilot to production. According to analysis by blogger Drew Breunig, who synthesized findings across MIT NANDA, McKinsey, Wharton/GBK, and UC Berkeley's MAP study, the pattern is consistent across every major research effort in 2025.
Companies are spending real money — executive time, engineering hours, vendor contracts — and getting almost nothing back.
The Real Failure Mode: It's Organizational, Not Technical
Jyoti Shah, Director of Applications Development and GenAI tech leader at ADP, wrote in Forbes that the core mistake is treating an organizational challenge like a technology deployment. The gap, she says, is "orchestration, trust and incentives" — NOT model capability.
She described watching a support automation agent that was given simultaneous access to ticket resolution workflows AND account configuration systems. The agent started making configuration changes based on incomplete ticket context. Teams spent more time diagnosing the side effects than fixing the original customer problems. The agent wasn't malfunctioning. Nobody had defined what it was and wasn't allowed to do.
This pattern is showing up across IT departments nationwide.
RAG Was Supposed to Fix This. It Didn't.
The standard enterprise answer to AI reliability has been Retrieval-Augmented Generation — RAG. Shove relevant documents into the model's context window and let it figure things out. According to Wyatt Mayham of Northwest AI Consulting, quoted by VentureBeat, RAG "breaks immediately" for agents that need to make decisions and take actions.
The problem: a retrieved document doesn't tell an agent whether it still applies, whether it's been superseded, or whether a conflicting rule takes priority. "Miss any of that," Mayham said, "and the agent confidently does the wrong thing."
Rippletide co-founder and CSO Yann Bilien calls the alternative approach a decision context graph — a structured map encoding what rules are applicable, when they apply, and in what sequence. The key property he emphasizes is "non-regressivity": the ability to freeze validated action sequences and build on them without regressing to earlier failures. It's a fundamentally different architecture than RAG, and it's the kind of thing most enterprise pilots never bother to build.
The Production Crisis Nobody Wanted to Talk About
On the software operations side, the problem is getting worse fast. AI-powered code generation has exploded — engineering teams are shipping dramatically more code than they were two years ago. The catch: keeping that code running in production is still overwhelmingly manual.
Resolve AI, which raised a $125 million Series A at a $1 billion valuation earlier this year, is betting its entire company on that gap. CEO Spiros Xanthos told VentureBeat the company's new platform now deploys coordinated teams of specialized agents to diagnose production failures in parallel — rather than a single agent working alone. He claims more than a 2x improvement in root cause accuracy on internal benchmarks. Those are internal benchmarks, NOT third-party audited results. Even the direction of the problem is telling: AI is breaking production systems, and we need more AI to fix them.
New Platforms Keep Launching. The Fundamentals Don't Change.
Kore.ai launched its Artemis Agent Platform on May 21, 2026, introducing a YAML-based Agent Blueprint Language designed to let enterprises govern and build AI agents using AI itself. Founder and CEO Raj Koneru told VentureBeat the goal is to compress months of engineering work into days.
It's a credible technical approach. Version-controlled YAML artifacts, six built-in orchestration patterns, compiler and runtime included.
But new platforms don't fix organizational dysfunction. You can hand a disorganized construction crew the best power tools on the market. The building still falls down if nobody agrees on the blueprints.
What the Tech Press Is Getting Wrong
Mainstream tech coverage — including most of VentureBeat's own reporting — focuses almost exclusively on new platform launches, funding rounds, and vendor capability claims. The 95% failure rate gets mentioned in passing, if at all.
The real story is in the enterprise behavior data. According to Breunig's synthesis of the Wharton/GBK AI Adoption Report, 82% of enterprise leaders use generative AI weekly — but they're using ChatGPT, Copilot, and Gemini. Third-party, off-the-shelf tools. NOT internally built agents.
Custom-built enterprise agents — the ones companies are spending the most money on — see dramatically lower adoption. Employees trust ChatGPT. They don't trust the thing IT built last quarter.
The trust gap is the actual story, and it's largely uncovered.
What This Means for Regular People
If you work at a company currently "piloting AI agents," there's a better-than-even chance that pilot is quietly failing — and no one in leadership wants to say it out loud. They've got a vendor contract, a board update to deliver, and a press release ready to go.
The technology can work. The companies deploying it, mostly, aren't ready.