AI-POWERED NEWS

30+ sources. Zero spin.

Cross-referenced, unbiased news. Both sides of every story.

← Back to headlines

Enterprise AI Is Burning Cash and Crashing in Production — Here's How the Smart Companies Are Fixing It

Enterprise AI Is Burning Cash and Crashing in Production — Here's How the Smart Companies Are Fixing It
Most companies rushed AI into production without doing the engineering homework, and now they're paying for it — literally. Pinterest slashed AI costs 90% by gutting a frontier model and rebuilding it with proprietary data. Meanwhile, Gartner warns that through 2028, at least half of all generative AI projects will blow their budgets. The lesson: AI cost is an architecture problem, not a spending problem.

The Hype Worked. The Bills Arrived.

Enterprises spent the last two years racing to deploy AI. Now the invoices are landing, the systems are crashing, and the results are underwhelming.

According to Gartner, through 2028, at least 50% of generative AI projects will overrun their budgeted costs due to poor architectural choices and lack of operational discipline. Not bad luck. Bad engineering decisions made in a hurry.

A survey of over 300 CIOs conducted by Gartner in June and July 2024 found that more than 90% said managing cost limits their ability to extract real value from AI. Nine out of ten enterprise AI leaders cite cost as the primary constraint.

A separate MIT study found that 95% of generative AI pilots produce zero measurable impact on profit and loss. Not because the technology doesn't work — but because companies bolted AI onto broken foundations and called it a strategy.

Pinterest Figured It Out — By Breaking the Model on Purpose

Pinterest CEO Matt Madrigal didn't wait for a consulting firm to hand him a roadmap. He looked at 620 million monthly active users and did the math: calling a frontier model for every single image recommendation wasn't a product strategy, it was a budget catastrophe.

His solution, detailed in a recent VentureBeat Beyond the Pilot podcast, was surgical. His team took Qwen3-VL — a leading open-source vision-language model — and literally ripped out its vision encoder layer. Gone. Replaced with Pinterest's own proprietary multimodal embeddings built from years of accumulated pin and image metadata.

The results were substantial. Costs dropped 90%. Accuracy improved 30%. Inference latency that would have been 20 times worse — because the old approach required encoding every image at runtime, one at a time — is now handled via precomputed embeddings offline.

Madrigal's framework is straightforward and replicable: "If you've got really unique data that you can fine-tune an open-source model with, data quality will outweigh model size."

Pinterest also built a "taste graph" — a dynamic, continuously retrained representation of individual user preferences — to drive its conversational shopping assistant, Navigator 1. This is a specific architectural decision that separates Pinterest from competitors calling expensive APIs they don't control.

The Reliability Problem Nobody Wants to Talk About

Cost is only half the crisis. The other half is that these AI agents keep breaking.

Preeti Somal, Senior VP of Engineering at Temporal, laid it out plainly at an AI Impact Series event in New York. She said Temporal regularly works with customers who are building "version 2.0 of the same agent" — because the first version was deployed so fast that nobody thought about what happens when it crashes.

"Things crash and burn, and then they're back to rebuilding with the reliable foundation," Somal said.

The failure mode is predictable. Enterprise AI workflows are long-running — spanning multiple models, APIs, retrieval systems, and external tools — sometimes executing over hours or days. When one piece fails and there's no state management or recovery mechanism, the entire workflow restarts. Every restart multiplies inference costs. Every restart increases latency. Every restart is a bad customer experience.

Somal compared it to the early days of cloud migration, when companies did "lift and shift" — moving workloads to cloud without redesigning architecture — then discovered they were paying more for cloud than their old data centers and getting less value. Same mistake, new technology.

What the Coverage Is Missing

Most mainstream tech coverage of enterprise AI focuses on which model is smartest, which startup raised the most money, or which CEO said something bold at a conference.

The real story isn't model quality. It's operational discipline. Most AI spend comes from token usage, GPU utilization, and inefficient workflows — not from the model itself being inadequate. Fine-tuning smaller, task-specific models delivers stronger ROI than chasing the biggest frontier model for most enterprise use cases.

The companies winning this aren't the ones with the biggest AI budgets. They're the ones treating cost as an architectural constraint from day one — using model routing, token-efficient prompts, optimized retrieval pipelines, autoscaling, and what Azilen calls "AI FinOps" to align spending with actual business outcomes.

Only 11% of enterprises have successfully scaled AI across departments, according to Appinventiv. This is a leadership and planning problem, not a technology one.

What This Means for Real Businesses

If your company is still in pilot mode, you can still make the right architectural decisions before the bills get ugly. Don't skip the plumbing.

If you're already in production and the costs are climbing while the results are soft, you're probably building version 2.0 whether you know it or not. The question is whether you admit it now or after another year of sunk costs.

Pinterest's playbook is available to anyone paying attention: own your data, customize open-source models foundationally, precompute what can be precomputed, and stop calling expensive frontier APIs for problems your proprietary data can solve cheaper and better.

The AI bill is real. The solution isn't to spend more. It's to build smarter — the first time.

Sources

center VentureBeat Pinterest cut AI costs 90% by gutting a frontier model's vision layer
center VentureBeat AI agents are entering their rebuild era as enterprises confront the reliability problem
unknown appinventiv Scaling AI: Cost-Optimization Strategies for Enterprises
unknown azilen 8 AI Cost Optimization Strategies for Enterprise AI Systems
unknown truefoundry 10 Ways to Reduce Gen AI Costs: Insights from the Gartner ...