READ. SCROLL. LISTEN.

Original briefings. Zero spin.

Every story is an original briefing written from 60+ sources across the spectrum — sources linked so you can verify it yourself.

← Back to headlines

tech Notable (5/10) June 6, 2026 at 02:02 AM

Model Routing Emerges as Corporate America's Answer to Runaway AI Bills — and It Threatens OpenAI and Anthropic's Revenue Models

Since reports emerged in late May that companies like Uber and Cisco were blowing past their entire annual AI budgets within months, a concrete technical fix has taken shape: model routing. The strategy could slash AI spending by up to 90% on routine tasks — which is great news for CFOs and terrible news for the frontier AI companies whose sky-high valuations depend on everyone paying premium prices for everything.

Since our earlier coverage confirmed that Uber burned its entire 2026 AI coding budget by April and Cisco overran its own targets with a projected $900 million annual tab for 90,000 employees, the corporate AI spending crisis has moved from shock to strategy.

The new strategy is called model routing — a key development in enterprise AI that could reshape how companies manage AI costs.

What Model Routing Actually Is

The concept is straightforward. Instead of firing every query — from 'summarize this email' to 'redesign our entire database architecture' — through GPT-5.1 or Claude Opus 4.5, you match the task to the cheapest model capable of handling it.

Scott Wu, CEO of Cognition (makers of the coding agent Devin), told CNBC that for routine boilerplate work, companies can achieve five to ten times better cost efficiency by routing to cheaper alternatives. Arvind Jain, CEO of enterprise AI company Glean, estimates that roughly 95% of enterprise AI usage today still runs on the most expensive frontier models — even for tasks a $5-per-million-token model could handle just fine.

Wu made the point bluntly: ask any AI model, expensive or cheap, to name the third U.S. president. Every single one says Thomas Jefferson. You're paying GPT-5.1 rates for a Wikipedia lookup.

The Numbers That Should Make Every CFO Angry

Cisco Chief Product Officer Jeetu Patel laid out the math to CNBC: at roughly $200 in token usage per employee per week, that's $10,000 per person per year. For a 90,000-employee company, that's $900 million annually. Patel confirmed Cisco came in well over budget and has had to reallocate resources, with 30,000 engineers now building products largely written with AI assistance.

Priceline's situation is worse. Senior Director of IT Finance Chris Reed told TechCrunch that a routine Cursor contract renewal came back 4-5x more expensive than expected. Reed's description of the dynamic was characteristically blunt: 'It's like the crack-cocaine epidemic. They let you try it to get you hooked on it, and now you're kind of beholden to it.' Priceline has responded by placing token limits on certain employee groups.

One unnamed company reportedly ended up with a $500 million Claude bill after simply forgetting to set usage limits for employees, according to TechCrunch.

What's Actually Driving the Cost Surge

Per-token prices have actually fallen. This isn't a pricing gouge story. The crisis is driven by consumption volume, not unit cost.

New model releases in November — Anthropic's Claude Opus 4.5, OpenAI's GPT-5.1, and Google's Gemini 3 Pro — brought major improvements to agentic tools. Agentic AI systems don't just answer one question; they chain together dozens or hundreds of queries to complete a task autonomously. Each step burns tokens. The better the agents got, the more companies deployed them. The more they deployed them, the more tokens got consumed — without anyone watching the meter.

Alexander Embiricos, OpenAI's head of enterprise, told TechCrunch at an event in New York City this week that the corporate conversation has completely flipped. Six months ago, he said, every client wanted to know what the technology could do. Now they want to know: 'What visibility do you have? What auditability do you have? What token controls do you have?'

A New Standards Body Enters the Picture

The Linux Foundation announced plans this week for the Tokenomics Foundation, a new standards body explicitly designed to bring the same cost discipline to AI tokens that the FinOps movement brought to cloud computing.

J.R. Storment, executive director of the FinOps Foundation (itself a Linux Foundation project), told TechCrunch the inflection point hit hard: 'In April and May, I started hearing from companies: Oh my god, we are 3x over our entire 2026 token budget and it's only April. We started hearing existential crises, and the whole conversation shifted from tokenmaxxing and go fast to we need guardrails, how do we control this?'

This is the cloud cost management story playing out again — except faster and with higher stakes.

Revenue Pressure on OpenAI and Anthropic

Both companies' valuations are built on an assumption: enterprises will keep routing maximum volume through maximum-cost frontier models indefinitely. Model routing challenges that assumption directly.

If 95% of enterprise usage is currently on frontier models but only 5-10% of tasks actually require frontier models, the addressable market for GPT-5.1 and Claude Opus 4.5 at premium prices could shrink significantly as companies optimize spending.

Microsoft already revoked its developers' Claude Code licenses months after rolling them out, according to TechCrunch. That's a significant signal — the largest software company on earth decided the ROI math doesn't work.

What This Means for Regular People

If you work at any company deploying AI tools, your access is about to get tiered — and probably more restricted. The era of 'use the best model for everything' is over at most large enterprises. Expect usage caps, routing policies, and a lot more questions from finance about what exactly your team is doing with AI.

For taxpayers: the federal government is also deploying AI at scale. Equivalent scrutiny is not being applied to federal token consumption. Nobody is asking what federal agencies are paying per query or whether they've set any usage limits whatsoever. Someone should start asking.

Sources used for this briefing

This briefing was written by UBH's AI agent — these are the reporting inputs it draws on, linked so you can verify.

center

forbesOptimizing LLM Costs Through Strategic Routing

center-left

TechCrunchThe token bill comes due: Inside the industry scramble to manage AI’s runaway costs

center-left

CNBCModel routing is a fix for AI overspending. That's a problem for OpenAI and Anthropic

center-left

CNBCBoeing to start 737 Max production on new assembly line July 6, CEO says

center-left

techcrunchEnterprises turn to model routing to curb runaway AI spending

unknown

cioHow to Tame Runaway AI Spending

Search Results

Model Routing Emerges as Corporate America's Answer to Runaway AI Bills — and It Threatens OpenAI and Anthropic's Revenue Models

What Model Routing Actually Is

The Numbers That Should Make Every CFO Angry

What's Actually Driving the Cost Surge

A New Standards Body Enters the Picture

Revenue Pressure on OpenAI and Anthropic

What This Means for Regular People

Sources used for this briefing