60% of enterprises curb AI spending as token costs surge, UBS finds

Roughly 60% of enterprises have imposed controls on AI spending, UBS found, as token consumption from agents and coding tools pushes costs into CFO-level scrutiny and forces a shift toward cheaper models including Chinese open-source alternatives.

"This is a big speed bump, not a small one," Databricks Chief Executive Officer Ali Ghodsi said, describing the recalibration.

The price gap between tiers is stark: Anthropic's Haiku 4.5 charges $5 per million output tokens, while its top-tier Fable/Mythos 5 costs $50 — a tenfold spread that makes model routing economically compelling. One company saw a single user rack up $35,000 in monthly AI costs on AWS Bedrock, according to the report. Another cut its internal AI tools to two from five after burning through its token budget.

The shift threatens revenue growth for premium AI providers such as Anthropic and OpenAI while creating openings for cheaper alternatives. Chinese open-source models — Alibaba's Qwen, DeepSeek, MiniMax and Zhipu's GLM — are entering enterprise procurement lists. A major global bank has deployed Qwen locally to balance its use of Anthropic's Claude, the report said.

Model routing reshapes the cost curve

The most consequential technical response is model routing — assigning simple tasks to cheap models and reserving expensive ones for complex reasoning. Palantir Technologies commercialized this approach about a month ago with AIP Evolve, which in one case cut a client's token costs by 97%. The product achieved 90% adoption within three weeks of launch, the report said.

Microsoft's release of its MAI "Thinking" model, a 35-billion-parameter system, also targets this middle ground — powerful enough for reasoning tasks but cheaper than frontier models. The strategy mirrors a broader industry push toward "good enough" AI at lower price points.

The cost crunch is accelerating adoption of Chinese open-source models. AWS Bedrock now lists MiniMax, Moonshot's Kimi, Qwen, DeepSeek and GLM in its model catalog. Microsoft offers DeepSeek through Azure AI Foundry. While these models are typically free or low-cost, limiting direct revenue for their developers, they create partnership opportunities — BMW and Alibaba recently collaborated around Qwen for automotive applications. Local deployment of open-source models also avoids the regulatory risks of using externally hosted Chinese AI, making them viable for regulated industries such as banking.

Cloud and software providers face uneven pressure

Cloud platforms are relatively insulated from the spending shift. AWS, Azure and Google Cloud operate multi-model marketplaces, so customers switching from premium to cheaper models may reduce API revenue growth but still consume compute. "The more enterprises manage costs, the more likely they are to centralize model selection, deployment and billing on a single cloud platform," the UBS analysts wrote.

Hardware demand also remains intact. Nvidia's GB200 and GB300 chips are just beginning volume shipments, and multimodal workloads — audio, video, physical AI — continue to expand the compute envelope. The question for investors is whether model companies' price compression will eventually cap cloud GPU pricing power.

The largest SaaS platforms face the most complex position. Salesforce, ServiceNow and Workday are pushing to transition from per-seat to consumption-based pricing just as clients become cost-sensitive. That timing mismatch could slow their AI monetization efforts. Yet software companies also have an opening as AI cost optimizers. Palantir's AIP Evolve is the clearest example, but the structural advantage belongs to any platform that can act as a model-agnostic routing layer.

UBS Evidence Lab surveyed about 130 companies and found only 8% have deployed AI agents in production at scale. Another 37% use them in limited production, 29% are piloting and 26% use only Copilot or coding tools without agent deployment. The bulk of token consumption from autonomous agents has yet to begin. Harvey, an AI legal assistant, saw its token consumption grow to 12 trillion to 13 trillion in May from 1 trillion in January — evidence that optimization and expansion can coexist.

The spending controls differ fundamentally from the post-pandemic cloud budget pullback of 2022 to 2024. That was mature usage being cut. This is cost governance during early-stage technology diffusion. The result is not disappearing AI demand but a reordering of winners: premium model providers face slower revenue growth, cost-optimization platforms benefit, cloud providers collect multi-model workloads, and Chinese open-source models gain a foothold in global enterprise infrastructure.

This article is for informational purposes only and does not constitute investment advice.