Coinbase slashed AI spending by nearly half by routing most tasks through Chinese open-source models, challenging the pricing power of premium US AI providers.
Coinbase slashed AI spending by nearly half by routing most tasks through Chinese open-source models, challenging the pricing power of premium US AI providers.

Coinbase Chief Executive Officer Brian Armstrong said the company cut AI spending by nearly 50% by setting Chinese open-source models GLM 5.2 and Kimi 2.7 as default options through an internal LLM gateway, while token usage continued to grow exponentially.
"91% of engineers never hit their usage cap, so we didn't tighten quotas — we switched to cheaper default models," Armstrong said in a post on X on Friday.
The crypto exchange deployed three cost-cutting measures: a smart routing system that preprocesses prompts and assigns tasks to the most cost-effective model based on cache hit rates and pricing; aggressive caching that lifted LibreChat's hit rate to 60% from 5%; and context streamlining that requires engineers to start new sessions when switching tasks. For complex planning and reasoning, engineers can still invoke frontier models, while code reviews use a multi-model parallel strategy where outputs cross-check each other.
The shift validates the commercial viability of Chinese open-source AI in Western enterprise production environments, directly challenging the pricing power of US providers such as OpenAI and Anthropic. For Coinbase, the cost reduction could improve margins and profitability metrics at a time when the company is expanding AI usage rather than restricting it.
Smart Routing Replaces Manual Model Selection
Armstrong said the company's custom scheduling framework preprocesses every prompt, then automatically routes it to the most suitable model based on cache hit probability and per-token pricing. The goal, he said, is to let AI handle model selection rather than leaving it to engineers. Execution-level tasks, he argued, do not require the most expensive frontier models — only planning and reasoning tasks do.
Caching and Context Discipline Drive the Bulk of Savings
Coinbase now requires all AI requests to be cache-aware, meaning the system checks whether a previous response can be reused before generating a new one. The LibreChat implementation illustrates the impact: cache hit rates jumped to 60% from 5% after the optimization. Armstrong also urged engineers to keep context windows lean — starting fresh sessions, narrowing file scope, and disconnecting unused tools — to reduce wasted token consumption.
The company has not disclosed absolute spending figures. But achieving a near-50% reduction while token usage grows at an exponential rate suggests Coinbase has partially decoupled consumption from cost.
What This Means for the AI Market
The adoption of GLM 5.2 — developed by Beijing-based Zhipu AI — and Kimi 2.7 — from Beijing Moonshot AI — as default enterprise models marks a milestone for Chinese open-source AI in Western corporate infrastructure. OpenAI's GPT-4o and Anthropic's Claude 4, which command premium pricing, now face a credible low-cost alternative that enterprises can deploy without sacrificing quality on routine tasks.
For investors, the implication is clear: if other large enterprises follow Coinbase's playbook, the addressable market for premium US AI models could narrow to high-complexity tasks only, compressing revenue growth expectations for providers that rely on blanket enterprise adoption. Coinbase, which trades as COIN on Nasdaq, has not disclosed the exact dollar savings, but the structural cost improvement supports margin expansion as AI usage scales.
This article is for informational purposes only and does not constitute investment advice.