OpenAI's Jalapeño chip challenges Nvidia with 50% lower inference costs

OpenAI unveiled Jalapeño, a custom inference processor built with Broadcom that it says cuts operating costs by roughly 50 percent versus Nvidia's GPUs, threatening the chipmaker's dominance in AI data centers.

"Jalapeño matches the performance of Nvidia's Blackwell chips and Google's TPUs while running inference for roughly half the cost," Broadcom Chief Executive Hock Tan said in an interview.

The ASIC pairs a large compute section with six stacks of high-bandwidth memory to move data faster through large language models. Early testing shows "substantially better" performance per watt than current alternatives, OpenAI said, though it has not disclosed the process node, clock speeds, or memory configuration. The chip is expected to deploy by the end of 2026.

The move marks OpenAI's first step away from relying solely on Nvidia's scarce GPUs. Inference costs are the recurring bill that grows with success — every ChatGPT query, every Codex agent step — and a chip tuned only for that task can strip out the machinery a general processor has to carry. For a company serving models at OpenAI's volume, shaving inference costs by half changes the economics of the business.

Broadcom Sits Beneath Every Custom Chip

The partnership reveals a deeper dynamic. OpenAI, Google, and Meta all build their custom AI chips on Broadcom's architecture, turning a noisy contest between frontier models into steady revenue for the company beneath all of them. Broadcom reported $8.4 billion in AI chip revenue in the first quarter of fiscal 2026, up 106 percent from a year earlier, and holds a $73 billion backlog of committed orders tied to a path toward $100 billion in annual AI chip revenue by 2027, according to management.

Co-designing a chip means years of shared engineering, intellectual property, and hardware roadmaps that tie the lab and the designer together long after the first part ships. Google began designing its own AI chips roughly a decade ago and only this year reached its seventh generation, Ironwood, its first TPU built specifically for inference. Jalapeño is generation one of what OpenAI calls a "multi-generation compute platform."

What It Means for Nvidia and the Supply Chain

The merchant suppliers do not disappear. First production runs rarely cover a company's full demand, which means OpenAI keeps buying inference chips from outside vendors while Jalapeño ramps. Nvidia still dominates training workloads, where performance-intensive pre-training will likely rely on its hardware for the foreseeable future.

The constraint on all of this is manufacturing. Every one of these chips depends on Taiwan Semiconductor for advanced fabrication and the specialized packaging that bonds compute and memory into a single working part. That packaging capacity is sold out through 2026, and demand across the industry runs far ahead of supply. OpenAI does not get to skip the line — it competes for finite allocation alongside every major technology company.

Nvidia shares, trading at roughly 35 times forward earnings, face a long-term narrative shift as every major AI lab builds its own silicon. But the immediate revenue impact is years away. The clearer beneficiary is Broadcom, whose architecture sits inside the custom programs of three of the largest AI companies at once. The names on the chips will keep changing. The company that designs them stays the same.

This article is for informational purposes only and does not constitute investment advice.