Key Takeaways:
- DeepSeek V4 2-bit quantized version runs within 90 GB of VRAM
- Apple hardware hit 35 tokens per second vs 7 on AMD in Buterin's tests
- Buterin tied local AI to private RPC reads and ZK-based LLM payments
Key Takeaways:

Vitalik Buterin said DeepSeek V4's local AI advances can strengthen Ethereum's privacy infrastructure, linking the model's 2-bit quantized build to private RPC access and zero-knowledge payments.
"There's actually a lot of intersection between 'CROPS Ethereum access layer' and 'CROPS AI,'" Buterin, co-founder of Ethereum, said in a May 28 post detailing his local LLM testing.
Buterin said the 2-bit quantized DeepSeek V4 runs within about 90 GB of VRAM, reaching roughly 35 tokens per second on Apple hardware but only about 7 tokens per second on AMD GPUs — a gap he framed as the difference between genuine CROPS AI and systems described only as decentralized AI. The DeepSeek V4 Pro model uses a mixture-of-experts architecture with 1.6 trillion total parameters and 49 billion active parameters, supporting a 1-million-token context window.
The intersection matters because Ethereum users still leak sensitive metadata when querying wallets and contracts through public RPC endpoints. Private RPC reads and ZK-based payment layers for remote LLM calls could let users and AI agents interact with Ethereum without exposing identity or usage data to infrastructure providers.
Buterin also highlighted application-specific finetuned models as a key piece of the privacy and security roadmap. He cited Mistral's Leanstral, which he said reaches about 38 tokens per second on AMD hardware using under 70 GB of VRAM. "Things like this are a huge boon for writing more secure code," Buterin said, adding that the Ethereum ecosystem "should have models fine-tuned for Ethereum-related use cases" to help developers spot flaws in smart contracts and protocol code before mainnet deployment.
His local AI testing also covered several infrastructure projects. Buterin said his messaging-daemon project now has alpha Telegram support, and he pointed to Lucebox Hub as a promising tool for running dense models like Qwen 27B more efficiently, delivering roughly twice the token throughput of Llama.cpp on his 5090 laptop.
The push for local AI on Ethereum comes as the network faces broader market pressure. ETH traded near $2,063.92 as of the latest data, with CryptoQuant reporting rising failed transactions and higher exchange inflows — a combination the analytics firm described as potentially "somewhat bearish for the asset in the near term."
This article is for informational purposes only and does not constitute investment advice.