GateGPT: 56k tokens per second Transformer (KV cache) on FPGA at 80 MHz

2026-06-16 | original

read original more articles

Why This Matters

This breakthrough demonstrates the potential for high-speed, energy-efficient AI processing using custom digital silicon, bypassing traditional GPU or CPU reliance. It highlights a significant advancement in deploying large language models on specialized hardware, which could lead to faster, more cost-effective AI applications. For consumers and the industry, this paves the way for more powerful AI tools embedded in everyday devices with improved performance and lower latency.

Key Takeaways

Achieved 56,000+ tokens/sec processing speed at 80 MHz on FPGA.
Implemented a fully digital, gate-level Transformer with KV cache in custom silicon.
Shows potential for high-performance, low-power AI hardware beyond traditional GPU/CPU setups.

56,000+ tokens/sec at just 80 MHz. 🤯 I burned a full Transformer with KV cache into a custom chip. Designed gate by gate as a 100% digital integrated circuit. Prototyped on a FPGA. (No GPU. No CPU) Just pure digital silicon running

microGPT, spelling out names on a

GPT 👇

Explore topics: gategpt fpga transformer kv cache microgpt