It’s been a little over eight years since we first started talking about Neural Processing Units (NPUs) inside our smartphones and the early prospects of on-device AI. Big points if you remember that the HUAWEI Mate 10’s Kirin 970 processor was the first, though similar ideas had been floating around, particularly in imaging, before then.
Of course, a lot has changed in the last eight years — Apple has finally embraced AI, albeit with mixed results, and Google has obviously leaned heavily into its Tensor Processor Unit for everything from imaging to on-device language translation. Ask any of the big tech companies, from Arm and Qualcomm to Apple and Samsung, and they’ll all tell you that AI is the future of smartphone hardware and software.
And yet the landscape for mobile AI still feels quite confined; we’re restricted to a small but growing pool of on-device AI features, curated mostly by Google, with very little in the way of a creative developer landscape, and NPUs are partly to blame — not because they’re ineffective, but because they’ve never been exposed as a real platform. Which begs the question, what exactly is this silicon sitting in our phones really good for?
What is an NPU anyway?
Robert Triggs / Android Authority
Before we can decisively answer whether phones really “need” an NPU, we should probably acquaint ourselves with what it actually does.
Just like your phone’s general-purpose CPU for running apps, GPU for rendering games, or its ISP dedicated to crunching image and video data, an NPU is a purpose-built processor for running AI workloads as quickly and efficiently as possible. Simple enough.
Specifically, an NPU is designed to handle smaller data sizes (such as tiny 4-bit and even 2-bit models), specific memory patterns, and highly parallel mathematical operations, such as fused multiply-add and fused multiply–accumulate.
Mobile NPUs have taken hold to run AI workloads that traditional processors struggle with.
Now, as I said back in 2017, you don’t strictly need an NPU to run machine learning workloads; lots of smaller algorithms can run on even a modest CPU, while the data centers powering various Large Language Models run on hardware that’s closer to an NVIDIA graphics card than the NPU in your phone.
... continue reading