Skip to content
Tech News
← Back to articles

Generating Images with a 2025 Android

read original more articles
Why This Matters

This article highlights the challenges and progress in running advanced AI image generation models on Android devices, showcasing the potential for on-device creativity and AI-powered apps. It underscores the ongoing need for more mature hardware acceleration tools and frameworks to match the performance seen on iOS, which is crucial for broader consumer adoption and innovation in mobile AI applications.

Key Takeaways

29 June 2026

Here is an image we generated entirely on a Samsung Galaxy S25+ – PrismML's Bonsai Image model again, this time with the diffusion transformer running on the phone's Hexagon NPU:

“A bonsai tree in a quiet ceramic studio, soft morning light, shallow depth of field”

The code is at github.com/duration-ai/bonsai-image-android. This is the companion to Generating images with a 2020 iPhone – same model, same 512×512 but Android this time. We closed that last post by saying we might have a look at Android next. We did, and it turned out to be much harder.

For iOS, Apple gives you two mature machine-learning stacks (Core ML and MLX), and PrismML's reference happened to run on the same MLX framework as our Swift port, so we could check our numbers against theirs line by line.

Annoyingly, there aren't any mature equivalents for Android – NNAPI has been replaced by LiteRT (formerly TensorFlow Lite), but it's still fairly immature. The trouble is that different Android phone families have different hardware, so for maximum performance you have to choose how and where to run a given model: the CPU, the GPU, or, on some phones, the NPU. Each family has its own toolchain, none of them as smooth as Core ML.

The baseline for our porting work is the CPU: stable-diffusion.cpp – or rather Juste-Leo2's fork of it, which adds the 1-bit support – runs the Bonsai weights pretty straightforwardly, though very slowly. On the S25+'s Snapdragon 8 Elite the diffusion transformer takes about 2 minutes per step, so a full 512×512 image takes 8–9 minutes. This is fine, but given that we got the iPhone 12 Pro from 2020 running a 512×512 generation in a little over 2 minutes, we wanted to see if we could do any better on a phone that should be five whole years ahead technologically.

We had some early success with the GPU, managing to generate a crisp apple at 256×256:

256×256 on the GPU, via OpenCL

But efforts to push the GPU to generate a 512×512 image then stalled badly, with consistent crashing during denoising, which left us with only the NPU as the untested path. (Someone more knowledgeable may be able to unblock this, I'm sure.)

... continue reading