Speeding up PyTorch inference on Apple devices with AI-generated Metal kernels
Speeding up PyTorch inference by 87% on Apple devices with AI-generated Metal kernels tl;dr: Our lab investigated whether frontier models can write optimized GPU kernels for Apple devices to speed up inference. We found that they can: our AI-generated Metal kernels were 1.87x faster across 215 PyTorch modules, with some workloads running hundreds of times faster than baseline. Why use AI to generate kernels for Apple devices? AI models execute on hardware via GPU kernels that define each oper