Apple’s machine learning framework is getting support for NVIDIA’s CUDA platform

Apple’s MLX machine learning framework, originally designed for Apple Silicon, is getting a CUDA backend, which is a pretty big deal. Here’s why.

The work is being led by developer @zcbenz on GitHub (via AppleInsider), who started prototyping CUDA support a few months ago. Since then, he split the project into smaller pieces, and gradually merged them into Apple’s MLX’s main branch.

The backend is still a work in progress, but several core operations, like matrix multiplication, softmax, reduction, sorting, and indexing, are already supported and tested.

Wait, what is CUDA?

Basically, CUDA (or Compute Unified Device Architecture) is the Metal of NVIDIA hardware: a computing platform the company created specifically to run on its own GPUs and make the most of them for high-performance parallel computing tasks.

For many, CUDA is the standard way to run machine learning workloads on NVIDIA GPUs, and it’s used throughout the ML ecosystem, from academic research to commercial deployment. Frameworks like PyTorch and TensorFlow, which are names increasingly familiar even outside of deep ML circles, all rely on CUDA to tap into GPU acceleration.

So why is Apple’s MLX now supporting CUDA?

MLX was originally optimized for Apple Silicon and Metal, but adding a CUDA backend changes that. Now, researchers and engineers can prototype CUDA-based models locally on a Mac using MLX, and then deploy them on large-scale NVIDIA GPU clusters, which still dominate machine learning training workloads.

That said, there are still limitations, most of which are works in progress. For instance, not all MLX operators are implemented yet, and AMD GPU support is still further down the road.

Still, bringing MLX and NVIDIA GPUs closer together opens the door to faster testing, experimentation, and research use cases, which is pretty much all an AI developer can hope to hear.

... continue reading