Skip to content
Tech News
← Back to articles

Training an LLM in Swift, Part 1: Taking matrix mult from Gflop/s to Tflop/s

read original get Swift for TensorFlow Guide → more articles
Why This Matters

This article highlights the importance of optimizing machine learning workloads directly in Swift on Apple Silicon, showcasing how low-level code can push the hardware's capabilities. It emphasizes the potential for developers to better understand and leverage the full power of Apple Silicon for training large language models, even outside traditional frameworks.

Key Takeaways

In this article, I try to get my own handwritten matrix multiplication code running as fast as possible for training a Large Language Model (LLM) in Swift. The aim is to give some insight into the key steps for optimizing mathematics code in Swift. I also hope that these examples will offer a sense of scale about the capabilities of the different units on Apple Silicon – CPU, SIMD, AMX and GPU.

This will be the first in a series where I look at training neural networks in Swift on Apple Silicon. Future articles will look at the maybe-too-many frameworks Apple offer for machine learning on the Mac. Those established frameworks are what you should really use for matrix multiplication and machine learning (they’ve spent a few more years optimizing matrix kernels than I have).

But until then, I’m having fun writing everything for myself in a “no frameworks, no libraries” plain code approach.

And I’m not just writing matrix multiplication kernels. The sample app will use these kernels as part of a full LLM implementation and the numbers I’ll quote will be for entire forward and backward training iterations. The reference implementation for this series will be Andrej Karpathy’s llm.c (a plain C implementation of a GPT2-compatible model). It’s a fairly basic model but it does contain all the necessary components and is representative of real-world workloads.

That means it’s time for my favorite game: optimize Swift until it’s faster than C.

Backstory

About two years ago, I dug up my engineering thesis from the early 2000s. It’s an image recognizer written in C++ that uses a neural network for classifying images. I wanted to get my old code running again but I hadn’t worked on ML code in a long time. It got annoying and I gave up.

For all the discussion around LLMs in early 2024, it felt like no one was training neural networks on the Mac. At least, not in languages like Swift. I played with some Python libraries like PyTorch and TensorFlow but Python never does the calculations itself – it operates more like an orchestrator of another computational engine under the hood – and the separation left me feeling like I wasn’t in control.

A month later, Andrej Karpathy released llm.c. This reached me in a way that other machine learning content didn’t because nothing is hidden. It is around 1000 lines of plain C and (although it’s filled with some pretty cryptic variable names) it’s relatively readable.

So naturally, I immediately rewrote it in Swift. And it was a lot of fun to play with.

... continue reading