I built a large language model "from scratch"

A developer's journey through building an LLM from scratch, sharing key insights about tokenization, training, and the learning process of mastering AI fundamentals.

Building a large language model from scratch#

I’m a machine learning / A.I. hobbyist. The technologies fascinate me, and I can’t seem to learn enough about them. Sebastian Raschka’s book, Build a Large Language Model (From Scratch) caught my eye. I don’t recall how I stumbled on it, but I found it when it was still in early access from Manning Publications. I purchased it, and started working through it as the final chapters were being written and released. I just completed the book and all the included work and loved every minute of it.

My approach#

A while ago, I read some advice about learning programming from digital books and tutorials. The advice was to never copy and paste code from samples but to hand-type all the code. I took that approach with this book. I typed every single line of code (except for a couple of blocks which were highly repetitive and long). You can see all my work here: https://github.com/controversy187/build-a-large-language-model

I did my best to work in section chunks. I didn’t want to start a section unless I had the time dedicated to completing it. Some sections are pretty short, others are fairly involved and time-consuming.

I built this in Jupyter Notebooks on my laptop, which is pretty underpowered for this type of work. The premise of the book was that you can build an LLM on consumer hardware, and it can perform decently well. As I’m writing this, I’m currently fine-tuning my model locally. My model is about 50 steps into a 230-step tuning, and I just crossed the 20-minute execution time mark. The earlier code samples ran quicker, but the last few sections used larger models, which slowed things down considerably.

I didn’t do most of the supplemental exercises. I tend to have an “I want to do ALL THE THINGS!” personality. The drawback is that if I take the time to do all the things, I eventually get long-term distracted and never actually finish what I started. So I sort of rushed through this book. I even took several weeks off around Christmas and New Year’s. But I got back into it and powered through the last few chapters.

So, more or less, I read through the chapters and wrote all the mandatory coding assignments.

What can I tell you about large language models? A lot more than I could before I started this book, but certainly not all the things the author attempted to teach me. I’ll summarize my understanding, but I could be wrong about some of these things, and I most certainly forgot or misunderstood others.

... continue reading