Microgpt explained interactively

MicroGPT explained interactively

Trying my best to visualize it. I'm a n00b at machine learning though

Andrej Karpathy wrote a 200-line Python script that trains and runs a GPT from scratch, with no libraries or dependencies, just pure Python. The script contains the algorithm that powers LLMs like ChatGPT.

Let's walk through it piece by piece and watch each part work. Andrej did a walkthrough on his blog, but here I take a more visual approach, tailored for beginners.

The dataset

The model trains on 32,000 human names, one per line: emma, olivia, ava, isabella, sophia... Each name is a document. The model's job is to learn the statistical patterns in these names and generate plausible new ones that sound like they could be real.

By the end of training, the model produces names like "kamon", "karai", "anna", and "anton".The model has learned which characters tend to follow which, which sounds are common at the start vs. the end, and how long a typical name runs. From ChatGPT's perspective, your conversation is just a document. When you type a prompt, the model's response is a statistical document completion.

Numbers, not letters

Neural networks work with numbers, not characters. So we need a way to convert text into a sequence of integers and back. The simplest possible tokenizer assigns one integer to each unique character in the dataset. The 26 lowercase letters get ids 0 through 25, and we add one special token called BOS (Beginning of Sequence) with id 26 that marks where a name starts and ends.

Type a name below and watch it get tokenized. Each character maps to its integer id, and BOS tokens wrap both ends:

... continue reading