This is a brief guide to my new art project microgpt, a single file of 200 lines of pure Python with no dependencies that trains and inferences a GPT. This file contains the full algorithmic content of what is needed: dataset of documents, tokenizer, autograd engine, a GPT-2-like neural network architecture, the Adam optimizer, training loop, and inference loop. Everything else is just efficiency. I cannot simplify this any further. This script is the culmination of multiple projects (micrograd, makemore, nanogpt, etc.) and a decade-long obsession to simplify LLMs to their bare essentials, and I think it is beautiful 🥹. It even breaks perfectly across 3 columns:
Where to find it:
This GitHub gist has the full source code: microgpt.py
It’s also available on this web page: https://karpathy.ai/microgpt.html
Also available as a Google Colab notebook
The following is my guide on stepping an interested reader through the code.
Dataset
The fuel of large language models is a stream of text data, optionally separated into a set of documents. In production-grade applications, each document would be an internet web page but for microgpt we use a simpler example of 32,000 names, one per line:
# Let there be an input dataset `docs`: list[str] of documents (e.g. a dataset of names) if not os . path . exists ( 'input.txt' ): import urllib.request names_url = 'https://raw.githubusercontent.com/karpathy/makemore/refs/heads/master/names.txt' urllib . request . urlretrieve ( names_url , 'input.txt' ) docs = [ l . strip () for l in open ( 'input.txt' ). read (). strip (). split ( '
' ) if l . strip ()] # list[str] of documents random . shuffle ( docs ) print ( f "num docs: { len ( docs ) } " )
... continue reading