MacMind
A complete transformer neural network implemented entirely in HyperTalk, trained on a Macintosh SE/30.
MacMind is a 1,216-parameter single-layer single-head transformer that learns the bit-reversal permutation -- the opening step of the Fast Fourier Transform -- from random examples. Every line of the neural network is written in HyperTalk, a scripting language from 1987 designed for making interactive card stacks, not matrix math. It has token embeddings, positional encoding, self-attention with scaled dot-product scores, cross-entropy loss, full backpropagation, and stochastic gradient descent. No compiled code. No external libraries. No black boxes.
Option-click any button and read the actual math.
Why This Exists
The same fundamental process that trained MacMind -- forward pass, loss computation, backward pass, weight update, repeat -- is what trained every large language model that exists today. The difference is scale, not kind. MacMind has 1,216 parameters. GPT-4 has roughly a trillion. The math is identical.
We are at a moment where AI affects nearly everyone but almost nobody understands what it actually does. MacMind is a demonstration that the process is knowable -- that backpropagation and attention are not magic, they are math, and that math does not care whether it is running on a TPU cluster or a 68000 processor from 1987.
Everything is inspectable. Everything is modifiable. Change the learning rate, swap the training task, resize the model -- all from within HyperCard's script editor. This is the engine with the hood up.
What It Learns
The bit-reversal permutation reorders a sequence by reversing the binary representation of each position index. For an 8-element sequence:
... continue reading