Language models pack billions of concepts into 12k dimensions
In a recent 3Blue1Brown video series on transformer models, Grant Sanderson posed a fascinating question: How can a relatively modest embedding space of 12,288 dimensions (GPT-3) accommodate millions of distinct real-world concepts? The answer lies at the intersection of high-dimensional geometry and a remarkable mathematical result known as the Johnson-Lindenstrauss lemma. While exploring this question, I discovered something unexpected that led to an interesting collaboration with Grant and a