Tech News
← Back to articles

The maths you need to start understanding LLMs

read original related products more articles

The maths you need to start understanding LLMs

Actually coming up with ideas like GPT-based LLMs and doing serious AI research requires serious maths. But the good news is that if you just want to understand how they work, while it does require some maths, if you studied it at high-school at any time since the 1960s, you did all of the groundwork then: vectors, matrices, and so on.

One thing to note -- what I'm covering here is what you need to know to understand inference -- that is, using an existing AI, rather than the training process used to create them. That's also not much beyond high-school maths, but I'll be writing about that later on.

So, with that caveat, let's dig in!

Vectors and high-dimensional spaces

In the last post I used the word "vector" in the way it's normally used by software engineers -- pretty much as a synonym of "an array of numbers". But a vector of length n is more than that; it's a distance and direction in n -dimensional space, or (equivalently) it can be taken as a point -- you start at the origin, and then follow the vector from there to the point in question.

In 2-d space, the vector ( 2 , − 3 ) means "two units to the right, and three down", or the point that is located if you move that way from the origin. In 3-d, ( 5 , 1 , − 7 ) means "five right, one up, and seven away from the viewer" (or in some schools of thought, seven toward the viewer), or the point there. With more dimensions, it becomes pretty much impossible to visualise, but conceptually it's the same.

We use vectors to mean things in LLMs. For example, the vectors of logits that come out of the LLM (see the last post) represent the likelihood of different next tokens for an input sequence. And when we do that, it's often useful to think of that in terms of defining a high-dimensional space that the meaning is represented in.

Vocab space

The logits that come out of the LLM for each token are a set of numbers, one per possible token, where the value in each "slot" is the LLM's prediction of how likely the associated token is to be the next one.

... continue reading