LLMs contain a LOT of parameters. But what’s a parameter?

A large language model’s parameters are often said to be the dials and levers that control how it behaves. Think of a planet-size pinball machine that sends its balls pinging from one end to the other via billions of paddles and bumpers set just so. Tweak those settings and the balls will behave in a different way.

OpenAI’s GPT-3, released in 2020, had 175 billion parameters. Google DeepMind’s latest LLM, Gemini 3, may have at least a trillion—some think it’s probably more like 7 trillion—but the company isn’t saying. (With competition now fierce, AI firms no longer share information about how their models are built.)

But the basics of what parameters are and how they make LLMs do the remarkable things that they do are the same across different models. Ever wondered what makes an LLM really tick—what’s behind the colorful pinball-machine metaphors? Let’s dive in.

What is a parameter?

Think back to middle school algebra, like 2a + b. Those letters are parameters: Assign them values and you get a result. In math or coding, parameters are used to set limits or determine output. The parameters inside LLMs work in a similar way, just on a mind-boggling scale.

How are they assigned their values?

Short answer: an algorithm. When a model is trained, each parameter is set to a random value. The training process then involves an iterative series of calculations (known as training steps) that update those values. In the early stages of training, a model will make errors. The training algorithm looks at each error and goes back through the model, tweaking the value of each of the model’s many parameters so that next time that error is smaller. This happens over and over again until the model behaves in the way its makers want it to. At that point, training stops and the values of the model’s parameters are fixed.

Sounds straightforward …

In theory! In practice, because LLMs are trained on so much data and contain so many parameters, training them requires a huge number of steps and an eye-watering amount of computation. During training, the 175 billion parameters inside a medium-size LLM like GPT-3 will each get updated tens of thousands of times. In total, that adds up to quadrillions (a number with 15 zeros) of individual calculations. That’s why training an LLM takes so much energy. We’re talking about thousands of specialized high-speed computers running nonstop for months.

Oof. What are all these parameters for, exactly?

... continue reading