Bayes, bits & brains
This site is about probability and information theory. We'll see how they help us understand machine learning and the world around us.
A few riddles
More about the content, prerequisites, and logistics later. I hope you get a feel for what this is about by checking out the following riddles. I hope some of them nerd-snipe you! 😉 You will understand all of them at the end of this minicourse.
🧠 Intelligence test Test your intelligence with the following widget! You will be given a bunch of text snippets cut from Wikipedia at a random place. Your job: predict the next letter! Try at least five snippets and compare your performance with some neural nets (GPT-2 and Llama 4). Loading questions… Don't feel bad if a machine beats you; they've been studying for this test their entire lives! But why? And why did Claude Shannon - the information theory GOAT - make this experiment in the 1940s? Hide ▲
📈 Modelling returns
🌐 How much information is on Wikipedia?
🔮 Who's less wrong?
🦶 Average foot
🤓 Explaining XKCD jokes
Onboarding
As we go through the mini-course, we'll revisit each puzzle and understand what's going on. But more importantly, we will understand some important pieces of mathematics and get solid theoretical background behind machine learning.
Here are some questions we will explore.
What's KL divergence, entropy and cross-entropy? What's the intuition behind them? (chapters 1-3)
Where do the machine-learning principles of maximum likelihood & maximum entropy come from? (chapters 4-5)
Why do we use logits, softmax, and Gaussian all the time? (chapter 5)
How to set up loss functions? (chapter 6)
How compression works and what intuitions it gives about LLMs? (chapter 7)
Prerequisites
How to read this
What's next?
This is your last chance. You can go on with your life and believe whatever you want to believe about KL divergence. Or you go to the first chapter and see how far the rabbit-hole goes.