GoKawiil - Writing an LLM from scratch, part 10

Writing an LLM from scratch, part 10 -- dropout I'm still chugging through chapter 3 of Sebastian Raschka's "Build a Large Language Model (from Scratch)". Last time I covered causal attention, which was pretty simple when it came down to it. Today it's another quick and easy one -- dropout. The concept is pretty simple: you want knowledge to be spread broadly across your model, not concentrated in a few places. Doing that means that all of your parameters are pulling their weight, and you don't have a bunch of them sitting there doing nothing. So, while you're training (but, importantly, not during inference) you randomly ignore certain parts -- neurons, weights, whatever -- each time around, so that their "knowledge" gets spread over to other bits. Simple enough! But the implementation is a little more fun, and there were a couple of oddities that I needed to think through. Code-wise, it's really easy: PyTorch provides a useful torch.nn.Dropout class that you create with the drop ... Read full article.

Find Related products on Amazon

Writing an LLM from scratch, part 10 – dropout

Find Related products on Amazon

Writing an LLM from scratch, part 10 – dropout

Related Articles