Cross-Entropy and KL Divergence
Published on: 2025-04-30 16:48:48
April 12, 2025 at 06:54 Tags Math , Machine Learning
Cross-entropy is widely used in modern ML to compute the loss for classification tasks. This post is a brief overview of the math behind it and a related concept called Kullback-Leibler (KL) divergence.
Information content of a single random event We'll start with a single event (E) that has probability p. The information content (or "degree of surprise") of this event occurring is defined as: \[I(E) = \log_2 \left (\frac{1}{p} \right )\] The base 2 here is used so that we can count the information in units of bits. Thinking about this definition intuitively, imagine an event with probability p=1; using the formula, the information we gain by observing this event occurring is 0, which makes sense. On the other extreme, as the probability p approaches 0, the information we gain is huge. An equivalent way to write the formula is: \[I(E) = -\log_2 p\] Some numeric examples: suppose we flip a fair coin and it comes out heads. The proba
... Read full article.