Skip to content
Tech News
← Back to articles

Jensen–Shannon Divergence

read original get Shannon Divergence Calculator → more articles
Why This Matters

The Jensen–Shannon divergence is a crucial metric in the tech industry for measuring the similarity between probability distributions, with applications in machine learning, data analysis, and information theory. Its symmetry and finite values make it particularly useful for comparing complex data models, enhancing the accuracy of algorithms and data-driven decision-making.

Key Takeaways

Statistical distance measure

In probability theory and statistics, the Jensen–Shannon divergence, named after Johan Jensen and Claude Shannon, is a method of measuring the similarity between two probability distributions. It is also known as information radius (IRad)[1][2] or total divergence to the average.[3] It is based on the Kullback–Leibler divergence, with some notable (and useful) differences, including that it is symmetric and it always has a finite value. The square root of the Jensen–Shannon divergence is a metric often referred to as Jensen–Shannon distance. The similarity between the distributions is greater when the Jensen-Shannon distance is closer to zero.[4][5][6]

Definition [ edit ]

Consider the set M + 1 ( A ) {\displaystyle M_{+}^{1}(A)} of probability distributions where A {\displaystyle A} is a set provided with some σ-algebra of measurable subsets. In particular we can take A {\displaystyle A} to be a finite or countable set with all subsets being measurable.

The Jensen–Shannon divergence (JSD) is a symmetrized and smoothed version of the Kullback–Leibler divergence D ( P ∥ Q ) {\displaystyle D(P\parallel Q)} . It is defined by

J S D ( P ∥ Q ) = 1 2 D ( P ∥ M ) + 1 2 D ( Q ∥ M ) , {\displaystyle {\rm {JSD}}(P\parallel Q)={\frac {1}{2}}D(P\parallel M)+{\frac {1}{2}}D(Q\parallel M),}

where M = 1 2 ( P + Q ) {\displaystyle M={\frac {1}{2}}(P+Q)} is a mixture distribution of P {\displaystyle P} and Q {\displaystyle Q} .

The geometric Jensen–Shannon divergence[7] (or G-Jensen–Shannon divergence) yields a closed-form formula for divergence between two Gaussian distributions by taking the geometric mean.

A more general definition, allowing for the comparison of more than two probability distributions, is:

J S D π 1 , … , π n ( P 1 , P 2 , … , P n ) = ∑ i π i D ( P i ∥ M ) = H ( M ) − ∑ i = 1 n π i H ( P i ) {\displaystyle {\begin{aligned}{\rm {JSD}}_{\pi _{1},\ldots ,\pi _{n}}(P_{1},P_{2},\ldots ,P_{n})&=\sum _{i}\pi _{i}D(P_{i}\parallel M)\\&=H\left(M\right)-\sum _{i=1}^{n}\pi _{i}H(P_{i})\end{aligned}}}

... continue reading