Jensen–Shannon Divergence

Statistical distance measure

In probability theory and statistics, the Jensen–Shannon divergence, named after Johan Jensen and Claude Shannon, is a method of measuring the similarity between two probability distributions. It is also known as information radius (IRad)[1][2] or total divergence to the average.[3] It is based on the Kullback–Leibler divergence, with some notable (and useful) differences, including that it is symmetric and it always has a finite value. The square root of the Jensen–Shannon divergence is a metric often referred to as Jensen–Shannon distance. The similarity between the distributions is greater when the Jensen-Shannon distance is closer to zero.[4][5][6]

Definition [ edit ]

Consider the set M + 1 ( A ) {\displaystyle M_{+}^{1}(A)} of probability distributions where A {\displaystyle A} is a set provided with some σ-algebra of measurable subsets. In particular we can take A {\displaystyle A} to be a finite or countable set with all subsets being measurable.

The Jensen–Shannon divergence (JSD) is a symmetrized and smoothed version of the Kullback–Leibler divergence D ( P ∥ Q ) {\displaystyle D(P\parallel Q)} . It is defined by

J S D ( P ∥ Q ) = 1 2 D ( P ∥ M ) + 1 2 D ( Q ∥ M ) , {\displaystyle {\rm {JSD}}(P\parallel Q)={\frac {1}{2}}D(P\parallel M)+{\frac {1}{2}}D(Q\parallel M),}

where M = 1 2 ( P + Q ) {\displaystyle M={\frac {1}{2}}(P+Q)} is a mixture distribution of P {\displaystyle P} and Q {\displaystyle Q} .

The geometric Jensen–Shannon divergence[7] (or G-Jensen–Shannon divergence) yields a closed-form formula for divergence between two Gaussian distributions by taking the geometric mean.

A more general definition, allowing for the comparison of more than two probability distributions, is:

J S D π 1 , … , π n ( P 1 , P 2 , … , P n ) = ∑ i π i D ( P i ∥ M ) = H ( M ) − ∑ i = 1 n π i H ( P i ) {\displaystyle {\begin{aligned}{\rm {JSD}}_{\pi _{1},\ldots ,\pi _{n}}(P_{1},P_{2},\ldots ,P_{n})&=\sum _{i}\pi _{i}D(P_{i}\parallel M)\\&=H\left(M\right)-\sum _{i=1}^{n}\pi _{i}H(P_{i})\end{aligned}}}

... continue reading