A Visual Exploration of Gaussian Processes (2019)

Even if you have spent some time reading about machine learning, chances are that you have never heard of Gaussian processes. And if you have, rehearsing the basics is always a good way to refresh your memory. With this blog post we want to give an introduction to Gaussian processes and make the mathematical intuition behind them more approachable.

Gaussian processes are a powerful tool in the machine learning toolbox . They allow us to make predictions about our data by incorporating prior knowledge. Their most obvious area of application is fitting a function to the data. This is called regression and is used, for example, in robotics or time series forecasting. But Gaussian processes are not limited to regression — they can also be extended to classification and clustering tasks . For a given set of training points, there are potentially infinitely many functions that fit the data. Gaussian processes offer an elegant solution to this problem by assigning a probability to each of these functions . The mean of this probability distribution then represents the most probable characterization of the data. Furthermore, using a probabilistic approach allows us to incorporate the confidence of the prediction into the regression result.

We will first explore the mathematical foundation that Gaussian processes are built on — we invite you to follow along using the interactive figures and hands-on examples. They help to explain the impact of individual components, and show the flexibility of Gaussian processes. After following this article we hope that you will have a visual intuition on how Gaussian processes work and how you can configure them for different types of data.

Multivariate Gaussian distributions

Before we can explore Gaussian processes, we need to understand the mathematical concepts they are based on. As the name suggests, the Gaussian distribution (which is often also referred to as normal distribution) is the basic building block of Gaussian processes. In particular, we are interested in the multivariate case of this distribution, where each random variable is distributed normally and their joint distribution is also Gaussian. The multivariate Gaussian distribution is defined by a mean vector μ \mu μ and a covariance matrix Σ \Sigma Σ. You can see an interactive example of such distributions in the figure below.

The mean vector μ \mu μ describes the expected value of the distribution. Each of its components describes the mean of the corresponding dimension. Σ \Sigma Σ models the variance along each dimension and determines how the different random variables are correlated. The covariance matrix is always symmetric and positive semi-definite . The diagonal of Σ \Sigma Σ consists of the variance σ i 2 \sigma_i^2 σi2 of the i i i-th random variable. And the off-diagonal elements σ i j \sigma_{ij} σij describe the correlation between the i i i-th and j j j-th random variable.

X = [ X 1 X 2 ⋮ X n ] ∼ N ( μ , Σ ) X = \begin{bmatrix} X_1 \\ X_2 \\ \vdots \\ X_n \end{bmatrix} \sim \mathcal{N}(\mu, \Sigma) X = ⎣ ⎢ ⎢ ⎡ X 1 X 2 ⋮ X n ⎦ ⎥ ⎥ ⎤ ∼ N ( μ , Σ )

We say X X X follows a normal distribution. The covariance matrix Σ \Sigma Σ describes the shape of the distribution. It is defined in terms of the expected value E E E:

Σ = Cov ( X i , X j ) = E [ ( X i − μ i ) ( X j − μ j ) T ] \Sigma = \text{Cov}(X_i, X_j) = E \left[ (X_i - \mu_i)(X_j - \mu_j)^T \right] Σ = Cov ( X i , X j ) = E [ ( X i − μ i ) ( X j − μ j ) T ]

Visually, the distribution is centered around the mean and the covariance matrix defines its shape. The following figure shows the influence of these parameters on a two-dimensional Gaussian distribution. The variances for each random variable are on the diagonal of the covariance matrix, while the other values show the covariance between them.

... continue reading