Week 04

Covariance functions and the squared exponential kernel |
Gaussian processes for regression

Motivation

So far, we have been studying parametric models, where we focussed on estimating the model parameter w.

What about moving to the function-space?

Augmented model by introducing the latent variable y in the joint probability:

If we integrate out the parameters w, we end up with :

Then, if we plug our prior Gaussian distribution for w in the distribution of y, we get:

This is our prior for y!

Closer look on the covariance between two latent variables...

Covariance functions

It is simply a function of the two corresponding input features.

Let's use other covariance functions!

The squared exponential kernel

→ Most common covariance function used in statistics and machine learning

Example (with a zero-mean)

ABOUT PARAMETERS

Parameter ℓ is called the wavelength

Decreasing ℓ

Increasing ℓ

Parameter κ is called the magnitude

Decreasing κ

Increasing κ

Gaussian Process for regression

Recall the regression problem context:

Gaussian Process

[Definition] A Gaussian process is a collection of random variables, where any finite number of which have a joint Gaussian distribution.

In this case, we note:

In other words, in a Gaussian process, we generate functions defined on the domain of x, and that are correlated to each other according to the mean and the covariance functions.

[Definition] A random vector x has a multivariate Gaussian distribution if all linear combinations of x are Gaussian distributed:

About the Prior

We impose directly the prior on the function values y

Hyperparameters κ and ℓ from covariance function as squared exponential kernel

About the Likelihood

Noise e is independent and identically distributed.

Hyperparameter β that controls the precision of measurements.

About the Predictive Posterior

Visualization of Prior and Posterior predictive distributions

Visualization of Prior and Posterior squared exponential kernels (covariance matrix)

Previously, we studied the influence of hyperparameters κ and ℓ...

But what about β?

Influence of β hyperparameter

Smaller β (measure precision decreases) VS Higher β (measure precision increases)

Hyperparameters optimization

Evidence approximation: maximizing the marginal likelihood

Gaussian Process for Regression applied with optimal hyperparameters: β = 2.87, κ = 2.92 and ℓ = 1.82

Page updated

Google Sites

Report abuse