Week 04
Covariance functions and the squared exponential kernel |
Gaussian processes for regression
Covariance functions and the squared exponential kernel |
Gaussian processes for regression
So far, we have been studying parametric models, where we focussed on estimating the model parameter w.
What about moving to the function-space?
Augmented model by introducing the latent variable y in the joint probability:
If we integrate out the parameters w, we end up with :
Then, if we plug our prior Gaussian distribution for w in the distribution of y, we get:
This is our prior for y!
Closer look on the covariance between two latent variables...
It is simply a function of the two corresponding input features.
Let's use other covariance functions!
→ Most common covariance function used in statistics and machine learning
Example (with a zero-mean)
ABOUT PARAMETERS
Parameter ℓ is called the wavelength
Decreasing ℓ
Increasing ℓ
Parameter κ is called the magnitude
Decreasing κ
Increasing κ
Recall the regression problem context:
[Definition] A Gaussian process is a collection of random variables, where any finite number of which have a joint Gaussian distribution.
In this case, we note:
[Definition] A random vector x has a multivariate Gaussian distribution if all linear combinations of x are Gaussian distributed:
About the Prior
We impose directly the prior on the function values y
Hyperparameters κ and ℓ from covariance function as squared exponential kernel
About the Likelihood
Noise e is independent and identically distributed.
Hyperparameter β that controls the precision of measurements.
About the Predictive Posterior
Previously, we studied the influence of hyperparameters κ and ℓ...
But what about β?
Influence of β hyperparameter
Evidence approximation: maximizing the marginal likelihood