Week 05

Gaussian processes for classification | Non-gaussian likelihoods

Gaussian Process for classification

Complex decision boundary: non-linear cases → Gaussian process can learn from it
Probabilistic classification: model uncertainty

Example of a non-linear classification problem

Illustration with a 2D Binary Classification

MNIST dataset: classify 4 and 7 digits

Some notations and assumptions

Why do we use an inverse link function σ?

"squeeze" the values of 𝑦(𝐱) from the ℝ to [0,1], to interpret it as a probability

Which inverse link function?

- - - Sigmoid (here): more robust with outliers
    - Cumulative Distribution Function of the standard normal distribution: better for computational properties

y follows a Gaussian Process:

About the Prior → Gaussian Process

About the Likelihood

About the Predictive Posterior

We want to make predictions, BUT like in Bayesian classification, the predictive distribution is analytically intractable → Laplace approximation

1. Express the posterior distribution for new data points p(y*|t,x*)

Laplace approximation on p(y|t):

Then apply the equations for linear Gaussian models, since p(y*|y,x*) is a conditional Gaussian density:

2. Compute the predictive distribution for classification labels

Two methods:

Monte Carlo sampling: generate a number S of samples from the posterior distribution of new data y*, then computing the mean

Probit approximation: approximate the sigmoid function with the Normal CDF Φ for the expectation value to be computed analytically

Applying Gaussian Process Classification on MNIST: posterior mean and variance + predictive probabilities for the all data space

APPLICATION IN 1D

Observations:

A 1D example

Other existing likelihoods for various applications:

Use regression for everything!

→ GLM = Generalized Linear Models

Example: a Poisson Bayesian regression

What do we need?

1. A linear model

Note: for a Generalized Gaussian Process model, replace the linear model with a Gaussian Process

2. Link function, to link the mean of the linear model to the mean of the targets t(x) we want to predict

3. Choose the proper distribution p(t|x) for the targets t(x)

In our example: we use a Poisson likelihood to count data

Page updated

Google Sites

Report abuse