Week 05
Gaussian processes for classification | Non-gaussian likelihoods
Gaussian processes for classification | Non-gaussian likelihoods
Complex decision boundary: non-linear cases → Gaussian process can learn from it
Probabilistic classification: model uncertainty
Some notations and assumptions
Why do we use an inverse link function σ?
"squeeze" the values of 𝑦(𝐱) from the ℝ to [0,1], to interpret it as a probability
Which inverse link function?
Sigmoid (here): more robust with outliers
Cumulative Distribution Function of the standard normal distribution: better for computational properties
y follows a Gaussian Process:
About the Prior → Gaussian Process
About the Likelihood
About the Predictive Posterior
We want to make predictions, BUT like in Bayesian classification, the predictive distribution is analytically intractable → Laplace approximation
1. Express the posterior distribution for new data points p(y*|t,x*)
Laplace approximation on p(y|t):
Then apply the equations for linear Gaussian models, since p(y*|y,x*) is a conditional Gaussian density:
2. Compute the predictive distribution for classification labels
Two methods:
Monte Carlo sampling: generate a number S of samples from the posterior distribution of new data y*, then computing the mean
Probit approximation: approximate the sigmoid function with the Normal CDF Φ for the expectation value to be computed analytically
APPLICATION IN 1D
Observations:
Less extreme than Neural Network with MAP estimation
MC and Probit: quite similar approximations
Other existing likelihoods for various applications:
Use regression for everything!
→ GLM = Generalized Linear Models
Example: a Poisson Bayesian regression
What do we need?
1. A linear model
2. Link function, to link the mean of the linear model to the mean of the targets t(x) we want to predict
3. Choose the proper distribution p(t|x) for the targets t(x)
In our example: we use a Poisson likelihood to count data