Week 13
Regression modelling with heteroscedastic noise | Deep ensembles | Last-layer Laplace approximations
Regression modelling with heteroscedastic noise | Deep ensembles | Last-layer Laplace approximations
How to choose the w parameters?
CLASSICAL APPROACH
Maximum A Posteriori
→ pick one optimal set of parameters
BAYESIAN APPROACH
Integrate over w
→ compute a weighted average over all parameters
Bayesian approach can also quantify the epistemic uncertainty
Epistemic uncertainty = lack of knowledge, not enough data (reductible)
Aleatoric uncertainty = measurement noise (irreductible)
Classical approach is usually too confident when data is missing
Example below with classification problem (from Week 5)
Let's see an application of Bayesian Neural Networks!
We study regression models with a Gaussian likelihood as follows:
Heteroscedastic = the noise measurement (variance) varies with the input data
(Opposite) Homoscedastic = constant noise, independent of the data
→ Fully connected neural network, with two outputs y1, y2 and two hidden layers
We associate the Gaussian mean to the first output (y1) and the log variance to the second one (y2).
The model parameters to approximate?
About the Prior
About the Likelihood
As mentioned previously:
About the Posterior
We cannot compute it directly
high dimension of w, complexity of posterior geometry, very large datasets
Let's approximate the posterior distribution!
Train our model S times with different seeds, e.g. different initial parameter values
Compute the S resulting parameters vectors w, and average them
Note: if S = 1 : MAP estimation
Recall of Laplace approximation
BUT the size of the Hessian matrix grows very fast with parameters: too high dimension!
Idea: apply the Laplace approximation to the last layer of our NN only
Almost as good as a complete Laplace approximation
Clearly faster
Can be easily added after training a model
Apply the Maximum A Posteriori estimation to approximate the parameters vector w for full network
Apply a Laplace approximation on the last layer
Estimate the Hessian matrix for the two parameters of the last layer, W2 and b2
Construct the Gaussian approximation on W2 and b2