Week 06
Multi-class classification | Decision theory | Calibration
Multi-class classification | Decision theory | Calibration
Classification with K classes
About the Likelihood
Categorical distributions
About the Posterior Predictive
It is another categorical distribution
→ Like in binary classification case, we use Laplace approximation to compute the posterior, then we estimate the posterior predictive probabilities by applying the Monte-Carlo sampling or the Probit approximation method
We predict using the most likely class, i.e.
Purpose: find the optimal decision, and know when it's better not to choose
Need to measure uncertainty
CONFIDENCE
of the posterior predictive distribution
ENTROPY
An indicator of uncertainty
REJECT OPTION
A condition on confidence to know when it is better not to choose any option
Let's make it Bayesian by introducing a utility function!
Example with the 0/1-utility function, as a matrix between true and predicted classes
How do we use the utility function in practice to predict the targets?
Compute the predictive posterior distribution p(t*|t,x*) that contains all knowledge about new data given the observations
Choose the target that maximizes the posterior expected utility:
Utility matrix = identity matrix → all choices have the same importance
Induce a negative utility for predicting green (1) when the true target is red (0)
→ the decision region for predicting green decreases so that predicting red becomes more important
Induce a positive utility for predicting blue (2) when the true target is yellow (3), and vice-versa
→ blue and yellow decision regions merge
Turn the true positive green prediction to zero
→ predicting green is worthless now
How much is our model confident about the predictions?
Example: a multi-class image classification
4 classes: dogs, berries, birds and flowers
Model calibration: trained model + post-processing operation = improve probability estimation
Interpretation:
If the model predicts 80% of dogs, I expect 80% of dog labels in the data
How to compute the calibration curve of a given class?
Compute the predictive probabilities on a validation set
Divide the [0, 1] interval into N segments of equal size
For each segment: for each segment, compute the fraction of positive samples
Plot f(predicted probability) = fraction of positive samples
Idea: fit a logistic regression model to the binary targets using the predicted class probabilities as inputs
Estimate optimal A and B scalar parameters using maximum likelihood
Re-calibrate predictive probabilities