Week 03
Generative and discriminative classification | Logistic regression | Laplace approximations
Generative and discriminative classification | Logistic regression | Laplace approximations
What is the probability that a new data point belongs to a given class?
Focus on the distribution of each class in the dataset D, to return the probability of a given sample
Goal: explain how the data is generated
How: apply Bayes' rule to find the joint probability
Pros: able to generate new data points, handle easily missing data
Cons: strong assumptions, weak with outliers
Make predictions on unseen data based on conditional probabilities
Goal: find a decision boundary to separate classes
How: assume the functional form of the posterior and estimate parameters with provided data
Pros: handle outliers, often better calibrated, easy to make flexible
Cons: cannot handle missing data
We assume the class conditional distributions as multivariate normal distributions:
Application of Bayes' rule:
About the Prior
About the Marginal Density
Mixture distribution with the sum rule:
About the Posterior
Some calculation details:
About the Classification Rule
We maximize the posterior class probability and estimate optimal w0 and w parameters:
Assumption:
We use Bayesian inference to estimate the weights w, our model parameters.
About the Prior
We first assume that the weights are independent and identically distributed (i.i.d).
→ hyperparameter α, the prior precision on the logistic regression weights
About the Likelihood
About the Posterior
Not a conjugate model: the posterior probability density is analytically intractable
→ Laplace approximation!
Idea: approximate the posterior distribution with a Gaussian centered on at the Maximum A Posteriori estimator (MAP)
Method explained:
Locate the mode of the posterior distribution
2. Evaluate the Hessian A at the found MAP estimator
Advantages
Fast computation
Good approximation results
Limitations
Only for continuous parameters
Gaussian (symmetric, thin tail)
Local (vicinity of MAP estimator)
About the Predictive Posterior
Problem: this integral doesn't have analytical solution...
→ Evaluation strategies: Monte Carlo sampling / numerical integration / Probit approx.