Week 03

Generative and discriminative classification | Logistic regression | Laplace approximations

Classification Problem

Example with binary classification

What is the probability that a new data point belongs to a given class?

Generative vs discriminative models

Generative Model

Unsupervised Machine Learning

Focus on the distribution of each class in the dataset D, to return the probability of a given sample

Goal: explain how the data is generated

How: apply Bayes' rule to find the joint probability

Pros: able to generate new data points, handle easily missing data

Cons: strong assumptions, weak with outliers

Discriminative Model

Supervised Machine Learning

Make predictions on unseen data based on conditional probabilities

Goal: find a decision boundary to separate classes

How: assume the functional form of the posterior and estimate parameters with provided data

Pros: handle outliers, often better calibrated, easy to make flexible

Cons: cannot handle missing data

Generative example: binary classifier

Data points for 4 and 7 digits from MNIST dataset after PCA

We assume the class conditional distributions as multivariate normal distributions:

Application of Bayes' rule:

About the Prior

About the Marginal Density

Mixture distribution with the sum rule:

About the Posterior

Some calculation details:

About the Classification Rule

We maximize the posterior class probability and estimate optimal w0 and w parameters:

Visualizations for Bayesian Binary Classifier in 2D to classify 4-digit and 7-digit in MNIST dataset

Discriminative example: logistic regression

Assumption:

We use Bayesian inference to estimate the weights w, our model parameters.

About the Prior

We first assume that the weights are independent and identically distributed (i.i.d).

→ hyperparameter α, the prior precision on the logistic regression weights

About the Likelihood

About the Posterior

Not a conjugate model: the posterior probability density is analytically intractable

→ Laplace approximation!

Laplace approximation

Idea: approximate the posterior distribution with a Gaussian centered on at the Maximum A Posteriori estimator (MAP)

Method explained:

Locate the mode of the posterior distribution

2. Evaluate the Hessian A at the found MAP estimator

Advantages
Fast computation
Good approximation results

Limitations
Only for continuous parameters
Gaussian (symmetric, thin tail)
Local (vicinity of MAP estimator)

Visualization of Laplace approximation for 3 different target distributions
1) unimodal and almost symmetric, 2) unimodal, asymmetric, 3) multimodal, asymmetric

About the Predictive Posterior

Problem: this integral doesn't have analytical solution...

→ Evaluation strategies: Monte Carlo sampling / numerical integration / Probit approx.

Flexibility of Bayesian logistic regression

Page updated

Google Sites

Report abuse