Week 02
Bayesian linear regression | Model selection using the marginal likelihood
Bayesian linear regression | Model selection using the marginal likelihood
Supervised machine learning technique
Classical approach: estimate the Maximum Likelihood Estimator for unknown regression weights by minimizing the sum-of-squares error
Example with polynomial regression:
M defines the model complexity (here: polynomial order of the model).
Model Selection: Under-fitting VS Overfitting
→ Regularization of the error with a penalty parameter λ
How to handle overfitting? How to choose the optimal λ?
Less prone to overfitting
Can adapt model complexity automatically
Assumption: the Gaussian noise is independent and identically distributed (i.i.d)
About the Prior
About the Likelihood
About the Posterior
Conjugate model: prior and posterior are both multivariate normal distributions
Example: linear model, where we estimate the intercept and the slope
Let's add one data point at a time:
About the Predictive Posterior
We average over all possible parameter values, weighted by the posterior.
Two terms in the predictive posterior variance:
first one = posterior uncertainty projected onto data space (epistemic/reductible)
second one = measurement noise (aleatoric/irreductible)
α, prior precision of the weights
If α increases, then the prior will influence more the posterior in the "prior-likelihood compromise" but it won't change the predictive distribution that much
If α tends to 0, then the posterior mean converges to the Maximum Likelihood estimated weights:
β, precision of measurements
If β increases, then the likelihood will influence more the posterior
If β tends to 0, then the posterior distribution tends to the prior distribution:
QUESTION: HOW TO CHOOSE THESE HYPERPARAMETERS?
We cannot analytically find optimal hyperparameters α and β with a Bayesian approach...
Solution: maximizing the marginal likelihood of the model
With arbitrary priors (α, β) = (10, 1)
VS
After marginal likelihood maximization (α, β) = (0.176, 138.63)
But maximizing the marginal likelihood can be done for another purpose...
Maximize the marginal likelihood
to determine the complexity M of the model