Week 01
Bayesian inference, estimators and posterior summaries | Conjugacy | The beta-binomial model
Bayesian inference, estimators and posterior summaries | Conjugacy | The beta-binomial model
Classical approach: searching the best model parameters by minimizing the sum of squared distances (error)
BUT for a given finite dataset, many sets of parameters could work
→ different predictions
THUS Bayesian machine learning:
consider ALL potential sets of parameters
compute the probability of those parameters given the data
Based on the Bayes' rule:
Everything is a probability distribution!
Take uncertainty into account + better decision-makingPrior p(w) : prior belief about the parameters before seeing the data
Likelihood p(y|w) : represent the probability our parameters generate the given data
Marginalization p(y) :normalization constant
Posterior p(w|y) : contains all knowledge about the parameters after seeing the data
Posterior, all knowledge? YES!
→ Posterior summaries:
Mean (Bayes estimator)
Credibility interval to evaluate uncertainty
Mode (Maximum A Posteriori estimator)
Where does prior knowledge come from?
Previous experiments (iterations), experts
Likelihood and prior function forms vary with the situation
[Definition] Conjugacy: when the posterior and the prior are in the same probability distribution family.
Let's study an example of Bayesian conjugate model!
For proportion estimations
Example of A/B test, to estimate click-rates
Is A better than B?
What is the probability that the click-rate is below 10%?
How much am I certain about these conclusions?
We want to estimate the parameter µ, the success probability e.g. click rate.
About the Prior
→ 2 hyperparameters
About the Likelihood
Binomial distribution (PMF - Probability Mass Function)
About the Posterior
Prior and posterior are conjugate, since they are both Beta distributions!
Effect of the prior distribution: informative VS weekly informative
About the Predictive Posterior
Obtained by averaging over the uncertainty of the posterior
All information about the parameter µ can be found in the posterior distribution.
Posterior mean as Bayes estimator
Credibility interval as uncertainty
Interval such that the posterior probability of µ belonging to the interval is 0.95
Posterior mode as MAP estimator
(Maximum A Posteriori)
Estimation at highest density, e.g. where the results are the most frequent
Maximum Likelihood estimator: obtained by maximizing the log likelihood (classical approach)
Finally, what is the probability that the click-rate of B is better than A after seeing the data D?