Week 07

Markov Chain Monte Carlo Methods | Metropolis-Hasting algorithm

To make prediction, we compute the predictive posterior distribution based on p(w|t), the posterior distribution:

BUT intractable integral analytically → approximate it!

Monte Carlo methods, based on random sampling

Goal: in general terms, estimate the expectation

Indeed, if we rewrite the predictive posterior distribution:

Monte-Carlo methods to approximate π

[Definition] Monte-Carlo estimator, a random variable

What is the relation of the MC estimator with our target expectation?

→ the Monte Carlo estimor is unbiased
→ the variance decreases with 1/S

Subject to existence, the MC estimator converges towards the desired expectation, with the law of large numbers i.e.

HOW TO OBTAIN I.I.D. SAMPLES FROM ANY GIVEN DISTRIBUTION?

Existing methods: rejection sampling, importance sampling, ancestral sampling.

Markov chain Monte Carlo methods [MCMC]

Let's forget about i.i.d samples: use random walks with specific properties instead

[Definition] 1st order Markov chain

"If you know about the present, forget about the past"

[Definition] Transition kernel T

A transition kernel is homogeneous if it is the same for all m.

[Definition] Invariance with respect to Markov chain

A distribution p(z) is invariant wrt. Markov chain if no step changes the distribution

Idea: create a specific random walk that explore the space of the target distribution, where high density regions are visited more frequently

Markov Chain Monte Carlo [MCMC] methods
refer to the design of a Markov chain such that
the target distribution is invariant wrt. the chain.

BUT HOW TO DESIGN THIS MARKOV CHAIN, IN PRACTICE?

The Metropolis-Hastings algorithm

Goal: sample from the target distribution p(θ)

INITIALIZATION

Define a symmetric proposal distribution q (= transition kernel)

For this example, we will use a Gaussian proposal

with parameter τ, the variance of the normal distribution

Define an initial value for θ

REPETITIVE PROCESS

For k=1 to K:

Generate candidate sample using proposal distribution, based on previous θ

2. Compute acceptance probability

Note: by using a symmetric proposal, we can simplify the ratio of posteriors.

3. Sample from a uniform distribution on the unit interval, then define the new θ

MCMC applied to approximate N(0,1)

WHAT IS THE WARM-UP PHASE?

To get rid of the initialization and focus on the high density region of θ, we remove a part of the first θ.

Here: we remove 50% of iterations

The effect of parameter τ in Gaussian proposal

The parameter τ controls the noise we add to the previous sample to generate a new candidate sample → step-size

Changing the parameter τ with the same K = 10 000 iterations

Pros & Cons for MCMC methods

PROS

Strong mathematical guarantee: we converge to the exact target distribution

Easy implementation

Easy to evaluate different models

CONS

May be very long for some difficult distributions

Proposal distribution may require tuning

Slow for large datasets

Page updated

Google Sites

Report abuse