Week 11
Black-box variational inference | Stochastic optimization
Black-box variational inference | Stochastic optimization
Recall - Variational Inference method
Goal: approximate a target distribution (posterior)
Idea: use a collection of "simple" distributions to get as close as possible to our target distribution by minimizing the distance
About the variational family
Free-form variational inference
Optimal function form given assumptions
BUT some issues:
Require model-specific derivations
Integrals may be intractable
Optimal forms may not be "well-known" distributions
Now, the goal is:
About entropy calculation
Easier calculation of entropy and its gradient, for Gaussian distributions:
Now, let's focus on the remaining term of ELBO
About the expectation of the joint distribution of t and w
Monte Carlo sampling
What about the gradient?
We use the score function gradient estimator
We cannot use Monte Carlo sampling since
Application: approximation of the posterior distribution of a linear Gaussian model
Last steps of BBVI algorithm:
"Estimate gradient"
"Update variational parameters using gradient estimate"...
BUT with a constant step-size, the gradient ascent doesn't converge
Now, the step-size decreases at each iteration t
Many recent methods for stochastic optimization, with Adam as the most common one
If unbiased gradient estimator, the Robbins-Monro conditions guarantees convergence
High variance = small step-size, so lower variance = faster optimization
Many variational families can be re-parametrized.
Example of re-parametrization with a Gaussian distribution
Visual comparison of convergence for both gradient estimators after re-parametrization
Score function gradient estimator:
more general, but large variance, so long optimization
Re-parametrized gradient estimator:
lower variance, but only applicable to continuous variables