Variational Bayes

Variational Bayes
VBmix
Summary

Variational Bayes
using the R package VBmix

Matt Moores Zoé van Havre

Bayesian Research & Applications Group
Queensland University of Technology, Brisbane, Australia
CRICOS provider no. 00213J

Thursday October 11, 2012

BRAG Oct. 11 Variational Bayes

Variational Bayes
VBmix
Summary

Outline

1 Variational Bayes
Introduction
univariate Gaussian
mixture of Gaussians

2 VBmix


Variational Bayes Introduction
VBmix univariate Gaussian
Summary mixture of Gaussians

Exact Inference

When the posterior distribution is analytically tractable
eg. Normal distribution with natural conjugate priors

p(θ|Y) = p(µ, σ 2 |Y) = p(µ|σ 2 , Y)p(σ 2 |Y) (1)
σ2
∼ N m′ , ′ IG(a′ , b ′ ) (2)
ν

where
ν ′ = ν0 + n
1 ¯
m′ = ν ′ (ν0 m0 + ny )
a′ = a0 + n 2
1 n ν0 n(y −m0 )2
¯
b ′ = b0 + 2 i=1 (yi − y )2 +
¯ ν0 +n



Approximate Inference

Stochastic approximation
Markov chain Monte Carlo
Analytic approximation
expectation propagation
Laplace approximation
variational Bayes



Variational Bayes

VB is derived from the calculus of variations
(Euler, Lagrange, et al.)
integration and differentiation of functionals
(functions of functions)
Kullback-Leibler (KL) divergence
measures the distance between our approximation q(θ)
and the true posterior distribution p(θ|Y)
p(θ|Y)
KL(q||p) = − q(θ) ln dθ (3)
q(θ)

Kullback & Leibler (1951) On Information and Sufﬁciency



Mean Field Variational Bayes

If the posterior distribution is analytically intractable,
approximate it using a distribution that is tractable
eg. using mean ﬁeld theory:
M
q(θ) = qm (θm ) (4)
m=1

then minimise the KL divergence using convex optimisation

Parisi (1988) Statistical Field Theory



VB for the univariate Gaussian distribution
The exact posterior distribution is analytically tractable
(see equation 1):

p(µ, σ 2 |Y) = p(µ|σ 2 , Y)p(σ 2 |Y)

but for the purpose of illustration:

q(µ, σ 2 ) = qµ (µ) × qσ2 (σ 2 )
ν0 m0 + ny E[σ 2 ]
¯
qµ (µ) ∼ N ,
ν0 + n ν0 + n
n
2 n 1
qσ2 (σ ) ∼ IG a0 + , b0 + Eµ (yi − µ)2 + ν0 (µ − m0 )2
2 2
i=1

this lends itself to estimation via a variant of the EM algorithm


R code for VB
§
while ( LB − oldLB > 0 . 1 ) {
# E−ste p
Emu ← m_vb
Etau ← a_vb / b_vb
# M−ste p
m_vb ← mean( y )
n_vb ← n
a_vb ← n / 2
b_vb ← (sum ( ( y − Emu) ^ 2 ) + 1 / Etau ) / 2
# check convergence
oldLB ← LB
LB ← calcLowerBound (m_vb , n_vb , a_vb , b_vb )
}



VB in action

iteration 0
2.0

1.5

τ 1.0

0.5

0.0
0.0 0.5 1.0 1.5 2.0
µ



VB in action

iteration 1 bound is −100.6
2.0

1.5

τ 1.0

0.5

0.0
0.0 0.5 1.0 1.5 2.0
µ



VB in action

iteration 2 bound is −100.2
2.0

1.5

τ 1.0

0.5

0.0
0.0 0.5 1.0 1.5 2.0
µ



Gaussian Mixture Model
Likelihood function:
 
n k
1 (yi − µj )2
p(y|λ, µ, σ 2 ) =  λj exp − 
i=1 j=1 2πσj2 2σj2

where
k
λj = 1
j=1

Natural conjugate priors:
p(λ) ∼ Dirichlet(α)
σj2
p(µj |σj2 ) ∼ N mj , νj

p(σj2 ) ∼ IG(aj , bj )


Exact Inference for GMM

Complexity of the posterior distribution is O(k n )
computationally infeasible for more than a small handful of
observations and mixture components
back of the envelope:
if k = 2 and n = 50, it would take approximately 15min
on an nVidia Tesla M2050 (1288 GFLOPs peak throughput)
if k = 2 and n = 100, it would take 31 billion years
For EM, Gibbs sampling and Variational Bayes, we approximate
the posterior by introducing a matrix Z of indicator variables,
such that zij = 1 if yi has the label j, and zij = 0 otherwise.

Robert & Mengersen (2011) Exact Bayesian analysis of mixtures



Variational Bayes for GMM
mean ﬁeld approximation:
k
q(θ) = q(Z) × q(λ) q(µj |σj2 )q(σj2 )
j=1

Variational E-step:
n k
z
q(Z) = ρij ij
i=1 j=1
ωij
ρij = k
x=1 ωix
1 1
log ωij = E[log λj ] − E[log σj2 ] − log 2π
2 2
(xi − µj )2
1
− Eµj ,σ2
2 j σj2


Variational Bayes for GMM, continued

M-step:
n n n
1 1
ˆ
nj = ρij ¯
yj = ρij yi sj2 =
ˆ ρij (yi − yj )2
¯
ˆ
nj ˆ
nj
i=1 i=1 i=1
ˆ
nj nj sj2
ˆˆ
ν0 nj (yj − m0 )2
ˆ ¯
q(σj2 ) ∼ IG a0 + , b0 + +
2 2 ˆ
2(ν0 + nj )
ˆ¯
ν0 m0 + nj yj σj2
q(µj |σj2 ) ∼ N ,
ˆ
ν0 + n j ˆ
ν0 + n j
ˆ ˆ
q(λ1 , . . . , λk ) ∼ Dirichlet(α0 + n1 , . . . , α0 + nk )


Variational Bayes
VBmix
Summary

VBmix

An R package by Pierrick Bruneau
Variational Bayesian inference for mixtures of Gaussians
see §10.2 of Bishop (2006)
open source (GPL v3)
implemented in C using the Gnu Scientiﬁc Library (GSL)
Windows binary unavailable on CRAN

Christopher M. Bishop (2006) Pattern Recognition and Machine Learning


Variational Bayes
VBmix
Summary

VBmix for Fisher’s iris data
§
i n s t a l l . packages ( "VBmix" ) # r e q u i r e s GSL, Qt , f f t w 3
l i b r a r y ( VBmix )

# 3 component m i x t u r e o f m u l t i v a r i a t e Gaussians
f i t _vb ← varbayes ( i r i s d a t a , ncomp=20)
f a c t o r ( Z to L a b e ls ( f i t _vb$model$ resp )

# ground t r u t h
irislabels

# f i t GMM u sin g maximum l i k e l i h o o d , f o r comparison
f i t _em ← classicEM ( i r i s d a t a , 4 )
f i t _em$ l a b e l s


Variational Bayes
VBmix
Summary

Summary

VB is an analytic approximation to the posterior distribution
suited to standard models with natural conjugate priors
update equations derived using calculus of variations
to minimise the KL divergence
algorithm resembles Expectation-Maximisation (EM)
can become stuck on suboptimal local maxima
tends to underestimate the uncertainty in the posterior
The R package VBmix provides fast, approximate inference
for mixtures of multivariate Gaussians.


Appendix For Further Reading

For Further Reading I

Christopher M. Bishop
Pattern Recognition and Machine Learning.
Springer, 2006.
John Ormerod & Matt Wand
Explaining Variational Approximations.
The American Statistician, 64(2): 140–153, 2010.
Mike Jordan, Zoubin Ghahramani, Tommi Jaakkola, & Lawrence Saul
An Introduction to Variational Methods for Graphical Models.
Machine Learning, 37: 183–233, 1999.
Pierrick Bruneau, Marc Gelgon & Fabien Picarougne
Parsimonious reduction of Gaussian mixture models with a
variational-Bayes approach.
Pattern Recognition, 43(3): 850–858, 2010.


Appendix For Further Reading

For Further Reading II

Clare McGrory & Mike Titterington
Variational approximations in Bayesian model selection for ﬁnite mixture
distributions.
Computational Statistics & Data Analysis, 51: 5352–5367, 2007.
Solomon Kullback & Richard Leibler
On Information and Sufﬁciency.
The Annals of Mathematical Statistics, 22: 79–86, 1951.
Giorgio Parisi
Statistical Field Theory.
Addison-Wesley, 1988.
Christian Robert & Kerrie Mengersen
Exact Bayesian analysis of mixtures
In Mengersen, Robert & Titternginton (eds.)
Mixtures: Estimation and Applications
John Wiley & Sons, 2011.


Variational Bayes

More Related Content

Variational Bayes