SlideShare a Scribd company logo
Variational Bayes
                       VBmix
                    Summary




            Variational Bayes
         using the R package VBmix


       Matt Moores              Zoé van Havre

      Bayesian Research & Applications Group
Queensland University of Technology, Brisbane, Australia
            CRICOS provider no. 00213J


         Thursday October 11, 2012




               BRAG Oct. 11      Variational Bayes
Variational Bayes
                                VBmix
                             Summary


Outline



  1   Variational Bayes
        Introduction
        univariate Gaussian
        mixture of Gaussians


  2   VBmix




                       BRAG Oct. 11      Variational Bayes
Variational Bayes    Introduction
                                    VBmix     univariate Gaussian
                                 Summary      mixture of Gaussians


Exact Inference

  When the posterior distribution is analytically tractable
      eg. Normal distribution with natural conjugate priors

            p(θ|Y) = p(µ, σ 2 |Y) = p(µ|σ 2 , Y)p(σ 2 |Y)                 (1)
                               σ2
                   ∼ N m′ , ′ IG(a′ , b ′ )                               (2)
                               ν

  where
            ν ′ = ν0 + n
                   1               ¯
           m′ =    ν ′ (ν0 m0   + ny )
            a′ =   a0 + n  2
                         1        n                       ν0 n(y −m0 )2
                                                               ¯
            b ′ = b0 +   2        i=1 (yi    − y )2 +
                                               ¯              ν0 +n


                             BRAG Oct. 11     Variational Bayes
Variational Bayes   Introduction
                                VBmix    univariate Gaussian
                             Summary     mixture of Gaussians


Approximate Inference



  Stochastic approximation
      Markov chain Monte Carlo
  Analytic approximation
      expectation propagation
      Laplace approximation
      variational Bayes




                          BRAG Oct. 11   Variational Bayes
Variational Bayes   Introduction
                                       VBmix    univariate Gaussian
                                    Summary     mixture of Gaussians


Variational Bayes


         VB is derived from the calculus of variations
         (Euler, Lagrange, et al.)
              integration and differentiation of functionals
              (functions of functions)
         Kullback-Leibler (KL) divergence
              measures the distance between our approximation q(θ)
              and the true posterior distribution p(θ|Y)
                                                         p(θ|Y)
                   KL(q||p) = −             q(θ) ln                    dθ   (3)
                                                          q(θ)


Kullback & Leibler (1951) On Information and Sufficiency

                              BRAG Oct. 11      Variational Bayes
Variational Bayes       Introduction
                                         VBmix        univariate Gaussian
                                      Summary         mixture of Gaussians


Mean Field Variational Bayes



    If the posterior distribution is analytically intractable,
    approximate it using a distribution that is tractable
          eg. using mean field theory:
                                                  M
                                 q(θ) =               qm (θm )               (4)
                                              m=1

    then minimise the KL divergence using convex optimisation




Parisi (1988) Statistical Field Theory

                                BRAG Oct. 11          Variational Bayes
Variational Bayes   Introduction
                                   VBmix    univariate Gaussian
                                Summary     mixture of Gaussians


VB for the univariate Gaussian distribution
  The exact posterior distribution is analytically tractable
  (see equation 1):

                    p(µ, σ 2 |Y) = p(µ|σ 2 , Y)p(σ 2 |Y)

  but for the purpose of illustration:

  q(µ, σ 2 ) = qµ (µ) × qσ2 (σ 2 )
                    ν0 m0 + ny E[σ 2 ]
                              ¯
    qµ (µ) ∼ N                   ,
                       ν0 + n      ν0 + n
                                                     n
        2                 n      1
  qσ2 (σ ) ∼ IG       a0 + , b0 + Eµ                      (yi − µ)2 + ν0 (µ − m0 )2
                          2      2
                                                    i=1

  this lends itself to estimation via a variant of the EM algorithm
                          BRAG Oct. 11      Variational Bayes
Variational Bayes   Introduction
                              VBmix    univariate Gaussian
                           Summary     mixture of Gaussians


R code for VB
 §
  while ( LB − oldLB > 0 . 1 ) {
    # E−ste p
    Emu ← m_vb
    Etau ← a_vb / b_vb
    # M−ste p
    m_vb ← mean( y )
    n_vb ← n
    a_vb ← n / 2
    b_vb ← (sum ( ( y − Emu) ^ 2 ) + 1 / Etau ) / 2
    # check convergence
    oldLB ← LB
    LB ← calcLowerBound (m_vb , n_vb , a_vb , b_vb )
  }

                     BRAG Oct. 11      Variational Bayes
Variational Bayes   Introduction
                                   VBmix    univariate Gaussian
                                Summary     mixture of Gaussians


VB in action


                        iteration 0
     2.0

     1.5

    τ 1.0

     0.5

     0.0
            0.0   0.5          1.0          1.5         2.0
                                µ


                          BRAG Oct. 11      Variational Bayes
Variational Bayes   Introduction
                                   VBmix    univariate Gaussian
                                Summary     mixture of Gaussians


VB in action


            iteration 1 bound is −100.6
     2.0

     1.5

    τ 1.0

     0.5

     0.0
            0.0   0.5          1.0          1.5         2.0
                                µ


                          BRAG Oct. 11      Variational Bayes
Variational Bayes   Introduction
                                   VBmix    univariate Gaussian
                                Summary     mixture of Gaussians


VB in action


            iteration 2 bound is −100.2
     2.0

     1.5

    τ 1.0

     0.5

     0.0
            0.0   0.5          1.0          1.5         2.0
                                µ


                          BRAG Oct. 11      Variational Bayes
Variational Bayes        Introduction
                                      VBmix         univariate Gaussian
                                   Summary          mixture of Gaussians


Gaussian Mixture Model
  Likelihood function:
                                                                                          
                           n           k
                                                     1                     (yi − µj   )2
      p(y|λ, µ, σ 2 ) =                     λj              exp −                         
                          i=1          j=1          2πσj2                      2σj2

  where
                                             k
                                                  λj = 1
                                           j=1

  Natural conjugate priors:
         p(λ) ∼ Dirichlet(α)
                                 σj2
     p(µj |σj2 ) ∼ N      mj ,   νj

          p(σj2 ) ∼ IG(aj , bj )
                               BRAG Oct. 11         Variational Bayes
Variational Bayes   Introduction
                                      VBmix    univariate Gaussian
                                   Summary     mixture of Gaussians


Exact Inference for GMM


    Complexity of the posterior distribution is O(k n )
         computationally infeasible for more than a small handful of
         observations and mixture components
         back of the envelope:
              if k = 2 and n = 50, it would take approximately 15min
              on an nVidia Tesla M2050 (1288 GFLOPs peak throughput)
              if k = 2 and n = 100, it would take 31 billion years
    For EM, Gibbs sampling and Variational Bayes, we approximate
    the posterior by introducing a matrix Z of indicator variables,
    such that zij = 1 if yi has the label j, and zij = 0 otherwise.

Robert & Mengersen (2011) Exact Bayesian analysis of mixtures

                             BRAG Oct. 11      Variational Bayes
Variational Bayes     Introduction
                                     VBmix      univariate Gaussian
                                  Summary       mixture of Gaussians


Variational Bayes for GMM
  mean field approximation:
                                                 k
                q(θ) = q(Z) × q(λ)                    q(µj |σj2 )q(σj2 )
                                                j=1

  Variational E-step:
                             n     k
                                         z
             q(Z) =                    ρij ij
                           i=1 j=1
                                   ωij
                ρij   =          k
                                 x=1 ωix
                                    1             1
            log ωij   = E[log λj ] − E[log σj2 ] − log 2π
                                    2             2
                                    (xi − µj )2
                          1
                        − Eµj ,σ2
                          2       j     σj2
                            BRAG Oct. 11        Variational Bayes
Variational Bayes        Introduction
                                        VBmix         univariate Gaussian
                                     Summary          mixture of Gaussians


Variational Bayes for GMM, continued

  M-step:
         n                              n                                      n
                                  1                                       1
  ˆ
  nj =         ρij      ¯
                        yj   =               ρij yi          sj2 =
                                                             ˆ                       ρij (yi − yj )2
                                                                                               ¯
                                  ˆ
                                  nj                                      ˆ
                                                                          nj
         i=1                           i=1                                     i=1
                                                      ˆ
                                                      nj            nj sj2
                                                                    ˆˆ
                                                                    ν0 nj (yj − m0 )2
                                                                       ˆ ¯
                     q(σj2 ) ∼ IG           a0 +         , b0 +   +
                                                      2         2               ˆ
                                                                       2(ν0 + nj )
                                                  ˆ¯
                                         ν0 m0 + nj yj     σj2
               q(µj |σj2 ) ∼ N                         ,
                                                 ˆ
                                            ν0 + n j           ˆ
                                                         ν0 + n j
                                          ˆ                 ˆ
     q(λ1 , . . . , λk ) ∼ Dirichlet(α0 + n1 , . . . , α0 + nk )



                                 BRAG Oct. 11         Variational Bayes
Variational Bayes
                                      VBmix
                                   Summary


VBmix



    An R package by Pierrick Bruneau
        Variational Bayesian inference for mixtures of Gaussians
              see §10.2 of Bishop (2006)
         open source (GPL v3)
         implemented in C using the Gnu Scientific Library (GSL)
         Windows binary unavailable on CRAN




Christopher M. Bishop (2006) Pattern Recognition and Machine Learning

                             BRAG Oct. 11      Variational Bayes
Variational Bayes
                                   VBmix
                                Summary


VBmix for Fisher’s iris data
 §
  i n s t a l l . packages ( "VBmix" ) # r e q u i r e s GSL, Qt , f f t w 3
  l i b r a r y ( VBmix )

  # 3 component m i x t u r e o f m u l t i v a r i a t e Gaussians
  f i t _vb ← varbayes ( i r i s d a t a , ncomp=20)
  f a c t o r ( Z to L a b e ls ( f i t _vb$model$ resp )

  # ground t r u t h
  irislabels

  # f i t GMM u sin g maximum l i k e l i h o o d , f o r comparison
  f i t _em ← classicEM ( i r i s d a t a , 4 )
  f i t _em$ l a b e l s

                          BRAG Oct. 11      Variational Bayes
Variational Bayes
                                 VBmix
                              Summary


Summary


 VB is an analytic approximation to the posterior distribution
     suited to standard models with natural conjugate priors
          update equations derived using calculus of variations
          to minimise the KL divergence
     algorithm resembles Expectation-Maximisation (EM)
          can become stuck on suboptimal local maxima
     tends to underestimate the uncertainty in the posterior
 The R package VBmix provides fast, approximate inference
 for mixtures of multivariate Gaussians.




                        BRAG Oct. 11      Variational Bayes
Appendix   For Further Reading



For Further Reading I

     Christopher M. Bishop
     Pattern Recognition and Machine Learning.
     Springer, 2006.
     John Ormerod & Matt Wand
     Explaining Variational Approximations.
     The American Statistician, 64(2): 140–153, 2010.
     Mike Jordan, Zoubin Ghahramani, Tommi Jaakkola, & Lawrence Saul
     An Introduction to Variational Methods for Graphical Models.
     Machine Learning, 37: 183–233, 1999.
     Pierrick Bruneau, Marc Gelgon & Fabien Picarougne
     Parsimonious reduction of Gaussian mixture models with a
     variational-Bayes approach.
     Pattern Recognition, 43(3): 850–858, 2010.



                          BRAG Oct. 11   Variational Bayes
Appendix   For Further Reading



For Further Reading II

     Clare McGrory & Mike Titterington
     Variational approximations in Bayesian model selection for finite mixture
     distributions.
     Computational Statistics & Data Analysis, 51: 5352–5367, 2007.
     Solomon Kullback & Richard Leibler
     On Information and Sufficiency.
     The Annals of Mathematical Statistics, 22: 79–86, 1951.
     Giorgio Parisi
     Statistical Field Theory.
     Addison-Wesley, 1988.
     Christian Robert & Kerrie Mengersen
     Exact Bayesian analysis of mixtures
     In Mengersen, Robert & Titternginton (eds.)
     Mixtures: Estimation and Applications
     John Wiley & Sons, 2011.

                            BRAG Oct. 11    Variational Bayes

More Related Content

Variational Bayes

  • 1. Variational Bayes VBmix Summary Variational Bayes using the R package VBmix Matt Moores Zoé van Havre Bayesian Research & Applications Group Queensland University of Technology, Brisbane, Australia CRICOS provider no. 00213J Thursday October 11, 2012 BRAG Oct. 11 Variational Bayes
  • 2. Variational Bayes VBmix Summary Outline 1 Variational Bayes Introduction univariate Gaussian mixture of Gaussians 2 VBmix BRAG Oct. 11 Variational Bayes
  • 3. Variational Bayes Introduction VBmix univariate Gaussian Summary mixture of Gaussians Exact Inference When the posterior distribution is analytically tractable eg. Normal distribution with natural conjugate priors p(θ|Y) = p(µ, σ 2 |Y) = p(µ|σ 2 , Y)p(σ 2 |Y) (1) σ2 ∼ N m′ , ′ IG(a′ , b ′ ) (2) ν where ν ′ = ν0 + n 1 ¯ m′ = ν ′ (ν0 m0 + ny ) a′ = a0 + n 2 1 n ν0 n(y −m0 )2 ¯ b ′ = b0 + 2 i=1 (yi − y )2 + ¯ ν0 +n BRAG Oct. 11 Variational Bayes
  • 4. Variational Bayes Introduction VBmix univariate Gaussian Summary mixture of Gaussians Approximate Inference Stochastic approximation Markov chain Monte Carlo Analytic approximation expectation propagation Laplace approximation variational Bayes BRAG Oct. 11 Variational Bayes
  • 5. Variational Bayes Introduction VBmix univariate Gaussian Summary mixture of Gaussians Variational Bayes VB is derived from the calculus of variations (Euler, Lagrange, et al.) integration and differentiation of functionals (functions of functions) Kullback-Leibler (KL) divergence measures the distance between our approximation q(θ) and the true posterior distribution p(θ|Y) p(θ|Y) KL(q||p) = − q(θ) ln dθ (3) q(θ) Kullback & Leibler (1951) On Information and Sufficiency BRAG Oct. 11 Variational Bayes
  • 6. Variational Bayes Introduction VBmix univariate Gaussian Summary mixture of Gaussians Mean Field Variational Bayes If the posterior distribution is analytically intractable, approximate it using a distribution that is tractable eg. using mean field theory: M q(θ) = qm (θm ) (4) m=1 then minimise the KL divergence using convex optimisation Parisi (1988) Statistical Field Theory BRAG Oct. 11 Variational Bayes
  • 7. Variational Bayes Introduction VBmix univariate Gaussian Summary mixture of Gaussians VB for the univariate Gaussian distribution The exact posterior distribution is analytically tractable (see equation 1): p(µ, σ 2 |Y) = p(µ|σ 2 , Y)p(σ 2 |Y) but for the purpose of illustration: q(µ, σ 2 ) = qµ (µ) × qσ2 (σ 2 ) ν0 m0 + ny E[σ 2 ] ¯ qµ (µ) ∼ N , ν0 + n ν0 + n n 2 n 1 qσ2 (σ ) ∼ IG a0 + , b0 + Eµ (yi − µ)2 + ν0 (µ − m0 )2 2 2 i=1 this lends itself to estimation via a variant of the EM algorithm BRAG Oct. 11 Variational Bayes
  • 8. Variational Bayes Introduction VBmix univariate Gaussian Summary mixture of Gaussians R code for VB § while ( LB − oldLB > 0 . 1 ) { # E−ste p Emu ← m_vb Etau ← a_vb / b_vb # M−ste p m_vb ← mean( y ) n_vb ← n a_vb ← n / 2 b_vb ← (sum ( ( y − Emu) ^ 2 ) + 1 / Etau ) / 2 # check convergence oldLB ← LB LB ← calcLowerBound (m_vb , n_vb , a_vb , b_vb ) } BRAG Oct. 11 Variational Bayes
  • 9. Variational Bayes Introduction VBmix univariate Gaussian Summary mixture of Gaussians VB in action iteration 0 2.0 1.5 τ 1.0 0.5 0.0 0.0 0.5 1.0 1.5 2.0 µ BRAG Oct. 11 Variational Bayes
  • 10. Variational Bayes Introduction VBmix univariate Gaussian Summary mixture of Gaussians VB in action iteration 1 bound is −100.6 2.0 1.5 τ 1.0 0.5 0.0 0.0 0.5 1.0 1.5 2.0 µ BRAG Oct. 11 Variational Bayes
  • 11. Variational Bayes Introduction VBmix univariate Gaussian Summary mixture of Gaussians VB in action iteration 2 bound is −100.2 2.0 1.5 τ 1.0 0.5 0.0 0.0 0.5 1.0 1.5 2.0 µ BRAG Oct. 11 Variational Bayes
  • 12. Variational Bayes Introduction VBmix univariate Gaussian Summary mixture of Gaussians Gaussian Mixture Model Likelihood function:   n k 1 (yi − µj )2 p(y|λ, µ, σ 2 ) =  λj exp −  i=1 j=1 2πσj2 2σj2 where k λj = 1 j=1 Natural conjugate priors: p(λ) ∼ Dirichlet(α) σj2 p(µj |σj2 ) ∼ N mj , νj p(σj2 ) ∼ IG(aj , bj ) BRAG Oct. 11 Variational Bayes
  • 13. Variational Bayes Introduction VBmix univariate Gaussian Summary mixture of Gaussians Exact Inference for GMM Complexity of the posterior distribution is O(k n ) computationally infeasible for more than a small handful of observations and mixture components back of the envelope: if k = 2 and n = 50, it would take approximately 15min on an nVidia Tesla M2050 (1288 GFLOPs peak throughput) if k = 2 and n = 100, it would take 31 billion years For EM, Gibbs sampling and Variational Bayes, we approximate the posterior by introducing a matrix Z of indicator variables, such that zij = 1 if yi has the label j, and zij = 0 otherwise. Robert & Mengersen (2011) Exact Bayesian analysis of mixtures BRAG Oct. 11 Variational Bayes
  • 14. Variational Bayes Introduction VBmix univariate Gaussian Summary mixture of Gaussians Variational Bayes for GMM mean field approximation: k q(θ) = q(Z) × q(λ) q(µj |σj2 )q(σj2 ) j=1 Variational E-step: n k z q(Z) = ρij ij i=1 j=1 ωij ρij = k x=1 ωix 1 1 log ωij = E[log λj ] − E[log σj2 ] − log 2π 2 2 (xi − µj )2 1 − Eµj ,σ2 2 j σj2 BRAG Oct. 11 Variational Bayes
  • 15. Variational Bayes Introduction VBmix univariate Gaussian Summary mixture of Gaussians Variational Bayes for GMM, continued M-step: n n n 1 1 ˆ nj = ρij ¯ yj = ρij yi sj2 = ˆ ρij (yi − yj )2 ¯ ˆ nj ˆ nj i=1 i=1 i=1 ˆ nj nj sj2 ˆˆ ν0 nj (yj − m0 )2 ˆ ¯ q(σj2 ) ∼ IG a0 + , b0 + + 2 2 ˆ 2(ν0 + nj ) ˆ¯ ν0 m0 + nj yj σj2 q(µj |σj2 ) ∼ N , ˆ ν0 + n j ˆ ν0 + n j ˆ ˆ q(λ1 , . . . , λk ) ∼ Dirichlet(α0 + n1 , . . . , α0 + nk ) BRAG Oct. 11 Variational Bayes
  • 16. Variational Bayes VBmix Summary VBmix An R package by Pierrick Bruneau Variational Bayesian inference for mixtures of Gaussians see §10.2 of Bishop (2006) open source (GPL v3) implemented in C using the Gnu Scientific Library (GSL) Windows binary unavailable on CRAN Christopher M. Bishop (2006) Pattern Recognition and Machine Learning BRAG Oct. 11 Variational Bayes
  • 17. Variational Bayes VBmix Summary VBmix for Fisher’s iris data § i n s t a l l . packages ( "VBmix" ) # r e q u i r e s GSL, Qt , f f t w 3 l i b r a r y ( VBmix ) # 3 component m i x t u r e o f m u l t i v a r i a t e Gaussians f i t _vb ← varbayes ( i r i s d a t a , ncomp=20) f a c t o r ( Z to L a b e ls ( f i t _vb$model$ resp ) # ground t r u t h irislabels # f i t GMM u sin g maximum l i k e l i h o o d , f o r comparison f i t _em ← classicEM ( i r i s d a t a , 4 ) f i t _em$ l a b e l s BRAG Oct. 11 Variational Bayes
  • 18. Variational Bayes VBmix Summary Summary VB is an analytic approximation to the posterior distribution suited to standard models with natural conjugate priors update equations derived using calculus of variations to minimise the KL divergence algorithm resembles Expectation-Maximisation (EM) can become stuck on suboptimal local maxima tends to underestimate the uncertainty in the posterior The R package VBmix provides fast, approximate inference for mixtures of multivariate Gaussians. BRAG Oct. 11 Variational Bayes
  • 19. Appendix For Further Reading For Further Reading I Christopher M. Bishop Pattern Recognition and Machine Learning. Springer, 2006. John Ormerod & Matt Wand Explaining Variational Approximations. The American Statistician, 64(2): 140–153, 2010. Mike Jordan, Zoubin Ghahramani, Tommi Jaakkola, & Lawrence Saul An Introduction to Variational Methods for Graphical Models. Machine Learning, 37: 183–233, 1999. Pierrick Bruneau, Marc Gelgon & Fabien Picarougne Parsimonious reduction of Gaussian mixture models with a variational-Bayes approach. Pattern Recognition, 43(3): 850–858, 2010. BRAG Oct. 11 Variational Bayes
  • 20. Appendix For Further Reading For Further Reading II Clare McGrory & Mike Titterington Variational approximations in Bayesian model selection for finite mixture distributions. Computational Statistics & Data Analysis, 51: 5352–5367, 2007. Solomon Kullback & Richard Leibler On Information and Sufficiency. The Annals of Mathematical Statistics, 22: 79–86, 1951. Giorgio Parisi Statistical Field Theory. Addison-Wesley, 1988. Christian Robert & Kerrie Mengersen Exact Bayesian analysis of mixtures In Mengersen, Robert & Titternginton (eds.) Mixtures: Estimation and Applications John Wiley & Sons, 2011. BRAG Oct. 11 Variational Bayes