Canini ãã®è«æ*1ãèªãã§ã¿ã¾ãããLDA ã®é次å¦ç¿ã«ã¤ãã¦è¿°ã¹ãè«æã§ãããããã¯ã¢ãã«ã«ããçµ±è¨çæ½å¨æå³è§£æãã® 3.5.2 ç¯ã¯ããã®è«æã®æ¹æ³ã説æãããã®ã«ãªã£ã¦ãã¾ããä¸è¨ã®ã¦ã§ããã¼ã¸ã§è«æã® PDF ãã¡ã¤ã«ããã¦ã³ãã¼ãã§ãã¾ãã
JMLR W&CP: Volume 5: AISTATS 2009
ã¾ã LDA ã®ç²åãã£ã«ã¿ã§ã®ãªãµã³ããªã³ã°ã«ã¤ãã¦ãæç§æ¸ã«ã¯ä»¥ä¸ã®ããã«æ¸ããã¦ãã¾ãããããã¯è«æã®å 容ã¨ç°ãªã£ã¦ããããã«æããã¾ãã
ãã¦ããªãµã³ããªã³ã°ã§ããããLDA ã§ã¯å¾©å æ½åºã«ããç²åã®çææ¶æ» ãè¡ããªãããã®çç±ã¨ãã¦ãåç²å㯠z(d,i)(s) ãªã®ã§ãæ°åä¸ã®ææ¸ãæ±ãå ´åã«å®è£ ãè¤éã«ãªãã³ã¹ããããããã¨ãæãããããã±ãã (Canini) ãã¯ãéå»ã®ãµã³ãã«ã®æ´»æ§å (è¥è¿ã) ã¨ããæ¹æ³ã«ãããªãµã³ããªã³ã°ãè¡ã£ã¦ããã
è«æã®ã¢ã«ã´ãªãºã (Algorithm 4) ã§ã¯ãESS ãé¾å¤ãä¸åã£ãã¨ãã«ã¯ãã¾ã 8 è¡ç®ã§ãªãµã³ããªã³ã°ãè¡ãããã®å¾ 9 ~ 11 è¡ç®ã§æ´»æ§åãè¡ãããã«ãªã£ã¦ãã¾ããæ¬æã«ã以ä¸ã®ããã«æ¸ããã¦ãã¾ãã
As in the resample-move algorithm of Gilks and Berzuini (2001), Markov chain Monte Carlo (MCMC) is used after particle resampling to restore diversity to the particle set in the same way that the incremental Gibbs sampler rejuvenates its samples, by choosing a rejuvenation sequence R(i) of topic variables to resample.
LDA ã®ç²åãã£ã«ã¿ã®ã¢ã«ã´ãªãºã ã確èªãã¦ã¿ãã¨ãæç§æ¸ã®å¼ (3.172), å¼ (3.175) ã¨ã z ãå«ã¾ãªãå½¢ã«ãªã£ã¦ããã®ã§ãæ´»æ§åãè¡ããªãã®ã§ããã° z ãåç²åã®æ å ±ã¨ãã¦ä¿æããå¿ è¦ã¯ãªããçµ±è¨éã§ãã nk,v 㨠nd,k ãä¿æãã¦ããã°å¦çã§ãã¾ã*2ãnk,v ã¯ããèªä½ãåèªã®ç°ãªãã«æ¯ä¾ãã大ããã«ãªãã¾ãããããã¯ãããã«ãã¦ãå¿ è¦ãªæ å ±ãªã®ã§ãå®è£ ã工夫ãã¦ä¿æããå¿ è¦ãããã¾ã*3ããããã®ãã¨ãããæ´»æ§åãè¡ãç®çã¯è«æã«æ¸ããã¦ããããã«ããªãµã³ããªã³ã°ã«ãã£ã¦å¤±ãããå¤æ§æ§ãåãæ»ããã¨ã«ããã®ã ããã¨æãã¾ãã
ãã¦ãæ´»æ§åã®å¦çèªä½ã«ã¤ãã¦ã¯ãæç§æ¸ã«æ¸ããã¦ããããã«ãR ã®å¤§ããããéå»ã®è¦³æ¸¬åãã©ãã¾ã§é¡ã£ã¦ wl,m ãé¸æããããå®ç¨ä¸ã®åé¡ã«ãªãã¾ããR ã大ããããã°å®è¡é度ãä½ä¸ããéå»ã®è¦³æ¸¬åãé·ãä¿æããã°å®è¡ã«å¿ è¦ãªã¡ã¢ãªéã大ãããªãã¾ãããã®ç¹ã«ã¤ãã¦ãè«æã®å®é¨ã§ã¯åãã©ã¡ã¼ã¿ã®å¤ã以ä¸ã®ããã«è¨å®ãã¦ãã¾ãã
ãã©ã¡ã¼ã¿ | è¨å®å¤ |
---|---|
ç²åæ° | 100 |
ESS ã®é¾å¤ | 20 ã¾ã㯠10 (ãã¼ã¿ã»ããã«ãã£ã¦å¤ãã¦ãã) |
R | 30 ã¾ã㯠10 (ãã¼ã¿ã»ããã«ãã£ã¦å¤ãã¦ãã) |
wl,m ã®é¸ææ¹æ³ | éå»ã®ãã¹ã¦ã®è¦³æ¸¬åããä¸æ§ã«ã©ã³ãã é¸æ |
表ã«ç¤ºããããã«ãè«æã®å®é¨ã§ã¯ wl,m ãéå»ã®ãã¹ã¦ã®è¦³æ¸¬åããé¸æãã¦ããã®ã§ããã®è¨å®ã§ã¯ææ¸ãå¦çãããã³ã«æ¶è²»ã¡ã¢ãªéãå¢ãã¦ãããã¨ã«ãªãããã§ãã
ãã®ç¹ã¯ Börschinger ãã®è«æ*4ã§ææããã¦ãã¾ããããã®è«æã¯ä¸è¨ã®ã¦ã§ããã¼ã¸ã« PDF ãã¡ã¤ã«ãããã¾ãããªã Börschinger ãã¯ãã®ç¹ã«ã¤ãã¦ææ³ã工夫ãã¦ããããã§ãããã¡ãã£ã¨é£ããã¦å
容ã¯ç§ã«ã¯ããåããã¾ããã§ããã
ACL Anthology » P12
ã¾ããMay ãã¯ãreservoir sampling ã¨ããææ³ãé©ç¨ãã¦ãæ´»æ§åã®ããã«ä¿æãã観測åã®é·ããéå®ãã¦ãã¾ã*5ããã®è«æã¯ä¸è¨ã®ã¦ã§ããã¼ã¸ã« PDF ãã¡ã¤ã«ãããã¾ããMay ãã®æ¹æ³ã¯è«æã® Algorithm 2 ã«æ¸ããã¦ããã¨ããã§ãããã¯æ¯è¼çç°¡åããã§ãã
ACL Anthology » P14
ã¨ããããMay ãã®è«æã§ã¯ Canini ãã®æ¹æ³ã¨ã®æ¯è¼å®é¨ãè¡ããã¦ããã®ã§ãããè«æã®çµè«ã§ã¯ãããããæ´»æ§åã¯éè¦ã§ã¯ãªããé次å¦ç¿ã®åã«å®è¡ãã (ãããå¦çã®) ã®ãã¹ãµã³ããªã³ã°ã«ããåæåãéè¦ã ã¨è¿°ã¹ããã¦ãã¾ããæ´»æ§åãè¡ãã®ãããã®ãè¡ãå¿ è¦ã¯ãªãã®ããããåãããªããªã£ã¦ãã¾ãã
*1:Kevin R. Canini, Lei Shi, and Thomas L. Griffiths. Online Inference of Topics with Latent Dirichlet Allocation. In Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics. 2009.
*2:åºåã¨ã㦠z ãã®ãã®ã欲ããå ´åã¯ãã¡ããä¿æããå¿ è¦ãããã¾ããããã®å ´åã§ã大æµã®ç¨éã§ã¯ 1 ææ¸ã®å¦çãçµãããã¨ã«æ¨ã¦ãããã®ã§ã¯ãªããã¨æãã¾ãã
*3:ç§ã試ã¿ã«å®è£ ããã¨ããã§ã¯ãè«æã® Figure 1 ã®ããã«ç²åã®è¦ªåé¢ä¿ãå©ç¨ããæ¨æ§é ã«ãããã¨ã§ãããªããµã¤ãºãå°ããã§ãã¾ããã
*4:Benjamin Börschinger and Mark Johnson. Using Rejuvenation to Improve Particle Filtering for Bayesian Word Segmentation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2012
*5:Chandler May, Alex Clemmer, and Benjamin Van Durme. Particle Filter Rejuvenation and Latent Dirichlet Allocation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2014.