階層ディリクレ過程を実装してみる (1) HDP-LDA と LDA のモデルを比較 - Mi manca qualche giovedi`? ã®ç¶ãã
ä»åã [Teh+ 2006] ã«åºã¥ãã¦ãChinese Restaurant Franchise(ä¸è¯æçåºãã©ã³ãã£ã¤ãº, CRF) ã®æ çµã¿ã§ Hierarchical Dirichlet Process(é層ãã£ãªã¯ã¬éç¨, HDP) ã® Collapsed Gibbs sampling æ¨è«ãè¡ãå ´åã®æ´æ°å¼ãå°åºãã¦ããã
ã¾ãä»åã¯ä¸è¬ã® HDP ã CRF ã«è½ã¨ãã¨ããã次åã¯ãããã full conditional ãå°åº([Teh+ 2006] ã«ãã ããã³ t ã k ã®äºå¾åå¸ãå°åº)ãããã¦æ¬¡ã åãããã§ããããã®æ´æ°å¼ã HDP-LDA ã«å½ã¦ã¯ããå ´å(ã¤ã¾ãååè¨äºã® base measure H 㨠emission? F(Ï) ã«å ·ä½çãªåå¸ãé©ç¨ããå ´å)ããã£ã¤ããã¤ã¡ã¼ã¸ã
ã¾ããä¸ã§æ¸ããããã« CRF ã®æ çµã¿ãå°å
¥ããããã ãããã®çç±ã¯ (Hierarchical) Dirichlet Process ã®ã¾ã¾ã§ã¯ç¡éåãµã³ããªã³ã°ããªããã°ãªããªããªã£ã¦ãã¾ããããCRF ãªãæéåã§æ¸ã¾ããããã
æ¬æ¥ãªããã㧠Stick-Breaking Process ã¨ã CRF ã¨ãã¡ããã¨èª¬æããæ¹ããããã ãããããã°ã£ã¨çç¥ãã¦ãCRF ã®è¨æ³ã®èª¬æã«å
¥ã£ã¦ãã¾ãã
ã®ã§ããããªãã客ãã¨ãããã¼ãã«ãã¨ããæçãã¨ããåºãã¨ããåèªãåºã¦ããããã¾ãããªãã¨è¨ãããã»ãã¨ç³ã訳ãªã(è¦ç¬)ã
ã¡ãªã¿ã« HDP-LDA ã®å ´åã ã¨ããåºï¼ææ¸ãããæçï¼ãããã¯ããã客ï¼åèª(x_ji)ããã§ãããããã¼ãã«ãã¯å®¢ã¨æçã(ã¤ã¾ããåèªã¨ãããã¯ã)çµã³ã¤ãããã®ã¨ããã¡ã¿ãã¡ã¼ãã¨ããã»ã©ä¼¼ã¦ãæ°ã¯ããªãããã¾ããããã対å¿ã§ãããã¨ããã²ã©ã説æã§ãè¶ãæ¿ãã¦ããâ¦â¦ã
K ãããµã³ããªã³ã°æç¹ã§ã®æå¹ãª(ã¤ã¾ãå°ãªãã¨ãï¼ã¤ä»¥ä¸ã®å®¢ããã¼ãã«ãå²ãå½ã¦ããã¦ãã)æçã®åæ°ãã¨ãã ããã® K åã®æçã¨ããã
j ãåºã®ã¤ã³ããã¯ã¹ã¨ããθ_ji ã i çªç®ã®å®¢ãé£ã¹ã¦ããæçãÏ_jt ã t çªç®ã®ãã¼ãã«ã«æä¾ããã¦ããæçã¨ããã
ã¤ã¾ãã客 x_ji ãçãã¦ãããã¼ãã«ã t_ji ã§ããã¼ãã« t ã«æä¾ããã¦ããæçã k_jt ã®ã¨ãã ã¨ãããã¨ã§ããã
å°ã
ãããããããè¦ã¯ θ_ji ã Ï_jt ã¯ãæåã« H ãããµã³ããªã³ã°ããæç Ï_k ã®ã©ããã§ãããã¨ãããã¨ã
ä¸ã®å³ã§è¨ãã°ãåºï¼çªã®å®¢ï¼çªã¯ãã¼ãã«ï¼çªã«çãã¦ãããæçï¼çªãé£ã¹ã¦ãã(ä¸çªä¸ã®å²ã¿ã®å·¦ããä¸çªç®)ã
ããã θ_17 = Ï_13 = Ï_1 ã¨ããå¼ã§è¡¨ããã¨ããèãæ¹ãªããã ã
ãã¦ãããããè¨å®ã®å
㧠CRF ã®æ çµã¿ãæãããè¨ãã¨ããæ¥åºãã客ã¯ã客ã®å¤ããã¼ãã«ã«çããããããæ°è¦ãã¼ãã«ã«çãå¯è½æ§ãè¥å¹²æ®ã£ã¦ãããã
æ°å¼ã§æ¸ãã¨æ¬¡ã®éãã
n_jtk ã¯ãj çªç®ã®åºã®ãt çªç®ã®ãã¼ãã«ã§ãk çªç®ã®æçãé£ã¹ã¦ãã客ã®æ°ããã¤ã³ããã¯ã¹ãç¹ã«ãªã£ããã®ã¯ãã®å¨è¾ºåã¨ããããããè¨æ³ã¨ãã¦ããã
ãã®å¼ã§ã¯ i çªç®ã«æ°ããæ¥ã客ãã©ã®ãã¼ãã«ã«çããã示ãã¦ãããå
客ããã t çªç®ã®ãã¼ãã«ã«çã確çã ã客ã®ããªãæ°è¦ãã¼ãã«ã«çã確çã ã¨ããæå³ã®å¼ã§ããã
æ°è¦ãã¼ãã«ã«çãå ´åã¯ãH ããä½ããããã£ãªã¯ã¬éç¨ G_0 ããã®ãµã³ããªã³ã°ãè¡ãããããã ãããããåãããã« CRF ã«è½ã¨ãè¾¼ããã¨ã§ãä¸ã¨ããä¼¼ãå¼ãå¾ãããã
m_jk ã¯ãj çªç®ã®åºã®ãk çªç®ã®æçãæä¾ããã¦ãããã¼ãã«ã®æ°ãã§ãã¤ã³ããã¯ã¹ãç¹ã¯ä»¥ä¸åæã
ãã¯ãåæ§ã«ãã k çªç®ã®æä¾æ¸ã¿ã®æçãããæ°è¦æçããããããã®ç¢ºçã§é¸ã°ãããæ°è¦æçã®å ´å㯠base measure H ããæ°ããæçãé¸ã°ãã K ã¯ï¼ã¤å¢ãããããã¨ã«ãªãã
ãªããα_0 ã γ ã¯ãªã«ãããã®ãã¤ãã¼ãã©ã¡ã¼ã¿ã§ãããããã§ã¯è©³ç´°ã¯çç¥ããã
ã©ãã«ããããããã¦ããããªãããæåã«æ¸ããã¨ããããããããã£ãªã¯ã¬éç¨ã®ã¾ã¾ã§ã¯ãç¡éåãµã³ããªã³ã°ããªããã°ãªããªããã®ãåé¿ããããããã
ã¤ã¾ããæ£æ»æ³ã§ã¯äºå®ä¸ãµã³ããªã³ã°ã§ããªããã£ãªã¯ã¬éç¨ããæéåã®åççãªæé ã§ãµã³ããªã³ã°ã§ããããã«ããã¢ã«ã´ãªãºã ã ãã¨èããã°å°ãã¯å¿ã«å¹³å®ã訪ããâ¦â¦ã®ãããããªã(è¦ç¬)ã