æ¦è¦
ãããã¯ã¢ãã«[LSA(SVDã«ãã), LSA(NMFã«ãã), LDA]ã®æ¯è¼ãè¡ãã
åºåãããããã¯ã®ã³ãã¼ã¬ã³ã¹(ä¸è²«æ§)ãè¿å¹´ææ¡ãããææ³(UCI measure, UMass measure)ã®å¹³åãã¨ã³ãããã¼ãç¨ãããã¨ã§ã¢ãã«å
¨ä½ã«é©ç¨ããã
ã¾ã人éãåèªãã¢ã«ã¤ããé¡ä¼¼åº¦ã¨ãããã¯ç©ºéã§ã®ãã¯ãã«éã®é¡ä¼¼åº¦ã®ç¸é¢ä¿æ°ããææ¸åé¡ã®ã¿ã¹ã¯ã§ãè©ä¾¡ããã
å è¡ç 究ã«ããã¨LSA(NMFã«ãã)ã¯PLSAã¨é¡ä¼¼æ§ããããããã
ãããã¯ã®ã³ãã¼ã¬ã³ã¹ã測ã尺度
以ä¸ã®ããã«ãããã¯ã«å«ã¾ããåèªéã®ã¹ã³ã¢ã®ç·åãæ±ãã(ãããã¯ãã¨ã«å®ç¾©ããã)ã
ã¯smoothing factor
UCI measure("Evaluating topic models for digital libraries", Newman et al., 2010)
ãããã¯å
ã®åèªéã®PMI(pointwise mutual information)ãå
¨ã¦ã®åèªã«å¯¾ãã¦è¶³ãåãããã
PMIã®å
ã¨ãªã確çã¯å¤é¨ã®ã³ã¼ãã¹ã§è¨ç®ããã
UMass measure("Optimizing semantic coherence in topic models", Mimno et al., 2011)
ã¯ãç»å ´ããææ¸ã®æ°ã
ãã¡ãã¯ãããã¯ã¢ãã«ãé©ç¨ããã³ã¼ãã¹èªä½ã§è¨ç®ããå¤ã使ãã
çµæ
ã³ãã¼ã¬ã³ã¹ã®å°ºåº¦ã§ã¯LDA>LSA(NMF)>LSA(SVD)ã§ãã£ãã
ãããé¡ä¼¼åº¦ãææ¸åé¡ã®ã¿ã¹ã¯ã§ã¯LSA(SVD)ãæãè¯ãæ§è½ã示ããã
ãã®ãã¨ããLDAã¯äººéãç®ã§è¦ã¦ãããããããããã¯ãçæãã¦ããããç°¡æ½ãªè¡¨ç¾ã¨ãã¦ã¯LSA(SVD)ã®ã»ãããã¾ããã£ã¦ããã¨èããããã
UCI measure, UMass measureã¯LDAã«å¯¾ãã¦ææ¡ããã¦ãããã®ã§ããã
ãå
è«æã§ã¯ä½¿ããã¦ãããLSAã«å¯¾ãã¦ã¯ç°ãªã£ãããå°ããsmoothing factor(æ¬è«æã§ã¯)ãé©åã ã£ãã
ææ³
LSAã¯LDAã«æ¯ã¹ã¦å¤ãææ³ã§ãããè¿å¹´ã®è«æã§ããã¾ãè¦ãããªãã®ã§LDAã®ã»ããLSAãããããã¨æã£ã¦ããããå¿
ãããããã§ãªãã¨ãããã¨ãããã£ãã
LSA(SVD)ã®ã»ããéè² å¶ç´ããªãããLSA(NMF)ããã精度çã«ãã¾ããããããªã®ã¯ç´æçã«ããããªãã§ããªãããLDAãLSAã«è² ããçç±ã¯ãã¾ãããããããªãâ¦â¦ã
ãããã¯ã¢ãã«ã®è©ä¾¡ææ¨ã¨ãã¦perplexityãå¿ç¨ã§ã¯ãªãããããã¯ã®ã³ãã¼ã¬ã³ã¹ãè©ä¾¡ããã¨ããè¦ç¹ããããã¨ãåãã¦ç¥ãã¾ããã