ãããããã¨è¦æããªã人ã¯Twitterã使ãã®ã¯é£ãã
- Twitterã«ãããããæ¤åºææ³ãè«ããç 究ãï¼ã¤ãã«EMNLP2011ã«åºã¦ããã®ã§ç´¹ä»ãã¾ãï¼
- è«æï¼Rumor has it: Identifying Misinformation in Microblogs[Qazvinian et al., 2011]
- Twitterä¸ã®ããã«é¢ããèå³æ·±ãçµ±è¨æ å ±ãå¹¾ã¤ãå«ã¾ãã¦ããã®ã§ï¼èå³ã®ããæ¹ã¯ä¸èªãããã¨è¯ããã¨æãã¾ãï¼
æ¦è¦
- åã¨åã«é¢é£ãããã¤ã¼ããæ¤åºããã¨åæã«ï¼ãã®åã®ä¿¡é ¼åº¦ãæ¨å®
- æ§ã ãªç¹å¾´éãç¨ãã¦å®é¨
- ãã¤ã¼ãã®æé¢ã使ã£ã¦åé¡å¨ãä½ãã ãã§ï¼é«ã精度ãå®ç¾å¯è½ï¼
- ãã ãï¼ã¢ããã¼ãããããã¤ã¼ããæ師ãã¼ã¿ã¨ãã¦ä½¿ç¨
èæ¯
- ãã¤ã¯ãããã°ä¸ã§åã¯æ¥éã«åºã¾ã
- ããã誤æ å ±ã¯ï¼ä¼æ¥ã«ã¨ã£ã¦å¤§ããªé害ã¨ãªãããã®ã§èªåã§ç¹å®ããã
- ãã®ç 究ã§ã¯ï¼ä»¥ä¸ã®æé ã§ããã誤æ
å ±ãæ¤åºãã
- ç¹å®ã®åã«é¢ãã¦è¨åãã¦ãããã¤ã¼ããç¶²ç¾ çã«åå¾ [Rumor Retrieval]
- åãã©ã®ãããã®å²åã®äººãä¿¡ãã¦ãããï¼åã®ä¿¡é ¼åº¦ï¼ãæ¨å® [Belief classification]
åé¡è¨å®ï¼ææ³
ã¿ã¹ã¯ï¼ï¼Rumor Retrieval
- 誤æ å ±ã»ãããå«ããã¤ã¼ããåå®
- é«ãpresicion/recallçãæ±ãããã
- ç¹å®ã®åã«é¢ãã¦ã®ãã¤ã¼ã[presicion]ãç¶²ç¾ çã«[recall]åå¾ããããã
- æ¨æºçãªIRææ³ã§ã¯ä¸åå
ã¿ã¹ã¯ï¼ï¼Belief Classification
- ã¿ã¹ã¯ï¼ã§éãããã¤ã¼ããï¼åã®æ¯æ度ã«å¿ãã¦åé¡
- åãä¿¡ãã¦ãããã¤ã¼ã
- åã«å¯¾ãçåãåãã¦ãããã¤ã¼ã
- ãåãã¨ããæ£è§£ãææ§ãªãã®ã«å¯¾ããè©å¤åæ
- ãã¡ããææ³ã工夫ããå¿ è¦ããã
å©ç¨ãããã¼ã¿
- Twitter API + æ£è¦è¡¨ç¾(Regexp)ã§åã«é¢é£ãããã¤ã¼ãã網ç¾
çã«åå¾
- æä½æ¥ã§é«recallã¨ãªãæ£è¦è¡¨ç¾(Regexp)ãè¨è¿°
- æ師ãã¼ã¿ãä½æããããï¼ä¸ã§éãããã¤ã¼ããã¢ããã¼ã (10400tweets)
ææ³
- ã¿ã¹ã¯ï¼ã»ï¼å
±ã«Bayes Classifierã«ãã尤度æ大å
- L1-regularized log-linear model [Andrew and Gao, 2007] + QWL-QN [Gao et al., 2007]
- ç¨ããç¹å¾´éãè²ã å¤åããï¼å®é¨ãè¡ã
Content-based Features
- åèªæ å ± [TXT1 : unigram] [TXT2 : bigram]
- åè©æ å ± (+HASHTAG/URL) [POS1 : unigram] [POS2 : bigram]
Network-based Features
- RTããå´ã®ã¦ã¼ã¶ã¼ã¯ï¼åã«å¯¾ãã¦PositiveãNegativeãã¨ããæ å ±
- RTãããå´ã®ã¦ã¼ã¶ã¼ã¯ï¼åã«å¯¾ãã¦PositiveãNegativeãã¨ããæ å ±
Twitter Specific Memes
- Hashtag
- URL [URL1 : unigram] [URL2 : bigram]
å®é¨çµæ
- Rumor Retrieval / Belief Classificationå
±ã«ï¼Content-based Featuresãé«æ§è½
- Få¤ : ç´95% (Rumor Retrieval) / 93.2% (Belief Classification)
- å ¨ç¹å¾´ãå ¥ãã¦å®é¨ããå ´åã大ä½åãçµæ
- æ師ãã¼ã¿ã®æ°ã«å¿ãã¦Presicionãã©ã®ããã«å¤åããããå®é¨ (Figure 2)
- æ師ãã¼ã¿ãå ¨ããªã(æ°è¦ã®ããæ¤åº)å ´åã¯ï¼Presicionã¯ç´60%
é¢é£ç 究
åï¼ããã»èª¤æ å ±å«ãï¼ã®æ¤ç¥ã¨åæ
- ãã¤ã¯ãããã°ä¸ã®åã®åæ [Ratkiewicz et al.,2010]
- å¼ç¨ãç¨ãããããä¸ã®åã®åå® [Leskovec et al., 2009]
- "Truthy"ã·ã¹ãã ï¼èª¤æ å ±ãå«ãTwitterä¸ã®æ¿æ²»ãã¿ã®åå® [Ratkiewicz et al.,2010]
- 2010å¹´ã®ããªå°éæã®Twitterã¦ã¼ã¶ã¼ååã®åæ [Mendoza et al., 2010]
- RTãããã¯ã¼ã¯ãããã¸ã¼ããï¼ãã¥ã¼ã¹ã¨åã®æ å ±ä¼éãã¿ã¼ã³ã®éããåæ
è©å¤åæ
- æ©æ¢°å¦ç¿ææ³ã«ããæ ç»è©å¤åæ [Pang et al., 2002]
- Usenetã§ã®ã¦ã¼ã¶ã¼æ¥µæ§åæ [Hassan et al.,2010]
- æ師ãããã«ã³ãã¢ãã«ï¼POS-taggerï¼ä¾åãã¿ã¼ã³ã使ç¨
- ãã¥ã¼ã¹ãããã°è¨äºã®è©å¤ã¹ã³ã¢æ¨æ¸¬ [Godbole et al., 2007]
- èªåP/N wordæ¤åº
- è©å¤åæãµã¼ãã¤[Pang and Lee, 2008]
- ãã¼ã åå® [Leskovec et al., 2009]
Twitterãã¼ã¿ãã¤ãã³ã°
- NLP. information diffusionã«é¢é£ããTwitterãã¼ã¿ãç¨ããç 究 [Bifet and Frank. 2010]
- è©å¤åæç¨ã®ã³ã¼ãã¹ä½æ [Pak and Paroubek, 2010]