æ°ã¯ã¦ãªããã¯ãã¼ã¯ã§ã使ããã¦ãComplement Naive Bayesã解説ããã
ãæ°ã¯ã¦ãæ£å¼ãªãªã¼ã¹è¨å¿µã¨ãããã¨ã§ããããªãªã¼ã¹ããä½é±éãçµã£ã¡ãã£ããã©ã
ãæ°ã¯ã¦ãªããã¯ãã¼ã¯ã§ã¯ããã¯ãã¼ã¯ã¨ã³ããªãã«ãã´ãªã¸ã¨èªåã§åé¡ãã¦ãããããã®ã«ãã´ãªåé¡ã«ä½¿ããã¦ããã¢ã«ã´ãªãºã ã¯Complement Naive Bayesらしいãä»æ¥ã¯ãã®ã¢ã«ã´ãªãºã ã«ã¤ãã¦ç´¹ä»ãã¦ã¿ãã
ãComplement Naive Bayesã¯2003å¹´ã®ICMLã§J. Rennieããææ¡ããææ³ã§ãããICMLã¨ããã®ã¯ãæ©æ¢°å¦ç¿ã«é¢ããï¼ãã¶ãï¼æé£é¢ã®å¦ä¼ã§ãæ¡æçã¯ããæ°å¹´ã¯30%ãåã£ã¦ããã2003ã¯119/371ã§ã32.1%ã®æ¡æçã ã£ãããã ã
ãComplement Naive Bayesã®ä½ç½®ã¥ãã¯
- å®è£ ãç°¡å
- å¦ç¿æéãçã
- æ§è½ããããããã
ãã¨ããæãã§ã2003年段éã«ãã£ã¦ãã絶対çãªæ§è½ã§ã¯SVMã«è² ãã¦ãããããããå¦ç¿ãæ©ãã¨ããã®ã¯å®ã¢ããªã±ã¼ã·ã§ã³ã§ã¯é常ã«éè¦ã§ãSVMã¯æã®å®è£
ã ã¨å¦ç¿ã«ããªãæéãããããã¨ãããããã¼ã¿ã大è¦æ¨¡ã«ãªã£ã¦ããã¨ãå ´åã«ãã£ã¦ã¯ä½¿ããªããç§ãåè·ã§ãã¼ã¿ãä½ã£ã¦ãã¨ãã«ãSVMã§ã¯ãã¼ã¿ãå¢ããã¦ããã¨å¦ç¿æéãå¹³æ°ã§æ°æéã¨ãã«ãªã£ã¦ãç¦ã£ããã¨ãããã
ããããªãComplement Naive Bayesã説æãã¦ããã¶ããããããããªãã¨æãã®ã§ãã¾ãNaive Bayesã説æããããããComplement Naive BayesãNaive Bayesã¨ã©ãéãã®ãã説æããããæåã¯çé¢ç®ã«å
¨é¨èª¬æãããã¨æã£ããã ãã©ãã¾ããã«èª¬æããã¨ã©ããã¦ããã¤ãºã®å®çã¨ããåºããããå¾ãªãããããã«ããã®ã§ãä»åã¯å¦¥åãã¦å°ãä¸æ£ç¢ºãªèª¬æã«ãã¦ãã¾ã£ãããã£ã¨æ£ç¢ºãªæ
å ±ãç¥ããã人ã¯ãæç§æ¸ãèªããããã¡ã®äººã«è³ªåãã¦ãããããã¯æ±é·ºã®æWikiã®単純ベイズの項ãåç
§ãã¦ãã
ãNaive Bayesã§ã¯ï¼ã¨ããããã»ã¼å
¨ã¦ã®æ©æ¢°å¦ç¿ã¢ã«ã´ãªãºã ã§ã¯ï¼ãå¦çã¯å¦ç¿ã¨ãã¹ãã®2ã¤ã®ãã§ã¼ãºã«åããããããã®å ´åãå¦ç¿ã¨ããã®ã¯ããããããã«ãã´ãªåãããããã¼ã¿ã使ã£ã¦ä¸æºåãè¡ããã§ã¼ãºã§ããã¹ãã¨ããã®ã¯(ã«ãã´ãªã)æªç¥ã®ææ¸ã«å¯¾ãã¦ã«ãã´ãªãæ¨å®ãããã§ã¼ãºã§ããã
Naive Bayesã®å¦ç¿ãã§ã¼ãº
ãå¦ç¿ã®ãã§ã¼ãºã§ã¯ãã«ãã´ãªæ¯ã«ãããããææ¸ãç¨æãã¦ãããããã使ã£ã¦ã«ãã´ãªæ¯ã«åèªã®åºç¾ç¢ºçãå¦ç¿ãã¦ãããåºç¾ç¢ºçãã¨æ¸ãã¨ãªãã ããããããããããã©ãå®éã®ã¨ãããå ¨ä½ä¸ã§ãã®åèªãä½ååºç¾ãããããã®å²åãè¨é²ãã¦ããã ãã§ããããã¨ãã°ãã«ãã´ãªã社ä¼ãã«å ¨é¨ã§150åã®åèªããã£ãå ´åã«ããã®ãã¡3åãã社é·ãã¨ããåèªã ã£ãã¨ãããããã®å ´åãã社é·ãã¨ããåèªã®åºç¾ç¢ºçã¯3 ÷ 150 à 100 = 2%ã¨ãªããï¼å¦ç¿ãã¼ã¿ã«å«ã¾ããªãåèªã®äºãèæ ®ãããããã¨ãã£ã¨è¤éã«ãªããã©ãããã¯ä»åã¯çç¥ãããï¼
Naive Bayesã®ãã¹ããã§ã¼ãº
ããã¹ããã§ã¼ãºã§ã¯ã«ãã´ãªãæ¨å®ãããææ¸ã«å¯¾ãã¦ãããããã®ã«ãã´ãªã§ã®åèªã®åºç¾ç¢ºçã使ã£ã¦ãææ¸ã®åºç¾ç¢ºçãè¨ç®ããããã£ã¨ãåºç¾ç¢ºçãé«ãã«ãã´ãªããåé¡çµæã¨ãã¦è¿ããææ¸ã®åºç¾ç¢ºçã ãªãã¦æ½è±¡çãªãã®ã¯æ£ããæ±ãããããªãã®ã§ãæç« ä¸ã®åèªã®åºç¾ç¢ºçã®ç©ã§è¿ä¼¼ããã
ãä¾ã¨ãã¦ããç§ã¯ç¤¾é·ãã¨ããæç« ãã«ãã´ãªã社ä¼ãã¨ã«ãã´ãªããããããã®ã©ã¡ãã«åé¡ãããããèããããã®æç« ããç§/ã¯/社é·ãã®3ã¤ã®åèªã«åºåãããã¨ããã«ãã´ãªã社ä¼ãã§ã®ããããã®åèªã®åºç¾çã1%, 5%, 2%ã§ãã«ãã´ãªããããããã§ã®ããããã®åèªã®åºç¾çã1%, 4%, 3%ã ã£ãã¨ããããããã¨ããç§ã¯ç¤¾é·ãã®åºç¾ç¢ºçã¯ãã«ãã´ãªã社ä¼ãã§ã¯
0.01Ã0.05Ã0.02Ã100=0.001%
ã¨ãªããã¾ããã«ãã´ãªããããããã§ã¯
0.01Ã0.04Ã0.03Ã100=0.0012%
ãã¨ãªãã0.001ã¨0.0012ãæ¯ã¹ãããããããã§ã®æ°å¤ã®æ¹ã大ãããã¨ãåããããã®ããããç§ã¯ç¤¾é·ãã¯ã«ãã´ãªããããããã«åé¡ãããäºã«ãªãã
Naive Bayesã®ç¹å¾´
ãææ¸ã®åºç¾ç¢ºçãåèªã®åºç¾ç¢ºçã®ç©ã§è¿ä¼¼ãã¦ãã¾ãã¨ããã®ã¯ãããã£ã¨æµããããã©ã以ä¸ã®ããã«ç¸å½ã«å¤§èãªè¿ä¼¼ã§ããã
- èªé ãã¾ã£ããèæ ®ããªã
- åèªéã®ç¸é¢é¢ä¿ãã¾ã£ããèæ ®ããªã
ãã¤ã¾ããææ¸ãåãªãåèªã®éåã¨ãã¦æ±ã£ã¦ããäºã«ãªãããã®å¤§èãªè¿ä¼¼ããNaive Bayesãåç´(Naive)ã¨å¼ã°ããæ以ã§ãããã¾ããSVMãç·å½¢SVMã ã¨ããã辺ã®æ¡ä»¶ã¯å¤ãããªãã£ããããå ´åããããã ãã©ããããæ·±ãçªã£è¾¼ãã¨ã«ã¼ãã«ãã©ãã¨ãç´ æ§ã®ãªã¼ãã¼ã©ãããã©ãã¨ã話ãããããããªãã®ã§çç¥ããã
ãNaive Bayesã¯ãã¾ãã«ãåç´éãã¦ãåè¦ã§ã¯ã¡ãã£ã¨ã©ããªã®ã¨æãã¦ãã¾ãããå®éã«ã¯ææ¸åé¡ã§ã¯ããªããã¾ãããäºãç¥ããã¦ãããä¾ãã°ã¹ãã ãã£ã«ã¿ã§ä¸ææãã¤ã¸ã¢ã³ãã£ã«ã¿ã¨ãããã®ããããæµè¡ã£ãããã©ãããã¯Naive Bayesï¼ããNaive Bayesã®ã¡ãã£ã¨ããå¤ç¨®ï¼ã§ããã
ãããªãã¯ããã£ãã¨ãããããå æ¸ãªèª¬æã ããããã¾ã§ãNaive Bayesã®è§£èª¬ã«ãªãã
Complement Naive Bayesã®ç¹å¾´
ã次ã«ãComplement Naive Bayesã説æãããComplementã¨ããã®ã¯è£éåã®äºã§ããããéåã«å«ã¾ããªãè¦ç´ ã®éã¾ããã¨ããæå³ã§ãããNaive Bayesã§ã¯ãã«ãã´ãªæ¯ã«ããã®ã«ãã´ãªã«å±ããææ¸ãã使ã£ã¦å¦ç¿ãã¦ããããComplement Naive Bayesã§ã¯ãã«ãã´ãªæ¯ã«ããã®ã«ãã´ãªã«å±ããªãææ¸ãã使ã£ã¦å¦ç¿ãè¡ãããå±ããªãææ¸ãã使ã£ã¦å¦ç¿ãããã®ã§ãã«ãã´ãªãæ¨å®ããéã«ã¯ããå±ããªã確çããæãä½ãã«ãã´ãªãå²ãå½ã¦ãäºã«ãªããããã ãã§ããªãã¨ãããã¨ã§ããããã«ãã´ãªæ¨å®ç²¾åº¦ãã¢ããããã®ã§ãã
ããã®èª¬æã§ãããªãç´å¾ãã¦ããã人ã¯ãã¶ã10人ã«1人ãããããããªãã¨æãã®ã§ãããå°ã詳ãã説æããããããã¤ãã®ã«ãã´ãªã«åé¡ãããã¨ããåé¡ã§ã¯ãé常ãããã«ãã´ãªã«å±ããææ¸ã®æ°ã¯å¤§å¹
ã«éããã¨ãå¤ããä¾ãã°ãç¾æ®µéã®ã¯ã¦ãã®ã¨ã³ããªã§ã¯ããã³ã³ãã¥ã¼ã¿ã»ITãã«ãã´ãªã®è¨äºã¯ãããããçæ´»ã»äººçãã«ãã´ãªã®è¨äºããã2å以ä¸å¤ãã ããããããªãã¨ããã³ã³ãã¥ã¼ã¿ã»ITãã«ãã´ãªã®æ¹ãè¨äºãå¤ããªãåã ããã©ããã¦ãææ¸ã®åºç¾ç¢ºç(=ææ¸ä¸ã®åèªã®åºç¾ç¢ºç)ãé«ããªã£ã¦ãã¾ãäºãå¤ããªããã«ãã´ãªã«å«ã¾ããææ¸æ°ã«é¢ãã¦ã¯å¥éè£æ£é
ããããããã®ãããªãã©ããã¯ãã¾ããããããªãã
ããã©ãããæ¸ããããã«ããããã«ãã´ãªã«å«ã¾ãããææ¸ã§ã¯ãªãããã®ãã«ãã´ãªã«å«ã¾ããªããææ¸ã使ã£ã¦å¦ç¿ãè¡ããå«ã¾ããªãææ¸ã使ã£ãæ¹ããã¼ã¿éã®ã°ãã¤ããæ¸ããã¨ããæ§åã表ç¾ããã®ãå³1ã§ããã
ãé©å½ãªå³ãªã®ã§ã¡ãã£ã¨ãããã«ããããã«ãã´ãªA, B, Cã«å±ããææ¸ããããã20, 20, 60ææ¸ã§ãã£ãå ´åã«ãNaive Bayesã¨Complement Naive Bayesã§ã©ã®ãããªãã¼ã¿éã®éããããããã¨ãããã®ã示ãããã®ã§ãããComplement Naive Bayesã®æ¹ã使ã£ã¦ãããã¼ã¿éãå¤ãè¦ããããããã¯ã«ãã´ãªã«å«ã¾ããªãææ¸ã使ã£ã¦å¦ç¿ãã¦ããããã§ãã¼ã¿ã«éè¤ãçºçãã¦ãããããªã®ã§ã絶対çãªåéã¯éè¦ã§ã¯ãªããéè¦ãªã®ã¯ãNaive Bayesã§ã¯ãã¼ã¿éã®ã°ãã¤ããæ大ã§3åï¼20ææ¸ vs. 60ææ¸ï¼ã§ãã£ãã®ããComplement Naive Bayesã§ã¯2åï¼40ææ¸ vs. 80ææ¸ï¼ã«æãããã¦ãããã¨ããç¹ã§ããã
ããã®ããã«ããã©ãããæãããã¨ã§æ§è½ãä¸ãããã®ãªã®ã§ãã«ãã´ãªãæ¨å®ãããã¨ãããããªå¤å¤åé¡åé¡ã«ã¯æå¹ã§ããããã¹ãã ãããã§ãªãããåé¡ãããããªäºå¤åé¡åé¡ã«ã¯ã¾ã£ããæå³ããªããã¾ããå¤å¤åé¡åé¡ã§ãã£ã¦ããã«ãã´ãªéã§ã®ãã¼ã¿éã®ãã©ãããå°ãªãå ´åã¯ããã¾ãå¹æããªãã
ãè«æã§ã¯ä»ã«ãéã¿ãã¯ãã«ã«å¯¾ãã¦
- é »åºåèªã«å¼ããããéããªãããã«åºç¾åæ°ã®å¯¾æ°ãåã
- ä¸è¬çãªåèªã®å½±é¿ãæ¸ããããã«åèªãåºç¾ããææ¸æ°ã®å¯¾æ°ã§å²ã
- ææ¸é·ã®å½±é¿ãæ¸ããããã«ææ¸ã«åºã¦ããåèªç·æ°ã§åºç¾åæ°ãè£æ£ãã
- è¤æ°èªã§ä¸ã¤ã®åèªãå½¢æãããããªãã®ï¼New Yorkã¨ãï¼ã®å½±é¿ãæ¸ããããã«ãæ£è¦åãè¡ã
ãã¨ãããããã¨ãã¥ã¼ãªã¹ãã£ã¯ã¹ãªææ³ãå
¥ãã¦æ¹è¯ãè¡ã£ã¦ããããããã®æ¹è¯ã®ãã¡ã©ããæ§è½æ¹åã«å½±é¿ãããã®ã¯æ¸ããã¦ããªãããããããããªãããã¨ãããNaive Bayesãããã¾ã§ããããã¨ããã¨æ§è½ããããªã£ãããã¨ããã話ã
ãã¾ã¨ããã¨ãComplement Naive Bayesã¨ããã®ã¯ã«ãã´ãªæ¨å®ãªã©å¤å¤åé¡åé¡ã®å ´åã«ãè£éåã使ã£ã¦å¦ç¿ãããã¨ã§ãã¼ã¿éã®ãã©ãããå°ãæããæ¹æ³ã§ãããã¾ããè«æã§ã¯ãä»ã«ãããã¤ããã¥ã¼ãªã¹ãã£ã¯ã¹ãªæ¹è¯ãå ããå ´åãSVMã¨å¤§å·®ãªãæ§è½ãåºããäºãå®è¨¼ãã¦ãããï¼ã¯ã¦ã¶åé¡ããã®ãã¥ã¼ãªã¹ãã£ã¯ã¹ãã©ãã¾ã§åç¾ãã¦ããã®ãã¯ããããªããã©ãï¼
ããã®ä»ãæ¸ããªãã£ããã©ãå®ç¨ããä¸ã§ã¯ä»¥ä¸ã®2ç¹ãé常ã«å¤§äºããªã¨æãããä»åã®ã¨ã³ããªã®ã¹ã³ã¼ãããã¯å¤ããã¨ããããæ£ç´ãããªã¨ããã¾ã§æ¸ãã¦ãã¨å¹´å
ã«æ¸ãä¸ãããªãããã ã£ãã®ã§ãä»åã¯çç¥ããã
- ææ¸ã®åèªã¸ã®åºåãæ¹
- ã¹ã ã¼ã¸ã³ã°
ããã¨ããã£ã±ãä¸æ£ç¢ºãªæ
å ±ã ãæ¸ãã®ã¯è¾ãã®ã§ãæå¾ã«朱鷺の杜Wikiの単純ベイズの項ã¨Wikipediaの単純ベイズ分類器の項ã¸ãªã³ã¯ãå¼µã£ã¦ãããèå³ãæã£ãæ¹ã¯ãã²ãã¡ãã§æ£ç¢ºãªç¥èãåå¼·ãã¦ãã ããã
ã追è¨ï¼朱鷺の杜Wikiのcomplement naive Bayesの項ã«ããªã³ã¯ãå¼µã£ã¦ããã¾ãã