[æ©æ¢°å¦ç¿] A few useful things to know about machine learning
ã¿ã¤ãã«ã®è«æã¯Communication of the ACM, 2012ã®ã¬ãã¥ã¼è¨äº
ãã©ãããã¼ã¸ã§ã³ã¯ä¸ã®ãªã³ã¯ããèªããã
http://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf
å²ã¨é¢ç½ãã£ãã®ã§ããã¤ãå 容ãç´¹ä»
æ¦è¦
æ©æ¢°å¦ç¿ã·ã¹ãã ã¯ãã¼ã¿ããèªåã§ã¿ã¹ã¯(ã¹ãã ãã£ã«ã¿ãã¬ã³ã¡ã³ããªã©)ãã©ããã£ã¦å®è¡ããããè¦åºããã¨ãã§ãã¾ãã
ããããªããæ©æ¢°å¦ç¿ã·ã¹ãã ãæåãããã«ã¯æç§æ¸ãèªãã ã ãã§ã¯ãªããªãè¦ã¤ãã¥ãããç´æäºã¨ãããã£ã¦ãæãããã«ã¯è¡ããªããã¨ãå¤ãã
æ¬æç®ã§ã¯æ©æ¢°å¦ç¿ã®ç 究è ããã³å®åã«æºãã人éãç¥ã£ã¦ããã¹ãã§ããäºæã12åã«è¦ç´ãã¦ãã¾ãã
ä¸è¬åãéè¦
æ©æ¢°å¦ç¿ã®ã´ã¼ã«ã¯è¨ç·´ãã¼ã¿ã«ã¯ãªããã¼ã¿ã«å¯¾ãã¦ãä¸è¬åãã¦æ¨å®ãã§ããã¨ããç¹ã«ãªãã¾ããåã«è¨ç·´ãã¼ã¿ã®ã¿åé¡ã§ããã°ãããªãã°äºä¾ã丸æè¨ãããã¨ã«ãã£ã¦å¯è½ã¨ãªãã¾ããããããã®å ´åã¯è¨ç·´ãã¼ã¿ã«ãªããã¼ã¿ã«å¯¾ãã¦ã¯ã©ã³ãã ãªæ§è½ããå¾ãããªãã
ãã®ãããèå¥æ©ã®å©ç¨æã«ã¯è¨ç·´ãã¼ã¿ã¨ãã¹ããã¼ã¿ãåé¢ãããã¨ãéè¦ã¨ãªãã¾ããæ©æ¢°å¦ç¿ã®åæã§ã¯ãã®äºã¤ãåé¢ãããã¨ã¯ãã¾ããããã¦ãã¾ããã§ãããããã¯å½æã¯ç·å½¢å¤å¥ã¿ãããªè¡¨ç¾åã®ä½ãå¦ç¿æ©ãå©ç¨ãã¦ããã®ã§åé¡ã«ã¯ãªããªãã£ã(ä¾ãã°çµ±è¨ã§ããã§ã¦ãã身é·ã«å¯¾ãã¦ä½éãäºæ¸¬ããåå帰åæã¨ãã§ããã°ãã¼ã¿ã«ãã£ãããããã¨ãã¦ãããã¾ã§ãã¼ã¿ã«å¯¾ãã¦æé©åã§ããªã)
ãã¼ã¿ã ãã§ã¯ååã§ã¯ãªã
ä¾ãã°äºå¤åé¡åé¡ã«ããã¦ã100次å ã®2å¤ã®ç¹å¾´éããã¤ãã¼ã¿ã100ä¸ç¹åå¨ããã¨ãã¦ãåãããå¤ã§ãã2^100ã«å¯¾ãã¦2^100 - 10^6ç¹ã«é¢ãããã¼ã¿ã®æ å ±ãæãã¦ããããããããã«å¯¾ãã¦ã®ã©ãã«ãã©ã³ãã ã«æ¯ããã¦ããã¨ãããã©ããªæ©æ¢°å¦ç¿ã¢ã«ã´ãªãºã ã使ã£ãã¨ãã¦ãã©ã³ãã ãªæ¨å®ããããããªããã¨ã¯ãªã(ãããã"no free lunch"å®ç)
ãããããã¼ã¿ã¯å®éã«ã¯ã©ã³ãã ã«ãã¹ã¦ã®å¯è½ãªæ°å¦çé¢æ°ãããµã³ããªã³ã°ãããé¢æ°ããçæããã¦ããããã§ã¯ãªããå®éã¯é¡ä¼¼ãããã¼ã¿ã¯é¡ä¼¼ããã©ãã«ãæã¤ãªã©ã®æ§è³ªãå¤æ°éã®å¾å±é¢ä¿ãåå¨ããã
ãã®ãã解ãããåé¡ã®ãã¡ã¤ã³ã®ç¥èããã¾ãå©ç¨ãããã¨ã«ããé©åãªå¦ç¿æ¹æ³ãé¸æãã¦ãããã¨ãã§ããã
å¦ç¿ã®éã«ãã¼ã¿ã«é¢ããç¥èããããã¨ã¯é©ãã¹ããã¨ã§ã¯ãªããæ©æ¢°å¦ç¿ã¯éæ³ã§ã¯ãªããç¡ããæãå¾ããã¨ã¯ã§ããªãã
æ©æ¢°å¦ç¿ã¯è¾²æ¥ã«ä¼¼ã¦ãã¦ã農家ã種ã¨è¥æããããã¨ã«ããä½ç©ãå¾ãã®ã¨åæ§ã«ãæ©æ¢°å¦ç¿ã¢ã«ã´ãªãºã ã¯ãã¼ã¿ã¨ããã«å¯¾ããç¥èãåããããã¨ã«ãã£ã¦å¦ç¿å¨ãä½ããã
éå¦ç¿ã«ã¤ãã¦
åç´ãªåé¡å¨ã¯ãã¼ã¿ãå°ãªãã¦ãã大ããäºæ¸¬ãå¤ããªããä¸æ¹è¤éãªåé¡å¨ã¯ãã¼ã¿ã大ãããªãã¨ãäºæ¸¬ã大ããå¤ããã¨ãå¤ãã
Feature Engineeringãéµã«ãªã
ããã¤ãã®æ©æ¢°å¦ç¿ããã¸ã§ã¯ãã¯æåããä»ã¯å¤±æããããã®äºã¤ãåããä¸çªéè¦ãªç¹ã¯ã©ãããç¹å¾´éã使ã£ããã«ãããå¤ãã®ç¬ç«ãã¤äºæ¸¬ãããå¤ã¨ç¸é¢ãããå¤æ°ãããã°å¦ç¿ã¯ãã¾ããããä¸æ¹ã§è¤éã«ããã¿ãã£ãå¤æ°ããå¦ç¿ããã®ã¯é常ã«é£ãããããã¦ããã¦ãã®å ´åçã®ãã¼ã¿ã¯è¤éãªã®ã§ããããã©ãããå¤æ°ãåãåºãããã«ã®ã¨ãªã£ã¦ããã
é«æ¬¡å ã«ããã¦ã¯ç´æã¯ãã¾ãåããªã
ãã®ãããä¸è¦ç¹å¾´éãå¢ããã°å¢ããã»ã©æªãã¦ããã®ç¹å¾´éã使ãããªããªãã ãã ã¨æããããå®éã«ã¯ç²¾åº¦ãæªããªã£ããããã
ãã¼ã¿æ°ãå¢ããã°è³¢ãã¢ã«ã´ãªãºã ã使ãããããããªã
ããã¦ãã®å ´åãã¼ã¿ãå¢ããã°å¢ããã»ã©ç²¾åº¦ã¯é«ããªããããã«ç²¾åº¦ãé«ãã¨ããã¦ããè¤éãªå¦ç¿æ©ã¯ãã¼ã¿ãå¢ããã¨ãã«ã¹ã±ã¼ã«ããªããã¨ãå¤ãã®ã§ãã¾ãã¯åç´ãªå¦ç¿å¨ã試ãã¦ã¿ããã¨ãå¿ è¦ã
ããã«æ©æ¢°å¦ç¿ã®è«æã§ã¯ç²¾åº¦ãè¨ç®éããè©ä¾¡ãããªãããå®åé¡ã§ã¯äººéãç解ããããå¦ç¿çµæãæ示ãããã¨ã«ãããæ©æ¢°å¦ç¿ã®å°é家ã¨ãã¡ã¤ã³ã®å°é家ãåæ¥ãããããªãã
è¤æ°ã®ã¢ãã«ãå¦ç¿ãã
å¤ãã®ç 究ã§ã¯èªåã®å¥½ããªå¦ç¿å¨ãä¸ã¤é¸ãã§ããã®åªä½æ§ãè°è«ãããã¨ãå¤ãããå®éã«ã¯baggingã®ãããªè¤æ°ã®å¦ç¿å¨ãä½æãã¦ãããã§å¤æ°æ±ºãã¨ã£ããããæ¹ãããã