æ¬è¨äºR Advent Calendar 201619æ¥ç®ã®è¨äºã«ãªãã¾ãã ####ã¾ããã çæ§ã¯ä»ã§ãç¾å½¹ã§randomforest使ã£ã¦ãã¾ããï¼ xgboost, lightGBM, mxnetãªã©ãæ°æã次ã ã«ç»å ´ãã¦ãããä¸ææã«æ¯ã¹ãã¨ãã®åå¨æã¯èããªã£ã¦ãã¾ã£ãããªã¨è¨ãå°è±¡ãããã¾ãã ã¨ã¯è¨ãããã©ã¡ã¼ã¿ãã¥ã¼ãã³ã°ãæ¯è¼ç容æãçµæã解éãããããªã©ã®å©ç¹ãããç¾å¨ã«ããã¦ãååã«ä¸ç·ç´ã ã¨èãã¦ãã¾ãã ####æè¿ã®randomforestå®è¡ããã±ã¼ã¸ æè¿ã®Rãéããrandomforest packageè²ã ã«ã¤ãã¦ã¯ã Rã§ã©ã³ãã ãã©ã¬ã¹ããããªãRboristãrangerã "ranger: A Fast Implementation of Random Forests"ã®ã¡ã¢æ¸ã æè¿ã®Rã®ã©ã³ãã ãã©ã¬ã¹ãããã±ã¼ã¸ -ranger/Rbor
Rã§æ©æ¢°å¦ç¿ï¼ã©ã³ãã ãã©ã¬ã¹ãã使ã£ã¦ãæ°è¦ã¦ã¼ã¶ã¼ã®å±æ§ãã¼ã¿ããã1ã«æå¾ã®ã¢ã¯ãã£ãã¦ã¼ã¶ã¼æ°ãäºæ¸¬ãã ååãRã§æ©æ¢°å¦ç¿ï¼ãµãã¼ããã¯ã¿ã¼ãã·ã³ï¼SVNï¼ã使ã£ã¦ãæ°è¦ã¦ã¼ã¶ã¼ã®å±æ§ãã¼ã¿ããã1ã«æå¾ã®ã¢ã¯ãã£ãã¦ã¼ã¶ã¼æ°ãäºæ¸¬ããã§ããµãã¼ããã¯ã¿ã¼ãã·ã³ã使ãã æ°è¦ç»é²ã¦ã¼ã¶ã¼500人ã®å±æ§ãã¼ã¿ãããæªç¥ãã¼ã¿ããã¦ã¯ãã¦ã1ã«æå¾ã¢ã¯ãã£ããã©ããäºæ¸¬ãã ã¨ãããé¡ããã£ã¦ã¿ã¾ããã
æ¦è¦ Rã¯å¯¾è©±çã«ãã¼ã¿åæããããªããã¨ã«é©ããããã°ã©ãã³ã°è¨èªã§ãããããã«å ãã¦ãã¼ã¿ã®å¯è¦åãªã©ã®ããã±ã¼ã¸ãå«ããã¼ã¿åæã®ãã¹ã¤ã¼ããã¨è¨ãã¾ãã ãã®è¨äºã§ã¯ã¾ã Rã«è§¦ãããã¨ããªãã¦ã¼ã¶ã¼ããRã®åºæ¬ã解説ãã¤ã¤ã決å®æ¨ããã³ã©ã³ãã ãã©ã¬ã¹ãã¨å¼ã°ããã¢ã«ã´ãªãºã ãç¨ããäºæ¸¬ã¢ãã«ãä½æããæé ã¾ã§ããã¥ã¼ããªã¢ã«å½¢å¼ã§ã«ãã¼ãã¦ããã¾ãã ãã®ãã¥ã¼ããªã¢ã«ãçµããã¨ãæ§é åãã¼ã¿ã®çµ±è¨ãç¬æã«ç®åºã§ããããã«ãªãããã¤ããã®åæã¢ãã«ãã¤ãã£ã¦äºæ¸¬ããããªããã¨ãã§ããããã«ãªãã¾ãã ç°å¢ OSã¯MacOS X Yosemiteãå©ç¨ãã¾ãããMacã®ä»ã®ãã¼ã¸ã§ã³ãããã³Windowsã§ãã»ã¨ãã©ã®æé ã¯ãã®ã¾ã¾é©ç¨ã§ããã¯ãã§ãã ã¤ã³ã¹ãã¼ã« ä¸è¨URLããã使ãã®OSã«åã£ãRããã¦ãã¼ããã¾ãã https://cran.ism.ac.jp ã¤ã³ã¹ãã¼ã©
ååã¯ãã¿ã¤ã¿ããã¯ã®çåè ãã¼ã¿ã使ã£ã¦ãPythonã§æ±ºå®æ¨ã¨ã©ã³ãã ãã©ã¬ã¹ãã®å®è·µããã¦ã¿ã¾ããã www.randpy.tokyo ä»åã¯ï¼²ã®å®è·µç·¨ã§ãï¼ ä»¥åã«Twitter APIã使ã£ã¦Word Cloudã¨ããã¦ã¿ãã®ã§ããã®æµãã§ãã¤ã¼ããã¼ã¿ã使ã£ã¦ã¿ããã¨æãã¾ãã www.randpy.tokyo 対象ã«ããã®ã¯ã大好ããªã¹ãã¼ãã¯ã´ã³ã®ãäºäººã®ãã¤ã¼ããã¼ã¿ã§ãã ãäºäººã®ãã¤ã¼ãã決å®æ¨ã使ã£ã¦åæããäºæ¸ç°ããã®ãã¤ã¼ãã¨å°æ²¢ããã®ãã¤ã¼ããåé¡ãã¦ã¿ã¾ãã ãã®å¾ãã©ã³ãã ãã©ã¬ã¹ãã使ã£ã¦åããã¼ã¿ãåæããåé¡ã®ç²¾åº¦ãã©ãã»ã©å¤åããã®ãæ¤è¨¼ãã¾ãã 決å®æ¨ãã©ã³ãã ãã©ã¬ã¹ãã®çè«ç·¨ã¯ã以ä¸ãåèã«ãã¦ãã ããã ããªãåãããããæ¸ãã¦ããã¨æãã¾ãã www.randpy.tokyo www.randpy.tokyo æµãã¨ãã¦ã¯ã äºæ¸ç°ããã¨
æºå 決å®æ¨ï¼decision treeï¼åæãããéãã¾ãç®çå¤æ°ã®ç¨®é¡ã¨ã¢ã«ã´ãªãºã ã決å®ããã ã¢ã«ã´ãªãºã CART CHAID ID3 / C4.5 / C5.0 ç®çå¤æ°ã®å ç®çå¤æ°ã®åã«ãã£ã¦æ±ããå¤ãã 質çå¤æ°ï¼2å¤å¤æ°ï¼ï¼åé¡æ¨âç®çå¤æ°ã0/1, T/Fã®å ´åã¯as.factor()ã§factoråã«ãã¼ã¿å¤æãã¦ãã éçå¤æ°ï¼åå¸°æ¨ survivalãªãã¸ã§ã¯ã ï¼çèµ·ã表ã2ã«ã©ã ï¼ CARTã¯ãã¹ã¦å¯¾å¿ãC4.5/C5.0ã¯è³ªçå¤æ°ã®ã¿ ããã§ã¯CARTã¢ã«ã´ãªãºã ã§ããªã¼ã¢ãã«ãçæããrpartã¨ãã©ã³ãã ãã©ã¬ã¹ãrangerãä¸å¿ã«èª¬æããã ãã¼ã¿ã»ããã¨åå¦ç Default of Credit Card Clients Dataset ãã¼ã¿ã»ããã®ä¸»ãªçæç¹ 30000è¡25å¤æ° æåã®åãèå¥åï¼IDï¼âé¤å¤ 3åç®SEX, 4åç®EDUC
ã¿ãªãããPOSãã¼ã¿ã¨ãããã®ããåãã§ããããï¼ POSãã¼ã¿ã¨ã¯ãPoint Of Salesã®é æåãã¨ã£ããåºæ¬çã«å°å£²åºã§ã®é¡§å®¢ã®è³¼è²·ãã¼ã¿ã®ãã¨ãè¨ãã¾ããããã«è©³ããè¨ãã°ã顧客ããã¤ã»ã©ãã§ã»ä½ãã»ããã¤è²·ã£ããã¨ãããã¼ã¿ã§ããã 以åããPOSãã¼ã¿ã¯ããã¾ãã¾ãªå°å£²åºã§åå¾ã»è§£æãããå¨åº«æ°ã®æé©åããååã®ãªã³ã¡ã³ãã¼ã·ã§ã³ãå°å£²åºã§ã®ååã®é ç½®æé©åã顧客ã®æ¥åºäºæ¸¬ãªã©ã«ä½¿ç¨ããã¦ãã¾ããã ä»åã¯ã顧客ã®æ¥åºäºæ¸¬ãè¡ã£ã¦ã¿ããã¨æãã¾ãã 使ç¨ããPOSãã¼ã¿ã«ã¤ãã¦ã§ããã ä¸ç´ã®ãã¼ã¿ãµã¤ã¨ã³ãã£ã¹ãã§ããã°ã»ã¨ãã©ã®äººãèªãã ãã¨ã®ããæ¸ç±ã ãã¼ã¿åæããã»ã¹(é æå²ç·¨ã»ç¦å³¶ ç太æè) http://www.kyoritsu-pub.co.jp/bookdetail/9784320123656 ã«ä»å±ãã¦ãããã¼ã¿ã»ããã使ç¨ãã¦ã¿ã¾ãããã®æ¸ç±ã¯
åºæ¬çã«ç«¶é¦¬ãªãã¦ããã¹ãã§ã¯ãªãã¨ç§ã¯æã£ã¦ãããè´å ã®åãåãå¤ãããã ãå®ããã«æ¯ã¹ãã°ã¾ã ã¾ãã ããããã§ãè³ãéã®20ï½30%ã¯è´å ã«åããããã¨ã«ãªãã*1 ãããä»åã¯ãã¡ãã£ã¨æãç«ã£ã¦ç«¶é¦¬ã®äºæ¸¬ããã£ã¦ã¿ããã¨ã«ããã çç±ã¯é¦¬å¸ã®å®ãã ãç§ã¯ç¾å¨ãè³ééãå°ãªã人éã§ãä¸å©ã«ãªããªãæè³å ãæ¢ãã¦ããã®ã ãã馬å¸ã®ä¸æ100åã¨ããå®ãã¯é åçã«æ ããæ ªã®å ´åã«ã¯ã©ããªå®ãæ ªã§ããæä½è³¼å ¥é¡ã¯æ°ä¸å以ä¸*2ãªã®ã§ãããç¨åº¦ã¾ã¨ã¾ã£ãè³éãå¿ è¦ã«ãªãã ã¾ãã競馬ã«ã¯æè¡ä»å ¥ã®ä½å°ï¼åªå次第ã§åå©ã§ããå¯è½æ§ï¼ãããã ä¾ãã°ãããªä¾ãããã ï¼ï¼ï¼ååããå²ãï¼è±æè³ä¼ç¤¾ãæ¥æ¬ã®ç«¶é¦¬ã§è稼ãããé©ãã®ææ³ - NAVER ã¾ã¨ã å½¼ãã¯çµ±è¨è§£æã«ãã£ã¦ç«¶é¦¬ã§åã£ã¦ããããã®æå¾ãé ãã¦ãããããããããããã¥ã¼ã¹ãåºãã¨ãããã¨ã¯ã解æè ã®è 次第ã§ã¯ç«¶é¦¬ã§åã¦ãå¯è½æ§ã
ããè¨ãã°3å¹´åã«ãããªã¾ã¨ãçã¨ã³ããªãæ¸ããã®ã§ããããã®å 容ã¯ãã®ã¾ã¾ããªãã®é¨åã2å¹´åã«åè¡ããæèã®åæ¡ã«ããªã£ãã¨ãããã¨ã§ãè²ã æãåºæ·±ãã¨ã³ããªã§ãã ãªã®ã§ãããã»ã»ã»ãã®3å¹´ã®éã«çµ±è¨å¦ã»æ©æ¢°å¦ç¿ã»ãã¼ã¿ãã¤ãã³ã°ã®è«¸ææ³åã³ãããåãå·»ããã¸ãã¹ãã¼ãºã«ã¯æ§ã ãªé²æ©ãããããããããã®å 容ã«ãé³è åãç®ç«ã¤ããã«ãªã£ã¦ãã¾ãããã¨ãããã¨ã§ã3å¹´éã®é²æ©ãåæ ãã¦ã¢ãããã¼ãããè¨äºãæ¸ãã¦ã¿ããã¨æãã¾ããååã¯ã10é¸ãã§ããããä»åã¯ã10+2é¸ãã«æ¹ãã¾ããããã®ã©ã¤ã³ãããã¯ä»¥ä¸ã®éãã çµ±è¨å¦çæ¤å®ï¼tæ¤å®ã»ã«ã¤äºä¹æ¤å®ã»ANOVAãªã©ï¼ tæ¤å® ã«ã¤äºä¹æ¤å® ANOVAï¼åæ£åæï¼ ãã®ä»ã®æ¤å® éå帰åæï¼ç·å½¢å帰ã¢ãã«ï¼ ä¸è¬åç·å½¢ã¢ãã«ï¼GLMï¼ãã¸ã¹ãã£ãã¯å帰ã»ãã¢ã½ã³å帰ãªã©ï¼ ãã¸ã¹ãã£ãã¯å帰 ãã¢ã½ã³å帰 æ£ååï¼L1 / L2ãã«ã
åæ¸ã ã¨ããRã®ãæ©ã¿ç¸è«å®¤*1ã«ã¦ãæ¿ããå§åãåããRandom Forestã®æ°ããããã±ã¼ã¸{ranger}ã®ãè¡æ°ãããåæ°ã®æ¹ã大ããçãªãã¼ã¿ãã¸ã®é©ç¨ã«é¢ãã¦ãã¨ã¦ããã£ããã¨ããç³ã訳ãªãç¨åº¦ã®ã¡ã¢ãæ¸ãã¾ããã ããã±ã¼ã¸èªä½ã¯ä¸è¨ãªã³ã¯ããåç §ãã ããã mnwright/ranger · GitHub CRAN - Package ranger [1508.04409] ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R ãè¡æ°ãããåæ°ã®æ¹ã大ããçãªãã¼ã¿ãã¨ãã¦ãä»åã¯LIBSVMã®äºå¤åé¡ã¿ã¹ã¯ã®ãã¼ã¿ã»ããã®ãã¡ãnews20.binaryãå©ç¨ãã¾ããã LIBSVM Data: Classification (Binary Cla
ã©ã³ãã³ã°
é害
ã©ã³ãã³ã°
ãªãªã¼ã¹ãé害æ å ±ãªã©ã®ãµã¼ãã¹ã®ãç¥ãã
ææ°ã®äººæ°ã¨ã³ããªã¼ã®é ä¿¡
å¦çãå®è¡ä¸ã§ã
j次ã®ããã¯ãã¼ã¯
kåã®ããã¯ãã¼ã¯
lãã¨ã§èªã
eã³ã¡ã³ãä¸è¦§ãéã
oãã¼ã¸ãéã
{{#tags}}- {{label}}
{{/tags}}