kaggleåææ¦: ã¿ã¤ã¿ããã¯ä¹å®¢ã®ãããã£ã¼ã«ããçåçãã©ã³ãã ãã©ã¬ã¹ãã§äºæ¸¬ãã¦ã¿ã
ãã®è¨äºã¯ãªã«?
kaggleã¨ãããã¼ã¿ãµã¤ã¨ã³ã¹ã®ã³ã³ãã¹ããµã¤ãã«ããããã¿ã¤ã¿ããã¯ä¹å®¢ã®çåäºæ¸¬ãã¨ãããã¥ã¼ããªã¢ã«èª²é¡ããã£ã¦ã¿ãè¨é²ã§ãã
Rã®ã¤ã³ã¹ãã¼ã«ããåãã¦ã3160ãã¼ã ä¸1639ä½ãããã¾ã§è¡ãã¾ãããï¼æºè¶³ãããã!ï¼
Kaggleã£ã¦ãªã«
ãã¼ã¿ãµã¤ã¨ã³ã¹ã«ã¤ãã¦ã®æ°ããWebãµã¼ãã¹ã§ãã
Kaggleã¯ä¼æ¥ãç 究è ããã¼ã¿ãæ稿ããä¸çä¸ã®çµ±è¨å®¶ããã¼ã¿åæ家ããã®æé©ã¢ãã«ã競ãåããäºæ¸¬ã¢ããªã³ã°åã³åæææ³é¢é£ãã©ãããã©ã¼ã åã³ãã®éå¶ä¼ç¤¾ã§ããã
ãã¼ã¿åæãä¾é ¼ãããããã®ä¾é ¼ã§ããã«é«ãææãåºããããã¼ã /å人ã§ç«¶ã£ãããã¾ãã
æ績ä¸ä½è
ã«ã¯è³éãåºãããªã¯ã«ã¼ããæ¥ããããããã§ãã
ã¾ããä¸ä½è
ã®åæææ³ãå
¬éããã¦ãããããã©ã¼ã©ã ã§èª²é¡ã«ã¤ãã¦ã®è°è«ãããããã¦ããããã¦å¦ç¿ãã¯ãã©ãã¾ãã
ãã¿ã¤ã¿ããã¯ä¹å®¢ã®ãããã£ã¼ã«ããçåçãäºæ¸¬ãã¨ã¯
kaggleã®ãã¥ã¼ããªã¢ã«èª²é¡ã®ä¸ã¤ã§ãã
ã¿ã¤ã¿ããã¯ä¹å®¢ã®æ§å¥ãå¹´é½¢ãååããã±ããçªå·ãæ¯æã£ãéè³ãæµ·å¤ã«å± ãåä¾ã®æ°(?)ãªã©ãªã©ã®æ å ±ããããã®ä¹å®¢ãã¿ã¤ã¿ããã¯å·æ²æ²¡äºæ ã§çãæ®ã£ããã©ããäºæ³ãã¾ãã
ãã¼ã¿ãããã800人åä¸ãããã¾ãã
a. å
ç´400人åã«ã¯ä¸è¨ã®ãããã£ã¼ã«æ
å ±ã¨å
±ã«ãçãæ®ã£ããã©ãããã®æ
å ±ãä¸ãããã¾ãã
b. å
ç´400人åã«ã¯ä¸è¨ã®ãããã£ã¼ã«æ
å ±ã ããããã¾ãããçãæ®ã£ããã©ãããã¯ä¸æã§ãã
a.ã®æ å ±ã使ã£ã¦äºæ³ã¢ãã«ãç«ã¦ãb.ã®ä¹å®¢ãçãæ®ã£ããã©ãããäºæ¸¬ããkaggleã«éãã¤ãã¾ããããã¨ããã«ã¹ã³ã¢ãªã³ã°ããã¦é ä½ãã§ã¾ãã
1æ¥10åã¾ã§ææ¦ããäºãã§ããã®ã§è©¦è¡é¯èª¤ãã¤ã¤ãé«ãé ä½ãç®æãã¦ãã競æã«ãªãã¾ãã
å®éã«ãã£ããã¨
Rã®ã¤ã³ã¹ãã¼ã«
ãã¶ãããã¡ã¯ãã¹ã¿ã³ãã¼ãã§ãããR
ã使ã£ã¦è§£æããäºã«ãã¾ãã
Rã使ã£ããã¨ãããªãã®ã§ã¾ãã¤ã³ã¹ãã¼ã«ããã¨ããããã¯ããã¾ãã
Rã¨RStudioã¨ããã®ãããããã§ããRStudioã¯RãGUIã§ã©ãããããªããçµ±åéçºç°å¢ã¿ãããªãã¤ã§ãã
brewã¨brew caskã§ç°¡åã«ã¤ã³ã¹ãã¼ã«ã§ãã¾ãããç°¡åã§ããã ç«ã¡ä¸ããã¨ãç«ã¡ä¸ããã¾ãã
å·¦å´ãconsoleã«ãªã£ã¦ãã¦ã対話çã«ã³ã¼ããå®è¡ã§ãã¾ããå³å´ã¯ã°ã©ãã表示ããããHistoryãè¦ããã§ãã¾ãã
ã¨ããããä¹±æ°ãå ¥ãã¦ãµãããããã¦ã¿ã
Excelã§ä¹±æ°ãèµ·ããã¦çå/æ»äº¡ãã©ã³ãã ã«æ±ºããåçã¨ãã¦ãµãããããã¦ã¿ã¾ããã ããã¯ãåæãè¡ãåã«åçã®ãã©ã¼ãããã確èªãããã£ãããã§ãã
å®éåçã®ãã©ã¼ããããåéããã¦ããäºã«æ°ã¥ããã®ã§æå³ã¯ãã£ãã¨æãã¾ãã
ãã¦ãçµæã§ãããç´53.1%ã®æ£ççã§ãé ä½ã¯3160ãã¼ã ä¸3148ä½ã§ããã æªãã§ãããå½ããåã§ããã
人æ§ã®å®è·µä¾ãåèã«ã©ã³ãã ãã©ã¬ã¹ãã使ã£ã¦ã¿ã
ãã®ã¨ã³ããªã«è¼ã£ã¦ããæ¹æ³ãã²ã¨éã試ãã¦ã¿ã¾ããã ãã¯ãã©ã³ãã ãã©ã¬ã¹ãã®çä¸çãä¸çªé«ãã®ã§ããã使ã£ã¦ã¿ã¾ãã
Rã®åºæ¬çãªææ³ãåãããªãã®ã§ãææ¢ãã§ãã
æ£ççã77.5%ã¾ã§åä¸ãã¾ããã
ã©ã³ãã ãã©ã¬ã¹ãã®ãã¥ã¼ãã³ã°
ãããã®ãµã¤ããåèã«ãã¥ã¼ãã³ã°ãè¡ãªã£ã¦ããã¾ãã
ãã¼ã¿ããã©ã³ãã ãã©ã¬ã¹ããå¦çå¯è½ãªå½¢ã«å¤æãã
ã©ã³ãã ãã©ã¬ã¹ããã©ã®ãããªã¡ã«ããºã ã§äºæ¸¬ããã¦ããã®ãã¯ã£ããåãã£ã¦ãã¾ãããããããããã®æ±ºå®æ¨ãçæãã¦ããããã使ã£ã¦äºæ¸¬ããã¦ãããã¨ããäºã¯åããã¾ããã
ãã®ãã¨ããåããã¨ãããã©ã³ãã ãã©ã¬ã¹ãã«ä¸ãã説æå¤æ°ã¯ã決å®æ¨ã§å¦çã§ãããããªç©ã§ãªãã¨ãã¡ããã§ãã
大å°é¢ä¿ããããã®ãããã¾ãæ°ãå¤ããªãã«ãã´ãªï¼ä¾ãã°æ§å¥ã®ãããªï¼ãããªãã¨æ±ºå®æ¨ã§å¦çã§ãã¾ããã
ä¾ãã°ããååãã®ãããªæ å ±ã¯åºåãä»ããããªãããååãã®ãã®ã¯ã«ãã´ãªã§ãç¡ãã®ã§ã説æå¤æ°ã¨ãã¦ã©ã³ãã ãã©ã¬ã¹ãã«å¦çãããã®ã¯ç¡çããã§ãã
ãããããã®ãããªæ å ±ããã大å°é¢ä¿ãã«ãã´ãªãåãåºãã¦ãããã°ãã©ã³ãã ãã©ã¬ã¹ããå¦çã§ããããã«ãªãã¾ããä¾ãã°ãååãããMr.ããMrs.ããDon.ããªã©ã®æ¬ç§°ãåãåºãã°ãã«ãã´ãªå¤æ°ã¨ãã¦ä½¿ãããã§ãã
ãã®ãããªèª¬æå¤æ°ã追å ãã¦ã¿ã¾ããã
æ¬ç§°
- ã称å·ãã¨è¨ã£ãæ¹ãæ£ãããã ãããããMr.ããMrs.ããDon.ãã¨ããããããã¤ã§ãã
ãã£ãã³çªå·ã®ä¸3æ¡
- ãã£ãã³çªå·ã®ä¸næ¡ãåãåºããã¨ã§ã宿æ³å ´æã表ãã«ãã´ãªå¤æ°ã¨ãã¦æ±ãäºãã§ããªãã試ã¿ã¾ããã
- ä½æ¡ã試ãã¦ã¿ããã©3æ¡ãä¸çªè¯ãããã§ãã
- ï¼ã©ã³ãã ãã©ã¬ã¹ãã¯å説æå¤æ°ãç®çå¤æ°ã®äºæ¸¬ã«ã©ããããè²¢ç®ãã¦ãããåããï¼
ãã±ããã®çªå·é¨åã®ä¸2æ¡
- ãã¼ã¿ããã¼ã£ã¨çºãã¦ãã¦ãä¸2,3æ¡ã«å ±éããçªå·ãæã£ããã±ãããå¤ãäºã«æ°ã¥ãã¾ããããã®é¨åã«ããã±ããã®ã°ã¬ã¼ããããããã¯è¹å ã®å ´æã®æ å ±ãå«ã¾ãã¦ããäºãæå¾ ãã¦ã追å ãã¦ã¿ã¾ããã
ãã±ããçªå·ã®éè¤ä»¶æ°
- ãã±ããçªå·ã¯ã¦ãã¼ã¯ãªå¤ã§ã¯ç¡ãã£ããããããã¦ãåãé¨å±ã«æ³ã¾ã人ã«å¯¾ãã¦ãåããã±ããçªå·ãå²ãå½ã¦ããã¦ããã®ã§ã¯ç¡ãã ãããã1人é¨å±ã¨11人é¨å±ã§ã¯çåçã«éããåºãããªã®ã§è¿½å ãã¦ã¿ãã
æ¸åºã«è¡ããRã®åèæ¸ãå ¥æ
Rã®ææ³ãããããªãããã®ã§ãæ¸åºã«ãã£ã¦åèæ¸ãã²ãããã¾ããã
å¿ è¦ãªã¨ãããã¤ã¾ã¿èªãã§ã¿ãæ触ã¨ãã¦ã¯ãã¬ãã¥ã¼ã®ããã¨å¤§ä½åæè¦ã§ããè¯ãã£ãã
.Rãè¨èªã¨ãã¦æãã¦ãããçµ±è¨ãã¼ã«ã¨ãã¦é¢æ°ã ãç´¹ä»ããæ¬ã¯å°ãªããªããï¼å ¥éæ¸ã§ï¼foræãifelseæï¼é¢æ°ã®æ¸ãæ¹ãªã©ï¼ããããé«ç´è¨èªã¨ãã¦ã®ä½¿ãæ¹ã¾ã§ä¸å¯§ã«è§£èª¬ãã¦ãããã¨ã
ãã¥ã¼ãã³ã°ã®çµæ
tuneRF
ã«ããã©ã³ãã ãã©ã¬ã¹ãã®ãã©ã¡ã¼ã¿ã¼ãã¥ã¼ãã³ã°- 説æå¤æ°ã®å¤æã追å
ã«ãã£ã¦ãæ£ççã78.5%
ã¾ã§ä¸ããã¾ããã
â¦ãããããã£ãã®1%â¦
ä¸ä½é£ã®è§£ææ¹æ³ãè¦ã¦ã¿ã
- æ£çç80.9%ãããã§ãæ績ä¸ä½10%ã«å
¥ãã
- åãå¯éãã¦ãããããã©ãæ¦ããªãã ãªã
- æ£çç100%ã®äººãããã ãã©â¦
- ãªããªã®â¦
- ã³ã¼ãå ¬éãã¦ã人ã¯å¤§ä½ã©ã®äººãã©ã³ãã ãã©ã¬ã¹ãã§è§£ãã¦ãã
- ãã¡ããªã¼ãã¼ã ãåãåºãã¦ãéè¤ä»¶æ°ãã«ã¦ã³ããã¦ããã
- åä¹ãã¦ãã家æã®æ°ã¨ãããã¨ããª? ãªãã»ã©ãªã¼ã
- æªæå¹´è
ã®å ´åã«ã親ãæ¯è¦ªãåä¼´ãã¦ãããã©ãããæ¨æ¸¬ãã¦ãã人ãããã
- ãã®çºæ³ã¯ç¡ãã£ãã
- ã³ã¼ããã·ã³ãã«ã§èªã¿ããããæå³ãããä¼ãã£ã¦ããã
ææ³
- é¢ç½ã!
å¾ããã®
ç¥ããªãæ¦å¿µã¨ç¥èãï¼ãããã«ã¤ãã¦ç¥ããªããã¨ããæ å ±ãå«ãã¦ï¼ãã£ã±ãå¾ããåèªã ãåæãã¦ãããã
çµ±è¨
説æå¤æ°ãç®çå¤æ°ãè¨ç·´ãã¼ã¿ã次å ãã©ã³ãã ãã©ã¬ã¹ããã¢ã³ãµã³ãã«å¦ç¿ãéå¦ç¿ãã¯ãã¹ããªãã¼ã·ã§ã³ãæ¬ æå¤ãå®å ¨ã«ç¡ä½çºãªæ¬ 測ãç¡ä½çºãªæ¬ 測ã観測ãããªã説æå¤æ°ã«å¾å±ããæ¬ æ¸¬ãMI
政治学方法論 I:欠測値を持つデータと Multiple Imputation
R
package
, CRAN
, 代å
¥æ¼ç®å: <-
, vector
, data.frame
, frame$column
, merge()
, cut()
, randomForest()
, tuneRF()
, factor
, character
, summary()
, vectorã¨vectorãç´æ¥æ¼ç®ã§ãã
, write.csv()
, NA
, NULL
, Inf
, is.na()