æ¬è¨äºã¯ãkaggle Advent Calendar 2018ã®4æ¥ç®ã®è¨äºã§ãã
- ã¯ããã«
- éè¦ãªè¦ç¹
- scikit-learnã«ç¨æããã¦ããé¢æ°
- å帰åé¡ã®å ´å
- å¿ ããããTrust CVãã§ã¯ãªã
- ãããã«
ã¯ããã«
æ¬è¨äºã§ã¯ã3æ¥ç®ã®è¨äºã§éè¦æ§ã説æããCross Validationã«ã¤ãã¦ããè¯ãCVãã¨ãªãvalidationã®ãã¼ã¿ã»ããã¯ã©ã®ãããªãã®ãèãã¦ã¿ããã¨æãã¾ãã
ãã®è©±é¡ã«ã¤ãã¦ã¯ãscikit-learnã®ããã¥ã¡ã³ãã¼ã·ã§ã³ãé常ã«å å®ãã¦ãã¾ããæ¬è¨äºã§ããã½ã¼ã¹ã³ã¼ãã大ãã«æµç¨ãã¾ããã
Visualizing cross-validation behavior in scikit-learn — scikit-learn 0.21.3 documentation
éè¦ãªè¦ç¹
ãè¯ãCVãã¯ä½ããèããä¸ã§éè¦ãªã®ã¯ãååã®è¨äºã§å¼ç¨ããbestfittingæ°ãè¿°ã¹ã¦ããããã«ããã¼ã¿ã¨è§£ãã¹ãåé¡ãæ確ã«ç解ãããã¨ã§ãã
å ·ä½çã«ã¯ãä¾ãã°ä»¥ä¸ã®ãããªãã¤ã³ãã大åã«ãªãã¾ãã
- åé¡åé¡ãå帰åé¡ã
- åé¡åé¡ã®å ´åãåã¯ã©ã¹ã®æ°ã«åãã¯ãªãã
- é çªã«æå³ã®ãããã¼ã¿ãï¼æç³»åãã¼ã¿ã§ãããï¼
ãã®ãããªãã¼ã¿ã解ãã¹ãåé¡ã«å¿ãã¦ãé©åãªææ³ã§validationã®ãã¼ã¿ã»ãããtrainã®ãã¼ã¿ã»ããããåãåºãå¿ è¦ãããã¾ãã
scikit-learnã«ç¨æããã¦ããé¢æ°
ä»åã¯scikit-learnã®ããã¥ã¡ã³ãã¼ã·ã§ã³ã®ä¾ã«æ²¿ã£ã¦ã次ã®ãããª3ã¯ã©ã¹åé¡ã®ãã¼ã¿ã»ãããèãã¦ã¿ã¾ãã説æã®ãããã¯ã©ã¹0ãæ°´è²ãã¯ã©ã¹1ãæ©è²ãã¯ã©ã¹2ãè¶è²ã¨ãã¾ãã
ã¯ã©ã¹ã¨ã¯å¥ã®æ¦å¿µã¨ãã¦ããã¼ã¿ã»ããå ¨ä½ã¯åçãª10ã°ã«ã¼ãã«åå²ããã¦ãã¾ããã°ã«ã¼ãã¯ãªããªãã¤ã¡ã¼ã¸ãä»ãã¥ããããããã¾ããããä¾ãã°ãåãã¦ã¼ã¶ã®ãã¼ã¿ãä¸ã¤ã®ã°ã«ã¼ãã«ã¾ã¨ãã¦ãããã¨ãã£ã使ãæ¹ãæ³å®ã§ãã¾ããåãã¦ã¼ã¶ã®ãã¼ã¿ãtrainã®ãã¼ã¿ã»ããã¨validationã®ãã¼ã¿ã»ããã®ä¸¡è ã«åå¨ããã¨ãä¸å½ã«ç²¾åº¦ãé«ããªãæããããããã§ãã
scikit-learnã«ç¨æããã¦ããé¢æ°ãç¨ãã¦ããã®ãã¼ã¿ã»ãããtrainã®ãã¼ã¿ã»ããã¨validationã®ãã¼ã¿ã»ããã«åãåãã¦ã¿ã¾ãããã
KFold
sklearn.model_selection.KFold — scikit-learn 0.21.3 documentation
ä¸å³ã®ããã«ããã¼ã¿ã»ãããæå®ããnåã«ï¼ããã©ã«ãã§ã¯é åºãå¤ãããã¨ãªãï¼åå²ãã¾ãã赤ãvalidationã®ãã¼ã¿ã»ãããéãtrainã®ãã¼ã¿ã»ããã§ãã
åã ã®åå²ãè¦ã¦ã¿ãã¨ãåå²2ã¨åå²3ã§ã¯ãå ¨ã¦ã®validationã®ãã¼ã¿ã»ãããã¯ã©ã¹2ï¼è¶è²ï¼ã«ãªã£ã¦ãã¾ããä¸æ¹ã§åå²0ã®validationã®ãã¼ã¿ã»ããã«ã¯ã¯ã©ã¹2ï¼è¶è²ï¼ãå ¨ãå«ã¾ãã¦ãã¾ãããå ¨ã¦ã®åå²ã§ã¯ã©ã¹ãåã£ãç¶æ³ã«ãªã£ã¦ãããé©åãªã¹ã³ã¢ãå¾ãããªããã¨ãåããã¾ãã
StratifiedKFold
sklearn.model_selection.StratifiedKFold — scikit-learn 0.21.3 documentation
ä¸å³ã®ããã«ããã¼ã¿ã»ããå ¨ä½ã®ã¯ã©ã¹ã®åããä¿æããªããããã¼ã¿ã»ãããåå²ã§ãã¾ããKFoldã¨æ¯è¼ãã¦ããã®ãã¼ã¿ã»ããã«ããã¦ã¯ãè¯ãCVãã¨ãªã£ã¦ããã¨è¨ããã§ãããã
GroupKFold
sklearn.model_selection.GroupKFold — scikit-learn 0.21.3 documentation
ä¸å³ã®ããã«ãåãã°ã«ã¼ããç°ãªãåå²ãã¿ã¼ã³ã«åºç¾ããªãããã«ãã¼ã¿ã»ãããåå²ã§ãã¾ããGroupKFoldãå¿ è¦ãªå ·ä½çäºä¾ã¯ãiwiwiããã®スライドã動画ãåãããããã§ãã
ShuffleSplit
sklearn.model_selection.ShuffleSplit — scikit-learn 0.21.3 documentation
ä¸å³ã®ããã«ããã¡ãã¾ãã§åå²ãã¾ããååå²ã§ãã¼ã¿ã»ãããã·ã£ããã«ãã¦èãã¦ããããããã¼ã¿ã®éè¤ã許容ããã¦ãã¾ãã
GroupShuffleSplit
sklearn.model_selection.GroupShuffleSplit — scikit-learn 0.21.3 documentation
ä¸å³ã®ããã«ãã°ã«ã¼ãåä½ã§ã©ã³ãã ã«validationã®ãã¼ã¿ã»ããã«å²ãå½ã¦ãããã«åå²ãã¾ããGroupKFoldã¨æ¯ã¹ãã¨ãä¾ãã°ä¸çªå·¦ã®ã°ã«ã¼ããä¸åº¦ãvalidationã®ãã¼ã¿ã»ããã¨ãã¦ä½¿ããã¦ãã¾ãããããã¯ãShuffleSplitã¨åæ§ã«ååå²ã§ãã¼ã¿ã»ãããã·ã£ããã«ãã¦èãã¦ããããããã¼ã¿ã®éè¤ï¼ã¤ã¾ãã¯ä½¿ãããªããã¼ã¿ã®åå¨ï¼ã許容ãããããã§ãã
StratifiedShuffleSplit
sklearn.model_selection.StratifiedShuffleSplit — scikit-learn 0.21.3 documentation
ä¸å³ã®ããã«ããã¼ã¿ã»ããå ¨ä½ã®ã¯ã©ã¹ã®åããä¿æããªããããã¼ã¿ã»ãããåå²ã§ãã¾ããGroupShuffleSplitã¨GroupKFoldã®æ¯è¼ã¨åæ§ãStratifiedShuffleSplitãStratifiedKFoldã¨æ¯è¼ããã¨ããã¼ã¿ã®éè¤ã許容ããã¦ããã¨åããã¾ããå¿ ãããå ¨ã¦ã®ãã¼ã¿ãvalidationã®ãã¼ã¿ã»ããã«ä¸åº¦ä½¿ãããããã§ã¯ããã¾ããã
TimeSeriesSplit
sklearn.model_selection.TimeSeriesSplit — scikit-learn 0.21.3 documentation
ä¸å³ã®ããã«ãæç³»åã®ãã¼ã¿ã»ãããåå²ã§ãã¾ããæç³»åã®ãã¼ã¿ã»ããã§ã¯ãéå»ã®ãã¼ã¿ããè¨ç·´ã«ä½¿ããã¨ã¯ã§ããªãã®ã§ãä¸å³ã®ãããªåå²ãç¾å®çã§ããä¸è¬ã«æç³»åã®ãã¼ã¿ã»ããã§ã¯ãæªæ¥ã®æ å ±ãtrainã®ãã¼ã¿ã»ãããéå»ã®æ å ±ãvalidationã®ãã¼ã¿ã»ããã«ããã¨ãæ¬æ¥ãããè¯ãã¹ã³ã¢ãåºã¦ãã¾ããã¨ãç¥ããã¦ãã¾ãã
å帰åé¡ã®å ´å
ããã¾ã§åé¡åé¡ãä¾ã¨ãã¦æ±ã£ã¦ãã¾ããããå帰åé¡ã®å ´åããããã®åå²æ¹æ³ãå©ç¨ã§ãã¾ããå帰åé¡ã§ã¯ã¯ã©ã¹ã¨ããæ¦å¿µãããã¾ããããä¾ãã°ç®çå¤æ°ã1å¤æ°ã§k-meansã§é©å½ã«åé¡ãããã¨ã§ãæ¬ä¼¼çã«ã¯ã©ã¹ãä½æãã¦StratifiedKFoldãç¨ãããã¨ãå¯è½ã§ãã
å¿ ããããTrust CVãã§ã¯ãªã
ããã¾ã§ã®è°è«ã§ã¯ãtrainã®ãã¼ã¿ã»ããã¨testã®ãã¼ã¿ã»ããã§åã¯ã©ã¹ã®åå¸ãä¼¼éã£ã¦ãããã¨ããåæã«åºã¥ãã¦è©±ãé²ãã¦ãã¾ããããããæã«ã¯ãtrainã®ãã¼ã¿ã»ããã¨testã®ãã¼ã¿ã»ããã§åå¸ãç°ãªãå ´åãããããããã¾ããã
ä¾ãã°æ¥µç«¯ãªä¾ã§ãããtrainã®ãã¼ã¿ã»ããã§ã¯ã¯ã©ã¹0:ã¯ã©ã¹1:ã¯ã©ã¹2 = 1:3:6 ãªã®ã«ãtestã®ãã¼ã¿ã»ããã§ã¯ã¯ã©ã¹0:ã¯ã©ã¹1:ã¯ã©ã¹2 = 5:5:0 ã®ãããªå ´åã§ãããã®å ´åã«ã¯ãã©ãã»ã©trainã®ãã¼ã¿ã¨è§£ãã¹ãåé¡ãæ確ã«ç解ãã¦ãè¯ãCVããã¤ã¾ãã¯trainã®ãã¼ã¿ã»ããã¨åå¸ãçããvalidationãã¼ã¿ã»ãããä½æãã¦ããæå³ãããã¾ããã
testã®ãã¼ã¿ã»ããã®åå¸ã¯ã³ã³ãçµäºæã¾ã§åããã¾ãããããè¯ãCVããä½ã£ãèªä¿¡ãããã«ããããããlocal CVã¨public LBã®ã¹ã³ã¢ã«å¤§ããªä¹é¢ãããå ´åã¯ãtrainã®ãã¼ã¿ã»ããã¨testã®ãã¼ã¿ã»ããã§åå¸ãç°ãªããã¨ãä»®å®ãã¦ãè¯ãããããã¾ããã
ãã®ãããªç¶æ³ã§ã¯ãAdversarial Validationãã®å©ç¨ãæ¤è¨ã§ãã¾ãã以åã«ç°¡åã«ã¾ã¨ããè¨äºãããã®ã§ãèå³ãããã°ãåç §ãã ããã
Adversarial Validation
ãããã«
æ¬è¨äºã§ã¯ããè¯ãCVãã¨ãªãvalidationã®ãã¼ã¿ã»ããã®ä½ãæ¹ã«ã¤ãã¦ã¾ã¨ãã¾ãããæå¾ã«æ¹ãã¦å¼·èª¿ãããã®ã¯ãä½ã§ãä¸æãããææ³ãããããã§ã¯ãªãããã¼ã¿ã¨è§£ãã¹ãåé¡ãç解ãé©åãªææ³ãé¸æãããã¨ã§ãããã¡ãããæ¬è¨äºã§ç´¹ä»ããscikit-learnã«ç¨æããã¦ããé¢æ°ã§ã¯è¦ä»¶ãæºãããªãå ´åãåºç¾ããããããã¾ããã大åãªã®ã¯ãåå説æããããªãCross Validationãéè¦ãªã®ãããç解ããç¶æ³ã«å¿ãã¦å¯è½ãªéãé©åãªvalidationã®ãã¼ã¿ã»ãããä½æãããã¨è©¦ã¿ããã¨ã ã¨æã£ã¦ãã¾ãã
ã½ã¼ã¹ã³ã¼ãã¯GitHubã§å ¬éãã¾ããã