ä»å¹´ã®1æã«ãããªè©±é¡ãåãä¸ããããã§ããã
ãã®è¨äºã®æå¾ã«ã¡ããã£ã¨æ¸ããéããå®éã«ã¯ãããã"too good to be true"å³ã¡ããã®ã¢ãã«ã®ç²¾åº¦ãããä½ã§ãé«éããããããªãã®ï¼ãâãå®ã¯æ±åæ§è½è¦ã¦ã¾ããã§ãããã¿ãããªã±ã¼ã¹ã£ã¦ãæ³åãããé¥ãã«å¤ããã®ä¸ã®ä¸åå¨ããã¿ãããªãã§ãããã¨ãããã¨ã§ããããããã¯ããã¿ãã®2ç« ã¨ãPRMLã®æåã®æ¹ã«åºã¦ããåæ©ä¸ã®åæ©ãªãã§ããããã®è¾ºã®è©±ãæ¹ãã¦ã ãã ãæ¸ãã¦ã¿ããã¨æãã¾ãã
ããããã精度100%ãã¨ããç¸é¢ä¿æ°0.9以ä¸ãã¨ãè¦ãã身æ§ããã¹ã
åé ã«æããä¾ã¯ãããããã精度100%ãªãã¦ãããããã¨ãã声ããã¡ãã¡ããæãã£ããã¨ã§è©±é¡ã«ãªããèãéãã¦ã¿ããleakageã¯ãããè¨ç·´èª¤å·®ã§ããè©ä¾¡ãã¦ãªããã§æ£ã
ã ã£ãããã§ãã
ä¸è¬ã«ãå®ä¸çã®ãã¼ã¿ã»ããã§çµ±è¨ã¢ããªã³ã°ã«ããæ©æ¢°å¦ç¿ã«ããã¢ããªã³ã°ããå ´åãè¨æ¸¬ã®èª¤å·®ããã¤ãºãåå¨ãããã¨ãèããã°ã精度100%ã«ãªããªãã¦ãã¨ã¯ã¾ãããå¾ã¾ãããåæ§ã«ç¸é¢ä¿æ°ã0.9以ä¸ã¨ããã®ããæ£ç´è¨ã£ã¦å®ä¸çã®ãã¼ã¿ã»ãããç¸æã«ããéãã¯ã»ã¨ãã©èãããªãæ°åã ã¨æãã¾ãã
ãã®ãããª"too good to be true"ãªã¢ãã«ç²¾åº¦ã®è©±é¡ãåºã¦ããããã¾ãçã£å
ã«ãä½ãããããã®ã§ã¯ãªããï¼ãã¨çã£ã¦ãããçãã¤ããæ¹ãè¯ãã¨å人çã«ã¯æã£ã¦ã¾ãããã®ä¸ã§ã以ä¸ã®ãããªãã¤ã³ããç²¾æ»ãã¹ããªã®ããªã¨ã
éå¦ç¿ã»æ±åæ§è½ã¨ã¯
ãã®ãã¯ããã¿ãã§ãPRMLã§ãæåã®æ¹ã«åºã¦ããè¶
絶æåãªè©±ãªãã§ãããä¸å¿ãããããã¦ããã¾ããããå
¨ãåããã®ã§ããã®ã¯é¢ç½ããªãã®ã§ãé¢åã§ããèªåã§ä¼¼ããã®ãç¨æãã¾ãã*1ã
ã¾ãã以ä¸ã®ãããªãã¼ã¿ã»ãããæ³å®ãã¾ãããã ããå®éã«ã¯å ¨é¨ã§21ç¹ã®ããã«ã¼ã«ã«å¾ã£ã¦çæãããã¼ã¿ã®ãã¡16ç¹ã®ã¿ããããããã¦ããã¾ãããã®16ç¹ããå¦ç¿ãã¼ã¿ãã¨ããæ®ãã®5ç¹ãããã¹ããã¼ã¿ãã¨ãã¾ãã
è¦ãæãSåã«ã¼ãã£ã½ãã®ã§ãä¾ãã°3次ã®å¤é å¼ã§è¿ä¼¼ï¼å¦ç¿ï¼ãã¦ã¿ã¾ããããããã¨ã以ä¸ã®ãããªæãã«ãªãã¾ãã
ã§ããã¡ãã£ã¨ç²¾åº¦ã¨ãã¦ã¯ã¡ãã£ã¨ç©è¶³ããªãæ°ãããã®ã§ã次æ°ãä¸ãã¦9次ã®å¤é å¼ã§è¿ä¼¼ãã¦ã¿ã¾ããããçµæã¯ä»¥ä¸ã®éç·ã®éãã
ã¡ãªã¿ã«ãã®2ã¤ã®è¿ä¼¼çµæã¨å ã®16ç¹ã®ãã¼ã¿ã¨ã®ç¸é¢ä¿æ°ãè¨ç®ãã¦ã¿ãã¨ã3次ã®å¤é å¼ã§ã¯0.968ã9次ã®å¤é å¼ã§ã¯0.986ã¨ããçµæã«ãªãã9次ã®å¤é å¼ã§è¿ä¼¼ããæ¹ããã®16ç¹ã«å¯¾ããå½ã¦ã¯ã精度ã¯é«ãã¨ãããã¨ã«ãªãã¾ãã
ãªã®ã§ãããæ¬å½ã«ããã§è¯ãã®ã§ããããï¼ãå®ã¯ãå ¨é¨ã§21ç¹ãããã¼ã¿ãå ¨ã¦ããããããã¨ã以ä¸ã®ããã«ãªãã¾ãã
ããã«ãå ã»ã©ã®3次ã®å¤é å¼ã§è¿ä¼¼ï¼å¦ç¿ï¼ããã¢ãã«ã«ããäºæ¸¬æ²ç·ã¨ã9次ã®å¤é å¼ã§è¿ä¼¼ããã¢ãã«ã«ããäºæ¸¬æ²ç·ãéãæããã¦ã¿ãã¨ããããªãã¾ãã
ãããåããã§ãããã3次å¤é å¼ã¢ãã«ã¯æ®ãã®5ç¹ã«å¯¾ãã¦ãã»ã¼ãã£ãããã¦ããã®ã«å¯¾ãã¦ã9次å¤é å¼ã¢ãã«ã¯éä¸ããããã£ã¦ã®æ¹åã«å¹ã£é£ãã§ãã¾ã£ã¦ãã¾ãã
ãã®å ã®21ç¹ã®ãã¼ã¿ã¯ã¨ãã3次é¢æ°ã«æ£è¦åå¸ãããã¤ãºãå ãã¦çæãããã®ãªã®ã§ãå½ããåã§ãã3次ã®å¤é å¼ã§è¿ä¼¼ããæ¹ããå¦ç¿ãã¼ã¿ã«ã¯åå¨ããªãã£ããæ®ã5ç¹ã®ãã¹ããã¼ã¿ã«å¯¾ããäºæ¸¬ç²¾åº¦ã¯è¯ããªãã¾ããã¤ã¾ãããã¹ãã®ã¢ãã«ã¯3次å¤é å¼ã¢ãã«ã§ããã9次å¤é å¼ã¢ãã«ã¯è¯ããªãã¢ãã«ã ã¨ãããã¨ã«ãªãããã§ãã
ãã¯ããã¿ããPRMLã§ãè¿°ã¹ããã¦ããããã«ãããããæã«9次å¤é å¼ã¢ãã«ã®ãããªäºæ ã«é¥ããã¨ããéå¦ç¿ã(overfitting)ã¨å¼ã³ã¾ããä¸æ¹ã§3次å¤é å¼ã¢ãã«ã®ããã«ãå¦ç¿ã¢ãã«ã¸ã®å½ã¦ã¯ã¾ãã¯å¿ ãããè¯ããªããã®ã®æªç¥ãã¼ã¿ï¼ãã¹ããã¼ã¿ï¼ã¸ã®å½ã¦ã¯ã¾ããè¯ãããã¨ããæ±åæ§è½ï¼æ±åè½åï¼generalizationï¼ãã¨å¼ã³ã¾ãã
ãã®ä¾ãããåããããã«ãåºæ¬çã«ã¯é©åãªå¦ç¿ã¢ãã«ãé¸æãããå ´åã¯ãã§ããã ãéå¦ç¿ãã¦ãããæ±åæ§è½ã«åªãããã¢ãã«ãåªå
ãã¹ããã ã¨è¨ãã¾ããè¨ãæããã¨ã観測ããããã¼ã¿ãçæãã¦ããï¼ãã¼ã¿ã®èå¾ã«ããï¼çã®ã¢ãã«ã«è¿ä»ãããã«ã¯ãæªç¥ãã¼ã¿ï¼ãã¹ããã¼ã¿ï¼ã«ããã¡ãã¨å½ã¦ã¯ã¾ãæ±åæ§è½ã«åªããã¢ãã«ãæ¡ç¨ããã¹ããã¨ãè¨ããã¨æãã¾ãã
ãäºæ¸¬ãããã説æããéè¦ãããæ ã®è½ã¨ãç©´
ã¨ããããå¦ç¿ãã¼ã¿ã®ã¿ã«å¯¾ãã精度ã ããè¦ã¦ã¢ãã«ã®è©ä¾¡ããã¦ãã¾ãã±ã¼ã¹ã¯ãä¸ã®ä¸å°ãªãããããããã§ãããã®çç±ã¨ãã¦ãå¦ç¿ããã¢ãã«ã®äºæ¸¬æ§è½ãéè¦ããã®ã§ã¯ãªãããã®ã¢ãã«ã®ãã©ã¡ã¼ã¿ï¼åå帰ä¿æ°ï¼ã®å¤§å°ã ãã«ï¼ãï¼èå³ãããã±ã¼ã¹ã§ã¯ããããå¦ç¿ãã¼ã¿ã«å¯¾ãã精度ï¼ãããã¯è¨ç·´èª¤å·®ï¼ã®ã¿ãæ ãæã«ãã¦ãã¾ããã¨ãããã¨ããªãã¨ãã
åé ã«æããä¾ã§ããäºæ¸¬æ§è½ããããã¨ãªãããã®å帰å¼ï¼èª¬æå¤æ°Ãåå帰ä¿æ°ã®ã©ã¤ã³ãããï¼èªä½ã«ãæå³ãæããããã£ãããã§ãããã§è¨ç·´èª¤å·®ããè¦ãªãã¨ããçµæã«ã¤ãªãã£ã¦ããããã«è¦åãããã¾ãã
ããããªãããä»®ã«ã¢ãã«ã®ãã©ã¡ã¼ã¿ï¼åå帰ä¿æ°ï¼ã®å¤§å°ã«ããèå³ããªãã±ã¼ã¹ã§ãã£ã¦ããæ±åæ§è½ã«å£ãéå¦ç¿ããã¢ãã«ãæ¡ç¨ãããã¨ã«ã¯å¤§ãã«åé¡ãããã¨æããã¾ããå®éãå ã»ã©ã®3次é¢æ°çæãã¼ã¿ã«å¯¾ãã3次å¤é å¼ã¢ãã«vs.9次å¤é å¼ã¢ãã«ã«ããè¿ä¼¼çµæããæ®éã®ç·å½¢å帰ã¢ãã«ï¼æå°äºä¹æ³ï¼ã«åºã¥ãã¦æ¨å®ãããã©ã¡ã¼ã¿ã¨ã¨ãã«ä¾ç¤ºããã¨ä»¥ä¸ã®ããã«ãªãã¾ãã
> summary(lm3) Call: lm(formula = y ~ V1 + V2 + V3, data = d) Residuals: Min 1Q Median 3Q Max -7.8075 -1.6179 0.7592 1.8565 5.8518 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -12.10054 2.86446 -4.224 0.00118 ** V1 35.24213 3.41855 10.309 2.58e-07 *** V2 -12.15388 1.08009 -11.253 9.87e-08 *** V3 1.02871 0.09454 10.882 1.43e-07 *** --- Signif. codes: 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1 Residual standard error: 3.559 on 12 degrees of freedom Multiple R-squared: 0.9368, Adjusted R-squared: 0.921 F-statistic: 59.28 on 3 and 12 DF, p-value: 1.819e-07 > summary(lm9) Call: lm(formula = y ~ ., data = d) Residuals: Min 1Q Median 3Q Max -3.7139 -0.7857 -0.2104 0.8271 4.1045 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.389e+01 3.287e+00 -4.226 0.00553 ** V1 9.606e+01 4.339e+01 2.214 0.06879 . V2 -2.145e+02 1.358e+02 -1.579 0.16530 V3 2.729e+02 1.694e+02 1.611 0.15836 V4 -1.871e+02 1.093e+02 -1.712 0.13773 V5 7.276e+01 4.047e+01 1.798 0.12229 V6 -1.664e+01 8.938e+00 -1.861 0.11199 V7 2.216e+00 1.163e+00 1.906 0.10529 V8 -1.591e-01 8.216e-02 -1.936 0.10099 V9 4.755e-03 2.431e-03 1.956 0.09829 . --- Signif. codes: 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1 Residual standard error: 3.289 on 6 degrees of freedom Multiple R-squared: 0.973, Adjusted R-squared: 0.9325 F-statistic: 24.03 on 9 and 6 DF, p-value: 0.0004908
è¨ãã¾ã§ããªããlm3ï¼3次å¤é å¼ã¢ãã«ï¼ã¯ã»ã¼æ£ç¢ºã«å ã®3次é¢æ°ã«è¿ããã©ã¡ã¼ã¿æ¨å®çµæãè¿ãã¦ããã®ã«å¯¾ãã¦ãlm9ï¼9次å¤é å¼ã¢ãã«ï¼ã¯ã»ã¨ãã©ãã¿ã©ã¡ã«è¿ãçµæãè¿ãã¦ãã¾ããã¤ã¾ããéå¦ç¿ãã¦ããã¢ãã«ã¯ãããããã©ã¡ã¼ã¿æ¨å®ã¨ããæå³ã§ãå½¹ã«ç«ããªãã¨ããããã§ãã
ããããwebãã¼ã±ãã£ã³ã°ã®ä¸çã ã¨ãã®æã®éå帰åæã®ãã©ã¡ã¼ã¿æ¨å®çµæã«åºã¥ãã¦ä¾ãã°ããã£ã¨ââæ½çã«æ³¨åãã¹ããããã£ã¨â³â³ã«ã³ã¹ãããããã¹ããã¿ãããªææ決å®ããããã¨ãããããã§ãããããéå¦ç¿ããã¢ãã«ã§ãããªãã¨ãããæ¥ã«ã¯ææªã®äºæ ã«ãªããã¨ãããå¾ã¾ããéå¦ç¿ã®çµæââæ½çãæå¹ã ã¨æã£ã¦å¤§éã®äººæã³ã¹ããæãã¦ã¹ããã¢ããªãæ¹ä¿®ããã®ã«ããããªãªã¼ã¹ãã¦ã¿ããå ¨ãCVã伸ã³ãªããã¨ãããããããéå¦ç¿ããä¸é©åãªã¢ãã«ã®ããã ã£ãããªãã¦ãã¨ã«ãªã£ããæ³£ãã«æ³£ãã¾ããã
ãããªãçç±ããã£ãã¨ãã¦ããåºæ¥ãéãã¢ãã«ã®ç²¾åº¦ã¯å¦ç¿ãã¼ã¿ã«å¯¾ããå½ã¦ã¯ãã ããè¦ãã®ã§ã¯ãªãããã¡ãã¨æªç¥ãã¼ã¿ï¼ãã¹ããã¼ã¿ï¼ã¸ã®å½ã¦ã¯ããè¦ãããã«ããã¹ãã ã¨ããæ以ã§ãããã¾ãã
ä½æ ãããªãã¨ã«ãªãã®ã
ããã¾ããã¯ããã¿ãã§ãPRMLã§ããã¤ã¢ã¹ï¼ããªã¢ã³ã¹å解ã®ä¸ãã§è©³ç´°ã«æ¸ããã¦ãã話ã§ãããããå°ãåã¿ç ãã¦è¦ã¦ã¿ã¾ããããã¾ãã3次å¤é
å¼ã¢ãã«ã¨9次å¤é
å¼ã¢ãã«ã¨ãæ¸ã並ã¹ã¦ã¿ãã¨ãããªãã¾ãã
åç´ã«èªãã°ã3次å¤é å¼ã¢ãã«ã¯èª¬æå¤æ°3ã¤ã®ã¢ãã«ã9次å¤é å¼ã¯èª¬æå¤æ°9ã¤ã®ã¢ãã«ãã¨ãããã¨ã«ãªãã¾ããä¸æ¹ãå ã®ãã¼ã¿ãçæããçã®3次é¢æ°ã¯ä»¥ä¸ã®ãããªå½¢ããã¦ãã¾ããã
ã¤ã¾ããå ãã¼ã¿ã®èå¾ã«ããçã®ã¢ãã«ã¯ã3次å¤é å¼ã®ã·ã°ãã«ï¼ãã¤ãºãã¨ããæ§æã«ãªã£ã¦ããããã§ããããã«ãã¤ã³ããããã¾ãã
ã¾ããçã®ã¢ãã«ã¨æ¬¡æ°ãçãã3次å¤é å¼ã¢ãã«ã§ããã°ããã¤ãºã«ã¯ãã£ãããåããªãããã©ãã·ã°ãã«ã«ã¯ã´ããã¨ãã£ããããã¨æå¾ ãããããã§ããã§ã¯ã9次å¤é å¼ã¢ãã«ã§ã¯ã©ããªããã¨ããã¨ãããããããã·ã°ãã«ã«ããã¤ãºã«ããã£ãããã¦ãã¾ã£ã¦ãããã®ã§ããã¤ã¾ããã次æ°ãä¸ããï¼èª¬æå¤æ°ãå¢ãããï¼ãã¨ã§å¦ç¿ãã¼ã¿ã®ã·ã°ãã«ã ãã§ã¯ãªããã¤ãºã«ã¾ã§ãã£ãããã¦ãã¾ã£ããã¨ãããã¨ãªã®ã§ããå½ç¶ã®ãã¨ãªãããæªç¥ãã¼ã¿ã¯ã·ã°ãã«ã®çã®ã¢ãã«ã«æ²¿ã£ã¦çºçããããã«ã¡ãã£ã¨ã ããã¤ãºãä¹ã£ãã ãã®ä»£ç©ã§ãããã®ã¨æå¾ ããããã*2ããããªãã¡ããã¡ããªã¢ãã«ã§äºæ¸¬ãããã¨ãã¦ãå½ãããããããã¾ããã
ä¸è¬ã«ãä»åä¾ã«æããå¤é å¼è¿ä¼¼ã«éãããã¢ãã«ã®èª¬æå¤æ°ã¯å¿ è¦ä»¥ä¸ã«å¢ããã°å¢ããã»ã©å¦ç¿ãã¼ã¿ã®ã·ã°ãã«ã ãã§ãªããã¤ãºã«ã¾ã§ãã£ãããã¦ãã¾ããã¨ãããã¨ãè¨ããã¦ãã¾ãã試ãã«ãã¯ããã¿ãåæ§ã«ãä»åã®å¤é å¼ãã¼ã¿ã«å¯¾ãã¦1次, 2次, â¦, 9次ã¾ã§æ¬¡æ°ãä¸ãã¦åã ã¢ãã«æ¨å®ããæã®ãå¦ç¿ãã¼ã¿16ç¹ã«å¯¾ããMSEï¼å¹³åäºä¹èª¤å·®ï¼ã¨ãã¹ããã¼ã¿5ç¹ã«å¯¾ããMSEã¨ã次æ°ãã¨ã«ãããããã¦ããã¨ä»¥ä¸ã®ããã«ãªãã¾ãã
æ¢ã«è¦ãããã«ã3次ã§ã¡ããã©è¨ç·´èª¤å·®ï¼å¯¾å¦ç¿ãã¼ã¿ï¼ã¨ãã¹ã誤差ï¼å¯¾ãã¹ããã¼ã¿ï¼ã¨ãã»ã©ããæå°å¤ãåã£ã¦ããä¸æ¹ã§ãç¹ã«5次以ä¸ã«æ¬¡æ°ãä¸ãã£ã¦ããã¨è¨ç·´èª¤å·®ã¯æ¸ãç¶ããã®ã«å¯¾ãã¦ãã¹ã誤差ã¯ã©ãã©ãè·³ãä¸ãã£ã¦ããã®ãåãããã¨æãã¾ãã
ãã®ããã«ãéé²ã«èª¬æå¤æ°ãå¢ããã¦ããã¨ç¢ºãã«è¨ç·´èª¤å·®ã¯ä¸ããï¼å¦ç¿ãã¼ã¿ã¸ã®å½ã¦ã¯ã精度ã¯ä¸ããï¼ããã§ãããå®éã«ã¯ãã¹ã誤差ã¯éä¸ããä¸ãã£ã¦ãã£ã¦ãã¾ãï¼æªç¥ãã¹ããã¼ã¿ã¸ã®äºæ¸¬ç²¾åº¦ã¯ä¸ããï¼ããã§ãããªã®ã§ãããä»®ã«ä¸ã«æ¸ããããã«ããäºæ¸¬ããããã説æããéè¦ãããçµæã¨ãã¦ã¢ãã«ã®è¨ç·´èª¤å·®ããè©ä¾¡ããããªããã¤ãã®è¨ç·´èª¤å·®ãããã«æ¸ããããã«ä½ãèããã«ã©ãã©ã説æå¤æ°ãããã§ããã¨å¢ããã¦ãã£ãããããææªã§ããï¼æ±ï¼ãã¢ãã«ã¯ã§ãããããã©ã¡ã¼ã¿ï¼å¤å帰ä¿æ°ï¼ã®æ¨å®çµæãã§ããããã¨ãããã¨ã«ãªã£ã¦æªå½±é¿ã¯ãã¯ãè¨ãç¥ãã¾ããã
交差æ¤è¨¼ã§ããæ±åæ§è½ã«åªããã¢ãã«ãé¸ã¶
ã§ã¯ããã®ãããªã¢ããªã³ã°ã®ç¾å ´ã«ããã¦ããããå°ã£ãäºæ
ãé¿ããã«ã¯ã©ããããè¯ãã®ã§ããããï¼ãæã確å®ãªã®ã¯äº¤å·®æ¤è¨¼(cross validation)ãè¡ããã¨ã§ãåã
ã®ã¢ãã«ã®æ±åæ§è½ãè©ä¾¡ãããã¨ãã¨ã
ãã¯ããã¿ãã§ã触ãããã¦ããããã«ã交差æ¤è¨¼ã«ã¯æ§ã ãªããæ¹ãããã¾ããä¾ãã°å¦ç¿ãã¼ã¿ãã©ã³ãã ã«2ã¤ã«æ¯ãåãã¦çæ¹ã§ã¢ãã«ãå¦ç¿ããããçæ¹ã§ãã®ç²¾åº¦ãè©ä¾¡ããhold-outæ³ãå¦ç¿ãã¼ã¿ãkåã«åå²ãã¦ãã®ãã¡k-1åã§ã¢ãã«ãå¦ç¿ããã¦æ®ã£ã1åã§ã¢ãã«ç²¾åº¦ãè©ä¾¡ããã®ãkåç¹°ãè¿ãk-foldsæ³ãå¦ç¿ãã¼ã¿ã®ãã¡ãµã³ãã«1ã¤ãæãã¦ã¢ãã«ãå¦ç¿ããã¦æ®ã£ã1ãµã³ãã«ã¸ã®ã¢ãã«äºæ¸¬å¤ãæ¯è¼ããã®ããµã³ãã«ãµã¤ãºã®åã ãç¹°ãè¿ãLeave-One-Outæ³ãä»ã«ãbootstrapæ³ãã©ã³ãã ãã©ã¬ã¹ãã§ç¨ããããOut-Of-Bagæ³ãªã©ãããã¾ãã
ä»åã®ç°¡åãªä¾ã§ã¯ã試ãã«Leave-One-Outæ³ã§ã¢ãã«è©ä¾¡ãã¦ã¿ã¾ããå¦ç¿ãã¼ã¿16ç¹ã®ä¸ã¤ä¸ã¤ã«å¯¾ãã¦ãæ®ã15ç¹ããæ¨å®ããã¢ãã«ã«åºã¥ãã¦äºæ¸¬å¤ãåºããããããããããã¦æ¯ã¹ã¦ã¿ãã¨ãããã¨ããã¦ãã¾ãã
# Leave-One-Outæ³ã§3次å¤é å¼ã¢ãã«ã»9次å¤é å¼ã¢ãã«ããããã® # 交差æ¤è¨¼ãã¼ã¿ã¸ã®äºæ¸¬å¤ãç®åºãã > lm3_vec<-rep(0,16) > for (i in 1:16){ + tmp<-lm(y~V1+V2+V3,d[-i,c(1:3,10)]) + lm3_vec[i]<-predict(tmp,newdata=d[i,1:3]) + } > lm9_vec<-rep(0,16) > for (i in 1:16){ + tmp<-lm(y~.,d[-i,]) + lm9_vec[i]<-predict(tmp,newdata=d[i,-10]) + } # å¦ç¿ãã¼ã¿ã¨è¦æ¯ã¹ã¦ã¿ã > plot(x,y,cex=4,xlim=c(0,8),ylim=c(-80,120)) > par(new=T) > plot(x,lm3_vec,cex=4,xlim=c(0,8),ylim=c(-80,120),col='red') > par(new=T) > plot(x,lm9_vec,cex=4,xlim=c(0,8),ylim=c(-80,120),col='blue')
ã»ã¼å¦ç¿ãã¼ã¿ã«è¿½å¾ãã¦ãã3次å¤é å¼ã¢ãã«ã®èµ¤ãç¹ã«å¯¾ãã¦ã9次å¤é å¼ã¢ãã«ã®éãç¹ã¯ä¸é¨ããã£ã¦ã®æ¹åã«ã¶ã£é£ãã§ã¾ãããããè¦ãã ãã§ãã9次å¤é å¼ã¢ãã«ãéå¦ç¿ãèµ·ããã¦ãã¦ä¸é©åã ã¨ãããã¨ã¯å®¹æã«åãããã¨æãã¾ãã
ãã®ããã«ã交差æ¤è¨¼ãè¡ããã¨ã§å¦ç¿ãã¼ã¿ã®ã¿ããæå ã«ãªãç¶æ³ã§ãã£ã¦ããéå¦ç¿ãã¦ãããããæ±åæ§è½ã®é«ãã¢ãã«ãé¸ã¶ãã¨ãã§ãããã¨ããããã§ããç¾å®ã«ã¯Leave-One-Outæ³ã¯è¨ç®è² è·ãé«ãã¦å¿ ããã使ããã¨ã¯éããªãã®ã§ãé©å®hold-outæ³ãk-foldsæ³ãé¸æããã¨ãããããã¨ã
ä»ã«ããL1æ£ååã§è¦ããªã説æå¤æ°ãåãã¨ããèãæ¹ãããã¾ããæ£ååã«ã¤ãã¦ã¯ä¸è¨ã®ä»¥åã®ããã°è¨äºãåç §ã®ãã¨ãå®éã«ãã£ã¦ã¿ãçµæã以ä¸ãã¡ãªã¿ã«ããã§æ£ååãã©ã¡ã¼ã¿ãæ±ããéã«ããcv.glmneté¢æ°ã¯äº¤å·®æ¤è¨¼èª¤å·®ã«åºã¥ãã¦æé©å¤ã決ãã¦ãã¾ãã
> lm_regL1<-cv.glmnet(as.matrix(d[,-10]),y,family='gaussian',alpha=1) > plot(lm_regL1) > coef(lm_regL1,s=lm_regL1$lambda.min) 10 x 1 sparse Matrix of class "dgCMatrix" 1 (Intercept) -6.811253828 V1 21.132445191 V2 -5.071913288 V3 . V4 . V5 0.005820756 V6 . V7 . V8 . V9 .
ä½æ
ã3次ã®é
ãè½ã¡ã¦ã¦ã代ããã«5次ã®é
ãå
¥ã£ã¡ãã£ã¦ã¾ããï¼æ±ï¼ãä»åã®ä¾ã§ã¯ãã¾ããã¾ããããªãããã§ãããã ãéå»è¨äºã®ããã¹å大大ä¼ãã¼ã¿ãå·ææ¿å¹çãã¼ã¿ã§ã¯ãã¾ããã£ã¦ããã®ã§ãå²ã¨æ±ç¨çã«ä½¿ããææ³ã ã¨ãããã¨ã¯è¦ãã¦ããã¦è¯ãã§ãããã
æå¾ã«
以ä¸ã«è¿°ã¹ã¦ãããã¨ã¯ããã¡ããå¤é
å¼è¿ä¼¼ã®ãããªåæ©çãªãã¼ãã«éãããç·å½¢å帰ã¢ãã«ï¼éå帰åæï¼ãä¸è¬åç·å½¢ã¢ãã«ï¼ãã¸ã¹ãã£ãã¯å帰ãªã©ï¼ããµãã¼ããã¯ã¿ã¼ãã·ã³(SVM)ãã©ã³ãã ãã©ã¬ã¹ããXgboostãã¯ãã¾ãDeep Learningã¨è¨ã£ããã®ä»ã®çµ±è¨ã¢ããªã³ã°and/oræ©æ¢°å¦ç¿ã¢ãã«ã®ã»ã¼å
¨ã¦ã«å½ã¦ã¯ã¾ãã¾ããä¸è¨ã®å
容ã¨å
¨ãåããããªæç¶ãã§ãéå¦ç¿ãåé¿ãæ±åæ§è½ãä¸ãããã¨ãã§ãã¾ãã
ãã ããçµ±è¨ã¢ããªã³ã°ç³»ã®ææ³ã§ããã°AIC, BIC以ä¸æ§ã ãªæ å ±éåºæºã«åºã¥ãã¦è§£æçã«æ±åæ§è½ãè©ä¾¡ããæ¹æ³ãããã¾ããããå人çãªææ³ãªããããã ãã§ã¯äº¤å·®æ¤è¨¼ã«ã¯åã°ãªãã¨ããå°è±¡ãããã¾ãããããªã®ã§åºæ¥ãéã交差æ¤è¨¼ããæ¹ãè¯ãããªãã¨ã
ã¾ããããã¯å²ã¨å«ãªè©±ã§ããã交差æ¤è¨¼ããããã¨è¨ã£ã¦æ±åæ§è½ã確ä¿ã§ããã¨ã¯éããªããã±ã¼ã¹ãããã¨ãããã¨ãç¹ã«å¦ç¿ãã¼ã¿ã¨ï¼ãã¹ããã¼ã¿ã§ã¯ãªãæ¬å½ã®ï¼æ°è¦ã®æªç¥ãã¼ã¿ã¨ã§æ§è³ªãå ¨ãéããããªã±ã¼ã¹ã§ã¯ããããªæ±åæ§è½ã®é«ãã¢ãã«ã§ã太åæã¡ã§ãã¾ãããæã Kaggleã§ãã®æã®ãã¼ã¿ã»ãããåºã¦ãã¦ç©è°ãé¸ããã¨ãããã¾ãããå®åã§ãåæ§ã®ãã¨ã¯å°ãªããªãã§ãã
ããã¦ä¼¼ã¦éãªããã¿ã¼ã³ã¨ãã¦ã¯ã交差æ¤è¨¼ããã«ããããããéå¦ç¿ãèµ·ãããã¾ã¾ã«ãªããã±ã¼ã¹ããã£ãããç¹ã«ãµã³ãã«ãµã¤ãºï¼è¡æ°ï¼ã«æ¯ãã¦èª¬æå¤æ°ã®åæ°ï¼åæ°ï¼ãå¤ãé«æ¬¡å ã®ã±ã¼ã¹ã§ãå°ä¸ã¤å¯è¦åãå°é£ãªã¬ãã«ã®é«æ¬¡å ã ã¨ããããéå¦ç¿ãèµ·ããã¦ãããã©ãããåãããªãããªãã¦ãã¨ãããã念ã«ã¯å¿µãå ¥ãã¾ããããã¨ãããã¨ã§ã
ãªãããã®è¨äºã¯ãã¯ããã¿ã2ç« ãPRMLã¯ãã¾ãã«ã¹ãã©æ¬ã¨è¨ã£ãæ©æ¢°å¦ç¿ã®ããã¹ãã«åºã¦ãããéå¦ç¿ï¼æ±åæ§è½ã®ä¸ããè¶ ãããã¼ã«ãªãã£ãã ãã®ãã®ã§ãã¦ãä¾ãã°VCçè«ã¿ãããªæ±å誤差ã®çè«è§£æã¨ããããã話ã¯ã¾ã£ã£ã£ã£ã£ãã念é ã«ç½®ãã¦ããã¾ããã®ã§æªãããããã¨ãããããããVCçè«ã¨ãæªã ã«å ¨ãç解ã§ãã¦ãªãã®ã§èª°ãæãã¦ãã ããï¼æ³£ï¼ããã¡ããããã¤ãéãçä¸ã©ã¼ãã³ã°å¤§æè¿ãªã®ã§ééã£ã¦ããç¹ãªã©ããã°ã©ãã©ããææãã ããã
ã¡ãªã¿ã«ãã£ã¨çªã£è¾¼ãã æ©æ¢°å¦ç¿ã®çè«çãªè©±é¡ã«ãªã£ãããã®æã®æ±å誤差ã®çè«è§£æã¨ããæ¼ããã¦ãããã¨ã¯éè¦ãªã¯ããªã®ã§ããã®ãã¡åå¼·ãããããªãã¨ããã*3
*1:ã§ãå¤é å¼ã®æ¬¡æ°ã¯åãã¨ããç¬
*2:ããã¯ä¸å¿æ©æ¢°å¦ç¿ã«ããã¦åæã¨ãããæ³å®ãªã®ã§ãå¤éå¹æãä¼´ããªã©ããã«å½ã¦ã¯ã¾ããªãã±ã¼ã¹ã¯ããã§ã¯ä¸æ¦ç¡è¦ãã¾ã
*3:大ä½ããããæã¯çµ¶å¯¾ã«èªåããã¯åå¼·ããªã