ã«ã¼ãã®ã¼ã¡ãã³å¤§å¦ã®Cosma Shaliziæ°ã®ãRäºä¹å¤ã¯ä½ã®å½¹ã«ãç«ããªããé¢é£ã
ãã¼ã¸ãã¢å¤§å¦å³æ¸é¤¨ã®ã¯ã¬ã¤ã»ãã©ã¼ãæ°ã«ããã·ãã¥ã¬ã¼ã·ã§ã³ãã¿ã¦ã¿ããã
2015å¹´10æ16æ¥ï¼æ¨ï¼ãä¸ä¿¡æãæ±ããå¦çãRedditã«æ稿ãããç§ã®çµ±è¨å¦ã®ææããRäºä¹ã®å¤ã¯æ¬è³ªçã«å½¹ã«ç«ããªãã¨ããæ´è¨ãåããã®ã§ãããããã«ã¯ä½ãçå®ãããã®ã ãããï¼ãå°ãªãã¨ãRedditã®ä»ã®çµ±è¨ã«é¢ããæ稿ã¨æ¯è¼ããã¨ãããªãã®æ³¨ç®ãéããã
ãã®å¦çã®çµ±è¨å¦ã®ææã¯ãã«ã¼ãã®ã¼ã¡ãã³å¤§å¦ã®Cosma Shaliziæ°ã§ãããã¨ãå¤æãã¾ãããShaliziã¯ãå½¼ã®ææ¥ã®è¬ç¾©è³æãç¡æã§å ¬éãã¦ããã®ã§ãå½¼ãä¸ä½ä½ã«ã¤ãã¦ãæ´è¨ããåããã®ããè¦ããã¨ãã§ãããããã¯ãå½¼ã®ç¬¬10è¬ã®ãã¼ãã®3.2ç¯ããå§ã¾ã£ã¦ããã
R2ä¹ã¯å帰åºåã«ããç¾ããçµ±è¨éã§ãããã¨ãå¿ããã®ã ãããï¼ããã¯0ãã1ã¾ã§ã®å¤ã§ãé常ãå帰ã¢ãã«ã説æããå¿çã®å¤åã®ãã¼ã»ã³ãã¼ã¸ãè¦ç´ãã¦ããã¨è§£éãããããããã£ã¦ãR2ä¹ã0.65ã¨ããã®ã¯ãã¢ãã«ãå¾å±å¤æ°ã®å¤åã®ç´65%ã説æãããã¨ãæå³ããããã®ãã¸ãã¯ãèããã¨ãæã ã¯å帰ã¢ãã«ãé«ãR2ä¹ãæã¤ãã¨ã好ããããããShaliziã¯ã説å¾åã®ããè°è«ã«ãã£ã¦ãã®è«çã«ç°è°ãå±ãã¦ããã
Rã§ã¯ãé常ãã¢ãã«ã»ãªãã¸ã§ã¯ãã®summaryé¢æ°ãå¼ã³åºãã¦ãR2ä¹ãæ±ããã以ä¸ã¯ã·ãã¥ã¬ã¼ã·ã§ã³ãã¼ã¿ã使ã£ãç°¡åãªä¾ã§ããã
x <- 1:20 # ç¬ç«å¤æ° set.seed(1) # åç¾æ§ã®ãã y <- 2 + 0.5*x + rnorm(20,0,3) # å¾å±å¤æ°; x ã®é¢æ°ã§ãã©ã³ãã ãªèª¤å·®ã mod <- lm(y~x) # åç´ãªç·å½¢å帰 summary(mod)$r.squared # R2ä¹ã®å¤ã ããè¦æ±ããã
çµæ
[1] 0.6026682
R2ä¹ã¯ããã£ããå¤ã®åå·®ã®2ä¹ã®åè¨ããªãªã¸ãã«å¤ã®åå·®ã®2ä¹ã®åè¨ã§å²ã£ãå¤ã§è¡¨ãããã
ãã®ããã«ã¢ãã«ãªãã¸ã§ã¯ãã使ã£ã¦ç´æ¥è¨ç®ãããã¨ãã§ããã
f <- mod$fitted.values # ã¢ãã«ããé©åå¤(ã¾ãã¯äºæ¸¬å¤)ãåãåºã mss <- sum((f - mean(f))^2) # é©åå¤ã®åå·®ã®äºä¹ã®å tss <- sum((y - mean(y))^2) # å å¤ã®åå·®ã®äºä¹ã®ç·å mss/tss # R2ä¹
çµæ
[1] 0.6026682
ã§ã¯ãRäºä¹ã«é¢ããShaliziã®çºè¨ãããã¤ãåãä¸ããRã§ã®ã·ãã¥ã¬ã¼ã·ã§ã³ã§å®è¨¼ãã¦ã¿ããã
1. R2ä¹ã¯é©å度ã測ããã®ã§ã¯ãªãã
ã¢ãã«ãå®å ¨ã«æ£ããå ´åãæ£æçã«ä½ããªããã¨ããããÏ2ã大ãããããã¨ã§ãåç´ãªç·å½¢å帰ã¢ãã«ã®ãã¹ã¦ã®ä»®å®ãããããç¹ã§æ£ããã¦ããR2ä¹ã0ã«è¿ã¥ãããã¨ãã§ããã
Ï2ã¨ã¯ï¼ç·å½¢å帰ãå®è¡ããã¨ããæã ã¯ã¢ãã«ãå¾å±å¤æ°ãã»ã¨ãã©äºæ¸¬ããã¨ä»®å®ããããã»ã¼ãã¨ãæ£ç¢ºãã®å·®ã¯ãå¹³å0ãæã ãÏ2ã¨å¼ã¶åæ£ãæã¤æ£è¦åå¸ããã®æ½é¸ã§ããã¨ä»®å®ãããã
Shaliziã®çºè¨ã¯ãå®è¨¼ããã®ãã¨ã¦ãç°¡åã§ãããããã§ã¯ã(1)åç´ãªç·å½¢å帰ã®ä»®å®ï¼ç¬ç«ãã観測å¤ãä¸å®ã®åæ£ãæã¤æ£è¦åå¸ã®èª¤å·®ï¼ãæºãããã¼ã¿ãçæãã(2)åç´ãªç·å½¢ã¢ãã«ããã¼ã¿ã«é©åããã(3)R2ä¹ãå ±åããé¢æ°ãä½æããã簡便ã«ãããããã©ã¡ã¼ã¿ã¯ã·ã°ãã®ã¿ã§ãããã¨ã«æ³¨æãã¦ã»ããã次ã«ããã®é¢æ°ãÏã®å¤ãå¢å ããç³»åã«ãé©ç¨ããã¦ãçµæãããããããã
r2.0 <- function(sig){ x <- seq(1,10,length.out = 100) # æã ã®äºæ¸¬å¤æ° y <- 2 + 1.2*x + rnorm(100,0,sd = sig) # å¿çï¼ x ã¨ããã¤ãã®ã©ã³ãã ãã¤ãºã®é¢æ°ã§ããã summary(lm(y ~ x))$r.squared # R2 ä¹ã®å¤ã表示ããã } sigmas <- seq(0.5,20,length.out = 20) rout <- sapply(sigmas, r2.0) # ä¸é£ã®ã·ã°ãå¤ã«å¯¾ãã¦é¢æ°ãé©ç¨ããã plot(rout ~ sigmas, type="b")
確ãã«ãã¢ãã«ãããããç¹ã§å®å ¨ã«æ£ããã«ãããããããã·ã°ãã大ãããªãã¨R2ä¹ã¯å¤§ããæ¸å°ãã¾ãã
2. R2ä¹ã¯ãã¢ãã«ãå®å ¨ã«ééã£ã¦ããå ´åãä»»æã«1ã«è¿ã¥ãããã¨ãã§ããã
ç¹°ãè¿ãã«ãªãããR2ä¹ã¯é©å度ã測ããã®ã§ã¯ãªããã¨ãããã¨ã§ãããããã§ã¯ãShaliziã®è¬ç¾©10ã®ãã¼ãã®å¥ã®ã»ã¯ã·ã§ã³ã«ããã³ã¼ãã使ã£ã¦ãéç·å½¢ãã¼ã¿ãçæãã¦ã¿ããã
set.seed(1) x <- rexp(50,rate=0.005) # äºæ¸¬å¨ã¯ææ°åå¸ããã®ãã¼ã¿ y <- (x-1)^2 * runif(50, min=0.8, max=1.2) # éç·å½¢ãã¼ã¿çæ plot(x,y) ãããã ãããã # æããã«éç·åã§ãããã¨ã確èªãã
ããã§ãR2ä¹ã確èªããã
summary(lm(y ~ x))$r.squared
[1] 0.8485146
ããã0.85ã¨é常ã«é«ãå¤ã§ãããããã®ã¢ãã«ã¯å®å ¨ã«ééã£ã¦ããããã®ä¾ã§ã¢ãã«ã®ãgoodnessããæ£å½åããããã«R2ä¹ã使ãã®ã¯ééãã§ããããé¡ããã°ãã¾ããã¼ã¿ããããããã¦ããã®å ´åã®åç´ãªç·å½¢å帰ãä¸é©åã§ãããã¨ãèªèãã¦ã»ããã§ããã
3.R2ä¹ã¯ãÏ2ãå ¨ãåãã§ãä¿æ°ãå¤ãããªãã¦ããäºæ¸¬èª¤å·®ã«ã¤ãã¦ä½ãè¨åã§ããªãã
R2ä¹ã¯Xã®ç¯å²ãå¤ããã ãã§ã0ãã1ã®éã®ã©ãã«ã§ããªããäºæ¸¬èª¤å·®ã®ææ¨ã¨ãã¦ã¯ãå¹³åäºä¹èª¤å·®ï¼MSEï¼ã使ãæ¹ãããã ããã
MSEã¯åºæ¬çã«ããã£ããããyå¤ãã観測ãããyå¤ãå¼ãããã®ã2ä¹ãããããåè¨ãã観測åæ°ã§å²ã£ããã®ã§ããã
ã¾ããåç´ãªç·å½¢å帰ã®ä»®å®ããã¹ã¦æºãããã¼ã¿ãçæããyãxã«å帰ãã¦R2ä¹ã¨MSEã®ä¸¡æ¹ãè©ä¾¡ãããã¨ã§ããã®è¨è¿°ãå®è¨¼ãã¦ã¿ããã
x <- seq(1,10,length.out = 100) set.seed(1) y <- 2 + 1.2*x + rnorm(100,0,sd = 0.9) mod1 <- lm(y ~ x) summary(mod1)$r.squared
[1] 0.9383379
sum((fitted(mod1) - y)^2)/100 # å¹³åäºä¹èª¤å·®
[1] 0.6468052
ä»åº¦ã¯ä¸ã®ã³ã¼ããç¹°ãè¿ãããä»åº¦ã¯xã®ç¯å²ãå¤ãã¦ã»ããã
x <- seq(1,2,length.out = 100) # xã®æ°ããç¯å² set.seed(1) y <- 2 + 1.2*x + rnorm(100,0,sd = 0.9) mod1 <- lm(y ~ x) summary(mod1)$r.squared
[1] 0.1502448
sum((fitted(mod1) - y)^2)/100 # å¹³åäºä¹èª¤å·®
[1] 0.6468052
R2ä¹ã¯0.94ãã0.15ã«ä½ä¸ãã¾ããããMSEã¯å¤ãããªãã£ããè¨ãæããã°ãäºæ¸¬è½åã¯ä¸¡æ¹ã®ãã¼ã¿ã»ããã§åãã§ãããR2ä¹ãè¦ãã¨ãæåã®ä¾ã®æ¹ãä½ããã®å½¢ã§ããäºæ¸¬è½åã®é«ãã¢ãã«ãæã£ã¦ããã¨æãããã®ã§ããã
4.R2ä¹ã¯ãå¤æããã¦ããªãYã¨å¤æãããYã®ã¢ãã«éãã¾ãã¯Yã®ç°ãªãå¤æéã®æ¯è¼ã¯ã§ããªãã
ããã§ã¯ãå¤æãæå¹ãªãã¼ã¿ãçæãã¦æ¤è¨¼ãã¦ã¿ããã以ä¸ã®Rã³ã¼ãã¯ã以åã®åãçµã¿ã¨é常ã«ããä¼¼ã¦ããããä»åº¦ã¯yå¤æ°ãææ°é¢æ°åãããã¨ã«æ³¨æãã¦ã»ããã
x <- seq(1,2,length.out = 100) set.seed(1) y <- exp(-2 - 0.09*x + rnorm(100,0,sd = 2.5)) summary(lm(y ~ x))$r.squared
[1] 0.003281718
plot(lm(y ~ x), which=3)
R2ä¹ã¯é常ã«ä½ããæ®å·®å¯¾é©åããããã¯å¤ãå¤ãä¸å®ã§ãªãåæ£ãæããã«ããããã®åé¡ã解決ããã«ã¯ããã¼ã¿ã対æ°å¤æãããã¨ãä¸è¬çã§ãããããã試ãã¦ãä½ãèµ·ãããè¦ã¦ã¿ããã
plot(lm(log(y)~x),which = 3)
診æããããã¯ããªãè¯ããªã£ã¦ããããã ãåæ£ãä¸å®ã§ããã¨ããæã ã®ä»®å®ã¯æºãããã¦ããããã«è¦ããããããR2ä¹ãè¦ã¦ã»ããã
summary(lm(log(y)~x))$r.squared
[1] 0.0006921086
ããã«ä½ããªã£ã¦ãã! ããã¯æ¥µç«¯ãªä¾ã§ããã¤ããã®ããã«ãªãããã§ã¯ãªããå®éã対æ°å¤æãè¡ãã¨ãé常ã¯R2ä¹ãå¢å ãããããããå ã»ã©ç¤ºããããã«ãããããæºããããä»®å®ãå¿ ãããé«ãR2ä¹ãããããã¨ã¯éããªãããããã£ã¦ãR2ä¹ã¯ã¢ãã«éã§æ¯è¼ãããã¨ãã§ããªã®ã§ããã
5.R2ä¹ã¯å帰ã«ãã£ã¦ã説æãããåæ£ã®å²åãã§ããã¨è¨ãã®ã¯é常ã«ä¸è¬çã§ãããããããXãYã«å帰ãããããå ¨ãåãR2ä¹ãå¾ãããããã®ãã¨èªä½ãé«ãR2ä¹ã¯ããå¤æ°ãå¥ã®å¤æ°ã§èª¬æãããã¨ã«ã¤ãã¦ã¯ä½ãèªã£ã¦ããªããã¨ã示ãã®ã«ååã§ãããã
ããã¯ãæãå®è¨¼ããããã
x <- seq(1,10,length.out = 100) y <- 2 + 1.2*x + rnorm(100,0,sd = 2) summary(lm(y ~ x))$r.squared
[1] 0.7065779
summary(lm(x ~ y))$r.squared
[1] 0.7065779
xãyã説æããã®ããyãxã説æããã®ãï¼ãã説æãããã¨ããè¨èã¯ãåå ãã¨ããè¨èãé¿ãã¦ããã®ã ãããï¼ããã®ãããª2å¤æ°ã®åç´ãªã·ããªãªã§ã¯ãR2ä¹ã¯åã«xã¨yã®ç¸é¢ã®2ä¹ã§ããã
all.equal(cor(x,y)^2, summary(lm(x ~ y))$r.squared, summary(lm(y ~ x))$r.squared)
[1] TRUE
ãã®å ´åãR2ä¹ã®ä»£ããã«ç¸é¢ã使ãã°ããã®ã§ã¯ï¼ãããããç¸é¢ã¯ç·å½¢é¢ä¿ãè¦ç´ãããã®ã§ããã¼ã¿ã«ã¯é©ãã¦ããªããããããªãããã®ãããªå ´åã«ãããã¼ã¿ããããããããã¨ãå¼·ãæ¨å¥¨ãããã
æ¹ãã¦ç¢ºèªãããã
- R2ä¹ã¯é©å度ã測ããã®ã§ã¯ãªãã
- R2ä¹ã¯äºæ¸¬èª¤å·®ã測å®ããªãã
- R2ä¹ã¯ãå¤æãããåçã使ç¨ãã¦ã¢ãã«ãæ¯è¼ãããã¨ã¯ã§ããªãã
- R2ä¹ã¯ãããå¤æ°ãä»ã®å¤æ°ãã©ã®ããã«èª¬æãããã測å®ããªãã
Shaliziæ°ã¯è¬ç¾©ãã¼ãã§ããã«å¤ãã®çç±ãè¿°ã¹ã¦ãããããã¦ãAdjusted R-squaredã¯ãããã®åé¡ã®ã©ãã«ã対å¦ãã¦ããªããã¨ã«çæãã¹ãã§ããã
ã§ã¯ãR2ä¹ã使ãçç±ã¯ããã®ã ãããï¼ãShaliziã¯ããªããã¨è¨ã£ã¦ããï¼ãç§ã¯ããããå ¨ãå½¹ã«ç«ã£ãã¨ããç¶æ³ãè¦ã¤ãããã¨ããªããï¼ãééããªããä¸é¨ã®çµ±è¨å¦è ãRedditorã¯å対ããã§ããããããªãã®è¦è§£ãã©ãã§ããããã¼ã¿åæã«R2ä¹ã使ãã®ã§ããã°ããããããªãã®èãã¦ãããã¨ãä¼ãã¦ãããã©ãããå確èªããã®ãè³¢æã ããã