ãã®è¨äºã¯ããããã¯ã¦ãã稼ãã§ãã¾ã£ãååã®è¨äºã®ç¶ãã§ãã
ASAã®ãã¬ã¹ãªãªã¼ã¹åã³å£°æã®ä¸ã«ã¯ã確ãã«ãpå¤ã«ä¾æ ããªãæ°ããªã¢ããã¼ãã®ä¾ãã¨ãã¦äºæ¸¬å¤ãéè¦ããã¢ããã¼ã*5ããã¤ã¸ã¢ã³ã¢ããªã³ã°ã決å®çè«çã¢ããã¼ã*6ããã³false discovery rate*7ã¨ãã£ããã®ãç¨ããã¹ããã¨ãã趣æ¨ã®ã³ã¡ã³ããå ¥ã£ã¦ãã¾ããã¨ã¯è¨ããéå帰åæã¨ãæ©æ¢°å¦ç¿ã®ãããªå¤å¤éã¢ããªã³ã°ï¼ãªããã¤ãµã³ãã«ãµã¤ãºã大ããï¼ãä¼´ããã¼ããªãã¨ããããçµ±è¨å¦ç仮説æ¤å®ã®ãããªãµã³ãã«ãµã¤ãºãå°ããï¼ãã¼ã¿ãå°ãªãï¼ã·ãã¥ã¨ã¼ã·ã§ã³ã§ã©ããããã ãçãªçåãæã¤äººãå¤ãã®ã§ã¯ãªããã¨ã
ãããªããã§ãå®éã«ããã£ã½ãå種æ¤å®ã®æ°ã ãStanã«ãããã¤ã¸ã¢ã³ã¢ããªã³ã°ã§ä»£æ¿ãã¦ã¿ãã®ã§ããã®è¨äºã§ã¯ãã®çµæãã¤ãã¤ãç´¹ä»ãã¦ã¿ããã¨æãã¾ãããã¼ãã¯åã åã®ãã¡ãã®è¨äºã®1ç¯ã§åãä¸ãã2種ã
ã¨ããããä»åã¯tæ¤å®ã¨ã«ã¤äºä¹æ¤å®ã ãåãä¸ãã¦ã¿ã¾ããANOVAã¯ããããç·å½¢å帰ã¢ãã«ã§ä»£æ¿å¯è½ãªã®ã§ãããã§ã¯å²æãã¾ããã¨ãããã¨ã§ã
ï¼å¯¾å¿ã®ããï¼tæ¤å®ããã¤ã¸ã¢ã³ã¢ããªã³ã°ã§ãã£ã¦ã¿ã
@TJO_datasci https://t.co/JTkxUd3J4q
— Dr. ãããã¿ãã (@SuperOyasumi) November 27, 2016
ã®è¨äºã«ãããæåã®Stanã³ã¼ãã¯ãtæ¤å®ã§ã¯ãªã対å¿ã®ããtæ¤å®ã®å½¢ã«ãªã£ã¦ããããã«æãã®ã§ããå¦ä½ã§ãããï¼ã¨ãããææã®éããä¸è¨ã®Stanã³ã¼ãã ã¨paired t-testã«ãªã£ã¦ãã¾ãã¾ãããããunpairedã«æ¹ããæ¹æ³ã¯ãã®ç¯ã®æå¾ã«è¿½è¨ãã¾ããã
ããããtæ¤å®ã£ã¦ã©ããã£ã¦ã¢ãã«ã§è¡¨ç¾ãããã ã£ãï¼ã¨æã£ããã§ãããããããèããããå¹³åå¤ã®å·®ãã®ã¢ããªã³ã°ãªã®ã§ããã§ååãªãã§ããã
è¦ã¯2ã¤ã®ãã¼ã¿ã®éã®å·®ã®äºå¾åå¸ã0ããååã«é¢ãã¦ããã¨ãããµã³ããªã³ã°çµæã«ãªãã°ãã¨ã¨ã¯ååã«å·®ãããã¨æ¨è«ã§ããããã§ãããããStanã³ã¼ãã§è¡¨ç¾ããã¨ãããªãã¾ãã
data { int<lower=0> N; real<lower=0> x1[N]; real<lower=0> x2[N]; } parameters { real s; real m; } model { for (i in 1:N) x2[i]~normal(x1[i]+m,s); }
ãã®Stanã³ã¼ããttest.stanã¨ããååã§ä¿åããä¸ã§ã以ä¸ã®Rã³ã¼ãã§kickãã¾ãã
> d<-read.csv('https://raw.githubusercontent.com/ozt-ca/tjo.hatenablog.samples/master/r_samples/public_lib/DM_sampledata/ch3_2_2.txt',header=T,sep=' ') > dat1<-list(N=nrow(d),x1=d$DB1,x2=d$DB2) > fit1<-stan(file='ttest.stan',data=dat1,iter=1000,chains=4) SAMPLING FOR MODEL 'ttest' NOW (CHAIN 1). Chain 1, Iteration: 1 / 1000 [ 0%] (Warmup) # ... # Chain 1, Iteration: 1000 / 1000 [100%] (Sampling)# # Elapsed Time: 0.008053 seconds (Warm-up) # 0.007259 seconds (Sampling) # 0.015312 seconds (Total) # # ... omitted ... # SAMPLING FOR MODEL 'ttest' NOW (CHAIN 4). Chain 4, Iteration: 1 / 1000 [ 0%] (Warmup) # ... # Chain 4, Iteration: 1000 / 1000 [100%] (Sampling)# # Elapsed Time: 0.007944 seconds (Warm-up) # 0.008107 seconds (Sampling) # 0.016051 seconds (Total) # # ... omitted ... # > fit1 Inference for Stan model: ttest. 4 chains, each with iter=1000; warmup=500; thin=1; post-warmup draws per chain=500, total post-warmup draws=2000. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat s 0.73 0.01 0.15 0.50 0.62 0.70 0.81 1.08 577 1.01 m 0.72 0.01 0.19 0.35 0.59 0.72 0.84 1.09 824 1.00 # âã³ã³ï¼ lp__ -1.78 0.05 1.08 -4.57 -2.29 -1.43 -0.97 -0.67 509 1.00 Samples were drawn using NUTS(diag_e) at Wed Mar 9 22:06:55 2016. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1). > fit1.coda<-mcmc.list(lapply(1:ncol(fit1),function(x) mcmc(as.array(fit1)[,x,]))) > plot(fit1.coda)
ã¨ãããã¨ã§ãä¸å¿ã®äºå¾åå¸ã®2.5%ç¹ã0ãã大ãããã¨ãããã¦ç¶ºéºãªåå³°æ§åå¸ã§ãµã³ããªã³ã°ããã¡ãã¨åæãã¦ããã¨ã¿ãªãããã¨ããã2ã¤ã®ãã¼ã¿ã®å¹³åå¤ã®å·®ã¯0ããååã«å¤§ããã¨è¨ã£ã¦è¯ãããã§ãã
è¿½è¨ (Mar 15, 2017)
以ä¸ã®Stanã¹ã¯ãªããã«æ¿ããã°ãunpairedãªæ¤å®ã«ãªãã¾ãã
data { int<lower=0> N; real<lower=0> x1[N]; real<lower=0> x2[N]; } parameters { real s; real m; } model { real q[N]; for (i in 1:N) q[i] <- x2[i] - x1[i]; q~normal(m,s); }> d<-read.csv('https://raw.githubusercontent.com/ozt-ca/tjo.hatenablog.samples/master/r_samples/public_lib/DM_sampledata/ch3_2_2.txt',header=T,sep=' ') > dat1<-list(N=nrow(d),x1=d$DB1,x2=d$DB2) > fit1<-stan(file='ttest.stan',data=dat1,iter=1000,chains=4) > fit1 Inference for Stan model: ttest. 4 chains, each with iter=1000; warmup=500; thin=1; post-warmup draws per chain=500, total post-warmup draws=2000. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat s 0.73 0.01 0.18 0.49 0.61 0.70 0.80 1.18 224 1.02 m 0.74 0.01 0.20 0.34 0.61 0.74 0.85 1.16 424 1.01 # âã³ã³ï¼ lp__ -1.87 0.11 1.44 -6.02 -2.24 -1.41 -0.96 -0.67 181 1.03 Samples were drawn using NUTS(diag_e) at Wed Mar 15 16:45:13 2017. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1). > fit1.coda<-mcmc.list(lapply(1:ncol(fit1),function(x) mcmc(as.array(fit1)[,x,]))) > plot(fit1.coda)pairedã®å ´åã«æ¯ã¹ã¦è¥å¹²äºå¾åå¸ãåºããªã£ã¦ãã¾ãããããã§ã2.5%ç¹ã0ãè¶ ãã¦ããã®ã§OKã¨è¦ã¦è¯ãããã§ãã
ã«ã¤äºä¹æ¤å®ããã¤ã¸ã¢ã³ã¢ããªã³ã°ã§ãã£ã¦ã¿ã
ãã¡ãã¯å¹³åå¤ã®å·®ã®ã¢ããªã³ã°ã§æ¸ãã tæ¤å®ã«æ¯ã¹ãã¨ããè¤éã§ãäºé
ãã¸ããã®æ çµã¿ã®ä¸ã§ã¢ããªã³ã°ãããã¨ã«ãªãã¾ããå¼ã«æ¸ãä¸ãã¨ãããªæãã§ããããã
ï¼ãã ãã¯CVæ°ãã¯éCVæ°ãã¯äºé åå¸ã®ç¢ºçãã©ã¡ã¼ã¿ãã¯å®æ°é ãã¯ä»å ¥å¹æãã©ã¡ã¼ã¿ãã¯ä»å ¥ã®æç¡ã表ãäºå¤ãã©ã¡ã¼ã¿ãã¯ä»å ¥è©¦é¨ãã¨ã®åä½å·®ã2è¡ç®ã¯ããããinverse logité¢æ°ï¼
ãã®å ´åã¯ä»å ¥å¹æãã©ã¡ã¼ã¿ã§ããã®äºå¾åå¸ãååã«0ããé¢ãã¦ããã°ãæ½çä»å ¥ã«ããCVå¼ãä¸ãå¹æããã¨æ¨è«ã§ãããã¨ã«ãªãã¾ããã§ãStanã§ã®æ¼ç®ã§ããå®ã¯ã«ã¤äºä¹æ¤å®ã®ã¡ã¿ã¢ããªã·ã¹ã§ä½¿ã£ãStanã³ã¼ãããã®ã¾ã¾è»¢ç¨ã§ãã¾ãã
ãã ãã以ä¸ã®ã³ã¼ãä¸åä½å·®ã表ãclï¼ä¸ã®ã¢ãã«å¼ä¸ã®ï¼ã¯ãããã0ã«ãªãã¯ã*1ã§ä¸è¦ãªã®ã§ãè¨ç®è² è·ãç¡é§ã«å¢ããã®ãæ°ã«å
¥ããªãã¨ãã人ã¯åé¤ãã¡ãã£ã¦ãã ããããããããããããã«ãµã³ããªã³ã°çµæãå¤ããããã§ãããæçµçãªçµè«ã«ã¯å½±é¿ããªãã¯ãã§ãã
data { int<lower=0> N; int<lower=0> ncv[N]; int<lower=0> cv[N]; real<lower=0,upper=1> iv[N]; } parameters { real cl[N/2]; real s; real a; real b; } model { real p[N]; s~uniform(0,1e4); cl~normal(0,s); for (i in 1:N/2){ p[2*i-1]<-inv_logit(a+b*iv[2*i-1]+cl[i]); p[2*i]<-inv_logit(a+b*iv[2*i]+cl[i]); } for (i in 1:N) cv[i]~binomial(ncv[i]+cv[i],p[i]); }
以åã®è¨äºåæ§ã«binom_hb_gen.stanã¨ããååã§ä¿åããä¸ã§ã以ä¸ã®Rã³ã¼ãã§kickãã¾ãã
> x<-data.frame(ncv=c(117,32),cv=c(25,16),iv=c(0,1)) > dat2<-list(N=2,ncv=x$ncv,cv=x$cv,iv=x$iv) > fit2<-stan(file='binom_hb_gen.stan',data=dat2,iter=1000,chains=4) SAMPLING FOR MODEL 'binom_hb_gen' NOW (CHAIN 1). Chain 1, Iteration: 1 / 1000 [ 0%] (Warmup) # ... # Chain 4, Iteration: 1000 / 1000 [100%] (Sampling)# # Elapsed Time: 2.40675 seconds (Warm-up) # 2.91676 seconds (Sampling) # 5.32351 seconds (Total) # # ... omitted ... # > fit2 Inference for Stan model: binom_hb_gen. 4 chains, each with iter=1000; warmup=500; thin=1; post-warmup draws per chain=500, total post-warmup draws=2000. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat cl[1] -283.14 275.47 560.17 -1371.49 -688.51 -158.22 63.54 768.06 4 1.42 s 902.56 325.75 808.28 41.42 279.46 671.98 1297.48 3132.93 6 1.79 a 281.58 275.48 560.18 -769.40 -65.01 156.69 686.87 1370.13 4 1.42 b 0.87 0.04 0.40 0.13 0.58 0.88 1.15 1.68 83 1.03 # âã³ã³ï¼ lp__ -104.30 0.28 1.37 -107.43 -105.07 -104.29 -103.44 -101.66 24 1.18 Samples were drawn using NUTS(diag_e) at Wed Mar 9 22:08:31 2016. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1). > fit2.coda<-mcmc.list(lapply(1:ncol(fit2),function(x) mcmc(as.array(fit2)[,x,]))) > plot(fit2.coda)
ã¨ãããã¨ã§ãä»å
¥å¹æãã©ã¡ã¼ã¿ã®äºå¾åå¸ã®2.5%ç¹ã0ãã大ãããã¨ãããã¦æ¦ãã¾ã¨ããªåå³°æ§åå¸ã§ãµã³ããªã³ã°ãåæãã¦ããã¨ã¿ãªãããã¨ãã*2ãæ½çã«ããä»å
¥å¹æã¯ååã«ï¼0ããï¼å¤§ããã¨è¨ã£ã¦è¯ãããã§ãã
ãã£ã¦ã¿ã¦æã£ããã¨
è¦ã¯ããã£ã¦confidence intervalãæ示çã«åºãã¨ããããæ¹ã®ããã«ç´°ããæ¹æ³è«ãªã®ã§ãæ®éã«CIåºãã°ããã®ããªã¨æãã¾ããããã ãCIãåºãã«ããã±ã¼ã¹ã§ã©ããããã¨ããã®ã¯ç¡è«ãã£ã¦ãããã¯èª²é¡ããªã¨ã
ããã¦ããã®ç¨åº¦ã®æ¤å®ã«ããããStanå¼ã£å¼µãåºãã¦ãã¦ãã¤ã¸ã¢ã³ã¢ããªã³ã°ããã®ã£ã¦ãã¶ã£ã¡ããå®å ¨ã«ãªã¼ãã¼ãã«ã ã¨æããã§ãããï¼æ±ï¼ãããã¨ã¯è¨ããå ´åã«ãã£ã¦ã¯ãã 漫ç¶ã¨æ¤å®ããã ãã§ãªãããã®ããã«ãã¡ãã¨ã¢ããªã³ã°ããçµæãè¦ããå¿ è¦ãããã®ããã¨æãã°ãè¦ãã¦ããã¦æã¯ãªãããã§ãã