ä¸è¬ã«ããã¼ã¿åæã®å¤§åã¯ããã»ã©é«åº¦ãªãã¯ããã¯ã®é¡ãå¿ è¦ã¨ããªããã®ã§ããåã常æ¥é ããå£ã«åºãã¦è¨ããã¨ãå¤ããã§ããããçµ±è¨å¦ã ã®æ©æ¢°å¦ç¿ã ã®ã®åºçªãªãã¦ããããå°ãªãã¦å½ããåããå·¥æ°ããããããã§ããã°ãããªãæ¹ãè¯ãã§ããã¶ã£ã¡ããåç´ãªååæ¼ç®ã§ååãªã±ã¼ã¹ã®æ¹ãå¤æ°æ´¾ã§ãããã
ãªã®ã§ãæ®æ®µã¯DBä¸ã§SQLï¼ã¨ãããHiveãªã©ï¼ã§ãµã¯ãã¨ååæ¼ç®ã ãã§éè¨å¦çãæ¸ã¾ãã¦ãã¾ã£ã¦ããã®çµæã ãã表示ããããã«ãã¦ãããæ¹ãå§åçã«æ¥½ã§æã£åãæ©ãã¯ããå¤ãã®BIãã¼ã«ãããããèãã®ãã¨ã§ä½ããã¦ããã¨æãã¾ãã
ã¨ãããã©ã£ãããä¸ã®ä¸ã«ã¯ãåç´ãªååæ¼ç®ã§ã®éè¨çµæã¨ããã¼ã¿ãµã¤ã¨ã³ã¹ãé§ä½¿ããåæçµæã¨ã§ãé£ãéã£ã¦ãã¾ãã±ã¼ã¹ãä½æ ããããã¨ãç¥ããã¦ãã¾ããã©ã¡ããã¨è¨ãã¨ã¬ã¢ã±ã¼ã¹ã ã¨ã¯æãã¾ããããã®çç¾ããããªãã«ããã¨ã¨ãã§ããªããã¨ã«ãªããã¨ãå¤ã ããã¾ãã®ã§ãä»åã¯ãã®ãã¡ã®ããããä¸é¨ã«ã¤ãã¦ãã©ããªäºä¾ãã§ãã©ããã¦çç¾ãèµ·ããã®ãããæ¸ãã¦ã¿ã¾ãã
ï¼â»ç¶ç·¨ãããã¾ãâãなぜ項目ごとに単純な集計をするより、多変量解析(重回帰分析)をした方が正確な結果を返すのかãï¼
ãçµã¿åããããå¼·ãå½±é¿ãã¦ããã±ã¼ã¹
æ¬ç·¨
GitHubã«サンプルデータãä¸ãã¦ããã¾ãããæä½ãã®ãµã³ãã«ãã¼ã¿ãªã®ã§çµæãå¾®å¦ããã§ãã*1ãããã¯äºããäºæ¿ããããRã§æ¼ç¿ããªããèªã¿é²ããã¨ããæ¹ã¯å¿ è¦ãªRããã±ã¼ã¸ã¨ãã¦{randomForest}, {arules}, {arulesViz}*2ãã¤ã³ã¹ãã¼ã«ãã¦ä¸ããã
ã¤ã¡ã¼ã¸ã¨ãã¦ã¯ãä½ãã®ECãµã¤ãã§ã®è¡åãã°ãa1ããa7ãã¦ã¼ã¶ã¼è¡åã®æç¡ã0 or 1ã§è¨é²ãããã®ã§ãcvãã³ã³ãã¼ã¸ã§ã³ã®æç¡ï¼Yesãªãæãã»Noãªãç¡ãï¼ãé©å½ã«ä½ã£ãã®ã§ãCVRã¯ã´ã£ãã50%ã§ããããã§CVã®æç¡ã«ãã£ã¦ããããã®ã¦ã¼ã¶ã¼è¡åã®æç¡ã®ãã¼ã»ã³ãã¼ã¸ãéè¨ãã¦ã¿ãã¨ã
a1 | a2 | a3 | a4 | a5 | a6 | a7 | CV |
---|---|---|---|---|---|---|---|
40.1% | 58.3% | 47.9% | 94.2% | 30.7% | 5.6% | 50.0% | No |
60.5% | 41.7% | 49.4% | 43.6% | 68.4% | 92.7% | 49.3% | Yes |
ã¨ããæãã§ãã©ã®ææ¨ãCVRã«å¹ãã¦ãããã大ä½åããã¾ããä¾ãã°a6ã¯CV = "Yes"ã®å´ã«å¯ããã®ã«ãã®ãããè²¢ç®ãã¦ãããªãã¨ããa4ã¯éã«CV = "No"ã®å´ã«å¯ããã®ã«å¼·ãè²¢ç®ãã¦ãããããã¨ããããããããããªå·®*3ã§ã¯ããã¾ãããa7ã¯CV = "No"ã®å´ã«å¯ããã®ã«è²¢ç®ãã¦ããã¨è¨ãããã§ããã
ã¨ããã§ããã®å½¢ã®ãã¼ã¿ã¯æ®éã«ä¾ãã°GLMã¨ãæ©æ¢°å¦ç¿ã®è«¸ææ³ã¨ãã«ããããã¨ãã§ãã¾ããã¨ããããRã§GLMãã£ã¦ã¿ã¾ããããCVã®æç¡ã¨ããäºå¤ãã¼ã¿ãªã®ã§*4ãfamily="binomial"ã§è¨ç®ãã¦ã¿ãã¨
# ãµã³ãã«ãã¼ã¿ã¯"sample_d"ã¨ãããã¼ã¿ãã¬ã¼ã ã«å ¥ãã¦ãããã®ã¨ãã > sample_d.glm<-glm(cv~.,sample_d,family="binomial") > summary(sample_d.glm) Call: glm(formula = cv ~ ., family = "binomial", data = sample_d) Deviance Residuals: Min 1Q Median 3Q Max -3.6404 -0.2242 -0.0358 0.2162 3.1418 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.37793 0.25979 -5.304 1.13e-07 *** a1 1.05846 0.17344 6.103 1.04e-09 *** a2 -0.54914 0.16752 -3.278 0.00105 ** a3 0.12035 0.16803 0.716 0.47386 a4 -3.00110 0.21653 -13.860 < 2e-16 *** a5 1.53098 0.17349 8.824 < 2e-16 *** a6 5.33547 0.19191 27.802 < 2e-16 *** a7 0.07811 0.16725 0.467 0.64048 # âã³ã¬ã ãã³ã¬ï¼ï¼ï¼ --- Signif. codes: 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 4158.9 on 2999 degrees of freedom Residual deviance: 1044.4 on 2992 degrees of freedom AIC: 1060.4 Number of Fisher Scoring iterations: 7
ããï¼a7ã¯ã¯ãã¹éè¨ãããã¼ã»ã³ãã¼ã¸ã ã¨CV = "No"ã«å¯ããã®ã«è²¢ç®ãã¦ããã¯ããªã®ã«ãGLMã®çµæã¯CV = "Yes"ã«å¯ããã®ã«è²¢ç®ãã¦ãããå¾åã*5ããããããã¨ããçµæã«ãªã£ã¦ã¾ããã¤ã¾ããã¯ãã¹éè¨ããã ãã®çµæã¨GLMã®çµæã¨ãçç¾ãã¦ããã¨ãããã¨ã§ããææãªå¤ã§ã¯ãªãã®ã§ã©ãã§ãããã¨è¨ãã°ããã¾ã§ã§ããããã
æ°ã«ãªã£ãã®ã§ãä¸å¿Random Forestã§å¤æ°éè¦åº¦ããã§ãã¯ãã¦ã¿ã¾ãã
> tuneRF(sample_d[,-8],sample_d[,8],doBest=T) # ä¸å¿ã°ãªãããµã¼ãã§ãã¥ã¼ãã³ã° mtry = 2 OOB error = 6.67% Searching left ... mtry = 1 OOB error = 8.6% -0.29 0.05 Searching right ... mtry = 4 OOB error = 6.63% 0.005 0.05 Call: randomForest(x = x, y = y, mtry = res[which.min(res[, 2]), 1]) Type of random forest: classification Number of trees: 500 No. of variables tried at each split: 4 OOB estimate of error rate: 6.37% Confusion matrix: No Yes class.error No 1391 109 0.07266667 Yes 82 1418 0.05466667 > sample_d.rf<-randomForest(cv~.,sample_d,mtry=4) > importance(sample_d.rf) MeanDecreaseGini a1 18.593800 a2 10.052819 a3 6.983980 a4 189.841509 a5 52.050336 a6 956.140215 a7 8.536062 # âã³ã¤ãã§ã
ãããã«a7ã®å¤æ°éè¦åº¦ã¯ããã»ã©å¤§ãããªãã§ãããããã§ãæä¸ä½ã®a3ããã¯ããããªãã大ããã§ããããããã¯ã©ã¤ã¢ã³ããããã¨ã«ããa1-a7ã®å
¨ã¦ã«ããããCVãå¢ããoræ¸ããå¹æããããæãã¦ãããã¨è¨ãããããå°ã£ã¦ãã¾ãã±ã¼ã¹ã§ã*6ã
ä½æ ãããªãã¨ãèµ·ãããã§ããããï¼ãããã確ãããããã«ã¢ã½ã·ã¨ã¼ã·ã§ã³åæï¼ãã¹ã±ããåæï¼ã«ããã¦ã¿ã¾ãã*7ããã®çµæã{arulesViz}ã®ã°ã©ãæ§é ããããã§å¯è¦åãããªããã¤ãã®ã°ã©ãæç»ã¢ã«ã´ãªãºã ããé¡ä¼¼ãã¦ãããã®ã¯ããè¿ãã¨ããã«é ç½®ããããFruchterman-Reingoldアルゴリズムã«è¨å®ããã¨ãããªãã¾ãã
ãã®å³ããã¯cv_yesãããã¯cv_noããã®è·é¢ã«åºã¥ãã¦ããããã®CVã¸ã®è²¢ç®åº¦ã®å¼·å¼±ãè¦ã¦åãã¾ãããããä¸ã®GLM & Random Forestã®çµæã¨è¦æ¯ã¹ãã¨ã»ã»ã»ãããä¾ãã°a6ãCV = "Yes"ã«å¼·ãè²¢ç®ãã¦ããã¨ãa4ãCV = "No"ã«å¼·ãè²¢ç®ãã¦ããã¨ããããã«ä¸è´ãã¦ããé¨åããããã§ãããåé¡ã¯ä¾ã®a7ã§ããa7ã®ä½ç½®ã¯cv_yesã«æ¯è¼çè¿ãä½ç½®ãããã¦a6ã«ããè¿ãä½ç½®ã«ããã¾ãã
å®ã¯ããã®ãµã³ãã«ãã¼ã¿ãä½ãæã«åã¯ããã¦CV= "Yes"ã¨ãªãã±ã¼ã¹ã®ä¸é¨ã«éã£ã¦a7ãa6ã¨å¼·ãç¸é¢ããããã«ç´°å·¥ãã¦ããã¾ãããããªãã¡ããã¼ã¿ã®ä¸ã«ã¯ãa6 & a7ãã¨ããçµã¿åãããCV = "Yes"ã¨å¼·ãé¢é£ããã±ã¼ã¹ãä¸é¨ãªããæ··ãã£ã¦ããã¨ããããã§ãã
ããããã¯ãã¹éè¨ãããã¼ã»ã³ãã¼ã¸ã®å¤§å°ã§è¦ãã¨a7ã¯CV = "No"ã«ããå¼·ãè²¢ç®ãã¦ããã®ã«ããã¼ã¿ãµã¤ã¨ã³ã¹ã®è«¸ææ³ãç¨ããã¨å®ã¯CV = "Yes"ã«ããå¼·ãè²¢ç®ãã¦ãããã¨ã«ãªããã¨ããçµæã«ã¤ãªãã£ããã§ããã¼ã
ã¨ãããªããªãã§ããããããå®éã®ãã¼ã¿åæã®ç¾å ´ã ã£ããããªãæ©ã¾ããã·ãã¥ã¨ã¼ã·ã§ã³ã§ãããã©ããã¦ãä½ãããã®ã¬ãã¼ããåºããªãããããªãã¨ãããã¨ã§ããã°ãåãªããa7ã¯a6ã¨ã®çµã¿åããã«ããã¦ã®ã¿CVRå¢ã«è²¢ç®ãã¦ããã¨èããããã®ã§ãa7ã¯a6ã¨ã®éã«ã©ãããé¢ä¿æ§ãããã®ãããã£ã¨ç´°ãã調ã¹ãã¹ããã¨çãããã¨ã«ãªãã¨æãã¾ããã¾ãèªåã®ä»äºãå¢ããã ãã«ãªãã¾ããï¼ç¬ï¼ã
ãªããä»åã®ãµã³ãã«ãã¼ã¿ã§ã¯ãµã³ãã«ãµã¤ãºã3000ã¨å°ããã£ãã®ã§a7ã®ãããªçç¾ããçµæãè¦ããå¤æ°ã®å帰ä¿æ°ãGLMã§ã¯ææã«ãªã£ã¦ãã¾ããã§ãããããã£ã¨ãµã³ãã«ãµã¤ãºã馬鹿ã§ããå ´åã¯a7ã®ããã«çç¾ããçµæãè¦ããä¸ã«ããã«çµ±è¨å¦çã«ææã¨ãããã¼ã¿ãµã¤ã¨ã³ã¹åæçµæã«ãªããã¨ãå ¨ãçãããªãã®ã§ã注æãå¿ è¦ã§ãã
ããã³ããããã ãã¾ãã
è£å çãã以ä¸ã®ãã¨ãã
å¤å¤é解æã§ã¯ï¼ä»ã®å¤æ°ãåæã«å½±é¿ãä¸ããã¨èãã訳ã§ããããï¼éã« a7 㨠cv ã®ãæ£å³ã®ç¸é¢é¢ä¿ããç¥ããããªãã°ã©ãããããåç¸é¢ä¿æ°ã§ãããã
a1 a2 a3 a4 a5 a6 a7 a2 -0.027 a3 -0.005 0.003 a4 -0.005 0.012 0.019 a5 0.027 -0.007 -0.025 0.015 a6 -0.031 -0.013 0.018 -0.020 0.015 a7 0.006 -0.029 -0.031 0.007 -0.003 -0.011 cv 0.112 -0.059 0.003 -0.284 0.176 0.807 0.006
a1 ï½ a6 ã®å½±é¿ãåãé¤ãã a7 㨠cv ã®åç¸é¢ä¿æ°ã¯ 0.006 ã¨ãããã¨ã§ãã
åç¸é¢ä¿æ°ã¯ -0.007 ã¨è² ã®å¤ã§ãã£ããã©ï¼åç¸é¢ä¿æ°ã¯æ£ã®å¤ã§ãã
ãã¸ããã¢ãã«ã¯éå帰åæã¨ã¯éããã©ï¼åç¸é¢é¢ä¿ãããããã«ããã¨ããç¹ã§ã¯åããããããï¼ãã¸ããã¢ãã«ã«ããéå帰ã¢ãã«ã«ããï¼ç¬ç«å¤æ°ã«æããããä¿æ°ã¯ãåå帰ä¿æ°ãã§ãã£ã¦ï¼ããã¯åç¸é¢ä¿æ°ã®ãåãã¨åããã¤ã¾ãï¼ä»ã®å¤æ°ã®å½±é¿ãåãé¤ãã¦ï¼ãã®å¤æ°ã«æããéã¿ã¨ãããã¨ãç¸é¢ä¿æ°ãè©ä¾¡ããã¨ããã®ã¯ï¼å¾å±å¤æ°ãç¬ç«å¤æ°1åã§åå帰ï¼ç´ç·å帰ï¼ããã¨ããã®ã¨åãï¼æ°å¦çã«ç価ï¼ã
åç´ãªåæã§ç·åçãªåæï¼å¤å¤é解æï¼ã®çµæã¯äºæ¸¬ã§ããªãã¨ãããã¨ã
ãä½æ ããããã®ã§ã¯ãªãï¼ãå¿ ç¶çã«ãããããããã¾ã«åç´ãªåæçµæããäºæ³ããçµæãç·åçãªçµæã¨ä¸è´ãããã¨ãããã®ã ãã
ã¾ãã¨ã«ä»°ãéãã§ã*8ãã¨ããããä½æ
ããã§åã¯çã£å
ã«åç¸é¢ä¿æ°ãèããªãã£ãã®ãããããªãæéã¨æéã®ãããã¢ã½ã·ã¨ã¼ã·ã§ã³åæããå¿
è¦å
¨ããªãã§ãã*9ãå¦é¨1å¹´çã«æ»ã£ã¦ããç´ãã¦ãã¾ãããã
ã¨ãããã¨ã§ãèªåã®åå¼·ã®ããã«Rä¸ã§æãåããã¦ã¿ã¾ãããå ¨é¨çµãã§ããããã§ãããæéãªã®ã§{ppcor}ããã±ã¼ã¸ãã¤ã³ã¹ãã¼ã«ãã¦ä½¿ãã¾ããã
# sample_d$cvã0 or 1ã«as.numeric()ãªã©ã§å¤æãã¦ãã > pcor(sample_d2) $estimate a1 a2 a3 a4 a5 a1 1.000000000 -0.027097667 -0.004665751 -0.005058604 0.026653584 a2 -0.027097667 1.000000000 0.003302999 0.011735712 -0.007277052 a3 -0.004665751 0.003302999 1.000000000 0.019140545 -0.025398517 a4 -0.005058604 0.011735712 0.019140545 1.000000000 0.014636602 a5 0.026653584 -0.007277052 -0.025398517 0.014636602 1.000000000 a6 -0.030659270 -0.013304820 0.018187916 -0.020112145 0.015498244 a7 0.006356354 -0.028772727 -0.030596755 0.007394668 -0.002781801 cv 0.111647484 -0.058979223 0.002985735 -0.284084473 0.175562475 a6 a7 cv a1 -0.03065927 0.006356354 0.111647484 a2 -0.01330482 -0.028772727 -0.058979223 a3 0.01818792 -0.030596755 0.002985735 a4 -0.02011215 0.007394668 -0.284084473 a5 0.01549824 -0.002781801 0.175562475 a6 1.00000000 -0.010926019 0.807244783 a7 -0.01092602 1.000000000 0.005953911 cv 0.80724478 0.005953911 1.000000000 # âã³ã³ããææãããã ããç®æ $p.value a1 a2 a3 a4 a5 a1 0.000000e+00 0.138136964 0.79855664 7.820066e-01 1.447173e-01 a2 1.381370e-01 0.000000000 0.85662479 5.208875e-01 6.905865e-01 a3 7.985566e-01 0.856624786 0.00000000 2.950240e-01 1.646120e-01 a4 7.820066e-01 0.520887465 0.29502400 0.000000e+00 4.233077e-01 a5 1.447173e-01 0.690586496 0.16461200 4.233077e-01 0.000000e+00 a6 9.338115e-02 0.466719422 0.31972235 2.711839e-01 3.965254e-01 a7 7.280697e-01 0.115372742 0.09405176 6.858500e-01 8.790585e-01 cv 7.973833e-10 0.001230386 0.87026806 4.505601e-59 1.762439e-22 a6 a7 cv a1 0.09338115 0.72806975 7.973833e-10 a2 0.46671942 0.11537274 1.230386e-03 a3 0.31972235 0.09405176 8.702681e-01 a4 0.27118386 0.68584999 4.505601e-59 a5 0.39652541 0.87905852 1.762439e-22 a6 0.00000000 0.55005353 0.000000e+00 a7 0.55005353 0.00000000 7.446666e-01 cv 0.00000000 0.74466664 0.000000e+00 $statistic a1 a2 a3 a4 a5 a6 a1 0.0000000 -1.4827646 -0.2552155 -0.2767050 1.4584473 -1.6778256 a2 -1.4827646 0.0000000 0.1806723 0.6419780 -0.3980593 -0.7278271 a3 -0.2552155 0.1806723 0.0000000 1.0471639 -1.3897263 0.9950286 a4 -0.2767050 0.6419780 1.0471639 0.0000000 0.8006959 -1.1003404 a5 1.4584473 -0.3980593 -1.3897263 0.8006959 0.0000000 0.8478430 a6 -1.6778256 -0.7278271 0.9950286 -1.1003404 0.8478430 0.0000000 a7 0.3476943 -1.5744964 -1.6744013 0.4044933 -0.1521628 -0.5976799 cv 6.1454476 -3.2317408 0.1633180 -16.2069243 9.7546290 74.8125538 a7 cv a1 0.3476943 6.1454476 a2 -1.5744964 -3.2317408 a3 -1.6744013 0.1633180 a4 0.4044933 -16.2069243 a5 -0.1521628 9.7546290 a6 -0.5976799 74.8125538 a7 0.0000000 0.3256798 cv 0.3256798 0.0000000 $n [1] 3000 $gp [1] 6 $method [1] "pearson"
確ãã«a7ã¨cvã®åç¸é¢ä¿æ°ã¯0.006ã§ãæ£ãã®å¤ã«ãªã£ã¦ãã¦ãå
ã®åç¸é¢ä¿æ°ã®-0.007ã¨ã¯ç°ãªãCV = "Yes"ã«å¯ä¸ãã¦ããã¨ãããã¨ãåããã¾ããã
ã¨ãããã¨ã§ããææãåãã¦è£è¶³ãå¤å¤é解æãããæã¯åç¸é¢ä¿æ°ã«ã注æï¼ããã¦äº¤äºä½ç¨ãï¼ãè£å çããææãããã¨ããããã¾ããã
ãçããæ±ã£ã¦ããã®ã«åæ¯ãå¤åãã¦ããã±ã¼ã¹
ããã¯ä»¥åã®è¨äºï¼「カイゼンしたらコンバージョン率が○○%→△△%にup!」は分母を無視したら成り立たないかもしれないï¼ã§ãåãä¸ãããCVRã ããè¨ç®ãã¦ãæå³ããããã©ãããè«ç¾©ã¨åãã±ã¼ã¹ã§ããã
ã¤ã¾ãããã課éãä¼´ãã²ã¼ã ã¤ãã³ãï¼ãããã³ã³ãã¼ã¸ã§ã³ã¨ããï¼ã®å°ç·ãæ¹åãããã¨æã£ã¦ãå¾æ¥ã®å¤ãã¤ãã³ãå°ç·ã«å ãã¦æ°ããå¥ã®ã¤ãã³ãå°ç·ã追å ããã¨ãã¦ã以ä¸ã®ãããª2Ã2ã®è¡¨ãä½ã£ã¦*10ã
課éãã | 課éããªãã£ã | |
---|---|---|
æ°ã¤ãã³ãå°ç· | 5 | 2 |
æ§ã¤ãã³ãå°ç· | 150 | 140 |
ããã«ããã以ä¸ã®ããã«è§£éããã¨ãã
課éãã | 課éããªãã£ã | 課éç | |
---|---|---|---|
æ°ã¤ãã³ãå°ç· | 5 | 2 | 71% |
æ§ã¤ãã³ãå°ç· | 150 | 140 | 52% |
ãæ°ããã¤ãã³ãå°ç·ã®æ¹ã課éç71%ï¼ã ãã絶対æ°ããã¤ãã³ãå°ç·ã®æ¹ã売ä¸ã伸ã°ããããå
¨é¢çã«åãæ¿ãããï¼ãã¨ãåãã§ãã¾ã£ãã*11ããããä½ã§ãã¾ããã®ã§ã¯ï¼ã¨ããã話ã§ããã
ããã¾ã以åã®è¨äºã§ãç´¹ä»ãã¾ããããããããã±ã¼ã¹ã§ã¯ç¬ç«æ§ã®ã«ã¤äºä¹æ¤å®orãã£ãã·ã£ã¼ã®æ£ç¢ºç¢ºçæ¤å®ãç¨ãã¦*12ãåæ¯ãéããã¨ã«ããå½±é¿ãå å³ããä¸ã§CVRã«å¤åããã£ããã©ããããã§ãã¯ããã®ãçã§ãããã
> x=matrix(c(5,2,150,140),ncol=2,byrow=T) > print(x) [,1] [,2] [1,] 5 2 [2,] 150 140 > chisq.test(x) # ç¬ç«æ§ã®ã«ã¤äºä¹æ¤å® Pearson's Chi-squared test with Yates' continuity correction data: x X-squared = 0.4205, df = 1, p-value = 0.5167 # ææå·®ãªãï¼ã¤ã¾ã両è ã«å·®ã¯ãªã è¦åã¡ãã»ã¼ã¸ï¼ In chisq.test(x) : ã«ã¤èªä¹è¿ä¼¼ã¯ä¸æ£ç¢ºããããã¾ãã # ãæ°ã¤ãã³ãå°ç·ãã®ãµã³ãã«ãµã¤ãºãå°ããããã¨æå¥ãè¨ã£ã¦ãã > fisher.test(x) # ãã£ãã·ã£ã¼ã®æ£ç¢ºç¢ºçæ¤å®ï¼ãµã³ãã«ãµã¤ãºã®å¤§å°ãåãã使ãããæéãããããã¨ããã Fisher's Exact Test for Count Data data: x p-value = 0.4507 # ãã¯ãææå·®ãªãï¼ã¤ã¾ã両è ã«å·®ã¯ãªã alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.3736408 24.8107839 sample estimates: odds ratio 2.326977
åæ¯ã大ããéã£ã¦ããã«ããããããããã å²ãç®ã ããã¦ãââçããæ±ãã¦ããã®å¤ã大ããã£ãorå°ããã£ãã¨ãã§ä½ãèããã«ä¸åä¸æãã¦ãã¦ã¯ããã¾ããããã¨ãããã¨ã§ã
æç³»åãã¼ã¿ãæ±ãã±ã¼ã¹
æå°ã®æªãä¾ã§ããããããªã±ã¼ã¹ãèãã¦ã¿ã¾ããããGitHubã«サンプルデータその2ãç¨æãã¦ãããã®ã§ãæã£ã¦ãã¦Rã«x1ã¨ãããååã§èªã¿è¾¼ã¾ãã¦ä¸ãããå¿
è¦ãªRããã±ã¼ã¸ã¯{fGARCH}ã§ãã
ãã¦ããã®ãã¼ã¿ãªãã§ãããå¹³åã¯ã©ããããã§ããããï¼ãã¡ãªã¿ã«ãã®ãã¼ã¿ã®ãµã³ãã«ãµã¤ãºã¯1200ã§ãã
> mean(x1) [1] 9989.522 > mean(x1[1:500]) [1] 9935.598 > mean(x1[501:1200]) [1] 10028.04 > mean(x1[701:900]) [1] 9934.387 > mean(x1[301:400]) [1] 9943.169
ã©ããã©ãåã£ã¦ãã大ä½10000ããããå¹³åå¤ã®ããã§ããããããããã¯é©å½ã«ã°ãã¤ãã¦ããå¹³å10000ãããã®ãã¼ã¿ãªãã ãªãããã¨æã£ã¦ããã¨å¤§ééãã試ãã«åæ£ãè¨ç®ãã¦ã¿ãã¨ã
> var(x1) [1] 1346957 > var(x1[1:500]) [1] 1489078 > var(x1[501:1200]) [1] 1243861 > var(x1[701:900]) [1] 252995.2 > var(x1[301:400]) [1] 3699340
ããï¼åãåºãã¦ããã¨ãããã¨ã«ã¾ãã£ããåæ£ãéãï¼ããã§ã試ãã«ãããæç³»åã§ãããããã¦ããã¨ãããªãã¾ãã
ããããã®ã¯ãããã®ãã¼ã¿ã¯å®ã¯
> x1<-rbind(matrix(rnorm(300,mean=10000,sd=200)),matrix(rnorm(300,mean=10000,sd=2000)), + matrix(rnorm(300,mean=10000,sd=500)),matrix(rnorm(300,mean=10000,sd=1000))) > plot.ts(x1)
ã¨è¨ç®ãã¦å¾ããããã®ã ããã§ããã¤ã¾ããå¹³åã¯ä¸å®ã ãã©åæ£ï¼æ¨æºåå·®ï¼ã¯ã¿ã¤ã ã¾ã¼ã³ãã¨ã«ã¦ãã§ãã©ãã©ããããªã£ã¦ããæç³»åãã¼ã¿ããã å¹³åããã ãã§ã¯ä½ãåãããªãã¦å½ç¶ã§ããï½ãã¨ãããã¨ã§ãæç³»åãã¼ã¿ã®å ´åã¯ãã å¹³åãè¦ãã ãã§ãªãããã®ãããªã°ãã¤ãï¼ãã©ãã£ãªãã£ï¼Volatilityï¼ãè©ä¾¡ããå¿
è¦ãããã±ã¼ã¹ãããã¨ãããã¨ãé ã«å
¥ãã¦ããå¿
è¦ãããã¾ãã
ãããªãã®ã¯ç¨å¿ãã¦æ¯åæç³»åããããæãã ãã§ãååã ã¨æãã¾ããï¼ç¬ï¼ãã©ããã¦ãç´°ããåæãã¦ã¿ããã¨ããå ´åã¯ãä¾ãã°è¨éæç³»ååæã®GARCHï¼ä¸è¬åèªå·±å帰æ¡ä»¶ä»ãåæ£ä¸åä¸ï¼ã¢ãã«ãæ¨å®ãããã¨ã§ããã©ã¡ã¼ã¿æ¨å®ãäºæ¸¬ãè¡ããã¨ãã§ãã¾ãã
GARCHã¢ãã«ã¨ã¯ä½ãã¨ããã¨ãè¦ã¯æç³»åãã¼ã¿ã®çã®å¤ã§ã¯ãªããåæ£ã®å¤§ããï¼ï¼ã°ãã¤ãorãã©ãã£ãªãã£ï¼ããæ¹ãã¦æç³»åãã¼ã¿ã¨ãã¦æ±ãããã®æåãã¢ããªã³ã°ãããã®ã§ããããã¯ã©ã¡ããã¨ããã¨ãã¾ãwebãã¼ã±ã«ã¯é¦´æã¿ã®ãªãçºæ³ã§ãããéèãã¼ã¿ã®ããã«ãã©ãã£ãªãã£èªä½ããªã¹ã¯è¦å ã¨ãªãæ¥çã§ã¯å¤§å¤éè¦ãªæ¹æ³è«ã§ãã
ã§ãå®éã®è¨ç®ã¯{fGARCH}ããã±ã¼ã¸ãç¨ãã¦ãããªæãã§ããã¾ã*13ã
> x1.garch<-garchFit(~garch(1,1),data=x1,trace=T) Series Initialization: ARMA Model: arma Formula Mean: ~ arma(0, 0) GARCH Model: garch Formula Variance: ~ garch(1, 1) ARMA Order: 0 0 Max ARMA Order: 0 GARCH Order: 1 1 Max GARCH Order: 1 Maximum Order: 1 Conditional Dist: norm h.start: 2 llh.start: 1 Length of Series: 1200 Recursion Init: mci Series Scale: 1160.585 Parameter Initialization: Initial Parameters: $params Limits of Transformations: $U, $V Which Parameters are Fixed? $includes Parameter Matrix: U V params includes mu -86.07318384 86.07318 8.607318 TRUE omega 0.00000100 100.00000 0.100000 TRUE alpha1 0.00000001 1.00000 0.100000 TRUE gamma1 -0.99999999 1.00000 0.100000 FALSE beta1 0.00000001 1.00000 0.800000 TRUE delta 0.00000000 2.00000 2.000000 FALSE skew 0.10000000 10.00000 1.000000 FALSE shape 1.00000000 10.00000 4.000000 FALSE Index List of Parameters to be Optimized: mu omega alpha1 beta1 1 2 3 5 Persistence: 0.9 --- START OF TRACE --- Selected Algorithm: nlminb R coded nlminb Solver: 0: 1454.2321: 8.60732 0.100000 0.100000 0.800000 1: 1339.3542: 8.56697 0.0446296 0.111299 0.779856 # ä¸ç¥ 47: 1146.4477: 8.61814 0.000875581 0.172139 0.849785 Final Estimate of the Negative LLH: LLH: 9614.463 norm LLH: 8.012052 mu omega alpha1 beta1 1.000208e+04 1.179370e+03 1.721390e-01 8.497848e-01 R-optimhess Difference Approximated Hessian Matrix: mu omega alpha1 beta1 mu -8.391052e-03 5.621736e-06 1.624651e-01 2.756090e-01 omega 5.621736e-06 -4.616137e-06 -1.335149e-01 -1.959011e-01 alpha1 1.624651e-01 -1.335149e-01 -1.819808e+04 -2.431246e+04 beta1 2.756090e-01 -1.959011e-01 -2.431246e+04 -3.595277e+04 attr(,"time") Time difference of 0.1320131 secs --- END OF TRACE --- Time to Estimate Parameters: Time difference of 0.7820778 secs > plot(x1.garch) Make a plot selection (or 0 to exit): 1: Time Series 2: Conditional SD 3: Series with 2 Conditional SD Superimposed 4: ACF of Observations 5: ACF of Squared Observations 6: Cross Correlation 7: Residuals 8: Conditional SDs 9: Standardized Residuals 10: ACF of Standardized Residuals 11: ACF of Squared Standardized Residuals 12: Cross Correlation between r^2 and r 13: QQ-Plot of Standardized Residuals Selection: 2 # ãåæ£ã®å¤§ãããã®æç³»åãã¼ã¿ããããããã > x1.pred<-predict(x1.garch,n.ahead=150,plot=T,nx=1200) # 150æå ã¾ã§ã®ãåæ£ã®å¤§ããããäºæ¸¬ãã¦ãããããã
æ«åºããã«ãªã£ã¦ãã®ã¯ãéä¸ã§åæ£ãã§ãããªã£ãããã§ãããã¡ãªã¿ã«ããã¯ãã®ããã¼ã¼ã¼ãé©å½ãªãµã³ãã«ãã¼ã¿ãªã®ã§ãæ¬æ¥ãªããããªGARCHã¢ãã«ã§æ±ãã«ã¯ãã¿ã©ã¡éãã¾ããåèã¾ã§ã«garchSim()é¢æ°ã§çæãããã£ã¨ç¶ºéºãªã·ãã¥ã¬ã¼ã·ã§ã³ãã¼ã¿ã«å¯¾ããã¢ããªã³ã°ã¨äºæ¸¬ã®çµæãè¼ãã¦ããã¾ãããã
> set.seed(100) > x2<-garchSim(n=1200,garchSpec()) > plot(x2,lwd=2) > x2.garch<-garchFit(~garch(1,1),data=x2,trace=F) > plot(x2.garch) Make a plot selection (or 0 to exit): 1: Time Series 2: Conditional SD 3: Series with 2 Conditional SD Superimposed 4: ACF of Observations 5: ACF of Squared Observations 6: Cross Correlation 7: Residuals 8: Conditional SDs 9: Standardized Residuals 10: ACF of Standardized Residuals 11: ACF of Squared Standardized Residuals 12: Cross Correlation between r^2 and r 13: QQ-Plot of Standardized Residuals Selection: 2 > x2.pred<-predict(x2.garch,n.ahead=150,plot=T,nx=1200)
ããã¯ãåæ£ã®å¤§ããã*14ãå®å¸¸ã¨ããã·ãã¥ã¬ã¼ããã¼ã¿ãªã®ã§ããã®150æå äºæ¸¬ããåæ£ã®å¤§ããããä¸å®ã®ã¾ã¾ãã¨ããçµæã«ãªã£ã¦ãã¾ãã
æå¾ã«
ã¨ãããã¨ã§ãSQLã§å®æçã«åç´éè¨ããBIãã¼ã«çµãã§ãå®æçã«æ°åorãããããçºãã¦æºè¶³ããããã¦ããã¨å±ãªããããããªãã®ã§ãé©å®æ©æ¢°å¦ç¿ã¨ãçµ±è¨è§£æãã£ã¦ãã§ãã¯ãã¦ã¿ã¾ããã¼ãã¨ããã話ã§ããï¼ç¬ï¼ã
ã¡ãªã¿ã«GARCHã¢ãã«ã¯åã¯ãã¼ãã¼ã¯ãªã®ã§ãçªã£è¾¼ã¾ãããçãããã¾ããï¼æ³£ï¼ãä¸å¿æ²æ¬æ¬7ç« ã«è©³ãã解説ãè¼ã£ã¦ã¾ããã詳ãã人誰ãæãã¦ãã ããããã
ä½è«
è£å
çã®ããã³ããããã ãã¦ã¡ãã£ã¨æãè¿ãããã§ãããåã®å ´åå®åã§ä½¿ããªããã®ã¯ä¸åº¦åå¼·ãã¦ãã©ãã©ãå¿ãã¦ãã£ã¦ãã¾ãã®ã§ãåç¸é¢*15ã«éãããä¾ãã°ç¸é¢ã¾ããå
¨è¬ãåºéæ¨å®ãããã¦ãã³ãã©ã¡ããªãã¯æ¤å®*16ã¾ãããä»ã«ã確çéç¨ãä¹±æ°çè«ãããã¯å¤åç¸å½æªããæ°ããããå¤ä¼ã¿ã«ã¡ãã£ã¨éä¸ãã¦å¾©ç¿ãããã¨æãã¾ãã
*1:GLMã«çªã£è¾¼ãã ãæ¨æºå帰ä¿æ°ã5%ææãããªãã¨ã
*2:glm()ã¯ããã©ã«ãã®{stats}ã«å«ã¾ããã®ã§ä½ãããªãã¦å¤§ä¸å¤«
*3:0.7%ã®å·®ãªãã¦ãªããåç¶ã§ããï½
*4:Rã§as.factor()ã使ã£ã¦å¤æããã¨"No"ã®å´ã«0, "Yes"ã®å´ã«1ãå ¥ã
*5:å帰ä¿æ°ã0ãã大ãããã
*6:馬鹿ã«ãã¡ããããã¾ããããããªã±ã¼ã¹ã¯ãã¼ã¿åæã®ç¾å ´ã«è¡ãã°è ãã»ã©ããã¾ã
*7:ã¾ãcvãcv_yesï¼CV = Yesã®æã«1 / ãã以å¤ã¯0ï¼ã¨cv_noï¼CV = Noã®æã«1 / ãã以å¤ã¯0ï¼ã®2ã¤ã®ã«ã©ã ã«åãããã®2ã«ã©ã ãæ®ãã®a1-a7ã®7ã«ã©ã ã¨ãã£ã¤ãã¦ããããªã¯ã¹ã«å¤æããã°{arules}ã®apriori()é¢æ°ã«æãããã¨ãã§ãã¾ã
*8:ãã ããä½æ ããã¨æ¸ããã®ã¯åã«èªè ã®çããã®æ³¨æãæ¹ãããã£ãã ããªã®ã§ããå¿ ç¶çã§ã¯ãªããã¨ããæå³ã§ã¯ãªãã®ã§ã
*9:ãã ãå¯è¦åã§ãã¦åãããããã¨ããå©ç¹ã¯ããã¾ãããçµ±è¨å¦çã«ã¯å ¨ãã¨ã¬ã¬ã³ãã§ã¯ãªã
*10:ããã¯ççã«æ¥µç«¯ãªä¾ã§ãã念ã®ãã
*11:ä½åº¦ã§ãæ¸ãã¾ããããã¯ççã«æ¥µç«¯ãªä¾ã§ãã念ã®ãã
*12:chisq.test()ãfisher.test()ã{stats}ã«å«ã¾ããã®ã§ã¤ã³ã¹ãã¼ã«ä¸è¦
*13:GARCH(1,1)ã¢ãã«ãé¸æãã¦ããçç±ã«ã¤ãã¦ã¯ä¾ãã°æ²æ¬æ¬p.156ãåç §ã®ãã¨
*14:å³å¯ã«ã¯å¤åãã®å¯¾æ°ã ã¨æããã§ããããã
*15:確ã4å¹´åãããã«å½éä¼è°ã®ãã¬ã¼ã³æ¸ãæã«ä½¿ã£ãè¨æ¶ããã
*16:é ä½åæ¤å®ã¨ç¬¦åã¤ãé ä½åæ¤å®ãããããæãã¤ããªã