競馬ã®äºæ¸¬ãã¬ãã§ãã£ã¦ã¿ã
åºæ¬çã«ç«¶é¦¬ãªãã¦ããã¹ãã§ã¯ãªãã¨ç§ã¯æã£ã¦ãããè´å
ã®åãåãå¤ãããã ãå®ããã«æ¯ã¹ãã°ã¾ã ã¾ãã ããããã§ãè³ãéã®20ï½30%ã¯è´å
ã«åããããã¨ã«ãªãã*1
ãããä»åã¯ãã¡ãã£ã¨æãç«ã£ã¦ç«¶é¦¬ã®äºæ¸¬ããã£ã¦ã¿ããã¨ã«ããã
çç±ã¯é¦¬å¸ã®å®ãã ãç§ã¯ç¾å¨ãè³ééãå°ãªã人éã§ãä¸å©ã«ãªããªãæè³å
ãæ¢ãã¦ããã®ã ãã馬å¸ã®ä¸æ100åã¨ããå®ãã¯é
åçã«æ ããæ ªã®å ´åã«ã¯ã©ããªå®ãæ ªã§ããæä½è³¼å
¥é¡ã¯æ°ä¸å以ä¸*2ãªã®ã§ãããç¨åº¦ã¾ã¨ã¾ã£ãè³éãå¿
è¦ã«ãªãã
ã¾ãã競馬ã«ã¯æè¡ä»å
¥ã®ä½å°ï¼åªå次第ã§åå©ã§ããå¯è½æ§ï¼ãããã
ä¾ãã°ãããªä¾ãããã
160億円ボロ儲け!英投資会社が日本の競馬で荒稼ぎした驚きの手法 - NAVER まとめ
å½¼ãã¯çµ±è¨è§£æã«ãã£ã¦ç«¶é¦¬ã§åã£ã¦ããããã®æå¾ãé ãã¦ãããããããããããã¥ã¼ã¹ãåºãã¨ãããã¨ã¯ãè§£æè
ã®è
次第ã§ã¯ç«¶é¦¬ã§åã¦ãå¯è½æ§ãããã¨ãããã¨ã ã*3
ã¾ãã¯ãã¼ã¿ãéãã
ã¨ãããã¨ã§ã競馬ã®çµ±è¨è§£æãããããããªã®ã ããè§£æããããã®ãã¼ã¿ããªããã°ä½ãå§ã¾ããªãã
ã¾ãã¯ã競馬ã®ãã¼ã¿ã以ä¸ã®ãµã¤ãããã¹ã¯ã¬ã¤ãã³ã°ãã¦åã£ã¦ãããã¨ã«ããã
netkeiba.com - 競馬データベース
netkeiba.comã§ã¹ãã¼ãææ°ï¼ããåºæºãå
ã«èµ°ç ´ã¿ã¤ã ãæ°å¤åãããã®ï¼ãé¦¬å ´ææ°ï¼é¦¬å ´ã³ã³ãã£ã·ã§ã³ãæ°å¤åãããã®ï¼ãé²è¦§ããã«ã¯ææä¼å¡ã«ç»é²ããå¿
è¦ããããç§ã¯ææä¼å¡ã«ç»é²ããä¸ã§ã¹ãã¼ãææ°ãé¦¬å ´ææ°ã¾ã§å«ãã¦ã¹ã¯ã¬ã¤ãã³ã°ãè¡ã£ãã
以ä¸ã«ã¹ã¯ã¬ã¤ãã³ã°ï¼ç´ æ§ä½æç¨ã®Scalaã³ã¼ããå
¬éããã
github.com
ã¡ãªã¿ã«ãã¼ã¿è§£æã¯ãã¼ã¿ãè§£æã§ããå½¢ã«æã£ã¦ããã¾ã§ãå
¨å·¥ç¨ã®ä¹å²ãå ããã¨è¨ããã¦ãããå®éç§ããã®ã¹ã¯ã¬ã¤ãã³ã°ï¼ç´ æ§ä½æç¨ã¹ã¯ãªããã使ããã®ã«æ°é±éã¯ããã¦ãã*4ããã®ã¹ã¯ãªãããç¡æã§ä½¿ããçããã¯å¹¸éã§ããã
使ãããç´ æ§ã¯æçµçã«SQLiteã«æ ¼ç´ãããããã«ãªã£ã¦ããããã®ã³ã¼ãã使ãã®ã«netkeiba.comã®ææä¼å¡ã«ç»é²ããå¿
è¦ã¯ãªããããã®å ´åã¯ã¹ãã¼ãææ°ãé¦¬å ´ææ°ã®ã«ã©ã ã«ã¯NULLå¤ãå
¥ããã¨ã«ãªãã®ã§æ°ãã¤ãã¦æ¬²ããã
ä½ãäºæ¸¬ããã®ã
ãã¼ã¿ãéã¾ã£ãæã§ã次ã«ãä½ããäºæ¸¬ããã®ã決ãããã
ç§ã調ã¹ãéãã§ã¯ã競馬ã®äºæ¸¬ã«ã¯ï¼ã¤ã®æ¹æ³ãããã*5
- ããã¬ã¼ã¹ã«é¢ããæ å ±ãå ¥åã¨ãã¦ããã®ã¬ã¼ã¹ãèãããå¦ãï¼ä¸çªäººæ°ã®é¦¬ãä¸çã«ãªããã©ããï¼ãäºæ¸¬
- ãã馬ã®éå»ã®åçã鍿ã®åçãªã©ãå ¥åã¨ãã¦ããã®é¦¬ãã¬ã¼ã¹ã«ããã¦ä½çã«ãªãããäºæ¸¬
ä¾ãã°ä»¥ä¸ã®æ¬ã®èè ã¯ä¸¡æ¹ã®æ¹æ³ã試ããä¸ã§å¾è ã®æ¹æ³ã¯é£ããã®ã§åè ã®æ¹æ³ã§äºæ¸¬ããã»ãããã¾ãããã¨çµè«ã¥ãã¦ããã

å®è·µãã¼ã¿ãã¤ãã³ã°âéèã»ç«¶é¦¬äºæ¸¬ã®ç§å¦
- ä½è : ææ¬æ´
- åºç社/ã¡ã¼ã«ã¼: ãªã¼ã 社
- çºå£²æ¥: 1999/12
- ã¡ãã£ã¢: åè¡æ¬
- è³¼å ¥: 5人 ã¯ãªãã¯: 74å
- ãã®ååãå«ãããã° (3ä»¶) ãè¦ã
ã¤ã¾ããåå¥ã®é¦¬ã«é¢ãããã¼ã¿ãå
¥åã¨ãããã®é¦¬ãã¬ã¼ã¹ã§ä¸çã«ãªããã©ããã®äºå¤ãåºåã¨ããçµ±è¨ã¢ãã«ã使ããããã§ããã
ãªãä»åã¯ãäºæ¸¬ããã®ã¯ã¬ã¼ã¹ã®çé ã§ã¯ãªãããã¾ã§ããä¸çã«ãªããã©ãããã®äºå¤ã ãã«ããã
夿°å | 説æ |
---|---|
order_of_finish | ä¸çã§ããã°TRUEãããã§ãªããã°FALSEã¨ãªã夿° |
ãªãããããã®ãã¨ããã¨ã競馬ã§ã¯ã¬ã¼ã¹ã®éä¸ã§é¨æãããã®ã¾ã¾ã§ã¯ä¸ä½ã«ãªããªããªãã¨æ°ä»ããã¨ãã馬ãç¡é§ã«ç²ããããªãããã«ããã¦é ãèµ°ããããã¨ãããã®ã ããã ï¼ç«¶é¦¬ã¯çé ãä¸ä½ãããªãã¨è³éãè²°ããªãããï¼ãã¤ã¾ããçé ãä¸ä½ãªãã°ãã®é¦¬ã«ã¯å®åãããã¨è¨ããããçé ãä¸ä½ã ããã¨ãã£ã¦å¿ ãããå®åãç¡ãã¨ã¯ãããªãã®ã ãã ãããå³å¯ãªçé ã®æ°å¤ãã§ã¯ãªããä¸çã«ãªããã©ããã®äºå¤ãã ããäºæ¸¬ããã·ã³ãã«ãªã¢ãã«ã使ããã»ãããã¾ãããããã ã*6ï¼åèï¼Identifying winners of competitive events: A SVM-based classification model for horserace predictionï¼
æ°ãã¤ããªãã¨ãããªãã®ã¯ãä¸çã«ãªã£ã馬ã¯å°ãªã䏿¹ã§ãä¸çã«ãªããªãã£ã馬ã¯ããããããã¨ãããã¨ã ããã®ã¾ã¾ã ã¨å¦ç¿ãã¼ã¿ãä¸åè¡¡*7ã«ãªã£ã¦ãã¾ããäºæ¸¬ã¢ãã«ã使ããã¨åã£ãã¢ãã«ãåºæ¥ã¦ãã¾ããä¸åè¡¡ãã¼ã¿ãæ±ãæ¹æ³ã¯ããã¤ãããããä»åã¯é¢åèãã®ã§å¤ãæ¹ã®ã¯ã©ã¹ï¼ä¸çã«ãªããªãã£ã馬ï¼ã®ãã¼ã¿ããµã³ããªã³ã°ã§æ¸ããã¦ãã¾ããã¨ã«ããã
ä½ãçµ±è¨ã¢ãã«ã®å ¥åã¨ããã®ã
次ã«åé¡ãªã®ã¯ãçµ±è¨ã¢ãã«ã®å
¥åã«ä½ã®å¤æ°ã使ããã ã
ãã®ã¢ãã«ã®å
¥åã¨ãã¦ãç§ã¯ä»¥ä¸ã®ç´ æ§ã使ããã¨ã«ããã
夿°å | 説æ |
---|---|
age | 馬ã®å¹´é½¢ |
avgsr4 | éå»4ã¬ã¼ã¹ã®ã¹ãã¼ãææ°ã®å¹³å |
avgWin4 | éå»4ã¬ã¼ã¹ã®ä¸çã¾ã§ã«å ¥ã£ã¦ããå²å |
course | ã³ã¼ã¹ãå³åããå·¦åããç´ç·ã |
dhweight | ååã®ã¬ã¼ã¹æããã®é¦¬ã®ä½éå¤åé |
disavesr | ä»åã¨åä¸ã®è·é¢ã³ã¼ã¹ã«ãããã¹ãã¼ãææ°ã®å¹³å |
disRoc | å¹³åè·é¢ã¨ã®å·®Ã·å¹³åè·é¢ |
distance | ä»åã®ã³ã¼ã¹ã®è·é¢ |
dsl | ååã®ã¬ã¼ã¹ãã使¥ç©ºããã |
enterTimes | åºå ´åæ° |
eps | 馬ã®å¹³åç²å¾è³éé¡ |
grade | ã°ã¬ã¼ãã¯ä½ã |
horse_number | é¦¬çª |
hweight | 馬ã®ç¾å¨ã®éã |
jAvgWin4 | 鍿ã®éå»ï¼èµ°ã®åç |
jEps | 鍿ã®å¹³åç²å¾è³éé¡ |
jwinper | 鍿ã®ä¸çç |
owinper | 馬主ã®ä¸çç |
placeCode | ç«¶é¦¬å ´ã¯ã©ãã |
preOOF | åèµ°ã®é ä½ |
pre2OOF | ï¼èµ°åã®é ä½ |
preSRa | ååã®ã¹ãã¼ãææ° |
preLastPhase | åèµ°ã®ä¸ãã3ããã³ã¿ã¤ã |
race_number | 䏿¥ã®å ã®ä½ã¬ã¼ã¹ç®ã |
runningStyle | 馬ã®è質 |
lateStartPer | åºé ãç |
month | ã¬ã¼ã¹æ¥ã¯ä½æã |
sex | é¦¬ã®æ§å¥ |
surface | ã³ã¼ã¹ã¯èããã¼ãã |
surfaceScore | é¦¬å ´ææ° |
twinper | 調æå¸«ã®åç |
weather | ã¬ã¼ã¹æ¥ã®å¤©å |
weight | æ¤é |
weightper | æ¤é÷馬ã®ä½é |
winRun | 馬ã®åã¡åæ° |
ãã®ãªã¹ãã¯ãç§ã競馬é¢é£ã®æ¬ã¨ãè«æã¨ããèªãã§ããªãã¨ãªãè¯ããããã¨æã£ã夿°ãããéããã ããªã®ã§ããããã®å¤æ°ã使ããã¨ã«å¿ ç¶æ§ãããããã§ã¯ãªããä»ã®å¤æ°ã使ã£ãå ´åã«ã©ããªããæ°ã«ãªãã¨ããæ¹ã¯èªåã§ã³ã¼ããå¼ã£ã¦è©¦ãã¹ãã
randomForestã使ã£ã¦äºæ¸¬ãã¦ã¿ã
äºæ¸¬ã¢ãã«ã®ä½æã«ã¯Rã®randomForestããã±ã¼ã¸ã使ããã¨ã«ãããrandom forestã¨ã¯2001å¹´ã«Leo Breiman ã«ãã£ã¦ææ¡ãããæå¸«ããå¦ç¿ã®ã¢ã«ã´ãªãºã ã§ããããã®ããã°ãè¦ã«æ¥ããããªäººã«ã¯è§£èª¬ã®å¿
è¦ã¯ãªããããããªããããã£ããè¨ãã¨ãdecision treeã¯bias-varianceåè§£ã§è¨ãã¨ããã®varianceï¼å¦ç¿çµæã®ä¸å®å®æ§ï¼ãé«ãã®ã§baggingã¨ç´ æ§ã®samplingãé©ç¨ãã¦ã¿ããvarianceãä¸ãã£ã¦æ±åæ§è½ã¢ãããã¾ãããã¨ããã¢ã«ã´ãªãºã ãrandom forestã§ããã*8
ããã§ã¯ãå®éã«Rã®randomForestããã±ã¼ã¸ã使ã£ã¦äºæ¸¬ã¢ãã«ã使ãã¦ã¿ããã
> library(randomForest) > library(RSQLite) > > randomRows <- function(df, n) { + df[sample(nrow(df),n),] + } > > downSample <- function(df) { + c1 <- df[df$order_of_finish == "TRUE",] + c2 <- df[df$order_of_finish == "FALSE",] + size <- min(nrow(c1), nrow(c2)) + rbind(randomRows(c1,size), randomRows(c2,size)) + } > > drv <- dbDriver('SQLite') > > conn <- dbConnect(drv, dbname='race.db') > > rs <- dbSendQuery(conn, + 'select + order_of_finish, + race_id, + horse_number, + grade, + age, + avgsr4, + avgWin4, + dhweight, + disRoc, + r.distance, + dsl, + enterTimes, + eps, + hweight, + jwinper, + odds, + owinper, + preSRa, + sex, + f.surface, + surfaceScore, + twinper, + f.weather, + weight, + winRun, + jEps, + jAvgWin4, + preOOF, + pre2OOF, + month, + runningStyle, + preLastPhase, + lateStartPer, + course, + placeCode, + race_number + from + feature f + inner join + race_info r + on + f.race_id = r.id + where + order_of_finish is not null + and + preSRa is not null + limit 250000') > > allData <- fetch(rs, n = -1) > > dbClearResult(rs) [1] TRUE > dbDisconnect(conn) [1] TRUE > > #ã«ãã´ãªå¤æ°ããã¡ã¯ã¿ã¼ã«å¤æãã¦ãã > allData$placeCode <- factor(allData$placeCode) > allData$month <- factor(allData$month) > allData$grade <- factor(allData$grade) > allData$sex <- factor(allData$sex) > allData$weather <- factor(allData$weather) > allData$surface <- factor(allData$surface) > allData$course <- factor(allData$course) > > #è² æ éé/馬ä½éãç´ æ§ã«è¿½å > allData$weightper <- allData$weight / allData$hweight > > #ãªããºãæ¯æçã«å¤æ > allData$support <- 0.788 / (allData$odds - 0.1) > allData$odds <- NULL > > #çé ãã«ãã´ãªå¤æ°ã«å¤æ > allData$order_of_finish <- factor(allData$order_of_finish == 1) > > #ã¯ã©ã¹ãã©ã³ã¹ã50/50ã«ãã > allData.s <- downSample(na.omit(allData)) > allData.s <- allData.s[order(allData.s$race_id),] > > #ä»åã®å®é¨ã§ä½¿ç¨ãããã¼ã¿ã®ãµã³ãã«æ° > nrow(allData.s) [1] 30428 > > #ãã¼ã¿ãå¦ç¿ç¨25428ãµã³ãã«ã¨ãã¹ãç¨5000ãµã³ãã«ã«åå²ãã > train <- allData.s[1:(nrow(allData.s)-5000),] > test <- allData.s[(nrow(allData.s)-4999):nrow(allData.s),] > > #äºæ¸¬ã¢ãã«ã使 > (rf.model1 <- randomForest( + order_of_finish ~ . - support - race_id, train)) Call: randomForest(formula = order_of_finish ~ . - support - race_id, data = train) Type of random forest: classification Number of trees: 500 No. of variables tried at each split: 5 OOB estimate of error rate: 29.72% Confusion matrix: FALSE TRUE class.error FALSE 8420 4362 0.3412611 TRUE 3196 9450 0.2527281 > > #ç´ æ§ã®éè¦åº¦ãè¦ã¦ã¿ã > importance(rf.model1) MeanDecreaseGini horse_number 276.57124 grade 191.53030 age 210.04150 avgsr4 545.24005 avgWin4 526.77427 dhweight 296.32679 disRoc 443.31973 distance 232.20557 dsl 371.28809 enterTimes 332.80342 eps 682.54396 hweight 393.27570 jwinper 417.62300 owinper 366.49348 preSRa 536.27096 sex 62.81792 surface 45.83353 surfaceScore 361.17891 twinper 348.52685 weather 123.82181 weight 165.54897 winRun 246.36929 jEps 603.00998 jAvgWin4 140.99460 preOOF 870.35176 pre2OOF 475.39642 month 737.97377 runningStyle 456.73422 preLastPhase 408.51575 lateStartPer 250.49252 course 42.43917 placeCode 564.23156 race_number 278.57604 weightper 430.13985 > > #ãã¹ããã¼ã¿ã§äºæ¸¬åãè¦ã¦ã¿ã > pred <- predict(rf.model1, test) > tbl <- table(pred, test$order_of_finish) > sum(diag(tbl)) / sum(tbl) [1] 0.7067173
OOBã¨ã©ã¼ã¨ãã¹ããã¼ã¿ã§ã®æ£è§£çãå
±ã«ç´70%ã«ãªã£ã¦ããã50%ãè¶
ãã¦ããã®ã§ããã®ã¢ãã«ã«äºæ¸¬åããããã¨ã¯ç¢ºããªããã ã
ãããæ¬çªã¯ããããã§ãããåé¡ã¯ããã®ã¢ãã«ã®äºæ¸¬åãä»ã®é¦¬å¸è³¼å
¥è
éã®äºæ¸¬åã«åã¦ããã©ããã ã
ãä»ã®é¦¬å¸è³¼å
¥è
éã®äºæ¸¬ãã表ãã¢ãã«ã¨ãã¦ã以ä¸ã®ç´ æ§ã ããç¨ãã¦å¦ç¿ããã¢ãã«ã使ç¨ããã
夿°å | 説æ |
---|---|
support | ååãªããºããéç®*9ããæ¯æç |
ååãªããºããéç®ãããæ¯æçã¯ãä»ã®é¦¬å¸è³¼å ¥è éã®äºæ¸¬ããã®ãã®ã§ãããã ããããã競馬å¸å ´ãå¹ççã§ãããªãã°ããã®æ¯æçã使ã£ãã¢ãã«ãè¶ ããäºæ¸¬ç²¾åº¦ã¯çã¿åºããªãã¯ãã§ããããªã®ã§ããã®ã¢ãã«ã®äºæ¸¬ç²¾åº¦ãè¶ ãããããã©ããã競馬å¸å ´ã®å¹çæ§ã測ãä¸ã¤ã®ç®å®ã¨ãªãã
> #æ¯æçã ããç¨ãã¦äºæ¸¬ã¢ãã«ã使ãã > (rf.model2 <- randomForest( + order_of_finish ~ support, train)) Call: randomForest(formula = order_of_finish ~ support, data = train) Type of random forest: classification Number of trees: 500 No. of variables tried at each split: 1 OOB estimate of error rate: 25.7% Confusion matrix: FALSE TRUE class.error FALSE 8734 4048 0.3166954 TRUE 2486 10160 0.1965839 > > pred <- predict(rf.model2, test) > tbl <- table(pred, test$order_of_finish) > sum(diag(tbl)) / sum(tbl) [1] 0.7379048
ãã®ã¢ãã«ã®äºæ¸¬ç²¾åº¦ã¯ç´74%ã§ããã
æ®å¿µãªããç§ã®ã¢ãã«ã¯70%ãªã®ã§äºæ¸¬åã§è² ãã¦ããâ¦ã
ã¬ã¼ã¹æ¯ã®ç¸å¯¾çãªè½åå·®ãç´ æ§ã«ãã¦ã¿ã
ãã馬ãã¬ã¼ã¹ã§åã¦ããã©ããã¯ããã®é¦¬ã®çµ¶å¯¾çãªè½åã§ã¯ãªããä»ã®é¦¬ã¨ã®ç¸å¯¾çãªè½åå·®ã§æ±ºå®ããããã¨ãããã¨ã¯ã絶対çãªè½åå¤ã§ã¯ãªããåãã¬ã¼ã¹ã«åºãä»ã®é¦¬ã¨ã®ç¸å¯¾çãªè½åå·®ã®æ
å ±ã使ããã¨ã§äºæ¸¬ç²¾åº¦ãåä¸ã§ããã®ã§ã¯ãªããï¼
å
·ä½çã«ã©ãããã®ãã¨ããã¨ãåãã¬ã¼ã¹ã«ã§ã馬ã®ãã¼ã¿ã ããéãã¦æ£è¦åï¼å¹³åï¼åæ£ï¼ã«ããæä½ï¼ããã°ãããããããã°ãåãã¬ã¼ã¹ã«åºãä»ã®é¦¬ã¨ã®è½åå·®ã ããèæ
®ãããã¨ãã§ãããï¼åèï¼Identifying winners of competitive events: A SVM-based classification model for horserace predictionï¼*10
ãã®ã¢ã¤ãã¢ãRã®ã³ã¼ãã«è½ã¨ãè¾¼ãã§ã¿ããã
> racewiseFeature <- + c("avgsr4", + "avgWin4", + "dhweight", + "disRoc", + "dsl", + "enterTimes", + "eps", + "hweight", + "jwinper", + "owinper", + "preSRa", + "twinper", + "weight", + "jEps", + "jAvgWin4", + "preOOF", + "pre2OOF", + "runningStyle", + "preLastPhase", + "lateStartPer", + "weightper", + "winRun") > > splited.allData <- split(allData, allData$race_id) > > scaled.allData <- unsplit( + lapply(splited.allData, + function(rw) { + data.frame( + order_of_finish = rw$order_of_finish, + race_id = rw$race_id, + age = rw$age, + grade = rw$grade, + distance = rw$distance, + sex = rw$sex, + weather = rw$weather, + course = rw$course, + month = rw$month, + surface = rw$surface, + surfaceScore = rw$surfaceScore, + horse_number = rw$horse_number, + placeCode = rw$placeCode, + race_number = rw$race_number, + support = rw$support, + scale(rw[,racewiseFeature])) #ããã§æ£è¦åãã¦ãã + }), + allData$race_id) > > scaled.allData$order_of_finish = factor(scaled.allData$order_of_finish) > > is.nan.df <- function(x) do.call(cbind, lapply(x, is.nan)) > scaled.allData[is.nan.df(scaled.allData)] <- 0 > > scaled.allData <- downSample(na.omit(scaled.allData)) > scaled.allData <- scaled.allData[order(scaled.allData$race_id),] > > #ãã¼ã¿ãå¦ç¿ç¨ã¨ãã¹ãç¨ã«åå²ãã > scaled.train <- scaled.allData[1:(nrow(scaled.allData)-5000),] > scaled.test <- scaled.allData[(nrow(scaled.allData)-4999):nrow(scaled.allData),] > > #ã¬ã¼ã¹æ¯ã«æ£è¦åããããã¼ã¿ã§äºæ¸¬ã¢ãã«ã使 > (rf.model3 <- randomForest( + order_of_finish ~ . - support - race_id, scaled.train)) Call: randomForest(formula = order_of_finish ~ . - support - race_id, data = scaled.train) Type of random forest: classification Number of trees: 500 No. of variables tried at each split: 5 OOB estimate of error rate: 28.63% Confusion matrix: FALSE TRUE class.error FALSE 8739 4047 0.3165181 TRUE 3234 9408 0.2558140 > > #ç´ æ§ã®éè¦åº¦ãè¦ã¦ã¿ã > importance(rf.model3) MeanDecreaseGini age 138.15954 grade 157.86619 distance 192.87544 sex 55.18635 weather 92.09389 course 33.38138 month 529.20000 surface 33.58647 surfaceScore 287.95836 horse_number 222.12282 placeCode 537.07988 race_number 193.98961 avgsr4 858.85621 avgWin4 726.16178 dhweight 345.24014 disRoc 371.22814 dsl 363.05980 enterTimes 357.92536 eps 1005.00112 hweight 366.85535 jwinper 471.85535 owinper 367.94282 preSRa 890.83216 twinper 381.33466 weight 336.16596 jEps 530.81950 jAvgWin4 352.48784 preOOF 794.77337 pre2OOF 500.63913 runningStyle 358.16418 preLastPhase 383.60317 lateStartPer 338.66961 weightper 359.51054 winRun 264.60148 > > #ãã¹ããã¼ã¿ã§äºæ¸¬åãè¦ã¦ã¿ã > pred <- predict(rf.model3, scaled.test) > tbl <- table(pred, scaled.test$order_of_finish) > sum(diag(tbl)) / sum(tbl) [1] 0.7221112
OOBã¨ã©ã¼ããã³ãã¹ããã¼ã¿ã§ã®äºæ¸¬ç²¾åº¦ãç´72%ã«ãªã£ã¦ãããå
ã»ã©ãã2%精度ãåä¸ãã¦ããããã¯ãç¸å¯¾çãªè½åå·®ã®æ
å ±ã使ããã¨ã§ç²¾åº¦ãåä¸ããããã ã
ããããããã§ãã¾ã æ¯æçã使ã£ãã¢ãã«ã®äºæ¸¬ç²¾åº¦74%ã«ã¯å±ããªãã
æ¯æçãç´ æ§ã«å ãã¦ã¿ã
æå¾ã®ã²ã¨æ¼ãã«ãæ¯æçãç§ã®ã¢ãã«ã®ç´ æ§ã«å ãã¦ãã¾ããã¨ã«ãããã
ã¨ããã®ãã人éã®äºæ¸¬åã¯ããªãã®ãã®ã ããåæã«äººéã«ã¯å¿çå¦çãªãã¤ã¢ã¹ï¼ã¢ã³ã«ãªã³ã°ã¨ãï¼ããããã¨ãããã£ã¦ããã䏿¹ã§ãæ©æ¢°ã¯ã¯ã£ããã¨æ°å¤åã§ããç´ æ§ããèæ
®ã§ããªããããã®ä»£ããã«æ©æ¢°ã«ã¯å¿çå¦çãªãã¤ã¢ã¹ã¯åå¨ããªããã¤ã¾ãã人éã徿ãªé åã¨æ©æ¢°ã徿ãªé åã¯ç°ãªã£ã¦ããããã§ãããã¨ãããã¨ã¯ããããããå¼±ç¹ãè£ãåãã°ããè¯ãäºæ¸¬ãã§ããã®ã§ã¯ãªããï¼ãæ¯æçã¯äººéã®äºæ¸¬ã®çµæãªã®ã§ãç§ã®ã¢ãã«ã¨æ¯æçãçµã¿åãããã°äºæ¸¬ç²¾åº¦ãåä¸ã§ãããããããªãã
ã¨ããããã§ã絶対çè½åå¤ã¢ãã«ã¨ç¸å¯¾çè½åå·®ã¢ãã«ã®ä¸¡æ¹ã®ç´ æ§ã«æ¯æçãå ãã¦ã¿ãããã®çµæã以ä¸ã§ããã
> #絶対çè½åå¤ã¢ãã«ã®ç´ æ§ã«æ¯æçã追å ãã¦äºæ¸¬ã¢ãã«ã使 > (rf.model4 <- randomForest( + order_of_finish ~ . - race_id, train)) Call: randomForest(formula = order_of_finish ~ . - race_id, data = train) Type of random forest: classification Number of trees: 500 No. of variables tried at each split: 6 OOB estimate of error rate: 24.88% Confusion matrix: FALSE TRUE class.error FALSE 8967 3793 0.2972571 TRUE 2534 10134 0.2000316 > > #ãã¹ããã¼ã¿ã§äºæ¸¬åãè¦ã¦ã¿ã > pred <- predict(rf.model4, test) > tbl <- table(pred, test$order_of_finish) > sum(diag(tbl)) / sum(tbl) [1] 0.7491004 > > #ç¸å¯¾çè½åå·®ã¢ãã«ã®ç´ æ§ã«æ¯æçã追å ãã¦äºæ¸¬ã¢ãã«ã使 > (rf.model5 <- randomForest( + order_of_finish ~ . - race_id, scaled.train)) Call: randomForest(formula = order_of_finish ~ . - race_id, data = scaled.train) Type of random forest: classification Number of trees: 500 No. of variables tried at each split: 5 OOB estimate of error rate: 25.26% Confusion matrix: FALSE TRUE class.error FALSE 8936 3850 0.3011106 TRUE 2572 10070 0.2034488 > > #ãã¹ããã¼ã¿ã§äºæ¸¬åãè¦ã¦ã¿ã > pred <- predict(rf.model5, scaled.test) > tbl <- table(pred, scaled.test$order_of_finish) > sum(diag(tbl)) / sum(tbl) [1] 0.7457017
両ã¢ãã«ã¨ã0.5ï½1%ç¨åº¦ã ããæ¯æçã ãã使ã£ãã¢ãã«ã®äºæ¸¬åãä¸åã£ã¦ããã
ããã§ããããäºæ¸¬ç²¾åº¦ã74%ãè¶
ãããã¨ãã§ãããã¤ãã¿ã¼ï¼(*´Ïï½*)
ã¡ãªã¿ã«ããã¾ã§ã®Rã³ã¼ãã¯ここã«ã¾ã¨ãã¦ããã®ã§ããã£ããã©ããã
ã¾ããã®ç¨åº¦ã®äºæ¸¬ååä¸ã§ã¯ãæ§é¤çãé«ã競馬ã§ã¯å²ãããã¨ãã§ããªãã ããããã©ãä»åã¯ç«¶é¦¬å¸å ´ã®å¹çæ§ãå®å
¨ã§ã¯ãªãã¨ããã£ãã ãã§ãè¯ãã¨ãããã
äºæ¸¬ç²¾åº¦ã74%ãè¶
ããæç¹ã§ãªãã ãããæ°ãå°½ãã¦ãã¾ã£ãã®ã§ãä»åã¯ããã¾ã§ã次åã«ç¶ãã¾ãã
ä»å¾ã®äºå®
- JRDBã®ãã¼ã¿ã使ããJRDBã«ã¯ãIDMã馬ã®ä½æ ¼ãè¹ãªã©ãä»ã«ã¯ãªããã¼ã¿ãè±å¯ã«å«ã¾ãã¦ããã®ã§ãäºæ¸¬ç²¾åº¦åä¸ã®ä½å°ããããããã¡ãªã¿ã«JRDBã®ãã¼ã¿ã¯å¤ã馬å¸è£å¤ã§è©±é¡ã«ãªã£ãåæ°ã使ç¨ãã¦ããã
- ä»åã¯æçµæ¯æçããã®ã¾ã¾ç´ æ§ã«è¿½å ããããå®éã«å©ç¨å¯è½ãªãã¼ã¿ã¯ã¬ã¼ã¹éå§ç´åã®æ¯æçã§ãããæçµæ¯æçã¨ã¯ãºã¬ããããããããªããã¬ã¼ã¹éå§ç´åã®ååãªããºãJRDBã®ç´åæ å ±ãã¼ã¿ã«å«ã¾ãã¦ããã®ã§ããã使ãããã«ããã
- ä»åã¯ä¸å¤®ç«¶é¦¬ã®ãã¼ã¿ã使ã£ã¦ããããå°æ¹ç«¶é¦¬ã®æ¹ãå²ããããå¯è½æ§ãé«ããããªããªãå°æ¹ç«¶é¦¬ã®ã»ããæ³¨ç®åº¦ãä½ããå¸å ´ã®å¹çæ§ãä½ãããªã®ã§ãï¼ãã®ä»£ããã«ã¬ã¼ã¹éå§ç´åã®ãªããºã®å¤åãæ¿ããã¿ããã ãã©â¦ï¼
- è¡çµ±ã®ãã¼ã¿ãæ´»ç¨ãããè¡çµ±ãã©ãæ°å¤åããããã¡ãã£ã¨æ©ã¾ããããå¤åJRA-VANããã£ã¦ãã親馬ãã«ãã´ãªå¤æ°ã«ãã¦ãã¾ãæ¹æ³ãä¸çªç°¡åã
- ã¬ã¼ã¹ãèãããã©ãããäºæ¸¬ããã»ããç°¡åãããã®ã§ãå¾ã§ãã£ã¡ã®æ¹æ³ã試ããâ¦ã¨æã£ã¦ä»ã¡ãã£ã¨ã ããã£ã¦ã¿ããã©ããã¾ããã¾ããããªããããâ¦
- ããããã®ç®çã¯ç«¶é¦¬ã§å²ãããã¨ã§ããããã®ããã«ã¯äºæ¸¬åã§ã¯ãªãååçãé«ããªããã°ãªããªãããªã®ã§å¼·åå¦ç¿ãéºä¼çã¢ã«ã´ãªãºã ã使ã£ã¦ååçãé«ããªãããã«å¦ç¿ãããæ¹ãæã¿ããããããï¼æå¸«ããå¦ç¿ã¨ã¯éãå¼·åå¦ç¿ãéºä¼çã¢ã«ã´ãªãºã ã§ããã°ååçã®é«ããã®ãã®ãæå¤§åããããã«å¦ç¿ããããã¨ãã§ããï¼
åèãã¼ã¸
ä»åã®è¨äºãæ¸ãã«ããã£ã¦ãç§ãæãåèã«ããã®ã¯JRA-VANの予測モデル解説ã¨åæ°ã®æ¸ç±ï¼これã¨これï¼ãããã¦Stefan Lessmannの競馬論文ã®ä¸ã¤ã§ãããããåã®è§£èª¬ã¯ä¸æããã¦æå³ãããããã¨ããæ¹ã¯ãããã®ãã¼ã¸ãåèã«ããããã
ç¶ããæ¸ãã¾ãã
*1:ã©ã³ãã ã«é¦¬å¸ãè²·ã£ãå ´åã®è©±
*2:ãã£ã¨å®ãè²·ããããæ ªãªã©ããããããã¡ãã¯ææ°æãé«ã
*3:ãã¡ããå½¼ãã®éãè¯ãã£ãã ãã®å¯è½æ§ããããã©
*4:ããã ãããã£ã¦ããããã§ã¯ãªãããã©
*5:ä»ã«ãèµ°ç ´ã¿ã¤ã ãäºæ¸¬ããæ¹æ³ãããããã ããçµå±ã¯äºæ¸¬ãããã¿ã¤ã ãå ã«ãã¦ä½çããäºæ¸¬ããã®ã ãããå¾è ã®æ¹æ³ã«å«ã¾ããæ±ãã«ãã
*6:ç§ã¯å®éã«å®é¨ããããã§ã¯ãªãã®ã§ãå³å¯ãªçé ã®æ°å¤ããäºæ¸¬ãããã¨ã«ããã©ãã ãã®ãã¤ã¢ã¹ãå ¥ãã®ãã¯ç¥ããªããã²ãã£ã¨ãããç¡è¦ã§ããã»ã©ã«å°ããéãããããªãããããä»®ã«ããã ã£ãã¨ãã¦ããã¾ãæåã¯ã·ã³ãã«ãªæ¹æ³ã試ãã¹ãã ã¨æãã®ã§ãããã§ã¯ãä¸çã«ãªããã©ããã®äºå¤ããäºæ¸¬ããæ¹æ³ãæ¡ç¨ããã
*7:æ£ä¾ã¨è² ä¾ã®æ¯çãåã£ã¦ãããã¼ã¿ãä¾ãã°æ£ã¨è² ã®æ¯çãï¼å¯¾ï¼ï¼ã¨ãªã£ã¦ãããããªãã¼ã¿ã®ãã¨
*8:ã¡ãªã¿ã«ç§ã¯åé¡åé¡ã«ã¯ã©ã³ãã ãã©ã¬ã¹ãã°ãã使ã£ã¦ããã©ã³ãã ãã©ã¬ã¹ãä¿¡è ã ãã ã£ã¦OOBã¨ã©ã¼ãç´ æ§ã®éè¦åº¦ãç°¡åã«è¦ãããããã¤ãã¼ãã©ã¡ã¼ã¿ã®ãã¥ã¼ãã³ã°ã楽ã ãããããããã¥ã¼ãã³ã°èªä½ãããªãã¦ãããã©ã«ãã®ãã©ã¡ã¼ã¿ã§è¯ãæ§è½ãåºããã¨ãå¤ããâ¦
*9:æ¯æç = 0.788 / (ãªã㺠- 0.1) ã¨ããå¼ã§è¨ç®ã§ãã
*10:ã¡ãªã¿ã«ã馬ã®ç¸å¯¾çãªè½åå·®ãä½¿ãæ¹æ³ã«ã¯JRA-VANの対決型モデルã®ãããªæ¹æ³ããã