dplyrããã±ã¼ã¸ä¾¿å©ãããããããããããããã
ãããªè³æãããã¾ãã
ãplyrããã±ã¼ã¸ã§åãåå¦çã¹ã¿âãæ¹ããplyrããã±ã¼ã¸å¾¹åºå
¥éã
http://www.slideshare.net/teramonagi/tokyo-r30-20130420
plyrããã±ã¼ã¸ã¯é常ã«ä¾¿å©ãªã®ã§ãããã¨ã¦ã大ããªãã¼ã¿ã«å¯¾ãã¦ã¯é
ãã
ddplyãããã©é
ããé
ããããï¼ï¼ï¼ï¼ï¼ä¿ºã¯Python-pandasã§éè¨ããããããããããããã£ã¦äººãããã¨æãã¾ãã
ã¾ãdata.tableããã±ã¼ã¸ã使ã£ãããããã¨ã§ããç¨åº¦é«éåã§ãããã ããã©ããplyrãggplot2ãªã©Hadley Ecosystemã«ã©ã£ã·ã浸ãã£ã¦ãã身ã¨ãã¦ã¯ãæ
£ã親ããã ææ³ã®ã¾ã¾éè¨ãé«éåããã¦ã»ãããï¼
ã¨ãããã¨ã§æç¥ããã£ã¦ããã¾ããã
dplyrããã±ã¼ã¸ã§ãã
Rbloggersã§é度æ¤è¨¼è¨äºãä¸ãã£ã¦ããã¾ãã
http://www.r-statistics.com/2013/09/a-speed-test-comparison-of-plyr-data-table-and-dplyr/
ãã ãããããéçºä¸ã®ããã±ã¼ã¸ãªã®ã§ãä»ã¾ã§éãéè¨ããã¨
çµæãããã¾ã§ã¨é£ãéãçããã¾ãããã注æã
dplyr:summarise behavior differs when operating on data frame vs. data table
https://groups.google.com/forum/#!topic/manipulatr/sg_54p-6Sk4
install.packages("data.table") # dplyrã¯githubã§éçºä¸ã®ããã±ã¼ã¸ã§CRANã«ã¯ã¾ã ä¸ãã£ã¦ããªãã®ã§ # devtoolsã®install_githubã使ã£ã¦ã¤ã³ã¹ãã¼ã« library(devtools) install_github("assertthat") install_github("dplyr") library(dplyr) #cylã®æ°´æºãã¨ã«ã«ã¦ã³ã summarise(group_by(mtcars, cyl), count=length(cyl)) dt1 <- tbl_dt(mtcars) #ãã¼ã¿ãã¼ãã«ã«å¤æ # å¤ãªçµæãåºã summarise(group_by(dt1, cyl), count=length(cyl)) # ã«ã¦ã³ããããæã¯lengthã§ã¯ãªãã¦né¢æ°ã使ã summarise(group_by(dt1, cyl), count=n())
enjoy!