æ©æ¢°å¦ç¿æç¿ã: æ°å¤ã«ãããã¼ã¿ã®è¦ç´ã¨å¯è¦åææ³
ãå ¥é æ©æ¢°å¦ç¿ãæç¿ããä»æ¥ã¯ã2ç« ãã¼ã¿ã®èª¿æ»ãã§ãã
æ°å¤ã«ãããã¼ã¿ã®è¦ç´ã¨ãå¯è¦åææ³ãå¦ã³ã¾ãã
ãã¹ãç¨ãã¼ã¿ã®èªã¿è¾¼ã¿
> setwd("02-Exploration/") > data.file <- file.path('data', '01_heights_weights_genders.csv') > heights.weights <- read.csv(data.file, header = TRUE, sep = ',') > head(heights.weights) Gender Height Weight 1 Male 73.84702 241.8936 2 Male 68.78190 162.3105 3 Male 74.11011 212.7409 4 Male 71.73098 220.0425 5 Male 69.88180 206.3498 6 Male 67.25302 152.2122
ãã¼ã¿ã®æ°å¤ã«ããè¦ç´
summary
ã§ãã¯ãã«ã®æ°å¤ãè¦ç´ãã¾ã
> summary(heights.weights$Height) Min. 1st Qu. Median Mean 3rd Qu. Max. 54.26 63.51 66.32 66.37 69.17 79.00
å·¦ããã
Min
.. æå°å¤1st Qu
.. 第ä¸ååä½(ãã¼ã¿å ¨ä½ã®ä¸ãã25%ã®ä½ç½®ã«ãããå¤)Median
.. ä¸å¤®å¤(ãã¼ã¿å ¨ä½ã®50%ã®ä½ç½®ã«ãããå¤)Mean
.. å¹³åå¤3rd Qu.
.. (ãã¼ã¿å ¨ä½ã®ä¸ãã75%ã®ä½ç½®ã«ãããå¤)Max
.. æ大å¤
ã表示ããã¾ãã
æå°å¤ãæ大å¤ãæ±ãã
min/max
ã使ã£ã¦ãæå°å¤/æ大å¤ãç®åºã§ãã¾ã
# Heightã ããå«ããã¯ãã«ãä½æ > heights <- with(heights.weights, Height) > head(heights) [1] 73.84702 68.78190 74.11011 71.73098 69.88180 67.25302 > min(heights) [1] 54.26313 > max(heights) [1] 78.99874
range
ã§ã両æ¹ãã¾ã¨ãã¦è¨ç®ãããã¨ãã§ãã¾ãã
> range(heights) [1] 54.26313 78.99874
åä½æ°ãæ±ãã
quantile
ã§ããã¼ã¿ä¸ã®åä½ç½®ã®ãã¼ã¿ãåºåã§ãã¾ãã
> quantile(heights) 0% 25% 50% 75% 100% 54.26313 63.50562 66.31807 69.17426 78.99874
åå²å¹ ãæå®ãããã¨ãã§ãã¾ãã
> quantile(heights, probs = seq(0, 1, by = 0.20)) 0% 20% 40% 60% 80% 100% 54.26313 62.85901 65.19422 67.43537 69.81162 78.99874
åæ£ã¨æ¨æºåå·®ãæ±ãã
var
,sd
ã使ãã¾ãã
# æ¨æºåå·® > var(heights) [1] 14.80347 # åæ£ > sd(heights) [1] 3.847528
ãã¼ã¿ã®å¯è¦å
å¿ è¦ãªã©ã¤ãã©ãªãèªã¿è¾¼ã¿ã
> library('ggplot2')
ãã¹ãã°ã©ã
> plot = ggplot(heights.weights, aes(x = Height)) + geom_histogram(binwidth = 1) > ggsave(plot = plot, filename = "histgram.png", width = 6, height = 8)
å¯åº¦ããããã«ãã¦ã¿ã¾ããå°ãªããã¼ã¿éã§ãããã¼ã¿ã»ããã®å½¢ç¶ãåãããããã®ãã¡ãªããã
> plot = ggplot(heights.weights, aes(x = Height)) + geom_density() > ggsave(plot = plot, filename = "kde_histgram.png", width = 6, height = 8)
æ§å¥ãã¨ã®ç¹å¾´ãã¿ããããæ§å¥ãã¨ã®ãã¹ãã°ã©ã ã表示ãã¦ã¿ã¾ãã
> plot = ggplot(heights.weights, aes(x = Height, fill = Gender)) + geom_density() + facet_grid(Gender ~ .) > ggsave(plot = plot, filename = "gender_histgram.png", width = 6, height = 8)
ãã¹ãã°ã©ã ã®åé¡ãæ´çã詳ããã¯Wikipediaã§ã
- æ£è¦åå¸
- ãã¼ã¯(=æé »å¤)ã1ã¤ãããªããåå³°åå¸
- å·¦å³ã対称
- 裾ãèã(ãã¼ã¿ã®ã°ãã¤ããå°ãã)
- ã³ã¼ã·ã¼åå¸
- ãã¼ã¯(=æé »å¤)ã1ã¤ãããªããåå³°åå¸
- å·¦å³ã対称
- 裾ãåã(ãã¼ã¿ã®ã°ãã¤ãã大ãã)
- ã¬ã³ãåå¸
- å·¦å³ãé対称ã§ãå¹³åå¤ã¨ä¸å¤®å¤ã大ããç°ãªã
- ææ°åå¸
- å·¦å³ãé対称ã§ãæé »å¤ãã¼ãã
æ£è¦åå¸ã®ä¾ã
> set.seed(1) > normal.values <- rnorm(250, 0, 1) > plot = ggplot(data.frame(X = normal.values), aes(x = X)) + geom_density() > ggsave(plot = plot, filename = "normal_histgram.png", width = 6, height = 8)
ã³ã¼ã·ã¼åå¸ã
> cauchy.values <- rcauchy(250, 0, 1) > plot = ggplot(data.frame(X = cauchy.values), aes(x = X)) + geom_density() > ggsave(plot = plot, filename = "cauchy_histgram.png", width = 6, height = 8)
ã¬ã³ãåå¸ã
> gamma.values <- rgamma(100000, 1, 0.001) > plot = ggplot(data.frame(X = gamma.values), aes(x = X)) + geom_density() > ggsave(plot = plot, filename = "gamma_histgram.png", width = 6, height = 8)
ææ°åå¸ãã¯ãªãã
æ£å¸å³
身é·ã¨ä½éã®æ£å¸å³ãæãã¾ãã
> plot = ggplot(heights.weights, aes(x = Height, y = Weight)) + geom_point() > ggsave(plot = plot, filename = "scatterplots.png", width = 6, height = 8)
身é·ãä½éã«ã¯ç¸é¢é¢ä¿ããããããgeom_smooth()
ã使ã£ã¦ã妥å½ãªäºæ¸¬é åã表示ãã¦ã¿ã¾ãã
> plot = ggplot(heights.weights, aes(x = Height, y = Weight)) + geom_point() + geom_smooth() > ggsave(plot = plot, filename = "scatterplots2.png", width = 6, height = 8)
æå¾ã«ãç·å¥³å¥ã®æ£å¸å³ãæãã¦çµããã
> plot = ggplot(heights.weights, aes(x = Height, y = Weight, color = Gender)) + geom_point() > ggsave(plot = plot, filename = "gender_scatterplots.png", width = 6, height = 8)