å¤å¤é解æææ³ã®ç°¡æã¡ã¢ãªã©
Rã«ãããã¼ã¿ãµã¤ã¨ã³ã¹ã£ã¦æ¬ã®ãã¼ã¿è§£æã¡ã¢ã§ãã.
ãã®æ¬ã¯ããããä¸è¬çãªææ³ã®ç¶²ç¾
çãªè§£èª¬ + åèæç®è±å¯ã§è¯ãæã.
主æååæ(Principal Component Analysis)
- ç®ç
- å¤å¤éãã¼ã¿ãå°ãªãå¤æ°ã§è¡¨ç¾ã§ããããã«ãã
- é常ã¯2~3å¤æ°ã«ç¸®ç´ããå ´åãå¤ãã
- åæ£ãæ大åããææ³
- å¤å¤é解æã¨ãã¦ã¯æãæåãªææ³ã®ä¸ã¤
- åæ£å ±åæ£è¡åã®åºæå¤åé¡ã¨ã¿ãªã
- 主æåå¾ç¹ãäºæ¸¬ãbiplot...
- é¢ä¿ã®å¼·ãææ³ã¨ãã¦ãã«ã¼ãã«ä¸»æååæ(kpca, éç·å½¢ä¸»æååæ)ãç¬ç«æååæãªã©ãããã
- R
- princompãprcompãªã©ãå©ç¨å¯è½
- princompãæ®éå©ç¨ãã
- prcompã¯ãã¼ã¿ã®ã¹ã±ã¼ã«ãªã©å©ç¨å¯è½
ãµã³ãã«
require(graphics) ## The variances of the variables in the ## USArrests data vary by orders of magnitude, so scaling is appropriate (pc.cr <- princomp(USArrests)) # inappropriate princomp(USArrests, cor = TRUE) # =^= prcomp(USArrests, scale=TRUE) ## Similar, but different: ## The standard deviations differ by a factor of sqrt(49/50) summary(pc.cr <- princomp(USArrests, cor = TRUE)) loadings(pc.cr) ## note that blank entries are small but not zero plot(pc.cr) # shows a screeplot. biplot(pc.cr) ## Formula interface princomp(~ ., data = USArrests, cor = TRUE) # NA-handling USArrests[1, 2] <- NA pc.cr <- princomp(~ Murder + Assault + UrbanPop, data = USArrests, na.action=na.exclude, cor = TRUE) pc.cr$scores
å ååæ
- ç®ç
- 主ã«å¿çå¦ã社ä¼å¦ãªã©ã§å¤çåºæºããªãéçãã¼ã¿ããå ±éå åãè¦ã¤ãåºãæ¢ç´¢çãã¼ã¿è§£ææã«å©ç¨
- å¤æ°éã®ç¸é¢é¢ä¿ããå ±éå åãæ±ãã
- æ´å²
- 1904 Spearmanã«ãã£ã¦æå±ããã
- å©ç¨ã®éã®æ³¨æç¹
- å¤ç¾©ç解éãå¯è½ãªã®ã§ãèªåã«é½åã®è¯ã解éãå¯è½ãªã®ã§æ³¨æãå¿ è¦
- 客観çæå³ä»ããã§ããããã«ä½¿ãäºãéè¦
- å¤ç¾©æ§ãå°ãªã主æååæã対å¿åæãªã©ãå ¼ç¨ããã¨å
- ã¢ã«ã´ãªãºã
- 主å åæ³ãæå°¤æ³ãªã©ãªã©
- 主å åæ³: å®å®ããçµæãå¾ããã
- æå°¤æ³ã: ãã¼ã¿ãæ£è¦åå¸ã«å¾ãã¨ãã«å©ç¨ããã¨åãä½ãåå¿è åãã§ã¯ãªã
- 主å åæ³ãæå°¤æ³ãªã©ãªã©
- å åã®å転
- 解éã®ä¾¿å®ã®ãã,é«ãç¸é¢ããã¤é ç®ãå ±éå åã¨ãã¦ç©ºéä¸ã®è»¸ã決ããæä½ãè¡ãããããå åã®å転ã§ããã
- 種é¡
- ç´è¡å転ãæ交å転
- ç´è¡å転: ããªããã¯ã¹(ãã使ããã)ããã¤ã³ã¼ãã£ããã¯ã¹ãã³ã¼ãã£ããã¯ã¹ãã¨ã¯ã£ããã¯ã¹
- æ交å転: ããããã¯ã¹(ãã使ããã)ãã³ããªãã³ããã¤ã³ã¼ãã£ãã³ãã³ã¼ãã£ãã³
- R
- ããã±ã¼ã¸: stats
- é¢æ°: factanal
ãµã³ãã«
# A little demonstration, v2 is just v1 with noise, # and same for v4 vs. v3 and v6 vs. v5 # Last four cases are there to add noise # and introduce a positive manifold (g factor) v1 <- c(1,1,1,1,1,1,1,1,1,1,3,3,3,3,3,4,5,6) v2 <- c(1,2,1,1,1,1,2,1,2,1,3,4,3,3,3,4,6,5) v3 <- c(3,3,3,3,3,1,1,1,1,1,1,1,1,1,1,5,4,6) v4 <- c(3,3,4,3,3,1,1,2,1,1,1,1,2,1,1,5,6,4) v5 <- c(1,1,1,1,1,3,3,3,3,3,1,1,1,1,1,6,4,5) v6 <- c(1,1,1,2,1,3,3,3,4,3,1,1,1,2,1,6,5,4) m1 <- cbind(v1,v2,v3,v4,v5,v6) cor(m1) factanal(m1, factors=3) # varimax is the default factanal(m1, factors=3, rotation="promax") # The following shows the g factor as PC1 prcomp(m1) ## formula interface factanal(~v1+v2+v3+v4+v5+v6, factors = 3, scores = "Bartlett")$scores ## a realistic example from Bartholomew (1987, pp. 61-65) utils::example(ability.cov)
対å¿åæ
- 対å¿åæã¨ã¯
- é »åº¦ãã¼ã¿ã質çãã¼ã¿ã®åä½ã¨å¤æ°ã¨ã®é¢é£æ§ããã¿ã¼ã³åæãè¡ãææ³
- ã³ã¬ã¹ãã³ãã³ã¹åæã¨ãå¼ã°ãã
- æ°éåIIIé¡ã¨ä¼¼ã¦ãã
- 主ã«ãã©ã³ã¹ã§ãã使ããã¦ããã¨ã®ãã¨ããã©ã³ã¹ãã©ã³ã¹
- R
- ããã±ã¼ã¸: MASS
- é¢æ°: mca
ãµã³ãã«
farms.mca <- mca(farms, abbrev=TRUE) farms.mca plot(farms.mca)
å¤æ¬¡å
尺度æ³(MDS: Multi-Dimensional Scaling)
- ç®æ¨
- ãã¼ã¿ã®åä½éã®é¡ä¼¼åº¦ãè·é¢ãæ±ãã¦ããã2~3次å ã«ãããããã¦ãã¼ã¿ã®æ§é ããã¿ã¼ã³å½¢æãªã©ãææ¡ããææ³
- åé¡
- è¨éãéè¨éã®äºç¨®é¡
- è¨éçMDS
- 解æã®æµã
- è·é¢ãæ±ãã
- 座æ¨å¤ãæ±ãã
- 2~3次å ä¸ã§åä½ãé ç½®ãã(æ£å¸å³ä½æ)
- ä¿¡é ¼æ§ãªã©ã®èå¯
- 解æã®æµã
- Rã§ã®æ±ãæ¹
- cmdscaleãå©ç¨
ãµã³ãã«(ã¨ã¼ãããã®é½å¸ã®è·é¢ãã¼ã¿)
require(graphics) loc <- cmdscale(eurodist) x <- loc[,1] y <- -loc[,2] plot(x, y, type="n", xlab="", ylab="", main="cmdscale(eurodist)") text(x, y, rownames(loc), cex=0.8) cmdsE <- cmdscale(eurodist, k=20, add = TRUE, eig = TRUE, x.ret = TRUE) utils::str(cmdsE)
- éè¨éçMDS
- è¨éçMDSã¯ç´æ¥è·é¢ãªã©ãæ±ããããäºãåæã¦ãã¦ããããå¿çãã¼ã¿ãªã©ã®è¦ªè¿æ§ãã¼ã¿ã¯è·é¢ã®æ§è³ªãæºãããªã
- è·é¢ã®æ§è³ªãæºãããªãé¡ä¼¼æ§ãã¼ã¿ãå©ç¨å¯è½ã«ããã®ãéè¨éçMDS
- ãã¼ã¯ã¼ã
- ã«ã«ã¹ã«ã¹ã®ã¹ãã¬ã¹
- ã¹ãã¬ã¹ãæå°ã«ããææ³
- ã«ã«ã¹ã«ã¹ã®ã¹ãã¬ã¹
- R
- ããã±ã¼ã¸: MASS, mlbench, e1071, vegenãªã©
- é¢æ°: isoMDS, sammon, metaMDS
isoMDS
swiss.x <- as.matrix(swiss[, -1]) swiss.dist <- dist(swiss.x) swiss.mds <- isoMDS(swiss.dist) plot(swiss.mds$points, type = "n") text(swiss.mds$points, labels = as.character(1:nrow(swiss.x))) swiss.sh <- Shepard(swiss.dist, swiss.mds$points) plot(swiss.sh, pch = ".") lines(swiss.sh$x, swiss.sh$yf, type = "S")
ã¯ã©ã¹ã¿åæ
- 大ããåãã¦å¤§ããä¸ç¨®é¡ãã
- é層ãéé層(ã°ã«ã¼ãæ°æå®)ãã¢ãã«ã«åºã¥ãææ³
é層çã¯ã©ã¹ã¿ãªã³ã°
- 注æäºé
- åä½æ°ã大ããã¨è¨ç®éãè¨å¤§ã«ãªãã®ã§å¤§è¦æ¨¡ãã¼ã¿ã«ã¯ä¸åã
- å°è¦æ¨¡ãã¼ã¿ã§ãé©ç¨ã¯è¨ç»çã«
- 解æã®æµã
- ãã¼ã¿ããè·é¢(or é¡ä¼¼åº¦)ãæ±ãã
- ã¯ã©ã¹ã¿åæææ³ã®é©ç¨
- ã³ã¼ãã§ã³è¡åãæ±ãã
- ã³ã¼ãã§ã³è¡åãã樹形å³ãä½ã
- çµæã«ã¤ãã¦ã®æ¤è¨ãè¡ã
- ææ³
- æè¿é£æ³, æé é£æ³, 群平åæ³, ã¡ãã£ã¢ã³æ³, éå¿æ³, ã¦ã©ã¼ãæ³
- R
- é¢æ°: hclust
ãµã³ãã«
require(graphics) hc <- hclust(dist(USArrests), "ave") plot(hc) plot(hc, hang = -1) ## Do the same with centroid clustering and squared Euclidean distance, ## cut the tree into ten clusters and reconstruct the upper part of the ## tree from the cluster centers. hc <- hclust(dist(USArrests)^2, "cen") memb <- cutree(hc, k = 10) cent <- NULL for(k in 1:10){ cent <- rbind(cent, colMeans(USArrests[memb == k, , drop = FALSE])) } hc1 <- hclust(dist(cent)^2, method = "cen", members = table(memb)) opar <- par(mfrow = c(1, 2)) plot(hc, labels = FALSE, hang = -1, main = "Original Tree") plot(hc1, labels = FALSE, hang = -1, main = "Re-start from 10 clusters") par(opar)
éé層çã¯ã©ã¹ã¿ãªã³ã°
- 解æã®æµã
- 1.Kåã®ã¯ã©ã¹ã¿ä¸å¿ãé©å½ã«æ±ºãã
- 2.å ¨ã¦ã®ãã¼ã¿ã®Kåã®ã¯ã©ã¹ã¿ä¸å¿ã¨ã®è·é¢ãæ±ãæãè¿ãã¯ã©ã¹ã¿ã®åé¡
- 3.å½¢æãããã¯ã©ã¹ã¿ã®ä¸å¿ãæ±ãã
- 2,3ã®ç¹°ãè¿ã
- R
- é¢æ°: kmeans
ãµã³ãã«
require(graphics) # a 2-dimensional example x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2), matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2)) colnames(x) <- c("x", "y") (cl <- kmeans(x, 2)) plot(x, col = cl$cluster) points(cl$centers, col = 1:2, pch = 8, cex=2) ## random starts do help here with too many clusters (cl <- kmeans(x, 5, nstart = 25)) plot(x, col = cl$cluster) points(cl$centers, col = 1:5, pch = 8)
ã¢ãã«ã«åºã¥ãææ³
- model-based clustering
- æ··ååå¸ã«ããã¯ã©ã¹ã¿ãªã³ã°ãæ½å¨ã¯ã©ã¹ã¿ãªã³ã°ã¨ãå¼ã°ãã
- 確çåå¸ã®å©ç¨
- æ大尤度æ¨æ¸¬æ³ãç¨ããEMã¢ã«ã´ãªãºã ãªã©
- R
- ããã±ã¼ã¸: mclust
- é¢æ°: EMclust, mclustBICãªã©
ãµã³ãã«
irisBIC <- mclustBIC(iris[,-5]) irisBIC plot(irisBIC) subset <- sample(1:nrow(iris), 100) irisBIC <- mclustBIC(iris[,-5], initialization=list(subset =subset)) irisBIC plot(irisBIC) irisBIC1 <- mclustBIC(iris[,-5], G=seq(from=1,to=9,by=2), modelNames=c("EII", "EEI", "EEE")) irisBIC1 plot(irisBIC1) irisBIC2 <- mclustBIC(iris[,-5], G=seq(from=2,to=8,by=2), modelNames=c("VII", "VVI", "VVV"), x= irisBIC1) irisBIC2 plot(irisBIC2) nNoise <- 450 set.seed(0) poissonNoise <- apply(apply( iris[,-5], 2, range), 2, function(x, n) runif(n, min = x[1]-.1, max = x[2]+.1), n = nNoise) set.seed(0) noiseInit <- sample(c(TRUE,FALSE),size=nrow(iris)+nNoise,replace=TRUE, prob=c(3,1)) irisNdata <- rbind(iris[,-5], poissonNoise) irisNbic <- mclustBIC(data = irisNdata, initialization = list(noise = noiseInit)) irisNbic plot(irisNbic)
èªå·±çµç¹åããã(SOM)
- æ¦è¦
- æ師ãã¼ã¿ããããªããã¥ã¼ã©ã«ãããã¯ã¼ã¯ãå©ç¨ãããã¿ã¼ã³åé¡ææ³ã
- é«æ¬¡å ãã¼ã¿ãäºæ¬¡å å¹³é¢ã¸éç·å½¢å°å½±ãã
- å ¥å層ã¨åºå層ã«ããæ§æãããäºå±¤ã®ãã¥ã¼ã©ã«ãããã¯ã¼ã¯
- R
- ããã±ã¼ã¸: kohonen, som
- é¢æ°: som, somgrid, plot.kohonenãªã©ãªã©
ãµã³ãã«
data(wines) set.seed(7) training <- sample(nrow(wines), 120) Xtraining <- scale(wines[training, ]) Xtest <- scale(wines[-training, ], center = attr(Xtraining, "scaled:center"), scale = attr(Xtraining, "scaled:scale")) som.wines <- som(Xtraining, grid = somgrid(5, 5, "hexagonal")) som.prediction <- predict(som.wines, newdata = Xtest, trainX = Xtraining, trainY = factor(wine.classes[training])) table(wine.classes[-training], som.prediction$prediction)
ç·å½¢å帰åæ
- å帰åæã¯æãä¸è¬çãªææ³ä¸ã¤è©±é¡ãè±å¯ãªææ³
- ç·å½¢å帰åæã¨ã¯
- éçãã¼ã¿ãç®çå¤æ°ã¨ããæãåºæ¬çãªè§£æææ³(ã£ã¦)
- ä¸ã¤ã®èª¬æå¤æ°ã®å ´å: åå帰åæ
- è¤æ°ã®èª¬æå¤æ°ã®å ´å: éå帰åæ
- R
- é¢æ°: lm
ãµã³ãã«
require(graphics) ## Annette Dobson (1990) "An Introduction to Generalized Linear Models". ## Page 9: Plant Weight Data. ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) group <- gl(2,10,20, labels=c("Ctl","Trt")) weight <- c(ctl, trt) anova(lm.D9 <- lm(weight ~ group)) summary(lm.D90 <- lm(weight ~ group - 1))# omitting intercept summary(resid(lm.D9) - resid(lm.D90)) #- residuals almost identical opar <- par(mfrow = c(2,2), oma = c(0, 0, 1.1, 0)) plot(lm.D9, las = 1) # Residuals, Fitted, ... par(opar) ## model frame : stopifnot(identical(lm(weight ~ group, method = "model.frame"), model.frame(lm.D9))) ### less simple examples in "See Also" above
éç·å½¢å帰åæ
- ç·å½¢å帰åæã¨ã¯
- éçãã¼ã¿ãç®çå¤æ°ã¨ãã解æææ³. éç·å½¢
- ææ³
- ãã¸ã¹ãã£ã¯ã¹å帰ãå¤é å¼å帰ãä¸è¬åç·å½¢ã¢ãã«, å¹³æ»åå帰ã¨å æ³ã¢ãã« ãªã©ãªã©
- R
- ããã±ã¼ã¸: stats, mgcv
- é¢æ°: nls, glm(ä¸è¬åç·å½¢ã¢ãã«), gam(å¹³æ»åå帰)
ç·å½¢å¤å¥åæ
- å帰åæã¨ã®æ¯è¼
- å帰åæ: å¤çåºæºãéçãã¼ã¿
- å¤å¥åæ: å¤çåºæºã質çãã¼ã¿
- ãã£ãã·ã£ã¼ã®ç·å½¢å¤å¥é¢æ°
- irisã®å¤å¥ãªã©ãªã©
- å¤å ¸çææ³
- ç·å½¢å¤å¥ã®æ³¨æäºé
- çåæ£ã®å¶ç´æ¡ä»¶ãå¿ è¦
- 大éã®å¤æ°ã«ã¯åããªã
- R
- é¢æ°:lda
- predict, cross-validationãªã©ä½µç¨ãã
ãµã³ãã«
Iris <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]), Sp = rep(c("s","c","v"), rep(50,3))) train <- sample(1:150, 75) table(Iris$Sp[train]) ## your answer may differ ## c s v ## 22 23 30 z <- lda(Sp ~ ., Iris, prior = c(1,1,1)/3, subset = train) predict(z, Iris[-train, ])$class ## [1] s s s s s s s s s s s s s s s s s s s s s s s s s s s c c c ## [31] c c c c c c c v c c c c v c c c c c c c c c c c c v v v v v ## [61] v v v v v v v v v v v v v v v (z1 <- update(z, . ~ . - Petal.W.))
- éç·å½¢ã¨æ¯ã¹ãã¨ãã¾ã使ãããªã
éç·å½¢å¤å¥åæ
- éç·å½¢å¤å¥åæã¨ã¯
- ç·å½¢å¤å¥ä»¥å¤ã®å ¨é¨(ã)ã®ææ³
- ãªã®ã§ãéç·å½¢çææ³ä»¥å¤ã«ãè·é¢ã«åºã¥ããå¤å¥ææ³ãå¤æ°æ±ºã®å¤å¥æ¹æ³ããã¤ãºå¤å¥æ¹æ³ãæ©æ¢°å¦ç¿ã«ããå¤å¥æ¹æ³ãå«ãã
- å¤å¥é¢æ°ã«ããå¤å¥åæ
- äºæ¬¡å¼ã«ä¾ãå¤å¥é¢æ°ã¨ãã¦Rã ã¨qdaããã
- R
- ããã±ã¼ã¸: MASS
- é¢æ°: qda
ãµã³ãã«
tr <- sample(1:50, 25) train <- rbind(iris3[tr,,1], iris3[tr,,2], iris3[tr,,3]) test <- rbind(iris3[-tr,,1], iris3[-tr,,2], iris3[-tr,,3]) cl <- factor(c(rep("s",25), rep("c",25), rep("v",25))) z <- qda(train, cl) predict(z,test)$class
- è·é¢ã«ããå¤å¥åæ
- ããã©ããã¹è·é¢ãªã©å©ç¨
- R
- é¢æ°: mahalanobis
ãµã³ãã«
require(graphics) ma <- cbind(1:6, 1:3) (S <- var(ma)) mahalanobis(c(0,0), 1:2, S) x <- matrix(rnorm(100*3), ncol = 3) stopifnot(mahalanobis(x, 0, diag(ncol(x))) == rowSums(x*x)) ##- Here, D^2 = usual squared Euclidean distances Sx <- cov(x) D2 <- mahalanobis(x, colMeans(x), Sx) plot(density(D2, bw=.5), main="Squared Mahalanobis distances, n=100, p=3") ; rug(D2) qqplot(qchisq(ppoints(100), df=3), D2, main = expression("Q-Q plot of Mahalanobis" * ~D^2 * " vs. quantiles of" * ~ chi[3]^2)) abline(0, 1, col = 'gray')
- å¤æ°æ±ºã«ããå¤å¥åæ
- k-NNãªã©ããã
- R
- ããã±ã¼ã¸: class
- é¢æ°: knn
ãµã³ãã«
train <- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3]) test <- rbind(iris3[26:50,,1], iris3[26:50,,2], iris3[26:50,,3]) cl <- factor(c(rep("s",25), rep("c",25), rep("v",25))) knn(train, test, cl, k = 3, prob=TRUE) attributes(.Last.value)
- R
- ããã±ã¼ã¸: e1071, klaRãªã©
- é¢æ°: NaiveBayes
ãµã³ãã«
data(iris) mN <- NaiveBayes(Species ~ ., data = iris) plot(mN) mK <- NaiveBayes(Species ~ ., data = iris, usekernel = TRUE) plot(mK)
çååæ
- çååæã¨ã¯
- ã¤ãã³ããèµ·ããã¾ã§ã®æéã¨ã¤ãã³ãã¨ã®ããã ã®é¢ä¿ã®é¢ä¿ã«çç®ããææ³
- é©ç¨ç¯å²
- å·¥å¦: æ©æ¢°ã·ã¹ãã ã製åã®æ é
- å»å¦åé: ç¾æ£ã®ç æ°ã®åçºãæ»äº¡ãªã©
- çååæã§ã¯æ éãç ´å£ãåç£ãæ»äº¡ãªã©ã®ã¤ãã³ããåºç¾©ã§æ»äº¡ã¨ã¿ãªã
- R
- ããã±ã¼ã¸: survival
- é¢æ°: survfitãªã©
ãµã³ãã«
leukemia.surv <- survfit(Surv(time, status) ~ x, data = aml) plot(leukemia.surv, lty = 2:3) legend(100, .9, c("Maintenance", "No Maintenance"), lty = 2:3) title("Kaplan-Meier Curves\nfor AML Maintenance Study") lsurv2 <- survfit(Surv(time, status) ~ x, aml, type='fleming') plot(lsurv2, lty=2:3, fun="cumhaz", xlab="Months", ylab="Cumulative Hazard")
æç³»å
- ç®ç
- æç³»åãã¼ã¿ã®å¤åã®ç¹å¾´ãæããç¾è±¡ã®è§£æã¨å°æ¥ã®äºæ¸¬ãå¶å¾¡ãããããã«å©ç¨
- ã¢ãã«
- AR, ARMA, ARIMA, ARFIMA, GARCH, VARãªã©ãªã©
- äºæ¸¬ã¢ãã«ãä½ãéã«ã¯acf, pacf, AIC, ã¹ãã¯ãã«åæ, å ´åã«ãã£ã¦ã¯(è¤æ°æç³»åã¢ãã«ãªã©)ccfãäºåã«ç¢ºèªããå¿ è¦ãã
- å¾ã¯åä½æ ¹æ¤å®
- AR
- ARMA
- ARIMA
- ARFIMA
- èªå·±å帰å®æ°åå移åå¹³åã¢ãã«
- GARCH
- èªå·±å帰æ¡ä»¶ä»ãåæ£ä¸åä¸ã¢ãã«, ãã¼ãã«çµæ¸å¦è³
- æ´¾çãããã®ã«TGARCH, APARCHãªã©ããã
- VAR
- å¤å¤éèªå·±å帰ã¢ãã«
- arã§æ±ãããã
- R
- ããã±ã¼ã¸: tseries, fracdiff, fseries
- ar, arma, arima, fracdiff, garch ãªã©ãªã©
ãµã³ãã«
arima(lh, order = c(1,0,0)) arima(lh, order = c(3,0,0)) arima(lh, order = c(1,0,1)) arima(lh, order = c(3,0,0), method = "CSS") arima(USAccDeaths, order = c(0,1,1), seasonal = list(order=c(0,1,1))) arima(USAccDeaths, order = c(0,1,1), seasonal = list(order=c(0,1,1)), method = "CSS") # drops first 13 observations. # for a model with as few years as this, we want full ML arima(LakeHuron, order = c(2,0,0), xreg = time(LakeHuron)-1920) ## presidents contains NAs ## graphs in example(acf) suggest order 1 or 3 require(graphics) (fit1 <- arima(presidents, c(1, 0, 0))) tsdiag(fit1) (fit3 <- arima(presidents, c(3, 0, 0))) # smaller AIC tsdiag(fit3)
- ã«ãªã¹æç³»å
- ä¸è¦åã«å¤åããæç³»åãã¼ã¿ãéç·å½¢çã«è§£æããææ³
- R
- ããã±ã¼ã¸: tseriesChaos
- é¢æ°: embedd
ãµã³ãã«
library(scatterplot3d) x <- window(rossler.ts, start=90) xyz <- embedd(x, m=3, d=8) scatterplot3d(xyz, type="l")
決å®æ¨
- IF-THENã§åå²ããtreeãä½ãæç»
- ãããããã使ããã
- CARTãæå
- åé¡åºæºã¨ãã¦ã¨ã³ãããã¼ã¨ã¸ãå¤æ°ãå©ç¨
- åé¡åºæº
- Fçµ±è¨é, Ïäºä¹çµ±è¨éãªã©ãå©ç¨ããã
- R
- ããã±ã¼ã¸: mvpart
- é¢æ°: rpart
ãµã³ãã«
data(car.test.frame) z.auto <- rpart(Mileage ~ Weight, car.test.frame) summary(z.auto) plot(z.auto); text(z.auto) data(spider) fit1 <- rpart(data.matrix(spider[,1:12])~water+twigs+reft+herbs+moss+sand,spider,method="mrt") plot(fit1); text(fit1) fit2 <- rpart(data.matrix(spider[,1:12])~water+twigs+reft+herbs+moss+sand,spider,method="mrt",dissim="man") plot(fit2); text(fit2) fit3 <- rpart(gdist(spider[,1:12],meth="bray",full=TRUE,sq=TRUE)~water+twigs+reft+herbs+moss+sand,spider,method="dist") plot(fit3); text(fit3)
ãã¥ã¼ã©ã«ãããã¯ã¼ã¯
- éç·å½¢å帰åæãéç·å½¢å¤å¥åæ(ãã¿ã¼ã³èªè)ã®æåãªææ³
- 誤差éä¼ææ³ãæåã§ã¢ã¬
- 人éã«ã¯140ååã®ãã¥ã¼ãã³ããã
- ã¢ãã«
- é層åãããã¯ã¼ã¯ãéé層åãããã¯ã¼ã¯ãåä¸ä¸é層ãããã¯ã¼ã¯ãªã©ãªã©
- 注æäºé
- overfittingãããã
- å¦ç¿æ¹æ³ã«ãã£ã¦çµæãå¤ãã ãªã©ãªã©
- R
- ããã±ã¼ã¸: nnet
ãµã³ãã«
# use half the iris data ir <- rbind(iris3[,,1],iris3[,,2],iris3[,,3]) targets <- class.ind( c(rep("s", 50), rep("c", 50), rep("v", 50)) ) samp <- c(sample(1:50,25), sample(51:100,25), sample(101:150,25)) ir1 <- nnet(ir[samp,], targets[samp,], size = 2, rang = 0.1, decay = 5e-4, maxit = 200) test.cl <- function(true, pred) { true <- max.col(true) cres <- max.col(pred) table(true, cres) } test.cl(targets[-samp,], predict(ir1, ir[-samp,])) # or ird <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]), species = factor(c(rep("s",50), rep("c", 50), rep("v", 50)))) ir.nn2 <- nnet(species ~ ., data = ird, subset = samp, size = 2, rang = 0.1, decay = 5e-4, maxit = 200) table(ird$species[-samp], predict(ir.nn2, ird[-samp,], type = "class"))
ã«ã¼ãã«æ³ã¨ãµãã¼ããã¯ã¿ã¼ãã·ã³
- ã«ã¼ãã«ã¨ã¯
- éç·å½¢æ§é ãç·å½¢æ§é ã«åºæ¥ãã¨ä¾¿å©ãªã®ã§ã«ã¼ãã«é¢æ°ã§ããããã
- K(x,x)çãªæãã®é¢æ°
- é©ç¨ç¯å²
- å¯åº¦é¢æ°ã®æ¨å®ã主æååæãæ£æºç¸é¢åæãã¯ã©ã¹ã¿åæãå¤å¥åæãªã©ãªã©
- ã«ã¼ãã«ä¸»æååæ
- KPCA(Kernel Principal COmponent Analysis)
- 解æã®æµã
- ã«ã¼ãã«é¢æ°ãK(x,z)ãæ±ãã
- ãã¼ã¿ããååè¡åKnÃnãæ±ãã
- KnÃnã®åºæå¤ã¨åºæãã¯ãã«ãæ±ãã
- åºæå¤ã¨åºæãã¯ãã«ã®æ£è¦å
- R
- ããã±ã¼ã¸: kernlab
- é¢æ°: kpca
ãµã³ãã«
# another example using the iris data(iris) test <- sample(1:150,20) kpc <- kpca(~.,data=iris[-test,-5],kernel="rbfdot",kpar=list(sigma=0.2),features=2) #print the principal component vectors pcv(kpc) #plot the data projection on the components plot(rotated(kpc),col=as.integer(iris[-test,5]),xlab="1st Principal Component",ylab="2nd Principal Component") #embed remaining points emb <- predict(kpc,iris[test,-5]) points(emb,col=as.integer(iris[test,5]))
- ãµãã¼ããã¯ã¿ã¼ãã·ã³
- åé¡ã¨å帰åé¡ã主ã¨ãããã¼ã¿è§£æææ³
- ãã¼ã¸ã³æ大ååé¡ã«å¸°ç
- R
- ããã±ã¼ã¸: kernlab
- é¢æ°: ksvm (ä»ã«ãè²ã ãã)
ãµã³ãã«
## simple example using the spam data set data(spam) ## create test and training set index <- sample(1:dim(spam)[1]) spamtrain <- spam[index[1:floor(2 * dim(spam)[1]/3)], ] spamtest <- spam[index[((2 * ceiling(dim(spam)[1]/3)) + 1):dim(spam)[1]], ] ## train a support vector machine filter <- ksvm(type~.,data=spamtrain,kernel="rbfdot",kpar=list(sigma=0.05),C=5,cross=3) filter ## predict mail type on the test set mailtype <- predict(filter,spamtest[,-58]) ## Check results table(mailtype,spamtest[,58]) ## Another example with the famous iris data data(iris) ## Create a kernel function using the build in rbfdot function rbf <- rbfdot(sigma=0.1) rbf ## train a bound constraint support vector machine irismodel <- ksvm(Species~.,data=iris,type="C-bsvc",kernel=rbf,C=10,prob.model=TRUE) irismodel ## get fitted values fitted(irismodel) ## Test on the training set with probabilities as output predict(irismodel, iris[,-5], type="probabilities") ## Demo of the plot function x <- rbind(matrix(rnorm(120),,2),matrix(rnorm(120,mean=3),,2)) y <- matrix(c(rep(1,60),rep(-1,60))) svp <- ksvm(x,y,type="C-svc") plot(svp,data=x) ### Use kernelMatrix K <- as.kernelMatrix(crossprod(t(x))) svp2 <- ksvm(K, y, type="C-svc") svp2 #### Use custom kernel k <- function(x,y) {(sum(x*y) +1)*exp(-0.001*sum((x-y)^2))} class(k) <- "kernel" data(promotergene) ## train svm using custom kernel gene <- ksvm(Class~.,data=promotergene,kernel=k,C=10,cross=5) gene #### Use text with string kernels data(reuters) is(reuters) tsv <- ksvm(reuters,rlabels,kernel="stringdot",kpar=list(length=5),cross=3,C=10) tsv ## regression # create data x <- seq(-20,20,0.1) y <- sin(x)/x + rnorm(401,sd=0.03) # train support vector machine regm <- ksvm(x,y,epsilon=0.01,kpar=list(sigma=16),cross=3) plot(x,y,type="l") lines(x,predict(regm,x),col="red")
éå£å¦ç¿
- éå£å¦ç¿ã¨ã¯
- ã¢ã³ãµã³ãã«å¦ç¿ã¨ãè¨ããã
- 決ãã¦ç²¾åº¦ãé«ãã¨ã¯è¨ããªãåé¡å¨ã®çµæããå¦ç¿ãè¡ãªãå¶åº¦ãé«ãåé¡å¨ã®æ§ç¯ãè¡ã
- ææ³
- ãã®ã³ã°ããã¼ã¹ãã£ã³ã°ãã©ã³ãã ãã©ã¬ã¹ã
- ã©ã³ãã ãã©ã¬ã¹ãã¯ç¹ã«æ°ããã
- ãã®ã³ã°
- bagging(bootstrap aggregating), 1996å¹´ãã©ã¤ãã³ã«ãã£ã¦ææ¡(L.Breiman)
- ä¸ãããããã¼ã¿ã»ããããã¼ãã¹ãã©ããã¨ãããªãµã³ããªã³ã°æ³ã«ãã£ã¦è¤æ°ã®å¦ç¿ãã¼ã¿ãä½æ
- ä½æãããã¼ã¿ã»ããã«å帰ã»åæçµæãçµ±åãçµã¿åããããã¨ã§ç²¾åº¦ãããã
- ãã¼ãã¹ãã©ããããµã³ãã«ã¯ããããç¬ç«ãå¦ç¿ã¯ä¸¦åå®è¡å¯è½.
- R
- ããã±ã¼ã¸: adabag
- é¢æ°: bagging
ãµã³ãã«
## rpart library should be loaded library(rpart) data(iris) names(iris)<-c("LS","AS","LP","AP","Especies") lirios.bagging <- bagging(Especies~LS +AS +LP+ AP, data=iris, mfinal=10) ## rpart and mlbench libraries should be loaded library(rpart) library(mlbench) data(BreastCancer) l <- length(BreastCancer[,1]) sub <- sample(1:l,2*l/3) BC.bagging <- bagging(Class ~.,data=BreastCancer[,-1],mfinal=25, maxdepth=3) BC.bagging.pred <- predict.bagging(BC.bagging,newdata=BreastCancer[-sub,-1]) BC.bagging.pred[-1] # Data Vehicle (four classes) library(rpart) library(mlbench) data(Vehicle) l <- length(Vehicle[,1]) sub <- sample(1:l,2*l/3) Vehicle.bagging <- bagging(Class ~.,data=Vehicle[sub, ],mfinal=50, maxdepth=5) Vehicle.bagging.pred <- predict.bagging(Vehicle.bagging,newdata=Vehicle[-sub, ]) Vehicle.bagging.pred[-1]
- ãã¼ã¹ãã£ã³ã°
- æ師ä»ããã¼ã¿ãç¨ãã¦å¦ç¿ãè¡ãªããå¦ç¿çµæãè¸ã¾ãéã¿ã®èª¿æ´ãç¹°ãè¿ã
- è¤æ°ã®å¦ç¿çµæãæ±ããçµæãçµ±åã»çµã¿åããããããã¨ã§ç²¾åº¦ãããã
- AdaBoost(1996)ãæå
- R
- ããã±ã¼ã¸: adabag
- é¢æ°: adaboost.M1
ãµã³ãã«
## rpart library should be loaded library(rpart) data(iris) names(iris)<-c("LS","AS","LP","AP","Especies") iris.adaboost <- adaboost.M1(Especies~LS +AS +LP+ AP, data=iris, boos=TRUE, mfinal=10) ## rpart and mlbench libraries should be loaded ## Comparing the test error of rpart and adaboost.M1 library(rpart) library(mlbench) data(BreastCancer) l <- length(BreastCancer[,1]) sub <- sample(1:l,2*l/3) BC.rpart <- rpart(Class~.,data=BreastCancer[sub,-1], maxdepth=3) BC.rpart.pred <- predict(BC.rpart,newdata=BreastCancer[-sub,-1],type="class") tb <-table(BC.rpart.pred,BreastCancer$Class[-sub]) error.rpart <- 1-(sum(diag(tb))/sum(tb)) tb error.rpart BC.adaboost <- adaboost.M1(Class ~.,data=BreastCancer[,-1],mfinal=25, maxdepth=3) BC.adaboost.pred <- predict.boosting(BC.adaboost,newdata=BreastCancer[-sub,-1]) BC.adaboost.pred[-1] ## Data Vehicle (four classes) library(rpart) library(mlbench) data(Vehicle) l <- length(Vehicle[,1]) sub <- sample(1:l,2*l/3) mfinal <- 25 maxdepth <- 5 Vehicle.rpart <- rpart(Class~.,data=Vehicle[sub,],maxdepth=maxdepth) Vehicle.rpart.pred <- predict(Vehicle.rpart,newdata=Vehicle[-sub, ],type="class") tb <- table(Vehicle.rpart.pred,Vehicle$Class[-sub]) error.rpart <- 1-(sum(diag(tb))/sum(tb)) tb error.rpart Vehicle.adaboost <- adaboost.M1(Class ~.,data=Vehicle[sub, ],mfinal=mfinal, maxdepth=maxdepth) Vehicle.adaboost.pred <- predict.boosting(Vehicle.adaboost,newdata=Vehicle[-sub, ]) Vehicle.adaboost.pred[-1]
- ã©ã³ãã ãã©ã¬ã¹ã
- Random Forest(RF)ã¯ãã®ã³ã°ã®æå±è Breimalã«ãã£ã¦ææ¡ãããæ°ããææ³
- 精度ãPCã®è³æºç¯ç´ã®é¢ã§ãã®ã³ã°ããã¼ã¹ãã£ã³ã°ããåªç§
- R
- ããã±ã¼ã¸: randomForest
- é¢æ°: randomForest
ãµã³ãã«
## Classification: ##data(iris) set.seed(71) iris.rf <- randomForest(Species ~ ., data=iris, importance=TRUE, proximity=TRUE) print(iris.rf) ## Look at variable importance: round(importance(iris.rf), 2) ## Do MDS on 1 - proximity: iris.mds <- cmdscale(1 - iris.rf$proximity, eig=TRUE) op <- par(pty="s") pairs(cbind(iris[,1:4], iris.mds$points), cex=0.6, gap=0, col=c("red", "green", "blue")[as.numeric(iris$Species)], main="Iris Data: Predictors and MDS of Proximity Based on RandomForest") par(op) print(iris.mds$GOF) ## The `unsupervised' case: set.seed(17) iris.urf <- randomForest(iris[, -5]) MDSplot(iris.urf, iris$Species) ## Regression: ## data(airquality) set.seed(131) ozone.rf <- randomForest(Ozone ~ ., data=airquality, mtry=3, importance=TRUE, na.action=na.omit) print(ozone.rf) ## Show "importance" of variables: higher value mean more important: round(importance(ozone.rf), 2) ## "x" can be a matrix instead of a data frame: set.seed(17) x <- matrix(runif(5e2), 100) y <- gl(2, 50) (myrf <- randomForest(x, y)) (predict(myrf, x)) ## "complicated" formula: (swiss.rf <- randomForest(sqrt(Fertility) ~ . - Catholic + I(Catholic < 50), data=swiss)) (predict(swiss.rf, swiss)) ## Test use of 32-level factor as a predictor: set.seed(1) x <- data.frame(x1=gl(32, 5), x2=runif(160), y=rnorm(160)) (rf1 <- randomForest(x[-3], x[[3]], ntree=10)) ## Grow no more than 4 nodes per tree: (treesize(randomForest(Species ~ ., data=iris, maxnodes=4, ntree=30)))
ã¢ã½ã·ã¨ã¼ã·ã§ã³åæ
- ç´ããã¤ã¨ãã¼ã«ã®ã¢ã¬
- ã¢ã½ã·ã¨ã¼ã·ã§ã³åæã¨ã¯
- POSãã¼ã¿çããæçãªæ
å ±ãè¦ã¤ããéã«æ´»ç¨ããã
- POSãã¼ã¿ã¯ãã©ã³ã¶ã¯ã·ã§ã³ããã¹ã±ããã¨å¼ã°ãã(ææ³ãç¨ããé)
- 代表çãªææ³ã«ç¸é¢ã«ã¼ã«ãé »åºã¢ã¤ãã ãªã©ã®ææ³ããã
- POSãã¼ã¿çããæçãªæ
å ±ãè¦ã¤ããéã«æ´»ç¨ããã
- ç¸é¢ã«ã¼ã«
- ãã©ã³ã¶ã¯ã·ã§ã³ãã¼ã¿ãã¼ã¹ã«é »ç¹ã«åºã¦ããã¢ã¤ãã éã®ååããã®è¦åã®äº
- IBMã§éçºãããAprioriãæå
- R
- ããã±ã¼ã¸: arules
- é¢æ°: apriori
ãµã³ãã«
data("Adult") ## Mine association rules. rules <- apriori(Adult, parameter = list(supp = 0.5, conf = 0.9, target = "rules")) summary(rules)
- R
- ããã±ã¼ã¸: arules
- é¢æ°: eclat
ãµã³ãã«
data("Adult") ## Mine itemsets with minimum support of 0.1. itemsets <- eclat(Adult, parameter = list(supp = 0.1, maxlen = 15))