æ©æ¢°å¦ç¿æç¿ã: ã¯ã©ã¹ã¿ãªã³ã°
ãå ¥é æ©æ¢°å¦ç¿ãæç¿ãã9æ¥ç®ãã9ç« MDS:ç±³å½ä¸é¢è°å¡ã®é¡ä¼¼åº¦ã®è¦è¦çãªèª¿æ»ãã§ãã
観測å¤ãã¯ã©ã¹ã¿ãªã³ã°ããããã®ãå¤æ¬¡å 尺度æ§ææ³(MDS:multidimensional scaling)ãå¦ã³ãå¾ç¥¨ã«åºã¥ãã¦ç±³å½ä¸é¢è°å¡ãã¯ã©ã¹ã¿ãªã³ã°ãã¦ã¿ã¾ãã
# åæºå > setwd("09-MDS/")
å¤æ¬¡å 尺度æ§ææ³ã§ã®ã¯ã©ã¹ã¿ãªã³ã°
æåã«ã顧客ã®è£½åè©ä¾¡ãã¼ã¿ã使ã£ã¦é¡§å®¢ãåé¡ããä¾ã試ããå¤æ¬¡å 尺度æ§ææ³ã®èãæ¹ãå¦ã³ã¾ãã
ã¾ãã¯ã製åè©ä¾¡ãã¼ã¿ãä½æãã¾ãã
> set.seed(851982) > ex.matrix <- matrix(sample(c(-1, 0, 1), 24, replace = TRUE), nrow = 4, ncol = 6) > row.names(ex.matrix) <- c('A', 'B', 'C', 'D') > colnames(ex.matrix) <- c('P1', 'P2', 'P3', 'P4', 'P5', 'P6') > ex.matrix P1 P2 P3 P4 P5 P6 A 0 -1 0 -1 0 0 B -1 0 1 1 1 0 C 0 0 0 1 -1 1 D 1 0 1 -1 0 0
å(P1,P2..)ã製åãè¡(A,B)ã顧客ã示ãã¾ããå¤ãè©ä¾¡å¤ã§ã1ãè¯ãã-1ãæªãã0ãæªè©ä¾¡ãæå³ãã¾ãã
è©ä¾¡å¤ã¯sample
ã使ã£ã¦ã©ã³ãã ã«çæãã¦ãã¾ãã
ãã®ãã¼ã¿ãã顧客å士ã®é¢ä¿ãåãåºãããããããé¢ä¿ãè¦ç´ããæ°å¤ã«å¤æããå¿ è¦ãããã¾ãã å ·ä½çã«ã¯ã以ä¸ã®ãããªæé ã§ã顧客å士ã®æè¦ã®ä¸è´åº¦ã示ãè¡åã«ãã¾ãã
- 1) å製åã«å¯¾ããAããã®è©ä¾¡ãBããã®è©ä¾¡ãæ¯è¼ããç®åºãããæ°å¤ãåè¨
- 両æ¹ã¨ã1 or -1 ãªããæè¦ãä¸è´(+1)
- çæ¹ã1ã§ãããä¸æ¹ã-1ã®å ´åãæè¦ã¯ä¸ä¸è´(-1)
- ããããä¸æ¹ã0ã®å ´åãæªè©ä¾¡(0)
- 2) ãã®è©ä¾¡ãå ¨ã¦ã¼ã¶ã¼ã®çµã¿åããã§è¡ããè¡åãä½ãã
ä¾) Aããã¨Bããã®ä¸è´åº¦
ä¸è´åº¦ = 製å1ã®ä¸è´åº¦ + 製å2ã®ä¸è´åº¦ + 製å3ã®ä¸è´åº¦ + 製å4ã®ä¸è´åº¦ + 製å5ã®ä¸è´åº¦ + 製å6ã®ä¸è´åº¦ = (0 * -1) + (-1 * 0) + (0 * 1) + (-1 * 1) + (0 * 1) + (0 * 0) = -1
ãã®æä½ã¯ãè¡åã転置ãã¦æãåããããã¨ã§ããã£ã¨è¨ç®ã§ãã¾ãã
> ex.mult <- ex.matrix %*% t(ex.matrix) > ex.mult A B C D A 2 -1 -1 1 B -1 4 0 -1 C -1 0 3 -1 D 1 -1 -1 3
åå¤ããâã§èª¬æããæè¦ã®ä¸è´åº¦ã示ãæ°å¤ã«ãªãã¾ãã ã¡ãªã¿ã«ãAxAãBxBã®å ´åãæè¦ã¯å®å ¨ä¸è´ãªã®ã§ãé«ãå¤ã«ãªãã¾ãã
次ã«ããã®ãã¼ã¿ã示ã顧客éã®è·é¢ããã¦ã¼ã¯ãªããè·é¢ãè¨ç®ãããã¨ã§æ°å¤åãã¾ãã
ã¦ã¼ã¯ãªããè·é¢ãâã®ãã¼ã¿ã4次å
座æ¨ã«ãããã³ã°ããã¨ãã®ãå座æ¨éã®è·é¢ã§ãã(ã¨ããèªèã§ãã£ã¦ãã®ããª?)
ã¦ã¼ã¯ãªããè·é¢ã¯ã dist
é¢æ°ã§è¨ç®ã§ãã¾ãã
> ex.dist <- dist(ex.mult) > ex.dist A B C B 6.244998 C 5.477226 5.000000 D 2.236068 6.782330 6.082763
A,Dã¯è·é¢ãè¿ããæè¦ãä¼¼ã¦ããã¨ãããã¾ãã
æå¾ã«ããã®è·é¢è¡åãå¤æ¬¡å 尺度æ§ææ³ã§å¯è¦åãã¾ãã
ä»å使ãå¤æ¬¡å 尺度æ§ææ³ã§ã¯ããã¼ã¿ä¸ã®ãã¹ã¦ã®ç¹éã®è·é¢ã示ãè·é¢è¡åã使ãããã®2ç¹éã®è·é¢ãè¿ä¼¼ãã座æ¨ãè¿ãã¾ãã
ããã使ã£ã¦ã顧客ã®é¢ä¿ã2次å ã®å³ã«ãã¦ã¿ã¾ãã
> png("plot1.png", width = 100, height = 100) > ex.mds <- cmdscale(ex.dist) > plot(ex.mds, type = 'n') > text(ex.mds, c('A', 'B', 'C', 'D')) > dev.off()
æ£å¸å³ãããA,Dã¯é¡ä¼¼æ§ããããã¨ããããã¾ãã
ç±³å½ä¸é¢è°å¡ã®ã¯ã©ã¹ã¿ãªã³ã°
å¤æ¬¡å 尺度æ§ææ³ã®æ¦è¦ãã¤ãããã¨ããã§ãæ¬é¡ã«åãçµã¿ã¾ãã æ³æ¡ã¸ã®è³æ/å対ãã¼ã¿ã使ã£ã¦ãç±³å½ä¸é¢è°å¡ãåé¡ãã¦ã¿ã¾ãã
ã¾ãã¯ããã¼ã¿ã®èªã¿è¾¼ã¿ã¨ã¯ãªã¼ãã³ã°ããã
> library('foreign') > library('ggplot2') > data.dir <- file.path("data", "roll_call") > data.files <- list.files(data.dir) # å ¨ãã¼ã¿ãèªã¿è¾¼ã > rollcall.data <- lapply(data.files, function(f) { read.dta(file.path(data.dir, f), convert.factors = FALSE) })
ä»åã¯ãè°å¡ã®ååãæå±æ¿å ãè³å¦ã®ã¿ä½¿ãã®ã§ããããåãåºãã¾ãã
> rollcall.simplified <- function(df) { no.pres <- subset(df, state < 99) # state 99ã¯å¯å¤§çµ±é ãä»åã¯ããã¯é¤å¤ããã # æ票æ¹æ³ã«ã¯ããã¤ãããã®ã§ãè³æ(1)/å対(-1)/éæ票(0)ã«ä¸¸ãã for(i in 10:ncol(no.pres)) { no.pres[,i] <- ifelse(no.pres[,i] > 6, 0, no.pres[,i]) no.pres[,i] <- ifelse(no.pres[,i] > 0 & no.pres[,i] < 4, 1, no.pres[,i]) no.pres[,i] <- ifelse(no.pres[,i] > 1, -1, no.pres[,i]) } return(as.matrix(no.pres[,10:ncol(no.pres)])) } > rollcall.simple <- lapply(rollcall.data, rollcall.simplified)
å ãã¼ã¿ãã§ããã®ã§ãè·é¢è¡åãä½æã
> rollcall.dist <- lapply(rollcall.simple, function(m) dist(m %*% t(m)))
MDSãè¨ç®ã
# å³ã«ããæã®é ç½®ãç´è¦³ã«åãããããã-1ãæãã¦å転ããã¦ã¾ãã > rollcall.mds <- lapply(rollcall.dist, function(d) as.data.frame((cmdscale(d, k = 2)) * -1)) > head(rollcall.mds[[1]]) V1 V2 2 -11.44068 293.0001 3 283.82580 132.4369 4 885.85564 430.3451 5 1714.21327 185.5262 6 -843.58421 220.1038 7 1594.50998 225.8166
ããã«ã rollcall.data
ãããè°å¡åã¨æå±æ¿å
ãåãåºãã¦è¿½å ãã¾ãã
> congresses <- 101:111 > for(i in 1:length(rollcall.mds)) { names(rollcall.mds[[i]]) <- c("x", "y") congress <- subset(rollcall.data[[i]], state < 99) # å¯å¤§çµ±é ã¯é¤å¤ # è°å¡åãåãåºããå§åãå«ã¾ãã¦ããå ´åãããã®ã§ãåã ããåãåºãã congress.names <- sapply(as.character(congress$name), function(n) strsplit(n, "[, ]")[[1]][1]) rollcall.mds[[i]] <- transform(rollcall.mds[[i]], name = congress.names, party = as.factor(congress$party), congress = congresses[i]) } > head(rollcall.mds[[1]]) x y name party congress 2 -11.44068 293.0001 SHELBY 100 101 3 283.82580 132.4369 HEFLIN 100 101 4 885.85564 430.3451 STEVENS 200 101 5 1714.21327 185.5262 MURKOWSKI 200 101 6 -843.58421 220.1038 DECONCINI 100 101 7 1594.50998 225.8166 MCCAIN 200 101
ãã¼ã¿ãã§ããã®ã§ãã°ã©ãã«ãã¦ã¿ã¾ãã ã¾ãã¯ã第110ä¸é¢è°ä¼ã®æ£å¸å³ã
> cong.110 <- rollcall.mds[[9]] > base.110 <- ggplot(cong.110, aes(x = x, y = y)) + scale_size(range = c(2,2), guide = 'none') + scale_alpha(guide = 'none') + theme_bw() + theme(axis.ticks = element_blank(), axis.text.x = element_blank(), axis.text.y = element_blank(), panel.grid.major = element_blank()) + ggtitle("Roll Call Vote MDS Clustering for 110th U.S. Senate") + xlab("") + ylab("") + scale_shape(name = "Party", breaks = c("100", "200", "328"), labels = c("Dem.", "Rep.", "Ind."), solid = FALSE) + scale_color_manual(name = "Party", values = c("100" = "black", "200" = "dimgray", "328"="grey"), breaks = c("100", "200", "328"), labels = c("Dem.", "Rep.", "Ind.")) + geom_point(aes(shape = party, alpha = 0.75, size = 2)) > ggsave(plot = base.110, filename = 'plot2.png')
æå±æ¿å ã§ãã£ããäºåãããçµæã«ãã¾ããããã§ãããã
æå¾ã«ãæ®ãã®è°ä¼ã®å³ãä½æãã¦çµããã
> all.mds <- do.call(rbind, rollcall.mds) > all.plot <- ggplot(all.mds, aes(x = x, y = y)) + geom_point(aes(shape = party, alpha = 0.75, size = 2)) + scale_size(range = c(2, 2), guide = 'none') + scale_alpha(guide = 'none') + theme_bw() + theme(axis.ticks = element_blank(), axis.text.x = element_blank(), axis.text.y = element_blank(), panel.grid.major = element_blank()) + ggtitle("Roll Call Vote MDS Clustering for U.S. Senate (101st - 111th Congress)") + xlab("") + ylab("") + scale_shape(name = "Party", breaks = c("100", "200", "328"), labels = c("Dem.", "Rep.", "Ind."), solid = FALSE) + facet_wrap(~ congress) > ggsave(plot = all.plot, filename = 'plot3.png')