注æï¼ãã®è¨äºã¯å¤§å¥½è©æ¾éä¸ã®ハロー!!きんいろモザイクã¨æè¿è©±é¡ã®Deep learning ããã¶ãã¦è©±é¡æ²¸é¨°!!ã«ãããã£ãããã©ããããããã¢ã¶ã¤ã¯ã«åºã¦ãã声åªã®ãµã³ãã«ãã¤ã¹(東山奈央)ãå
¥æã§ãããDNNã«ã¤ãã¦ãçµå±å®è£
ãéã«åããã«random forestã¨ãå¤é
ãã¸ã¹ãã£ãã¯å帰ã§ãã¾ããã¦ããããェâ¦ã¨æã£ãããªãã¨ãDNNã§ããã®ã§ååã¿ã¤ãã«è©æ¬ºã§ãã
ã
æåããã
ご注文はDeep Learningですか? - kivantium活動日記
ããã§ã¯OpenCV ãç¨ãã¦é¡èªèããã¦ããã®ãã¼ã¿ãDNNã«æµãã¦ä¸»è¦ãã£ã©+ãã®ä»å¤å®ããã¦ããã
ãªãã°ã声åªçµ±è¨ãä¿®ããè
ã¨ãã¦ã¯ã音声解析ã®æè¡ãç¨ãã¦
誰ãä»æã£ã¦ããã®ããèå¥ãããã
ãããDNNã®æè¡ãç¨ãã¦ãã£ã¦ã¿ãã
ã
ããæ¹ã¨ãã¦ã¯ã
ãµã³ãã«ãã¤ã¹ã®åéâçµ±è¨éã®ä½æâå¦ç¿âå¦ç¿å¨ã®æ§è½è©ä¾¡âæ¨å®âåç»ä½æ
ã¨ããæãã§ããã
ã
ãµã³ãã«ãã¤ã¹ã®åé
ããããã¢ã¶ã¤ã¯ã®ãã£ã¹ãã¯西明日香ã田中真奈美ã種田梨沙ã内山夕実ã東山奈央ã®5人ã ããæ±å±±å¥å¤®ã®ãµã³ãã«ãã¤ã¹ããªãã£ãã®ã§ã代æ¿æ¡ã¨ãã¦ã注æã¯ãããã§ãã?ã®水瀬いのりã佐倉綾音ã種ç°æ¢¨æ²ã佐藤聡美ã内田真礼ã®ãµã³ãã«ãã¤ã¹ãéããã
BGMããªããã®ãæ¡ç¨ããç¡é³é åã¯é©å½ã«ã«ããããã
ã
çµ±è¨éã®ä½æ
ã¨ããããã¡ã«ã±ãã¹ãã©ã ãæ¡ç¨ããã12次å
ã¨ã£ã¦ãã¦ãæéå¤åã®åçç¹å¾´éã¨åããã¦1ãã¬ã¼ã 10 msec ããã24次å
ã®ãã¼ã¿ãå
¥åã¨ãªãã
ã
å¦ç¿
ã¨ããããRã§ã使ããh2oã使ã£ã¦ã¿ãããã©ã¡ã¼ã¿ã®èª¿æ´ã¯i5, 8GB RAM ã®ã¬ãããã¼ãã§ã¯å
¨ããã¾ãè¡ããªãã£ããâ¦
å
¥åãã¼ã¿ã¯5人ã®ã¡ã¤ã³ãã£ã©+ãã®ä»ã®6ã©ãã«ã50000ãã¤ã®ãã¬ã¼ã ã¨ããã
ãã®ä»ã¨ããã®ã¯BGMãä»ã®å£°åªã®å£°ã¨ãããã¨ã§ã前回の解析ã§ä½¿ã£ããã¤ã¹éãããµã³ããªã³ã°ããã
è¨ç®æ©è³æºã®ããã¼ããåçãã¦ãããç¾å¨ã¯GPUPCã使ããããã«ç³è«ä¸ã§ãTheanoãCaffeã使ã£ã¦ããã«é«éãªæãã§ã§ããã°ãããªã¨æãã
# Deep learning library(h2o) # deep learning ã®ããã®æ¥ç¶ localH2O <- h2o.init(ip = "localhost", port = 54321, startH2O = TRUE, nthreads=-1) # dat_tr ã¯1åç®ãã©ãã«ã2åç®ä»¥éãå ¥åãã¯ãã«ã®ã²ãããã§ãããã¼ã¿ # èªã¿è¾¼ã¿ h2o_tr <- h2o.importFile(localH2O, path = "dl_train_mel.csv") h2o_ts <- h2o.importFile(localH2O, path = "dl_test_mel.csv") df_tr <- as.data.frame(h2o_tr) df_ts <- as.data.frame(h2o_ts) # ããããããè¨ç®æéã®éç dl <- h2o.deeplearning(x = 2:ncol(df_tr), y = 1, data = h2o_tr, activation = "Tanh", hidden=rep(1000, 3), epochs = 5, rate=0.01) pred.dl <- h2o.predict(object=dl, newdata=h2o_ts) p <- as.data.frame(pred.dl)$predict table(p) v <- as.data.frame(h2o_ts$V1)$V1 mat <- table(v, p) sum(diag(mat))/length(v)
ã
å¦ç¿å¨ã®æ§è½è©ä¾¡
ãã¹ããã¼ã¿ã¯6ã©ãã«10000ãã¤ã®ãã¬ã¼ã ãç¨æãããã¨ããããåãããã«åé¡ã§ããããå²åã§åºãã¨ãããã¨ã§ã
çµæã¯70%ãããã®åé¡è½ã ã£ãã
Minase_Inori Sakura_Ayane Taneda_Risa Sato_Satomi Uchida_Maaya Other Minase_Inori 9655 72 29 33 11 59 Sakura_Ayane 64 9478 97 5 168 169 Taneda_Risa 194 323 8999 42 99 297 Sato_Satomi 292 59 24 9425 8 72 Uchida_Maaya 91 814 191 21 8532 305 Other 6829 14666 9350 1451 4508 50331
ã
æ¨å®
predict ããã ãã10 msec ãã¨ã«æ¨å®ããã¦ãæã確çã®é«ããã£ã©ãæ¨å®ãããã混声ã®å ´åã¯ããããã®ç¢ºçã§æ··åããã¦ããã¨èããã¨
åã£ç«¯ããå§åç種ç°ç()
predict Minase_Inori Sakura_Ayane Taneda_Risa Uchida_Maaya Sato_Satomi Other Taneda_Risa 0.0188608989 0.1630974859 0.8161082864 0.000003831 0.000003944 0.0019255573 Taneda_Risa 0.0069014449 0.0141900461 0.9757663012 4.78813080917462E-006 0.000431119 0.0027063172 Taneda_Risa 0.0010154428 0.2572121322 0.7388253808 2.58052859862801E-005 0.0002405713 0.0026806884 Sakura_Ayane 0.0272961222 0.9442297816 0.0271465927 4.11001019529067E-005 0.0002363344 0.0010500529 Taneda_Risa 0.1239169091 0.0305609833 0.8443024158 3.30391935676744E-006 3.27450834447518E-006 0.0012130789 Taneda_Risa 0.0202784631 0.3523899317 0.6195970774 0.000002033 3.84907465900142E-008 0.007732505 Taneda_Risa 0.0186102167 0.0161491111 0.9485545754 1.07119058156968E-006 2.20463178379759E-007 0.0166848917 Taneda_Risa 0.0865577981 0.1521711946 0.6512304544 1.5910183719825E-005 9.73642386270512E-007 0.1100237072 Taneda_Risa 0.0029677181 0.0482471623 0.9420395494 9.03342788660666E-006 5.52017036170582E-006 0.0067310599
ã
åç»ä½æ
Rã§ã¯animation ããã±ã¼ã¸ã® saveVideo ã§ã§ãã¦ã1ãã¬ã¼ã ãããã®ç§æ°ã¨æ¡å¼µåæå®ã§æéããããããã©ãå®æããã
ããèªä½ã«ã¯é³æ¥½ãã¤ãã¦ããªãã®ã§ãé©å½ãªåç»ç·¨éã½ããã§é³æ¥½ãã¤ããã
ã¡ãªã¿ã«ãã£ã©ç»åã¯å
¬å¼HPã®ãã¤ãã¿ã¼ãããã¯ã£ã¦ããããã®ä»ã¯ä¸ç´ç©ã·ãã«ããã
library(png) library(jpeg) library(animation) pngs <- list.files("/cv/pic/", pattern="jpg") # ãã¯ã£ã¦ãããã¤ãã¿ã¼ç»åãé©å½ã«ã©ãã«ä»ããã¦ããã¦ãã pics <- mapply(function(x) readJPEG(x, native=TRUE), pngs) ra <- 1 #åç¹ã«è¿ãã¨ãããæ½°ããã®ã§æ¡å¤§ãããã£ããã©ãçåã§ãã£ãã xy0 <- sapply(pics, dim)[1:2, ] #pixel xy0[] <- min(xy0) xy0[2,] <- 1000 # ãªãã横ãæ½°ããã®ã§ãã³å ¥ã rownames(xy0) <- c("height", "width") s0 <- 0.001 #æ¡å¤§ç¸®å°ç cols <- c("pink", "lightblue", "violet", "yellow", "lightgreen", "black") # ãã£ã©ã®è² saveVideo({ ani.options(interval = 0.01, nmax = nrow(dat)) for(j in seq(nrow(dat))){ b0 <- barplot(unlist(dat[j,-1]), ylim=c(0, 1), col=cols, axisname=FALSE, las=1) pa <- par()$usr text(pa[2], pa[4]+0.02, paste(round(j*0.01), "sec"), pos=2, xpd=TRUE, cex=2) lay0 <- cbind(b0, pa[3]-0.09) for(i in seq(pics)){ xleft=lay0[i, 1]*ra - xy0[2, i]/2*s0 ybottom=lay0[i, 2]*ra - xy0[1, i]/2*s0 xright=lay0[i, 1]*ra + xy0[2, i]/2*s0 ytop=lay0[i, 2]*ra + xy0[1, i]/2*s0 rasterImage(image=pics[[i]], xleft=xleft, ybottom=ybottom, xright=xright, ytop=ytop, xpd=TRUE) } } }, video.name = "BM.mp4", other.opts = "-b 300k") # higher bitrate, better quality
åç»ãã¢ãããã¼ãããã®ã¯é常ã«ããã©ãããã£ãã®ã§æç³»åããããã«ããã
dat <- read.csv("predicted.csv") cvnames <- c("ä½å綾é³", "æ°´ç¬ãã®ã", "種ç°æ¢¨æ²", "å ç°ç礼", "ä½è¤è¡ç¾", "ãã®ä»") cols <- c("pink", "lightblue", "violet", "yellow", "lightgreen", "black") par(mfrow=c(6, 1), mar=c(2, 4, 2, 2), cex.lab=1.2) for(i in 2:7){ plot(dat[, i], type="l", col=cols[i-1], xaxt="n", xlab="", ylab="Probability", ylim=c(0, 1), las=1) title(cvnames[i-1]) axis(1, at=seq(0, 9000, length=10), labels=seq(0, 90, length=10)) }
ã
å®æåãããã
h2oã®å®é¨ã®æ®µéãããæ¨å®ããã¹ã¦ç¨®ç°æ¢¨æ²ã ã£ããæ°´ç¬ãã®ãã«ãªã£ããã§ãã®ãããæ±åæ§è½ãæªããå
¥åãã¼ã¿ã§ãããã精度ãé«ããªã£ã¦ããã£ã±ããã¹ããã¼ã¿ã§æ±åæ§è½ãæªãã£ããæã®ã»ã¨ãã©ãå
ç°ç礼ããã®ä»ã§å ãããã¦ããããä½è¤è¡ç¾ã«ããã£ã¦ã¯1ç§ä»¥ä¸ãããªãã
ãã®ä»ã®ã©ãã«ãæãã¦åæ§ã«ãã£ã¦ã¿ãããã¾ã£ããæ¹åããªãã£ãã
ã
æ¹åçã¨ãã¦ã¯ãDNNã¯ç¹å¾´éã®æ½åºãè¡ããã¨ããã®ã売ãã®ã²ã¨ã¤ã«ãªã£ã¦ãããããç¹å¾´éã¨ãã¦å
¥åããã¡ã«ã±ãã¹ãã©ã ã¯ãã¶ããããããªãã£ããã¨ããã®ããã¡ã«ã±ãã¹ãã©ã ã¯äººéã®è³ã®ç¹æ§ã«é常ã«ããä¼¼ããæ¯é³ã®å¨æ³¢æ°ã表ãã¨ããªãã¨ãããããã説æããããæ¢ã«å®æ(?)ãããç¹å¾´éãå
¥åã«ããã®ã¯ãã¾ããããããªãã£ããããããªããMFCC features are not suitable. ã¨ã¯こちらã§ãè¨ããã¦ãããã¨ããããã§ãæ¹åçãã®1ã¨ãã¦ã¯ã¡ã«ã±ãã¹ãã©ã ã«ãªãåã®ãFFTãããããã®ãã¼ã¿ã§ããã®ã¯ã©ãããFFTãããããã§ããã°ãå¨æ³¢æ°ã¯ããç¨åº¦ç¹å¾´çãªãã¼ã¯ãã¨ãã¦ããªããã¤2000次å
ãããåããããé©å½ã«DNNã«æ¾ãè¾¼ãã ããªãã¨ããã¦ãããã£ãã(é©å½ãメルフィルタバンクãããã®å¦çã¯ãã¾ãã¦ããããã
ã
ããã²ã¨ã¤ãOPã®BGMã¯åé¢ãã¹ãã ã£ããããããªããç¬ç«æååæã¯ãã£ãããã©ãããããåååæ§ãã¾ãåé¢ã§ããªãã£ããã¨æã£ã¦ãããdeep karaoke(論文)ã¨ãããããããã«ã¯ãã«ãã¼ãã£å¹æãDNNã§ããã¾ããããã¨ãã話ããã£ãã®ã§ããã試ãã¦ãBGMã«ãããã¤ãºãæ¸ããã®ãæã ããã
MEDLEYDBã¨ãã楽æ²å¦ç¿å¨ãããã®ã§ãããã使ãã°ãã¶ãæ©ãã
ã
library(tuneR) library(seewave) library(sound) library(dtt) library(fastICA) library(phonTools) library(e1071) wd1 <- "/cv/cv_deep/" # ãã¡ãã声åªã®ãµã³ãã«ãã¤ã¹ wd2 <- "/cv/original/" # ãã®ä»å£°åªã®ãµã³ãã«ãã¤ã¹ f1 <- list.files(wd1, pattern="wav") f2 <- list.files(wd2, pattern="wav") cv_gochiusa <- unique(mapply(function(x) x[1], strsplit(f1, "_silence_"))) wav1 <- mapply(readWave, paste(wd1, f1, sep="")) wav2 <- mapply(readWave, paste(wd2, f2, sep="")) fs <- 44100 msec <- 0.05 # ãµã³ããªã³ã°ã®é·ã niter <- 10000 # ä½æãããã¼ã¿æ° n_frm <- 8 # ãã©ã«ãã³ã # ãã¡ãã声åªã®ãã¼ã¿ãéãã res_mel <- res_frm <- NULL for(cv in seq(cv_gochiusa)){ pb <- txtProgressBar(max=niter, style=3) for(n in seq(niter)){ setTxtProgressBar(pb, n) tmp_mel <- tmp_frm <- NULL i <- sample(grep(cv_gochiusa[cv], f1), size=1) r <- rle(wav1[[i]]@left > 100) # ç¡é³ã£ã½ãã¨ããã¯çããã lidx <- cumsum(r$lengths)[r$lengths < msec*fs] # if(length(lidx) > 1){ cutpoint <- sample(head(lidx, -1), size=1) tmp_w <- extractWave(wav1[[i]], from=cutpoint, to=cutpoint+msec*fs) m0 <- try(melfcc(tmp_w, wintime=0.01, spec_out=TRUE), silent=TRUE) if(class(m0) != "try-error"){ dc <- delta_cepstrum(m0$cepstra, dd=5) tmp_mel <- rbind(tmp_mel, cbind(m0$cepstra, dc)) tmp_mel <- as.data.frame(cbind(cv_gochiusa[cv], tmp_mel)) res_mel <- rbind(res_mel, tmp_mel) } frm <- findformants(tmp_w@left, fs=fs, verify=FALSE) # ãã©ã«ãã³ãæ½åº if(length(frm$formant) >= n_frm){ tmp_frm <- rbind(tmp_frm, head(frm$formant, n_frm)) tmp_frm <- as.data.frame(cbind(cv_gochiusa[cv], tmp_frm)) res_frm <- rbind(res_frm, tmp_frm) } } #print(n) } } write.csv(res_frm, "gochiusa_frm.csv") write.csv(res_mel, "gochiusa_mel.csv") # å¥å£°åªãã¼ã¿ãä½ã # ãã¡ãã声åªã®ãã¼ã¿ãéãã res_mel <- res_frm <- NULL pb <- txtProgressBar(max=niter, style=3) for(n in seq(niter)){ setTxtProgressBar(pb, n) tmp_mel <- tmp_frm <- NULL i <- sample(seq(wav2), size=1) r <- rle(wav2[[i]]@left > 100) # ç¡é³ã£ã½ãã¨ããã¯çããã lidx <- cumsum(r$lengths)[r$lengths < msec*fs] # if(length(lidx) > 1){ cutpoint <- sample(head(lidx, -1), size=1) tmp_w <- extractWave(wav2[[i]], from=cutpoint, to=cutpoint+msec*fs) m0 <- try(melfcc(tmp_w, wintime=0.01, spec_out=TRUE), silent=TRUE) if(class(m0) != "try-error"){ dc <- delta_cepstrum(m0$cepstra, dd=5) tmp_mel <- rbind(tmp_mel, cbind(m0$cepstra, dc)) tmp_mel <- as.data.frame(cbind("Other", tmp_mel)) res_mel <- rbind(res_mel, tmp_mel) } frm <- findformants(tmp_w@left, fs=fs, verify=FALSE) # ãã©ã«ãã³ãæ½åº if(length(frm$formant) >= n_frm){ tmp_frm <- rbind(tmp_frm, head(frm$formant, n_frm)) tmp_frm <- as.data.frame(cbind("Other", tmp_frm)) res_frm <- rbind(res_frm, tmp_frm) } } } write.csv(res_mel, "other_mel.csv") write.csv(res_frm, "other_frm.csv") # melfcc ã®cepstra è¡åãã¶ã¡è¾¼ãå½¢å¼ # dd ã¯åå¾ããã¤åãã delta_cepstrum <- function(mat, dd=2){ res <- mat dat1 <- mat[c(rep(1, dd) ,seq(nrow(mat)), rep(nrow(mat), dd)), ] x <- seq(2*dd + 1) for(j in seq(ncol(dat1))){ for(i in (dd+1):(nrow(dat1)-dd)){ y <- dat1[(i-dd):(i+dd), j] lm1 <- lm(y ~ x) res[i-dd, j] <- lm1$coefficients[2] } } return(res) }