Earth Mover's Distance (EMD)
Earth Mover's Distance (EMD) ã«ã¤ãã¦èª¿ã¹ããã¨ãæ´çãã¦ããã¾ããEMDã¯ãã¦ã¼ã¯ãªããè·é¢ã®ãããªè·é¢å°ºåº¦ã®ä¸ã¤ã§ãäºã¤ã®åå¸ã®éã®è·é¢ã測ããã¨ãã§ãã¾ããè¨èªå¦çã§ã¯ãã¾ãèãããã¨ãªãã£ãã®ã§ãããç»åå¦çãé³å£°å¦çã§ã¯æ¯è¼çæåãªè·é¢å°ºåº¦ã®ããã§ãã
EMDã使ããåé¡è¨å®ã¯ä¸å³ã®ããã«ãªãã¾ãã
EMDã¯ç¹å¾´éã¨éã¿ã®éåï¼ã·ã°ããã£ã¨å¼ã¶ï¼ã§ä¸ããããåå¸Pã¨åå¸Qã®éã®è·é¢ã§ããããã§ãç¹å¾´ééã§ã¯è·é¢ ãå®ç¾©ããã¦ããã®ãåæã§ããç¹å¾´éããã¯ãã«ã®ã¨ãã¯ã¦ã¼ã¯ãªããè·é¢ãç¹å¾´éã確çåå¸ã®ã¨ãã¯ã«ã«ããã¯ã»ã©ã¤ãã©ã¼è·é¢ï¼æ å ±éï¼ãªã©ã§ããEMDã¯ãç¹å¾´éã®éåã2ã¤ä¸ããããã¨ãã«ã1å1åã®ç¹å¾´ééã®è·é¢ããã¨ã«ãç¹å¾´ééåéã®è·é¢ãæ±ãããããã§ãããããã¯ãããã
éã¿ã¯å ·ä½çãªå¿ç¨ã«ãã£ã¦ä½¿ãæ¹ãå¤ããã¾ããããã®ç¹å¾´éã®éè¦åº¦ã表ãã¦ãã¾ãããã¨ãã°ããã¹ãã°ã©ã ã ã£ããåæ£ãç¹å¾´éã«ããããæ£ã®é«ããéã¿ã«ãããã¾ããåã«é¡ä¼¼ç»åæ¤ç´¢ (2009/10/3ï¼ã§ãç»åã®è²ã®ãã¹ãã°ã©ã ããHistogram Intersectionã¨ããè·é¢ã使ãã¾ããããEMDã使ã£ã¦è·é¢ãæ±ãããã¨ãã§ãã¾ããåèæç®ã«ãããEMDã®åè«æã¯é¡ä¼¼ç»åæ¤ç´¢ã対象ã«ãã¦ãã¾ãã
EMDãªãã¦ä½¿ããããã£ã¨åç´ã«å ¨ç¹å¾´éã®ããããçµã¿åããéã®è·é¢ã®ç·åã§ãããããããï¼ã¨æãã¾ãããã©ãããã ãã ã¨éã¿ãå®å ¨ã«ç¡è¦ãã¦ãã¾ããã»ã»ã»éã¿ãéè¦ãªãã§ãï¼
èãæ¹ã®åºæ¬ã¯è¼¸éåé¡
EMDã®å®ç¾©ã¯ãæé©ååé¡ã®1ã¤ã®è¼¸éåé¡ï¼Transportation Problemï¼ã®èãæ¹ã«åºã¥ãã¦ãã¾ãããªã®ã§ã¾ãã¯è¼¸éåé¡ã«ã¤ãã¦ç°¡åã«ã¾ã¨ãã¾ããå ã®å³ã«ããã¦ãPã®åå ´æP1, ... ,Pmã«ã¯ãéã¿ã®éã ãè·ç©ãç©ã¾ãã¦ããã¨ãã¾ããããã¦ãQã®åå ´æQ1, ... ,Qnã«ã¯éã¿ã®éã ãæ ¼ç´ã§ããå庫ãããã¨ãã¾ãããã®ã¨ããPã«ããè·ç©ããã¹ã¦Qã«éã¶*1ã¨ããã©ãããã©ãã¸ã©ã®ãããéã¶ã¨ãã£ã¨ãå¹çãããããæ±ããã®ã輸éåé¡ã§ãã
ããã§ãPi ãã Qj ã¸è¼¸éããã³ã¹ãï¼è·é¢ï¼ã ã¨ããPi ãã Qj ã¸è¼¸éããè·ç©éï¼ããã¼ï¼ã ã¨å®ç¾©ãã¾ããããã¦ãPi ãã Qj ã¸éã¶ã®ã«è¦ããä»äºéã ã¨å®ç¾©ãã¾ãããã¨ãã°ãè·é¢ãé ãã¨ããã«å¤§éã®è·ç©ãéã¶ã¨ããã ãä»äºéãå¢ããã®ã§ç´æã¨ãä¸è´ãã¾ãããã®ã¨ããç·ä»äºéWãä¸ã®ããã«å®ç¾©ããã¨ãW ãæå°åãã ãæ±ããã°ãã£ã¨ãå¹çã®ããéã³æ¹ã ã¨ãããã¾ãã
ã¯ä¸ããããã®ãåæãªã®ã§ãæé©åããå¤æ°ã¯è¼¸éé ã ãã§ããããã¦ã輸éé ã«ã¯ä¸ã®4ã¤ã®å¶ç´ãå ãããã¾ãã
(1) ããªããPããQã¸è¼¸éãããéæ¹åã¯ãªãã
(2) Piã«ããè·ç©ä»¥ä¸ã¯è¼¸éã§ããªã
(3) Qjã«ããå庫ã®å®¹é以ä¸ã¯è·ç©ãåãä»ããããªã
(4) 輸ééã®ä¸éã¯ãè·ç©ã®ç·éãå庫ã®ç·å®¹éã®å°ããæ¹
æå¾ã®æ¡ä»¶ã¯è·ç©ã®ç·éã¨å庫ã®å®¹éãéãã¨ãã«å¿ è¦ã§ããè·ç©ç·éããå庫ã®ç·å®¹éã大ããã£ããå ¨é¨è¼¸éã§ããã®ã§è¼¸ééã®ä¸éã¯è·ç©ã®ç·éã¨ãªãã¾ãããè·ç©ãå庫ã®éããå¤ãã£ããå ¨é¨è¼¸éã§ããªãã®ã§è¼¸ééã®ä¸éã¯å庫ã®ç·å®¹éã«ãªãã¾ããä»åãåãä¸ããä¾é¡ã¯è·ç©ã®ç·éã¨å庫ã®ç·å®¹éã¯åãã¨ãã¦ãã¾ãã
輸éåé¡ã®è§£ãæ¹ã¯çç¥ãã¾ããã解ãã¨æé©ãª ãæ±ã¾ãã¾ããEMDã¯ãã® ãç¨ãã¦ä¸ã®ããã«å®ç¾©ããã¾ãã輸éé ã®åè¨ã§å²ã£ã¦ããã®ã¯ã輸ééã«ãã£ã¦EMDã®ã¹ã±ã¼ã«ãå¤ãããªãããã«æ£è¦åãã¦ããããã§ãããããã¯ãã¨ã§å ·ä½ä¾ã§ç¢ºèªãã¦ã¿ã¾ãã
EMDã¯è¼¸éã«å¿ è¦ãªæé©ãªä»äºéãå°ããã»ã©äºã¤ã®ã·ã°ããã£ã®è·é¢ã¯è¿ãã¨ããèãæ¹ãªã®ã§ããã¨èªç¶ãªèãæ¹ã ã¨æãã¾ãããã ããããããç¹å¾´ééã®çµã¿åããã«ã¤ãã¦è¶³ãåãããå¿ è¦ãªã®ã§ç¹å¾´éã®æ°ãå¤ããªãã¨è¨ç®éã¯é常ã«å¤§ãããªãããã§ãããã®ãããç¹å¾´éã®æ°ãççºããªãããã«ãã¯ãã«éååã¨çµã¿åããã¦ã·ã°ããã£ãä½ãææ³ãææ¡ããã¦ãã¾ããããã¯ãå¾ã«ç´¹ä»äºå®ã§ãã
EMDã®å®ç¾©ãããã£ãã¨ããã§å ·ä½ä¾ã解ãã¦ã¿ã¾ã
ãã®ä¾ã¯ãEMDã®ææ¡è ã®Rubnerさんのライブラリã«åºã¦ããä¾é¡ã§ããç¹å¾´éã¯3次å ãã¯ãã«ã§éã¿ã¯æµ®åå°æ°ç¹æ°ã§ä¸ãããã¦ãã¾ããç¹å¾´éã®å¤ã0ãã255ã®3次å ãã¯ãã«ãªã®ã§ãåå¸Pãç»å1ã®ã«ã©ã¼ãã¹ãã°ã©ã ãåå¸Qãç»å2ã®ã«ã©ã¼ãã¹ãã°ã©ã ã表ãã¦ããããã§ãããã®åå¸Pã¨åå¸Qã®EMDãè¨ç®ãã¦ã¿ã¾ãï¼
Rubnerã®Cè¨èªå®è£
ã¾ãã¯ãRubnerãããå ¬éããã¦ããCè¨èªã®ã³ã¼ãã使ã£ã¦ã¿ã¾ãï¼example1.cï¼ãå®è¡ã«ã¯ãemd.cã¨emd.hãå¿ è¦ã§ããã¾ããemd.hã®feature_tã®å®ç¾©ãåé¡ã«åããã¦æ¸ãæããå¿ è¦ãããã¾ããä»åã¯ãç¹å¾´éã3次å ãã¯ãã«ãªã®ã§
typedef struct { int X,Y,Z; } feature_t;
ã¨å®ç¾©ãã¦ãã¾ããemd.cã®ã©ã¤ãã©ãªã使ã£ã¦ä¸ã®ä¾é¡ã解ãã³ã¼ãã§ãã
#include <stdio.h> #include <math.h> #include "emd.h" /* ã¦ã¼ã¯ãªããè·é¢ */ float dist(feature_t *F1, feature_t *F2) { int dX = F1->X - F2->X; int dY = F1->Y - F2->Y; int dZ = F1->Z - F2->Z; return sqrt(dX*dX + dY*dY + dZ*dZ); } int main() { /* åå¸Pã®ç¹å¾´ãã¯ãã« */ feature_t f1[4] = { {100,40,22}, {211,20,2}, {32,190,150}, {2,100,100} }; /* åå¸Qã®ç¹å¾´ãã¯ãã« */ feature_t f2[3] = { {0,0,0}, {50,100,80}, {255,255,255} }; /* åå¸Pã®éã¿ */ float w1[5] = { 0.4, 0.3, 0.2, 0.1 }; /* åå¸Qã®éã¿ */ float w2[3] = { 0.5, 0.3, 0.2 }; /* åå¸Pã®ã·ã°ãã㣠*/ signature_t s1 = { 4, f1, w1 }; /* åå¸Qã®ã·ã°ãã㣠*/ signature_t s2 = { 3, f2, w2}; /* EMDãè¨ç® */ float e; e = emd(&s1, &s2, dist, 0, 0); printf("emd = %f\n", e); return 0; }
emd()é¢æ°ã«2ã¤ã®åå¸ã®ã·ã°ããã£ã¨ç¹å¾´ééã®è·é¢ãè¨ç®ããé¢æ°ãæå®ãã¦ãã¾ãããã®å®è£ ã§ã¯ãéã¿ã¯æµ®åå°æ°ç¹ã«ãªã£ã¦ãã¾ããåè¨ããã¨ã©ã¡ãã1.0ã«ãªãã®ã§è·ç©ç·éã¨å庫ã®ç·å®¹éã¯çãããªã£ã¦ãã¾ããå®è¡ããã¨ã
emd = 160.542770
ãã£ã¦ãåå¸Pã¨åå¸Qã®è·é¢ã¯ã160.54ã¨ãããã¾ããå ã«æ¸ããããã«EMDã¯ã輸ééã§æ£è¦åãã¦ããããä¸ã®ããã«æ¯çãä¿ã£ãã¾ã¾éã¿ãå¤ãã¦ãçµæã¯åãã«ãªãã¾ãã
/* åå¸Pã®éã¿ */ float w1[5] = { 4.0, 3.0, 2.0, 1.0 }; /* åå¸Qã®éã¿ */ float w2[3] = { 5.0, 3.0, 2.0 };
Rè¨èªã«ããå®è£
次ã¯ãRè¨èªã§åãä¾é¡ã解ãã¦ã¿ã¾ããRè¨èªã§è¼¸éåé¡ã解ãé¢æ°ã¯ãlpSolveã¨ããã©ã¤ãã©ãªã«å«ã¾ãã¦ãã¾ãããæ¨æºã§ã¯å ¥ã£ã¦ããªãã®ã§ã¤ã³ã¹ãã¼ã«ãã¾ãã
install.packages("lpSolve")
以ä¸ã®emd_sample.Rãã¡ã¤ã«ãä½æãã¾ãã
library(lpSolve) # ã¦ã¼ã¯ãªããè·é¢ euclid_dist <- function(f1, f2) { return(sqrt(sum((f1 - f2)^2))) } # EMDãè¨ç® emd <- function(dist, w1, w2) { # lp.transport()ã使ãããã®æºå costs <- dist row.signs <- rep("<", length(w1)) row.rhs <- w1 col.signs <- rep(">", length(w2)) col.rhs <- w2 # 輸éåé¡ã解ã t <- lp.transport(costs, "min", row.signs, row.rhs, col.signs, col.rhs) # æé©ãªè¼¸ééãåå¾ flow <- t$solution # ä»äºéãè¨ç® work <- sum(flow * dist) # æ£è¦åãã¦EMDãè¨ç® e <- work / sum(flow) return(e) } # ç¹å¾´é f1 = matrix(c(100, 40, 22, 211, 20, 2, 32, 190, 150, 2, 100, 100), 4, 3, byrow=T) f2 = matrix(c(0, 0, 0, 50, 100, 80, 255, 255, 255), 3, 3, byrow=T) # éã¿ï¼è¦æ´æ°ï¼ï¼ w1 = c(4, 3, 2, 1) w2 = c(5, 3, 2) n1 = length(f1[,1]) n2 = length(f2[,1]) # è·é¢è¡åãä½æ dist = matrix(0, n1, n2) for (i in 1:n1) { for (j in 1:n2) { dist[i, j] = euclid_dist(f1[i,], f2[j,]) } } # è·é¢è¡åã¨éã¿ããEMDãè¨ç® e = emd(dist, w1, w2) cat(sprintf("emd = %f\n", e))
ãã®å®è£ ã§ã¯ãã·ã°ããã£ã渡ããã«ãã·ã°ããã£ããè¨ç®ããè·é¢è¡åã¨éã¿ã渡ãã¦ãã¾ããlp.transport()ãè·é¢è¡åãåãä»ããã®ã§ããã«åããã¾ããããã¾ããã©ã£ã¡ã§ãããã¨æãã¾ããRãèµ·åãã¦ä»¥ä¸ã®ããã«æã¤ã¨å®è¡ã§ãã¾ãã
> source("emd_sample.R") emd = 160.542763
Cè¨èªçã¨åãçµæã«ãªãã¾ããï¼ãã ãè·é¢è¡åã®è¨ç®ãRã£ã½ãæ¸ãæ¹ã§ãªãã®ã§å¹çæªãããããã£ã¨ããæ¸ãæ¹ããã£ããæãã¦ãã ããã
ããã§ãPythonã§æ¸ããããã ãï¼
Pythonã®SciPyã«ã¯lp.transport()ã«å¯¾å¿ããé¢æ°ã¯ãªãããã§ããä»ã®æé©åã©ã¤ãã©ãªï¼openoptãcvxoptï¼ããã£ã¨æ¢ãã¾ãããè¦ã¤ãããã¾ããã§ãããèªåã§æ¸ãã¦ãããã£ãã®ã§ããããã£ããRã®é¢æ°ãããã®ã§rpy2ã使ã£ã¦PythonããRã®lp.transport()ãå¼ã³åºãã¦ã¿ã¾ãã
rpy2ã¯ãRã®æ©è½ãPythonããå¼ã³åºããããã«ããPythonã©ã¤ãã©ãªã§ãã使ãæ¹ã¯ããè¤éã§ãããä»åã®ããã«Pythonã«ãªãã¦Rã«ããã¢ã«ã´ãªãºã ãããããããã®ã§ä½¿ããã¨ã¯ãã©ãã¾ããã¾ããRãã©ããã¼ãã¦ããã®ã§ã¡ãã£ã¨è¤éã«ãªã£ã¡ãããã©ãã
#coding:utf-8 import numpy as np import rpy2.robjects as robjects # Rã®lp.transport()ãã¤ã³ãã¼ã robjects.r['library']('lpSolve') transport = robjects.r['lp.transport'] def euclid_dist(feature1, feature2): """ã¦ã¼ã¯ãªããè·é¢ãè¨ç®""" if len(feature1) != len(feature2): print "ERROR: calc euclid_dist: %d <=> %d" % (len(feature1), len(feature2)) return -1 return np.sqrt(np.sum((feature1 - feature2) ** 2)) def emd(dist, w1, w2): """Rã®transport()é¢æ°ã使ã£ã¦EMDãè¨ç®""" # transport()ã®å¼æ°ãç¨æ costs = robjects.r['matrix'](robjects.FloatVector(dist), nrow=len(w1), ncol=len(w2), byrow=True) row_signs = ["<"] * len(w1) row_rhs = robjects.FloatVector(w1) col_signs = [">"] * len(w2) col_rhs = robjects.FloatVector(w2) t = transport(costs, "min", row_signs, row_rhs, col_signs, col_rhs) flow = t.rx2('solution') dist = dist.reshape(len(w1), len(w2)) flow = np.array(flow) work = np.sum(flow * dist) emd = work / np.sum(flow) return emd if __name__ == "__main__": f1 = np.array([ [100, 40, 22], [211, 20, 2], [32, 190, 150], [2, 100, 100] ]) f2 = np.array([ [0, 0, 0], [50, 100, 80], [255, 255, 255] ]) # éã¿ã¯èªç¶æ°ã®ã¿ w1 = np.array([4, 3, 2, 1]) w2 = np.array([5, 3, 2]) n1 = len(f1) n2 = len(f2) # è·é¢è¡åãä½æ dist = np.zeros(n1 * n2) for i in range(n1): for j in range(n2): dist[i * n2 + j] = euclid_dist(f1[i], f2[j]) # è·é¢è¡åã¨éã¿ããEMDãè¨ç® print "emd =", emd(dist, w1, w2)
å®è¡ããã¨ã
> python emd_sample.py emd = 160.542762808
ã¨ãªããCãRã¨åãçµæãå¾ããã¾ããã
ãããã«
ä»åã¯ãEMDã«ã¤ãã¦èª¿ã¹ããã¨ãã¾ã¨ãã¦ã¿ã¾ãããã§ããã ãæ£ç¢ºã«æ¸ãããã«ãã¾ãããã誤ããããããããã¾ãããéµåã¿ã«ããªãã§ä»ã®è³æãå½ãã£ã¦ã¿ã¦ãã ããã
ãã¨ã§ãEMDãé¡ä¼¼æ¥½æ²æ¤ç´¢ã¨ããå ·ä½çãªåé¡ã«å¿ç¨ãã¦ã¿ãäºå®ã§ããä¹ããæå¾ ã
åèæç®
- Earth mover's distance - Wikipedia
- Y. Rubner, C. Tomasi and L. J. Guibas: The earth mover's distance as a metric for image retrieval (PDF), International Journal of Computer Vision, 40(2), pp.99-121, 2000 - EMDã®åè«æãEMDãé¡ä¼¼ç»åæ¤ç´¢ã«é©ç¨ãã¦ãã¾ãã
- Code for the Earth Movers Distance (EMD) - Rubnerãããå ¬éããã¦ããCè¨èªå®è£
- Fast Earth Mover's Distance (EMD) Code - EMDãé«éè¨ç®ããå®è£
- æ³æ¬, 大æ¾: Earth Mover's Distanceを用いたテキスト分類ã人工ç¥è½å¦ä¼å ¨å½å¤§ä¼, 2007. - EMDã®èª¬æããããããããç»åãé³å£°ã®ææ³ãããã¹ãã«ã使ãããã§ããã
- lpSolve - Rè¨èªã®lpSolveã®ããã¥ã¢ã«ãlp.transform()ã®è©³ããä»æ§ã¯ããã§ã
*1:å®éã¯ãã¹ã¦ã§ã¯ãªãQã«å ¥ãåã ãã§OK