DTW Barycenter Averaging (DBA) ã®ææ³ãç¨ãã¦ãæç³»åãã¼ã¿ã®å¹³åãæ±ãã¦ã¿ã¾ããååã®è¨äº*1ã¨åæ§ã«å°é¢¨ã®çµè·¯æ
å ±ãé¡æã¨ãã¦ãè¤æ°ã®å°é¢¨ã®çµè·¯ã®ãå¹³åããè¨ç®ãã¾ãããçµæã¯ä»¥ä¸ã®ã¨ããã§ããæ°è±¡åºã®ã¦ã§ããµã¤ã*2ã§å
¬éããã¦ãããã¼ã¿ãã 10 åã®å°é¢¨ãé¸ã³ãDBA ãé©ç¨ãã¦å¹³åã®çµè·¯ãæ±ãããã®ã§ãéç·ãåå°é¢¨ã®çµè·¯ã赤ç·ã DBA ã«ããå¹³åã§ãã
ãã¼ã¿ã®æºå
å©ç¨ãããã¼ã¿ã¯ãååã®è¨äºã§ä½æããéå»ã®å°é¢¨ã®ãã¼ã¿ããã2012 å¹´ã®å°é¢¨ 17 å·ã« DTW è·é¢ã®è¿ã 10 åãé¸æãã¾ãããDTW ãè¨ç®ããããã°ã©ã ãç¨ãã¦æ¬¡ã®ããã«å®è¡ãã¾ãã
$ for f in T*.csv; do echo $f $(php dtw.php T1217.csv $f); done | sort -nsk2,2 | head -n 10 T1217.csv 0 T1101.csv 152.08961628221 T0315.csv 175.12050614974 T1304.csv 198.65671463983 T1506.csv 239.88518007505 T1102.csv 240.21226800402 T0306.csv 280.40652975797 T0715.csv 305.72729625626 T1203.csv 309.22264174911 T0423.csv 320.9641416908
ãã® 10 åã®å°é¢¨ã«ã¤ãã¦è·é¢è¡åãè¨ç®ã㦠medoid ãæ±ããã¨ãT1101 (2011 å¹´ã®å°é¢¨ 1 å·) ã«ãªãã¾ãã次ã®å³ã¯ T1101 ã赤ç·ã¨ã㦠10 åã®å°é¢¨ãæãããã®ã§ããMedoid 㯠DTW è·é¢ã®ç·åãæå°ã«ãããã®ãªã®ã§ãå³ã®ããã«ãçµè·¯ã®ä¸é¨ã¯å
¨ä½ã®å¹³åãã大ããå¤ãããã¨ãããã¾ãã
DBA ã®ã¢ã«ã´ãªãºã
DBA ã¯æç³»åãã¼ã¿ã®å¹³åãæ±ããã¢ã«ã´ãªãºã ã§ãã次ã®è«æã«æ¬ä¼¼ã³ã¼ããæ²è¼ããã¦ãã¾ããè«æã¯ç¬¬ä¸èè ã®ã¦ã§ããã¼ã¸ (http://www.francois-petitjean.com/Research/) ã§å ¬éããã¦ãã¾ãã
IEEE ICDM 2014: F. Petitjean, G. Forestier, G. Webb, A. Nicholson, Y. Chen and E. Keogh, "Dynamic Time Warping Averaging of Time Series allows Faster and more Accurate Classification."
ãªããDBA ã®ã¢ã«ã´ãªãºã èªä½ã¯åã第ä¸èè ã«ãã以ä¸ã®è«æã§çºè¡¨ããã¦ããããã§ãããã¡ãã¯æ¬æãã¡ã¤ã«ã¸ã®ãªã³ã¯ãè¦ã¤ãããªãã£ãã®ã§ãä»åã®è¨äºã§ã¯ä¸è¨ã®è«æã®ã³ã¼ããåç §ãã¾ããã
Pattern Recognition: F. Petitjean, A. Ketterlin & P. Gançarski, "A global averaging method for dynamic time warping, with applications to clustering," 2011.
DBA ã¯ãé©å½ãªä»®ã®ãå¹³åããåæå¤ã¨ãã¦ããããå復çã«æ´æ°ãããã¨ã§æç³»åãã¼ã¿ã®å¹³åãæ±ãã¾ãã次ã®å³ã®ä¾ã§ DBA ã®åä½ã説æãã¾ããå³ã®éç·ã¯ T1217 㨠T1203ã赤ç·ã¯ T1101 ã§ãã赤ç·ã®ç³»åãåæå¤ã¨ãã¦ãéç·ã® 2 ã¤ã®ç³»åã®å¹³åãæ±ãã¾ãããªããããã§ã¯ç°¡åã®ãã 2 ç³»åã®å¹³åãæ±ãã¾ããã3 ç³»å以ä¸ã®å ´åã§ãåæ§ã«è¨ç®ã§ãã¾ãã
ã¾ããåç³»å (éç·) ã¨å¹³åã®ç³»å (赤ç·) ã¨ã®é㧠DTW ãããããè¨ç®ãã¾ãããã®ã¨ããDTW ã®è¨ç®éç¨ã§å¾ããããè¦ç´ éã®å¯¾å¿ä»ããä¿æãã¦ããã¾ããä¸å³ã¯ãå·¦ã T1217 ã¨å¹³åç³»åã¨ã® DTW ã®çµæãå³ã T1203 ã¨å¹³åç³»åã¨ã® DTW ã®çµæã§ããç·è²ã®ç·ãè¦ç´ éã®å¯¾å¿ä»ãã表ãã¦ãã¾ãã
次ã«ãå¹³åç³»åã®åè¦ç´ ãããããã対å¿ä»ããããè¦ç´ ã®éå¿ã«ç§»åãã¾ãããã®æä½ãä¸å³ã«ç¤ºãã¾ãã左㯠DTW ã®çµæãã¾ã¨ãã¦æ¥æ¬å島ä»è¿ãæ¡å¤§ãããã®ã§ãå³ã¯ãããããå¹³åç³»åã®åè¦ç´ ã移åãããã®ã§ããå³å³ã®æ¡è²ã®ç·ã¯ç§»ååã®ç¶æ
ã表ãã¦ãã¾ããæ¯è¼ããã¨ãéå¿ã«ç§»åãããã¨ã§å¹³åç³»åãæ¹åããã¦ããæ§åãåããã¾ãã
以ä¸ã®æé ã§å¹³åã®ç³»åã移åãããã¨ã«ãããåç³»åã¨ã® DTW ã®çµæãå¤åãã¾ããDBA ã¯ããããç¹°ãè¿ãã¦å¹³åã®ç³»åãé 次æ´æ°ãã¦ããã¾ããä¾ã«ç¤ºãã 2 ç³»åã®å¹³åã§ã¯ã9 åã®å復å¦çã§åæãã¾ãããå¾ãããçµæã¯ä»¥ä¸ã®ã¨ããã§ãã
DBA ã®å®è£
ããã§ã¯ãDBA ã®ã¢ã«ã´ãªãºã ãå®è£ ãã¦ã¿ã¾ãããã¤ãã®ããã« PHP ã§å®è£ ãã¾ãã
ã¾ããDTW ãè¨ç®ãã¦å¯¾å¿ä»ããæ±ããé¢æ°ã次ã®ããã«å®è£ ãã¾ããã$a 㨠$b ãå ¥åã®æç³»åã¨ãã¦ãé¢æ°ã®ååã§ã¯ãäºæ¬¡å é å $d ã«ç´¯ç©ã® DTW è·é¢ãæ ¼ç´ãã¦ããã¾ããããã¨åæã«ãç´åã®ã¤ã³ããã¯ã¹ã $p ã«æ ¼ç´ãã¦ããã¾ããå¾åã§ã¯ $p ãæçµè¦ç´ ããéé ã«è¾¿ã£ã¦å¯¾å¿ä»ããæ§ç¯ããå¾ããã $path ãæ»ãã¾ãã
<?php function dtwpath($a, $b, $distance = 'euclid') { $p = array_fill(0, count($a), array_fill(0, count($b), false)); $d = array_fill(0, count($a), array_fill(0, count($b), false)); $d[0][0] = $distance($a[0], $b[0]); for ($i = 1; $i < count($a); ++$i) { $p[$i][0] = array($i - 1, 0); $d[$i][0] = $d[$i - 1][0] + $distance($a[$i], $b[0]); } for ($j = 1; $j < count($b); ++$j) { $p[0][$j] = array(0, $j - 1); $d[0][$j] = $d[0][$j - 1] + $distance($a[0], $b[$j]); } for ($i = 1; $i < count($a); ++$i) { for ($j = 1; $j < count($b); ++$j) { $prev = array($i - 1, $j - 1); if ($d[$i][$j - 1] < $d[$prev[0]][$prev[1]]) { $prev = array($i, $j - 1); } if ($d[$i - 1][$j] < $d[$prev[0]][$prev[1]]) { $prev = array($i - 1, $j); } $p[$i][$j] = $prev; $d[$i][$j] = $d[$prev[0]][$prev[1]] + $distance($a[$i], $b[$j]); } } $path = array(); list ($i, $j) = array(count($a) - 1, count($b) - 1); while ($i > 0 || $j > 0) { $path[] = array($i, $j); list ($i, $j) = $p[$i][$j]; } $path[] = array(0, 0); return array_reverse($path); } function euclid($a, $b) { return sqrt(array_sum(array_map(function ($x, $y) { return pow($x - $y, 2); }, $a, $b))); }
ãããç¨ãã¦ãDBA ã®ä¸åã®å復å¦çã次ã®ããã«å®è£ ãã¾ããã$sequences ãæç³»åãã¼ã¿ã®é åã$seed ã移ååã®å¹³åç³»åã§ããforeach ã®ã«ã¼ãã§ãå¹³åç³»åã¨åç³»åã¨ã® DTW ã®å¯¾å¿ä»ããæ±ããå¹³åç³»åã®åè¦ç´ ã«å¯¾å¿ä»ããããè¦ç´ ã $vertices[$ci] ã«è¿½å ãã¦ããã¾ããæå¾ã«ãaverage é¢æ°ã§å¹³åç³»åã®åè¦ç´ ãéå¿ã«ç§»åãã¦æ»ãã¾ãã
<?php require_once __DIR__ . '/dtwpath.php'; function dba($sequences, $seed) { $vertices = array_fill(0, count($seed) - 1, array()); foreach ($sequences as $s) { $path = dtwpath($seed, $s); foreach ($path as list ($ci, $si)) { $vertices[$ci][] = $s[$si]; } } return array_map('average', $vertices); } function average($vectors) { $n = count($vectors); $d = count($vectors[0]); $result = array_fill(0, $d, 0); for ($i = 0; $i < $d; ++$i) { foreach ($vectors as $v) { $result[$i] += $v[$i]; } $result[$i] /= $n; } return $result; }
ãã®å復å¦çãç¹°ãè¿ãã¦æç³»åãã¼ã¿ã®å¹³åãè¨ç®ããå¦çã¯ã次ã®ããã«å®è£ ãã¾ããã第ä¸å¼æ°ãåæå¤ã¨ãã¦ã第äºå¼æ°ä»¥éã®æç³»åãã¼ã¿ã®å¹³åãæ±ãã¾ããå¹³åã®ç³»åãå¤åããªããªãããå復åæ°ã 100 åã«éããã¾ã§å¦çãç¹°ãè¿ãã¾ãã
<?php require_once __DIR__ . '/dba.php'; function readCsv($filename) { return array_map( function ($line) { return explode(',', $line); }, file($filename, FILE_IGNORE_NEW_LINES)); } $average = readCsv($argv[1]); $sequences = array(); for ($i = 2; $i < $argc; ++$i) { $sequence = readCsv($argv[$i]); $sequences[] = $sequence; } for ($i = 0; $i < 100; ++$i) { $next_average = dba($sequences, $average); if ($next_average === $average) { break; } $average = $next_average; } foreach ($average as $values) { echo implode(',', $values) . "\n"; }
ãã®ããã°ã©ã ã次ã®ããã«å®è¡ããã¨ã10 åã®å°é¢¨ãå¹³åããçµè·¯ãå¾ããã¾ããåé ã®å³ã¯ããã®åºåçµæãæç»ãããã®ã§ãã
$ php dba_main.php T1101.csv T*.csv | tee result.csv 10.647916666667,139.06458333333 12.328571428571,132.61785714286 14,129.96666666667 12.676,129.036 14.236363636364,128.3 ... 42.7,161.17272727273 44.507692307692,165.56153846154 49.178260869565,169.88695652174 47.17,173.59 48.441666666667,177.85
åæå¤ã«ãã DBA ã®çµæã®éã
ããããã¯ãDBA ã§å¾ãããæç³»åã«ã¤ãã¦å°ã調ã¹ã¦ã¿ã¾ããã¾ããåæå¤ã«ããçµæã®éãã確èªãã¾ãã
ã¢ã«ã´ãªãºã ãèããã¨ãããããã«ãDBA ã§ã¯ãåæå¤ã¨ãã¦ä¸ããç³»åã®é·ãã¯æå¾ã¾ã§å¤ããã¾ãããè«æãèªãéãã§ã¯ããã®ã¢ã«ã´ãªãºã ã¯é·ãã®çããæç³»åãã¼ã¿ã®éåã«é©ç¨ãããã¨ãæå³ãã¦ããããã«ãæããã®ã§ãã*3ãä»åã®è¨äºã®ããã«é·ããã°ãã°ãã®æç³»åã«é©ç¨ããã¨ãã«ã¯ãåæå¤ã®ç³»åé·ãç¹ã«åé¡ã«ãªãããã§ããç´æçã«ã¯ã極端ã«çãæç³»åã§ã¯è¯ãçµæãå¾ãããªãããã«æããã¾ããä»åã®å®é¨ã§ã¯ 10 åã®å°é¢¨ã® medoid ã¨ã㦠T1101 ãåæå¤ã¨ãã¾ãããã確èªãã¦ã¿ãã¨ãããã¯å ¨ä½ã®ä¸ã§ 2 çªç®ã«çããã¼ã¿ã§ããã
$ wc -l T*.csv 61 T0306.csv 50 T0315.csv 57 T0423.csv 59 T0715.csv 36 T1101.csv 61 T1102.csv 53 T1203.csv 69 T1217.csv 34 T1304.csv 60 T1506.csv 540 åè¨
10 åã®å°é¢¨ã®ä¸ã§ä¸çªé·ã T1217 ãåæå¤ã«ããã¨ã次ã®çµæã«ãªãã¾ããã赤ç·ã T1217 ãåæå¤ã¨ããã¨ãã®çµæã§ããæ¯è¼ã®ãããT1101 ãåæå¤ã¨ããçµæ (åé ã®å³) ãæ¡è²ã§ç¤ºãã¦ãã¾ããæã£ãã»ã©ã®éãã¯ããã¾ããã§ããããå°ãç°ãªãè»è·¡ãæãã¦ãããã¨ããããã¾ãã
ç³»åãã¨ã«å¹³åãåã
ä»åå©ç¨ããå°é¢¨ã®ãã¼ã¿ã§ã¯ãDBA ã§å¾ãããå¹³åã®çµè·¯ãå ¨ä½çã«ã§ãã¼ããã¦ãã¦ãããä¸èªç¶ãªå°è±¡ãããã¾ãããã®åå ã¨ãã¦ãDTW ã§ä¸å¯¾å¤ã®å¯¾å¿ä»ããå¾ãããããã«éå¿ã®è¨ç®ã§ç¹å®ã®ç³»åã«å¼·ãå¼ãå¯ããããã®ã§ã¯ãªããã¨èãã¾ãã*4ã
ããã§ãDBA ã®ã¢ã«ã´ãªãºã ãå°ãå¤æ´ãã¦ãã¾ãåç³»åã®ä¸ã§å¹³åãã¦ãããç³»åãã¨ã«åãéãã§éå¿ãæ±ããããã«ãã¦ã¿ã¾ãããå¤æ´å¾ã® dba é¢æ°ã®å®è£ ã¯æ¬¡ã®ã¨ããã§ãã
<?php function dba($sequences, $seed) { $vertices = array_fill(0, count($seed) - 1, array()); foreach ($sequences as $s) { $path = dtwpath($seed, $s); $seqavg = array_fill(0, count($seed) - 1, array()); foreach ($path as list ($ci, $si)) { $seqavg[$ci][] = $s[$si]; } $seqavg = array_map('average', $seqavg); for ($i = 0; $i < count($seqavg); ++$i) { $vertices[$i][] = $seqavg[$i]; } } return array_map('average', $vertices); }
ãã®ããã«å¤æ´ãã¦å®è¡ããçµæã以ä¸ã®ã¨ããã§ããåæå¤ã¯ T1101 ã¨ãã¦ãã¾ãã赤ç·ãå¤æ´å¾ã®ããã°ã©ã ã«ãããã®ã§ãæ¡è²ã¯ãªãªã¸ãã«ã®ããã°ã©ã ã«ãããã®ã§ããä»åå©ç¨ãããã¼ã¿ã§ã¯ãæå¾
ã©ããæ»ãããªçµæãå¾ããã¾ããããã®ä¸æ¹ã§ãæç³»åã®æåã®ç¹ã西å´ã«ããã¦ãããæ±äº¬ã®åããåæ±æ¹åã§çºçãã¦ããå°é¢¨ã®æ§åãæãããããªããªã£ã¦ãã¾ãããã®ãããã¯ç¨éã«å¿ãã¦å·¥å¤«ã®ä½å°ãããããã§ãã
*1:k-medoids 法と DTW による時系列データのクラスタリング - y_uti のブログ
*2:http://www.data.jma.go.jp/fcd/yoho/typhoon/index.html, ä»åã®è¨äºã®å³ã¯ããããããã¦ã³ãã¼ãã§ãã PDF ãã¡ã¤ã«ã®ãã¼ã¿ãå å·¥ãã¦ä½æãããã®ã§ãã
*3:è«æ III ç¯ã® A. Definitions ã§æç³»åã®é·ããå®æ° L ã¨ãã¦è°è«ãé²ãã§ãã¾ããã¾ããTable III ã§ããã¼ã¿ã»ãããã¨ã«ç³»åé·ãä¸å®ã®ããã§ãã
*4:è«æã«æ¸ããã¦ããå 容ã§ã¯ãªãç§ã®åæãªæãã¤ããªã®ã§æ³¨æãã¦ãã ããã