éå»ã®å°é¢¨ã®çµè·¯æ
å ±ãé¡æã¨ãã¦ãk-medoids æ³ã«ããæç³»åãã¼ã¿ã®ã¯ã©ã¹ã¿ãªã³ã°ã試ãã¦ã¿ã¾ãããè·é¢ã®å°ºåº¦ã«ã¯ã以åã®è¨äº*1ã§ã試ãã Dynamic Time Warping (DTW) ãå©ç¨ãã¾ãããK-medoids æ³ã¨ DTW ã«ã¤ãã¦ã¯ããããã Wikipedia ã«èª¬æãããã¾ãã
k-medoids - Wikipedia, the free encyclopedia
Dynamic time warping - Wikipedia, the free encyclopedia
çµæã¯æ¬¡ã®ããã«ãªãã¾ãããæ°è±¡åºã®å°é¢¨ä½ç½®è¡¨ (http://www.data.jma.go.jp/fcd/yoho/typhoon/position_table/) ããéå»ã®å°é¢¨ã®çµè·¯æ
å ±ãåå¾ãã¦ãk-medoids æ³ã§ 5 ã¯ã©ã¹ã¿ã«åé¡ããçµæã§ããä»åã®è¨äºã§ã¯ããã®å³ãæç»ããã¾ã§ã®æé ã説æãã¾ãã
ãã¼ã¿ã®æºå
éå»ã®å°é¢¨ã®è³æã¯æ°è±¡åºã®ã¦ã§ããã¼ã¸ã§å
¬éããã¦ãã¾ã*2ããå°é¢¨ä½ç½®è¡¨ãã®ãã¼ã¸ããã2001 年以éã®å°é¢¨ã®è³æã PDF ãã¡ã¤ã«ã§ãã¦ã³ãã¼ãã§ãã¾ããä»åã¯ããããã® PDF ãã¡ã¤ã«ããæå»ãã¨ã®ç·¯åº¦çµåº¦ãæ½åºãã¦ãæç³»åãã¼ã¿ã¨ãã¦å©ç¨ãã¾ãã
気象庁|過去の台風資料
ä¸è¨ã®ã¦ã§ããã¼ã¸ãã 2001 年以éã®ãã¹ã¦ã®å°é¢¨ã® PDF ãã¡ã¤ã«ããã¦ã³ãã¼ããã¦ãããããããã¹ããã¡ã¤ã«ã«å¤æãã¾ããããã¹ããã¡ã¤ã«ã¸ã®å¤æã«ã¯ pdftotext ãå©ç¨ãã¾ããã2001 å¹´ã®å°é¢¨ 1 å·ã§ã®å®è¡ä¾ã¯æ¬¡ã®ã¨ããã§ããpdftotext ã®å¦ççµæã T0101.txt ã«ä¿åããã¾ãã
$ wget http://www.data.jma.go.jp/fcd/yoho/data/typhoon/T0101.pdf $ pdftotext T0101.pdf
次ã«ããã®ããã¹ããã¡ã¤ã«ãã緯度ã¨çµåº¦ãæ½åºã㦠csv ãã¡ã¤ã«ã«å¤æãã¾ãã以åã®è¨äºã«ãæ¸ãã¾ãããããã®ãããªå¦çã¯ãã¼ã¿ã«åããã¦ã¢ãããã¯ãªå·¥å¤«ãå¿ è¦ã«ãªãã¾ããä»åã¯ä»¥ä¸ã®ãããªã¹ã¯ãªãããä½æãã¦å¦çãã¾ããã
#!/bin/bash tr ' -' '\n' |\ grep '\(\.\|^[NEWS]$\)' |\ sed 's/\(.\)\([NEWS]\)$/\1\n\2/' |\ awk ' BEGIN { nextval = "lat"; } $1 ~ /[NS]/ { latdir = $1; next; } $1 ~ /[EW]/ { lngdir = $1; next; } nextval == "lat" && NR != 1 { writeline(lat, latdir, lng, lngdir); } nextval == "lat" { lat = $1 % 100; nextval = "lng"; next; } nextval == "lng" { lng = $1 ; nextval = "lat"; next; } END { writeline(lat, latdir, lng, lngdir); } function writeline(lat, latdir, lng, lngdir) { printf("%.1f,%.1f\n", latdir == "N" ? lat : -lat, lngdir == "E" ? lng : 360 - lng); } '
ã¹ã¯ãªããã®ååã§ã¯ãã¾ã tr ã§ãã¼ã¿ãå¤ãã¨ã«æ¹è¡ããå¾ãgrep ã§ç·¯åº¦çµåº¦ã®æ°å¤ã¨ãåç·¯ãæ±çµçã表ãæåãå«ãè¡ãæ½åºãã¾ãããã¼ã¿ã®ä¸ã«ã¯ã緯度çµåº¦ã®æ°å¤ã®å¾ã«ç©ºç½ç¡ãã§åç·¯ãæ±çµãªã©ã®æåãç¶ãå ´åããã£ãã®ã§ããã®ãã¿ã¼ã³ã sed ã§æ¾ã£ã¦æ´ãã¾ããããã¾ã§ã®å¦çã§æ¬¡ã®ç¶æ ã«ãªãã¾ããå ãã¼ã¿ã® pdf ãã¡ã¤ã«*3ã¨å 容ãæ¯è¼ãã¦ã¿ãã¨ã緯度çµåº¦ããã¾ãæ½åºã§ãã¦ãããã¨ããããã¾ãã
$ cat T0101.txt | tr ' -' '\n' | grep '\(\.\|^[NEWS]$\)' | sed 's/\(.\)\([NEWS]\)/\1\n\2/' | head 11.8 N 120.0 E 12.3 119.6 13.1 119.5 14.1 119.3
å¾åã® awk ã¹ã¯ãªããã¯ããããã緯度ã¨çµåº¦ãä¸è¡ã«ã¾ã¨ã㦠csv å½¢å¼ã§åºåãã¾ã*4*5ãåç·¯ã西çµã®å ´åã«ã¯ããããåç·¯ãæ±çµã®å¤ã«å¤æãã¦ãã¾ãã
ãã¦ããã®ã¹ã¯ãªããã extract_latlng.sh ã¨ãã¦ã次ã®ããã«å®è¡ãã¾ãã
$ extract_latlng.sh <T0101.txt >T0101.csv
å¦ççµæã¯ä»¥ä¸ã®ã¨ããã§ããåè¡ãæå»ãã¨ã®å°é¢¨ã®ä½ç½®ã表ãã¦ããã1 åç®ãåç·¯ã2 åç®ãæ±çµã®å¤ã«ãªãã¾ãã
$ head T0101.csv 11.8,120.0 12.3,119.6 13.1,119.5 14.1,119.3 15.3,119.1 16.2,119.1 17.1,119.1 17.6,119.1 18.0,119.2 18.4,119.4
è·é¢è¡åã®è¨ç®
ããã¾ã§ã®å¦çã§ã2001 å¹´ãã 2015 å¹´ã¾ã§ã® 355 åã®å°é¢¨ã®çµè·¯æ å ±ã csv å½¢å¼ã§æ½åºã§ãã¾ãããããã§ããããã®ãã¹ã¦ã®çµã¿åããã«ã¤ã㦠DTW ã«ããè·é¢ãè¨ç®ãã355 è¡ 355 åã®è·é¢è¡åãä½æãã¾ãã
æç³»åãã¼ã¿ã®è·é¢ãè¨ç®ããããã°ã©ã ã¯ä»¥ä¸ã®ã¨ããã§ããããã¯ã以åã®è¨äºã§ä½æããããã°ã©ã ããã®ã¾ã¾å©ç¨ãã¦ãã¾ãã
<?php function dtw($a, $b, $distance = 'euclid') { $d = array_fill(0, count($a) + 1, array_fill(0, count($b) + 1, INF)); $d[0][0] = 0; for ($i = 1; $i <= count($a); ++$i) { for ($j = 1; $j <= count($b); ++$j) { $d[$i][$j] = min([$d[$i - 1][$j - 1], $d[$i][$j - 1], $d[$i - 1][$j]]) + $distance($a[$i - 1], $b[$j - 1]); } } return $d[count($a)][count($b)]; } function euclid($a, $b) { return sqrt(array_sum(array_map(function ($x, $y) { return pow($x - $y, 2); }, $a, $b))); } function readCsv($filename) { return array_map( function ($line) { return explode(',', $line); }, file($filename, FILE_IGNORE_NEW_LINES)); } echo dtw(readCsv($argv[1]), readCsv($argv[2])) . "\n";
ãã®ããã°ã©ã ã dtw.php ã¨ãã¦ããã¹ã¦ã®çµã¿åããã«ã¤ã㦠DTW è·é¢ãè¨ç®ãã¾ãã次ã®ããã«ä½¿ãæ¨ã¦ã®ã¹ã¯ãªãããä½æãã¦å®è¡ãã¾ãããçµæ㯠355 è¡ 355 åã®å¯¾ç§°è¡åã«ãªãã¾ãããªããæå ã®ç°å¢ã§ã¯ãã®è¡åã®è¨ç®ã«æ°æéãããã¾ãããçµæã対称è¡åã«ãªããã¨ã¯ããã£ã¦ããã®ã§ãå®è£ ã工夫ããã°è¨ç®æéã¯ååã§æ¸ãã¯ãã§ããããã®ã¹ã¯ãªããã§ã¯ä½ã工夫ããã«è¡åã®å ¨ã¦ã®è¦ç´ ãè¨ç®ãã¦ãã¾ãã
$ cat build_distance_matrix.sh #!/bin/bash for a in T*.csv; do for b in T*.csv; do echo -n $(php dtw.php $a $b), done echo done $ ./build_distance_matrix.sh | sed -s ',$//' >dtw_distance_matrix.csv
å¾ãããè¡åã®å·¦ä¸ 5 è¡ 5 åã表示ããã¦ã¿ã¾ãã対称è¡åã«ãªã£ã¦ãããã¨ã対è§æåã®å¤ã 0 ã«ãªã£ã¦ãããã¨ã確ããããã¾ãã
$ head -n 5 dtw_distance_matrix.csv | cut -d, -f1-5 0,212.8224774021,359.27459542857,354.74975039293,188.25687465072 212.8224774021,0,383.83645213076,201.19359206238,162.92601640849 359.27459542857,383.83645213076,0,270.84594012043,225.63593029795 354.74975039293,201.19359206238,270.84594012043,0,133.08475475109 188.25687465072,162.92601640849,225.63593029795,133.08475475109,0
K-medoids æ³ã«ããã¯ã©ã¹ã¿ãªã³ã°
å段ã®å¦çã§å¾ããã DTW ã®è·é¢è¡åãå©ç¨ãã¦ãä»åã®è¨äºã®ä¸»é¡ã§ãã k-medoids æ³ã«ããã¯ã©ã¹ã¿ãªã³ã°ãå®è¡ãã¾ãã
K-medoids æ³ã¯ã次ã®æé ã§ãã¼ã¿ã k åã®ã¯ã©ã¹ã¿ã«åé¡ããææ³ã§ã*6ã
- ä¸ãããããã¼ã¿ãã k åãé©å½ã«é¸æãããããã k åã®ã¯ã©ã¹ã¿ããããã® medoid ã¨ãã
- åæããã¾ã§æ¬¡ã®æä½ãç¹°ãè¿ã
- åãã¼ã¿ããk åã®ã¯ã©ã¹ã¿ã®ãªã㧠medoid ã¨ã®è·é¢ãä¸çªè¿ããã®ã«å²ãå½ã¦ã
- åã¯ã©ã¹ã¿ã«ã¤ãã¦ãã¯ã©ã¹ã¿å ã®åãã¼ã¿ã¨ã®è·é¢ã®ç·åãæå°ã«ãªããã¼ã¿ãããã®ã¯ã©ã¹ã¿ã® medoid ã¨ãã
K-means æ³ã§ã¯ã¯ã©ã¹ã¿å ã®åãã¼ã¿ã¨ã®äºä¹è·é¢ã®ç·åãæå°ã«ãªãç¹ãè¨ç®ã㦠centroid ã¨ãã¾ãããk-medoids æ³ã¯ãä¸ãããããã¼ã¿ã®ä¸ãã medoid ãé¸æããã¨ããç¹ã k-means æ³ã¨ç°ãªãã¾ããK-medoids æ³ã¯ãã¼ã¿éã®è·é¢ã ããæ±ã¾ãã°ããã®ã§ããããããå ¨ãã¼ã¿éã®è·é¢è¡åãä¸ããã°å®è¡ä¸ã«è·é¢ãè¨ç®ããå¿ è¦ã¯ããã¾ããã
K-medoids æ³ã¯ä»¥ä¸ã®ãããªåç´ãªã¢ã«ã´ãªãºã ã§ãä¸è¬çãªããã°ã©ãã³ã°è¨èªã§ç°¡åã«å®è£ ã§ãã¾ããä»å㯠PHP ã§æ¬¡ã®ããã«å®è£ ãã¾ããã
<?php function kmedoids($distances, $k, $maxiter = 100) { $medoids = initialize_medoids($distances, $k); $indices = false; for ($iter = 0; $iter < $maxiter; ++$iter) { $next = assign_to_nearest($distances, $medoids); if ($next === $indices) { break; } $indices = $next; $medoids = update_medoids($distances, $indices, $k); } return array($indices, $medoids); } function initialize_medoids($distances, $k) { $medoids = range(0, count($distances) - 1); shuffle($medoids); return array_slice($medoids, 0, $k); } function assign_to_nearest($distances, $medoids) { $k = count($medoids); $indices = array(); foreach ($distances as $d) { $mindist = INF; $nearest = 0; for ($i = 0; $i < $k; ++$i) { if ($d[$medoids[$i]] < $mindist) { $mindist = $d[$medoids[$i]]; $nearest = $i; } } $indices[] = $nearest; } return $indices; } function update_medoids($distances, $indices, $k) { $n = count($distances); $mindists = array_fill(0, $k, INF); $medoids = array_fill(0, $k, false); for ($i = 0; $i < $n; ++$i) { $m = $indices[$i]; $dist = 0; for ($j = 0; $j < $n; ++$j) { if ($indices[$j] == $m) { $dist += $distances[$i][$j]; } } if ($dist < $mindists[$m]) { $mindists[$m] = $dist; $medoids[$m] = $i; } } return $medoids; } function readCsv($filename) { return array_map( function ($line) { return explode(',', $line); }, file($filename, FILE_IGNORE_NEW_LINES)); } $distances = readCsv($argv[1]); $k = $argv[2]; list ($indices, $medoids) = kmedoids($distances, $k); echo implode(' ', $indices) . "\n"; echo implode(' ', $medoids) . "\n";
ããã kmedoids.php ã¨ãã¦ä»¥ä¸ã®ããã«å®è¡ãã¾ããå¼æ°ã«è·é¢è¡åã® csv ãã¡ã¤ã«ã¨ã¯ã©ã¹ã¿æ°ãæå®ãã¾ãã
$ php kmedoids.php dtw_distance_matrix.csv 5 >kmedoids_result.txt
å®è¡çµæã¯ä»¥ä¸ã®ããã«ãªãã¾ããä¸è¡ç®ã¯ãåãã¼ã¿ (å°é¢¨) ãä½çªç®ã®ã¯ã©ã¹ã¿ã«å²ãå½ã¦ããããã表ãã¾ããäºè¡ç®ã¯ãåã¯ã©ã¹ã¿ã«ã¤ãã¦ãå ¥åãã¼ã¿ã®ä½çªç®ããã®ã¯ã©ã¹ã¿ã® medoid ãã表ãã¾ãã
$ cat kmedoids_result.txt 1 1 1 1 1 4 1 1 4 1 0 4 2 1 4 1 4 2 1 4 4 2 1 1 3 1 1 3 4 1 0 0 4 1 0 2 0 1 4 1 0 0 2 1 4 1 4 2 2 2 4 3 3 3 2 4 4 4 3 1 1 4 1 3 1 0 4 4 4 2 1 1 3 3 4 3 4 1 4 0 4 1 4 4 2 1 2 4 0 1 0 0 4 0 4 4 3 1 1 3 3 2 3 3 4 4 0 2 4 1 1 1 4 4 1 0 1 1 4 1 0 4 1 1 1 1 1 0 1 1 1 4 0 1 4 4 2 4 2 1 3 2 4 1 1 3 1 3 3 4 1 4 0 1 1 1 0 2 0 1 1 1 4 2 4 2 4 4 1 4 1 1 1 4 1 4 4 1 4 1 1 2 4 1 4 1 4 1 1 2 1 2 1 3 4 1 4 1 1 1 1 1 4 2 4 4 1 0 1 1 1 0 4 4 3 3 3 1 1 4 1 1 4 1 0 4 1 4 1 4 4 4 1 1 1 4 3 1 4 2 1 0 4 4 4 4 1 1 1 1 3 1 4 4 4 1 1 0 1 1 0 0 2 1 1 4 4 4 4 4 1 4 4 1 3 1 1 1 4 4 1 1 0 1 1 1 1 1 2 2 1 2 4 4 1 4 1 4 1 0 1 4 0 2 1 3 1 1 3 2 3 3 1 4 4 3 1 0 1 2 4 1 1 4 4 0 4 1 3 1 3 2 3 3 3 4 3 1 0 1 0 0 0 4 0 2 2 4 1 4 0 1 2 3 4 299 199 48 147 167
ã¯ã©ã¹ã¿ãªã³ã°çµæã®å¯è¦å
ãã¨ã¯ããã®çµæãå¯è¦åããã°åé ã®å³ãå¾ããã¾ããä»åã d3.js ãå©ç¨ãã¾ãããã¯ã©ã¹ã¿çªå·ãã¨ã«è²ãå¤ãã¦å°é¢¨ã®è»è·¡ãæç»ãã¦ãã¾ããã¾ããåãçµæãã¯ã©ã¹ã¿ãã¨ã«åãã¦æç»ãããã®ã以ä¸ã® 5 æã®å³ã§ããåã¯ã©ã¹ã¿ã® medoid ã赤è²ã®è»è·¡ã§è¡¨ãã¦ãã¾ãã
*1:Dynamic Time Warping による時系列データの類似度計算 - y_uti のブログ
*2:ã気象庁 | 著作権・リンク・個人情報保護についてãã®å©ç¨è¦ç´ã«å¾ã£ã¦ãèªç±ã«å©ç¨ãããã¨ãèªãããã¦ãã¾ãã
*3:http://www.data.jma.go.jp/fcd/yoho/data/typhoon/T0101.pdf
*4:åç·¯ãæ±çµã示ãè¨å·ã¯æ°å¤ã®å¾ã«åºç¾ããã¾ããç´åã®ãã¼ã¿ã¨åãå ´åã«ã¯çç¥ããã¾ãããã®ãããããè¤éãªã¹ã¯ãªããã«ãªã£ã¦ãã¾ãã¾ããã
*5:緯度ã®æ°å¤ã 100 ã§å²ã£ãä½ãã¨ãã¦ããã®ã¯ããã¨ãã° 7112113.9 ã®ããã«æ¥æã¨ç·¯åº¦ãç¹ãã£ããã¼ã¿ãããããã§ãããã㯠7 æ 11 æ¥ 21 æã«åç·¯ 13.9 度ã¨èªã¿ã¾ãã緯度ã 10 度æªæºã®ãã¼ã¿ãããã¨ä»åã®ã¹ã¯ãªããã§ã¯å¦çã§ãã¾ãããã幸ãããã®ãããªãã¼ã¿ã¯ãªãããã§ããã
*6:ãã㯠Wikipedia ã® Algorithms ã®é ã« Voronoi iteration method ã¨ãã¦è¨è¼ããã¦ããæ¹æ³ã§ããå ·ä½çãªè¨ç®æ¹æ³ã«ã¯ããã®ã»ãã« PAM, CLARA ã¨ãã£ãæ¹æ³ãããããã§ãã