ãä»æ¥ã¯ã¨ããç§çãªããã¸ã§ã¯ãã§ç»åã®æ å ±æä½ãå¿ è¦ã«ãªã£ãã®ã§ããã¼ã«ã«ã§è¨ç®ãã¦ããããã ãã©ä»å¾ããã©ã¡ã¼ã¿ãã¥ã¼ãã³ã°ã§ä½åº¦ãçºçããããªã®ã§ãAmazonã®MapReduceã使ã£ã¦ã¿ã¾ããã
ä»åã®æ§æå³ã¯ãããªããããå ¨é¨Amazonã§å ¨é¨PHPã§ãã£ã¦ã¿ãã
ãããã¨ããã£ããã£ãã
ãå¥ã«5ä¸æãããããå®ã¯Amazonãããªãã¦ãããã£ãããã¾ããä¸æ©ãããºã¼ã®ã¹ã¯ãªããã4ã5æ¬å¹³è¡ã§èµ°ããã¦ããã°ããã¼ã«ã«ã®MySQLã«çµæãããããããå¯è½ã§ããã§ãã
- ä»å¾ã®ããã«åå¼·ãããã£ããMapReduceã§èª¿ã¹ã¦ããApacheã®ãã°ã¨ãããã¹ãæä½ãããããè¦ã¤ãããªãã£ãã®ã§ã
- ãã¼ã¿ããªã³ã©ã¤ã³ã«ç½®ãããã£ãã»ãã¼ã«ã«ã«ç½®ããããªãã£ãã
- ããããã¨ãMapReduceããã«ããããªãã«ããS3ãããããã ã£ãã
- ããããEC2ä¸æ©åãããããMapReduceã§ä¸æ°ã«ãã£ãæ¹ãå®ãä¸ããããããªãï¼ã¨ãæã£ãã
ã¾ãå¦ç対象ã®ç»åãã¡ã¤ã«ãS3ã«é åããã
ãAmazon MapReduceããä¸æã«ã¢ã¯ã»ã¹ããã«ã¯ãé常ã®ã¬ã³é¯ã ã¨ãè¿·æãããããããããããªãã®ã§ãS3ã«ãã¾ãããã¨ãããæåã£ããS3ã«ç½®ãã¤ããã§ãããç°ãªãURLä½ç³»ã®ãã¼ã¿5ä¸ãã¡ã¤ã«ãã¡ã¾ã¡ã¾S3ã«ã¢ãããã¾ããæåã¯GUIã§ãããã¨æã£ããã©ãS3Foxã ã¨ï¼ãã£ã¬ã¯ããªå ã®ãã¡ã¤ã«ã巨大ã«ãªãã¨åºã¾ã£ã¦ä½¿ãç©ã«ãªãããs3sync.rbã使ã£ã¦ã¿ããã¡ã¿ãã¼ã¿ãæ¸ãè¾¼ã¾ãããã©ã¦ã¶ã¢ã¯ã»ã¹ã§ä½¿ãç©ã«ãªããããã¨ãªããPHPã§ããããããã¾ããã
- åèï¼PHPからAmazon S3を利用するライブラリを3つ試してみた(うち一つは動作不可) : akiyan.com
- ãã®ã©ã¤ãã©ãªï¼amazon-s3-php-class - Standalone Amazon S3 REST implementation for PHP 5 - Google Project Hosting
ã§ã5ä¸ãã¡ã¤ã«ãï¼ã¹ã¬ãããããã§ã¡ã¾ã¡ã¾ã¢ãããã¾ããããã©ã«ãæ°ãå«ããã¨å ¨é¨ã§20ä¸å¼·ã®ãã¼ãæ°ãªãã§ãããS3ã¯ãã©ã«ãã¨ããæ¦å¿µããªãã®ã§ãPUTã¯ãã¡ã¤ã«ã®æ°ã ãã§ãã¿ã¾ãããã¾ããã®ã¢ãããã¼ãã®éç¨ã§ãURLä¸è¦§ããã°ã«åºåãã¦ããã¾ãï¼ãã¨ã§ä½¿ãï¼ã
MapReduceã«å¿ è¦ãªãã¡ã¤ã«ãã¢ãããã
ã次ã«ãMapReduceã«å¿ è¦ãªmapperã¨reducerãã¢ãããã¾ããããã¯S3Foxã使ãã¾ãããã¢ã¯ã»ã¹æ¨©(ACL)ã¯èªåã®ã¿ã§OKã§ããã¾ããå ã»ã©ã®ã¢ãããã¼ãå¦çã§çæããURLä¸è¦§
https://s3.amazonaws.com/(ç¥)/0001.png
https://s3.amazonaws.com/(ç¥)/0002.png
https://s3.amazonaws.com/(ç¥)/0003.png
https://s3.amazonaws.com/(ç¥)/0004.png
ã¿ãããªï¼ä¸è¡ãããã®ããã¹ããã¡ã¤ã«ã/bucket_name/input/filelist.txtã«é åãã¾ãã
Mapperã¨Reducerã®å¦çå 容
Reducerã¯ãããããããã¹ãã«ã¦ã³ãã¨ããªãã§ãMapããã®è¨ç®çµæãåºåãã¦ãåå¸ãã¿ãã®ã«ä½¿ããã¨æã£ãã®ã§ããã¡ãã®ãµã³ãã«ã³ã¼ãããã®ã¾ã¾ä½¿ããã¦ããã ãã¾ããã
Mapperãªãã§ãããã³ããããã¨é·ããªãã¾ãã®ã§æ¦è¦ãããã¾ãã¨
- ãã°åºåtoSimpleDB
- whileï¼æ¨æºåºåãåå¨ããéï¼
- 巨大ãªããã¹ããã¡ã¤ã«ã®ï¼è¡ãåå¾ï¼URLãã²ãã
- file_get_contentsãã¦ãimagecreatefromstringããã
- ç»åå¦çãè¡ã(GDã¨ãæ®éã«ä½¿ãã)
- ç»åã®å¦ççµæããSimpleDBã«insert(Put Attribute)ããã
- Reducerã«æ¸¡ãã¿ãåºåãã®key valueãechoã§åºåãã
ã¨ããæãã§ããPHPã1ãã¡ã¤ã«ã§ã¤ãããªãã¨ãããªãã®ã§ãåã©ã¤ãã©ãªããã¡ã½ãããå¯ãéãã¦åæ§æãã¦ã¤ããã¾ããã
SimpleDBã®æ´»ç¨
ããã¡ããåå¼·ç¨éã®è²ãå¼·ãã®ã§ãããæè¿ã¯ããã®key/valueã¹ãã¢ã使ã£ã¦ã¿ããã£ããGAE/Jã¯è§¦ãã¾ãããããè² è·ãä¸ãã£ã¦ããªãã¢ã«ã¹ã±ã¼ã«ãããPHPãã使ããKVSã§ã¡ã¸ã£ã¼ã§ç°¡åã«è©¦ãããã®ãããã¨ãªãã¨SimpleDBãä¸çªããããã ã£ãã®ã§ãèªåã§ãµã³ãã«ã³ã¼ãã®WebUIã©ããã¼ãããã¤ã¤ãä»åã®Mapperã«ãçµã¿è¾¼ãã§ã¿ã¾ããã
ãSimpleDBã®æ¦è¦ã¯ãã®ã¸ããããããããã§ãã
ãã§ãèªåã§ã¤ãã£ãWebUIã©ããã¼ããããªããããèªåç¨ãªã®ã§ã¦ã«ãä½ãã§ããã¾ã RDBMSã«ãããUPDATEã¨ãã¼ã¸éããæªå®è£
ã
ãã¼ãã«ï¼ãã¡ã¤ã³ï¼ã®ã¡ã¿ãã¼ã¿ããããªæãã§æ¥æ¬èªè¡¨ç¤º
ã§ãçµæã¯å¤±æãã¾ããã
ããªãããããããããªããã§ãããããããreducer.phpã®ã¡ã¢ãªä¸è¶³ãä½ãã§2.8ä¸ãã¡ã¤ã«ãããã§æ¢ã¾ã£ã¦ãã¾ãã¾ããããã以å¤ã«ãã6400ãã¡ã¤ã«ãããã§æ¥ã«é²æãæ¢ã¾ã£ããããã®ã§ãä½ãåé¡ãèµ·ãã¦ããããã§ããããã°ãè¦ã«ããã®ãé£ç¹ã§ããããã®ã¸ãã¯Hadoopã®ç¥èãå¿
è¦ãªããã§ããtimestampã¨ããvalueãè¤æ°putããã¦ããä¾ããã£ãã®ã§ãããã¡ãã£ã¨è©³ç´°ã«ãã°ãã¨ããªãããã£ã¦ã¿ããã¨æãã¾ãã
ããã°ã¯ãã¡ã¤ã«ã«åºåã§ããªãã®ã§ãSimpleDBã«putããã¨ããããæ¹ã¯æªããªããªã¨æãã¾ãããauto_incrementã¿ãããªã®ããªãã£ãã¯ããªã®ã§ãä¸æãã¼ã®ã¤ãæ¹ã ãã課é¡ã§ãããã°ãªããã¤ã¯ãç§ã¨ãã§ããããã§ããã©ãã
使ã£ããéã¯20å°ã2æéã§400å
ãMapReduce代ã360åã«ãAmazonS3代ã40åä½ã§ãããæ°åæé使ãã¨ç·´ç¿å°ã¨ãã¦ã¯é«ãã¤ãã¾ããããã®å¤æ®µãªãã¬ã³ã¬ã³å¤±æã§ãã¾ãããã²ã¨ã¾ãç·´ç¿å°ã¯ã¤ã³ã¹ã¿ã³ã¹æ°ã1ãå¦çãããã¡ã¤ã«ã10è¡ã¨ãã«ãã¦ããã°ãï¼åãã¹ããããã¨ã«10åãããããã¾ããã®ã§ããããã§ãã
åã¯ä»åã®ãã¹ãã§ã
- inputãã¡ã¤ã«ã¯ããã£ã¬ã¯ããªãæå®ããªãã¨ãããªã
- PHPã使ãã¨ãã¯CGIãªã®ã§ã·ã§ãã³ã°(#!/usr/bin/php)ããããªãã¨ãã¡
- ãã°ã¯ããã©ã«ãã§ã¯ãã£ããã§ãªããã¤ã³ã¹ã¿ã³ã¹æ°ãå ¥åããç»é¢ã§ãªãã·ã§ã³ã§è¨å®ãã¾ãããã