Hadoop MapReduce ãã¶ã¤ã³ãã¿ã¼ã³ ã¾ã¨ã
æã«å ¥ã£ãã®ã§èªã¿ã¤ã¤å¿ è¦ãªã¨ãããã¾ã¨ãã¦ããã
2ç« :MapReduceã®åºç¤
大è¦æ¨¡ãã¼ã¿ã®åé¡ã«å¯¾ããå®éçãªã¢ããã¼ãã¯åå²çµ±æ²»æ³ãããªãã
åå²çµ±æ²»æ³ã¢ã«ã´ãªãºã ã®å®è£
ã«ã¯å¯¾å¦ããå¿
è¦ã®ããåé¡ï¼ä½æ°´æºãªãã®ãï¼ãå¤ãã
Hadoopã¯ãã®ä½æ°´æºãªåé¡ãããã°ã©ã ä½æè
ãèããã«ããæ½è±¡åãããã¤ã³ã¿ã¼ãã§ã¼ã¹ãæä¾ããã
Hadoopã¨googleã®map reduceå®è£
ã¯ç°ãªãç¹ãããã
googleã®å®è£
ã§ã¯reducerã«æ¸¡ãvalueã®ä¸¦ã³ãæå®ããã»ã«ã³ããªã½ã¼ããã¼ãæå®ã§ããã
Hadoopã§ã¯ãã®ãããªæå®ã¯ã§ããªãã
mapã¿ã¹ã¯ã®æ°ã¯å
¥åãã¼ã¿ã«ããå¯å¤ã ããReduceã¿ã¹ã¯ã®æ°ã¯å³å¯ã«ããã°ã©ããæå®å¯è½ã
mapã¿ã¹ã¯ãreduceã¿ã¹ã¯ã§ã¯å¤é¨ç¶æ
ã«å½±é¿ãããå¦çãè¡ããã¨ãå¯è½ã
mapã¿ã¹ã¯ã¨reduceã¿ã¹ã¯ã®å®è¡æéã¯ãããããã£ã¨ãé
ãã¿ã¹ã¯ã®å®è¡æéã«è¦ç¨ãããã
ãããææ£çå®è¡ãã¤ã¾ãåãã¿ã¹ã¯ã®è¤è£½ãå¥ã
ã®ãã·ã³ã§å®è¡ãããæ©ãçµäºããçµæã使ç¨ãããã¨ã«ãããé«éåãå¯è½ã
ãããã¿ã¹ã¯ã§ãããªãããå¦çãã®ãã®ãéãå ´åãå®è¡æéã¯ãã¾ãæ¹åãããã¨ã¯ã§ããªãã
ã¿ã¹ã¯ãå®è¡ãããã¼ãã¯ãå¿
è¦ãªãã¼ã¿ãããå ´æãããªãã¹ãæãè¿ããã¼ããé¸ãã§è¡ãã
ããã¯éä¿¡ã«ããå®è¡ã³ã¹ãå¢å¤§ãé²ãããã
mapã¨reduceã®ã¿ã¹ã¯ã®ä»ã«ãcombinerã¨partitionerãåå¨ããã
combinerã¯mapã®çµæãéç´ããå½¹å²ããpartitionerã¯ã©ã®ãã¼ãæã¤ãã¼ã¿ãã©ã®reduceã¿ã¹ã¯ã¸éããã決å®ããå½¹å²ãæã¤ã
HDFSã¯ãã¼ã¿ãã¼ãã¨ãã¼ã ãã¼ãã«åããã¦ããããã¼ã¿ãã¼ãã¯ãã¼ã¿ãã®ãã®ãããã¼ã ãã¼ãã¯ãã¼ã¿ã®æµãå
¨ä½ã管çããå½¹å²ãæã¤ã
ã¯ã©ã¤ã¢ã³ãã¨ã®ãã¼ã¿ã®éä¿¡ã¯ãã¼ã¿ãã¼ããç´æ¥è¡ãå½¢ã¨ãªãããã¼ã ãã¼ããéä¿¡ãããã¼ã¿ã¯ã¡ã¿ãã¼ã¿ã®ã¿ã¨ãªãã
HDFSã®ãã¼ã ãã¼ãã¯ä»¥ä¸ã®è²¬ä»»ãè² ãã
- ãã¡ã¤ã«ã®åå空éã®ç®¡ç
- ãã¡ã¤ã«æä½ã®å¶å¾¡
- ãã¡ã¤ã«ã·ã¹ãã ã®å ¨ä½çãªå¥å ¨æ§ã®ç®¡ç
MapReduceã¢ã«ã´ãªãºã ã®å¹ççãªè¨è¨ã«ã¯ä»¥ä¸ã®é¸æãç解ãããã¨ãå¿ è¦
- æ¯è¼çå°æ°ã®å¤§ããªãµã¤ãºã®ãã¡ã¤ã«ãä¿åãã(ããã¯HDFSã®ãããã¯ãµã¤ãºã大ãããå ¥åãã¡ã¤ã«ãå¢ããã¨ãã®å ¥åãã¡ã¤ã«åã®mapã¿ã¹ã¯ãçããã®ã§)
- åºã転é帯åã®ç¢ºä¿
- å®ä¾¡ã ãä¿¡é ¼æ§ãããã»ã©é«ããªãæ§æè¦ç´ ã«ãã£ã¦ã·ã¹ãã ãæ§ç¯ããã
3ç« :MapReduceã¢ã«ã´ãªãºã ã®è¨è¨
主ãªãã¶ã¤ã³ãã¿ã¼ã³
in-mapper-combining
combinerãmapã¯ã©ã¹ã®ã¤ã³ã¹ã¿ã³ã¹ã«æããã¦ããæ¹æ³ã
æ¬æ¥ã®combinerã¯å¿
ãå®è¡ããããã®ã§ã¯ãªããããcombinerãå¿
ãå®è¡ããããã¨ãã¯ãã®æ¹æ³ãæåã
mapã¯ã©ã¹ã®ã¤ã³ã¹ã¿ã³ã¹ã¯åãã¿ã¹ã¯ï¼ã§ä½¿ãã¾ãããããããã¤ã³ã¹ã¿ã³ã¹å¤æ°ã¨ãã¦é£æ³é
åãæã£ã¦ããã¦ãããã«ãã¼ã¿ãèãã¦ãããéç´ããçµæãã¤ã³ã¹ã¿ã³ã¹ç ´æ£æã«åºåãããªã©ã
pairsã¨stripes
pairsã¯å
±èµ·ã®èªãããããã®ãã¢ãã¨ã«mapã§åºåããreduceã§éè¨ããã
stripesã¯å
±èµ·ããèªãããã·ã¥ãããã«èããããèªèº«ãmapã§åºåããreduceã§éè¨ã
stripesã®ã»ããå¹ççã ããã¡ã¢ãªãã¹ã±ã¼ã©ããªãã£ã®ããã«ããã¯ã¨ãªãã
ã©ã¡ããcombainerã§éç´å¦çãå¯è½
order inversion
æ¼ç®ã®ä¸¦ã³ãã½ã¼ãã®åé¡ã«å¤æããã¨ããã®ãåºæ¬çãªèãã
ã½ã¼ãã«ããå
ã«å¿
è¦ãªãã¼ã¿ãReduceã«å
ã«éããã¨ãå¯è½ã¨ãªãï¼ã½ã¼ãé ã«reduceã§ã¯å¦çãè¡ãããï¼ã
value-to-key conversion
å¤ã®ä¸é¨ããã¼ã«ç§»ããã¨ã§ãã½ã¼ãã®ããã«map reduceã®å®è¡ãã¬ã¼ã ã¯ã¼ã¯èªèº«ã使ããã¨ãå¯è½ã
çªãè©°ããã°map reduceã§åæãå¶å¾¡ããã¨ãããã¨ã¯ã以ä¸ã®ãã¯ããã¯ãå¹ççã«å©ç¨ããã¨ãããã¨ã«éç´ãããã
- æ¼ç®ã«å¿ è¦ãªãã¼ã¿ãçµã¿åãããè¤ååã®ãã¼ã¨å¤ã®æ§ç¯
- mapperãreducerã§ã®ã¦ã¼ã¶ã¼æå®ã®åæååã³çµäºã³ã¼ãã®å®è¡
- mapperãreducerã§ã®å¾©æ°ã®å ¥åã«ã¾ãããç¶æ ã®ä¿å
- ä¸éãã¼ã®ã½ã¼ãé åºã®å¶å¾¡
- ä¸éãã¼ç©ºéã®ãã¼ãã£ã·ã§ãã³ã°ã®å¶å¾¡(ã©ã®reduceã¸éãã)
5ç« :ã°ã©ãã®ã¢ã«ã´ãªãºã
並åå¹ åªå æ¢ç´¢
mapã§ã¯ãã¼ãç¾å¨ã®idãvalueãç¾å¨ã®ã°ã©ãã¨ãã¦åãåãããããå
ã¨ã°ã©ããã®ãã®ãåºåã
redeuceã§ã¯ã°ã©ããã¼ã¿ã¨ããã¾ã§ã®è·é¢ãåãåããæ°ããªè·é¢ãã¼ã¿ãä½ãã ãã
辺ã®ã³ã¹ãã1ã«åºå®ããã¦ããå ´åã¯ãã³ã¹ããâã®ãã¼ãããªããªã£ãæç¹ã§çµäºããã°ããã
ããã¾ã§map reduceãç¹°ãè¿ããã¨ã«ãªããçµäºå¤å®ã¯hadoopAPIã«åå¨ããã«ã¦ã³ãã使ç¨ããã°å¯è½(ãã¼ãã®ã³ã¹ããæ´æ°ããããã¤ã³ã¯ãªã¡ã³ã)ã
辺ã®ã³ã¹ãã1ã«åºå®ããã¦ããªãå ´åã¯ãææªã§|ãã¼ãæ°-1|ã®ç¹°ãè¿ããå¿
è¦ã¨ãªãã
çµäºå¤å®ã¯ããã¼ãã®æçã³ã¹ããæ´æ°ãããªããªã£ããçµäºããã°ããããããã«ã¦ã³ããç¨ããã°å¯è½ã
ããã¯ãã¤ã¯ã¹ãã©æ³ãä¸å°ã§è¡ã£ãæã¨æ¯ã¹ãã¯ããã«å¹çãæªããï¼ããããéè·¯ããããããã¨ææªï¼
ããã¯ä¸¦ååã®ããã®ã³ã¹ãã¨ã¿ãªãã
並åå¹ åªå æ¤ç´¢ã¢ã«ã´ãªãºã ã¯ãmap reduceã§ã®å¤æ°ã®ä¸é£ã®ã°ã©ãã¢ã«ã´ãªãºã ã®ååã¨ãªãæ§é ã示ãã¦ããããããã®ã°ã©ãã¢ã«ã´ãªãºã ã¯ä»¥ä¸ã®ãããªç¹å¾´ãå ±éãã¦æã¤ã
- ã°ã©ãã®æ§é ã¯é£æ¥ãªã¹ãã§è¡¨ç¾ããã
- mapå¦çã¯ãã¼ãã®ãã¼ã¿æ§é ã«å¤§ãã¦è¡ãããreduceå¦çã§ã¯åãè¡ãå ã®ãã¼ã¿ãåãåããããã«å¯¾ããå¦çãããã
- æ¼ç®å¦çã®çµæã«å ãã¦ã°ã©ãæ§é ãã®ãã®ã渡ãå¿ è¦ãããã
- map reduceã§ã®ã°ã©ãã¢ã«ã´ãªãºã ã¯ç¹°ãè¿ãã®å¦çãåæã¨ãªã£ã¦ããã
PageRank
Webãã¼ã¸ã®å質ã測å®ããããã«Googleã®æ¤ç´¢ã¨ã³ã¸ã³ã§ç¨ãããã¦ãæåãªææ³ã
ãã¤ãã¼ãªã³ã¯ã®ã°ã©ãæ§é ã«åºã¥ãã¦æ¸¬å®ãè¡ãã
å¼ç¨å
:
ãHadoop MapReduce ãã¶ã¤ã³ãã¿ã¼ã³ã