Mapreduce2.0
次ä¸ä»£Hadoopã®éçºãé²ãã§ãããç¾ç¶ã®æ¨ç§»ã§ã¯ãå°ãªãã¨ãåæ£ã¯ã©ã¦ãã§ã®ãOSSã¤ã³ãã©ãã¨ãã¦ã¯Hadoopãææååè£ã§ãããã¨ã¯ééããªããã¯ã©ã¦ãä¸ã§ã®åæ£å¦çåºç¤ã§ã®æè¡ç«¶äºã§ã¯GoogleãAmazonãç¸å½æããã§ããç¾å¨ãããã«å¯¾æãããå¯è½æ§ãããOSSã¯Hadoopã®æ½®æµã®å»¶é·ç·ä¸ã«ããèããããªãããã®å½¢ã¨ãã¦Hadoop-MapReduce2.0ãããããã«è¦ãããç¾å¨ã®ç¶æ ã§èªåãªãã®æ¬¡ä¸ä»£Hadoopã®ç解ãã¾ã¨ãã¦ãããåºæ¬çã«å ¨é¨ã¯è¦åãã¦ããªãã®ã§ããã®ãããã¯ããããããåºæ¬çã«æ¬¡ä¸ä»£Hadoopã®ä»çµã¿ã¯å¤§ããäºã¤ã®è¦ç´ ãããªã
ç¾å¨ã®ã¨ããã®æ±ã¯HDFSã¨Mapreduce2.0ã®äºã¤ã ã
ã¾ãMapReduceãããã¯å¾æ¥ã®ãMapReduceãã¨ãããã®ããã¯ã»ã©é ããããããä»»æãã®åæ£å¦çå®è¡ãã¬ã¼ã ã¯ã¼ã¯ã«ãããã¦ãé©åãªãªã½ã¼ã¹ãé åããå æ¬çãªä»çµã¿ã¨ãã風ã«ç解ããæ¹ãããããã®ä¸ã§ã¯ä»ç¾å¨ã®MapReduceã¯ãæ³å®ããã¦ããåæ£å¦çå®è¡ãã¬ã¼ã ã¯ã¼ã¯ã®ä¸ã¤ã§ããªããï¼ä¸ã¤ã§ãããªãããå¾æ¹äºææ§ã¯æ大éä¿è¨¼ãããã ãããï¼ç¾å¨ã®ã¨ããã¯ãMapReduce以å¤ã®å®è¡ãã¬ã¼ã ã¯ã¼ã¯ã¯BSPãMPIã¨ãã£ããã®ãæ³å®ããã¦ããããã ããå¾è¿°ãããããããããã«éå®ããããã®ã§ã¯ãªãã
ãã®MpaReduce2.0ã¯ãã¾ãã大ããäºã¤ã®è¦ç´ ãããªããããªãã¡ããªã½ã¼ã¹ç®¡çã¨ã¢ããªã±ã¼ã·ã§ã³ç®¡çã«ãªãããªã½ã¼ã¹ç®¡çã¯ãåæ£ç°å¢ä¸ã®æ§ã ãªãªã½ã¼ã¹ãå®è¡ãã¬ã¼ã ã¯ã¼ã¯å´ãå©ç¨ããããããã«ä¸å®ã®å½¢ã«æ´ãã¦ãæä¾ããã¨åæã«ããã®ç¶æ管çãè¡ããã¾ããã¢ããªã±ã¼ã·ã§ã³ç®¡çã¯ãæå®ãããã¢ã«ã´ãªãºã ã®å®è¡ãã¬ã¼ã ã¯ã¼ã¯ã«å¯¾ãã¦ãå®éã®æå ¥ãããã¢ããªã±ã¼ã·ã§ã³ã®ã©ã¤ããµã¤ã¯ã«ç®¡çããã£ã¨ç´è£çã«ããã¨çæ»ç®¡ççãå«ãã¢ãã¿ãªã³ã°ãæä¾ããããªããããã§ã®ã¢ããªã±ã¼ã·ã§ã³ã¯å½é¢ã¯jobã¾ãã¯è¤æ°ã®jobãããªãDAGï¼ãã¯ãDAGã¨ããè¨èã¯ä¸è¬çã«ä½¿ããã¦ããï¼ãæãã®ã ããããããéåæå¦çãä¸è¬ãæããã¨ã«ãªãã ããããã®äºã¤ã®æ©è½ã®çµã¿åããã«ãããåæ£å¦çãå®è¡ãããã¢ã¼ããã¯ãã£ã«ãªã£ã¦ãããå¾åã®Hadoopã§ã¯JobTrackerã¨TaskTrackerã®äºã¤ã®æ©è½ã§ãMapReduceã ãã®ãµãã¼ãã§ãã£ããã¨ãèããã¨ãåçãªå¤åã¨ãããããã¾ã£ããå¥ç©ã¨ã¿ãæ¹ãæ£ããããªããJobTrackerã¨TaskTrackerã¨ããè¨èå®å ¨ã«æ¶ãã¦ããã以ä¸ã«è©³ç´°ã«ã¿ã¦ãã
1-1ããªã½ã¼ã¹ç®¡ç(Resource Manager)
åæ£å¦çåºç¤ã®æ§ã
ãªãªã½ã¼ã¹ãä¸å®ã®åä½ã«ãã¦ç®¡çããé©åãªã¢ããªã±ã¼ã·ã§ã³å®è¡ã«å¿
è¦ãªè³æºãæä¾ããã主è¦ãªæ©è½ã¯ãSchedulerã¨ApplicationsManagerï¼whitepaperä¸ï¼ã®äºã¤ã«éç´ããã¦ããã
ã¾ãããã£ã¦ãåºæ¬åä½ããè¦ã¦ãããããã¯ãcontainer(resource container)ã¨ããã³ã³ã»ããã§ã¾ã¨ãããã¦ããã
http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Container.java
ã¾ãã«å¤ãã¦æ°ããæè¡ã¨ããå°è±¡ãæè¿ã§ã¯lxcã¨ããã®ã注ç®ããã¦ãã¾ãããè¦ã¯ãªã½ã¼ã¹ï¼CPU/ã¡ã¢ãªã¼/ãã£ã¹ã¯IOï¼ãã¾ã¨ãã¦ãã¼ãã£ã·ã§ãã³ã°ããä»çµã¿ã§ãããç¥ã£ã¦ãã人ã¯ãhttp://h50146.www5.hp.com/products/software/oe/hpux/component/vse/prm/ã¨ããã°æãåºã人ããããã¨ã
ãªãç¾ç¶ã®v1.0ã§ã¯ã¡ã¢ãªã¼ã®ã¿ã®é©ç¨ã§ã©ãã¿ã¦ãä»å¾ã«æå¾
ã
http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Resource.java
containerã¨ããã¢ããã¸ã¼ã§è¦ãã¨ã確å®ã«ãåæ£ãã¼ããçµ±åããä»®æ³OSãã¨ããä½ç½®ã¥ãã«ãªãã¤ã¤ãããã¨ã容æã«æ³å®ãããããªããcontainerããæ³åãããã¨ãããããããå®è¡å¹çã¯ç¸å½é«ãã¯ããï¼ã¨ãããé«ããããã¨ãçã£ã¦ããã¨æãããï¼
ã¤ãã«ãSchedulerã¯ãåæ£ãã¼ãããã¬ãã¼ããããcontainerãå¿
è¦ãªã¢ããªã±ã¼ã·ã§ã³ã®å®è¡ã«ã©ãã©ãå²ãå½ã¦ãæ©è½ããã¤ãå¾åã®Hadoopã§ã®schedulerã«è¿ããããã£ã¨ãã¼ã¬ãã«ãªæãã«è¿ãã
http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
ã¹ã±ã¸ã¥ã¼ã©ã¼ã®å²ãå½ã¦ãã¸ãã¯ã¯ãã©ã¬ãã«ã«ãªã£ã¦ããã
1-2ãã¢ããªã±ã¼ã·ã§ã³ç®¡ç(ApplicationMasterã¨ãããå¸ãlib群)
ããä¸ã¤ã®æ§æè¦ç´ ã¯(whitepaperã§ããã¨ããã®)ApplicationsManagerã§ãããï¼å®éã®å®è£
ã¯ãApplicationsManagerã§ã¯ãªããApplicationMasterLauncher, ApplicationMasterServiceçã«ãªã£ã¦ããããã ããã®è¾ºãã¯ããªãæµåçã«è¦ãããï¼ã¢ããªã±ã¼ã·ã§ã³ã®ã¹ã¿ã¼ãã·ã¼ã±ã³ã¹ã¯ãjobãªããDAGãå®è¡è¦æ±ãåºããã¨ãã«ãSchedulerããä¸å®ã®containerãå²ãå½ã¦ããããããã«åºã¥ãã¦ãã¢ããªã±ã¼ã·ã§ã³ã«å¯¾å¿ããApplicationMasterããResourceManagerããçæãããããã®æç¹ããã¢ããªã±ã¼ã·ã§ã³ã®ã©ã¤ããµã¤ã¯ã«ãå§ã¾ããç¾è¡ã®HadoopMapReduceã®åé¡ç¹ã®ä¸ã¤ã«ãjobã®æ»æ´»ç®¡çãçãã¨ããã¨ããããã£ããããã¯jobå®è¡ä¸ã«jobTrackerãæ»ãã å ´åTaskãæ»ã«åããªãã¨ãããã¨ãçºçãããã¨ãããããã®ãããªåé¡ã«å¯¾å¿ããæ段ã¨ãã¦ãã©ã¤ããµã¤ã¯ã«ç®¡çãå°å
¥ããã¦ãããApplicationMasterã¯çåä¸ã¯ãã¼ããã¼ããResoruceManagerã«éãç¶ãããã¨ã«ãªãã
ã¢ããªã±ã¼ã·ã§ã³ã®ãã¼ã¹ã¯ApplicationMasterã¨ããããã¬ã¼ã ã¯ã¼ã¯ãã«ä¾ã£ã¦ããã¨è¨ããããã¹ã¦ã®jobã«å¯¾ã«ãªã£ãApplicationMasterãå®è¡æã«1対1ã§çæããããï¼æ£ç¢ºã«ã¯å®è¡å¯è½æã®ããã ï¼ãã®ApplicationMasterãæå®ãããå¦çãå®éã«è¡ããå®ä½ã¯åæ£Shellã§ãããå¦çãå²ãå½ã¦ãããContainerã¸ãªã½ã¼ã¹ãæä¾ãä¿è¨¼ãã¦ããNodeManagerã«å¯¾ãã¦ãå¦çãä¾é ¼ããã
http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
大æ ã¯ä»¥ä¸ã®éãã§ãããä½ãå¾åã®Mapreduceã¨éããã¨ããã°ãã¾ãæ ¹æ¬çãªã¢ã¼ããã¯ãã£ãå¤æ´ãã¦ãããã¤ã³ãã©çã«ã¯æ§ã ãªåæ£ã¯ã©ã¦ãã®å è¡å®è£ ãåèãã¦ããç¹ãç°ãªããContainerã¯mesosã§ãå©ç¨ããã¦ããææ³ã§ããã,ç´°ããªãªã½ã¼ã¹ç®¡çããã£ã¦ããã¨ããæ¹éãããæ確ã«ãªã£ã¦ãã¦ããããã®æ¬¡ã®éãã¯ãå®è¡ãã¬ã¼ã ã¯ã¼ã¯ãåé¢ãã¦ããã¨ããç¹ã«ãªããMapReduce以å¤ã«BSPçã使ããã¨ãããã¨ã ãã§ã¯ãªããèªåãªãã®åæ£ã¢ã«ã´ãªãºã ãå®è£ ãããã¨ãå¯è½ã«ãªã£ã¦ãããå¹çãè¯ããã©ããã¯å¥ã ããç¹å®ã®ãã¡ã¤ã³ã§æå¹ãªä¿®æ£ããä¸å®ã®ã¢ã«ã´ãªãºã ã«æ½ãã¨ãããã¨ãå¯è½ã«ãªãã®ã§ããã¾ã使ãã°ç¸å½æç¨æ§ããããã
åå¥å®è£
ã¯ä¾ãã°ãmesos/spark-yarnã¨ããå½¢ã§ãBSPãyarnä¸ã®ApplicationMasterã§å®è£
ãã¦ããä¾ãããã
mesos/spark-yarn
https://github.com/mesos/spark-yarn
2 HDFSãHDFS Federation
ã¾ããªãã¨ãããå®å
¨ã«Mapreduce2.0ã¨ã¯åé¢ãã¦ãããå¾æ¥ã®Hadoopã§ã¯ãHDFSã¨MapReduceã¯è¡¨è£ä¸ä½ã ã£ãããMapReduce2.0ã§ã¯ããªãä½ç½®ä»ããç°ãªã£ã¦ãããHDFS Federationã¯ããã¨ç´ ç´ã«ããåæ£ãã¦ãããã¡ã¤ã«ã·ã¹ãã ãééçã«ä¸ã¤ã®ãã¡ã¤ã«ã·ã¹ãã ã¨ãã¦ç®¡çã§ãããã¨ããHDFSãã®ã¾ã¾ã«ãå¯ç¨æ§ã管çã¬ãã«ããããäºã«çµå§ãã¦ããæãã«ãªã£ã¦ãããã¬ã¤ã¤ã¼ã¨ãã¦ãããã«é«æ©è½ãçãã¨ãããããç¾è¡ã®ã¬ã¤ã¤ã¼ã§ã®ãå
ç¢æ§ã»ã¦ã¼ã¶ããªãã£ãåä¸ããããã¨ã«éããããã¦ãæãã ããã ã管çã¬ã¤ã¤ã¼ã®æ´çãè¡ã£ã¦ãããããããªã¬ã¤ã¤ãªã³ã°ã»ã¢ã¼ããã¯ãã£ã«ãªã£ã¦ãããã¨ã¯ãããMapReduce2.0ãããªãã®å¤æ´ããã£ããã¨ã¨æ¯ã¹ãã¨ãç¸å½å°å³ãªå°è±¡ãããããããã¦å¥½å¯¾ç
§ã«è¦ãããã¾ãããããæ£å¸¸é²åã¨ãããã®ã§ã¯ãªããã¨æãã
æ£å¸¸é²åã¨ãããã¨ã§ã¯ãå¾ã£ã¦ãããããã®èª²é¡ãä½ã§ããã«å¯¾ãã¦ã©ã®ããã«è§£æ±ºãã¦ããããã¨ããç¹ããæ¥ããã¹ãHDFS Federationãæ¦è¦³ãããç¾è¡HDFSã®åé¡ç¹ã¯ãSPoFã»ãã«ãããã³ãã»å ¨ä½çãªãã£ãã·ãã£ã®æ¡å¤§ã»appendã®åä¸ã«ãªããä»ã®ã¨ããè¦ãã¦ããã®ã¯ç¹ã«ãã«ãããã³ãã¨ãã£ãã·ãã£ã®æ¡å¤§ã®é¨åã«ãªãã
ææ¬çã«æãå ¥ãã¦ããã®ãNameNodeã®æ¡å¼µã«ãªã£ã¦ããã端çã«ããã¨æ©è½ãæ確ã«åé¢ãã¦ãè¤æ°ã®NameNodeã§ã®ç¨¼åã確ä¿ãã¦ãããå¾åã®æ¹æ³ã§ã¯ããã¯DataNodeã¯ã©ã¹ã¿ã¼ããã¼ãã£ã·ã§ãã³ã°ããããæ¹æ³ããªãã£ãã®ã§ãå¤§å¹ ãªæ¹è¯ã¨ããããã¾ãNameNodeãæ ã£ã¦ããæ©è½ãæ示çã«æ´çãã¦ãããããªãã¡ãåå空éã®å¶å¾¡ã¨ãã¼ã¿ãããã¯ã®å¶å¾¡ã®åé¢ã ãå¾åã§ã¯NameNodeã§ä¸ä½åãã¦ãã両è ãåé¢ããã¬ã¤ã¤ã¼ãæ確ã«åãã¦ãããç¹ã«ãã¼ã¿ãããã¯ã®å¶å¾¡ãåé¢ããBlockPoolã¨ãã¦ç¬ç«ãã¦æ±ã£ã¦ãããåNameNodeã«å¯¾å¿ããBlockPoolãOneForOneã§ã²ãã¥ãã¦ãåå空éããè¦ããã¼ã¿ãããã¯ã®ç®¡çãééçã«è¡ããããã«ãã¦ãããç©çãããã¯ãµã¤ãããè¦ãã°ãèªåã対å¿ãã¦ããBlockPoolã¨ã ãä¼è©±ããã°ããä»çµã¿ã«ãªã£ã¦ããããããã¯ãã¼ã«ãå¥ãµã¼ãã¼ã§ç®¡çãããã¨ãæ¤è¨ãã¦ããããã ããããã«ããBigDataã¨ããã¤ã¤ãç¾è¡ã®Hadoopã§ã¯ãããè¨çç¹ãè¶ããã¨ä¸æ°ã«å¶å¾¡ãé¢åã«ãªãã¨ããç¹ãã«ãã¼ã¢ãããã¦ããããã«ãè¦ããã
ãã®ä»ãHAããHbaseã¨ã®å ¼ãåãããããããããããã§ã¯ããããã¡ãã£ã¨ããã¡ãã£ã¨ãã£ããè¦ã¦ããäºå®ã
åèãµã¤ãã¯ä»¥ä¸ã«ãªã
H社ã®ããã
http://www.slideshare.net/hortonworks/apache-hadoop-023
http://www.hortonworks.com/an-introduction-to-hdfs-federation/
http://www.slideshare.net/hortonworks/nextgen-apache-hadoop-mapreduce
HDFS-HAã¯ãã¡ã
https://issues.apache.org/jira/browse/HDFS-1623
ãã§ãã£ã¡ã¯MR2
https://issues.apache.org/jira/browse/MAPREDUCE-279?attachmentSortBy=dateTime#attachmentmodule
https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdf