SlideShare a Scribd company logo
Hadoop!
2010/05/16
  naoki yanai
   id:yanaoki




                1
Hadoop

Hadoop
(Elastic MapReduce)


                      2
naoki yanai (id:yanaoki)
Web

Hadooop



                           m        m

                           iPhone

          Ruby Java


                                        3
Hadoop




         4
Hadoop




Java

Apache




                  5
Hadoop
Google 2004

MapReduce
  http://labs.google.com/papers/mapreduce.html

Google File System (GFS)
  http://labs.google.com/papers/gfs.html

2010                          Google




                                                 6
Hadoop

Web




  →




               7
8
Hadoop
         9
Hadoop
Yahoo
  Yahoo      Hadoop




     Facebook Amazon

                       10
Hadoop
RDBMS




 Join   mapreduce join




SQL     Hadoop

                         11
Hadoop


                         MapReduce
        web
                     HDFS




          RDB

  Web           Hadoop



                                     12
Hadoop


                         MapReduce
        web
                     HDFS




          RDB

  Web           Hadoop



                                     13
Hadoop


            N




Hadoop



                14
Hadoop
MapReduce HDFS

Hadoop

         MapReduce HDFS




                          15
MapReduce


      → map       → reduce      →

map   reduce    hadoop

                    key-value

               Hadoop




                                    16
HDFS

Hadoop


MapReduce




                   17
MapReduce

                          slave
                                   MR:TaskTracker
master
         MR:JobTracker
                          slave
                                  MR:TaskTracker
                  (Job)

                                   (map            reduce




                                                            18
HDFS

                       slave
                               HDFS:DataNode
master
     HDFS:NameNode
                       slave
                               HDFS:DataNode




                                               19
Hadoop
                  MapReduce HDFS
                                         slave
                                                 MR:TaskTracker
       master
                                                 HDFS:DataNode
                MR:JobTracker

                                         slave
            HDFS:NameNode
                                                 MR:TaskTracker

                                                 HDFS:DataNode
Hadoop
HDFS
MapReduce map                   reduce             JobTracker
                                                   map    reduce
                                                                   20
MapReduce


AA                A3
AB                B2
BC                C1
input            output


 map    reduce

                          21
MapReduce
Example                               Google

map(String key, String value):
 / key: document name
  /
 / value: document contents
  /
 for each word w in value:
   EmitIntermediate(w, "1");

reduce(String key, Iterator values)
  / key: a word
   /
  / values: a list of counts
   /
  int result = 0;
  for each v in values:
    result += ParseInt(v);
  Emit(AsString(result));
                                               22
MapReduce

              A:1
              A:1
        map              A:<1,1,1>
                                              A:3
                                              C:1
AA
AB
                         C:<1>       reduce
BC            A:1
              B:1
        map                                   B:2

HDFS
                         B:<1,1>
                                     reduce
              B:1
input   map
              C:1
                                                     HDFS
                    shuffle                          output
        map                          reduce
                     (sort)
                                                             23
MapReduce




  Google
            24
Hadoop
Mahout

Hadoop

Apache




  CollaborativeFiltering
  Classifier
  Clustering
  DecisionForest

                           25
Hadoop




         26
Hadoop




         27
Amazon Web Service   EC2




                           28
Amazon Web Service




WebAPI




                     29
Amazon Web Service
 EC2 ( Elastic Compute Cloud )
                                 root/admin



 S3 ( Simple Storage Service )




 EMR ( Elastic MapReduce )
    Web
    Hadoop              → MapReduce

    EC2   S3                       +α
                                              30
Elastic MapReduce
            Hadoop




                  Hadoop


input    output     S3


                           31
Elastic MapReduce




 Amazon




                    32
Elastic MapReduce
client           cloud         master
          API            Job




         input/output                        slave


                          S3
                                            slave

                                    slave

                                                     33
Elastic MapReduce




      MapReduce




                    34
Elastic MapReduce
Finding Similar Items with Amazon Elastic MapReduce,
Python, and Hadoop Streaming
http://developer.amazonwebservices.com/connect/
entry.jspa?externalID=2294
Item




                                                       35
Elastic MapReduce
                          map/reduce


               map/reduce
input       http://www.grouplens.org/
        5




                                        36
Elastic MapReduce

input S3
[       ID] [        ID] [     ]

    map/reduce

    output      S3
[         ID] [        ID] [   ]

                                   37
Elastic MapReduce
S                                                   S



    map
     map reduce     map
                     map reduce     map
                                     map reduce
      map reduce
           reduce     map reduce
                           reduce     map reduce
                                           reduce




                                                        38
Elastic MapReduce
          step1 :
input
  key:[] value:[   ID_           ID_       ]




           map                       ID
                         key:[        ID] values[          ID_   ]

         reduce                      ID
output
         ID ¥t      ID_          |          ID_     |...


                                                                     39
Elastic MapReduce
 step2 :
input
  key:[     ID] value:[       ID_          |     ID_          |...]



                                    ID
           map
                      key:[         IDx_       IDy] values[           x_   y]

                                    ID
          reduce
output
          IDx_            _    IDy


                                                                                40
Elastic MapReduce
         step3 :
input
         IDx_        _           IDy


                             IDx_(1-              ) key map
          map map
                  key: <         IDx_(1-   )> values <   IDy>


         reduce             1-
output
         IDx_        IDy_


                                                                41
Elastic MapReduce




                    42
Elastic MapReduce
              1



elastic-mapreduce 
--create 
--name "item similarity job" 
--alive 
--log-uri s3n://bucket /logs 
--num-instances 10 
--instance-type m1.small 
--availability-zone us-west-1a




                                 43
EC2




EC2
            44
Elastic MapReduce




WAITING
                         45
Elastic MapReduce
             2
                            S3                     (s3cmd
         input
            map/reduce python

s3cmd.rb put bucket   :input/input.tsv input.tsv
s3cmd.rb put bucket   :script/map.py map1.py
s3cmd.rb put bucket   :script/reduce1.py reduce1.py
...


                                                            46
Elastic MapReduce
           4
     Job



elastic-mapreduce 
--job-flow-id j-2ROU0QKL6KOV6 
--json item_similarity.json




                                 47
Elastic MapReduce




Step1       RUNNING
                      48
Elastic MapReduce
            5
      output



s3sync.rb -r --make-dirs bucket   :output .

elastic-mapreduce 
--terminate 
--job-flow-id j-2ROU0QKL6KOV6




                                              49
Hadoop
         x




             50
Hadoop

Tom White ( )
         (      )
         (      )




¥4,830




                    51
52

More Related Content

Hadoop入門とクラウド利用