This document discusses Hadoop and related technologies. It introduces Hadoop, its components MapReduce and HDFS, and how they work together. It also briefly mentions related Apache projects like Mahout and how companies like Amazon, Yahoo and Facebook use Hadoop in their systems. Finally, it covers Amazon's Elastic MapReduce service, which allows running Hadoop jobs in the cloud.
6. Hadoop
Google 2004
MapReduce
http://labs.google.com/papers/mapreduce.html
Google File System (GFS)
http://labs.google.com/papers/gfs.html
2010 Google
6
21. MapReduce
AA A3
AB B2
BC C1
input output
map reduce
21
22. MapReduce
Example Google
map(String key, String value):
/ key: document name
/
/ value: document contents
/
for each word w in value:
EmitIntermediate(w, "1");
reduce(String key, Iterator values)
/ key: a word
/
/ values: a list of counts
/
int result = 0;
for each v in values:
result += ParseInt(v);
Emit(AsString(result));
22