Hadoopã§Python使ã£ã¦ãã¹ããã¦ã¿ã - ãã®ï¼
前回ã®ç¶ã
Pythonã§[mapper]ã¨[reducer]ãä½ã£ãã¨ãããã
ãã¡ã¤ã«ã·ã¹ãã ã®ãã©ã¼ããããè¡ãã¾ãã
hadoop@ubuntu-vm:~$ hadoop namenode -format
hadoop@ubuntu-vm:~$ start-all.sh
hadoop@ubuntu-vm:~$ jps
9258 TaskTracker
9043 SecondaryNameNode
17131 Jps
8885 DataNode
8751 NameNode
9122 JobTracker
jpsã³ãã³ããä¸è¨ã®ããã«ãªã£ã¦ããã°OKã§ãã
ä»åº¦ã¯[HDFS]ã¨ãããã¡ã¤ã«ã·ã¹ãã ã«前回ä½ã£ããµã³ãã«ãã¼ã¿ãç»é²ãã¾ãã
ãã¼ã¿éãå¤ãå ´åã¯æéãçµæ§ãããã¾ãã
hadoop@ubuntu-vm:~$ hadoop dfs -copyFromLocal input input
ãã¡ã¤ã«ã·ã¹ãã ã«ç»é²ããããã¡ã¤ã«ã®ç¢ºèªã
hadoop@ubuntu-vm:~$ hadoop dfs -lsr
drwxr-xr-x - hadoop supergroup 0 2009-12-15 18:12 /user/hadoop/input
-rw-r--r-- 1 hadoop supergroup 96 2009-12-15 18:11 /user/hadoop/input/example.tsv
ã§ããã£ã¨å®è¡ã§ãã
hadoop@ubuntu-vm:~$ hadoop jar contrib/streaming/hadoop-0.20.1-streaming.jar \
-mapper /usr/local/hadoop/work/python/map.py \
-reducer /usr/local/hadoop/work/python/reduce.py \
-input input \
-output output
packageJobJar: [/tmp/hadoop-hadoop/hadoop-unjar2202646462728913956/] [] /tmp/streamjob1294070877681960957.jar tmpDir=null
09/12/15 18:13:43 INFO mapred.FileInputFormat: Total input paths to process : 1
09/12/15 18:13:44 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-hadoop/mapred/local]
09/12/15 18:13:44 INFO streaming.StreamJob: Running job: job_200912151157_0005
09/12/15 18:13:44 INFO streaming.StreamJob: To kill this job, run:
09/12/15 18:13:44 INFO streaming.StreamJob: /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_200912151157_0005
09/12/15 18:13:44 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_200912151157_0005
09/12/15 18:13:45 INFO streaming.StreamJob: map 0% reduce 0%
09/12/15 18:13:58 INFO streaming.StreamJob: map 100% reduce 0%
09/12/15 18:14:13 INFO streaming.StreamJob: map 100% reduce 100%
09/12/15 18:14:16 INFO streaming.StreamJob: Job complete: job_200912151157_0005
09/12/15 18:14:16 INFO streaming.StreamJob: Output: output
å¦ççµæã®ç¢ºèªã
hadoop@ubuntu-vm:~$ hadoop dfs -lsr
drwxr-xr-x - hadoop supergroup 0 2009-12-15 18:12 /user/hadoop/input
-rw-r--r-- 1 hadoop supergroup 96 2009-12-15 18:11 /user/hadoop/input/example.tsv
drwxr-xr-x - hadoop supergroup 0 2009-12-15 18:14 /user/hadoop/output
drwxr-xr-x - hadoop supergroup 0 2009-12-15 18:13 /user/hadoop/output/_logs
drwxr-xr-x - hadoop supergroup 0 2009-12-15 18:13 /user/hadoop/output/_logs/history
-rw-r--r-- 1 hadoop supergroup 17173 2009-12-15 18:13 /user/hadoop/output/_logs/history/localhost_1260845857887_job_200912151157_0005_conf.xml
-rw-r--r-- 1 hadoop supergroup 8986 2009-12-15 18:13 /user/hadoop/output/_logs/history/localhost_1260845857887_job_200912151157_0005_hadoop_streamjob1294070877681960957.jar
-rw-r--r-- 1 hadoop supergroup 74 2009-12-15 18:14 /user/hadoop/output/part-00000
ä¸èº«ãè¦ã¦ã¿ãã
hadoop@ubuntu-vm:~$ hadoop dfs -cat /user/hadoop/output/part-00000
[ test ] : 4
[ aaaa ] : 2
[ bbbbb ] : 1
[ hagaeru3sei ] : 2
[ mochi ] : 2
ãããããï¼ï¼ï¼ã§ããï¼ï¼ï¼
次åã¯ãã¼ã¿éãå¢ããã¦Amazon S3ã¨ã使ã£ã¦ãã¹ããã¦ã¿ããã¨æãã¾ãã