CentOS+CDH5ã§Sparkã®æ¬ä¼¼åæ£ç°å¢ãä½ã
æ¥æ¬ã·ãªã¼ãºãçµãã£ã¦ãã¾ã£ãã®ã§Sparkãããã¾ããä»å¹´ä¸ã«ä½ã¨ãSparkãå®ç¨ã§ããã¨ããã¾ã§æã£ã¦è¡ãããã§ãã
åèãµã¤ã
- http://dmtolpeko.com/2015/02/06/installing-and-running-spark-on-yarn/
- http://datasciesotist.hatenablog.jp/entry/2014/05/10/225809
- http://stackoverflow.com/questions/27299923/how-to-load-local-file-in-sc-textfile-instead-of-hdfs
ç°å¢
- CentOS6.6
- CDH 5
Scalaã®ã¤ã³ã¹ãã¼ã«
Javaã®ã¤ã³ã¹ãã¼ã«ã¯çç¥ãã¾ãã
wget http://downloads.typesafe.com/scala/2.11.7/scala-2.11.7.tgz tar xvf scala-2.11.7.tgz mv scala-2.11.7 /usr/local/share/scala
/etc/profileã«ä»¥ä¸ã®3è¡ã追å
SCALA_HOME=/usr/local/share/scala PATH=$SCALA_HOME/bin:$PATH export PATH SCALA_HOME
ç°å¢å¤æ°ãæ£ããè¨å®ã§ãã¦ãããã¨ã確èª
$ source /etc/profile $ scala -version #Versionãæ£ããåºãã°OK Scala code runner version 2.11.7 -- Copyright 2002-2013, LAMP/EPFL
CDH5åã³Sparkã®ã¤ã³ã¹ãã¼ã«
ãªãã¸ããªã¤ã³ã¹ãã¼ã«
wget http://archive.cloudera.com/cdh5/one-click-install/redhat/6/x86_64/cloudera-cdh-5-0.x86_64.rpm rpm -i cloudera-cdh-5-0.x86_64.rpm rpm --import http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
æ¬ä¼¼åæ£ã¢ã¼ãã§ã®ã¤ã³ã¹ãã¼ã«
yum install hadoop-conf-pseudo
HDFSã®æ¨©éè¨å®åã³ãã©ã¼ããã
chown hdfs:hdfs /var/lib/hadoop-hdfs sudo -u hdfs hdfs namenode -format
HDFSç³»ãµã¼ãã¹ã®èµ·å
service hadoop-hdfs-datanode start service hadoop-hdfs-namenode start service hadoop-hdfs-secondarynamenode start
é¢é£ãã£ã¬ã¯ããªä½æã¨æ¨©éè¨å®
sudo -u hdfs hadoop fs -ls -R / sudo -u hdfs hadoop fs -mkdir -p /tmp/hadoop-yarn/staging/history/done_intermediate sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn/staging sudo -u hdfs hadoop fs -chmod -R 1777 /tmp sudo -u hdfs hadoop fs -mkdir -p /var/log/hadoop-yarn sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn sudo -u hdfs hadoop fs -ls -R /
YARNç³»ãµã¼ãã¹ã®èµ·å
service hadoop-yarn-resourcemanager start service hadoop-yarn-nodemanager start service hadoop-mapreduce-historyserver start
Sparkã®ã¤ã³ã¹ãã¼ã«
Hiveã®ã¤ã³ã¹ãã¼ã«ã¯spark-shellèµ·åæã«ã¨ã©ã¼ãåºãªãå ´åã¯å¿ è¦ããã¾ããã
yum install spark-core spark-master spark-worker spark-history-server spark-python yum install hive
Sparkã®èµ·å
service spark-master start service spark-worker start
Sparkã·ã§ã«ã®èµ·åã¨ãµã³ãã«å®è¡
spark-shell > val textFile = sc.textFile("file:///usr/lib/spark/LICENSE") > textFile.count() res0: Long = 859 > val linesWithSpark = textFile.filter(line => line.contains("Spark")) > linesWithSpark.count() res1: Long = 3
çµãã
ä»åã¯ããã¾ã§ã§ãã次åã¯å¯¾è©±ç°å¢ã§ã¯ãªãå¤é¨ããã°ã©ã ã®å®è¡ãè¡ãã¾ãã