Spark SQL with Hadoop Hiveã«ããã
Sparkã®ãã«ããç°¡åã«æåããã®ã§ï¼èª¿åã«ä¹ãã¾ããï¼
Sparkのビルドが簡単すぎた件について - なぜか数学者にはワイン好きが多い
Spark-shellã使ã£ã¦æ¢ã«ãã¼ã¿ãHDFSä¸ã«å¤§éã«èç©ããã¦ããHiveã«ã¢ã¯ã»ã¹ãã¦ã¿ããã¨ãã¦ãããã¾ããï¼
scala> val hiveContext=new org.apache.spark.sql.hive.HiveContext(sc) <console>:12: error: object hive is not a member of package org.apache.spark.sql val hiveContext=new org.apache.spark.sql.hive.HiveContext(sc)
org.apache.spark.sql.hiveãç¡ãã ã¨ï¼ï¼ï¼ï¼ã¨æã£ã¦spark-assembly-1.0.0-hadoop2.0.0-mr1-cdh4.2.0.jarã®ä¸ãè¦ã¦ã¿ãã¨ï¼ç¢ºãã«ç¡ãã§ãï¼
ããããã°ï¼ã¨æã£ã¦Spark SQL and DataFrames - Spark 2.3.2 Documentationãè¦ã¦ã¿ãã¨ï¼
Spark SQL also supports reading and writing data stored in Apache Hive. However, since Hive has a large number of dependencies, it is not included in the default Spark assembly. In order to use Hive you must first run âSPARK_HIVE=true sbt/sbt assembly/assemblyâ (or use -Phive for maven).
ã¨æ¸ãã¦ããã¾ããï¼ããã§ï¼ååï¼
> mvn -Pyarn-alpha -Dhadoop.version=2.0.0-cdh4.4.0 -DskipTests clean package
ã¨ä½ã£ããã®ãï¼æ¹ãã¦
> time mvn -Phive -Pyarn-alpha -Dhadoop.version=2.0.0-cdh4.4.0 -DskipTests clean package
ã§ä½ããªããã¾ããï¼
ç¹ã«ãã©ãã«ããªãï¼
[INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM .......................... SUCCESS [3.259s] [INFO] Spark Project Core ................................ SUCCESS [3:32.873s] [INFO] Spark Project Bagel ............................... SUCCESS [26.504s] [INFO] Spark Project GraphX .............................. SUCCESS [1:18.451s] [INFO] Spark Project ML Library .......................... SUCCESS [1:31.125s] [INFO] Spark Project Streaming ........................... SUCCESS [1:41.268s] [INFO] Spark Project Tools ............................... SUCCESS [15.847s] [INFO] Spark Project Catalyst ............................ SUCCESS [1:36.652s] [INFO] Spark Project SQL ................................. SUCCESS [1:23.676s] [INFO] Spark Project Hive ................................ SUCCESS [1:40.685s] [INFO] Spark Project REPL ................................ SUCCESS [47.597s] [INFO] Spark Project YARN Parent POM ..................... SUCCESS [1.770s] [INFO] Spark Project YARN Alpha API ...................... SUCCESS [39.965s] [INFO] Spark Project Assembly ............................ SUCCESS [34.985s] [INFO] Spark Project External Twitter .................... SUCCESS [22.567s] [INFO] Spark Project External Kafka ...................... SUCCESS [24.920s] [INFO] Spark Project External Flume ...................... SUCCESS [26.224s] [INFO] Spark Project External ZeroMQ ..................... SUCCESS [24.192s] [INFO] Spark Project External MQTT ....................... SUCCESS [21.597s] [INFO] Spark Project Examples ............................ SUCCESS [1:03.254s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 18:38.366s [INFO] Finished at: Mon Jul 07 12:56:23 JST 2014 [INFO] Final Memory: 57M/903M [INFO] ------------------------------------------------------------------------ real 18m40.737s user 44m34.005s sys 0m24.722s
ãããã§ç¡äºãã«ãçµäºï¼
Hadoopã®YARNå©ç¨ã§å®è¡ãã¦ã¿ã¾ããï¼
> ./bin/spark-shell --master yarn-client Spark assembly has been built with Hive, including Datanucleus jars on classpath 14/07/07 13:00:37 INFO spark.SecurityManager: Changing view acls to: hadoop 14/07/07 13:00:37 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop) 14/07/07 13:00:37 INFO spark.HttpServer: Starting HTTP Server 14/07/07 13:00:37 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/07/07 13:00:37 INFO server.AbstractConnector: Started [email protected]:43672 Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.0.0 /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45) scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) hiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@4a984a6f scala> import hiveContext._ import hiveContext._ scala> hql("SHOW TABLES") â»ï¼å¤§éã®ã¨ã©ã¼ãã°ï¼ 14/07/07 13:03:38 ERROR hive.HiveContext: ====================== HIVE FAILURE OUTPUT ====================== FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient ====================== END HIVE FAILURE OUTPUT ====================== org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
ããåãããªãã£ãã®ã§ããï¼è¨å®ã足ããªãã®ãã¨æã£ã¦ã¾ãããã¥ã¡ã³ããèªãã§ã¿ã¾ããï¼
Configuration of Hive is done by placing your hive-site.xml file in conf/.
Hiveã®ã¤ã³ã¹ãã¼ã«ãã£ã¬ã¯ããªã¨å¥ã«ï¼Sparkã®ã¤ã³ã¹ãã¼ã«ãã£ã¬ã¯ããªã®conf以ä¸ã«hive-site.xmlãå¿
è¦ï¼ï¼ï¼ï¼
ãã©ã¤ï¼
> cp -pv /usr/local/hive/conf/hive-site.xml conf/ `/usr/local/hive/conf/hive-site.xml' -> `conf/hive-site.xml'
çµæçã«ã¯ï¼å¤ãããï¼ï¼ï¼
â»ãã ãï¼ããã¯å¿
è¦ãªè¨å®ã§ãã
scala> hql("SHOW TABLES") 14/07/07 13:28:00 INFO parse.ParseDriver: Parsing command: SHOW TABLES 14/07/07 13:28:00 INFO parse.ParseDriver: Parse Completed ï¼ä¸ç¥ï¼ 14/07/07 13:28:01 ERROR exec.DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient ï¼ä¸ç¥ï¼ Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BoneCP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver. ï¼ä¸ç¥ï¼ 14/07/07 13:28:01 ERROR hive.HiveContext: ====================== HIVE FAILURE OUTPUT ====================== FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient ====================== END HIVE FAILURE OUTPUT ======================
mysql-connectorã®jarã®å ´æãåãã£ã¦ãªããï¼ï¼ï¼ã¨ãããã¨ã§ï¼ããããããªãã·ã§ã³ãæå®ãã¦ã¿ã¾ããï¼
> ./bin/spark-shell --master yarn-client --jars /usr/local/hive/lib/mysql-connector-java.jar Spark assembly has been built with Hive, including Datanucleus jars on classpath 14/07/07 16:24:09 INFO server.AbstractConnector: Started [email protected]:42755 Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.0.0 /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45) Type in expressions to have them evaluated. Type :help for more information. ï¼ä¸ç¥ï¼ 14/07/07 16:24:16 INFO spark.SparkContext: Added JAR file:/usr/local/hive/lib/mysql-connector-java.jar at http://192.168.26.25:55431/jars/mysql-connector-java.jar with timestamp 140471 7856329 scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) ï¼ä¸ç¥ï¼ hiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@ea51a00 scala> import hiveContext._ scala> hql("SHOW TABLES") Caused by: java.sql.SQLException: Unable to open a test connection to the given database. JDBC url = jdbc:mysql://mymetastoreserver/metastore?createDatabaseIfNotExist=true, username = hive. Terminating connection pool. Original Exception: ------ java.sql.SQLException: No suitable driver found for jdbc:mysql://mynamenode/metastore?createDatabaseIfNotExist=true
ããã¯è¨³ãåããã¾ããã§ããï¼JDBC urlã®å 容ãããããå¤ãã¦ããã¡ï¼
æçµçãªåçï¼
> ./bin/spark-shell --master yarn-client --jars /usr/local/hive/lib/mysql-connector-java.jar
ãããªãã¦ï¼
> ./bin/spark-shell --master yarn-client --driver-class-path /usr/local/hive/lib/mysql-connector-java.jar
ãæ£è§£ï¼
yarn-clientã¢ã¼ãã®æã¯ï¼ã³ãã³ããå©ãããã·ã³ã§ãã©ã¤ããå®è¡ãããã®ã§ï¼--driver-class-pathã§javaã«ç´æ¥jarã®å ´æãæãã¦ãããªãã¨ãã¡ãªããã§ãï¼yarn-clusterã¢ã¼ãã®æã¯ï¼--jarsã§å
±æããæãã§ï¼