Description
Hi,
In reference.conf I set (and some other minor options):
# The execution modes in Sparta are: local, mesos or marathon
sparta.config.executionMode = yarn
# Yarn cluster name
sparta.yarn.master = yarn
# Cluster or Client. If the user need more than one policy running is necessary use "cluster". Is the same as the variable spark.submit.deployMode
sparta.yarn.deployMode = cluster
I have a correct workflow which can run on local mode, but after switching to yarn mode, I get below logs. It seems like sparta cannot connect with Resource Manager. Could anybody help with this issue?
02 Jul 2018 15:29:31.053 INFO c.s.s.s.c.a.ClusterLauncherActor Sparta submit options initialized correctly
02 Jul 2018 15:29:31.062 INFO c.s.s.s.c.a.ClusterLauncherActor Updating context d23359d0-de5b-4589-bb5a-236b1bde8eed with name test1:
Status: Failed ---> NotStarted
Status Information: The checker detects that the policy not start/stop correctly ---> Sparta submit options initialized correctly
Submission Id: undefined ---> undefined
Submission Status: LOST ---> LOST
Marathon Id: undefined ---> undefined
Last Error: undefined ---> undefined
Last Execution Mode: yarn-cluster ---> yarn-cluster
Resource Manager URL: undefined ---> undefined
02 Jul 2018 15:29:31.103 INFO c.s.s.s.c.a.ClusterLauncherActor Launching Sparta Job with options ...
Policy name: test1
Main Class: com.stratio.sparta.driver.SparkDriver
Driver file: http://0.0.0.0:9090/sparta/driver/driver-1.6.0-SNAPSHOT.jar
Master: yarn
Spark submit arguments: --deploy-mode -> cluster,--num-executors -> 1,--properties-file -> /etc/spark2/conf/spark-defaults.conf,--proxy-user -> hdfs
Spark configurations: spark.sql.parquet.binaryAsString -> true,spark.app.name -> test1-2018/07/02-03:29:30,spark.driver.memory -> 1G,spark.driver.cores -> 1,spark.mesos.driverEnv.SPARK_USER -> ,spark.executor.memory -> 1G,spark.executor.cores -> 1
Driver arguments: Map(plugins -> ICw=, clusterConfig -> eyJ5YXJuIjp7ImRlcGxveU1vZGUiOiJjbHVzdGVyIiwiZHJpdmVyQ29yZXMiOjEsImRyaXZlck1lbW9yeSI6IjFHIiwiZXhlY3V0b3JDb3JlcyI6MSwiZXhlY3V0b3JNZW1vcnkiOiIxRyIsImtpbGxVcmwiOiIvdjEvc3VibWlzc2lvbnMva2lsbCIsIm1hc3RlciI6Inlhcm4iLCJudW1FeGVjdXRvcnMiOjEsInByb3BlcnRpZXNGaWxlIjoiL2V0Yy9zcGFyazIvY29uZi9zcGFyay1kZWZhdWx0cy5jb25mIiwicHJveHktdXNlciI6ImhkZnMiLCJzcGFyayI6eyJzcWwiOnsicGFycXVldCI6eyJiaW5hcnlBc1N0cmluZyI6dHJ1ZX19fSwic3BhcmtIb21lIjoiL29wdC9jbG91ZGVyYS9wYXJjZWxzL1NQQVJLMi0yLjEuMC5jbG91ZGVyYTItMS5jZGg1LjcuMC5wMC4xNzE2NTgvbGliL3NwYXJrMiJ9fQ==, detailConfig -> eyJjb25maWciOnsiYWRkVGltZVRvQ2hlY2twb2ludFBhdGgiOmZhbHNlLCJhdXRvRGVsZXRlQ2hlY2twb2ludCI6dHJ1ZSwiYXdhaXRQb2xpY3lDaGFuZ2VTdGF0dXMiOiIxODBzIiwiYmFja3Vwc0xvY2F0aW9uIjoiL29wdC9zZHMvc3BhcnRhL2JhY2t1cHMiLCJjaGVja3BvaW50UGF0aCI6Ii90bXAvc3BhcnRhL2NoZWNrcG9pbnQiLCJkcml2ZXJQYWNrYWdlTG9jYXRpb24iOiIvb3B0L3Nkcy9zcGFydGEvZHJpdmVyIiwiZHJpdmVyVVJJIjoiaHR0cDovLzAuMC4wLjA6OTA5MC9zcGFydGEvZHJpdmVyL2RyaXZlci0xLjYuMC1TTkFQU0hPVC5qYXIiLCJleGVjdXRpb25Nb2RlIjoieWFybiIsImZyb250ZW5kIjp7InRpbWVvdXQiOjUwMDB9LCJwbHVnaW5QYWNrYWdlTG9jYXRpb24iOiIvb3B0L3Nkcy9zcGFydGEvcGx1Z2lucyIsInJlbWVtYmVyUGFydGl0aW9uZXIiOnRydWV9fQ==, storageConfig -> IA==, policyId -> d23359d0-de5b-4589-bb5a-236b1bde8eed, zookeeperConfig -> eyJ6b29rZWVwZXIiOnsiY29ubmVjdGlvblN0cmluZyI6IjEwLjAuMTEuMjI6MjE4MSwxMC4wLjExLjMwOjIxODEsMTAuMC4xMS4zMToyMTgxIiwiY29ubmVjdGlvblRpbWVvdXQiOjE1MDAwLCJyZXRyeUF0dGVtcHRzIjo1LCJyZXRyeUludGVydmFsIjoxMDAwMCwic2Vzc2lvblRpbWVvdXQiOjYwMDAwfX0=)
02 Jul 2018 15:29:31.128 INFO c.s.s.s.c.a.ClusterLauncherActor Sparta cluster job launched correctly
02 Jul 2018 15:29:31.131 INFO c.s.s.s.c.a.ClusterLauncherActor Updating context d23359d0-de5b-4589-bb5a-236b1bde8eed with name test1:
Status: NotStarted ---> Launched
Status Information: Sparta submit options initialized correctly ---> Sparta cluster job launched correctly
Submission Id: undefined ---> undefined
Submission Status: LOST ---> UNKNOWN
Marathon Id: undefined ---> undefined
Last Error: undefined ---> undefined
Last Execution Mode: yarn-cluster ---> yarn-cluster
Resource Manager URL: undefined ---> undefined
02 Jul 2018 15:29:31.205 INFO c.s.s.s.c.a.ClusterLauncherActor Cluster context listener added to test1 with id: d23359d0-de5b-4589-bb5a-236b1bde8eed
02 Jul 2018 15:29:31.218 INFO c.s.s.s.c.a.ClusterLauncherActor Starting scheduler task in awaitPolicyChangeStatus with time: 180s
02 Jul 2018 15:29:33.764 INFO c.s.s.s.c.a.ClusterLauncherActor Submission state changed to ... CONNECTED
02 Jul 2018 15:29:33.767 INFO c.s.s.s.c.a.ClusterLauncherActor Updating context d23359d0-de5b-4589-bb5a-236b1bde8eed with name test1:
Status: Launched ---> Launched
Status Information: Sparta cluster job launched correctly ---> Sparta cluster job launched correctly
Submission Id: undefined ---> undefined
Submission Status: UNKNOWN ---> CONNECTED
Marathon Id: undefined ---> undefined
Last Error: undefined ---> undefined
Last Execution Mode: yarn-cluster ---> yarn-cluster
Resource Manager URL: undefined ---> undefined
02 Jul 2018 15:29:34.299 INFO c.s.s.s.c.a.ClusterLauncherActor Submission state changed to ... LOST
02 Jul 2018 15:29:34.301 INFO c.s.s.s.c.a.ClusterLauncherActor Updating context d23359d0-de5b-4589-bb5a-236b1bde8eed with name test1:
Status: Launched ---> Launched
Status Information: Sparta cluster job launched correctly ---> Sparta cluster job launched correctly
Submission Id: undefined ---> undefined
Submission Status: CONNECTED ---> LOST
Marathon Id: undefined ---> undefined
Last Error: undefined ---> undefined
Last Execution Mode: yarn-cluster ---> yarn-cluster
Resource Manager URL: undefined ---> undefined
02 Jul 2018 15:29:51.657 INFO c.s.s.s.core.actor.StatusActor Updating context d23359d0-de5b-4589-bb5a-236b1bde8eed with name test1:
Status: Launched ---> Stopping
Status Information: Sparta cluster job launched correctly ---> Sparta cluster job launched correctly
Submission Id: undefined ---> undefined
Submission Status: LOST ---> LOST
Marathon Id: undefined ---> undefined
Last Error: undefined ---> undefined
Last Execution Mode: yarn-cluster ---> yarn-cluster
Resource Manager URL: undefined ---> undefined
02 Jul 2018 15:29:51.678 INFO c.s.s.s.c.a.ClusterLauncherActor Stopping message received from Zookeeper
02 Jul 2018 15:29:51.678 INFO c.s.s.s.c.a.ClusterLauncherActor The Sparta System don't have submission id associated to policy test1
02 Jul 2018 15:29:51.679 INFO c.s.s.s.c.a.ClusterLauncherActor Node cache to cluster context listener closed correctly