faq How to run Hadoop free Spark in PadoGrid

How to run Hadoop-free Spark in PadoGrid? I want to include my own Hadoop. Where do I set `SPARK_DIST_CLASSPATH`?

First, make sure you have Hadoop installed. You can install Hadoop in PadoGrid using install_padogrid as follows.

# Download and install Hadoop
install_padogrid -product hadoop

# Update the workspace with the desired Hadoop version
update_product -product hadoop

There are two (2) ways to include the Hadoop class path to Spark in PadoGrid.

1. `bin_sh/setenv.sh`

Edit bin_sh/setenv.sh file.

switch_cluster myspark
vi bin_sh/setenv.sh

In bin_sh/setenv.sh, uncomment the following line. HADOOP_HOME is set by PadoGrid when you updated the workspace with update_product. See above.

CLASSPATH="$CLASSPATH:$($HADOOP_HOME/bin/hadoop classpath)"

2. `spark-env.sh`

Another way is to add SPARK_DIST_CLASSPATH in spark-env.sh as described in the Spark documentation [1]. This overrides bin_sh/setenv.sh.

Edit etc/spark-env.sh file.

switch_cluster myspark
vi etc/spark-env.sh

In etc/spark-env.sh, add the following line. HADOOP_HOME is set by PadoGrid when you updated the workspace with update_product. See above.

export SPARK_DIST_CLASSPATH=$CLASSPATH:$($HADOOP_HOME/bin/hadoop classpath)

✏️ CLASSPATH set by PadoGrid includes your cluster and workspace libraries in their plugins and lib directories. You can drop in your jar files in any of these directories to be part of the Spark class path.

References

Using Spark's "Hadoop Free" Build, https://spark.apache.org/docs/latest/hadoop-provided.html

PadoGrid Manual

Overview

Operations

Tools

Platforms

Clouds

PadoGrid on OCI Compute

Pado

Geode/GemFire

Hazelcast/Jet

ComputeDB/SnappyData

Coherence

Hadoop

Hadoop CLASSPATH

Kafka/Confluent

Mosquitto

Redis

Spark

Spark CLASSPATH

Explore

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

faq How to run Hadoop free Spark in PadoGrid

How to run Hadoop-free Spark in PadoGrid? I want to include my own Hadoop. Where do I set `SPARK_DIST_CLASSPATH`?

1. `bin_sh/setenv.sh`

2. `spark-env.sh`

References

Clone this wiki locally

faq How to run Hadoop free Spark in PadoGrid

How to run Hadoop-free Spark in PadoGrid? I want to include my own Hadoop. Where do I set SPARK_DIST_CLASSPATH?

1. bin_sh/setenv.sh

2. spark-env.sh

References

Clone this wiki locally

How to run Hadoop-free Spark in PadoGrid? I want to include my own Hadoop. Where do I set `SPARK_DIST_CLASSPATH`?

1. `bin_sh/setenv.sh`

2. `spark-env.sh`