-
Notifications
You must be signed in to change notification settings - Fork 1
faq How to run Hadoop free Spark in PadoGrid
How to run Hadoop-free Spark in PadoGrid? I want to include my own Hadoop. Where do I set SPARK_DIST_CLASSPATH
?
First, make sure you have Hadoop installed. You can install Hadoop in PadoGrid using install_padogrid
as follows.
# Download and install Hadoop
install_padogrid -product hadoop
# Update the workspace with the desired Hadoop version
update_product -product hadoop
There are two (2) ways to include the Hadoop class path to Spark in PadoGrid.
Edit bin_sh/setenv.sh
file.
switch_cluster myspark
vi bin_sh/setenv.sh
In bin_sh/setenv.sh
, uncomment the following line. HADOOP_HOME
is set by PadoGrid when you updated the workspace with update_product
. See above.
CLASSPATH="$CLASSPATH:$($HADOOP_HOME/bin/hadoop classpath)"
Another way is to add SPARK_DIST_CLASSPATH
in spark-env.sh
as described in the Spark documentation [1]. This overrides bin_sh/setenv.sh
.
Edit etc/spark-env.sh
file.
switch_cluster myspark
vi etc/spark-env.sh
In etc/spark-env.sh
, add the following line. HADOOP_HOME
is set by PadoGrid when you updated the workspace with update_product
. See above.
export SPARK_DIST_CLASSPATH=$CLASSPATH:$($HADOOP_HOME/bin/hadoop classpath)
✏️ CLASSPATH
set by PadoGrid includes your cluster and workspace libraries in their plugins
and lib
directories. You can drop in your jar files in any of these directories to be part of the Spark class path.
- Using Spark's "Hadoop Free" Build, https://spark.apache.org/docs/latest/hadoop-provided.html
PadoGrid Manual
Overview
- Home
- PadoGrid in 5 Minutes
- Quick Start
- Introduction
- Bundle Catalogs
- Building PadoGrid
- Supported Data Grid Products and Downloads
- PadoGrid Components
- Installing PadoGrid
- Root Workspaces Environments (RWEs)
- Initializing PadoGrid
- Bash Auto-Completion
- Viewing PadoGrid Summaries
- Updating Products
- Upgrading PadoGrid
- Migrating Workspaces
- PadoGrid Pods
- Kubernetes
- Docker
- Apps
- Software List
Operations
- Workspace Lifecycle Management
- Creating RWE
- Creating Workspace and Starting Cluster
- Managing Workspaces
- Understanding Workspaces
- Understanding Clusters
- Running Clusters
- Default Port Numbers
- Running Clusters Independent of PadoGrid
- Running Apps
- Understanding Groups
- Running Groups
- Understanding Bundles
- User Bundle Repos
- Using Bundle Templates
- Bundle Repo Guidelines
- User Bundle Catalogs
- Private Bundle Repos
- Gitea Repos
- Running Bundles in Container
- PadoGrid Addon Jars
- Understanding PadoGrid Pods
- Tested Vagrant Boxes
- VM-Enabled Pods
- Multitenancy
- Multitenancy Best Practices
- PadoGrid Configuration Files
Tools
Platforms
Clouds
Pado
Geode/GemFire
- Geode CLASSPATH
- Geode Kubernetes
- Geode Minikube
- Geode Minikube on WSL
- Geode Docker Compose
- Geode Grafana App
- Geode
perf_test
App - Geode WAN Example
- Geode Workspaces on VMs
- Geode on AWS EC2
- Reactivating Geode Workspaces on AWS EC2
Hazelcast/Jet
- Hazelcast CLASSPATH
- Creating Jet Workspace
- Configuring Hazelcast Addon
- HQL Query
- Hazelcast Kubernetes
- Hazelcast GKE
- Hazelcast Minikube
- Hazelcast Minikube on WSL
- Hazelcast Minishift/CDK
- Hazelcast OpenShift
- Hazelcast Docker Compose
- Hazelcast Desktop App
- Hazelcast Grafana App
- Hazelcast
jet_demo
App - Hazelcast
perf_test
App - Hazelcast WAN Example
- Hazelcast Workspaces on VMs
- Hazelcast on AWS EC2
- Reactivating Hazelcast Workspaces on AWS EC2
ComputeDB/SnappyData
Coherence
Hadoop
Kafka/Confluent
Mosquitto
- Mosquitto CLASSPATH
- Mosquitto Overview
- Installing/Building Mosquitto
- Clustering MQTT
- Cluster Archetypes
- Enabling Mosquitto SSL/TLS
- Mosquitto Docker Compose
- MQTT perf_test App
Redis
Spark