This initialization action installs Starburst Presto on a Dataproc cluster. This script also configures Presto to work with Apache Hive on the cluster. The master Dataproc node will be the coordinator, and all Dataproc workers will be Presto workers.
Use this initialization action to create a Dataproc cluster with Presto installed:
-
Use the
gcloud
command to create a new cluster that runs this initialization action.REGION=<region> CLUSTER_NAME=<cluster_name> gcloud dataproc clusters create ${CLUSTER_NAME} \ --region ${REGION} \ --initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/starburst-presto/presto.sh
-
Presto is configured to run on port
8080
on the cluster'a master node (you can change the port assignment in the script). To connect to the Presto web interface, create an SSH tunnel and use a SOCKS5 Proxy— see Dataproc cluster web interfaces. You can also use thepresto
command line interface on the master node.
- Update the script to specify a Presto version to install.
- You can adjust the memory settings in
jvm.config
. - By default, Presto uses HTTP port
8080
. Usepresto-port
metadata value to configure a different port, for example,--metadata presto-port=8060
. - The Hive connector is configured by default.
- Dataproc High-Availability mode
is not recommended because the coordinator is started
only on
m-0
, and other master nodes will be idle.