Submit New Event

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Submit News Feature

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Sign up for Newsletter

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Sep 13, 2019

Co-locating a Jupyter Server and Dask Scheduler

By

If you want, you can have Dask set up a Jupyter notebook server for you,co-located with the Dask scheduler. There are many ways to do this, but thisblog post lists two.

First, why would you do this?

Sometimes people inside of large institutions have complex deployment pains.It takes them a while to stand up a process running on a machine in theircluster, with all of the appropriate networking ports open and such.In that situation, it can sometimes be nice to do this just once, say for Dask,rather than twice, say for Dask and for Jupyter.

Probably in these cases people should invest in a long term solution likeJupyterHub,or one of its enterprise variants,but this blogpost gives a couple of hacks in the meantime.

Hack 1: Create a Jupyter server from a Python function call

If your Dask scheduler is already running, connect to it with a Client and runa Python function that starts up a Jupyter server.

from dask.distributed import Client

client = Client("scheduler-address:8786")

def start_juptyer_server():
from notebook.notebookapp import NotebookApp
app = NotebookApp()
app.initialize([]) # add command line args here if you want

client.run_on_scheduler(start_jupyter_server)

If you have a complex networking setup (maybe you’re on the cloud or HPC andhad to open up a port explicitly) then you might want to installjupyter-server-proxy(which Dask also uses by default if installed), and then go tohttp://scheduler-address:8787/proxy/8888 . The Dask dashboard can route yourconnection to Jupyter (Jupyter is also kind enough to do the same for Dask ifit is the main service).

Hack 2: Preload script

This is also a great opportunity to learn about the various ways of addingcustom startup and teardown.One such way, is a preload script like the following:

# jupyter-preload.py
from notebook.notebookapp import NotebookApp

def dask_setup(scheduler):
app = NotebookApp()
app.initialize([])

dask-scheduler --preload jupyter-preload.py

That script will run at an appropriate time during scheduler startup. You canalso put this into configuration

distributed:
scheduler:
preload: ["/path/to/jupyter-preload.py"]

Really though, you should use something else

This is mostly a hack. If you’re at an institution then you should ask forsomething like JuptyerHub.

Or, you might also want to run this in a separate subprocess, so that Jupyterand the Dask scheduler don’t collide with each other. This shouldn’t be somuch of a problem (they’re both pretty light weight), but isolating themprobably makes sense.

Thanks Nick!

Thanks to Nick Bollweg, who answered a questions on this topic here