If you want, you can have Dask set up a Jupyter notebook server for you,co-located with the Dask scheduler. There are many ways to do this, but thisblog post lists two.
Sometimes people inside of large institutions have complex deployment pains.It takes them a while to stand up a process running on a machine in theircluster, with all of the appropriate networking ports open and such.In that situation, it can sometimes be nice to do this just once, say for Dask,rather than twice, say for Dask and for Jupyter.
Probably in these cases people should invest in a long term solution likeJupyterHub,or one of its enterprise variants,but this blogpost gives a couple of hacks in the meantime.
If your Dask scheduler is already running, connect to it with a Client and runa Python function that starts up a Jupyter server.
from dask.distributed import Client
client = Client("scheduler-address:8786")
def start_juptyer_server():
from notebook.notebookapp import NotebookApp
app = NotebookApp()
app.initialize([]) # add command line args here if you want
client.run_on_scheduler(start_jupyter_server)
If you have a complex networking setup (maybe you’re on the cloud or HPC andhad to open up a port explicitly) then you might want to installjupyter-server-proxy(which Dask also uses by default if installed), and then go tohttp://scheduler-address:8787/proxy/8888 . The Dask dashboard can route yourconnection to Jupyter (Jupyter is also kind enough to do the same for Dask ifit is the main service).
This is also a great opportunity to learn about the various ways of addingcustom startup and teardown.One such way, is a preload script like the following:
# jupyter-preload.py
from notebook.notebookapp import NotebookApp
def dask_setup(scheduler):
app = NotebookApp()
app.initialize([])
dask-scheduler --preload jupyter-preload.py
That script will run at an appropriate time during scheduler startup. You canalso put this into configuration
distributed:
scheduler:
preload: ["/path/to/jupyter-preload.py"]
This is mostly a hack. If you’re at an institution then you should ask forsomething like JuptyerHub.
Or, you might also want to run this in a separate subprocess, so that Jupyterand the Dask scheduler don’t collide with each other. This shouldn’t be somuch of a problem (they’re both pretty light weight), but isolating themprobably makes sense.
Thanks to Nick Bollweg, who answered a questions on this topic here