Submit New Event

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Submit News Feature

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Sign up for Newsletter

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Jun 22, 2019

Dask Release 2.0

By

Please take the Dask User Survey for 2019.Your reponse helps to prioritize future work.

We are pleased to announce the release of Dask version 2.0.This is a major release with bug fixes and new features.

Most major version changes of software signal many new and exciting features.That is not the case with this release.Instead, we’re bumping the major version number becausewe’ve broken a few APIs to improve maintainability,and because we decided to drop support for Python 2.

This blogpost outlines these changes.

Install

As always, you can conda install Dask:

conda install dask

or pip install from PyPI:

pip install "dask[complete]" --upgrade

Full changelogs are available here:

Drop support for Python 2

Python 2 reaches end of life in 2020, just six months away. Most major PyDataprojects are dropping Python 2 support around now. See the Python 3Statement for more details about some of yourfavorite projects.

Python 2 users can continue to use older versions of Dask, which are inwidespread use today. Institutions looking for long term support of Dask inPython 2 may wish to reach out to for-profit consulting companies, likeQuansight.

Dropping Python 2 will allow maintainers to spend more of their time fixingbugs and developing new features. It will also allow the project to adopt moremodern development practices going forward.

Small breaking changes

We now include a list with a brief description of most of the breaking changes:

  • The distributed.bokeh module has moved to distributed.dashboard
  • Various ncores keywords have been moved to nthreads
  • Client.map/gather/scatter no longer accept iterators and Python queueobjects. Users can handle this themselves with submit/as_completed orcan use the Streamz library.
  • The worker /main route has moved to /status
  • Cluster.workers is now a dictionary mapping worker name to worker, ratherthan a list as it was before

Some larger fun changes

We didn’t only break things. We also added some new things :)

Array metadata

Previously Dask Arrays were defined by their shape, chunkshape, and datatype,like float, int, and so on.

Now, Dask Arrays also know the type of their chunks. Historically this wasalmost always a NumPy array, so it didn’t make sense to store, but now thatDask Arrays are being used more frequently with sparse array chunks and GPUarray chunks we now maintain this information as well in a ._meta attribute.This is already how Dask dataframes work, so it should be familiar to advancedusers of that module.

>>> import dask.array as da
>>> x = da.eye(1000000)
>>> x._meta
array([], shape=(0, 0), dtype=float64)

>>> import sparse
>>> s = x.map_blocks(sparse.COO.from_numpy)
>>> s._meta
<COO: shape=(0, 0), dtype=float64, nnz=0, fill_value=0.0>

This work was largely done by Peter Entschev

Array HTML output

Dask arrays now print themselves nicely in Jupyter notebooks, showing a tableof information about their size and chunk size, and also a visual diagram oftheir chunk structure.

import dask.array as da
x = da.ones((10000, 1000, 1000))

Array Chunk Bytes 80.00 GB 125.00 MB Shape (10000, 1000, 1000) (250, 250, 250) Count 640 Tasks 640 Chunks Type float64 numpy.ndarray 1000100010000

Proxy Worker dashboards from the Scheduler dashboard

If you’ve used Dask.distributed they you’re probably familiar with Dask’sscheduler dashboard, which shows the state of computations on the cluster witha real-time interactive Bokeh dashboard. However you maynot be aware that Dask workers also have their own dashboard, which shows acompletely separate set of plots for the state of that individual worker.

Historically these worker dashboards haven’t been as commonly used because it’shard to connect to them. Users don’t know their address, or network rulesdon’t enable direct web connections. Fortunately, the scheduler dashboard isnow able to proxy a connection from the user to the worker dashbaord.

You can access this by clicking on the “Info” tab and then selecting the“dashboard” link next to any of the workers. You will need to also installjupyter-server-proxy

pip install jupyter-server-proxy

Thanks to Ben Zaitlen for this fun addtition.We hope that now that these plots are made more visible, people will investmore into developing plots for them.

Black everywhere

We now use the Black code formatter throughoutmost Dask repositories. These repositories include pre-commit hooks, which werecommend when developing on the project.

cd /path/to/dask
git checkout master
git pull upstream master

pip install pre-commit
pre-commit install

Git will then call black and flake8 whenever you attempt to commit code.

Dask Gateway

We would also like to inform readers about the somewhat new DaskGateway project that enablesinstitutions and IT to control many Dask clusters for a variety of users.

Dask Gateway

Acknowledgements

There have been several releases since the last time we had a release blogpost.The following people contributed to the following repositories since the 1.1.0release on January 23rd:

  • dask/dask
  • (Rick) Richard J Zamora
  • Abhinav Ralhan
  • Adam Beberg
  • Alistair Miles
  • Álvaro Abella Bascarán
  • Anderson Banihirwe
  • Aploium
  • Bart Broere
  • Benjamin Zaitlen
  • Bouwe Andela
  • Brett Naul
  • Brian Chu
  • Bruce Merry
  • Christian Hudon
  • Cody Johnson
  • Dan O’Donovan
  • Daniel Saxton
  • Daniel Severo
  • Danilo Horta
  • Dimplexion
  • Elliott Sales de Andrade
  • Endre Mark Borza
  • Genevieve Buckley
  • George Sakkis
  • Guillaume Lemaitre
  • HSR05
  • Hameer Abbasi
  • Henrique Ribeiro
  • Henry Pinkard
  • Hugo
  • Ian Bolliger
  • Ian Rose
  • Isaiah Norton
  • James Bourbeau
  • Janne Vuorela
  • John Kirkham
  • Jim Crist
  • Joe Corbett
  • Jorge Pessoa
  • Julia Signell
  • JulianWgs
  • Justin Poehnelt
  • Justin Waugh
  • Ksenia Bobrova
  • Lijo Jose
  • Marco Neumann
  • Mark Bell
  • Martin Durant
  • Matthew Rocklin
  • Michael Eaton
  • Michał Jastrzębski
  • Nathan Matare
  • Nick Becker
  • Paweł Kordek
  • Peter Andreas Entschev
  • Philipp Rudiger
  • Philipp S. Sommer
  • Roma Sokolov
  • Ross Petchler
  • Scott Sievert
  • Shyam Saladi
  • Søren Fuglede Jørgensen
  • Thomas Zilio
  • Tom Augspurger
  • Yu Feng
  • aaronfowles
  • amerkel2
  • asmith26
  • btw08
  • gregrf
  • mbarkhau
  • mcsoini
  • severo
  • tpanza
  • dask/distributed
  • Adam Beberg
  • Benjamin Zaitlen
  • Brett Jurman
  • Brett Randall
  • Brian Chu
  • Caleb
  • Chris White
  • Daniel Farrell
  • Elliott Sales de Andrade
  • George Sakkis
  • James Bourbeau
  • Jim Crist
  • John Kirkham
  • K.-Michael Aye
  • Loïc Estève
  • Magnus Nord
  • Manuel Garrido
  • Marco Neumann
  • Martin Durant
  • Mathieu Dugré
  • Matt Nicolls
  • Matthew Rocklin
  • Michael Delgado
  • Michael Spiegel
  • Muammar El Khatib
  • Nikos Tsaousis
  • Olivier Grisel
  • Peter Andreas Entschev
  • Sam Grayson
  • Scott Sievert
  • Tom Augspurger
  • Torsten Wörtwein
  • amerkel2
  • condoratberlin
  • deepthirajagopalan7
  • jukent
  • plbertrand
  • dask/dask-ml
  • Alejandro
  • Florian Rohrer
  • James Bourbeau
  • Julien Jerphanion
  • Matthew Rocklin
  • Nathan Henrie
  • Paul Vecchio
  • Ryan McCormick
  • Saadullah Amin
  • Scott Sievert
  • Sriharsha Atyam
  • Tom Augspurger
  • dask/dask-jobqueue
  • Andrea Zonca
  • Guillaume Eynard-Bontemps
  • Kyle Husmann
  • Levi Naden
  • Loïc Estève
  • Matthew Rocklin
  • Matyas Selmeci
  • ocaisa
  • dask/dask-kubernetes
  • Brian Phillips
  • Jacob Tomlinson
  • Jim Crist
  • Joe Hamman
  • Joseph Hamman
  • Matthew Rocklin
  • Tom Augspurger
  • Yuvi Panda
  • adam
  • dask/dask-examples
  • Christoph Deil
  • Genevieve Buckley
  • Ian Rose
  • Martin Durant
  • Matthew Rocklin
  • Matthias Bussonnier
  • Robert Sare
  • Tom Augspurger
  • Willi Rath
  • dask/dask-labextension
  • Daniel Bast
  • Ian Rose
  • Matthew Rocklin
  • Yuvi Panda