Please take the Dask User Survey for 2019.Your reponse helps to prioritize future work.
We are pleased to announce the release of Dask version 2.0.This is a major release with bug fixes and new features.
Most major version changes of software signal many new and exciting features.That is not the case with this release.Instead, we’re bumping the major version number becausewe’ve broken a few APIs to improve maintainability,and because we decided to drop support for Python 2.
This blogpost outlines these changes.
As always, you can conda install Dask:
conda install dask
or pip install from PyPI:
pip install "dask[complete]" --upgrade
Full changelogs are available here:
Python 2 reaches end of life in 2020, just six months away. Most major PyDataprojects are dropping Python 2 support around now. See the Python 3Statement for more details about some of yourfavorite projects.
Python 2 users can continue to use older versions of Dask, which are inwidespread use today. Institutions looking for long term support of Dask inPython 2 may wish to reach out to for-profit consulting companies, likeQuansight.
Dropping Python 2 will allow maintainers to spend more of their time fixingbugs and developing new features. It will also allow the project to adopt moremodern development practices going forward.
We now include a list with a brief description of most of the breaking changes:
We didn’t only break things. We also added some new things :)
Previously Dask Arrays were defined by their shape, chunkshape, and datatype,like float, int, and so on.
Now, Dask Arrays also know the type of their chunks. Historically this wasalmost always a NumPy array, so it didn’t make sense to store, but now thatDask Arrays are being used more frequently with sparse array chunks and GPUarray chunks we now maintain this information as well in a ._meta attribute.This is already how Dask dataframes work, so it should be familiar to advancedusers of that module.
>>> import dask.array as da
>>> x = da.eye(1000000)
>>> x._meta
array([], shape=(0, 0), dtype=float64)
>>> import sparse
>>> s = x.map_blocks(sparse.COO.from_numpy)
>>> s._meta
<COO: shape=(0, 0), dtype=float64, nnz=0, fill_value=0.0>
This work was largely done by Peter Entschev
Dask arrays now print themselves nicely in Jupyter notebooks, showing a tableof information about their size and chunk size, and also a visual diagram oftheir chunk structure.
import dask.array as da
x = da.ones((10000, 1000, 1000))
Array Chunk Bytes 80.00 GB 125.00 MB Shape (10000, 1000, 1000) (250, 250, 250) Count 640 Tasks 640 Chunks Type float64 numpy.ndarray 1000100010000
If you’ve used Dask.distributed they you’re probably familiar with Dask’sscheduler dashboard, which shows the state of computations on the cluster witha real-time interactive Bokeh dashboard. However you maynot be aware that Dask workers also have their own dashboard, which shows acompletely separate set of plots for the state of that individual worker.
Historically these worker dashboards haven’t been as commonly used because it’shard to connect to them. Users don’t know their address, or network rulesdon’t enable direct web connections. Fortunately, the scheduler dashboard isnow able to proxy a connection from the user to the worker dashbaord.
You can access this by clicking on the “Info” tab and then selecting the“dashboard” link next to any of the workers. You will need to also installjupyter-server-proxy
pip install jupyter-server-proxy
Thanks to Ben Zaitlen for this fun addtition.We hope that now that these plots are made more visible, people will investmore into developing plots for them.
We now use the Black code formatter throughoutmost Dask repositories. These repositories include pre-commit hooks, which werecommend when developing on the project.
cd /path/to/dask
git checkout master
git pull upstream master
pip install pre-commit
pre-commit install
Git will then call black and flake8 whenever you attempt to commit code.
We would also like to inform readers about the somewhat new DaskGateway project that enablesinstitutions and IT to control many Dask clusters for a variety of users.
There have been several releases since the last time we had a release blogpost.The following people contributed to the following repositories since the 1.1.0release on January 23rd: