We look at how to create a basic image segmentation pipeline, using the dask-image library.
Bonus content: using arrays on GPU
The content of this blog post originally appeared as a conference talk in 2020.
If you want to run this yourself, you’ll need to download the example data from the Broad Bioimage Benchmark Collection: https://bbbc.broadinstitute.org/BBBC039
And install these requirements:
Here’s our full pipeline:
You can keep reading for a step by step walkthrough of this image segmentation pipeline, or you can skip ahead to the sections on custom functions, scaling up computation, or GPU acceleration.
Our basic image segmentation pipeline has five steps:
Before we begin, we’ll need to set up our python virtual environment.
At a minimum, you’ll need:
Optionally, you can also install the napari image viewer to visualize the image segmentation.
To use napari from IPython or jupyter, run the %gui qt magic in a cell before calling napari. See the napari documentation for more details.
We’ll use the publically available image dataset BBBC039 Caicedo et al. 2018, available from the Broad Bioimage Benchmark Collection Ljosa et al., Nature Methods, 2012. You can download the dataset here: https://bbbc.broadinstitute.org/BBBC039
These are fluorescence microscopy images, where we see the nuclei in individual cells.
Step one in our image segmentation pipeline is to read in the image data. We can do that with the dask-image imread function.
We pass the path to the folder full of *.tif images from our example data.
By default, each individual .tif file on disk has become one chunk in our Dask array.
Denoising images with a small amount of blur can improve segmentation later on. This is a common first step in a lot of image segmentation pipelines. We can do this with the dask-image gaussian_filter function.
Next, we want to separate the objects in our images from the background. There are lots of different ways we could do this. Because we have fluorescent microscopy images, we’ll use a thresholding method.
We could set an absolute threshold value, where we’d consider pixels with intensity values below this threshold to be part of the background.
Let’s have a look at these images with the napari image viewer. First we’ll need to use the %gui qt magic:
And now we can look a the images with napari:
But there’s a problem here.
When we look at the results for different image frames, it becomes clear that there is no “one size fits all” we can use for an absolute threshold value. Some images in the dataset have quite bright backgrounds, others have fluorescent nuclei with low brightness. We’ll have to try a different kind of thresholding method.
We can improve the segmentation using a local thresholding method.
If we calculate a threshold value independently for each image frame then we can avoid the problem caused by fluctuating background intensity between frames.
The results here look much better, this is a much cleaner separation of nuclei from the background and it looks good for all the image frames.
Now that we have a binary mask from our threshold, we can clean it up a bit with some morphological operations.
Morphological operations are changes we make to the shape of structures a binary image. We’ll briefly describe some of the basic concepts here, but for a more detailed reference you can look at this excellent page of the OpenCV documentation.
Erosion is an operation where the edges of structures in a binary image are eaten away, or eroded.
Image credit: OpenCV documentation
Dilation is the opposite of an erosion. With dilation, the edges of structures in a binary image are expanded.
Image credit: OpenCV documentation
We can combine morphological operations in different ways to get useful effects.
A morphological opening operation is an erosion, followed by a dilation.
Image credit: OpenCV documentation
In the example image above, we can see the left hand side has a noisy, speckled background. If the structuring element used for the morphological operations is larger than the size of the noisy speckles, they will disappear completely in the first erosion step. Then when it is time to do the second dilation step, there’s nothing left of the noise in the background to dilate. So we have removed the noise in the background, while the major structures we are interested in (in this example, the J shape) are restored almost perfectly.
Let’s use this morphological opening trick to clean up the binary images in our segmentation pipeline.
You’ll notice here that we need to be a little bit careful about the structuring element. All our image frames are combined in a single Dask array, but we only want to apply the morphological operation independently to each frame. To do this, we sandwich the default 2D structuring element between two layers of zeros. This means the neighboring image frames have no effect on the result.
The last step in any image processing pipeline is to make some kind of measurement. We’ll turn our binary mask into a label image, and then measure the intensity and size of those objects.
For the sake of keeping the computation time in this tutorial nice and quick, we’ll measure only a small subset of the data. Let’s measure all the objects in the first three image frames (roughly 300 nuclei).
Here’s a screenshot of the label image generated from our mask.
The dask-image ndmeasure subpackage includes a number of different measurement functions. In this example, we’ll choose to measure:
What if you want to do something that isn’t included in the dask-image API? There are several options we can use to write custom functions.
The Dask array map_overlap and map_blocks are what is used to build most of the functions in dask-image. You can use them yourself too. They will apply a function to each chunk in a Dask array.
You can read more about overlapping computations here.
If you want more flexibility and fine-grained control over your computation, then you can use Dask delayed. You can get started with the Dask delayed tutorial here.
If you’re a person who does a lot of image processing in python, one tool you’re likely to already be using is scikit-image. I’d like to draw your attention to the apply_parallel function available in scikit-image. It uses map-overlap, and can be very helpful.
It’s useful not only when when you have big data, but also in cases where your data fits into memory but the computation you want to apply to the data is memory intensive. This might cause you to exceed the available RAM, and apply_parallel is great for these situations too.
When you want to scale up from a laptop onto a supercomputing cluster, you can use dask-distributed to handle that.
See the documentation here to get set up for your system.
We’ve recently been adding GPU support to dask-image.
We’re able to add GPU support using a library called CuPy. CuPy is an array library with a numpy-like API, accelerated with NVIDIA CUDA. Instead of having Dask arrays which contain numpy chunks, we can have Dask arrays containing cupy chunks instead. This blogpost explains the benefits of GPU acceleration and gives some benchmarks for computations on CPU, a single GPU, and multiple GPUs.
From dask-image version 0.6.0, there is GPU array support for four of the six subpackages:
Subpackages of dask-image that do not yet have GPU support are.
We hope to add GPU support to these in the future.
Here’s an example of an image convolution with Dask on the CPU:
And here’s the same example of an image convolution with Dask on the GPU. The only thing necessary to change is the type of input arrays.
You can’t mix arrays on the CPU and arrays on the GPU in the same computation. This is why the weights w must be a cupy array in the second example above.
Additionally, you can transfer data between the CPU and GPU. So in situations where the GPU speedup is larger than cost associated with transferring data, this may be useful to do.
Generally, we want to start our image processing by reading in data from images stored on disk. We can use the imread function with the arraytype=cupy keyword argument to do this.
Create and share your own segmentation or image processing workflows with Dask (join the current discussion on segmentation or propose a new blogpost topic here)
Contribute to adding GPU support to dask-image: https://github.com/dask/dask-image/issues/133