Avoid sending SubDataset and use broadcast for datasets #140

kuenishi · 2017-11-21T09:03:43Z

In mpi4py objects sent with Comm.send() are always pickled every time;
Thus, whole dataset is pickled at sending to each of all rank > 0
nodes, as self._dataset is included in all SubDataset [1]. Thus in
scatter_dataset function has been inefficient that took >30 seconds
when distributing ImageNet 2012 dataset, even if it was just for file
names and tags, for example, which is optimized to finish in <5
seconds in our experimental environment.

This patch changes the repeat of Comm.send() to a combination of
Comm.bcast() of whole dataset and Comm.send() of range of each
scattered dataset. With this change number of picking whole dataset
has been reduced from N to 1, where N denotes number of MPI processes.

But this fix as a drawback in error handling, especially in case
OverflowError is raised by mpi4py. Rank 0 node may handle that error
because it knows the error, while other nodes may or may not know the
error and may wait for broadcast message for ever in busy loop. Those
behavior shall depend on MPI implementation, as it is not defined in
MPI specification 3.1 [2] afaik.

To solve the second issue dataset passed to scatter_dataset() is
forcible split into chunks whose default size is 256MB. This size is
configurable, but must not be larger than max value of signed integer.
And those chunks will be sent via a course of Comm.Bcast() instead of
single Comm.bcast(). Also, chainermn.datasets.DataSizeError is
emptied.

Note that any dataset passed to scatter_dataset() is forcibly pickled
into single buffer without any streaming, and the pickling and unpickling
may consume long time proportional to the size of the dataset.

[1] https://github.com/chainer/chainer/blob/v3.1.0/chainer/datasets/sub_dataset.py#L50
[2] http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf

Note: this is a substitute and rebased version of #138.

In mpi4py objects sent with Comm.send() are always pickled every time; Thus, whole dataset is pickled at sending to each of all rank > 0 nodes, as self._dataset is included in all SubDataset [1]. Thus in scatter_dataset function has been inefficient that took >30 seconds when distributing ImageNet 2012 dataset, even if it was just for file names and tags, for example, which is optimized to finish in <5 seconds in our experimental environment. This patch changes the repeat of Comm.send() to a combination of Comm.bcast() of whole dataset and Comm.send() of range of each scattered dataset. With this change number of picking whole dataset has been reduced from N to 1, where N denotes number of MPI processes. But this fix as a drawback in error handling, especially in case OverflowError is raised by mpi4py. Rank 0 node may handle that error because it knows the error, while other nodes may or may not know the error and may wait for broadcast message for ever in busy loop. Those behavior shall depend on MPI implementation, as it is not defined in MPI specification 3.1 [2] afaik. To solve the second issue dataset passed to scatter_dataset() is forcible split into chunks whose default size is 256MB. This size is configurable, but must not be larger than max value of signed integer. And those chunks will be sent via a course of Comm.Bcast() instead of single Comm.bcast(). Also, chainermn.datasets.DataSizeError is emptied. Note that any dataset passed to scatter_dataset() is forcibly pickled into single buffer without any streaming, and the pickling and unpickling may consume long time proportional to the size of the dataset. [1] https://github.com/chainer/chainer/blob/v3.1.0/chainer/datasets/sub_dataset.py#L50 [2] http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf Signed-off-by: UENISHI Kota <[email protected]>

keisukefukuda · 2017-11-30T06:28:19Z

chainermn/datasets/scatter_dataset.py

-class DataSizeError(RuntimeError):
-    def __init__(self, ds_size, pickled_size):
-        msg = """The dataset was too large to be scattered using MPI.
+class DataSizeError(object):


In my environment of Python 3.6.0, the following error is raised in tset_dataset.py.

====================================================================== ERROR: test_scatter_large_dataset (test_dataset.TestDataset) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/kfukuda/chainermn/tests/test_dataset.py", line 102, in test_scatter_large_dataset lambda: self.scatter_large_data(comm_type)) File "/home/kfukuda/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/unittest/case.py", line 728, in assertRaises return context.handle('assertRaises', args, kwargs) File "/home/kfukuda/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/unittest/case.py", line 158, in handle (name, self._base_type_str)) TypeError: assertRaises() arg 1 must be an exception type or tuple of exception types

I guess this is because DataSizeError is not inherited from RuntimeError or other exception classes.
I'm not sure if it is a version-dependent error, but could you look at it ?

This is not because DataSizeError but the test does not fail any more! I pushed a test fix. Also I Changed DataSizeError to inherit RuntimeError which is trivial. It's left for applications that has except DataSizeError: clause which is not called any more.

kuenishi · 2017-12-01T03:10:19Z

Fixed, updated and all tests have passed!

kuenishi mentioned this pull request Nov 21, 2017

[WIP] Use bcast+send instead of distributing SubDataset with just send #138

Closed

kuenishi requested a review from keisukefukuda November 21, 2017 09:04

kuenishi self-assigned this Nov 21, 2017

kuenishi and others added 2 commits November 24, 2017 20:02

Merge branch 'master' into bcast-scatter-data2

35a3fc6

keisukefukuda requested changes Nov 30, 2017

View reviewed changes

kuenishi and others added 2 commits December 1, 2017 10:44

Tests that don't fail any more!

fdc69a1

Merge branch 'master' into bcast-scatter-data2

96fae3a

keisukefukuda approved these changes Dec 4, 2017

View reviewed changes

Merge branch 'master' into bcast-scatter-data2

158201e

keisukefukuda merged commit bd1ab1d into chainer:master Dec 4, 2017

kuenishi deleted the bcast-scatter-data2 branch December 4, 2017 09:35

iwiwi added this to the v1.1.0 milestone Dec 13, 2017

kuenishi added the enhancement label Dec 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid sending SubDataset and use broadcast for datasets #140

Avoid sending SubDataset and use broadcast for datasets #140

kuenishi commented Nov 21, 2017

keisukefukuda Nov 30, 2017

kuenishi Nov 30, 2017

kuenishi commented Dec 1, 2017

Avoid sending SubDataset and use broadcast for datasets #140

Avoid sending SubDataset and use broadcast for datasets #140

Conversation

kuenishi commented Nov 21, 2017

keisukefukuda Nov 30, 2017

Choose a reason for hiding this comment

kuenishi Nov 30, 2017

Choose a reason for hiding this comment

kuenishi commented Dec 1, 2017