-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid sending SubDataset and use broadcast for datasets #140
Conversation
In mpi4py objects sent with Comm.send() are always pickled every time; Thus, whole dataset is pickled at sending to each of all rank > 0 nodes, as self._dataset is included in all SubDataset [1]. Thus in scatter_dataset function has been inefficient that took >30 seconds when distributing ImageNet 2012 dataset, even if it was just for file names and tags, for example, which is optimized to finish in <5 seconds in our experimental environment. This patch changes the repeat of Comm.send() to a combination of Comm.bcast() of whole dataset and Comm.send() of range of each scattered dataset. With this change number of picking whole dataset has been reduced from N to 1, where N denotes number of MPI processes. But this fix as a drawback in error handling, especially in case OverflowError is raised by mpi4py. Rank 0 node may handle that error because it knows the error, while other nodes may or may not know the error and may wait for broadcast message for ever in busy loop. Those behavior shall depend on MPI implementation, as it is not defined in MPI specification 3.1 [2] afaik. To solve the second issue dataset passed to scatter_dataset() is forcible split into chunks whose default size is 256MB. This size is configurable, but must not be larger than max value of signed integer. And those chunks will be sent via a course of Comm.Bcast() instead of single Comm.bcast(). Also, chainermn.datasets.DataSizeError is emptied. Note that any dataset passed to scatter_dataset() is forcibly pickled into single buffer without any streaming, and the pickling and unpickling may consume long time proportional to the size of the dataset. [1] https://github.com/chainer/chainer/blob/v3.1.0/chainer/datasets/sub_dataset.py#L50 [2] http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf Signed-off-by: UENISHI Kota <[email protected]>
class DataSizeError(RuntimeError): | ||
def __init__(self, ds_size, pickled_size): | ||
msg = """The dataset was too large to be scattered using MPI. | ||
class DataSizeError(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my environment of Python 3.6.0, the following error is raised in tset_dataset.py.
======================================================================
ERROR: test_scatter_large_dataset (test_dataset.TestDataset)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/kfukuda/chainermn/tests/test_dataset.py", line 102, in test_scatter_large_dataset
lambda: self.scatter_large_data(comm_type))
File "/home/kfukuda/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/unittest/case.py", line 728, in assertRaises
return context.handle('assertRaises', args, kwargs)
File "/home/kfukuda/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/unittest/case.py", line 158, in handle
(name, self._base_type_str))
TypeError: assertRaises() arg 1 must be an exception type or tuple of exception types
I guess this is because DataSizeError is not inherited from RuntimeError or other exception classes.
I'm not sure if it is a version-dependent error, but could you look at it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not because DataSizeError but the test does not fail any more! I pushed a test fix. Also I Changed DataSizeError to inherit RuntimeError which is trivial. It's left for applications that has except DataSizeError:
clause which is not called any more.
Fixed, updated and all tests have passed! |
In mpi4py objects sent with Comm.send() are always pickled every time;
Thus, whole dataset is pickled at sending to each of all rank > 0
nodes, as self._dataset is included in all SubDataset [1]. Thus in
scatter_dataset function has been inefficient that took >30 seconds
when distributing ImageNet 2012 dataset, even if it was just for file
names and tags, for example, which is optimized to finish in <5
seconds in our experimental environment.
This patch changes the repeat of Comm.send() to a combination of
Comm.bcast() of whole dataset and Comm.send() of range of each
scattered dataset. With this change number of picking whole dataset
has been reduced from N to 1, where N denotes number of MPI processes.
But this fix as a drawback in error handling, especially in case
OverflowError is raised by mpi4py. Rank 0 node may handle that error
because it knows the error, while other nodes may or may not know the
error and may wait for broadcast message for ever in busy loop. Those
behavior shall depend on MPI implementation, as it is not defined in
MPI specification 3.1 [2] afaik.
To solve the second issue dataset passed to scatter_dataset() is
forcible split into chunks whose default size is 256MB. This size is
configurable, but must not be larger than max value of signed integer.
And those chunks will be sent via a course of Comm.Bcast() instead of
single Comm.bcast(). Also, chainermn.datasets.DataSizeError is
emptied.
Note that any dataset passed to scatter_dataset() is forcibly pickled
into single buffer without any streaming, and the pickling and unpickling
may consume long time proportional to the size of the dataset.
[1] https://github.com/chainer/chainer/blob/v3.1.0/chainer/datasets/sub_dataset.py#L50
[2] http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf
Note: this is a substitute and rebased version of #138.