Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid sending SubDataset and use broadcast for datasets #140

Merged
merged 5 commits into from
Dec 4, 2017
Merged

Avoid sending SubDataset and use broadcast for datasets #140

merged 5 commits into from
Dec 4, 2017

Conversation

kuenishi
Copy link
Member

In mpi4py objects sent with Comm.send() are always pickled every time;
Thus, whole dataset is pickled at sending to each of all rank > 0
nodes, as self._dataset is included in all SubDataset [1]. Thus in
scatter_dataset function has been inefficient that took >30 seconds
when distributing ImageNet 2012 dataset, even if it was just for file
names and tags, for example, which is optimized to finish in <5
seconds in our experimental environment.

This patch changes the repeat of Comm.send() to a combination of
Comm.bcast() of whole dataset and Comm.send() of range of each
scattered dataset. With this change number of picking whole dataset
has been reduced from N to 1, where N denotes number of MPI processes.

But this fix as a drawback in error handling, especially in case
OverflowError is raised by mpi4py. Rank 0 node may handle that error
because it knows the error, while other nodes may or may not know the
error and may wait for broadcast message for ever in busy loop. Those
behavior shall depend on MPI implementation, as it is not defined in
MPI specification 3.1 [2] afaik.

To solve the second issue dataset passed to scatter_dataset() is
forcible split into chunks whose default size is 256MB. This size is
configurable, but must not be larger than max value of signed integer.
And those chunks will be sent via a course of Comm.Bcast() instead of
single Comm.bcast(). Also, chainermn.datasets.DataSizeError is
emptied.

Note that any dataset passed to scatter_dataset() is forcibly pickled
into single buffer without any streaming, and the pickling and unpickling
may consume long time proportional to the size of the dataset.

[1] https://github.com/chainer/chainer/blob/v3.1.0/chainer/datasets/sub_dataset.py#L50
[2] http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf

Note: this is a substitute and rebased version of #138.

kuenishi and others added 2 commits November 24, 2017 20:02
In mpi4py objects sent with Comm.send() are always pickled every time;
Thus, whole dataset is pickled at sending to each of all rank > 0
nodes, as self._dataset is included in all SubDataset [1]. Thus in
scatter_dataset function has been inefficient that took >30 seconds
when distributing ImageNet 2012 dataset, even if it was just for file
names and tags, for example, which is optimized to finish in <5
seconds in our experimental environment.

This patch changes the repeat of Comm.send() to a combination of
Comm.bcast() of whole dataset and Comm.send() of range of each
scattered dataset. With this change number of picking whole dataset
has been reduced from N to 1, where N denotes number of MPI processes.

But this fix as a drawback in error handling, especially in case
OverflowError is raised by mpi4py. Rank 0 node may handle that error
because it knows the error, while other nodes may or may not know the
error and may wait for broadcast message for ever in busy loop. Those
behavior shall depend on MPI implementation, as it is not defined in
MPI specification 3.1 [2] afaik.

To solve the second issue dataset passed to scatter_dataset() is
forcible split into chunks whose default size is 256MB. This size is
configurable, but must not be larger than max value of signed integer.
And those chunks will be sent via a course of Comm.Bcast() instead of
single Comm.bcast(). Also, chainermn.datasets.DataSizeError is
emptied.

Note that any dataset passed to scatter_dataset() is forcibly pickled
into single buffer without any streaming, and the pickling and unpickling
may consume long time proportional to the size of the dataset.

[1] https://github.com/chainer/chainer/blob/v3.1.0/chainer/datasets/sub_dataset.py#L50
[2] http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf

Signed-off-by: UENISHI Kota <[email protected]>
class DataSizeError(RuntimeError):
def __init__(self, ds_size, pickled_size):
msg = """The dataset was too large to be scattered using MPI.
class DataSizeError(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my environment of Python 3.6.0, the following error is raised in tset_dataset.py.

======================================================================
ERROR: test_scatter_large_dataset (test_dataset.TestDataset)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/kfukuda/chainermn/tests/test_dataset.py", line 102, in test_scatter_large_dataset
    lambda: self.scatter_large_data(comm_type))
  File "/home/kfukuda/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/unittest/case.py", line 728, in assertRaises
    return context.handle('assertRaises', args, kwargs)
  File "/home/kfukuda/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/unittest/case.py", line 158, in handle
    (name, self._base_type_str))
TypeError: assertRaises() arg 1 must be an exception type or tuple of exception types

I guess this is because DataSizeError is not inherited from RuntimeError or other exception classes.
I'm not sure if it is a version-dependent error, but could you look at it ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not because DataSizeError but the test does not fail any more! I pushed a test fix. Also I Changed DataSizeError to inherit RuntimeError which is trivial. It's left for applications that has except DataSizeError: clause which is not called any more.

@kuenishi
Copy link
Member Author

kuenishi commented Dec 1, 2017

Fixed, updated and all tests have passed!

@keisukefukuda keisukefukuda merged commit bd1ab1d into chainer:master Dec 4, 2017
@kuenishi kuenishi deleted the bcast-scatter-data2 branch December 4, 2017 09:35
@iwiwi iwiwi added this to the v1.1.0 milestone Dec 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants