Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MultiNodeBatchNormalization #106

Merged
merged 20 commits into from
Aug 24, 2017
Merged

Conversation

iwiwi
Copy link
Contributor

@iwiwi iwiwi commented Aug 10, 2017

This PR adds a new link MultiNodeBatchNormalization. When using chainer.link.BatchNormalization, batch mean and std are computed independently for the local batch in each worker. In contrast, when using this MultiNodeBatchNormalization, workers communicate to conduct 'correct' batch normalization (e.g., obtaining mean and std for the whole global batch).

TODO

  • Documents
  • Ignore Chainer v1?

@iwiwi iwiwi added the feature label Aug 15, 2017
@iwiwi iwiwi changed the title [WIP] Add MultiNodeBatchNormalization Add MultiNodeBatchNormalization Aug 15, 2017
@shu65 shu65 self-requested a review August 16, 2017 05:21
@shu65 shu65 added this to the v1.0.0 milestone Aug 16, 2017
@iwiwi
Copy link
Contributor Author

iwiwi commented Aug 21, 2017

I talked with @beam2d -san. I will send a PR to Chainer to add hook functions to BatchNormalizationFunctions ( https://github.com/chainer/chainer/blob/master/chainer/functions/normalization/batch_normalization.py#L32 ), which can customize the computation of mean, var, ggamma and gmean.

We have two options:

  1. We pend this PR until Chainer's BatchNormalizationFunction is modified, or
  2. We merge the current code tentatively, and then improve after Chainer's BatchNormalizationFunction is modified.

@shu65
Copy link
Member

shu65 commented Aug 23, 2017

Following error occurs in unitttest when I use 1 GPU in single node.

======================================================================
FAIL: Tests correctness of MultiNodeBatchNormalization.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ssuzuki/workspace/chainermn_pr106/tests/links_tests/test_batch_normalization.py", line 156, in test_multi_node_bn
    self.assert_not_allclose(p1[1].grad, p2[1].grad)
  File "/home/ssuzuki/workspace/chainermn_pr106/tests/links_tests/test_batch_normalization.py", line 164, in assert_not_allclose
    x, y, atol=atol, rtol=rtol, verbose=verbose)
AssertionError: AssertionError not raised

----------------------------------------------------------------------

So, please check it.

@shu65 shu65 merged commit da39cb0 into master Aug 24, 2017
@shu65 shu65 deleted the distributed-batch-normalization branch August 24, 2017 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants