Chainer: A flexible framework for neural networksChainer is a powerful, flexible, and intuitive framework for neural networks.
https://chainer.org/
Wed, 29 Jun 2022 05:51:08 +0000Wed, 29 Jun 2022 05:51:08 +0000Jekyll v3.9.2Chainer/CuPy v7 release and Future of Chainer<p>Today, we would like to announce two things: the release of Chainer/CuPy v7 and the shift of development efforts for Chainer.</p>
<h2 id="chainercupy-v7">Chainer/CuPy v7</h2>
<p>We have released Chainer and CuPy v7.0.0. Changes can be found in the release notes of pre/releases. Here are some notable updates.</p>
<p>Chainer v7 (<a href="https://github.com/chainer/chainer/releases/tag/v7.0.0a1">alpha</a>, <a href="https://github.com/chainer/chainer/releases/tag/v7.0.0b1">beta1</a>, <a href="https://github.com/chainer/chainer/releases/tag/v7.0.0b2">beta2</a>, <a href="https://github.com/chainer/chainer/releases/tag/v7.0.0b3">beta3</a>, <a href="https://github.com/chainer/chainer/releases/tag/v7.0.0b4">beta4</a>, <a href="https://github.com/chainer/chainer/releases/tag/v7.0.0rc1">rc1</a>, <a href="https://github.com/chainer/chainer/releases/tag/v7.0.0">major</a>):</p>
<ul>
<li>Most features of Chainer, including ChainerMN, are now compatible with ChainerX ndarray.</li>
<li>ONNX-Chainer is integrated into Chainer.</li>
<li><code class="language-plaintext highlighter-rouge">TabularDataset</code> is added. It is a rich abstraction of columnar datasets with pandas like manipulations.</li>
<li>NHWC support added. Performance for convolutions and batch normalization greatly improved on GPUs with Tensor Core.</li>
</ul>
<p>CuPy v7 (<a href="https://github.com/cupy/cupy/releases/tag/v7.0.0a1">alpha</a>, <a href="https://github.com/cupy/cupy/releases/tag/v7.0.0b1">beta1</a>, <a href="https://github.com/cupy/cupy/releases/tag/v7.0.0b2">beta2</a>, <a href="https://github.com/cupy/cupy/releases/tag/v7.0.0b3">beta3</a>, <a href="https://github.com/cupy/cupy/releases/tag/v7.0.0b4">beta4</a>, <a href="https://github.com/cupy/cupy/releases/tag/v7.0.0rc1">rc1</a>, <a href="https://github.com/cupy/cupy/releases/tag/v7.0.0">major</a>):</p>
<ul>
<li>Support NVIDIA cuTENSOR and CUB for better performance.</li>
<li>Experimental support of ROCm. CuPy now runs on AMD GPUs.</li>
</ul>
<p>Also note that Python 2 support is dropped as <a href="https://chainer.org/announcement/2019/08/21/python2.html">announced</a>. Chainer/CuPy v7 only supports Python 3.5+.</p>
<h2 id="shift-of-development-efforts-for-chainer">Shift of Development Efforts for Chainer</h2>
<p>As <a href="https://preferred.jp/en/news/pr20191205/">announced today</a>, Preferred Networks, the company behind Chainer, is changing its primary framework to PyTorch. We expect that Chainer v7 will be the last major release for Chainer, and further development will be limited to bug-fixes and maintenance. The Chainer family products (ChainerCV, Chainer Chemistry, ChainerUI, and ChainerRL) will also follow this policy.</p>
<p>CuPy will continue its development as before. Although developed as a GPU backend for Chainer, it has been widely adopted by different communities and is relatively unique in accelerating computation with GPUs using NumPy syntax.</p>
<h3 id="background">Background</h3>
<p>This decision has been made after serious considerations based on the mission of the Chainer team: <em>speeding up research and development of deep learning and its applications.</em> With the introduction of Chainer in 2015, we proposed an imperative API set for the differentiable programming paradigm that we named <em>define-by-run</em>. It is now often called <em>eager</em> execution. The define-by-run approach was originally motivated by structured networks for natural language processing such as recurrent neural networks (RNN) and brought advantages to other kinds of networks as well. Its intuitiveness and debuggability helped accelerate the deep learning research development cycle. We believed in the advantages of an imperative execution framework compared to the existing <em>define-and-run</em> declarative approaches. Along the way, we worked on improvements like object-oriented network definition, higher-order differentiation, dynamic inference of layer input size, and training loop abstractions, while keeping the simplicity of the pure Python implementation and interoperability with the NumPy ecosystem.</p>
<p>The define-by-run approach has been widely adopted by the deep learning research community, and the designs of the major frameworks are converging to similar syntax and functionality. We are proud of the role that Chainer has played in this shift and pleased with its contribution to the community. We believe it is the right time to consider what contributions we should make to improve the research productivity of the deep learning community. Instead of separately developing frameworks with similar design goals, we have decided to support a framework with a larger user-base and ecosystem.</p>
<p>After reviewing the available frameworks, we believe PyTorch is the closest in spirit to the Chainer style of code and the appropriate replacement. Preferred Networks will start using PyTorch widely, and we look forward to contributing to PyTorch with the experience and knowledge gained from the development of Chainer.</p>
<h3 id="conclusion">Conclusion</h3>
<p>For users migrating to PyTorch, we are releasing resources to ease porting efforts: <a href="http://chainer.github.io/migration-guide">Migration Guide</a> and <a href="http://github.com/chainer/chainer-pytorch-migration">Migration Library</a>.</p>
<p>We would like to thank the contributors to the Chainer code base and the community surrounding it. We wouldn’t be here today without your support over all these years. Let’s continue improving deep learning software to accelerate research and development.</p>
<p><a href="https://chainer.org/announcement/2019/12/05/released-v7-ja.html">日本語版 (Japanese)</a></p>
Thu, 05 Dec 2019 00:00:00 +0000
https://chainer.org/announcement/2019/12/05/released-v7.html
https://chainer.org/announcement/2019/12/05/released-v7.htmlAnnouncementChainer/CuPy v7のリリースと今後の開発体制について<p>Chainer/CuPy v7のリリース、およびChainerの開発体制の変更についてお知らせします。</p>
<h2 id="chainercupy-v7">Chainer/CuPy v7</h2>
<p>本日、ChainerおよびCuPyのv7.0.0をリリースしました。変更点については各リリースノートをご覧ください。主要な変更点は以下の通りです。</p>
<p>Chainer v7 (<a href="https://github.com/chainer/chainer/releases/tag/v7.0.0a1">alpha</a>, <a href="https://github.com/chainer/chainer/releases/tag/v7.0.0b1">beta1</a>, <a href="https://github.com/chainer/chainer/releases/tag/v7.0.0b2">beta2</a>, <a href="https://github.com/chainer/chainer/releases/tag/v7.0.0b3">beta3</a>, <a href="https://github.com/chainer/chainer/releases/tag/v7.0.0b4">beta4</a>, <a href="https://github.com/chainer/chainer/releases/tag/v7.0.0rc1">rc1</a>, <a href="https://github.com/chainer/chainer/releases/tag/v7.0.0">major</a>):</p>
<ul>
<li>ChainerMNを含む多くの機能がChainerXのndarrayに対応しました。</li>
<li>ONNX-ChainerがChainerに統合されました。</li>
<li><code class="language-plaintext highlighter-rouge">TabularDataset</code> が追加されました。カラム指向のデータセットをpandasのような抽象化APIで操作できます。</li>
<li>NHWCのサポートが追加されました。Tensor Coreを搭載したGPUにおいて畳み込みやBatch Normalizationのパフォーマンスが向上します。</li>
</ul>
<p>CuPy v7 (<a href="https://github.com/cupy/cupy/releases/tag/v7.0.0a1">alpha</a>, <a href="https://github.com/cupy/cupy/releases/tag/v7.0.0b1">beta1</a>, <a href="https://github.com/cupy/cupy/releases/tag/v7.0.0b2">beta2</a>, <a href="https://github.com/cupy/cupy/releases/tag/v7.0.0b3">beta3</a>, <a href="https://github.com/cupy/cupy/releases/tag/v7.0.0b4">beta4</a>, <a href="https://github.com/cupy/cupy/releases/tag/v7.0.0rc1">rc1</a>, <a href="https://github.com/cupy/cupy/releases/tag/v7.0.0">major</a>):</p>
<ul>
<li>NVIDIA cuTENSORおよびCUBのサポートによりパフォーマンスが向上しました。</li>
<li>ROCmの試験的なサポートを行いました。これにより、CuPyがAMD GPU上で実行可能になります。</li>
</ul>
<p>なお、すでに<a href="https://chainer.org/announcement/2019/08/21/python2.html">アナウンス</a>した通り、Python 2のサポートが終了しました。Chainer/CuPy v7ではPython 3.5以降のみがサポートされます。</p>
<h2 id="chainer開発体制の変更について">Chainer開発体制の変更について</h2>
<p>本日<a href="https://preferred.jp/ja/news/pr20191205/">アナウンス</a>された通り、Chainerの開発元であるPreferred Networksでは、研究開発に使用するフレームワークをPyTorchへ順次移行します。現時点では、Chainer v7はChainerの最後のメジャーリリースとなる予定であり、今後の開発はバグフィックスおよびメンテナンスのみとなります。Chainerファミリー(ChainerCV, Chainer Chemistry, ChainerUI, ChainerRL)についてもこの方針に従います。また、Preferred Networksの運用する<a href="https://tutorials.chainer.org/ja/">ディープラーニング入門: Chainerチュートリアル</a>については今後コンテンツのリニューアルを検討しています。</p>
<p>なお、CuPyの開発はこれまで通り継続してゆきます。CuPyは当初ChainerのGPUバックエンドとして開発されましたが、現在ではGPUによる高速な演算をNumPyと同じ文法で記述できる数少ないライブラリとして、様々なコミュニティに受け入れられています。</p>
<h3 id="背景">背景</h3>
<p>この決定は、「深層学習およびその応用の研究開発を高速化する」というChainerチームのミッションを踏まえ、様々な検討を重ねた上で慎重に行われました。</p>
<p>2015年に公開されたChainerは、微分可能プログラミングのための新たな命令的APIセットを提案し、それを <em>define-by-run</em> と名付けました。このパラダイムは、今日では <em>eager</em> executionとも呼ばれています。当初define-by-runのアプローチは、自然言語処理に用いられる回帰型ニューラルネットワーク(RNN)などの記述を容易にするというモチベーションから発案されたものでしたが、すぐにそれ以外のネットワークにも応用されてゆきました。その直感的な表記とデバッグの容易さは、深層学習研究における開発サイクルの高速化に大きく貢献しました。我々は命令的な実行方式を採用するフレームワークが、既存の宣言的な <em>define-and-run</em> 実行方式よりも優れているという確信を得て、開発を進めました。オブジェクト指向によるネットワーク定義、高次微分、レイヤの入力データサイズの動的推論、トレーニングループの抽象化といった様々な機能追加を、pure Pythonによる簡潔な実装とNumPyエコシステムとの相互運用性を保ったまま実現してきました。</p>
<p>define-by-runのアプローチは深層学習コミュニティにおいて広く受け入れられ、結果として多くのフレームワークは似通った文法と機能に集約されてゆきました。Chainerチームは、このトレンドの転換においてChainerが果たした役割を誇りに思うとともに、コミュニティに対してこのような貢献ができたことを嬉しく思います。そして今、研究開発の生産性を高めるために深層学習コミュニティに対してどのような貢献をしてゆくべきか改めて熟慮した結果、似通ったゴールを持つフレームワークを個別に開発するのではなく、より大きなユーザベースとエコシステムを持つフレームワークに貢献してゆくことが最良であると判断しました。</p>
<p>いくつかのフレームワークを検討したのち、PyTorchが最もChainerに近い思想を持っており、Chainerの後続として最適であると確信しました。Preferred Networksでは、今後PyTorchを主要なフレームワークとして使用するとともに、Chainerの開発を通じて得られた知識と経験を生かしてPyTorchへ貢献してゆきます。</p>
<h3 id="おわりに">おわりに</h3>
<p>PyTorchへの移行に際して、Chainerチームでは移行を容易にするためのドキュメントおよびライブラリを公開しました。</p>
<ul>
<li><a href="http://chainer.github.io/migration-guide">Migration Guide</a></li>
<li><a href="http://github.com/chainer/chainer-pytorch-migration">Migration Library</a></li>
</ul>
<p>これまでChainerおよびChainerを取り巻くコミュニティへ貢献してくださった全ての皆さまに、深く感謝いたします。今日の成果は、皆さまの協力なくして成し得ませんでした。今後も深層学習ソフトウェアの改善を通じて、コミュニティと協働しながら深層学習領域の研究開発の加速に貢献してゆきたいと考えています。</p>
<p><a href="https://chainer.org/announcement/2019/12/05/released-v7.html">英語版 (English)</a></p>
Thu, 05 Dec 2019 00:00:00 +0000
https://chainer.org/announcement/2019/12/05/released-v7-ja.html
https://chainer.org/announcement/2019/12/05/released-v7-ja.htmlAnnouncementSunsetting Python 2 Support<p><strong>Summary:</strong> Due to the end-of-life (EOL) of Python 2 in January 2020, Chainer and CuPy v7.0.0b3 (release planned in August 2019) will drop Python 2 support. Chainer and CuPy v6.x (current stable release branch) continue to support Python 2. Chainer v6.x will be supported at least until after the EOL of Python 2.</p>
<hr />
<p>The Chainer Team has decided to drop Python 2 support in Chainer and CuPy (referred to collectively as “Chainer” in this post) v7.x releases.
This decision was made considering the following facts:</p>
<ul>
<li>Python 2 will become end-of-life (EOL) in <a href="https://www.python.org/dev/peps/pep-0373/#maintenance-releases">January 2020</a>.</li>
<li>Many scientific computation packages, including NumPy, which is one of the core dependency of Chainer, are <a href="https://python3statement.org/">planning or already started to drop support for Python 2</a>.</li>
<li>The results of open-source users’ survey held in the forum (<a href="https://groups.google.com/forum/#!topic/chainer/Yymm49chbC4">English</a> and <a href="https://groups.google.com/forum/#!topic/chainer-jp/b98cqvA9V9A">Japanese</a>) indicated that only a small ratio of users are currently using Python 2 and most of them are planning to migrate to Python 3.</li>
<li>Supporting Python 2 and 3 in the same codebase requires effort, such as replicating Python 3 features in Python 2, including <code class="language-plaintext highlighter-rouge">six</code> in pull-request reviews, etc.</li>
</ul>
<p>We will sunset Python 2 support on the following schedule:</p>
<ul>
<li>In Chainer v7.0.0b3 (planned in August 2019), Python 2 will not be supported. It will still run on Python 2, but will give a warning when <code class="language-plaintext highlighter-rouge">import chainer</code> / <code class="language-plaintext highlighter-rouge">import cupy</code> commands are used in Python 2.</li>
<li>In Chainer v7.0.0b4 (planned in September 2019), Python 2 support will be removed, and the code will not run on Python 2.</li>
</ul>
<p>Please note that Chainer v6.x (current stable) releases still support Python 2, so you can continue using current and future v6.x releases on Python 2 in your existing projects.
Chainer v6.x will be supported at least until after the EOL of Python 2.</p>
<h3 id="chainer-family-products">Chainer Family Products</h3>
<ul>
<li><strong>ChainerMN</strong> (which is already merged to Chainer), <strong><a href="https://github.com/chainer/chainercv">ChainerCV</a></strong>, <strong><a href="https://github.com/pfnet/chainer-chemistry">Chainer Chemistry</a></strong>, and <strong><a href="https://github.com/chainer/chainerui">ChainerUI</a></strong> will support Python 2 until EOL of Chainer v6.x series.</li>
<li><strong><a href="https://github.com/chainer/chainerrl">ChainerRL</a></strong> will drop Python 2 support in the near future (possibly next release, before the Chainer v6.x EOL) as the latest <code class="language-plaintext highlighter-rouge">gym</code> package (which ChainerRL depends on) no longer supports Python 2.</li>
<li><strong><a href="https://github.com/chainer/onnx-chainer">ONNX-Chainer</a></strong> and <strong><a href="https://github.com/chainer/chainerio">ChainerIO</a></strong> didn’t support Python 2 since the initial release.</li>
</ul>
Wed, 21 Aug 2019 00:00:00 +0000
https://chainer.org/announcement/2019/08/21/python2.html
https://chainer.org/announcement/2019/08/21/python2.htmlAnnouncementReleased Chainer/CuPy v6.0.0<p>We have released Chainer and CuPy v6.0.0 today!
This is a major release that introduces several new features.
Full updates can be found in the release notes: <a href="https://github.com/chainer/chainer/releases/tag/v6.0.0">Chainer</a>, <a href="https://github.com/cupy/cupy/releases/tag/v6.0.0">CuPy</a>.</p>
<h2 id="chainerx">ChainerX</h2>
<p>The biggest update is the introduction of <strong>ChainerX</strong>.
It is a fast and portable ndarray engine with autograd support written in C++ with a very thin Python wrapper.</p>
<p>We have released the beta version of ChainerX in v6.0.0b1 as we wrote in the <a href="https://chainer.org/announcement/2018/12/03/chainerx.html">previous blog post</a>.
Since then, we have been working on improving it in various aspects.
In particular, ChainerX in v6.0.0 expands the coverage of various features since v6.0.0b1.</p>
<ul>
<li><strong>Wider op coverage</strong>.
We have more Chainer functions that directly call ChainerX’s low-overhead implementation.
The effort is still on going at <a href="https://github.com/chainer/chainer/issues/6423">the tracking issue</a> with <a href="https://docs.google.com/spreadsheets/d/1B4E78tw9Awgpcdn5G7zsQ8NVFYJdOoJlIQg42QxKNfU">the spreadsheet of op-wise implementation status</a>.
We continue to expand the op coverage towards the next v7 release.
Contributions are always welcomed!</li>
<li><strong>Wider Function coverage</strong>.
Most users will start using ChainerX through Chainer’s existing interface (just by replacing NumPy/CuPy arrays with ChainerX arrays).
When ChainerX does not have an implementation for an operation, Chainer automatically falls back to NumPy/CuPy-based implementation.
It basically works without any fix for most functions, but sometimes not.
We are fixing such bugs to enlarge the coverage of functions for ChainerX usage.
The effort is accompanied by the introduction of a test fixture class for function tests (you can find <a href="https://github.com/chainer/chainer/issues/6071">the tracking issue</a>).
Currently, 40% of the functions under <code class="language-plaintext highlighter-rouge">chainer.functions</code> are already tested with ChainerX.
They cover basic array operations resembling routines in NumPy and operations commonly used in convolutional neural networks such as convolution, deconvolution and pooling. Operations for recurrent neural networks will be addressed in the upcoming releases.
We hope the coverage will reach 100% in v7.
Contributions are always welcomed here, too!</li>
<li><strong>Wider example coverage</strong>.
Most examples now support ChainerX.
By specifying ChainerX’s device names (e.g. <code class="language-plaintext highlighter-rouge">native</code> for CPU and <code class="language-plaintext highlighter-rouge">cuda:0</code>, <code class="language-plaintext highlighter-rouge">cuda:1</code>, … for GPUs), examples run with ChainerX arrays.
It also means that the coverage of ChainerX support in Chainer’s features in general is expanding.</li>
</ul>
<p>You can find the <a href="https://chainer.org/announcement/2018/12/03/chainerx.html">previous blog post</a> for its background and overview,
and <a href="https://docs.chainer.org/en/v6.0.0/chainerx/index.html">ChainerX Documentation</a> for the installation guide, tutorial, and reference.</p>
<h2 id="other-updates">Other updates</h2>
<p>This release also includes many features other than ChainerX.
We list up notable updates as follows.</p>
<ul>
<li><strong>More mixed precision support</strong>.
Chainer v6 introduces <em>mixed precision mode</em> and <em>dynamic loss scaling</em> for better support of mixed precision training.
Mixed precision mode is enabled by setting <code class="language-plaintext highlighter-rouge">CHAINER_DTYPE=mixed16</code> or <code class="language-plaintext highlighter-rouge">chainer.global_config.dtype = chainer.mixed16</code>.
In this mode, Chainer automatically chooses either <code class="language-plaintext highlighter-rouge">float16</code> or <code class="language-plaintext highlighter-rouge">float32</code> depending on what is appropriate in terms of a performance-to-precision tradeoff.
Dynamic loss scaling, originated from <a href="https://github.com/NVIDIA/apex">Apex</a>, automatically adjusts the scaling coefficient of backprop to avoid underflow.</li>
<li><strong>Device API</strong>.
We introduce a new device API for better interoperability between backends (including ChainerX).
It unifies the way in which devices are specified and data is transferred between devices.
In particular, a unified device specifier is introduced.
It is based on ChainerX’s device specifier of the format <code class="language-plaintext highlighter-rouge">'backend:id'</code>, e.g. <code class="language-plaintext highlighter-rouge">'native:0'</code> and <code class="language-plaintext highlighter-rouge">'cuda:N'</code> (where <code class="language-plaintext highlighter-rouge">N</code> is the CUDA device id).
For native (CPU), the id part can be omitted (like <code class="language-plaintext highlighter-rouge">'native'</code>).
For conventional devices backed by NumPy-like modules, the name is <code class="language-plaintext highlighter-rouge">@numpy</code>, <code class="language-plaintext highlighter-rouge">@cupy:N</code>, and <code class="language-plaintext highlighter-rouge">@intel64</code>.
This notation can be used, e.g., in the <code class="language-plaintext highlighter-rouge">to_device</code> function.
Note that the existing APIs related to devices (e.g. <code class="language-plaintext highlighter-rouge">to_cpu</code> and <code class="language-plaintext highlighter-rouge">to_gpu</code>) are still available.</li>
<li><strong><code class="language-plaintext highlighter-rouge">__array_function__</code> in CuPy</strong>.
NumPy’s <code class="language-plaintext highlighter-rouge">__array_function__</code> is an experimental feature for letting NumPy dispatch implementations of almost all functions to third-party duck arrays.
CuPy now supports this interface.
To use this feature, you need to get NumPy 1.16 and set <code class="language-plaintext highlighter-rouge">NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=1</code> (it will hopefully be the default mode in NumPy 1.17).
Then, many NumPy functions that CuPy supports will accept CuPy arrays and automatically call CuPy’s implementation.</li>
</ul>
<p>We recommend updating to the latest version of Chainer and CuPy.
You can find the upgrade guide <a href="https://docs.chainer.org/en/latest/upgrade.html">here</a>.
Updating Chainer should be done as usual with the command <code class="language-plaintext highlighter-rouge">pip install -U chainer</code>.
Note that ChainerX is not built by default; see the <a href="https://docs.chainer.org/en/v6.0.0/chainerx/install/index.html">installation guide of ChainerX</a> for details.
CuPy can be updated with <code class="language-plaintext highlighter-rouge">pip</code> as well, but be careful to use the appropriate package name if you are using a wheel package (<code class="language-plaintext highlighter-rouge">cupy-cuda NN</code>).</p>
<p>Any feedback to the dev team would be welcomed and appreciated.
You can ask questions or leave comments at <a href="https://gitter.im/chainer">gitter</a>, <a href="https://bit.ly/join-chainer-slack">Slack</a>, <a href="https://groups.google.com/forum/#!forum/chainer">Google Groups</a>, and <a href="https://stackoverflow.com/questions/tagged/chainer">StackOverflow</a>.</p>
Thu, 16 May 2019 00:00:00 +0000
https://chainer.org/announcement/2019/05/16/released-v6.html
https://chainer.org/announcement/2019/05/16/released-v6.htmlAnnouncementChainerX Beta Release<p>Today, we announce <strong>ChainerX</strong>, a fast, portable, and extensible backend of Chainer.
It is aimed at reducing the host-side performance overhead as well as making models much easier to ship for applications.
ChainerX is included as an optional feature of Chainer v6.0.0 beta1, and is planned to be officially released as a part of Chainer v6 series next Spring.
You can find <a href="https://docs.chainer.org/en/latest/chainerx/index.html">the official documentation</a>, including a quick tutorial.</p>
<h2 id="background">Background</h2>
<p>Chainer was developed as a pure Python package, which enabled a simple interface for a Define-by-Run deep learning framework.
It heavily depends on NumPy and CuPy, which are both implemented in fast, compiled languages (C and Cython, respectively).
Most heavy deep learning tasks work best on NVIDIA GPUs,
and, thanks to its asynchronous computing architecture, the framework overhead has been hidden by the sequence of heavy GPU kernel executions.
This enabled a deep network system based on Chainer to take the record for <a href="https://arxiv.org/abs/1711.04325">the fastest training of a large convolutional networks at the time</a>.</p>
<p>The situation is changing.
GPUs are evolving rapidly compared to CPUs, and more and more accelerator chips optimized for deep learning computation are available.
As a result, the host side operations are becoming the bottleneck of many tasks, including computer vision, automatic speech recognition, and natural language processing.
Some research outcomes are being supplied to application areas, which increases the demands of deploying deep learning models for products and services in reliable and portable ways.</p>
<p>While pure Python is easier to work with and design, it incurs heavy host-side overhead, and dependency on CPython can be an obstacle to porting models to applications.
We found that the design of the multi-dimensional array and the define-by-run automatic differentiation is mature, and radical design changes are not expected.
ChainerX is designed from scratch as a C++ implementation of these mature components to solve both the performance and the portability issues.</p>
<h2 id="overview">Overview</h2>
<p>The “X” suffix of the name stands for three keywords that represent its aim.</p>
<ul>
<li>Accelerated: It implements an ndarray with autograd feature in C++, removing the host-side overhead related to automatic differentiation.</li>
<li>Exportable: Thanks to the C++ implementation, it opens the door to porting models onto Python-free environments.
Note that ChainerX itself does not include any features to actually port the models;
yet the pure C++ ndarray implementation with autograd makes it much easier to introduce such a mechanism.</li>
<li>Extensible: As noted above, there are increasing demands of supporting a wider range of computing environments.
The new ndarray has a modular design that enables us to plug-in a computing backend that supports new devices.</li>
</ul>
<p>ChainerX currently covers the ndarray and automatic differentiation part of Chainer.
The ndarray and the chainerx namespace follow NumPy-like APIs.
The implementation is written in C++, while a thin Python binding layer is provided.
We added built-in support of this new ndarray for existing Chainer APIs, including Variable, so that users can immediately start using ChainerX with only slight changes to the user code.</p>
<p><img src="https://chainer.org/assets/chainerx-stack.png" alt="Software stack of Chainer with ChainerX. ChainerX and its backends roughly correspond to Chainer’s automatic differentiation implementations and the NumPy/CuPy layers." /></p>
<p>ChainerX provides the following three levels of interfaces.</p>
<ul>
<li>C++ API (unstable yet): The fastest interface to use if you do not need Python.
Backend plugins require this layer to cooperate with.</li>
<li>Python API: A thin wrapper of the C++ API.
It follows the NumPy API design, so users familiar with NumPy can quickly learn this API.</li>
<li>Chainer on ChainerX: The existing Chainer API also supports the new ChainerX ndarray similarly to NumPy/CuPy ndarrays.
It incurs some overhead, but it is the easiest way to start using ChainerX based on existing code base.</li>
</ul>
<p>The following is a quick comparison result of host-side overhead.</p>
<table>
<thead>
<tr>
<th>Framework</th>
<th>Time per iteration (= fwd+bwd+update, msec)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Chainer on NumPy</td>
<td>14.48</td>
</tr>
<tr>
<td>Chainer on ChainerX</td>
<td><strong>7.54</strong></td>
</tr>
<tr>
<td>ChainerX Python</td>
<td><strong>1.88</strong></td>
</tr>
<tr>
<td>PyTorch</td>
<td>2.45</td>
</tr>
</tbody>
</table>
<p>While ChainerX lacks some operation implementations (see the <a href="https://docs.chainer.org/en/latest/chainerx/limitations.html">limitations</a> page for more information),
Chainer on ChainerX supports automatic fallback to existing NumPy/CuPy based code for forward, backward, and update.
Note that this compatibility layer also has some overhead, and we will continue exploring the best way of getting the maximum performance with small code changes.</p>
<h2 id="future">Future</h2>
<p>During the beta phase of Chainer v6, we will add more features to ChainerX so that it can accelerate a wider range of research and applications.
We will also continue exploring other ways to adopt the fast Python API of ChainerX in more research projects.</p>
<p>ChainerX is just a half of the picture that covers the whole scenario of fast research and quick model shipping.
The Chainer team is also working on translating the models written in Python to a portable format based on ONNX, and running the exported neural net with the ChainerX C++ implementation.
We are looking forward to realizing this new research and application cycle, and seeing more and more researchers, practitioners, and engineers play with this evolving framework.</p>
Mon, 03 Dec 2018 00:00:00 +0000
https://chainer.org/announcement/2018/12/03/chainerx.html
https://chainer.org/announcement/2018/12/03/chainerx.htmlAnnouncementReleased Chainer/CuPy v5.0.0<p>We have released Chainer and CuPy v5.0.0 today!
This is a major release that introduces several new features.</p>
<p>The following is a list of selected updates. Full updates can be found in the release notes: <a href="https://github.com/chainer/chainer/releases/tag/v5.0.0">Chainer</a>, <a href="https://github.com/cupy/cupy/releases/tag/v5.0.0">CuPy</a>.</p>
<ul>
<li><strong>Static subgraph optimization</strong> <em>(experimental)</em>.
By applying the <code class="language-plaintext highlighter-rouge">@static_graph</code> decorator to the static part of your computation (which uses the same graph at every iteration), the computational graph of that part is cached and reused.
Fully-static models speed up by 20-60% in most cases.
Example code modified for the static subgraph feature can be found <a href="https://github.com/chainer/chainer/tree/v5/examples/static_graph_optimizations">here</a>.</li>
<li><strong>Float16 support</strong>.
Using half-precision floats is made much easier!
Since recent GPU technologies often focus on half and mixed precision computations, using float16 is crucial for fully utilizing the latest hardware performance.
In Chainer v5, the default floating point dtype is configurable via <code class="language-plaintext highlighter-rouge">CHAINER_DTYPE</code> environment variable or <code class="language-plaintext highlighter-rouge">config.dtype</code> entry.
Using this feature, most code will be able to use float16 without modification.
Many classes and functions are fixed to support float16 inputs and parameters.</li>
<li><strong>ChainerMN integration</strong>.
<a href="https://github.com/chainer/chainermn">ChainerMN</a> was an add-on package of Chainer for distributed deep learning,
but is now a built-in module of Chainer v5.
The APIs and the usage are not changed; just install <code class="language-plaintext highlighter-rouge">chainer</code> and <code class="language-plaintext highlighter-rouge">mpi4py</code> to start distributed deep learning.</li>
<li><strong>Probability distributions</strong>.
We introduced the <code class="language-plaintext highlighter-rouge">chainer.distributions</code> module that implements many parametric probability distributions with autograd capability.
Each distribution provides point-wise evaluation (e.g. log density), statistics computation, and sampling.
For its implementation, we also added many GPU sampling routines (under <code class="language-plaintext highlighter-rouge">cupy.random</code>) and special functions (e.g. log-gamma function).
While v5 includes many frequently used distributions, we are still expanding this feature for the upcoming releases.</li>
<li><strong>iDeep 2.0</strong>.
Chainer Backend for Intel Architecture, a.k.a. iDeep, is updated.
You can install it with <code class="language-plaintext highlighter-rouge">pip install ideep4py</code>, and use it by setting the environment variable <code class="language-plaintext highlighter-rouge">CHAINER_USE_IDEEP=auto</code>.
There are many performance improvements in this version.</li>
<li><strong>CuPy interoperability with other libraries and ecosystems</strong>.
CuPy ndarray can now be easily combined with other libraries.
For more details, see the <a href="https://docs-cupy.chainer.org/en/v5.0.0/reference/interoperability.html">Interoperability section</a> of the CuPy reference manual.
<ul>
<li>DLpack: <code class="language-plaintext highlighter-rouge">ndarray.toDLpack</code> and <code class="language-plaintext highlighter-rouge">cupy.fromDLpack</code> can be used to interchange the array with other deep learning frameworks.</li>
<li>NumPy: NumPy ufunc is directly applicable to CuPy’s ndarray. For example, <code class="language-plaintext highlighter-rouge">numpy.exp(cupy.arange(3))</code> is valid, which is equivalent to <code class="language-plaintext highlighter-rouge">cupy.exp(cupy.arange(3))</code>.</li>
<li>Numba: Numba’s JITed CUDA kernel is directly applicable to CuPy ndarrays.</li>
</ul>
</li>
</ul>
<p>We recommend updating to the latest version of Chainer and CuPy.
You can find the upgrade guide <a href="https://docs.chainer.org/en/latest/upgrade.html">here</a>.
Updating Chainer should be done as usual with the command <code class="language-plaintext highlighter-rouge">pip install -U chainer</code>.
CuPy can be updated in the same way, but be careful to use the appropriate package name if you are using a wheel package (<code class="language-plaintext highlighter-rouge">cupy-cuda NN</code>).</p>
<p>Any feedback to the dev team would be welcomed and appreciated.
You can ask questions or leave comments at <a href="https://gitter.im/chainer">gitter</a>, <a href="https://bit.ly/join-chainer-slack">Slack</a>, <a href="https://groups.google.com/forum/#!forum/chainer">Google Groups</a>, and <a href="https://stackoverflow.com/questions/tagged/chainer">StackOverflow</a>.</p>
Thu, 25 Oct 2018 00:00:00 +0000
https://chainer.org/announcement/2018/10/25/released-v5.html
https://chainer.org/announcement/2018/10/25/released-v5.htmlAnnouncementChainerMN on AWS with CloudFormation<p><em>Japanese version is <a href="https://research.preferred.jp/2018/06/chainermn-on-aws-with-cloudformation/">here</a></em></p>
<p><a href="https://aws.amazon.com/cloudformation/">AWS CloudFormation</a> a service which helps us to practice <a href="https://en.wikipedia.org/wiki/Infrastructure_as_Code"><em>Infrastructure As Code</em></a> on wide varieties of AWS resources. <a href="https://aws.amazon.com/cloudformation/">AWS CloudFormation</a> provisions AWS resources in a repeatable manner and allows us to build and re-build infrastructure without time-consuming manual actions or write custom scripts.</p>
<p>Building distributed deep learning infrastructure requires some extra hustle such as installing and configuring deep learning libraries, setup ec2 instances, and optimization for computational/network performance. Particularly, running <a href="https://github.com/chainer/chainermn">ChainerMN</a> requires you to setup an MPI cluster. <a href="https://aws.amazon.com/cloudformation/">AWS CloudFormation</a> helps us automating this process.</p>
<p>Today, We announce <a href="https://github.com/chainer/chainer-ami">Chainer/ChainerMN pre-installed AMI</a> and <a href="https://github.com/chainer/chainer-cfn">CloudFormaiton template for ChainerMN Cluster</a>.</p>
<ul>
<li><a href="https://github.com/chainer/chainer-ami">chainer/chainer-ami</a></li>
<li><a href="https://github.com/chainer/chainer-cfn">chainer/chainer-cfn</a></li>
</ul>
<p>This enables us to spin up a <a href="https://github.com/chainer/chainermn">ChainerMN</a> cluster on AWS and run your <a href="https://github.com/chainer/chainermn">ChainerMN</a> tasks instantly in the cluster.</p>
<p>This article explains how to use them and how you can run distributed deep learning with <a href="https://github.com/chainer/chainermn">ChainerMN</a> on AWS.</p>
<h2 id="chainer-ami"><a href="https://github.com/chainer/chainer-ami">Chainer AMI</a></h2>
<p>The <a href="https://github.com/chainer/chainer-ami">Chainer AMI</a> comes with <a href="https://chainer.org">Chainer</a>/<a href="https://cupy.chainer.org/">CuPy</a>/<a href="https://github.com/chainer/chainermn">ChainerMN</a>, its families (<a href="https://github.com/chainer/chainercv">ChianerCV</a> and <a href="https://github.com/chainer/chainerrl">ChainerRL</a>) and <a href="https://developer.nvidia.com/cuda-zone">CUDA</a>-aware <a href="https://www.open-mpi.org/">OpenMPI</a> libraries so that you can run <a href="https://chainer.org">Chainer</a>/<a href="https://github.com/chainer/chainermn">ChainerMN</a> workloads easily on AWS EC2 instances even on ones with GPUs. This image is based on <a href="https://docs.aws.amazon.com/dlami/latest/devguide/overview-base.html">AWS Deep Learning Base AMI</a>.</p>
<p>The latest version is <code class="language-plaintext highlighter-rouge">0.1.0</code>. The version includes:</p>
<ul>
<li>OpenMPI version <code class="language-plaintext highlighter-rouge">2.1.3</code>
<ul>
<li>it was built only for <code class="language-plaintext highlighter-rouge">cuda-9.0</code>.</li>
</ul>
</li>
<li>All Chainer Families (they are built and installed against both <code class="language-plaintext highlighter-rouge">python</code> and <code class="language-plaintext highlighter-rouge">python3</code> environment)
<ul>
<li><code class="language-plaintext highlighter-rouge">CuPy</code> version <code class="language-plaintext highlighter-rouge">4.1.0</code></li>
<li><code class="language-plaintext highlighter-rouge">Chainer</code> version <code class="language-plaintext highlighter-rouge">4.1.0</code>,</li>
<li><code class="language-plaintext highlighter-rouge">ChainerMN</code>, version <code class="language-plaintext highlighter-rouge">1.3.0</code></li>
<li><code class="language-plaintext highlighter-rouge">ChainerCV</code> version <code class="language-plaintext highlighter-rouge">0.9.0</code></li>
<li><code class="language-plaintext highlighter-rouge">ChainerRL</code> version <code class="language-plaintext highlighter-rouge">0.3.0</code></li>
</ul>
</li>
</ul>
<h2 id="cloudformation-template-for-chainermn"><a href="https://github.com/chainer/chainer-cfn">CloudFormation Template For ChainerMN</a></h2>
<p>This template automatically sets up a <a href="https://github.com/chainer/chainermn">ChainerMN</a> cluster on AWS. Here’s the setup overview for AWS resources:</p>
<ul>
<li>VPC and Subnet for the cluster (you can configure existing VPC/Subnet)</li>
<li>S3 Bucket for sharing ephemeral ssh-key, which is used to communicate among MPI processes in the cluster</li>
<li>Placement group for optimizing network performance</li>
<li>ChainerMN cluster which consists of:
<ul>
<li><code class="language-plaintext highlighter-rouge">1</code> master EC2 instance</li>
<li><code class="language-plaintext highlighter-rouge">N (>=0)</code> worker instances (via AutoScalingGroup)</li>
<li><code class="language-plaintext highlighter-rouge">chainer</code> user to run mpi job in each instance</li>
<li><code class="language-plaintext highlighter-rouge">hostfile</code> to run mpi job in each instance</li>
</ul>
</li>
<li>(Option) <a href="https://aws.amazon.com/efs/features/">Amazon Elastic Filesystem</a> (you can configure an existing filesystem)
<ul>
<li>This is mounted on cluster instances automatically to share your code and data.</li>
</ul>
</li>
<li>Several required SecurityGroups, IAM Role</li>
</ul>
<p>The latest version is <code class="language-plaintext highlighter-rouge">0.1.0</code>. Please see <a href="https://s3-us-west-2.amazonaws.com/chainer-cfn/chainer-cfn-v0.1.0.template">the latest template</a> for detailed resource definitions.</p>
<p>As stated on our <a href="https://chainer.org/general/2018/05/25/chainermn-v1-3.html">recent blog on ChainerMN 1.3.0</a>, using new features (double buffering and all-reduce in half-precision floats) enables almost linear scalability on AWS even at ethernet speeds.</p>
<h2 id="how-to-build-a-chainermn-cluster-with-the-cloudformation-template">How to build a <a href="https://github.com/chainer/chainermn">ChainerMN</a> Cluster with the <a href="https://github.com/chainer/chainer-cfn">CloudFormation Template</a></h2>
<p>This section explains how to setup <a href="https://github.com/chainer/chainermn">ChainerMN</a> cluster on AWS in a step-by-step manner.</p>
<p>First, please click the link below to create <a href="https://aws.amazon.com/cloudformation/">AWS CloudFormation</a> Stack. And just click ‘Next’ on the page.</p>
<p><a href="https://console.aws.amazon.com/cloudformation/home#/stacks/new?stackName=chainermn-sample&templateURL=https://s3-us-west-2.amazonaws.com/chainer-cfn/chainer-cfn-v0.1.0.template"><img src="https://s3.amazonaws.com/cloudformation-examples/cloudformation-launch-stack.png" alt="launch stack" /></a></p>
<p>In “Specify Details” page, you can configure parameters on stack name, VPC/Subnet, Cluster, EFS configurations. The screenshot below is an example for configuring <code class="language-plaintext highlighter-rouge">4</code> <code class="language-plaintext highlighter-rouge">p3.16xlarge</code> instances, each of which has 8 NVIDIA Tesla V100 GPUs.</p>
<p><img src="/images/chainer-cfn-specifying-details.png" alt="chainer-cfn-specifying-details" /></p>
<p>At the last confirmation page, you will need to check a box in CAPABILITY section because this template will create some IAM roles for cluster instances.</p>
<p><img src="/images/chainer-cfn-capabilities-confirmation.png" alt="chainer-cfn-specifying-details" /></p>
<p>After several minutes (depending on cluster size), the status of the stack should converge to <code class="language-plaintext highlighter-rouge">CREATE_COMPLETE</code> if all went well, meaning your cluster is ready. You can access the cluster with <code class="language-plaintext highlighter-rouge">ClusterMasterPublicDNS</code> which will appear in the output section of the stack.</p>
<h2 id="how-to-run-chainermn-job-in-the-cluster">How to run <a href="https://github.com/chainer/chainermn">ChainerMN</a> Job in the Cluster</h2>
<p>You can access the cluster instances with keypair which was specified in template parameter.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ssh -i keypair.pem [email protected]
</code></pre></div></div>
<p>Because <a href="https://github.com/chainer/chainer-ami">Chainer AMI</a> comes with all required libraries to run <a href="https://chainer.org">Chainer</a>/<a href="https://github.com/chainer/chainermn">ChainerMN</a> jobs, you only need to download your code to the instances.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># switch user to chainer
ubuntu@ip-ww-xxx-yy-zzz$ sudo su chainer
# download ChainerMN's train_mnist.py into EFS
chainer@ip-ww-xxx-yy-zzz$ wget https://raw.githubusercontent.com/chainer/chainermn/v1.3.0/examples/mnist/train_mnist.py -O /efs/train_mnist.py
</code></pre></div></div>
<p>That’s it! Now, you can run MNIST example with <a href="https://github.com/chainer/chainermn">ChainerMN</a> by just invoking <code class="language-plaintext highlighter-rouge">mpiexec</code> command.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># It will spawn 32 processes(-n option) among 4 instances (8 processes per instance (-N option))
chainer@ip-ww-xxx-yy-zzz$ mpiexec -n 32 -N 8 python /efs/train_mnist.py -g
...(you will see ssh warning here)
==========================================
Num process (COMM_WORLD): 32
Using GPUs
Using hierarchical communicator
Num unit: 1000
Num Minibatch-size: 100
Num epoch: 20
==========================================
epoch main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time
1 0.795527 0.316611 0.765263 0.907536 4.47915
...
19 0.00540187 0.0658256 0.999474 0.979351 14.7716
20 0.00463723 0.0668939 0.998889 0.978882 15.2248
# NOTE: above output is actually the output of the second try because mnist dataset download is needed in the first try.
</code></pre></div></div>
Fri, 01 Jun 2018 00:00:00 +0000
https://chainer.org/general/2018/06/01/chainermn-on-aws-with-cloudformation.html
https://chainer.org/general/2018/06/01/chainermn-on-aws-with-cloudformation.htmlGeneralOpen source deep learning framework Chainer officially supported by Amazon Web Services<p>Chainer has worked with Amazon Web Services (AWS) to provide access to the Chainer deep learning framework as a listed choice across many of AWS applications. Chainer provides straightforward calculation of deep neural networks in Python. The combination with AWS leverages Chainer’s exceptional abilities in multi-GPU and multi-server scaling, as demonstrated when <a href="https://www.preferred-networks.jp/docs/imagenet_in_15min.pdf">PFN trained ResNet50 on ImageNet-1K using Chainer in 15 minutes</a>, four times faster than the previous record held by Facebook.</p>
<p>Usage of multi-GPU and multi-server scaling allows researchers to leverage the ability of the cloud to provide computing resources on demand. Chainer’s unparalleled ability for parallel computing combined with AWS cloud resources available on demand enables researchers and engineers to minimize their cost while training complex deep learning models in a fraction of the time required on more limited hardware.</p>
<p>Chainer is already available as part of the AWS Deep Learning Amazon Machine Image (AMI). This is further enhanced by Chainer’s recent release of a CloudFormation script, which enables easy deployment of multiple Chainer AMIs at a time. Chainer has been tested to provide 95% scaling efficiency up to 32 GPUs on AWS, which means training of a neural network can be done up to thirty times as fast.</p>
<p>To simplify the process of pre-processing data, tuning hyperparameters, and deploying a neural network, Chainer is now supported on Amazon SageMaker. Amazon SageMaker is a fully-managed platform that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale. Using Chainer on Sagemaker will provide speed increases from parallelization, in addition to the deployment benefits of SageMaker.</p>
<p>In an additional announcement, AWS now supports Chainer on AWS Greengrass, the AWS service that lets you run local compute, messaging, data caching, sync, and ML inference capabilities for connected devices in a secure way. Combined with Amazon SageMaker, this allows access to the ease and speed of Chainer when training models on SageMaker and direct deployment on AWS Greengrass to IoT devices.</p>
<p>The Chainer team is excited about these releases by AWS and looks forward to providing further advances as deep learning techniques continue to advance.</p>
Fri, 01 Jun 2018 00:00:00 +0000
https://chainer.org/general/2018/06/01/chainer-officially-supported-by-aws.html
https://chainer.org/general/2018/06/01/chainer-officially-supported-by-aws.htmlGeneralNew ChainerMN functions for improved performance in cloud environments and performance testing results on AWS<p>ChainerMN is a package that adds multi-node distributed learning functionality to Chainer. We have added the following two new functions to v1.2.0 and v1.3.0 of ChainerMN, which are intended to improve the performance on systems whose inter-node communication bandwidth is low.</p>
<ul>
<li>Double buffering to conceal communication time</li>
<li>All-Reduce function in half-precision floats (FP16)</li>
</ul>
<p>It had previously been difficult to achieve high parallel performance in a system environment without a high-speed network because ChainerMN was developed assuming a supercomputer-like system with a high-speed network. With these newly-added functions, ChainerMN will be able to achieve high parallel performance even in the cloud and other common systems such as Amazon Web Services (AWS) as we presented at GTC2018.</p>
<h2 id="background">Background</h2>
<p>In data parallel distributed deep learning, the training time is typically dominated by the All-Reduce operation to calculate the sum gradients computed per node. We solved this issue on PFN’s 1,024 GPU supercomputer by utilizing high-speed InfiniBand interconnect, which is also used in supercomputers and Microsoft Azure, and also using the NVIDIA Collective Communications Library (NCCL) that enables fast execution of All-Reduce functions [1]. However, AWS and other commonly used systems have larger communication overhead because they do not have such a high-speed interconnect as InfiniBand. As a result of this, we could not make training faster simply by increasing the number of nodes in some cases. To solve these issues, we have added two function to ChainerMN v1.2.0 and v1.3.0: a double buffering function to conceal communication time and an All-Reduce function in FP16.</p>
<h2 id="function-to-conceal-communication-time-by-double-buffering">Function to conceal communication time by double buffering</h2>
<p>This function conceals the time it takes to communicate and shortens the overall computation time by having computation (forward, backward, and optimize) and communication (All-Reduce) processes overlapped. Normally in ChainerMN, one iteration consists of the four steps in the below diagram: forward, backward, All-Reduce, and optimize.</p>
<p><img src="https://chainer.org/assets/chainermn_v1_3_chainermn_iter.jpg" alt="Iteration" /></p>
<p>Using double buffering to conceal the communication time, the calculation and communication processes can overlap as in the below diagram.</p>
<p><img src="https://chainer.org/assets/chainermn_v1_3_dbuf_iter.jpg" alt="Double buffering iteration" /></p>
<p>In this case, the optimization process is performed using gradients in the previous iteration. This means it uses old gradients to optimize the model, possibly affecting accuracy. We have learned, however, that almost the same level of accuracy can be maintained when training on ImageNet as demonstrated in the experiment described later in this article. </p>
<p>You can use this function just by making <code class="language-plaintext highlighter-rouge">double_buffering=True</code> when creating a multi-node optimizer as shown below.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">optimizer</span> <span class="o">=</span> <span class="n">chainermn</span><span class="p">.</span><span class="n">create_multi_node_optimizer</span><span class="p">(</span><span class="n">optimizer</span><span class="p">,</span> <span class="n">comm</span><span class="p">,</span> <span class="n">double_buffering</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
</code></pre></div></div>
<p>Currently, this function only supports the <code class="language-plaintext highlighter-rouge">pure_nccl</code> communicator.</p>
<h2 id="all-reduce-function-in-fp16">All-Reduce function in FP16</h2>
<p>ChainerMN v1.2.0 only supported All-Reduce in FP32 but v1.3.0 supports FP16 as well. This allows you to perform distributed training even for FP16 models using ChainerMN. We can expect a significant reduction in All-Reduce time by using FP16 because the communication volume is halved in comparison with using FP32.
In addition, now you can use FP16 for only All-Reduce and reduce the All-Reduce time, even if you used FP32 in computation. This is the technique we employed for training on ImageNet using 1,024 GPUs [1].</p>
<p>For FP16 models, All-Reduce is carried out in FP16 without making any change. You can use different data types for computation and All-Reduce by putting <code class="language-plaintext highlighter-rouge">allreduce_grad_dtype='float16'</code> when creating a communicator as shown below.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">comm</span> <span class="o">=</span> <span class="n">chainermn</span><span class="p">.</span><span class="n">create_communicator</span><span class="p">(</span><span class="s">'pure_nccl'</span><span class="p">,</span> <span class="n">allreduce_grad_dtype</span><span class="o">=</span><span class="s">'float16'</span><span class="p">)</span>
</code></pre></div></div>
<p>This function only supports the <code class="language-plaintext highlighter-rouge">pure_nccl</code> communicator as of today, as double buffering does likewise.</p>
<h2 id="results">Results</h2>
<p>To demonstrate high parallel performance using the two new functions, we measured performance using image classification datasets on ImageNet. We used ResNet-50 as the CNN model. In this experiment, we used 10Gb Ethernet of PFN’s supercomputer MN-1 and AWS as low-speed networks. For more details on the experiment setting, please refer to the appendix at the end of this article.</p>
<h3 id="evaluation-using-10gb-ethernet">Evaluation using 10Gb Ethernet</h3>
<p>The following graphs show changes in the throughput as the number of GPUs is increased in three cases using MN-1: Infiniband FDR, 10Gb Ethernet, and 10Gb Ethernet using the two new functions.</p>
<p><img src="https://chainer.org/assets/chainermn_v1_3_mn_1_result.png" alt="Throughput using 10 Gb Ethernet" /></p>
<p>As you can see in the figure, the performance did not improve even as we increased the number of GPUs when using the 10Gb Ethernet while the use of the new functions enabled it to achieve the ideal speedup with the performance scaling linearly with the number of GPUs.</p>
<p>The following table also shows the average validation accuracy and average training hours when conducting training for five times with the number of epochs = 90 and 32 GPUs.</p>
<table>
<thead>
<tr>
<th> </th>
<th>Validation Accuracy (%)</th>
<th>ComputingTime (hour)</th>
</tr>
</thead>
<tbody>
<tr>
<td>InfiniBand FDR</td>
<td>76.4</td>
<td>6.96</td>
</tr>
<tr>
<td>10 Gb Ethernet</td>
<td>76.4</td>
<td>21.3</td>
</tr>
<tr>
<td>10 Gb Ethernet + Double Buffering + FP16 Allreduce</td>
<td>75.8</td>
<td>7.71</td>
</tr>
</tbody>
</table>
<p>As you can see, the two new functions had almost no impact on accuracy. In the meantime, it just took 11% longer to train the model when using the 10 Gb Ethernet and the new functions than when using Infiniband FDR. With this, we can conclude that high parallel performance can be achieved while maintaining the level of accuracy, without a need to use Infiniband or other high-speed networks.</p>
<h3 id="evaluation-using-aws">Evaluation using AWS</h3>
<p>In testing with AWS, we used p3.16xlarge. This instance has eight V100, which is the highest-performance GPU available as of May 2018. The following graphs show changes in the throughput as the number of GPUs increased when using this instance.</p>
<p><img src="https://chainer.org/assets/chainermn_v1_3_aws_result.png" alt="Throughput using AWS" /></p>
<p>Scaling efficiency is an indicator often used to measure parallel performance. In this experiment, the scaling efficiency is expressed as \(e\) using the following equation where the base throughput is \(t_0\) and the throughput when \(n\) x base GPUs are used is \(t\).</p>
\[e = t/(t_0*n)\]
<p>It indicates that the closer \(e\) gets to 1 (100%), the higher the parallel performance is. In this experiment, the scaling efficiency was 96% at 32GPUs when using 8GPUs as the base, demonstrating that the high parallel performance has been achieved by using the new functions.</p>
<h2 id="outlook-for-the-future">Outlook for the future</h2>
<p>We plan to add more functions to ChainerMN, including model parallelism to support various training models that are not achievable by data parallel as well as a function to improve fault tolerance. Our team is not only developing ChainerMN but also putting efforts into making Chainer and CuPy faster, and doing large-scale research and development activities by making the full use of MN-1, which is equipped with 1,024 P100 units, and the next-generation cluster with 512 V100 units. If you are interested in working with us on these activities, send us your application!</p>
<h2 id="appendix">Appendix</h2>
<h3 id="performance-measurement-details">Performance measurement details</h3>
<h4 id="experiment-setup">Experiment setup</h4>
<ul>
<li>Dataset:ImageNet-1k</li>
<li>Model:ResNet-50 (input image size 224×224)</li>
</ul>
<h4 id="setup-for-measuring-throughputs">Setup for measuring throughputs</h4>
<ul>
<li>Batch size:64</li>
<li>Training rate:fixed</li>
<li>Data augmentation:using the same method as Goyal et al. [2]</li>
<li>Optimization:Momentum SGD (momentum=0.9)</li>
<li>Weight decay: 0.0001</li>
<li># of measurements:400 iterations</li>
</ul>
<h4 id="setting-for-training-with--of-epochs--90">Setting for training with # of epochs = 90</h4>
<ul>
<li>Batch size:64 per GPU until the 30th epoch, 128 afterwards</li>
<li>Training rate:Gradual warmup until the 5th epoch, 0.2 time at the 30th epoch, and 0.1 time at the 60th and 80th epochs.</li>
<li>Data augmentation:using the same method as Goyal et al. [2]</li>
<li>Optimization:Momentum SGD (momentum=0.9)</li>
<li>Weight decay: 0.0001</li>
<li># of epochs:90 epochs</li>
<li>In general, this setup is based on Goyal et al. [2] and uses the technique described in Smith et al. [3].</li>
</ul>
<h4 id="experiment-conditions-in-the-verification-test-using-10gb-ethernet">Experiment conditions in the verification test using 10Gb Ethernet</h4>
<ul>
<li>Max 4 nodes, 32 GPUs in total</li>
<li>Node
<ul>
<li>GPU: 8 * NVIDIA Tesla P100 GPUs</li>
<li>CPU: 2 * Intel Xeon E5-2667 processors (3.20 GHz, 8 cores)</li>
<li>Network: InfiniBand FDR</li>
<li>Save location for training data:local disk</li>
</ul>
</li>
</ul>
<h4 id="experiment-conditions-in-the-verification-test-using-aws">Experiment conditions in the verification test using AWS</h4>
<ul>
<li>Max 4 nodes, 32 GPUs in total</li>
<li>Node(p3.16xlarge)
<ul>
<li>GPU: 8 * NVIDIA Tesla V100 GPUs</li>
<li>CPU: 64 vCPUs</li>
<li>Network: 25 Gbps network</li>
<li>Save location for training data:RAM disk</li>
</ul>
</li>
</ul>
<h2 id="references">References</h2>
<p>[1] Akiba, T., et al. Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes. CoRR, abs/1711.04325, 2017.</p>
<p>[2] Goyal, P., et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. CoRR, abs/1706.02677, 2017.</p>
<p>[3] Smith, S. L., et al. Don’t Decay the Learning Rate, Increase the Batch Size. CoRR, abs/1711.00489, 2017.</p>
Fri, 25 May 2018 00:00:00 +0000
https://chainer.org/general/2018/05/25/chainermn-v1-3.html
https://chainer.org/general/2018/05/25/chainermn-v1-3.htmlGeneralChainerMN on Kubernetes with GPUs<p><a href="https://kubernetes.io/">Kubernetes</a> is today the most popular open-source system for automating deployment, scaling, and management of containerized applications. As the rise of <a href="https://kubernetes.io/">Kubernetes</a>, bunch of companies are running <a href="https://kubernetes.io/">Kubernetes</a> as a platform for various workloads including web applications, databases, cronjobs and so on. Machine Learning workloads, including Deep Learning workloads, are not an exception even though such workloads require special hardwares like GPUs.</p>
<p><a href="https://kubernetes.io/">Kubernetes</a> can <a href="https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/">schedule NVIDIA GPUs by default</a>. So, single node <a href="https://chainer.org/">Chainer</a> workloads are straightforward. You can simply launch a <code class="language-plaintext highlighter-rouge">Pod</code> or a <code class="language-plaintext highlighter-rouge">Job</code> with <a href="https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/"><code class="language-plaintext highlighter-rouge">nvidia.com/gpu</code> resource request</a>.</p>
<p>However running <a href="https://github.com/chainer/chainermn/">ChainerMN</a> on <a href="https://kubernetes.io/">Kubernetes</a> is not straightforward because it requires us to setup an MPI cluster. <a href="https://github.com/kubeflow/kubeflow">Kubeflow</a> can be a big help for it. The <a href="https://github.com/kubeflow/kubeflow">Kubeflow</a> project is dedicated to making deployments of machine learning (ML) workflows on <a href="https://kubernetes.io/">Kubernetes</a> simple, portable and scalable. Please refer to helpful two slides below about <a href="https://github.com/kubeflow/kubeflow">Kubeflow</a> which were presented on <a href="https://events.linuxfoundation.org/events/kubecon-cloudnativecon-europe-2018/">KubeCon + CloudNativeCon Europe 2018</a>.</p>
<ul>
<li><a href="http://sched.co/Duoq">Keynote: Cloud Native ML on Kubernetes - David Aronchick, Product Manager, Cloud AI and Co-Founder of Kubeflow, Google & Vishnu Kannan, Sr. Software Engineer, Google</a></li>
<li><a href="http://sched.co/Drnd">Kubeflow Deep Dive – David Aronchick & Jeremy Lewi, Google</a></li>
</ul>
<p>In this article, I would like to explain how to run <a href="https://github.com/chainer/chainermn/">ChainerMN</a> workloads on <a href="https://kubernetes.io/">Kubernetes</a> with the help of <a href="https://github.com/kubeflow/kubeflow">Kubeflow</a>.</p>
<h2 id="how-to-run-chainermn-on-kubernetes">How to run ChainerMN on Kubernetes</h2>
<p>I explain it in three steps below:</p>
<ul>
<li><a href="#step-1-build-your-docker-image">Step 1. Build Your Container Image</a></li>
<li><a href="#step-2-install-kubeflows-openmpi-package">Step 2. Install Kubeflow’s OpenMPI package</a></li>
<li><a href="#step-3-run-chainermn-on-kubernetes">Step 3. Run ChainerMN on Kubernetes</a></li>
</ul>
<h3 id="prerequisites">Prerequisites</h3>
<ul>
<li><a href="https://kubernetes.io/">Kubernetes</a> cluster equipped with Nvidia GPUs</li>
<li>on your local machine
<ul>
<li><a href="https://www.docker.com/community-edition">docker</a></li>
<li><a href="https://kubernetes.io/docs/tasks/tools/install-kubectl/">kubectl</a></li>
<li><a href="https://ksonnet.io/">ksonnnet</a></li>
</ul>
</li>
</ul>
<h3 id="step-1-build-your-container-image">Step 1. Build Your Container Image</h3>
<p>First we need to build a container image to run your deep learning workload with ChainerMN. All we can just follow <a href="http://chainermn.readthedocs.io/en/stable/installation/index.html">the official ChainerMN installation guides</a>.</p>
<p>For <a href="https://chainer.org/">Chainer</a>/<a href="https://cupy.chainer.org/">Cupy</a>, official docker image <a href="https://hub.docker.com/r/chainer/chainer/"><code class="language-plaintext highlighter-rouge">chainer/chainer</code></a> is available on DockerHub. This is very handy as a base image or runtime image for deep learning workloads because this image is already <code class="language-plaintext highlighter-rouge">nvidia-docker</code> ready.</p>
<p>Below is a sample <code class="language-plaintext highlighter-rouge">Dockerfile</code> to install CUDA aware <a href="https://www.open-mpi.org/">OpenMPI</a>, <a href="https://github.com/chainer/chainermn">ChainerMN</a> and its sample <code class="language-plaintext highlighter-rouge">train_mnist.py</code> script. Please save the contents with the name <code class="language-plaintext highlighter-rouge">Dockerfile</code>.</p>
<div class="language-docker highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> chainer/chainer:v4.0.0-python3</span>
<span class="k">ARG</span><span class="s"> OPENMPI_VERSION="2.1.3"</span>
<span class="k">ARG</span><span class="s"> CHAINER_MN_VERSION="1.2.0"</span>
<span class="c"># Install basic dependencies and locales</span>
<span class="k">RUN </span>apt-get update <span class="o">&&</span> apt-get <span class="nb">install</span> <span class="nt">-yq</span> <span class="nt">--no-install-recommends</span> <span class="se">\
</span> locales wget <span class="nb">sudo </span>ca-certificates ssh build-essential <span class="o">&&</span> <span class="se">\
</span> <span class="nb">rm</span> <span class="nt">-rf</span> /var/lib/apt/lists/<span class="k">*</span> /var/cache/apt/archives/<span class="k">*</span> <span class="o">&&</span> <span class="se">\
</span> <span class="nb">echo</span> <span class="s2">"en_US.UTF-8 UTF-8"</span> <span class="o">></span> /etc/locale.gen <span class="o">&&</span> locale-gen
<span class="c"># Install OpenMPI with cuda</span>
<span class="k">RUN </span><span class="nb">cd</span> /tmp <span class="o">&&</span> <span class="se">\
</span> wget <span class="nt">-q</span> https://www.open-mpi.org/software/ompi/v<span class="k">${</span><span class="nv">OPENMPI_VERSION</span><span class="p">%\.*</span><span class="k">}</span>/downloads/openmpi-<span class="nv">$OPENMPI_VERSION</span>.tar.bz2 <span class="o">&&</span> <span class="se">\
</span> <span class="nb">tar</span> <span class="nt">-xjf</span> openmpi-<span class="nv">$OPENMPI_VERSION</span>.tar.bz2 <span class="o">&&</span> <span class="se">\
</span> <span class="nb">cd</span> /tmp/openmpi-<span class="nv">$OPENMPI_VERSION</span> <span class="o">&&</span> <span class="se">\
</span> ./configure <span class="nt">--prefix</span><span class="o">=</span>/usr <span class="nt">--with-cuda</span> <span class="o">&&</span> make <span class="nt">-j2</span> <span class="o">&&</span> make <span class="nb">install</span> <span class="o">&&</span> <span class="nb">rm</span> <span class="nt">-r</span> /tmp/openmpi-<span class="nv">$OPENMPI_VERSION</span><span class="k">*</span> <span class="o">&&</span> <span class="se">\
</span> ompi_info <span class="nt">--parsable</span> <span class="nt">--all</span> | <span class="nb">grep</span> <span class="nt">-q</span> <span class="s2">"mpi_built_with_cuda_support:value:true"</span>
<span class="c"># Install ChainerMN</span>
<span class="k">RUN </span>pip3 <span class="nb">install </span><span class="nv">chainermn</span><span class="o">==</span><span class="nv">$CHAINER_MN_VERSION</span>
<span class="c"># Download train_mnist.py example of ChainerMN</span>
<span class="c"># In practice, you would download your codes here.</span>
<span class="k">RUN </span><span class="nb">mkdir</span> <span class="nt">-p</span> /chainermn-examples/mnist <span class="o">&&</span> <span class="se">\
</span> <span class="nb">cd</span> /chainermn-examples/mnist <span class="o">&&</span> <span class="se">\
</span> wget https://raw.githubusercontent.com/chainer/chainermn/v<span class="k">${</span><span class="nv">CHAINER_MN_VERSION</span><span class="k">}</span>/examples/mnist/train_mnist.py
</code></pre></div></div>
<p>Then, you are ready to build and publish your container image.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># This takes some time (probably 10-15 min.). please enjoy ☕️.</span>
docker build <span class="nb">.</span> <span class="nt">-t</span> YOUR_IMAGE_HERE
docker publish YOUR_IMAGE_HERE
</code></pre></div></div>
<h3 id="step-2-install-kubeflows-openmpi-package">Step 2. Install Kubeflow’s OpenMPI package</h3>
<p><a href="https://github.com/kubeflow/kubeflow/tree/master/kubeflow/openmpi/">Kubeflow’s OpenMPI package</a> in <a href="https://github.com/kubeflow/kubeflow">Kubeflow</a> enables us launch <a href="https://www.open-mpi.org/">OpenMPI</a> cluster on <a href="https://kubernetes.io/">Kubernetes</a> very easily.</p>
<p>Actually, <strong><a href="https://github.com/kubeflow/kubeflow/blob/master/kubeflow/openmpi">Kubeflow’s OpenMPI package</a> have not been released officially</strong>. But it has been already available in <code class="language-plaintext highlighter-rouge">master</code> branch of <a href="https://github.com/kubeflow/kubeflow">Kubeflow</a> repository. So, Let’s use it. Please note that this package is still in development mode.</p>
<p>Kubeflow depends on <a href="https://ksonnet.io/">ksonnet</a>. If you’re not faimiliar with <a href="https://ksonnet.io/">ksonnet</a>, I recommend you to follow <a href="https://ksonnet.io/docs/tutorial">their official tutorial</a>.</p>
<p>Steps are very similar as discribed in <a href="https://github.com/kubeflow/kubeflow/blob/master/kubeflow/openmpi/">Kubeflow’s OpenMPI package</a>. I modified the original steps slightly because we have to use a specific commit of <a href="https://github.com/kubeflow/kubeflow">Kubeflow</a> repository.</p>
<p><em>NOTE: If you faced <a href="https://developer.github.com/v3/#rate-limiting">rate limit errors</a> of github api, please set up <code class="language-plaintext highlighter-rouge">GITHUB_TOKEN</code> as described <a href="https://github.com/kubeflow/kubeflow#github-tokens">here</a>.</em></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Create a namespace for kubeflow deployment.</span>
<span class="nv">NAMESPACE</span><span class="o">=</span>kubeflow
kubectl create namespace <span class="k">${</span><span class="nv">NAMESPACE</span><span class="k">}</span>
<span class="c"># Generate one-time ssh keys used by Open MPI.</span>
<span class="nv">SECRET</span><span class="o">=</span>openmpi-secret
<span class="nb">mkdir</span> <span class="nt">-p</span> .tmp
<span class="nb">yes</span> | ssh-keygen <span class="nt">-N</span> <span class="s2">""</span> <span class="nt">-f</span> .tmp/id_rsa
kubectl delete secret <span class="k">${</span><span class="nv">SECRET</span><span class="k">}</span> <span class="nt">-n</span> <span class="k">${</span><span class="nv">NAMESPACE</span><span class="k">}</span> <span class="o">||</span> <span class="nb">true
</span>kubectl create secret generic <span class="k">${</span><span class="nv">SECRET</span><span class="k">}</span> <span class="nt">-n</span> <span class="k">${</span><span class="nv">NAMESPACE</span><span class="k">}</span> <span class="nt">--from-file</span><span class="o">=</span><span class="nv">id_rsa</span><span class="o">=</span>.tmp/id_rsa <span class="nt">--from-file</span><span class="o">=</span>id_rsa.pub<span class="o">=</span>.tmp/id_rsa.pub <span class="nt">--from-file</span><span class="o">=</span><span class="nv">authorized_keys</span><span class="o">=</span>.tmp/id_rsa.pub
<span class="c"># Which version of Kubeflow to use.</span>
<span class="c"># For a list of releases refer to:</span>
<span class="c"># https://github.com/kubeflow/kubeflow/releases</span>
<span class="c"># (Specific commit hash is specified here.)</span>
<span class="nv">VERSION</span><span class="o">=</span>e2fbf9e25e087eeb6ee1f9414526c6ed917c4bf9
<span class="c"># Initialize a ksonnet app. Set the namespace for it's default environment.</span>
<span class="nv">APP_NAME</span><span class="o">=</span>chainermn-example
ks init <span class="k">${</span><span class="nv">APP_NAME</span><span class="k">}</span>
<span class="nb">cd</span> <span class="k">${</span><span class="nv">APP_NAME</span><span class="k">}</span>
ks <span class="nb">env set </span>default <span class="nt">--namespace</span> <span class="k">${</span><span class="nv">NAMESPACE</span><span class="k">}</span>
<span class="c"># Install Kubeflow components.</span>
ks registry add kubeflow github.com/kubeflow/kubeflow/tree/<span class="k">${</span><span class="nv">VERSION</span><span class="k">}</span>/kubeflow
ks pkg <span class="nb">install </span>kubeflow/openmpi@<span class="k">${</span><span class="nv">VERSION</span><span class="k">}</span>
</code></pre></div></div>
<h3 id="step-3-run-chainermn">Step 3. Run ChainerMN!</h3>
<p>Now ready to run distributed <code class="language-plaintext highlighter-rouge">train_mnist.py</code>! According to standard <a href="https://ksonnet.io/">ksonnet</a> way, we firstly generate <em><code class="language-plaintext highlighter-rouge">train_mnist</code> component</em> from <em><code class="language-plaintext highlighter-rouge">openmpi</code> prototype</em>.</p>
<p>When generating a component, we can specify several <em>parameters</em>. In this example, we specify</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">train-mnist</code> for its name,</li>
<li><code class="language-plaintext highlighter-rouge">4</code> workers,</li>
<li><code class="language-plaintext highlighter-rouge">1</code> GPU for each worker, and</li>
<li>launching <code class="language-plaintext highlighter-rouge">mpiexec ... train_mnist.py</code> scirpt for <code class="language-plaintext highlighter-rouge">exec</code> param</li>
</ul>
<p>And then, <code class="language-plaintext highlighter-rouge">ks apply</code> command deploy our <a href="https://www.open-mpi.org/">OpenMPI</a> cluster on <a href="https://kubernetes.io/">Kubernetes</a> cluster.</p>
<p><em>Please be advised that this step requires an authorization to create service accounts and cluster role bindings for “view” cluster role. If you didn’t have such authorization, you will have to ask your administrator to create a service account which is granted ‘get’ verb for ‘pods’ resources. If such service account was ready, you then will set it to <code class="language-plaintext highlighter-rouge">serviceAccountName</code> param of <code class="language-plaintext highlighter-rouge">train-mnist</code> component.</em></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># See the list of supported parameters.</span>
ks prototype describe openmpi
<span class="c"># Generate openmpi components.</span>
<span class="nv">COMPONENT</span><span class="o">=</span>train-mnist
<span class="nv">IMAGE</span><span class="o">=</span>YOUR_IMAGE_HERE
<span class="nv">WORKERS</span><span class="o">=</span>4
<span class="nv">GPU</span><span class="o">=</span>1
<span class="nv">EXEC</span><span class="o">=</span><span class="s2">"mpiexec -n </span><span class="k">${</span><span class="nv">WORKERS</span><span class="k">}</span><span class="s2"> --hostfile /kubeflow/openmpi/assets/hostfile --allow-run-as-root --display-map -- python3 /chainermn-examples/mnist/train_mnist.py -g"</span>
ks generate openmpi <span class="k">${</span><span class="nv">COMPONENT</span><span class="k">}</span> <span class="nt">--image</span> <span class="k">${</span><span class="nv">IMAGE</span><span class="k">}</span> <span class="nt">--secret</span> <span class="k">${</span><span class="nv">SECRET</span><span class="k">}</span> <span class="nt">--workers</span> <span class="k">${</span><span class="nv">WORKERS</span><span class="k">}</span> <span class="nt">--gpu</span> <span class="k">${</span><span class="nv">GPU</span><span class="k">}</span> <span class="nt">--exec</span> <span class="s2">"</span><span class="k">${</span><span class="nv">EXEC</span><span class="k">}</span><span class="s2">"</span>
<span class="c"># Deploy to your cluster.</span>
ks apply default
<span class="c"># Clean up, execute below two commands</span>
<span class="c"># ks delete default</span>
<span class="c"># kubectl delete secret ${SECRET}</span>
</code></pre></div></div>
<p>This launches <code class="language-plaintext highlighter-rouge">1</code> master pod and <code class="language-plaintext highlighter-rouge">4</code> worker pods and some supplemental parts. Once <code class="language-plaintext highlighter-rouge">train-mnist-master</code> pod became <code class="language-plaintext highlighter-rouge">Running</code> state, training logs will be seen.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Inspect pods status</span>
<span class="c"># Wait until all pods are 'Running'</span>
kubectl get pod <span class="nt">-n</span> <span class="k">${</span><span class="nv">NAMESPACE</span><span class="k">}</span> <span class="nt">-o</span> wide
</code></pre></div></div>
<p>If all went good, our job progress will be seen on your terminal with <code class="language-plaintext highlighter-rouge">kubectl logs</code>!! It will show our deep learning jobs are distributed across <code class="language-plaintext highlighter-rouge">4</code> workers!</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Inspect training logs</span>
kubectl logs <span class="nt">-n</span> <span class="k">${</span><span class="nv">NAMESPACE</span><span class="k">}</span> <span class="nt">-f</span> <span class="k">${</span><span class="nv">COMPONENT</span><span class="k">}</span><span class="nt">-master</span>
</code></pre></div></div>
<p>This will show you training logs (I omitted several warning messages you can ignore)!!</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>...
======================== JOB MAP ========================
Data for node: train-mnist-worker-0.train-mnist.kubeflow Num slots: 16 Max slots: 0 Num procs: 1
Process OMPI jobid: [13015,1] App: 0 Process rank: 0 Bound: N/A
Data for node: train-mnist-worker-1.train-mnist.kubeflow Num slots: 16 Max slots: 0 Num procs: 1
Process OMPI jobid: [13015,1] App: 0 Process rank: 1 Bound: N/A
Data for node: train-mnist-worker-2.train-mnist.kubeflow Num slots: 16 Max slots: 0 Num procs: 1
Process OMPI jobid: [13015,1] App: 0 Process rank: 2 Bound: N/A
Data for node: train-mnist-worker-3.train-mnist.kubeflow Num slots: 16 Max slots: 0 Num procs: 1
Process OMPI jobid: [13015,1] App: 0 Process rank: 3 Bound: N/A
=============================================================
==========================================
Num process (COMM_WORLD): 4
Using GPUs
Using hierarchical communicator
Num unit: 1000
Num Minibatch-size: 100
Num epoch: 20
==========================================
epoch main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time
1 0.285947 0.106961 0.917333 0.9681 16.6241
2 0.0870434 0.0882483 0.9736 0.9708 23.0874
3 0.050553 0.0709311 0.9842 0.9781 28.6014
...
</code></pre></div></div>
Thu, 10 May 2018 00:00:00 +0000
https://chainer.org/general/2018/05/10/chainermn-on-kubernetes.html
https://chainer.org/general/2018/05/10/chainermn-on-kubernetes.htmlGeneral