Having trouble setting-up environments for Deep Learning? We do this for you! From now on, you shall say goodbye to annoying messages such as "Build failed..." or "An error occurred during installation...".
Currently, we maintain the following docker images:
- Keras using TensorFlow Backened
- Keras using CNTK Backend
- Keras using MXNET Backend
- Keras using Theano Backend
Apparantly, all these environments support using Keras as the frontend.
See below for more details about these environments.
- Before Getting Started
- Summary of the Images
- Keras using TensorFlow Backend
- Keras using MXNET Backend
- Keras using CNTK Backend
- Keras using Theano Backend
- ndrun - Run a Docker Container for Your Deep-Learning Research
- Getting Started with the Command Line
- Advanced Usage of the Command Line
- Getting Started with Jupyter Notebook
- NVIDIA-Docker2 has to be installed. See [here] for how to install and [here] for its introduction.
- Docker needs to be configured. For example, you may have to add your user to the
docker
group. see [here] for Docker setup. - Beware: the recent images we've built contain CUDA
9.2
, which requires NVIDIA driver version>=396
. You can get the latest NVIDIA driver [here].
The following tables list the docker images maintained by us. All these listed images are retrievable through Docker Hub.
-
Images within the repository: honghu/keras
Keras Backend Image's Tag Description Dockerfile TensorFlow tf-cu9.2-dnn7.2-py3-avx2-18.09
tf-latestTensorFlow v1.10.1
Intel® Distribution for Pythonv2018.3.039
Kerasv2.2.2
NCCLv2.2.13
[Click] TensorFlow tf-cu9.2-dnn7.1-py3-avx2-18.08 TensorFlow v1.10.0
Intel® Distribution for Pythonv2018.3.039
Kerasv2.2.2
NCCLv2.2.13
[Click] TensorFlow tf-cu9-dnn7-py3-avx2-18.03 TensorFlow v1.6.0
Kerasv2.1.5
[Click] TensorFlow tf-cu9-dnn7-py3-avx2-18.01 TensorFlow v1.4.1
Kerasv2.1.2
[Click] MXNet mx-cu9.2-dnn7.2-py3-18.09
mx-latestMXNet v1.3.0-dev
GluonCVv0.3.0-dev
Intel® Distribution for Pythonv2018.3.039
Keras-MXNetv2.2.2
NCCLv2.2.13
[Click] MXNet mx-cu9.2-dnn7.1-py3-18.08 MXNet v1.3.0-dev
GluonCVv0.3.0-dev
Intel® Distribution for Pythonv2018.3.039
Keras-MXNetv2.2.0
NCCLv2.2.13
[Click] MXNet mx-cu9-dnn7-py3-18.03 MXNet v1.2.0
Keras-MXNetv2.1.3
[Click] MXNet mx-cu9-dnn7-py3-18.01 MXNet v1.0.1
Keras-MXNetv1.2.2
[Click] CNTK cntk-cu9.2-dnn7.2-py3-18.09
cntk-latestCNTK v2.5.1
Intel® Distribution for Pythonv2018.3.039
Kerasv2.2.2
[Click] CNTK cntk-cu9-dnn7-py3-18.08 CNTK v2.5.1
Kerasv2.2.2
[Click] CNTK cntk-cu9-dnn7-py3-18.03 CNTK v2.4
Kerasv2.1.5
[Click] CNTK cntk-cu8-dnn6-py3-18.01 CNTK v2.2
Kerasv2.1.2
[Click] Theano theano-cu9.0-dnn7.0-py3-18.09
theano-latestTheano v1.0.2
Intel® Distribution for Pythonv2018.3.039
Kerasv2.2.2
[Click] Theano theano-cu9-dnn7-py3-18.03 Theano v1.0.1
Kerasv2.1.5
[Click] Theano theano-cu9-dnn7-py3-18.01 Theano v1.0.1
Kerasv2.1.2
[Click] -
Images within the repository: honghu/intelpython3
Tag Description Dockerfile gpu-cu9.2-dnn7.2-18.09 Intel® Distribution for Python v2018.3.039
Ubuntu18.04
[Click] gpu-cu9.0-dnn7.2-18.09 Intel® Distribution for Python v2018.3.039
Ubuntu16.04
[Click] gpu-cu9.2-dnn7.1-18.08 Intel® Distribution for Python v2018.3.039
Ubuntu18.04
[Click] cpu-18.09 Intel® Distribution for Python v2018.3.039
Ubuntu18.04
[Click] cpu-18.08 Intel® Distribution for Python v2018.3.039
Ubuntu18.04
[Click]
This environment can be obtained via:
docker pull honghu/keras:tf-cu9.2-dnn7.2-py3-avx2-18.09
which includes
- Keras
v2.2.2
- TensorFlow
v1.10.0
- Intel® Distribution for Python
v2018.3.039
, including accelerated NumPy and scikit-learn. - NVIDIA CUDA
9.2
, cuDNN7.2
and NCCL2.2
. - Must-to-have packages such as XGBOOST, Pandas, OpenCV, imgaug, Matplotlib, Seaborn and Bokeh.
This environment can be obtained via:
docker pull honghu/keras:mx-cu9.2-dnn7.2-py3-18.09
which includes
- Keras-MXNet
v2.2.2
- MXNet
v1.3.0-dev
- GluonCV
v0.3.0-dev
- Intel® Distribution for Python
v2018.3.039
, including accelerated NumPy and scikit-learn. - NVIDIA CUDA
9.2
, cuDNN7.2
and NCCL2.2
. - Must-to-have packages such as XGBOOST, Pandas, OpenCV, imgaug, Matplotlib, Seaborn and Bokeh.
This environment can be obtained via:
docker pull honghu/keras:cntk-cu9.2-dnn7.2-py3-18.09
which includes
- Keras
v2.2.2
- CNTK
v2.5.1
- NVIDIA CUDA
9.2
and cuDNN7.2
. - Must-to-have packages such as Pandas, OpenCV, imgaug, Matplotlib, Seaborn and Bokeh.
Remark
- According to Microsoft, CNTK backend of Keras is still in beta. But, never mind! For the task such as text generation, switching the backend from TensorFlow to CNTK could possibly increase the speed of training significantly. See a [benchmark] made by Max Woolf.
This environment can be obtained via:
docker pull honghu/keras:theano-cu9.0-dnn7.0-py3-18.09
which includes
- Keras
v2.2.2
- Theano
v1.0.2
- NVIDIA CUDA
9.0
and cuDNN7.0
. - Must-to-have packages such as Pandas, OpenCV, imgaug, Matplotlib, Seaborn and Bokeh.
Remark
- As Theano has stopped developing, we will not update this image regularly.
Before you proceed to the next section, please get ndrun
first:
# Create the "bin" directory if you don't have one inside your home folder.
if [ ! -d ~/bin ] ;then
mkdir ~/bin
fi
# Get the wrapper file and save it to "~/bin/ndrun".
wget -O ~/bin/ndrun https://raw.githubusercontent.com/chi-hung/DockerbuildsKeras/master/ndrun.sh
# Make the wrapper file executable.
chmod +x ~/bin/ndrun
ndrun
is a tool that helps you to run a deep-learning environment. Before using it, please be sure to re-open your terminal in order to let the system know where this newly-added script ndrun
is. In other words, make sure $HOME/bin
is within your system's $PATH
and then reload bash
.
Remark:
ndrun
has to be used along with the recent images (images made starting Sep. 2018). There's no garantee that it will work fine with the older images.
Let's prepare a script that will import TensorFlow and print its version out:
# Create a script that prints TensorFlow's version.
printf "import tensorflow as tf \
\nprint('TensorFlow version=',tf.__version__)" \
> check_tf_version.py
Now, using ndrun
, the script check_tf_version.py
can be executed easily using our TensorFlow image. All you have to do is add ndrun
before python3 check_tf_version.py
:
ndrun python3 check_tf_version.py
And you should get the following output:
TensorFlow version= 1.10.1
which indicates that the current version of TensorFlow is 1.10.1
. Now, the question then arises: where is this TensorFlow installed? Indeed, the TensorFlow's version you've seen is from the TensorFlow installed inside our latest TensorFlow image.
To activate another image, we can use the option -t [IMG_TYPE]
. For example, let's now prepare a script that will import CNTK and print its version out:
# Create a script that checks CNTK's version.
printf "import cntk \
\nprint('CNTK version=',cntk.__version__)" \
> check_cntk_version.py
To run this script using the CNTK image, simply add the option -t cntk
:
# Print CNTK's version out.
ndrun -t cntk python3 check_cntk_version.py
Its output:
CNTK version= 2.5.1
Currently, the possible choices of [IMG_TYPE]
are:
tensorflow
cntk
mxnet
theano
Remark
- If you select an image via its type, i.e. via
[IMG_TYPE]
, then, the latest image of that type will be selected. - The latest TensorFlow image will be selected, if you do not inform
ndrun
which image it should select. - If you don't have the selected image locally, docker will pull it from Docker Hub and that might take some time.
- You can also select an image via its tag. Type
ndrun --help
for more details.
Now, let's retrieve an example from Google's GitHub repository aimed at handwritten-digits classification. This simple model (has only one hidden layer) is written in TensorFlow and MNIST is the dataset it's using.
# Get "mnist_with_summaries.py" from Google's GitHub repository.
wget https://raw.githubusercontent.com/tensorflow/tensorflow/master/tensorflow/examples/tutorials/mnist/mnist_with_summaries.py
Then, the retrieved script mnist_with_summaries.py
can now be easily executed, via:
ndrun python3 mnist_with_summaries.py
The output should be similar to the following:
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting /tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting /tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting /tmp/tensorflow/mnist/input_data/t10k-labels-idx1-ubyte.gz
2017-10-16 17:33:59.597331: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:06:00.0
totalMemory: 15.77GiB freeMemory: 15.36GiB
2017-10-16 17:33:59.597368: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:06:00.0, compute capability: 7.0)
Accuracy at step 0: 0.1426
Accuracy at step 10: 0.6942
Accuracy at step 20: 0.8195
Accuracy at step 30: 0.8626
...
The previous example utilizes only one GPU. In this example, we suppose you have multiple GPUs at hand and you would like to train a model that utilizes multi-GPUs.
To be more specific:
- Our goal is to demostrate how you can run a script that classifies images of the CIFAR10 dataset.
- The model we are going to train is a small Convolutional Neural Network. For more details of this model, see the TensorFlow's official tutorial.
First, let's pull some models from the Google's GitHub repository. We also need to get the CIFAR10 dataset, which is roughly 162MB:
# Clone tensorflow/models to your local folder.
# Say, to your home directory.
git clone https://github.com/tensorflow/models.git $HOME/models
# There was a bug in the CIFAR10 example.
# We temporarily switch to an older version of this repository.
cd $HOME/models && \
git checkout c96ef83
# Let's also retrieve the CIFAR10 Dataset and put it into
# our home directory.
wget -O $HOME/cifar-10-binary.tar.gz \
https://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz
Now, you should have:
cifar-10-binary.tar.gz
(the CIFAR10 dataset)models
(a folder that contains many deep-learning models)
in your home directory.
The script we are going to run iscifar10_multi_gpu_train.py
, which is located at $HOME/models/tutorials/image/cifar10/
. Before we train the model, we need to set up configurations for training. Let's use --help
to find out the acceptable configurations of cifar10_multi_gpu_train.py
:
ndrun python3 models/tutorials/image/cifar10/cifar10_multi_gpu_train.py --help
which returns the following output:
usage: cifar10_multi_gpu_train.py [-h] [--batch_size BATCH_SIZE]
[--data_dir DATA_DIR]
[--use_fp16 [USE_FP16]] [--nouse_fp16]
[--train_dir TRAIN_DIR]
[--max_steps MAX_STEPS]
[--num_gpus NUM_GPUS]
[--log_device_placement [LOG_DEVICE_PLACEMENT]]
[--nolog_device_placement]
optional arguments:
-h, --help show this help message and exit
--batch_size BATCH_SIZE
Number of images to process in a batch.
--data_dir DATA_DIR Path to the CIFAR-10 data directory.
--use_fp16 [USE_FP16]
Train the model using fp16.
--nouse_fp16
--train_dir TRAIN_DIR
Directory where to write event logs and checkpoint.
--max_steps MAX_STEPS
Number of batches to run.
--num_gpus NUM_GPUS How many GPUs to use.
--log_device_placement [LOG_DEVICE_PLACEMENT]
Whether to log device placement.
--nolog_device_placement
As you can see, you can choose number of GPUs to be used via --num_gpus NUM_GPUS
and you can set --data_dir TRAIN_DIR
, which tells the script where the downloaded CIFAR10 dataset is.
Now, we are ready to train the model:
# Switch to your home directory.
cd $HOME
# Train the model.
ndrun -n 2 python3 models/tutorials/image/cifar10/cifar10_multi_gpu_train.py \
--num_gpus=2 \
--data_dir=/workspace \
--batch_size=128 \
--max_steps=100 \
--fp16
Your output should be similar to what I've got (2x NVIDIA Tesla V100):
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
2017-10-17 04:36:51.811596: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:06:00.0
totalMemory: 15.77GiB freeMemory: 15.36GiB
2017-10-17 04:36:52.434640: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 1 with properties:
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:07:00.0
totalMemory: 15.77GiB freeMemory: 15.36GiB
2017-10-17 04:36:52.434689: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Device peer to peer matrix
2017-10-17 04:36:52.434702: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] DMA: 0 1
2017-10-17 04:36:52.434726: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 0: Y Y
2017-10-17 04:36:52.434748: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 1: Y Y
2017-10-17 04:36:52.434758: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:06:00.0, compute capability: 7.0)
2017-10-17 04:36:52.434765: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0000:07:00.0, compute capability: 7.0)
2017-10-17 04:36:59.790431: step 0, loss = 4.68 (38.3 examples/sec; 3.346 sec/batch)
2017-10-17 04:37:00.205024: step 10, loss = 4.59 (24464.4 examples/sec; 0.005 sec/batch)
2017-10-17 04:37:00.323271: step 20, loss = 4.58 (20327.2 examples/sec; 0.006 sec/batch)
2017-10-17 04:37:00.439341: step 30, loss = 4.50 (23105.6 examples/sec; 0.006 sec/batch)
2017-10-17 04:37:00.558475: step 40, loss = 4.35 (22412.1 examples/sec; 0.006 sec/batch)
2017-10-17 04:37:00.675634: step 50, loss = 4.48 (23193.5 examples/sec; 0.006 sec/batch)
2017-10-17 04:37:00.791710: step 60, loss = 4.21 (23634.6 examples/sec; 0.005 sec/batch)
2017-10-17 04:37:00.911417: step 70, loss = 4.26 (21293.4 examples/sec; 0.006 sec/batch)
2017-10-17 04:37:01.028642: step 80, loss = 4.22 (22391.1 examples/sec; 0.006 sec/batch)
2017-10-17 04:37:01.149847: step 90, loss = 3.98 (20516.7 examples/sec; 0.006 sec/batch)
Remark
-
As you activate a docker image using
ndrun
, your current working directory on the host machine, e.g.$HOME
, will automatically be mounted to/workspace
, a default working directory inside the docker container.Since the script runs inside the docker container, it can only find the CIFAR10 dataset at
/workspace
(caution! not at$HOME
!). Therefore, you should set--data_dir=/workspace
. -
Use
-n [NUM_GPUS]
to specify number of GPUs visible to the running image. If you don't pass this option tondrun
, then, by default,ndrun
will use only 1 GPU to run your script.
Here's a mistake: the script sees two available GPUs. However, we ask it to use only one of the available GPUs for training:
# Switch to your home directory.
cd $HOME
# Train the model.
ndrun -n 2 python3 models/tutorials/image/cifar10/cifar10_multi_gpu_train.py \
--num_gpus=1 \
--data_dir=/workspace \
--batch_size=128 \
--max_steps=100 \
--fp16
During the run-time of this script, we can check the status of GPUs vianvidia-smi
:
chweng@server1:~$ nvidia-smi
Wed Aug 8 01:28:01 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81 Driver Version: 384.81 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:01:00.0 On | N/A |
| 31% 58C P2 143W / 250W | 10844MiB / 11171MiB | 72% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:02:00.0 Off | N/A |
| 27% 51C P2 58W / 250W | 10622MiB / 11172MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3046 C python3 10451MiB |
| 1 3046 C python3 10611MiB |
+-----------------------------------------------------------------------------+
The above output reveals that GPU0 is being utilized (GPU-Util=72%). Interestingly, although GPU1's RAM is almost fully occupied, it's not utilized at all (GPU-Util=0%).
This is a default behavior of TensorFlow. By default, when TensorFlow gets started, it begins to aggressively occupy the RAM of all the available GPU devices.
Avoid this mistake, otherwise you'll waste your GPU resources.
If you'd like to run your script using GPU6 and GPU7, you can pass NV_GPU=6,7
to ndrun. Let's look an example:
# Switch to your home directory.
cd $HOME
# Train the model using GPU6 and GPU7.
NV_GPU=6,7 ndrun -n 2 python3 models/tutorials/image/cifar10/cifar10_multi_gpu_train.py \
--num_gpus=2 \
--data_dir=/workspace \
--batch_size=128 \
--fp16
or, if you want to utlize 4 GPUs, say, GPU0, GPU1, GPU2 and GPU3:
# Switch to your home directory.
cd $HOME
# Train the model using GPU0, GPU1, GPU2 and GPU3.
NV_GPU=0,1,2,3 ndrun -n 4 python3 models/tutorials/image/cifar10/cifar10_multi_gpu_train.py \
--num_gpus=4 \
--data_dir=/workspace \
--batch_size=128 \
--fp16
However, I would suggest you avoid passing NV_GPU to ndrun
, unless you are pretty sure that's what you want. This is because ndrun
will automatically find available GPU devices for you. Here, an available GPU device means it has GPU-Utilization < 30% and has free memory > 2048MB. If you wish, you can rewrite these criteria inside ndrun
.
If you don't pass any script to ndrun
, then, ndrun
will activate a docker container that runs Jupyter Notebook for you. See the example below for more details.
The easiest way to dive into Deep Learning with MXNet's new interface, gluon, is to follow the tutorials of [Deep Learning - The Straight Dope]. These tutorials are written in the nice form of Jupyter Notebook, which allows executable scripts, equations, explanations and figures to be contained at one place - a notebook.
Let's clone these tutorials into, say, our home directory:
cd $HOME && git clone https://github.com/zackchase/mxnet-the-straight-dope.git
Now, you'll see a folder called mxnet-the-straight-dope
within your home directory. Let's switch to this directory and initialize a daemon of Jupyter Notebook from there:
cd $HOME/mxnet-the-straight-dope
ndrun -n 1 -t mxnet -p 8889
The above command activates the latest MXNet image. It utilize 1 GPU and is now served as a daemon that listens to Port 8889
on the side of your host machine.
Its output:
An intepreter such as python3 or bash, is not given.
You did not provide me the script you would like to execute.
NV_GPU=0
Starting Jupyter Notebook...
* To use Jupyter Notebook, open a browser and connect to the following address:
http://localhost:8889/?token=c5676caa643ecf9ebbfd8781381d0c0dbfbfcc1e67028e7a
* To stop and remove this container, type:
docker stop 5fb5489f198b && docker rm 5fb5489f198b
* To enter into this container, type:
docker exec -it 5fb5489f198b bash
Now, by opening a web browser and connecting to the URL given above, we are able to start learning MXNet gluon:
Bravo!