This project implements a image-to-image translation method as described in the paper - Image-to-Image Translation with Conditional Adversarial Networks by Phillip Isola et al. (arXiv:1611.07004)
It was made as the final project for CS 763 - Computer Vision course in Spring 2019 at Indian Institute of Technology (IIT) Bombay, India.
pix2pix uses a conditional generative adversarial network to efficiently design a general-purpose image-to- image translation system. Image-to-image translation involves learning a mapping from images from one distribution to corresponding images in another distribution. Many kinds of problems can be viewed as an image-to-image translation problem, including image colorization, edges to object visualization, style transfer etc.
For example, an output for Satellite-to-Maps view would be
All the image output files in this project will be of the above format i.e.
[Source - Target_Ground_Truth - Target_Generated]
I had tested this project with the following datasets released public by the authors (link in Acknowledgements section)
- Facades
- Maps (satellite-to-map)
- Maps (map-to-satellite)
Follow the instructions below to get our project running on your local machine.
- Clone the repository and make sure you have prerequisites below to run the code.
- Run
python src/main.py --help
to see the various options available to specify. - To train the model, run the command
python src/main.py ...
along with the flags. For example, to run on the maps (map-to-satellite) dataset, you may run
python src/main.py --mode train --data_root '../datasets/maps' --num_epochs 100 --data_invert
- All the outputs will be saved to
src/output/[timestamp]
where[timestamp]
is the time of start of training.
-
Python 3.7.1 or above
-
PyTorch 1.0.0 or above
-
CUDA 9.1 (or other version corresponding to PyTorch) to utilize any compatible GPU present for faster training
[The code is tested to be working with the above versions on a Windows 10 machine with GTX 1070. It may also work for other lower versions.]
Code of the various modules can be found in the modules.py file.
- Generator
- I had used a
U-Net
(arXiv:1505.04597) like architecture for the generator, which is simply an encoder-decoder architecture with skip connections in between them.
- I had used a
[Image Courtesy: Author's paper]
-
Precisely, the encoder channels vary as
in_channels -> 64 -> 128 -> 256 -> 512 -> 512 -> 512 -> 512
and the decoder's channel sizes vary accordingly. -
Discriminator
- For the discriminator, a
PatchGAN
is used. APatchGAN
is similar to a common discriminator, except that it tries to classify each patch of N × N size whether it is real or fake. - In our case, we take N = 70. This is in our code achieved by using a Convolutional network whose receptive field is 70 on the input image to the discriminator. Mathematically, this can be checked to be equivalent to what has been described in the paper.
- The channel sizes in our
PatchGAN
vary asin_channels -> 64 -> 128 -> 256 -> 512 -> out_channels
.
- For the discriminator, a
-
Hyperparameters
- I had used the default parameters mentioned in the code of
main.py
. You may easily test on other values by suitably changing the flags.
- I had used the default parameters mentioned in the code of
All the results shown here are on test data.
![]() |
![]() |
---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
---|---|
![]() |
![]() |
![]() |
![]() |
As a sanity check, I would like to point out that on the training set, the model was able to give good outputs as shown below, indicating that it's capacity was quite sufficient.
![]() |
![]() |
For the Facades dataset,
Generator Loss [Training] | Discriminator Loss [Training] |
---|---|
![]() |
![]() |
- Vamsi Krishna Reddy Satti - vamsi3
- I would like to thank the authors of the paper for the amazing public dataset found here.
This project is licensed under MIT License - please see the LICENSE file for details.