Vishal Ramesha, Abhishek Aditya BS, Yashas Kadambi, T Vijay Prashant, Shylaja S S
This repository contains the official implementation for the paper titled "Towards Faster and Efficient Lightweight Image Super Resolution using SwinV2 Transformers and Fourier Convolutions".
Lightweight Single Image Super Resolution in recent times has seen lot of advances. Transformer based methods have achieved great improvements over CNN-based methods. This is mainly driven by transformer's ability to effectively model long-range dependencies in images. However, transformer based approaches have lot of parameters and are computationally expensive during inference. In this work, we propose SWIFT, a hybrid of Swin transformers and Fast Fourier Convolutions (FFC). SWIFT consists of three stages: shallow feature extraction, deep feature extraction and high-quality image reconstruction. Deep feature extraction consists of SwinV2 transformer blocks extended with Attention Scaling and our novel Residual Frequency Blocks (RFB) to effectively extract high frequency details and model long-range dependencies. Experimental results on popular benchmarking datasets shows that SWIFT outperforms state-of-the-art methods like SwinIR in the realm of lightweight SISR while using 33.55% less parameters and being upto 60% faster during inference.
We use the DIV2K dataset (800 images) for training the model. We also experiment with training models on DF2K (DIV2K + Flickr2K). The model scores reported in the paper and the released checkpoints for the model use only DIV2K for training.
train.py
is the main training file used for training the model. The model has been trained for 700K iterations in total for each of 2x, 3x, and 4x scales. Methods like SwinIR, SwinFIR train 2x model from scratch and fine-tune for 3x and 4x scales. However, in our work we train all the models from scratch.
Structure of dataset is used in our work is as shown below,
- Datasets
|__DIV2K
|__DIV2K_train_HR
| |__HR
| |__0001.png
| |__0002.png
| |
| |__0800.png
|__DIV2K_train_LR_bicubic
|__X2
| |__0001x2.png
| |__0002x2.png
| |
| |__0800x2.png
|__X3
| |__0001x3.png
| |__0002x3.png
| |
| |__0800x3.png
|__X4
|__0001x4.png
|__0002x4.png
|
|__0800x4.png
The different training options that train.py
provides is shown below
$ python3 train.py --help
usage: train.py [-h] [--lr float] [--n_epochs int] [--batch_size int] [--test_batch_size int] [--gamma int] [--step_size int] [--root str] [--n_train int] [--n_val int] [--cuda] [--threads int] [--amp] [--load_mem] [--ckpt_dir str]
[--start_epoch int] [--log_every int] [--test_every int] [--save_every int] [--pretrained str] [--resume str] [--scale int] [--patch_size int] [--rgb_range int] [--n_colors int] [--seed int] [--show_metrics] [--ext str]
[--model str]
Towards Faster and Efficient Lightweight Image Super Resolution using Swin Transformers and Fourier Convolutions
options:
-h, --help show this help message and exit
--lr float Learning rate for training. Default=2e-4
--n_epochs int Number of epochs to train model. By default model trains till 700K iterations.
--batch_size int Batch size to use for training. Default=64
--test_batch_size int
Batch size to use for validation. Default=1
--gamma int Learning rate decay factor. Default=0.5
--step_size int learning rate decay per N epochs
--root str Path to root of dataset directory.
--n_train int Number of samples in training set. Default=800
--n_val int Number of images in validation set. Default=1
--cuda Use CUDA enabled devices for training
--threads int Number of workers for dataloader. Default=12
--amp Enables Automatic Mixed Precision for training.
--load_mem Loads entire dataset to RAM.
--ckpt_dir str Path to model checkpoint directory.
--start_epoch int Epoch number to resume training.
To train SWIFT, please type the following commands in terminal
python3 train.py --scale=2 --patch_size=128 --root=<path_to_Dataset> \
--lr=2e-4 --n_epochs=100000 --batch_size=64 --threads=8 --n_train=800 \
--ckpt_dir="./experiment" --log_every=100 --test_every=1000 \
--save_every=2000 --cuda --amp --load_mem --model="SWIFTx2"
The above command trains the model for x2 scale on random patches of size 128 x 128. To train for x3 and x4, --scale
must be set to 3 or 4 and --patch_size
must be 192 (for x3) and 256 (for x4).
SWIFT uses tensorboard for storing all training metrics and predictions on images. Tensorboard can be launched by typing the following command in a new terminal.
tensorboard --logdir=runs --bind_all
We use a single NVIDIA TESLA A100 GPU for training the models.
We perform validation on Set5 dataset during training. We test the effectiveness of SWIFT on popular benchmarking datasets like Set5, Set14, BSD100, Urban100 and Manga109. test.py
script was used to obtain the scores reported in paper. This script performs inference of SWIFT model on the benchmarking dataset.
To execute the testing script, type the following command in terminal
# x2 scale
python3 test.py --scale=2 --patch_size=128 --model_path="./model_zoo/SWIFT/SWIFT-S-2x.pth" --cuda
# x3 scale
python3 test.py --scale=3 --patch_size=192 --model_path="./model_zoo/SWIFT/SWIFT-S-3x.pth" --cuda
# x4 scale
python3 test.py --scale=4 --patch_size=256 --model_path="./model_zoo/SWIFT/SWIFT-S-4x.pth" --cuda
Other options provided by the test.py
are shown below
$ python3 test.py --help
usage: test.py [-h] --scale int --model_path str [--batch_size int] [--cuda] [--jit] [--forward_chop] [--seed int] [--summary]
Towards Faster and Efficient Lightweight Image Super Resolution using Swin Transformers and Fourier Convolutions
options:
-h, --help show this help message and exit
--scale int Super resolution scale. Scales: 2, 3, 4.
--model_path str Path to the trained SWIFT model.
--batch_size int Batch size to use for testing. Default=1.
--cuda Use CUDA enabled device to perform testing.
--jit Perform inference using JIT.
--forward_chop Use forward_chop for performing inference on devices with less memory.
--seed int Seed for reproducibility.
--summary Print summary table for model.
test.py
script does not store the predictions in files. Please see the next section for storing the results from SWIFT. Using --forward_chop
redcues memory consumption but results in lower PSNR and SSIM scores compared to the reported scores.
If you want to use SWIFT to get predictions on the benchmarking datasets or on custom images, use predict.py
script instead of test.py
script. predict.py
script stores the results of predictions in results/
folder.
If you have both LR (low resolution) and HR (high resolution) image pairs, then execute the following command in terminal,
# for x2
python3 predict.py --scale=2 --training_patch_size=128 --model_path="./model_zoo/SWIFT/SWIFT-S-2x.pth" --folder_lq=<path_to_LR_image_folder> --folder_gt=<path_to_HR_image_folder> --cuda
# for x3
python3 predict.py --scale=3 --training_patch_size=192 --model_path="./model_zoo/SWIFT/SWIFT-S-3x.pth" --folder_lq=<path_to_LR_image_folder> --folder_gt=<path_to_HR_image_folder> --cuda
# for x4
python3 predict.py --scale=4 --training_patch_size=256 --model_path="./model_zoo/SWIFT/SWIFT-S-4x.pth" --folder_lq=<path_to_LR_image_folder> --folder_gt=<path_to_HR_image_folder> --cuda
Passing both HR and LR images will calculate the PSNR and SSIM scores using HR images as ground truth.
If you only have LR images and would like to get the SWIFT super resolved images, type the following command in terminal,
python3 predict.py --scale=4 --training_patch_size=256 --model_path="./model_zoo/SWIFT/SWIFT-S-4x.pth" --folder_lq=<path_to_LR_image_folder> --cuda
Note that in the above command, the path to HR folder is omitted.
Optionally, you can use --jit
flag to compile the model using JIT which speeds up inference time. If you have less memory for inference, use --forward_chop
flag to reduce memory consumption.
The complete list of options provided by predict.py
script is shown below
$ python3 predict.py --help
usage: predict.py [-h] --scale int --model_path str --folder_lq str [--folder_gt str] [--tile int] [--tile_overlap int] [--cuda] [--jit] [--forward_chop] [--summary]
Towards Faster and Efficient Lightweight Image Super Resolution using Swin Transformers and Fourier Convolutions
options:
-h, --help show this help message and exit
--scale int Super resolution scale. Scales: 2, 3, 4
--model_path str Path to the trained SWIFT model.
--folder_lq str Path to low-quality (LR) test image folder.
--folder_gt str Path to ground-truth (HR) test image folder. (Optional)
--tile int Tile size, None for no tile during testing (testing as a whole)
--tile_overlap int Overlapping of different tiles
--cuda Use CUDA enabled device for inference.
--jit Perform inference using JIT.
--forward_chop Use forward_chop for performing inference on devices with less memory.
--summary Print summary table for model.
If you want to test out SWIFT, we provide a docker image that comes with all the dependencies pre-installed. The image can be run on both CPU and CUDA enbaled GPU. To run on GPU, please refer to the installation guide.
To run SWIFT training image, type the following command in terminal
To run on CPU,
docker run --rm -p 6006:6006 -it ivishalr/swift-training:latest bash
To run on GPU,
docker run --rm --gpus all -p 6006:6006 -it ivishalr/swift-training:latest bash
Note: The above image will be pulled from DockerHub and requires internet connection.
Use the following command to build docker image from scratch
docker build -t swift:0.1 -f docker/swift.dockerfile .
To run the docker container on CPU, type the following command in terminal
docker run --rm -it swift:0.1 bash
To run the docker container on GPU, type the following command in terminal
docker run --rm --gpus all -it swift:0.1 bash
We provide an easy to use SWIFT inference using TorchServe. This section provides instructions to run SWIFT using TorchServe in Docker. To setup TorchServe locally, please refer to README.md.
Make sure you have the following packages installed.
- Numpy
- Pillow
- Requests
To run SWIFT inference image, type the following command in terminal
To run on CPU,
docker run -p 8080:8080 -p 8081:8081 -d ivishalr/swift:latest
To run on GPU,
docker run --gpus all -p 8080:8080 -p 8081:8081 -d ivishalr/swift:latest-gpu
Note: The above image will be pulled from DockerHub and requires internet connection.
To build for CPU, type the following command in terminal
docker build -t swift_inference:0.1 -f docker/swift_inference.dockerfile .
To build for GPU, type the following command in terminal
docker build -t swift_inference:0.1 --build-arg image=pytorch/torchserve:latest-gpu -f docker/swift_inference.dockerfile .
To run on CPU,
docker run -p 8080:8080 -p 8081:8081 -d swift_inference:0.1
To run on GPU,
docker run --gpus all -p 8080:8080 -p 8081:8081 -d swift_inference:0.1
Running on GPU requires docker to be setup on NVIDIA GPUs. Please refer to Docker section.
To make predictions using SWIFT running on TorchServe, use serve/infer.py
to make predictions on images.
python3 serve/infer.py --path=<path_to_image> --scale=<2,3,4>
$ python3 serve/infer.py -h
usage: infer.py [-h] --path str --scale int [--save] [--save_dir str]
Towards Faster and Efficient Lightweight Image Super Resolution using Swin Transformers and Fourier Convolutions
options:
-h, --help show this help message and exit
--path str Path to image for prediction.
--scale int Super resolution scale. Scales: 2, 3, 4.
--save Store predictions.
--save_dir str Path to folder for saving predicitons.
Pass args --save
and --save_dir
for saving the predictions made using SWIFT.
The figure below shows the qualitative comparisons of SWIFT and other state-of-the-art methods on a small patch highlighted by the red rectangle.
The table below shows the quantitative comparisons of SWIFT with other state-of-the-art methods on Lightweight Image Super Resolution on popular benchmarking datasets. We compare models based on the scores achieved in PSNR and SSIM metrics. The first and second best methods have been highlighted with red and blue respectively.
The table below shows the comparison of inference time of state-of-the-art methods on benchmarking datasets for ×4 scale. The ▼ symbol indicates improvement and ▲ symbol indicates deterioration of inference time compared to the reference model. The reference model used for comparisons for different model architecture types is indicated by * (asterisks).
If you find our work useful in your works, please use the below citation. Thank You!
@inproceedings{ramesha2022towards,
title={Towards Faster and Efficient Lightweight Image Super Resolution Using Transformers and Fourier Convolutions},
author={Ramesha, Vishal and Kadambi, Yashas and Aditya, Abhishek and Prashant, T Vijay and Shylaja, SS},
booktitle={Artificial Intelligence and Applications},
year={2022}
}