SRCNN architecture uses 3 CNN layers for implementing super resolution, and is one of the very first papers to use deep neural network for the image super resolution task. By preprocessing an original image (i.e. the ground truth) into a low resolution image (LR), SRCNN upsamples the LR image into high resolution image (HR) and compares it with the ground truth during the training process.
Custom training dataset was obtained from the below link from kaggle: https://www.kaggle.com/datasets/adityachandrasekhar/image-super-resolution It is convenient to use since the dataset contains both LR and HR of the same image, which we don't need to run through cumbersome data preprocessing. However, since the upscale factor is unknown, it is inappropriate to precisely measure the performance of the trained architecture.
In dataset.py, there are two classes for both training and validation dataset. Both have almost identical. Notice that I only extracted 'Y' channels of input images for the training, which is mentioned in the paper.
For the training, I used Adam optimizer instead of SGD. I used both MSE loss and PSNR (peak signal-to-noise ratio) for performance metric. PSNR naively represents how similar the model output is compared to the ground truth, measured in dB scale.
With 20 Epoch for the training, below is the MSE error and PSNR with respect to each epoch, for both training and validation
Below is the output result with one of the validation images. Notice the initial blurred image turns into a more sharp, high-resolution image.