453 Deep CNN Based Blind Image Quality Predictor
453 Deep CNN Based Blind Image Quality Predictor
Bachelor of Technology
in
Electronics and Communication Engineering
By
K. SUNDHAR (16RJ1A0453)
Under the esteemed Guidance of
CERTIFICATE
This is to certify that the Mini project report entitled Deep CNN-Based Blind Image Quality
Predictor being submitted by
K. SUNDHAR – 16RJ1A0453
in partial fulfillment for the continuation of the Degree of Bachelor of Technology in
Electronics and Communication Engineering to the Jawaharlal Nehru Technological
University, Hyderabad is a record of bonafide work carried out under my guidance and
supervision.
The results embodied in this project report have not been submitted to any other
University or Institute for the award of any Degree.
I ,K. Sundhar hereby declare that, this project report Entitled Deep CNN-Based Blind
Image Quality Predictor is the bonafide work of mine carried out under the supervision
of Mr. Y. Venkateswara Reddy [Link](Ph.D). I declare that, to the best of my
knowledge, the work reported herein does not form part of any other project report or
dissertation on the basis of which a degree or award was conferred on an earlier occasion
to any other candidate. The content of this report is not being presented by any other
student to this or any other University for the award of a degree.
Signature:
K. SUNDHAR
Date:
i
ACKNOWLEDGEMENT
I take this opportunity to express my deepest gratitude and appreciation to all those who
have helped me directly or indirectly towards the successful completion of this project.
I would like to say a very warm thank you to my guide Mr. Y. Venkateswara Reddy
[Link](Ph.D) for giving me proper guidance and support.
Next, I would like to express my gratitude to my parents, friends and all the faculty of our
department for their cooperation and keen interest throughout this project.
Also, I would also like to thank all the teaching and non-teaching faculty members of
Electronics and Communication Branch.
ii
CONTENTS
List of Figures v
Abstract vi
1. Introduction 01
2. Literature Review 03
2.1 Review of Full Reference Image Quality Assessment (FR-IQA) 03
2.1.1 Limitations of FR-IQA Models 08
2.2 Review of Reduced Reference Image Quality Assessment (RRIQA) 08
2.2.1 Limitations of RR-IQA 11
2.3 Review of No Reference Image Quality Assessment (NR-IQA) 12
2.3.1 Distortion Specific NR-IQA 12
2.3.2 Generalized NR-IQA 16
2.4 Research Gaps 18
3. Software Introduction 20
3.1. Introduction to MATLAB 20
3.2 The MATLAB system 21
3.3 GRAPHICAL USER INTERFACE (GUI) 22
3.4 Getting Started 24
3.4.1 Introduction 24
3.4.2 Development Environment 24
3.4.3 Manipulating Matrices 24
3.4.4 Graphics 24
3.4.5 Programming with MATLAB 24
3.5 Development environment 25
3.5.1 Introduction 25
3.5.2 Starting MATLAB 25
3.5.3Quitting MATLAB 25
3.5.4 MATLAB Desktop 25
3.5.5 Desktop Tools 26
3.6 MANIPULATING MATRICES 29
iii
3.6.1 Entering Matrices 29
3.6.2 Expressions 30
3.6.3 Operators 32
3.6.4 Functions 32
3.7 GUI 33
3.7.1 Creating GUIs with GUIDE 34
3.7.2 GUI Development Environment 34
3.7.3 Features of the GUIDE-Generated Application M-File 35
3.7.4 Beginning the Implementation Process 36
3.7.5 User Interface Control 37
[Link] Method 48
4.1 Proposed Framework 48
4.2 RELATED WORK 49
4.3 DEEP IMAGE QUALITY ASSESSMENT PREDICTOR 49
4.3.1 Model Architecture 50
4.3.2 Image Normalization 51
4.3.3 Reliability Map Prediction 52
4.3.4 Learning Objective Error Map 54
4.3.5 Learning Subjective Opinion 54
4.3.6 Training 55
4.3.7 Patch-Based Training 55
5. simulation results 56
[Link] 60
[Link] 63
References 64
iv
LIST OF FIGURES
v
ABSTRACT
Deep CNN-Based Blind Image Quality Predictor
Abstract— Image recognition based on convolutional neural networks (CNNs) has recently
been shown to deliver the state-of-the-art performance in various areas of computer vision and
image processing. Nevertheless, applying a deep CNN to no- reference image quality assessment
(NR-IQA) remains a challenging task due to critical obstacles, i.e., the lack of a training
database. In this paper, we propose a CNN-based NR-IQ framework that can effectively solve
this problem. The proposed method—deep image quality assessor (DIQA)—separates the
training of NR-IQA into two stages: 1) an objective distortion part and 2) a human visual system-
related part. In the first stage, the CNN learns to predict the objective error map, and then the
model learns to predict subjective score in the second stage. To complement the inaccuracy of
the objective error map prediction on the homogeneous region, we also propose a reliability map.
Two simple handcrafted features were additionally employed to further enhance the accuracy. In
addition, we propose a way to visualize perceptual error maps to analyze what was learned by
the deep CNN model. In the experiments the DIQA yielded the state-of-the-art accuracy on the
various databases.
Index Terms— Convolutional neural network (CNN), deep learning, image quality assessment
(IQA), no-reference IQA (NR-IQA).
vi
Chapter 1
INTRODUCTION
The goal of image quality assessment (IQA) is to predict the perceptual quality of digital
images in a quantitative manner. Digital images are likely to be inevitably degraded in the
process from content generation to consumption. The acquisition, transmission, storage, post
processing, or compression of images introduces various distortions, such as Gaussian white
noise, Gaussian blur (GB), or blocking artifacts. A reliable IQA algorithm can help quantify the
quality of images obtained blindly from the Internet and accurately assess the performance of
image processing algorithms, such as image compression and super-resolution, from the perspec
tive of a human observer. IQA is classified in general into three categories, depending on
whether a reference image (the pristine version of an image) is available: full-reference IQA
(FR-IQA), reduced-reference IQA (RR-IQA), and no-reference IQA (NR-IQA). In general, the
performance of these techniques, in order of decreasing accuracy, is FR-IQA RR-IQA, and NR-
IQA. However, since reference images are not accessible in a number of practical scenarios, NR-
IQA is most appropriate as the most general method.
The bit rate of computer networks has continued to increase in recent years and has
enabled the provision of high-quality entertainment to end users who do not have reference
images; hence, significant research efforts have been made to enhance the accuracy of NR-IQA
from the perspective of the end user. Many recently proposed NR-IQA algorithms involve the
use of machine learning, such as support vector machines (SVMs) and neural networks (NN), to
blindly predict image quality scores. Research has shown that the accuracy of NR-IQA depends
heavily on designing elaborate features. Natural scene statistics (NSS) [1], [2] is one of the most
successful features under the assumption that natural images have statistical regularity that is
altered when distortions are introduced. Due to the difficulties involved in obtaining reliable
features, research on NR-IQA has progressed significantly since NSS. Deep learning has lately
been adopted in a few NR-IQA studies as a different method from conventional approaches
based on NSS [3], [4]. However, most such studies have continued to use handcrafted features,
and deep models, such as deep belief networks (DBNs) and stacked auto encoders, have been
used in place of conventional regression machines.
Convolutional neural networks (CNNs) form the most popular deep learning model
nowadays due to their strong representation capability and impressive performance. CNNs have
been successfully applied to various computer vision and image processing problems. The
performance of deep neural networks heavily depends on the number of training data. However,
the currently available IQA databases are much smaller compared to the typical computer vision
data set for deep learning. For example, the LIVE IQA database [5] contains 174–233 images for
each distortion type, while the widely used data set for image recognition contain more than 1.2
1
million pieces of labeled data [6]. Moreover, obtaining large-scale reliable human subjective
labels is very difficult. Unlike classification labels, constructing an IQA database requires a
complex and time- consuming psychometric experiment. To expand the training data set, one can
use data augmentation techniques such as rotation, cropping, and horizontal reflection.
Unfortunately, any transformation of images would affect perceptual quality scores. Moreover,
the perceptual process of the human visual system (HVS) includes multiple complex processes,
which makes
Fig.1.1 Overall flowchart of DIQA. The training process consists of two stages: regression
onto objective error maps and regression onto subjective scores. The squares with red
“train” (blue “train”) indicates that the sub network will be trained in the first (second)
stage.
training of a deep model with limited data set even harder. For example, the visual sensitivity of
the HVS varies according to spatial frequency of stimuli [7], [8], and the presence of texture
hinders other spatially coincident image changes [9]. In addition, the perceived signals go
through band pass, multi scale, and directional decompositions in the visual cortex [10]. Such
complex behaviors need to be embedded in the data set with human subjective labels. However,
it is difficult to claim that a small data set can represent general visual stimuli, which results in
an over fitting problem.
2
Chapter 2
LITERATURE REVIEW
Literature shows that many researchers have contributed in the development of image quality
assessment metrics in last decade. Articles published in journals, proceedings of conferences and
books are referred for literature survey. Literature based on all parameters is studied in order to
study their effects and possibility of their extension in no-reference color image quality
assessment. The models developed by researchers can be classified in three main categories.
They are Full Reference Image Quality Assessment (FR-IQA) metrics, Reduced Reference
Image Quality Assessment (RR-IQA) metrics and No Reference Image Quality Assessment
(NR-IQA) metrics. Literature shows that these image quality assessment models of each
category are further divided into two categories as image quality measures applicable to gray
scale images and image quality measures applicable to color images.
The development in the research of IQA models started with FR image quality metrics.
Conventionally image fidelity metrics like Mean Square Error (MSE), Peak Signal to Noise
Ratio (PSNR) were used to evaluate quality of images [126]. Though simple, they show certain
limitations. They focus on global errors and ignore the local errors. In contrast to foveated vision
property of HVS, they operate on pixel by pixel basis. Spatial relationship among pixels is an
important characteristic which is perceived by human eye. However, reordering the pixels does
not change distortion measurement in case of these metrics. Therefore these conventional metrics
fail to emulate human visual system. Research in the field of image quality assessment was truly
accelerated after the development of mean structural similarity metric [72]. It made an attempt to
measure the image quality based on preservance of the image structures, instead of calculating
pixel wise difference. Image quality score was computed by structure comparison, luminance
comparison and contrast comparison. Study reveals that most of the FR-IQA metrics are based
on structural similarity. Mean Structural Similarity Index Metric (M-SSIM) developed by Z.
Wang et al [71], [72], [73] attracted attention of entire IQA researcher community. M-SSIM
metric is based on the hypothesis that human eye is subjected to extract structural activity from
any image. 10 Luminance comparison, structure comparison and contrast comparison between
original image and distorted image is done using mean, variance and covariance of the images.
They all are combined as SSIM. Block wise quality score of the image is computed. Average of
block wise SSIM values is called as M-SSIM, the final quality score. This metric is based on
similarity measure and it quantifies any variation between the reference image and the degraded
image. The metric performed much better than conventional image fidelity measures on image
databases comprising of different distortions. It is well known that HVS is attracted by different
image textures with different degrees. Therefore authors have suggested a modification in the
metric. Spatially variant weighted average of SSIM index map can improve the HVS consistency
of this approach.
3
Literature shows that many further improvements in this model were suggested by
researchers to improve the performance. Perceivability of details in an image is dependent on
density of samples in the image, distance between observer and the image plane and finally, the
perceiving capacity of the visual system of the observer. A single scale method of quality
assessment is appropriate for specific settings. In order to provide more flexibility in case of
variations in resolution of the image, resolution of the display system and viewing distance, a
multi-scale M-SSIM model is proposed by Z. Wang et al [74]. This approach makes an attempt
to incorporate structural details at different resolutions. Iterative decomposition of images is
performed at different scales using low pass filtering followed by down sampling. Structure and
contrast comparison is done at every stage but luminance comparison is done only at last stage.
Final quality score is computed from quality measures obtained at different scales. Distorted
image often looks more similar to the distortion-less image if quality evaluation is done at larger
scale. It is quite obvious that the single-scale model exhibits higher HVS consistency with the
increase in scales. However, this method suffers from certain limitations. It is tested for only
JPEG and JPEG2000 image dataset in LIVE database. The metric is not effective on blurred and
noisy images. Authors have suggested incorporation of more systematic approach so that it can
be applicable to a broad range of applications.
M-SSIM metric fails to predict quality of images distorted by blur and noise. B. Liao et al
[18] proposed a dual scale structural similarity metric to improve the performance of MSSIM for
images distorted by blur and noise. In the proposed measure the first scale describes the clumsy
contours of the object called as macro edge image and the second scale describes 11 the subtle
edges called as micro edges. Micro edges reflect the detailed edges of the objects. Subtraction of
edge image from filtered image gives macro edge image and difference between this macro edge
image and original image is used as micro edge image. Computation of macro edge similarity
and micro edge similarity between original image and distortion image is used to generate the
final quality score. The metric outperforms especially for blur and noise distortions.
Although M-SSIM metric evaluates the image quality accurately, it suffers from the
limitation of its sensitivity to geometrical distortions. In order to increase immunity of the metric
to such non structural distortions, C. L. Wang et al [23] extended M-SSIM in wavelet domain.
Multilevel discrete wavelet transform is used to calculate wavelet coefficients of reference image
and that of degraded image. LH, HL, HH bands of same decomposition level are combined to
form one band each, for five levels. These five bands and the lowest subband (LL) are used to
get total six bands. DWT-SSIM is computed as similarity measure for each band. Weighted
mean of DWT-SSIM gives the final quality score. As human eye is highly sensitive to mid
frequency band it is assigned greater weight than that of other bands. Authors have concluded
that this metric outperforms M-SSIM. Evaluation of the metric is done using LIVE image
database after implementing non-linear regression. M-SSIM in spatial domain performs poor for
Gaussian blurred images. However, the metric in wavelet domain shows improvement in the
performance for Gaussian blurred images.
4
B. Wang et al [19] proposed HVS based SSIM metric based on frequency and spatial
characteristics of human eye. It is based on the hypothesis that human eye does not pay equal
attention to all regions in the image. Frequency sensitivity weight is calculated using DCT
coefficients. To mimic the foveated vision of human eye spatial affect weight is calculated in
spatial domain. These weights are used in the calculation of M-SSIM. This metric gives HVS
consistent results especially for badly blurred images. Content partitioned structural similarity
metric proposed by C. Li et al [75] is also based on the similar hypothesis where foveated vision
of human eye is taken into consideration to increase accuracy in prediction. Metrics proposed
discussed above [23], [19] are complex as far as computation is concerned as they make use of
transforms.
All these derivatives of M-SSIM are mainly intended to evaluate quality of gray scale
images. However, image suffers from deviation in color due to introduced distortion which must
be considered [121]. A. Ninnasi et al [2] also proposed DFT/DWT based SSIM for full reference
assessment of color images using color space conversion and sub band decomposition.
Perceptual sub band decomposition is used to mimic the multi-channel behavior of HVS.
Contrast sensitivity and masking functions are used to make the metric HVS consistent. Authors
have concluded that DFT performs better than DWT. They have also stated that the method
overestimates the masking effect in the region of strong edges with high contrast.
All the above image quality metrics exhibit good performance. However, they are
developed for evaluation of only gray scale images. They are employed in the assessment of
color images by applying them to intensity plane. However, it has been observed that the
introduction of distortion causes deviation in not only luminance information but also in
chrominance information which needs to be considered while evaluating color image quality. To
cater this need G. Fahmi et al [78] proposed a novel idea in SCIELAB metric based on color
difference to measure the quality of JPEG-compressed color images. Original image and
degraded image are transformed into CIEXYZ color space followed by filtration in spatial
domain. Then they are transformed into CIELAB color space. Histogram error features are
calculated in CIELAB color space. All images are classified as smooth or textured images and
5
training of classifier is done using histogram error feature vector. It has been observed that the
classifier has high accuracy of prediction for JPEG images, but the metric performs poor for
textured images. Authors have stated that human eye can not notice distortion in texture easily.
Therefore further improvement in the metric is needed. 13 V. Tsagaris et al [63] also proposed a
color image quality metric using CIELab color space for evaluation of color information.
Relative entropy divergence between the corresponding image planes is calculated using
Kullback-Leibler divergence formula. This method is useful in evaluation of color related
information in different regions of the image and in pseudo color procedures. A. Kolaman et al
[1] combined structural similarity metric with quaternion representation for color images in
Video Quality Metric (VQM). Quaternion matrix has three imaginary planes which are derived
from R, G and B planes of a color image. Using subjective test, authors have proved that
distortion like cross talk in color degrades the perceived image quality and blur has a stronger
effect on perceived image quality. Authors have concluded that the metric is able to measure
combined degradation of blur and saturation.
6
computed for assessment of content dependent part. PSNR is computed for assessment of content
independent part. Results are combined using non linear equation to obtain the perceptual quality
score. The metric gives HVS consistent results for majority of distortion types available in TID
image database.
7
2.1.1 Limitations of FR-IQA Models
All these metrics have exhibited good performance. However, they work only if original
image is available at the time of assessment of degraded image. Today in this era of multimedia
and internet, original image cannot be made available at the time of assessment in most of the
applications. Especially in the field of video communication, where bandwidth is a great
constraint it is not feasible. This situation demands no-reference image quality assessment which
is a challenging task. Reduced-reference IQA is a good compromise between FR and NR image
quality assessment.
The framework of RR-IQA makes use of partial information carried by extracted features
of original image. This feature information is also transmitted along with the image as side
information [42], [87]. At the receiver side, similar features are extracted from received distorted
version of the image. Prediction of image quality is done by comparing such features with the
image features of original image. Such features must provide summary of original image in brief.
They should be able to reflect different types of distortions.
G. Cheng et al [87] proposed RR-IQA based on natural scene statistics. It is based on the
hypothesis that natural images come from tiny space. They follow very specific distributions in
vertical as well as horizontal gradient domain which can be modeled by Laplacian distribution
function. Introduction of distortion in an image causes proportional deviation in the distribution.
The deviation between the model and actual distribution of image is calculated by Kullback-
Leibler (KL) divergence formula. Feature vector consists of variance and kurtosis of the model
distribution along with computed value of KL divergence. At the receiver end similar procedure
is carried out to extract features from received image. Evaluation of image quality takes place
8
using received features and extracted features. Correlation between subjective quality score and
objective quality score is greater than 90% for different distortions. Results of experimentation
prove that the method is more efficient than popular PSNR metric.
Y. Ming et al [89] proposed an effective reduced-reference IQA model for color images.
Color feature information is insensitive to geometrical distortions introduced during
transmission. Hence spatial domain color difference analysis is done in hue and saturation
planes. Frequency domain distortion analysis is conducted in intensity plane using contourlet
transform. Feature vector includes chrominance information, visual threshold and proportion of
visual sensitive coefficients. This information is transmitted through an auxiliary channel to the
receiver from transmitter and is employed in weighted evaluation of image quality at the
receiver. Authors have suggested the need of better mode of feature extraction as future
direction. Correlation between subjective quality score and objective quality score is 94%.
However, performance evaluation is done using only JPEG and JPEG 2000 images.
11
multimedia, original image is not obtainable or it does not even exist. Only degraded image is
available which is to be evaluated. In such case design of NRIQA is the only solution.
Although important, NR-IQA is a very difficult task [91]. Development of blind image
quality assessment metric is the only solution in many applications where the model evaluates
the image quality without any information about the distortion free image [97]. It is based on the
hypothesis that humans can easily distinguish between good quality image and poor quality
image [72]. Human eye-brain system is using effective pooling of information perceived from
degraded image to make the opinion about image quality. Prior NR-IQA models assume that the
distortion introduced in the image is known. Each distortion causes special visual effect in
perception of the image [25]. The NR-IQA algorithm is developed by focusing the possible
impairments in the image that would have been caused due to distortion. Fortunately, type of
distortion is known in most of the applications. For example, JPEG 2000 images suffer from
ringing and blurring artifacts. Such metrics are application specific or distortion specific.
Literature shows that available methods are mainly focused on gray scale images which ignores
the deviation in color component. 21
Block based transform coding is widely used in many image and video compression
algorithms. It causes visible blocking artifact in images due to abrupt change across block
boundaries. It has unpleasant effect in the perception. Popular FR-IQA technique like PSNR
does not correlate well with the distortion. Therefore many NR-IQA algorithms are focused to
predict image quality by measuring such blockiness in images. Generally in any distortion free
image, maximum energy is distributed among low frequency components and energy associated
with high frequency components is insignificant. However when blocking artifact is introduced,
boundaries of blocks in the image possess significant energy at high frequency. Authors [98]
have stated that true edges also show similar property. Therefore detection of true edges as
blocking artifacts is to be intelligently avoided. The algorithm developed by considering only
blocking artifacts may fail in the evaluation of good quality images, which must be taken care of
in order to increase accuracy of prediction in NR-IQA.
Z. Wang et al [130] proposed an efficient perceptual image quality metric for assessment
of JPEG compressed images using NR approach. Proposed method suggests extraction of
effective features that well reflect the blocking and blurring artifacts introduced during
quantization in JPEG coder. Blockiness is measured by estimating average of the difference
calculated across the boundaries of blocks of size 8*8. Blurriness reduces the signal activity.
Average of absolute difference between neighboring pixels in the block is used to measure
amount of blurriness. Subjective test results and extracted features are used to train the model. It
has been observed that results of the metric correlate highly with HVS for the two JPEG image
12
databases created. However, when the images in both the databases are combined it shows that,
overtraining of the model reduces its generalization. Design of the model in spatial domain
makes it computationally efficient and memory efficient. The method does not need to store the
whole image in the memory. Authors have stated that in future additional features can be used in
the training process so that the metric will be applicable to MPEG compression standard as well
as other types of distortions.
JPEG2000 is accepted as a standard for digital film and it is recommended for storage,
transmission and display of motion pictures. However, JPEG2000 images suffer from blur and
13
ringing due to distortion in pixels and edges. Hence it is needed to measure distortion in
JPEG2000 images to determine their usability. Z. M. Parvez Sazzad et al [70] proposed NR- 23
IQA for JPEG2000 images based on spatial features. The subjective experiment results on
JPEG2000 image database are used to train and test the model. Parameters based on pixel
distortion and edge distortion are used in feature vector to train the model. Mean pixel values of
neighbor hood, the absolute difference between central pixel and the second closest
neighborhood pixel are computed as pixel distortion measures. Zero crossing rate along
horizontal and vertical directions and histogram measure without and with edge preserving filter
are used as edge distortion measures. These measures are combined using particle swarm
optimization to measure final quality. The metric is consistent with HVS and is tested on images
from databases which are not used in training. Authors have suggested extension of the work for
generalization of the metric.
Distortion of blur appears as a loss of spatial detail and it reduces edge sharpness [104].
Following are the causes of occurrence of this distortion.
1. During the process of image acquisition images are blurred due to relative motion
between the object and the camera or due to out-of-focus capturing.
14
spread of the edges to quantify amount of blur. The ringing is measured by the ripples around the
edges. Start point and end point are calculated in gradient domain. Blur is measured as spread of
the edge. After processing original image an FR-IQA model is developed for ringing artifact.
Ringing is measured from ring width found near the edges or contours. The metric outperforms
widely used PSNR metric. This metric is applicable in auto focusing of image capturing device,
coding optimization and network resource management. Correlation between subjective score
and objective score for FR blur metric and NR blur metric is calculated as 87% and 73%
respectively. Authors have suggested extension of FR-ringing metric to a metric without
reference. Generalization of the metric is also needed.
Partially blurred images show high aesthetic qualities. However, they affect saturation,
contrast and other features of images. Therefore F. Mavridaki et al [108] proposed measurement
of partial blur in frequency domain. Introduction of blur attenuates the high frequencies.
Therefore the metric uses information derived from the power spectrum of Fourier transform of
image to estimate the distribution of low and high frequencies. Image is partitioned into nine
patches followed by calculation of power spectra for whole image and partitions. A five bin
frequency histogram of these ten spectra form the feature vector. A Support Vector Machine
(SVM) classifier is trained using above features to evaluate the partial blur in testing image
dataset. Metric shows promising performance for naturally blurred and artificially blurred
images.
R. Ferzli et al [109] proposed a metric to measure Just Noticeable Blur (JNB). It uses
probability summation over space to consider response of HVS to sharpness at different contrast
levels. JNB is the minimum perceivable blurriness around the edges. Blurriness around an edge
is masked up to a certain threshold which is equal to JNB. JNB is determined through
experimentation by calculation of standard deviation of Gaussian filter corresponding to JNB
threshold at given contrast. The corresponding edge width is called as JNB width and it is used
as blurriness measure. The image is divided into blocks of 64*64 pixels before edge detection
and edge width computation. JNB widths are obtained subjectively using local contrasts in the
neighborhood of the edges for the blocks containing higher amount of edge pixels. A comparison
between measured widths and JNB widths obtained subjectively is used to classify the
15
perceptible blur or imperceptible blur on each edge. Pooling of these values is done over all
edges in an edge block. Authors have suggested need of improvement in the performance of the
metric for very large blur values and need of incorporation of noise immunity in the model as
future direction.
Further an artificial neural network has been used by F. Shao et al [110] to combine a
blocking metric, blur metric, noise metric and a ringing metric of JPEG2000 image and estimate
the overall quality of an image. The model performs the detail characteristic analysis of the
specific distortion. Distortion-specific features are extracted which are used to train Support
Vector Regression (SVR). Extracted features contain gradient histogram information in order to
evaluate the degree of blur, frequency amplitude information in order to measure amount of
white noise, DCT coefficient contrast information in order to calculate the amount of blocking
artifacts and wavelet sub band decomposition in order to evaluate JPEG2000 images.
Experimental results show that this metric outperforms FR-IQA techniques. This metric can
measure the distortions which are included in the design. However the metric fails to measure
other distortions if introduced in the images. To increase the range of applications of the metric it
is needed to extend it to other types of distortions too.
To develop an NR-IQA algorithm that can work for all types of distortions is a very
challenging task. Very few NR-IQA algorithms are available which are not restricted to single
distortion [91]. Literature shows that natural scene statistics has a great potential to solve the
problem of generalized NR image quality assessment [99]. Human brain is trained with high
quality images since childhood. It develops the models of high-quality image and learns to use
them to assess the quality of degraded image [91].
NR image quality model makes use of characteristics of human visual system. Original
distortion-free images are called as natural images or pristine images. Their quality is perfect. It
is believed that they form a tiny subset of a huge set of all possible images and exhibit similar
statistical properties. Any distortion introduced in the image causes deviation in this
characteristic. Statistical models of such high quality images are developed to describe the class
of high quality images. Such Natural Scene Statistics (NSS) based models have potential to
predict the amount of distortion without any reference. FR-IQA and RR-IQA models help to
know exact nature of statistical model of natural images while developing NR image quality
model.
A. Mittal et al [6], [114] proposed an excellent NSS based generic Blind Image Spatial
Quality Evaluator (BRISQUE) model for NR-IQA in spatial domain. It does not compute
distortion-specific features such as blocking, ringing or blur. The metric is based on the statistics
of luminance coefficients that are normalized locally. It is based on the hypothesis that the
normalized luminance coefficients follow Gaussian distribution in a natural image while as
16
introduction of distortion causes deviation in this characteristic. Support vector regression is used
for prediction of quality. Authors have stated that this method does not make use of any
transform to avoid computational burden and results are competitive with respect to all generic
NR-IQA algorithms available. Authors have suggested that these features can be extended for
detection of the distortions.
A. Mittal et al [4] extended the metric to completely blind quality analyzer. It can
evaluate the quality of distorted images with very little amount of prior knowledge of distortions.
Proposed model uses simple quality aware statistical features based on Natural Scene Statistics
(NSS) model. Such features of natural image are fitted with MultiVariate Gaussian 27 fit
(MVG). The distance between the model and MVG model developed using quality aware
features extracted from distorted image is used as image quality score. Authors have concluded
that the results are comparable with general NR-IQA models that are based on machine learning.
This model can be applied in unconstrained environment.
Y. Jhang et al [115] proposed NR-IQA using log derivative statistics of natural scenes.
The model is referred as DErivative Statistics-based Image QUality Evaluator (DESIQUE) and it
uses two stage frame work. The model performs distortion identification and distortion specific
17
evaluation. It extracts image quality related statistical features in both the spatial and frequency
domains at two image scales. Features are used to evaluate the image quality. Pixel 28 wise
statistics is used along with log derivative statistics computed using pairs of pixels in the spatial
domain. Model makes use of log-Gabor filters to extract high frequency components of an image
in frequency domain which is also modeled by the log-derivative statistics. These statistical
features are modeled by a generalized Gaussian distribution. The parameters of fitting are used
as the features in the proposed method. Experimental results show that the proposed method
shows improvement in the performance of current NR-IQA models. The model is
computationally efficient.
Researchers have reported following issues and challenges that are to be faced while
developing different IQA models.
3. No-reference image quality assessment (NR-IQA) is the most difficult task in the field
of image analysis. General purpose NR image quality metric is a more complex system as it
handles several modes of artifacts. The metric must be designed to assess the unknown
distortions that may be introduced in the images in future.
4. The assumed statistical knowledge describing high quality images in NR-IQA is not
restricted to a single original image, but it is expressed as probability distribution of all high
quality natural images that fall within the space of possible images.
5. NR metrics must be able to differentiate between the signal and visible distortion with
limited input information. Many desired signals look like the typical artifacts.
6. Human visual system is complex and to develop a perfect model of the human eye is
not possible. Sensitivity of HVS to different errors is different and varies with visual context.
7. Mapping of computed quality score to corresponding MOS value to reflect the way
the HVS and the brain arrive at an assessment of the perceived distortion is a difficult task. 29
2.4 Research Gaps
It is observed that initially research was focused mainly on full-reference (FR) measures
that assume availability of original images at the time of assessment. In fact in this era of
18
multimedia, often times original image is not easily available or perhaps does not even exist.
Therefore more work is needed to be done to develop a robust reduced-reference (RR) and no-
reference (NR) metrics. The techniques which were published till 2010 for no-reference image
quality assessment are mainly distortion specific. They assume specific distortion in the image
and the metric is developed considering the statistics of that distortion. These metrics are not able
to quantify other types of distortions. Literature reveals that the development of generalized NR-
IQA technique has been accelerated recently. Still researchers have concluded that there is room
for improvement and this domain is far from maturity. Most of the generalized NR-IQA methods
make use of transform based approach which increases computational burden and slows down
the speed. It makes the metrics unsuitable in real time applications.
Moreover most of these measures are focused on gray scale image features. However as
human beings can discern thousands of shades of color as compared to only two dozen shades of
gray, color is employed as a powerful descriptor. It has been proved that color information also
suffers from deviation due to distortion in the images. Therefore it is a challenging demand of
this era to develop a generalized no-reference image quality metric for color images which will
be efficient, faster and simpler. It can be incorporated in real time applications. Hence it is
proposed to design a generalized framework for assessment of color images in different color
spaces using no-reference technique with following objectives.
To design full-reference HVS based color image quality assessment measure based on
different color spaces
Today’s algorithms developed for image quality assessment are performing well while
predicting human visual judgment in conventional applications. Still it is needed to extend 30 the
research in IQA to face the current challenges by focusing on the future challenges
simultaneously [127], [128]. Limitations of current IQA algorithms prove that there is a room for
development of alternative methods beyond the available techniques
19
CHAPTER 3
Software Introduction:
3.1. Introduction to MATLAB
MATLAB is an interactive system whose basic data element is an array that does not
require dimensioning. This allows you to solve many technical computing problems, especially
those with matrix and vector formulations, in a fraction of the time it would take to write a
program in a scalar non interactive language such as C or FORTRAN.
The name MATLAB stands for matrix laboratory. MATLAB was originally written to
provide easy access to matrix software developed by the LINPACK and EISPACK projects.
Today, MATLAB engines incorporate the LAPACK and BLAS libraries, embedding the state of
the art in software for matrix computation.
MATLAB has evolved over a period of years with input from many users. In university
environments, it is the standard instructional tool for introductory and advanced courses in
mathematics, engineering, and science. In industry, MATLAB is the tool of choice for high-
productivity research, development, and analysis.
20
MATLAB features a family of add-on application-specific solutions called toolboxes.
Very important to most uses of MATLAB, toolboxes allow you to learn and apply specialized
technology. Toolboxes are comprehensive collections of MATLAB functions (M – files) that
extend the MATLAB environment to solve particular classes of problems. Areas in which
toolboxes are available include signal processing, control systems, neural networks, fuzzy logic,
wavelets, simulation, and many others.
Development Environment:
This is the set of tools and facilities that help you use MATLAB functions and files.
Many of these tools are graphical user interfaces. It includes the MATLAB desktop and
command window, a command history, an editor and debugger, and browsers for viewing help,
the workspace, files, and the search path.
This is a high-level matrix/array language with control flow statements, functions, data
structures, input/output, and object-oriented programming features. It allows both “programming
in the small” to rapidly create quick and dirty throw-away programs, and “programming in the
large” to create large and complex application programs.
Graphics:
MATLAB has extensive facilities for displaying vectors and matrices as graphs, as well
as annotating and printing these graphs. It includes high-level functions for two-dimensional and
three-dimensional data visualization, image processing, animation, and presentation graphics. It
21
also includes low-level functions that allow you to fully customize the appearance of graphics as
well as to build complete graphical user interfaces on your MATLAB applications.
This is a library that allows you to write C and FORTRAN programs that interact with
MATLAB. It includes facilities for calling routines from MATLAB (dynamic linking), calling
MATLAB as a computational engine, and for reading and writing MAT-files.
Various toolboxes are there in MATLAB for computing recognition techniques, but we are
using IMAGE PROCESSING toolbox.
A file with extension .fig, called a FIG-file that contains a complete graphical description of
all the function’s GUI objects or elements and their spatial arrangement. A FIG-file contains
binary data that does not need to be parsed when he associated GUI-based M-function is
executed.
A file with extension .m, called a GUI M-file, which contains the code that controls the GUI
operation. This file includes functions that are called when the GUI is launched and exited,
and callback functions that are executed when a user interacts with GUI objects for example,
when a button is pushed.
guide filename
22
Where filename is the name of an existing FIG-file on the current path. If filename is
omitted,
23
GUI components can include menus, toolbars, push buttons, radio buttons, list boxes, and
sliders just to name a few. GUIs created using MATLAB tools can also perform any type of
computation, read and write data files, communicate with other GUIs, and display data as tables
or as plots.
If you are new to MATLAB, you should start by reading Manipulating Matrices. The most
important things to learn are how to enter matrices, how to use the: (colon) operator, and how to
invoke functions. After you master the basics, you should read the rest of the sections below and
run the demos.
At the heart of MATLAB is a new language you must learn before you can fully exploit its
power. You can learn the basics of MATLAB quickly, and mastery comes shortly after. You will
be rewarded with high productivity, high-creativity computing power that will change the way
you work.
3.4.3 Manipulating Matrices - introduces how to use MATLAB to generate matrices and
perform mathematical operations on matrices.
24
3.5 DEVELOPMENT ENVIRONMENT
3.5.1 Introduction
This chapter provides a brief introduction to starting and quitting MATLAB, and the tools
and functions that help you to work with MATLAB variables and files. For more information
about the topics covered here, see the corresponding topics under Development Environment in
the MATLAB documentation, which is available online as well as in print.
You can change the directory in which MATLAB starts, define startup options including running
a script upon startup, and reduce startup time in some situations.
To end your MATLAB session, select Exit MATLAB from the File menu in the desktop, or
type quit in the Command Window. To execute specified functions each time MATLAB quits,
such as saving the workspace, you can create and run a finish.m script.
When you start MATLAB, the MATLAB desktop appears, containing tools (graphical user
interfaces) for managing files, variables, and applications associated with [Link] first
time MATLAB starts, the desktop appears as shown in the following illustration, although your
Launch Pad may contain different entries.
You can change the way your desktop looks by opening, closing, moving, and resizing
the tools in it. You can also move tools outside of the desktop or return them back inside the
25
desktop (docking). All the desktop tools provide common features such as context menus and
keyboard shortcuts.
You can specify certain characteristics for the desktop tools by selecting Preferences from
the File menu. For example, you can specify the font characteristics for Command Window text.
For more information, click the Help button in the Preferences dialog box.
This section provides an introduction to MATLAB's desktop tools. You can also use
MATLAB functions to perform most of the features found in the desktop tools. The tools are:
Use the Command Window to enter variables and run functions and M-files.
Command History
Lines you enter in the Command Window are logged in the Command History window. In
the Command History, you can view previously used functions, and copy and execute selected
lines. To save the input and output from a MATLAB session to a file, use the diary function.
You can run external programs from the MATLAB Command Window. The exclamation point
character! is a shell escape and indicates that the rest of the input line is a command to the
operating system. This is useful for invoking utilities or running other programs without quitting
26
MATLAB. On Linux, for example, emacs magik.m invokes an editor called emacs for a file
named magik.m. When you quit the external program, the operating system returns control to
MATLAB.
Launch Pad
MATLAB's Launch Pad provides easy access to tools, demos, and documentation.
Help Browser
Use the Help browser to search and view documentation for all your Math Works products.
The Help browser is a Web browser integrated into the MATLAB desktop that displays HTML
documents.
To open the Help browser, click the help button in the toolbar, or type helpbrowser in the
Command Window. The Help browser consists of two panes, the Help Navigator, which you use
to find information, and the display pane, where you view the information .
Help Navigator
Product filter - Set the filter to show documentation only for the products you specify.
Contents tab - View the titles and tables of contents of documentation for your products.
Index tab - Find specific index entries (selected keywords) in the MathWorks documentation
for your products.
Search tab - Look for a specific phrase in the documentation. To get help for a specific
function, set the Search type to Function Name.
Display Pane
27
After finding documentation using the Help Navigator, view it in the display pane. While
viewing the documentation, you can:
Browse to other pages - Use the arrows at the tops and bottoms of the pages, or use the back
and forward buttons in the toolbar.
Find a term in the page - Type a term in the Find in page field in the toolbar and click Go.
Other features available in the display pane are: copying information, evaluating a selection,
and viewing Web pages.
MATLAB file operations use the current directory and the search path as reference points.
Any file you want to run must either be in the current directory or on the search path.
Search Path
To determine how to execute functions you call, MATLAB uses a search path to find M-files
and other MATLAB-related files, which are organized in directories on your file system. Any
file you want to run in MATLAB must reside in the current directory or in a directory that is on
the search path. By default, the files supplied with MATLAB and MathWorks toolboxes are
included in the search path.
Workspace Browser
The MATLAB workspace consists of the set of variables (named arrays) built up during a
MATLAB session and stored in memory. You add variables to the workspace by using
functions, running M-files, and loading saved workspaces.
To view the workspace and information about each variable, use the Workspace browser, or
use the functions who and whos.
28
To delete variables from the workspace, select the variable and select Delete from the Edit
menu. Alternatively, use the clear function.
The workspace is not maintained after you end the MATLAB session. To save the workspace
to a file that can be read during a later MATLAB session, select Save Workspace As from the
File menu, or use the save function. This saves the workspace to a binary file called a MAT-file,
which has a .mat extension. There are options for saving to different formats. To read in a MAT-
file, select Import Data from the File menu, or use the load function.
Array Editor
Double-click on a variable in the Workspace browser to see it in the Array Editor. Use the
Array Editor to view and edit a visual representation of one- or two-dimensional numeric arrays,
strings, and cell arrays of strings that are in the workspace.
Editor/Debugger
Use the Editor/Debugger to create and debug M-files, which are programs you write to
runMATLAB functions. The Editor/Debugger provides a graphical user interface for basic text
editing, as well as for M-file debugging.
You can use any text editor to create M-files, such as Emacs, and can use preferences
(accessible from the desktop File menu) to specify that editor as the default. If you use another
editor, you can still use the MATLAB Editor/Debugger for debugging, or you can use debugging
functions, such as dbstop, which sets a breakpoint.
If you just need to view the contents of an M-file, you can display it in the Command
Window by using the type function.
The best way for you to get started with MATLAB is to learn how to handle matrices. Start
MATLAB and follow along with each example.
29
You can enter matrices into MATLAB in several different ways:
A=
16 3 2 13
5 10 11 8
9 6 7 12
4 15 14 1
This exactly matches the numbers in the engraving. Once you have entered the matrix, it is
automatically remembered in the MATLAB workspace. You can refer to it simply as A.
3.6.2 Expressions
Like most other programming languages, MATLAB provides mathematical expressions, but
unlike most programming languages, these expressions involve entire matrices. The building
blocks of expressions are:
30
Variables
Numbers
Operators
Functions
Variables
MATLAB does not require any type declarations or dimension statements. When MATLAB
encounters a new variable name, it automatically creates the variable and allocates the
appropriate amount of storage. If the variable already exists, MATLAB changes its contents and,
if necessary, allocates new storage. For example, num_students = 25
Creates a 1-by-1 matrix named num_students and stores the value 25 in its single element.
Variable names consist of a letter, followed by any number of letters, digits, or underscores.
MATLAB uses only the first 31 characters of a variable name. MATLAB is case sensitive; it
distinguishes between uppercase and lowercase letters. A and a are not the same variable. To
view the matrix assigned to any variable, simply enter the variable name.
Numbers
MATLAB uses conventional decimal notation, with an optional decimal point and leading
plus or minus sign, for numbers. Scientific notation uses the letter e to specify a power-of-ten
scale factor. Imaginary numbers use either i or j as a suffix. Some examples of legal numbers are
3 -99 0.0001
1i -3.14159j 3e5i
All numbers are stored internally using the long format specified by the IEEE floating-point
standard. Floating-point numbers have a finite precision of roughly 16 significant decimal digits
and a finite range of roughly 10-308 to 10+308.
31
3.6.3 Operators
+ Addition
- Subtraction
* Multiplication
/ Division
^ Power
3.6.4 Functions
Some of the functions, like sqrt and sin, are built-in. They are part of the MATLAB core
so they are very efficient, but the computational details are not readily accessible. Other
32
functions, like gamma and sinh, are implemented in M-files. You can see the code and even
modify it if you want. Several special functions provide values of useful constants.
Pi 3.14159265...
I Same as i
Inf Infinity
NaN Not-a-number
3.7 GUI
A graphical user interface (GUI) is a user interface built with graphical objects, such as
buttons, text fields, sliders, and menus. In general, these objects already have meanings to most
computer users. For example, when you move a slider, a value changes; when you press an OK
button, your settings are applied and the dialog box is dismissed. Of course, to leverage this
built-in familiarity, you must be consistent in how you use the various GUI-building
components.
Applications that provide GUIs are generally easier to learn and use since the person using
the application does not need to know what commands are available or how they work. The
action that results from a particular user action can be made clear by the design of the interface.
The sections that follow describe how to create GUIs with MATLAB. This includes laying
out the components, programming them to do specific things in response to user actions, and
saving and launching the GUI; in other words, the mechanics of creating GUIs. This
33
documentation does not attempt to cover the "art" of good user interface design, which is an
entire field unto itself. Topics covered in this section include:
While it is possible to write an M-file that contains all the commands to lay out a GUI, it is
easier to use GUIDE to lay out the components interactively and to generate two files that save
and launch the GUI:
A FIG-file - contains a complete description of the GUI figure and all of its
children (uicontrols and axes), as well as the values of all object properties.
An M-file - contains the functions that launch and control the GUI and the
34
Note that the application M-file does not contain the code that lays out the uicontrols; this
information is saved in the FIG-file.
The M-file contains code to implement a number of useful features (see Configuring
Application Options for information on these features). The M-file adopts an effective approach
to managing object handles and executing callback routines (see Creating and Storing the Object
Handle Structure for more information). The M-files provides a way to manage global data (see
Managing GUI Data for more information).
The automatically inserted subfunction prototypes for callbacks ensure compatibility with
future releases. For more information, see Generating Callback Function Prototypes for
information on syntax and arguments.
35
You can elect to have GUIDE generate only the FIG-file and write the application M-file
yourself. Keep in mind that there are no uicontrol creation commands in the application M-file;
the layout information is contained in the FIG-file generated by the Layout Editor.
Selecting GUIDE Application Options - set both FIG-file and M-file options.
Command-Line Accessibility
When MATLAB creates a graph, the figure and axes are included in the list of children of
their respective parents and their handles are available through commands such as findobj, set,
and get. If you issue another plotting command, the output is directed to the current figure and
axes.
GUIs are also created in figure windows. Generally, you do not want GUI figures to be
available as targets for graphics output, since issuing a plotting command could direct the output
to the GUI figure, resulting in the graph appearing in the middle of the GUI.
In contrast, if you create a GUI that contains an axes and you want commands entered in the
command window to display in this axes, you should enable command-line access.
36
3.7.5 User Interface Control
The Layout Editor component palette contains the user interface controls that you can use in
your GUI. These components are MATLAB uicontrol objects and are programmable via their
Callback properties. This section provides information on these components.
Push Buttons
Sliders
Toggle Buttons
Frames
Radio Buttons
Listboxes
Checkboxes
Popup Menus
Edit Text
Axes
Static Text
Figures
Push Buttons
Push buttons generate an action when pressed (e.g., an OK button may close a dialog box and
apply settings). When you click down on a push button, it appears depressed; when you release
the mouse, the button's appearance returns to its nondepressed state; and its callback executes on
the button up event.
Properties to Set
String - set this property to the character string you want displayed on the push button .
Tag - GUIDE uses the Tag property to name the callback subfunction in the application M-file.
Set Tag to a descriptive name (e.g., close_button) before activating the GUI.
37
Programming the Callback
When the user clicks on the push button, its callback executes. Push buttons do not return a
value or maintain a state.
Toggle Buttons
Toggle buttons generate an action and indicate a binary state (e.g., on or off). When you click
on a toggle button, it appears depressed and remains depressed when you release the mouse
button, at which point the callback executes. A subsequent mouse click returns the toggle button
to the nondepressed state and again executes its callback.
The callback routine needs to query the toggle button to determine what state it is in.
MATLAB sets the Value property equal to the Max property when the toggle button is depressed
(Max is 1 by default) and equal to the Min property when the toggle button is not depressed (Min
is 0 by default).
From the GUIDE Application M-FileThe following code illustrates how to program the
callback in the GUIDE application M-file.
button_state = get(h,'Value');
if button_state == get(h,'Max')
end
38
Assign the CData property an m-by-n-by-3 array of RGB values that define a truecolor
image. For example, the array a defines 16-by-128 truecolor image using random values between
0 and 1 (generated by rand).
a(:,:,1) = rand(16,128);
a(:,:,2) = rand(16,128);
a(:,:,3) = rand(16,128);
set(h,'CData',a)
Radio Buttons
Radio buttons are similar to checkboxes, but are intended to be mutually exclusive within a
group of related radio buttons (i.e., only one button is in a selected state at any given time). To
activate a radio button, click the mouse button on the object. The display indicates the state of
the button.
Radio buttons have two states - selected and not selected. You can query and set the state of a
radio button through its Value property:
To make radio buttons mutually exclusive within a group, the callback for each radio button
must set the Value property to 0 on all other radio buttons in the group. MATLAB sets the Value
property to 1 on the radio button clicked by the user.
The following subfunction, when added to the application M-file, can be called by each radio
button callback. The argument is an array containing the handles of all other radio buttons in the
group that must be deselected.
function mutual_exclude(off)
39
set(off,'Value',0)
The handles of the radio buttons are available from the handles structure, which contains the
handles of all components in the GUI. This structure is an input argument to all radio button
callbacks.
The following code shows the call to mutual_exclude being made from the first radio
button's callback in a group of four radio buttons.
off = [handles.radiobutton2,handles.radiobutton3,handles.radiobutton4];
mutual_exclude(off)
After setting the radio buttons to the appropriate state, the callback can continue with its
implementation-specific tasks.
Checkboxes
Check boxes generate an action when clicked and indicate their state as checked or not
checked. Check boxes are useful when providing the user with a number of independent choices
that set a mode (e.g., display a toolbar or generate callback function prototypes).
The Value property indicates the state of the check box by taking on the value of the Max or
Min property (1 and 0 respectively by default):
40
You can determine the current state of a check box from within its callback by querying the
state of its Value property, as illustrated in the following example:
function checkbox1_Callback(h,eventdata,handles,varargin)
if (get(h,'Value') == get(h,'Max'))
else
end
Edit Text
Edit text controls are fields that enable users to enter or modify text strings. Use edit text
when you want text as input. The String property contains the text entered by the user.
To obtain the string typed by the user, get the String property in the callback.
user_string = get(h,'string');
MATLAB returns the value of the edit text String property as a character string. If you want
users to enter numeric values, you must convert the characters to numbers. You can do this using
the str2double command, which converts strings to doubles. If the user enters non-numeric
characters, str2double returns NaN.
You can use the following code in the edit text callback. It gets the value of the String
property and converts it to a double. It then checks if the converted value is NaN, indicating the
user entered a non-numeric character (isnan) and displays an error dialog (errordlg).
41
function edittext1_Callback(h,eventdata,handles,varargin)
user_entry = str2double(get(h,'string'));
if isnan(user_entry)
end
On UNIX systems, clicking on the menubar of the figure window causes the edit text
callback to execute. However, on Microsoft Windows systems, if an editable text box has focus,
clicking on the menubar does not cause the editable text callback routine to execute. This
behavior is consistent with the respective platform conventions. Clicking on other components in
the GUI execute the callback.
Static Text
Static text controls displays lines of text. Static text is typically used to label other controls,
provide directions to the user, or indicate values associated with a slider. Users cannot change
static text interactively and there is no way to invoke the callback routine associated with it
Frames
Frames are boxes that enclose regions of a figure window. Frames can make a user interface
easier to understand by visually grouping related controls. Frames have no callback routines
associated with them and only uicontrols can appear within frames (axes cannot).
Frames are opaque. If you add a frame after adding components that you want to be
positioned within the frame, you need to bring forward those components. Use the Bring to
Front and Send to Back operations in the Layout menu for this purpose.
42
List Boxes
List boxes display a list of items and enable users to select one or more items.
The String property contains the list of strings displayed in the list box. The first item in the
list has an index of 1.
The Value property contains the index into the list of strings that correspond to the selected
item. If the user selects multiple items, then Value is a vector of indices. By default, the first item
in the list is highlighted when the list box is first displayed. If you do not want any item
highlighted, then set the Value property to empty.
The ListboxTop property defines which string in the list displays as the top most item when
the list box is not large enough to display all list entries. ListboxTop is an index into the array of
strings defined by the String property and must have a value between 1 and the number of
strings. Noninteger values are fixed to the next lowest integer
The values of the Min and Max properties determine whether users can make single or
multiple selections:
If Max - Min > 1, then list boxes allow multiple item selection.
If Max - Min <= 1, then list boxes do not allow multiple item selection.
Selection Type
Listboxes differentiate between single and double clicks on an item and set the figure
SelectionType property to normal or open accordingly. See Triggering Callback Execution for
information on how to program multiple selection.
MATLAB evaluates the list box's callback after the mouse button is released or a keypress
event (including arrow keys) that changes the Value property (i.e., any time the user clicks on an
item, but not when clicking on the list box scrollbar). This means the callback is executed after
43
the first click of a double-click on a single item or when the user is making multiple selections .
In these situations, you need to add another component, such as a Done button (push button) and
program its callback routine to query the list box Value property (and possibly the figure
SelectionType property) instead of creating a callback for the list box. If you are using the
automatically generated application M-file option, you need to either:
Set the list box Callback property to the empty string ('') and remove the callback
subfunction from the application M-file. Leave the callback subfunction stub in the application
M-file so that no code executes when users click on list box items.
The first choice is best if you are sure you will not use the list box callback and you want to
minimize the size and efficiency of the application M-file. However, if you think you may want
to define a callback for the list box at some time, it is simpler to leave the callback stub in the M-
file.
Popup Menus
Popup menus open to display a list of choices when users press the arrow. The String
property contains the list of string displayed in the popup menu. The Value property contains the
index into the list of strings that correspond to the selected item. When not open, a popup menu
displays the current choice, which is determined by the index contained in the Value property.
The first item in the list has an index of 1.
Popup menus are useful when you want to provide users with a number of mutually
exclusive choices, but do not want to take up the amount of space that a series of radio buttons
requires
You can program the popup menu callback to work by checking only the index of the item
selected (contained in the Value property) or you can obtain the actual string contained in the
selected item.
This callback checks the index of the selected item and uses a switch statement to take action
based on the value. If the contents of the popup menu is fixed, then you can use this approach.
44
function varargout = popupmenu1_Callback(h,eventdata,handles,varargin)
val = get(h,'Value');
switch val
case 1
case 2
% etc.
This callback obtains the actual string selected in the popup menu. It uses the value to index
into the list of strings. This approach may be useful if your program dynamically loads the
contents of the popup menu based on user action and you need to obtain the selected string. Note
that it is necessary to convert the value returned by the String property from a cell array to a
string.
val = get(h,'Value');
string_list = get(h,'String');
You can control whether a control responds to mouse button clicks by setting the Enable
property. Controls have three states:
off - The control is disabled and its label (set by the string property) is grayed out.
45
inactive - The control is disabled, but its label is not grayed out. When a control is disabled,
clicking on it with the left mouse button does not execute its callback routine. However, the left-
click causes two other callback routines to execute: First the figure WindowButtonDownFcn
callback executes. Then the control's ButtonDownFcn callback executes. A right mouse button
click on a disabled control posts a context menu, if one is defined for that control. See the Enable
property description for more details.
Axes
Axes enable your GUI to display graphics (e.g., graphs and images). Like all graphics objects,
axes have properties that you can set to control many aspects of its behavior and appearance. See
Axes Properties for general information on axes objects.
Axes Callbacks
Axes are not uicontrol objects, but can be programmed to execute a callback when users click
a mouse button in the axes. Use the axes ButtonDownFcn property to define the callback.
GUIs that contain axes should ensure the Command-line accessibility option in the
Application Options dialog is set to Callback (the default). This enables you to issue plotting
commands from callbacks without explicitly specifying the target axes.
If a GUI has multiple axes, you should explicitly specify which axes you want to target when
you issue plotting commands. You can do this using the axes command and the handles
structure. For example, axes(handles.axes1) makes the axes whose Tag property is axes1 the
current axes, and therefore the target for plotting commands. You can switch the current axes
whenever you want to target a different axes. See GUI with Multiple Axes for and example that
uses two axes.
46
Figure
Figures are the windows that contain the GUI you design with the Layout Editor. See the
description of figure properties for information on what figure characteristics you can control.
47
Chapter 4
Proposed Method
4.1 Proposed Framework
To tackle this problem, we propose a novel NR-IQA frame- work called deep blind
image quality assessor (DIQA). The DIQA is trained in two separated stages as shown in Fig. 1.1
In the first stage, an objective error map is used as a proxy training target to expand the data set
labels. The existing database provides a subjective score for each distorted image. In other
words, one training data item includes a mapping from a 3-D tensor (width, height, and channel)
to a scalar value. Given a distorted image Id and a scalar subjective score S, the optimal
parameter of a model θ should be sought by arg minθ f (Id ; θ )−S2, where f (·) is a prediction
function. In contrast, the DIQA utilizes reference images during training and generates a 2-D
intermediate target called the objective error map. Please note that the reference images are
accessible during training as long as the database provides them, and the ground-truth objective
error map can be easily derived by comparing the reference and distorted images. By expanding
the training target to a 2-D error map e, we have arg minθ (i,j) f (Id ; θ )(i, j) − e(i, j)2, where (i,
j) is a pixel index. In other words, it yields the same effect of increasing the number of training
pairs up to the dimensions of the error map by giving more constraints. Once the deep neural
network is trained with sufficient training data set, the model is fine tuned to predict the
subjective scores. Since the objective error map is somewhat correlated with the subjective score,
the second stage can be trained without great difficulty by using even a limited data set. In the
end, our model can predict the subjective scores without accessing the ground- truth objective
error maps during testing.
Overall, we resolve the NR-IQA problem by dividing it into the objective distortion and
the HVS-related parts. In the objective distortion part, a pixel wise objective error map is
predicted using the CNN model. In the HVS-related part, the model further learns the human
visual perception behavior.
However, there persists another problem in the objective error map prediction phase.
When severe distortion is applied to an image and its high-frequency detail is lost, its error map
obtains more high-frequency components. Meanwhile the distorted image does not have high-
frequency details. Therefore, without the reference image, it is difficult to predict an accurate
error map from the distorted image, in particular, on homogeneous regions. To avoid this
problem, we propose deriving a reliability map by measuring textural strength to compensate for
the inaccuracy of the error map.
To visualize and analyze the learned human visual sensitivity, we further propose an
alternative model, which we call DIQA-SENS. We use two separated CNN branches where each
48
is dedicated to learn the objective distortion and the human visual sensitivity, respectively. In
particular, the visual sensitivity branch predicts local visual weights of the objective error map
by seeing the triplet of a distorted image, its objective error map, and its ground-truth subjective
score. The multiplication of the objective error map and the sensitivity map results in a
perceptual error map, which can explain the degree of distortion in the perspective of the HVS.
1) Using the simple objective error map, the training data set can be easily augmented, and the
deep CNN model can be trained without an over fitting problem.
2) DIQA is trained via end-to-end optimization so that the parameters can be thoroughly
optimized to achieve state- of-the-art correlation with human subjective scores.
3) DIQA-SENS generates the objective error map and the perceptual error maps as intermediate
results, which provide an intuitive analysis of local artifacts given distorted images.
Most previously proposed NR-IQA methods were developed based on the machine learning
framework. Researchers attempted to design elaborate features that could discriminate distorted
images from the pristine images. One popular feature is a family of NSS that assumes that
natural scenes contain statistical regularities. Various types of NSS features have been defined in
transformation and spatial domains in the literature. Moorthy and Bovik [1] extracted features in
the wavelet domain, and Saad et al. [2] defined them in the discrete cosine transform
coefficients. Recently, Mittal et al. [11], [12] captured NSS features using only locally
normalized images without any domain transformation.
In addition to NSS features, various kinds of features have been developed for NR-IQA.
Li et al. [13] employed a general regression neural network relative to phase congruency,
entropy, and image gradients. Tang et al. [14] considered such multiple features as natural image
statistics, distortion textures, blur, and noise statistics. Meanwhile, in [15] and [16], dictionary
learning was adapted to capture effective features from the raw patches. Most of these studies
were based on conventional machine learning algorithms, such as SVMs and NNs. Since such
models have a limited number of parameters, the size of the data set was not a significant issue.
However, they yielded lower accuracies than FR-IQA metrics.
Relatively recently, attempts have been made to adopt a deep learning technique for the
NR-IQA problem to enhance prediction accuracy [42]. Hou et al. [3] used a DBN, where NSS-
related features were extracted in the wavelet domain and fed into the deep model. Similarly, Li
et al. [4] derived NSS-related features from Shearlet-transformed images. The extracted features
were then regressed onto a subjective score using a stacked autoencoder. Lv et al. [17] used DoG
features and the stacked autoencoder. Ghadiyaram and Bovik [18] attempted to capture a large
49
number of NSS features using multiple transforms and then used a DBN to predict the subjective
score. However, most studies have used the deep model in place of the conventional regression
machine. This involved designing handcrafted features of sufficiently small size such that the
neural networks were not sufficiently deep to take full advantage of deep learning. Kang et al.
[19] applied a CNN to the NR-IQA problem without handcrafted features to conduct end-to-end
optimization. To resolve the data set size, an input image was divided into multiple patches, and
an equal mean opinion score (MOS) was used for all patches in an image. Strictly speaking, this
approach cannot reflect properties of the HVS, the pixelwise perceptual quality of which varies
over the spatial domain. Bosse et al. [20] adopted a deep CNN model with 12 layers. The loss
function was similar to [19]; however, they suggested an additional model, which learns the
individual importance of each patch. Recently, we proposed a CNN-based NR-IQA framework,
where FR-IQA metrics were employed as intermediate training targets of the CNN [21], and the
statistical pooling over minibatch was introduced for end-to-end optimization. On the other hand,
to overcome the limited training set, other attempts have been made by generating discriminable
image pairs [22], or employing multitask learning [23]. In contrast to past work, the DIQA
resolves the issue of the lack of a data set by utilizing reference images in training to generate an
intermediate target. Different from our previous work [21], the DIQA does not depends on
complicated FR-IQA metrics. In addition, the DIQA uses only convolutional layers in the
pretraining stage so that the model can be deeper and can use a larger proxy target. Our proposed
framework achieves state-of-the-art prediction accuracy using the strong representation
capability of CNN models.
The overall framework of the DIQA is shown in Fig. 1.1. Once an input-distorted image is
normalized it passes through two paths: 1) a CNN branch and 2) a reliability map prediction
branch. In the first training stage, the CNN branch is trained to predict an objective error map e
The ground-truth error map egt is obtained by comparing the reference and distorted images. In
the second stage, the model is further trained to predict a human subjective score S (Section
4.3.5). In each stage, the reliability map r is supplemented to compensate the inaccuracy on
homogeneous regions.
The design of the proposed CNN architecture is motivated by [24]. The structure of the DIQA is
shown in Fig. 4.3.1. For the error map prediction part, the model consists of only convolutional
layers and zeros are padded around the border before each convolution; therefore, the output
does not lose relative pixel position information. Each layer except the last one has a 3 × 3 filter
and a rectified linear unit (ReLU) [25]. We call the output of Conv8 as a feature map (filled with
yellow in Fig.4.3.1), which is reused for the second stage of training. In the last layer of the first
training stage, the feature map is reduced to a one-channel objective error map using a 1 × 1
filter without nonlinear activation. If we directly feed the predicted error map into the modules of
50
the second stage, it would hinder the abundant representation of features, because there is only
one channel in the error map. To avoid this problem, we employ a simple linear combination ove
channels in Conv9, so that we can generate a meaningful feature map closely related to the
ground-truth error map, meanwhile having multiple channels for better representation. The size
of the output of Conv9 is 1/4 times the original input image. Correspondingly, the ground-truth
objective error maps are downscaled by 1/4. For the down sampling operation, convolution with
a stride of 2 is used. In the second training stage, the extracted feature map is fed into the global
average pooling layer followed by two fully connected layers. We additionally use two
handcrafted features, which will be
Fig. 4.3.1 Architecture of the objective pixel error map prediction subnetwork. “Conv”
indicates the convolutional layers, and “FC” indicates fully connectedlayers. The text below
“Conv” indicates its size of filter. The red (blue) arrows indicate the flows of the first
(second) stage.
explained later. The handcrafted features are concatenated with the pooled features before FC1,
and then regressed onto a subjective score. For convenience, we denote the procedure from
Conv1 to Conv8 by f (·), the operation of Conv9 by g(·), and the procedure including FC1 and
FC2 by h(·).
As a preprocessing, the input images are first converted to grayscale, and they are subtracted
from their low-pass filtered images. Let Ir be a reference image and Id be the corresponding
distorted image. The normalized versions are then denoted by ˆIr and ˆId , respectively. The low-
frequency image is obtained by downscaling the input image to 1/4 and upscaling it again to the
original size, which is denoted by Ilowr and Ilowd . A Gaussian low-pass filter and subsampling
were used to resize the images.
There are two reasons for this simple normalization. First, image distortions barely affect
the low-frequency component of images. For example, white Gaussian noise (WN) adds random
high-frequency components to images, GB removes high-frequency details, and blocking
51
artifacts introduce new high-frequency edges. The distortions due to JPEG and JPEG2000
(JP2K) can be modeled by a combination of these artifacts [26]. Second, the HVS is not sensitive
to a change in the low-frequency band. The CSF shows a bandpass filter shape peaking at
approximately four cycles per degree, and sensitivity drops rapidly at low frequency [8].
Although there are small distortions in the low-frequency band, the HVS hardly notices them.
Though there are benefits of employing this normalization scheme, there is also a drawback of
losing information. To compensate this, two handcrafted features are supplemented in the second
training stage.
Many distortions, such as quantization by JP2K, or GB, make images blurry. However,
unlike FR-IQA, it is difficult to determine whether the blurry region is distorted without knowing
its pristine image. Furthermore, as severe distortion is applied to an image, its error map receives
more high- frequency components. Meanwhile, the distorted image loses more high-frequency
details, as shown in Fig. 3. Therefore, the model is likely to fail to predict the objective error map
on homogeneous regions.
Fig. 4.3.3 Examples of estimated reliability maps. (a)–(c) JPEG2000 distorted images in the
TID2013 data set at the distortion levels of 1, 3, and 5. (d)–(f) Difference maps derived by
using (5). (g)–(i) Reliability maps of (a)–(c).
52
To avoid this problem, the reliability of the predicted error map is estimated by measuring the
texture strength of the distorted image. Our assumption is that blurry regions have lower
reliability than textured regions. Preprocessed images that are bandpassed are used to measure
the reliability map as
where α controls the saturation property of the reliability map. To normalize the
reliability map, the positive half of the sigmoid function is used in (1), so that pixels with small
values are assigned sufficiently large reliability values.
The images shown in Fig. 4.3.3(a)–(c) are distorted by JPEG2000 at different levels, and
the corresponding reliability maps with α = 1 are shown in Fig. 4.3.3(d)–(f). It can be easily
checked that it is difficult to derive an accurate error map (f) from severely distorted images (c).
The estimated reliability maps are shown in Fig. 4.3.3(g)–(i). As shown in Fig. 4.3.3(i),
Fig. [Link] Histograms of error maps with different values of p. (a) p = 1. (b) p = 0.2.
the reliability map has zero values, where there is no meaningful spatial information in Fig.
4.3.3(c).
To prevent the reliability map from directly affecting the predicted score, it is divided by
its average as
53
4.3.4 Learning Objective Error Map
In the first stage of training, the objective error maps are used as proxy regression targets
to get the effect of increasing data. The loss function is defined by the mean squared error
between the predicted and ground-truth error maps
where f (·) and g(·) are defined in Fig. 2, θ represents the CNN’s parameters, and egt is defined
by
Here, any error metric function can be used for err(·). In our experiment, we chose the exponent
difference function
where p is the exponent number. When an absolute difference ( p = 1) is used for the error metric
function, most values in the error maps are small numbers close to zero. In this case, the model
tends to fail to predict an accurate error map. When the training process converges, most values
were zero in the experiment. Therefore, we chose p = 0.2 to spread the distribution of the
difference map over the higher values. Fig. [Link] shows a comparison of histograms for the two
exponent numbers, where the histogram of p = 0.2 has a broader distribution between 0 and 1.
Once the model is trained to predict the objective error maps, we move to the next
training stage, where DIQA is trained to predict subjective scores. To achieve this, the trained
sub network f (·) is connected to a global average pooling layer followed by the fully connected
layers as shown in Fig. 4.3.1. The feature map is averaged over spatial domain leading to a 128-
D feature vector. Here, to compensate the lost information, we consider two additional
handcrafted features: the mean of the non normalized reliability map μr and the standard
deviation of the low- frequency of distorted image allowed . If the distorted image is too blurred,
the reliable area becomes too small. In this case, the overall textural strength of the distorted
image becomes an important feature, which can be captured by μr. Therefore, the loss function is
defined as
54
where f (·) is a nonlinear regression function, S is the ground- truth subjective score of the input-
distorted image, and v is the pooled feature vector. v is defined by:
4.3.6 Training
In this section, we describe the training details of the DIQA. The layers for error map
prediction are first trained by minimizing (3), where the ground-truth error map is derived from
(5). When the first stage converges to a sufficient extent, (6) is then minimized in the second
stage.
Since zeros are padded before each convolution, the feature maps near the borders tend to
be zeros. Therefore, during the minimization of the loss functions in (3) and (6), we ignored
pixels near borders around the error and the perceptual error maps. Each of four rows or columns
for each border was excluded in the experiment, which compensated for information loss in the
last two convolutional layers.
In the DIQA framework, the sizes of input images must be fixed to train the model on a GPU.
Therefore, to train the DIQA using images of various sizes, such as in the LIVE IQA database
[5], each input image should be divided into multiple patches of the same size. Here, the step of
the sliding window is determined by the patch size and the number of ignored pixels around the
borders to avoid overlapping regions when the perceptual error map is reconstructed. When the
ignored pixels around the borders are four, the step should be 4, where steppatch = sizepatch −32
is determined by 4×2 (both sides of the border) ×4 (upscaling by 4). In the experiment with the
LIVE IQA database, the patch size was 112 × 112 and each step was 80 × 80. In addition, during
the training of the second stage, all patches composing an image should be in the same mini-
batch [21], so that v, μr, and σIlow d can be derived from the reconstructed perceptual error and
reliability maps.
55
Chapter 5
SIMULATION RESULTS
56
57
58
Figs 5.1 Simulation Results
59
Chapter 6
PROGRAM
clc
clc
clear all
close all
warning off
[filename, pathname] = uigetfile({'*.jpg'},'pick an image');
if isequal(filename,0) || isequal(pathname,0)
warndlg('File is not selected');
else
end
[pathstr,name,ext] = fileparts(filename);
filename11=char(filename);
I=imread(filename);
I=imresize(I,[256 256]);
nSig=imnoise(I,'poisson');
figure,imshow(nSig);
imwrite(nSig,'[Link]');
img=imcrop(nSig);
figure,imshow(img);title('Original image');
%%%%%%%%%%%%%%% Apply Discrete wavelet transform %%%%%%%%%%%%%%
[cA,cH,cV,cD]=dwt2(I ,'haar');
figure,subplot(2,2,1),imshow(mat2gray(cA));title('LL Image');
subplot(2,2,2),imshow(cH);title('LH Image');
subplot(2,2,3),imshow(cV);title('HL Image');
subplot(2,2,4),imshow(cD);title('HH Image');
60
v = y;
quiver(x,y,u,v);
w2=gray2rgb(y);
w2=num2cell(w2(:,1:7),2);
e_j=std(y);
disp('standard deviation of enargy---');
disp(e_j);
clc;
% Image fusion process start here
for k1=1:k
for p=1:2
for d1=1:2
for d2=1:3
k=6;
61
x = w1{k};
y = w2{k};
x=sort(x); %%%%%%%%%%% sorting or ordering rank
coefficents
D = (abs(x)-abs(y)) >= 0;
wf{k1}{p}{d1}{d2} = D.*x + (~D).*y; % image cat
end
end
end
end
[nndata AlexNet]=size(wf);
nn_data_image=0:0.01:nndata;
y=nn_data_image.^2;
net=newff(minmax(nn_data_image),[20,AlexNet],{'logsig','purelin','trainln'})
[Link]=4000;
[Link]=1e-25;
[Link]=0.01;
% net=train(net,nn_data_image,y);
out_nn=y(20);
out_nn=net(nn_data_image(20));
nn_data=(ceil(out_nn)./20)/out_nn;
out_AlexNet=ceil(nn_data);
out_AlexNet=length(out_AlexNet)-2;
%%%%%%%%% Apply inverse double density DWT %%%%%%%%%
load w
y = Iterative_HS(w,out_AlexNet,Fsf,sf);
y=imadd(y,33);
y=double(y);
figure; imshow(mat2gray(y));title('NR-IQA image');
imwrite(y,'out_img.jpg');
y=imresize(y,[256 256]);
nSig=imresize(nSig,[256 256]);
[PNSR,MSE,NC,SSIM]=measerr(nSig,y)
62
Chapter 7
CONCLUSION
We described a deep CNN-based NR-IQA framework. Applying a CNN to NR-IQA is a
challenging issue, because there are critical obstacles. In the DIQA, an objective error map was
used as an intermediate regression target to avoid overfitting with the limited database. When the
first training stage is not run enough, the DIQA suffers from the overfitting problem leading to a
degradation of performance. The input normalization and the reliability map increased the
accuracy significantly as well. The final DIQA model outperformed all the benchmarked full-
reference methods as well as no- reference methods. We further showed that the performance of
the DIQA is independent of the selection of the database. We additionally proposed the DIQA-
SENS to visualize and analyze the learned perceptual error maps. The perceptual error maps
followed the behavior of the HVS. In the future, we will investigate a new way to obtain more
meaningful sensitivity maps that can provide a more interpretable analysis with respect to the
HVS.
63
REFERENCES
[1] A. K. Moorthy and A. C. Bovik, “Blind image quality assessment: From natural scene
statistics to perceptual quality,” IEEE Trans. Image Process., vol. 20, no. 12, pp. 3350–3364,
Dec. 2011.
[2] M. A. Saad, A. C. Bovik, and C. Charrier, “Blind image quality assessment: A natural
scene statistics approach in the DCT domain,” IEEE Trans. Image Process., vol. 21, no. 8, pp.
3339–3352, Aug. 2012.
[3] W. Hou, X. Gao, D. Tao, and X. Li, “Blind image quality assessment via deep learning,”
IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 6, pp. 1275–1286, Jun. 2015.
[4] Y. Li et al., “No-reference image quality assessment with shear- let transform and deep
neural networks,” Neurocomputing, vol. 154, pp. 94–109, Apr. 2015.
Image Process., vol. 15, no. 11, pp. 3440–3451, Nov. 2006.
[6] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale
hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun.
2009, pp. 248–255.
[7] S. J. Daly, “The visible differences predictor: An algorithm for the assessment of image
fidelity,” Proc. SPIE, vol. 1666, pp. 179–206, Jan. 1992.
[8] A. B. Watson and A. J. Ahumada, “A standard model for foveal detection of spatial
contrast,” J. Vis., vol. 5, no. 9, p. 6, 2005.
[9] G. E. Legge and J. M. Foley, “Contrast masking in human vision,” J. Opt. Soc. Amer., vol.
70, no. 12, pp. 1458–1471, Dec. 1980.
[10] D. J. Field, “Relations between the statistics of natural images and the response properties
of cortical cells,” J. Opt. Soc. Amer. A, Opt. Image Sci., vol. 4, no. 12, pp. 2379–2394, 1987.
64
[13] C. Li, A. C. Bovik, and X. Wu, “Blind image quality assessment using a general
regression neural network,” IEEE Trans. Neural Netw., vol. 22, no. 5, pp. 793–799, May 2011.
[14] H. Tang, N. Joshi, and A. Kapoor, “Learning a blind measure of perceptual image
quality,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2011, pp. 305–312.
[15] P. Ye, J. Kumar, L. Kang, and D. Doermann, “Unsupervised feature learning framework
for no-reference image quality assessment,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
(CVPR), Jun. 2012, pp. 1098–1105.
[16] J. Xu, P. Ye, Q. Li, H. Du, Y. Liu, and D. Doermann, “Blind image quality assessment
based on high order statistics aggregation,” IEEE
Trans. Image Process., vol. 25, no. 9, pp. 4444–4457, Sep. 2016.
[17] Y. Lv, G. Jiang, M. Yu, H. Xu, F. Shao, and S. Liu, “Difference of Gaussian statistical
features based blind image quality assessment: A deep learning approach,” in Proc. IEEE Int.
Conf. Image Process. (ICIP), Sep. 2015, pp. 2344–2348.
[18] D. Ghadiyaram and A. C. Bovik, “Feature maps driven no-reference image quality
prediction of authentically distorted images,” Proc. SPIE, vol. 9394, p. 93940J, Mar. 2015.
[19] L. Kang, P. Ye, Y. Li, and D. Doermann, “Convolutional neural net- works for no-
reference image quality assessment,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
(CVPR), Jun. 2014, pp. 1733–1740.
[20] S. Bosse, D. Maniry, T. Wiegand, and W. Samek, “A deep neural network for image
quality assessment,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Sep. 2016, pp. 3773–3777.
[21] J. Kim and S. Lee, “Fully deep blind image quality predictor,” IEEE J. Sel. Topics Signal
Process., vol. 11, no. 1, pp. 206–220, Feb. 2017.
[22] K. Ma, W. Liu, T. Liu, Z. Wang, and D. Tao, “dipIQ: Blind image quality assessment by
learning-to-rank discriminable image pairs,” IEEE Trans.
[23] K. Ma, W. Liu, K. Zhang, Z. Duanmu, Z. Wang, and W. Zuo, “End-to- end blind image
quality assessment using deep neural networks,” IEEE
Trans. Image Process., vol. 27, no. 3, pp. 1202–1213, Mar. 2018.
[24] K. Simonyan and A. Zisserman. (Sep. 2014). “Very deep convolutional networks for
large-scale image recognition.” [Online]. Available: [Link]
65
[25] V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann
machines,” in Proc. Int. Conf. Mach. Learn. (ICML), 2010, pp. 807–814.
IEEE Trans. Image Process., vol. 15, no. 2, pp. 430–444, Feb. 2006.
[27] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. Int. Conf.
Learn. Represent. (ICLR), 2015, pp. 1–15.
[28] T. Dozat, “Incorporating Nesterov momentum into Adam,” in Proc. ICLR Workshop,
2016.
quality assessment models,” IEEE Trans. Image Process., vol. 26, no. 2,
[34] Final Report From the Video Quality Experts Group on the Validation of Objective
Models of Video Quality Assessment, Phase II (FR-TV2),
66
[35] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment:
From error visibility to structural similarity,” IEEE
Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
[36] L. Zhang, L. Zhang, X. Mou, and D. Zhang, “FSIM: A feature similarity index for image
quality assessment,” IEEE Trans. Image Process., vol. 20, no. 8, pp. 2378–2386, Aug. 2011.
[37] J. Kim and S. Lee, “Deep learning of human visual sensitivity in image quality
assessment framework,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017,
pp. 1969–1977.
[38] L. Zhang, L. Zhang, and A. C. Bovik, “A feature-enriched completely blind image quality
evaluator,” IEEE Trans. Image Process., vol. 24, no. 8, pp. 2579–2591, Aug. 2015.
[39] W. Xue, X. Mou, L. Zhang, A. C. Bovik, and X. Feng, “Blind image quality assessment
using joint statistics of gradient magnitude and Laplacian features,” IEEE Trans. Image Process.,
vol. 23, no. 11, pp. 4850–4862, Nov. 2014.
[40] Q. Li, W. Lin, J. Xu, and Y. Fang, “Blind image quality assessment using statistical
structural and luminance features,” IEEE Trans. Multimedia, vol. 18, no. 12, pp. 2457–2469,
Dec. 2016.
[41] A. Liu, W. Lin, M. Paul, C. Deng, and F. Zhang, “Just noticeable difference for images with
decomposition model for separating edge and textured regions,” IEEE Trans. Circuits Syst.
Video Technol., vol. 20, no. 11, pp. 1648–1652, Nov. 2010.
[42] J. Kim, H. Zeng, D. Ghadiyaram, S. Lee, L. Zhang, and A. C. Bovik, “Deep convolutional
neural models for picture-quality prediction: Challenges and solutions to data-driven image
quality assessment,” IEEE Signal Process. Mag., vol. 34, no. 6, pp. 130–141, 2017.
[43] J. Kim and S. Lee, “Deep learning of human visual sensitivity in image quality assessment
framework,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 1676–1684.
67