0% found this document useful (0 votes)
221 views75 pages

453 Deep CNN Based Blind Image Quality Predictor

This document is a major project report submitted by K. Sundhar for their Bachelor of Technology degree. It presents a deep CNN-based model for blind image quality prediction. The proposed framework uses a CNN to predict an objective pixel error map and reliability map for input images without any reference images. It then learns the relationship between these predicted maps and subjective human opinion scores to output a blind quality score. The model architecture, image normalization process, map prediction, learning objectives, and training procedure are described. Simulation results demonstrating the model's performance are also provided.

Uploaded by

sundhar kurukuri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
221 views75 pages

453 Deep CNN Based Blind Image Quality Predictor

This document is a major project report submitted by K. Sundhar for their Bachelor of Technology degree. It presents a deep CNN-based model for blind image quality prediction. The proposed framework uses a CNN to predict an objective pixel error map and reliability map for input images without any reference images. It then learns the relationship between these predicted maps and subjective human opinion scores to output a blind quality score. The model architecture, image normalization process, map prediction, learning objectives, and training procedure are described. Simulation results demonstrating the model's performance are also provided.

Uploaded by

sundhar kurukuri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

A

Major Project Report


on

Deep CNN-Based Blind Image Quality Predictor

Submitted in partial fulfillment of the requirement


for the continuation of

Bachelor of Technology
in
Electronics and Communication Engineering
By

K. SUNDHAR (16RJ1A0453)
Under the esteemed Guidance of

Mr. [Link] REDDY, [Link](Ph.D)


Associate Professor,
ECE Department,
Malla Reddy Institute Of Technology.

Department of Electronics and Communication Engineering


MALLA REDDY INSTITUTE OF
TECHNOLOGY
(Sponsored by Malla Reddy Educational Society)
Approved by AICTE & Affiliated to JNTU Hyderabad
Maisammaguda, Dhulapally Via Kompally,Secunderabad-500 100
MALLA REDDY INSTITUTE OF TECHNOLOGY
(Sponsored by Malla Reddy Educational Society)
.

Approved by AICTE & Affiliated to JNTUH, Hyderabad.


Maisammaguda, Dhulapally (Post via Kompally), Secunderabad– 500100
E-Mail:principalmrit@[Link], Ph +91-9346118805, 7207052040, 8121021991

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING

CERTIFICATE

This is to certify that the Mini project report entitled Deep CNN-Based Blind Image Quality
Predictor being submitted by

K. SUNDHAR – 16RJ1A0453
in partial fulfillment for the continuation of the Degree of Bachelor of Technology in
Electronics and Communication Engineering to the Jawaharlal Nehru Technological
University, Hyderabad is a record of bonafide work carried out under my guidance and
supervision.
The results embodied in this project report have not been submitted to any other
University or Institute for the award of any Degree.

(Internal Guide) Head of the Department

Mr. Y. Venkateswara Reddy, [Link],(Ph.D) C. LaxmiKanth Reddy,[Link],(Ph.D)


Associate Professor

Internal Examiner External Examiner


DECLARATION

I ,K. Sundhar hereby declare that, this project report Entitled Deep CNN-Based Blind
Image Quality Predictor is the bonafide work of mine carried out under the supervision
of Mr. Y. Venkateswara Reddy [Link](Ph.D). I declare that, to the best of my
knowledge, the work reported herein does not form part of any other project report or
dissertation on the basis of which a degree or award was conferred on an earlier occasion
to any other candidate. The content of this report is not being presented by any other
student to this or any other University for the award of a degree.

Signature:

K. SUNDHAR

Date:

i
ACKNOWLEDGEMENT

I take this opportunity to express my deepest gratitude and appreciation to all those who
have helped me directly or indirectly towards the successful completion of this project.

I express my deep sense of gratitude and thanks to Dr. K. SRINIVASA RAO


BE.,ME.,Ph.D.(ECE) , Principal, Malla Reddy Institute of Technology, for giving me this
opportunity to carry out the project work at a highly esteemed organization.

I express my gratitude to C. LaxmiKanth Reddy, [Link](Ph.D), HOD, Department


of ECE, for his constant co-operation, support and for providing necessary facilities throughout
the B. Tech program.

I would like to say a very warm thank you to my guide Mr. Y. Venkateswara Reddy
[Link](Ph.D) for giving me proper guidance and support.

Next, I would like to express my gratitude to my parents, friends and all the faculty of our
department for their cooperation and keen interest throughout this project.

Also, I would also like to thank all the teaching and non-teaching faculty members of
Electronics and Communication Branch.

ii
CONTENTS
List of Figures v
Abstract vi

S. No Chapter Page Number

1. Introduction 01
2. Literature Review 03
2.1 Review of Full Reference Image Quality Assessment (FR-IQA) 03
2.1.1 Limitations of FR-IQA Models 08
2.2 Review of Reduced Reference Image Quality Assessment (RRIQA) 08
2.2.1 Limitations of RR-IQA 11
2.3 Review of No Reference Image Quality Assessment (NR-IQA) 12
2.3.1 Distortion Specific NR-IQA 12
2.3.2 Generalized NR-IQA 16
2.4 Research Gaps 18
3. Software Introduction 20
3.1. Introduction to MATLAB 20
3.2 The MATLAB system 21
3.3 GRAPHICAL USER INTERFACE (GUI) 22
3.4 Getting Started 24
3.4.1 Introduction 24
3.4.2 Development Environment 24
3.4.3 Manipulating Matrices 24
3.4.4 Graphics 24
3.4.5 Programming with MATLAB 24
3.5 Development environment 25
3.5.1 Introduction 25
3.5.2 Starting MATLAB 25
3.5.3Quitting MATLAB 25
3.5.4 MATLAB Desktop 25
3.5.5 Desktop Tools 26
3.6 MANIPULATING MATRICES 29

iii
3.6.1 Entering Matrices 29
3.6.2 Expressions 30
3.6.3 Operators 32
3.6.4 Functions 32
3.7 GUI 33
3.7.1 Creating GUIs with GUIDE 34
3.7.2 GUI Development Environment 34
3.7.3 Features of the GUIDE-Generated Application M-File 35
3.7.4 Beginning the Implementation Process 36
3.7.5 User Interface Control 37
[Link] Method 48
4.1 Proposed Framework 48
4.2 RELATED WORK 49
4.3 DEEP IMAGE QUALITY ASSESSMENT PREDICTOR 49
4.3.1 Model Architecture 50
4.3.2 Image Normalization 51
4.3.3 Reliability Map Prediction 52
4.3.4 Learning Objective Error Map 54
4.3.5 Learning Subjective Opinion 54
4.3.6 Training 55
4.3.7 Patch-Based Training 55
5. simulation results 56
[Link] 60
[Link] 63
References 64

iv
LIST OF FIGURES

TITLE PAGE NUMBER

Fig. 1.1 Overall flowchart of DIQA 2


Fig 3.3 Graphical User Interface window 23
FIG 3.7.2 graphical user blocks 35
Fig. 4.3.1 Architecture of the objective pixel error map prediction sub network 51
Fig 4.3.3 Examples of estimated reliability maps 52
Fig. [Link] Histograms of error maps with different values of p 53
Figs 5.1 Simulation Results 59

v
ABSTRACT
Deep CNN-Based Blind Image Quality Predictor
Abstract— Image recognition based on convolutional neural networks (CNNs) has recently
been shown to deliver the state-of-the-art performance in various areas of computer vision and
image processing. Nevertheless, applying a deep CNN to no- reference image quality assessment
(NR-IQA) remains a challenging task due to critical obstacles, i.e., the lack of a training
database. In this paper, we propose a CNN-based NR-IQ framework that can effectively solve
this problem. The proposed method—deep image quality assessor (DIQA)—separates the
training of NR-IQA into two stages: 1) an objective distortion part and 2) a human visual system-
related part. In the first stage, the CNN learns to predict the objective error map, and then the
model learns to predict subjective score in the second stage. To complement the inaccuracy of
the objective error map prediction on the homogeneous region, we also propose a reliability map.
Two simple handcrafted features were additionally employed to further enhance the accuracy. In
addition, we propose a way to visualize perceptual error maps to analyze what was learned by
the deep CNN model. In the experiments the DIQA yielded the state-of-the-art accuracy on the
various databases.

Index Terms— Convolutional neural network (CNN), deep learning, image quality assessment
(IQA), no-reference IQA (NR-IQA).

vi
Chapter 1

INTRODUCTION
The goal of image quality assessment (IQA) is to predict the perceptual quality of digital
images in a quantitative manner. Digital images are likely to be inevitably degraded in the
process from content generation to consumption. The acquisition, transmission, storage, post
processing, or compression of images introduces various distortions, such as Gaussian white
noise, Gaussian blur (GB), or blocking artifacts. A reliable IQA algorithm can help quantify the
quality of images obtained blindly from the Internet and accurately assess the performance of
image processing algorithms, such as image compression and super-resolution, from the perspec
tive of a human observer. IQA is classified in general into three categories, depending on
whether a reference image (the pristine version of an image) is available: full-reference IQA
(FR-IQA), reduced-reference IQA (RR-IQA), and no-reference IQA (NR-IQA). In general, the
performance of these techniques, in order of decreasing accuracy, is FR-IQA RR-IQA, and NR-
IQA. However, since reference images are not accessible in a number of practical scenarios, NR-
IQA is most appropriate as the most general method.

The bit rate of computer networks has continued to increase in recent years and has
enabled the provision of high-quality entertainment to end users who do not have reference
images; hence, significant research efforts have been made to enhance the accuracy of NR-IQA
from the perspective of the end user. Many recently proposed NR-IQA algorithms involve the
use of machine learning, such as support vector machines (SVMs) and neural networks (NN), to
blindly predict image quality scores. Research has shown that the accuracy of NR-IQA depends
heavily on designing elaborate features. Natural scene statistics (NSS) [1], [2] is one of the most
successful features under the assumption that natural images have statistical regularity that is
altered when distortions are introduced. Due to the difficulties involved in obtaining reliable
features, research on NR-IQA has progressed significantly since NSS. Deep learning has lately
been adopted in a few NR-IQA studies as a different method from conventional approaches
based on NSS [3], [4]. However, most such studies have continued to use handcrafted features,
and deep models, such as deep belief networks (DBNs) and stacked auto encoders, have been
used in place of conventional regression machines.

A. Problems of Applying CNNs to NR-IQA

Convolutional neural networks (CNNs) form the most popular deep learning model
nowadays due to their strong representation capability and impressive performance. CNNs have
been successfully applied to various computer vision and image processing problems. The
performance of deep neural networks heavily depends on the number of training data. However,
the currently available IQA databases are much smaller compared to the typical computer vision
data set for deep learning. For example, the LIVE IQA database [5] contains 174–233 images for
each distortion type, while the widely used data set for image recognition contain more than 1.2

1
million pieces of labeled data [6]. Moreover, obtaining large-scale reliable human subjective
labels is very difficult. Unlike classification labels, constructing an IQA database requires a
complex and time- consuming psychometric experiment. To expand the training data set, one can
use data augmentation techniques such as rotation, cropping, and horizontal reflection.
Unfortunately, any transformation of images would affect perceptual quality scores. Moreover,
the perceptual process of the human visual system (HVS) includes multiple complex processes,
which makes

Fig.1.1 Overall flowchart of DIQA. The training process consists of two stages: regression
onto objective error maps and regression onto subjective scores. The squares with red
“train” (blue “train”) indicates that the sub network will be trained in the first (second)
stage.

training of a deep model with limited data set even harder. For example, the visual sensitivity of
the HVS varies according to spatial frequency of stimuli [7], [8], and the presence of texture
hinders other spatially coincident image changes [9]. In addition, the perceived signals go
through band pass, multi scale, and directional decompositions in the visual cortex [10]. Such
complex behaviors need to be embedded in the data set with human subjective labels. However,
it is difficult to claim that a small data set can represent general visual stimuli, which results in
an over fitting problem.

2
Chapter 2
LITERATURE REVIEW

Literature shows that many researchers have contributed in the development of image quality
assessment metrics in last decade. Articles published in journals, proceedings of conferences and
books are referred for literature survey. Literature based on all parameters is studied in order to
study their effects and possibility of their extension in no-reference color image quality
assessment. The models developed by researchers can be classified in three main categories.
They are Full Reference Image Quality Assessment (FR-IQA) metrics, Reduced Reference
Image Quality Assessment (RR-IQA) metrics and No Reference Image Quality Assessment
(NR-IQA) metrics. Literature shows that these image quality assessment models of each
category are further divided into two categories as image quality measures applicable to gray
scale images and image quality measures applicable to color images.

2.1 Review of Full Reference Image Quality Assessment (FR-IQA)

The development in the research of IQA models started with FR image quality metrics.
Conventionally image fidelity metrics like Mean Square Error (MSE), Peak Signal to Noise
Ratio (PSNR) were used to evaluate quality of images [126]. Though simple, they show certain
limitations. They focus on global errors and ignore the local errors. In contrast to foveated vision
property of HVS, they operate on pixel by pixel basis. Spatial relationship among pixels is an
important characteristic which is perceived by human eye. However, reordering the pixels does
not change distortion measurement in case of these metrics. Therefore these conventional metrics
fail to emulate human visual system. Research in the field of image quality assessment was truly
accelerated after the development of mean structural similarity metric [72]. It made an attempt to
measure the image quality based on preservance of the image structures, instead of calculating
pixel wise difference. Image quality score was computed by structure comparison, luminance
comparison and contrast comparison. Study reveals that most of the FR-IQA metrics are based
on structural similarity. Mean Structural Similarity Index Metric (M-SSIM) developed by Z.
Wang et al [71], [72], [73] attracted attention of entire IQA researcher community. M-SSIM
metric is based on the hypothesis that human eye is subjected to extract structural activity from
any image. 10 Luminance comparison, structure comparison and contrast comparison between
original image and distorted image is done using mean, variance and covariance of the images.
They all are combined as SSIM. Block wise quality score of the image is computed. Average of
block wise SSIM values is called as M-SSIM, the final quality score. This metric is based on
similarity measure and it quantifies any variation between the reference image and the degraded
image. The metric performed much better than conventional image fidelity measures on image
databases comprising of different distortions. It is well known that HVS is attracted by different
image textures with different degrees. Therefore authors have suggested a modification in the
metric. Spatially variant weighted average of SSIM index map can improve the HVS consistency
of this approach.

3
Literature shows that many further improvements in this model were suggested by
researchers to improve the performance. Perceivability of details in an image is dependent on
density of samples in the image, distance between observer and the image plane and finally, the
perceiving capacity of the visual system of the observer. A single scale method of quality
assessment is appropriate for specific settings. In order to provide more flexibility in case of
variations in resolution of the image, resolution of the display system and viewing distance, a
multi-scale M-SSIM model is proposed by Z. Wang et al [74]. This approach makes an attempt
to incorporate structural details at different resolutions. Iterative decomposition of images is
performed at different scales using low pass filtering followed by down sampling. Structure and
contrast comparison is done at every stage but luminance comparison is done only at last stage.
Final quality score is computed from quality measures obtained at different scales. Distorted
image often looks more similar to the distortion-less image if quality evaluation is done at larger
scale. It is quite obvious that the single-scale model exhibits higher HVS consistency with the
increase in scales. However, this method suffers from certain limitations. It is tested for only
JPEG and JPEG2000 image dataset in LIVE database. The metric is not effective on blurred and
noisy images. Authors have suggested incorporation of more systematic approach so that it can
be applicable to a broad range of applications.

M-SSIM metric fails to predict quality of images distorted by blur and noise. B. Liao et al
[18] proposed a dual scale structural similarity metric to improve the performance of MSSIM for
images distorted by blur and noise. In the proposed measure the first scale describes the clumsy
contours of the object called as macro edge image and the second scale describes 11 the subtle
edges called as micro edges. Micro edges reflect the detailed edges of the objects. Subtraction of
edge image from filtered image gives macro edge image and difference between this macro edge
image and original image is used as micro edge image. Computation of macro edge similarity
and micro edge similarity between original image and distortion image is used to generate the
final quality score. The metric outperforms especially for blur and noise distortions.

Although M-SSIM metric evaluates the image quality accurately, it suffers from the
limitation of its sensitivity to geometrical distortions. In order to increase immunity of the metric
to such non structural distortions, C. L. Wang et al [23] extended M-SSIM in wavelet domain.
Multilevel discrete wavelet transform is used to calculate wavelet coefficients of reference image
and that of degraded image. LH, HL, HH bands of same decomposition level are combined to
form one band each, for five levels. These five bands and the lowest subband (LL) are used to
get total six bands. DWT-SSIM is computed as similarity measure for each band. Weighted
mean of DWT-SSIM gives the final quality score. As human eye is highly sensitive to mid
frequency band it is assigned greater weight than that of other bands. Authors have concluded
that this metric outperforms M-SSIM. Evaluation of the metric is done using LIVE image
database after implementing non-linear regression. M-SSIM in spatial domain performs poor for
Gaussian blurred images. However, the metric in wavelet domain shows improvement in the
performance for Gaussian blurred images.

4
B. Wang et al [19] proposed HVS based SSIM metric based on frequency and spatial
characteristics of human eye. It is based on the hypothesis that human eye does not pay equal
attention to all regions in the image. Frequency sensitivity weight is calculated using DCT
coefficients. To mimic the foveated vision of human eye spatial affect weight is calculated in
spatial domain. These weights are used in the calculation of M-SSIM. This metric gives HVS
consistent results especially for badly blurred images. Content partitioned structural similarity
metric proposed by C. Li et al [75] is also based on the similar hypothesis where foveated vision
of human eye is taken into consideration to increase accuracy in prediction. Metrics proposed
discussed above [23], [19] are complex as far as computation is concerned as they make use of
transforms.

W. Xue et al [81] proposed an innovative approach in gradient magnitude similarity


metric. It is based on the concept that image gradient is affected because of image distortions.
Different 12 local structures in a distorted image suffer by different amount due to degradations.
The metric calculates pixel-wise similarity between the gradient magnitude maps of reference
and distorted images and also local quality map for overall image quality prediction after
pooling. The metric predicts perceptual image quality accurately and efficiently. J. Zhao et al
[82] proposed FR-IQA using regional weight. Authors believe that gradient similarity reflects the
image quality. The metric assigns different weights to different regions according to sensitivity
of human visual system. Regional weight map is computed in gradient domain and weighted
gradient metric gives single overall quality score. Results of experimentation prove that the
metric outperforms M-SSIM metric.

All these derivatives of M-SSIM are mainly intended to evaluate quality of gray scale
images. However, image suffers from deviation in color due to introduced distortion which must
be considered [121]. A. Ninnasi et al [2] also proposed DFT/DWT based SSIM for full reference
assessment of color images using color space conversion and sub band decomposition.
Perceptual sub band decomposition is used to mimic the multi-channel behavior of HVS.
Contrast sensitivity and masking functions are used to make the metric HVS consistent. Authors
have concluded that DFT performs better than DWT. They have also stated that the method
overestimates the masking effect in the region of strong edges with high contrast.

All the above image quality metrics exhibit good performance. However, they are
developed for evaluation of only gray scale images. They are employed in the assessment of
color images by applying them to intensity plane. However, it has been observed that the
introduction of distortion causes deviation in not only luminance information but also in
chrominance information which needs to be considered while evaluating color image quality. To
cater this need G. Fahmi et al [78] proposed a novel idea in SCIELAB metric based on color
difference to measure the quality of JPEG-compressed color images. Original image and
degraded image are transformed into CIEXYZ color space followed by filtration in spatial
domain. Then they are transformed into CIELAB color space. Histogram error features are
calculated in CIELAB color space. All images are classified as smooth or textured images and
5
training of classifier is done using histogram error feature vector. It has been observed that the
classifier has high accuracy of prediction for JPEG images, but the metric performs poor for
textured images. Authors have stated that human eye can not notice distortion in texture easily.
Therefore further improvement in the metric is needed. 13 V. Tsagaris et al [63] also proposed a
color image quality metric using CIELab color space for evaluation of color information.
Relative entropy divergence between the corresponding image planes is calculated using
Kullback-Leibler divergence formula. This method is useful in evaluation of color related
information in different regions of the image and in pseudo color procedures. A. Kolaman et al
[1] combined structural similarity metric with quaternion representation for color images in
Video Quality Metric (VQM). Quaternion matrix has three imaginary planes which are derived
from R, G and B planes of a color image. Using subjective test, authors have proved that
distortion like cross talk in color degrades the perceived image quality and blur has a stronger
effect on perceived image quality. Authors have concluded that the metric is able to measure
combined degradation of blur and saturation.

H. W. Chang et al [79] proposed an excellent full-reference quality metric using sparse


correlation coefficient. This model is based on bottom-up approach and simulates the receptive
fields of simple cells found in primary visual cortex. Sparse coding is used to correlate the test
image with original image. Sparse correlation coefficient is calculated to capture the correlation
between the two sets of outputs obtained from a sparse model of simple cell in receptive fields.
Fixed point Independent Component Analysis (ICA) algorithm is used to get the sparse codes of
the image. This metric correlates well with human visual system. However, use of bottom up
approach makes it complex. X. Cui et al [80] made use of artificial neural network. Model
extracts blocking and blurriness features from the gradient image and they are used to train
neural network for predicting quality. The model applies non linear correction in prediction to
increase the accuracy for JPEG, JPEG2000 and fast fading images. However this metric is
restricted to only few distortions.

A. Shnayderman et al [14], [84] proposed a new SVD based multidimensional image


quality measure. Singular values of an image represent relationship among the pixels of
underlying matrix. Hence they are used to measure dissimilarity between images. Original image
and test image are divided in small blocks and SVD is applied to each block. Error value
between the corresponding blocks of the two images is calculated and such error scores are
combined to predict the overall image quality. To minimize the computational burden, block size
of 8*8 is preferred. Distortion maps are presented for six types of distortions, each with 5 levels.
They are used as graphical measure. Authors have concluded that the metric outperforms M-
SSIM 14 metric. Authors have extended the same metric to color images by using YIQ color
space [15]. It outperforms MSSIM and PSNR metrics. S. Wang et al [59] used singular value
decomposition as a useful tool to separate content dependent and content independent
components in the image. For each design, specific assessment model is used according to its
distortion properties. Gradient and contrast similarity between reference and test image is

6
computed for assessment of content dependent part. PSNR is computed for assessment of content
independent part. Results are combined using non linear equation to obtain the perceptual quality
score. The metric gives HVS consistent results for majority of distortion types available in TID
image database.

M. Narwaria et al [46] proposed an excellent metric based on SVD using machine


learning. They extracted image features based on SVD and applied machine learning for pooling
those features to predict the image quality. F. Zhang et al [28] proposed a full-reference metric
for video quality measurement by treating luminance information and chrominance information
simultaneously in quaternion representation. The luminance information of color image is
encoded as a real part and chrominance information is encoded as an imaginary part of complex
number. Quality is predicted by computing block measure, frame measure and video measure.
Authors have stated that in future, instead of using quaternion representation, quaternion that
contains structural information can be developed. [8], [28], [48], [55], [59] and [62] also show
similar attempts made by researchers to use SVD as a tool to measure image quality. All these
metrics are mostly developed to assess image quality of only gray scale images and they are
applicable in full-reference image quality assessment. Singular value is a unique property of an
image. Research can be extended to study its usefulness in no-reference image quality
assessment or in the evaluation of color images.

Y. Wang et al [69] proposed effective assessment using quaternion representation of


structural information in color images. The local variance is calculated in luminance plane. It is
encoded as a real part while as R channel, G channel and B channel of a color image are used to
form the three imaginary parts of the quaternion. Quaternion matrices are obtained from original
image and distorted image. The angle formed by singular value feature vectors of the two
quaternion matrices is used to calculate amount of structural similarity. Images which have size
different than that of original image are also assessed by this method. 15 Magnitude and phase
are used to represent significant information of the images in frequency domain. Phase carries
useful information regarding important features like edges and contours. M. Narwaria et al [61]
explored scalable image quality measure based on Fourier transform. The proposed method
considers different amount of sensitivity of human eye to different frequency components. HVS
is not sensitive to higher frequency components. However it can detect distortion in lower
frequency components easily. Therefore binning of the components associated with high
frequency is used for dimensionality reduction. Regression method is used to combine the
quality scores calculated based on change in phase and change in magnitude. The metric also
uses scalability in order to reduce the amount of reference information required. It performs well
on publicly available databases. Authors have suggested that this metric can be extended to video
quality assessment and also to reduced-reference image quality assessment

7
2.1.1 Limitations of FR-IQA Models

All these metrics have exhibited good performance. However, they work only if original
image is available at the time of assessment of degraded image. Today in this era of multimedia
and internet, original image cannot be made available at the time of assessment in most of the
applications. Especially in the field of video communication, where bandwidth is a great
constraint it is not feasible. This situation demands no-reference image quality assessment which
is a challenging task. Reduced-reference IQA is a good compromise between FR and NR image
quality assessment.

2.2 Review of Reduced Reference Image Quality Assessment (RRIQA)

The framework of RR-IQA makes use of partial information carried by extracted features
of original image. This feature information is also transmitted along with the image as side
information [42], [87]. At the receiver side, similar features are extracted from received distorted
version of the image. Prediction of image quality is done by comparing such features with the
image features of original image. Such features must provide summary of original image in brief.
They should be able to reflect different types of distortions.

Z. Wang et al [86] presented an application of RR-IQA in quality aware images.


Features are extracted from high quality original image. They are embedded in the image before
16 transmitting the image. This embedded feature information is not visible in the picture. At the
receiver end, hidden features are extracted and decoded. Similar features are extracted from
degraded image received and prediction of image quality is done by comparing these two feature
sets. Authors have made use of features based on natural image statistics [4], [6], [114].
Quantization water-marking based data hiding technique is used to embed the features in the
image and a key is shared with receiver in order to decode these features. Histogram of wavelet
sub band coefficients of original image follows Laplacian distribution. Such distribution deviates
from Laplacian nature due to introduced distortion. Therefore histogram of wavelet sub band
coefficients is used as an NSS feature. KL divergence formula is used to measure the
dissimilarity between two distribution functions at the receiver. Authors have concluded that if
error correcting codes are introduced in the image before transmission it is also possible to
restore the original image to some extent from distorted image using proposed model.

G. Cheng et al [87] proposed RR-IQA based on natural scene statistics. It is based on the
hypothesis that natural images come from tiny space. They follow very specific distributions in
vertical as well as horizontal gradient domain which can be modeled by Laplacian distribution
function. Introduction of distortion in an image causes proportional deviation in the distribution.
The deviation between the model and actual distribution of image is calculated by Kullback-
Leibler (KL) divergence formula. Feature vector consists of variance and kurtosis of the model
distribution along with computed value of KL divergence. At the receiver end similar procedure
is carried out to extract features from received image. Evaluation of image quality takes place

8
using received features and extracted features. Correlation between subjective quality score and
objective quality score is greater than 90% for different distortions. Results of experimentation
prove that the method is more efficient than popular PSNR metric.

Reduced-reference perceptual quality metric proposed by T. Kusuma et al [60], [88]


employs several image quality measures. Weighted mean of respective image quality scores is
used as hybrid image quality measure. The metrics operate on five different artifacts observed in
blur, image activity, blocking and intensity masking detection. Like the NR approach, blocking
algorithm is based on extraction of average difference computed across the boundaries of blocks.
It is followed by calculation of zero crossing rate. Average of absolute differences between
image samples in a block is used to calculate the rate of zero-crossing. Machine 17 learning is
used to predict final blocking measure. Blur is calculated by measuring smoothing effect. As
increase in the activity in the image indicates introduced distortion, edge activity and gradient
based activity are also measured. Intensity masking detection is performed by using standard
deviation of first order histogram of the image. Weighted sum of above image quality metrics is
used as final quality score. This metric quantifies perceived image quality more accurately than
PSNR. It can be employed for ‘in service quality measurement’.

Y. Ming et al [89] proposed an effective reduced-reference IQA model for color images.
Color feature information is insensitive to geometrical distortions introduced during
transmission. Hence spatial domain color difference analysis is done in hue and saturation
planes. Frequency domain distortion analysis is conducted in intensity plane using contourlet
transform. Feature vector includes chrominance information, visual threshold and proportion of
visual sensitive coefficients. This information is transmitted through an auxiliary channel to the
receiver from transmitter and is employed in weighted evaluation of image quality at the
receiver. Authors have suggested the need of better mode of feature extraction as future
direction. Correlation between subjective quality score and objective quality score is 94%.
However, performance evaluation is done using only JPEG and JPEG 2000 images.

A. Rehman et al [90] proposed RR-IQA based on structural similarity estimation.


Multiscale, multi-orientation divisive normalization transform is used to extract local statistical
and spatial features. Structural similarity index metric gives efficient description of the image.
Use of distributed source coding technique reduces bandwidth requirement of RR features. It is
followed by regression using discretization method. It helps to normalize the measure across
different types of image distortions. Wavelet coefficients represent orientation of localized
structures in spatial domain and frequency domain. Divisive Normalization Transform (DNT) is
applied to wavelet coefficients of the image. Each coefficient is divided by local energy
calculated from its neighbouring coefficients. Any distortion introduced in the image changes
statistics of these coefficients. Subband distortion of distorted image is then measured by
calculating KL divergence between the original image and degraded image. This method needs
low data rate. Therefore it can be employed in quality monitoring and image repair task. Z.
Wang et al [91] also proposed use of DNT coefficients in a feature vector of RR-IQA as they
9
reflect the distortion in the image. Authors have stated that the metric correlates well with human
visual system. 18

J. A. Redi et al [92] proposed RR image quality metric using color distribution


information. The features based on color correlogram are used to analyze the deviations in the
color distribution of an image due to distortion. Color distribution describes both luminance and
hue information and needs low data rate. The metric makes use of two stages. First stage
identifies the type of distortion affecting the received signal. Proposed metric detects JPEG,
JPEG2000, blur and noise distortions. Second stage uses dedicated models to compute image
quality for each type of distortion. Due to use of dedicated distortion specific model in second
stage, the metric becomes a general purpose image quality metric with increased precision. Each
predictor is a neural regression machine and it is trained to assess quality. Thus, automatically
the most efficient predictor is used for specific distortion. Author has suggested that additional
existing metrics and new distortions can be included in the model to increase the accuracy of
prediction so that the system can be made effective for real life application problems.

L. Ma et al [93] proposed a novel idea of RR-IQA in recognized DCT (RDCT) domain.


This metric extracts intra subband and inter sub band statistical characteristics of an image in the
RDCT domain. Image shows statistical dependencies between DCT sub bands in DCT domain.
However, recognized DCT coefficients within same sub band exhibit identical coefficient
distribution which can be modeled by GGD model. Then city block distance is used to measure
the deviation of actual distribution from fitted curve. Mutual information contained between the
pair of DCT coefficients in respective RDCT sub bands is calculated to capture inter subband
error. A frequency ratio descriptor (FRD) is computed to measure the distribution of energy
among different frequency components. FRD can be used to simulate the HVS texture masking
property. A metric is developed by combining above three values. This metric shows consistency
with human visual system for all distortion types and outperforms FR image quality metrics like
PSNR and SSIM and also uses low data rate. Correlation between subjective quality score and
objective quality score is above 90% for all types of distortions except JPEG2000 image dataset.
Authors have suggested extension of the metric for video quality assessment as future direction.
X. Li et al [111] also developed NSS based RR-IQA using hybrid wavelets and directional filter
banks (HWD). It uses sub band coefficients of distorted image and original image.

proposed divisive normalization to model the perceptual sensitivity of biological vision


for RR-IQA. It also provides a useful image representation to improve 19 statistical
independence for natural images. Linear image decomposition of the image is done by using
wavelet transform as it provides localized representation of images in space, frequency and
orientation. Divisive Normalization Transformation (DNT) is computed to reduce statistical
redundancies between wavelet coefficients. DNT is calculated from Gaussian scale mixture
statistical model of image wavelet coefficients. DNT coefficients are used as features. Marginal
probability distributions of DNT coefficients of original image and distorted image are calculated
by using KL divergence. Image quality is evaluated by comparing reduced-reference statistical
10
features that are extracted from DNT domain representations of the reference image and
distorted image. This makes the metric more robust and gives HVS consistent results across a
wide variety of image distortions. Correlation between subjective quality score and objective
quality score is in the range of 85% to 90% for different distortions. Authors have concluded that
this model can be further developed to predict general no-reference image quality.

R. Soundararajan et al [96] developed information theoretic algorithm to predict image


quality. The change in image information due to distortion is measured using entropy of the
reference image and that of degraded image. Entropies of wavelet coefficients are used in the
feature vector. An algorithm is developed to measure differences between the entropies of
wavelet coefficients of reference image and distorted image. The average of absolute value of
difference between the scaled entropies of neural noisy reference and distorted images are used
as RRID indices in the measurement. The algorithm gives HVS consistent results with subjective
quality scores provided by LIVE image quality assessment database. Authors have suggested
need of efficient implementation of this model to reduce the complexity and time consumption
for implementation of the algorithm. This metric can be extended for video quality assessment
with low bit rate.

G. Cheng et al [30] proposed natural scene statistics based reduced-reference image


quality metric. Natural images follow Gaussian distribution in gradient domain. Introduction of
artifact causes deviation in this property. Hence features such as kurtosis, skewness and shape
parameter are extracted from pixel distribution in gradient domain. Predicted image quality
correlates well with HVS and the model is generic. All these reduced-reference image quality
metrics are proved to be useful in different applications of communication. Using these
techniques it is also possible to develop algorithm for repairing of the images to bring
improvement in quality. Accuracy of the metric is dependent on the amount of information 20
transmitted through the features. However, it is restricted by the resources available in the
system.

2.2.1 Limitations of RR-IQA

Models Success of RR-IQA method depends on the amount of feature information


transmitted and the available data rate. If the data rate increases, the assessment technique
approaches to FRIQA resulting in increased accuracy of prediction. However, this puts demand
of higher amount of bandwidth which is not economical due to tremendous growth in
transmission of huge amount of audio visual information over the communication channels. In
the applications where resources are limited, RR-IQA method is not useful. For example, in
wireless communication system available frequency spectrum is a big constraint for transmission
of the actual information. In such case large amount of feature information cannot be transmitted
as side information. In the applications of communication, limitation of low data rate makes the
design of RR-IQA similar to NR-IQA technique where accuracy is low due to increased
difficulty level. Moreover, RR-IQA cannot serve the purpose every time. In the applications like

11
multimedia, original image is not obtainable or it does not even exist. Only degraded image is
available which is to be evaluated. In such case design of NRIQA is the only solution.

2.3 Review of No Reference Image Quality Assessment (NR-IQA)

Although important, NR-IQA is a very difficult task [91]. Development of blind image
quality assessment metric is the only solution in many applications where the model evaluates
the image quality without any information about the distortion free image [97]. It is based on the
hypothesis that humans can easily distinguish between good quality image and poor quality
image [72]. Human eye-brain system is using effective pooling of information perceived from
degraded image to make the opinion about image quality. Prior NR-IQA models assume that the
distortion introduced in the image is known. Each distortion causes special visual effect in
perception of the image [25]. The NR-IQA algorithm is developed by focusing the possible
impairments in the image that would have been caused due to distortion. Fortunately, type of
distortion is known in most of the applications. For example, JPEG 2000 images suffer from
ringing and blurring artifacts. Such metrics are application specific or distortion specific.
Literature shows that available methods are mainly focused on gray scale images which ignores
the deviation in color component. 21

2.3.1 Distortion Specific NR-IQA

Block based transform coding is widely used in many image and video compression
algorithms. It causes visible blocking artifact in images due to abrupt change across block
boundaries. It has unpleasant effect in the perception. Popular FR-IQA technique like PSNR
does not correlate well with the distortion. Therefore many NR-IQA algorithms are focused to
predict image quality by measuring such blockiness in images. Generally in any distortion free
image, maximum energy is distributed among low frequency components and energy associated
with high frequency components is insignificant. However when blocking artifact is introduced,
boundaries of blocks in the image possess significant energy at high frequency. Authors [98]
have stated that true edges also show similar property. Therefore detection of true edges as
blocking artifacts is to be intelligently avoided. The algorithm developed by considering only
blocking artifacts may fail in the evaluation of good quality images, which must be taken care of
in order to increase accuracy of prediction in NR-IQA.

Z. Wang et al [130] proposed an efficient perceptual image quality metric for assessment
of JPEG compressed images using NR approach. Proposed method suggests extraction of
effective features that well reflect the blocking and blurring artifacts introduced during
quantization in JPEG coder. Blockiness is measured by estimating average of the difference
calculated across the boundaries of blocks of size 8*8. Blurriness reduces the signal activity.
Average of absolute difference between neighboring pixels in the block is used to measure
amount of blurriness. Subjective test results and extracted features are used to train the model. It
has been observed that results of the metric correlate highly with HVS for the two JPEG image

12
databases created. However, when the images in both the databases are combined it shows that,
overtraining of the model reduces its generalization. Design of the model in spatial domain
makes it computationally efficient and memory efficient. The method does not need to store the
whole image in the memory. Authors have stated that in future additional features can be used in
the training process so that the metric will be applicable to MPEG compression standard as well
as other types of distortions.

J. Bagade et al [119] proposed an excellent machine learning approach for measurement


of blocking artifacts. Block based spatial frequency measures and activity measures are
computed based on the rate of zero crossing and average of absolute difference between pixels in
the block. Back propagation artificial neural network is trained with extracted 22 features and
corresponding MOS values in order to measure ringing and blocking artifacts. The results show
that the metric correlates well with human judgment. Further J. Bagade et al [113] proposed NR
image quality metric for evaluation of JPEG images using block-based features like brightness,
amplitude, contrast and texture related parameters. Statistical first order and statistical second
order features are also extracted in frequency domain. These features and corresponding MOS
values are used to train back propagation neural network. These metrics are useful in the
prediction of quality of only gray scale JPEG images. Metrics need generalization in order to
extend them to other distortions in the images. Also the metrics are not applicable to color JPEG
images. In today’s world a vast data is transmitted using color JPEG images through internet. It
has been proved that there is deviation in chrominance information of the image due to JPEG
compression. Therefore it is needed to measure deviation in color in order to increase usefulness
of the metric.

S. Li et al [98] also proposed a no-reference image quality metric to measure strength of


blocking artifacts of JPEG images. The proposed method operates on one block boundary in
order to detect blocking artifacts. Generally the same pixel relationship is observed along all
block boundaries. When the boundary of the block shows discontinuity between pixels the block
is assumed to have blocking distortion and strength of artifact is measured for the block.
Histogram in gradient domain is used to detect the boundary of the block. Average of pixels
along horizontal boundaries and vertical boundaries is used as quality score. Non linearity is
introduced in this metric using regression. The metric correlates well with subjective score in
LIVE database. Accuracy is 97 % as long as the blocking artifacts have connectivity with each
other. However, authors have suggested that the metric is applicable to only JPEG image quality
assessment. Metrics which are able to detect other degradations need to be explored. Further S.
Suresh et al [112] developed modified machine learning extreme classifier to assess JPEG coded
images. The features sensitive to HVS such as edge length, edge amplitude, background
luminance and background activity are used for predicting the perceived image quality. In future
this metric can be modified for video quality assessment.

JPEG2000 is accepted as a standard for digital film and it is recommended for storage,
transmission and display of motion pictures. However, JPEG2000 images suffer from blur and
13
ringing due to distortion in pixels and edges. Hence it is needed to measure distortion in
JPEG2000 images to determine their usability. Z. M. Parvez Sazzad et al [70] proposed NR- 23
IQA for JPEG2000 images based on spatial features. The subjective experiment results on
JPEG2000 image database are used to train and test the model. Parameters based on pixel
distortion and edge distortion are used in feature vector to train the model. Mean pixel values of
neighbor hood, the absolute difference between central pixel and the second closest
neighborhood pixel are computed as pixel distortion measures. Zero crossing rate along
horizontal and vertical directions and histogram measure without and with edge preserving filter
are used as edge distortion measures. These measures are combined using particle swarm
optimization to measure final quality. The metric is consistent with HVS and is tested on images
from databases which are not used in training. Authors have suggested extension of the work for
generalization of the metric.

J. Zhang et al [83] proposed use of kurtosis for assessment of JPEG2000 images. It is


based on hypothesis that blur distortion is due to the attenuation of higher frequencies in the
image during JPEG2000 compression. The PDF of blurred image shows heavy tail in DCT
domain as compared to natural image. Hence the kurtosis increases with increase in blur. After
calculating difference with respect to the median value, average of local kurtosis is used as
quality score. This is a simpler metric and also limitation of edge extraction of distorted image
does not restrict the use of the metric. The metric does not require parameter calculation and
additional procedure of training data for parameter determination. Authors have suggested need
of introduction of ringing metric in the model as JPEG2000 images suffer from ringing distortion
also.

Distortion of blur appears as a loss of spatial detail and it reduces edge sharpness [104].
Following are the causes of occurrence of this distortion.

1. During the process of image acquisition images are blurred due to relative motion
between the object and the camera or due to out-of-focus capturing.

2. During image processing stages application of smoothing filter introduces blur in


resulting image

. 3. During compression techniques high frequency components are truncated in the


process of quantization which results in loss of details and blur [104]. Traditional no-reference
blur methods usually focus on blur distortion introduced due to coding artifacts. They are not
useful to assess general blur. 24 Ringing occurs in an image due to quantization of DCT or
wavelet coefficients at high frequency. This effect is seen as oscillations or ripples around the
contours and sharp edges in the spatial domain image. Very few NR-IQA metrics are proposed to
measure this distortion [110]. P. Marziliano et al [51], [106] proposed a novel idea of full-
reference and no-reference blur and full-reference ringing metric with low computational
complexity based on the analysis of adjacent regions and edges. These metrics measure the

14
spread of the edges to quantify amount of blur. The ringing is measured by the ripples around the
edges. Start point and end point are calculated in gradient domain. Blur is measured as spread of
the edge. After processing original image an FR-IQA model is developed for ringing artifact.
Ringing is measured from ring width found near the edges or contours. The metric outperforms
widely used PSNR metric. This metric is applicable in auto focusing of image capturing device,
coding optimization and network resource management. Correlation between subjective score
and objective score for FR blur metric and NR blur metric is calculated as 87% and 73%
respectively. Authors have suggested extension of FR-ringing metric to a metric without
reference. Generalization of the metric is also needed.

Partially blurred images show high aesthetic qualities. However, they affect saturation,
contrast and other features of images. Therefore F. Mavridaki et al [108] proposed measurement
of partial blur in frequency domain. Introduction of blur attenuates the high frequencies.
Therefore the metric uses information derived from the power spectrum of Fourier transform of
image to estimate the distribution of low and high frequencies. Image is partitioned into nine
patches followed by calculation of power spectra for whole image and partitions. A five bin
frequency histogram of these ten spectra form the feature vector. A Support Vector Machine
(SVM) classifier is trained using above features to evaluate the partial blur in testing image
dataset. Metric shows promising performance for naturally blurred and artificially blurred
images.

F. Kerouh et al [107] also performed blur measurement in wavelet domain. Multi


resolution analysis of wavelet transform is used to detect the blur. Wavelet decomposition can
extract high frequency components from an image. Therefore this transform is used in edge
analysis. High frequency coefficients are combined at each decomposition level to produce an
edge map. In wavelet transform domain, the pixel is considered as blurred if difference between
the pixel and average of its neighbors is smaller than a fixed threshold. Blur measurement is
done at different resolution levels. Finally linear regression is used to predict the image 25
quality. This technique uses edge detection. Homogeneous regions in the image are not mistaken
as blur in this measurement which is a common drawback of other blur metrics. Use of multi
resolution analysis results in smaller image size and less execution time.

R. Ferzli et al [109] proposed a metric to measure Just Noticeable Blur (JNB). It uses
probability summation over space to consider response of HVS to sharpness at different contrast
levels. JNB is the minimum perceivable blurriness around the edges. Blurriness around an edge
is masked up to a certain threshold which is equal to JNB. JNB is determined through
experimentation by calculation of standard deviation of Gaussian filter corresponding to JNB
threshold at given contrast. The corresponding edge width is called as JNB width and it is used
as blurriness measure. The image is divided into blocks of 64*64 pixels before edge detection
and edge width computation. JNB widths are obtained subjectively using local contrasts in the
neighborhood of the edges for the blocks containing higher amount of edge pixels. A comparison
between measured widths and JNB widths obtained subjectively is used to classify the
15
perceptible blur or imperceptible blur on each edge. Pooling of these values is done over all
edges in an edge block. Authors have suggested need of improvement in the performance of the
metric for very large blur values and need of incorporation of noise immunity in the model as
future direction.

Further an artificial neural network has been used by F. Shao et al [110] to combine a
blocking metric, blur metric, noise metric and a ringing metric of JPEG2000 image and estimate
the overall quality of an image. The model performs the detail characteristic analysis of the
specific distortion. Distortion-specific features are extracted which are used to train Support
Vector Regression (SVR). Extracted features contain gradient histogram information in order to
evaluate the degree of blur, frequency amplitude information in order to measure amount of
white noise, DCT coefficient contrast information in order to calculate the amount of blocking
artifacts and wavelet sub band decomposition in order to evaluate JPEG2000 images.
Experimental results show that this metric outperforms FR-IQA techniques. This metric can
measure the distortions which are included in the design. However the metric fails to measure
other distortions if introduced in the images. To increase the range of applications of the metric it
is needed to extend it to other types of distortions too.

2.3.2 Generalized NR-IQA

To develop an NR-IQA algorithm that can work for all types of distortions is a very
challenging task. Very few NR-IQA algorithms are available which are not restricted to single
distortion [91]. Literature shows that natural scene statistics has a great potential to solve the
problem of generalized NR image quality assessment [99]. Human brain is trained with high
quality images since childhood. It develops the models of high-quality image and learns to use
them to assess the quality of degraded image [91].

NR image quality model makes use of characteristics of human visual system. Original
distortion-free images are called as natural images or pristine images. Their quality is perfect. It
is believed that they form a tiny subset of a huge set of all possible images and exhibit similar
statistical properties. Any distortion introduced in the image causes deviation in this
characteristic. Statistical models of such high quality images are developed to describe the class
of high quality images. Such Natural Scene Statistics (NSS) based models have potential to
predict the amount of distortion without any reference. FR-IQA and RR-IQA models help to
know exact nature of statistical model of natural images while developing NR image quality
model.

A. Mittal et al [6], [114] proposed an excellent NSS based generic Blind Image Spatial
Quality Evaluator (BRISQUE) model for NR-IQA in spatial domain. It does not compute
distortion-specific features such as blocking, ringing or blur. The metric is based on the statistics
of luminance coefficients that are normalized locally. It is based on the hypothesis that the
normalized luminance coefficients follow Gaussian distribution in a natural image while as

16
introduction of distortion causes deviation in this characteristic. Support vector regression is used
for prediction of quality. Authors have stated that this method does not make use of any
transform to avoid computational burden and results are competitive with respect to all generic
NR-IQA algorithms available. Authors have suggested that these features can be extended for
detection of the distortions.

A. Mittal et al [4] extended the metric to completely blind quality analyzer. It can
evaluate the quality of distorted images with very little amount of prior knowledge of distortions.
Proposed model uses simple quality aware statistical features based on Natural Scene Statistics
(NSS) model. Such features of natural image are fitted with MultiVariate Gaussian 27 fit
(MVG). The distance between the model and MVG model developed using quality aware
features extracted from distorted image is used as image quality score. Authors have concluded
that the results are comparable with general NR-IQA models that are based on machine learning.
This model can be applied in unconstrained environment.

A. Moorthy et al [12] developed an effective two-step framework for predicting blind


image quality index. First stage is designed to identify the distortion which is followed by
evaluation of distortion in the second stage. Probability weighted sum is used to compute
numerical quality score (BLINDS). Wavelet coefficients of natural images follow General
Gaussian Distribution (GGD). Hence such coefficients of degraded image are fitted with GGD.
Support vector machine classifier [45] is used for classification of the distortion. Mean, variance
and shape parameters in different subbands in wavelet domain form the feature vector. Support
vector regression is used to compute image quality using the same set of features. Authors have
concluded that though the metric is consistent with HVS and useful in identification of
distortion, there is some room for improvement in assessment of JPEG2000 and fast fading
distortion.

D. Zhang et al [25] proposed use of Independent Component Analysis (ICA) to


decompose the image information into independent components. Distribution of each component
is modelled by Generalized Gaussian Distribution (GGD). ICA generative model is developed
from large samples of nature scenes. It gives a set of filters. Images are decomposed into
different independent components by applying each filter. Features that are extracted from
distorted image are fitted with GGD and KL Divergence is used to measure the dissimilarity
between the model features and distorted features. Proposed metric performs better for white
noise and Gaussian blur images than JPEG and JPEG 2000 images. Authors believe that the
performance of this model can be further improved by improving ICA based statistical model.
Authors have concluded that the use of more effective method is needed for measurement of
variation in features.

Y. Jhang et al [115] proposed NR-IQA using log derivative statistics of natural scenes.
The model is referred as DErivative Statistics-based Image QUality Evaluator (DESIQUE) and it
uses two stage frame work. The model performs distortion identification and distortion specific

17
evaluation. It extracts image quality related statistical features in both the spatial and frequency
domains at two image scales. Features are used to evaluate the image quality. Pixel 28 wise
statistics is used along with log derivative statistics computed using pairs of pixels in the spatial
domain. Model makes use of log-Gabor filters to extract high frequency components of an image
in frequency domain which is also modeled by the log-derivative statistics. These statistical
features are modeled by a generalized Gaussian distribution. The parameters of fitting are used
as the features in the proposed method. Experimental results show that the proposed method
shows improvement in the performance of current NR-IQA models. The model is
computationally efficient.

Researchers have reported following issues and challenges that are to be faced while
developing different IQA models.

1. Subjective viewing data is essential for verification of performance of a quality metric.


Proposed metric must be evaluated with different visual content and different types of distortions
to draw meaningful conclusion about its performance.

2. In full-reference image quality assessment original image is assumed to have perfect


quality. The evaluation of distorted image takes place by comparing it with original image.
However, in some situations original image may not have perfect quality. For example, in
contrast enhancement the quality of output image is superior to the quality of original image.
This situation is exactly contradictory to the assumption.

3. No-reference image quality assessment (NR-IQA) is the most difficult task in the field
of image analysis. General purpose NR image quality metric is a more complex system as it
handles several modes of artifacts. The metric must be designed to assess the unknown
distortions that may be introduced in the images in future.

4. The assumed statistical knowledge describing high quality images in NR-IQA is not
restricted to a single original image, but it is expressed as probability distribution of all high
quality natural images that fall within the space of possible images.

5. NR metrics must be able to differentiate between the signal and visible distortion with
limited input information. Many desired signals look like the typical artifacts.

6. Human visual system is complex and to develop a perfect model of the human eye is
not possible. Sensitivity of HVS to different errors is different and varies with visual context.

7. Mapping of computed quality score to corresponding MOS value to reflect the way
the HVS and the brain arrive at an assessment of the perceived distortion is a difficult task. 29
2.4 Research Gaps

It is observed that initially research was focused mainly on full-reference (FR) measures
that assume availability of original images at the time of assessment. In fact in this era of

18
multimedia, often times original image is not easily available or perhaps does not even exist.
Therefore more work is needed to be done to develop a robust reduced-reference (RR) and no-
reference (NR) metrics. The techniques which were published till 2010 for no-reference image
quality assessment are mainly distortion specific. They assume specific distortion in the image
and the metric is developed considering the statistics of that distortion. These metrics are not able
to quantify other types of distortions. Literature reveals that the development of generalized NR-
IQA technique has been accelerated recently. Still researchers have concluded that there is room
for improvement and this domain is far from maturity. Most of the generalized NR-IQA methods
make use of transform based approach which increases computational burden and slows down
the speed. It makes the metrics unsuitable in real time applications.

Moreover most of these measures are focused on gray scale image features. However as
human beings can discern thousands of shades of color as compared to only two dozen shades of
gray, color is employed as a powerful descriptor. It has been proved that color information also
suffers from deviation due to distortion in the images. Therefore it is a challenging demand of
this era to develop a generalized no-reference image quality metric for color images which will
be efficient, faster and simpler. It can be incorporated in real time applications. Hence it is
proposed to design a generalized framework for assessment of color images in different color
spaces using no-reference technique with following objectives.

To study subjective and objective approaches for gray/color image quality


measures/assessment techniques

To design full-reference HVS based color image quality assessment measure based on
different color spaces

To study deviation in color information of images due to compression

To develop a general no-reference quality measure to determine the quality of color


images with different color spaces (wiz. HSV, YCbCr etc)

Today’s algorithms developed for image quality assessment are performing well while
predicting human visual judgment in conventional applications. Still it is needed to extend 30 the
research in IQA to face the current challenges by focusing on the future challenges
simultaneously [127], [128]. Limitations of current IQA algorithms prove that there is a room for
development of alternative methods beyond the available techniques

19
CHAPTER 3
Software Introduction:
3.1. Introduction to MATLAB

MATLAB is a high-performance language for technical computing. It integrates


computation, visualization, and programming in an easy-to-use environment where problems and
solutions are expressed in familiar mathematical notation. Typical uses include

 Math and computation


 Algorithm development
 Data acquisition
 Modeling, simulation, and prototyping
 Data analysis, exploration, and visualization
 Scientific and engineering graphics
 Application development, including graphical user interface building

MATLAB is an interactive system whose basic data element is an array that does not
require dimensioning. This allows you to solve many technical computing problems, especially
those with matrix and vector formulations, in a fraction of the time it would take to write a
program in a scalar non interactive language such as C or FORTRAN.

The name MATLAB stands for matrix laboratory. MATLAB was originally written to
provide easy access to matrix software developed by the LINPACK and EISPACK projects.
Today, MATLAB engines incorporate the LAPACK and BLAS libraries, embedding the state of
the art in software for matrix computation.

MATLAB has evolved over a period of years with input from many users. In university
environments, it is the standard instructional tool for introductory and advanced courses in
mathematics, engineering, and science. In industry, MATLAB is the tool of choice for high-
productivity research, development, and analysis.

20
MATLAB features a family of add-on application-specific solutions called toolboxes.
Very important to most uses of MATLAB, toolboxes allow you to learn and apply specialized
technology. Toolboxes are comprehensive collections of MATLAB functions (M – files) that
extend the MATLAB environment to solve particular classes of problems. Areas in which
toolboxes are available include signal processing, control systems, neural networks, fuzzy logic,
wavelets, simulation, and many others.

3.2 The MATLAB system:

The MATLAB system consists of five main parts

 Development Environment:

This is the set of tools and facilities that help you use MATLAB functions and files.
Many of these tools are graphical user interfaces. It includes the MATLAB desktop and
command window, a command history, an editor and debugger, and browsers for viewing help,
the workspace, files, and the search path.

 The MATLAB Mathematical Function Library:

This is a vast collection of computational algorithms ranging from elementary functions,


like sum, sine, cosine, and complex arithmetic, to more sophisticated functions like matrix
inverse, matrix Eigen values, Bessel functions, and fast Fourier transforms.

 The MATLAB Language:

This is a high-level matrix/array language with control flow statements, functions, data
structures, input/output, and object-oriented programming features. It allows both “programming
in the small” to rapidly create quick and dirty throw-away programs, and “programming in the
large” to create large and complex application programs.

 Graphics:

MATLAB has extensive facilities for displaying vectors and matrices as graphs, as well
as annotating and printing these graphs. It includes high-level functions for two-dimensional and
three-dimensional data visualization, image processing, animation, and presentation graphics. It

21
also includes low-level functions that allow you to fully customize the appearance of graphics as
well as to build complete graphical user interfaces on your MATLAB applications.

 The MATLAB Application Program Interface (API):

This is a library that allows you to write C and FORTRAN programs that interact with
MATLAB. It includes facilities for calling routines from MATLAB (dynamic linking), calling
MATLAB as a computational engine, and for reading and writing MAT-files.

Various toolboxes are there in MATLAB for computing recognition techniques, but we are
using IMAGE PROCESSING toolbox.

3.3 GRAPHICAL USER INTERFACE (GUI):

MATLAB’s Graphical User Interface Development Environment (GUIDE) provides a


rich set of tools for incorporating graphical user interfaces (GUIs) in M-functions. Using
GUIDE, the processes of laying out a GUI (i.e., its buttons, pop-up menus, etc.)and
programming the operation of the GUI are divided conveniently into two easily managed and
relatively independent tasks. The resulting graphical M-function is composed of two identically
named (ignoring extensions) files:

 A file with extension .fig, called a FIG-file that contains a complete graphical description of
all the function’s GUI objects or elements and their spatial arrangement. A FIG-file contains
binary data that does not need to be parsed when he associated GUI-based M-function is
executed.

 A file with extension .m, called a GUI M-file, which contains the code that controls the GUI
operation. This file includes functions that are called when the GUI is launched and exited,
and callback functions that are executed when a user interacts with GUI objects for example,
when a button is pushed.

To launch GUIDE from the MATLAB command window, type

guide filename

22
Where filename is the name of an existing FIG-file on the current path. If filename is
omitted,

GUIDE opens a new (i.e., blank) window.

Fig 3.3 Graphical User Interface window

A graphical user interface (GUI) is a graphical display in one or more windows


containing controls, called components that enable a user to perform interactive tasks. The user
of the GUI does not have to create a script or type commands at the command line to accomplish
the tasks. Unlike coding programs to accomplish tasks, the user of a GUI need not understand the
details of how the tasks are performed.

23
GUI components can include menus, toolbars, push buttons, radio buttons, list boxes, and
sliders just to name a few. GUIs created using MATLAB tools can also perform any type of
computation, read and write data files, communicate with other GUIs, and display data as tables
or as plots.

3.4 Getting Started

If you are new to MATLAB, you should start by reading Manipulating Matrices. The most
important things to learn are how to enter matrices, how to use the: (colon) operator, and how to
invoke functions. After you master the basics, you should read the rest of the sections below and
run the demos.

At the heart of MATLAB is a new language you must learn before you can fully exploit its
power. You can learn the basics of MATLAB quickly, and mastery comes shortly after. You will
be rewarded with high productivity, high-creativity computing power that will change the way
you work.

3.4.1 Introduction - describes the components of the MATLAB system.

3.4.2 Development Environment - introduces the MATLAB development environment,


including information about tools and the MATLAB desktop .

3.4.3 Manipulating Matrices - introduces how to use MATLAB to generate matrices and
perform mathematical operations on matrices.

3.4.4 Graphics - introduces MATLAB graphic capabilities, including information about


plotting data, annotating graphs, and working with images.
3.4.5 Programming with MATLAB - describes how to use the MATLAB language to
create scripts and functions, and manipulate data structures, such as cell arrays and
multidimensional arrays.

24
3.5 DEVELOPMENT ENVIRONMENT

3.5.1 Introduction

This chapter provides a brief introduction to starting and quitting MATLAB, and the tools
and functions that help you to work with MATLAB variables and files. For more information
about the topics covered here, see the corresponding topics under Development Environment in
the MATLAB documentation, which is available online as well as in print.

Starting and Quitting MATLAB

3.5.2 Starting MATLAB

On a Microsoft Windows platform, to start MATLAB, double-click the MATLAB shortcut


icon on your Windows [Link] a UNIX platform, to start MATLAB, type matlab at the
operating system prompt. After starting MATLAB, the MATLAB desktop opens - see MATLAB
Desktop.

You can change the directory in which MATLAB starts, define startup options including running
a script upon startup, and reduce startup time in some situations.

3.5.3 Quitting MATLAB

To end your MATLAB session, select Exit MATLAB from the File menu in the desktop, or
type quit in the Command Window. To execute specified functions each time MATLAB quits,
such as saving the workspace, you can create and run a finish.m script.

3.5.4 MATLAB Desktop

When you start MATLAB, the MATLAB desktop appears, containing tools (graphical user
interfaces) for managing files, variables, and applications associated with [Link] first
time MATLAB starts, the desktop appears as shown in the following illustration, although your
Launch Pad may contain different entries.

You can change the way your desktop looks by opening, closing, moving, and resizing
the tools in it. You can also move tools outside of the desktop or return them back inside the

25
desktop (docking). All the desktop tools provide common features such as context menus and
keyboard shortcuts.

You can specify certain characteristics for the desktop tools by selecting Preferences from
the File menu. For example, you can specify the font characteristics for Command Window text.
For more information, click the Help button in the Preferences dialog box.

3.5.5 Desktop Tools

This section provides an introduction to MATLAB's desktop tools. You can also use
MATLAB functions to perform most of the features found in the desktop tools. The tools are:

 Current Directory Browser


 Workspace Browser
 Array Editor
 Editor/Debugger
 Command Window
 Command History
 Launch Pad
 Help Browser
Command Window

Use the Command Window to enter variables and run functions and M-files.

Command History

Lines you enter in the Command Window are logged in the Command History window. In
the Command History, you can view previously used functions, and copy and execute selected
lines. To save the input and output from a MATLAB session to a file, use the diary function.

Running External Programs

You can run external programs from the MATLAB Command Window. The exclamation point
character! is a shell escape and indicates that the rest of the input line is a command to the
operating system. This is useful for invoking utilities or running other programs without quitting

26
MATLAB. On Linux, for example, emacs magik.m invokes an editor called emacs for a file
named magik.m. When you quit the external program, the operating system returns control to
MATLAB.

Launch Pad

MATLAB's Launch Pad provides easy access to tools, demos, and documentation.

Help Browser

Use the Help browser to search and view documentation for all your Math Works products.
The Help browser is a Web browser integrated into the MATLAB desktop that displays HTML
documents.

To open the Help browser, click the help button in the toolbar, or type helpbrowser in the
Command Window. The Help browser consists of two panes, the Help Navigator, which you use
to find information, and the display pane, where you view the information .

Help Navigator

Use to Help Navigator to find information. It includes:

Product filter - Set the filter to show documentation only for the products you specify.

Contents tab - View the titles and tables of contents of documentation for your products.

Index tab - Find specific index entries (selected keywords) in the MathWorks documentation
for your products.

Search tab - Look for a specific phrase in the documentation. To get help for a specific
function, set the Search type to Function Name.

Favorites tab - View a list of documents you previously designated as favorites.

Display Pane

27
After finding documentation using the Help Navigator, view it in the display pane. While
viewing the documentation, you can:

Browse to other pages - Use the arrows at the tops and bottoms of the pages, or use the back
and forward buttons in the toolbar.

Bookmark pages - Click the Add to Favorites button in the toolbar.

Print pages - Click the print button in the toolbar.

Find a term in the page - Type a term in the Find in page field in the toolbar and click Go.

Other features available in the display pane are: copying information, evaluating a selection,
and viewing Web pages.

Current Directory Browser

MATLAB file operations use the current directory and the search path as reference points.
Any file you want to run must either be in the current directory or on the search path.

Search Path

To determine how to execute functions you call, MATLAB uses a search path to find M-files
and other MATLAB-related files, which are organized in directories on your file system. Any
file you want to run in MATLAB must reside in the current directory or in a directory that is on
the search path. By default, the files supplied with MATLAB and MathWorks toolboxes are
included in the search path.

Workspace Browser

The MATLAB workspace consists of the set of variables (named arrays) built up during a
MATLAB session and stored in memory. You add variables to the workspace by using
functions, running M-files, and loading saved workspaces.

To view the workspace and information about each variable, use the Workspace browser, or
use the functions who and whos.

28
To delete variables from the workspace, select the variable and select Delete from the Edit
menu. Alternatively, use the clear function.

The workspace is not maintained after you end the MATLAB session. To save the workspace
to a file that can be read during a later MATLAB session, select Save Workspace As from the
File menu, or use the save function. This saves the workspace to a binary file called a MAT-file,
which has a .mat extension. There are options for saving to different formats. To read in a MAT-
file, select Import Data from the File menu, or use the load function.

Array Editor

Double-click on a variable in the Workspace browser to see it in the Array Editor. Use the
Array Editor to view and edit a visual representation of one- or two-dimensional numeric arrays,
strings, and cell arrays of strings that are in the workspace.

Editor/Debugger

Use the Editor/Debugger to create and debug M-files, which are programs you write to
runMATLAB functions. The Editor/Debugger provides a graphical user interface for basic text
editing, as well as for M-file debugging.

You can use any text editor to create M-files, such as Emacs, and can use preferences
(accessible from the desktop File menu) to specify that editor as the default. If you use another
editor, you can still use the MATLAB Editor/Debugger for debugging, or you can use debugging
functions, such as dbstop, which sets a breakpoint.

If you just need to view the contents of an M-file, you can display it in the Command
Window by using the type function.

3.6 MANIPULATING MATRICES

3.6.1 Entering Matrices

The best way for you to get started with MATLAB is to learn how to handle matrices. Start
MATLAB and follow along with each example.

29
You can enter matrices into MATLAB in several different ways:

 Enter an explicit list of elements.


 Load matrices from external data files.
 Generate matrices using built-in functions.
 Create matrices with your own functions in M-files.
Start by entering Dürer's matrix as a list of its elements. You have only to follow a few basic
conventions:

 Separate the elements of a row with blanks or commas.


 Use a semicolon, ; , to indicate the end of each row.
 Surround the entire list of elements with square brackets, [ ].
To enter Dürer's matrix, simply type in the Command Window

A = [16 3 2 13; 5 10 11 8; 9 6 7 12; 4 15 14 1]

MATLAB displays the matrix you just entered.

A=

16 3 2 13

5 10 11 8

9 6 7 12

4 15 14 1

This exactly matches the numbers in the engraving. Once you have entered the matrix, it is
automatically remembered in the MATLAB workspace. You can refer to it simply as A.

3.6.2 Expressions

Like most other programming languages, MATLAB provides mathematical expressions, but
unlike most programming languages, these expressions involve entire matrices. The building
blocks of expressions are:

30
 Variables
 Numbers
 Operators
 Functions
Variables

MATLAB does not require any type declarations or dimension statements. When MATLAB
encounters a new variable name, it automatically creates the variable and allocates the
appropriate amount of storage. If the variable already exists, MATLAB changes its contents and,
if necessary, allocates new storage. For example, num_students = 25

Creates a 1-by-1 matrix named num_students and stores the value 25 in its single element.
Variable names consist of a letter, followed by any number of letters, digits, or underscores.
MATLAB uses only the first 31 characters of a variable name. MATLAB is case sensitive; it
distinguishes between uppercase and lowercase letters. A and a are not the same variable. To
view the matrix assigned to any variable, simply enter the variable name.

Numbers

MATLAB uses conventional decimal notation, with an optional decimal point and leading
plus or minus sign, for numbers. Scientific notation uses the letter e to specify a power-of-ten
scale factor. Imaginary numbers use either i or j as a suffix. Some examples of legal numbers are

3 -99 0.0001

9.6397238 1.60210e-20 6.02252e23

1i -3.14159j 3e5i

All numbers are stored internally using the long format specified by the IEEE floating-point
standard. Floating-point numbers have a finite precision of roughly 16 significant decimal digits
and a finite range of roughly 10-308 to 10+308.

31
3.6.3 Operators

Expressions use familiar arithmetic operators and precedence rules.

+ Addition

- Subtraction

* Multiplication

/ Division

\ Left division (described in "Matrices and Linear


Algebra" in Using MATLAB)

^ Power

' Complex conjugate transpose

() Specify evaluation order

3.6.4 Functions

MATLAB provides a large number of standard elementary mathematical functions, including


abs, sqrt, exp, and sin. Taking the square root or logarithm of a negative number is not an error;
the appropriate complex result is produced automatically. MATLAB also provides many more
advanced mathematical functions, including Bessel and gamma functions. Most of these
functions accept complex arguments. For a list of the elementary mathematical functions, type
help elfun, For a list of more advanced mathematical and matrix functions, type help specfun
help elmat

Some of the functions, like sqrt and sin, are built-in. They are part of the MATLAB core
so they are very efficient, but the computational details are not readily accessible. Other

32
functions, like gamma and sinh, are implemented in M-files. You can see the code and even
modify it if you want. Several special functions provide values of useful constants.

Pi 3.14159265...

I Imaginary unit, √-1

I Same as i

Eps Floating-point relative precision, 2-52

Realmin Smallest floating-point number, 2-1022

Realmax Largest floating-point number, (2- ε)21023

Inf Infinity

NaN Not-a-number

3.7 GUI

A graphical user interface (GUI) is a user interface built with graphical objects, such as
buttons, text fields, sliders, and menus. In general, these objects already have meanings to most
computer users. For example, when you move a slider, a value changes; when you press an OK
button, your settings are applied and the dialog box is dismissed. Of course, to leverage this
built-in familiarity, you must be consistent in how you use the various GUI-building
components.

Applications that provide GUIs are generally easier to learn and use since the person using
the application does not need to know what commands are available or how they work. The
action that results from a particular user action can be made clear by the design of the interface.

The sections that follow describe how to create GUIs with MATLAB. This includes laying
out the components, programming them to do specific things in response to user actions, and
saving and launching the GUI; in other words, the mechanics of creating GUIs. This

33
documentation does not attempt to cover the "art" of good user interface design, which is an
entire field unto itself. Topics covered in this section include:

3.7.1 Creating GUIs with GUIDE

MATLAB implements GUIs as figure windows containing various styles of uicontrol


objects. You must program each object to perform the intended action when activated by the user
of the GUI. In addition, you must be able to save and launch your GUI. All of these tasks are
simplified by GUIDE, MATLAB's graphical user interface development environment.

3.7.2 GUI Development Environment

The process of implementing a GUI involves two basic task.

 Laying out the GUI components


 Programming the GUI components
GUIDE primarily is a set of layout tools. However, GUIDE also generates an M-file that
contains code to handle the initialization and launching of the GUI. This M-file provides a
framework for the implementation of the callbacks - the functions that execute when users
activate components in the GUI.

The Implementation of a GUI

While it is possible to write an M-file that contains all the commands to lay out a GUI, it is
easier to use GUIDE to lay out the components interactively and to generate two files that save
and launch the GUI:

A FIG-file - contains a complete description of the GUI figure and all of its

children (uicontrols and axes), as well as the values of all object properties.

An M-file - contains the functions that launch and control the GUI and the

callbacks, which are defined as subfunctions. This M-file is referred to as the

application M-file in this documentation.

34
Note that the application M-file does not contain the code that lays out the uicontrols; this
information is saved in the FIG-file.

The following diagram illustrates the parts of a GUI implementation.

FIG 3.7.2 graphical user blocks

3.7.3 Features of the GUIDE-Generated Application M-Fil

GUIDE simplifies the creation of GUI applications by automatically generating an M-file


framework directly from your layout. You can then use this framework to code your application
M-file. This approach provides a number of advantages:

The M-file contains code to implement a number of useful features (see Configuring
Application Options for information on these features). The M-file adopts an effective approach
to managing object handles and executing callback routines (see Creating and Storing the Object
Handle Structure for more information). The M-files provides a way to manage global data (see
Managing GUI Data for more information).

The automatically inserted subfunction prototypes for callbacks ensure compatibility with
future releases. For more information, see Generating Callback Function Prototypes for
information on syntax and arguments.

35
You can elect to have GUIDE generate only the FIG-file and write the application M-file
yourself. Keep in mind that there are no uicontrol creation commands in the application M-file;
the layout information is contained in the FIG-file generated by the Layout Editor.

3.7.4 Beginning the Implementation Process

To begin implementing your GUI, proceed to the following sections:

Getting Started with GUIDE - the basics of using GUIDE.

Selecting GUIDE Application Options - set both FIG-file and M-file options.

Using the Layout Editor - begin laying out the GUI.

Understanding the Application M-File - discussion of programming techniques

used in the application M-file.

Application Examples - a collection of examples that illustrate techniques

which are useful for implementing GUIs.

Command-Line Accessibility

When MATLAB creates a graph, the figure and axes are included in the list of children of
their respective parents and their handles are available through commands such as findobj, set,
and get. If you issue another plotting command, the output is directed to the current figure and
axes.

GUIs are also created in figure windows. Generally, you do not want GUI figures to be
available as targets for graphics output, since issuing a plotting command could direct the output
to the GUI figure, resulting in the graph appearing in the middle of the GUI.

In contrast, if you create a GUI that contains an axes and you want commands entered in the
command window to display in this axes, you should enable command-line access.

36
3.7.5 User Interface Control

The Layout Editor component palette contains the user interface controls that you can use in
your GUI. These components are MATLAB uicontrol objects and are programmable via their
Callback properties. This section provides information on these components.

 Push Buttons
 Sliders
 Toggle Buttons
 Frames
 Radio Buttons
 Listboxes
 Checkboxes
 Popup Menus
 Edit Text
 Axes
 Static Text
 Figures
Push Buttons

Push buttons generate an action when pressed (e.g., an OK button may close a dialog box and
apply settings). When you click down on a push button, it appears depressed; when you release
the mouse, the button's appearance returns to its nondepressed state; and its callback executes on
the button up event.

Properties to Set

String - set this property to the character string you want displayed on the push button .

Tag - GUIDE uses the Tag property to name the callback subfunction in the application M-file.
Set Tag to a descriptive name (e.g., close_button) before activating the GUI.

37
Programming the Callback

When the user clicks on the push button, its callback executes. Push buttons do not return a
value or maintain a state.

Toggle Buttons

Toggle buttons generate an action and indicate a binary state (e.g., on or off). When you click
on a toggle button, it appears depressed and remains depressed when you release the mouse
button, at which point the callback executes. A subsequent mouse click returns the toggle button
to the nondepressed state and again executes its callback.

Programming the Callback

The callback routine needs to query the toggle button to determine what state it is in.
MATLAB sets the Value property equal to the Max property when the toggle button is depressed
(Max is 1 by default) and equal to the Min property when the toggle button is not depressed (Min
is 0 by default).

From the GUIDE Application M-FileThe following code illustrates how to program the
callback in the GUIDE application M-file.

function varargout = togglebutton1_Callback(h,eventdata,handles,varargin)

button_state = get(h,'Value');

if button_state == get(h,'Max')

% toggle button is pressed

elseif button_state == get(h,'Min')

% toggle button is not pressed

end

Adding an Image to a Push Button or Toggle Button

38
Assign the CData property an m-by-n-by-3 array of RGB values that define a truecolor
image. For example, the array a defines 16-by-128 truecolor image using random values between
0 and 1 (generated by rand).

a(:,:,1) = rand(16,128);

a(:,:,2) = rand(16,128);

a(:,:,3) = rand(16,128);

set(h,'CData',a)

Radio Buttons

Radio buttons are similar to checkboxes, but are intended to be mutually exclusive within a
group of related radio buttons (i.e., only one button is in a selected state at any given time). To
activate a radio button, click the mouse button on the object. The display indicates the state of
the button.

Implementing Mutually Exclusive Behavior

Radio buttons have two states - selected and not selected. You can query and set the state of a
radio button through its Value property:

Value = Max, button is selected.

Value = Min, button is not selected.

To make radio buttons mutually exclusive within a group, the callback for each radio button
must set the Value property to 0 on all other radio buttons in the group. MATLAB sets the Value
property to 1 on the radio button clicked by the user.

The following subfunction, when added to the application M-file, can be called by each radio
button callback. The argument is an array containing the handles of all other radio buttons in the
group that must be deselected.

function mutual_exclude(off)

39
set(off,'Value',0)

Obtaining the Radio Button Handles.

The handles of the radio buttons are available from the handles structure, which contains the
handles of all components in the GUI. This structure is an input argument to all radio button
callbacks.

The following code shows the call to mutual_exclude being made from the first radio
button's callback in a group of four radio buttons.

function varargout = radiobutton1_Callback(h,eventdata,handles,varargin)

off = [handles.radiobutton2,handles.radiobutton3,handles.radiobutton4];

mutual_exclude(off)

% Continue with callback .

After setting the radio buttons to the appropriate state, the callback can continue with its
implementation-specific tasks.

Checkboxes

Check boxes generate an action when clicked and indicate their state as checked or not
checked. Check boxes are useful when providing the user with a number of independent choices
that set a mode (e.g., display a toolbar or generate callback function prototypes).

The Value property indicates the state of the check box by taking on the value of the Max or
Min property (1 and 0 respectively by default):

Value = Max, box is checked.

Value = Min, box is not checked.

40
You can determine the current state of a check box from within its callback by querying the
state of its Value property, as illustrated in the following example:

function checkbox1_Callback(h,eventdata,handles,varargin)

if (get(h,'Value') == get(h,'Max'))

% then checkbox is checked-take approriate action

else

% checkbox is not checked-take approriate action

end

Edit Text

Edit text controls are fields that enable users to enter or modify text strings. Use edit text
when you want text as input. The String property contains the text entered by the user.

To obtain the string typed by the user, get the String property in the callback.

function edittext1_Callback(h,eventdata, handles,varargin)

user_string = get(h,'string');

% proceed with callback...

Obtaining Numeric Data from an Edit Test Component

MATLAB returns the value of the edit text String property as a character string. If you want
users to enter numeric values, you must convert the characters to numbers. You can do this using
the str2double command, which converts strings to doubles. If the user enters non-numeric
characters, str2double returns NaN.

You can use the following code in the edit text callback. It gets the value of the String
property and converts it to a double. It then checks if the converted value is NaN, indicating the
user entered a non-numeric character (isnan) and displays an error dialog (errordlg).

41
function edittext1_Callback(h,eventdata,handles,varargin)

user_entry = str2double(get(h,'string'));

if isnan(user_entry)

errordlg('You must enter a numeric value','Bad Input','modal')

end

% proceed with callback...

Triggering Callback Execution

On UNIX systems, clicking on the menubar of the figure window causes the edit text
callback to execute. However, on Microsoft Windows systems, if an editable text box has focus,
clicking on the menubar does not cause the editable text callback routine to execute. This
behavior is consistent with the respective platform conventions. Clicking on other components in
the GUI execute the callback.

Static Text

Static text controls displays lines of text. Static text is typically used to label other controls,
provide directions to the user, or indicate values associated with a slider. Users cannot change
static text interactively and there is no way to invoke the callback routine associated with it

Frames

Frames are boxes that enclose regions of a figure window. Frames can make a user interface
easier to understand by visually grouping related controls. Frames have no callback routines
associated with them and only uicontrols can appear within frames (axes cannot).

Placing Components on Top of Frames

Frames are opaque. If you add a frame after adding components that you want to be
positioned within the frame, you need to bring forward those components. Use the Bring to
Front and Send to Back operations in the Layout menu for this purpose.

42
List Boxes

List boxes display a list of items and enable users to select one or more items.

The String property contains the list of strings displayed in the list box. The first item in the
list has an index of 1.

The Value property contains the index into the list of strings that correspond to the selected
item. If the user selects multiple items, then Value is a vector of indices. By default, the first item
in the list is highlighted when the list box is first displayed. If you do not want any item
highlighted, then set the Value property to empty.

The ListboxTop property defines which string in the list displays as the top most item when
the list box is not large enough to display all list entries. ListboxTop is an index into the array of
strings defined by the String property and must have a value between 1 and the number of
strings. Noninteger values are fixed to the next lowest integer

Single or Multiple Selection

The values of the Min and Max properties determine whether users can make single or
multiple selections:

If Max - Min > 1, then list boxes allow multiple item selection.

If Max - Min <= 1, then list boxes do not allow multiple item selection.

Selection Type

Listboxes differentiate between single and double clicks on an item and set the figure
SelectionType property to normal or open accordingly. See Triggering Callback Execution for
information on how to program multiple selection.

Triggering Callback Execution

MATLAB evaluates the list box's callback after the mouse button is released or a keypress
event (including arrow keys) that changes the Value property (i.e., any time the user clicks on an
item, but not when clicking on the list box scrollbar). This means the callback is executed after

43
the first click of a double-click on a single item or when the user is making multiple selections .
In these situations, you need to add another component, such as a Done button (push button) and
program its callback routine to query the list box Value property (and possibly the figure
SelectionType property) instead of creating a callback for the list box. If you are using the
automatically generated application M-file option, you need to either:

Set the list box Callback property to the empty string ('') and remove the callback
subfunction from the application M-file. Leave the callback subfunction stub in the application
M-file so that no code executes when users click on list box items.

The first choice is best if you are sure you will not use the list box callback and you want to
minimize the size and efficiency of the application M-file. However, if you think you may want
to define a callback for the list box at some time, it is simpler to leave the callback stub in the M-
file.

Popup Menus

Popup menus open to display a list of choices when users press the arrow. The String
property contains the list of string displayed in the popup menu. The Value property contains the
index into the list of strings that correspond to the selected item. When not open, a popup menu
displays the current choice, which is determined by the index contained in the Value property.
The first item in the list has an index of 1.

Popup menus are useful when you want to provide users with a number of mutually
exclusive choices, but do not want to take up the amount of space that a series of radio buttons
requires

Programming the Popup Menu

You can program the popup menu callback to work by checking only the index of the item
selected (contained in the Value property) or you can obtain the actual string contained in the
selected item.

This callback checks the index of the selected item and uses a switch statement to take action
based on the value. If the contents of the popup menu is fixed, then you can use this approach.

44
function varargout = popupmenu1_Callback(h,eventdata,handles,varargin)

val = get(h,'Value');

switch val

case 1

% The user selected the first item

case 2

% The user selected the second item

% etc.

This callback obtains the actual string selected in the popup menu. It uses the value to index
into the list of strings. This approach may be useful if your program dynamically loads the
contents of the popup menu based on user action and you need to obtain the selected string. Note
that it is necessary to convert the value returned by the String property from a cell array to a
string.

function varargout = popupmenu1_Callback(h,eventdata,handles,varargin)

val = get(h,'Value');

string_list = get(h,'String');

selected_string = string_list{val}; % convert from cell array to string % etc.

Enabling or Disabling Controls

You can control whether a control responds to mouse button clicks by setting the Enable
property. Controls have three states:

on - The control is operational

off - The control is disabled and its label (set by the string property) is grayed out.

45
inactive - The control is disabled, but its label is not grayed out. When a control is disabled,
clicking on it with the left mouse button does not execute its callback routine. However, the left-
click causes two other callback routines to execute: First the figure WindowButtonDownFcn
callback executes. Then the control's ButtonDownFcn callback executes. A right mouse button
click on a disabled control posts a context menu, if one is defined for that control. See the Enable
property description for more details.

Axes

Axes enable your GUI to display graphics (e.g., graphs and images). Like all graphics objects,
axes have properties that you can set to control many aspects of its behavior and appearance. See
Axes Properties for general information on axes objects.

Axes Callbacks

Axes are not uicontrol objects, but can be programmed to execute a callback when users click
a mouse button in the axes. Use the axes ButtonDownFcn property to define the callback.

3.7.6 Plotting to Axes in GUIs

GUIs that contain axes should ensure the Command-line accessibility option in the
Application Options dialog is set to Callback (the default). This enables you to issue plotting
commands from callbacks without explicitly specifying the target axes.

GUIs with Multiple Axes

If a GUI has multiple axes, you should explicitly specify which axes you want to target when
you issue plotting commands. You can do this using the axes command and the handles
structure. For example, axes(handles.axes1) makes the axes whose Tag property is axes1 the
current axes, and therefore the target for plotting commands. You can switch the current axes
whenever you want to target a different axes. See GUI with Multiple Axes for and example that
uses two axes.

46
Figure

Figures are the windows that contain the GUI you design with the Layout Editor. See the
description of figure properties for information on what figure characteristics you can control.

47
Chapter 4

Proposed Method
4.1 Proposed Framework

To tackle this problem, we propose a novel NR-IQA frame- work called deep blind
image quality assessor (DIQA). The DIQA is trained in two separated stages as shown in Fig. 1.1
In the first stage, an objective error map is used as a proxy training target to expand the data set
labels. The existing database provides a subjective score for each distorted image. In other
words, one training data item includes a mapping from a 3-D tensor (width, height, and channel)
to a scalar value. Given a distorted image Id and a scalar subjective score S, the optimal
parameter of a model θ should be sought by arg minθ f (Id ; θ )−S2, where f (·) is a prediction
function. In contrast, the DIQA utilizes reference images during training and generates a 2-D
intermediate target called the objective error map. Please note that the reference images are
accessible during training as long as the database provides them, and the ground-truth objective
error map can be easily derived by comparing the reference and distorted images. By expanding
the training target to a 2-D error map e, we have arg minθ (i,j) f (Id ; θ )(i, j) − e(i, j)2, where (i,
j) is a pixel index. In other words, it yields the same effect of increasing the number of training
pairs up to the dimensions of the error map by giving more constraints. Once the deep neural
network is trained with sufficient training data set, the model is fine tuned to predict the
subjective scores. Since the objective error map is somewhat correlated with the subjective score,
the second stage can be trained without great difficulty by using even a limited data set. In the
end, our model can predict the subjective scores without accessing the ground- truth objective
error maps during testing.

Overall, we resolve the NR-IQA problem by dividing it into the objective distortion and
the HVS-related parts. In the objective distortion part, a pixel wise objective error map is
predicted using the CNN model. In the HVS-related part, the model further learns the human
visual perception behavior.

However, there persists another problem in the objective error map prediction phase.
When severe distortion is applied to an image and its high-frequency detail is lost, its error map
obtains more high-frequency components. Meanwhile the distorted image does not have high-
frequency details. Therefore, without the reference image, it is difficult to predict an accurate
error map from the distorted image, in particular, on homogeneous regions. To avoid this
problem, we propose deriving a reliability map by measuring textural strength to compensate for
the inaccuracy of the error map.

To visualize and analyze the learned human visual sensitivity, we further propose an
alternative model, which we call DIQA-SENS. We use two separated CNN branches where each

48
is dedicated to learn the objective distortion and the human visual sensitivity, respectively. In
particular, the visual sensitivity branch predicts local visual weights of the objective error map
by seeing the triplet of a distorted image, its objective error map, and its ground-truth subjective
score. The multiplication of the objective error map and the sensitivity map results in a
perceptual error map, which can explain the degree of distortion in the perspective of the HVS.

Our contributions can be summarized as follows.

1) Using the simple objective error map, the training data set can be easily augmented, and the
deep CNN model can be trained without an over fitting problem.

2) DIQA is trained via end-to-end optimization so that the parameters can be thoroughly
optimized to achieve state- of-the-art correlation with human subjective scores.

3) DIQA-SENS generates the objective error map and the perceptual error maps as intermediate
results, which provide an intuitive analysis of local artifacts given distorted images.

4.2 RELATED WORK

Most previously proposed NR-IQA methods were developed based on the machine learning
framework. Researchers attempted to design elaborate features that could discriminate distorted
images from the pristine images. One popular feature is a family of NSS that assumes that
natural scenes contain statistical regularities. Various types of NSS features have been defined in
transformation and spatial domains in the literature. Moorthy and Bovik [1] extracted features in
the wavelet domain, and Saad et al. [2] defined them in the discrete cosine transform
coefficients. Recently, Mittal et al. [11], [12] captured NSS features using only locally
normalized images without any domain transformation.

In addition to NSS features, various kinds of features have been developed for NR-IQA.
Li et al. [13] employed a general regression neural network relative to phase congruency,
entropy, and image gradients. Tang et al. [14] considered such multiple features as natural image
statistics, distortion textures, blur, and noise statistics. Meanwhile, in [15] and [16], dictionary
learning was adapted to capture effective features from the raw patches. Most of these studies
were based on conventional machine learning algorithms, such as SVMs and NNs. Since such
models have a limited number of parameters, the size of the data set was not a significant issue.
However, they yielded lower accuracies than FR-IQA metrics.

Relatively recently, attempts have been made to adopt a deep learning technique for the
NR-IQA problem to enhance prediction accuracy [42]. Hou et al. [3] used a DBN, where NSS-
related features were extracted in the wavelet domain and fed into the deep model. Similarly, Li
et al. [4] derived NSS-related features from Shearlet-transformed images. The extracted features
were then regressed onto a subjective score using a stacked autoencoder. Lv et al. [17] used DoG
features and the stacked autoencoder. Ghadiyaram and Bovik [18] attempted to capture a large

49
number of NSS features using multiple transforms and then used a DBN to predict the subjective
score. However, most studies have used the deep model in place of the conventional regression
machine. This involved designing handcrafted features of sufficiently small size such that the
neural networks were not sufficiently deep to take full advantage of deep learning. Kang et al.
[19] applied a CNN to the NR-IQA problem without handcrafted features to conduct end-to-end
optimization. To resolve the data set size, an input image was divided into multiple patches, and
an equal mean opinion score (MOS) was used for all patches in an image. Strictly speaking, this
approach cannot reflect properties of the HVS, the pixelwise perceptual quality of which varies
over the spatial domain. Bosse et al. [20] adopted a deep CNN model with 12 layers. The loss
function was similar to [19]; however, they suggested an additional model, which learns the
individual importance of each patch. Recently, we proposed a CNN-based NR-IQA framework,
where FR-IQA metrics were employed as intermediate training targets of the CNN [21], and the
statistical pooling over minibatch was introduced for end-to-end optimization. On the other hand,
to overcome the limited training set, other attempts have been made by generating discriminable
image pairs [22], or employing multitask learning [23]. In contrast to past work, the DIQA
resolves the issue of the lack of a data set by utilizing reference images in training to generate an
intermediate target. Different from our previous work [21], the DIQA does not depends on
complicated FR-IQA metrics. In addition, the DIQA uses only convolutional layers in the
pretraining stage so that the model can be deeper and can use a larger proxy target. Our proposed
framework achieves state-of-the-art prediction accuracy using the strong representation
capability of CNN models.

4.3 DEEP IMAGE QUALITY ASSESSMENT PREDICTOR

The overall framework of the DIQA is shown in Fig. 1.1. Once an input-distorted image is
normalized it passes through two paths: 1) a CNN branch and 2) a reliability map prediction
branch. In the first training stage, the CNN branch is trained to predict an objective error map e
The ground-truth error map egt is obtained by comparing the reference and distorted images. In
the second stage, the model is further trained to predict a human subjective score S (Section
4.3.5). In each stage, the reliability map r is supplemented to compensate the inaccuracy on
homogeneous regions.

4.3.1 Model Architecture

The design of the proposed CNN architecture is motivated by [24]. The structure of the DIQA is
shown in Fig. 4.3.1. For the error map prediction part, the model consists of only convolutional
layers and zeros are padded around the border before each convolution; therefore, the output
does not lose relative pixel position information. Each layer except the last one has a 3 × 3 filter
and a rectified linear unit (ReLU) [25]. We call the output of Conv8 as a feature map (filled with
yellow in Fig.4.3.1), which is reused for the second stage of training. In the last layer of the first
training stage, the feature map is reduced to a one-channel objective error map using a 1 × 1
filter without nonlinear activation. If we directly feed the predicted error map into the modules of

50
the second stage, it would hinder the abundant representation of features, because there is only
one channel in the error map. To avoid this problem, we employ a simple linear combination ove
channels in Conv9, so that we can generate a meaningful feature map closely related to the
ground-truth error map, meanwhile having multiple channels for better representation. The size
of the output of Conv9 is 1/4 times the original input image. Correspondingly, the ground-truth
objective error maps are downscaled by 1/4. For the down sampling operation, convolution with
a stride of 2 is used. In the second training stage, the extracted feature map is fed into the global
average pooling layer followed by two fully connected layers. We additionally use two
handcrafted features, which will be

Fig. 4.3.1 Architecture of the objective pixel error map prediction subnetwork. “Conv”
indicates the convolutional layers, and “FC” indicates fully connectedlayers. The text below
“Conv” indicates its size of filter. The red (blue) arrows indicate the flows of the first
(second) stage.

explained later. The handcrafted features are concatenated with the pooled features before FC1,
and then regressed onto a subjective score. For convenience, we denote the procedure from
Conv1 to Conv8 by f (·), the operation of Conv9 by g(·), and the procedure including FC1 and
FC2 by h(·).

4.3.2 Image Normalization

As a preprocessing, the input images are first converted to grayscale, and they are subtracted
from their low-pass filtered images. Let Ir be a reference image and Id be the corresponding
distorted image. The normalized versions are then denoted by ˆIr and ˆId , respectively. The low-
frequency image is obtained by downscaling the input image to 1/4 and upscaling it again to the
original size, which is denoted by Ilowr and Ilowd . A Gaussian low-pass filter and subsampling
were used to resize the images.

There are two reasons for this simple normalization. First, image distortions barely affect
the low-frequency component of images. For example, white Gaussian noise (WN) adds random
high-frequency components to images, GB removes high-frequency details, and blocking
51
artifacts introduce new high-frequency edges. The distortions due to JPEG and JPEG2000
(JP2K) can be modeled by a combination of these artifacts [26]. Second, the HVS is not sensitive
to a change in the low-frequency band. The CSF shows a bandpass filter shape peaking at
approximately four cycles per degree, and sensitivity drops rapidly at low frequency [8].
Although there are small distortions in the low-frequency band, the HVS hardly notices them.
Though there are benefits of employing this normalization scheme, there is also a drawback of
losing information. To compensate this, two handcrafted features are supplemented in the second
training stage.

4.3.3 Reliability Map Prediction

Many distortions, such as quantization by JP2K, or GB, make images blurry. However,
unlike FR-IQA, it is difficult to determine whether the blurry region is distorted without knowing
its pristine image. Furthermore, as severe distortion is applied to an image, its error map receives
more high- frequency components. Meanwhile, the distorted image loses more high-frequency
details, as shown in Fig. 3. Therefore, the model is likely to fail to predict the objective error map
on homogeneous regions.

Fig. 4.3.3 Examples of estimated reliability maps. (a)–(c) JPEG2000 distorted images in the
TID2013 data set at the distortion levels of 1, 3, and 5. (d)–(f) Difference maps derived by
using (5). (g)–(i) Reliability maps of (a)–(c).

52
To avoid this problem, the reliability of the predicted error map is estimated by measuring the
texture strength of the distorted image. Our assumption is that blurry regions have lower
reliability than textured regions. Preprocessed images that are bandpassed are used to measure
the reliability map as

where α controls the saturation property of the reliability map. To normalize the
reliability map, the positive half of the sigmoid function is used in (1), so that pixels with small
values are assigned sufficiently large reliability values.

The images shown in Fig. 4.3.3(a)–(c) are distorted by JPEG2000 at different levels, and
the corresponding reliability maps with α = 1 are shown in Fig. 4.3.3(d)–(f). It can be easily
checked that it is difficult to derive an accurate error map (f) from severely distorted images (c).
The estimated reliability maps are shown in Fig. 4.3.3(g)–(i). As shown in Fig. 4.3.3(i),

Fig. [Link] Histograms of error maps with different values of p. (a) p = 1. (b) p = 0.2.

the reliability map has zero values, where there is no meaningful spatial information in Fig.
4.3.3(c).

To prevent the reliability map from directly affecting the predicted score, it is divided by
its average as

where Hr and Wr are the height and width of r.

53
4.3.4 Learning Objective Error Map

In the first stage of training, the objective error maps are used as proxy regression targets
to get the effect of increasing data. The loss function is defined by the mean squared error
between the predicted and ground-truth error maps

where f (·) and g(·) are defined in Fig. 2, θ represents the CNN’s parameters, and egt is defined
by

Here, any error metric function can be used for err(·). In our experiment, we chose the exponent
difference function

where p is the exponent number. When an absolute difference ( p = 1) is used for the error metric
function, most values in the error maps are small numbers close to zero. In this case, the model
tends to fail to predict an accurate error map. When the training process converges, most values
were zero in the experiment. Therefore, we chose p = 0.2 to spread the distribution of the
difference map over the higher values. Fig. [Link] shows a comparison of histograms for the two
exponent numbers, where the histogram of p = 0.2 has a broader distribution between 0 and 1.

4.3.5 Learning Subjective Opinion

Once the model is trained to predict the objective error maps, we move to the next
training stage, where DIQA is trained to predict subjective scores. To achieve this, the trained
sub network f (·) is connected to a global average pooling layer followed by the fully connected
layers as shown in Fig. 4.3.1. The feature map is averaged over spatial domain leading to a 128-
D feature vector. Here, to compensate the lost information, we consider two additional
handcrafted features: the mean of the non normalized reliability map μr and the standard
deviation of the low- frequency of distorted image allowed . If the distorted image is too blurred,
the reliable area becomes too small. In this case, the overall textural strength of the distorted
image becomes an important feature, which can be captured by μr. Therefore, the loss function is
defined as

54
where f (·) is a nonlinear regression function, S is the ground- truth subjective score of the input-
distorted image, and v is the pooled feature vector. v is defined by:

v = GAP( f ( ˆId ; θ f )) (7)

where GAP indicates the global average pooling operation.

4.3.6 Training

In this section, we describe the training details of the DIQA. The layers for error map
prediction are first trained by minimizing (3), where the ground-truth error map is derived from
(5). When the first stage converges to a sufficient extent, (6) is then minimized in the second
stage.

Since zeros are padded before each convolution, the feature maps near the borders tend to
be zeros. Therefore, during the minimization of the loss functions in (3) and (6), we ignored
pixels near borders around the error and the perceptual error maps. Each of four rows or columns
for each border was excluded in the experiment, which compensated for information loss in the
last two convolutional layers.

For better convergence of optimization, the adaptive moment estimation optimizer


(ADAM) [27] with Nesterov momentum [28] was employed to alter the regular stochas- tic
gradient descent method. The default hyper parameters suggested in the literature [27] were used
for the ADAM, and the momentum parameter was set to 0.9. The learning rate was set
differently for each data set from 2 × 10−4 to5×10−4. We chose the optimal value empirically. In
addition, during training of the second stage, the learning rates for the pretrained layers were
multiplied by 0.1. For weight decay, L2 regularization was applied to all the layers (L2 penalty
multiplied by 5 × 10−4).

4.3.7 Patch-Based Training

In the DIQA framework, the sizes of input images must be fixed to train the model on a GPU.
Therefore, to train the DIQA using images of various sizes, such as in the LIVE IQA database
[5], each input image should be divided into multiple patches of the same size. Here, the step of
the sliding window is determined by the patch size and the number of ignored pixels around the
borders to avoid overlapping regions when the perceptual error map is reconstructed. When the
ignored pixels around the borders are four, the step should be 4, where steppatch = sizepatch −32
is determined by 4×2 (both sides of the border) ×4 (upscaling by 4). In the experiment with the
LIVE IQA database, the patch size was 112 × 112 and each step was 80 × 80. In addition, during
the training of the second stage, all patches composing an image should be in the same mini-
batch [21], so that v, μr, and σIlow d can be derived from the reconstructed perceptual error and
reliability maps.

55
Chapter 5

SIMULATION RESULTS

56
57
58
Figs 5.1 Simulation Results

59
Chapter 6

PROGRAM
clc
clc
clear all
close all
warning off
[filename, pathname] = uigetfile({'*.jpg'},'pick an image');
if isequal(filename,0) || isequal(pathname,0)
warndlg('File is not selected');
else
end

[pathstr,name,ext] = fileparts(filename);
filename11=char(filename);

I=imread(filename);
I=imresize(I,[256 256]);
nSig=imnoise(I,'poisson');
figure,imshow(nSig);
imwrite(nSig,'[Link]');
img=imcrop(nSig);
figure,imshow(img);title('Original image');
%%%%%%%%%%%%%%% Apply Discrete wavelet transform %%%%%%%%%%%%%%

[cA,cH,cV,cD]=dwt2(I ,'haar');
figure,subplot(2,2,1),imshow(mat2gray(cA));title('LL Image');
subplot(2,2,2),imshow(cH);title('LH Image');
subplot(2,2,3),imshow(cV);title('HL Image');
subplot(2,2,4),imshow(cD);title('HH Image');

%%%%%%%%%CALICULTE gradient vector for each DWT scale%%%%%%%%%%


if size(cA,3)==3
cA=rgb2gray(cA);
cH=rgb2gray(cH);
cV=rgb2gray(cV);
cD=rgb2gray(cD);
end
[FX1, Gdir1] = imgradient(cA,'prewitt');
figure,subplot(2,2,1),imshow(mat2gray(FX1));title('gradient LL comp');
[FX2, Gdir2] = imgradient(cH,'prewitt');
subplot(2,2,2),imshow(mat2gray(FX2));title('gradient LH comp');
[FX3, Gdir3] = imgradient(cV,'prewitt');
subplot(2,2,3),imshow(mat2gray(FX3));title('gradient HL comp');
[FX4, Gdir4] = imgradient(cD,'prewitt');
subplot(2,2,4),imshow(mat2gray(FX4));title('gradient HH comp');
a=imresize(I,[30 30]);
a = double(I);
[nr,nc]=size(I);
% [dx,dy] = gradient(I);
[x y] = meshgrid(1:nc,1:nr);
u = x;

60
v = y;
quiver(x,y,u,v);

[Gmag, Gdir] = imgradient(I,'prewitt');


figure
imshowpair(Gmag, Gdir, 'montage');
title('Gradient Magnitude, Gmag (left), and Gradient')

% 1)Select an initial estimate for T


T = 128;
T0 = .5;
G1 = I > T;
G2 = I <= T;
% 3)Compute the average gray level values mean1 and
% mean2 for the pixels in regions G1 and G2.
meanGL1 = mean(I(G1))
meanGL2 = mean(I(G2))
% 4)Compute a new threshold value
Tnew=(1/2) * (meanGL1 +meanGL2)
if (Tnew - T) < T0
end
%%%%%%% algoritham
k=0
I=nSig;
nSig=max(I);
% x=nSing+k(I(nSig));
k=6; %%%%%%%%%%%%%%%%%%%%% upto 6 orientations
[Faf, Fsf] = FSdoubledualfilt;
[af, sf] = doubledualfilt;
im1 = double(imread('[Link]'));
im2 = double(imread('[Link]'));
% image decomposition
y = gradient_HP(im1,k);
w1=gray2rgb(y);
w1=num2cell(w1(:,1:7),2);
%w1=w1(:,1:6);
T = 6; % choose a threshold of 15
y = gradient_HP(im2,T);

w2=gray2rgb(y);
w2=num2cell(w2(:,1:7),2);
e_j=std(y);
disp('standard deviation of enargy---');
disp(e_j);
clc;
% Image fusion process start here
for k1=1:k
for p=1:2
for d1=1:2
for d2=1:3
k=6;

61
x = w1{k};
y = w2{k};
x=sort(x); %%%%%%%%%%% sorting or ordering rank
coefficents
D = (abs(x)-abs(y)) >= 0;
wf{k1}{p}{d1}{d2} = D.*x + (~D).*y; % image cat
end
end
end
end
[nndata AlexNet]=size(wf);
nn_data_image=0:0.01:nndata;
y=nn_data_image.^2;

net=newff(minmax(nn_data_image),[20,AlexNet],{'logsig','purelin','trainln'})
[Link]=4000;
[Link]=1e-25;
[Link]=0.01;
% net=train(net,nn_data_image,y);

out_nn=y(20);
out_nn=net(nn_data_image(20));
nn_data=(ceil(out_nn)./20)/out_nn;
out_AlexNet=ceil(nn_data);
out_AlexNet=length(out_AlexNet)-2;
%%%%%%%%% Apply inverse double density DWT %%%%%%%%%
load w
y = Iterative_HS(w,out_AlexNet,Fsf,sf);
y=imadd(y,33);
y=double(y);
figure; imshow(mat2gray(y));title('NR-IQA image');
imwrite(y,'out_img.jpg');

y=imresize(y,[256 256]);
nSig=imresize(nSig,[256 256]);

[PNSR,MSE,NC,SSIM]=measerr(nSig,y)

62
Chapter 7

CONCLUSION
We described a deep CNN-based NR-IQA framework. Applying a CNN to NR-IQA is a
challenging issue, because there are critical obstacles. In the DIQA, an objective error map was
used as an intermediate regression target to avoid overfitting with the limited database. When the
first training stage is not run enough, the DIQA suffers from the overfitting problem leading to a
degradation of performance. The input normalization and the reliability map increased the
accuracy significantly as well. The final DIQA model outperformed all the benchmarked full-
reference methods as well as no- reference methods. We further showed that the performance of
the DIQA is independent of the selection of the database. We additionally proposed the DIQA-
SENS to visualize and analyze the learned perceptual error maps. The perceptual error maps
followed the behavior of the HVS. In the future, we will investigate a new way to obtain more
meaningful sensitivity maps that can provide a more interpretable analysis with respect to the
HVS.

63
REFERENCES

[1] A. K. Moorthy and A. C. Bovik, “Blind image quality assessment: From natural scene
statistics to perceptual quality,” IEEE Trans. Image Process., vol. 20, no. 12, pp. 3350–3364,
Dec. 2011.

[2] M. A. Saad, A. C. Bovik, and C. Charrier, “Blind image quality assessment: A natural
scene statistics approach in the DCT domain,” IEEE Trans. Image Process., vol. 21, no. 8, pp.
3339–3352, Aug. 2012.

[3] W. Hou, X. Gao, D. Tao, and X. Li, “Blind image quality assessment via deep learning,”
IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 6, pp. 1275–1286, Jun. 2015.

[4] Y. Li et al., “No-reference image quality assessment with shear- let transform and deep
neural networks,” Neurocomputing, vol. 154, pp. 94–109, Apr. 2015.

[5] H. R. Sheikh, M. F. Sabir, and A. C. Bovik, “A statistical evaluation of recent full


reference image quality assessment algorithms,” IEEE Trans.

Image Process., vol. 15, no. 11, pp. 3440–3451, Nov. 2006.

[6] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale
hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun.
2009, pp. 248–255.

[7] S. J. Daly, “The visible differences predictor: An algorithm for the assessment of image
fidelity,” Proc. SPIE, vol. 1666, pp. 179–206, Jan. 1992.

[8] A. B. Watson and A. J. Ahumada, “A standard model for foveal detection of spatial
contrast,” J. Vis., vol. 5, no. 9, p. 6, 2005.

[9] G. E. Legge and J. M. Foley, “Contrast masking in human vision,” J. Opt. Soc. Amer., vol.
70, no. 12, pp. 1458–1471, Dec. 1980.

[10] D. J. Field, “Relations between the statistics of natural images and the response properties
of cortical cells,” J. Opt. Soc. Amer. A, Opt. Image Sci., vol. 4, no. 12, pp. 2379–2394, 1987.

[11] A. Mittal, A. K. Moorthy, and A. C. Bovik, “No-reference image quality assessment in


the spatial domain,” IEEE Trans. Image Process., vol. 21, no. 12, pp. 4695–4708, Dec. 2012.

[12] A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a ‘completely blind’ image


quality analyzer,” IEEE Signal Process. Lett., vol. 20, no. 3, pp. 209–212, Mar. 2013.

64
[13] C. Li, A. C. Bovik, and X. Wu, “Blind image quality assessment using a general
regression neural network,” IEEE Trans. Neural Netw., vol. 22, no. 5, pp. 793–799, May 2011.

[14] H. Tang, N. Joshi, and A. Kapoor, “Learning a blind measure of perceptual image
quality,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2011, pp. 305–312.

[15] P. Ye, J. Kumar, L. Kang, and D. Doermann, “Unsupervised feature learning framework
for no-reference image quality assessment,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
(CVPR), Jun. 2012, pp. 1098–1105.

[16] J. Xu, P. Ye, Q. Li, H. Du, Y. Liu, and D. Doermann, “Blind image quality assessment
based on high order statistics aggregation,” IEEE

Trans. Image Process., vol. 25, no. 9, pp. 4444–4457, Sep. 2016.

[17] Y. Lv, G. Jiang, M. Yu, H. Xu, F. Shao, and S. Liu, “Difference of Gaussian statistical
features based blind image quality assessment: A deep learning approach,” in Proc. IEEE Int.
Conf. Image Process. (ICIP), Sep. 2015, pp. 2344–2348.

[18] D. Ghadiyaram and A. C. Bovik, “Feature maps driven no-reference image quality
prediction of authentically distorted images,” Proc. SPIE, vol. 9394, p. 93940J, Mar. 2015.

[19] L. Kang, P. Ye, Y. Li, and D. Doermann, “Convolutional neural net- works for no-
reference image quality assessment,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
(CVPR), Jun. 2014, pp. 1733–1740.

[20] S. Bosse, D. Maniry, T. Wiegand, and W. Samek, “A deep neural network for image
quality assessment,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Sep. 2016, pp. 3773–3777.

[21] J. Kim and S. Lee, “Fully deep blind image quality predictor,” IEEE J. Sel. Topics Signal
Process., vol. 11, no. 1, pp. 206–220, Feb. 2017.

[22] K. Ma, W. Liu, T. Liu, Z. Wang, and D. Tao, “dipIQ: Blind image quality assessment by
learning-to-rank discriminable image pairs,” IEEE Trans.

Image Process., vol. 26, no. 8, pp. 3951–3964, Aug. 2017.

[23] K. Ma, W. Liu, K. Zhang, Z. Duanmu, Z. Wang, and W. Zuo, “End-to- end blind image
quality assessment using deep neural networks,” IEEE

Trans. Image Process., vol. 27, no. 3, pp. 1202–1213, Mar. 2018.

[24] K. Simonyan and A. Zisserman. (Sep. 2014). “Very deep convolutional networks for
large-scale image recognition.” [Online]. Available: [Link]

65
[25] V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann
machines,” in Proc. Int. Conf. Mach. Learn. (ICML), 2010, pp. 807–814.

[26] H. R. Sheikh and A. C. Bovik, “Image information and visual quality,”

IEEE Trans. Image Process., vol. 15, no. 2, pp. 430–444, Feb. 2006.

[27] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. Int. Conf.
Learn. Represent. (ICLR), 2015, pp. 1–15.

[28] T. Dozat, “Incorporating Nesterov momentum into Adam,” in Proc. ICLR Workshop,
2016.

[29] E. C. Larson and D. M. Chandler, “Most apparent distortion:

Full-reference image quality assessment and the role of strategy,”

J. Electron. Imag., vol. 19, no. 1, p. 011006, 2010.

[30] N. Ponomarenko et al., “Image database TID2013: Peculiarities, results and


perspectives,” Signal Process., Image Commun., vol. 30, pp. 57–77,

Jan. 2015. [Online]. Available: [Link]

[31] D. Jayaraman, A. Mittal, A. K. Moorthy, and A. C. Bovik, “Objective

quality assessment of multiply distorted images,” in Proc. 46th Asilomar

Conf. Signals, Syst. Comput. (ASILOMAR), Nov. 2012, pp. 1693–1697.

[32] D. Ghadiyaram and A. C. Bovik, “Massive Online crowdsourced study

of subjective and objective picture quality,” IEEE Trans. Image Process.,

vol. 25, no. 1, pp. 372–387, Jan. 2016.

[33] K. Ma et al., “Waterloo exploration database: New challenges for image

quality assessment models,” IEEE Trans. Image Process., vol. 26, no. 2,

pp. 1004–1016, Feb. 2017.

[34] Final Report From the Video Quality Experts Group on the Validation of Objective
Models of Video Quality Assessment, Phase II (FR-TV2),

document ITU-T SG09, VQEG, 2003.

66
[35] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment:
From error visibility to structural similarity,” IEEE

Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.

[36] L. Zhang, L. Zhang, X. Mou, and D. Zhang, “FSIM: A feature similarity index for image
quality assessment,” IEEE Trans. Image Process., vol. 20, no. 8, pp. 2378–2386, Aug. 2011.

[37] J. Kim and S. Lee, “Deep learning of human visual sensitivity in image quality
assessment framework,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017,
pp. 1969–1977.

[38] L. Zhang, L. Zhang, and A. C. Bovik, “A feature-enriched completely blind image quality
evaluator,” IEEE Trans. Image Process., vol. 24, no. 8, pp. 2579–2591, Aug. 2015.

[39] W. Xue, X. Mou, L. Zhang, A. C. Bovik, and X. Feng, “Blind image quality assessment
using joint statistics of gradient magnitude and Laplacian features,” IEEE Trans. Image Process.,
vol. 23, no. 11, pp. 4850–4862, Nov. 2014.

[40] Q. Li, W. Lin, J. Xu, and Y. Fang, “Blind image quality assessment using statistical
structural and luminance features,” IEEE Trans. Multimedia, vol. 18, no. 12, pp. 2457–2469,
Dec. 2016.

[41] A. Liu, W. Lin, M. Paul, C. Deng, and F. Zhang, “Just noticeable difference for images with
decomposition model for separating edge and textured regions,” IEEE Trans. Circuits Syst.
Video Technol., vol. 20, no. 11, pp. 1648–1652, Nov. 2010.

[42] J. Kim, H. Zeng, D. Ghadiyaram, S. Lee, L. Zhang, and A. C. Bovik, “Deep convolutional
neural models for picture-quality prediction: Challenges and solutions to data-driven image
quality assessment,” IEEE Signal Process. Mag., vol. 34, no. 6, pp. 130–141, 2017.

[43] J. Kim and S. Lee, “Deep learning of human visual sensitivity in image quality assessment
framework,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 1676–1684.

67

You might also like