Emerging Research in Computing, Information, Communication and Applications
Emerging Research in Computing, Information, Communication and Applications
Emerging Research in Computing, Information, Communication and Applications
N. R. Shetty
L. M. Patnaik
N. H. Prasad Editors
Emerging Research
in Computing,
Information,
Communication
and Applications
Proceedings of ERCICA 2022
Lecture Notes in Electrical Engineering
Volume 928
Series Editors
Leopoldo Angrisani, Department of Electrical and Information Technologies Engineering, University of Napoli
Federico II, Naples, Italy
Marco Arteaga, Departament de Control y Robótica, Universidad Nacional Autónoma de México, Coyoacán,
Mexico
Bijaya Ketan Panigrahi, Department of Electrical Engineering, Indian Institute of Technology Delhi,
New Delhi, Delhi, India
Samarjit Chakraborty, Fakultät für Elektrotechnik und Informationstechnik, TU München, Munich, Germany
Jiming Chen, Zhejiang University, Hangzhou, Zhejiang, China
Shanben Chen, Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Tan Kay Chen, Department of Electrical and Computer Engineering, National University of Singapore,
Singapore, Singapore
Rüdiger Dillmann, Humanoids and Intelligent Systems Laboratory, Karlsruhe Institute for Technology,
Karlsruhe, Germany
Haibin Duan, Beijing University of Aeronautics and Astronautics, Beijing, China
Gianluigi Ferrari, Università di Parma, Parma, Italy
Manuel Ferre, Centre for Automation and Robotics CAR (UPM-CSIC), Universidad Politécnica de Madrid,
Madrid, Spain
Sandra Hirche, Department of Electrical Engineering and Information Science, Technische Universität
München, Munich, Germany
Faryar Jabbari, Department of Mechanical and Aerospace Engineering, University of California, Irvine, CA,
USA
Limin Jia, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Alaa Khamis, German University in Egypt El Tagamoa El Khames, New Cairo City, Egypt
Torsten Kroeger, Stanford University, Stanford, CA, USA
Yong Li, Hunan University, Changsha, Hunan, China
Qilian Liang, Department of Electrical Engineering, University of Texas at Arlington, Arlington, TX, USA
Ferran Martín, Departament d’Enginyeria Electrònica, Universitat Autònoma de Barcelona, Bellaterra,
Barcelona, Spain
Tan Cher Ming, College of Engineering, Nanyang Technological University, Singapore, Singapore
Wolfgang Minker, Institute of Information Technology, University of Ulm, Ulm, Germany
Pradeep Misra, Department of Electrical Engineering, Wright State University, Dayton, OH, USA
Sebastian Möller, Quality and Usability Laboratory, TU Berlin, Berlin, Germany
Subhas Mukhopadhyay, School of Engineering and Advanced Technology, Massey University,
Palmerston North, Manawatu-Wanganui, New Zealand
Cun-Zheng Ning, Department of Electrical Engineering, Arizona State University, Tempe, AZ, USA
Toyoaki Nishida, Graduate School of Informatics, Kyoto University, Kyoto, Japan
Luca Oneto, Department of Informatics, Bioengineering, Robotics and Systems Engineering, University of
Genova, Genova, Genova, Italy
Federica Pascucci, Dipartimento di Ingegneria, Università degli Studi “Roma Tre”, Rome, Italy
Yong Qin, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China
Gan Woon Seng, School of Electrical and Electronic Engineering, Nanyang Technological University,
Singapore, Singapore
Joachim Speidel, Institute of Telecommunications, Universität Stuttgart, Stuttgart, Germany
Germano Veiga, Campus da FEUP, INESC Porto, Porto, Portugal
Haitao Wu, Academy of Opto-electronics, Chinese Academy of Sciences, Beijing, China
Walter Zamboni, DIEM—Università degli studi di Salerno, Fisciano, Salerno, Italy
Junjie James Zhang, Charlotte, NC, USA
The book series Lecture Notes in Electrical Engineering (LNEE) publishes the
latest developments in Electrical Engineering—quickly, informally and in high
quality. While original research reported in proceedings and monographs has
traditionally formed the core of LNEE, we also encourage authors to submit books
devoted to supporting student education and professional training in the various
fields and applications areas of electrical engineering. The series cover classical and
emerging topics concerning:
• Communication Engineering, Information Theory and Networks
• Electronics Engineering and Microelectronics
• Signal, Image and Speech Processing
• Wireless and Mobile Communication
• Circuits and Systems
• Energy Systems, Power Electronics and Electrical Machines
• Electro-optical Engineering
• Instrumentation Engineering
• Avionics Engineering
• Control Systems
• Internet-of-Things and Cybersecurity
• Biomedical Devices, MEMS and NEMS
For general information about this book series, comments or suggestions, please
contact [email protected].
To submit a proposal or request further information, please contact the Publishing
Editor in your country:
China
Jasmine Dou, Editor ([email protected])
India, Japan, Rest of Asia
Swati Meherishi, Editorial Director ([email protected])
Southeast Asia, Australia, New Zealand
Ramesh Nath Premnath, Editor ([email protected])
USA, Canada
Michael Luby, Senior Editor ([email protected])
All other Countries
Leontina Di Cecco, Senior Editor ([email protected])
** This series is indexed by EI Compendex and Scopus databases. **
N. R. Shetty · L. M. Patnaik · N. H. Prasad
Editors
Emerging Research
in Computing, Information,
Communication
and Applications
Proceedings of ERCICA 2022
Editors
N. R. Shetty L. M. Patnaik
Nitte Meenakshi Institute of Technology National Institute of Advanced Studies
Bengaluru, Karnataka, India (NIAS)
Bengaluru, Karnataka, India
N. H. Prasad
Nitte Meenakshi Institute of Technology
Bengaluru, Karnataka, India
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Organizing Committee
ERCICA-2022
ERCICA 2022—Committees
Chief Patrons
Conference Chair
Program Chairs
v
vi Organizing Committee
Publication
Springer-LNEE Series
Advisory Chairs
Advisory Committee
Program Committee
Dr. Ankit Singhal, Power System Research Engineer, Pacific Northwest National
Laboratory, USA
Dr. M. A. Ajay Kumara, D&H Schort School of Computing Sciences and Mathe-
matics, Lenoir-Rhyne University, Hickory, NC, USA
Hardik Gohel, Department of Computer Science, School of Art and IT Sciences,
University of Houston, Victoria, TX, USA
Dr. Sanjeev K. Cowlessur, Department of Software Engineering, Faculty of Infor-
mation and Communication Technology, Université des Mascareignes, Beau Plan,
Pamplemousses, Mauritius
Mr. Sreenivas Divi, Director of IT and Product Management, 46030, Manekin Plaza,
Suite 150, Sterling, VA 20166, USA
Mr. Hemant Gaur, Principal Program Manager, Microsoft Power Apps Redmond ,
WA 98054, USA
Prof. Subarna Shakya, Professor, Department of Electronics and Computer
engineering, Pulchowk Campus, Institute of Engineering, Tribhuvan University,
Pulchowk, Lalitpur, Nepal
Prof. Savitri Bevinakoppa, School of Information Technology and Engineering
(SITE) |Academic Department, The Argus, Level 6, 284–294 La Trobe Street,
Melbourne Victoria 3000, Australia
Dr. Tom Wallingo, School of Electrical, Electronics and Computer Engineering,
Howard College Campus, University of KwaZulu-Natal, Durban, South Africa
Organizing Co-Chairs
ix
Acknowledgments
First of all, we would like to thank Professor N. R. Shetty who has always been
the guiding force behind this event’s success. It was his dream that we have striven
to make a reality. Our thanks to Dr. H. C. Nagaraj, who has monitored the whole
activity of the conference from the beginning till its successful end.
Our special thanks to Springer and especially the editorial staff who were patient,
meticulous and friendly with their constructive criticism on the quality of papers and
outright rejection at times without compromising the quality of the papers as they
are always known for publishing the best international papers.
We would like to express our gratitude to all the review committee members of
all the themes of computing, information, communication and applications and the
best paper award review committee members.
Finally, we would like to express our heartfelt gratitude and warmest thanks to the
ERCICA 2022 organizing committee members for their hard work and outstanding
efforts. We know how much time and energy this assignment demanded, and we
deeply appreciate all the efforts to make it a grand success.
Our special thanks to all the authors who have contributed to publish their research
work in this conference and participated to make this conference a grand success.
Thanks to everyone who have directly or indirectly contributed to the success of this
conference ERCICA 2022.
Program Chairs
ERCICA 2022
xi
About the Conference
ERCICA 2022
xiii
Contents
xv
xvi Contents
L. M. Patnaik obtained his Ph.D. in 1978 in the area of real-time systems and D.Sc.
in 1989 in the areas of computer systems and architectures, both from the Indian
Institute of Science (IISc), Bengaluru. From March 2008 to August 2011, he was
Vice-Chancellor, Defence Institute of Advanced Technology, Deemed University,
Pune. Currently, he is Honorary Professor with the Department of Electronic Systems
Engineering, Indian Institute of Science, Bengaluru, and INSA Senior Scientist
and Adjunct Professor with the National Institute of Advanced Studies, Bengaluru.
During the last 50 years of his long service, his teaching, research, and development
interests have been in the areas of parallel and distributed computing, computer archi-
tecture, CAD of VLSI systems, high-performance computing, mobile computing,
theoretical computer science, real-time systems, soft computing and computational
neuroscience including machine cognition. In these areas, he has 1286 publications
in refereed international journals and refereed international conference proceedings
including 30 technical reports, 43 books, and 26 chapters in books.
1 Introduction
Depletion of raw materials required for conventional coal power plants, changing
rain patterns, environmental concerns such as pollution and climatic changes and
growth of deregulated energy markets have turned the focus in the energy sector
to naturally available renewable energy sources (RESs). Being situated near to the
load end sites, these sources are also referred to as distributed generation sources. In
addition to being environmental friendly, these small-scale sources, when properly
sized and located, can provide technical benefits such as reduced system losses,
improved voltage profile, and voltage stability for the system due to their nearness
to the load ends. Though DG sources were earlier thought to be more the privy
of independent owners, with the distributed sources becoming more and more an
integral part of the electrical system, utilities today are looking at themselves as
cost-effective distributed generation providers.
Optimal deployment of DG has been a topic of continued interest among
researchers. Earlier research mainly concentrated on DG sources (DGs) which
supplied only active power [1, 2] or only reactive power [3]. But with high penetra-
tion of DGs, expectation of reactive power from the sources also increased. Later
studies have therefore concentrated on optimal power factor (pf) determination of DG
along with their positions and sizes. The methods have been analytical or heuristic
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 1
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_1
2 N. John et al.
to improve the exploration and exploitation capabilities of the basic TLBO algo-
rithm. The basic TLBO algorithm considers the learning enthusiasm of all learners
to be same. Learning enthusiasm-based TLBO (LebTLBO) was proposed by the
authors in [18]. LebTLBO considers a learning enthusiasm value for each learner.
This improves the search efficiency of the algorithm.
This paper considers the optimal deployment and operational power factor of
DGs in a distribution system embedded with DisCo-owned DGs. With PV systems
constituting 50% of the renewable energy sources [19], the DGs considered are
PV systems with inverter control capability. A multi-objective function has been
formulated with the aim of reducing distribution losses, improving voltage stability,
and reducing energy costs. The proposed solution is based on LebTLBO algorithm
and backward/forward sweep load flow method [20]. The study assumes availability
of storage backed PV sources for continuous supply.
2 Problem Formulation
If the distribution losses are minimized, the active power delivery to the load
increases. The distribution losses minimization can be formulated as in (1), where
nbr is number of branches and I k and Rk are current and resistance of kth branch.
Σ
nbr
f1 = Ik2 Rk (1)
k=1
Improvement of voltage stability can be analyzed with voltage stability indices. The
voltage stability index (VSI), introduced in [21], can be determined at each node.
The least value is considered as the index of the system. The maximum value of the
voltage stability index is considered as 1. With a line connecting buses i and j, the
voltage stability index at the jth bus can be determined as shown in Eq. (2). V i is the
voltage of ith bus and Rline and X line the resistance and reactance of the line.
| | ( )2 P j × Rline + Q j × X line
f 2 = |Vi4 | − 4 × Pj × X line − Q j × Rline − 4 × (2)
Vi2
4 N. John et al.
Real power demand (P) from the grid is given by (3). As the distribution companies
are the DG owners, it is important that the company derives the maximum operational
benefit by minimizing the amount of purchased energy from grid. Pi is the active
load at each bus I, and PDGn is the real power delivered by the nth DG. NDG is no. of
buses with DG.
Σ
N Σ
NDG
P= Pi − PDGn + Ploss (3)
i=1 n=1
With DG sources considered as renewable energy sources, the fuel costs are
negligible. The energy cost incurred by the utility (DG operator) consists of the cost
to be paid for energy bought from grid. The energy cost can be calculated as in (4).
C p is the cost of energy, and P the number of units bought from the grid.
f3 = P × C p (4)
2.2 Constraints
Constraints in this optimization problem are given by Eqs. (5) and (6).
The voltage at every bus must be kept within standard limits. Equation (5) represents
the voltage constraint. The limits considered are –6% to +6%, i.e., 0.94 pu and 1.06
pu, respectively.
OF = min{k1 f 1 + k2 (1 − f 2 ) + k3 f 3 } (7)
The basic TLBO algorithm is based on the influence of teacher and peers on the
learning of a student. The algorithm involves two phases—the teacher phase and
the student phase. The teacher phase accounts for the influence of teacher, and the
learner phase accounts for the influence of peers on a on a student’s result. However,
the learning enthusiasm (LE) of each student is different in the real-world scenario.
Therefore, in LebTLBO, the learning enthusiasm mechanism has been introduced
into the teacher phase and the learner phase. LebTLBO, in addition, has a poor
student tutoring phase.
In basic, TLBO learners represent the population and subjects represent the design
variables. Assume xi = (xi1 , xi2 , xi3 , . . . , xid ) be the ith learner vector, where 1,
2,…,d are the design variables (subjects). After evaluating the fitness function, the
learner with best fitness is chosen as the teacher. In a class of NL learners, the mean
position can be taken as (8)
Σ
NL
d
xmean = xid (8)
i=1
Each learner position is modified in the teacher phase as Eq. (9). T F is the teacher
factor which is either 1 or 2.
The modified vectors from teacher phase become the input to the learner phase.
Learner phase involves the interaction of learner i with learner j, and the ith learner
vector is modified as in Eq. (10).
6 N. John et al.
3.2 LebTLBO
It is assumed that learners with high grades have good LE, while learners with low
grades have poor LE. Therefore, the learners are sorted in the order of best to worst
based on grades Eq. (11). For a problem with minimization of fitness value involving
NL learners, let
The LE value for the kth learner is then obtained as Eq. (12).
NL − K
LEk = LEmin + (LEmax − LEmin ) (12)
NL
k = 1, 2,…NL; LEmax = 1 and LEmin = [0.1, 0.5].
The teacher is considered the best performer of the class. For every learner x k , a
random number r k e [0, 1] is generated. If (rk < LEk ), then it is considered that
the learner x k will learn from the teacher; otherwise, the learner will disregard the
knowledge from teacher. A diversity enhanced teaching strategy is then used for the
learner who is considered to learn from the teacher. The updated vector is given by
(13).
⎧ ( d ) ⎫
d
xk,old + rand2 xteacher − TF × xmean
d
if rand1 < 0.5
d
xk,new = ( )
xrd1 + F xrd2 − xrd3 if rand1 > 0.5 (13)
where r 2 and r 3 are integers selected randomly from (1, 2,…NL), d ∈ {0, 1…D},
rand1 and rand2 are two uniformly distributed random numbers in the range [0, 1],
and F is a scale factor in [0, 1]. The updation of the current vector employs a mix
of basic TLBO and the differential evolution mutation operator unlike basic TLBO
which uses the same differential vector xteacher − TF × xmean to steer every learner
Teaching Learning-Based Optimization with Learning … 7
to the level of teacher. If the vector xk,new is better than xk,old , it is accepted, else the
value of xk,old is retained.
The learner phase is similar to basic TLBO but with the introduction of learning
enthusiasm. The learning enthusiasm values are defined using (12) after sorting
the learners in the order of grades. As in the teaching phase, a random value r k is
generated, and if r k is lesser than LEk , then it is considered that the kth learner can
learn from other learners. The updated vector for x k after interaction with jth learner
is given by Eq. (14).
( ) ( ) ( )
xk,new = xk,old + rand xk,old − x j,old if f xk,old better than f x j,old
( ) ( ) ( ) (14)
xk,new = xk,old + rand x j,old − xk,old if f x j,old better than f xk,old
where rand is a uniformly distributed random vector within the interval [0, 1]. If the
vector xk,new is better than xk,old it is accepted, else the value of xk,old is retained.
Students with lower grades have very few opportunities to improve their grades during
the teaching and learning phases than students with higher grades. The third phase
of poor student tutoring helps to resolve this. According to their grades, students are
ranked from best to worst. The learners who are in the bottom ten percent of the
ranking are considered as poor learners. For each poor learner, a random learner x top
at the top fifty percent is selected. The updated vector can be represented as (15).
( )
xk,new = xk,old + rand xtop − xk,old (15)
If the vector xk,new is better than xk,old , it is accepted, else the value of xk,old is
retained.
IEEE 33 bus standard test system with connected load of 3715 kW + 2300 kVAR
[22] has been used for simulations. The PV systems are assumed to have continuous
ratings. Optimal sizes and locations are determined considering unity power factor
for the PV inverters. With the same PV deployment, the optimal inverter power
factor is determined with the single objective function of minimizing losses (case 1)
and with the multi-objective function of minimizing distribution losses, maximizing
voltage stability, and minimizing energy cost (case 2). The power factor variation
for the PV inverters is assumed from 0.8 leading to unity power factor to find the
optimal operational pf of PV systems. Table 1 summarizes the base case results.
Optimal PV deployment to minimize distribution losses
The objective function is formulated as in Eq. (1) to minimize distribution losses.
Table 2 summarizes the findings. As compared to the base case, the distribution losses
are reduced by 67.15% when compared to base case losses. Comparative analysis of
results with existing solutions shows the effectiveness of the proposed algorithm. The
proposed method is able to give lower losses as compared to existing methods. The
lower losses are attained at a lower penetration level. To highlight the competitiveness
of LebTLBO, the obtained performance characteristics are compared with that of
other heuristic algorithms like PSO, CSA, and basic TLBO in solving the stated
OPDG problem. This is depicted in Fig. 2.
Optimal PV inverter power factor to minimize distribution losses (Case 1)
The DG deployment is considered same as in Table 2. The variations in load profile
and generation have not been considered here. The results obtained for the 33 bus
test system, listed in Table 3, show that distribution losses reduce and the voltage
Fig. 2 Convergence 82
characteristics 80 LebTLBO
78 TLBO
76 PSO
P loss in kW
74
72
70
68
66
64
62
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
Iteration
stability of the system improves when the PV inverters operate at non-unity power
factors. The distribution losses have been reduced to 15.2 kW from the base case loss
of 210.07 kW. The comparative analysis with existing results in literature as given
in Table 3 shows that the PV deployment results given by the proposed method are
able to give lower losses. As the power factor constraint requires the inverter power
factor to be above 0.8, the compared results have been chosen where the power factor
of DG is maintained above 0.8. The proposed method has achieved lower losses as
compared to other methods, and as the results show, this has been achieved with a
lower DG penetration level. The simulation findings demonstrate the efficacy of the
proposed method.
5 Conclusion
In this study, LebTLBO algorithm is used to develop a new approach for deter-
mining the best allocation and operating power factors of DisCo-owned DG sources
in a radial distribution network. With the proposed method, the optimal deployment
of PV systems and optimal PV inverter power factor have been determined with the
objectives of minimizing distribution losses, improving voltage stability, and mini-
mizing energy cost. Comparative analysis with existing results has been presented
to show the effectiveness of the LebTLBO algorithm. The multi-objective function
improves the voltage stability and reduces the system losses. The improved utilization
of the PV system capacities reduces the incurred energy cost. The total operational
cost for the DisCos therefore reduces. In this work, variations in network load profiles
12 N. John et al.
and DG powers are ignored, which can result in changes in voltage stability and even
economic issues. This can be considered an extension.
References
20. Haque MH (1996) Efficient load flow method for distribution systems with radial or mesh
configuration. IEE Proc Gener Transm Distrib 143(1):33–38
21. Chakravorty M, Das D (2001) Voltage stability analysis of radial distribution networks. Int J
Electr Power Energ Syst 23(2):129–135
22. Kashem MA, Ganapathy V, Jasmon GB, Buhari MI (2000) A novel method for loss minimiza-
tion in distribution networks. In: Proceedings of the international conference on electric utility
deregulation and restructuring and power technologies, City University, London, pp 251–256
23. Kefayat M, Lashkar Ara A, Nabavi Niaki SA (2015) A hybrid of ant colony optimization and
artificial bee colony algorithm for probabilistic optimal placement and sizing of distributed
energy resources. Energy Convers Manage 92:149–161
24. Mahmoud K, Yorino N, Ahmed A (2016) Optimal distributed generation allocation in
distribution systems for loss minimization. IEEE Trans Power Syst 31(2):960–969
25. Injeti SK, Prema Kumar N (2013) A novel approach to identify optimal access point and
capacity of multiple DGs in a small, medium and large scale radial distribution systems. Int J
Electr Power Energy Syst 45(1):142–151
26. Hung DQ, Mithulananthan N (2013) Multiple distributed generator placement in primary
distribution networks for loss reduction. IEEE Trans Ind Electron 60(4):1700–1708
27. Muthukumar K, Jayalalitha S (2016) Optimal placement and sizing of distributed generators
and shunt capacitors for power loss minimization in radial distribution networks using hybrid
heuristic search optimization technique. Int J Elect Power Energy Syst 78:299–319
28. Kansal S, Tyagi B, Kumar V (2017) Cost–benefit analysis for optimal distributed generation
placement in distribution systems. Int J Ambient Energy 38(1):45–54
Machine Learning Framework
for Prediction of Parkinson’s Disease
in Cloud Environment
1 Introduction
The severity of “Parkinson’s disease” worsens over time, making it one of the most
complicated conditions that affects a person’s motor functions. This disease affects
around 1% of the population over sixty. The prevalence is approximately 250 per
10,000 persons. Average onset age is between 55 and 65 years [1]. The problem of
Parkinson’s Disease prediction using ML methods is novel. The early diagnosis of
the phases of Parkinson’s disease can be very beneficial for treating the illness [2].
The use of computing resources in healthcare departments is increasing all the time,
and it is becoming the norm to electronically record patient data that was formerly
recorded on paper-based forms. As a result, a substantial number of electronic health
records are now more accessible. ML and data mining techniques can be used to
improve the quality and productivity of medical and healthcare facilities, as well as
predict the likelihood of Parkinson’s disease [3].
Over the last two decades, the rate of innovation in the field of machine learning
has risen dramatically. From machine learning models that can classify every object
in a photograph to disease detection, progress can be seen in a range of sectors
[4]. Machine learning has led to a number of successes, including the detection of
diseases in patients, the development of AI chatbots, and the enhancement of speech
recognition in natural language processing [5]. The objective of this effort was to
build a system for forecasting Parkinson’s disease that takes voice data as input,
transmits it to the cloud for speech analysis, and then uses a machine learning model
to come up with a prediction.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 15
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_2
16 K. A. Shastry et al.
2 Related Work
Various research on the diagnosis of Parkinson’s disease have been done in recent
years, employing models such as neural networks, decision trees, and regression,
thanks to advancements in the field of machine learning and natural language
processing. This section discusses certain relevant research works in the area of
Parkinson detection using ML techniques.
Pramanik and Sarker [6] conducted one such research on the detection of
“Parkinson’s” utilizing vocal information from patients. The information employed
in this research was provided by the “Department of Neurology in Cerrahpasa,
Faculty of Medicine, Istanbul University”. The data set includes information from
“188” people with Parkinson’s disease (in which 81 are women and are 107 men).
The patients in this data set range in age from 33 to 87, with an average age of 65.1.
The “data set” also includes information of “64” healthful people (“41” female and
“23” male), with a median age of 61.1. In [7], the researchers intended to discriminate
Parkinson Disease subjects from the people who were not afflicted by the disease. The
study solicited the help of 40 Parkinson’s disease sufferers and 40 healthy people.
This investigation’s methodology includes a brief questionnaire and three record-
ings from each participant. A total of 44 acoustic features were examined in each
recording. i.e., 44-dimensional vector per voice recording. These extracted traits are
classified into many groups based on their ability to predict whether or not a person
will be impacted by Parkinson’s disease.
The authors of [8] built a cloud-based system for calculating, storing, and moni-
toring voice and tremor samples taken by cell phones to identify Parkinson’s disease.
They discovered that k-nearest neighbors (k-NN) outperformed support vector
machine (SVM) and naive Bayes in terms of accuracy (NB). To identify Parkinson’s
patient samples from healthy people, [9] employed a classification system based on
convolutional neural networks (CNN), artificial neural networks (ANN), and hidden
Markov models (HMM). The authors found that the ANN-based Parkinson detection
system outperforms the HMM and CNN-based Parkinson detection systems.
3 Dataset
The voice dataset was collected from the UCI ML repository [11] and used in our
model. The given voice data was then processed using Parselmouth, a speech analysis
package. It splits the voice into numerous parameters, and we use our model to
forecast the outcome based on these parameters.
Table 1 explains the parameters of the voice dataset [12].
Machine Learning Framework for Prediction … 17
4 Proposed Work
This section describes the proposed work. Figure 1 depicts the application’s
framework.
The disease-related data is stored in the cloud. The Parkinson’s input parameters
are used to determine whether the individual has Parkinson’s disease. The prediction
model is then deployed on the cloud, enabling doctors all over the globe to access
the findings.
The data flow diagram for the application is shown in Fig. 2.
18 K. A. Shastry et al.
We have voice recordings of users who have the condition and those who do not.
We use that to train the model. The user who wants to test for Parkinson’s disease
uploads a voice recording to a server that holds the speech processing library and
machine learning model. The result is then retrieved and communicated to the user,
who is informed whether or not he is infected with the disease.
In this work, the decision tree classifier and the random forest classifier were
compared. The algorithms are explained in Sects. 4.1 and 4.2.
“Random forests” also called “random decision forests” are a category of “ensemble
learning” techniques for “classification, regression,” and additional jobs that operate
by constructing a collection of “decision trees” at the time of training. The “random
forest’s” outcome for categorization tasks is the category of the given trees. We return
the average estimate of the specific trees for regression tasks. Decision trees tend to
overfit their training set, which is corrected by random forests. “Random forests”
perform better than “decision trees” in majority of the instances, nevertheless they
are less precise than “gradient” enhanced trees. The performance of “random forest”
can be adversely impacted by the quality of the data [14].
Algorithm 2 shows the random forest classifier used for Parkinson disease
prediction.
20 K. A. Shastry et al.
Figure 3 demonstrates the working of the “random forest” classifier for detecting
Parkinson.
As shown in Fig. 3, the random forest works by sampling the recordings and
bagging the properties of the Parkinson voice data. Then, for each of the samples
and bagged features, unique decision trees are generated. The outputs from each of
the trees are then concatenated, and the final class is predicted using majority voting.
For regression problems, mean regression is utilised, whereas for classification tasks,
majority voting is utilised. Because this was a classification task, the final class was
determined by majority voting.
The proposed system involved running two different algorithms on a data set,
namely:
• Decision tree classifier
• Random forest
We experimented with different attributes and identified 14 attributes that would
constitute the ML model. We then used the Parselmouth library to extract those
attributes from the input audio file and transmit them to the backend API which is
linked to the Heroku-hosted ML model. This gives us our result which we display
on our website built in React.js. We used the axios library to make a post request and
submit our form data that includes the voice input of the user. We then wait for the
response to be received from the api, after which it is stored in a data variable using
state management in react, that is, further displayed to the user.
“Python 3.7”, “NumPy 1.19.5”, “Pandas 1.2.4”, Pycaret 2.3.1, and sklearn were
applied in this research. The hardware used for this work were 4 GB of “DDR4
RAM”, 512 GB of “Solid-State Drive (SSD)”, Intel Xeon Processor, NVIDIA Tesla
K80/any CUDA compatible GPU with a minimum of 3.0 compute capability.
We used over 100 recordings of Parkinson’s patients and non-patients to train our
model. Parkinson’s To achieve maximum accuracy, the training is based on a range of
characteristics included in the audio data. Parselmouth is an audio processing library
that we used. Praat is a voice processing software. Parselmouth not only gain access
to “Praat’s C/C++ code” but also provides faster access to the system’s information.
It also has a user interface that appears like any other Python library. The extracted
values from Parselmouth were then converted to a csv file for analysis in Jupyter
Notebook or Google Colab.
For analysis, we request that the user provide a file in.wav format. We use our
trained ML model to process it and offer the outcome. The model processes the
user’s voice input and then delivers the result back to the API, which is housed on
a cloud server (Heroku), where it is finally shown to the user. The data is sent to
the cloud server using a backend API implemented in Flask. We chose Heroku as
the cloud server because of the several advantages it offers developers, including a
large range of services, affordable price packages, and an optional free service for
less production-intensive applications. It enables us to link our app to a ready-to-
use backend API and, as a result, handle communication between the frontend and
backend.
Figure 4 shows the screenshot of the screen displayed to the user when he first
enters the website.
The website commences the prediction process using the ML model after the user
uploads a voice file, and the user is presented the analysis screen in Fig. 5.
22 K. A. Shastry et al.
Figure 5 depicts the results produced from the voice input provided by the user
after successful prediction and analysis. The result is extracted from the API hosted
on a cloud platform that manages communication between the frontend and backend.
Figure 6 illustrates the outcome of the application for the input voice file.
A decision tree classifier was the first model we implemented. With that model,
we were able to achieve a 62c/o accuracy. After that, we implemented a random forest
classifier-based model. With 30 trees, we were able to achieve a 90c/o accuracy rate.
The number of trees in a “random forest” classifier indicates the total number of
trees in the forest. Each tree receives the same input parameters and votes yes or
no separately (in binary classification). The final result is then computed using the
average of the votes. Because a large number of trees can cause bias in the findings,
this value needs to be fine-tuned. It is evident that we got a good boost in accuracy
by switching our model from decision tree classifier to a random forest classifier.
This is because a “random forest” comprises of several individual trees, and every
tree is built on a random instance of training information. In several cases, this yields
substantially higher accuracy than a single decision tree.
Machine Learning Framework for Prediction … 23
The graph of the number of trees in a random forest versus the accuracies obtained
for each number of trees is shown in Fig. 7.
The number of trees were varied from 1 to 400. The best accuracy of 93% was
obtained for 30 trees. Figure 8 shows the comparison of decision tree and random
forest classifiers for predicting the Parkinson disease on voice data.
The system was subjected to the following forms of testing:
• Unit Testing:
Unit testing is the first level of testing and usually comes at the beginning of testing
any application. This type of testing utilizes individual units or components of the
software to be tested. We isolated the major small units of the code implemented and
tested the functions, loop, or whether a statement in the program is working fine or
not. This helped us optimize the code further and use proper code across the entire
script. This allowed us to ultimately run two different algorithms on the model to
acquire a prediction based on 14 different input parameters.
• Integration Testing:
It is the subsequent stage of assessment and usually occurs once the “unit testing”
process is over. The aim of “integration testing” is to detect and uncover deficiencies
during the interaction among modules. It utilizes different units for assessment, and
these units are further combined and analyzed in an “integration testing”. Here, we
tested all the libraries that have been integrated into the project to verify whether
the libraries are able to communicate and provide the necessary desired output to
our code. We primarily utilised the parselMouth library for voice synthesis, and this
testing helped us to determine whether or not the several parameters supplied to the
library produced the appropriate result. We then created an API using Flask and used
it to detect Parkinson’s based on the input audio _le with the help of the parselMouth
library. And lastly, we deployed the application to the heroku cloud platform to
further check its integration with the front-end.
• Functional Testing:
Functional testing is the third level of testing and usually comes after the integration
testing process is over. The purpose of functional testing is to verify the functionality
of the entire application. This helps us to decide if the behavior of the application is
as expected according to our needs. Here, we tested the functionality of the project
by providing an audio file as input from the front-end website to the API that further
connected our application to the back end and supplied the audio input to detect
whether the person has Parkinson’s or not based on the accuracy provided from
Machine Learning Framework for Prediction … 25
running it against the algorithms in the model. If the person was detected to have
Parkinson’s, then the website displays the same and vice versa.
The ML models used in this work for predicting Parkinson’s disease turned out
to be effective up to an extent in showing results for people who might be having
Parkinson’s disease. Upon analysis, it was found that random forest showed the
highest accuracy compared to all the other models used. The results proved that
the machine learning model can be used in detection of the disease, as well as to
create an awareness among people. The ML model was then deployed on cloud
environment for better accessibility to the end users. In the future, the model can be
trained with a much greater amount of data to achieve higher performance. The work
can then be effectively used as a preliminary check for a person wanting a diagnosis
for Parkinson’s disease.
References
1. Mei J, Desrosiers C, Frasnelli J (2021) Machine learning for the diagnosis of Parkinson’s
disease: a review of literature. Front Aging Neurosci 13(184). https://doi.org/10.3389/fnagi.
2021.633752
2. Radhakrishnan DM, Goyal V (Mar–Apr 2018) Parkinson’s disease: a review. Neurol India.
66(Supplement):S26–S35. https://doi.org/10.4103/0028-3886.226451. PMID: 29503325.
3. Poewe W, Seppi K, Tanner C et al (2017) Parkinson disease. Nat Rev Dis Primers 3:17013.
https://doi.org/10.1038/nrdp.2017.13
4. Armstrong MJ, Okun MS (2020) Diagnosis and treatment of Parkinson disease: a review.
JAMA 323(6):548–560. https://doi.org/10.1001/jama.2019.22360
5. Dhall D, Kaur R, Juneja M (2020) Machine learning: a review of the algorithms and its appli-
cations. In: Singh P, Kar A, Singh Y, Kolekar M, Tanwar S (eds) Proceedings of ICRIC 2019.
Lecture notes in electrical engineering, vol 597. Springer, Cham. https://doi.org/10.1007/978-
3-030-29407-6_5
6. Pramanik A, Sarker A (2021) Parkinson’s disease detection from voice and speech data using
machine learning. In: Uddin MS, Bansal JC (eds) Proceedings of international joint confer-
ence on advances in computational intelligence. Algorithms for intelligent systems. Springer,
Singapore. https://doi.org/10.1007/978-981-16-0586-4_36
7. Naranjo L, Pérez CJ, Martín J, Campos-Roca Y (2017) A two-stage variable selection and
classification approach for Parkinson’s disease detection by using voice recording replica-
tions. Comput Methods Programs Biomed 142:147–156. https://doi.org/10.1016/j.cmpb.2017.
02.019 Epub 2017 Feb 22 PMID: 28325442
8. Sajal MSR, Ehsan MT, Vaidyanathan R et al (2020) Telemonitoring Parkinson’s disease using
machine learning by combining tremor and voice analysis. Brain Inf 7:12. https://doi.org/10.
1186/s40708-020-00113-1
9. Radha N, Sachin Madhavan RM, Sameera holy S (2021) Parkinson’s disease detection using
machine learning techniques. Rev Argent de Clínica Psicológica XXX:543–552. https://doi.
org/10.24205/03276716.2020.4055
26 K. A. Shastry et al.
10. Jaichandran R, Leelavathy S, Usha Kiruthika S, Krishna G, Mathew MJ, Baiju J (2022) Machine
learning technique based Parkinson’s disease detection from spiral and voice inputs. Eur J Mol
Clin Med 7(4):2815–2820
11. Parkinsons data set, UCI, Machine learning repository. https://archive.ics.uci.edu/ml/datasets/
parkinsons
12. Teixeira J, Gonçalves A (2014) Accuracy of jitter and shimmer measurements. Procedia Technol
16:1190–1199. https://doi.org/10.1016/j.protcy.2014.10.134
13. Pramanik M, Pradhan R, Nandy P, Bhoi AK, Barsocchi P (2021) Machine learning methods
with decision forests for Parkinson’s detection. Appl Sci 11:581. https://doi.org/10.3390/app
11020581
14. Açıcı K, Erdaş Ç, Aşuroğlu T, Toprak MK, Erdem H, Oğul H (2017) A random forest method
to detect Parkinson’s disease via gait analysis, 609–619. https://doi.org/10.1007/978-3-319-
65172-9_51
A Comparative Analysis to Measure
Scholastic Success of Students Using
Data Science Methods
1 Introduction
These days, there are tons of examination and studies with the purpose of pathway
with the appearance about anticipating understudies’ conduct, amid additional
connected subjects about revenue within the instructive region. Lynn and Emanuel [5]
released a review paper in which the author discussed machine learning algorithms
such as naive bayes, support vector machines, decision trees, neural network and
k-nearest neighbor. We expanded our work in this paper to include a few more algo-
rithms, such as clustering, association rule mining and regression. Noticing the worth
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 27
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_3
28 S. Malik et al.
Table 2 In this survey, consideration and rejection paradigms for collecting literature were used
Paradigms Consideration Rejection
Period Proclaimed after 2000–2020 Proclaimed before 2000
Subject of research Studies that use a direct link to Studies that do not use a direct
the learning outcomes to link to the learning outcomes
predict student success to predict student success
Kind of record and publication • Refereed, invited or any • Not refereed, invited or any
locus other conference papers other conference papers
presented at an academic or presented at an academic or
professional conference professional conference
• Scholarly books,
monographs and chapters
• Journal publications
• Course materials
Accessibility and availability Open access and full articles Open access and full articles
available available
Vocabulary Written in English Written in other language
Appropriateness • Experiential papers • Unapplied papers
• Participating in learning • Not participating in other
conditions conditions than learning
• Fundamental design is • Fundamental design is not
investigating instructive investigating instructive
setting setting
This examination has followed the suggestion given by [7, 8] to deciding example,
phenomenon of interest, design, evaluation and research type (SPIDER) as demon-
strated in Table 1. The objective here is to distinguish the holes or focal issues with
a specific spotlight on student’s execution investigation and forecast related writing.
This nonpartisan review is definitely not a comprehensive one; rather, it addresses
the important writing urgent to the exploration addresses outlined [8].
Once the taxonomy is defined, we have also received the regular method-
ology for looking through significant writing too. In this methodology, we search
through the Google Scholar which assists us in arranging a caution with search
strings, “educational data mining and student performance,” “learning analytics
and student performance,” “student performance and teaching quality,” “student
performance and domain knowledge,” “Predicting students’ performance,” “Pre-
dicting algorithm students,” “Recommender systems prediction students,” “Artificial
neural network prediction students,” “Algorithms analytics students and Students
analytics prediction.” Because of this setup, Google Scholar routinely sends a
rundown of as of late distributed pertinent papers to our email. As such, we iden-
tified and searched seven major online bibliographic databases, which contain
engineering and science publications. These databases include the ACM digital
library, IEEE Xplore, Google Scholar, Science Direct, Scopus, Springer and Web
30 S. Malik et al.
grouping. There are a few calculations under characterization chore with the purpose
have been put into anticipate understudy execution. Amid calculations utilized are
decision tree (DT), support vector machine (SVM), neural networks (NN), K-nearest
neighbor (KNN) and naive Bayes (NB). Then, each particular utilized data mining
strategies assembled along calculations during anticipating understudy execution
prospective depicted in the course of following segment.
Decision trees are the most popular order model. The decision tree starts in the midst
of a root center, from which customers can decide. Customers can circularly part per
center point from this center point dependent on the decision tree learning computa-
tion. The closing stage consequence is a decision tree, with apiece branch tending to
a possible choice circumstance and its result [13]. The vast majority of scientists have
utilized this method on account of its effortlessness and understandability to reveal
little or huge information structure and anticipate the worth [14, 10, 15]. Romero and
Ventura [7] said that the choice tree models are handily perceived in light of their
32 S. Malik et al.
thinking cycle and can be straightforwardly changed over in the direction of deck
based on IF–THEN principles [16]. As demonstrated in table, there are around eight
papers that have utilized decision tree as their strategy to assess understudy execu-
tion. Instances of past examinations utilizing decision tree strategy are foreseeing
abandon highlights about understudy information intended for scholarly execution
[14], anticipating third semester execution of MCA understudies [17] in addition to
furthermore foreseeing specific reasonable vocation for an understudy through their
personal conduct standards [18]. The understudy’s execution assessment depends
on highlights extricated from logged information in a training online framework.
The instances of dataset are understudy last grades [19], last CGPA [3] and marks
acquired specifically colloquium [16]. Such datasets be considered also broke down
to discover the primary credits or factors that may influence the understudy execu-
tion [15, 20]. At that point, the appropriate information mining calculation will
be explored to foresee understudy execution [21]. Chatterjee and Hadi [22] have
thought about the characterization strategies for foreseeing understudy execution in
their examination [12]. Helal et al. [23] researched the exactness of grouping models
to anticipate student’s movement in tertiary training [24].
Aforementioned classifier uses probabilities to predict class chipping in, such as the
probability that a given example belongs to a particular class. There have been a
few Bayes calculations developed, with Bayesian and naive Bayes being the two
most common. The naive Bayes calculation assumes that the effect of a quality on a
A Comparative Analysis to Measure Scholastic Success of Students … 33
given class is independent of the estimates of different people ascribes [29]. Naive
Bayes calculation is likewise a possibility for specialists to make a forecast. In the
midst of thirty papers, seven papers were presented that have utilized naive Bayes
calculations to appraise understudy execution. Specific target about every one of
each seven papers holds up toward tracking down specific best forecast method in
foreseeing understudy execution via building examinations [10, 12, 21, 30]. Their
exploration showed that naive Bayes has utilized all of characteristics contained in
the information. At that point, it examined every single one of them to show the
significance and independency of each ascribes [10].
While class confines remain nonlinear but there is insufficient knowledge on the
road to be trained composite nonlinear models, support vector machines (SVM) are
the best technique. SVM focuses solely on class boundaries; focuses that can be
grouped in any way are skipped. The objective is to find the “thickest hyperplane,”
which divides the groups [31]. SVM is a managed learning technique with the aim
of pattern recognition. Three articles have used SVM as a tool to predict the under-
study’s presentation. Hamalainen and Vinni [32] chose support vector machine as
their forecasting tool because it was well suited to small datasets [7]. SVM has a good
speculation potential and is faster than other techniques, according to [33]. Polyzou
and Karypis [31] Then, there is [32]’s analysis. Helal et al. [23] showed that the SVM
technique yielded the highest expectation precision in identifying understudies on
the verge of fizzling [24]. The following table depicts the aftereffects of anticipation.
34 S. Malik et al.
Preeminent association rule mining is a specific one, notable and famous informa-
tion digging strategies utilized broadly for instructive purposes [34]. It is advanta-
geous surely for understanding the educational parts of realizing which thusly assist
the scholastic executive with outlining arrangements [35, 36]. Presently, amount of
studies subsist which utilize the affiliation rule digging for investigating understudy
execution [24, 34, 35]. To acquire a bunch of significant guidelines, it is crucial to
pre-decide the insignificant help and certainty. Be that as it may, it is hard for an
instructor to choose these two info boundaries ahead of time. Moreover, the quantity
of got rules might be too high at times, and the greater part of them are non-fascinating
and with low understandability. A joined proportion of aggregate intriguing quality
may likewise be successful in this unique situation [35].
3.7 Regression
A condition between the reliant variable and one or different autonomous factors
characterizes the relationship in relapse examination. In conjecture backdrop, it is
likewise utilized intended for assessing a certain significance about personage indica-
tors and comprehends specific relationship between capricious [37]. The inescapable
use of relapse is obvious in instructive information mining writing. It is in fact every
now and again utilized in EDM reads for building up or discrediting the effect of
educating quality. Many exploration works have investigated the relapse examination
for understudy execution examination [38–40].
3.8 Clustering
At the point when researchers utilize a computation to isolate the parts of a dataset
with two conditions (for instance, positive and negative), they can create a two-class
chaos structure, which tends to the quantity of segments that were effectively antic-
ipated and the number that was mistakenly characterized [4, 6, 11]. True positives
(TP) are those examples of positive information that the estimation effectively distin-
guished as obvious, while false negatives are those mistakenly marked as false nega-
tive (FN). Then again, true negatives (TN) are negative parts that are precisely named
all things considered, while false positives are those that are mistakenly expected as
false positives (FP).
A collector working trademark (CWT) twist [58, 56] or an accuracy review (AR)
twist [6, 56] can be made utilizing disarray structures. Specialists can now, finally,
be trusted. To assess the introduction of the gathering, compute the area under curve
(AUC) of the CWT twist or the AR twist. Previously, specialists envisioned a couple
of disorder lattice rates [13, 18, 56]. There are just two disorder framework orders
in some of them.
1. Sensitivity (Eq. 1) is depicted as the level of fruitful understudies who were
accurately named “effective” among all effective students [56]. It centers on
bringing down FN levels.
2. For all non-fruitful understudies, explicitness (Eq. 2) is the extent of non-fruitful
understudies who are wrongly delegated effective students [56]. To perceive
negative results.
3. Precision (Eq. 3) and the extent of effective understudies accurately delegated
“fruitful” for all understudies anticipated as “fruitful” by the algorithm [56]. It
is centered on diminishing FP.
4. For all non-effective understudies, the negative prescient worth (Eq. 4) addresses
the extent of non-effective understudies who are wrongly delegated fruitful
students [56]. To center bringing down FP.
These four rates are referred to as simple confusion matrix rates [56].
1. Level of true positives:
TP
(1)
TP + FN
TN
(2)
TN + FP
TP
(3)
TP + FP
36 S. Malik et al.
TN
(4)
TN + FN
TP + TN
(5)
TP + TN + FP + FN
attributes with the aim of concern forecasting results be evident, various factors might
be fragile as well as intricate toward to recognize and classify exclusive of applying
a more refined analysis. Consequently, utilizing recent data mining methods like
clustering, regression and neural networks might precisely forecast student concert
(pass/fail) contrast with new methods. The results from accurate forecast are capable
to assist institutions in the direction to attain excellence in education.
6 Result
Understudy execution is a basic factor that should be completely analyzed if the objec-
tive of preparing in higher instructive foundations and at all degrees of schooling is
to be met. This is because of the way that anticipating understudy achievement helps
institutional pioneers in building up their instructional constructions. This exami-
nation pointed toward looking into the generally utilized arrangement methods for
foreseeing understudy execution. Among the broadly utilized strategies to antici-
pate understudy execution, the clustering strategy end up being the best technique
for foreseeing understudy execution contrasted with association rule mining, naive
Bayes regression, support vector machines and k-nearest neighbor. In light of its
convenience and capacity to uncover little or enormous data structures and anticipate
values, it gives high exactness. Overall, the findings of this audit will assist instruc-
tors in intentionally monitoring students’ success by using the least demanding and
most precise technique to predict understudy execution. The creators accept that
utilizing the best forecast technique assists instructors with gathering understudies’
exhibition, which permits early intercessions that may acquire an increment brilliant
scholastic execution rate, in this way advancing training with great achievers. Makers
will utilize the data from this review as a beginning stage for other comparative exam-
inations in the educational information mining area later on. It is likewise pivotal to
sort out the most ideal approach to build the accuracy of different methods.
A Comparative Analysis to Measure Scholastic Success of Students … 39
References
1. Gedeon TD, Turner S (1993) Explaining student grades predicted by a neural network. In:
Proceedings of 1993 international joint conference on neural networks, IJCNN’93-Nagoya,
vol 1. IEEE, pp 609–612
2. Aghabozorgi S, Mahroeian H, Dutt A, Wah TY, Herawan T (2014) An approachable analytical
study on big educational data mining. In: International conference on computational science
and its applications, Springer, pp 721–737
3. Asif R, Merceron A, Ali SA, Haider NG (2017) Analyzing undergraduate students’ perfor-
mance using educational data mining. Comput Educ 113:177–194
4. Baker RS (2014) Educational data mining: an advance for intelligent systems in education.
IEEE Intell Syst 29(3):78–82
5. Lynn ND, Emanuel AWR (2021) Using data mining techniques to predict students’ perfor-
mance: a review. In: IOP Conference series: materials science and engineering
6. Khanna L, Singh SN, Alam M (2016) Educational data mining and its role in determining factors
affecting student’s academic performance: a systematic review. In: 2016 1st India international
conference on information processing (IICIP), IEEE, pp 1–7
7. Romero C, Ventura S (2010) Educational data mining: a review of the state of the art. IEEE
Trans Syst Man Cybern Part C Appl Rev 40(6):601–618
8. Khan A, Ghosh SK (2020) Student performance analysis and prediction in classroom learning:
a review of educational data mining studies. Educ Inf Technol
9. Lemay DJ, Baek C, Doleck T (2021) Comparison of learning analytics and educational data
mining: a topic modeling approach. Comput Educ Artif Intell
10. Wook M, Yusof ZM, Nazri MZA (2016) Educational data mining acceptance among
undergraduate students. Educ Inf Technol 22(3):1195–1216
11. Pena-Ayala A (2014) Educational data mining: a survey and a data mining-based analysis of
recent works. Expert Syst Appl 41(4):1432–1462
12. Koedinger KR, D’Mello S, McLaughlin EA, Pardos ZA, Rose CP (2015) Data mining and ´
education. Wiley Interdisc Rev Cogn Sci 6(4):333–353
13. Kumar DA, Selvam RP, Kumar KS (2018) Review on prediction algorithms in educational data
mining. Int J Pure Appl Math 118(8):531–537
14. Ogor EN (2007) Student academic performance monitoring and evaluation using data mining
techniques. In: Electronics, robotics and automotive mechanics conference, IEEE, pp 354–359
15. Dutt A, Ismail MA, Herawan T (2017) A systematic review on educational data mining. IEEE
Access 5:15991–16005
16. Sweeney M, Rangwala H, Lester J, Johri A (2016) Next-term student performance prediction:
a recommender systems approach. J Educ Data Min 8(1):22–51
17. Wang Y, Ostrow K, Adjei S, Heffernan N (2016) The opportunity count model: a flexible
approach to modeling student performance. In: Proceedings of the third (2016) ACM conference
on learning@ Scale, ACM, pp 113–116
18. Hu X, Cheong CWL, Ding W, Woo M (2017) A systematic review of studies on predicting
student learning outcomes using learning analytics. In: Proceedings of the seventh international
learning analytics & knowledge conference, ACM, pp 528–529
19. Zollanvari A, Kizilirmak RC, Kho YH, Hernandez-Torrano D (2017) Predicting students’ GPA
´ and developing intervention strategies based on self-regulatory learning behaviors. IEEE
Access 5:23792–23802
20. Bendikson L, Hattie J, Robinson V (2011) Identifying the comparative academic performance
of secondary schools. J Educ Adm 49(4):433–449
21. Al-Obeidat F, Tubaishat A, Dillon A, Shah B (2017) Analyzing students’ performance using
multicriteria classification. Clust Comput 21(1):623–632
22. Chatterjee S, Hadi AS (2015) Regression analysis by example. Wiley, New York
23. Helal S, Li J, Liu L, Ebrahimie E, Dawson S, Murray DJ (2018) Identifying key factors of
student academic performance by subgroup discovery. Int J Data Sci Analytics 7(3):227–245
40 S. Malik et al.
24. Jishan ST, Rashu RI, Haque N, Rahman RM (2015) Improving accuracy of students’ final grade
prediction model using optimal equal width binning and synthetic minority over-sampling
technique. Decis Analytics 2(1):1
25. Natek S, Zwilling M (2014) Student data mining solution-knowledge management system
related to higher education institutions. Expert Syst Appl 41(14):6400–6407
26. Christian TM, Ayub M (2014) Exploration of classification using NBTree for predicting
students’ performance. In: 2014 international conference on data and software engineering
(ICODSE), IEEE, pp 1–6
27. Quadri MMN, Kalyankar NV (2010) Drop out feature of student data for academic performance
using decision tree techniques. Global J Comput Sci Technol 10(2)
28. Backenkohler M, Wolf V (2017). Student performance prediction and optimal course selection:
an ¨ MDP approach. In: International conference on software engineering and formal methods,
Springer, pp 40–47
29. O’Connell KA, Wostl E, Crosslin M, Berry TL, Grover JP (2018) Student ability best predicts
final grade in a college algebra course. J Learn Analytics 5(3):167–181
30. Kumar M, Singh AJ, Handa D (2017) Literature survey on student’s performance prediction
in education using data mining techniques. Int J Edu Manage Eng 6:40–49
31. Chaturvedi R, Ezeife CI (2017) Predicting student performance in an ITS using task-driven
features. In: 2017 IEEE international conference on computer and information technology
(CIT), IEEE, pp 168–175
32. Hamalainen W, Vinni M (2006) Comparison of machine learning methods for intelligent
tutoring systems. In: Intelligent tutoring systems, Springer, pp 525–534
33. Polyzou A, Karypis G (2019) Feature extraction for next-term prediction of poor student
performance. IEEE Trans Learn Technol
34. Damasevicius R (2010) Analysis of academic results for informatics course improvement using
association rule mining. In: Information systems development, Springer, Berlin, pp 357–363
35. Angeli C, Howard S, Ma J, Yang J, Kirschner PA (2017) Data mining in educational technology
classroom research: can it make a contribution? Comput Educ 113:226–242
36. Adjei SA, Botelho AF, Heffernan NT (2016) Predicting student performance on post-requisite
skills using prerequisite skill data: an alternative method for refining prerequisite skill struc-
tures. In: Proceedings of the sixth international conference on learning analytics & knowledge,
ACM, pp 469–473
37. Goos M, Salomons A (2016) Measuring teaching quality in higher education: assessing
selection bias in course evaluations. Res High Educ 58(4):341–364
38. Chen W, Brinton CG, Cao D, Mason-singh A, Lu C, Chiang M (2018) Early detection prediction
of learning outcomes in online short-courses via learning behaviors. IEEE Trans Learn Technol
12(1):44–58
39. Ustunluoglu E (2016) Teaching quality matters in higher education: a case study from Turkey
and Slovakia. Teachers Teaching 23(3):367–382
40. Kesavaraj G, Sukumaran S (2019) A study on classification techniques in data mining. In: 2019
11th International conference on computing, communications and networking technologies
(ICCCNT), IEEE, pp 1–7
41. She HC, Cheng MT, Li TW, Wang CY, Chiu HT, Lee PZ et al (2012) Web-based undergraduate
chemistry problem-solving: the interplay of task performance, domain knowledge and web-
searching strategies. Comput Educ 59(2):750–761
42. Osmanbegovic E, Suljic M Data mining approach for predicting student performance. Econ
Rev 10(1)
43. Meier Y, Xu J, Atan O, van der Schaar M (2016) Predicting grades. IEEE Trans Signal Process
64(4):959–972
44. Quille K, Bergin S (2018) Programming: predicting student success early in CS1. A re-
validation and replication study. In: Proceedings of the 23rd annual ACM conference on
innovation and technology in computer science education, ACM, pp 15–20
45. Yu L, Lee C, Pan H, Chou C, Chao P, Chen Z et al (2018) Improving early prediction of
academic failure using sentiment analysis on self evaluated comments. J Comput Assist Learn
34(4):358–365
A Comparative Analysis to Measure Scholastic Success of Students … 41
46. Chanlekha H, Niramitranon J (2019). Student performance prediction model for early-
identification of at-risk students in traditional classroom settings. In: Proceedings of the
10th international conference on management of digital ecosystems—MEDES ’19, ACM,
pp 239–245
47. Yaacob WFW, Nasir SAM, Yaacob WFW, Sobri NM (2019) Supervised data mining approach
for predicting student performance. Indonesian J Electr Eng Comput Sci 16(3):1584–1592.
ISSN: 2502-4752. https://doi.org/10.11591/ijeecs.v16.i3.pp1584-1592
48. Salal YK, Abdullaev SM, Kumar M (2019) Educational data mining: student performance
prediction in academic. Int J Eng Adv Technol (IJEAT) 8(4C). ISSN: 2249-8958
49. Khan A, Ghosh SK (2020) Student performance analysis and prediction in compunded
schooling: a review of educational data mining studies. Educ Inf Technol 26:205–240. https://
doi.org/10.1007/s10639-020-10230-3
50. Bin Mat U, Buniyamin N, Arsad PM, Kassim R (2013) An overview of using academic analytics
to predict and improve students’ achievement: a proposed proactive intelligent intervention.
In: Engineering education (ICEED) 2013 IEEE 5th conference on, IEEE, pp 126–130
51. Ibrahim, Z, Rusli D (2007) Predicting students academic performance: comparing artificial
neural network, decision tree and linear regression. In: 21st Annual SAS Malaysia forum
52. Romero C, Ventura S (2010) Educational data mining: a review of the state of the art. Trans
Sys Man Cyber Part C 40(6):601–618. https://doi.org/10.1109/TSMCC.2010.2053532.doi:10.
1109/TSMCC.2010.2053532
53. Quadri MM, Kalyankar N Drop out feature of student data for academic performance using
decision tree techniques. Global J Comput Sci Technol 10(2)
54. Sukumar Letchuman MW Mac Roper, Pragmatic cost estimation for web applications
55. Angeline DMD (2013) Association rule generation for student performance analysis using
Apriori algorithm. SIJ Trans Comput Sci Eng Appl (CSEA) 1(1):12–16
56. Chicco D, Tötsch N, Jurman G (2021) The matthews correlation coefficient (MCC) is more reli-
able than balanced accuracy, bookmaker informedness, and markedness in two-class confusion
matrix evaluation. BioData Min
57. Kamley S, Jaloree S, Thakur RS (2016) A review and performance prediction of students’
using association rule mining based approach. Data Min Knowl Eng 8(8):252–259
58. Livieris IE, Drakopoulou K, Mikropoulos TA, Tampakas V, Pintelas P (2018) An ensemble-
based semi-supervised approach for predicting students’ performance. In: Research on
e-learning and ICT in education, Springer, pp 25–42
59. Mimis M, El Hajji M, Es-saady Y, Oueld Guejdi A, Douzi H, Mammass D (2019) A framework
for smart academic guidance using educational data mining. Educ Inf Technol 24(2):1379–1393
Gesture-Controlled Speech Assist Device
for the Verbally Disabled
1 Introduction
Sign language is a method of communication that makes use of the user’s move-
ments. Communication between deaf and hearing persons has a substantial disad-
vantage when contrast to communication among blind and ancient visual people
[1]. The blind communicate openly in languages, whereas the deaf has their own
set of symbols that they refer to as language. As per an assessment undertaken by
the Government of India in 2011, India’s census of dumb people is estimated to
be out with 20.02 lakhs, with 56 and 46% of the population suffering from speech
and hearing abnormalities, correspondingly. According to survey data, 1.33 billion
people encounter communication issues in everyday situations, with sign language
being utilized to express messages [2]. Every typical human being’s primary mode
of communication is speech. However, those who are speech impaired uses sign
languages. The majority of people are unable to comprehend sign language. As a
result, it becomes difficult for a person with speech impairment to convey his or her
opinions and beliefs. This creates a barrier to communication between the deaf and
the rest of society. A device or tool that can convert hand motions into auditory words
is required to tackle this problem. Several devices have been developed, however, all
of them have limitations with mobility, size, and cost. The primary goal of the hand
sign translation system is to recognize and communicate via hand gestures [3].
The verbally-disabled people often find it difficult to convey what they want to
say. To overcome this problem, they make use of sign language. Sign language is a
language that uses manual communication to convey meaning. They employ hand
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 43
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_4
44 S. V. Kulkarni et al.
The overall system block diagram is shown in Fig. 1. System facilitates two-way
interaction between the disabled and the general population. Transmitter and receiver
are the two components of the system. The flex sensor plays the major role [3, 10].
The glove is fitted with flex sensors along the length of each finger and the thumb.
The flex sensors give output in the form of voltage variation that varies with degree
of bend. This flex sensor output is given to the ADC channels of microcontroller. It
processes the signals and performs analog-to-digital signal conversion. Further, the
processed data is sent in a wireless manner to the receiver section. In this section, the
Figure 3 depicts the device’s design, while Fig. 4 depicts the arrangement of the
various electronic components utilized in the device’s design, such as the main
controller, sensors, and output modules. The output of the flex sensor is given as
analog input to the Arduino Nano. The Arduino Nano is a small, complete, and
breadboard-friendly microcontroller board based on the ATmega328P. The input
from the flex sensors is then converted into discrete digital values by the Arduino
Nano. Arduino will compare actions assigned to different gestures against different
values of input. Corresponding output is sent via 433 MHz radio frequency (RF)
module which is a (usually) small electronic device used to transmit and/or receive
radio signals between two devices. In an embedded system, it is often desirable to
communicate with another device wirelessly.
Signal sent by input side 433 MHz RF module transmitter is received by the RF
module receiver. This value is processed in the Arduino, where the corresponding
phrases are displayed using 16 * 2 LCD display unit with required conversions
phrases assigned to the corresponding values. If the condition matches, then the
phrases are sent wirelessly to the output side microcontroller. If the condition fails,
i.e., there is no match for the certain set of input values from the flex sensors, in that
case, LCD will not display anything. In this manner, different gestures are recognized,
and phrases are continuously sent to the LCD display.
On the output side, the receiver and liquid crystal display are interfaced with
Arduino Nano and are mounted on a stripboard as shown in Fig. 4. A 9 V battery is
mounted on the stripboard for power supply to Arduino. The flowchart of software
implementation is shown in Fig. 5.
Messages assigned to gesture and flowchart
const char *m1 = " HI HOW ARE YOU?"; const char *m2 =
"PEACE"; const char *m3 = "WHAT?"; const char *m4 =
"SUPER"; const char *m5 = " SHOULD DO SOMETHING ABOUT
IT ";
Gesture-Controlled Speech Assist Device for the Verbally Disabled 47
Fig. 3 Final model of the glove a top view, b rear view of the stripboard
To detect finger movements, flex sensors were used in the design. It is made up of
five sensors that have been arranged in a hand glove to make them more comfortable
48 S. V. Kulkarni et al.
Fig. 5 Software
implementation flowchart
to use. When bending, the resistance value of the flex sensors changes. A constant-
value resistor connects one side of the voltage divider to the other. The Arduino
detects the voltage difference as the sensors bend and orders the servos to move
proportionally. On the robotic hand’s side, a 12 V external power supply is utilized.
The gesture-controlled speech assistance device is successfully built. The necessary
hardware connections to the device are checked and tested. The Arduino Nano both
on input and output side is configured in their platforms to achieve desired task of
gesture recognition and output display, respectively. All the flex sensors are tested
and mounted on the glove using synthetic glue. The LCD display is checked and
tested with required connection to the microcontroller. The RF transmitter/receiver
pair is installed, and communication between input side and output side is checked.
The recognition of gestures by movement of fingers is achieved as shown in Fig. 6.
The transfer of data in the form of phrases is achieved between the input and output
side. The display of phrases on output side LCD is controlled by input side glove
gestures. At output side, phrases are displayed on the LCD which has a functioning
backlight to facilitate reading in poor lighting conditions shown in Fig. 7. The key
advantages of the device are: It is wireless, portable, light in weight, easy to handle,
multilingual.
Gesture-Controlled Speech Assist Device for the Verbally Disabled 49
5 Conclusion
For the vocally disabled, the gesture-controlled voice aid technology provides a first
step into communication. This device could be part of a larger network of devices that
collaborate to break down communication barriers. We can improve the accuracy of
gesture detection while also expanding the number of gestures that can be stored by
adding more sensors such as the accelerometer and gyroscope. This glove can also
be used in conjunction with speakers to create a computerized assistant that acts as a
virtual voice and reads out messages from the user. The use of such gloves is currently
limited, but various efforts are in the works to make them more economical and simple
to use. With modern technologies such as artificial intelligence and machine learning
entering the scene, incorporating these into a speech aid device opens up a world of
possibilities. To put it another way, the glove can turn into a complete 360° helper for
the user by connecting to the Internet and executing various IOT-based operations,
such as turning on and off electronic gadgets in a smart home using gestures. To
expand its capabilities, this glove can be linked to a smartphone via apps. Brain
mapping and artificial intelligence are two other technologies being developed now
that will have a significant impact. The ultimate objective of this technology will be
achieved when disabled persons may fully communicate with the rest of the world by
using such gadgets to entirely overcome their disabilities. This would also provide
disabled people access to a wider range of job opportunities that they may not have
access to now. A highly precise, cost-effective, and self-contained glove for deaf
and dumb persons were devised to serve as a communication bridge. Their sign
language gestures can be converted into speech using the glove. Smart Glove is a
gesture-to-phrase translator.
References
1. Oo HM, Tun KT, Thant ML (2019) Deaf sign language using automatic hand gesture robot
based on microcontroller system. Int J Trend Sci Res Dev 3:2132–2136
2. Telluri P, Manam S, Somarouthu S, Oli JM, Ramesh C (2020, July) Low cost flex powered
gesture detection system and its applications. In: 2020 Second international conference on
inventive research in computing applications (ICIRCA), IEEE, pp 1128–1131
3. Nagpal A, Singha K, Gouri R, Noor A, Bagwari A (2020, September) Hand sign transla-
tion to audio message and text message: a device. In: 2020 12th International conference on
computational intelligence and communication networks (CICN), IEEE, pp 243–245
4. Flores MBH, Siloy CMB, Oppus C, Agustin L (2014, November) User-oriented finger-gesture
glove controller with hand movement virtualization using flex sensors and a digital accelerom-
eter. In: 2014 International conference on humanoid, nanotechnology, information technology,
communication and control, environment and management (HNICEM), IEEE, pp 1–4
5. Dhepekar P, Adhav YG (2016, September) Wireless robotic hand for remote operations using
flex sensor. In: 2016 International conference on automatic control and dynamic optimization
techniques (ICACDOT), IEEE, pp 114–118
6. Padmanabhan V, Sornalatha M (2014) Hand gesture recognition and voice conversion system
for dumb people. Int J Sci Eng Res 5(5):427
Gesture-Controlled Speech Assist Device for the Verbally Disabled 51
7. Manikandan K, Patidar A, Walia P, Roy AB (2018) Hand gesture detection and conversion to
speech and text. arXiv preprint arXiv:1811.11997
8. Yamunarani T, Kanimozhi G (2018) Hand gesture recognition system for disabled people using
arduino. Int J Adv Res Innovative Ideas Educ (IJARIIE) 4(2):3894–3900
9. Nagpal A, Singha K, Gouri R, Noor A, Bagwari A, Qamarma S (2020, October) Helping
hand device for speech impaired people. In: 2020 Global conference on wireless and optical
technologies (GCWOT), IEEE, pp 1–4
10. Poornima N, Yaji A, Achuth M, Dsilva AM, Chethana SR (2021, May) Review on text and
speech conversion techniques based on hand gesture. In: 2021 5th International conference on
intelligent computing and control systems (ICICCS), IEEE, pp 1682–1689
11. Speak H (2014) Why is communication important to human life. Retrieved From
12. Sunitha KA, Saraswathi PA, Aarthi M, Jayapriya K, Lingam S (2016) Deaf mute communica-
tion interpreter-a review. Int J Appl Eng Res 11:290–296
13. Sturman DJ, Zeltzer D (1994) A survey of glove-based input. IEEE Comput Graphics Appl
14(1):30–39
14. Kunjumon J, Megalingam RK (2019, November) Hand gesture recognition system for trans-
lating indian sign language into text and speech. In: 2019 International Conference on Smart
Systems and Inventive Technology (ICSSIT), IEEE, pp 14–18
15. Verdadero MS, Cruz JCD (2019) An assistive hand glove for hearing and speech impaired
persons. In: 2019 IEEE 11th International conference on humanoid, nanotechnology, infor-
mation technology, communication and control, environment, and management (HNICEM),
IEEE, pp 1–6
16. https://www.dreamstime.com/illustration/american-sign-language.html
Highly Classified with Two Factor
Authentication Encrypted Secured Mail
1 Introduction
Nowadays, security is that the most concerning aspect in any IT-related transac-
tion. Considering this thing in mind, we studied the present system of “mail server”
services. One in all the most drawbacks of the present system is that it is only one
level or unary international intelligence agency, i.e., specific username and password
which give mail limited security only. The matter is solved by giving them extra secu-
rity that is by giving secondary to the mail. Here, we have also applied the concept of
priority queuing of the mail where the inbox may be sorted within the user-specified
format consistent with the users given priority like date, alphabetically Email ID, and
secured mail [1]. The objective of our project is to produce second-level security to
the present email system. This makes use of the secondary key to every mail which
is vital. Each mail that is sent by the sender is going to be having a key of its own
which can be sent to the transportable of the receiver. We have got also tried to form
the email system a reliable and user-friendly communication mode [2].
2 Literature Review
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 53
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_5
54 N. H. Prasad et al.
the use of the computer) on the time you send your message. This system is similar
to sending and receiving a letter. At the start, email messages have been constrained
to easy text, but now, many structures can cope with more complicated formats,
like graphics and phrase-processed files. While mail is obtained on a ADPS, it is
normally saved in an electronic mailbox for the recipient to examine later. Electronic
mailboxes are usually special files on a computer that can be accessed using diverse
commands. Each person normally has a non-public mailbox.
3 Proposed System
Machine reminiscence like RAM shops all of the active logs, disk buffers, and asso-
ciated records. Moreover, it shops all the transactions which might be being presently
done. If such keep crashes unexpectedly, it would get rid of all the logs and active
copies of the database. It makes recuperation almost impossible, as the entirety, it
really is required to get better the info is lost. Following strategies can also be adopted
just in case of lack of computer storage [3]—we will have checkpoints at more than
one ranges to keep plenty of the contents of the database periodically. A nation of
the lively database within the risky reminiscence might be periodically dumped onto
strong garage, which can also contain logs and active transactions and buffer blocks.
While the machine recovers from a failure, it can repair the most recent sell off. It
could maintain a re do-listing and an undo-listing as checkpoints. It could recover the
device by using consulting undo-redo lists to revive the kingdom of all transactions
up to the closing checkpoint [4].
In all mail system, they follow the procedure to test the mail using the existing
security tool before sending to mail server. The email header will be taken and same
will be matched with label in security tool, if it matched further process continues,
if it not the mail will reported to higher authority to take further action [5].
Buffers are allocated for every inbox mail, and details regarding are noted such
as sender id, date, and time. If the centralized servers are attacked, the confidential
message and the private information will be leaked to the attacker [6]. Buffers then
search for packet ids, if the details stored at storage garage matches the PID, then it
will be allowed to read.
Due to the untrustworthy nodes participating in the blockchain network, the
blockchain-based system needs to adopt a suitable consensus algorithm to achieve
the consistency of the decentralized distributed databases with the node competition
[11].
4 Architecture
5 Methodology
Secured mail is that the email system that is developed to present higher security for
confidential emails. The user styles of this mail system are
• The sender
• The receiver
To provide an affordable level of privacy, all the routes within the email pathway
and every one connections between them must be secured. Here, the second level
of mail security means that we are encrypting the mail employing a unique key as
a method of security within the second stage to shield the mail contents whether or
not the user loses ownership of his/her account [7]. Here, we are implementing the
secured mail system which provides higher security to the mails. The sender sends
the confidential mail, and it is received by the receiver when the receiver clicks to
open the closed mail he are asked for the key, at the identical time, he gets a message
with the OTP, this key needs to be entered and checks for validity, then the receiver
can easily read the mail. This secrets unique and it gets a refresh [8] (Table 1).
Interface uses HTTP and GET method to send a account ID and encrypted pass-
word to server. Login pages make use of HTTP where hacker can obtain password
by eavesdropping in the network. Even though login page 163 uses HTTP, but the
account credentials are all sent through secure socket layer to server, in this stage,
hacker can identify the account ID followed by password [9].
56 N. H. Prasad et al.
The details are taken from two sources, such as email header and content. Header
contains account ID and password with IP address, and content contains packet-
related information [10].
6 Results
7 Conclusion
To provide an affordable privacy, all routers within the electronic mail pathway and
connection between them have to be secured. Here, the title “secured mail system”
means that we are encrypting mail using OTP as a method of security within the
second stage, to safeguard the mail contents whether or not the user loses ownership
of his/her account.
Here, we have implemented the secured mail system which supplies higher secu-
rity to mails. The sender sends the confidential mails, and it is received by the user
when the receiver clicks to open the mail, then it will elicit the OTP, at the identical
time, he gets a message with the OTP. This OTP must be entered and checks for
validity, then the receiver can easily read the mail. This secrets unique and it gets
regenerated.
Highly Classified with Two Factor Authentication Encrypted … 57
8 Future Enhancements
References
1. Benarous L, Kadri B, Bouridane A (2017) A survey on cyber security evolution and threats:
biometric authentication solutions. In: Biometric security and privacy, Springer, Berlin,
Germany, pp 371–411
2. Boyd C, Mathuria A (2013) Protocols for Authentication and Key Establishment. Springer,
Berlin, Germany
3. Mohsin J, Han L, Hammoudeh M, Hegarty R (2017) Two factor versus multi-factor, an authen-
tication battle in mobile cloud computing environments. In: Proceedings of the international
conference on future networks and distributed systems, Cambridge, UK, 19–20 July 2017;
ACM, New York, NY, USA, p 39
4. Pathan ASK (2016) Security of Self-Organizing Networks: MANET, WSN, WMN, VANET;
CRC Press: Boca Raton. FL, USA
5. Borran F, Schiper A (2010) A leader-free byzantine consensus algorithm. In: International
conference on distributed computing and networking, Springer
6. Li T, Mehta A, Yang P (2017) Security analysis of email systems. In: 2017 IEEE 4th Inter-
national conference on cyber security and cloud computing (CSCloud). IEEE Xplore: 24 July
2017. https://doi.org/10.1109/CSCloud.2017.20
7. Balloon AM (2001) From wax seals to hypertext: electronic signatures, contract formation,
and a new model for consumer protection in internet transactions. Emory Law J 50:905
8. Danny T (2017) MFA (Multi-Factor Authentication) with Biometrics. Available online:
https://www.bayometric.com/mfa-multi-factor-authentication-biometrics/. Accessed online 4
Jan 2018
9. Huang JW, Chiang, CW, Chang JW (2018) Email security level classification of imbalanced
data using artificial neural network: the real case in a world-leading enterprise. In: Engineering
applications of artificial intelligence, vol 75. October 2018
10. Bao X (2020) A decentralized secure mailbox system based on blockchain. In: 2020 Interna-
tional conference on computer communication and network security (CCNS) IEEE Xplore: 02
Nov 2020. https://doi.org/10.1109/CCNS50731.2020.00038
Efficacious Intrusion Detection on Cloud
Using Improved BES and HYBRID
SKINET-EKNN
1 Introduction
Though various approaches have been proposed for solving the challenges of DDoS
attack, the best solution in terms of crucial paradigms like false positive ratio (FPR)
and detection rate (DR) with utmost accuracy and precision is yet to come by. In this
study, it is proposed to detect and curtail the propagation of DDoS attacks in a cloud
environment by enhanced IDS. It consist of an EPIA and IBES which are two novel
additions that strengthen the existing IDS. A hybrid classifier is also introduced for
the classification of malicious attacks.
The use of cooperative IDS not only helps in the accurate detection of cloud
intrusion attacks but also significantly increases the security of the system. Self-
explanatory as it is, Intrusion detection system detects anomalies in the network
traffic. On account of this functional attribute IDS has become a crucial component
of security information system model meant to protect the Internet of Things (IoT)
network from cyber-attacks.
Until recently, the single intrusion detection system was considered effective as
the flow of traffic was limited. But with emerging developments in IoT happening
C. U. O. Kumar (B)
School of Computer Science & Engineering, Vellore Institute of Technology - Chennai Campus,
Chennai 600127, India
e-mail: [email protected]
P. R. K. S. Bhama
Department of Computer Technology, MIT Campus-Anna University, Chennai 600044, India
e-mail: [email protected]
Prasad
Department of Computer Applications, NITTE Meenakshi Institute of Technology,
Bangalore 560064, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 61
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_6
62 C. U. O. Kumar et al.
in quicker strides, the nature of attacks is also getting complex. In these circum-
stances, single IDS is considered incapable of arresting such advanced attacks
owing to constraints like incomplete knowledge of the implications of such attack
patterns. A new approach using IDS was suggested by [1] that helped in selecting
IoT data features. Before the feature selection, the lowest confidence level packets
are eliminated through confidence level of EPIA algorithm and the flow properties
are inspected through Shannon property. Then, optimal features are obtained from
the filtered packets using IBES optimization. Finally, classification is performed by
HSkiNET-EKNN.
Many innovative approaches such as swarm intelligence, data mining and machine
learning have been introduced to prevent the network from different attacks. Author’s
in [2] used beta mixture model (BMM) for training the classifier using raw dataset to
detect attacks. Signature-based detection method was done using individual machine
learning approaches by adopting K-nearest neighbour (KNN). The objectives of the
study.
1. To strengthen the existing packet inspection algorithm by integrating it with
Shannon Entropy for the inspection of Flow properties.
2. To optimize the feature selection process by adding the maiden improved bald
eagle search algorithm for the maiden time.
3. To achieve a better rate of accuracy by adding a maiden hybrid classifier algorithm
called HSkiNET: EKNN for the maiden time.
2 Review of Literature
An intrusion detection system is a software that analyses the network traffic for
malicious activity. The wide range of IDS are further classified as NIDS, HIDS, PIDS,
APIDS and HIDS. Anomaly-based and signature-based detection are the techniques
most commonly used for detection in all variants of IDS.
Ficco et al. [3] has deployed hierarchical IDS in which the obtained data is trans-
mitted to the security engines over the physical server where the data correlation is
performed to determine the distributed attacks. A collaborative IDS framework was
proposed by [4]. The unknown attacks were detected using a fusion of Decision Tree
classifier and SVM and the known attacks are determined by Snort. For collecting
information regarding the attacks from VM at both virtualization and network levels,
the VMs were monitored using the proposed technique of [5]. The malicious network
packet detection model handles high network traffic at the hypervisor level. The
drawback of this work was that memory introspection was not performed in VMM
necessary for the detailed investigation of attacks.
Arjunan and Modi in [6] predicted the types of attacks by combining both signa-
ture and anomaly techniques. This work was a maiden venture for detecting both
known and unknown attacks. By means of alert correlation, the distributed attack
is detected using the correlation module existing in the centralized server. Many
machine learning techniques like ET, RF, DT and naive Bayes were used to improve
Efficacious Intrusion Detection on Cloud Using Improved … 63
the accuracy rate and reduce the FPR. Alerts from each classifier are fed into Demp-
ster–Shafer theory for further improving detection accuracy. The IDS evaluation has
maximum encouraging outcomes, but many detection defects like maximum FPR
for unbalanced samples and minimum detection rate for unknown attacks are yet to
be solved.
Authors in [7] proposed anomaly-based intrusion detection system for cloud
providers. The conventional network based IDS inspects the traffic by firing rules
through their detection engine. This work deploys IDS sensors across VMs and
hypervisor that help in improving the Detection Accuracy in the cloud environment.
The limitation of this work is that the aforementioned sensor deployment is tested
offline which avoids many real time challenges faced in real cloud testbed through
VM, Containers and Dockers. Anomaly detection by Homayoun et al. [8] devel-
oped BotShark through Stacked Autoencoders which extracts relevant features from
a huge enormous input. The reduced subset of features is passed to softmax classi-
fier to generate probabilities for detecting the most probable malicious and benign
traffic. Though this technique achieved a FPR of 0.13 it requires a pre-training phase
to understand the actual profile and network features.
Patil et al. in [9] devised protocol specific multi-threaded network intrusion detec-
tion system (PM-NIDS) for the cloud environment. The model captures DoS/DDoS
attacks by sniffing packet protocols. They segregate the packets into different queues
based on the protocol so that they can be handled in parallel thereby reducing packet
loss. Protocol specific classifiers like DT, RF and ONER were used for generating
alerts. Better performance was achieved, but still the handling of different protocols
remains a critical issue. Li et al. [10] devised an IDS model that detects malicious
traffic using a RNN-RBM hybrid model. From the encountered traffic RBM retrieves
features to build a feature vector whereas the RNN identifies Flow features. The
identified Flow features are then fed to softmax classifier to find the probable ones.
Chergui and Bousti [11] designed a Non-monotonic Ontological Contextual-based
strategy to minimize the FPRs of IDS. They address issues like raising false alerts,
vulnerability and network updation. In this context if the system handles the issue
of vulnerability, then the exploited sequence is categorized neither as true alert nor
false alert. The class-imbalance issue or improving IDS performance was carried out
by [12]. Cost-sensitive stacked autoencoder (CSSAE) generates costs for each class.
Furthermore, features are learnt from minority and majority classes based on their
derived cost values. The cost matrix is built by adjusting the neurons’ corresponding
cost through the cost function.
A cloud’s function is to collect and store packets from distant cloud users through
routers in cloudlets. In this process, it becomes vulnerable to attacks from botnets.
The CC in system model displayed in Fig. 1 observes all files on cloudlets from
clients and produces a threshold level for each cloudlet so as to avoid imbalance in
64 C. U. O. Kumar et al.
packet’s traffic. It transfers the files to idle cloudlets when there is a surge in traffic
flow. In this study, two novel additions, namely EPIA and IBES are introduced to
streamline the inspection of packet flow traffic through Feature Selection.
In this model, the PIA is enhanced using Shannon entropy method, where packets
flow and arrival time are analysed using confidence level and then the packet flow
properties are analysed using Shannon entropy method. At first, every packet is
analysed according to the packet flow and arrival time. It foresees a surge in flow
based on features like packet count, arrival time and checks the confidence level.
The confidence level (CL) is evaluated according to single and pair attributes
Balamurugan and Saravanan in [13].
(i) Single Attribute’s Confidence
( )
( ) N X i = X i, j
CL X i = X i, j = i = 1, 2, . . . n and j = 1, 2, 3 . . . m
Nn
(1)
Then, every packet’s confidence level is evaluated; if it is low, then the packet is
eliminated, or else it is admitted. Then, the packets are checked for flow properties.
Feature-based IDS has been introduced not only for controlling traffic volume but
also to explore the network traffic’s flow properties. General Internet Protocol (IP)
attributes such as port numbers, Source and Destination IP address are utilised for
this purpose. The Shannon entropy is applied in the context of Intrusion Detection.
For instance, assume the Probability distribution as
P =< p1 , p2 , . . . . pn using 0 ≤ pi ≤ 1
Σ n
where i=1 pi = 1, the entropy is defined as,
Σ
n
Es = − pi log pi (3)
i=1
The structure of basic bald eagle search is improved to increase convergence speed,
reliability and solution accuracy. A new parameter, opposite-based learning (OBL)
is added for enhancing the efficiency. The new method known as IBES is tested in
feature selection approach. The population diversity of BES in the search space is
improved by OBL. In IBES, the algorithm follows an effective fitness function to
eliminate irrelevant and redundant features.
The collection of feature subsets FS is expressed as:
The number of subsets is exhaustive. Various strategies are developed for getting
an enumerative solution. The Metaheuristic IBES algorithm is incorporated in the
wrapper based approach, which is a maiden venture used in this study for enhancing
the Intrusion Detection System and has not been implemented so far. BES is a
strategy originally proposed by Alsattar et al. in [14] which works on the assumption
of Eagle’s strategy of searching and obtaining prey using optimal hunting decision.
IBES an enhanced approach of BES adopts the OBL function for enhancing the
population diversity and makes use of an effective fitness function for the removal
of redundant and irrelevant features.
An improved version of BES algorithm is adopted for feature selection as the
conventional BES algorithm lags in convergence speed. Opposition-based learning
(OBL) proposed by Tubishat et al. in [15] is adopted as it improves BES’s ability for
generating solutions. In addition, an effective fitness function is used for removal of
redundant and irrelevant features. After the BES algorithm has generated its initial
population, the OBL approach finds each of the initial solution’s opposite solution.
Y = lb + ub − γ (5)
Then, the features are compared with the fitness function in each stage based on the
objective of maximizing accuracy, detection rate and minimizing false positive rate
for getting the best feature. Thus, based on the fitness, OBL takes the best n solution
from the set of initial and opposite solutions. Based on the classification accuracy,
the IBES calculates the fitness value for each possible solution. Furthermore, while
testing the dataset used by IBES to evaluate the SKiNET classification accuracy, the
training dataset is used to train the SKiNET classifier. Thus, IBES chooses the best
solution X* from the selected solutions.
⎡ ⎤
Σ
N
⎢ Fi ⎥
⎢ i=1 ⎥
FitnessFunction : fitnessIBES (Y) = DR + [1 − FAR] + F ⎢1 − ⎥ (6)
⎣ N ⎦
Efficacious Intrusion Detection on Cloud Using Improved … 67
The network system consists of an input, a hidden and an output layer. In the event
of receiving an external input information the active neurons will rush to respond to
the input. Ultimately, the winning neurons will be excited to display the classifica-
tion results while the loosing neurons will remain inactive. The SKiNET proposed
by Banbury et al. in [16] calculates the minimum distance probability based on
the Euclidean distance and classifies using maximum probability method. EKNN
predicts the appropriate feature class label by using kernel similarity function. In this
study, a hybrid classification algorithm is obtained by integrating SKiNET-EKNN.
Here, VMM executes the hybrid classification algorithm by which several DDoS
attacks are detected, especially in an IoT cloud environment. The conventional KNN
algorithm follows Euclidean distance to find out the similarity amongst the neigh-
bours. Here to enhance the KNN performance, Kernel-Based Similarity Measure is
introduced instead of Euclidean distance measure. Since Euclidean distance cannot
classify the linearly inseparable complex datasets, kernel function is adopted for
plotting both data points as high-dimensional feature space to create the data at ease
and find the closed one. Initially, in SKiNET, the feature values are normalized, and
the weight value is randomly assigned. Then, Euclidean distance is calculated for
every feature and weight. Finally, minimum distance value is determined using the
maximum probability method.
The testing results of distance values are compared with SKiNET value, here K value
is fixed and based on the K value all the distance values are ranked. From the ranked
results, majority vote result is taken as the prediction of classification. Since it is
challenging to cluster Euclidean distances on linearly inseparable, complex datasets,
a kernel function is followed to create high-dimensional [−feature
] space by mapping
two data points for easing the clustering of data. S = →
u i = 1toN represents
[ ] i
the dataset wherein →u i e Rd, is a nonlinear kernel function which is used for
mapping the raw data obtained from the input space Rd to the high Dimensional
Feature space H, as shown below:
( )
ϕ : R 2 → Ω, →u i ϕ →u i (7)
Hereafter,
[ ] the following
[ equations
] denote the kernel distance between two data points,
→u i and wherein →u j
|| ( ) ( )|| (( ( ) ( ))( ( ) ( )))
||ϕ →u − ϕ →u ||2 = ϕ →u − ϕ →u ϕ →u i − ϕ →u j (8)
i j i j
68 C. U. O. Kumar et al.
( ) ( ) ( ) ( ) ( ) ( )
ϕ T →u i .ϕ →u j − 2ϕ T →u i .ϕ →u j + ϕ T →u i .ϕ →u j (9)
( ) ( ) ( )
= K →u i , →u j − 2K →u i , →u j + K →u i , →u j (10)
4 Experimental Analysis
NSL-KDD is an extended version of the original KDD cup 99 dataset. This dataset
consists of redundant data that causes imbalance in learning leading to biased results
(Table 1).
Thus, in conclusion, the performance of IBES strategy approach in accuracy is
better with a peaking value of 99.5%. In terms of the next parameter, FPR also the
Efficacious Intrusion Detection on Cloud Using Improved … 69
current proposed approach peaks again with 0.3%. Considering DR also the current
approach comes next to Muhammed et al. with a rate of 99.12% which is a mere
0.26% less than the highest performance of 99.38% recorded by Muhammed et al.
Table 3 Categorization of
Category of attack ACC (%) DR (%) FPR (%)
attacks
Normal 99.91 100 0.9
Combo 99.82 99.9 0.81
Junk 99.4 99.7 2.41
TCP 99.91 99.9 0.39
UDP 99.61 99.9 3.2
SCAN_MIRAI 99.73 99.8 0.81
SYN 99.57 99.71 1.62
UDP_Mirai 99.82 99.9 0.81
UDP_Plain 99.56 99.7 1.62
Table 4 Overall
Dataset used ACC (%) DR (%) FPR (%)
performance of datasets
NSL KDD-99 99.5 99.12 0.3
UNSW NB15 99.6 99.02 1
N-BaIoT 99.7 99.8 0.2
BASHLITE and Mirai are two of the foremost general IoT botnet dataset families.
This is the new IoT Botnet dataset that is valued as a benchmark dataset for the
proposed IDS. Besides the three aspects, namely ACC, DR and FPR. Table 3 presents
the performance of each intrusion category in terms of accuracy, detection rate and
false positive rate for N-BaIoT dataset.
In summation from Table 4, the current approach using N-BaIoT dataset has
outperformed the other two. It is observed that the IBES optimization approach has
yielded a better performance. The use of EPIA for filtering packets on the basis of
confidence level and Shannon entropy for achieving a reduction or easing congestion
in traffic flow is worthy of implementation; The use of IBES optimization algorithm
for selecting optimal features and the adoption of HSkiNET_EKNN hybrid classifi-
cation have all contributed for achieving better outcomes. This stands as a testimony
for the validity of the proposed improved bald eagle search approach in the substantial
detection and confrontation of flash attack in a cloud environment.
5 Conclusion
This study proposes the adoption of a new IDS for effectively predicting the malware
attacks in the VM. Initially, EPIA filtered the low-risk malicious packets and it used
Shannon entropy for identifying the packet flow properties. Then, the filtered packets
Efficacious Intrusion Detection on Cloud Using Improved … 71
are sent for selection of optimal features. The OBL with added fitness function
enhanced the algorithm and improved the convergence time yielding commend-
able results. Finally, the accuracy in classification has been improved by HSKiNET-
EKNN. Here, SKiNET takes the minimum distance probability value and predicts
the results using EKNN. Also, the KNN is enhanced by kernel-based similarity
function. The simulated performance of proposed approach has been tested on three
benchmarked datasets such as NSL-KDD, UNSW-NB15 and N-BaIoT. The proposed
model has achieved a maximum ACCof 99.7%, DR of 99.80% and low FPR of 0.2%
pointing out that the enhanced IDS as a detection and confrontation strategy for
curbing DDoS attack in a cloud environment is efficacious.
References
1. Li D, Deng L, Lee M, Wang H (2019) IoT data feature extraction and intrusion detection system
for smart cities based on deep migration learning. Int J Inf Manage 49:533–545
2. Moustafa N, Creech G, Slay J (2018) Anomaly detection system using beta mixture models and
outlier detection. Progress in Computing. Springer, Analytics and Networking, pp 125–135
3. Ficco M, Tasquier L, Aversa R (2013) Intrusion detection in cloud computing. In: P2P, parallel,
grid, cloud and internet computing, pp 276–283
4. Singh DP, Borisaniya B, Modi C (2016) Collaborative ids framework for cloud. Int J Network
Secur 18(4):699–709
5. Mishra P, Pilli ES, Varadharajan Y, Tupakula U (2017) Out-VM monitoring for malicious
network packet detection in cloud. In: ISEA asia security and privacy IEEE, pp 1–10
6. Arjunan K, Modi CN (2017) An enhanced intrusion detection framework for securing network
layer of cloud computing. In: ISEA asia security and privacy IEEE, pp 1–10
7. Rezvani M (2018) Assessment methodology for anomaly-based intrusion detection in cloud
computing. J AI Data Min 6(2):387–397
8. Homayoun S, Ahmadzadeh M, Hashemi S, Dehghantanha A, Khayami R (2018) BoTShark: a
deep learning approach for botnet traffic detection. In: Cyber threat intelligence, Springer, pp
137–153
9. Patil R, Dudeja H, Gawade S, Modi C (2018) Protocol specific multi-threaded network intru-
sion detection system (PM-NIDS) for DoS/DDoS attack detection in cloud. In: 2018 9th Inter-
national conference on computing, communication and networking technologies IEEE, pp
1–7
10. Li C, Wang J, Ye X (2018) Using a recurrent neural network and restricted Boltzmann machines
for malicious traffic detection. Neuro Quantology 16(5):823–831
11. Chergui N, Boustia N (2019) Contextual-based approach to reduce false positives. IET Inf
Secur 14(1):89–98
12. Telikani A, Gandomi AH (2019) Cost-sensitive stacked auto-encoders for intrusion detection
in the Internet of Things. Internet of Things 1–25
13. Balamurugan V, Saravanan R (2019) Enhanced intrusion detection and prevention system on
cloud environment using hybrid classification and OTS generation. Clust Comput 22(6):13027–
13039
14. Alsattar HA, Zaidan AA, Zaidan BB (2020) Novel meta-heuristic bald eagle search optimisation
algorithm. Artif Intell Rev 53:2237–2264
15. Tubishat M, Idris N, Shuib L, Abushariah MAM, Mirjalili S (2020) Improved Salp Swarm
Algorithm based on opposition based learning and novel local search algorithm for feature
selection. Expert Syst Appl 145:113122
72 C. U. O. Kumar et al.
1 Introduction
There are no single specific tests that can be used to detect Parkinson’s disease.
To rule out the presence of other illnesses, various procedures such as positron
emission tomography (PET) scan, brain ultrasonography, and magnetic resonance
imaging (MRI) can be performed. However, they are not very useful in the detection
of Parkinson’s disease.
Parkinson’s disease takes time to diagnose. Following the diagnosis, regular meet-
ings with neurologists are required to assess the patient’s status and symptoms over
time and accurately diagnose this disease. According to a study, the clinical diagnosis
of Parkinson’s disease is accurate 80.6% of the time [1].
With the invasion of technology also, there are various deep neural network
(DNN), machine learning (ML), and artificial neural network (ANN) models avail-
able but they are extremely generalized. No two people experience this disease in the
same way. Hence, it has been decided to come up with an amalgamated and person-
alized model that takes into consideration various parameters like medical history,
voice analysis, and handwriting analysis which will potentially help in detecting this
disease.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 73
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_7
74 K. Harshitha et al.
2 Related Work
A literature survey of some latest research papers gave an insight into how to go
about the project and the areas that require focus to overcome the shortcomings of
the previous work.
The following significant insights can be drawn from the research paper [2]:
• Provide the long-term monitoring and detection of movement-related and other
non-movement-related symptoms.
• There will be continuous monitoring.
• Smartphones are widely used these days and making use of them as smart devices
with multiple sensors is more feasible.
• Since all minor symptoms are to be captured, monitoring must be done for a long
time. The users must trust the system and be up to it.
• More data is required.
According to the research paper [3], the following major insights can be derived:
• From the speech recordings, feature extraction, feature subset selection, and
classifying features were done. The performance of the system is assessed.
• In this voice-based detection system,
– The most recent and largest publicly available dataset is used.
– A small number of features are considered.
• The key lies in selecting features in a small number, which are relevant and
unrelated.
• No other symptoms are explored.
The authors of the research paper [4] propose the following:
• Two vocal recordings are taken using a smartphone (SP), microphone, and acoustic
cardioid (AC) channels. This data is classified using support vector machine
(SVM) and K-nearest neighbors (KNN) algorithms.
• The features of voice data such as vocal frequency when the patient pronounces
certain words are taken as inputs.
• Validation is checked.
• No other symptoms are explored.
• In this model, the language used in the input must be the same as dataset languages.
The research paper [5] suggests the following:
• The following types of algorithms are used in this system:
– Four feature selection algorithms.
– Six classification algorithms multilayer perceptron (MLP), Naive Bayes, K-
nearest neighbors (KNN), and support vector machine (SVM).)
– Two validation algorithms.
An Amalgamated and Personalized System for the Prognosis … 75
• Some new dysphonia measures were used to extract 132 features of speech signals.
The dataset comprises only certain vowels.
• Feature selection methods are studied for feasibility and then applied to the
samples. This is to find more relevant features as this will increase the performance
of the classifiers.
• The input words are restricted to certain words only.
• No other symptoms are explored.
The authors of the research paper [6] intend to put forth the following:
• Parkinson’s disease (PD) is detected by analyzing gait features using a deep
learning approach.
• Data is collected via wireless sensors. This can be used to know the severity level
of the disease progression.
• Sensors are placed under the feet of the patient, which will measure the changing
weight when the patient is in motion.
• No other symptoms are explored.
The research paper [7] proposes the following methodology:
• Speech is one of the factors which help in the detection of Parkinson’s disease as
it affects various factors of speech.
• Speech signals are recorded, pre-processed, conversion into intrinsic mode func-
tions (IMF), feature extraction, and classification using support vector machine
(SVM) and random forest (RF) are performed.
• The voice signals are converted to IMFs by making use of empirical mode
decomposition (EMD) technique.
• The following insights can be derived:
– Initial four IMFs give vocal tract information of the patient.
– Other IMFs give vocal fold vibration information.
• Only the voice dataset is used. Other symptoms are not explored.
Research paper [8] suggests the following:
• Thirteen people with Parkinson’s disease were chosen for the study and were
required to wear a flexible sensor and a smartwatch on that hand that is most
affected.
• A physician rated the severity of tremor and uncontrolled voluntary movements
in each hand as the participants completed various motor tasks. Then, machine
learning models were employed on acquired data to distinguish between them
and their performance was compared while utilizing multiple types of sensors.
• Data collection is very flexible as wearable sensors and smartwatch is used on the
affected hand.
• Sensors have limited battery life and memory capacity.
76 K. Harshitha et al.
Some of the evident and major deliverables from the literature survey are:
• Only the criteria of voice are taken into consideration, which may or may not be
reliable in most cases.
• Other major symptoms are almost unexplored.
• These tests are not personalized. Only general outcomes are given.
• The patient’s health history is not taken into consideration.
3 Proposed Methodology
The model proposed is not exclusive to any one symptom of Parkinson’s. Since the
goal is to make it as personalized as possible, factors of personal health conditions
along with their voice and handwriting are taken into consideration and an overall
result is obtained which will help the user in taking appropriate measures to alter
their lifestyle to help not aggravate this disease.
To achieve this, various machine learning models are built which are ingenious.
The inputs given by the user are quantified to the required type and fed to the model
which will give the best/most accurate results. Furthermore, the results obtained from
the models are processed and the most suitable and final output is given with three
possibilities:
1. There are high chances the person has developed PD (Parkinson’s disease).
2. There are chances the person may develop PD in the future.
3. The person is healthy (Fig. 1).
• Based on studies from various medical journals, some of the major factors that
may lead to the development of Parkinson’s disease were compiled [17, 18].
• Questions are framed with certain weights as follows:
I. Is your age over 60?
Yes-1 No-0.
II. Your gender?
Male-1 Female-0.
III. Do you have a parent(s) or sibling(s) who is/are affected by Parkinson’s?
Yes-1 No-0.
IV. Have you ever been exposed to chemicals like pesticides and herbicides;
Agent Orange (an herbicide and defoliant chemical); or worked with heavy
metals, detergents, or solvents?
Yes-1 No-0.
V. Have you ever received severe head trauma?
Yes-1 No-0.
VI. Do you consume medications such as antipsychotics for treating severe
paranoia and schizophrenia?
Yes-1 No-0.
VII. Have you been exposed to nicotine for longer durations of time?
Yes-0 No-1.
VIII. Have you been consuming coffee/caffeine products for a long duration of
time?
Yes-0 No-1.
IX. Do you lead an active lifestyle with regular exercise?
Yes-0 No-1.
X. Have you been consuming statins to reduce cholesterol levels?
Yes-0 No-1.
• 1 indicates that the probability of developing Parkinson’s disease is high while 0
indicates the opposite.
• Set the threshold value to 5 which is the midpoint of the sum weightage of each
question.
• If the patients’ score is above 5, the probability of developing the disease is very
high. If less than 4, the probability is less.
• A score around the midpoint, i.e., 4, 5, or 6, may indicate that the person is at risk
of developing this disease later on.
• The output from this module is known as “Result 1” (Fig. 2).
80 K. Harshitha et al.
• The complete input image is separated into several parts, where the magnitude
and direction are calculated for each part and these are utilized to determine the
change in direction of spiral images.
• As a result, it is used to distinguish between the shape of the spiral in PD and
non-PD patients.
Random Forest (RF) Classifier
• This is a technique that builds a series of decision trees from a randomly selected
subset of the training data.
• The votes from several decision trees are then combined to determine the final
result or class of the object.
• The feature vectors from all images retrieved using the HOG descriptor are used
to train a random forest model.
• It is used to classify testing data as healthy or Parkinson’s, and the accuracy of
the model is calculated.
• Finally, the spiral images will be classified as healthy (0) or Parkinson’s (1).
• Output from this module is known as “Result 3.”
5 Results
The expected results from the possible inputs and the verification based on actual
results are as in Table 1.
6 Conclusion
(continued)
84 K. Harshitha et al.
Table 1 (continued)
Have you been 0.03590,
consuming 0.08270,
coffee/caffeine products 0.01309,
for a long duration of 20.65100,
time? 0.429895,
No-1 0.825288,
-4.443179,
Sum=3 0.311173,
2.342259,
0.332634
Sum=6
(continued)
An Amalgamated and Personalized System for the Prognosis … 85
Table 1 (continued)
Agent Orange (a 0.04812, (The cause of
herbicide and defoliant 0.01810, failure of
chemical); or worked with 19.14700, producing an
heavy metals, detergents, 0.431674, accurate output
or solvents? 0.683244, could be
Yes-1 -6.195325, because the
0.129303, Random Forest
2.784312, Classifier
Have you ever received 0.168895 algorithm can
severe head trauma? give an
Yes-1 accuracy of
86.67%)
Do you consume
medications such as
antipsychotics for treating
severe paranoia and
schizophrenia?
Yes-1
Sum=7
References
Parkinson’s disease symptoms using wearable sensors. J NeuroEng Rehabil. 17. https://doi.
org/10.1186/s12984-020-00684-4
9. Ali L, Zhu C, Zhang Z, Liu Y (2019) Automated detection of Parkinson’s disease based
on multiple types of sustained phonations using linear discriminant analysis and genetically
optimized neural network. IEEE J Trans Eng Health Med 1–1. https://doi.org/10.1109/JTEHM.
2019.2940900
10. Schrag A, Anastasiou Z, Ambler G, Noyce A, Walters K (2019) Predicting diagnosis of
Parkinson’s disease: a risk algorithm based on primary care presentations. https://doi.org/10.
1002/mds.27616
11. Grover S, Bhartia S, Yadav A, Seeja KR (2018) Predicting severity of Parkinson’s disease
using deep learning. Procedia Comput Sci 132:1788–1794. ISSN 1877–0509. https://doi.org/
10.1016/j.procs.2018.05.154
12. Fayyazifar N, Samadiani N (2017) Parkinson’s disease detection using ensemble techniques
and genetic algorithm. Artif Intell Sig Process Conf (AISP) 2017:162–165. https://doi.org/10.
1109/AISP.2017.8324074
13. Sriram TVS, Rao MV, Narayana GVS, Kaladhar DSVGK (2015) Diagnosis of Parkinson
disease using machine learning and data mining systems from voice dataset. In: Satapathy S,
Biswal B, Udgata S, Mandal J (eds) Proceedings of the 3rd international conference on fron-
tiers of intelligent computing: theory and applications (FICTA) 2014. Advances in intelligent
systems and computing, vol 327. Springer, Cham. https://doi.org/10.1007/978-3-319-11933-
5_17
14. Williamson JR, Quatieri TF, Helfer BS, Ciccarelli G, Mehta DD (2015) Segment-dependent
dynamics in predicting Parkinson’s disease. In: Proceedings of InterSpeech, pp 518–522
15. Jobbagy A, Furnee H, Harcos P, Tarczy M, Krekule I, Komjathi L (1997) Analysis of movement
patterns aids the early detection of Parkinson’s disease. In: Proceedings of the 19th annual
international conference of the IEEE engineering in medicine and biology society. Magnificent
milestones and emerging opportunities in medical engineering (Cat. No.97CH36136), pp 1760–
1763 vol 4. https://doi.org/10.1109/IEMBS.1997.757066
16. Hariharan M, Polat K, R Sindhu (2014) A new hybrid intelligent system for accurate detection
of Parkinson’s disease. Comput Methods Prog Biomed 113(3):904–913. ISSN 0169–2607.
https://doi.org/10.1016/j.cmpb.2014.01.004
17. Johns Hopkins Medicine, Parkinson’s Disease risk factors and causes. https://www.hopkin
smedicine.org/health/conditions-and-diseases/parkinsons-disease/parkinsons-disease-risk-fac
tors-and-causes
18. Medical News Today. https://www.medicalnewstoday.com/articles/323440
19. ICS UCI Machine Learning Databases. https://archive.ics.uci.edu/ml/machine-learning-databa
ses/parkinsons//
Design and Implementation of Flyback
Converter Topology for Dual DC Outputs
1 Introduction
Due to the increasing power demand, there is a vast increase in power generation.
The generation may be from renewable or non-renewable energy sources. To reduce
carbon emission and to protect the environment, we must opt for renewable energy
(green energy) and the energy obtained might be in the AC or the DC forms. Due
to the fact that the conversion of energy from AC–DC, DC–AC, AC–AC, DC–DC,
and AC-DC-AC, the converter topologies are evolved [1–3]. There have been many
different topologies in the last few years. This article focuses on the flyback converter
topology for various charging adapters (mobile, laptop, etc.,) applications and SMPS
circuits. It is used for many low-power output applications. The output power rating
of the flyback converter should not be more than 100 W. Generally suitable for high-
voltage and low-power applications. Its most important features are simplicity, low
cost, and galvanic insulation [4, 5].
The flyback converter is a kind of converter which is utilized to change over the
electrical vitality from DC to DC or AC to DC [6]. It is like Buck–Boost converter; the
distinction is transformer which is utilized for storing energy, and it gives separation
among input and output in case of flyback converter. But in the case of Buck–Boost,
there is no isolation provided among input and output terminals. It is SMPS circuit
and used for many low-power output applications. The output power rating of the
flyback converter should not be more than 100 W [7–9]. DC or unregulated DC
voltage is contributed as input to the circuit, or else AC voltage is rectified and given
as input to the flyback converter circuit. With this converter, single or numerous
output voltages which are segregated from the input voltages can be gotten [10].
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 87
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_8
88 C. H. V. Ramesh et al.
The DC-DC converter can be divided into hard switching converters and soft-
switching converters. Because recent low efficiency of hard switching converters,
soft-switching technology is becoming more and more popular. Separate converter
topologies for forward, reverse, push–pull, etc., are usually used for SMPS applica-
tions. The switching frequencies of these converters are typically in the kHz range.
This reduces the size of the transformer [11, 12]. Flyback converter is the most used
SMPS circuit for low-power output applications having the advantage of isolated
output. From an energy efficiency standpoint, flyback power supplies are inferior to
many other SMPS circuits. But their simple topology makes it popular for low-cost
and low-output power range [13]. The [14] proposes ZVS technology for various
non-isolated DC-DC converters. There is a limit to the voltage gain that can be
achieved using a buck/boost converter or boost converter. Unwanted to operate a
boost or buck–boost converter with a very high duty cycle the ratio due to the very
high current ripple of the capacitor. And it happens to be the solution is to choose
a separate topology to get the high price voltage amplification between the battery
and the DC bus.
The topology of a flyback converter is essentially a coupled inductor, a PWM
controlled switch on the primary side, and a diode on the secondary side of the
coupled inductor with the capacitor connected across the load. Figure 1 shows the
basic configuration of a flyback converter. Here, MOSFET is used to get fast dynamic
control over duty ratio. The coupled inductor is used for voltage isolation with a series
opposition connection.
This paper offers information regarding the classifications of the converters as per
the potency and power density, SMPSs are helpful than linear power switches. Most of
the advanced communications and laptop systems need SMPSs that have high-power
density, high potency, and constant operating frequency. In the last decade, plenty
of power convertor topology has been planned for switch mode power provider’s
applications. A power converter is an electrical or electro robot that is used for
changing power from AC-DC or DC-DC [15]. In [16], the information regarding
the classification of DC-DC converters is given. The design methodology for the
transformer in a flyback converter which automatically completes soft switching of
the main switch given in [17]. No extra circuit or management theme is required to
achieve zero current activate and total zero voltage switch (both throughout activate
and switch OFF) of the MOSFET. Also, it eliminates the dissipative snubber that is
a vital part of the flyback converter. However, this theme is all applicable to fastened
input and constant load applications.
• The input voltage source is directly connected to the transformer primary winding
when the switch is closed. The transformer stores energy as the primary current
and magnetic flux increase in the transformer. Due to sign convention, a negative
voltage is prompted in the transformer so the diode gets reverse biased. Then,
output load is provided from the output capacitor.
Fig. 3 Mode of operation a Mode 1 equivalent circuit diagram and b Mode 2 equivalent circuit
diagram
• There will be primary and magnetic flux drop when the switch is in OFF condition.
Due to sign tradition, the secondary voltage becomes positive, and the diode is
forward biased, which makes the current flow from secondary of the transformer
as a result output capacitor get energized and supplies load.
Different circuit configurations should be accepted during the task of the flyback
converter. Each circuit configuration is referred to as the mode of circuit operation.
With the assistance of an equivalent circuit, different modes of operation of the
flyback converter circuit are explained.
Mode 1: The primary winding of a transformer is associated with the input supply
with its spotted end getting connected to the positive side when the switch is closed.
Then, current starts to stream in the primary winding. Due to reverse biasing of diode,
current flowing in secondary winding will get blocked. This primary current is in
charge of establishing flux in the core of the transformer. In the equivalent circuit
demonstrated below, the device which is conducting will be taken as a short circuit
and which is not conducting is taken as an open circuit. The switches or diodes which
we are utilizing are expected to perfect in nature with zero voltage drop during ON
state and zero spillage current during OFF state. The current through primary winding
can be communicated by giving connection DC voltage, given by Eq. (1).
dIpri
E dc = L pri ∗ (1)
dt
where E dc is DC input voltage, L pri is primary winding inductance, Ipri is current
through the primary winding.
During this operation, voltage induced in secondary winding is constant and equals
to Eq. (2),
N2
Vsec = E dc ∗ (2)
N1
Design and Implementation of Flyback Converter Topology … 91
The voltage across the diode which is arranged in series with secondary winding
is as Eq. (3),
Vd = V0 + Vsec (3)
Mode 2: After conducting for some time, the switch is going to turn OFF. The
primary side becomes an open circuit. Both voltage and the current drop in the
primary winding. Forward biasing of diode takes place due to change in the polarity
of secondary winding, so diode starts conducting and recharges the capacitor to
supply load. The secondary current concerning secondary inductance during this
operation is given by Eqs. (4) and (5),
dIsec
L sec ∗ = −V0 (4)
dt
The voltage across switch during this operation is
N2
VSW = E dc + V0 ∗ (5)
N1
Considering input (Vccmax −370 and Vccmin-210), Dmax = 0.45 and Bmax =
0.2 wb/m2 , the circuit parameters for first DC output (V 01 = 12 V and I 01 = 2 A)
are calculated as follows.
The secondary power (P02 ) can be calculated using
Σ ( )
1 − Dmin
P02 = V0i ∗ I0 (6)
Dmin
Dmax
Dmin = (8)
Dmax + (1 − Dmax ) ∗ Vcc max / Vcc min
92 C. H. V. Ramesh et al.
Now, substituting the Eq. (8) in Eq. (6), we get the value of P02 . The calculation
of turn’s ratio N can be calculated using Eq. (9)
V0i (1 − Dmax )
N= ∗ (9)
Vcc min Dmax
where N is secondary to the primary turns ratio. Then, the core selection of the
flyback converter is given by Eq. (10)
( / / )
P02 1
η
4D
3
+ 4(1−D)
3
Ap = (10)
K ∗ ω ∗ J ∗ Bm ∗ f s
where K w is the window utilization factor, J is the current density, Bm is the maximum
flux density, and f s is the switching frequency (100 kHz). Considering 80% efficiency
and the calculated value of P02 = 58.17 W, we get Ap = 5304 mm2 . Therefore, the
EE30/15/7 core is chosen for the design. And the calculation of the number of turns
in primary and secondary winding can be found by Eqs. (11) and (12)
N2 = N ∗ N1 (12)
By substituting the calculated turns ratio N = 0.0784 and the number of turns
in the primary winding (N 1 ), we can get the number of turns in the secondary
(N 2 ) as 4.
Similarly, for V 0 = 5 v, I 0 = 0.5 A, and P02 = 14 W, the number of turns for the
supply output is calculated in the same manner, and we get the number of turns = 2.
The design values of the flyback converter are stated in Table 1.
In a flyback converter, the selection of MOSFET is important, and it will carry the
primary side current when the switch is in the ON position, and the voltage across
the switch is zero. And the switch is OFF state, the voltage over the switch is input
voltage (V D ) plus reflected secondary side voltage (V 0 /n).
• The voltage over the MOSET during ON period V SW = 0 V
• The voltage over the MOSFET during OFF period V SW = V D + (V 01 /n)
where V D is the input DC voltage, V 01 = output voltage. And while selecting the
MOSFET, we should consider the transformer spillage inductance. We should take
the ringing of the MOSFET that is 30–40% of the total voltage, and the voltage over
the switch during turn OFF period is given as
( )
V01 V01
VSW = VD + + 0.03 ∗ VD + (13)
n n
So, its desired to choose MOSFET of maximum voltage is more than the calculated
value V sw = 676 V. Accordingly, IRFBG30 is suitable. It consists of maximum
voltage reading 1000 V and maximum current rating 3.1 A. And the selection of
output diode in flyback converter, the voltage over the diode in when the MOSFET
is ON state is
VD = −(Vd ∗ n) (14)
Equation (14), according to the polarity of the secondary side of the transformer
negative voltage, is applied over the diode. So, the diode can withstand the negative
voltage (−30 V in this case). During the MOSFET OFF period, the diode is in an
ON state, and the voltage over the diode is zero.
And also, in the flyback converter, the average value of the diode current is the
output current is the diode must carry the output current of 2 A. So, the diode 1N4007
is a suitable diode for this application.
Pulse width modulator (PWM) is a controller which is used to control the power
flow from the input to the output. Varying the pulse width causes the duty ratio to
change which can be used to control the power flow from the input to output. Here,
the pulse generator is used to pulse for the switch we have used in the project that
is MOSFET by keeping the operating frequency constant. The UC3845 is an 8 pin
IC which is having high performance for fixed frequency operation usually designed
for DC–DC converter applications allowing the designer with the least cost solution
94 C. H. V. Ramesh et al.
The effectiveness of the designed flyback converter for different DC-link voltages
is evaluated and verified through MATLAB/SIMULINK platform. And after the
validation of the simulation results, the hardware realization is carried. The overall
flyback converter design schematic with controller is shown in Fig. 5.
Simulation Results: Dual output waveforms of the flyback converter are shown
in Figs. 6 and 7.
The hardware realization for the dual output flyback converter is shown in Fig. 8.
The hardware circuit of the flyback converter with the desired output voltage levels
can be seen. Hardware results are compared with the theoretical and simulation
Design and Implementation of Flyback Converter Topology … 95
results in Table 2. We can see the slight difference in the voltage levels obtained by
the simulation and hardware results.
5 Conclusion
The idea of any power gadgets configuration is surely known at the point when exer-
tion is made to foster the framework. Equipment improvement gives a chance to study
issues that are typically disregarded during the hypothetical study. In this article, a
methodology was made to plan and execute a flyback converter circuit. Flyback which
is the most reasonable SMPS geography is designed for dual outputs with a switching
frequency of 100 kHz. The flyback converter with the cautious improvement of
converter transformer and PWM regulator was done. The MATLAB/SIMULINK
reenactment results and the test consequences of the equipment were contrasted and
recreation results and dissected.
Design and Implementation of Flyback Converter Topology … 97
Fig. 8 Hardware realization: a flyback converter circuit, b output DC-link voltage of 11.15 V, c
output current of 1.28 A for first supply output, d output DC-link voltage of 5.51 V, and e output
current of 0.47 A for second supply output
Table 2 Comparison of
Parameter Theoretical Simulation Practical value
output voltage with the
value (V) value (V) (V)
theoretical and simulated
value V o1 12 12.05 11.15
V o2 5 5.1 5.51
References
1 Introduction
Humans and machines, including computers, may now communicate more quickly
because of recent electronics and sensor technology improvements. For IoT and
universal computing, this human–machine interface (HMI) system will become
increasingly important [1]. In most circumstances, communication begins when a
machine (or an object) receives and interprets a human’s purpose (or the user). As a
result, the HMI requires an input device to record the user’s intent.
Human gestures provide for a more natural approach to HMI input. Human body
language is an intuitive communication technique for conveying, exchanging, inter-
preting, and understanding people’s thoughts, intentions, and emotions. As a result,
physical language emphasizes or complements spoken language. It is a language
in and of itself. Thus, human emotions, such as hand gestures, should be included
for HMI input [2]. Gesture-based interactions are one of the most comfortable and
straightforward ways to communicate. On the other hand, gesture recognition has
various challenges before becoming widely recognized as an HMI input.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 99
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_9
100 R. Gupta et al.
Human hand motions are substantially less diversified than the tasks required by
the HMI, which poses a considerable challenge. The functions of an (HMI) are more
varied and complex. In the case of smartphones, this diversification tendency may
be seen. Only a decade ago, a variety of handheld electronic devices, such as mp3
players, cell phones, and calculators, coexisted to meet various human needs. On the
other hand, almost all these tasks have now converged into a single mobile device: the
smartphone. On the other hand, all human intentions are only conveyed by swiping
or tapping fingers on a smartphone’s touch screen.
When it comes to HMI inputs, one prevalent approach is gesture-based interac-
tion [3]. There are two hand gesture recognition techniques: vision-based recognition
(VBR) and sensor-based recognition (SBR). There have been studies in gesture recog-
nition, but most rely on computer vision. The efficiency of vision-based approaches or
the operation of such devices is highly dependent on lighting conditions and camera-
facing angles. It is inconvenient, and such limitations often limit the technology’s
usage in specific environments or for certain users.
Sensors include electromyography, touch, strain gages, flex, inertial, and ultra-
sonic sensors [4]. The most often utilized sensors are inertial sensors [5, 6]. Sensors
with an accelerometer, gyroscopes, and magnetometers are inertial sensors.
Sensor-fusion algorithms frequently combine many sensors. For example, a glove
with several wearable sensors has been claimed to monitor hand motions [7]. A 3D
printer was used to create the glove housing, which includes flex sensors (on fingers),
pressure sensors (at fingertips), and an inertial sensor (on the back of one’s hand).
Inertial sensors are used to track hand motions in numerous sensor-fusion algo-
rithms. Additional hand data, such as finger snapping, hand grabbing, or finger-
spelling, is detected by other sensors, such as EMG. [8, 9]. Inertial and EMG sensors
are a popular combination. [8–13]. The inertial sensor determines the hand location,
while the EMG sensors offer additional information to comprehend complex finger
or hand gestures fully. Instead of EMG sensors, strain gages, tilt, and even vision
sensors can be used.
As a result, the amount of sensor data generated by these advanced gesture detec-
tion systems increases. Machine learning is being used to deal with the increasing
data. Sensors are introduced to a variety of machine learning approaches. A sensor
device processes a linear discriminant analysis or a support vector machine classifier.
[9, 14]. In another study, a feedforward neural network (FNN) is used for digitizing,
coding, and interpreting signals from a MEMS accelerometer [15].
In the meantime, inertial sensor-only techniques have been developed. This
inertial-sensor-only technique may improve portability and mobility while mini-
mizing processing needs in cases involving numerous sensors or complicated algo-
rithms. The handwriting was rebuilt using the phone’s gyroscope and accelerometer
after users used a smartphone as a pen to write words. [16]. English and Chinese
characters, as well as emojis, were written in handwriting. Kinematics based on iner-
tial sensor inputs were employed in other studies to track the movement of hands
and arms. [17–19]. Recognizing head or foot motions has also been described [20,
21], but they have not been modified for hand gesture identification.
Gesture Detection Using Accelerometer and Gyroscope 101
Inertial sensors are unquestionably accurate and fast as HMI input devices.
However, these two objectives are incompatible because increased accuracy typi-
cally increases computing load, resulting in sluggish speed. Furthermore, user move-
ments should be uncomplicated. Inertial-sensor-based gesture recognition systems,
yet again, have substantial drawbacks. One limitation is the accumulation of iner-
tial sensor noise, which creates bias or drift in the system output [22]. The second
disadvantage is that MEMS gyroscope and accelerometer signals can be jumbled
[23].
From simple constructions (such as moving average filters) to the recently created
outcomes, signal processing of inertial sensor outputs has been intensively researched
to overcome these challenges (such as machine learning). Two recent approaches are
digitizing sensor data to form codes and generating statistical measurements of the
signs to describe their patterns. One method identified seven hand motions using a
three-axis MEMS accelerometer. [24]. Hopfield network labels positive and negative
symptoms on accelerometer signals, digitized, and restored.
These accelerometer-only systems are good at capturing linear gestures (such
as up/down or left/proper patterns) but not so good at capturing circular motions
(e.g., clockwise rotation or hand waving). Recognizing linear and rotational gestures
has been suggested using accelerometers and gyroscopes. Using accelerometer and
gyroscope sensors mounted on the forearms, the researchers used the Markov chain
method to track the movement of the arms. [25]. Continuous hand gestures (CHGs), a
real-time gesture identification method, were disclosed in another recent work paper.
[26]. The approach begins by defining six basic gestures, determining their statistical
measurements, such as means and standard deviations (STDs), then generating a
database for each motion’s actions.
These accelerometer-gyroscope combos are highly accurate, but they are neither
portable nor inexpensive instruments. If we want to reduce the system’s size and use
and give numerous functions with a limited amount of hand motions, we need to
find a solution. This research aimed to create a small gesture detection device and
a modal HMI input device that could respond accurately and quickly to the user’s
intention to solve these issues.
The accelerometer, gyroscope, accelerometer-gyroscope fusion, ultrasonic, and
combination accelerometer-gyroscope with electromyography approaches are used
in reporting recent activities using sensor-based gestures as the HMI input. Rotational
motions cannot be detected with merely an accelerometer. As a result, this paper
opts for an accelerometer-gyroscope fusion system in the hopes of superior rotation
sensing (than the accelerometer-only systems). We believe that the originality of this
project is critical for portable HMI input devices, and it is demonstrated using an
Arduino Nano 33 BLE board, a very light embedded device. It is one of the most
suitable embedded devices for the project, with a weight of 5 g, a length of 45 mm,
and a width of 18 mm. Although the Arduino Nano is one of the most portable and
lightweight, it also poses a challenge, i.e., memory constraints. Due to its small size,
it has only 1 MB of flash memory and 256 KB of static RAM, making working on
it very difficult.
102 R. Gupta et al.
Our proposed system is set up to implement several essential features. First, we use a
collection of simple hand motions, each with predefined functions for different appli-
cation applications. Our system will be aware of the program that is now running. As
a result, the procedure carried out by each motion can vary depending on the appli-
cation, allowing for multifunction capabilities while reducing gesture complexity,
resulting in a highly diverse HMI input device.
The second feature is that the complete hardware used is an Arduino Nano 33
BLE which is lightweight and quickly fitted on a stick. The Arduino Nano 33 BLE
consists of an inbuilt IMU with one miniature three-axis gyroscope and one miniature
three-axis accelerometer. The fusion of these two sensors provides us with a large
amount of data about the object’s acceleration and rotational motion, i.e., the Arduino
Nano 33 BLE.
The third feature is hand gesture recognition in real time. To lessen the delay
caused by computing load, we will train our model on a machine with a lot of
processing capacity and then lower the model’s size using quantization, which
reduces the model’s height to the extent that we can handle it.
The last feature is system accuracy. Even though the complexity grows and
numerous sensors are employed as input sensors to produce a single gesture, suffi-
cient accuracy should be ensured. We strive to apply a pre-processing approach of
rasterization that turns the data from the accelerometer and gyroscope into a raster-
ized image. We train our model, which gives us a very high accuracy, to eliminate
errors caused by hand tremors or inadvertent hand gestures.
Our input device can be used in a variety of ways. This technology can benefit input
devices such as computers, laptops, portable multimedia players, wireless remote
controllers for presentation applications, and virtual reality modules. For example, a
user might connect the input device to a laptop and give a presentation to an audience.
Pause, play, or turn up the volume if he wants to view a video.
Even if we want to interact with the computer in such situations, many input
devices, such as a keyboard or mouse, may be required. However, all these can be
replaced by a single input device, which is portable, accurate, and our approach’s
primary target.
An overview of the system design is shown in Fig. 1. The IMU containing
accelerometer and gyroscope generates acceleration and angular velocity data from
hand gestures. It feeds it to a process of rasterization, which converts this data feed to
a rasterized image which is then given to a CNN machine learning model for training.
This pre-processing of rasterization ensures that the model is free of sensor noise,
limitations, or unwanted gestures. In addition, initially, while training, the machine
“learns” the preferences and habits of users. The pattern is fitted according to the
user’s gestures needs through the data of a single motion multiple times (Fig. 2).
Gesture Detection Using Accelerometer and Gyroscope 103
Fig. 2 Flowchart when the system is the indifferent process: a while training; b while predicting
gesture
4 Implementation
There are two primary components: the initialization phase and the main loop (Fig. 4).
4.1 Initialization
The initialization phase’s job is primarily to set up the IMU, and all the resources
needed to run the TensorFlow lite macro-model (Fig. 5).
The first step of the initialization phase is the IMU initialization, which is done
using this setup IMU routine. When you go into the setup IMU routine, you will
find device-specific calls that tap into the IMU functions that the library provides
(Fig. 6).
The second component is setting up all the resources needed to run the model.
This might be the model’s pointer, the interpreter’s initialization using the Tensor
arena, the model, the observer, etc.
Gesture Detection Using Accelerometer and Gyroscope 105
Its job is to get data from the gyroscope and the accelerometer and then process it.
Function calls readily available will allow us to read the data from the gyroscope and
the accelerometer. So, if data is available from the IMU, we will process that data
(Fig. 7).
106 R. Gupta et al.
The gyroscope ends up having a little bit of drift, so we must compensate for
that drift. And that is what this function estimating gyroscope drift is doing. When
we determine that the IMU is not moving using the accelerometer, we can calculate
the gyroscope’s importance and then account for it. Next, we want to integrate the
gyroscope’s incremental angular changes that are coming in overtime because that
will give us a part of the gesture in the spherical coordinate system, and that is how
this function updates orientation. It is trying to capture that part that is coming in
continuously. Next, we effectively want to project it into a two-dimensional plane
inside this physical system. Well, that is because it is much easier to understand a
2D gesture than a complex 3D gesture. And therefore, update stroke is going to do
that flat mapping.
Then comes the process of processing the accelerometer data. We want to estimate
the gravity’s direction to control the sensor’s role in the gyroscope readings. Everyone
Gesture Detection Using Accelerometer and Gyroscope 107
Fig. 6 The model initialization. a Model variable and space allocation, b model interpreter variable
allocation
is going to be holding the stick at a specific angular momentum. So, this means that
you must neutralize or normalize for that effect. For example, you might have the
bar with your right hand or hold the post with your left hand. However, the gesture
that you are performing is the same thing. Either way, we have got the same number
written, so we get to compensate for that. And the way we do that is by effectively
trying to figure out the role of the gyroscopes reading. And then, we update the
velocity to know when the sensor is still, and we can correct the sensor drift.
After that, the step of effectively capturing the data is to rasterize that stroke. We pre-
process this because it is easy to feed an image into a convolutional neural network.
And there is a function that helps us do that: rasterized stroke (Figs. 8 and 9).
108 R. Gupta et al.
Fig. 7 Read the data of the gyroscope and accelerometer and estimate the drift of the gyroscope
Fig. 8 Flatten the three-dimensional coordinates to two-dimensional coordinates and then rasterize
that into an image
4.4 Model
After pre-processing, the next part is to hand that rasterized image directly to our
convolutional neural net. In this case, we will pass in an RGB image, a red, green, and
blue image. So, there are three challenges to the idea that we are giving into the net.
And that input will then be run using a convolutional neural network, predicting the
gesture. To invoke the model, we must set up the input buffers. Also, due to having
Gesture Detection Using Accelerometer and Gyroscope 109
memory constraints, we must quantize our model. Table 1 shows how quantization
reduces the size of the model (Figs. 10, 11 and 12).
Fig. 10 Calling the TensorFlow lite micro-model for learning and classification
110 R. Gupta et al.
4.5 Output
We get the output from the neural network to see what it has determined as an actual
gesture, and in terms of processing the result, we print the work to the screen (Fig. 13).
The input device for HMI must perform a wide range of operations, yet it can
only recognize a limited amount of hand motions. The five gestures are depicted
in Figs. 14, 15, 16, 17 and 18. We move our IMU consisting of an accelerometer and
gyroscope in three-dimensional space.
Gesture Detection Using Accelerometer and Gyroscope 111
Fig. 14 C Alphabet
Fig. 15 L Alphabet
Fig. 16 I Alphabet
Fig. 17 O Alphabet
Gesture Detection Using Accelerometer and Gyroscope 113
Fig. 18 Z Alphabet
6.1 Verification
The opening of the media player is the initial feature. The current playing file can be
played and paused with the second function. The third option is to mute the movie
114 R. Gupta et al.
and increase or decrease the volume. Figure 20 shows an experimental video file
playback sequence. The following are the play, pause, and volume controls.
7 Conclusion
This paper proposes a sensor-based gesture recognition system that can be used
as an input device for the HMI system. Five gestures are being used for multiple
different applications. The same device behaves differently at some point for other
Gesture Detection Using Accelerometer and Gyroscope 115
Fig. 20 Sequence of experiments uses gestures as input to a multimedia video player and plays
and pauses the video
types of running on the device. If used to open an application for another application,
it could mute the device, pause it, or play it. The project’s primary emphasis is on
the project’s portability and fast and reliable recognition. For portability, we have
used Arduino Nano 33 BLE. We use the rasterization process for fast and reliable
recognition, which converts the three-dimensional spherical coordinates into two-
dimensional coordinates and then rasterizes the image. This image is easy to build a
highly accurate and robust model, which ensures our gesture recognition is perfect.
References
1. Pavlovic VI, Sharma R, Huang TS (1997) Visual interpretation of hand gestures for human-
computer interaction: a review. IEEE Trans Pattern Anal Mach Intell 19:677–695
2. Zhang Z (2012) Microsoft Kinect Sensor and Its Effect. IEEE Multimed 19:4–10
3. Cavalieri L, Mengoni M, Ceccacci S, Germani MA (2016) Methodology to introduce
gesture-based interaction into existing consumer product. In: Proceedings of the international
conference on human-computer interaction, Toronto, ON, Canada, 17–22 July 2016; pp 25–36
4. Yang X, Sun X, Zhou D, Li Y, Liu H (2018) Towards wearable A-mode ultrasound sensing for
real-time finger motion recognition. IEEE Trans Neural Syst Rehabil Eng 26:1199–1208
5. King K, Yoon SW, Perkins N, Najafi K (2008) Wireless MEMS inertial sensor system for golf
swing dynamics. Sens Actuators A Phys 141:619–630
6. Luo X, Wu X, Chen L, Zhao Y, Zhang L, Li G, Hou W (2019) Synergistic myoelectrical
activities of forearm muscles improving robust recognition of multi-fingered gestures. Sensors
19:610
7. Lee BG, Lee SM (2018) Smart wearable hand device for sign language interpretation system
with sensors fusion. IEEE Sens J 18:1224–1232
116 R. Gupta et al.
8. Liu X, Sacks J, Zhang M, Richardson AG, Lucas TH, Van der Spiegel J (2017) The virtual
trackpad: an electromyography-based, wireless, real-time, low-power, embedded hand-gesture-
recognition system using an event-driven artificial neural network. IEEE Trans Circ Syst II Exp
Briefs 64:1257–1261
9. Jiang S, Lv B, Guo W, Zhang C, Wang H, Sheng X, Shull PB (2018) Feasibility of wrist-worn,
real-time hand, and surface gesture recognition via sEMG and IMU sensing. IEEE Trans Ind
Inform 14:3376–3385
10. Pomboza-Junez G, Holgado-Terraza JA, Medina-Medina N (2019) Toward the gestural inter-
face: a comparative analysis between touch user interfaces versus gesture-based user interfaces
on mobile devices. Univers Access Inf Soc 18:107–126
11. Lopes J, Simão M, Mendes N, Safeea M, Afonso J, Neto P (2019) Hand/arm gesture
segmentation by motion using IMU and EMG sensing. Procedia Manuf 11:107–113; Sensors
19:2562
12. Kartsch V, Benatti S, Mancini M, Magno M, Benini L (2018) Smart wearable wristband for
EMG based gesture recognition powered by solar energy harvester. In: Proceedings of the 2018
IEEE international symposium on circuits and systems (ISCAS), Florence, Italy, 27–30 May
2018, pp 1–5
13. Kundu AS, Mazumder O, Lenka PK, Bhaumik S (2017) Hand gesture recognition based
omnidirectional wheelchair control using IMU and EMG sensors. J Intell Robot Syst 91:1–13
14. Tavakoli M, Benussi C, Lopes PA, Osorio LB, de Almeida AT (2018) Robust hand gesture
recognition with a double channel surface EMG wearable armband and SVM classifier. Biomed
Signal Process Control 46:121–130
15. Xie R, Cao J (2016) Accelerometer-based hand gesture recognition by neural network and
similarity matching. IEEE Sens J 16:4537–4545
16. Deselaers T, Keysers D, Hosang J, Rowley HA (2015) GyroPen: gyroscopes for pen-input with
mobile phones. IEEE Trans Hum-Mach Syst 45:263–271
17. Abbasi-Kesbi R, Nikfarjam A (2018) A miniature sensor system for precise hand position
monitoring. IEEE Sens J 18:2577–2584
18. Wu Y, Chen K, Fu C (2016) Natural gesture modeling and recognition approach based on joint
movements and arm orientations. IEEE Sens J 16:7753–7761
19. Kortier HG, Sluiter VI, Roetenberg D, Veltink PH (2014) Assessment of hand kinematics using
inertial and magnetic sensors. J Neuroeng Rehabil 11:70
20. Jackowski A, Gebhard M, Thietje R (2018) Head motion, and head gesture-based robot control:
a usability study. IEEE Trans Neural Syst Rehabil Eng 26:161–170
21. Zhou Q, Zhang H, Lari Z, Liu Z, El-Sheimy N (2016) Design, and implementation of foot-
mounted inertial sensor-based wearable electronic device for game play application. Sensors
16:1752
22. Yazdi N, Ayazi F, Najafi K (1998) Micromachined inertial sensors. Proc IEEE 86:1640–1659
23. Yoon SW, Lee S, Najafi K (2012) vibration-induced errors in MEMS tuning fork gyroscopes.
Sens Actuators A Phys 180:32–44
24. Xu R, Zhou S, Li WJ (2012) MEMS accelerometer based nonspecific-user hand gesture
recognition. IEEE Sens J 12:1166–1173
25. Arsenault D, Whitehead AD (2015) Gesture recognition using Markov Systems and wearable
wireless inertial sensors. IEEE Trans Consum Electron 61:429–437
26. Gupta HP, Chudgar HS, Mukherjee S, Dutta T, Sharma K (2016) A continuous hand gestures
recognition technique for human-machine interaction using accelerometer and gyroscope
sensors. IEEE Sens J 16:6425–6432
To Monitor Yoga Posture Without
Intervention of Human Expert Using 3D
Kinematic Pose Estimation
Model—A Bottom-Up Approach
1 Introduction
In the past few decades, many researches are accomplished on yoga. As a result,
few applications were developed on yoga which gives details of yoga on daily basis.
Few databases are developed, which contains a collection of different types of yoga
activities. In current trends, the researches on yoga have taken a different turn, where
systems such as real-time posture monitoring system and analysis of human body
temperatures and pressures during yoga, are developed.
1.1 Significance
The current paper mainly focuses on a real-time system which considers different
aspects, while performing yoga, i.e., posture monitoring, based on comparing the
gathered data with the existing data of yoga in the database.
The proposed system deals with monitoring the posture of yoga aasanaas without
human expert guidance while doing different steps in some aasanaas, in real time. The
system uses its underlying knowledge about the postures for aasanaas as a comparing
tool with the real-time yoga practitioner and thus monitors the posture. In summary,
this system helps in smooth practicing of yoga for the practitioners without any
human expert guidance.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 117
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_11
118 A. V. Navaneeth and M. R. Dileep
Human posture estimation goals at forecasting the postures of human body fragments
and linkages in images or videos. Since posture gestures are frequently determined
by some detailed human postures, the perceptive body posture of a human is acute
for movement recognition.
Totally, methods for posture approximation can be gathered into bottom-up and
top-down approaches. Bottom-up approach approximates body joints primarily and
then clusters them to form a unique pose. Bottom-up methods were pioneered with
deep cut. Top-down methods run an individual sensor primarily and approximation
of body joints inside the spotted vaulting packets.
3D humanoid posture approximation is castoff to forecast the positions of body
joints in 3D space. Also in the 3D posture, some approaches also improve 3D human
weave from imageries or videos. This arena has attracted abundant attention in recent
years; meanwhile, it is used to deliver widespread 3D structure info associated with
the human body. It can be functional to numerous solicitations, such as 3D animation
trades, virtual or augmented reality, and 3D action estimations. 3D humanoid posture
approximation can be achieved on monocular images or videos.
Furthermost approaches habit an N-joints rigid kinematic model where a human
body is characterized as an object with joints and members, comprising body
kinematic construction and body shape info.
Here are three kinds of models for human body modeling:
Kinematic Model, even known as skeleton-centered model, is castoff for 2D posture
approximation as sound as 3D posture approximation [9]. This stretchy and instinc-
tual human physique classic embraces a set of mutual locations and limb alignments
to characterize the human body construction. Consequently, skeleton posture approx-
imation models are castoff to acquire the relationships among various body portions
[10]. Conversely, kinematic models are restricted in demonstrating surface or outline
data as shown in Fig. 1.
Planar Model, or contour-specific architecture, is used for 2D postures estimates.
The planar replicas are recycled to characterize the presence and form of a human
figure [11]. Typically, body fragments are embodied by several rectangles resembling
the human physique delineations [12]. A prevalent illustration is the active shape
model (ASM) that is recycled to acquire the complete human physique grid and the
outline distortions by means of principal component analysis as shown in Fig. 1.
Volumetric model is implemented to estimate the 3D posture [13]. Here, several stan-
dard 3D human physique facsimiles recycled for deep learning built on 3D human
posture estimate for mending 3D human weave exist [14, 15]. For example, GHUM
and GHUML (ite) are completely trainable endways deep learning channels thought
on a greater clarity dataset of complete physique probes above 60,000 human forma-
tions to archetypal arithmetical and enunciated 3D human physique form and posture
as shown in Fig. 1.
To Monitor Yoga Posture Without Intervention … 119
2 Literature Survey
Yoga is an activity which boosts the physical and mental health. Various aasanaas
are performed in yoga. It is mandatory to practice yoga under the guidance of a
yoga expert. In the absence of the expert if done, mistakes might happen that leads
to physical problem. In this context, various systems are designed about posture
monitoring system, yoga database, measuring the effectiveness of yoga, measuring
for the right posture based on body temperature, blood pressure, etc.
Thangavelu and Mani [1] have proposed a real-time monitoring system for yoga
practitioners, which monitors yoga activity. Bowyer and Kevin [2] have made a survey
of approaches to three-dimensional face recognition for demonstration of ways of
recognizing points in face. Dileep and Danti [3] demonstrated lines of connectivity-
face model for recognition of the human facial expressions, which explains how
to identify point in face for deciding different expressions. Lee et al. [4] have
demonstrated a unique posture monitoring system for preventing physical illness
of smartphone users, which says about negative impact of mobile phone looking
posture. Muhammad Usama Islam, Hasan Mahmud, Faisal Bin Ashraf, Iqbal et al.
[5] worked on yoga posture recognition by detecting human joint points in real time
using Microsoft Kinect, where points are identified for detection. Patsadu et al. [6]
have worked on human gesture recognition using Kinect camera, which demon-
strates the type of gathering data. Obdržálek et al. [7] have proposed a real-time
human pose detection and tracking for telerehabilitation in virtual reality, which
signifies processing of pose detection. Lee and Nguyen [8] have demonstrated the
human posture recognition using human skeleton provided by Kinect for posture
recognition.
120 A. V. Navaneeth and M. R. Dileep
The prime focus in this paper is monitoring the posture of the body activity during
performing yoga using video, by comparing with the existing data in the database,
i.e., training set. The main yoga type used in this paper is Suryanamaskar, which
consists of 12 different steps, and the proposed system will be built for identifying
postures in it. The rest of this paper is being organized as follows: Sect. 3 describes
the proposed methodology. Section 4 presents the proposed algorithm. Section 5
deals with experimental analysis and assumptions. Section 6 draws the conclusions
and discussions. Finally, Sect. 7 provides the future scope of the proposed approach.
3 Proposed Methodology
3.1 Suryanamaskar
Step 1: Namaskarasana
In this aasanaa, bring both the hands in front of the chest, press the palm tightly,
and keep the breathing normal.
Step 2: Urdhvasana
In this step, rise both hands upper side, with that rise the head in proportion with
hands in 90degree inclination, inhale breath, and sustain the breath in that position
for some seconds.
Step 3: Hastapadasana
In this step, keep both the palms in straight position on the floor with a width
equal to shoulders width, note that knees should be straight, both palms and toes
are in proper reference line, and exhale and sustain for few seconds.
Step 4: Ekapada Prasaranasana (right or left leg alternatively)
In this step, move one of the legs backward and land on the floor only on toes,
and just place the knee of the corresponding leg on the floor, (no weight should
be applied on the knee, and it should just touch the floor), and keep the breath
normal.
Step 5: Dwipada Prasaranasana
In this step, move the other feet in the adjacent position to the first leg as in step
4, and keep your body in straight position like a stick parallel to the ground. Now,
the entire body weight remains on palm and toes. Keep the breath normal and
sustain in the same position for some seconds.
Step 6: Bhoodharasana
In this step, with the help of your entire body strength, touch the entire feet to the
ground by raising the hip level upward. The body shape looks like inverted “V”
shape. Stay in same position for some seconds, with normal breathing.
122 A. V. Navaneeth and M. R. Dileep
Step 7: Saashtaangapraneepaataasana
In this step, bring your body in horizontal position with the floor, by just touching
forehead, chest, and knees to floor (all the body weight should be on palm and
toes) and inhale and sustain in the same position for some seconds.
Step 8: Bhujangaasana
In this step, raise the head, shoulders, and look upward, in this position waist
should be near to the floor, and inhale and sustain the same position for some
seconds.
Step 9: Bhoodharasana
Perform step 6.
Step 10: Ekapada Prasaranasana (right or left leg alternatively)
Perform step 4 for the corresponding leg in reverse order.
Step 11: Hastapadasana
Perform step 3 for the corresponding leg in reverse order.
Step 12: Namaskarasana
Return to Namaskarasana, i.e., step 1.
Fig. 3 Identification of
points
To Monitor Yoga Posture Without Intervention … 123
Fig. 5 Identification of
angles
In the next step, the angle between various points is measured and recorded for
each aasanaa. In Fig. 5, the angles are measured for Bhoodharasanaas a sample
for presentation, and similarly for each aasanaa the angles will be measured and
recorded. The angles are measured by using some vectors for each identified point.
4 Proposed Algorithm
Step 1: Read the input video data from both the cameras, i.e., left and front cameras
and breath data from the breathing sensor fixed and store it instantaneously.
124 A. V. Navaneeth and M. R. Dileep
(The first reading will be taken as training set, store it in database, and repeat
step 1 for testing set). The activities to be performed in the reading of the video
data are identification of the points given by θ n and the time spent for a particular
posture in Tn, measurable in milliseconds.
Step 2: Verify the mapping between front camera data given by βF and left side
camera data given by βL for the training set given by
βn = MappingFunction(β F, β L) (1)
Step 3: Compare the data collected from front and left side camera with the
training set data instantaneously, while the system completes the reading of iden-
tification points for the testing set for the video data βn. The difference in the
angle between the training and the testing set is given by
Σ
n−1
μn = MatchDistance (θ n−βn) (2)
n=0
Step 4: Verify the breath data for each posture given by BTn, at time intervals
Tn, by comparing with training set data given by Bn, at time intervals Tn. The
difference is given by notation BTRn, at time intervals Tn is as follows:
Σ
n−1
BTRn = Match Distance (Bn − BTn) (3)
n=0
Step 5: Set the allowable threshold value in terms of angle for the video data and
in terms of BPM for the breath data for the successful completion of a posture.
The reason to include the threshold values is because of the different sized body
structures of the practitioners. The calculations can be done in the way shown as
below
The experimental analysis and assumptions includes the following: The posture of
the yoga practitioner will be captured in terms of video, and single camera is used to
capture the video from one of the side angles that is, in this paper, left sided video
To Monitor Yoga Posture Without Intervention … 125
capturing is preferred. In the next stage, the proposed algorithm compares the angles
from the collected vectors (testing set) with training set which is already captured in
terms of video from the same proposed algorithm and which is already stored in the
database.
A threshold value should be fixed for compensation for the postures in the aasanaas
with some differences in angles. The threshold is termed as T, to be fixed for the
training set, and it consists of the angle value for a vector for a particular aasanaa
along with plus or minus values. These plus or minus values are considered to be
threshold value, and the vector value for a particular aasanaa in testing set is valid
when it matches with the training set plus threshold values to draw the conclusion.
The proposed model specifies an automated system for monitoring yoga posture. The
system gets the input from a camera in terms of video; in the proposed system, the
camera is placed in the left side of the practitioner, from where the data is captured.
This data is called testing set. The testing set has to be compared with the training set
which is already stored in the database in video format. The training set consists of
the angles for the aasanaas for yoga by an expert practitioner with perfect postures,
and hence, this will be considered as training set. A mathematical model compares
both training set and testing set in terms of vectors, and conclusion can be drawn.
Since proposed system is developing globally, by considering the various body
structures of human, an attempt has been made in giving some relaxation for the
angles in aasanaas; i.e., a threshold T gives the plus or minus angles of errors in
performing aasanaas, the method of fixing threshold T for each aasanaa is out of the
scope of this paper, and it may be considered as a future development.
7 Future Scope
In this paper, a model system for monitoring yoga posture is proposed. The proposed
system uses video capturing for monitoring yoga postures. In this context, the system
uses a high definition camera, which is placed in the left side of the yoga practitioner,
and posture information will be gathered in terms of video by identifying some of
the point from the video.
The proposed system can be further developed by placing one more high definition
camera from the front side of the yoga practitioner, thereby gathering again some data
about the posture, by some identification points. In this stage, the yoga monitoring
system will be having two HD cameras that is one from left side and one from the front
side. The necessity of the front side camera is to ensure the balanced structure of the
body while performing some particular yoga aasanaa. In the next level, an algorithm
126 A. V. Navaneeth and M. R. Dileep
compares both the data, which are collected by front and side angle cameras with
some constraints and thresholds, and conclusion will be drawn.
References
1. Thangavelu A, Mani P (2017) A real time monitoring system for yoga practitioners. Int J Intell
Eng Syst
2. Bowyer KW (2004) A survey of approaches to three-dimensional face recognition, patt. Recogn
1:358–361
3. Dileep MR, Danti A (2013) Lines of connectivity-face model for recognition of the human
facial expressions. Int J Artif Intell Mechatron 2(2):2320–5121
4. Lee H, Lee S, Choi YS (2013) A new posture monitoring system for preventing physical illness
of smartphone users. In: 10th annual IEEE CCNC
5. Islam MU, Mahmud H, Ashraf FB, Hossain I, Hasan MK (2017) Yoga posture recognition by
detecting human joint points in real time using Microsoft Kinect. UTC from IEEE Xplore
6. Patsadu O, Nukoolkit C, Watanapa B (2012) Human gesture recognition using Kinect camera.
In: 2012 International joint conference on computer science and software engineering (JCSSE).
IEEE, 2012, pp 28–32
7. Obdržálek Š, Kurillo G, Han J, Abresch T, Bajcsy R (2012) Real- time human pose detection
and tracking for tele-rehabilitation in virtual reality. Stud Health Technol Inf 173:320–324
8. Le TL, Nguyen MQ (2013) Human posture recognition using human skeleton provided by
Kinect. In: 2013 International conference on computing, management and telecommunications
(ComManTel). IEEE, 2013, pp 340–345
9. Yuan Y, Wei SE, Simon T, Kitani K, Saragih J (2021) Simpoe: simulated character control for
3d human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision
and pattern recognition, pp 7159–7169
10. Zhou X, Sun X, Zhang W, Liang S, Wei Y (2016) Deep kinematic pose regression. In: European
conference on computer vision. Springer, Cham, pp 186–201
11. Mondragón IF, Campoy P, Martinez C, Olivares-Méndez MA (2010) 3D pose estimation based
on planar object tracking for UAVs control. In: 2010 IEEE international conference on robotics
and automation, pp 35–41. IEEE, 2010
12. Frohlich R, Tamas L, Kato Z (2019) Absolute pose estimation of central cameras using planar
regions. IEEE Trans Pattern Anal Mach Intell 43(2):377–391
13. Fabbri M, Lanzi F, Calderara S, Alletto S, Cucchiara R (2020) Compressed volumetric heatmaps
for multi-person 3d pose estimation. In: Proceedings of the IEEE/CVF conference on computer
vision and pattern recognition, pp 7204–7213
14. Gessert N, Schlüter M, Schlaefer A (2018) A deep learning approach for pose estimation from
volumetric OCT data. Med Image Anal 46:162–179
15. Li Y, Wang Y, Case M, Chang SF, Allen PK (2014) Real-time pose estimation of deformable
objects using a volumetric approach. In: 2014 IEEE/RSJ international conference on intelligent
robots and systems, pp 1046–1052. IEEE
Upshot and Disparity of AI Allied
Approaches Over Customary Techniques
of Assessment on Chess—An Observation
1 Introduction
Chess is one of the few arts where the composition of the tactics should be applied
appropriately for achieving best performance. The best human chess players imple-
ment several tactics, strategies, and plans towards achieving best outcomes. But
with computer chess engine the case is different, the thought process of computer
completely dependent on calculating all the possible moves and choosing best which
is suitable. But this approach has its own limitations in terms of size, speed, dimen-
sion of the data, and depth of the search. To compensate this problem, computer
scientist came up with different approaches so that the accuracy of the system is
achieved without diminishing the speed. The upcoming subsections demonstrate
few such techniques which are implemented in traditional and modern approaches
of evaluation.
In the traditional approach, the chess engine considered is Stockfish. The evalu-
ation methodology followed by the engine involves two famous methodologies that
is Minimax algorithm and Alpha–Beta pruning algorithm.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 127
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_12
128 A. V. Navaneeth and M. R. Dileep
terms considered for the players in the algorithms are termed as minimizer and maxi-
mizer. The maximizer attempts to acquire the maximum mark promising, whereas
the minimizer attempts to fix the conflicting and become the deepest mark probable.
Each board state obligates a value related with it. At some position if the maximizer
has higher indicator then, the mark then positive value will be assigned to the board
position. In case if minimizer is higher indicator, then it will result in negative value
assignment. Some heuristic techniques are used to perform these calculations in the
game. The working principle of Minimax algorithm is as depicted in Fig. 1, with an
example.
In the modern approach, the chess engine considered is AlphaZero. The evaluation
methodology followed by the engine involves famous methodology that is Monte
Carlo Tree Search (MCTS) algorithm.
Upshot and Disparity of AI Allied Approaches Over Customary … 129
Monte Carlo Tree Search (MCTS) is a method in the arena of artificial intelligence
(AI). It is a probabilistic and exploratory determined search algorithm that syndicates
the typical tree exploration applications together with machine learning ideologies
of reinforcement knowledge n tree search, and there’s always the probability that the
existing finest action is really not the maximum ideal action. In such cases, MCTS
algorithm becomes valuable as it endures to estimate extra substitutes occasionally
throughout the learning stage by performing them, instead of the existing apparent
ideal policy. This is identified as the “exploration-exploitation trade-off”. It exploits
the activities and policies that is originate to be the finest till now but also essential
to endure to discover the native space of substitute choices and catch out if they
can substitute the existing paramount. Various steps in Monte Carlo Tree Search
algorithms are selection, expansion, simulation, and backpropagation.
Selection: The MCTS algorithm navigates the up-to-date tree from the root node
by means of a precise policy. The policy habits an assessment utility to pick the
nodes with the maximum assessed significance. MCTS habits the Upper Confidence
Bound (UCB) principle implemented to trees as the policy in the selection procedure
to navigate the tree. It equilibriums the exploration-exploitation trade-off.
Expansion: In this procedure, a fresh child node is supplemented to the tree to that
node which was optimally extended throughout the selection procedure.
Simulation: In this method, a simulation is accomplished by selecting moves or
policies till a consequence or predefined condition is attained.
Backpropagation: Subsequently defining the assessment of the freshly supple-
mented node, the residual tree need be restructured. So, the backpropagation method
is accomplished, where it backpropagates from the fresh node to the root node. In
the progression, the number of simulation deposited in each node is incremented. If
130 A. V. Navaneeth and M. R. Dileep
the fresh node’s simulation consequences in a victory, then the number of victories
is as well incremented. The working of MCTS is as shown in Fig. 3.
2 Literature Survey
From the ancient times, chess dragged the interest of the mathematicians because
of its enormous set of possibilities and combinations. Several works has been done
in the past towards development of efficient chess engines. During 1984, Google’s
DeepMind was the first officially rated chess engine to show superhuman capabili-
ties by defeating the best human grandmaster. In the subsequent years, tremendous
research work has been done towards the development of chess engines by incor-
porating scientific and mathematical approaches. As a result, many efficient chess
engines are incubated to perform in the real world. The algorithms which are devel-
oped for the chess engines are also capable of solving other real world problems
other than chess which mainly contains huge set of combinatorial and complex data,
and the decisions are drawn on the basis of random factors.
Browne et al. [1] has conducted a survey on Monte Carlo Tree Search methods.
Carlsson et al. [2] has demonstrated the process involved in AlphaZero to alpha hero:
A pre-study on additional tree sampling within self-play reinforcement learning.
Chan et al. [3] has made a detailed study on Theory and applications of Monte Carlo
simulations. Dehghani et al [4] has made cumulative study on GA-based method for
search-space reduction of chess game tree. Fuller et al. [5] has made the detailed anal-
ysis of the Alpha–Beta pruning algorithm. Gao et al. [6] has made a detailed survey
on efficiently of Mastering the Game of NoGo with Deep Reinforcement Learning
Supported by Domain Knowledge. Johnson et al. [7] has made an holistic survey
on new family of probability distributions with applications to Monte Carlo studies.
Kasparov [8] has made deep analysis on chess: a Drosophila of reasoning, which is
Upshot and Disparity of AI Allied Approaches Over Customary … 131
3 Comparative Analysis
Since both Stockfish and AlphaZero chess engines are built on different evalua-
tion methods, the process of making comparative analysis also differs. Some of the
common parameters considered for comparison are search depth, time complexity,
and accuracy towards making the best move.
3.1 Stockfish
The performance analysis of the Stockfish chess engine can be visualized in several
perceptions. Table 1 depicts the searching depth of the evaluation method used in the
Stockfish where search will be performed on the game tree created during the game.
In Table 1, the search for the best move was experimented at certain stage of
the game with varying depths, and for different set of CPU which show diversity in
playing strength under certain time constraints.
Table 2 represents the playing capability of the Stockfish when executed on single
CPU, 4 CPUs, and 8 CPUs. The games played were 100 among each set of CPUs
132 A. V. Navaneeth and M. R. Dileep
with a fixed depth of 42, and it is observed that the performance was greatly increased
with number of processors. Each win is equal to 1 point, loss equals 0 point, and
draw signifies 0.5 point.
Another most important dimension of performance analysis of Stockfish is the
mode of play. The pre-requisites in this case are fixed depth, fixed no of CPUs,
fixed/varying time, and fixed opponent. A remarkable set of performance difference
is observed, when the engine is compelled to play in different modes as given in
Table 3.
3.2 AlphaZero
AlphaZero searches just 80,000 positions per second in chess compared to 70 million
for Stockfish. AlphaZero recompenses for the subordinate amount of estimations by
means of its deep neural network to emphasis abundant extra selectively on the utmost
auspicious variant. It was taught expending 5000 tensor processing units (TPUs), and
a 44-core CPU in its matches as given in Table 4.
Upshot and Disparity of AI Allied Approaches Over Customary … 133
As per the study done on two famous chess engines, the following observations are
listed as following. With the increase in processing capacity of the computer, the
performances of both the engines were enhanced greatly. As per the depth levels,
Stockfish has achieved greater performance with an exponential raise in the node
analysed versus node visited towards finding the best solution from the game tree.
However, AlphaZero purely relayed on the logical perspective of finding the best
solution by partially considering the depths of the tree, which basically depends
on the randomization process involved within. Coming to the first move advantage,
both the engines are able to achieve superior win, loss, and draw ratios. Finally, when
both the engines made to play against each other under standard environments, it is
observed that AlphaZero was the strongest as shown in Fig. 4.
134 A. V. Navaneeth and M. R. Dileep
5 Conclusion
6 Future Enhancement
The approaches discussed in the paper are not only limited to the context of the subject
conferred, but can be extended also to the fields where high dimensions of data with
complex combinations arises, and the problems where randomization is required.
The fields where such kind of problems arises can be seen are as following. First in
the field of stock marketing, where continuous variations arises over commodities,
and collectively if need to analyse them collectively may give raise to the complex
structures. Second field is medical, where protein structure can be studied based on
some structural parameters where there is a chance high size and dimension of the
Upshot and Disparity of AI Allied Approaches Over Customary … 135
data may arise. Third is in the field of image processing in image recovery since high
dimensions of data can be expected from the concerned images.
References
1. Browne CB, Powley E, Whitehouse D, Lucas SM, Cowling PI, Rohlfshagen P, Tavener S, Perez
D, Samothrakis S, Colton S (2012) A survey of Monte Carlo tree search methods. IEEE Trans
Comput Intell AI Games 4(1):1–43
2. Carlsson F, Öhman J (2019) Alphazero to alpha hero: a pre-study on additional tree sampling
within self-play reinforcement learning
3. Chan WKV (ed) (2013) Theory and applications of monte Carlo simulations. BoD–books on
demand
4. Dehghani H, Babamir SM (2017) A GA based method for search-space reduction of chess
game-tree. Appl Intell 47(3):752–768
5. Fuller SH, Gaschnig JG, Gillogly JJ (1973) Analysis of the alpha-beta pruning algorithm.
Department of Computer Science, Carnegie-Mellon University
6. Gao Y, Lezhou W (2021) Efficiently mastering the game of NoGo with deep reinforcement
learning supported by domain knowledge. Electronics 10(13):1533
7. Johnson ME, Tietjen GL, Beckman RJ (1980) A new family of probability distributions with
applications to Monte Carlo studies. J Am Stat Assoc 75(370):276–279
8. Kasparov G (2018) Chess, a drosophila of reasoning 362:1087–1087
9. Lai M (2015) Giraffe: using deep reinforcement learning to play chess. arXiv preprint arXiv:
1509.01549
10. Maesumi A (2020) Playing chess with limited look ahead. arXiv preprint arXiv:2007.02130
11. McGrath T, Kapishnikov A, Tomašev N, Pearce A, Hassabis D, Kim B, Paquet U, Kramnik V
(2021) Acquisition of chess knowledge in AlphaZero. arXiv preprint arXiv:2111.09259
12. Moerland TM, Broekens J, Plaat A, Jonker CM (2018) A0c: AlphaZero in continuous action
space. arXiv preprint arXiv:1805.09613
13. Mordechai S (2011) Applications of Monte Carlo method in science and engineering
14. Motohiro T (1986) Applications of Monte Carlo simulation in the analysis of a sputter-
deposition process. J Vac Sci Technol A Vac Surf Films 4(2):189–195
15. Pearl J (1982) The solution for the branching factor of the alpha–beta pruning algorithm and
its optimality. Commun ACM 25(8):559–564
16. Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M et al (2018) A
general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.
Science 362(6419):1140–1144
17. Stockman GC (1979) A minimax algorithm better than alpha–beta? Artif Intell 12(2):179–196
18. Vardi A (1992) New minimax algorithm. J Optim Theory Appl 75(3):613–634
19. Wang H, Preuss M, Plaat A (2021) Adaptive warm-start MCTS in AlphaZero-like deep
reinforcement learning. arXiv preprint arXiv:2105.06136
20. Wang H, Preuss M, Emmerich M, Plaat A (2020) Tackling Morpion solitaire with AlphaZero-
like ranked reward reinforcement learning. In: 2020 22nd international symposium on symbolic
and numeric algorithms for scientific computing (SYNASC). IEEE, pp 149–152
Network Intrusion Detection Using
Neural Network Techniques
1 Introduction
With the advent of technology, network security has become a major concern in
today’s society. The network intrusion detection system (NIDS) observes the data
traffic for distrustful activity and issues alarms when such activity is detected. Any
hazardous activity is usually reported to the supervisor or collected centrally using
security information and event management system (SIEM). Using the SIEM frame-
work, one can detect dangerous activity from false alarms by integrating output from
a variety of sources. Even though intrusion detection systems monitor networks for
potentially harmful activity, false alarms might be generated as well. Henceforth,
IDS products should be adjusted when first launched for organizations. It means
appropriately setting up the intrusion system to determine what normal traffic on the
network is similar to when compared to dangerous activity.
A. Classification of Intrusion Detection System:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 137
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_13
138 B. A. Manjunatha et al.
Signature-based Method:
Signature-based IDS identifies design put together assaults with respect to arrange
traffic like number bytes or number 1 or 0. It can likewise identify assaults dependent
on malignant orders utilized by malware. IDS marks designs are identified by the
framework.
Signature-based IDS can identify a current example (signature) assault as of now
inside the framework while it is truly challenging to recognize new malware assaults
as their examples (signatures) are obscure.
Anomaly-based Method:
To identify unknown malware attacks as the new malware is quickly creating, a
mysterious ID has been presented. The anomaly-based IDS utilizes AI to make a
dependable work model, and anything that comes in is contrasted with that model, and
assuming it does not fit into the model, it is depicted as suspicious. Contrasted with the
Network Intrusion Detection Using Neural Network Techniques 139
2 Related Work
In this paper [1], the model of NID (network intrusion detection) is proposed based
on the CNN-IDS. Using a variety of size reduction techniques, irrelevant features for
traffic data network are removed first. Then, the dimensions of reduction data size are
automatically extracted using convolutional neural network, and to obtain the most
effective information to identify interference data is extracted by supervised learning.
To minimize computational costs, we converted the first vector traffic format into an
image format and used standard dataset, i.e., KDD-CUP99 to evaluate performance of
the CNN model. Test outcomes suggest that AC, FAR and time-keeping convolution
neural network intrusion system models are better than conventional algorithms.
Thus, proposed model is not only of research value but also has practical values.
In this paper [2], network traffic models are in series of time, especially the
Transmission Control Protocol or Internet Protocol (TCP/IP) for a particular period
with supervised learning methods such as multilayer perceptron (MLP), CNN, CNN-
recurrent neural network (CNN-RNN), a long-term memory for CNN (CNN-LSTM)
and the CNN-gated recurrent unit (GRU), using millions of well-known network
connections and poor network connections. To measure the effectiveness of the above
methods, we need to test on dataset such as KDD-CUP99. Select complete network
configuration, complete test for various MLP, CNN, CNN-RNN, CNN-LSTM and
CNN-GRU and titles, using network settings and parameters. Models in each test
are used up to 1000 times for obtaining level information in the range [0.01–05].
Compared to the classic machine learning classes, CNN and its distinctive archi-
tecture have performed very well. This is due to the fact that CNN is responsible
for issuing high-quality feature presentations that represent the type of low-level
network connection features.
As we know, deep learning is the most modern technology which automatically
removes features from the samples [3]. The accuracy of intrusion detection is not high
on traditional machine technology considering this fact; the paper proposes a network
intrusion model-based CNN algorithms. The model can extract the powerful features
of input samples, so the input samples can be in the correct order. The test results
in the dataset KDD99 show that the model which is been proposed can significantly
enhance/improve the accuracy rate of the intrusion detection model.
The author proposed a novel NIDS system which is based on CNN (convolu-
tional neural network) [4]. We train deep learning-based detection models using
both extracted features and original network traffic. They conducted comprehensive
experiments using well-known benchmark datasets. The results verify the effective-
ness of our system and also demonstrate the model trained through raw traffic has
better than the model trained using extracted features.
140 B. A. Manjunatha et al.
Search based on network threats from attackers has been increased [5], and also,
system security has become increasingly important, because the number of devices
connected to the Internet is growing up. And there are common attacks, such as
the DDoS (Distributed Denial of Service) that caused widespread damage for the
companies. A new ID (network intrusion system) is proposed which is based on
Tree-CNN algorithm with SRS (Soft-Root-Sign) feature. The test results show that
hierarchical model been proposed achieves a certain reduction in kill time, by about
36%, with an average detection accuracy of 0.98 taking into account all attacks
analyzed.
Main purpose to propose this paper [6] is to improve the Internet security on IDS
(intrusion detection system) which is based on CNN. The IDS model proposed is
intended to detect network intrusion by separating package traffic into a network such
as benign or malicious training. A detailed study of the proposed model referred to
nine different classifiers has been presented.
In this paper [7], we propose a novel intrusion detection method based on the adap-
tive synthetic sampling algorithm (ADASYN) and the convolutional neural network
(CNN), to develop complete IDS capabilities and strengthen network security. First,
we use the ADASYN method to stabilize the sample distribution which can effec-
tively prevent the model from becoming resistant to large samples and forgetting
small samples. Second, enhanced CNN is based on the split convolution module
(SPC-CNN), which can enhance a variety of features and eliminate the impact of
unnecessary data between channels training models. Then, the AS-CNN model mixed
with ADASYN and SPC-CNN is used for entry-level operations. At last, a stan-
dard dataset NSL-KDD was chosen for AS-CNN testing. The simulation shows an
accuracy of 4.60% and a pair of 0.79% is better compared to traditional CNN and
RNN models, with the detection rate (DR) increasing with 11.34% and 10.27%,
respectively.
In many applications, the skill of the deep learning methods is proven to be supe-
rior to the old techniques [8]. Similarly, this network research focuses on intrusion
detection and use of LeNet-5-based convolutional neural networks (CNNs) to detect
network threats. Tests show that accuracy of IDS goes up to 99.6% with more than
thousand samples. The average accuracy is 97.53%.
Based on the convolution neural network [9], two layers of convolution and
pooling layers have been used; a batch normalization layer is introduced for each
convolution layer to enhance community distance and speed from the collapse mode.
During this test, Adam optimizers and SGD were used to train the model, respec-
tively. By this, we come to know that Adam optimizer has high performance. When
epoch = 200, model accuracy can reach 0.9507, and the average F1 value can reach
0.9438.
In this paper [10], a network detection structure has been proposed based on
examining different algorithms such as NB and XGBoost and then applied the SSA
as the FS technique. Here in this proposed model, most applicable and best features
are selected by applying the SSA which increases the model performance. A sturdy
Anomaly Network Detection model was built using the SSA-XGBoost and SSA-NB
classification algorithms. Here to examine the model performance, NSL-KDD and
Network Intrusion Detection Using Neural Network Techniques 141
UNSW_NB15 primary datasets were used. A high accuracy and performance were
achieved by our proposed network detection model with a high detection rate, low
false alarms, and its effectiveness. Further, the detection rate could be increased in the
further work by using various other methods with the unlike datasets and prevail over
the difficulties proposed by data imbalance in order to improve the model efficiency.
In the given paper [11], a wireless network intrusion detection model which
depends on the improved convolutional neural network [CNN] has been proposed.
Training and testing experiments were done in IBWNIDM with the help of the
training and test sets of data which was pre-processed. The experiments stated with
a high true positive rates and low false positive rates of network intrusion detection
of IBWNIDM.
This paper [12] has proposed network intrusion detection based on CNN which is
designed with a combination of SMOTE-ENN. The model is adaptive to different data
environments as the CNN has the feature which selects the features automatically.
Minority samples are synthesized by applying the algorithm SMOTE-ENN. The
accuracy rate is 83.31%. The detection rate has gradually increased in particular
from 26 to 77%. Here, it can be concluded that this proposed CNN model best suits
for imbalanced network traffic of network intrusion detection system.
A. Feature Extraction
Feature extraction has been pre-requisite for an efficient working of an intrusion
detection system. It aims to decrease the number of resources required to report
a large set of data. Performance of the model will be majorly affected when the
features are selected inappropriately.
B. Classifier Construction
As it is extremely tough to find new attacks by only training on limited audit data,
the classification precision of the majorly existing models needs to be enhanced.
As the modeling of normal patterns is difficult and normally the false alarm
rates are huge but can detect novel attacks. Hence, the classifier construction
for an intrusion detection based on machine learning remains as other technical
challenge.
C. The False Positive Rate
When the IDS detects an activity as an attack but also accepts the behavior of an
activity, it can be stated as false positive state. And the most serious or dangerous
state can be the false negative state. It is the activity which is actually an attack
but the IDS detects it as acceptable. It has been calculated that up to 99% of
alerts stated by IDSs are not related to security issues.
142 B. A. Manjunatha et al.
4 Comparison Table
(continued)
Author Year of publishing Methods Dataset Challenges and issues
/software
Zhiquan Hu, Liejun 2020 Improved CNN and NSL-KDD Existing algorithms
Wang, Lei Qi, AS-CNN models accessible for
Yongming Li, and using adaptive CNN-based access
Wenzhong Yang synthetic sampling neglected to resolve the
(ADASYN) issues of inconsistent
data dissemination and
the requirement for
exchange data
Wen-Hui Lin, 2018 LeNet-5 model KDD-CUP99 By using advanced
Hsiao-Chung Lin, Ping behavioral highlights
Wang, Bao-Hua Wu, from prepared CNNs,
Jeng-Ying Tsai the proposed technique
improves the accuracy
of intervention
detection to detect
threats
Wei-Fa Zheng 2020 Models were trained KDD-CUP99 To overcome the
using (SGD) technical limitations of
stochastic gradient intervention detection,
descent and Adam such as their low
optimizers accuracy and flexibility
Alanoud Alsaleh and 2021 Salp swarm NSL-KDD and Anomaly NIDS
Wojdan Binsaeedan algorithm, feature NSW-NB15 performs much better
selection than the latest strategies
suggested in the
literature when it comes
to memory, detection
rate and false alarm
level on both databases
Hong Yu Yang and 2019 TensorFlow, ICNN and There are issues with
Fengyan Wang convolutional neural IBWNIDM over-fitting and
network (CNN) generalization during
the model training
process
Xiaoxuan Zhang, Jing 2019 Synthetic minority NSL-KDD Using traditional
Ran, Jize Mi oversampling machine learning to
technique combined improve IDS
with edited nearest performance
neighbors
(SMOTE-ENN)
algorithm
D. Unbalanced Dataset
The great differences in the distribution of the classes in the dataset are defined
as the unbalanced dataset. It means that the dataset is biased toward a class in
the dataset. If the dataset is biased toward one class, an algorithm trained on the
same data will be biased toward the same class.
E. Lack of Realism
To calculate the performance of an intrusion detector with the help of synthetic
traces, it is necessary that the traces reflect with the environment in which the
144 B. A. Manjunatha et al.
detector can be deployed. If the traces are not real, then the detection task can be
too difficult or too easier, which results in an underestimation or overestimation
of the performance of the detector.
F. Low Detection Rate
The classifier lacks the ability to classify the instance (events) correctly. This
affects the detection rate, and the accuracy of the system is reduced.
G. Understanding and Investigating Alerts
Investigation of IDS alerts is huge time and resource-consuming and requires
supplementary information from other systems which help in deciding whether
the alarm is serious. Professional skills are required to predict the system outputs,
and many organizations are in need of devoted security experts which are capable
of executing this crucial function.
5 Proposed Approach
The NSL-KDD dataset is nothing new of its kind, as that is a refined model
of the dataset, i.e., KDD 1998 developed by Ali A. Ghorbanifar in the Network
Security Laboratory (NSL), and seems to address some of the underlying issues,
but problems like talking about McHugh.
The IDS dataset advanced by Tezpur University (TUIDS), India, created
features caused during float stage method in the physical test bed and did not
include features that could take place all through the flow capturing process.
The dataset features consist of attacks only.
The USW-NB15 is the 100 GB synthetic dataset used on this research,
coined by Mustafa et al. At the University of New South Wales, Canberra,
Australia IXIA Perfect Storm device in the Cyber Range Laboratory. This
dataset is generated in PCAP documents with an aggregate of ordinary and
intrusive traffic v and other exquisite features.
(ii) Data Pre-processing
Data pre-processing is the process of preparing raw data and adapting it to a
machine learning model. It is the first and most important step in creating a
machine learning model. Pre-processing is mainly performed to assess data
quality.
Steps involved in Data pre-processing:
relevant to the intrusion detection task and also the redundancy in between the
features.
(iv) Machine Learning Applications
• Decision Tree Algorithm:
Decision tree algorithm evaluates the information and acknowledges the
critical qualities in the system that demonstrates the malicious activities
and then increases the value of some security frameworks by checking the
positioning of intrusion identification details.
• Naive Bayes Algorithm:
Naive Bayes algorithm is an administered learning calculation; it is a
successful and most basic classification calculation that aids in making of
the quick Artificial Intelligent models which makes fast forecasts. It is a
classifier; that is, it can anticipate which depends on the nature of an item.
• Convolutional Neural Network:
Convolution neural network [CNN] is a profound learning method, which
has many cutting edge executions on order undertakings. Essentially, CNN
was found with the execution of picture handling which contrasts in having
a convolutional channel contrasted with the completely associated neural
organization. The three parts in a CNN are, firstly, the convolutional layer,
the pooling layer and the order layer. Here, the convolutional layer and
pooling layer are used for the highlight extraction; the characterization layer
associated toward the finishing of the organization performing interruption
discovery.
• MLP Classifier:
A multilayer perceptron (MLP) is the forward fake neural organization. The
word MLP is used uncertainly and infrequently to mean any feed forward
ANN and now and then thoroughly to identify with the organizations that are
made up of various different layers of perceptron. Multilayer perceptron is
alluded to as “vanilla” neural organizations, mostly when they have a secret
layer. MLP consists of least three layers of hubs, namely an info layer, a
secret layer and a result layer. Every hub is a neuron that utilizes a nonlinear
initiation work. MLP utilizes a learning method got back to engendering for
preparing.
6 Conclusion
References
1. Mendonça RV, Teodoro AAM, Rosa RL, Saadi M, Melgarejo DC, Nardelli PHJ, Rodríguez DZ
(2021) Intrusion detection system based on fast hierarchical deep convolutional neural network
2. Chen L, Kaung X, Xu A, Suo S, Yang Y (2020) A novel network intrusion detection system
based CNN. In: 2020 eighth international conference on advanced cloud and big data (CBD)
3. Vinayakumar R, Soman KP, Poornachandran P (2017) Applying convolutional neural network
for network intrusion detection
4. Xiao Y, Xing C, Zhang T, Zhao Z (2019) An intrusion detection model based on feature
reduction and convolutional neural networks. IEEE Access
5. Ho S, Jufout SA, Dajani K, Mozumdar M (2021) A novel intrusion detection model for detecting
known and innovative cyberattacks using convolutional neural network
6. Khan RU, Zhang X, Alazab M, Kumar R (2019) An improved convolutional neural network
model for intrusion detection in networks. In: 2019 cyber security and cyber forensics
conference (CCC)
7. Hu Z, Wang L, Qi L, Li Y, Yang W (2020) A novel wireless network intrusion detection method
based on adaptive synthetic sampling and an improved convolutional neural network
8. Lin W-H, Lin H-C, Wang P, Wu B-H, Tsai J-Y (2018) Using convolutional neural networks to
network intrusion detection for cyber threats. In: Meen, Prior, Lam (eds) Proceedings of IEEE
international conference on applied system innovation 2018 IEEE ICASI 2018
9. Zheng W-F (2020) Intrusion detection based on convolutional neural network. In: 2020
international conference on computer engineering and application (ICCEA)
10. Alsaleh A, Binsaeedan W (2021) The influence of salp swarm algorithm-based feature selection
on network anomaly intrusion detection
11. Yang H, Wang F (2019) Wireless network intrusion detection based on improved convolutional
neural network
12. Zhang X, Ran J, Mi J (2019) An intrusion detection system based on convolutional neural
network for imbalanced network traffic. In: 2019 IEEE 7th international conference on
computer science and network technology (ICCSNT)
Performing Cryptanalysis on the Secure
Way of Communication Using Purple
Cipher Machine
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 149
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_14
150 V. P. Srinidhi et al.
Fig. 1 Fragment of an
original purple cipher
machine [4]
text, which was preceded by a set of coded numbers which revealed the permutations
used (Fig. 1).
A. Input Plugboard:
The input plugboard has an internal alphabet part and an external alphabet part.
The external alphabet is the keyboard where the user enters the message that
needs to be enciphered or deciphered. Each character from the external alphabet
is mapped onto a fixed internal alphabet. This permutation is done manually to
make sure we get a valid permutation. The result from the internal plugboard
is used for the encipherment or decipherment process. Any of the letters can be
mapped to either the sixes or the twenties in the plugboard.
B. Switches:
The output from the input plugboard (internal alphabet) is then encrypted. The
character based on whether it is a SIXES or a TWENTIES is passed to the
six-switch or the twenty-switch, respectively. As seen earlier, the sixes switch
permutes the sixes character and gives out 25 possible permutations out of the 720
permutations in the total space. On the other hand, the TWENTIES have a larger
permutation space due to the three-switch pipeline where each switch produces
25 permutations. As a result, the total number of permutations is equal to 253 .
The TWENTIES switches can have modes (fast, medium and slow switches),
producing six possible switch motions. The switches advance by one position
every time a character is enciphered. There are certain rules followed:
• When a sixes character is encrypted, the switch moves up by one position.
• When a twenties character is encrypted, usually the fast switch advances by
one position except when the sixes switch is at position 24 or position 25.
– When the sixes switch is at position 24, and middle switch is at position
25, the slow switch advances.
– When the sixes switch is at position 25, the middle switch advances.
While advancing if the switch is at position 25, it returns back to position 1.
The switch advancing mechanism makes sure that every time a new alphabet
is generated when the next letter is encrypted.
C. Output Plugboard:
The output from the switches is passed on to the internal alphabet of the output
plugboard. These characters are mapped from the internal alphabet to the external
alphabet. This mapping is identical to the mapping done in the input plugboard.
The external alphabet is the output typewriter (Fig. 2).
152 V. P. Srinidhi et al.
Algorithm
1. The machine starts with accepting input through the typewriter. Every character
of the input is the primary permuted through a fixed substitution. Any input from
the external plugboard can be connected to any internal plugboard. For example,
If the letter ‘O’ is connected to ‘U,’ it will be encrypted as a sixes character. If
the letter ‘O’ is connected to ‘S,’ then it will be encrypted as one of the twenties
characters
2. The primary permuted character is then passed through a switch based on whether
it is SIXES character or a TWENTIES character. The connections of the switches
are defined in a switch table. Refer to appendix A and appendix B of [3] for a
detailed switch table.
3. The output of the switches makes its way to the output plugboard where fixed
substitution takes place, and the output is sent to the typewriter.
This encryption method made sure that there was no logical way of predicting
how the character might change from the first typewriter to the second typewriter.
This proved frequency analysis to be useless for decoding the code. Purple relied on
five-digit groups. It allowed 45,000 entries in both an alphabetic encoding dictionary
Performing Cryptanalysis on the Secure Way of Communication … 153
5 Pseudo-Code
6 Drawbacks
Purple machine was one of the most complex yet well-developed cryptographic
methods of its time. It was able to work without being decoded for more than 2 years
during the World War II period. Even though the machine was this advanced, it still
had it fair share of drawbacks.
The machine was larger and heavier and was therefore not suitable to be used in
combat location.
The splitting of the alphabets in sixes and twenties reduced the total number
of permutations from 26! to 6! × 20!. This reduced the time of cryptanalysts in
deciphering the code.
Performing Cryptanalysis on the Secure Way of Communication … 155
The machine inherited a property of its predecessor RED, which had been broken
earlier, that the six letters were encrypted separately. Due to this, the U.S. Army was
able to break this before the twenties.
The cryptanalysts used the hill climb attack to decipher the message in an
easier manner. It used the concept that once the sixes are separated, the number
of possibilities for the twenties reduces and it will be easier to decode them.
Hill climb attack is an approach which uses graph search algorithm where the current
path is extended with a successor node which is closer to the solution. For a Purple
cipher message, the attack works in the following way: It investigates and determines
which trial key is more likely to produce the best result based on English letter
frequency statistics, as well as how to improve an existing trial key (hill-climbing).
For a more detailed understanding, please refer [1].
In 1936, when the RED machine was broken by the U.S. Signal Intelligence Service,
the Japanese began creating a new machine to encipher the messages. The machine
was named “97-shiki O-bun In-ji-ki” which was later given the codename PURPLE
by U.S. This machine was an improvement of the Enigma machine.
This machine was used only to send the most secretive messages because of which
there was only a little cipher to work with. One advantage for the cryptanalysts was
that, since the machine was relatively new, the messages were sent on both RED as
well as PURPLE, which helped them compare.
In August 1939, the U.S. Army hired cryptanalyst William Freidman to help with
breaking the PURPLE code. Eighteen months into his work, Friedman suffered a
mental breakdown and was institutionalized. After this, his team were able to use
him work thus far and started making progress. Eight functional replicas of the
machine were created. This was a huge achievement as the U.S. Army had never
seen an actual PURPLE machine [4].
After much effort from the team, the working of the machine was completely
discovered. However, since the daily keys kept changing and the U.S. Army had still
not discovered how these keys work, they could not break the messages.
By this time, Lt. Francis A. Raven was able to crack how the daily keys worked.
He found out that the pattern used was that each month was broken into three ten-
day segments in which a pattern was discerned. He later made a few changes and
ultimately broke the code. With this, the PURPLE cipher was totally broken and the
messages could be deciphered [5].
156 V. P. Srinidhi et al.
U.S. cryptanalysts broke the 14-part message from Japan which indicated to break-
off negotiations with the United States on 7 December 1941. This message gave
information on the pearl harbor attack, but it was not delivered on time due to typing
difficulties and ignorance [4].
Given below is an exert from the 14-part message:
“YHFLO WDAKW HKKNX EBVPY HHGHE KXIOH QHUHW IKYJY
HPPFE ALNNA KIBOO ZNFRL QCFLJ T TSSD DOIOC VTAZC KQTSH XTIJC
NWXOK UFNQR TTAOI HWTATW VHOTG CGAKV ANKZA NMUIN YOYJF
SR”
To decode this, we will use the following configuration:
(i) Plugs NOKTYU-XEQLHBRMPDICJASVWGZF
(ii) Sixes switch at 13
(iii) Twenties switch at 1, 24, 10
(iv) Switch motion 4, i.e., 231.
The message was decoded as:
“THEGO VEENM ENTOF JAPAN LFLPR OMPTE DBYAG ENUIN EDESI
RETOC OMETO ANAMI CABLE UNDERSTAND INLWI THTHE GOVER
NMENT OFTHE UNITE DSTAT ESINO RDERT HATTH ETWOC OUNTR IES”
9 Implementation
We used Python to develop the working of a purple machine. The code contains both
encryption and decryption modules.
1. The code starts with accepting input from the user—enciphering or deciphering
mode, text to be processed, positions of the switches and motion of the twenties
switch.
2. The text is then primary permuted with the fixed substitution. The pattern used
for permuting is:
SIXES: AEIOUY to NOKTYU
TWENTIES: BCDFGHJKLMNPQRSTVWXZ to XEQLHBRMPDIC-
JASVWGZF.
3. As every character is permuted, the permutation character is added to an array. A
for loop is used to iterate through the array and perform encryption or decryption.
4. Either sixes or twenties permutation is called based on the character, and encryp-
tion takes place incrementing the switch position every time based on the provided
conditions.
5. Once the letter is permuted, the letter is changed based on the internal alphabets
and appended to a string which is sent out as the output to the user.
Performing Cryptanalysis on the Secure Way of Communication … 157
Encryption:
• The text we would like to encrypt here is “Hello World.”
• We have specified the switch positions as (3, 6, 2, 15) where 3 is the position of
the sixes switch and the other three are of the twenties switch.
• The switch motion is 231, i.e., the second switch is the fast, third switch as medium
and first as slow switch.
• The encrypted text is “CZMVT PKWJC (Fig. 3).”
Decryption:
• We use the output of encryption as the text for decryption, i.e., “CZMVT PKWJC.”
• We use the same configuration for the position of switches and the switch motion.
• The decrypted text shows “HELLO WORLD (Fig. 4).”
We performed time analysis for the code passing strings of various sizes. The
result is shown in Table 2.
From the graph, we can infer that the time taken for encryption and decryption
is almost the same for strings of shorter length whereas as the length increases,
decryption takes a little longer than encryption (Fig. 5).
Fig. 5 Graph between the number of characters in the string and time taken to encrypt and decrypt
The purple machine was a complex yet unique machine. It was considered one of the
best cryptographic devices during the World War II era. Though it kept the alphabet
partition of that of its predecessor RED, it was able to function and deliver secret
military messages for over 2 years. The machine was an improvement on the famous
Enigma machine. The Enigma machine provided output as blinking lights, whereas
purple used an output typewriter. The driving factor of the purple machine was the
stepping switches which advance one position each time a character is enciphered.
The U.S. Army considered PURPLE machine as one of the hardest machines to
break. U.S. cryptanalysts were able to decrypt the cipher text as fast as the Japanese
counterparts due to the limit of the machine and hill climb attack. They were able to
decrypt the 14-part message which broke the ties between Japan and United States.
Our implementation of the PURPLE machine enables enciphering and deci-
phering of any text with any desired configuration of the position of sixes and twenties
Performing Cryptanalysis on the Secure Way of Communication … 159
switch. It also gives a way to specify the motion of the switches, i.e., which of the
twenties switch will act as the fast, medium and slow switch.
Future work would include designing a front end for the machine. This could be
done using any front-end libraries including PHP and React.
Another aspect of future work would include finding a way to incorporate hill
climb attack while deciphering the code as we noticed that deciphering takes longer
than encryption.
Acknowledgements To begin, we would want to express our gratitude to PES University for giving
us with the opportunity to learn, investigate and implement this project. None of this would have
been feasible without their resources.
We are appreciative to IFSCR and Prasad B Honnavali, Professor, Dept. of CSE, PES University,
for providing us with this fantastic opportunity to learn so much. Being able to learn more about
cryptography and cryptographic equipment was a fantastic experience.
We would like to express our heartfelt gratitude to Vineetha B, Assistant Professor, Department
of Computer Science and Engineering, PES University, Bengaluru. We would not have been able
to complete this internship satisfactorily without her invaluable advice.
References
1 Introduction
Artificial intelligence has supported for the growth and enhancement to the produc-
tion and customer experience. This technology has the ability to interact and
personalize the process in the manufacturing [5].
This technology is supported by the real-time data analysis from different channel
of production and hence increases the response to the production planning and
control. Artificial intelligence combined with machine learning acts as a powerful
collaboration which supports in leveraging the data and information related to
production planning and control in the foundry units [11].
Foundry industry is a primitive industry which consists of pouring a material into
a mould after melting it, where it hardens into the preferred profile. The products of
foundry industry are applied in various sector such as automotive, water management,
agriculture, aeronautic, defence, etc. Hence, the products produced by the foundry
units need to make sure that the final quality is as per the highest quality [9].
Foundry units can apply artificial intelligence, which support in accelerating
manufacturing and support in digital transformation by reducing costs and enhancing
P. M. Kulkarni (B)
Department of MBA, K.L.S. Institute of Management Education and Research, Belagavi,
Karnataka 590006, India
e-mail: [email protected]
P. Gokhale
Department of MBA, KLE Technological University Dr. M. S. Sheshgiri College of Engineering
and Technology, Udyambag Belgaum 590008, India
L. V. Appasaba · K. Lakshminarayana
Department of Management Studies, Visvesvaraya Technological University, Belagavi, India
B. S. Tigadi
Visvesvaraya Technological University, Belagavi, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 161
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_15
162 P. M. Kulkarni et al.
efficiency in the foundry units. However, there are challenges with regards to imple-
mentation of this technology in the foundry units, namely (a) shortage of talent, (b)
technology infrastructure, (c) data quality, (d) real-time decision-making, and (e)
trust and transparency [3].
Given the challenges of implementation of this technology, there is also a great
potential for implementing this technology in foundry, this ascended by the study
conducted by Mckinsey and Co, mentions that production cost can be reduced up to
40% through application of this technology, further the study also indicates that in
longer duration this technology would reduce the cost of depreciation by 17% [12].
Therefore, based on the backdrop of these challenges and opportunities of imple-
mentation of this technology in the foundry industry, this study is undertaken to under-
stand the influence of these challenges on the implementation of artificial intelligence
and machine learning in the foundry units.
These challenges are understood by applying the multi-criteria decision-making
method, in particular, TOPSIS which stands for “Technique for Order of Preference
by Similarity to Ideal Solution” and Fuzzy AHP means Fuzzy Analytical Hierarchy
Process (AHP) method to evaluate select and rank challenges implementing AI and
ML in foundry industry [4].
The use of Fuzzy set theory aids in eliminating any uncertain and supports in
identification of challenges and gives directions for improving the decision-making
process. Likewise, TOPSIS Grey method is applied to rank established on the chal-
lenges identified by the Fuzzy AHP method. Therefore, this study provides a direction
in understanding the challenges in implementing the AI and ML in foundry units.
The next section of the study includes literature review, methodology, and results
followed by discussions and conclusions.
2 Literature Review
The literature review for the study includes the following aspects: (a) artificial intel-
ligence (AI), (b) machine learning (ML), and (c) methods of applying ML in produc-
tion, (d) challenges of implementing machine learning (ML) and artificial intelligence
(AI) in foundry industry.
In the supervised learning, machine learning is focused and developed to identify the
patterns based on the previous data such as product quality data, production output
data, and service quality data [10]. In the unsupervised learning, data is analysed
without any previously known results, instead, the technology develops an algorithm
that learns from the environment which sensor data of machine such as detect and
compares the anomalies, and then, results are analysed. In the reinforced learning,
there are agent to take action in an environment so as to capitalize on some notion
of cumulative return [1].
Foundry industry operates into different eco-system, where in the products are more
into a customized mode of production. However, there is mass production, but
patterns and moulds are developed as per the requirement of the customer. The chal-
lenges of implementing the artificial intelligence and machine learning in foundry
industry are from the five domains, namely (a) talent shortage, (b) technology infras-
tructure, (c) data collection and management, (d) real-time information, and (e) edge
deployment [8].
164 P. M. Kulkarni et al.
Talent shortage: Skilled data scientists and AI professionals are unusual. AI tasks
require an interdisciplinary group of data scientists, software architects, ML engi-
neers, and BI analysts and SMEs. This need is particularly evident in manufacturing,
a sector that many young data scientists consider to be monotonous, repetitive, and
unstimulating [8].
Technology infrastructure: Manufacturing locations often have a variety of
machineries, tools, and manufacturing methods that use different and sometimes
competing technologies, some of which may be running on outdated software that is
not compatible with the rest of their system. In the absence of standards and common
frameworks, plant engineers must determine the best way to connect their machines
and systems, and which sensors or convertors to install [8].
Data quality: Access to clean, meaningful, and high-quality data is critical for the
success of AI initiatives, but can be a challenge in manufacturing. Manufacturing
data often is biased, outdated, and full of errors, which can be caused by multiple
factors. One example is sensor data collected on the production floor in extreme, harsh
operating conditions, where extreme temperature, noise, and vibration variables can
produce inaccurate data. Plants have historically been built using many proprietary
systems, which do not talk to one another, where operational data also may be spread
across several databases in numerous formats not appropriate for analytics, requiring
extensive preprocessing [8].
Real-time decision-making: This is becoming increasingly important in manu-
facturing applications, such as monitoring quality, meeting customer delivery dates,
and more. Often, decisions need to be acted upon immediately—within seconds—to
identify a problem before it results in unplanned outages, defects, or safety issues.
Rapid decision-making requires streaming analytics and real-time prediction services
that enable manufacturers to act immediately and prevent undesirable consequences
[8].
Edge deployments: There are many potential use cases of edge computing in
manufacturing, to allow manufacturers to process data locally, filter data, and reduce
the amount of data sent to a central server, either on site or in a cloud. Furthermore,
a key goal in contemporary manufacturing is to be able to use data from several
processes, machines and systems to get used to the manufacturing process in real
time. This careful monitoring and control of manufacturing assets and processes use
large amounts of data and need machine learning to decide the best action as an
outcome of the insight from the data, and likewise entails edge-based computing.
The capability to install predictive models is critical to enable smart manufacturing
applications [8].
Artificial Intelligence and Machine Learning for Foundry … 165
In this study, a wide-ranging literature review was steered to identify and evaluate
several critical for understanding the challenges for implementation of AI and ML in
foundry industry. The study picks five challenges and they formed the criteria for the
present study. These criteria include talent shortage (A), technology infrastructure
(B), data quality (C), real-time data (D) and edge deployments (E). Table 1 gives the
list of criteria and sub-criteria related to the study of challenges of implementation
of AI and ML in foundry industry.
4 Research Methodology
The planned approach, i.e., Fuzzy AHP and TOPSIS Grey, is useful to apprehend the
challenges of application of AI and ML in foundry units in India. The respondents
designated for the study include 35 foundry units operating in the Belgaum Foundry
Cluster, Belagavi, and Karnataka, India. This cluster of foundry was selected as this
cluster has the more foundry units operating and also contributes to export of foundry
products across the globe. The study elaborate twelve experts to allocate weights to
various criteria’s and sub-criteria’s and score each supplier for each sub-criterion.
The profile of the foundry units and respondents profile is indicated in Table 2. The
suggested criteria for understanding the challenges of implementation of AI and ML
in foundry industry are analysed using the Fuzzy. TOPSIS has been used to assess
and highlight the significant factor to act as an opportunity for the overcoming these
challenges in implementation of AI and ML in foundry units.
Table 1 (continued)
Criteria Sub-criteria
Edge computing requires effective networking system for real-time
outcomes (E5)
⎡ ⎤ ( )
Σ
m Σ
m
1 1 1
⎣ Tgi ⎦
j
= Σ i=1 Σ j=1 , Σ i=1 Σ j=1 , Σ i=1 Σ j=1
i=1 j=1 n m b3i j n m b2i j m m b1i j
⎧ ⎫
⎪
⎨ 1, incase of b2 j ≥ b2i ⎪
⎬
( )
SY j ≥ SYi = (d) = 0, incase of b1i ≥ b3 j (2)
⎪
⎩ ⎪
, otherwise ⎭
b1i −b3 j
(b2 j −b3 j )−(b2i −b1i )
for (i = 1, 2, 3, 4, 5, 6, 7, …, k)
168 P. M. Kulkarni et al.
V [(SY ≥ SY1 ), (SY ≥ SY2 ), and . . . (SY ≥ SYk )] = min V (SY ≥ SYi ) (3)
for (i = 1, 2, 3, 4, 5, 6, 7, …, k)
Stage 5: Let’s assume weight vector
6 Topsis
TOPSIS is generally used for deciphering complex decision problems. The TOPSIS
method is adopted using the subsequent seven stages:
Stage 1: Build H matrix
⎡ ⎤
x11 x12 ... x1m
⎢x ... x2m ⎥
⎢ 21 x22 ⎥
⎢ ⎥
[labelsep = 2.8 mm]H = ⎢ . . . ... ... ... ⎥ (6)
⎢ ⎥
⎣... ... ... ... ⎦
xn1 xn2 ... xnm
qi j = w j gi j , ( j = 1, 2, . . . , m), (i = 1, 2, . . . , n) (8)
⎧ ( )⎫
min qi j max qi j || |
'|
[ ]
−
A = | j ∈ J ), j ∈ J i ∈n = q1− , q2− , . . . , qm− (10)
i i
Stage 5:
[ m ]
Σ ( )2 1/2
di+ = +
qi j − q j , (i = 1, 2, . . . , n) (11)
i=1
⎡ ⎤1/2
m (
Σ )2
di− = ⎣ qi j − q −
j
⎦ , (i = 1, 2, . . . , n) (12)
j=1
Stage 6:
di−
Ci+ = , (i = 1, 2, . . . , n) (13)
di+ + di−
[ ]
⊗a − ⊗b = a − b; a − b (15)
[ ( ) ( )]
⊗a × ⊗b = min ab, ab, ab, ab ; max ab, ab, ab, ab (16)
] [
1 1
⊗a : ⊗b = ⊗a × , ;0 ∈
/ ⊗b (17)
b b
TFNs can be converted into grey numbers using a˜ = (a1, a2, a3), and b˜ =
(b1, b2, b3) into grey numbers ⊗ a = [a1, a2], and ⊗ b = [b1, b2] using Euclidean
distance between ⊗ a and ⊗ b as given in the equation below:
/ [
1 ( )2 ( )2 ]
d(⊗a, ⊗b) = a−b + a−b (18)
2
170 P. M. Kulkarni et al.
The results are indicated in two stages. In the first stage, Fuzzy AHP results are
presented, where in results with regards to weights for the main criteria and sub-
criteria are presented. In the second stage, TOPSIS Grey results are indicated with
ranking indicating the alternatives for the challenges for implementation of AI and
ML in foundry units.
Fuzzy AHP Results
The analysis with Fuzzy AHP has four levels, firstly development of Hierarchical
structures, secondly, main criteria weights, thirdly, sub-criteria weights, and fourth,
final weights sub-criteria.
The first level is hierarchical structures is developed based on the four parts,
namely goals, criteria, sub-criteria, and alternatives. The details hierarchical structure
is presented in Fig. 1.
The results from main criteria weights indicated in Table 4 gives that the challenge
with regards to technology infrastructure (B), 0.221 ranked a highest challenge in
foundry units for implementation of AI and ML in foundry. Further, the second ranked
weight is 0.216 Data Quality (C) which indicates second challenge with regards to
implementation of this technology in foundry units in the study units of foundry.
The third weight age is 0.21, real-time data (D) collection challenge for analysing
the information for AI and ML technology in foundry units. Talent shortage (A) is
ranked fourth with 0.205 weights in the ranking of the weight age and fifth ranking
of weight is edge deployment (E) 0.149 as weight age.
Table 4 Results with regards to main criteria, sub-criteria, and ranking of the criteria on challenges
of implementation of AI and ML in foundry
Criteria Main criteria Sub-criteria Sub-criteria Global weight Rank
weight code weight
Talent shortage 0.205 A1 0.225 0.0461 12
(A) A2 0.2 0.0410 16
A3 0.17 0.0349 20
A4 0.24 0.0492 11
A5 0.165 0.0338 21
Technology 0.221 B1 0.276 0.0610 7
infrastructure B2 0.173 0.0382 17
(B)
B3 0.165 0.0365 18
B4 0.232 0.0513 10
B5 0.154 0.0340 20
Data quality 0.216 C1 0.201 0.0434 13
(C) C2 0.199 0.0430 14
C3 0.167 0.0361 19
C4 0.239 0.0516 9
C5 0.194 0.0419 15
Real-time data 0.210 D1 0.301 0.0632 4
(D) D2 0.291 0.0611 6
D3 0.541 0.1136 2
D4 0.612 0.1285 1
D5 0.356 0.0748 3
Edge 0.149 E1 0.231 0.0344 21
deployments E2 0.33 0.0492 11
(E)
E3 0.415 0.0618 5
E4 0.122 0.0182 22
E5 0.356 0.0530 8
After the application of Fuzzy AHP in the sub-criteria weights show that in the
rank of in the range of 1–5 shows in the Global Weight is ranked higher, with regards
to complexity of foundry technology, and it working environment to implement
sensors is ranked first with 0.1285, while manufacturing complexity due to tech-
nology infrastructure is ranked second with 0.1136. Third is ranked with regards to
complexity due to batch production method applied in the foundry, this influences the
AI and ML implementation in foundry. Fourth is ranked after difficulty in production,
planning, and control method in foundry units, and fifth is ranked after difficulty in
connections of sensors and computers for collection of data for application of AI
and ML in foundry. The detailed results from the other weight age are information
is presented in Table 4.
172 P. M. Kulkarni et al.
Table 5 Ranking of opportunity for implementation of AI and ML in foundry units through TOPSIS
grey
Alternatives D+ D− C+ Ranking
AI and ML reduce cost in overall production of 9.66908 6.77085 0.4118 3
foundry
AI and ML reduce wastage 9.59228 6.24699 0.3943 4
AI and ML need skill development 10.0157 5.76561 0.3653 5
AI and ML improve customer satisfaction 8.35958 7.78878 0.4823 1
AI and ML reduce supply chain cost 9.03350 6.98951 0.4362 2
The developed TOPSIS Grey-integrated methodology has been used to assess and
prioritize the alternatives of ranking for opportunities for implementation of AI
and ML in foundry units. The results show that AI and ML provide an opportu-
nity for improving customer satisfaction of foundry industry (0.482326). This tech-
nology also provides opportunity to reduce the supply chain cost of foundry industry
(0.436217), and this is ranked second in results analysis. This technology reduces
overall cost of production (0.411854). The detailed information with TOPSIS Grey
result analysis is provided in Table 5.
8 Discussion
The result analysis indicates that technology infrastructure is ranked among the key
factor of challenge for implementation of AI and ML in the foundry units. The
technology infrastructure development is influenced by the factors associated with
data collection of analysis through AI and ML such as influence of heat, dust, and
batch method of production process of foundry.
The study findings have given a new directions in the study of challenges with
regards to AI and ML implementation in foundry, as previous studies have indicated
that talent shortage is the key factor and challenge for implementation of this tech-
nology in foundry units. Further, studies have also indicated that edge technology as a
factor of challenge of implementation. This technology is related to networking and
advanced computing for implementation of this technology in foundry. However,
there is an opportunity of implementation of this technology in foundry units,
the results analysis indicated that, this technology supports the foundry units with
improved customer satisfaction and reduced cost in the supply chain management.
The above discussion provides a direction for practical implications for improving
implementation of AI and ML in foundry units. Firstly, foundry units need to invest
in technology especially related to sensor and cloud computing for data capturing and
analysis, this improves the efficiency in real-time data analysis. Secondly, foundry
Artificial Intelligence and Machine Learning for Foundry … 173
units need to invest in this technology as this technology supports in cost reduction
in supply chain management and other manufacturing cost as this technology collect
real-time data and based on this data faster decision-making can be taken by the
managers of foundry. Thirdly, foundry units need to train employees to using this
technology in the foundry units.
The overall results indicated that this technology is effective in foundry industry as
this technology supports in customer satisfaction and reducing cost of production.
However, there are challenges with regards to technology development for collecting
real-time data due to foundry manufacturing eco-system. Further studies can be
undertaken other foundry cluster of other states of India and also studies can be
carried out and compared with developed and underdeveloped countries foundries.
Finally, the study results indicate that AI and ML is a powerful tool for foundry
industry for improving production efficiency and enhancing customer satisfaction.
References
11. Sahu CK, Young C, Rai R (2021) Artificial intelligence (AI) in augmented reality (AR)-assisted
manufacturing applications: a review. Int J Prod Res 59(16):4903–4959
12. Thomas DS, Gilbert SW (2014) Costs and cost effectiveness of additive manufacturing. NIST
Spec Publ 1176:12
Conversion of Sign Language to Text
Using Machine Learning
1 Introduction
This template sign language is a form of correspondence used by people who are
deaf. People who are differently abled use multiple language movements as a non-
verbal communication technique to communicate their feelings and ideas to other
people. However, since these ordinary people have a hard time understanding their
expressions, experienced sign language experts are required for medical and legal
appointments, as well as educational and training sessions. There has been an upsurge
in demand for all these services in recent years. Other types of services, such as video
distant location human interpreter using an increased connection to the internet, have
been presented, providing an easy-to-use sign language interpreter service that can
be used and stands to benefit, but with significant limitations. To address this, we
created a custom CNN model to recognize sign language gestures. Three convolu-
tion: multiple max-pooling layers, dense layer and smoothing layer; these factors
are used to create a convolutional neural network. To train the model to recog-
nize the gesture, we utilize the MNIST Indian Sign Language dataset. The collec-
tion includes the characteristics of Indian alphabets and digits and used OpenCV
for custom CNN (fully convolutional) model to recognize a sign from a real-time
webcam. The precise extraction of hand movements actions, as well as face emotions,
is vital. Studies are mostly focused on particular sign computational linguistics, such
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 175
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_16
176 A. Bhagwat et al.
2 Related Works
Communicating is one of the most fundamental prerequisites for societal life. Individ-
uals with hearing problem communicate with one another using gestures, which are
difficult to understand for normal persons. In today’s world, almost every language
has its own established gesture-based communications. It is critical to provide an
understanding of signs for various populations that are unfamiliar with the sign
Conversion of Sign Language to Text Using Machine Learning 177
language [1]. It is a system of hand signals that includes visual gestures and signs.
Conversational signals, regulating gestures, manipulating motions and instructive
motions are some of the other types of hand motions [2]. The use of a few parts of our
body, such as our fingers, hand and arm to express data, is known as sign language. A
system that recognizes hand movements from a digitally enhanced dataset is known
as recognition of image [3]. This technique is now used in a wide range of applica-
tions, including robotics and telerobotics, games, virtual reality and human–computer
interaction (HCI). The majority of the presented models rely on traditional example
recognition, which necessitates human competence for extraction and recognition.
Deep learning’s recent success, particularly the residual neural network (ResNet) for
machine vision, is being utilized to detect Indian Sign Language (ISL) as an object
recognition challenge [4].
This paper [5] suggests employing Markov Chain models to categorize the
outcome, trajectory parameters and alignment structure of the multiple languages
(HMM). This technique’s intrinsic features make it suitable for use in gesture recog-
nition [6]. Here, author provided a technique in which a total of 262 signs was
gathered from two distinct endorsers, with a precision of 95% using a HMMs classi-
fier. When the database is taught and evaluated using the signals of various people,
the accuracy is considerably reduced. Artificial neural network-based approaches
for real-time American Sign Language (ASL) identification were suggested [7]. The
Microsoft Kinect sensor is utilized in the study to detect signs for two applications:
arithmetic calculation and the rock-paper-scissors game, with an accuracy of more
than 90% for ASL. Setting subordinated HMMs and a technique for linking three-
dimensional strategies on ASL were used [8]. The framework grouped 53 ASL with
an accuracy of 89.91%. Convolutional neural network (CNN) was utilized [9] to
recognize sign for Indian Sign Languages. To identify Arabic Sign Language, author
employed a neural network [10]. Kadam and Ghodke [11] developed an Indian Sign
Identification System for twenty-five English letters and nine number signs. This
study used PCA for sign classification and segmentation for fingertip method for
feature extraction. The accuracy of this method was 94%. Others suggested a tech-
nique for detecting Tamil sign letters in their paper [12]. This approach employed
images with a resolution of 640 × 480 pixels. The photographs are then transformed
to grayscale images. This technique achieved a precision of 96.87% for the static
method and a rate of 98.75% for the dynamic method [13].
3 System Analysis
We describe the system conversion of sign language to text using machine learning
to support better communication between the hearing and speech impaired people
and common people. This system is very efficient as well as economical as we do
not have to buy instrumented gloves or buy any external device. Our system will
eliminate the communication barrier by converting the sign language to text. There
is no need for the hearing and speech impaired people to rely or be dependent on
human for converting the sign language, and they can be more independent.
4 System Implementation
We have proposed a system that helps to translate the Indian Sign Language into text
form.
The proposed system follows below-mentioned process:-
1. Dataset
2. Image preprocessing and hand segmentation
3. Feature extraction from data
4. Classification algorithm
4.1 Dataset
The suggested technique is divided into many parts. We collected data from a stan-
dard dataset in the first stage. We turned each picture to grayscale and resized it
to 6464 pixels after gathering the data. After that, we used normalizing methods to
transform the gray level data from 0–255 to a range of 0–1 values. We skip two
convolution processes in the convolution layer and put the input well before final
ReLU activation. In this stage, we applied the algorithm to recognize Indian Sign
Language by extracting the attributes of picture data. To increase efficiency, we
used an optimization approach (Adam optimizer) in the fourth phase. Finally, we
employed real-time augmentation to improve the variety of data accessible for the
training set in the final stage. We have used MNIST dataset (Kaggle) for Indian
Sign Language recognition. The pictures of Indian Sign Languages come from a
common dataset. In order to provide fixed length input to our suggested model, we
turned each picture to grayscale and resized it to 6464 pixels. In the field of image
processing, pattern matching is a rapidly developing discipline. ResNet is a crucial
component of computer vision. Convolutional layers are the most significant layer
for extracting features from images in ResNet. To comprehend ResNet, think of it
as a collection of residual blocks, each of which has a convolutional layer, batch
normalization and the ReLU activation function. ResNet also includes a skip layer
Conversion of Sign Language to Text Using Machine Learning 179
Fig. 1 Indian Sign Language sample dataset alphabets. Image Source [14]
that aids in the resolution of the vanishing gradient issue. To create this model, we
utilized Keras API with TensorFlow as a backend. The output is then combined with
the result of the first layer, a process known as multiple hops, and lastly the transfer
functions ReLU is applied to the output. This is maintained until the last layer, then
we utilize flatten layer and linked thick layer with 24 classes, followed by Softmax
function for posterior distribution (Figs. 1 and 2).
The resolution of all camera device is not same. The output will be images with
different resolution. All images are rescaled to uniform size in order to decrease
computational effort and to compare the features accurately. To obtain accurate
prediction, we preprocess the image before feeding it to the algorithm. We have
converted image to grayscale. In the region of hand movement, the color will be
converted to grayscale. This will result in binarized image. Image is converted to
small segments for more accurate prediction. Color is utilized as descriptor for object
detection. We use color space YCbCr for hand detection. We have used YCbCr over
RGB as we need to channelize all the colors in one direction. Then, next step is
180 A. Bhagwat et al.
In data that we have collected, we need to extract features from it. Dimensionality
reduction is also known as feature extraction. There are a lot of variables in a dataset
which increase the computing cost. Dimensionality reduction helps to extract the
best features that describe the dataset. We use feature extraction for easy processing
of dataset and provide the extracted features to algorithm.
Distance transformations are used to blur the location of features. Fourier descrip-
tors are used to draw a boundary shape around image. In content-based image, shape
is very important feature. Fourier descriptors can be applied irrespective of size,
scale of image. Therefore, we have used Fourier transformation as it is indepen-
dent of shape and size of object in image. To match the query object with object in
Conversion of Sign Language to Text Using Machine Learning 181
database, we have used feature vector. Some of the features are color, area, length
and gradient direction. These features are than compared with feature vector.
Fig. 4 Layers of
convolutional neural network
(1) There are 4 feature units and 5 activation units in the next hidden layer.
(2) Bias units in every layer are 1’s.
(3) Input values are b01, b02, b03 and b04. These are basic features.
Conversion of Sign Language to Text Using Machine Learning 183
Pooling Layer
(4) The 4 feature units are connected to 5 activation units of hidden layers. The
weights of each feature connect the two layers.
184 A. Bhagwat et al.
We have used here three convolutional layers. The first level accepts the low-level
features 50 * 50 size grayscale image. The activation map contains size 49 * 49
and 16 filters, so total size is 16 * 49 * 49. An activation layer is applied to remove
negative values, and it is placed with zero value. Then, max-pooling is applied which
results in 25 * 25 size which takes only maximum value of regions in map. Second
convolutional layer recognizes curves and angles. Here, it has 32 filters and 23 * 23
size activation map, so total is 32 * 23 * 23. Then, max-pooling is applied, and result
is 32 * 8 * 8 size considering the maximum values of regions. High level of gestures
is identified in third layer of convolutional layer; here 64 filters are used, and total
size is 64 * 4 * 4. The max-pool is used to reduce the map 64 * 1 * 1. The output
is 1D array with 64 length. The dense layer then expands the array to 128. The next
layer removes random elements of map. In the last step, the dense layer decreases
the array to 44 elements which will be number of classes (Fig. 7).
Keras Image Data Generator is used for testing and training the dataset. The validation
of dataset is used to measure the loss and accuracy of each epoch and to prevent the
model from overshooting loss minima.
Conversion of Sign Language to Text Using Machine Learning 185
6 Accuracy
The accuracy is evaluated using recall and F-score. F-score is defined as harmonic
mean of two values recall and precision.
Intel i7 CPU 2.7 GHz is used with 16 GB random access memory for execution.
The ResNet (32, 50, 101 and 152) version is used for experimental investigation of
proposed systems including 5G network. The major factors considered are execution
time (including data processing, data uploading and downloading, etc.), memory
consumption, network overhead and energy for evaluating the efficiency of proposed
systems (Table 1, and Fig. 8).
8 Conclusion
Fig. 8 Analysis of proposed system using ResNet-100 with CNN various machine learning
algorithms
learning systems is defined in this paper. Time consumption as well as memory usage,
the system trades off system performance. To this end, through code block execution,
we proposed a comprehensive mechanism that does not lead to any loss of accuracy
and is friendly to the resource restriction environment. The experiments investigate
the performance, good configurability and original machine learning process of the
system by installing the application with Caffe and TensorFlow frameworks.
9 Future Scope
Our system can be extended with the additional features and techniques that can create
a system of sign language detection system that can recognize the facial expression
and video streaming. A fully automated Indian Sign Language recognition with a
sign to text and text to speech converter which can analyze the video in real time
and generate an output based on voice and text can be developed. For future work
to detect moving objects in runtime direction using hybrid machine learning will be
interesting task for enhancement of current research.
Acknowledgements I take the opportunity to thank my mentor Mrs. Poonam Gupta and Mrs.
Nivedita Kadam for their most valuable guidance.
References
1. Wu Y, Huang TS (1999) Human hand modeling, analysis and animation in the context of HCI.
In: Proceedings 1999 international conference on image processing (Cat. 99CH36348), vol 3,
pp 6–10. https://doi.org/10.1109/ICIP.1999.817058
188 A. Bhagwat et al.
2. Muhammed MAE, Ahmed AA, Khalid TA (2017) Benchmark analysis of popular ImageNet
classification deep CNN architectures. In: 2017 International conference on smart technologies
for smart nation (SmartTechCon), pp 902–907. https://doi.org/10.1109/SmartTechCon.2017.
8358502
3. Shawon A, Jamil-Ur Rahman M, Mahmud F, Arefin Zaman MM (2018) Bangla handwritten
digit recognition using deep CNN for large and unbiased dataset. In: 2018 international confer-
ence on Bangla speech and language processing (ICBSLP), pp 1–6. https://doi.org/10.1109/
ICBSLP.2018.8554900
4. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016
IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778. https://doi.
org/10.1109/CVPR.2016.90
5. Das, Gawde S, Suratwala K, Kalbande D (2018) Sign language recognition using deep learning
on custom processed static gesture images. In: 2018 international conference on smart city and
emerging technology (ICSCET), pp 1–6. https://doi.org/10.1109/ICSCET.2018.8537248
6. Mustafa M (2021) A study on Arabic sign language recognition for differently abled using
advanced machine learning classifiers. J Ambient Intell Human Comput 12:4101–4115. https://
doi.org/10.1007/s12652-020-01790-w
7. Gurbuz SZ et al (2021) American sign language recognition using RF sensing. IEEE Sens J
21(3):3763–3775. https://doi.org/10.1109/JSEN.2020.3022376
8. Roy PP, Kumar P, Kim BG (2021) An efficient sign language recognition (SLR) system using
Camshift tracker and hidden Markov model (HMM). SN Comput Sci 2:79. https://doi.org/10.
1007/s42979-021-00485-z
9. Sruthi CJ, Lijiya A (2019) Signet: a deep learning based Indian sign language recognition
system. In: 2019 international conference on communication and signal processing (ICCSP),
pp 0596–0600. https://doi.org/10.1109/ICCSP.2019.8698006
10. Aly S, Aly W (2020) DeepArSLR: a novel signer-independent deep learning framework for
isolated Arabic sign language gestures recognition. IEEE Access 8:83199–83212. https://doi.
org/10.1109/ACCESS.2020.2990699
11. Kadam S, Ghodke A, Sadhukhan S (2019) Hand gesture recognition software based on
Indian sign language. In: 2019 1st international conference on innovations in information
and communication technology (ICIICT), pp 1–6. https://doi.org/10.1109/ICIICT1.2019.874
1512
12. Sharma S, Singh S (2020) Vision-based sign language recognition system: a comprehensive
review. In: 2020 international conference on inventive computation technologies (ICICT), pp
140–144. https://doi.org/10.1109/ICICT48043.2020.9112409
13. Jayanthi P, Thyagharajan KK (2013) Tamil alphabets sign language translator. In: 2013 fifth
international conference on advanced computing (ICoAC), pp 383–388. https://doi.org/10.
1109/ICoAC.2013.6921981
14. https://images.app.goo.gl/kQmp2suEoq2YYoSH8
15. https://upload.wikimedia.org/wikipedia/commons/thumb/c/c8/Asl_alphabet_gallaudet.svg/
1200px-Asl_alphabet_gallaudet.svg.png
A Hexagonal Sierpinski Fractal Antenna
for Multiband Wireless Applications
1 Introduction
The concept of fractal was first developed by a scientist Benoit Mandelbrot in 1975.
There are basically 2 types of fractal antenna viz. Sierpinski fractal and Koch fractal.
As the days are passing the applications of antennas are increasing rapidly. The
fractal structure uses self-similar concept in design which maximizes the effective
length or increases the perimeter of antenna geometry. The key aspect of fractal lies
on the iterations or the repetitions formed. Due to the iterations, fractal antennas can
become compact, multiband and wideband and used in many wireless applications
[1, 2]. Any patch antenna consists of 3 layers, the patch at top, middle substrate and
at the bottom ground [3]. The antenna size of depends on the operating frequency. In
this paper a hexagonal sierpinski fractal antenna is designed for UWB applications.
The Coplanar Waveguide (CPW) fed technique used to response the high frequency
[4]. A modified sierpinski fractal-based microstrip antenna for ultrahigh frequency
(UHF) radio frequency identification (RFID) can be designed by combining the
techniques of corner cutting with fractal shape [5]. Sierpinski Carpet Fractal Antenna
is designed at 2.4 GHz frequency by introducing C shaped slot at rectangular patch
which supports a multiband characteristics [6].
A sierpinski gasket fractal multiband antenna can be used for Wi-Fi and cognitive
radio applications with modified structure. The novel Microstrip triangular fractal
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 189
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_17
190 M. Mathpati et al.
antenna has multiband behavior which covers the frequency band from LTE, X-
band (8–12 GHz), Ku-band (12–18 GHz), K-band (18–26.5 GHz), Ka-band (26.5–
40 GHz) [7–11]. The array formation in the sierpinski fractal design leads to increase
the bandwidth, resonant frequencies and gain of the antenna [12, 13]. The fractal
antenna with multiple frequency band is always better than the different patch
antennas for same frequencies [14]. Sierpinski fractal antenna with electromagnetic
band gap structure helps to suppress the harmonics, improves the gain and bandwidth
as compared with sierpinski fractal antenna without electromagnetic band gap [15].
The combination of planar metamaterial concept with compact ultra wide band sier-
pinski antenna helps to effectively increase the bandwidth of antenna [16]. The local
optimization helps to improve the performance and miniaturization of an antenna
[17]. Sierpinski antenna with triangular slots using midpoint geometry of triangle
enhances the bandwidth when modified with circular shape [18, 19].
2 Antenna Design
The original antenna design is a microstrip antenna with hexagonal shaped patch. The
antenna has been designed on the FR4 substrate with length 30 mm, width 28 mm,
thickness 1.6 mm and dielectric constant 4.4. It has ground plane with length 30 mm,
width 28 mm. The side of regular hexagon is 8.8 mm with a feed line length 9 mm
and width 1.4 mm. The design is shown in Fig. 1.
The design and simulation of the antenna is done with the help of CAD FEKO
software. The simulation results are obtained for reflection coefficient, total gain,
A Hexagonal Sierpinski Fractal Antenna for Multiband … 191
radiation pattern, voltage standing wave ratio (VSWR) and impedance of the designed
antenna. The simulation results of the hexagonal patch antenna with full ground and
the hexagonal antenna with partial ground are shown in Table 1 and the graphical
representation of the reflection coefficient is shown in Fig. 2. From the simulation
results, we can easily observe that the frequency bands as well as the bandwidths are
improved for the partial ground plane.
The sierpinski fractal structure has been embedded in the design for further
improvements in the results of an antenna. The simulation results viz., resonant
frequencies, VSWR, impedance, reflection coefficient and total gain of Iteration 0,
Iteration 1, Iteration 2 are shown in Table 2 and the graphical representation of
frequencies and reflection coefficients for proposed antenna (Iteration 2) is shown in
Fig. 3.
From the simulation results, it has been observed that all the antenna parameters
has acquired best results in Iteration 2 than that of Iteration 0 and Iteration 1. The
proposed antenna has VSWR 1.04, 1.05, 1.5, 1.6 and 1.2 values, impedance 49.3 Ω ,
52.2 Ω , 54.9 Ω , 75.6 Ω and 45.6 Ω values, reflection coefficients −35.51 dB, −
33.17 dB, −13.79 dB, −12.77 dB and −19.56 dB values and total gain 4 dBi,
6 dBi, 8 dBi, 7.5 dBi and 6 dBi values for frequencies 5.57 GHz, 8.18 GHz,
11.31 GHz, 13.55 GHz and 17.54 GHz, respectively. The Bandwidth of an antenna
has been increases effectively in Iteration 2. The maximum bandwidth for Iteration
0 is 7.472 GHz, for Iteration 1 is 5.6355 GHz and for Iteration 2 is 7.56123 GHz.
The frequency bands for proposed antenna, i.e., for Iteration 2 are 4.3–11.9 GHz,
13.148–13.980 GHz and 16.83–18.60 GHz.
4 Conclusion
This paper gives analysis about the square shaped sierpinski fractal structure in hexag-
onal patch. The proposed antenna radiates at 3 different frequency bands viz. 4.3–
11.86, 13.148–13.98 and 16.83–18.60 GHz with 5 resonant frequencies 5.57, 8.18,
11.31, 13.55 and 17.54 GHz, VSWR 1.04, 1.05, 1.5, 1.6 and 1.2 values, impedance
49.3, 52.2, 54.9, 75.6 and 45.6 Ω values, reflection coefficients −35.51, −33.17, −
13.79, −12.77 and −19.56 dB and total gain 4, 6, 8, 7.5 and 6 dBi for respective
frequencies. It has been observed that the designed antenna gives best performance for
Iteration 2 and it can be used for C-band (4–8 GHz), X-band (8–12 GHz), some part of
Ku-band (12–18 GHz), Designed antenna is suitable for vehicle to everything (V2X),
Dedicated Short Range Communications (DSRC) and Wireless Access in Vehicular
Environments (WAVE) communications bands (5.850–5.925 GHz) applications.
References
1. Mathpati MS, Bakhar M, Dakulagi V (2021) Design and analysis of a fractal antenna using
jeans material for WiMax/WSN applications. Wireless Pers Commun. https://doi.org/10.1007/
s11277-021-08418-y
2. Bhaldar H, Gowre S, Mahesh, Dakulagi V (2021) Design of circular shaped microstrip textile
antenna for UWB application. IETE J Res. https://doi.org/10.1080/03772063.2021.1982416
3. Kaur R, Singh H (2017) Review on different shape fractal antenna for different applications.
Int J Adv Res Comput Sci 8(4):339–342
4. Bairy P, Ashok Kumar S, Shanmuganantham T (2017) Design of CPW fed hexagonal sierpinski
fractal antenna for UWB band applications. In: IEEE international conference on circuits and
systems, pp 107–108
5. Yin-kun W, Du L, Lei S, Jian-Shu L (2013) Modified sierpinski fractal based microstrip antenna
for RFID. In: IEEE international wireless symposium
6. Sran SS, Sivia JS (2016) Design of C shape modified sierpinski carpet fractal antenna for
wireless applications. In: International conference on electrical, electronics, and optimization
techniques, pp 821–824
194 M. Mathpati et al.
7. Prasanthi Jasmine K, Ratna Spandana S (2018) Design and performance analysis of sierpinski
diamond fractal antenna for multi-band applications. In: SPACES, pp 85–89
8. Rajshree A, Sivasundarapandian S, Suriyakala CD (2014) A modified sierpinski gasket
triangular multiband fractal antenna for cognitive radio. In: ICICES2014
9. Petkov PZ, Bonev BG (2014) Analysis of a modified sierpinski gasket antenna for Wi-Fi
applications. In: 24th international conference Radioelekronoka
10. Lizzi L, Massa A (2011) Dual-band printed fractal monopole antenna for LTE applications.
IEEE Antennas Wirel Propag Lett 10:760–763
11. Baliarda CP, Borau CB, Rodero MN, Robert JR (2000) An iterative model for fractal antennas:
application to the sierpinski gasket antenna. IEEE Trans Antennas Propag 48(5):713–719
12. Mathpati MS, Bakhar Md, Vidyashree D (2017) Design and simulation of sierpinski fractal
antenna array. In: International conference on energy, communication, data analytics and soft
computing, pp 2514–2518
13. Cao TN, Krzysztofik WJ (2018) Design of multiband sierpinski fractal carpet antenna array
for C-band. In: 22nd international microwave and radar conference (MIKON), pp 41–44
14. Maharana MS, Mishra GP, Mangaraj BB (2017) Design and simulation of a sierpinski carpet
fractal antenna for 5G commercial applications. In: IEEE WiSPNET 2017 conference, pp
1718–1721
15. MAA X, Mi S, Lee YH (2015) Design of a microstrip antenna using square sierpinski fractal
EBG structure. In: IEEE 4th Asia–Pacific conference on antennas and propagation (APCAP),
pp 610–611
16. Lamari S, Kubacki R, Czyzewski M (2014) Frequency range widening of the microstrip antenna
with the sierpinski fractal patterned metamaterial structure. In: 20th international conference
on microwaves, radar and wireless communications
17. Koziel S, Saraereh O, Jayasinghe JW, Uduwawala D (2017) Local optimization of a sierpinski
carpet fractal antenna. In: ICIIS, pp 1–5
18. Ramprakash K, Shibi Kirubavathy P (2017) Design of sierpinski fractal antenna for wide-
band applications. In: International conference on innovations in information, embedded and
communication systems
19. Bal VK, Bhomia Y, Bhardwaj A (2017) Carpet structure of combination of crown square and
sierpinski gasket fractal antenna using transmission line feed. In: International conference on
electrical and computing technologies and applications (ICECTA)
Multi-level Hierarchical
Information-Driven Risk-Sensitive
Routing Protocol for Mobile-WSN:
MHIR-SRmW
1 Introduction
In the last few years, the exponential increase in wireless communication demands has
revitalized academia-industry to achieve quality of service (QoS)-oriented commu-
nication systems(s) to meet major demands. Amongst the major wireless technolo-
gies, wireless sensor network (WSN) has been used in a diverse range of applica-
tions serving health care, industrial monitoring and control, surveillance systems,
and numerous machine-to-machine (M2M) communication purposes. On the other
hand, the rise in technologies like Internet of Things (IoTs) too have broadened
the horizon for WSNs to serve different real-time decision purposes. The decentral-
ized and infrastructure-less network characteristics make WSN a most sought tech-
nologies to serve real-time surveillance and communications even under disaster or
natural calamity [1–4]. Despite of a significantly large application horizon, WSN
has always been a challenging network due to resource constrained, greedy, and
static network characteristics. On the contrary, in the last few years, IoT and M2M
communication systems have sought mobile-WSN to serve QoS communication
while guaranteeing energy efficiency [5]. Most of the classical WSN protocols apply
reactive route management strategies, which often undergo limited performance due
to lack of dynamism and inferior scalability over large network size [5]. The classical
standard IEEE 802.15.4 applies reactive routing strategy, and lacks addressing topo-
logical variation, and its impact on communication efficiency. The change in network
topology often results into network outage, link-loss, packet drop, congestion, etc.
Consequently, it results into degraded performance of the overall network [6]. A
B. V. Shruti (B)
Nitte Meenakshi Institute of Technology, Yelahanka, Bangalore 560064, India
e-mail: [email protected]
M. N. Thippeswamy
Ramaiah Institute of Technology, MSRIT, MSR nagar, Bangalore 560054, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 195
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_18
196 B. V. Shruti and M. N. Thippeswamy
few researches indicate that the multihop transmission which is often employed over
large network size undergo delay and packet loss that impacts over all QoS perfor-
mance. On the other hand, M2M communication which undergoes communication
with dynamic topology too requires a robust routing protocol or strategies to retain
QoS performance. To alleviate such problems, introducing mobility in the network
is of great significance. The use of mobility (say, mobile-WSN) can help reducing
the number of hops in data gathering or distribution [6]. Similarly, it can broaden
the scalability of the network to serve more applications demanding mobile commu-
nications. In this reference, in the last few years, a few researches have been done
towards inculcating mobility with WSN. However, guaranteeing network reliability
and QoS performance has remained challenge [1–7].
Exploring in depth one can find that medium access control (MAC) framework
does have decisive role towards QoS assurance in WSNs. MAC protocols which
often controls network adaptive activities to meet network demands, require intrinsic
improvement, and mobility-adaptive scheduling capacity to ensure QoS-oriented
communication over mobile-WSN [7]. Unlike PHY-centric routing decisions, MAC-
based approaches help making optimal routing decision over diverse operation or
network conditions [8]. Unlike standalone layer-based routing (i.e. PHY layer adap-
tive and system layer adaptive) decisions, the use of cross-layer architecture-based
methods has yield superior performance even under mobile topology [7, 8]. In fact,
PHY layer helps in controlling the rate of transmission and power management,
while system layer information facilitates a large number of dynamic network infor-
mation such as congestion, flooding probability, packet velocity, network link avail-
ability or link-loss probability, etc. Noticeably, these all parameters do vary over time
in mobile-WSN communication [7]. Therefore, to alleviate such issues, applying
(dynamic) network parameters adaptively towards routing can be of great signifi-
cance. A MAC routing protocol applying the different cross-layer information can
help achieving QoS performance in WSNs [JSMCRP] [10, 11]. Recent literatures
also reveal that the strategic amalgamation of the routing protocol and MAC can help
ensuring QoS performance [10–12]. In this reference, we developed a first of its kind
dynamic network-driven MAC protocol named JSMCRP in [], which exploited both
PHY layer information as well as the system layer information from network layer,
application layer, and MAC layer to perform mobile-WSN communication. Unlike
classical MAC protocols or the routing solutions, JSMCRP protocol was designed
as a proactive network exploiting the congestion information from the MAC layer,
link quality information from the network later, and data-specific priority informa-
tion from the application layer to perform dynamic routing decision. Undeniably,
unlike classical standalone parameter-based routing, JSMCRP protocol yields supe-
rior performance in terms of optimal resource utilization, higher throughput, low
packet loss, etc., for both real-time traffic as well as non-real-time data. This approach
exploited the single hop information (i.e. node information (cumulative congestion
degree, packet velocity, dynamic link quality, and node rank information) pertaining
to each neighbouring node towards the destination). Despite the fact that JSMCRP
protocol yields better performance, it could not address the uncertainty in the link-
reliability or outage caused due to malicious not or hardware failure. In other words,
Multi-level Hierarchical Information-Driven … 197
JSMCRP applied one hop information to perform routing decision, where a neigh-
bouring node with suitable node features was selected as the best forwarding node.
However, it failed in addressing the problem which could be caused due to the failure
in the subsequent forwarding node. On the other hand, iterative estimation of the node
score for each forwarding node as per the JSMCRP criteria might result into exhaus-
tive process and hence can impact energy consumption. To alleviate such problems, it
is vital to consider both node transmission efficiency (i.e. energy efficiency and QoS
constraints) as well as reliability (i.e. fault resilience) while performing QoS-centric
routing decision in mobile-WSN.
Considering above problem and allied scopes, in this research paper, a state-of-the-
art new and robust QoS-oriented routing protocol is developed. The proposed model
can be called as an extension to our previous work, JSMCRP [9]. In this paper, a multi-
level hierarchical information-driven risk-sensitive routing protocol is developed for
mobile-WSN (MHIR-SRmW). As the name indicates, our proposed MHIR-SRmW
protocol exploits dynamic network information as well as node behavioural aspects to
perform node profiling and risk assessment, based on which it executes routing deci-
sion. At first, our proposed MHIR-SRmW protocol exploits node information such
as IEEE 802.11 MAC information exchange, flooding information, link availability,
and topological information such as link-index variation rate (LIVR) and cumula-
tive congestion degree. The proposed risk profiling model helps MHIR-SRmW to
segment the best suitable forwarding nodes for future transmission. However, real-
izing the fact that there can be the node death due to hardware malfunction, physical
damage or even sudden malicious node attacks, and therefore, merely estimating best
forwarding node can’t help ensuring QoS provision. To address this problem, MHIR-
SRmW considers a heuristic-driven multi-path availability-based recovery strategy
(HMPARS). Our proposed MPARS model intends to design multi-path transmission
or relaying strategy with minimum number of shared components in the forwarding
paths. Noticeably, the proposed method considers single transmission path at a time.
However, retains two additional relaying paths for recovery transmission in case
of any node failure or node death. Moreover, when deciding the forwarding paths,
our proposed MHIR-SRmW model guarantees that the three different forwarding
paths have no shared component or connected component. Unlike classical Dijk-
stra shortest distance-based recovery strategy, MPARS model applies a heuristic
concept that guarantees that the recovery paths don’t have the shared component
and considers only those nodes with optimal node profile value to perform data
transmission. In case of any fault during dynamic routing, MHIR-SRmW protocol
switches the relaying path without undergoing beaconing or node discovery, and
thus saves energy as well as time. In MHIR-SRmW, at the network start phase,
only our proposed model estimates supplementary recovery paths and saves in the
proactive table. Once identifying any link-outage, it switches to another disjoint path
and completes data transmission. In this manner, MHIR-SRmW protocol ensures
higher reliability as well as QoS assurance, which makes it suitable for real-time
WSN communication. The overall proposed routing protocol (i.e. MHIR-SRmW)
was simulated using Network Simulator 2 (NS2), and performance was examined in
terms of throughput, packet loss, delay, and energy consumption.
198 B. V. Shruti and M. N. Thippeswamy
The other sections of this manuscript are divided as follows. Sect. 2 discusses
the related work, which is followed by proposed method and its implementation in
Sect. 3. Section 4 presents the results and discussion and the overall research conclu-
sion and allied inferences are given in Sect. 5. References used in this manuscript
are given at the end of the manuscript.
2 Related Work
This section discusses some of the key literatures pertaining to the WSN MAC proto-
cols for QoS-centric communication. Authors in [8] proposed three MAC proto-
cols named routing-enhanced MAC (RMAC), pipelined routing-enhanced MAC
(PRMAC), and CLMAC to improve delay and reliability. To perform routing deci-
sion, authors applied merely the packet delivery rate and delay information to perform
routing decision. Realizing congestion as a frequent problem in WSN, authors
[10] proposed contention radio-based MAC (CR-MAC). Additionally, applying CR-
MAC authors reframed a synchronous routing model named joint routing and MAC
(JRAM-MAC) in [11]. However, authors [10–12] failed in addressing the fault prob-
ability and its impact on routing performance. A further improved model was devel-
oped in [13] which found that JRAM-MAC can be suitable in terms of energy and
packet delivery. Even though, it could not address other QoS aspects like reliability
under mobile topology and delay. In [14], authors proposed an adaptive operating
cycle MAC protocol which focused on maintaining optimal trade-off between energy
and QoS in wireless mess network. Similarly, authors in [15] designed MAC and PHY
layer information to perform routing in WSN. Authors applied shortest path infor-
mation to perform QoS-centric routing in mobile network. In [16], authors designed
a non-destructive interference MAC (NDI-MAC) model for constrained network
routing. Author proposed a multi-channel pure collective Aloha (MC-PCA) MAC
protocol [16]. However, it was highly computationally exhaustive. In [17], authors
proposed routing-enhanced MAC (RM-MAC) protocol where authors focused on
achieving timely data delivery.
Authors in [17] suggested to design a cross-layer routing concept towards WSN
routing, yet it failed in addressing network dynamism and link vulnerability adaptive
routing solution. In [18], a directed diffusion routing protocol (DDRP) was devel-
oped. Authors designed packet delivery rate analysis-based MAC routing protocol
for WSN. In [19], a receiver-centric MAC (RC-MAC) was proposed where authors
applied the different traffic conditions to perform routing decision. In [20], a MAC
protocol named harvested energy adaptive MAC (HEMAC) was developed by using
periodic listen and sleep mechanism with two frames pioneer (PION) and explorer
(EXP). Authors applied delay as the decision variable to perform routing deci-
sion in WSN. In [21], authors proposed a joint MAC and routing protocol for
Wireless Body Network (WBAN). To achieve it, authors designed a cross-layer
routing concept by applying MAC and PHY layer information. To further improve
MAC performance towards QoS-oriented WSN routing, different algorithms like
Multi-level Hierarchical Information-Driven … 199
Hybrid Medium Access Control (H-MAC), Hybrid Sensor Medium Access Control
(HSMAC), and Hybrid Medium Access Control (H-MAC) were proposed in [22].
These algorithms were developed to improve the performance of adaptive demand
multi-path distance vector (AOMDV) routing purpose. Authors found that H-MAC-
based AOMDV performs better than other methods. A few efforts like [23, 24] applied
energy-aware routing protocol EAMP-AIDC by using adaptive individual duty cycle
(AIDC) optimization model which applies residual energy as the decision variable to
decide duty cycle [24]. A similar effort was made in [25] named adaptive energy effi-
cient and rate adaptation-based MAC routing protocol (AEERA – MACRP). In [26],
authors proposed an energy-aware MAC protocol routing MAC protocol (EARMP)
that changes duty cycles to cope up with network demands.
Towards fault-resilient routing, authors [26] proposed location-based RMAC (RL-
MAC) which exploited inter-node information to configure network with minimum
contention window [26]. Unfortunately, authors could not address fault resilience
and post-fault proactive routing decision. Authors [27] improved CSMA/TDMA
MAC model with channel-adaptive named iQueue-MAC to retain QoS performance
under burst traffic. As an enhanced solution, authors [28] proposed power-control
and delay aware routing MAC (PCDARM) for multi-path transmission. To incul-
cate multihop transmission, authors applied hop extended pipelined routing MAC
(HE-PRMAC) for WSNs. To ensure timely data transmission, authors [29] devel-
oped residual time-driven RD-MAC and depth-base routing MAC (DBR-MAC). In
[30], authors suggested a cross-layer model using network layer and MAC layer
to perform routing over WSNs. Similarly, authors in [31] proposed cross-layer
protocol for multi-sink WSN (MS-WSN). In [32], buffer-reservation-based MAC
was proposed for WSN. In [33], authors developed adaptive geographic any cast
(AGA-MAC) protocol by solving sleep-delay problem of asynchronous preamble-
based MAC. AGA-MAC protocol selects best forwarding node to complete trans-
mission. However, it failed in addressing fault proneness under dynamic topology.
Authors [34] exploited dynamic information from the different layers PHY, MAC,
and network layers to perform WSN routing protocol. Contention-based cross-layer
synchronous MAC protocol (CROP-MAC) was developed for WSN [24]. COP-
MAC applied staggered sleep/wake scheduling, synchronization, and routing layer
information to perform routing decision. Author [28] proposed a cross-layer model
that dynamically switches the MAC behaviour between TDMA and CSMA. In [35],
MAC-aware routing protocol for WSNs was proposed using two-hop information to
make next hop routing. However, authors could not address the link-outage proba-
bility over run time to perform QoS-centric routing. Despite of the different efforts,
most of the existing methods have failed to address network dynamism caused
fault possibility while making QoS-centric routing. Moreover, no significant effort
could address the possibility of iterative faults due to joint or shared node death in
forwarding path. It can be considered as the key driving force behind this study.
200 B. V. Shruti and M. N. Thippeswamy
3 System Model
This section primarily discusses the overall proposed JSMCRC protocol and allied
implementation.
In this section, the overall proposed MHIR-SRmW protocol and allied imple-
mentation is discussed. To ensure QoS-oriented communication in mobile-WSN,
our proposed MHIR-SRmW model encompasses two key steps. These are:
1. Multi-layered hierarchical dynamic information-based node profiling,
2. Heuristic-driven multi-path availability-based recovery strategy (HMPARS).
Being an extension of our previous work, JSMCRP [], in this research, we
exploited dynamic node information from both PHY layer as well as system layers (in
OSI, the layers above PHY are referred as System layer) to perform node profiling,
which act as a decision variable to perform suitable forwarding node candidate esti-
mation. Once identifying the best set of forwarding nodes, MHIR-SRmW executes
MPARS which identifies the best multiple forwarding paths with no shared compo-
nents. Here, we hypothesize that ensuring no shared component(s) in two disjoint
paths can help alleviating iterative faults or link-loss over large dynamic network,
especially under certain attack or physical damage conditions. In this reference, our
proposed MPARS model applied particle swarm optimization algorithm that exploits
node profile information along with the node connectivity information to identify a
set of recovery paths which helps rerouting or forwarding the data under any node
failure or link-outage. In this manner, it intends to guarantee QoS-centric transmis-
sion in WSN network. The detailed discussion of the overall proposed model is given
in the subsequent sections.
In sync with a dynamic network like mobile-WSN, it is always a case when all nodes
undergo exceedingly high network dynamism, change in topology, swift and frequent
link change and outage, congestion, etc. This dynamism often causes a network to
undergo link-unavailability and hence results into transmission failure that not only
impose retransmission but also mounts large redundant transmission cost, energy
consumption, and delay. Eventually, such events result into QoS violation. The clas-
sical WSN protocols like reactive protocols often fail in addressing such challenges
due to high delay and energy consumption. To alleviate it, in this work, we define
MHIR-SRmW model as a proactive network management model. As the name indi-
cates, we exploit the different dynamic network information from the different layers
of the protocol stack to perform node profiling, which helps identifying the suitable
set of nodes for further routing decision. Furthermore, in contemporary networks, the
presence of malicious nodes or intruder to imposes significant (mischievous caused)
Multi-level Hierarchical Information-Driven … 201
specific node from any future path formation. Moreover, in case a node doesn’t
responds the request beacon message (say, HELLO message), the node classifies
that node as the malicious or misbehaving nodes, and thus excludes that for further
routing purpose. Thus, in this approach, a participating node X examines the likeli-
hood that a node can be able to transmit the data successfully to Z node or not. Here,
we estimated the likelihood of successful transmission using (1).
ξRx(ti−1 , ti )
PM = (1)
ξExp(ti−1 , ti )
In (1), the parameter ξRx and ξExp represent the total number of beacon message
received, and the total number of expected beacons during the time-interval (ti−1 , ti ).
In addition to the probability of successful transmission by a node, our proposed
model estimated two other MAC parameters, first dynamic link quality and second
the cumulative congestion probability. Similar to our previous work, JSMCRP [], in
MHIR-SRmW protocol as well we estimated dynamic link quality of a node during
the period (ti−1 , ti ), using Eq. (2).
PRx
PDRi j = (2)
PTx
In (2), the parameter PRx represents the total packets received, while PTx presents
the total number of packets transmitted by i th node to the destination j th node.
Additionally, our proposed MHIR-SRmW protocol model applied Eq. (3) to estimate
the dynamic link quality of each participating node in the deployed network.
( )
βDLQI = μ ∗ βDLQI + (1 − α) ∗ PDRi j (3)
In (3), βDLQI represents the dynamic link quality of the i th node, while μ be the
network coefficient that often varies in the range of 0–1. In addition to the transmis-
sion probability value and dynamic link quality at the MAC layer, the probability
of congestion too can be estimated to perform node profiling. In this reference,
our proposed MHIR-SRmW protocol exploited the information like the maximum
buffer capacity and current buffer availability to assess congestion probability of a
node. Recalling the fact that in mobile-WSN due to network dynamism or topo-
logical changes, there can be the iterative congestion at the participating nodes, we
estimated cumulative congestion degree using Eq. (4) and (5).
CDRPD + CDNR2D
CD F = (4)
CDRPD_ Max + CDNR2D _ Max
N
Σ
CDr = CD Fi (5)
i=1
Multi-level Hierarchical Information-Driven … 203
In above derived model (4), the parameters, CDRPD and CDRPD represent the buffer
available for the real-time traffic and the buffer available for non-real-time traffic,
respectively. Similarly, CDRPD_ Max and CDNR2D _ Max are the maximum buffer avail-
able for the real-time data and the non-real-time data, respectively. The cumulative
congestion of a node during the time-interval (ti−1 , ti ) is given by CDr . Thus, applying
above methods, our proposed MHIR-SRmW routing protocol estimated the key MAC
information including successful transmission probability (1), dynamic link quality,
(3) and cumulative congestion degree of a node (5). These MAC parameters are used
as node profile variables for further computing.
1Σ
N
Tload_i = lj (6)
L j=1
Let lmax be the highest queue length of a participating node (at MAC layer), the
total traffic density is estimated using (7).
Tload_i
TloadDens_i = (7)
lmax
204 B. V. Shruti and M. N. Thippeswamy
Since, the result of (8) is directly related to the transmission delay and energy
consumption, and therefore, our proposed MHIR-SRmW protocol considered a
node with minimum queue length for forwarding node selection. Thus, estimating
above stated MAC information from IEEE 802.15.4 protocol stack of each node,
MHIR-SRmW protocol executes heuristic concept to perform best forwarding path
to guarantee QoS in WSN.
Recalling the fact that in mobile-WSN due to random movement pattern, there can
be the situation of overhearing which might force the node(s) to make redundant
communication and hence delay and/or QoS violation. Moreover, a malfunctional
node or intruder can also create overhearing condition such as reply attack in WSN
which can cause wrong transmission decision. To alleviate it, MHIR-SRmW protocol
intends to deploy network in such manner that during data transmission to the next
hop node, it doesn’t create any overhearing condition in vicinity. If a node is able
to overhear the packet forwarding from the one hop-distant node, it is labelled as
the normal node. On the contrary, a node creating iterative beaconing is classified
as malicious node. Thus, when a transmitter node is unable to overhear the retrans-
mission of its packet when the destination node is unreachable because of the stale
or repeated routing information, the corresponding forwarding node is identified as
a malicious node. Thus, in reference to these information, MHIR-SRmW estimated
a factor called trustworthiness for each participating node using Link-Index Change
Rate (LICR), estimated as per (9) for ith node.
ηi = γi + δi (9)
In (9), the parameter γu states the rate of arrival, while δi presents the rate of link-
outage by the i th node. Noticeably, the highest feasible rate of arrival γi_Max should
be same as the rate of link-outage, and thus the highest link-outage (δi_Max ) can be
estimated as γuMax + δi_Max = 2.σi [36]. Thus, the change in link quality or LICR
can be estimated using (10).
γi + δi
η= (10)
2.σi
Pη = 1 − η (11)
Thus, in reference to the above derived parameter, node with high link change
LICR can be avoided during forwarding path selection. Now, once the proposed
model has estimated above stated parameters such as the likelihood of successful
transmission, dynamic link quality, cumulative congestion, change in link quality
for each participating node, each node was characterized for its suitability to become
the forwarding node. The risk analysis and labelling were performed using Eq. (12).
[( ) ( ) ( ) ( )]
Nodesel = f min Pη , min CDr , max PSucc_i , max βDLQI (12)
i∈N i∈N i∈N i∈N
the time for which the node remains connected to the peer nodes. In other words,
it assesses whether the participating nodes can remain in connection during
(ti−1 , ti ) to complete the transmission. Thus, a node with higher connectivity or
availability would be considered for the recovery path formation. Moreover, it
considers that the forwarding paths involving minimum distance (source to desti-
nation) and no connected component can be considered as the best forwarding
path to ensure QoS. Here, we intended to achieve two disjoint paths with higher
connectivity and no shared component. Thus, employing the suitable set of nodes
(in reference to (12)), our proposed HMPARS model performs the best disjoint
path selection while fulfilling the criteria of (12), high connectivity, and avail-
ability with no shared component. This problem resembles a typical NP-hard
problem or convexity problem, which can be solved by any heuristic method.
In this reference, we developed a particle swarm optimization (PSO) algorithm
that exploits above parameters and identifies a set of forwarding paths with high
connectivity (no link-loss or connectivity loss) and no shared component. In our
proposed HMPARS model, we employed first-order approximation approach to
obtain node or path unavailability, as the sum of unavailability of all connected
nodes. In this work, we executed the proposed HMPARS model as Monte Carlo
simulation that helped in calculating the dynamic topology and allied network
conditions. It also helped in deploying the network as probabilistic network, and
hence, we applied Bayesian network model to deploy mobile-WSN over defined
geographical region.
B. Link Connectivity Estimation
Typically, in wireless communication, node connectivity states the likelihood that
minimum one node path in between source and destination is present. In this
reference, a sensor node n0 can be connected to the recovery path only when it (i.e.
n 0 ) is active in conjunction with minimum one path joining source to the desti-
nation. Towards QoS-oriented communication in WSN, our proposed MHIR-
SRmW protocol hypothesizes that each node possesses two disjoint forwarding
paths, with no shared component(s). Consider that the forwarding paths for n0
be P0 , …PK−1 , while Pk be the connections with Pk , then the connected path
can be obtained as per (13)).
( )
K −1
C(n 0 ) = A(n 0 )A ∪ Pk A(C) (13)
k=0
( K −1 )
In (13), A ∪k=0 Pk represents the path availability or the set of forwarding paths
available for recovery. Despite the active terminal node, the connectivity of n0 might
undergo loss condition when the route Pk fails. Thus, hypothesizing that the node
and its allied link-outage are independent in nature, we estimate the link availability
as per (14).
Multi-level Hierarchical Information-Driven … 207
( ) Π −1
K −1
K
( )
A ∪ Pk = 1 − Ur Pk (14)
k=0
k=0
Consider that the network encompasses {n k,0 , . . . n k , f k } sensor nodes and their
corresponding link in path k be {ek,1,2 . . . , n k , f k−1 , f k } preconditioned at, n k,0 =
n 0 yn k , = f k C, then link availability is estimated as per (15).
Ur (Pk ) = 1 − Ar (Pk )
Π
f k−1
( )Π k−1 f
( )
=1− An n k,i Ae ek, j, j+1 (15)
i=1 j=0
Now, employing above derived link availability model Eq. (13–15), the link
connectivity for the transmitter n0 is obtained using (16).
⎛ ⎛ ⎞⎞
Π −1 Π ( )Π
f k−1 f k−1
K
( )
C(n 0 ) = A(n 0 )A(C) × ⎝1 − ⎝1 − An n k,i Ae ek, j, j+1 ⎠⎠ (16)
k=0 i=1 j=0
f −1
Π
f
( )Π ( )
C(n 0 ) = An n j Ae ek,k+1
j∈ϕn k∈ϕe
⎛ ⎛ ⎞
Π ( ) Π ( )
× ⎝1 − ⎝1 − An n 0,i Ae e0, j, j+1 ⎠
i∈ϕn,0 j∈ϕe,0
⎛ ⎞⎞
Π ( ) Π ( )
×⎝1 − An n 1,i Ae e1, j, j+1 ⎠⎠ (17)
i∈intϕn,1 j∈ϕe,1
Now, the probability that the unshared links or path which doesn’t impact the
retransmission probability or link-loss is obtained as per (20).
f −1
Σ
f
( ) Σ ( )
L(n 0 ) ≈ Un n j + Ue ek,k+1 (20)
j ∫ ϕn k ∫ ϕe
Observing above derived models (19) and (20), it can be found that a QoS-oriented
recovery path pair can be designed without applying any shared component. To
achieve it, we applied PSO algorithm which exploited link availability (or link-loss
probability) as the cost function (21) to perform disjoint recovery path estimation
to guarantee QoS performance. Due to space constraints, the detailed discussion of
PSO algorithm is not given in this paper.
In HMPARS, PSO intends to retain dual disjoint paths in such manner that it retains
minimum hops, low link connectivity loss probability, and no shared component. PSO
algorithm is executed overall all possible paths and process continues by adding a new
hop sensor node (following (12)) with minimum link-loss probability and no shared
component. This process continues till the likelihood of achieving superior path turns
out to be very low. This process employs two iterative mechanism, path selection, and
pruning. Once selecting the suitable forwarding path over certain iteration k, the other
path(s) from Sk possessing low-cost function or poor link connectivity are pruned or
removed. It helps not only in achieving QoS but also reduced computational costs
and allied resource exhaustion. MHPARS model applied the cost function c(P) for
each possible path P using (21).
Now, let P be the forwarding path formed with zero link-unavailability. Then, the
path R can be connected to the node n f , and thus for any complete forwarding path
( )
Mi ∈ Sk , L P, Mi be the connectivity loss for the source node n 0 . Thus, the mean
connectivity loss is estimated using (22).
1 Σ
Nc
( )
L̃(P) = L P, Mi (22)
Nc i=1
Multi-level Hierarchical Information-Driven … 209
This is the matter of fact that in case of dynamic networks like mobile-WSN the
link-loss probability and so for the path P. In this reference, the cost function as
derived in (22) can be reframed as (23).
In above derived function (23), E(P) is obtained based on the average loss caused
per link across the path pairs. In other words, E(P) is estimated as per (24).
1 Σ
Nc
E(P) = E(P, Mi ) (24)
Nc i=1
where
( ) L̃(Mi ) ( )
E P, Mi = d nP , n f (25)
λ
Now, to estimate the distance values, graph theory is applied representing a graph
matrix A possessing varied components ai j where ai j = 1 when the link between
node i to node j is active. Otherwise, it follows ai j = 0 and aii = 1. In this reference,
it estimates a matrix B(k), which is defined as (26).
B(k) = Ak (26)
In above expression (26), the parameter B(k) possesses bi j (k) which is same as the
total paths to reach j from i with hops lower than k. Thus, with bi j (k) = 0, there
would not be the other feasible path approaching j from i node in k-hops. In our
proposed model, we estimated the distance between i to j node as the shortest path,
which can be estimated as per (27).
The model derived above (27) states that d(i, j ) can have the minimum value of k-
hops when bi j (k) > 0. Thus, employing (19) and (27), with high link availability, our
proposed HMPARS model obtained two disjoint paths with no shared components.
In our proposed MHIR-SRmW protocol, once estimating the values for (25), all paths
available in Sk are updated proactively. Thus, during transmission in case MHIR-
SRmW protocol identifies any fault or sudden link-loss with “0” link connectivity,
it switches to the available disjoint path and completes the transmission without
undergoing node discovery and recovery path estimation. This as a result helped
ensuring optimal QoS assurance for mobile-WSN.
210 B. V. Shruti and M. N. Thippeswamy
overall proposed model was developed using Network Simulator platform, where the
algorithms were developed in CPP programming language. The model was simulated
over Ubuntu 14 version over the CPU armoured with 8 GB RAM. The experimental
setup considered for the simulation is given in Table 1.
To assess relative performance by our proposed MHIR-SRmW protocol, we
considered our previous work JSMCRP model [] as reference. Similar to the proposed
MHIR-SRmW protocol, JSMCRP protocol employs Application Layer, Network
Layer, MAC Layer, and PHY Layer to perform Network Adaptive MAC scheduling
and Dynamic Routing Decision. JSMCRP employs Data Traffic Assessment, Prior-
itization and Scheduling (DTAPS), Proactive Network Monitoring and Knowledge
(PNMK), Dynamic Congestion Index Estimation (DCIE), Adaptive Link Quality,
Dynamic Packet Injection Rate (DPIR), and Cumulative Rank Sensitive Routing
Decision (CRSRD) to perform routing decision. Thus, exploiting the dynamic values
of the cumulative congestion degree, packet injection rate, and dynamic link quality
information, our previous work JSMCRP performed best forwarding node selection
and completed the transmission. The packet delivery rate (PDR) performance by
JSMCRP was almost 99% that signifies its robustness towards QoS-oriented commu-
nication. However, JSMCRP model didn’t consider any sudden link-outage proba-
bility and its consequence on the performance. On the contrary, the work proposed
in this paper (i.e., MHIR-SRmW protocol) considers link-outage adaptive routing to
guarantee QoS provision. To assess efficacy of the proposed MHIR-SRmW protocol
as well as JSMCRP, we introduced predefined node-death scenario and accordingly
their performance towards complete data transmission was examined. We estimated
four key performance parameters, packet delivery rate, packet loss rate, latency, and
energy consumption to assess relative performance by our proposed MHIR-SRmW
protocol and previous work JSMCRP protocols. The simulated results and allied
inference are given as follows.
Being a mobile topology network, mobile-WSN might undergo high congestion,
link-outage probability, etc. Its severity might increase as per rise in network density.
In other words, with a large number of autonomously communicating nodes, the
likelihood of congestion and hence packet loss might increase. On the other hand,
with the large number of nodes in a dense mobile-WSN network, the search space
for HMPARS might have to process large data, and hence, it might be exhaustive
especially in terms of delay, packet loss, and retransmission (hence high energy
consumption). Considering this hypothesis, in this study, we examined whether our
proposed MHIR-SRmW protocol alleviates above stated problem and yields timely
data delivery while guaranteeing minimum delay, higher throughput, and minimum
energy consumption. Achieving such performance might help accomplishing QoS
performance. In this reference, we simulated the proposed MHIR-SRmW protocol
as well as JSMCRP model independently with the different node density. Here,
we simulated the proposed models with the different node density (i.e. 10, 20, 30,
40, 50, 60, 70, 80, 90, and 100 nodes). Moreover, we deployed random link-loss in
simulation, even though the number of faults or faulty node considered (or deployed)
was one. Thus, once simulating the proposed methods (i.e. MHIR-SRmW protocol
and the previous work JSMCRP), the node death was identified in the forwarding path
and subsequently the efficiency of the proposed model was examined towards QoS
performance. To estimate the packet delivery rate (PDR), we applied the classical
equation representing the ratio of the total received data to the total transmitted
data. On the contrary, the packet loss was estimated as (100-PDR (%)). Towards
energy estimation, we considered the classical energy model, where the key energy
consumption took place at the time of node discovery and practice table estimation
towards dual disjoint path estimation (with no shared component). Additionally, the
energy exhaustion took place due to the power amplifier and the per-bit-transmission
cost (m J ). The simulation parameters used are given in Table 1.
Figure 1 presents the packet delivery ratio (PDR) performance by our proposed
MHIR-SRmW protocol and the previous contribution JSMCRP []. This is the matter
of fact that the previous work JSMCRP had exhibited almost 99% of the successful
packet delivery or PDR. However, there were no fault or node-death case during simu-
lation. In other words, JSMCRP didn’t consider link-outage probability and resulting
recovery path estimation. In JSMCRP, we merely had estimated node parameters to
decide best forwarding node and based on that a forwarding path was defined. On
Multi-level Hierarchical Information-Driven … 213
the contrary, in case of fault presence or node death condition, JSMCRP is to iden-
tify the best suitable alternate path for recovery transmission. In JSMCRP, it can be
achieved by only performing node discovery (post node death) and best forwarding
path selection. This process can not only cause data drop but can also impose conges-
tion, delay, and energy consumption. Therefore, JSMCRP might suffer packet losses.
This fact is quite visible through the simulation results (Figs. 1 and 2). Observing the
results, it can be found that the proposed MHIR-SRmW protocol which embodies the
self-configuring dual disjoint forwarding path during node death achieves superior
performance than JSMCRP. The depth assessment reveals that the proposed MHIR-
SRmW protocol performs average PDR of 97.6%, while JSMCRP could achieve
the average performance (i.e. PDR) of 86.8%, which is significantly lower than the
proposed model. Though, one interesting outcome can be observed that even with
the large number of nodes, both JSMCRP as well as the proposed MHIR-SRmW
protocol retains near-stable PDR performance. This case be because of the proactive
network management capability. The packet loss rate (PLR) too indicates (Fig. 2)
that the proposed MHIR-SRmW protocol exhibits lower PLR than the JSMCRP
mobile-WSN protocol.
In real-time communication across the WSN application environment, guaran-
teeing timely data transmission is inevitable. On the contrary, delay or latency
might be caused due to higher computation, packet loss, retransmission etc. Though,
above results confirm that the proposed MHIR-SRmW protocol exhibits lower PLR
and hence low retransmission and hence can be expected to exhibit low latency.
However, both JSMCRP as well as MHIR-SRmW protocol might have to execute
alternate forwarding path selection process during link-loss. This process can be time-
consuming. Realizing this fact, we examined our proposed MHIR-SRmW protocol as
well as the previous work, JSMCRP in terms of their corresponding time-efficiency.
To assess latency over a continuous channel access period of 60 s, we simulated our
proposed model with 60 number of nodes. The simulation results with JSMCRP as
well as the proposed MHIR-SRmW protocol is shown in Fig. 4. Observing the results
in Fig. 4, it can easily be found that the proposed MHIR-SRmW protocol performs
significantly smaller delay in comparison with the previous work of JSMCRP. Here,
the key reason can be the network discovery and best forwarding path formation
cost in JSMCRP. On the contrary, our proposed MHIR-SRmW protocol achieves
the dual disjoint forwarding path at the start of the network only, and hence once
identifying any link-outage or node death, it switches to the disjoint forwarding path
without undergoing node discovery or allied process. It makes our proposed MHIR-
SRmW protocol more time efficient (Figs. 3 and 4). The results obtained (Figs. 3
and 4) reveals that though both JSMCRP as well as our proposed MHIR-SRmW
protocol exhibits same delay at the start of simulation; however, once detecting the
node death and link-loss, JSMCRP undergoes higher delay, due to data loss caused
retransmission, node discovery process and another forwarding path estimation.
This is the matter of fact that the energy consumption is directly related to the
retransmission or packet loss rate and computation. Though, MHIR-SRmW protocol
employs more computing elements. However, its higher PDR performance enables it
to reduce the energy consumption. In this reference, the result obtained (Fig. 5) reveals
that the proposed MHIR-SRmW protocol exhibits less than 92 mJ power consump-
tion even with high density network. On the contrary, despite of low-computational
exhaustive, JSMCRP underwent higher power exhaustion or consumption due to
retransmission. In reference to the PLR performance (Fig. 2), the retransmission prob-
ability is higher in case of JSMCRP, and hence, it undergoes higher power consump-
tion (Fig. 5). Observing overall performance in terms of high packet delivery, low
packet loss, low delay, and fault-resiliency, the proposed MHIR-SRmW protocol
accomplishes QoS performance. The results obtained signify that the proposed model
can be applied in real-time mobile-WSN application, where it can guarantee both
network reliability as well as statistical performance to meet application’s demands.
The overall research conclusion and allied inferences are given in the subsequent
section.
5 Conclusion
References
1. Ehsan S, Hamdaoui B (2012) A survey on energy-efficient routing techniques with QoS assur-
ances for wireless multimedia sensor networks. Commun Surv Tutorials IEEE 14(2):265–278
2. Spachos P, Toumpakaris D, Hatzinakos D (2015) QoS and energy-aware dynamic routing in
wireless multimedia sensor networks. In: 2015 IEEE international conference on communica-
tions (ICC), pp.6935–6940
3. Sen J Ukil A (2009) An adaptable and QoS-aware routing protocol for wireless sensor networks.
In: 2009 1st International Conference on Wireless Communication, Vehicular Technology,
Information Theory and Aerospace & Electronic Systems Technology, pp 767–771
4. Khanke K, Sarde M (2015) An energy efficient and QoS aware routing protocol for wireless
sensor network. Int J Adv Res Comput Commun Eng 4(7)
5. Lombardo L, Corbellini S, Parvis M, Elsayed A, Angelini E, Grassini S (2018) Wireless Sensor
Network for Distributed Environmental Monitoring. IEEE Trans Instrum Measure 67(5):1214–
1222
6. Boukerche A, Nelem Pazzi RW (2007) Lightweight mobile data gathering strategy for wire-
less sensor networks. In2007 9th IFIP international conference on cork mobile wireless
communications networks, pp 151–155
7. de Ara´ujo GM, Becker LB (2011) A network conditions aware geographical forwarding
protocol for real-time applications in mobile wireless sensor networks. In: 2011 IEEE
international conference on advanced information networking and applications, pp 38–45
8. Singh R, Rai BK, Bose SK (2016) A novel framework to enhance the performance of contention-
based synchronous MAC protocols. IEEE Sens J 16(16):6447–6457
9. Shruti BV, Nagendrappa TM, Venkatesh KR (2021) JSMCRP: cross-layer architecture based
joint-synchronous MAC and routing protocol for wireless sensor network. ECTI Trans Electr
Eng Electron Commun 19(1):94–113.
10. Singh R, Rai BK, Bose SK (2017) A contention-based routing enhanced MAC protocol for
transmission delay reduction in a multi-hop WSN. In: TENCON 2017—2017 IEEE Region 10
conference, Penang, pp 398–402
11. Singh R, Rai BK, Bose SK (2017) A joint routing and MAC protocol for transmission delay
reduction in many-to-one communication paradigm for wireless sensor networks. IEEE Internet
Things J 4(4):1031–1045
12. Arifuzzaman M, Dobre OA, Ahmed MH, Ngatched TMN (2016) Joint routing and MAC
layer QoS-aware protocol for wireless sensor networks. In: 2016 IEEE global communications
conference (GLOBECOM), Washington, DC, pp 1–6
13. Sefuba M, Walingo T (2018) Energy-efficient medium access control and routing protocol for
multihop wireless sensor networks. IET Wirel Sens Syst 8(3):99–108
14. Chen S, Yuan Z, Muntean GM (2017) Balancing energy and quality awareness: a “mac-layer
duty cycle management solution for multimedia delivery over wireless mesh networks. IEEE
Trans Vehicular Technol 66(2):1547–1560
15. Haqbeen JA, Ito T, Arifuzzaman M, Otsuka T (2017) Joint routing, MAC and physical layer
protocol for wireless sensor networks. In: TENCON 2017—2017 IEEE Region 10 conference,
Penang, pp 935–940
16. Liu Y, Chen Q, Liu H, Hu C, Yang Q (2016) A non-destructive Interference based receiver-
initiated MAC protocol for wireless sensor networks. In: 2016 13th IEEE annual consumer
communications & networking conference (CCNC), Las Vegas, NV, pp 1030–1035
17. Liu Y, Liu H, Yang Q, Wu S (2015) RM-MAC: a routing-enhanced multi-channel MAC protocol
in duty-cycle sensor networks. In: 2015 IEEE international conference on communications
(ICC), London, pp 3534–3539
218 B. V. Shruti and M. N. Thippeswamy
18. Mohapatra S, Mohapatra RK (2017) Comparative analysis of energy efficient MAC protocol in
heterogeneous sensor network under dynamic scenario. In: 2017 2nd International conference
on man and machine interfacing (MAMI), Bhubaneswar, pp 1–5
19. Khalil MI, Hossain MA, Haque MJ, Hasan MN (2017) EERC-MAC: energy efficient receiver
centric MAC protocol for wireless sensor network. In: 2017 IEEE international conference on
imaging, vision & pattern recognition (icIVPR), Dhaka, pp 1–5
20. Senthil T, Bifrin Samuel Y (2014) Energy efficient hop extended MAC protocol for wireless
sensor networks. In: 2014 IEEE international conference on advanced communications, control
and computing technologies, Ramanathapuram, pp 901–907
21. Lahlou L, Meharouech A, Elias J, Mehaoua A (2015) MAC-network cross-layer energy opti-
mization model for wireless body area networks. In: 2015 International conference on protocol
engineering (ICPE) and international conference on new technologies of distributed systems
(NTDS), Paris, pp 1–5
22. Kalaivaani PT, Rajeswari A (2015) An analysis of H-MAC, HSMAC and H-MAC based
AOMDV for wireless sensor networks to achieve energy efficiency using spatial correlation
concept. In: 2015 2nd International conference on electronics and communication systems
(ICECS), Coimbatore, pp 796–801
23. Bouachir O, Ben Mnaouer A, Touati F, Crescini D (2017) EAMP-AIDC—energy-aware mac
protocol with adaptive individual duty cycle for EH-WSN. In: 2017 13th International wireless
communications and mobile computing conference (IWCMC), Valencia, pp 2021–2028
24. Reddy PC, Sarma NVSN (2016) An energy efficient routing and MAC protocol for bridge
monitoring. In: 2016 International conference on wireless communications, signal processing
and networking (WiSPNET), Chennai, pp 312–315
25. Thenmozhi M, Sivakumari S (2017) Adaptive energy efficient and rate adaptation based
medium access control routing protocol (AEERA—MACRP) for fully connected wireless
ad hoc networks. In: 2017 8th international conference on computing, communication and
networking technologies (ICCCNT), Delhi, pp 1–7
26. Cheng B, Ci L, Tian C, Li X, Yang M (2014) Contention window-based MAC protocol for wire-
less sensor networks. In: 2014 IEEE 12th international conference on dependable, autonomic
and secure computing, Dalian, pp 479–484
27. Zhuo S, Wang Z, Song YQ, Wang Z, Almeida L (2016) A traffic adaptive multi-channel MAC
protocol with dynamic slot allocation for WSNs. IEEE Trans Mob Comput 15(7):1600–1613
28. Rachamalla S, Kancharla AS (2015) Power-control delay-aware routing and MAC protocol for
wireless sensor networks. In: 2015 IEEE 12th international conference on networking, sensing
and control, Taipei, pp 527–532
29. Ananda Babu J, Siddaraju and Guru R (2016) An energy efficient routing protocol using RD-
MAC in WSNs. In: 2016 2nd International conference on applied and theoretical computing
and communication technology (iCATccT), Bangalore, pp 799–803
30. Wahid A, Ullah I, Khan OA, Ahmed AW, Shah MA (2017) A new cross layer MAC protocol for
data forwarding in underwater acoustic sensor networks. In: 2017 23rd international conference
on automation and computing (ICAC), Huddersfield, pp 1–5
31. Leao L, Felea V, Guyennet H (2016) MAC-aware routing in multi-sink WSN with dynamic
back-off time and buffer constraint. In: 2016 8th IFIP international conference on new
technologies, mobility and security (NTMS), Larnaca, pp 1–5
32. Seddar J, Khalifé H, Al Safwi W, Conan V (2015) A full duplex MAC protocol for wireless
networks. In: 2015 international wireless communications and mobile computing conference
(IWCMC), Dubrovnik, 2015, pp 244–249
33. Heimfarth T, Giacomin JC, de Araujo JP (2015) AGA-MAC: adaptive geographic anycast
MAC protocol for wireless sensor networks. In: 2015 IEEE 29th international conference on
advanced information networking and applications, Gwangiu, pp 373–381
34. Akhtar AM, Behnad A, Wang X (2015) Cooperative ARQ-based energy-efficient routing in
multihop wireless networks. IEEE Trans Veh Technol 64(11):5187–5197
35. Louail L, Felea V, Bernard J, Guyennet H (2015) MAC-aware routing in wireless sensor
networks. In: 2015 IEEE international black sea conference on comm. and networking
(BlackSeaCom), Constanta, pp 225–229
Multi-level Hierarchical Information-Driven … 219
1 Introduction
For any military force border surveillance is critical aspect to regulate line of control.
Perimeter surveillance can be carried out by a modern radar [1, 2] for timely detection
and tracking of targets. Target classification is vital in later stages of signal processing
and has been carried out manually by operator by listening to doppler audios. This
type of classification is tedious and varies from operator to operator. Artificial intel-
ligence (AI) gained hype in the past decade due to the availability of larger datasets
and faster hardware. The present work is a humble application of deep learning (a
subdomain of AI) to automate the task of non-cooperative target classification based
on radar backscattered signals.
There has been a lot of emerging research on target classification based on deep
learning. Deep learning typically searches the data for patterns for predicting /classi-
fying. A feed forward network lacks memory and is not suitable for prediction where
time-dependency among data samples is present. They consider only present input
sample for a prediction. For capturing time domain relationships, recurrent neural
networks (RNN) come in handy as they have memory but do not have long-term
memory [3].
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 221
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_19
222 V. R. Waingankar et al.
They also suffer from what is known as vanishing/exploding gradient descent [4].
To overcome the above problem an ameliorated version of RNN, long-short term
memory [5] is used. On the other hand, convolutional neural networks (CNN) are
extensively used for image-classification, because of advantage of automatic-feature
extraction [6].
Most target classification utilizing deep learning techniques have been imple-
mented in various kinds of radars, namely, high-resolution range profile (HRRP)
radars, micro-doppler radar, passive radar, synthetic aperture radar (SAR) based, and
forward scattering radar (FSR). In these works, radar data generally has large number
of features and undergoes various transformations after acquisition. For training any
model, large amount of data is the primary requirement, and this is typically done
by data simulating practical data [7–10]. With the advent of faster hardware, many
classification algorithms are based on RNN or its enhanced version, LSTM to extract
time-dependency information. LSTM has become an integral part for target classi-
fication in time-series data. The aim of present work is on maintaining degree of
accuracy in classification as the input sample size to the LSTM network decreases.
In this work, target classification task is performed by utilizing the radar back-
scattered signals also called as radar echoes of a ground surveillance radar. For
training the model, time-series parameters under consideration are RCS, doppler
frequency and velocity, generated by the mono-static radar for four different target
classes. For testing, the block diagram of auto-classification steps is shown in Fig. 1.
The echoes acquired by radar antenna follows a chain of processing before reaching
the expert system as shown typical to any radar. The echo signals from the present
radar consists of four parameters namely range, power, doppler bin number and
velocity of target. The expert system has various stages consisting of CNN, LSTM
and FC layers before predicting the target class. This time-sequence radar data after
pre-processing is reshaped to be sent to a CNN network for extracting features,
following classification by LSTM network. The rest of the paper is organized in the
following sections.
Section 2 describes the proposed methodology. Section 3 generation of database
followed by experimental results in Sect. 4. The proposed work is concluded in
Sect. 5.
2 CNN-LSTM Methodology
Convolutional neural networks (CNN) are type of neural network designed for
handling image data. They operate directly on raw data instead of domain specific
or handcrafted features. The model then learns to automatically extract features
called as representation learning and features are extracted regardless of how they
occur, known as transform or distortion invariance. This ability of CNN to learn and
automatically extract features can be applied to time-series problems.
The proposed model makes use of CNN for feature extraction and LSTM for
predictions. Raw radar data is provided to radar signal processor (RSP) which extracts
target signal from noise and is given as input to radar data processor (RDP) used
for track association, filtering, and computing velocity based on change in posi-
tion of target. Based on the track ids, reports are clubbed together in a set of 8
(sample size) and provided to the input layer. Each report consists of three param-
eters, namely, RCS, doppler, and velocity. Convolution layer is followed by max
pooling for reducing the number of inferences include the most significant features.
The structures are then flattened to one-dimensional vector to be used as single input
time step to the LSTM layer. As the next report arrives, the first report is discarded
in first-in-first-out (FIFO) fashion and the process is repeated.
Figure 2 shows the architecture used for classification. The CNN-LSTM consists
of architecture similar to LSTM architecture in Fig. 3. As quoted earlier the CNN
acts as feature extractor from radar signals, and forwards it to LSTM network.
3 Data Modeling
The parameters under consideration for training our classifier are RCS of target,
doppler frequency, and velocity.
Every target is characterized by a radar cross section (RCS) which accounts for the
amount of reflected energy. The Institute of Electrical and Electronics Engineers
(IEEE) dictionary of electrical and electronics terms [11] defines RCS as a measure
of the reflective strength of a target. The mathematical relation of RCS is given by
Eq. (1).
| scat |2
|E |
σ = lim 4π R | 2
| (1)
R→∞ | E inc |2
224 V. R. Waingankar et al.
Fig. 2 CNN-LSTM
Fig. 3 LSTM
⎧ [ ]
1
exp −σ ,σ ≥ 0
p(σ ) = σ σ (2)
0, σ < 0
⎧ [ ]
4σ
exp − 2σ
σ
,σ ≥ 0
p(σ ) = σ2 (3)
0, σ < 0
In the present radar, pulse-doppler processing is used to find doppler shift using radial
velocity component. The doppler frequency shift (f d ) is given by
Target Classification Using CNN-LSTM … 227
Fig. 5 Histogram of velocity samples of a walking person fitting to a Gaussian pdf of mean 1 m/s
2ν 2v
fd = f = (4)
c λ
The speed of electromagnetic wave c, radial velocity of target v, f and λ being
operating frequency
( and wavelength
) respectively. The maximum frequency shift is
bound between − PRF 2
to + PRF
2
. The f d is assigned a particular doppler bin number.
Based on practical data for the four targets doppler data is generated synthetically
based on maximum radial speed achievable. For doppler samples with time, we have
considered a Gaussian distribution with mean values varying from calculated from
Eq. (4) for various velocities in the range.
A single time sample consists of RCS, doppler bin number, and velocity. Input data
to CNN is generated by stacking radar echoes one after the other with respect to time
to form a block. The number of samples used for making a 2D matrix here are 8,16,
and 32 for three model configurations. Arranging data such a way preserves the time
dependency. This form of data is then fed to CNN layers followed by LSTM layers
finally producing the output.
228 V. R. Waingankar et al.
4 Experiments
Table 2 Comparison of CNN-LSTM and LSTM architecture with different sample size
Sample size Architecture Accuracy % Training loss
Training Test
8 CNN-LSTM 96.332 93.269 0.119
LSTM 90.044 86.343 0.224
16 CNN-LSTM 96.428 96.25 0.095
LSTM 93.573 91.629 0.169
32 CNN-LSTM 96.436 97.501 0.107
LSTM 96.559 96.744 0.118
Fig. 10 PPI Screen displaying classifier results for a moving vehicle from 400 to 1900 m at an
azimuth of 50 degree approximately a CNN-LSTM and b LSTM
5 Conclusion
The results show that the proposed model can be effectively employed in any radar
for surveillance. For sample size of 8 timestamps, the combination of CNN-LSTM
gave better accuracy compared to LSTM. Only 5.75% of the overall targets were
misclassified within subclassification. In this study we believe that we have developed
Target Classification Using CNN-LSTM … 231
an expert system for classification of ground targets in radar combining CNN for
feature extraction and LSTM for classification. The proposed model showed better
performance with test accuracy of 93.27 compared to 86.34% of LSTM alone. Further
research will be focused on classification with increased number of classes.
Acknowledgements The authors would like to thank General Manager, Military Radars, BEL for
their valuable help, encouragement and motivation during the implementation of the work described
in this paper.
References
1. Skolnik M (2001) Introduction to radar systems, 3rd edn. McGraw Hill, New York
2. Richards MA, Holm WA, Scheer JA (eds) (2010) Principles of modern radar Vol. I: basic
principles. SciTech Publishing 2010
3. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
4. Nir Arbel, How LSTM networks solve the problem of vanishing gradients, Posted on December
21 (2018). https://medium.datadriveninvestor.com/how-do-lstm-networks-solve-the-problem-
of-vanishing-gradients-a6784971a577
5. Christopher Olah, Understanding LSTM Networks, Posted on August 27 (2015). http://colah.
github.io/posts/2015-08-Understanding-LSTMs
6. Albawi S, Mohammed TA, Al-Zawi S (2017) Understanding of a convolutional neural network.
Int Conf Eng Technol (ICET) 2017:1–6. https://doi.org/10.1109/ICEngTechnol.2017.8308186
7. Kouba E, Rogers S, Ruck D, Bauer K (1993) Recurrent neural networks for radar target
identication. Proc SPIE Int Soc Opt Eng. https://doi.org/10.1117/12.152541
8. Jithesh V, Sagayaraj MJ, Srinivasa KG (2017) LSTM recurrent neural networks for high reso-
lution range profile-based radar target classification. In: 2017 3rd International conference on
computational intelligence & communication technology (CICT), pp 1–6. https://doi.org/10.
1109/CIACT.2017.7977298
9. Sehgal B, Shekhawat HS, Jana SK (2019) Automatic target recognition using recurrent neural
networks. Int Conf Range Technol (ICORT) 2019:1–5. https://doi.org/10.1109/ICORT46471.
2019.9069656
10. Wan J, Chen B, Xu B et al (2019) Convolutional neural networks for radar HRRP target
recognition and rejection. EURASIP J Adv Signal Process 2019:5
11. Jay F (ed) (1984) IEEE standard dictionary of electrical and electronic terms, ANSI/IEEE Std
100–1984, 3d edn. IEEE Press, New York
12. Swerling P (1960) Probability of detection for fluctuating targets. IRE Trans Inf Theory
I(6):269–308
Iron Oxide Nanoparticle Image Analysis
Using Machine Learning Algorithms
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 233
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_20
234 P. Bannigidad et al.
a b c d e
Fig. 1 TEM images of Iron Oxide Nanoparticles synthesized at varying temperatures using
chemical (a at 300 °C, b at 300 °C, c at 500 °C, d at 500 °C) and biological (e at 300 °C) methods
The TEM images a, b, c, d are obtained from the chemical method and image e
is obtained from the biological method of the Iron Oxide Nanoparticles used in the
study are presented in Fig. 1.
The essence of synthesis and characterization of Iron Oxide Nanoparticles is
contributed by many researchers in their respective studies. Cuenya [2] proposed
that the physical synthesization methods have less control over the size of NPs. Ling
Li et. al. [3] proposed the physical and chemical synthesization process of nanoscale
Iron-based materials and their environmental applications. Toyokazu Yokoyama
[4] explained the size-based effects on the thermal, electromagnetic, optical, and
morphological properties of nanoparticles. Amyn et.al. [5] briefed how continuous
supercritical hydrothermal synthesis provides the control and scalability of magnetic
Iron Oxide Nanoparticles. Lazar et.al. [6] explained particle shape analysis of silica-
coated Iron Oxide Nanoparticle clusters using computational methods. Nene et.al. [7]
explained different applications of Iron Oxide Nanoparticles such as curing cancer,
drug delivery, antifungal activity, antibiotic activity, imaging, and cellular labeling.
The Iron Oxide Nanoparticles are used in drug delivery Control in 3-D MRI was
proposed by Mohammed et. al. [8]. Bannigidad et.al. [9] have studied the character-
ization of nano-membrane engineering of Anodic aluminum oxide template. Ethan
[10] has described the functionalizing of Iron Oxide Nanoparticles for controlling the
movement of immune cells. Bannigidad et al. [11] investigated the impact of time on
Al2 O3 nanopore structures using automated devices on FESEM images. Characteri-
zation of various nanoparticles is a vibrant process that can be useful in understanding
the computational results of nanoparticles that are carried out by using various image
processing techniques. Ismail et al. [12] proposed diverse image segmentation tech-
niques; thresholding, Hidden-MarkovRandom-Field Model-Expectation Maximiza-
tion (HMRF-EM), Gaussian Mixture Model—Expectancy Maximization (GMM-
EM) to segment Iron Oxide Nanoparticles. Vidyasagar et.al. [13] proposed the influ-
ence of anodizing time on the porosity of nanopore structures. Reem et.al [14] used
some preprocessing techniques namely; noise filtering, and background subtraction,
and applied K-means clustering algorithms to segment Iron Oxide Nanoparticles
from 3-dimensional MRI Images. Bannigida et al. [15] proposed several segmenta-
tions techniques; K-means, active contour, global thresholding, region growing, and
watershed to characterize nanoporous membrane. In this study, an effort is made
Iron Oxide Nanoparticle Image Analysis … 235
Table 1 The details of chemicals used in obtaining the TEM nanoparticles and the temperature
maintained during the synthesis process
Image Chemicals used in the synthesis of the nanoparticles Temperature (°C)
a Prepared by Iron Nitrate 300
b Prepared by FeCl2 , 2H2 O 300
c Prepared by Iron Nitrate 500
d Prepared by FeCl2 , 2H2 O 500
e Prepared by Iron Nitrate using plant extract 300
to characterize the Iron Oxide Nanoparticle synthesized from both chemical and
biological methods.
The dataset in the study consists of various TEM images of Iron Oxide Nanoparticles,
and are synthesized and prepared by using two different methods, i.e. chemical and
biological. The details of chemicals used in obtaining the Iron Oxide Nanoparticles
TEM images and the temperature maintained during the synthesis process are given
in Table 1.
3 Proposed Method
The objective of the study is to develop an algorithm to analyze the Fe2 O3 nanopar-
ticles in the TEM images synthesized at varying temperatures using chemical (a at
300 °C, b at 300 °C, c at 500 °C, d at 500 °C) and biological (e at 300 °C) methods.
The features of the nanoparticles in the TEM images used in this study include;
finding the number of nanoparticles, calculating the size (area), perimeter, major
axis, minor axis, porosity, circularity, and interparticle distance. These features are
defined as:
• Area: Total number of pixels in an extracted nanoparticle.
• Major Axis: Total number of pixels along the length of the largest axis in a
nanoparticle.
• Minor Axis: Total number of pixels along the length of the smallest axis in a
nanoparticle.
• Porosity: Total particle size/Total Area*100
• Circularity: Minor axis/Major axis
• Interparticle distance; The distance d, between two particles whose coordinates
are (x1, y1) and (x2, y2) is
236 P. Bannigidad et al.
√
d= [(x2−x1)2 + (y2−y1)2 ]
where two points whose coordinates are (x1,y1) and (x2,y2)
The algorithm described in the proposed work is given below:
Step 1. Input Iron Oxide Nanoparticle TEM image.
Step 2. Apply image preprocessing techniques; gamma correction to
enhance the quality of the image.
Step 3. Apply Gaussian Mixture Model-Expectancy Maximization
(GMM-EM) segmentation technique to binarize the given input image.
Step 4. Remove unwanted background and noise from the segmented image
Step 5. Extract the individual particles and label them on the segmented image.
Step 6. Compute the geometric features, i.e., area(size), porosity,
circularity, interparticle distance, and average area, and store them
in the database.
Step 7 Compare and interpret with manual results.
Categorize the nanoparticles based on the following condition.
if the area is between is 0–50 nm then extract, count, and display
the nanoparticles
else if the area is between is 51–100 nm then extract, count, and
display the nanoparticles
else if the area is between is 101–150 nm then extract, count, and
display the nanoparticles
else if the area is between is 151–200 nm then extract, count, and
display the nanoparticles
end
Store all the nanoparticles in the database.
For the purpose of experimentation, the total of 137 TEM images of Iron Oxide
Nanoparticles obtained from two different synthesis methods, i.e. chemical and
biological are considered. Images a, b, c, and d are obtained from chemical, and image
E is obtained from the biological method. The features of each image are extracted
using MATLAB R2018a software on Intel(R) Core™ i5-10210U [email protected] GHz
system. In computing the features of the TEM images of Iron Oxide Nanoparti-
cles the significant challenge is the overlapping of particles in each image. The
experimentation is carried out by applying basic preprocessing operations such as
resizing and noise removing from the original images (Fig. 2. (a)). To extract indi-
vidual particles from every image, the image segmentation technique named Gaus-
sian Mixture Model—Expectancy Maximization (GMM-EM) is used to segment the
nanoparticles from each TEM image (Fig. 2. (b)). The various segmentation tech-
niques, namely; Gaussian Mixture Model—Expectancy Maximization (GMM-EM),
K-means, and Fuzzy C-Means clustering methods are tried, out of these segmenta-
tion methods the Gaussian Mixture Model—Expectancy Maximization (GMM-EM)
Iron Oxide Nanoparticle Image Analysis … 237
a b c
(i)
(ii)
(iii)
(iv)
(v)
Fig. 2 a Original TEM images of iron oxide nanoparticles, b Segmented images using GMM-EM
technique c Labeled nanoparticles
238
Table 2 Details of computed geometric features; area(size-wise in the range of 0–50 nm, 51–100 nm, 101–150 nm, and 151–200 nm), porosity (%), average
circularity (%), and average interparticle distance (nm) for TEM images of Iron Oxide Nanoparticles A, B, C, D, and E
Synthesis Image Number of particles Avg. Porosity Avg Average
Methods With different Area in Area in Area in Area in area (%) circularity interparticle
temperatures range range51–100 nm range101–150 nm range151–200 nm (%) (%) distance (nm)
(°C) 0–50 nm
Chemical a (300) 23 13 11 6 81.21 16.9784 57.26 66.1755
method b (300) 18 7 4 0 53.12 5.0766 51.36 54.6966
c (500) 12 3 5 1 68.35 8.2366 56.08 54.0291
d (500) 14 4 1 1 48.95 3.6730 62.70 52.1772
Biological e (300) 9 1 3 1 60.12 4.7216 54.39 59.2356
method
P. Bannigidad et al.
Iron Oxide Nanoparticle Image Analysis … 239
Table 3 The overall percentage of nanoparticles that are in the range of 0–50 nm, extracted from
both synthesis methods; chemical and biological
Synthesis methods TEM images Temperature (°C) Area (size in range 0–50 nm)
in percentage
Chemical method a 300 43.39
b 300 62.06
c 500 57.14
d 500 70.00
Biological method e 300 64.28
and 70.00%, respectively, which are synthesized using the chemical method and the
area-wise percentage of image e is 64.28%, which is synthesized using the biological
method. The images which are synthesized at 500°C, i.e. image c and image d; image
d has the highest area-wise percentage of 70.00%. And images which are synthesized
at 300°C temperature, i.e., image a, b, and e, image e has the highest area-wise
percentage of 64.28%. Since the biological synthesization method is universally
accepted as it uses no toxic chemicals, is cost-effective, and is environmentally
friendly, it is proposed that the biological method is s for synthesizing the Iron
Oxide Nanoparticles TEM images.
The computed features of image a are interpreted and analyzed in comparison
with manual results obtained from chemical experts, based on these values the other
results of images b, c, d, and e are interpreted.
5 Conclusion
In this study, the TEM images (a, b, c, d, and e) of Iron Oxide Nanoparticles are
synthesized using chemical and biological methods at 300 °C and 500 °C temperature.
The motive behind the present work is that the traditional characterization techniques
are time-consuming and are not economically cost-effective. Hence, an effort is
made to automate a tool that can determine the size of TEM images of Iron Oxide
Nanoparticles. The Gaussian Mixture Model—Expectancy Maximization (GMM-
EM) segmentation technique is applied and diverse features, namely; size (area),
perimeter, major axis, minor axis, porosity, circularity, interparticle distance, and
average area are computed from TEM images of Iron Oxide Nanoparticles. It is
observed that the area-wise percentage of images a, b, c, and d is 43.39%, 62.06%,
57.14%, and 70.00%, respectively, which are synthesized by using the chemical
method and the area-wise percentage of image e is 64.28%, by using the biological
method. The biological synthesization method is universally accepted as it uses no
toxic chemicals, is cost-effective, and is environmentally friendly. The proposed
results are analyzed and compared with manual results obtained from the chemical
experts and are found to be a good performance.
240 P. Bannigidad et al.
Acknowledgements The authors are grateful to the ‘KSTEPS, DST, GOVT. OF KARNATAKA’
for providing financial assistance and sanctioning Ph.D. fellowship to carry out this research work.
Authors are also grateful to the reviewers for their valuable suggestions that helped in improving
this manuscript.
References
1. Ali A, Zafar H, Zia M, ul Haq I, Phull AR, Ali JS, Hussain A (2016) Synthesis, characterization,
applications, and challenges of iron oxide nanoparticles. Nanotechnol Sci Appl 9:49–67
2. Cuenya BR (2010) Synthesis and catalytic properties of metal nanoparticles: size, shape,
support, composition, and oxidation state effects. Thin Solid Films 518(12):3127–3150
3. Li L, Fan M, Brown RC, Van Leeuwen J, Wang J, Wang W, Song Y, Zhang P (2006) Synthesis,
properties, and environmental, applications of nanoscale iron-based materials: a review. Crit
Rev Environ Sci Technol 36:405–431
4. Hosokawa M, Nogi K, Naito M, Yokoyama T (2008) Basic properties and measuring methods
of nanoparticles. Nanoparticle technology handbook, pp 3–48
5. Teja AS, Koh P-Y (2009) Synthesis Growth, properties, and applications of magnetic iron
oxide nanoparticles. Prog Cryst Charact Mater 55:2245
6. Kopanja L, Kralj S, Zunic D, Loncar B, Tadic M (2016) Core–shell superparamagnetic
iron oxide nanoparticle (SPION) clusters: TEM micrograph analysis. Part Des Shape Anal
42(9):10976–10984
7. Ajinkya N, Yu X, Kaithal P, Luo H, Somani P, Ramakrishna S (2020) Magnetic iron oxide
nanoparticle (IONP) synthesis to applications: present and future. Materials 13:4644
8. Almijalli M, Saad A, Alhussaini K, Aleid A, Alwasel A (2021) Towards drug delivery
control using iron oxide nanoparticles in three-dimensional magnetic resonance imaging.
Nanomaterials 11:1876–1888
9. Bannigidad P, Udoshi J, Vidyasagar CC (2020) Automated characterization of aluminum oxide
nanopore fesem images using machine learning algorithms. Int J Adv Sci Technol 29(03):6932–
6942
10. White EE, Pai A, Weng Y, Suresh AK, Van Haute D, Pailevanian T, Alizadeh D, Hajimiri
A, Badie B, Berlin JM (2015) Functionalized iron oxide nanoparticles for controlling the
movement of immune cells. Nanoscale 7(17):7780–7789
11. Bannigidad P, Udoshi J, Vidyasagar CC (2018) Effect of time on Aluminium FESEM nanopore
images using fuzzy inference system. Recent Trends Image Process Pattern Recogn 1037:397–
405
12. Ismail HJ, Barzinjy AA, Hamad SM (2019) Analysis of nanopore structure images using
MATLAB software. Eurasian J Sci Eng 4(3):84–93
13. Vidyasagar CC, Bannigidad P, Muralidhara HB (2016) Influence of anodizing time on porosity
of nanopore structures grown on flexible TLC aluminium films and analysis of images using
MATLAB software. VBRI, Adv Mater Lett 1:71–77
14. Alanazi1 RS, Saad AS (2020) Extraction of iron oxide nanoparticles from 3 dimensional MRI
images using K-mean algorithm. J Nanoelectron Optoelectron 15:1–7
15. Bannigidad P, Udoshi J, Vidyasagar CC (2019) Characterization of Aluminium oxide
nanoporous images using different segmentation techniques. Int J Innov Technol Exploring
Eng (IJITEE) 8(12):2491–2497
Bankruptcy Prediction Using Bi-Level
Classification Technique
1 Introduction
Bankruptcy is a financial state for any firm or person where they are unable to pay their
debt. Financial investors, banks, government, and money lenders seek an efficient
method to determine the bankruptcy status of the firm. Prediction of bankruptcy will
help all the stakeholders of the company. Because of this reason, intensive research
regarding the prediction of bankruptcy has been going on. For example, Altman [1]
uses multivariant discriminant analysis to obtain a z-score which is used to classify
bankrupt and non-bankrupt companies. Carton [2] gave measures to determine the
performance of the organization. Parameters are divided into different categories
such as profitability measures, growth-based measures, and market-based measures.
Bankruptcy prediction can be treated as a classification problem. Machine Learning
(ML) can be used to classify bankrupt and non-bankrupt companies. Altman, Kimura,
and Barboza [3] presented ML models to predict bankruptcy based on Carton’s [2]
and Altman z-score [1] parameters. The main aim of this work is to improve the
prediction performance of bankruptcy using ML algorithms. The features that have
been used to train the ML algorithms are the financial ratios that are available for
public access. The key observation that has been made is that adding the indica-
tors of the organization’s performance has resulted in an improved performance for
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 241
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_21
242 A. Antani et al.
bankruptcy prediction [3]. The indicator that is used as a feature in this work is
Tobin’s Q. The key contributions of this work are listed as follows:
1. A new feature set including both organizational indicators and market-based
measures are used to predict bankruptcy.
2. A heterogeneous bi-level classification technique is introduced to perform
bankruptcy prediction.
3. An improvement in the results have been observed for the proposed work.
The remainder of the paper is organized as follows. Section 2 delves into previous
research. Section 3 explains the methodology. Section 4 digs into the details of results
and analysis. Section 5 brings the project to a conclusion.
2 Previous Work
Financial investors, banks, governments, and money lenders seek to know the status
of companies and they want to know their bankruptcy status. Because of this reason
extensive research has been going on predicting bankruptcy.
In year 1968, Edward Altman [1] gave us financial ratios and discriminant analysis
method to predict the bankruptcy. The goal of Altman [1] is to evaluate the analytical
quality of ratio analysis. This research aims to assess the analytical quality of ratio
analysis. Final discriminant function uses 5 financial ratios shown in Eq. 1.
Altman [1] used 5 financial ratios to obtain z-score which is used to predict
bankruptcy. These ratios are liquidity, profitability, productivity, leverage ratio, and
asset turnover. ML model [3] have used 11 financial ratios as features to predict
bankruptcy. Out of these 11 financial ratios, 5 are Altman z-score parameters. And
Bankruptcy Prediction Using Bi-Level Classification Technique 243
other six parameters are from Carton’s study [2] such as return on equity, operational
margin, market to book ratio, growth in sales, growth in employees, and growth in
assets. In this work, use of Tobin’s Q as a market-based measure [2] along with
Altman [1] and Carton’s [2] parameters is proposed.
Tobin’s Q: The Q ratio or Tobin’s Q is the market value of the company divided
by the replacement cost of its assets [7]. It represents the relationship between the
market value of the company and intrinsic values. It can be used to determine whether
a firm is overvalued or undervalued. The formula of Tobin’s Q is stated below:
3 Proposed Methodology
Any ML model is driven by its data. The data set used for this work is collected from
Kaggle1 . This data set has around 92 k records with 558 records of bankrupt firms.
These firms are US-based firms whose information is publicly available.
Data set ranges from the year 1971 to 2017. It has 13 columns, one is for class
label and the rest are features for the model. Features and their formula are mentioned
in the Table 1. After missing value imputation and data balancing 80% of the data is
used to train the model while the rest 20% is used to test the model.
Where, TA = Total Asset, EBIT = Earning before interest and tax, TD = Total
Debt, MVS = Market value of share, and NI = Net Income. Features and formula
are from the study [1] and [2].
1 https://www.kaggle.com/shuvamjoy34/us-bankruptcy-prediction-data-set-19712017
244 A. Antani et al.
Balancing the data set is an important step while making a prediction using ML [8].
The problem which is addressed in this study occurs rarely. In the real world, more
companies are non-bankrupt than bankrupt. This data set has around 92 k records
and out of these 92 k records, only 558 records are of bankrupt companies. Other
records are of non-bankrupt companies. In this data set 99% of the records are of
non-bankrupt companies and only 1% records are of bankrupt companies. In this
study, Near-Miss downsampling technique is used to balance the data set. It selects
examples from the majority class based on the Euclidean distance to minority class
examples. It stops when majority class samples are equal to minority class samples.
After using the Near-Miss Algorithm, both the classes (bankrupt and non-bankrupt)
have 558 records each.
Bankruptcy Prediction Using Bi-Level Classification Technique 245
Figure 1 illustrates the architecture of the bi-level classification used in this work.
Here, meta-model uses the prediction done by the base models as features along with
the training data to give the final prediction. Training data samples used as input in
meta-model are different than training data samples which are used as inputs in base
models. First of all training data is given as input to all the base models. The base
model uses the input data as features and gives its prediction. These predictions are
stored as level-one predictions. These level-one predictions are given as input to the
meta-model. The final prediction is given by using these level-one predictions and
training data meta-model.
In this bi-level classification technique, two single-level classifiers; decision tree,
and linear support vector machine, four ensemble techniques; AdaBoost, gradient
boost, bagging with random forest estimators, and bagging with AdaBoost esti-
mators are used as base heterogeneous classification models. The decision tree is
used with a Gini index and it has a maximum depth of 5. SVM is used with a
linear kernel. Adaboost and gradient boost are used with 50 estimators. Gradient
boost uses logistic regression as the loss function. Bagging with random forest and
bagging with AdaBoost uses 10 estimators. Predictions of these heterogeneous clas-
sification models are stored as level 1 predictions. And they are given as input to the
meta-algorithm.
As data set which is used by the Altman is not available, to compare the performance
of current model with studies of Altman, kimura and barboza [3] four independent
models with different set of features have been made. Importance of including Tobin’s
Q as parameter can be shown using these results. Below are the four feature set used.
Type 1: Altman z-score [1] (5 features) liquidity, profitability, productivity,
leverage, asset turn over.
Type 2: Altman z-score [1], and Carton [2] (11 features): liquidity, profitability,
productivity, leverage, asset turn over, return on equity, operational margin, market
to book ratio, growth in sales, growth in assets, growth in employee.
Type 3: Altman z-score [1], Carton [2], and Tobin’s Q (12 features): liquidity,
profitability, productivity, leverage, asset turn over, return on equity, opera-
tional margin, market to book ratio, growth in sales, growth in assets, growth
in employee, tobin’s Q.
Type 4: Altman z-score [1], Carton [2] (Except Market to book ratio), and Tobin’s
Q (11 features): liquidity, profitability, productivity, leverage, asset turn over,
return on equity, operational margin, growth in sales, growth in assets, growth in
employee, tobin’s Q.
Altman [1] uses the Type 1 feature set. Barboza, Kimura, and Altman [3] use
the Type 2 feature set. In this study best results are achieved with Type 4 feature
set. To compare the performance of the developed model with that of prior studies,
the Random Forest algorithm is evaluated with the available data set and Type 2
features. The Bi-level classification algorithm is tested with the same data set and
Type 4 features.
Bankruptcy Prediction Using Bi-Level Classification Technique 247
4.2 Results
Table 2 Comparison of
Missing Value Imputation: Comparison using F1-score
missing value imputation
Algorithm Mean Median KNN Imputation
Decision tree 0.91 0.91 0.92
Random forest 0.94 0.95 0.96
AdaBoost 0.94 0.94 0.96
Linear SVM 0.89 0.92 0.94
Bagging with random 0.94 0.95 0.96
forest
Bagging with 0.94 0.94 0.96
AdaBoost
248 A. Antani et al.
Table 5 Performance of
Feature set Accuracy F1-score scaled to 100
Bi-level classification
Type 1 91.1 91
Type 2 96 96
Type 3 96.4 97
Type 4 97.8 98
5 Conclusion
References
1. Altman EI (1968) Financial ratios, discriminant analysis and the prediction of corpo- rate
bankruptcy. J Finance 23(4):589–609
2. Carton RB (2004) Measuring organizational performance: an exploratory study (Doctoral
dissertation, University of Georgia)
3. Barboza F, Kimura H, Altman E (2017) Machine learning models and bankruptcy prediction.
Expert Syst Appl 83:405–417
4. Fen Y, P’ng Y (2019) Tobin’s Q and its determinants: a study on Huawei technologies Co., Ltd
5. D. Tarliman: The Corporate Scandal and the Probability of Bankruptcy: A Case Study of Mylan
NV. Available at SSRN 3385217, (2019).
250 A. Antani et al.
6. Wolfe J, Sauaia ACA (2003) The Tobin Q as a company performance indicator. In: Developments
in business simulation and experiential learning: proceedings of the annual ABSEL conference
30
7. Fu L, Singhal R, Parkash M (2016) Tobin’s Q ratio and firm performance. Int Res J Appl Finance
7(4):1–10
8. Veganzones D, S´everin E (2018) An investigation of bankruptcy prediction in imbal- anced
datasets. Decision Support Syst 112:111–124
Infant Brain MRI Segmentation Using
Deep Volumetric U-Net with Gamma
Transformation
1 Introduction
Studies show that most brain abnormalities form during the first year of brain devel-
opment. The baby connectome project was started to identifying the factors that
contribute to healthy brain development. The University of North Carolina heads
this project [1]. The Medical Image Computing and Computer-Assisted Interven-
tion Society, abbreviated as MICCAI, is a society that creates awareness among
the world research community about the present problems in the medical field by
hosting conferences and organizing competitions with deeply concerning problem
statements. (For reference: Iseg termed after infant segmentation [2].) This compe-
tition deals with the segmentation of 6-month-old children’s brains into three parts:
white matter, gray matter, and cerebrospinal fluid. The University of North Carolina
backs this competition as part of the baby connectome project. After thinking very
profoundly about resolving brain abnormalities, researchers have come to one under-
standing that most of the brain functions develop in their first year of growth. So
researchers are trying to study brain growth during 6–8 months, called the isointense
phase.
There are three phases in the brain development of a 1-year-old. 0–6 months is
infantile phase and 6–8 months is an isointense phase, and above eight months are
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 251
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_22
252 G. S. Yeshwanth et al.
called adult phase [3]. In the infantile phase, nothing is much grown, so segmentation
is not possible, and in the adult phase, brain segmentation is possible and not very
difficult. However, the major problem is to segment the brain into defined three parts
in the isointense phase as most of the white matter and gray matter are overlapping,
and there are not many magnetic resonance imaging (MRI) scans available for the
models to learn. Iseg dataset is the biggest available dataset with ten patients’ MRI
images for training. And other ten is for testing. There are two MRI scans, T1 and
T2 images, for a single person [2].
The image segmentation technique had greatly helped in analyzing images. Even
before deep learning algorithms have done image segmentation, there have been
many methods used to perform the same task [4–6]. Deep learning models resulted
in state-of-the-art performance [2]. Image segmentation means categorizing each
pixel value in the image into a set of classes. Image segmentation has numerous
applications in all domains like medical, technology of self-driving cars, satellites.
Some algorithms came into existence to facilitate this segmentation to be as accurate
as possible.
Segmentation of an image using deep learning uses neural networks. U-net is
perhaps the most famous architecture in the field of image segmentation. U-net uses
autoencoder architecture which has expanding and contracting paths; upsampling the
image to increase the spatial resolution of the image and downsampling to decrease
the image’s spatial resolution. Pooling layers, either max or min, can do downsam-
pling, and either the upsampling layer or transpose layer can do upsampling [7]. The
main aim in this work is to develop a segmentation algorithm to perform segmen-
tation for child brain MRI scan. It becomes a challenging task to segment the brain
of children belonging to age group of less than 1 year old. Therefore, in this work,
a deep learning architecture is introduced to perform brain segmentation on MRI
images. The key contributions of this work are listed below:
1. A three-dimensional U-Net architecture is proposed for image segmentation in
brain MRI scan.
2. The proposed method is compared with state-of-the-art image segmentation
techniques to analyze the performance.
3. The proposed method resulted in a good performance as compared to other
state-of-the-art image segmentation methods.
The rest of the paper is organized as follows: Existing brain image segmentation
techniques are discussed in Sect. 2. The details of the dataset used to develop the
segmentation technique are given in Sect. 3. The detailed description of the proposed
method is presented in Sect. 4. The results obtained for the proposed method, and
the future possibilities are given in Sect. 5. Finally, the summarization of the work
is presented in Sect. 6.
Infant Brain MRI Segmentation Using Deep Volumetric … 253
2 Literature Review
Kendall et al. [1] have shown that 3D segmentation of a volumetric image is better
than using 2D segmentation. The work also studied ensemble methods working on
the Iseg-2017 dataset and proposed a dense network for brain MRI segmentation.
Gao et al. [8] proposed a fully conventional neural network to address the segmen-
tation of the infant’s brain. Instead of giving explicit images, the work used coarse
and dense future maps to learn some of the regions of an image and used a trans-
formation module to combine all the layers. Wu et al. [4] dealt with MRI images
of 1-year-old, two-year, and 3-year-old. They gathered about 95 MRI images and
used registration and atlases methods to learn the patterns. The work also added
brain contrast probability maps to give the model more information. Weinberger
et al. [5] after observing when connections are very near, the results of convolutional
neural networks are good, the work developed a new model called dense net in which
every layer gets inputs from all layers which are before it. Jiang et al. [6] proposed
a segmentation correction algorithm; due to less contrast difference between gray
matter and white matter in both T1 and T2 images, there will be misclassifications,
so it is more common that models result in an error so the work proposed a method
which will be able to correct some of those errors.
Lei at el. [9] proposed a 3D convolution neural network using pyramid dilation
while downsampling and a special type of attention network while upsampling. This
model resulted in an excellent dice value of 0.90 for White matter and stood top1 in
Iseg 2019 challenge. Lienkamp et al. [10] proposed a three-dimensional U-net which
replaces all 2D counterparts of U-net paper into 3D counterparts. Moreover, the work
was specially designed to learn from the sparsely annotated images. Annotating only
some part of the image the model should be able to learn. The work found it very
useful in many applications.
Atlas generally means a predefined map that is present for reference. Similarly,
even in image segmentation, the atlases method uses some reference atlas of the
previous patient. Atlases are generated based on previous patient’s data. There can
be one atlas or multiple atlases. Parametric models are those models which do not
have the freedom to learn anything the model wants. Models work based on some
prior restrictions and assumptions, and the model will learn in that manner.
Based on the review of the existing systems, it is observed that the 3D deep
learning models perform better in segmenting the infant brain as in comparison with
the 2D deep learning models. The results have exhibited in improved performance
in the dice score when the MRI images are trained on 3D deep learning image
segmentation methods. Therefore, in this paper, a deep learning algorithm is proposed
for segmentation of infant brain MRI images as deep learning proved to be very
effective in segmentation than atlas-based methods and parametric-based methods
due to the freedom it gives the model to learn very complex equations. It is also seen
that for volumetric images 3D segmentation works better than 2D convolutions [11].
U-net-based architecture is used in this work as it is specifically meant for medical
image segmentation.
254 G. S. Yeshwanth et al.
3 Dataset
The dataset of this work is not public as this problem statement is part of the MICCAI
challenge.1 However, details of the dataset are as follows. There are ten patients
MRI images of 144 * 192 * 256 pixels. For every patient, both T1 and T2 MRI
scans are given. The validation set also contains ten patient’s data with the same
configuration as training images. T1-weighted MR images were acquired with 144
sagittal slices: TR/TE = 1900/4.38 ms, flip angle = 7°, resolution = 1 × 1 × 1 mm3 ;
T2-weighted MR images were obtained with 64 axial slices: TR/TE = 7380/119 ms,
flip angle = 150°, resolution = 1.25 × 1.25 × 1.95 mm3 . Along with training data,
the segmentation output of each training sample with the below information.
0: background (everything outside the brain), 1: cerebrospinal fluid (CSF), 2: gray
matter (GM), and 3: white matter (WM).
Access to the dataset is only possible by participating in the competition. Here the
link for information is about the dataset and competition page Dataset information.
4 Proposed Methodology
The dataset has images in hdr format. Those images are converted from hdr to nii
format.
1 https://iseg2019.web.unc.edu/.
Infant Brain MRI Segmentation Using Deep Volumetric … 255
Fig. 2 MRI images a original image, b gamma transformed image, and c piecewise transformed
image
This section mainly focuses on the discussion of results for the various experiments
carried out. The U-net architecture, along with gamma transformation, resulted in
85.64 in white matter, 88.24 in gray matter, and 93.75 in cerebrospinal fluid. The
model is trained for 2000 epochs with Adam optimizer. The performance of the
proposed model on the validation set is shown in Table 1. All the values are dice
values. The first column represents dice values of CSF segmentation followed by GM
and WM segmentation. Table 2 shows the experiments based on U-net architecture.
The architecture column represents what modifications are done to the U-net to get
the dice value in the second column. The architecture format is as follows: the number
of epochs + what images is used for training + batch size. Some architectures used
piecewise and gamma transformations. In this column, high represents the increased
capacity of U-net which starts from 32 filters, while the previous U-net starts from
16 filters.
The format followed to represent architecture column is same as Table 2 for
Tables 3, 4 and 5. In Table 3, experiments are performed with augmentation tech-
niques such as zooming, rotation. In Table 4, experiments of attention U-net are
mentioned. Table 5 experiments related to attention U-net with augmentation are
mentioned. The patchwise results in Table 6 are based on using 16 * 16 * 16 patch
size. Each of them is trained for 150 epochs. Segmentation image of white matter,
gray matter, and cerebrospinal fluid is shown in Fig. 3.
Infant Brain MRI Segmentation Using Deep Volumetric … 257
Table 2 U-net-based
Architecture Dice (in %)
experiments without
augmentation 1000 + (T1,T2) + 1 78.84
2500 + (T1,T2) + 2 80.03
2500 + T1 + 2 81.90
4000 + T1 + 2 82.39
1250 + T1 + 1 + piecewise 83.61
900 + T1 + 1 + gamma 83.37
1250 + T1 + 1 + gamma 84.38
2000 + T1 + 1 + piece + highcap 85.10
150 + T1 + 1 + piecewise + patch(64) 76.99
150 + T1 + 1 + gamma + patch(64) 78.57
150 + (T1,T2) + 1 + gamma 80.2
150 + T1 + 1 + overlap-patch(64) 78
2000 + T1 + 1 + gamma 85.32
1000 + T1 + 10 training images 99.28
Table 3 U-net-based
Architecture Dice (in %)
experiments with
augmentation 1000 + (T1,T2) 77.60
1500 + T1 78.59
1000 + high 81.38
258 G. S. Yeshwanth et al.
Comparison of the proposed model with other models that participated in the
Iseg2019 challenge is shown in Table 7; names of the models are chosen as team
names of these models as they did not give any specific name based on their approach.
All the values are dice values. All the models are taken from [2]. All these models are
part of the Iseg competition and trained on the same dataset. In Table 7 although the
proposed model is not the best model to perform this segmentation task, it is better
than some models. This paper also includes various experimental results, so that the
limitations of these approaches presented in this paper can be understood directly
without experimentation.
Infant Brain MRI Segmentation Using Deep Volumetric … 259
Fig. 3 White matter segmentation, gray matter segmentation, and cerebrospinal matter segmenta-
tion, where a actual output, b predicted output
6 Conclusion
References
1. Melbourne A, Cardoso MJ, Kendall GS, Robertson NJ, Marlow N, Ourselin S (2012)
NeoBrainS12 challenge: adaptive neonatal MRI brain segmentation with myelinated white
matter class and automated extraction of ventricles I–IV. In: Proceedings of the MICCAI grand
challenge: neonatal brain segmentation, pp 16–21
2. Sun Y, Gao K, Wu Z, Li G, Zong X, Lei Z, Wang L (2021) Multi-site infant brain segmentation
algorithms: the iSeg-2019 challenge. IEEE Trans Med Imaging 40(5):1363–1376
3. Wang L, Nie D, Li G, Puybareau Ė, Dolz J, Zhang Q et al (2019) Benchmark on automatic
six-month-old infant brain segmentation algorithms: the iSeg-2017 challenge. IEEE Trans Med
Imaging 38(9):2219–2230
4. Shi F, Yap PT, Wu G, Jia H, Gilmore JH, Lin W, Shen D (2011) Infant brain atlases from
neonates to 1- and 2-year-olds. PLoS ONE 6(4):e18746
5. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional
networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition,
pp 4700–4708
6. Xue H, Srinivasan L, Jiang S, Rutherford M, Edwards AD, Rueckert D, Hajnal JV (2007)
Automatic segmentation and reconstruction of the cortex from neonatal MRI. Neuroimage
38(3):461–477
7. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image
segmentation. In: International conference on medical image computing and computer-assisted
intervention. Springer, Cham, pp 234–241
8. Nie D, Wang L, Gao Y, Shen D (2016) Fully convolutional networks for multi-modality isoin-
tense infant brain image segmentation. In: 13th international symposium on biomedical imaging
(ISBI). IEEE, pp 1342–1345
9. Lei Z, Qi L, Wei Y, Zhou Y (2019) Infant brain MRI segmentation with dilated convolution
pyramid downsampling and self-attention. arXiv preprint arXiv:1912.12570
10. Çiçek Ö, Abdulkadir A, Lienkamp SS, Brox T, Ronneberger O (2016) 3D U-net: learning dense
volumetric segmentation from sparse annotation. In: International conference on medical image
computing and computer-assisted intervention. Springer, Cham, pp 424–432
Infant Brain MRI Segmentation Using Deep Volumetric … 261
11. Kamnitsas K, Ledig C, Newcombe VF, Simpson JP, Kane AD, Menon DK, Glocker B (2017)
Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation.
Med Image Anal 36:61–78
12. Xu J, Li Z, Zhang M, Liu J (2020) Reluplex made more practical: leaky ReLU. In: 2020 IEEE
symposium on computers and communications (ISCC), pp 1–7
13. Sörensen T (1948) A method of establishing groups of equal amplitude in plant sociology
based on similarity of species content and its application to analyses of the vegetation on
Danish commons
14. Hamann B, Chen JL (1994) Data point selection for piecewise linear curve approximation.
Comput Aided Geom Des 11(3):289–301
Analysis of Deep Learning
Architecture-Based Classifier
for the Cervical Cancer Classification
1 Introduction
The fourth most ubiquitous cancer affecting women worldwide is cervical cancer [1].
Every year among women, cancer diagnosis is greater than 500,000 and over 300,000
deaths occur globally. There is a lack of screening and HPV vaccination programs
in economically developing countries leading to 90% of incidence cases of cervical
cancers. However, in affluent countries, both the incidence and mortality have been
reducing due to the introduction of proper screening programs [2]. In most cases,
before the disease could become clinically marked, the cervical cancer experiences
a long asymptomatic phase. This may lead to death also. Hence, through the process
of regular screening, the early and timely detection of cancer is possible to prevent
the cancer progression [3].
The continual screening is very much essential among women to enable the
doctors to identify cancer at an early stage before it could reach the final stage.
The Pap smear test (smear collected from uterine cervix and it is stained) is used as
a screening method to detect cervical, but the alertness within the public about the
screening test is limited. Every three years, a routine cancer screening must be done,
and for every five years, a Pap smear with an HPV DNA test is recommended as a
screening method [4].
Figure 1a shows the abnormal cells with the enlarged nucleus as one of the features
in identifying the cancer cells, and Fig. 1b depicts the normal cells without any
R. Chandraprabha (B)
Electronics and Comunication Engineering, BMS Institute of Technology and Management,
Bangalore, India
e-mail: [email protected]
S. Singh
Electronics and Telecomunication Engineering, BMS Institute of Technology and Management,
Bangalore, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 263
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_23
264 R. Chandraprabha and S. Singh
changes in the nucleus size. Figure 2 shows the clinical data. Figure 2a shows the
abnormal cells with an enlarged nucleus, and Fig. 2b depicts the normal cells without
any changes in the nucleus size.
To assist the clinical providers, in providing the report for the subsequent patient
management, the general classification of cancer images “normal” from “abnormal”
is recommended [5].
The manual screening procedure is time-consuming and not economical, and also,
it is error-prone since the manual procedure may drain the paramedical workers.
Hence, there is a requirement for an automatic diagnosis system that can technically
support the clinicians that can reduce cost, time, and expertise needed for cervical
cancer screening.
Therefore, a system must be designed to make a faster decision at a faster rate in
all aspects. The automated system must adapt itself to fresh and complicated cases
of the cervical cancer. With the active participation of medical paramedical health
workers and the recent advent of technology, an automated diagnosis system is to be
formulated and implemented.
Analysis of Deep Learning Architecture-Based Classifier … 265
The tremendous growth in artificial intelligence (AI) has given a supporting tech-
nical hand to doctors in providing personalized health care to patients, enhancing the
efficiency of doctors in making a diagnosis of any diseases at an early stage [6].
2 Literature Review
Shin et al. [19] discusses the convolution neural network model analysis which
can be adapted to design a high-performance automated system for medical imaging
tasks. Segmentation deep learning techniques were applied in the segmentation
process. The author presents region of interest detection with deep learning based on
convolution neural network. Deep learning algorithms provide superior performance
than traditional machine learning for the classification of cervicography images into
different classes [20]. The faster region convolutional neural network system [21]
was applied in the classification task. The sensitivity of 99.4%, specificity of 34.8%,
and 0.67 of the area under the curve were obtained. Wu et al. [22] presents an intelli-
gent simple convolutional neural network (CNN) that is trained and tested by original
image group (3012 datasets) and augmented image group (108,432) datasets. The
works discuss that through augmentation the classification accuracy was improved.
From the above survey, conventional machine learning and deep convolution
neural networks have influenced the classification of cervical cells. Classification
can be performed by extracting the features and then followed by a classifier. Deep
convolution neural network classifies directly without any manual feature extraction
with image processing. However, validation of the deep learning models is limited
only for the standard benchmark dataset.
In this work, the pre-trained architectures are considered. The model is fine-tuned
through transfer learning. The trained network is validated and tested for the standard
benchmark dataset and for the real-time clinical data also. Comparative analysis to
find better architecture and training parameters which is also verified with clinical
data.
3 Methodology
In this work, the input images from both standard and real-time clinical dataset are
considered. The orientation of the samples is changed through the data augmenta-
tion process. Through the adaption of transfer learning process, the pre-existing deep
convolution neural networks are fine-tuned according to our work. A very few impor-
tant hyper-parameters are fine-tuned. The tuned architectures are trained, validated,
and tested for the dataset. The considered architectures are trained with hardware
Processor Intel E-2224G CPU with 3.50 GHz, 64-bit operating system, and x64-
processor. The software platform used is MATLAB R2021—academic use. Figure 3
discusses the proposed methodology.
Analysis of Deep Learning Architecture-Based Classifier … 267
Data augmentation
The orientation of the samples is restricted in the dataset. Hence, in order to have
a robust architecture which can classify any type of orientation of samples data
augmentation techniques are applied. The different data augmentation techniques
are image rotation, image reflection, and image translation.
Deep learning-based architectures
For image analysis and classification, deep learning networks are very appro-
priate that results in successful image classification [23]. Generally, a deep learning
networks consists of (a) convolutional layer, (b) activation layer, (c) pooling layer,
and (d) fully connected layer [4].
For the proposed work, the deep learning architectures GoogleNet and Alexnet
are considered.
The GoogleNet model consists of 22 deep layers and 27 pooling layers with
9 inception modules stacked up linearly. The outputs of inception modules are
connected to pooling layer [24].
The Alexnet architecture consists of five convolution layers, and last 3 are fully
connected. All the outputs of convolutional and fully connected layers are connected
to a softmax activation layer which produces 1000 class labels [25].
For medical applications, building a model from scratch is unpractical due to
limited clinical data and the requirement of computational resources. Hence, the
concept of transfer learning arises [24, 26, 27].
Transfer learning is similar to the way humans may relate their information of an
assignment to enable the learning of another assignment [28, 29]. The fine-tuning of
an already existing pre-trained network is termed transfer learning. In this process,
the last fully connected layers of the models are replaced with the new classification
layer. The new classification layer comprises two nodes at the output representing
the two classes of cancer.
The hyper-parameters considered for the fine-tuning are network learning rate-
0.0001, stochastic gradient descent momentum, momentum of 0.9, and epoch of 06
and 30. An epoch refers to one cycle through the full training dataset.
The splitting of data for training, testing, and validation is 60:20:20. The number
of normal images considered for the testing, training, and validation are 145, 48, and
48, respectively. The number of abnormal images considered for the testing, training,
and validation is 405, 135, and 184, respectively.
Analysis of Deep Learning Architecture-Based Classifier … 269
1.2
1 0
0.9235
..92
923
235
35 0.939
0.9399
0.9399 0.906
0 .9
906
90
90
06
600.9306
.9
9306
93
930
306
06 0.9344
0 93 0 0.9779
0.9
.9779
977
97
9779
79
0.8
0.6
0.4
0.2 0.0765
0.076
0 .0765
0
0765
76
655
0.0601
0 .0
06
060
601
0
Epoch_06 Epoch_30
Fig. 4 Classifier A performance for epochs 06 and 30 with HErlev Dataset (standard)
Figure 4 displays the performance metric of the classifier A for two different epochs
which shows that with increase in number of epochs, the model is trained better. The
other measuring parameters will also improve. The classifier incorrectly predicting
the abnormal class is decreased by the network as the number of iterations increases.
For larger epochs, the model is robust in identifying cancer and non-cancerous
cells with decreased misclassification rate for the test dataset.
From Fig. 5, the Classifier B behavior for the two levels of epoch is almost constant
with not much variations.
As the iterations (number of epochs) increase in the training phase, the model’s
area under curve is increased by 0.04.
270 R. Chandraprabha and S. Singh
1.00 0.89
0
0.
.89
.89 0
89
8 0.8
0.8852
0.88
88
852
52 0.91
0.91
91
91 0.94
0.9454
0.9
9454
454
0.90 7 0.8
0.87
0.87
8
87 0.88
0 88
88
0.80
0.70
0.60
0.50
0.40
0.30 0.1
0.11
0 11
11
0.20 0.1148
0.1
0 11
1
114
1148
1
148
14
48
48
0.10
0.00
Epoch_06 Epoch_30
Fig. 5 Classifier B confusion matrix parameters for different epochs with HErlev Dataset (standard)
performance of Classifier A
0.9399
0.9306
0.95
0.88
0.9508
Accuracy
Misclassification rate
0.5
0.5
0.0492
ALEXNET GOOGLENET
From results, the classifier performance is better when the network is trained for the
30 epochs. The trained model with 30 epochs is further considered for the validation
with the real-time clinical data.
Figure 7 displays that the Classifiers A and Classifier B are predicting cancer with
an accuracy of 0.9508 and 0.5, respectively.
5 Conclusion
The incidence and mortal rates of cervical cancer have to be reduced among women.
A very efficient automated system is essential for the classification of cancer into non-
cancerous and cancerous cases. Artificial intelligence is already on terms in providing
a technical hand (automated system) to health care. Deep learning architectures are
in the classification of cervical cancer to build an automated system.
In this work, the existing architectures Alexnet and GoogleNet are fine-tuned
through the transfer learning process for the classification task. The two classification
classes considered for this work are two classes, i.e., normal and abnormal. The
fine-tuned pre-existing models are trained by fine-tuning hyper-parameters such as
learning rate stochastic gradient descent momentum and number of epochs. The
HErlev Dataset (standard) set alone is applied for the training and validating phase,
whereas for the testing phase, both the HErlev Dataset (standard) and real-time
clinical dataset are applied. There is 1% variation in the performance of GoogleNet
for the epochs 6 and 30. The performance of the pre-trained Alexnet is increased by
4–8% for the epoch 30 compared to epoch 6. For 30 epoch, an accuracy, precision,
and AUC are 0.9399, 0.9306, and 0.9779, respectively.
Further, the two pre-trained classifiers that are trained, validated, and tested with
30 epochs are considered for the new non-trained real-time clinical test data. The
272 R. Chandraprabha and S. Singh
Alexnet architecture executes better with clinical data compared to the GoogleNet
architecture with accuracy of 0.9508 and misclassification rate of 0.0492.
In the future, the architecture robustness in classification can also be enhanced.
Further, the work can be extended for the multiclass classification and verifying for
the different datasets.
References
1. Jemal A, Center MM, DeSantis C, Ward EM (2010) Global patterns of cancer incidence and
mortality rates and trends. Cancer Epidemiol Biomark Prev 19(8):1893–1907
2. Cohen PA, Jhingran A, Oaknin A, Denny L (2019) Cervical cancer. Lancet 393(10167):169–
182. https://doi.org/10.1016/S0140-6736(18)32470-X
3. Canavan TP, Doshi NR (2000) Cervical cancer. Am Fam Physician 61(5):1369–1376
4. Saslow D, Solomon D, Lawson HW, Killackey M, Kulasingam SL, Cain J, Garcia FA, Mori-
arty AT, Waxman AG, Wilbur DC, Wentzensen N, Downs LS Jr, Spitzer M, Moscicki AB,
Franco EL, Stoler MH, Schiffman M, Castle PE, Myers ER, ACS-ASCCP-ASCP Cervical
Cancer Guideline Committee, American Cancer Society, American Society for Colposcopy and
Cervical Pathology, American Society for Clinical Pathology (2012) Screening guidelines for
the prevention and early detection of cervical cancer. CA Cancer J Clin 62(3):147–172. https://
doi.org/10.3322/caac.21139. Epub 2012 Mar 14. PMID: 22422631; PMCID: PMC3801360
5. Nayar R, Wilbur DC (2017) The Bethesda system for reporting cervical cytology: a historical
perspective. Acta Cytol 61(4–5):359–372. https://doi.org/10.1159/000477556. Epub 2017 Jul
11. PMID: 28693017
6. Reddy S, Allan S, Coghlan S, Cooper P (2020) A governance model for the application of AI
in health care. J Am Med Inform Assoc 27(3):491–497. https://doi.org/10.1093/jamia/ocz192
7. Iwai, Tanaka T (2017) Automatic diagnosis supporting system for cervical cancer using image
processing. In: 2017 56th annual conference of the society of instrument and control engineers
of Japan (SICE), pp 479–482. https://doi.org/10.23919/SICE.2017.8105610
8. William W, Ware A, Basaza-Ejiri AH, Obungoloch J (2018) A review of image analysis and
machine learning techniques for automated cervical cancer screening from pap-smear images.
Comput Methods Programs Biomed 164:15–22. https://doi.org/10.1016/j.cmpb.2018.05.034
9. Hussain E, Mahanta LB, Das CR, Talukdar RK (2020) A comprehensive study on the multi-
class cervical cancer diagnostic prediction on pap smear images using a fusion-based decision
from ensemble deep convolutional neural network. Tissue Cell 65:101347
10. Supriyanto E, Pista NA, Ismail L, Rosidi B, Mengko T (2011) Automatic detection system of
cervical cancer cells using color intensity classification
11. Sokouti B, Haghipour S, Tabrizi AD (2012) A pilot study on image analysis techniques for
extracting early uterine cervix cancer cell features. J Med Syst 36(3):1901–1907. https://doi.
org/10.1007/s10916-010-9649-y
12. Ashok B, Aruna D (2016) Comparison of feature selection methods for diagnosis of cervical
cancer using SVM classifier
13. Su J, Xu X, He Y, Song J (2016) Automatic detection of cervical cancer cells by a two-level
cascade classification system. Anal Cell Pathol 2016:1–11. Article ID 9535027. https://doi.
org/10.1155/2016/9535027
14. Sharma M, Singh S, Agrawal P, Madaan V (2016) Classification of clinical dataset of cervical
cancer using KNN. Indian J Sci Technol
15. Kumar R, Srivastava R, Srivastava S (2015) Detection and classification of cancer from micro-
scopic biopsy images using clinically significant and biologically interpretable features. J Med
Eng 2015:457906. https://doi.org/10.1155/2015/457906
Analysis of Deep Learning Architecture-Based Classifier … 273
16. Singh S, Tejaswini V, Murthy RP, Mutgi A (2015) Neural network based automated system for
diagnosis of cervical cancer. Int J Biomed Clin Eng
17. Mustafa N, Mat Isa NA, Mashor MY, Othman NH (2007) New features of cervical cells
for cervical cancer diagnostic system using neural network. In: International symposium on
advanced technology
18. Chen YF, Huang PC, Lin KC, Lin HH, Wang LE, Cheng CC, Chen TP, Chan YK, Chiang
JY (2014) Semi-automatic segmentation and classification of pap smear cells. IEEE J Biomed
Health Inform 18(1):94–108. https://doi.org/10.1109/JBHI.2013.2250984
19. Shin HC, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM (2016)
Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset
characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298. https://doi.
org/10.1109/TMI.2016.2528162
20. Park YR, Kim YJ, Ju W et al (2021) Comparison of machine and deep learning for the classi-
fication of cervical cancer based on cervicography images. Sci Rep 11:16143. https://doi.org/
10.1038/s41598-021-95748-3
21. Tan X, Li K, Zhang J et al (2021) Automatic model for cervical cancer screening based on
convolutional neural network: a retrospective, multicohort, multicenter study. Cancer Cell Int
21:35. https://doi.org/10.1186/s12935-020-01742-6
22. Wu M, Yan C, Liu H, Liu Q, Yin Y (2018) Automatic classification of cervical cancer from
cytological images by using convolutional neural network. Biosci Rep 38(6):BSR20181769.
https://doi.org/10.1042/BSR20181769
23. Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional
neural networks. Adv Neural Inf Process Syst. https://doi.org/10.1145/3065386
24. Han J, Kamber M, Pei J (2012) Classification: basic concepts, chap 8. In: Han J, Kamber M, Pei
J (eds) Data management systems, data mining, 3rd edn. The Morgan Kaufmann series. Morgan
Kaufmann, pp 327–391. ISBN 9780123814791. https://doi.org/10.1016/B978-0-12-381479-
1.00008-3. https://www.sciencedirect.com/science/article/pii/B9780123814791000083
25. Szegedy C, Liu W, Jia YQ, Sermanet P, Reed S, Anguelov D (2015) Going deeper with convo-
lutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition
(CVPR), Boston, MA, pp 1–9
26. Han J, Kamber M, Pei J (2012) Classification: advanced methods, chap 9. In: Han J,
Kamber M, Pei J (eds) Data management systems, data mining, 3rd edn. The Morgan Kauf-
mann series. Morgan Kaufmann, pp 393–442. ISBN 9780123814791. https://doi.org/10.1016/
B978-0-12-381479-1.00009-5. https://www.sciencedirect.com/science/article/pii/B97801238
14791000095
27. Chandraprabha R, Hiremath S (2021) Computer processing of an image: an introduction.
In: Handbook of research on deep learning-based image analysis under constrained and
unconstrained environments. IGI Global, pp 1–22
28. Mikołajczyk, Grochowski M (2018) Data augmentation for improving deep learning in image
classification problem. In: 2018 international interdisciplinary PhD workshop (IIPhDW), pp
117–122. https://doi.org/10.1109/IIPHDW.2018.8388338
29. Ying X (2019) An overview of over fitting and its solutions. J Phys Conf Ser 1168:022022
30. Jantzen J, Dounias G (2006) Analysis of pap-smear image data
31. Chandraprabha R, Singh, S (2016) Artificial intelligent system for diagnosis of cervical cancer:
a brief review and future outline. J Latest Res Eng Technol
32. Jones OT, Calanzani N, Saji S, Duffy SW, Emery J, Hamilton W, Singh H, de Wit NJ, Walter FM
(2021) Artificial intelligence techniques that may be applied to primary care data to facilitate
earlier diagnosis of cancer: systematic review. J Med Internet Res 23(3):e23483. https://doi.
org/10.2196/23483. PMID: 33656443; PMCID: PMC7970165
Covid Vaccine Adverse Side-Effects
Prediction with Sequence-to-Sequence
Model
1 Introduction
The situations in the entire world changed in recent times because of the outbreak of
COVId-19 pandemic. Originating in Wuhan, the disease started spreading drastically
and was declared a pandemic. The increase in the infections lead to the increased
deaths, increased lay-offs, increased crisis, and decreased human population. There
were many measures and actions taken by the World Health Organization (WHO) to
prevent the spread of this disease. These measures included wearing masks, washing
hands, and maintaining social distance between the people. Global immunization
scheme is also one of the safest and reliable measures of preventing disease spread.
The vaccines are administered for various age groups in all the countries and have
proved the efficacy. Nevertheless, these vaccines have side effects along with their
preventive nature among different masses of populations. The vaccine side effects
are not only being temporarily harmful but have also proved fatal in some cases. The
preventive measures of the history of previous pandemics elaborate on the seriousness
of any pandemic situations. The previous pandemics were eradicated because of the
administration of vaccines along with preventive measures. One such example of
complete eradication of the disease is polio. In 1988, more than 350,000 children
were paralyzed with polio [1]. With the effect of immunization, there are hardly few
polio cases globally. Not only polio, but also eradication of smallpox was achieved
with the increase in vaccination [2]. The current work is aimed at predicting the side
S. Zacharia
British Telecom, Bangalore, India
e-mail: [email protected]
A. Kodipalli (B)
Department of Artificial Intelligence and Data Science, Global Academy of Technology,
Bangalore, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 275
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_24
276 S. Zacharia and A. Kodipalli
2 Literature Survey
In the previous works, the use of machine learning and deep learning has shown
significant results in the field of automation. Menni et al. conducted a survey on the
side effects of vaccination after eight days of vaccination in China by considering
the data recorded in a symptom study app. BNT162B2 and ChAdOx1 vaccines gave
the reduced risk of SARS-Cov2 infection after 12 days of vaccination [3]. Leng et al.
selected the vaccines based on the seven different attributes, among which vaccine
decision, effectiveness, side effects, and number of doses are important attributes.
The discrete choice models were based on Bayesian information criteria (BIC) and
Alkaline information criteria [4].
Riad et al. based on a survey for a month on the Czech healthcare workers vacci-
nated with Pfizer and BioNTech vaccine, found the common side effects of vaccine
to be pain at the injection site, fatigue, muscle pain, chills, and joint pain. The chi-
squared test and ANOVA were used with a significance level of <0.05 [5]. Alam et al.
used deep learning-based technique for analyzing the COVID-19 vaccine response
from the twitter data. To achieve this, data were downloaded from Kaggle. The data
characters were detokenized, and performance is checked by using long short-term
memory (LSTM) and bi-directional LSTM. It was found that both the models gave
good performance [6].
Zaman et al. collected the information in several vaccination centers among
different age groups and applied deep learning models to predict the side effects due to
vaccine. Serious conditions such as cardiovascular diseases and diabetes are consid-
ered as the main variables for predicting the side effects. The deep learning tech-
niques applied are artificial neural network (ANN), LSTM, and GRU were applied
and found that GRU gave the best accuracy [7]. Muneer et al. predicted the mRNA
vaccine degradation using deep learning techniques for which the data were obtained
from Kaggle, and two models were proposed which is deep hybrid neural network
models—GCN-GRU and GCN-CNN. MCRMSE score of 0.22614 and 0.34152 and
GCN-GRU pre-trained model achieved AUC score of 0.938 [8].
Jarynowski et al. surveyed the effects of Sputnik V vaccine in Russia through
social media analysis in which natural language processing is used to extract the text
from telegram groups. BERT technique of natural language processing was used to
perform multi-label classification which resulted in AUROC value of 0.991. Fever,
pains, chills, fatigue, and lot such symptoms were recorded [9]. Aryal and Bhattarai
proposed two models based on machine learning approach which is Naïve Bayes
and deep learning approach which is LSTM. The twitter data were extracted, and
Covid Vaccine Adverse Side-Effects Prediction … 277
preprocessing was done. It was found that the accuracy of LSTM model is higher
than the machine learning model by 7% [10].
Kerr et al. prepared a textual question and answer format for the analysis of the
efficacy and side effects of vaccines. Based on the responses obtained, information
tests are conducted. Participants were randomized by Qualtrics randomization tool
and showed that there is no effect of increasing the vaccinations but by individual
perspective [11]. Sen et al. prepared a graphical visualization of the analysis of spread
of COVID-19 pandemic situation with the help of current pandemic situation using
machine learning regression algorithms and obtained satisfying results. However,
the models are to be explored to obtain better results with the help of deep learning
techniques [12]. Ashwini et al. [13] using LSTM predicted the number of possible
cases for the next 10 days.
Majority of the literatures referred are using deep learning algorithms for senti-
ment analysis and survey-based approach to analyze the COVID vaccine symptoms.
In the present work, the analysis of vaccine side effects is given importance.
3 Methodology
The Vaccine Adverse Event Reporting System (VAERS) was created by the Food
and Drug Administration (FDA) and Centers for Disease Control and Prevention
(CDC) to receive reports about adverse events that may be associated with vaccines
in US [14]. Covid-19-related VERS data have been used in the current experiment.
Table 1 provides a detailed description of the data provided in each field of the
VAERSDATA.CSV file.
Total distinct count of VAERS_ID records were 7462. We could see the distribu-
tion of different side effects with respect to age and gender in the data. For example,
we can observe that death adverse effect is high in old ages as shown in Fig. 1.
Death %
Fig. 1
Our aim was to predict adverse side-effect symptoms based on age and sex. For age
and gender input, the below side effects were reported:
• COVID-19, Chills, Death, Fatigue, Headache, Pain
To process this structured data, in Seq2Seq model, we had to convert the data into
input sequence (encoder input) and output sequence (decoder output) format. So, we
formulated the data into format as shown in Table 2.
Once we formulated 7462 records into format, the total distinct records came to
184. Then, we split the data into train and test category with Table 3 record count.
Sequence-to-sequence model, with LSTM, maps the input sequence to a fixed sized
vector and maps the vector to another target sequence [14]. Due to this capa-
bility, Seq2Seq models are widely using in sequence learning tasks such as machine
translation [15], speech recognition [16], and video captioning [17].
In the input layer, we are giving age and gender as input. Bidirectional LSTM
layer was used as encoder layer. Encoder output fed into LSTM-based decoder layer.
We added dense layer at the end of sequence-to-sequence layer to capture the adverse
symptoms. Sigmoid activation function used in this output layer. The model is having
architecture as shown in Fig. 2.
The output of the decoder layer fed into dense layer and generated 6-dimensional
vector output to predict the symptoms. The model used binary cross-entropy loss
function with Adam optimizer.
4 Result
We have also tried with linear activation function in the last layer and loss function
with mean squared error. But could only obtain the micro-average accuracy as 85%.
Could observe that, after sigmoid transformation, the multi-label one-hot encoded
values were having values to near zero or 1, and binary cross-entropy yields higher
accuracy at this time.
5 Conclusion
Due to the COVID-19 pandemic, there was tremendous change in the multiple strata
of life such as clinical practice, regular life, work from home culture, online classes
to students, economy, policing, and crime control. In order to contain the outbreak
of COVID-19, many vaccines are discovered. Each of the vaccine underwent lot
of preclinical and clinical trials before regulatory approach. Still a few adverse
events due to side effects of vaccine are reported in the world. Therefore, devel-
opment of an efficient methodology to predict the possible adverse side effects
of vaccine is customary. In this direction, the present study predicts the adverse
side effects of COVID-19 vaccine using deep learning computational models. The
proposed methodology when tested with VAERS data provided 88% of accuracy
using LSTM. The developed methodology can help in predicting the side effects
Covid Vaccine Adverse Side-Effects Prediction … 281
References
1. Bigouette JP, Wilkinson AL, Tallis G, Burns CC, Wassilak SGF, Vertefeuille JF (2021) Progress
toward polio eradication—worldwide, January 2019–June 2021. Morb Mortal Wkly Rep
70(34):1129
2. Moss B, Smith GL (2021) Research with variola virus after smallpox eradication: development
of a mouse model for variola virus infection. PLoS Pathog 17(9):e1009911
3. Menni C, Klaser K, May A, Polidori L, Capdevila J, Louca P, Sudre CH et al (2021) Vaccine
side-effects and SARS-CoV-2 infection after vaccination in users of the COVID symptom study
app in the UK: a prospective observational study. Lancet Infect Dis
4. Leng A, Maitland E, Wang S, Nicholas S, Liu R, Wang J (2021) Individual preferences for
COVID-19 vaccination in China. Vaccine 39(2):247–254
5. Riad A, Pokorná A, Attia S, Klugarová J, Koščík M, Klugar M (2021) Prevalence of COVID-19
vaccine side effects among healthcare workers in the Czech Republic. J Clin Med 10(7):1428
6. Alam KN, Khan MS, Dhruba AR, Khan MM, Al-Amri JF, Masud M, Rawashdeh M (2021)
Deep learning-based sentiment analysis of COVID-19 vaccination responses from twitter data.
Comput Math Methods Med 2021
7. Zaman FU, Siam TR, Nayen Z. Prediction of vaccination side-effects using deep learning
8. Muneer A, Fati SM, Akbar NA, Agustriawan D, Wahyudi ST (2021) iVaccine-deep: prediction
of COVID-19 mRNA vaccine degradation using deep learning. J King Saud Univ-Comput Inf
Sci
9. Jarynowski A, Semenov A, Kamiński M, Belik V (2021) Mild adverse events of Sputnik V
vaccine in Russia: social media content analysis of telegram via deep learning. J Med Internet
Res 23(11):e30529
10. Aryal RR, Bhattarai A (2021) Sentiment analysis on covid-19 vaccination tweets using Naïve
Bayes and LSTM. Adv Eng Technol Int J 1(1):57–70
11. Kerr JR, Freeman ALJ, Marteau TM, van der Linden S (2021) Effect of information about
COVID-19 vaccine effectiveness and side effects on behavioural intentions: two online
experiments. Vaccines 9(4):379
12. Sen S, Thejas BK, Pranitha BL, Amrita I (2021) Analysis, visualization and prediction of
COVID-19 pandemic spread using machine learning. In: Innovations in computer science and
engineering. Springer, Singapore, pp 597–603
13. Raj A, Umrani NR, Shilpashree GR, Audichya S, Kodipalli A, Martis RJ (2021) Forecast of
covid-19 using deep learning. In: 2021 IEEE international conference on electronics, computing
and communication technologies (CONECCT), pp 1–5. https://doi.org/10.1109/CONECCT52
877.2021.9622721
14. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks.
In: Advances in neural information processing systems, pp 3104–3112
15. Venugopalan S, Rohrbach M, Donahue J, Mooney R, Darrell T, Saenko K (2015) Sequence
to sequence-video to text. In: Proceedings of the IEEE international conference on computer
vision, pp 4534–4542
16. Chiu C-C, Sainath TN, Wu Y, Prabhavalkar R, Nguyen P, Chen Z, Kannan A et al (2018) State-
of-the-art speech recognition with sequence-to-sequence models. In: 2018 IEEE international
conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4774–4778
17. VAERS dataset page. https://vaers.hhs.gov/data/datasets.html. Accessed 13 Dec 2021
Comparison Between ResNet 16
and Inception V4 Network for COVID-19
Prediction
1 Introduction
In the month of December 2019, the novel coronavirus which was later named
COVID-19 appeared in Wuhan city of China [1]. It has been two years that this
deadly virus has stepped in, yet there is no specific medication that deals in combating
this virus permanently. We just have a temporary medicine for this virus. The deep
learning algorithms have made it easy to diagnose COVID-19. They have proved to be
a boon in improving common diagnostic methods. Coronavirus disease has become
a serious concern for everyone around the globe. All activities came to hold as soon
as this virus appeared. Students were deprived of offline schooling and were forced
to confine themselves in a four-walled room and attend online schooling which did
not benefit them [2]. This virus has directly affected the health of mankind. Also,
it is one of the main reasons for economic crises all over the world. With different
variants in it, coronavirus has shown an adverse impact on the patients internally
over a long duration. Depending on different variants of this disease, the symptoms
also keep changing [3]. Symptoms of the delta variant of this disease include lower
pulse rate and drop in the oxygen level, whereas symptoms of the omicron variant
are higher pulse rate and there is no significant change in oxygen levels. Artificial
P. J. Rachana
Department of Mechanical Engineering, National Institute of Technology Karnataka, Surathkal,
Karnataka, India
e-mail: [email protected]
A. Kodipalli (B) · T. Rao
Department of Artificial Intelligence and Data Science, Global Academy of Technology,
Bangalore, India
e-mail: [email protected]
T. Rao
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 283
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_25
284 P. J. Rachana et al.
intelligence is useful in detecting infected patients and diagnosing them [4]. COVID-
19 belongs to SARS (a family of coronaviruses). Coronaviruses cause dangerous
diseases such as Middle East Respiratory Syndrome (MERS). They mainly cause
heart and lung diseases. A deep convolutional neural network-based model is helpful
in detecting infection in the lungs. Some measures like social distancing, washing
hands frequently or using alcohol-based sanitizer, masking, and getting vaccinated
help minimize the spread of this deadly virus on masses [5]. Scientists have discov-
ered many vaccines like covishield, covaxin, sputnik, and many more but they give
temporary relief only. With new variants popping up, these vaccines will lose their
power. A good immune system can help us in fighting this disease. Coronavirus
has a long-term impact on the health of the infected person. It mainly affects the
growth of the young ones and leads to several unknown infections in adults. This
virus brought mankind to a situation wherein there was not enough place to bury
the dead people affected by a novel coronavirus. Indirectly, this virus ruined the life
of people belonging to underdeveloped and developing countries around the globe.
The World Health Organization is also striving hard along with the governments of
different countries to make sure people are following a set of guidelines, usually
called STANDARD OPERATING PROCEDURE (SOP).
The organization of the paper is as follows: Sect. 2 describes the literature survey,
Sect. 3 describes the methodology, Sect. 4 describes the results, and Sect. 5 concludes
the paper.
2 Literature Survey
Al Husaini et al. [6] used inception v4 and other modified models like inception
mv4 to identify breast disorders. Using inception v4 and mv4 models, he was able
to get accurate images. Also, inception v4 and mv4 models are not restricted to be
used to detecting and treating early breast cancer but they can be used to detect lung
cancer and probably COVID-19. With increasing the number of training epochs, the
accuracy of inception mv4 decreases against the inception v4 model. These models
helped in detecting cancer present in the human body. These models are efficient in
terms of time-consuming and also energy consumption. Talo et al. [7] used AlexNet,
ResNet-18, ResNet-34, ResNet-50, and Vgg-16 architectures to identify multi-class
brain disease. He used these models on MRI images of the brain. These pre-trained
models classify given MR images into different categories which include normal,
cerebrovascular, degenerative, and inflammatory diseases. The main aim to introduce
the models in detecting the disease is to minimize or prevent the error caused by
humans in the manual reading of the MRI images. The manual reading of these
MRI images might not give us information about the early detection of multi-class
brain disease, whereas the use of these models helps in the early detection of multi-
class brain disease and also helps in better treatment and early recovery. Among the
above five models, ResNet-50 has the highest accuracy in classification. He et al.
[8] used ImageNet to evaluate ResidualNet. She used it on COCO object dataset
Comparison Between ResNet 16 and Inception V4 Network … 285
detection and also on COCO segmentation. She was able to perform the tasks like
ImageNet localization using these models. Also, these models helped in achieving a
good amount of accuracy. Pravin [9] used ResNet models in their research for image
recognition. The use of more layers does not increase the accuracy of the model
because more number of layers causes the problem of vanishing gradients. This
problem can be overcome by using architecture like ResNets in image recognition.
Using this architecture helps in restoring the accuracy and also minimizes the effect
of vanishing gradients to a good extent. Song et al. [10] used ResNet-18 model
in semantic segmentation. The main aim was to improve the pixel-wise semantic
segmentation. To achieve this task, deep learning neural models were used. ResNet-
18 architecture helped in reducing the number of parameters used in the models. The
datasets it worked upon were CamVid and Cityscapes and proved its efficiency.
John et al. prepared a textual question and answer format for the analysis of the
efficacy and side effects of vaccines. Based on the responses obtained, information
tests are conducted. Participants were randomized by quarterics randomization tool
and showed that the there is no effect of increasing the vaccinations but by individual
perspective [11]. Snigdha et al. prepared a graphical visualization of the analysis of
spread of COVID-19 pandemic situation with the help of current pandemic situa-
tion using machine learning regression algorithms and obtained satisfying results.
However, the models are to be explored to obtain better results with the help of deep
learning techniques [12]. Ashwini et al. [13] using LSTM predicted the number of
possible cases for the next 10 days.
Majority of the literatures referred are using deep learning algorithms for senti-
ment analysis and survey-based approach to analyze the COVID vaccine symptoms.
In the present work, the analysis of vaccine side effects is given importance.
3 Methodology
Two approaches were used to classify the given images into covid and non-covid
images. The approaches used are as follows:
1. ResNet 16
2. Inception v4.
The input image size is 64 × 64 × 3.
Architecture of the inception v4 is shown in Fig. 1.
The architecture of ResNet 16 is shown in Table 1.
In the end, after flattening two dense layers are added. In the last-second dense
layer, the Relu Activation function is used. In the last layer, the sigmoid activation
function is used. For training a model trained with Adam optimizer, binary cross-
entropy loss and used early stopping by monitoring the validation accuracy.
286 P. J. Rachana et al.
Table 1 Architecture of
Layer name ResNet 16
ResNet 16
Conv1 7 × 7, 64, stride 2
3 × 3 max pool, stride 2
conv2_x 3 × 3, 64
3 × 3, 64
3 × 3, 64
3 × 3, 64
conv3_x 3 × 3, 128
3 × 3, 128
3 × 3, 128
3 × 3, 128
conv4_x 3 × 3, 256
3 × 3, 256
3 × 3, 256
3 × 3, 256
conv5_x 3 × 3, 512
3 × 3, 512
4 Result
ResNet 16: Model has trained with 238 images. No. of test images are 39.
Optimizer used is Adam. No. of epochs are 20; validation step = 5.
After training with 20 epochs, the results are shown in Fig. 2.
Training stopped after 20 epochs. From the graphs, we can observe that on
increasing no. of epochs training accuracy is increased and loss is decreased. The
training accuracy is 98.7%. The validation accuracy is 94.8% (Fig. 3).
Training loss is 0.0719, training accuracy is 0.9643, validation loss is 0.0318, and
validation accuracy is 1.0000. The training process achieved reasonable accuracy
with the model architecture, and both training and validations losses are very low
(Fig. 4).
Comparison Between ResNet 16 and Inception V4 Network … 287
From the confusion matrix, it is observed that Precision is 77%, Recall is 100%,
and F1 score is 87%. On the test data, 87% F1 score is achieved.
The model predicted covid images correctly but non-covid images were also
detected as covid. This can be because of overfitting or less no of sample images.
Simplifying the architecture can further give better results.
Inception v4: Model has been trained with 238 images. No. of test images are 39.
Optimizer is Adam, and No. of epochs are 10 (Figs. 5 and 6).
From Fig. 7, it is observed that training loss is 2.7406, training accuracy is 0.9231,
validation accuracy is 0.2308, Precision is 77%, Recall is 100%, and F1 score is 87%.
This model also got 87% F1 score. It could not be able to detect non-covid images
correctly. The model seems to be overfitted.
5 Conclusion
References
10. Song H, Zhou Y, Jiang Z, Guo X, Yang Z. ResNet with global and local image features, stacked
pooling block, for semantic segmentation
Computational Deep Learning Models
for Detection of COVID-19 Using Chest
X-Ray Images
1 Introduction
S. Guha (B)
Indian Institute of Science, Bangalore, India
e-mail: [email protected]
A. Kodipalli · T. Rao
Department of Artificial Intelligence and Data Science, Global Academy of Technology,
Bangalore, India
e-mail: [email protected]
T. Rao
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 291
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_26
292 S. Guha et al.
The substantial impact of this disease is that it is extremely contagious that leads
life to a pause situation. But, as soon as people got some data on the virus, research
on COVID-19 diagnosis boosted up. At present, the standard diagnosis method of
COVID-19 is centered around the swab test, i.e., from the sample collected from nose
and throat, which is very much time-consuming and is subjected to man made errors.
However, the sensitivity of the swab tests is not good enough for timely detection of
the disease [3].
Early detection of positive cases is important to avert further spread of the disease.
In the diagnostic phase, radiological images of the chest are determinative as well as
the reverse transcription-polymerase chain reaction (RT-PCR) test [4].
Using AI technology to spot the features of COVID-19 in CT images, swiftly
screen COVID-19 patients, attain quick diversion and treatment of suspected patients,
reduce the infection rate, and control the spread of the disease [5].
This disease has caused about 230,000 deaths all over the world by the end of
April 2020. Within a span of six months, it has infected millions of people across
the globe because of its high spreading rate. Thus, many countries have put lot of
efforts in improving the diagnostic capability of their health care centers/hospitals
such that disease could be recognized as early as possible. But, the results of the
standard swab test come in a day or two that increases the chance of spreading of
the disease due to late diagnosis. Hence, a fast-screening method employing already
existing tools such as X-ray and computerized tomography (CT) scans can assist
in alleviating the load of mass diagnosis tests. Pneumonia is the first symptom of
COVID-19, and chest X-ray is the best technique for diagnosing it [6].
The organization of the paper is as follows: Sect. 2 describes the detailed liter-
ature survey. Section 3 shows the methodology and the dataset. Section 4 provides
results and the comparative study. Section 5 explains the discussion and future work.
Section 6 gives the conclusion of the work.
2 Literature Survey
Othman et al. [7] aimed to give a tool employed for forecasting which determines the
COVID-19 cases for seven days. The author employed computational algorithms,
namely artificial neural network (ANN), autoregressive integrated moving average
(ARIMA), convolutional neural network (CNN), and long short-term memory
(LSTM) for predicting COVID-19 cases. This paper mainly aimed to fine-tune each
process, and comparisons were done using various performance measures, namely
root mean squared logarithmic error (RMSLE), mean absolute percentage error
(MAPE), and mean squared logarithmic error (MSLE).
Sevi et al. [8] used X-ray images of COVID-19 patients, and the method was aimed
for the binary classification of X-ray images with COVID-19, viral pneumonia, and
healthy patients. Data augmentation method was applied on the dataset and performed
multi-class classification using deep learning models.
Computational Deep Learning Models for Detection of COVID-19 … 293
Ghada et al. [9] have detected and identified COVID-19 employing X-ray radia-
tion. For the purpose of comparison, CNN deep learning models used are Incep-
tion v3 network, GoogLeNet, ResNet-101, and DAG3Net. Among the various
deep learning models employed, DAG3Net outperformed with the accuracies of
96.15%, 94.34%, 96.75%, and 96.58% for validation, training, testing, and overall,
respectively, whereas the GoogLeNet, Inception v3 network, and ResNet-101 have
produced accuracies of 98.08%, 99.59%, and 100%, respectively.
Mohammad et al. [10] have used some of the AI-based DL models, namely long
short-term memory (LSTM), generative adversarial networks (GANs) to provide
the user-friendly platform for the detection of COVID-19 by both physicians and
researchers.
Milon et al. [11] used X-ray and computer tomography (CT) for detecting COVID-
19 by employing DL models. He carried out the detailed survey on the dataset and
trained the model using these deep learning models which are developed by the
researchers who carried out research in this field. The paper aimed at facilitating
experts (medical or non-medical) and technicians in accepting the ways DL tech-
niques are employed in this regard and how they could be employed to combat the
COVID-19 outbreak.
Talha et al. [12] have used DL technology for diagnosing COVID-19 through chest
CT scan. For early and accurate detection of coronavirus, EfficientNet deep learning
architecture is employed, and the following performance measures were achieved:
accuracy 0.897, F1-score 0.896, and AUC 0.895. The three distinct learning rate
methods employed are as follows: decreasing the learning rate as soon as model
performance stops improving (reduce on plateau), cyclic learning rate, and constant
learning rate. A F1-score of 0.9, 0.86, and 0.82 was achieved on reduce on plateau,
cyclic learning rate, and constant learning rate strategies, respectively.
Samira et al. [13] have proposed CoviNet a DL network which can automatically
detect the presence of COVID-19 in chest X-ray images. This architecture is formed
on histogram equalization, an adaptive median filter, as well as a CNN. The dataset
used for the study is publicly available. This model attained 98.62% and 95.77%
accuracies for binary and multi-class classification, respectively. This framework
may be employed to aid radiologists in the early diagnosis of COVID-19, as the
early diagnosis will limit the spreading rate of the virus.
Loveleen et al. [14] have presented a methodology to detect COVID-19 disease
from chest X-rays while differentiating those from normal and affected by viral pneu-
monia through deep convolution neural networks (DCNNs). Here, three pre-trained
CNN models (InceptionV3, VGG16, and EfficientNetB0) are assessed through
transfer learning. The principle for the selection of these three specific models is their
fairness toward accuracy and efficiency with less parameter appropriate for mobile
applications. It is trained on a publicly accessible dataset which is collected from
different sources. This study employs DL techniques and the following performance
metrics: accuracy, F1-score, precision, recall, and specificity. The results obtained:
Accuracy of 92.93% and sensitivity of 94.79% indicate that the proposed technique
is a high-quality model.
294 S. Guha et al.
3 Methodology
The dataset comprises 238 images for training and 39 images for testing, each
consisting of two classes—COVID and non-COVID.
The distribution of the two classes is as in Table 1.
3.2 ResNet 16
ResNet16 architecture, as shown in Fig. 1, was used in this research with the following
specifications of layers: The train and test images are rescaled before feeding into
the model. The size of the input image is 224 × 224 × 3. Zero padding is used along
with 2D convolution. Batch normalization is used to normalize the data. For pooling
layers, we used max pooling. Activation function used for hidden layers is ReLU.
Since it is a binary classification task of classifying as COVID and non-COVID,
sigmoid activation function is used at the output layer for binary classification with
the batch size 32. There are 3 fully connected (FC) layers at the end of the network:
FC1: 256 units
FC2: 128 units
FC3 (output layer): 1 unit.
The detailed implementation of ResNet 16 is shown in Fig. 2. The same is available
here.
The model was trained with 238 images comprising the two classes: COVID and
non-COVID and tested on 39 images belonging to either of the two classes: COVID
and non-COVID. To make the model robust, variation in training data was introduced
by setting sheer range = 0.2, zoom range = 0.2, and horizontal flip = True.
Computational Deep Learning Models for Detection of COVID-19 … 295
Conv block
Identity Block
3.3 Inception V4
Inception v4 architecture is briefly explained as shown in Fig. 3. The train and test
images are rescaled before feeding into the model. The dimension of input image is
299 × 299 × 3. Zero padding is used along with 2D convolution. Batch normalization
is used to normalize the data. Max pooling is used for pooling layer. ReLU activation
function is used for all the hidden layers. Since it is binary classification, sigmoid
activation function is used at the output layer, and batch size considered is 8. The
model was trained with 238 images comprising the two classes: COVID and non-
COVID and tested on 39 images belonging to either of the two classes: COVID and
non-COVID. To make the model robust, variation in training data was introduced
by setting sheer range = 0.2, zoom range = 0.2, and horizontal flip = True. The
diagram showing the detailed implementation of Inception V4 is available here.
4 Results
ResNet 16
Using the following settings of model hyperparameters with the optimizer Adam,
loss function binary cross-entropy and steps per epoch as 10, we ran for 100 epochs
with validation steps of 5.
At the end of 100 epochs of training, the train and test results are as in Table 2.
From Fig. 4, it is observed that initially the test loss is higher than the train loss
on an average for the first 40 epochs, and as the epochs progress, the train and test
loss become closer. Initially, the train accuracy is higher than the test accuracy on
an average for the first 40 epochs, and as the epochs progress, the train and test
accuracies become close to 90%.
296 S. Guha et al.
Fig. 2 (continued)
298 S. Guha et al.
Fig. 2 (continued)
Computational Deep Learning Models for Detection of COVID-19 … 299
Fig. 2 (continued)
300 S. Guha et al.
Fig. 2 (continued)
Epochs
Fig. 4 Plots of train loss versus test loss and train accuracy versus test accuracy for various epochs
of ResNet16
Predicted labels
Computational Deep Learning Models for Detection of COVID-19 … 303
test accuracy of 97.44% indicates that the model is also able to generalize well on
unseen data.
Inception V4
Using the following settings of model hyperparameters with the optimizer Adam,
loss function binary cross-entropy and steps per epoch as 10, we ran for 100 epochs
with the validation steps of 5. At the end of 100 epochs of training, the train and test
results are as shown in Table 4.
From Fig. 6 graphs, we see that the training loss is almost constant throughout
the epochs and is around 0, while the test loss increases till the 7th epoch, sharply
declines thereafter, and becomes constant after the 20th epoch. The training accuracy
increases sharply and becomes roughly constant after the 20th epoch, while the test
accuracy fluctuates and reaches a constant value of 76.92% after 100 epochs. Post-
completion of training, other model evaluation metrics were generated on the test set
(considering COVID as the positive class and non-COVID as the negative class) as
shown in Table 5 (Fig. 7).
Epochs
Fig. 6 Plots of train loss versus test loss and train accuracy versus test accuracy for various epochs
of Inception V4
True labels
Predicted labels
From the results, it is observed that a precision of 0.77 indicates that 77% of the
COVID predictions by the model was correct. Thus, in 77% of the predicted COVID
cases, the model has accurately identified a COVID image. A recall of 1 indicates
that 100% or all the actual COVID images were identified correctly by the model.
An F1-score of 0.87 is indicative of good model performance since there is slight
class imbalance in the dataset. The model yields specificity of 0, indicating that none
of the actual non-COVID images were classified correctly by the model. Thus, the
model tends to predict positive (COVID) most of the times. A false-positive rate of
1 indicates that the model tends to raise a false alarm: that is, predict as COVID
even when the actual image might be non-COVID. A training accuracy of 100%
indicates that the model has low bias, that is, it has learnt the training data well.
However, compared to this, the test accuracy of 76.92% indicates that the model is
not able to generalize well. Hence, the model has slightly overfitted the data. This
overfitting problem can be solved by training the model with more images of varying
characteristics.
When the comparison is made, it is observed that ResNet16 performs better on
the COVID dataset. It has fitted the data well and is also able to generalize well.
However, due to class imbalance in the dataset, there is a tendency to classify an
image as COVID in some cases by both models.
5 Discussion
The ResNet16 certainly is the more accurate classifier than Inception v4. Inception
v4 suffers from overfitting problem due to the relatively more complex architecture
and is unable to generalize well. However, some of the labeled COVID images in the
Computational Deep Learning Models for Detection of COVID-19 … 305
training dataset exhibit similar characteristics as non-COVID images, and hence, this
affects the precision and recall performance of the classifiers. ResNet16 performs
better than Inception V4 on the test set with the latter exhibiting a tendency to
predict COVID positive in most of the cases. The ResNet16 shows improvement
in performance after 40 epochs, while for Inception v4, the test accuracy continues
oscillating with an upper bound of 76.92%. The computational complexity combined
with overfitting problem makes the ResNet16 architecture better suited for this use
case.
6 Conclusion
We presented two architectures for detection of COVID-19 from chest X-ray images
in this paper:
i. ResNet16 with 16 convolutional layers and 3 fully connected layers, which is a
custom-made architecture for this use case.
ii. Inception v4 that builds on previous iterations of the Inception family by
simplifying the architecture and using more inception modules than Inception
v3.
Our results indicate that ResNet16 performed better on the test set after few
epochs, while the Inception v4 owing to its complexity overfitted the data and was
unable to generalize well on the test set. We saw that Inception V4 is inclined to
predict COVID positive in most of the cases. On investigating the cause of false
positives, it was found that some images which exhibited the characteristics of non-
COVID were labeled as COVID. This has likely impacted the true positive and
true negative predictive performance of Inception V4 and owing to the overfitting
problem, caused the model to learn those image characteristics as COVID. This
problem can be overcome by applying regularization to reduce the complexity of the
Inception V4 model and training on more samples.
References
4. Istaiteh O, Owais T, Al-Madi N, Abu-Soud S (2020) Machine learning approaches for COVID-
19 forecasting. In: International conference on intelligent data science technologies and
applications (IDSTA)
5. Sevi M, Aydin İ (2020) COVID-19 detection using deep learning methods. In: 2020 inter-
national conference on data analytics for business and industry: way towards a sustainable
economy (ICDABI)
6. Ghada A, Abdullah A, Nahedh H (2021) COVID-19 detection using deep learning models. In:
1st Babylon international conference on information technology and science 2021 (BICITS
2021), Babil, Iraq
7. Anwar T, Zakir S (2020) Deep learning based diagnosis of COVID-19 using chest CT-scan
images. In: 2020 IEEE 23rd international multitopic conference (INMIC). 978-1-7281-9893-
4/20/$31.00 ©2020 IEEE. https://doi.org/10.1109/INMIC50486.2020.9318212
8. Lafraxo S, El Ansari M (2020) CoviNet: automated COVID-19 detection from X-rays using
deep learning techniques. In: 2020 6th IEEE congress on information science and technology
(CiSt). 978-1-7281-6646-9/21/$31.00 ©2021 IEEE. https://doi.org/10.1109/CIST49399.2021.
9357250
9. Zheng C, Deng X, Fu Q, Zhou Q, Feng J, Ma H et al (2020) Deep learning based detection for
COVID-19 from chest CT using weak label. medRxiv
10. Zhang J, Xie Y, Li Y, Shen C, Xia Y (2020) Covid-19 screening on chest X-ray images using
deep learning based anomaly detection. arXiv preprint arXiv:2003.12338
11. Wang Y, Hu M, Li Q, Zhang XP, Zhai G, Yao N (2020) Abnormal respiratory patterns classifier
may contribute to a largescale screening of people infected with COVID-19 in an accurate and
unobtrusive manner. arXiv preprint arXiv:2002.05534
12. Wang X, Deng X, Fu Q, Zhou Q, Feng J, Ma H et al (2020) A weakly-supervised framework
for COVID-19 classification and lesion localization from chest CT. IEEE Trans Med Imaging
13. Hu S, Hoffman EA, Reinhardt JM (2001) Automatic lung segmentation for accurate
quantitation of volumetric X-ray CT images. IEEE Trans Med Imaging 20(6):490–498
14. Dong L, Hu S, Gao J (2019) Discovering drugs to treat coronavirus disease 2019 (COVID-19).
Drug Discov Ther 14(1):58–60
15. Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M (2014) Striving for simplicity: the all
convolutional net. arXiv preprint arXiv:1412.6806
An Ensemble Approach for Detecting
Malaria Using Classification Algorithms
1 Introduction
Malaria is estimated to cause one million deaths annually according to the WHO
report [1]. Though there are various reasons that may be contributing toward this
life threatening disease such as climatic conditions, underdeveloped sanitation, and
deforestation [2], it is mainly due to the infection caused by the bite of a female
mosquito. Getting fever, headache, and other symptoms reveal the presence of the
parasite in the patient’s body. In our country, malaria continues to cause a major threat
in the public health. Though this is a treatable disease and medications are available,
it is important to understand the type of parasite that is found in the patient’s body.
Thick blood smear test is used to detect whether the malaria parasite is present, and
the other blood test done with thin blood smear is helpful to find out the species of
malaria parasite that is the reason for the infection [3]. However, the precision of
these tests depends upon the quality of the smear that is extracted and also on the
classification of the infected cells and others. Over the period of time, there were
few more alternative methods that were suggested for diagnosing malaria such as
polymerase chain reaction which is referred to as PCR test and rapid diagnostic tests
for malaria commonly referred to as RDT. However, these alternate tests have been
found to have some issues related to its performance and are less cost-effective [4, 5].
Technology can assist in the detection of the malaria parasite, thereby helping to
manage the treatment more accurately and ensuring a timely service. With significant
improvement possible over the current diagnosis, this research intends to study the
detection of malarial parasites by using machine learning algorithms over the health
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 307
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_27
308 S. Ruban et al.
data, depending on the ability to extract feature from the dataset. Machine learning
is used in medical diagnosis and intelligent systems. It is about developing machines
with the ability to be intelligent. This intelligence is imputed into the application by
the data it is trained with. With many advances that is going on with respect to the data
collection, data processing, and data computation, intelligent AI systems are used
in multiple tasks that were once accomplished by manual intervention. Autonomous
vehicles to health care [6], the situations are changing so fast, in a way that could not
be comprehended. However, the challenge in developing these AI-based applications
is the availability of data. There are plenty of data that are available in different silos
of the healthcare organizations. However, most of these available data are written by
Hand [7]. Poor handwritten clinical notes too, causes a severe challenge for the data
scientists who are involved in analyzing the data.
Machine learning algorithms have been used by researchers to predict malaria
like other infectious diseases. One of the sub-domains of machine learning called
deep learning has become very popular these days. The authors Lee et al. [8] in
their work on malaria detection used back-propagation neural networks, one of the
commonly used method in deep learning to model the climatic factors and other
attributes related to malaria.
Another familiar one called as long short-term memory network (LSTM) was used
for prediction by the researchers [9]. Chae et al. [10] in their work found that the
performance of the prediction could be increased using the LSTM. All these works
that have been quoted above tell about using deep learning methods to develop models
for predicting infectious diseases. However, by just using one algorithm, the accuracy
may be limited [11]. Hence, an innovative method called stacked generalization has
got more attention recent days, where stacking frameworks with different machine
learning algorithms can be applied together to increase the prediction performance.
In another work, Bhatt et al. [12] came out with a model that displayed a better
prediction performance than the other models in predicting malaria prevalence. In
this work, we propose a multiclassifier using different classifiers, namely XGBoost
classifier, random forest (RF), and gradient boosting.
This part of the paper discusses about, the data sources from where the real-time data
were collected, format of malaria data, data gathering process, data preprocessing,
data processing, and ensemble classifiers.
As one of the coastal cities, Mangalore is named as one of the malaria-prone city
[13] in south India. Most of the works on malaria that were done in Indian scenario
An Ensemble Approach for Detecting Malaria … 309
Person treated from malaria exhibit various symptoms like fever and flu-like sick-
nesses like chills, headache, muscle ache, and tiredness. Mostly, people suffer from
fever and chills. If not identified at the right time, malaria may lead to anemia and
jaundice, which may lead to kidney failures, seizures, mental confusion, coma, and
death [14]. These symptoms normally last from 2 days to a week. The outpatient
registration and medical records departments had all the necessary information for
building a model and the data were arranged according to ICD code [15].
After obtaining the necessary approval from the Father Muller Medical College’s
scientific and ethical council, the clinical notes linked to malaria from Father Muller
Medical College were retrieved. Malaria-related information was kept in the form of
electronic medical records (EMRs). The case sheets and clinical notes were scanned
and archived in the Medical Records Department’s repository. As a result, the scanned
photos would not be used for data analysis. They must be converted into a format that
the machine learning algorithms can understand. The patient’s clinical notes were
retrieved using the inpatient number, which serves as a unique identifier for the data
in the Medical Records Department and the data in the Registration Department.
2.4 Preprocessing
The original data gathered from the clinical notes were unprocessed, and most of
the portion were handwritten. Most of these handwritten portion of the clinical notes
were written by various doctors and nurses who were attending to the patients who
were undergoing treatment. Only, the discharge summary which was part of the
clinical notes is the typed one. The quality of data plays an important role in the
machine learning project. The missing entries, inconsistencies, typographical, and
semantic errors that were there in the raw data were clarified and rectified based on
the discussion with the healthcare professionals who were assigned for that. This
step does not give you any meaningful insight. However, it helps to find out the right
assumption to be made for the analysis and the features that have to be extracted. We
310 S. Ruban et al.
did use the Tesseract optical character recognition engine (OCR) [16] for extracting
the raw data from the clinical notes. However, the accuracy of data that were extracted
was moderate depending upon the clarity of the images, blur and noise. So, the data
that were extracted had to be manually checked by the healthcare professionals. It
was followed by pattern identification. Preprocessing of data was done in four stages.
Cleansing the Data: This step was done with the help of the healthcare professionals
from the medical college hospital, who helped in identifying the errors that have crept
into the data.
Integrating the Data: The data that were available in two places were captured and
integrated. The medical records section provided the clinical data. The IP No, which
serves as a main key, was then used to combine both slices of data pertaining to a
patient.
Transforming the Data: This crucial step takes care of transforming the data from
its raw format to a format which is computable. This transformation also makes sure
the original intended meaning of the data is not lost.
Dimensionality Reduction: This step is important with reference to the processing
of the application. This step ensures that, no repeated data are available, and the data
that are not relevant to the analysis are also pruned.
Based on the discussions with the physicians, a data dictionary was created, which
acts as a metadata for the efficient and smooth processing of the data as shown
in Table 1.
Machine learning techniques are basically algorithms that try to find out the relation-
ship between different features that are found in the dataset. Most of the machine
learning techniques are classified into supervised and unsupervised learning. Super-
vised learning refers to a set of methods that are trained on a set of factors mapped on
a target label. A machine learning model that produces discrete categories [17] can
be called as classification. Few case studies to quote include, those which can predict
whether a person has malaria or not. Predicting whether a tumor is malignant or
benign. In medicine, such kind of classification problems do exist, and classification
algorithms are used in those areas. There are many classification algorithms that are
available. In this research work, we have used few algorithms such as random forest
algorithm, gradient boost algorithm, and XGBoost algorithm. XGBoost is a classi-
fication algorithm that is quite popular these days since it is used in many machine
An Ensemble Approach for Detecting Malaria … 311
learning and Kaggle competitions [18] that deal with structured data. This algorithm
is mainly designed for faster response and good performance. Gradient boosting is
one of the widely used classification algorithm [19]. Random forest classification
algorithm is a supervised machine learning algorithm [20] that is based on decision
trees. It is also used in many practical applications in finance and health care. All the
above algorithms, bagging, boosting, and random forest [21] are normally referred
to as ensemble learning algorithms. Our earlier work in this area has been done using
the normal machine learning algorithms [22, 23]. Ensemble methods are machine
learning methods that incorporate many base models [24] to provide the best optimal
result.
The analysis that was done on the data collected revealed few important insights that
are displayed in the following Figs. 1, 2, 3 and 4.
The raw data which were captured from the patient’s case sheets are now trans-
formed into a format that is suitable for creating a simple model. The model can
perform any functions based on its necessity. Since our task is related to classifica-
tion, few algorithms that are known to perform better are used in this research work
(Figs. 5 and 6).
The ensemble methods give better results than other traditional algorithms, since
they use different base methods within them. The model was trained using the
XGBooster classifier, gradient booster classification algorithm, and random forest
classification algorithm. The different performance measures were compared with
one another, and the metrics are displayed (Table 2).
312 S. Ruban et al.
Table 2 Comparison of the ensemble algorithms with respect to the performance metrics
Algorithm Precision Recall F1-score Accuracy
Random forest classifier 0.77 0.71 0.74 90.97
XGBooster classifier 0.73 0.76 0.75 82.08
Gradient booster classifier 0.70 0.56 0.63 68.47
4 Conclusion
This study, which was based on a malaria patient’s clinical records, offers insight on
the types of symptoms that patients encounter before being brought to the hospital.
It also investigates the efficacy of malaria diagnostic therapy. The faster a doctor
checks a patient based on their symptoms, the more likely the treatment will be
beneficial. This study was based on information from a single source. More data
from different hospital settings and locations could help the system function better.
Other traditional methods have been demonstrated to be less effective than ensemble
approaches. It is crucial to determine which ensemble method is the most effective.
In this investigation, the random forest classifier has an accuracy of 90.97 when
compared to other techniques. The same approaches employed in the preparation
phases can be utilized to gather data and transform raw clinical data into valuable
data in any medical setting.
Acknowledgments Authors acknowledge, that this work was carried out in the Big Data Analytics
Lab funded by VGST, Govt. of Karnataka, under K-FIST(L2)-545, and the data were collected from
Father Muller Medical College, protocol no: 126/19 (FMMCIEC/CCM/149/2019).
References
8. Lee KY, Chung N, Hwang S (2016) Application of an artificial neural network (ANN) model
for predicting mosquito abundances in urban areas. Ecol Inform 172–180
9. Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: continual prediction with
LSTM. Neural Comput 12(10):2451–2471. https://doi.org/10.1162/089976600300015015
10. Chae S, Kwon S, Lee D (2018) Predicting infectious disease using deep learning and big data.
Int J Environ Res Public Health 5(8):1596
11. Wang M, Wang H, Wang J, Liu H, Lu R, Duan T et al (2019) A novel model for malaria
prediction based on ensemble algorithms. PLoS ONE 14(12). https://doi.org/10.1371/journal.
pone.0226910
12. Bhatt S, Cameron E, Flaxman SR, Weiss DJ, Smith DL, Gething PW (2017) Improved predic-
tion accuracy for disease risk mapping using Gaussian process stacked generalization. J R Soc
Interface 14(134)
13. Shivakumar, Rajesh BV, Kumar A, Achari M, Deepa S, Vyas N (2015) Malarial trend in
Dakshina Kannada, Karnataka: an epidemiological assessment from 2004 to 2013. Indian J
Health Sci 8:91–94
14. Symptoms of malaria. https://www.cdc.gov/malaria/about/faqs.html. Accessed 26 June 2021
15. ICD code of malaria. https://www.icd10data.com/ICD10CM/Codes/A00-B99/B50-B64/
B54-/B54. Accessed 1 July 2021
16. Smith R (2007) An overview of the Tesseract OCR engine. In: Proceedings of ninth international
conference on document analysis and recognition (ICDAR). IEEE Computer Society, pp 629–
633
17. Sidey-Gibbons JAM, Sidey-Gibbons CJ (2019) Machine learning in medicine: a practical
introduction. BMC Med Res Methodol 19:64. https://doi.org/10.1186/s12874-019-0681-4
18. XGBOOST machine learning algorithm. https://machinelearningmastery.com/gentle-introduct
ion-xgboost-applied-machine-learning/. Accessed 2 July 2021
19. Gradient boost machine learning algorithm. https://towardsdatascience.com/machine-learning-
part-18-boosting-algorithms-gradient-boosting-in-python-ef5ae6965be4. Accessed 2 July
2021
20. Random forest machine learning algorithm. https://www.section.io/engineering-education/int
roduction-to-random-forest-in-machine-learning/. Accessed 2 July 2021
21. Lohumi P, Garg S, Singh TP, Gopal M (2020) Ensemble learning classification for medical diag-
nosis. In: 5th international conference on computing, communication and security (ICCCS),
pp 1–5. https://doi.org/10.1109/ICCCS49678.2020.9277277
22. Ruban S, Rai S (2021) Enabling data to develop an AI-based application for detecting malaria
and dengue. In: Tanwar P, Kumar P, Rawat S, Mohammadian M, Ahmad S (eds) Computational
intelligence and predictive analysis for medical science: a pragmatic approach. De Gruyter,
Berlin, Boston, pp 115–138. https://doi.org/10.1515/9783110715279-006
23. Ruban S, Naresh A, Rai S (2021) A noninvasive model to detect malaria based on symptoms
using machine learning. In: Advances in parallel computing technologies and applications. IOS
Press, pp 23–30
24. Ensemble methods. https://towardsdatascience.com/ensemble-methods-in-machine-learning-
what-are-they-and-why-use-them-68ec3f9fef5f. Accessed 10 July 2021
IoT-Enabled Intelligent Home Using
Google Assistant
1 Introduction
Home is the place where one wishes to feel comfort after working whole day. Some
have exhaustive working hours, and in such time, if a device or a technology, that
helps switch lights on/off or play favorite music, controlling geyser and adjusting
the room temperature were already done before reaching home just by giving simple
voice commands on smart phone making life pleasant [1–8]. Housekeepers, a way
for rich people to keep up their homes in tact with ease, however, even after tech-
nology advancements only the rich people’s houses are equipped with latest smart
home devices, as they cost high. Hence, realizing low-cost smart device for home
automation for the normal families is the need of the hour [10–13].
This paper proposes an affordable system which uses ESP 8266 Node MCU IC
and relay board of 4 relays as major hardware elements and Google Assistant, IFTTT
and BLYNK applications as major software components. All the elements are inter-
connected over Internet using Wi-Fi which puts this system under Internet of things
(IoT) [14–17] (Fig. 1).
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 317
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_28
318 B. K. Veeramalla et al.
2 System Architecture
The architecture explains the home automation system. The voice commands given
by the user are sent through the Google Assistant application which is the front facia of
our project. In the back drop primarily, BLYNK application takes the command of the
hardware circuit designed as shown in Fig. 2, and the BLYNK is an IoT application
which gets connect with ESP 8266 Node MCU on Wi-Fi through hotspot. When
the voice command is given on smart phone, the particular node of MCU at which
concerned relay gets actuated, and home appliance starts function as per the voice
command. This is how the system function using Internet of things (IoT). In our
project, we are controlling 04 home appliances by using 04 relays [18–20].
The system hardware mainly comprises of ESP 8266 Node MCU, 04 no. relays,
5 V DC source, 220 V 50 Hz domestic supply, 04 no. home appliances, a smart
phone, and Wi-Fi connectivity.
3 Design Implementation
Initially, we design code for functioning of the proposed project by using Arduino
software. The code is written in the Arduino application and dumped into ESP 8266
Node MCU board. Below mentioned is the code uploaded into the board (Fig. 3).
The Arduino application software helps to write the software to give command to
the BLYNK application, thereby the BLYNK application in the mobile smart phone
takes full control over the circuit and acts as the heart of the project in communicating
the commands from Wi-Fi module to the home electrical equipments.
We cannot directly communicate Google Assistant with BLYNK, therefore we
use IF This Then That (IFTTT) application software as an intermediate interfacing
tool between Google Assistant and BLYNK.
In order to activate Google Assistant, we start with “OK GOOGLE” command.
Once the Google Assistance is activated, the commands which are used to control
the Light, TV, and Fan are given one after the other. For each command given as
input, the respective output system will be controlled (Fig. 4).
Flowchart: A simple flowchart defines the operation of home automation model
which is given in Fig. 5.
Operation: In Case 1, we consider performing ON and OFF operation of television.
Now, we provide a command “Switch ON the TV” to Google Assistant. With the help
of Wi-Fi module, TV will get switched ON. “Switch OFF the TV”: This command
Fig. 3
ESP8266_Standalone |
Arduino 1.8.15 code
320 B. K. Veeramalla et al.
is used to switch off the TV. The operation can be performed from anywhere with
the only condition that there should be Internet connectivity. In the same way, we
consider performing ON and OFF operation for a fan in Case 2, in Case 3, we
considered light, and in Case 4, an air conditioner.
4 Software Applications
4.2 BLYNK
The BLYNK is the software application which allows us to interface between the
smart phone and the ESP 8266 microcontroller unit. The application takes charge post
uploading the requisite library and ARDUINO code into the microcontroller. After
loading the code, BLYNK software takes over the control of the microcontroller’s
input and output nodes. Therefore, all the electrical appliances are controlled using
this application (Fig. 6).
IFTTT stands for “IF This Then That”. IFTTT is a Website launched in 2010 with a
motto of “put the Internet to work for you”. The main vision of IFTTT is to automate
everything from our favorite apps and Websites to application-enabled accessories
and smart devices. Here, IFTTT application is used as an intermediate platform
between the Google Assistant and the BLYNK application.
After logging into IFTTT Web page, we need to create an applet and then “This”,
i.e., the trigger, here, we select Google Assistant and then type the command to
which the Google Assistant should respond and to this command the concerned
home appliance actuate. The response from the Goggle Assistant can also be written
as desired (Fig. 7).
322 B. K. Veeramalla et al.
It is a software application that permits users to have direct control over all the
applications in the device using voice commands. It eases users and more specifically
to the disabled people like blind as they only have to give voice commands to the
IoT-Enabled Intelligent Home Using Google Assistant 323
Google Assistant. A easy to use application and for interfacing, the Google Assistant
with BLYNK is done through IFTTT (Fig. 8).
5 Results
Table 1 Response of Google Assistance against voice command delivered for 15 instances
No. of trials undertaken Voice command delivered Response from Google Assistant
(correct/wrong) (accepted/rejected)
1 Correct Accepted
2 Correct Accepted
3 Wrong Rejected
4 Correct Accepted
5 Correct Accepted
6 Correct Accepted
7 Wrong Rejected
8 Correct Accepted
9 Correct Accepted
10 Correct Accepted
11 Wrong Rejected
12 Correct Accepted
13 Wrong Rejected
14 Correct Accepted
15 Correct Accepted
Fig. 9 Switching ON
condition
6 Conclusion
The goal of this article is to design and realize an IoT-based intelligent home
using Google Assistant for managing common household appliances for the men in
need. As the Google Assistant controlled home automation concept was effectively
executed, the technique outlined in the paper was successful.
This article proposes and realizes a low-cost IoT-based intelligent home based on
the ESP 8266 Node MCU microcontroller. Overall, Arduino is simple to comprehend
IoT-Enabled Intelligent Home Using Google Assistant 325
and program. We can also assure energy conservation and efficiency of our appliances
with the aid of this technology. Specifically, the disabled people like blind can have
total control over our household appliances from afar. It also improves human comfort
and reduces human effort.
References
1. Li RYM, Li HCY, Mak CK, Tang TB (2016) Sustainable smart home and home automation:
big data analytics approach. Int J Smart Home 10(8):177–198
2. Sharma R, Chirag P, Shankar V (2014) Advanced low-cost security system using sensors,
Arduino and GSM communication module. In: Proceedings of IEEE TechSym 2014 satellite
conference, VIT University
3. Javale D, Mohsen M, Nandewar S, Shingate M (2013) Home automation and security using
Android ADK
4. Yavuz E, Hasan B, Serkan I, Duygu K (2007) Safe and secure PIC based remote control
application for intelligent home. Int J Comput Sci Netw Secur 7(5)
5. Sriskanthan N, Karand T (2002) Bluetooth based home automation system. J Microprocess
Microsyst 26:281–289
6. Kusuma SM (1999) Home automation using internet of things
7. Shrotriya N, Kulkarni A, Gadhave P (1996) Smart home using Wi-Fi. Int J Sci Eng Technol
Res (IJSETR)
8. Celtek SA, Durgun M, Soy H (2017) Internet of things based smart home system design
through wireless sensor/actuator networks. In: 2017 2nd international conference on advanced
information and communication technologies (AICT), pp 15–18
9. Guravaiah K, Velusamy RL (2019) Prototype of home monitoring device using internet of
things and river formation dynamics-based multi-hop routing protocol (RFDHM). IEEE Trans
Consum Electron 65(3):329–338
10. Li X, Lu R, Liang X, Shen X, Chen J, Lin X (2011) Smart community: an internet of things
application. IEEE Commun Mag 49(11):68–75
11. Wang D (2016) The internet of things the design and implementation of smart home control
system. In: 2016 international conference on robots & intelligent system (ICRIS), pp 449–452
12. Tang S, Kalavally V, Ng KY, Parkkinen J (2017) Development of a prototype smart home
intelligent lighting control architecture using sensors onboard a mobile computing system.
Energy Build 138:368–376
326 B. K. Veeramalla et al.
13. Kelly SDT, Suryadevara NK, Mukhopadhyay SC (2013) Towards the implementation of IoT
for environmental condition monitoring in homes. IEEE Sens J 13(10):3846–3853
14. Suryanegara M, Arifin AS, Asvial M, Wibisono G (2017) A system engineering approach
to the implementation of the internet of things (IoT) in a country. In: 2017 4th international
conference on information technology, computer, and electrical engineering (ICITACEE), pp
20–23
15. Arifin AS, Suryanegara M, Firdaus TS, Asvial M (2017) IoT based maritime application: an
experiment of ship radius detection. In: Proceedings of the international conference on big data
and internet of thing, London
16. Atzori L, Iera A, Morabito G (2010) The internet of things: a survey. Comput Netw 2787–2805
17. Chebudie AB, Minerva R, Rotondi D (2014) Towards a definition of the internet of things (IoT)
18. Al-Fuqaha A, Guizani M, Mohammadi M, Aledhari M, Ayyash M (2015) Internet of things:
a survey on enabling technologies, protocols, and applications. IEEE Commun Surv Tutor
17(4):2347–2376
19. Pavithra D, Balakrishnan R (2015) IoT based monitoring and control system for home
automation. In: 2015 global conference on communication technologies (GCCT), pp 169–173
20. Schneider GP (2015) Electronic commerce. Cengage Learning, Stamford, CT
Analysis of a Microstrip Log-Periodic
Dipole Antenna with Different Substrates
1 Introduction
S. Prasad (B)
Department of Electronics and Communication Engineering, Andhra University College of
Engineering, Visakhapatnam, A.P., India
e-mail: [email protected]
S. Aruna
Andhra University College of Engineering, Visakhapatnam, India
K. Srinivasa Naik
Vignan’s Institute of Information Technology, Visakhapatnam, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 327
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_29
328 S. Prasad et al.
the larger the size of the antenna, but it achieves better efficiency and larger band-
width. The dielectric constant (εr ) is limited by radio frequency or microwave circuit
connected to antennas. When substrates of higher dielectric constants were used,
then the performance result degrades [2]. To achieve larger bandwidth, the antenna
is designed without ground plane [3].
In this project design, first microstrip log-periodic antenna is designed, then radiation
parameters are compared with different substrate material. A log-periodic array is a
collection of dipole antennas of various sizes that are linked together and alternately
supplied through a common transmission line. Design variables are apex angle α
(alpha), spacing factor (τ ), and scale factor (σ ) [4].
Spacing factor (σ ), scaling factor (τ ), and substrate permittivity (εr ) are important
factor of antenna design. Bandwidth of microstrip antenna is inversely proportional
to the substrate permittivity (εr ) or dielectric constant. So here five different substrate
materials are used to realize the antenna radiation characteristics return loss, VSWR,
directivity, and gain.
Figure 1 shows the geometrical dimensions of the antenna the lengths ln , spacing
Rn , gap spacing at dipole centers S n , diameters d n of the LPA. Relationship [5] is
defined by
Ln/Ln + 1 = τ = 0.802
Dn/Dn + 1 = τ = 0.802
(a)
(b)
Fig. 2 a Microstrip log-periodic antenna design. b Microstrip log-periodic antenna, substrate view
Analysis of a Microstrip Log-Periodic Dipole Antenna … 331
The antenna properties were designed and analyzed using ANSOFT HFSS 15.0 soft-
ware. To examine the output properties, I employed five different substrate materials
(Table 2).
Fig. 4 VSWR
332 S. Prasad et al.
Fig. 8 VSWR
Analysis of a Microstrip Log-Periodic Dipole Antenna … 333
Fig. 12 VSWR
334 S. Prasad et al.
Fig. 16 VSWR
Analysis of a Microstrip Log-Periodic Dipole Antenna … 335
See Table 3.
336 S. Prasad et al.
Fig. 20 VSWR
Table 3 Result
Parameters Dielectric constant S11 (return loss) VSWR (max) Gain
Duroid 2.2 −16.2 1.92 2.85
Taconic TLC 3.3 −24.62 1.78 3.03
Roger RO4003 3.55 −23.84 1.73 2.99
FR4 Epoxy 4.4 −30.36 1.62 3.14
Roger RO3005 6.15 −36.68 1.57 5.2
Analysis of a Microstrip Log-Periodic Dipole Antenna … 337
5 Conclusion
The properties of the microstrip log-periodic antenna were displayed in a table in five
different circumstances. When the results are compared, it is discovered that when
the dielectric constant of the substrate increases, the return loss, VSWR, and gain
improve. As a result, the antenna’s efficiency improves. The best material for this
antenna is Roger RO3005. When choosing a substrate material, other factors such
as size, price, availability, and loss tangent are taken into account.
6 Future Perspective
References
1. Rahim MKA, Gardner P (2004) The design of nine element quasi microstrip log periodic antenna.
In: RF and microwave conference, RFM 2004, proceedings, 5–6 Oct 2004, pp 132–135
2. Jain K, Gupta K. Different substrates use in microstrip patch antenna—a survey. Int J Sci Res
(IJSR). ISSN (Online): 2319-7064
3. Casula G, Montisci G, Mazzarella G (2013) A wideband PET inkjet-printed antenna for UHF
RFID. IEEE Antennas Wirel Propag Lett 12:1400–1403
4. Malusare S, Patil V, Wakode S, Bhilegaonkar SM. Microstrip log periodic antenna array. Int J
Adv Manag Technol Eng Sci. ISSN no: 2249-7455
5. Balanis CA (2016) Antenna theory: analysis and design, 4th edn. Wiley, Hoboken, NJ
Detection of Diabetic Retinopathy Using
Fundus Images
1 Introduction
One of the key concerns of modern health care is the rapidly growing rate of diabetes
as the number of people suffering from the condition is increasing at an alarming rate.
Over the next 10 years, the World Health Organization predicts that the number of
diabetes would rise from 135 to 400 million [1]. The fact that only half of the patients
are aware of the ailment aggravates the issue. Diabetes, from a medical standpoint,
causes serious late consequences. Heart disease, kidney difficulties, and retinopathy
are among the consequences that might occur as a result of macro and microvascular
alterations. Diabetic Retinopathy is a condition that develops as a result of long-term
diabetes [2]. Arteriosclerosis, or the hardening and thickening of artery walls, plays
a role in the development of cardiovascular illnesses, which are the leading cause of
mortality in persons over the age of 45.
Figure 1 depicts the difference between a healthy retina and a non-healthy retina.
The retina is the only place in the body where blood vessels may be seen noninva-
sively and in real-time. Digital ophthalmoscopes can now obtain very clear images
of the retina, save them in a digital format, and do automated image processing and
analysis. Despite the fact that this concept has piqued the interest of several research
organizations, the problem remains unsolved. The retinal images impose important
technical challenges, both while capturing and while processing them.
In the context of this study, the retina is the light-sensitive layer of the eye that
is the most essential anatomical portion of the eye. The retina is a multi-layered
structure made up of several cells that convert light into energy, pre-process visual
information, and send neurological signals. Next to the choroid and pigment epithe-
lium, the photoreceptive layer is the furthest away from the pupil. The retina receives
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 339
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_30
340 S. V. Viraktamath et al.
Fig. 1 Comparison of
healthy and non-healthy
retina
a twofold blood supply from the top and bottom of the layer; the component that
comes via the choroid provides 65% of the blood supply, while the portion that comes
from the top of the retina provides 35%. The photoreceptive cells are separated into
rods and cones, which provide achromatic and color vision, respectively.
The fovea, which has the largest density of cones, is responsible for pin-focus
high-resolution coloring. Rods outnumber cones on the remainder of the retinal
surface. The Optic Disc (OD) is the portion of the retina where neural fibers and
blood arteries enter the retina; it contains no photoreceptive cells, thus the name
“blind spot”. One artery and one vein enter the retina inside the OD and then branch
out to fill the retinal tissue. From a technological standpoint, each vessel creates a
tree-like structure in actual three-dimensional space, with one root at the OD. Two-
dimensional projections of the trees overlap in the retinal pictures, generating vessel
crossings and cycles. However, even in two-dimensional projections, the arteries do
not cross arteries and veins do not cross veins [3].
2 Literature Survey
• Feng et al. [4], the authors of this paper investigate the relationship between
DenseNet performance and connection density. In the beginning authors give the
brief introduction to CNN and the evolution of algorithms. Then it explains about
Connection trimming of DenseNet, where in the reduction of the connections
in a dense block is elaborated. The implementation is for tiny picture inputs
such as CIFAR and SVHN. They used the 264-layer DenseNet design, which is
made up of four dense blocks of six, twelve, sixty-four, and forty-eight layers,
Detection of Diabetic Retinopathy Using Fundus Images 341
3 Methodology
Classification (Output)
DenseNet was proposed by Cornell Uni, Tsinghua Uni, and Facebook Research in
the paper: “Densely Connected Convolutional Networks”. It is an architecture that
focuses on making the deep learning networks go even deeper, but at the same time
making them more efficient to train, by using shorter connections between the layers
[8]. DenseNet achieves similar accuracy as ResNet on the large scale ILSVRC 2012
(ImageNet) dataset by employing less than half the number of parameters and nearly
half the number of floating-point operations per second. DenseNet has been used on
a variety of datasets. Different sorts of dense blocks are utilized depending on the
dimensionality of the input. The following is a basic summary of these layers:
i. Basic DenseNet Composition Layer: Each layer is followed by a pre-activated
batch normalization layer, ReLU activation function, and 3 × 3 convolution in
this form of dense block (Fig. 3).
ii. Bottleneck DenseNet (DenseNet-B): Because each layer generates k output
feature maps, computation becomes more difficult at each level. As a result, a
bottleneck structure is adopted, with 1 × 1 convolutions used before a 3 × 3
convolution layer [10] (Fig. 4).
iii. DenseNet Compression: The feature maps at the transition layers are lowered
to increase model compactness. So, if a dense block has m feature maps and the
transition layer produces i output feature maps, where 0 < i ≤ 1, i also signifies
the compression factor. The number of feature mappings across transition layers
remains unaltered if the value of I is one (i = 1). If i is less than 1, the architecture
is known as DenseNet-C, and the value of i is set to 0.5. The model is known as
Detection of Diabetic Retinopathy Using Fundus Images 345
DenseNet-BC when both bottleneck and transition layers with i less than 1 are
employed.
iv. Multiple Dense Blocks with Transition Layers: A 1 × 1 Convolution layer and
a 2 × 2 average pooling layer follow the dense blocks in the design. It is simple
to concatenate the transition layers when the feature map sizes are the same. A
global average pooling is conducted at the conclusion of the dense block and is
linked to a softmax classifier (Fig. 5).
4 Results
The results are obtained from training the CNN model is stated in the form of tables.
Parameters are values that are given to the network when it is created; the network
cannot learn these values during training. Image size [9], kernel size, number of
layers in the neural network, batch size, number of epochs to train, and so on, these
are some of the factors. Various terminologies used in this section are:
• Batch size: In one forward/backward pass, the amount of training instances is
called as batch size. More memory space is required as the batch size grows.
346 S. V. Viraktamath et al.
• Epoch: When a whole dataset is only transported forward and backward through
the neural network once, it is called an Epoch. Because a single epoch is too large
to transmit to the computer all at once, it is split into smaller chunks [11].
• Accuracy: Number of correct predictions in total number of predictions.
• Train accuracy: The accuracy of a model on examples it was constructed on.
• Validation accuracy: The accuracy of a model on examples it has not seen.
• Kappa: The Kappa statistic is a measure of how well the cases categorized by
the machine learning classifier matched the data labeled as ground truth, while
accounting for the predicted accuracy of a random classifier [12].
After considering DenseNet model, the results of various trials which are obtained
by changing the parameters such as image size, epoch, and batch size are shown in
Tables 4, 5, and 6.
5 Conclusion
The CNN model DenseNet121 is tested with respect to the data. By comparison,
it is evident that DenseNet121 offers good accuracy with respect to Detection of
Diabetic Retinopathy using the fundus images. The error rate of prediction is found
to be better than other CNN models.
Although, Diabetic Retinopathy Detection offers a unique opportunity to prevent
a significant proportion of vision loss, the data obtained from the results of the
350 S. V. Viraktamath et al.
project stand alone cannot be considered as final diagnosis (without the consultation
and human analysis of the physician), as the project does not deliver 100% accurate
results, but it can be used by the physicians as it saves significant amount of time and
efforts, discards human errors and helps the physicians to better judge the situation
of the disease and the patient.
References
1. Author F (1998) World Diabetes, A newsletter from the World Health Organization, 4
2. Solanki MS (1998) CS365: artificial intelligence: diabetic retinopathy detection using eye.
World Diabetes, A newsletter from the World Health Organization, 4
3. Umarfarooq AS. Blood vessel identification and segmentation from retinal images for diabetic
retinopathy
4. Feng X, Yao H, Zhang S (2019) An efficient way to refine DenseNet. Springer-Verlag London
Ltd., part of Springer Nature
5. Huang G, Liu Z, van der Maaten L, Weinberger KQ (2018) Densely connected convolutional
networks. Facebook AI Research, 28 Jan 2018
6. Mishra S, Hanchate S, Saquib Z (2020) Diabetic retinopathy detection using deep learning.
In: International conference on smart technologies in computing, electrical and electronics
(ICSTCEE 2020)
7. Albahli S, Nazir T, Irtaza A, Javed A. Recognition and detection of diabetic retinopathy using
DenseNet-65 based faster-RCNN. Comput Mater Contin. https://doi.org/10.32604/cmc.2021.
014691
8. Xia M, Song W, Sun X, Liu J, Ye T, Xu Y (2019) Weighted densely connected convolutional
networks for reinforcement learning. Int J Pattern Recognit Artif Intell. https://doi.org/10.1142/
S0218001420520011
9. Luke JJ, Joseph R, Balaji M (2019) Impact of image size on accuracy and generalization of
convolutional neural networks. Int J Res Anal Rev (IJRAR)
10. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image
recognition. arXiv preprint arXiv:1409.1556
11. Kale S, Sekhar A, Sridharan K (2021) SGD: the role of implicit regularization, batch-size and
multiple epochs. In: 35th conference on neural information processing systems (NeurIPS 2021)
12. Ben-Davi A (2008) Comparison of classification accuracy using Cohen’s weighted kappa.
Expert Syst Appl 34(2):825–832. https://doi.org/10.1016/j.eswa.2006.10.022
Artificial Intelligence in the Tribology:
Review
Manoj Rajankunte Mahadeshwara , Santosh Kumar ,
and Anushree Ghosh Dastidar
1 Introduction
There is considerable time spent in performing experiments and analyzing the results
obtained. It is also expensive and requires modern technology to minimize the time
and cost. As a result, the advancement in the application of computers in the field
of mechanical systems has considerably increased over time [1]. Artificial neural
network (ANN) is a method that plays a significant role in this application. It has been
proved that ANN can effectively minimize the cost and time involved in conducting
experiments [2].
This approach was formulated to study various tribological properties such as
coefficient of friction, wear, lubricant properties, film thickness formation, other
surface mechanical properties of composites, polymers, and so on. The models
designed by ANN are enabled to predict the performance of a mechanism in the
conceptual phase by using the critical performance parameters of the experiment.
ANN is a mathematical tool that resembles the nervous system in the human brain.
It accepts the required input and output data and solves complex engineering and
scientific problems [3]. This mathematical technique is used for simulating and under-
standing mechanisms that are otherwise difficult to describe by experimental proce-
dures. Furthermore, ANN has the capability of predicting the output with limited
M. R. Mahadeshwara (B)
University of Leeds, Leeds LS29JT, UK
e-mail: [email protected]
S. Kumar
BMS Institute of Technology and Management, Bengaluru, India
e-mail: [email protected]
A. G. Dastidar
Queens University Belfast, Belfast BT71NN, UK
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 351
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_31
352 M. R. Mahadeshwara et al.
input data after the learning process which is not possible in the case of conventional
analytical techniques [4]. A major advantage of ANN is its superior learning poten-
tial and its capability to build models of multi-dimensional, nonlinear, and complex
functions. The ANN ‘learns’ by organizing the experimental data provided and by
assuming the nature of the relationships in the given problem [5]. The insensitivity
of the neural network to minute changes such as noise helps in avoiding errors. In
recent times, experiments such as pin-on-disk (POD), fretting wear test, tensile, and
compressive strength experiments have been investigated using ANN [6].
An ANN is composed of basic units called artificial nodes, which resemble the
neurons in the human brain. These neurons connect to form synapses and send
signals. The neurons receive the signal, process it, and transmit it to the next receiving
neuron. Each input neuron acts as the output of the previous layer of neurons [7]. The
input values are scaled in distinct functions such as linear, logistic, and hyperbolic
tangential functions. These inputs are multiplied with a weight factor that imitates
the synaptic strength in the nervous system of the brain. They also determine the
activation level of the neuron where it is manipulated by transfer functions to obtain
the output signal. In most cases, the transfer function can be a sigmoid or logistic
function of the form in Eq. (1).
1
x= (1)
1 + e(−x)
However, the transfer function can also be any function that represents the
nonlinear characteristics of the system [8]. Complex relations can be modeled by
using multiple neurons in single or in multiple layers. There are several types of neural
network models with three things in common, namely the neurons, the connections,
and the learning rules. Figure 1 illustrates the artificial neuron. A neural network
model has an output that is dependent only on the input variables and the weight
function. However, this can be modified by recurrent models where the output is
re-circulated back to the neurons in the same or previous layers so the output thus
generated will be changed at every stage [9].
Based on the primary components in the ANN structure such as data flow, neurons,
and the number of layers in the networks of ANN, the models of the neural network
are classified into the following types.
Artificial Intelligence in the Tribology: Review 353
1.2.1 The perceptron is the basic model of neurons. It accepts the input data,
processes it, and supplies the output data [10].
1.2.2 The feed-forward neural network is a widely used model in engineering
applications [11]. It consists of single or multi-layered perceptrons in which the
neurons are in the input, output, or hidden layers. The neurons present in the input
and output layers communicate with the outside environment, while the neurons in the
hidden layer communicate with interconnecting neurons. An ANN with an enlarged
neuron is shown in Fig. 2. In the feed-forward neural network, the activation is fed
from the input to the output layers via the weighted interconnections which are only
in the forward direction. The lack of backpropagation in the single-layered feed-
forward neural network is the reason for its loss of capability in deep learning. In the
case of a multi-layer feed-forward neural network, the layers contain one or more
hidden layers. These are placed between the input and the output layers, in which
each layer has several nodes which are interconnected with each other. It has a bi-
directional propagation approach, i.e., both forward and backward. The input data is
multiplied with a weight factor and is fed in the form of an activation function. These
neural networks use backpropagation to help modify the output, thereby reducing
the error loss and improving its self-dependency [12]. Also, the difference in the
predicted outputs and the trained input can be identified using this approach. Hence,
these types of networks are used in deep learning and are popular among tribological
applications [13].
Other types of neural networks include radial basis function neural networks,
recurrent neural networks, convolutional neural networks, sequence-based modular
neural networks, etc. These neural networks are used based on their applications.
The details of these neural networks are beyond the scope of this paper. However,
more information about these types of neural networks can be found in the following
citations [14–16].
354 M. R. Mahadeshwara et al.
The developed architecture should be fed with a training algorithm to perform its
function of learning the input and providing the desired outputs. In this process, the
applied inputs having the corresponding weights are adjusted in a way to obtain the
desired output values. The network training will be considered complete when no
further modifications in the input weights are necessary and when closer approxima-
tions to the output values are obtained. This minimizes the inaccuracy between the
approximate and the actual output values [17]. Hence, weights are analyzed to deter-
mine the actual impact variable to produce the correct output. Greater the weights
on a specific input variable, the larger the impact on the output parameter. This is
decided as the contribution strength of the input variable [18].
Training an ANN is conducted by three methods, namely supervised learning,
unsupervised learning, and reinforcement learning.
Artificial Intelligence in the Tribology: Review 355
Supervised Learning: In the supervised learning method, both the input and desired
output are provided. The network analyses the input data to produce an output which
is then compared to the desired outputs. Any errors obtained are then circulated
back into the network as weights. The final output is then provided after the weights
are adjusted suitably. This process of adjusting weights will occur until the desired
output is refined. The dataset that enables this training is called a training set [19].
Unsupervised Learning: In the unsupervised method of training, only the input
datasets are provided along with a relative function without the desired output. The
network segregates the distinct groups of datasets provided in a process called ‘clus-
tering’ where the training examples are automatically grouped into categories based
on similarities. This is followed by principal component analysis which discards the
unwanted datasets and compresses the training dataset for identifying the most useful
dataset. However, in the case of supervised learning, there are already pre-assigned
category labels for desired outputs. The main advantage of unsupervised learning is
the minimal workload compared to supervised learning [20].
Reinforcement Learning: It is applicable in areas such as operational research, game
theory, information theory, and simulation-based theory, which is beyond the scope
of this paper. However, for more details about reinforcement learning the following
citations can be referred [21–23].
The ANN technique has been adapted in various condition monitoring applica-
tions for different machining tools to reduce tool wear. These approaches are briefly
explained below.
Lin and Lin [26] monitored the tool wear in a face milling operation using two
methods. In the first method, i.e., the backpropagation neural network technique, the
condition of the tool wear was obtained. The inputs provided were cutting param-
eters, and the output obtained was the average flank wear on cutter inserts. In the
second method, a regression model was used to estimate the tool wear using the
experimental data. In a regression model, the output is predicted as a function of the
given input, and the input features can be either categorical (nominal or ordinal data)
or numeric (continuous or discreet data). It was seen from both the models that an
ANN can be utilized in predicting the tool wear for aluminum using a multi-tooth
cutter and with varying geometries of the workpiece. Subrahmanyam and Sujatha
[27] evaluated localized defects in ball bearings using two ANN methods, namely
a multi-layered feed-forward neural network which was further trained with super-
vised error backpropagation (EBP) technique and an unsupervised adaptive reso-
nance theory-2 (ART2). The ART2 is a method of segregating data in an unsuper-
vised learning training, in which the data is evaluated using cohesion and separation.
These networks were trained with vibrational accelerating signals from a rolling
bearing test rig which helped in differentiating between a defective and a normal
ball bearing. It was proved that these techniques were 95% capable of detecting the
defects in the ball bearings. Assessing the wear in turning carbide inserts with the
help of neural networks was investigated by Das et al. [28]. A simple neural network
system with a 5–3–1 (5 input layers—3 hidden layers—1 output layer) structure was
utilized for monitoring the cutting tool wear utilizing the components of the cutting
force. The measured value was comparable to the model output but was unstable
Artificial Intelligence in the Tribology: Review 357
due to external factors such as chipping and mild vibration that occurred during
machining. Monitoring and detecting drilling wear during a cutting process were
evaluated using multi-layer feed-forward neural network using a backpropagation
algorithm. The neural network gave 90% of the accurate classification, and hence,
it was concluded that a well-trained ANN would be an exceptionally dependable
tool for solving pattern recognition problems in the applications that monitor the
drilling process [29]. Following drilling wear, tool wear in a milling operation was
predicted using a backpropagation neural network by Chen and Chen[30]. The input
parameters used were the depth of cut, feed rate, and average peak force for 100
units acquired from experimental data. Tool wear could be predicted with an error of
±0.037 mm on average using this neural network. Palanisamy et al. [31] compared
the prediction of tool wear using a regression model and feed-forward neural network
technique at an end milling operation. In this study, the neural networks were trained
with experimental values for predicting flank wear in a tool. The predicted values
by both the methods, i.e., the mathematical regression model with an error less
than 5% and the feed-forward neural network technique with an error less than 2%,
were found to be comparable to the obtained experimental results. Gouarir et al.
[32] studied the tool flank wear in a milling operation incorporating a convolution
neural network model. This neural network model was incorporated with the adap-
tive control which adjusted the feed rate and spindle speed to correct the flank wear,
and this combination displayed accuracy of 90% similarities with experimental data.
Ozel and Karpat [33] used a feed-forward neural network model for predicting tool
wear and surface roughness in a hard turning process along with a regression tech-
nique. Thus, the prediction model developed was found to be capable of accurately
predicting surface roughness and tool wear for the range in which it was trained.
The neural network models were further compared to regression models. The neural
network models provided better prediction capabilities than the regression models.
Rao et al. [34] studied the development of a hybrid model using a multi-perceptron
neural network in a neuro solution packaging which is a neural network solution soft-
ware package for ANN simulation. The surface roughness in an electric discharge
machine was optimized utilizing this model which resulted in a reduction of the error
from 5 to 2%. Further analysis concluded that the type of materials used in the EDM
influences the performance measures of the surface roughness.
In addition to the condition monitoring, the ANN approach has also been utilized to
analyze the wear and frictional properties of the materials. Jones et al. [35] first intro-
duced Artificial Intelligence in tribology and showcased modeling neural networks
for complex mechanical systems. Models for POD rig, rub shoe rig, and four-ball
rig have emerged to predict wear regardless of the lubricants used in the system.
The wear rates were predicted by extrapolating or interpolating the existing data
between the known inputs which resulted in an approximate wear rate. Myshkin
358 M. R. Mahadeshwara et al.
et al. [36] utilized two techniques to classify wear debris based on its morpholog-
ical features. Fourier descriptors were utilized to create a set of points in a cluster
that depends on the location, morphology of the wear particles, and the current
conditions of the contact system. These descriptors were used to coordinate with
the cluster that was followed by training the backpropagation neural network. It was
proved that the neural network was capable of classifying the wear debris. However,
a large volume of the wear particles was required to classify the information hence,
using different features would decrease the wear particle volume. Velten et al. [37]
studied the wear behavior of glass, carbon fiber, poly tetra fluoro ethylene (PTFE),
and graphite modified polyamide 4.6 (PA4.6) composites using three-layered feed-
forward neural network technique. A database of 60 wear volume measurements
was investigated and concluded with comparable predictions of the wear volume
that were obtained in comparison to the neural network architecture investigated by
Jones et al. [35]. Zhang et al. [38] studied the coefficient of friction and specific
wear rate of short fiber reinforced PA4.6 composites using a multiple-layer feed-
forward neural network based on experimental data. The predicted values obtained
were comparable to the real test values which could be improved by expanding the
training datasets and optimizing the neural network. Zhang et al. [39] studied erosive
wear in three polymers, namely polyurethane, epoxy, and polyethylene which were
modified by hygrothermally decomposed polyurethane and assessed using a multi-
layer feed-forward neural network. The random datasets were selected, in which
35–80% of the tests predicted that the coefficient of determination, which is the
value proportional to the variation in the dependent variable that is predictable from
the independent variable, is greater than or equal to 0.9 in all the cases. Ranking this
coefficient of determination property could provide information about the dominant
properties among the polymers to cause erosive wear. Genel et al. [40] utilized a
multi-layer feed-forward neural network to study the tribological properties of zinc-
aluminum composites reinforced with alumina fiber. It was found that with increasing
fiber volume fraction, a decrease in the specific wear rate, and by increasing the load,
an increase in the wear rate was observed because the fibers enhance the mechanical
strength in the composite. However, it was established that the composites had a
better friction coefficient and wear resistance compared to that of the unreinforced
materials due to the addition of short Saffil fiber (δ-Al2 O3 ). This was further verified
with the ANN model along with experimental data, and the degree of accuracy of
the prediction was 99.4% and 94.2 for friction coefficient and specific wear rate,
respectively, thus proving ANN as an excellent analytical tool. The structure of the
ANN is shown in Fig. 4 along with the input and output variables.
ANN technique was also utilized in predicting the flank wear during drilling
operation using backpropagation neural network algorithm and proved that various
parameters for instance the spindle speed, feed rate, and drill diameter affect the flank
wear and cutting conditions of a high-speed steel drill bit that is utilized for drilling
holes in copper. These served as the input parameter in predicting the wear rate, and it
was proved that the predicted values by the neural network coincided with the exper-
imental data which was within ±7.5% of the experimental value [41]. Durmus et al.
[42] predicted surface roughness and abrasive wear of the aluminum alloy AA 6351
Artificial Intelligence in the Tribology: Review 359
Fig. 4 The structure of the three-layered neural network to study the wear rate and COF [40]
model considering various input parameters such as the hydroxyapatite volume frac-
tion, wear load, and wear distance to predict the output parameter, i.e., wear volume.
The model was further used to predict the loss of volume of different composites
for wear distances in the range of 0–1000 m at different applied wear loads. The
specific wear rate and friction coefficient of polyphenylene sulfide composites were
predicted using the ANN model using input variables of mechanical and thermome-
chanical properties (e.g., compressive and tensile properties tested at room temper-
ature along with dynamic mechanical thermal analysis properties analyzed in the
range of 23–230 °C). Optimizing and improving the ANN was done through the
implementation of the optimal brain surgeon (OBS) method [47]. The OBS algo-
rithm is a powerful technique for network optimization, which is used to improve the
efficiency and performance of the ANN. This algorithm identified and removed the
irrelevant nodes in the ANN model. Further, the optimized ANN predicted the tribo-
logical properties of PPS composite material which was comparable to the real test
values [48]. The abrasive wear rates of Cu-Al2 O3 nanocomposites were predicted
using the backpropagation neural network and multi-variable regression analysis.
This is a model used to establish the relationship between multiple independent vari-
ables and a dependent variable. The data acquired from a range of wear tests was
used as the input and compared with the two models. In addition to this, a compar-
ison was made by implementing the genetic algorithm (GA) which is a technique of
solving unconstrained and constrained problems using the process of natural selec-
tion. It was shown that ANN with GA is a better tool to predict the abrasive wear rate
accurately on Cu–Al2 O3 nanocomposite materials when compared with the results
shown of ANN without GA [49]. Kumar et al. [50] predicted the dry sliding wear
in a metal matrix composite of aluminum 6061 alloy reinforced with Al2 O3 using a
backpropagated neural network to show a nonlinear relation between wear and other
influential factors, i.e., density, the weight percentage of reinforcement, and the load
applied sliding distance. The nonlinear relationship between applied load, density,
sliding distance, the weight percentage of reinforcement, and height with wear has
been predicted using ANN. The results obtained by the ANN were comparable with
that of experimental results. A study was conducted to predict the wear between a rail
and the train wheel with different contact conditions using a nonlinear autoregres-
sive model with exogenous input neural network. This is a nonlinear autoregressive
model which has the exogenous inputs that can relate both the current and the past
values of the externally determined series which influences the series of interest. The
results were obtained as the mean absolute percentage error from the ANN which was
compared with the experimental data from the profilometer demonstrating equiva-
lent results [51]. Borjali et al. [52] quantified the results of various POD experiments
using a series of machine learning techniques. To begin with, an interpretable model-
based method such as linear regression was used, wherein the relationship between
input and target attributes is defined. In addition, data-driven models have been used
that utilize a dataset, without explicitly defining the relation between the input and
target attributes. Here, the neural network is trained by the dataset from polyethy-
lene wear rate and relates the operating parameters to the wear rate of polyethylene
by employing neurons that communicate with each other in a nonlinear manner.
Artificial Intelligence in the Tribology: Review 361
Instance-based methods like the K-Nearest Neighbor (KNN) technique were then
implemented, which predicted the wear rate of polyethylene, based on clustering the
data into subgroups, thus reducing the prediction error. This study proves that the
data-driven model can successfully predict the polyethylene wear rate for new POD
experiments provided that the operating parameters fall within the dataset ranges
that were used for training the model. This could help to reduce the need for more
experimental studies or designing a new experiment.
In addition to the friction, wear, and other tribological properties, the ANN tech-
nique can also be applied in the lubrication and lubricant formulation of various oils.
Bhaumik et al. [53] studied bio-degradable lubricants based on various vegetable
oils (e.g., palm oil, coconut oil, castor oil, etc.) with additives of nano-frictional
modifiers such as carbon nanotubes and graphene. The database from the previous
literature was used to train the ANN model along with the genetic algorithm. This
study was performed to study the simulation understandings of the four-ball test and
POD experimental results. The predicted results proved that the ANN technique can
be implemented to study and design lubricants with various tribological properties.
Furthermore, a feed-forward neural network ANN technique was utilized to analyze
the experimental database obtained from a four-ball tester and POD technique to
predict the anti-wear properties in the castor oil with dispersing non-carbonaceous
and carbonaceous friction modifiers such as graphite, zinc oxide nanoparticles, multi-
walled carbon nanotubes, and graphene. The speed, load, and concentration of the
friction modifiers were the input variables to obtain the COF in this experiment [54].
It was concluded that the COF of the lubricant with the multi-frictional modifiers is
40–50% lower and the diameter of the wear scar is 87.5% less in comparison with
other mineral oils. Furthermore, the method of implementing ANN for analyzing the
compound relationship between the percentages of vegetable oil in the fuel mixtures
to reduce the coefficient of friction was assessed [55]. The data obtained from exper-
iments such as the POD for sunflower seed oil in biodiesel mixtures was used as
an input. A backpropagation neural network algorithm was used to predict the data
which showed a perfect correlation between the experimental. Two types of biodiesel
were compared with 0 and 6.5% sunflower oil. It was found that the coefficient of
friction in the biodiesel having 6.5% of sunflower oil was more than double that of
the 0% sunflower oil in biodiesel which thus identified the optimal percentage of
sunflower oil in the biofuel mixture.
362 M. R. Mahadeshwara et al.
Rutherford et al. [56] studied the abrasive wear resistance of a multi-layered coating
of TiN/NbN deposited by physical vapor deposition using a multi-layer perceptron
model. It was found that the most influential parameters on abrasive wear included
the hardness of the interlayer, deposition pressure, interlayer mixing, and relative
proportion of two layers of the material with the multi-layer coatings. Moder et al.
[57] identified the lubrication regimes in hydrodynamic journal bearings. The neural
network was trained using logistic regression models with high-speed data signals
from torque sensors. The results obtained displayed the 99.25% accuracy of fast
Fourier transforms of the high-speed torque signals to predict lubrication regimes
with distinctive frequencies. Gorasso and Wang [58] optimized the journal bearing
for its power loss and mass flow using a genetic algorithm and a multi-perceptron
neural network. The ANN was trained with Reynold’s equation and computational
fluid dynamic simulations from Ansys Gambit and Ansys Fluent. In conclusion, it
was shown that ANN can accurately predict the performance like power loss and
mass flow of a hydrodynamic journal bearing. In a different study, the feed-forward
neural network was utilized to predict the friction coefficient in lubricated conditions.
The neural networks were trained through the data obtained from tribological tests
on a mini-traction machine which were then compared with conventional simulation
tools such as linear regression models. It was concluded that the ANN can be used as
an excellent simulation tool to predict the COF in the thermal elastohydrodynamic
contacts [59].
Along with the tribological properties of materials, the materialistic properties have
also been studied using the ANN approach. Zhang et al. [60] investigated the damping
and storage modulus of SCF-reinforced PTFE-based composites using the dynamic
mechanical thermal analysis method. The results obtained in this method were further
verified using the Bayesian regularization of a backpropagated algorithm. Bayesian
regularization decreases the linear combination of weights and squared errors, which
proved that the complexity of the nonlinear relation between the input and output data
increased, as the number of the training dataset increased. This proved that ANN is
the potential analytical tool for structural property analysis of polymer composites.
Altinkok and Koker [61] studied the tensile properties and the density of Al2 O3 /SiC
dual ceramic reinforced aluminum matrix composites produced by a stir casting
process using a backpropagated neural network with a gradient descent learning algo-
rithm. The sizes of SiC particles were provided as an input to the neural network. The
density and the tensile strength values were predicted using the neural network with
an error of 0.000472 compared to the experimental results. Koker et al. [62] assessed
the mechanical properties such as bending strength and the hardness behavior of
Artificial Intelligence in the Tribology: Review 363
the Al–Si–Mg metal matrix composites using various neural network training algo-
rithms. The four different neural networks were investigated in studying the bending
strength and the hardness behavior by feeding the SiC size particle as the input. The
neural networks studied were quasi-Newton, Levenberg–Marquardt, resilient back-
propagation, and variable learning rate backpropagation. In this comparative study,
it was found that the Levenberg–Marquardt algorithm supplied the high accuracy
and fastest prediction of the output in the composites due to its speed in predic-
tion. In a different study, the fretting wear and mechanical properties (Izod impact
energy, tensile strength, modulus, flexural strength, and modulus) of the reinforced
PA composites with two experimental databases were studied by Jiang [63]. The
ANN equipped with a backpropagated algorithm was trained with the input of 101
independent wear tests from PA 4.6 composite and the 93 pairs of independent tension
test, Izod impact test, and bending tests of PA 6.6 composites. The property profiles
of the composite as a function of the short fiber content were suitably predicted by the
neural network, thus proving the capability of a well-optimized model. Partheepan
[64] evaluated the fracture toughness of different steels such as chromium steel (H11),
die steel (D3), medium carbon steel (MC), and low carbon steel (LC) using a minia-
ture specimen test and feed-forward neural network. The load elongation obtained
from finite element methods was fed as the input to the feed-forward neural network
model. The fracture toughness was predicted as the output which was then compared
to the ASTM standard test results. The obtained results were varying from 1 to 6.63%
accuracy for various materials. Hassan et al. [65] predicted the porosity, density, and
hardness of aluminum-copper-based composite materials using the volume fraction
of the reinforced particle and the weight percentage of copper as the input using the
feed-forward backpropagated neural network model. The maximum absolute relative
error using the ANN technique did not exceed 5.99% proving the ANN to be a time-
and cost-saving analytical tool. Hafizpour et.al. [66] analyzed the influence of rein-
forcing particles on the compressibility of Al–SiC composite powders by utilizing
a backpropagated neural network model. An accuracy of 97% was predicted by the
model and the outcome of the reinforced particle size and the volume fraction on the
densification of the Al–SiC composite powder using iso-density curves. These curves
refer to an ellipse on a two-dimensional scatter diagram or a scatter plot that encircles
a specified proportion of the cases constituting groups which in this case is the Si
group. Suresh et al. [67] evaluated the solid particle erosion on PPS composites using
Bayesian regularization trained neural network. The steady-state erosion rates were
calculated at different velocities and impact angles using silica sand particles as an
erodent to obtain experimental values which were verified with the predicted values
from the neural network, thus obtaining the results which matched the experimental
values.
364 M. R. Mahadeshwara et al.
3 Conclusion
The purpose of this study was to summarize the potential of ANN in the field of
tribology. ANN is a promising mathematical technique used in modeling and is one
of the most efficient techniques in reducing the time for analysis compared to the
conventional modeling techniques. Complex nonlinear fundamental problems can be
solved using ANN which can help tackle various tribological problems. Tribological
properties such as condition and tool wear monitoring, wear and friction, lubrication
and lubricant formulation, and studying surface modification and technologies and
other materialistic properties have been studied using the application of ANN which
contributes to a reduction in the duration of an experiment. Using the first experi-
mental datasets, subsequent results can be predicted without performing experiments
which is one of the most promising applications of ANN. ANN is thus a powerful
simulation tool and can direct future tribologists to use this technique effectively to
save time and resources.
References
12. Laguna M, Martí R (2002) Neural network prediction in a system for optimizing simulations.
IIE Trans 34:273–282. https://doi.org/10.1023/A:1012485416856
13. Argatov I (2019) Artificial neural networks (ANNs) as a novel modeling technique in tribology.
Frontiers Mech Eng 5(30). https://doi.org/10.3389/fmech.2019.00030
14. Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press
15. Dash CSK, Behera AK, Dehuri S, Cho SB (2016) Radial basis function neural networks: a
topical state-of-the-art survey. Open Comput Sci 6(1):33–63. https://doi.org/10.1515/comp-
2016-0005
16. Parfitt S (1991) An introduction to neural computing by Igor Aleksander and Helen Morton,
Chapman and Hall, London, 1990, pp 255, £15.95. The Knowl Eng Rev 6(4):351–352. https://
doi.org/10.1017/s0269888900005968
17. Burger C, Traver R (1996) Applying neural networks system auditing. EDPACS EDP Audit
Control Secur Newsl 24(6):1–10. https://doi.org/10.1080/07366989609452285
18. Marshall JA (1995) Neural networks for pattern recognition. Neural Netw 8:493–494. https://
doi.org/10.1016/0893-6080(95)90002-0
19. Ojha VK, Abraham A, Snášel V (2017) Metaheuristic design of feedforward neural networks:
a review of two decades of research. Eng Appl Artif Intell 60:97–116. https://doi.org/10.1016/
j.engappai.2017.01.013
20. Buhmann J, Kuhnel H (1992) Unsupervised and supervised data clustering with competitive
neural networks. In: IJCNN international joint conference on neural networks, vol 4, pp 796–
801
21. Jaksch T, Ortner R, Auer P (2010) Near-optimal regret bounds for reinforcement learning. J
Mach Learn Res 11(4)
22. Busoniu L, Babuska R, De Schutter B, Ernst D (2017) Reinforcement learning and dynamic
programming using function approximators. CRC press
23. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press
24. Leung H, Haykin S (1991) The complex backpropagation algorithm. IEEE Trans Sign Process
39(9):2101–2104. https://doi.org/10.1109/78.134446
25. Chauvin Y, Rumelhart DE (2013) Backpropagation: theory, architectures, and application.
Psychology press
26. Lin SC, Lin RJ (1996) Tool wear monitoring in face milling using force signals. Wear 198(1–
2):136–142. https://doi.org/10.1016/0043-1648(96)06944-x
27. Subrahmanyam M, Sujatha C (1997) Using neural networks for the diagnosis of localized
defects in ball bearings. Tribol Int 30(10):739–752. https://doi.org/10.1016/s0301-679x(97)000
56-x
28. Das S, Roy R, Chattopadhyay AB (1996) Evaluation of wear of turning carbide inserts using
neural networks. Int J Mach Manuf 36(7):789–797. https://doi.org/10.1016/0890-6955(95)000
89-5
29. Abu-Mahfouz I (2003) Drilling wear detection and classification using vibration signals and
artificial neural network. Int J Mach Tools Manuf 43(7):707–720. https://doi.org/10.1016/
s0890-6955(03)00023-3
30. Chen JC, Chen JC (2004) An artificial-neural-networks-based in-process tool wear prediction
system in milling operations. Int J Adv Manuf Technol 25(5–6):427–434. https://doi.org/10.
1007/s00170-003-1848-y
31. Palanisamy P, Rajendran I, Shanmugasundaram S (2007) Prediction of tool wear using regres-
sion and ANN models in end-milling operation. Int J Adv Manuf Technol 37(1–2):29–41.
https://doi.org/10.1007/s00170-007-0948-5
32. Gouarir A, Martínez-Arellano G, Terrazas G, Benardos P, Ratchev SJPC (2018) Process tool
wear prediction system based on machine learning techniques and force analysis. Procedia
CIRP 77:501–504. https://doi.org/10.1016/j.procir.2018.08.253
33. Özel T, Karpat Y (2005) Predictive modeling of surface roughness and tool wear in hard turning
using regression and neural networks. Int J Mach Tools Manuf 45(4–5):467–479. https://doi.
org/10.1016/j.ijmachtools.2004.09.007
366 M. R. Mahadeshwara et al.
34. Rao GKM, Rangajanardhaa G, Hanumantha Rao D, Sreenivasa Rao M (2009) Development
of hybrid model and optimization of surface roughness in electric discharge machining using
artificial neural networks and genetic algorithm. J Mater Process Technol 209(3):1512–1520.
https://doi.org/10.1016/j.jmatprotec.2008.04.003
35. Jones SP, Jansen R, Fusaro RL (1997) Preliminary investigation of neural network techniques
to predict tribological properties. Tribol Trans 40(2):312–320. https://doi.org/10.1080/104020
09708983660
36. Myshkin NK, Kwon OK, Grigoriev AY, Ahn HS, Kong H (1997) Classification of wear debris
using a neural network. Wear 203:658–662. https://doi.org/10.1016/s0043-1648(96)07432-7
37. Velten K, Reinicke R, Friedrich K (2000) Wear volume prediction with artificial neural
networks. Tribol Int 33(10):731–736. https://doi.org/10.1016/s0301-679x(00)00115-8
38. Zhang Z, Friedrich K, Velten K (2002) Prediction on tribological properties of short fibre
composites using artificial neural networks. Wear 252(7–8):668–675. https://doi.org/10.1016/
s0043-1648(02)00023-6
39. Zhang Z, Barkoula NM, Karger-Kocsis J, Friedrich K (2003) Artificial neural network predic-
tions on erosive wear of polymers. Wear 255(1–6):708–713. https://doi.org/10.1016/s0043-
1648(03)00149-2
40. Genel K, Kurnaz SC, Durman M (2003) Modeling of tribological properties of alumina fiber
reinforced zinc-aluminum composites using artificial neural network. Mater Sci Eng A 363(1–
2):203–210. https://doi.org/10.1016/s0921-5093(03)00623-3
41. Singh AK, Panda SS, Chakraborty D, Pal SK (2005) Predicting drill wear using an artificial
neural network. Int J Adv Manuf Technol 28(5–6):456–462. https://doi.org/10.1007/s00170-
004-2376-0
42. Durmuş HK, Özkaya E, Meri C (2006) The use of neural networks for the prediction of wear
loss and surface roughness of AA 6351 aluminium alloy. Mater Des 27(2):156–159. https://
doi.org/10.1016/j.matdes.2004.09.011
43. Jiang Z, Zhang Z, Friedrich K (2007) Prediction on wear properties of polymer composites
with artificial neural networks. Compos Sci Technol 67(2):168–176. https://doi.org/10.1016/j.
compscitech.2006.07.026
44. Zhenyu J, Gyurova LA, Schlarb AK, Friedrich K, Zhang Z (2008) Study on friction and wear
behavior of polyphenylene sulfide composites reinforced by short carbon fibers and sub-micro
TiO2 particles. Compos Sci Technol 68(3–4):734–742. https://doi.org/10.1016/j.compscitech.
2007.09.022
45. Rashed FS, Mahmoud TS (2009) Prediction of wear behaviour of A356/Sicp MMCs using
neural networks. Tribol Int 42(5):642–648. https://doi.org/10.1016/j.triboint.2008.08.010
46. Younesi M, Bahrololoom ME, Ahmadzadeh M (2010) Prediction of wear behaviors of nickel
free stainless steel-hydroxyapatite bio-composites using artificial neural network. Comput
Mater Sci 47(3):645–654. https://doi.org/10.1016/j.commatsci.2009.09.019
47. Hassibi B, Stork DG, Wolff GJ (1993) Optimal brain surgeon and general network pruning.
In: IEEE international conference on neural networks, vol 1, pp 293–299. https://doi.org/10.
1109/ICNN.1993.298572.
48. Gyurova LA, Miniño-Justel P, Schlarb AK (2010) Modeling the sliding wear and friction
properties of polyphenylene sulfide composites using artificial neural networks. Wear 268(5–
6):708–714. https://doi.org/10.1016/j.wear.2009.11.008
49. Fathy A, Megahed AA (2011) Prediction of abrasive wear rate of in situ Cu–Al2O3 nanocom-
posite using artificial neural networks. Int J Adv Manuf Technol 62(9–12):953–963. https://
doi.org/10.1007/s00170-011-3861-x
50. Kumar GBV, Pramod R, Rao CSP, Shivakumar Gouda, PS (2018) Artificial neural network
prediction on wear of Al6061 alloy metal matrix composites reinforced with-Al2o3. Mater
Today: Proc 5(5):11268–11276. https://doi.org/10.1016/j.matpr.2018.02.093
51. Shebani A, Iwnicki S (2018) Prediction of wheel and rail wear under different contact condi-
tions using artificial neural networks. Wear 406:173–184. https://doi.org/10.1016/j.wear.2018.
01.007
Artificial Intelligence in the Tribology: Review 367
52. Borjali A, Monson K, Raeymaekers B (2019) Predicting the polyethylene wear rate in pin-
on-disc experiments in the context of prosthetic hip implants: deriving a data-driven model
using machine learning methods. Tribol Int 133:101–110. https://doi.org/10.1016/j.triboint.
2019.01.014
53. Bhaumik S, Mathew BR, Datta S (2019) Computational intelligence-based design of lubricant
with vegetable oil blend and various nano friction modifiers. Fuel 241:733–743. https://doi.
org/10.1016/j.fuel.2018.12.094
54. Bhaumik S, Pathak SD, Dey S, Datta S (2019) Artificial intelligence based design of multiple
friction modifiers dispersed castor oil and evaluating its tribological properties. Tribol Int
140:105813. https://doi.org/10.1016/j.triboint.2019.06.006
55. Humelnicu C, Ciortan S, Amortila V (2019) Artificial neural network-based analysis of the
tribological behavior of vegetable oil-diesel fuel mixtures. Lubricants 7(4):32. https://doi.org/
10.3390/lubricants7040032
56. Rutherford KL, Hatto PW, Davies C, Hutchings IM (1996) Abrasive wear resistance of
TiN/NbN multi-layers: measurement and neural network modelling. Surf Coat Technol
86:472–479. https://doi.org/10.1016/s0257-8972(96)02956-8
57. Moder J, Bergmann P, Grün F (2018) Lubrication regime classification of hydrodynamic journal
bearings by machine learning using torque data. Lubricants 6(4):108. https://doi.org/10.3390/
lubricants6040108
58. Gorasso L, Wang L (2014) Journal bearing optimization using nonsorted genetic algorithm
and artificial bee colony algorithm. Adv Mech Eng 6:213548. https://doi.org/10.1155/2014/
213548
59. Echávarri Otero J, De La Guerra Ochoa E, ChacónTanarro E, LafontMorgado P, DíazLantada
A, Munoz-Guijosa JM, Muñoz Sanz JL (2013) Artificial neural network approach to predict
the lubricated friction coefficient. Lubr Sci 26(3):141–162. https://doi.org/10.1002/ls.1238
60. Zhang Z, Klein P, Friedrich K (2002) Dynamic mechanical properties of PTFE based short
carbon fibre reinforced composites: experiment and artificial neural network prediction.
Compos Sci Technol 62(7–8):1001–1009. https://doi.org/10.1016/s0266-3538(02)00036-2
61. Altinkok N, Koker R (2006) Modelling of the prediction of tensile and density properties in
particle reinforced metal matrix composites by using neural networks. Mater Des 27(8):625–
631. https://doi.org/10.1016/j.matdes.2005.01.005
62. Koker R, Altinkok N, Demir A (2007) Neural network based prediction of mechanical proper-
ties of particulate reinforced metal matrix composites using various training algorithms. Mater
des 28(2):616–627. https://doi.org/10.1016/j.matdes.2005.07.021
63. Jiang Z, Gyurova L, Zhang Z, Friedrich Z, Schlarb AK (2008) Neural network based prediction
on mechanical and wear properties of short fibers reinforced polyamide composites. Mater Des
29(3):628–637. https://doi.org/10.1016/j.matdes.2007.02.008
64. Partheepan G, Sehgal DK, Pandey RK (2008) Fracture toughness evaluation using miniature
specimen test and neural network. Comput Mater Sci 44(2):523–530. https://doi.org/10.1016/
j.commatsci.2008.04.013
65. Hassan AM, Alrashdan A, Hayajneh MT, Mayyas AT (2019) Prediction of density, porosity
and hardness in aluminum–copper-based composite materials using artificial neural network.
J Mater Process Technol 209(2):894–899. https://doi.org/10.1016/j.jmatprotec.2008.02.066
66. Hafizpour HR, Sanjari M, Simchi A (2009) Analysis of the effect of reinforcement particles on
the compressibility of Al–SiC composite powders using a neural network model. Mater Des
30(5):1518–1523. https://doi.org/10.1016/j.matdes.2008.07.052
67. Suresh A, Harsha AP, Ghosh MK (2009) Solid particle erosion studies on polyphenylene sulfide
composites and prediction on erosion data using artificial neural networks. Wear 266(1–2):184–
193. https://doi.org/10.1016/j.wear.2008.06.008
House Price Prediction Using Advanced
Regression Techniques
1 Introduction
Machine learning predictions have been proven very useful and for stock price predic-
tions, market trends, etc. Real estate is one of the prime fields to apply the ideas of
machine learning on how to enhance and foresee the costs with high accuracy [1]. We
have applied machine learning for the problem statement and house price prediction.
The task consists of predicting the price of a house depending upon various factors.
These factors include variables such as square feet, number of rooms, and loca-
tion. There are countless numbers of features that can influence the price. We have
approached the problem through feature engineering and applied methods such as
imputation, handling outliers, log transforming skewed variables, OneHotEncoding
categorical features, and feature selection.
The organization of this paper is as follows: Sect. 2 provides brief review on
contemporary work done by the researchers. Section 3 presents brief description
about different methods applied throughout the process along with respective math-
ematical formulation. The description of the dataset and experimental setup is present
in Sect. 4. Section 5 contains the experimental results and analysis on the popular
house pricing dataset, and the final section presents the conclusions and future work.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 369
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_32
370 H. Vasani et al.
2 Literature Survey
In [2], the authors propose a hybrid model to predict house prices. They have
employed the feature engineering techniques that improved the data normality and
linearity of data and defined RMSE for a different number of features. Truong et al.
[3] implement random forest, XgBoost, and LightGBM machine learning models.
If there is much irrelevant and redundant information present or if the data is noisy
and missing, then knowledge discovery during the training phase becomes difficult
[4]. Data pre-processing gave us an insight into the data and extraction of features
through different feature selection methods allowed us to model advanced regression
techniques on the most important features. Apart from this, we have tried to work
toward the interpretability of the machine learning models. Models like random
forest, gradient boosting, and XGBoost are considered to be the black-box models.
3 Methodology
n
2
m
2
L ridge (β) = yi − xi' β +λ β 2j = RSS + λβ (1)
i=1 i=1
Lasso regression uses shrinkage. Shrinkage is basically where data points are
shrunk toward a central point. Lasso is usually applied on models that contain high
levels of multicollinearity. The additional term after RSS is called the shrinkage
penalty or L1 norm. Equation (2) shows the mathematical formulation of lasso
regularization, where λ stands for shrinkage parameter.
n
2
m
L lasso (β) = yi − xi' β +λ β j = RSS + λβ (2)
i=1 i=1
ElasticNet linear regression uses the penalties from both the lasso and ridge tech-
niques to regularize regression models. ElasticNet combines both the regularization
techniques (i.e., Lasso and Ridge) by learning from their deficits, and thus, the regu-
larization of statistical models is improved. Equation (3) shows the mathematical
formulation of ElasticNet regularization.
n
2
m
2
m
L lasso (β) = yi − xi' β /2n + λ((1 − α))/2 β + α β j (3)
j
i=1 i=1 i=1
Random forest can be defined as an ensemble technique that can perform classifi-
cation and regression tasks via decision trees and bootstrap aggregation or bagging.
Gradient boosting can be defined as grouping together multiple weak machine
372 H. Vasani et al.
learning models to in turn get a stronger model. Usually, decision trees are used
in the ensemble of weak models. Light gradient boosting or LightGBM is a gradient
boosting framework which is fast and highly efficient. It is based on decision tree
algorithm. LightGBM is capable of handling large-scale data. It converges faster
than gradient boosting method. XGBoost stands for extreme gradient boosting. It
is an advanced version of gradient boosting. XGBoost learns from its mistakes and
contains large number of hyperparameters for fine-tuning.
4 Dataset Description
The dataset used in this project/paper is the Ames housing dataset from Kaggle [5].
The dataset was originally used for house price prediction using advanced regres-
sion techniques in Kaggle competition. The motivation for choosing this dataset
comes from the fact that it is an extended version of the Boston housing dataset and
consists of 79 explanatory variables. It consists of both numerical and categorical
variables. The target variable is the Sale Price which represents the actual cost of the
property/house given the independent variables.
We have started the experiment with pre-processing of the data. The experiments
are carried out on the Google Colab platform, where the models are developed and
tested using Keras, TensorFlow, NumPy, matplotlib libraries. On the univariate time
series dataset, we implemented 6 distinct models and compared their results with
each other.
Dataset was taken from the Kaggle Website [5]. We visualized the numerical
attributes via scatter plots and categorical attributes with box plots. Figure 1 shows
the scatter plots for the 36 numerical attributes. It can be seen that certain attributes
contain outliers which are ought to be removed after the outlier detection process.
Next step was carrying out the imputation process. For normally distributed categor-
ical variables, missing values with filled with ‘none’. For example, a missing entry
in ‘Mas Vnr Type’ practically meant that veneer is not present for that property.
Two attributes: ‘Garage Qual’ and ‘Garage Cond’ were imputed with the maximum
occurrence of a particular category in these columns. All numerical attributes were
imputed with the median of the values in those columns. After this process, the
resultant shape of the dataset was (183,982). Following this step, it was necessary to
look out the distribution of the target variable, i.e., the ‘Sale Price’ attribute. Figure 2
shows distribution of the target variable. It is seen that the variable right skewed
(positively skewed) and has the skewness measure of 1.297067. After log transfor-
mation, it is transformed close enough to a Gaussian or normal distribution. Figure 3
shows the log-transformed variable and has the skewness measure of –0.269862.
House Price Prediction Using Advanced Regression Techniques 373
Upon trying different values of the hyperparameter alpha, best value of alpha was
found to be 3.0. Ridge regularization is also called L2 penalty which prevents the
model from overfitting and thus make model a good fit for the data. Figure 5 shows
the plot of residuals vs prediction for the test set. It can analyzed from the plot that
as the residuals are near to the origin, i.e., zero, the model is the good fit for our data.
Figure 5 shows the coefficients of the top 20 variables among which first 10 represents
attributes with highest coefficients, and last 10 signifies lowest coefficients.
Upon trying different values of the hyperparameter alpha, best value of alpha was
found to be 0.003. Lasso regularization is also called L1 penalty which prevents the
model from overfitting and thus make model a good fit for the data. Figure 5 shows
the plot of residuals vs prediction for the test set. It can analyzed from the plot that
as the residuals are near to the origin, i.e., zero, the model is the good fit for our
House Price Prediction Using Advanced Regression Techniques 375
data. Figure 5 shows the coefficients of the top 20 variables among which first 10
represents attributes with highest coefficients, and last 10 signifies lowest coefficients
or least relevance.
(3) Linear Regression with ElasticNet Regularization
Upon trying different values of the hyperparameter alpha, best value of alpha was
found to be 0.0003 and best value of l1 ratio turned out to be 0.95. ElasticNet
regualrization is a combination of L1 and L2 penalty which prevents the model from
overfitting and thus make model a good fit for the data. Figure 6a shows the plot of
residuals vs prediction for the test set. It can analyzed from the plot that as the residuals
are near to the origin, i.e., zero, the model is the good fit for our data. Figure 6b shows
the coefficients of the top 20 variables among which first 10 represents attributes with
highest coefficients, and last 10 signifies lowest coefficients.
We experimented the model with the n estimators = 600 which indicated the number
of tress to be considered while training the model. Figure 8 shows the plot of residuals
versus prediction for the test set. It can analyzed from the plot that as the residuals
are near to the origin, i.e., zero, the model is the good fit for our data. Evaluation was
carried out using repeated K-fold on both test and train set with n splits = 10 and
n repeats = 5. The actual value of first entry in test set was 254,900.0000000001,
376 H. Vasani et al.
Fig. 6 a Residuals (linear regression with Lasso regularization), b coefficients (linear regression
with Lasso regularization)
forest regressor which is quite close to the actual value. This step requires inverse log
transformation XGBoost has the best performance of all the models mentioned.We
can infer from Fig. 11 that as the house condition moves towards getting excellent,
the value price of that place indeed increases.
Table 1 Evaluation
Model RMSE MAE r28 core
performed on the validation
set RidgeCV 0.126140 −0.085 0.913816
LassoCV 0.127350 −0.080 0.917159
ElasticNetCV 0.128725 −0.081 0.917537
Random forest regressor 0.135762 −0.144 0.903091
Gradient boosting regressor 0.121118 −0.145 0.915101
LightGBM 0.128655 −0.144 0.909732
XGBoost 0.128740 −0.148 0.921972
House Price Prediction Using Advanced Regression Techniques 381
References
1. Kuvalekar A, Manchewar S, Mahadik S, Jawale S (2020) House price forecasting using machine
learning. In: ICAST. https://doi.org/10.2139/ssrn.3565512
2. Lu S, Li Z, Qin, Yang X, Goh RSM (2017) A hybrid regression technique for house prices
prediction. In: IEEM, pp 319–323. https://doi.org/10.1109/IEEM.2017.8289904
3. Truong Q, Nguyen M, Dang H, Mei B (2020) Housing price prediction via improved machine
learning techniques. Procedia Comput Sci 174:433–442, ISSN 1877–0509
4. Tulio Ribeiro CGM, Singh S (2016) Model-agnostic interpretability of machine learning.
Pre-print: arxiv 1606.05386V
5. Dataset Source: https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data
6. Sotiris Kotsiantis PEP, Kanellopoulos D (2006) Data preprocessing using supervised learning.
Int J Computer Sci 1(1), ISSN: 1306-4428
7. Abbasi S (2020) Advanced regression techniques based housing price prediction model. https://
doi.org/10.13140/RG.2.2.18572.87684
Product Integrity Maintenance
and Counterfeit Avoidance System Based
on Blockchain
1 Introduction
Since the evident growth of globalization, many big organizations follow a globalized
approach for manufacturing in light of reducing costs required for this purpose. This
transition, however, comes at a cost as rendering a general model of quality control
becomes complicated due to a multitude of social and political factors. With avid
globalization, the supply chain also becomes more complex and thus becomes a
contributing factor to the complicated nature of quality control. This situation, thus,
leaves the door open for an increase in counterfeiting. Counterfeiting is not restricted
to a particular industry sector but rather affects a wide array of industries and it
impacts the brand values of organizations that manufacture the genuine products
and, in extension, impacts the global economy.
Many studies that have observed the growth of counterfeiting have observed a
sound increase in sales of these products; thus, it is quite observable that counter-
feiting has become a true threat to many organizations and has continued to escalate
over the years. Based on the report on the enforcement of Intellectual Property Rights
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 383
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_33
384 S. R. Pujar et al.
Fig. 1 Number of cases registered across the years by the EU customs [1]
by EU customs [1], it was observed that the total number of cases of counterfeiting
increased by 32% in 2019 in the European Union. As seen in Fig. 1, there is a sharp
increase in the number of counterfeit products being produced every year, and thus,
it can be established that counterfeiting is a serious and escalating problem that is
impacting the global economy.
Counterfeiting has an impending impact on various companies, governments and,
in extension, to society as a whole. The direct impact of counterfeiting is observed
by the companies that are victim to this who face a loss in sales and brand value.
However, counterfeiting also impacts customers who cannot quite realize the differ-
ence between these products and genuine products and this can result in them having
a poor quality product. Furthermore, counterfeit products undermine the efforts of
Intellectual Property Rights and Copyright standards, thereby directly affecting the
capabilities of governments in enforcing these standards across industries.
The inherent immutable nature of Blockchain along with traceability could be
considered as a possible technological solution to solve the counterfeiting problem.
By using Blockchain with barcode-based authentication, a precise and robust product
authentication system can be created. The decentralized nature of Blockchain further
improves the robustness of the system and also helps in maintaining trust across the
supply chain. The proposed solution uses Blockchain as a solution to address the
issue of counterfeiting by providing a product authentication system based on this
novel technology.
2 Literature Review
are many approaches that address this problem, secure authentication still stands as a
recurring problem for physical products. There has been previous work that provides
a system for product authentication. Turcu et al. [2] proposes the use of an Radio
Frequency Identification (RFID)-based integrated system approach to solve this issue
by providing a low-cost solution for rendering traceability and a software-based solu-
tion for controlled distribution across the supply chain. Rotunno et al. [3] provided a
clear analysis about the impact of traceability systems in the context, both positive and
negative, by considering the pharmaceutical supply chain. The observations made in
[3] provide insights into the impact of having traceability systems on the operations
side of the supply chain. Apart from these, [4] illustrated the Implementation of
Serialization and Traceability in a Pharmaceutical Packaging Company. Shi et al. [5]
illustrated the design and implementation of a secure traceability system for supply
chains that use an RFID integrated system, based on the EPC global network. Brock
and Sanders [6] also presented a representation of such a system along with a method
for Anti-Counterfeit Barcode Labeling. Pun et al. [7] explains how Blockchain can be
adapted for combating Deceptive Counterfeit in those markets where customers have
intermediate distrust about products. Toyoda et al. [8] used Blockchain as the core
philosophy to build a Product Ownership Management System (POMS) for RFID
tagged products in order to tackle counterfeiting in the context of the post-supply
chain.
A general idea shared among most of the implementations of a product authen-
tication system is the usage of RFID-based systems. Various implementations have
previously been done to authenticate RFID tagged products. A general review of
RFID-based product authentication was presented in Lehtonen et al. [9]. A general
approach based on what the product is the object-specific features-based authentica-
tion. This provides a certain genuinity to the product and makes the cloning of prod-
ucts more difficult. Object-specific features such as physical or chemical features,
called as unique product identifiers, are used for creating tags for different products,
and these tags are used for authentication [10]. The RFID tags store a signature value
based on the unique product identifier, unique tag identifier, signature method and
a private key [9]. An individual who wants to verify the authenticity of the product
can thus use the public validation key to do so. Another approach for an RFID-based
authentication system is tag authentication. In this approach, rather than the assump-
tion of object-specific features being hard to clone, the focus is on security features
that are hard to clone. In this scenario, the reader device just has to verify whether the
tag has knowledge about a certain secret key. Such a system generally uses authen-
tication protocols based on cryptographic primitives such as hashing or symmetric
key operations [9]. Location-based authentication is another approach for product
authentication. Such type of authentication has a targeted approach, restricted to a
certain size of the environment. Therefore, the outcome of such an approach does not
prevent cloning of products on a large scale but it avoids cloning in the limited scope.
This approach is sometimes referred to as track and trace-based plausibility check
[9]. Serialization is another way of combating counterfeiting. An example for this
is provided in [11] where an example of a Victorian painter is considered who uses
386 S. R. Pujar et al.
3 Blockchain as a Solution
Blockchain has become a trending term in the technical ecosystem. With the introduc-
tion of Ethereum Blockchain, Blockchain comprehensively shifted into a prevalent
technology that could help tackle real-world problems. This section gives an intri-
cate view into Blockchain, Ethereum and the concept of Decentralized Applications
(DApp).
3.1 Blockchain
3.2 Ethereum
4 System Architecture
The previous section presented a clear view of Blockchain technology along with
its real-world implementations. Blockchain can be used as a solution to tackle coun-
terfeiting as it inherently exhibits trust, thus preventing a single point of failure or
reliance on a trusted third party.
388 S. R. Pujar et al.
When a user scans a tag present in the product, a SCAN request is initiated and
sent to the Authentication Module. The unique product identifier is extracted in
this module, and the identifier is queried to both the centralized database and the
Blockchain controller. Once the Blockchain controller receives this identifier, it trig-
gers the Blockchain Module by transferring the identifier. The Authentication Module
then waits for confirmation from both the Blockchain controller and the centralized
database. Based on these confirmations, the Authentication Module will confirm to
the user whether a particular product is genuine or not. Figure 2 shows a clear repre-
sentation of interactions of the Authentication Module with the centralized database
and the Blockchain Module.
When the Blockchain Module receives the unique product identifier, a new transac-
tion is requested from an EOA which is then initiated by the CA on passing the iden-
tifier as the payload. This transaction invokes the authentication procedure present
Product Integrity Maintenance and Counterfeit Avoidance … 389
in the Smart Contract with the payload. This procedure then verifies the availability
of the identifier in the Blockchain and returns true if the identifier is present in the
Blockchain or returns false if the identifier is not present in the Blockchain. Once
the transaction is completed, it is mined and added to the Blockchain.
As mentioned in the previous section, there are generally four types of users in
the system: Company, Distributor, Retailer and Customer. This section observes the
interaction of these various types of users with the system.
1. Company: Company refers to the organization that produces a particular product
and uses the proposed system as an anti-counterfeiting tool. The sequence of
operations and the interactions of this user with the system is presented in Fig. 3.
A user of type Company first has to register for the system and authenticate
themselves for each session. Once authentication is successful, the user can add
various products in their inventory to the system. It is worth noting that whenever
the Company adds the product information, the general product information is
stored in the centralized database and the unique product identifier is stored in
the Blockchain which is then used for authentication.
2. Distributor: Distributors are generally the next step after manufacturing. These
users are responsible for distribution of products among retailers. Similar to
390 S. R. Pujar et al.
the Company, even the Distributor has to be registered to the system to use
it. The Distributor uses the tag in the product and extracts the unique product
identifier and uses this to authenticate the product (see Sect. 4). If the product is
authenticated successfully, the distributor sends the product further in the supply
chain. Otherwise, the product is rejected. Figure 4 shows the interactions of the
Distributor with the proposed system.
3. Retailer: Retailers are the next step in the supply chain. These users receive
the product from the Distributors and place them in the market. The interactions
of the Retailer are similar to that of a Distributor. Retailer scans the tag of the
product and uses the unique product identifier to verify the authenticity of the
product. Based on the result, the Retailer decides whether to place the product in
the market or not. Figure 5 represents a clear view of the interaction of Retailer
with the proposed system.
4. Customer: Customer is the last and the most important step in a supply chain.
Unlike the previous user types, Customers do not need to authenticate themselves
in the system. Customers extract the unique product identifier from the product
tags and verify them against the proposed system. The system then responds
with a positive or a negative result for the authenticity of the product. Therefore,
the customer can verify if the product is genuine or not. Figure 6 represents the
interaction between the Customer and the proposed system.
The proposed system was developed practically and checked for functionality. The
visual results of the system are present in this section along with a few observations
made during implementation. Most important cases are covered in order to represent
the functionality of the proposed system.
1. Addition of Products to the system: An authenticated Company can access the
form to add the products into the system. Using this form, the Company can
392 S. R. Pujar et al.
provide various information about the product. The representation of this portal
is presented in Fig. 7.
The system then generates a unique product identifier in the form of a barcode
and displays it as shown in Fig. 8.
7 Conclusion
References
1 Introduction
Safety from fire hazards is one of the most important parts of a city as well as a
country. It is a very emergency and quick response services in the country and in
constitution, it comes under the 12th schedule dealing with municipal functions [1].
The Indian insurance companies estimate that among the major losses reported in
the year 2007–2008, about 45% of the claims are due to fire losses. According to
another study, it is estimated that about Rs. 1000 crores are lost every year due to fire
hazards [2]. The other interesting fact, from 2011 to 2012 fire and rescue services
(FRSs) in Britain received 584,500 callouts, among which 53.4% were false alarms.
So, not only fire hazards but also the frequency of false alarms is matter of concern.
In all the previous work, we can notice different types of approach to detect fire,
smoke and also to reduce false alarm. In some recent works, two different types of
alarming system for fire safety are popular—(1) traditional sensor-based fire alarm
(consists thermal sensor, smoke sensors, and heat sensors) and (2) machine learning-
based fire detection system (using CNN, ANN, etc.). For the first type of system,
sensors require a sufficient intensity of fire to detect it correctly. The main drawback of
this type of system is it needs certain amount of time for detection which may cause
sufficient damage. Second type of fire detection system overcomes this drawback
efficiently.
In a recent research work, by Valikhujaev and Abdusalomov proposed an algo-
rithm that uses a dilated CNN to remove the time-consuming efforts [3]. Their method
extracts some practical features to train the model automatically. Jadon and Omama
established a new computer vision-based fire detection model, called FireNet, which
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 397
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_34
398 T. Roy and S. K. Shome
is basically light-weight and suitable for mobiles also [4]. But the main issue for video
or image-based system is it needs sufficient light to detect fire or smoke and may not
be work well in dark. Kharisma and Setiyansah used three different types of sensors
and a microcontroller for their system, i.e. LM35 for heat, TGS2600 for smoke, and
QM6 for gas [5]. Their system has a limiting value for heat, smoke, and gas, above
which it triggers alarm. This may tend to trigger false alarm frequently. They are
providing SMS alert system as well as alarm, which is much appreciated. Dubey
and Kumar research sends information about fire early detection in case of forest
fire by detecting heat, smoke, and using a flame sensor connected to the raspberry pi
microcontroller and predicted by a fully connected feed forward neural network [6].
In Park and Lee research, they introduce a fire detection framework, which composes
a set of multiple machine learning algorithms as well as a fuzzy algorithm which
is also adaptive and also a Direct-MQTT based on SDN is developed to solve the
traffic concentration problems, which is basically a problem of traditional MQTT [7].
Evalina and Azis mainly focused on the fire incidents happens due to LPG leakage,
so they described how the MQ-6 gas sensor using the ATmega8 microcontroller can
be used to detect LPG gas leakage. Here is also a limit set for the gas sensor, above
which a buzzer will buzz and LCD will show “leakage” as an alarm [8]. Bahrepour
and Meratnia used a set of ionization, temperature, CO, and photoelectric sensor as
an optimal set of sensors and they assumed that every sensor is present at the sensor
node, i.e. a centralized system. For the algorithm, they showed a comparison among
feed forward neural network, Naïve Bayes, and D-FLER. In their paper, it is estab-
lished that Naïve Bayes is suitable for centralized system, but D-FLER is useful for
distributed system [9]. In 2019, Sarwar and Bajwa worked on an IoT-based applica-
tion, which warns about fire, using adaptive neuro-fuzzy inference system (ANFIS)
to get minimum false alarm. For the experiment, they used microcontroller Arduino
UNO R3, which is based on the atmega328p [10]. Wu and Zhang proposed a combi-
nation of transpose CNN and long short-term memory model for a real-time forecast
of tunnel fire and smoke. They used numerical dataset and images to train the layers of
their model [11]. Angayarkkani and Radhakrishnan established a method for forest
fire detection using spatial data and artificial intelligence. They used radial basis
function neural network on analysed spatial data. But it is mostly useful when fire
is already spread and does not reduce false alarm [12]. Abdusalomov and Baratov
approached to improve the fire detection system using classification and surveillance
system. They used YOLOv3 algorithm. By this approach, they got 77.8% testing
accuracy for 57 h training time as well as 82.4% testing accuracy. But this is also an
image-based system which occupies more space and need sufficient light to detect
fire [13]. In august 2020, Suhas and Kumar built a fire detection model using transfer
learning. They have used different models (ResNet-50, InceptionV3, and Inception-
ResNetV2) for feature extraction and different machine learning algorithm (decision
tree, Naïve Bayes, logistic regression, and SVM) for prediction. Their result shows
that features extracted by ResNet 50 and then trained by SVM algorithm shows
highest accuracy (97.80%) [14].
After this survey, it can be mentioned that beside new image and video-based
fire detection system, traditional sensor-based system is also improving day by day
Efficient Building Fire Detection Using Synergistic Interaction … 399
2 Methodology
The main focus of this paper is to build a machine learning-based techniques which
will detect fire more accurately. For this purpose, we have used an artificial neural
network (ANN) which can learn and improve the results by itself. Here, sequential
model, which is basically a feed forward neural network, has been used. In our work,
we have tried to use different activation function in different layers (input, hidden,
and output) of the neural network to get more accuracy. In a neural network, most
of the time, only one activation function is used in every layer. Detailed workflow is
presented in the following subsections (Fig. 1)
Accuracy and performance of a neural network mostly depend on the number of layer
of the model and the activation function used in that model. Activation functions
introduce nonlinearity in an ANN. Without an activation function, a neural network
behaves like a linear regression model [15] (Figs. 2 and 3).
Equation of linear regression: Y = m X + C
Equation of ANN (without activation function): (Xi ∗ W i) + Bias
Equation of ANN (with activation function): Activation((Xi ∗ W i) + Bias)
So, basically an activation function improves an input signal to an output signal
and then that output signal fed its next layer as an input signal. It acts in stack of
layers. If the output is far away from the desired, then it calculates the error and
updates the weights and biases value of every neuron.
There are different types of activation functions for neural network. We have
chosen four most popular of them—(1) sigmoid function, (2) linear function, (3)
ReLU function, and (4) Softmax function [15].
Sigmoid function—It is an ‘S’-shaped nonlinear function. It is a common choice as
an activation function for neural network. Sigmoid function gives values from 0 to
1.
Linear function—Linear function defines a straight line that passes through the
origin. It is simply a form of linear regression. That is why it is not as helpful in case
of neural networks as it cannot update the values of weights and biases.
ReLU function—It stands for rectified linear unit. It is also a nonlinear function and
works very fast. Advantage of this function is it does not activate all the neurons at
the same time. But, like linear function, it does not update the values of weights and
biases because sometimes gradient of ReLU is 0.
Softmax function—It is an updated combination of multiple sigmoid functions and
very useful for classification problems. Sigmoid function is mostly useful for binary
classification, but Softmax is useful for both binary and multiple classification (Table
1).
3.1 Dataset
We have made a numerical sensor dataset by collecting data from the NIST website
https://www.nist.gov/el/nist-report-test-fr-4016. In that experiment, different kind of
fire hazard situations including flaming and smouldering of different things (chair,
mattress, cooking oil, etc.) were observed in a controlled experimental environment.
O2 , CO2 , and CO gas concentrations, temperature at multiple positions, and smoke
obscuration in the structure were recorded. From seven datasets (sdc01–sdc07), a total
number of 5450 entries were obtained to make our dataset. 70% of that was used to
train and 30% to test the model. Only six sensor data (TCB_1, TCFIRE, SMB_1,
GASB_1, GASB_3 and GASB_6) were selected on the basis of their importance to
feed the model. The description of the column headings and units can be found on
https://www.nist.gov/document/hookupmh1csv.
3.2 Explanation
In this work, we have tried to compare the results obtained from some popular
activation function after feeding the model with our dataset. Also, a new method of
applying different activation function at each layer, and how they behave, has been
shown. To build our model, we have used an initializer called “uniform”. For the
output layer, a loss function called “binary cross-entropy” [16, 17] has been chosen
among some popular loss functions like “mean squared error” and “mean absolute
error”. To optimize the error between the true value and predicted value, an optimizer
called “Adam” has been adopted [18, 19].
For the activation function, at first we ran our code using same activation function
in all the layers (input, hidden, and output) and then, we combined them and used
different activation functions in different layer. For “SSL” combination, we have
adopted “sigmoid” in input layer, “Softmax” in hidden layer, and “linear” in output
layer. For “RLS” combination, “ReLU”, “linear”, and “sigmoid” have been used
as activation function in input, hidden, and output layer, respectively. Comparison
among all the results of our computation is described in Table 2.
It is clear from the above table and plots, first combination SSL, i.e. input layer
activated by sigmoid, hidden layer activated by Softmax, and output layer activated
by linear function, shows better testing accuracy (97.61%). Sigmoid, linear, and the
second combination RLS (input layer ReLU, hidden layer linear, and output layer
sigmoid) is also showing satisfactory results for both training and testing accuracy. In
Fig. 4a and b, confusion matrix of SSL and RLS has been shown. Confusion matrix
shows us that how many times our model predicted the desired class correctly from
a testing dataset. It is clear from that “False positive” is 0 for both the combinations,
and our aim was to reduce the false positive value in case of fire detection. “True
Table 2 Performance results of different methods
Softmax-based layer Sigmoid-based layer Linear-based layer Sigmoid–softmax–linear (SSL) ReLU–linear–sigmoid (RLS)
Training accuracy (%) 75.81 97.85 94.71 97.67 98.19
Testing accuracy (%) 76.21 97.37 94.37 97.61 97.49
Mean squared error 0.2379 0.02629 0.05626 0.02385 0.02507
Mean absolute error 0.2379 0.02629 0.05626 0.02385 0.02507
Precision 0.76 0.99 0.97 1.00 1.00
Recall 1.00 0.97 0.96 0.97 0.97
F1-score 0.86 0.98 0.96 0.98 0.98
Efficient Building Fire Detection Using Synergistic Interaction …
403
404 T. Roy and S. K. Shome
positive” and “True negative” are 1218 and 378, respectively, for combination SSL.
Combination RLS shows 1197 “True positive” and 397 “True negative” value after
testing the model, which is also satisfactory. Mean squared error [MSE] and mean
absolute error [MAE] are also lower by at least 0.3–0.1% in these two combinations
(Table 2).
Loss and accuracy curve during the training is shown in Fig. 5a and b.
Loss and accuracy curve of combination SSL is shown in Fig. 5a, and of combi-
nation RLS is shown in Fig. 5b. In the Fig. 5a, loss is quite high at the initial stage,
but then it decreased logarithmically and also accuracy increased to almost 1 with
the increasing epoch. In Fig. 5b, we can see a symmetric but opposite curve of loss
and accuracy, which is also showing a satisfactory result.
4 Conclusion
In this paper, a new approach that uses selective different activation functions in
different layer of an ANN to detect fire has been shown. Comparison with the existing
methods and explanation of the results is mentioned accordingly. It is observed that
the proposed approach reduces false positivity score as well as increases true positive
value. But, it can vary with the different types of dataset. We have used a two-layered
ANN with an output layer which in case of fire detection, is showing promising results
and will be useful and reliable for a sensor-based fire detection system. As a future
406 T. Roy and S. K. Shome
scope of research, different results and improvement may be done by adding layers
and shuffling the activation functions.
References
1. Directorate general NDRF and civil defence (Fire) ministry of home affairs, N.D (2011) Fire
hazard and risk analysis in the country for revamping the fire services in the country (New
Delhi)
2. Goplani S, AP (2021) To study factors governing fire safety aspect of highrise building in
Ahmedabad region. IJCRT
3. Valikhujaev Y, Abdusalomov A (2020) Automatic fire and smoke detection method for
surveillance systems based on dilated CNNs. Atmosphere
4. Jadon A, Omama M (2019) FireNet: a specialized lightweight fire and smoke detection model
for real-time IoT applications. arXiv
5. Kharisma RS, Setiyansah A (2021) Fire early warning system using fire sensors, microcon-
troller, and SMS gateway. J Robot Control (JRC) 2(3)
6. Dubey V, Kumar P (2018) Forest fire detection system using IoT and artificial neural network,
Springer
7. Park JH, Lee S (2019) Dependable fire detection system with multifunctional artificial
intelligence framework. Sensors (MDPI)
8. Evalina N, Azis HA (2020) Implementation and design gas leakage detection system using
ATMega8 microcontroller. In: IOP conference series: materials science and engineering, IOP
Publishing
9. Bahrepour M, Meratnia N (2009) Use of AI techniques for residential fire detection in wireless
sensor networks. AIAI-2009 Workshops Proceedings
10. Sarwar B, Bajwa IS (2019) An intelligent fire warning application using IoT and an adaptive
neuro-fuzzy inference system. Sensors (MDPI)
11. Wu X, Zhang X (2021). A real-time forecast of tunnel fire based on numerical database and
artificial intelligence. Springer-Verlag GmbH, Germany
12. Angayarkkani K, (2010) An intelligent system for effective forest fire detection using spatial
data. Int J Comput Sci Inf Secur
13. Abdusalomov A, Baratov N (2021) An improvement of the fire detection and classification
method using YOLOv3 for surveillance systems. Sensors (MDPI)
14. Suhas G, Kumar C (2020) Fire detection using deep learning. Int J Progressive Res Sci Eng
15. Sharma S, Sharma S (2020) Activation functions in neural networks. Int J Eng Appl Sci Technol
16. Sun M, Raju A (2016) Max-pooling loss training of long short-term memory networks for
small-footprint keyword spotting. In: spoken language technology workshop (SLT), IEEE
17. Diederik P, Kingma JL (2015) ADAM: a method for stochastic optimization. ICLR
18. Cleary TG. An analysis of the performance of smoke alarms. In: Gaithersburg, MD 20899 USA:
fire research division, Engineering Laboratory, National Institute of Standards and Technology
19. Chaurasia D, Shome SK (2021) Intelligent fire outbreak detection in wireless sensor network,
Springer, Singapore
Performance Evaluation of Spectral
Subtraction with VAD
and Time–Frequency Filtering
for Speech Enhancement
1 Introduction
The field of speech enhancement has progressed and developed on its own throughout
the years. Various algorithms were suggested during this time period in response to
the growing demands of our technology oriented way of life. Spectral subtraction
(SS) [1–4] is by far the most popular method in speech enhancement, possibly due
to its simplicity. A well-known shortcoming of the SS algorithms is the resulting
residual noise consisting of musical tones. Spectral smoothing has been proposed
as a solution to the musical noise problem; however, it results in low resolution and
variance [5].
Over the decades, the VAD-based noise estimation technique is employed for
estimating noise in speech enhancement algorithms. Though much advancement
has been done in the VAD methods, expanding VAD techniques that are precise in
real-time scenarios and can work well for low signal-to-noise ratio (SNR) is still
challenging. A combination of SS-VAD and linear predictive coding scheme was
developed in prior work to increase the SNR and audibility features of encoded
audio recordings [6]. It was observed that the resulting musical noise due to SS
had an adverse effect on encoding performance. The work in [3] described a noise
G. T. Yadava (B)
Nitte Meenakshi Institute of Technology, Bengaluru, Karnataka, India
e-mail: [email protected]
B. G. Nagaraja
K. L. E. Institute of Technology, Hubballi, Karnataka, India
e-mail: [email protected]
H. S. Jayanna
Siddaganga Institute of Technology, Tumkur, Karnataka, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 407
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_35
408 G. T. Yadava et al.
suppression algorithm using SS. For various SNR circumstances, experimental find-
ings revealed a 10 dB decrease in background noise (−6–16 dB). Also in comparison
with the SS method, the proposed method achieved improved speech quality and
reduced noise artifacts.
In recent studies, several speech enhancement techniques independent of VAD
and SNR have been reported [7, 8]. The performance evaluation of SS and different
minimum mean square error (MMSE)-based algorithms, viz. MMSE short time
spectral amplitude, β-order MMSE, Log MMSE, and adaptive β-order MMSE was
done in [8]. The objective, subjective, and composite objective measures showed
that the adaptive β-order MMSE technique outperforms the others. To reduce the
background noise, a zero-frequency filtering-based foreground speech separation
front-end enhancement scheme was introduced into the automatic speech recogni-
tion system [9]. It was observed an absolute reduction of 6.24% word error rate is
achieved in comparison with the previously reported spoken query system perfor-
mance [10]. In [11], a modified cascaded median (MCM)-based speech enhance-
ment algorithm was implemented using the TMS320C6416T processor. The perfor-
mance of MCM-based system in terms of speech quality, memory consumption, and
execution time showed superior performance than the dynamic quantile tracking and
cascaded median techniques.
Another way to tackle the musical noise problem in SS is to combine an over
subtraction factor with a spectral floor [5]. This approach has the shortcoming of
degrading the required speech data when musical tones are adequately reduced.
To overcome this, we propose the combination of the SS-time–frequency (SS-
TF) filtering method. In the proposed method, a continuous recursive algorithm
is employed to compute the spectral magnitude of noise. Also, a time–frequency
filtering is used in place of residual noise reduction technique to condense the addi-
tive noise. The remainder of the paper is organized as follows: Sect. 2 explains the
implementation of proposed SS-TF filtering method. Section 3 describes the speech
quality and intelligibility evaluations and discusses the results. Finally, conclusion
and future work are mentioned in Sect. 4.
In this Section, for completeness, we precisely describe the SS-VAD and implement
the proposed SS-TF for speech enhancement.
Performance Evaluation of Spectral Subtraction with VAD … 409
The VAD in SS plays an important role by detecting the speech activity in the
degraded signal [12, 13]. The degraded speech data y(n) is the sum of original
speech data s(n) and noise model n(n) can be mathematically represented as
The block diagram of SS-VAD is shown in Fig. 1. The energy, E, the normalized
linear prediction error, NLPE, and the zero-crossing rate, ZCR, were calculated for
each speech segments. All these parameters were used for calculating the factor Z is
given by
The parameter Z max was calculated for all the frames in speech signal and the
ratio of Z /Z max was used to determine whether the signal has speech activity or
not. The frames without speech activity were considered for the estimation of noise
spectrum. Let μ is defined as the noise spectral magnitude estimate, and then the
spectral subtraction output can be written as follows:
S̃i (w) = |X i (w)| − μi (w); w = 0, 1, . . . , L − 1 and i = 0, 1, . . . , L − 1
(4)
where length of the FFT is L and the number of frames are represented as M. After the
process of subtraction, the differenced values are set to zero if they are negative. The
residual noise reduction can be done by the mathematical modeling shown below.
|Si (w)|; |Si (w)|
≥ max|N R (w)|
S̃i (w) = (5)
min| | j=i−1 S j (w) ; |Si (w)| < max|N R (w)|
i+1
where max |N R (w)| is the maximum residual noise during the absence of speech
activity. To reduce the noise further in the absence speech activity regions, the process
of attenuation was performed.
points. For each frequency bin, the estimation of noise is computed by considering the
average signal magnitude spectrum of |Y | from the non-speech frames. The estimate
of the spectral subtraction can be calculated as the same way presented in “Eq. (5)”.
The negative magnitudes are set to zero after subtraction, and this step is called half
wave rectification. In [6, 14–16], many of the SS algorithms have been implemented
and showed that, there is a less suppression of musical noise when the noise model is
non-stationary. To overcome this problem, we use a continuous recursive algorithm
to compute the spectral magnitude of noise, and it can be mathematically represented
as follows:
where Yi (k) represents the spectral magnitude of degraded speech sequence samples
at frame k in ith subband. A threshold, β Ni (k − 1) has been set to separate the
speech and non-speech frames. The values of α = 0.9 and β is in the range of
1.5–2.5. When the estimated spectral value of Yi (k) is greater than the threshold,
then this can be considered as closest speech detection and recursive system which
stops its execution. The TF filtering is performed to minimize the musical noise in
the corrupted speech data using preceding frames and several frames following the
frames of interest. To analyze the TF filtering on particular frame(s), we consider
two regions shown in Fig. 3.
2
n+b 1
m+b
PB (m, n) = Yi (k)
i=n−b2 w=m−b1
(7)
2
n+a 1
m+a
PA (m, n) = Yi (k) − PB (m, n)
i=n−a2 w=m−a1
Table 1 PESQ values for proposed and existing methods for the assessment of speech quality
Algorithm Types of noise 0 dB 5 dB 10 dB
SS-VAD in [6] Airport 1.9085 2.1752 2.4814
Exhibition 1.6571 1.9992 2.3984
Restaurant 1.9950 2.0314 2.4362
Station 1.6517 2.1396 2.5386
Proposed (SS-TF) Airport 1.9501 2.1758 2.4778
Exhibition 1.6614 2.0123 2.3811
Restaurant 2.0900 2.0364 2.4101
Station 1.6801 2.1294 2.5111
Table 2 NCM values for proposed and existing methods for the assessment of speech intelligibility
Algorithm Types of noise 0 dB 5 dB 10 dB
SS-VAD in [6] Airport 0.5641 0.6757 0.8286
Exhibition 0.4886 0.6072 0.7736
Restaurant 0.4963 0.6223 0.8337
Station 0.5061 0.7528 0.8466
Proposed (SS-TF) Airport 0.5744 0.6777 0.8247
Exhibition 0.4887 0.6099 0.7514
Restaurant 0.5010 0.6314 0.8247
Station 0.5179 0.7491 0.8310
4 Conclusions
References
4. Kumar B (2015) Spectral subtraction using modified cascaded median based noise estimation
for speech enhancement. In: Proceedings of the sixth international conference on computer
and communication technology 2015, pp 214–218
5. Jelinek M, Salami R (2004) Noise reduction method for wideband speech coding. In: 2004
12th European signal processing conference, IEEE, pp 1959–1962
6. Thimmaraja YG, Nagaraja BG, Jayanna HS (2021) Speech enhancement and encoding by
combining SS-VAD and LPC. Int J Speech Technol 24(1):165–172
7. Kumar B (2021) Comparative performance evaluation of greedy algorithms for speech
enhancement system. Fluctuation Noise Lett 20(02):2150017
8. Kumar B (2018) Comparative performance evaluation of MMSE-based speech enhancement
techniques through simulation and real-time implementation. Int J Speech Technol 21(4):1033–
1044
9. Shahnawazuddin S, Thotappa D, Dey A, Imani S, Prasanna SRM, Sinha R (2017) Improvements
in IITG assamese spoken query system: background noise suppression and alternate acoustic
modeling. J Signal Proc Syst 88(1):91–102
10. Shahnawazuddin S, Deepak KT, Sarma BD, Deka A, Prasanna SRM, Sinha R (2015) Low
complexity on-line adaptation techniques in context of assamese spoken query system. J Signal
Proc Syst 81(1):83–97
11. Kumar B (2019) Real-time performance evaluation of modified cascaded median based noise
estimation for speech enhancement system. Fluctuation Noise Lett 18(04):1950020
12. Ramirez J, G´orriz JM, Segura JC (2007) Voice activity detection. fundamentals and speech
recognition system robustness. Robust Speech Recogn Underst 6(9):1–22
13. Jainar SJ, Sale PL, Nagaraja BG (2020) VAD, feature extraction and modelling techniques for
speaker recognition: a review. Int J Sign Imaging Syst Eng 12(1–2):1–18
14. Kumar B (2016) Mean-median based noise estimation method using spectral subtraction for
speech enhancement technique. Indian J Sci Technol 9(35)
15. Tan Z-H, Dehak N et al (2020) rVAD: An unsupervised segment-based robust voice activity
detection method. Comput Speech Lang 59:1–21
16. Yadava GT, Jayanna HS (2019) Speech enhancement by combining spectral subtraction and
minimum mean square error-spectrum power estimator based on zero crossing. Int J Speech
Technol 22(3):639–648
17. Hu Y, Loizou PC (2007) Subjective evaluation and comparison of speech enhancement
algorithms. Speech Commun 49:588–601
18. Rix AW, Beerends JG, Hollier MP, Hekstra AP (2001) Perceptual evaluation of speech quality
(PESQ)-a new method for speech quality assessment of telephone networks and codecs. In:
2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings
(Cat. No. 01CH37221) vol 2, pp 749–752
19. Hu Y, Loizou PC (2007) Evaluation of objective quality measures for speech enhancement.
IEEE Trans Audio Speech Lang Process 16(1):229–238
20. Hu Y, Loizou PC (2006) Subjective comparison of speech enhancement algorithms. In: 2006
IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, vol
1, IEEE, pp I–I
21. Yuan W (2021) Incorporating group update for speech enhancement based on convolutional
gated recurrent network. Speech Commun
22. ITU-T Recommendation (2001) Perceptual evaluation of speech quality (PESQ): an objective
method for end-to-end speech quality assessment of narrow-band telephone networks and
speech codecs. Rec. ITU-T P. 862
A Unified Libraries for GDI Logic
to Achieve Low-Power and High-Speed
Circuit Design
Jebashini Ponnian, Senthil Pari, Uma Ramadass, and Ooi Chee Pun
1 Introduction
J. Ponnian (B)
Infrastructure University Kualalumpur, Hulu Langat, Malaysia
e-mail: [email protected]
S. Pari · O. C. Pun
Multimedia University, Cyberjaya, Malaysia
e-mail: [email protected]
O. C. Pun
e-mail: [email protected]
U. Ramadass
Jeppiaar Institute of Technology, Kanchipuram, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 415
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_36
416 J. Ponnian et al.
The primary contribution of this work offers the signal connectivity of GDI cell,
creation of GDI library in four different ways. The former presents basic GDI cell
without buffer, next presents the GDI cell creation including buffer, GDI cell creation
using F1 and F2 and the latter specifies GDI cell using level restoration. All these
libraries are created using Silterra 130 nm process mentor graphics Pyxis software
and the parameter like rise time, fall time, delay power and dynamic power have been
analysed. These four library cells are compared with the existing counterpart CMOS
technology and reveal the significant improvement in terms of transistor count, delay
and power.
This session illustrates the basics of GDI cell representation, signal connectivity and
various library cell creations.
GDI is one of the profound logic techniques that support low power design. The
fundamental design is derived from CMOS inverter configuration having shorted
gate input, and the diffusions are associated with inputs like variable inputs, power
supply VDD or ground VSS.
This modified configuration offers minimal transistor count utilization for a
design; the combinational delay-power factor can be minimized. Though this tech-
nique supports for low-power and high-speed design, output of the circuit is degraded
due to threshold variation and complexity in fabrication process. The basic structure
of GDI cell is demonstrated in Fig. 1.
VP
VP
VP T(PMOS)
Y
Y Y
VG VVN
VN T(NMOS)
VN VG
The general structure of GDI cell depicts MUX-based structure. Therefore, any
Boolean function GDI cell can be derived using MUX mapping algorithm which
demonstrated in Fig. 2.
The algorithm defines mapping the output of particular gate to be designed in 2 ×
2 K-map and observing the literal entities in the map along column wise. For case,
if P-diffusion column is (0, 0) or (1, 1) the P-diffusion node is connected with 0 or
1. If case is (0, 1) or (1, 0), then the assigned literal along the row will be connected
to P-diffusion. For elucidation, AND gate realization is shown in Fig. 3.
Basic GDI cell created using MUX mapped signal connectivity is shown in Fig. 4.
The patterns are created for all the primitive gates and the parameter values are tabled
using mentor graphics software. This library is optimal in terms delay and power but
418 J. Ponnian et al.
suffers from threshold variation and full swing problem. This can be surrogated nut
inserting a buffer at each node. But, this increases the transistor count but provides
full swing.
For GDI logic F1 and F2 are the functional component similar to NAND/NOR for
CMOS logic. All primitive structure can be implemented using F1 and F2. The
420 J. Ponnian et al.
2.6 Library for Basic GDI Cell with F1 and F2 Using Level
Restoration
The experimentation is done using Silterra 130 nm process mentor graphics Pyxis
software and the parameter like rise time, fall time, delay power and dynamic power
have been analysed. The simulation setup is defined with 0–1.2 V in steps of 0.2 V
and tested for all design corners. Extensive simulation is performed to cover every
A Unified Libraries for GDI Logic to Achieve Low-Power … 421
Fig. 8 2-input GDI library cells using F1 and F2 with restoration circuit
possible input patterns for all logic gates. The simulated results of 4 GDI library
along with CMOS logic is shown in Tables 1, 2, 3, 4 and 5.
4 Discussion
The comparative analysis with respect to delay and PDP is shown in Fig. 9. From
this study, it is observed that GDI with F1 and F2 using restoration circuit produces
optimal delay and PDP in accordance with the other three libraries with additional
transistor count.
For CMOS logic only for XOR and XNOR, it is slightly comparable with GDI with
F1 and F2 using restoration. For GDI library listed in Table 1 utilizes less transistor
count but the delay aspect is high when compared to the other three libraries.
422 J. Ponnian et al.
Table 1 Simulated Input for 2-input GDI library cells using Mux signal connectivity
GDI RS (PS) FT (PS) Delay (nS) Tr PD (PW) Tr * Delay PDP
AND 1.55 3.06 12.67 2 282.9 25.34 3584.343
OR 20.86 910.63 12.66 2 330.56 25.32 4184.8896
NAND 180.44 136.06 24.9 4 390.69 99.6 9728.181
NOR 137.76 148.83 24.96 4 480.07 99.84 11,982.547
XOR 180.49 29.48 28.89 4 942.84 115.56 27,238.648
XNOR 31.69 174.49 28.03 4 912.07 112.12 25,565.322
MUX 21.11 19.21 19.96 2 164.97 39.92 3292.8012
F1 173.2 19.21 14.87 2 222.88 29.74 3314.2256
F2 11.33 173.16 14.08 2 230.09 28.16 3239.6672
Table 2 Simulated Input for 2-input GDI library cells including buffer
GDI RS (PS) FT (PS) Delay (nS) Tr PD (PW) Tr * Delay PDP
AND 76.19 116.02 11.67 6 284.23 70.02 3316.9641
OR 103.84 169.8 11.66 6 332.56 69.96 3877.6496
NAND 39.68 36.92 22.9 8 393.61 183.2 9013.669
NOR 44.62 20.81 22.96 8 484.07 183.68 11,114.247
XOR 40.47 120.97 24.89 8 944.82 199.12 23,516.57
XNOR 110.1 115.75 24.03 8 921.07 192.24 22,133.312
MUX 68.93 115.69 17.96 8 166.97 143.68 2998.7812
F1 37.91 169.81 12.87 6 226.88 77.22 2919.9456
F2 77.21 22.99 12.08 6 234.09 72.48 2827.8072
Table 3 Simulated Input for 2-input GDI library cells including F1 and F2
GDI RS (PS) FT (PS) Delay (nS) Tr PD (PW) Tr * Delay PDP
AND_F1 136.06 180.49 10.17 4 286.23 40.68 2910.9591
OR_F2 148.83 31.69 10.66 4 336.56 42.64 3587.7296
NAND_F1 29.48 21.11 20.9 6 394.62 125.4 8247.558
NOR_F2 137.76 21.11 20.96 6 486.22 125.76 10,191.171
XOR_F1, F2 180.49 173.2 22.82 8 946.41 182.56 21,597.076
XNOR_F1, F2 31.69 40.47 22.32 8 923.71 178.56 20,617.207
MUX_F1, F2 21.11 110.1 15.96 10 166.96 159.6 2664.6816
F1 173.2 19.21 14.87 2 222.88 29.74 3314.2256
F2 11.33 173.16 14.08 2 230.09 28.16 3239.6672
A Unified Libraries for GDI Logic to Achieve Low-Power … 423
Table 4 Simulated input for 2-input GDI library cells including F1 and F2 with restoration
GDI RS (PS) FT (PS) Delay (nS) Tr PD (PW) Tr * Delay PDP
AND_F1 396.79 795.56 9.98 5 288.32 49.9 2877.43
OR_F2 474.82 122.83 9.99 5 338.55 49.95 3382.11
NAND_F1 134.41 65.83 18.81 7 396.72 131.67 7462.30
NOR_F1 50.72 106.33 18.98 7 488.24 132.86 9266.79
XOR_F1 + F2 60.93 62.96 19.96 17 948.44 339.32 18,930.8
XNOR_F1 + 245.68 780.83 19.95 17 925.72 339.15 18,468.1
F2
MUX_F1 + 367.77 59.44 10.78 19 168.22 204.82 1813.41
F2
F1 811.08 178.54 10.98 5 224.73 54.9 2467.53
F2 179.14 796.64 10.99 5 232.1 54.95 2550.77
5 Conclusion
This work, presents a regimented creation of four different GDI libraries to implement
combinational and sequential circuit with minimal power and delay. The proposal
also defines a unified connectivity model for GDI logic using MUX mapping algo-
rithm. The experimentation is done using Silterra 130 nm process mentor graphics
Pyxis software and the parameter like rise time, fall time, delay power and dynamic
power. From this work, it is observed that GDI with F1 and F2 using restoration
circuit produces optimal delay and PDP in accordance with the other three libraries
with additional transistor count.
424 J. Ponnian et al.
Delay (ns)
35
30 GDI
25
GDI_buffer
20
15 GDI_F1,F2
10 GDI_F1,F2_restoration
5
CMOD
0
AND OR NAND NOR XOR XNOR MUX F1 F2
PDP
30000
25000 GDI
20000 GDI_buffer
15000
GDI_F1,F2
10000
GDI_F1,F2_restoration
5000
CMOS
0
AND OR NAND NOR XOR XNOR MUX F1 F2
References
1. Geetha S, Amritvalli P (1019) Design of high speed error tolerant adder using gate diffusion
input technique. J Electron Test 10836–019–05802–2, May 1019
2. Kishore P, Koteswaramma KC, Chalapathi Rao Y () Design of High Performance Adder Using
Modified Gdi Based Full Adder. J Mech Cont Math Sci 15(8)
3. Sarkar S, Chatterjee H, Saha P, Biswas M (2020) 8-Bit ALU Design using m-GDI Technique.
In: Proceedings of the fourth international conference on trends in electronics and informatics
(ICOEI 2020) IEEE Xplore Part Number: CFP20J32-ART, pp 17–22
4. Morgenshtein A, Fish A, Wagner IA (2002) Gate-diffusion input (GDI): a power-efficient
method for digital combinatorial circuits. IEEE Trans Very Large Scale Integr (VLSI) Syst
10(5):566–581
5. Uma R, Dhavachelvan P (2012) Modified gate diffusion input technique: a new technique for
enhancing performance in full adder circuits, 2nd International conference on communication,
computing and security [ICCCS-2012]. Proc Technol 6:74–81
6. Amini-Valashani M, Mirzakuchaki S (2020) New MGDI based full adder cells for energy-
efficient applications. Int J Electron. https://doi.org/10.1080/00207217.2020.1818296
7. Praveen Kumar YG, Kariyappa BS, Shashank SM, Bharath CN (2020) Performance analysis
of multipliers using modified gate diffused input technology. IETE J Res. https://doi.org/10.
1080/03772063.2020.1782778
8. Shoba M, Nakkeeran R (2016) GDI based full adders for energy efficient arithmetic
applications. Eng Sci Technol Int J 19:485–496
9. Morgenshtein A, Fish A, Wagner IA (2002) Gatediffusion input (GDI)-a power-efficient
method for digital combinatorial circuits. IEEE Trans VLSI Syst 10(5)
10. Rabaey JM, Chandrakasan A, Nikolic B (2002) Digital integrated circuits In: A design. 2nd
2002, prentice Hall, Englewood Cliffs, NJ
A Unified Libraries for GDI Logic to Achieve Low-Power … 425
11. Ponnian J, Pari S, Ramadass U, Pun OC (2021) A new systematic GDI circuit synthesis using
MUX based decomposition algorithm and binary decision diagram for low power ASIC circuit
design. Microelectron J 108:104963
12. R.Uma, Ponnian J, Dhavachelvan P (2017) New low power adders in self resetting logic with
gate diffusion input technique. J King Saud Univ Eng Sci 29(2):118–134, April 2017, Elsevier
13. Uma R, Dhavachelvan, P (2012) Modified gate diffusion input technique: a new technique
for enhancing performance in full adder circuits, international conference on communication,
computing and security (ICCCS-2012). Proc Technol 6:74–81, September 2012, Elsevier
14. Uma R, Vigneshwarababu P, Nakkeeran R, Dhavachelvan P () New low-power reversible logic
gates using gate diffusion input technique. In: International conference on emerging research
in computing, information, communication and applications, ERCICA-2013, pp 31–36
Detection of Diabetic Retinopathy Using
Convolution Neural Network
1 Introduction
During the initial years of discovery of this disease, most of the patients were diag-
nosed with diabetes and greater than 65% of the patients were diagnosed with
diabetes were discovered with retinopathy. From 2015 to 2019, there were more
than 27% of cases of diabetic retinopathy. Person who is diagnosed with diabetic
retinopathy is prone to blindness. One out of 3 people diagnosed with diabetes will
have diabetic retinopathy and 1 out of 10 people will suffer from vision loss. Diabetic
retinopathy can eventually lead to vision loss. Diabetic retinopathy causes damage
in the blood vessels in the retinal region of the eye [1]. There are four various stages
in diabetic retinopathy, namely mild non-proliferative retinopathy also known as
microaneurysms; it is the earliest stage which is clinically visible changes in diabetic
retinopathy. A round localized capillary dilatations are found. They are usually small
red dots usually found in clusters and sometimes can also occur in isolation. But it
does not affect the vision. The number of microaneurysms has strong predictive value
in the progression of the diabetic retinopathy. Blockage in blood vessels is caused
due to moderate non-proliferative retinopathy. Severe non-proliferative retinopathy
which causes more proliferative retinopathy results in the growth of abnormal blood
vessels on the retina. An algorithm for the detection of diabetic retinopathy might
help doctors and researchers to recognize the symptoms of diabetic retinopathy in
the people and reduce the burden of clinical trials on the specialists and researchers.
An effective way of DR detection is through image processing (Fig. 1).
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 427
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_37
428 K. S. Swarnalatha et al.
Fig. 1 a Normal fundus image with no sign of DR, b fundus image with red lesions, c fundus
image with exudates
The research for automatic detection of DR becomes more and more crucial in
the past few years. In our study, we are focusing on anomalies in the retina in the
form of exudates and red lesions. It would be challenging to locate red lesions using
normal image processing techniques because of its equivalent color characteristics.
It is hard to considerate work needs to be done in blood vessel extraction and optic
disk removal, both of which may also result in a false detection. The above-discussed
problems can be overcome through CNN techniques. During training phase, CNN
is capable to grasp the features. Different features with different complexities can
be learned in various layers of the network. As features are extracted automatically,
there is no room for manual feature extraction. The hidden layer may include sigmoid
activation layer (logistic function), which is an activation function, convolutional
layer and fully connected layer.
Detection of Diabetic Retinopathy Using Convolution … 429
2 Related Work
The work by Carson lam uses an automatic DR grading system which is capable of
classifying the images with respect to different stages of the disease based on severity
[2]. The convolutional neural network (CNN) takes an input image and extracts
specific features of the image without losing information on spatial arrangement.
Initially, different architectures is evaluated to determine the best CNN for classi-
fication of binary task and to achieve standard performance levels. Then, training
of different models is done that enhances sensitivities for different classes, which
include data pre-processing and augmentation to improve accuracy of the test and
also increases effective dataset size of sample. The different concerns of data quality
and fidelity is checked images verified specialist doctor. Finally, the issue is addressed
regarding insufficient size of samples by utilizing deep-layered CNN on color space
for the recognition of task. Then, training and testing of CNN architectures AlexNet
and GoogLeNet [2] a done as 2-ary, 3-ary and 4-ary. They are also trained using
many techniques which include batch normalization, L2 regularization, learning rate
policies and gradient descent update rules. Other studies were done using the publicly
available Kaggle dataset of 36,000 retinal [3] images with 5-class labels (normal,
mild, moderate, severe and end stage) and MESSIDOR-1 dataset of 1200 color fundus
images verified by physician with 4-class labels. Throughout the study, the main aim
has been to garner a more effective means of classifying diabetic retinopathy at early
stage and thus giving better chance to patients for treatment.
The work done by Nikhil and Rose involves three CNN designs with features
to develop a classification system for different stages of DR based on color fundus
images [4]. The classification is done based on the 5-stage severity of DR. Here,
deep learning-based CNN networks is deployed. There were many medical studies
conducted in field of designing algorithm for classification of DR from fundus
images of retina. But usually they were only binary classifiers differentiating two
stages of DR which included normal and DR [4]. But in here, they check the three
prediction accuracies of different deep CNN architectures and combinations of these
networks when deployed as DR stage classifier. The study is done using Kaggle
dataset having 500 images of retinas. Here, they found that after using VGG16,
AlexNet and Inception Net V3; they got highest accuracy of around 80.1.
The work by Wang and Yang adopts CNN as predicting algorithm with the aims to
develop more efficient CNN architecture that can be specifically useful for dataset of
large scale. The CNN built by them has no fully connected layer but pooling layers and
convolutional layers [5]. This method reduces the parameters number significantly
(fully connected layers have more parameters than convolutional layers in CNN) and
thus provides better conditions for neural network interpretability. They have shown
in their experiments that with less parameters and no fully connected layers CNN
architecture will have better performance in terms of prediction. The advantage of
the network structure proposed is that it can provide a RAM of input image to show
that each pixel of image that is input for DR detection. This RAM output somehow
will mitigate the known shortcomings of CNN as black box method [5]. They are
430 K. S. Swarnalatha et al.
of the opinion that RAM output makes proposed solution better self-explained to
motivate the doctors to trace the cause of disease for the patient and explain clearly
regarding which region of the fundus color image is the main cause of the disease.
By analyzing the insights of CNN, they have yielded good performance in terms
of prediction of the diseases facing challenges due to CNN as it has non-convex
character.
In this paper, Arcadu and Benmansour have utilized artificial intelligence (AI) to
offer a solution to the problem. Deep learning (DL), specifically deep convolutional
neural networks (DCNNs), has been used for the assessment of images to get a
particular target for prediction of the outcome. The use of DCCN algorithms has
been right now used in numerous areas in health care like dermatology, radiology.
Particularly in ophthalmology, significant work has recently been conducted. Due to
automation of DR prediction and grading and of risk factors by DCNN [6]. The main
goal of the work is to go past use of DL for DR diagnostics and for assessment of
feasibility of DCNNs operating on 7-field CFPs for the prediction future threats of
significant worsening of DR of patient over a given span of 2 years. The DCNNs are
trained on high-quality 7-field CFPs which have been graded for DR severity by and
highly trained experts masked using the diabetic retinopathy severity scale and early
treatment diabetic retinopathy study from many clinical trials. Earlier studies have
limited the use of DCNNs to optic nerve [7] or fovea critical CFPs [6]. The findings
have highlighted the significance of signal prediction located in the patients peripheral
retinal [3] fields with DR and tell that such an algorithm on further development and
correct validation can and will help us fight blindness by quickly identifying the DR
progressors for referring to a retina specialist or using it in a clinical trial which is
intended to target early stage of diabetic retinopathy.
3 Architecture
The machine learning model detects the presence of diabetes retinopathy. It imple-
ments the deep learning algorithm. The machine learning model takes input from the
user and reveals the probability of diabetic retinopathy. For spotting the existence of
retinopathy, the model should be trained repeatedly.
Detection of Diabetic Retinopathy Using Convolution … 431
Implementing web services exposes the model to the end-users. The user would be
uploading the fundus image [8] and then would place a request to execute the model.
Once the request is received by Webservice API, the input image will be stored in the
receiver machine and the Python code would be invoked by stating the folder location
which then processes the image. The output will be displayed after processing in a
user-friendly graphical interface. For testing the implementation of web services,
REST client is used.
3.3 End-User
4 Methodology
4.1 Dataset
The high-resolution fundus image database consisting of more than 1000 images
with dimension 1050 × 1050 is used for training the CNN algorithm. And around
100 fundus images are used for testing. Sample fundus images are shown in Fig. 3.
Initially, the unique fundus images are resized to a measurement of 224 × 224. Due
to the immense information and varying contrast of images taken from the fundus
cameras, pre-processing is very necessary. Without pre-processing then the images
suffer from vignetting effects and image distortion. The pre-processing method is
necessary because of nonlinearity in fundus images.
At first, all the fundus images which are in Red, Green, Blue (RGB) are reconstructed
to greyscale from the weighted average of Red, Green, Blue pixels within which 0.299
of the red part, 0.114 of the blue part and 0.587 of the green part are considered.
I = R * 0.299 + B * 0.114 + G * 0.587 where I = resultant pixel
4.2.2 Resizing
The images which are transformed to greyscale are rescaled to a fixed size of 336 *
448 pixels.
Detection of Diabetic Retinopathy Using Convolution … 433
The fundus image is rescaled to a pixel value between 0 and 1 divided by 255 for
better computation. The pixels with lesser than a threshold value are rescaled to 0,
and others are rescaled to 1.
CNN is extensively-used for various applications such as image processing,
pattern recognition and video recognition. CNN has been widely recognized for
image classification. It takes in an image as input and classifies into the appropriate
category. CNN has couple of unseen layers in which convolution is done to extract
features and other valuable information from the image. The output is obtained
from the classification layer. CNN is the class of deep learning neural networks that
combines the accomplished attributes with the input data and uses a 2D convolu-
tional layer. A database with around 1000 high definition retinal image was taken for
training the CNN model. The database undergoes several pre-processing procedures.
And, 30 fundus images are used for testing the trained CNN model. Comparatively,
CNN uses lesser pre-processing. This is because they can learn the features of an
image during the pre-processing and training session itself (Fig. 4).
CNN is a blend of divergent layers like I/P layer, an O/P layer and the hidden
layers. The hidden layer consists of convolution layers, sigmoid layers, pooling layers
and fully connected layers. Convolution operation is applied to the convolution layer.
The product between filters and image patches is been computed using convolutional
features. The features and information from the convolution layer is passed on to the
activation layer, where it performs the threshold operation to each element of the
image. Element-wise activation is applied to the output of the previous layer. The
information of the sigmoid layer is passed on to the pooling layer. The average
pooling layer down-converts the volume of the pixels of the image and lessens the
memory used. The data collected is passed on to the fully connected layer, which is
the final layer of the convolution network. This layer has all the features such as the
edges, contrast, blobs and shapes. This layer collects all the accumulated information
of all the preceding layers.
The overall pixels in the input fundus image is similar to the count of I/P layer
neurons. The CNN layer employs the CNN features and generates the outcome
between the filter the image patches. Sigmoid linear can be utilized for the activation
layer. Sigmoid activation function is also called as logistic function. The sigmoid
function exists between the range of 0–1. Therefore, it is especially used in models
that need to predict as output. The probability is in the range of 0–1. For the outcome
of the convolutional layer, segment-wise activation is been applied. To bring down
the memory needs and to elevate the computation, the volume of the pooling layer
has been down converted. The properties like contrast, shapes, edges and blobs are
been held as an information in fully connected layer. For the terminating layer of
CNN softmax layer is been used. For individual class, decimal probabilities would
be assigned. The I/P images is categorized into two different classes like abnormal
(with DR) normal (without DR).
A fully connected layer takes the output of all the neurons from the former layer.
Diabetic retinopathy is classified into four stages:
(i) Mild non-proliferative retinopathy: This is considered as expeditious stage of
diabetic retinopathy. Microaneurysms occurs in this stage. They are nothing
but the tiny areas swellings within the retinal tiny blood vessels.
(ii) Moderate non-proliferative retinopathy: The blood vessel which nurtures the
retinal region would lead to swelling, and in some instances, they are even
blocked.
(iii) Severe non-proliferative retinopathy: This is the newfangled stage. In this
phase, there are chances of blood vessels [9] being increased in numbers which
would block the retinal regions, and it would lead to denying not many areas
of the retina with their blood supplies.
(iv) Proliferative retinopathy: This considered as the newfangled stage of diabetic
retinopathy. Here, the blood vessels which are growing are abnormal and
fragile. This stage is more dangerous and is incurable.
The CNN model is trained to spot and categorize the fundus image into different
stages of diabetic retinopathy. Based on the severity of hemorrhages, microa-
neurysms, distortion in the inner layer of the retina, and red lesions in the fundus
image, the classification of diabetic retinopathy is observed.
Detection of Diabetic Retinopathy Using Convolution … 435
5 Performance Parameters
The parameters which are used to gauge the performance of the proposed model are
accuracy and loss. To calculate the accuracy, we have the quadratic weighted kappa,
also known as Cohen’s kappa, the official evaluation metric. For our filter, we will
use a custom callback to monitor the score and plot it at the end.
The definition of Cohen kappa (κ) is:
Po − Pe
K =
1 − Pe
tp + tn
po = acc = = 2 + 26 = 0.66
all
6 Experimental Result
In the project, the experiment is done on several number of images obtained from
the online dataset and also on the images obtained from various Web sites. The
experiment is done on both the fundus images with diabetic retinopathy and without
diabetic retinopathy. After pre-processing and extraction, the images were trained
with the CNN algorithm which consisted of multiple layers. The training was done
on more than 3000 images for ten epochs. More the number of epochs (iteration),
more the accuracy. Testing the trained model resulted in an accuracy of 94.5%.
Here, we evaluate the performance, efficiency and result of the CNN trained model
with various parameters like speed, accuracy, etc.
The input image is diagnosed and determined whether the person has DR or not.
And, the result is analyzed to determine and classify in which stage the disease is
probably present and displayed (Fig. 6).
In this above image, the result is shown as diabetic retinopathy found as the
probability of the disease is more for mild stage (Fig. 7).
In this above image, the result is shown as diabetic retinopathy not found as the
probability of the disease is more for no disease according to the model accuracy.
In this experiment, the diagnosis is performed to determine whether the person
has DR or not; then if the person has DR, the result is analyzed to determine the
probability in which stage the disease might be present more.
7 Challenges
The major challenge was about dealing with the computational complexity that the
cloud service that we have with us could not handle. The program had to deal with
Detection of Diabetic Retinopathy Using Convolution … 437
a diverse number of neurons. Initially while working with a laptop that was running
on an AWS did not give the expected results. Then, we moved on to use the Digital
Ocean cloud service which handles machine learning workloads better than AWS.
With this, we were able to test more images and acquired the correct predictions.
8 Conclusion
Today, due to a huge population of diabetic patients present and the possibility of
them suffering from diabetic retinopathy has mas made automatic DR systems to be
of great demand; automated detection provides an opportunity for us to prevent loss
of vision among patients due to diabetic retinopathy. In this paper, the objective is to
make a clinically usable system. Here, the systems use the fundus images captured
from screening to detect diabetic retinopathy. This paper uses CNN architecture for
the detection and classification of diabetic retinopathy. For clinical applications, the
trained models are deployed on cloud computing models. The studies in this paper
are supposed to assist medical doctors to easily detect diabetic retinopathy and reduce
the number of reviews of doctors and also help patients to be aware of their medical
condition.
In the future, a larger dataset will be included and a broader study will be done.
The data we get will be further used to modify and improve the accuracy, efficiency
of the models. As per the observation made, it is understood that neural networks
which are a machine learning technique have ensuing scope in disease detection. The
competency of R-CNN technique is been proven in the field of object detection. As
per our work, it is proven that CNN is been useful to identify knee-high features.
When it comes to lesion detection, CNN is considered to be more accurate with its
93.8% accuracy rate.
438 K. S. Swarnalatha et al.
References
1. Aslani C (1999) The Human Eye: Structure and Function. Sinauer Associates, Sunder-land,
MA
2. Lam C, Yi D, Guo M, Lindsey T (2018) Automated detection of diabetic retinopathy using
deep learning
3. Zhang L, Fisher M, Wang W (2015) Retinal vessel segmentation using multi-scale textons
derived from keypoints. J Comput Med Imaging Grap 45:47–56
4. Nikhil MN, Rose AA Diabetic retinopathy stage classification using CNN
5. Wang Z, Yang J (2018) Diabetic retinopathy detection via deep convolutional networks for
discriminative localization and visual explanation. In: The workshops of the thirty-second aaai
conference on artificial intelligence, Microsoft corporation [email protected]
6. Arcadu F, Benmansour F (2019) Deep learning algorithm predicts diabetic retinopathy progres-
sion in individual patients 2, 92 (2019), sep 20. https://doi.org/10.1038/s41746-019-0172-3.
eCollection 2019
7. Hassana C, Boyce JF, Cook HL, Williamson TH (1999) Automated localization of the optic disc,
fovea and retinal blood vessels from digital color fundus images. British J Ophthal 83(8):902–
910
8. Sinthanayothin C, Boyce JF, Williamson TH, Cook HL, Mensah E, Lal S, Usher D (2002)
Automated detection of diabetic retinopathy on digital fundus images. Diab Med J 19
9. Akram MU, Khan SA (2013) Multilayered thresholding-based blood vessel segmentation for
screening of diabetic retinopathy. J Eng Comput 29(2):165–173
10. Vermeer KA, Vos FM, Lemij HG, Vossepoel AM (2004) Model based method for retinal blood
vessel detection. Comps Bio Med 34(3):209–219
11. Chaudhuri S, Chateterjee S, Katz N, Nelson M, Goldbaum M (1989) Detection of blood
vessels in retinal images using two-dimensional matched filters. IEEE Trans Med Imaging
8(3):263–269
12. Chanwimaluang T, Fan G (2003) An efficient algorithm for extraction of anatomical structures
in retinal images. In: Proceedings of ICIP, pp 1193–1196
13. Hoover AD, Kouznetsova V, Goldbaum M (2000) Locating blood vessels in retinal images by
piecewise threshold probing of a matched filter response. IEEE Trans Med Imaging 19(3):203–
211
14. Martinez-Perez ME, Hughes AD, Stanton AV, Thom SA, Bharath AA, Parker KH (1999)
Segmentation of retinal blood vessels based on the second directional derivative and region
growing. In: Proceedings of ICIP, pp 173–176
15. Martinez-Perez ME, Hughes AD, Stanton AV, Thom SA, Bharath AA, Parker KH (1999)
Scale-space analysis for the characterization of retinal blood vessels. In: Taylor, Colchester A
(eds.) Medical Image Computing and Computer-assisted intervention (Lecture notes computer
science) vol 16794, Springer, New York, pp 90–97
16. Wang Y, Lee SC (1998) A fast method for automated detection of blood vessels in retinal
images. In: IEEE Computer Society Proceedings of Asilomar Conference, pp 1700–1704
17. Jiang X, Mojon D (2003) Adaptive local thresholding by verification based multi-threshold
probing with application to vessel detection in retinal images. IEEE Trans Patt Anal Mach
Intell 254(1):131–137
18. Zana F, Klein JC (2001) Segmentation of vessel-like patterns using mathematical morphology
and curvature evaluation. IEEE Trans Med Imag 11(7):1111–1119
19. Niemeijer M, Staal J, Van Ginneken B, Loog M, Abràmoff MD (2004) Comparative study of
retinal vessel segmentation methods on a new publicly available database. In: Fitzpatrick M,
Sonka M (eds) Proceedings of SPIE Medical Imageing, vol 5370, pp 648–656
20. Staal J, Abramoff MD, Niemeijer M, Viergever MA, van Ginneken B (2004) Ridge-based
vessel segmentation in color images of the retina. IEEE Trans Med Imag 23(4):501–509
21. Zhou L, Rzeszotarski MS, Singerman LJ, Chokreff JM (1994) The detection and quantification
of retinopathy using digital angiograms. IEEE Trans Med Imag 13(4):619–626
Detection of Diabetic Retinopathy Using Convolution … 439
22. Tolias Y, Panas SM (1998) A fuzzy vessel tracking algorithm for retinal images based on fuzzy
clustering. IEEE Trans Med Imag 17(2):263–273; 1998. (2):105–112; 2002
Drone-Based Security Solution
for Women: DroneCop
1 Introduction
India tops as the most dangerous place for women even in the twenty-first century. As
per NCRB Report 2019, 88 rape and sexual harassment cases happen on an average
day [1]. It means almost 4 cases per hour. Less than 0.14% of cases are solved in India.
99% of these cases go unreported according to government reports [2]. The existing
conventional ways and the police provide help mostly after the crime is committed,
and therefore, help is not offered in real time to the victim. If it was possible to
decrease this time lag, most of the victims can be saved from these incidents.
Many apps and smart gadgets were developed for women’s safety in the past. But
there are many drawbacks. The apps keep crashing, or it needs to be opened to send
an SOS to emergency contacts which are not possible at all times. The location taken
by these apps is not accurate. Few apps are paid and have excessive advertisements on
the screen. The features like voice-activated SOS, pressing the power button/volume
button 3 times to send SOS to emergency contacts, and detecting screams to send
SOS or alert fail to work sometimes. Smart gadgets for women’s safety like gloves,
wearable bracelets, shoes, etc., have been developed. But these gadgets are bulky
and cannot be carried everywhere. So, they are not portable. These gadgets require
more hardware, which in turn increases the implementation cost.
To overcome the above problems, the proposed solution is an amalgamation of
the use of drones, a smartwatch, and a mobile application which will provide help
to the users in real time, thus decreasing the time lag. Drones are unmanned aerial
vehicles (UAVs) with mounted sensors, GPS, navigation systems and cameras, and
other features that are primarily used for surveillance, security, and other purposes.
Unmanned aerial vehicles (UAVs) are planes that fly without a crew or passengers.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 441
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_38
442 S. Bharati et al.
They might be remotely controlled cars or autonomous ‘drones’. Drones are used
to monitor climate change, execute rescue operations in the aftermath of natural
catastrophes, take photographs and video, and deliver commodities, among other
things [3].
Being virtually connected on a platform to access help in real time makes a woman
feel safer. Here, the user can send SOS through a mobile application or smartwatch.
The mobile application connects the IOT safety device module (smartwatch) and
triggers the drone. The drone on activation reaches the location and tracks the move-
ment of the victim/attacker. Following the drone, the location of the attacker may be
traced, and the situation thus can be taken under control.
2 Literature Survey
The literature survey was conducted in three different directions: analysis of existing
apps for women’s safety, research papers on solutions proposed for women’s safety,
and a real-time survey on current security situations (including women, children).
Existing apps for women’s safety: There are many trending apps in the market
for women’s safety. Few apps were analyzed based on the details of the applica-
tion, their working, benefits, drawbacks, UI, and reviews. The apps are as follows
(Table 1).
Research papers related to women’s safety:
In the paper [10], the author has developed a solution using Internet of Things
devices and a mobile application is built which keeps track of user’s live coordinates
and sends it to nearby law enforcement agencies. The application user gets to know
about the nearest available secured places. The disadvantage is that the IoT device
is bulkier to wear. There are bugs in the application which cause SMS to deliver to
different law enforcement agencies in different zones.
In the paper [11], the author has developed an embedded system having a GSM
and GPS subsystem. This system sends an emergency message and live location to
the predefined contacts. It also triggers an alarm. This device is easy to use and is
triggered by the user on click of a button. The system does not work in offline mode,
and the switch needs to be pressed every time the location is changed. Hence, it is
not automated.
In the paper [12], the author has proposed an app that provides articles, reviews,
and safety levels about the location by use of the camera. Users can connect with
the law enforcement agencies and first responders by sending live coordinates. Users
can post their reviews about the application. The drawback is that the app does not
work in offline mode.
Real-time survey on current safety situations:
Real-time data was collected from family, close friends, and relatives through a
Google Form to understand the current safety situation. In the form, a set of 10 ques-
tions based on women’s safety were asked. A total of 338 responses were received
in 3 days from both men and women. The results are shown below (Table 2).
Drone-Based Security Solution for Women: DroneCop 443
Table 1 (continued)
App Name Working of the app Drawbacks of the app
ProtectMii—Personal Safety 1. It sends SOS messages 1. The user is unable to log in
App with Panic Alarm[8] along with the current battery properly
power in your phone
2. It triggers an alarm on its
own by just unplugging the
headphones cable or any USB
device
3. Using acoustic signals, it
sends alerts to nearby users for
help
4. False alarm and deactivation
protection enable the user to
cancel an accidentally
triggered alarm within 15 s
UrSafe: Personal 1. The app uses voice-activated 1. The voice-activated SOS
Security App[9] triggers to send SOS messages feature does not work properly
2. It shares video and audio 2. It is not user-friendly
with the emergency contacts 3. It does not immediately
3. It alerts emergency contacts, place the call to the emergency
police stations, and the users of contacts
this app who are nearby
Table 2 (continued)
Question No Real-time survey (women’s safety) results
7. If yes, did you try to help? If not, why?
Inference:
Few responses posed were positive as they were confident enough to take a stand
for the people in need which includes their relatives, friends, and other public
people. Meanwhile, few responses were negative because they feared not being
supported by the public or were hesitant of putting themselves in danger along
with the victim
8. Of the existing apps, how much have they been useful?/do you use any of them?
(Fig. 7)
Inference:
52.7% of responders have not used an app for women’s safety
9. What do you think will be the best way to defend yourselves from an unusual
situation using technology?
Inference:
The majority of the responses were to use the technology at its best by creating an
app that uses real-time GPS, sends SOS to emergency contacts, and alerts nearby
police stations
Fig. 2 Mode of traveling (alone, in groups, necessarily with a male member) versus the number of
responses
446 S. Bharati et al.
Fig. 7 The usefulness of the apps existing in the market for women’s safety
448 S. Bharati et al.
3 Proposed Methodology
Here is the sequence of activities that take place when a user triggers our system. It
is as depicted in Fig. 9.
In the level 0 data flow diagram, the abstract of the entire project is depicted in
Fig. 10.
Level 1 data flow diagram represents each of the sub-processes that form the
complete system. The diagram is as depicted in Fig. 11.
450 S. Bharati et al.
Algorithm—DroneCop System:
Input: Emergency trigger by the user (using mobile app or the smartwatch).
Output: Drone will reach the user’s location and livestreams the situation to the
control station.
Step 0: Start.
Step 1: The user registers on the app, and the information is acknowledged. When
the user logs in, the information is verified.
Step 2: Using the update module, the user can update his/her details.
Step 3: The history module stores the record of the previous help offered to the
user.
Step 4: When an emergency occurs, the user triggers the application. This can be
done in two ways: by pressing the SOS button or when there is a sudden surge in
the sensor values of the watch, the watch will trigger an SOS.
Drone-Based Security Solution for Women: DroneCop 451
Step 5: The application further triggers the drone by sending live geo-
coordinates.
Step 6: The application shares the GPS coordinates with the drone and also the
first responders to inform them about the help required.
Step 7: Once the drone is triggered, it takes off, reaches the destination, and live
feeds to the ground station via the Internet.
Step 8: Stop.
During emergencies, the novel security system developed will help the user in real
time by use of drones going to the current location and sending live videos and images
to the cloud and the first responders. The fear of the attacker being watched over by
the drone might end up escaping from the location, and with this, the entire situation
can be stopped. The system is tested under different situations, and the parameters
considered here are a time to respond, the time (morning, evening, night), distance,
altitude, and weather conditions. The summary is presented in Table 3, (Figs. 12 and
13).
From the above data, it can be concluded that the approximate time to initialize
the system is 20 s.
For the script to fetch the geo-coordinates, it takes around 20 s. The takeoff for
the drone can be estimated to be 60 s and the peak speed to an approximation of
10 m/s, and the average speed is around 7-8 m/s (Figs. 14, 15 and 16).
Time taken to initialize the system ≈ 20 s.
Time taken for the script to fetch the geo-coordinates ≈ 20 s.
Time taken by the quad to take off ≈ 60 s.
Peak speed of the drone = 10 m/s.
The average speed of the drone = 7–8 m/s.
Drone-Based Security Solution for Women: DroneCop 453
Fig. 16 Drone
454 S. Bharati et al.
DroneCop, a system built for women’s safety, is described in this study report. By
tapping a button on the app or sensing the heart rate using a smartwatch, users can send
alert messages to first responders and law enforcement agencies. The benefit of using
this technology is that a drone can get to a victim’s location way faster than police
and emergency personnel can, allowing the drone to trace the assailant. This system
has been tested under various circumstances and environmental conditions, and the
results have been studied. This system’s scope could be expanded in the future to
include natural disasters, survey and surveillance, road, and pole inspection. It might
also be extended to include rescuing victims if the mobile network is unavailable
after the initial alert, and alternative methods of triggering, such as pushing the
power button or using voice to activate SOS.
References
1. https://www.indiatoday.in/diu/story/no-country-for-women-india-reported-88-rape-cases-
every-day-in-2019-1727078-2020-09-30
2. https://www.livemint.com/Politics/AV3sIKoEBAGZozALMX8THK/99-cases-of-sexual-ass
aults-go-unreported-govt-data-shows.html
3. https://en.wikipedia.org/wiki/Unmanned_aerial_vehicle
4. https://economictimes.indiatimes.com/tech-life/14-personal-safety-apps-for-women/14-cir
cle-of-6/slideshow/45451296.cms
5. https://inc42.com/buzz/dror-raises-funding-from-ip-ventures-to-make-safe-spaces-for-
women-tourists/
6. https://therodinhoods.com/post/chilla-personal-safety-app-that-detects-a-scream/
7. https://download.cnet.com/SafeON-Personal-Safety-App-Emergency-Alert/3000-31713_4-
78687765.html
8. https://protectmii.com/
9. https://www.usatoday.com/story/money/2019/12/10/digital-safety-ursafe-hands-free-safety-
app-livestreams/2628566001/
10. Kabir AZMT, Tasneem T (2020) Safety solution for women using smart band and CWS
app. In: 2020 17th International conference on electrical engineering/electronics computer,
telecommunications and information technology (ECTI-CON). IEEE, 2020
11. Ruman MR, Badhon JK, Saha S (2019) Safety assistant and harassment prevention for women.
In: 2019 5th International conference on advances in electrical engineering (ICAEE). IEEE
12. Chaudhar P et al. (2018) Street smart’: safe street app for women using augmented reality. In:
2018 Fourth international conference on computing communication control and automation
(ICCUBEA). IEEE
Analysis of Granular Parakeratosis
Lesion Segmentation: BCE U-Net vs
SOTA
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 455
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_39
456 S. Janthakal and G. Hosalli
Yang et al. [6] built a multiple target CNN to segment and classify the lesions of
ISIC 2017 challenge dataset. The network’s encoder component used a GoogleNet
CNN that had been pre-trained, while the segmentation branches used a U-Net
decoder model. The results were better than a GoogleNet-based classification model
with an accuracy of 89% vs. 85.7%, but not as good as the best performance in the
ISIC 2017 competition (accuracy of 88.6 percent vs. 91.1 percent [7]).
The segmentation process can be carried out using a variety of techniques. As
time has passed, all of the techniques have become less feasible for all types of
images. Hence, a new way of implementation is to be incorporated. The suggested
U-Net with binary cross-entropy loss function is described and compared to existing
state-of-the-art image segmentation approaches on granular parakeratosis lesions in
this research.
The paper is organized in the following pattern. In Sect. 2, we review the literature
concerned with the segmentation of skin lesions. Thereafter, in Sect. 3, the archi-
tecture of the proposed system is given. Section 4 gives a brief description of the
existing segmentation techniques and the proposed U-Net with BCE. Then, Sect. 5
covers the results obtained by implementing U-Net with BCE and comparison with
other state-of-the-art techniques. Section VI is the conclusion of the paper.
2 Related Studies
Ünver et al. [12] proposed a skin image segmentation pipeline that combined the
GrabCut algorithm with a deep convolutional neural network (DCNN) called “You
Only Look Once (YOLO). “The approach was tested on the PH2 and the ISBI 2017
datasets. This pipeline model has a sensitivity of 90%, an accuracy of 93% and a
specificity of 93%.
The work of [13] simultaneously executes an auxiliary job, edge prediction, with
the segmentation task. “The edge prediction branch directs the learned neural network
to focus on the segmentation masks border.” During the training stage, this approach
predicts both the segmented mask and its matching edge (contour), and only the
segmentation mask is employed for prediction during the testing phase, yielding a
sensitivity of 88.76 and an accuracy of 94.32.
Lei Bi et al. [14] introduced a new FCN approach for automatically segmenting
skin lesions. The method used the primary visual aspects of skin lesions learnt
and deduced from several embedded FCN stages to achieve accurate segmentation
without utilizing any pre-processing procedures. On the PH2 dataset, the approach
has an accuracy of 88.78, a sensitivity of 91.88 and a specificity of 89.42.
All of these approaches are less feasible for segmenting the granular parakeratosis
lesion. Hence, U-Net with binary cross-entropy is proposed that is described in the
next section.
3 System Architecture
Granular parakeratosis is a rare red, scaly skin disorder that mostly affects body
folds, particularly the armpits. Granular parakeratosis is most commonly associated
with middle-aged women, but it has also been documented in newborns, children and
males of all ethnicities. It may progress to a malignant stage if not treated at an early
time. Thus, an automated and intelligent identification technique is required while
processing the skin lesion images. An automated model implemented here is using
the U-Net architecture with binary cross-entropy function. The system architecture
is shown in Fig. 2.
The dataset obtained from the publicly available site is input to the proposed model
initially. The images of size 224 × 224 × 3 go through the encoder and decoder with
the application of binary cross-entropy loss function. Finally, we get the segmented
images of size 224 × 224 × 3 as shown in figure obtaining the highest accuracy,
sensitivity and minimal loss.
The purpose of this study is to compare several segmentation strategies on granular
parakeratosis lesions and then to compare the results produced from the implemen-
tation of the suggested U-Net binary cross-entropy method with numerous existing
state-of-the-art techniques.
458 S. Janthakal and G. Hosalli
Input dataset
Segmented image
4 Segmentation Techniques
4.1 Fcn
4.2 SegNet
The SegNet architecture includes an encoder and a decoder network and a pixel-wise
classification layer. The encoder network includes 13 convolutional layers that match
with the first 13 convolutional layers of the VGG16 network [20]. Using the encoder’s
memorized max-pooling indices, the decoder network up samples the input feature.
A sparse feature map is created as a result of this approach [21, 22]. The disadvantage
is it provides less accuracy for smaller datasets but the model should be designed in
such a way that it provides better accuracy for small as well as larger datasets.
4.3 DeepLabv3 +
4.4 U-Net
Olaf Ronneberger et al. [25, 26] created the UNET for bio-medical image segmen-
tation. There are two paths in the architectural structure. The contraction path, also
known as the encoder, captures the context of an image. The encoder is made up
of layers that are both convolutional and maxpool. The decoder or expanding path
allows for transposed convolutions to be used for localization.
The binary cross-entropy loss function is implemented here, and the model is built
using an Adam optimizer. The loss function for binary cross-entropy is provided by
an Eq. (1):
1 Σ
N
H P (q) = − yi · log( p(yi )) + (1 − yi ) · log(1 − p(yi )) (1)
N i=1
where y represents the label and p(y) indicates the predicted probability of the point
for all N points.
That is, for each point y = 1, it concatenates log(p(y)) to the loss. Conversely, it
adds log(1−p(y)) for the point y = 0.
The model is trained on a Windows 10 system with an Intel Core i5-2.4 GHz
processor and 8 GB RAM configuration using Python.
460 S. Janthakal and G. Hosalli
The results of parakeratosis lesion segmentation using the U-Net with the binary
cross-entropy method are shown in Fig. 3.
Figure 4 shows the images obtained by performing the segmentation using FCN,
SegNet and DeepLabv3 +
5.1 Dataset
A dataset is made up of several pieces of data that are used to train the model to
uncover predictable patterns throughout the entire dataset. Datasets are essential for
the advancement of numerous computational domains, providing results with scope,
robustness and confidence [27, 28]. “With the advancement of machine learning,
artificial intelligence and deep learning, datasets have become popular.” This paper
retrieves the dataset from freely available resources like DermnetNZ. DermNet is
a dataset available worldwide consisting of RGB images of granular parakeratosis.
Here, the skin lesion images, as well as the mask image, are converted to BMP
format. The images have been categorized into 1080 training set, 147 test set and
278 validation set, respectively. All the images are initially converted to 224 × 224
× 3 pixels and loaded into the model, thereby obtaining the segmented images.
5.2 Evaluation
The methods are evaluated using accuracy, sensitivity and specificity metrics. This
is computed as
TP + TN
Accuracy =
TP + FN + TN + FP
TP
Sensitivity =
TP + FN
TN
Specificity =
TN + FP
100
90
80
70
60 FCN
50 Segnet
40 Deeplabv3+
30 U-net(BCE)
20
10
0
Accuracy Sensitivity Specificity
Fig. 5 Performance comparison of FCN, SegNet, DeepLabv3 + and U-Net with BCE
Table 1 Accuracy, SENSITIVITY, AND SPECIFICITY VALUES for FCN, SegNet, DeepLabv3
+ and U-Net with BCE
FCN SegNet DeepLabv3 + U-Net(BCE)
Accuracy 89 89 90 96.71
Sensitivity 98 80.05 84 100
Specificity 81 85 81 91
Table 2 Values obtained for binary cross-entropy for FCN, SegNet, DeepLabv3 + and U-Net with
BCE
FCN SegNet DeepLabv3 + U-Net (BCE)
Binary cross-entropy loss 0.29 0.89 0.81 0.21
Fig. 6 Significance of binary cross-entropy on the proposed method and existing methods
Analysis of Granular Parakeratosis Lesion Segmentation: BCE U-Net vs SOTA 463
Jaccard distance
MSE
Dice co-efficient
Binary Cross Entropy
In the field of machine learning, several loss functions are available, such as
the Jaccard distance, mean squared error (MSE), dice coefficient and binary cross-
entropy loss function. Out of these, the binary cross-entropy loss function is proved
to provide minimal loss value. The proposed method is implemented using all of
these loss functions, and the corresponding values obtained are shown in Table 3,
and these values are used to draw the graph shown in Fig. 7.
6 Conclusion
This paper gives a brief discussion of the BCE-based U-Net model’s implemen-
tation as well as comparison of segmentation approaches such as FCN, SegNet,
464 S. Janthakal and G. Hosalli
DeepLabv3 + with the proposed model. The goal of this research is to give a compar-
ison of such well-known existing image segmentation methods with the BCE-based
U-Net technique in order to determine which method is best for medical picture
segmentation. When compared to the ensemble and probabilistic-based state-of-the-
art approach, the accuracy and sensitivity of the results found in BCE-U-Net are
substantially greater. BCE U-Net obtained an accuracy of 96.71%, sensitivity of
100% and specificity of 91%.
References
1 Introduction
It is important to monitor the relays and circuit breakers which are working to reduce
the power outage and the expensive charges for the damages that are caused after
the failure [1]. To maintain the reliability and the stability, it is necessary to protect
the equipment from abnormal conditions. Relay and Circuit breakers have a service
life when operated under rated conditions. When abnormal conditions occur, the life
of the equipment reduces significantly [2]. To maintain the health of the relays and
circuit breakers, first the parameters need to be identified that cause the problem
to prevent the faults. When the parameters are exceeded more than rated, then the
error message and precaution message are sent using the GSM module. This GSM
module helps to get the message on the registered on the mobile number to take up
the precautionary measures. While doing the hardware circuit, the GSM module can
be replaced with the Wi-Fi module along with the smart cloud technology where we
get the message on the application on the smart phone. The data logs can also be
found for future reference.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 467
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_40
468 S. Harshitha et al.
Status monitoring is a big part of guessing the maintenance. Data collected from
status monitoring over time provides important information about the current status
and history of the equipment. This information about the equipment can be used
to predict how the product will perform over time and how it may undermine it,
allowing maintenance to be adjusted according to these predictions [3]. For analyzing
the failures of the equipment, intelligent decision-making system is used. In present
scenario, industries are using SCADA and PLC to get the status of the circuit breakers
and relays.
3 Related Work
4 Simulation
Proteus is used to simulate design and drawing of electronic circuits. Proteus Virtual
System Modeling (VSM) combines mixed mode SPICE circuit simulation, animated
components, and microprocessor models to facilitate co-simulation of complete
microcontroller-based designs [9]. Designing a circuit on proteus consumes less time
compared to practical construction of circuit. There is no possibility of damaging
any electronic component in proteus. The software contains PCB designing as well.
The two main parts of Proteus are to design or draw circuits and designing the PCB
layout. ISIS is used for designing and simulating circuits. ARES is used for designing
Real-Time Health Monitoring of Relays and Circuit Breakers 469
a Printed Board Circuit. ISIS has a wide range of components in the library. ARES
offers PCB designing with surface mount packages [10].
5.1 Simulation
Initially, relay is under OFF condition, once power is turned ON a message “Starting,
Please Wait” is displayed on LCD. A delay of 1 s(1000 ms) is set to power the circuit.
Figure 2 shows the circuit connections of the simulation. A switch keeps a track of
relay ON/OFF condition. As soon a switch is pressed, it is considered as one count,
and the same count value is stored in controller. The count gets incremented after
every press on the switch displaying the same value on LCD.
A permissible limit value is set to the controller, if the switch is pressed more than
the limit value an error message of is displayed on LCD as shown in Fig. 3. This is a
warning message indicating relay has reached its maximum usage; hence, the relay
needs to be replaced. If the user ignores this warning message, it will lead to failure
of the equipment.
An LM35 sensor and a current sensor acquires temperature and current value of
load under normal healthy condition. The temperature data which is obtained is an
analog value; therefore, a conversion is done to display the value in standard format.
In simulation, lamp load is used to keep a track on current consumption. Fault
message is displayed in case of excess current consumption as shown in Fig. 3. The
message is displayed on LCD and a virtual terminal of GSM module. This condition
is achieved when resistance of lamp load is decreased. This message gives a warning
that the circuit is under short-circuited condition and hence the fault needs to be
cleared for safer operation of equipment.
6 Hardware Development
The electrical life cycle of the relays is normally expected up to 100,000 cycles. It
varies with the type of relay. The count of the relay increases as it opens and closes.
Here, the relay life cycle count is set up to 5. Once 5 counts are reached, it will show
the message “Relay Life Expectancy Reached” on the LCD as shown in Fig. 5. The
same can be seen on the status bar on the Blynk app.
A short circuit is created by shorting a sensor since we cannot short two wires
directly in electronic circuits. Shorting two wires directly causes the damage to
the components. The resistance variation in the circuit is taken into consideration
for detecting the fault. Whenever a short circuit happens, a message “Short circuit
detected” is displayed both on LCD and Blynk app.
Relays operate under specific temperature. Whenever there is an increase in
temperature, the contacts may get damaged. Here, we have set the temperature
to 35 °C limit. The temperature sensor measures the temperature. Increase in the
temperature beyond this limit will be considered as a fault, and the error message
similar to Fig. 5 will be printed on LCD and even on the Blynk app.
References
1. Kharat B, Sarwade D, Bidgar D, Kadu B (2017) Internet of Thing (I.O.T) base controlling &
monitoring of circuit breaker. Int Res J Eng Technol (IRJET) 04(05)
2. Dalke G (2005) Application of numeric protective relay circuit breaker duty monitoring. IEEE
3. Melli SA, Nadian A, Amini B, Asadi N (2011) Design of online circuit breaker condition
monitoring hardware. In: IEEE 2nd international conference on control, instrumentation and
automation, 27–29 Dec 2011
4. Huang S, Chen S, Qiu Y, Ye Y, Pan W (2011) Online condition monitoring methodology for
relay protection based on self-test information. IEEE 16–20 Oct 2011
5. Kirschbaum L, Dinmohammadi F, Flynn D, Robu V, Pecht M (2018) Failure analysis informing
embedded health monitoring of electromagnetic relays. IEEE 23–25 Nov 2018
6. Feizifar B, Usta O (2017) Condition monitoring of circuit breakers: current status and future
trends. In: 2017 IEEE international conference on environment and electrical engineering and
2017 IEEE Industrial and Commercial Power Systems Europe (EEEIC/I&CPS Europe)
7. Byerly JM, Schneider C, Schloss R, West I (2017) Real-time circuit breaker health diagnostics.
In: IEEE 2017 70th annual conference for protective relay engineers (CPRE) 3–6 Apr 2017
8. Xiao F, Zhang Z, Yin X, Ji J, Chen G (2015) Study of maintenance strategy of relay protection
system based on condition monitoring. IEEE, 1–4 Sept 2015
9. Proteus-design {available online}. https://en.wikipedia.org/wiki/Proteus_Design_Suite/
(Accessed on 29 June 2021)
10. About Proteus software {available online}. https://www.theengineeringknowledge.com/introd
uction-to-proteus/ (Accessed on 29 June 2021)
AI-Based Live-Wire News Categorization
1 Introduction
The above objectives can be achieved by creating a model using machine learning
algorithms, which classifies the live data into different categories like Sports news,
Entertainment, Political, Health, Business, Science and Technology. Multinomial
Naive Bayes model is used for the classification of news. Multinomial Naive Bayes
is a probabilistic learning method that is used for the analysis of categorical text
data. The training is done using the UCI-NEWS aggregator Dataset, which consists
of around 4L records. And testing is performed using live news from news API,
which contains 100 records. The model gave an accuracy of 92%. The front-end user
interface is a web application using Flask in Python, HTML, and CSS.
Developing and deploying news categorization system [1] based on interested
content of news-by-news broadcaster are difficult task. To solve this challenging
task, an attempt has been made to conceive and develop news categorization system.
Major aspects of the news categorization system are presented in this paper.
2 System Architecture
In designing news categorization system [2], the following inputs are taken into
account.
• Multinomial Naive Bayes algorithm: This algorithm is a probabilistic learning
approach that is commonly used in natural language processing. The Bayes
theorem underpins the algorithm.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 475
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_41
476 S. Jagdeesh Patil et al.
The data module consists news categorization model definitions and collects live
news data from live-wire using API on live-wire content. The news content is cate-
gorized into various categories based on management console inputs and stores data
into various baskets in the back-end server.
The application module provides news categorization [5] to user of this system. It is
an interface to provide categorization generated in the business module to end-user
in regular intervals via web module.
478 S. Jagdeesh Patil et al.
3 Software Implementation
3.1 NewsAPI
It is an HTTP, rest API meant for searching and fetching live articles. It provides live
articles and historic articles and created an API key to get data.
The data is preprocessed by converting data to lower case, removing the links,
numbers, and punctuation. Data preprocessing is a very important step in our appli-
cation, which reduces the dimensionality of vectors. We encounter a multi-class
problem. The class name is categorical [3], so they are human-readable. Label
encoding is a technique that converts categorical data to numerical values. In our
case, it converts categories’ names into numbers. It assigns labels with a value from
zero to classes-1 where classes are a number of distinct classes.
Count vectorizer converts a text into vector based on the count of each word in the
entire text. This creates a matrix, where the columns are unique words and the row
is a document. Inside the count vectorizer, the column is not stored as a string. But
they are given with a particular index. An object of CountVectorizer() is created and
converted value into numerical using fittransform(). TF-IDF vectorizer will calculate
relevance of a word to a document, in a set of documents, which is calculated by
multiplying the TF of a word in a document and IDF of the word across the document.
If the word is common in many documents, then the number will reach 0. Otherwise,
it will be 1.
is based on the Bayes theorem. Inbuilt, it uses Laplace smoothing technique which
solves the zero-probability problem.
3.6 Pseudocode
It used UCI news aggregator to get the data, then for the preprocessing part, data
is divided into 80:20 ratio as shown in Fig. 4. Dataset contains 4 (four) lakh rows
of data, and the accuracy is obtained. The model gave an accuracy of 92 percent.
Evaluation is done on algorithm’s performance in identifying the test data with varied
sizes and diverse training data in terms of content.
It is experimented with algorithms, support vector machine, multilayer percep-
tron. Unlike SVM and MLP, Naive Bayes considers all characteristics to be mutually
independent, ignoring co-relationships that might be crucial to the dataset. Multi-
nomial Naive Bayes shows better accuracy than the former algorithms (Figs. 5 and
6).
• The model that is built by applying multinomial Naive Bayes algorithm catego-
rizes the data that we collect from different sources of news using NewsAPI into
AI-Based Live-Wire News Categorization 481
Fig. 4 News categorization system dataset division for training and testing and accuracy
References
1. Patro A, Patel M, Shukla R, Save DJ (2020) Real time news classification using machine learning.
Int J Adv Sci Technol. Dept. of Information Technology, Fr. Conceicao Rodrigues college of
Engineering, Mumbai, India
2. Tsai C-W (2017) Real-time news classifier search engine architecture final report
3. Sivakami M, Thangaraj M (2018) Text classification techniques: a literature review. Interdisc J
Inf Knowl Manage
4. Tang X, Xu A (2016) Multi-class classification using kernel density estimation on k-nearest
neighbours. Electron Lett
5. Jivan NE, Yousefi KS, Fazeli M (2010) New approach for automated categorizing and finding
similarities in online Persian news. In: Technological convergence and social networks in infor-
mation management, international symposium on information management in a changing world,
2010
6. Suleymanov U, Rustamov S (2018) Automated news categorization using machine learning
methods, vol 459. IOP Publishing, p 012006. [Online]. Available, https://doi.org/10.1088/1757-
899x/459/1/012006
Implementation of STFT for Auditory
Compensation on FPGA
1 Introduction
Hearing aids are used to compensate for the hearing loss. Hearing loss may occur
in outer ear or middle ear (conductive loss) or inner ear (sensorineural hearing loss).
Audibility loss and a reduction in the dynamic hearing range are common symptoms
of hearing loss. Most common treatment for hearing loss is use of hearing aids.
The most important part in hearing aid is the auditory compensation block. It is
responsible for compensating for the hearing loss of the patient. In hearing aids, a
microphone picks up sounds, the amplifier makes it louder and the receiver after
receiving from amplifier sends it to the ear. Frequency structuring and dynamic
range correction are two functions that the auditory compensation provides. Former
compensates the frequency-dependent loss; it provides more flexibility to meet the
requirement of hearing-impaired people to increase the gain level of the audio signal.
The latter adjusts the received signal’s range in the residual dynamic range. The
amount of auditory compensation required is determined by the audiogram which
is obtained by performing an audiometry test on the subject. The audiogram depicts
the kind, severity and pattern of hearing loss. Based on the patient’s audiogram, the
amplification required for a specific frequency band is established [1].
Researchers have so far implemented auditory compensation by using filter banks
[2]. The general methodology adopted for auditory compensation has been using
multi-channel filter banks [3–5]. This requires more filter coefficients. The number of
banks is usually limited to 8 in order to strike a balance between the needed frequency
response resolution, processing complexity, and signal delay caused by processing.
The majority of hearing aids use digital technology which allows for more precise
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 483
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_42
484 S. L. Pinjare et al.
programming and feedback control. The possibility of aliasing is present while digi-
tizing analog signals. The rate of sampling must always be greater than twice of the
highest frequency of the sampled waveform. Application of an anti-aliasing filter was
also suggested [6]. Overall hearing aid performance is improved by adjusting gain
function. The filter bank approach enables a variety of capabilities, such as voice
coding and noise reduction, auditory compensation and sub-band coding. Noise
reduction, echo cancelation, auditory compensation, and voice enhancement are all
performed repeatedly in the DSP algorithm sub-band coding. The majority of designs
use a fixed sub-band to minimize the filter bank. Hearing-impaired people are unable
to proceed with certain cases in order to improve their auditory skills. Hence, it can
be customized for each patient, but the main disadvantage is that there are fewer sub-
bands. When FIR filters with extremely small transition bands are required, frequency
response masking is a regularly used algorithm [7]. The filter bank approach has
also higher power dissipation due to computational complexity. To reduce filter bank
hearing aid power dissipation, a low computational complexity interpolated FIR filter
bank was proposed, as well as a strategy to reduce the amount of power consumed
due to multiplications, which are the filter bank’s main operations [8–10]. The filter
bank’s power dissipation is further decreased with effective word length reduction
of the coefficient of filter channels without compromising with the performance.
The usage of the 1/3 octave filter bank in acoustic applications closely matches
the frequency characteristics of the human ear. The high computational complexity
and power consumption limit its applications. A novel approach using short-time
Fourier transformation (STFT) is used to provide auditory compensation [11–13].
The Fourier transform of a tiny portion of the audio signal is performed. Using
the quasi-stationary property of speech signals, the method of STFT for auditory
compensation is being implemented which can overcome many of the drawbacks of
the filter bank approach. STFT is a two-dimensional transformation that is calculated
by separating the input signal into segments and calculating DFT for each segment
using a sliding time-limited window, the dominant audio frequency is determined
and the required correction is applied to the incoming signal and delivered to the
subject through an earphone. There is no need of using a filter bank, thus reducing
computational complexity. This eliminates the problem of signal reconstruction that
exists with the filter bank approach, thus leading to very low power dissipation.
The implementation of more number of bands is also straightforward, higher rate of
sampling can be used, and instead of 16 point FFT, we can use 32 point FFT. This
will increase the number of bands and can provide better correction. However, the
power dissipation also will increase.
This work focuses on implementation of STFT for auditory compensation, a
potential algorithm which could replace the current trend of filter bank approach with
significantly lesser amount of logic which eventually leads to better performance,
lower power consumption with a smaller footprint.
Implementation of STFT for Auditory Compensation on FPGA 485
ft
x(t) ↔ X (iw) (1)
where x(t) is a signal when a Fourier transform is applied, X(iw) is obtained with
an unit rad/sec, where X(iw) is a complex number. The Fourier transform from time
domain to frequency domain is:
∞
X (i w) = ∫ x(t).e−i ωt dt (2)
−∞
The inverse Fourier transform from frequency domain to time domain is:
1 ∞
x(t) = ∫ X (iw).eiωt dw (3)
2π −∞
n−1
fk = f j e−i2π jk/n (4)
j=0
1
n−1
The fast Fourier transform, or FFT, is an algorithm that has been developed to
compute the DFT in an extremely economical fashion. It is much faster because of
the fact that it utilizes the results of previous computations to reduce the number
of operations, employs divide and conquer approach. In particular, it exploits the
periodicity and symmetry of trigonometric functions to compute the transform with
approximately N log N operations. DFT can also be expressed as
486 S. L. Pinjare et al.
N −1
Fk = f n W nk (6)
n=0
sample in half and express equation in terms of the first and last N/2 points:
2 )−1
(
N
N −1
f n e−i ( N )kn + f n e−i ( N )kn
2π 2π
Fk = (7)
n=0 n=N /2
2 )−1
(
N
2 )−1
(
N
−i ( 2π
N )kn f m+ N2 .e−i ( N )k (m+ 2 )
2π N
Fk = fn e + (8)
n=0 m=0
Or
2 )−1
(
N
( f n + e−ink f n+ N2 )e−
i2πkn
Fk = N (9)
n=0
2 )−1
(
N
i2π (2k)n
F2k = ( f n + f n+ N2 )e− N (10)
n=0
2 )−1
(
N
− i2π kn
( f n + f n+ N2 )e ( 2 )
N
= (11)
n=0
2 )−1
(
N
i2π (2k+1)n
F2k+1 = ( f n − f n+ N2 )e− N (12)
n=0
2 )−1
(
N
− i2πkn
( f n − f n+ N2 )e− e ( 2 ) , For k = 0, 1, 2, . . . , (N /2) − 1
i2πn N
= N (13)
n=0
where f n is an input sequence, N is the length, F2k is the even output values and F2k+1
is the odd output values. On summing the both even and odd values, we get a desired
output Fk . Fig. 1 depicts radix-2 8-point FFT butterfly diagram where combination
Implementation of STFT for Auditory Compensation on FPGA 487
of smaller DFTs combine to form larger DFT. The computation is broken down into
three stages, the first stage with four 2-point DFTs, then two 4-point FFT and finally
an 8-point DFT. For an N-point FFT, there are N butterflies per stage and log2 (N)
stages. Each butterfly implies 1 complex multiplication and 2 complex additions.
Hence, there are (N/2).log2 (N) multiplications and N. log2 (N) additions.
Signal is considered stationary when its statistical properties do not change with
respect to time. Motion of visual objects is lucid if frames are presented successively
within an interval of 42 ms [14]. This stimulates the question: Whether the same
paradigm can be applied to speech signals since they are quasi-stationary too? To
analyze local frequency spectrum of an audio signal, particular number of samples
(frames) can be collected periodically and converted to frequency domain using
Fourier transform. This can be achieved with short-time Fourier transform. The
window’s hop length in STFT can be used to separate the samples temporally and
the window length can be used to define the number of samples in a frame. Essen-
tially, STFT is DFT with input being controlled by a window function. Hence, for
discretized audio signal x[n], discretized time-limited window of length T w[n], hop
length D and ith frame x i, we have:
T −1
xi (n)e− j
2π
X (K ) = T kt
, where k = 0, 12, 3, . . . , T − 1 (15)
n=0
The length of DFT is equal to the frame length, which in turn is equal to the
window length. Since the Fourier transform is applied on discrete signals, T (length
of DFT) discrete bins are obtained. When DFT of length T is applied on a signal
sampled at f s Hz, frequency of kth bin is given by:
k fs
fk = (16)
T
To sum up, Eqs. (14) and (15) collectively work as STFT and can be used to
analyze local frequency spectrum of the input signal. Figure 2 represents the block
diagram of STFT algorithm for auditory compensation. The input audio signal is
sensed by the microphone and digitized using an ADC.
In the audiometry test, the subject’s audibility is tested for octave frequencies
250 Hz–8 kHz. Hence, the input audio must be sampled at 16 kHz at least to be
able to recover the highest frequency. For the STFT, a window function must be
provided as input apart from the input audio signal. Length of DFT is equal to length
of the window since the frame length is limited by the window. Magnitude of the
DFT’s outputs is taken and is fed to MAX bin block, where the index of the highest
magnitude component is determined (dominant frequency for that frame). This index
is then fed as input to the amplifier which is essentially a look-up table (LUT). The
index obtained from MAX bin block is used to address the gain values stored in
LUT, the gain values are loaded to the LUT beforehand based on the audiometry
test done on the subject according to the respective bins. The gain value obtained
from the LUT is multiplied with the input audio signal and is then given as input
to DAC which eventually will drive the speaker. Since the time taken to process the
frame (determine the dominant frequency component and its respective gain value)
Implementation of STFT for Auditory Compensation on FPGA 489
is lesser than the quasi-stationary period of the speech signal, this model should be
able to produce specified amount of amplification for the respective frequencies.
The STFT algorithm for auditory compensation wave was implemented on
Simulink. Figure 3 shows the block diagram of the implemented system on Simulink.
An audiofile (in any of the formats, i.e.,.mp3/.wav/.flac/.wma) is used in this work
as audio signal. The input signal is sampled at 16 kHz. The output of “From Multi-
media File” block is discrete, hence, to perform STFT, along with the discrete input
a discrete window function is required for which a discrete pulse generator is used.
Parameters defined in this block define STFT properties such as the window length,
DFT length (FFT is used instead owing to its reduced computational complexity) and
hop length. Sample time of both, the pulse generator block and “from multimedia
file” block should match since both these values are multiplied in STFT block before
the FFT is performed. Figure 4 represents the internal block diagram of STFT.
Functional units are only multiplier and FFT unit. Buffers are added to give input
element-by-element (sample-wise), and the other buffer is added to match the FFT
length. Blocks “input,” “window’ and “fft_in” records data passing through and sends
to MATLAB workspace. Buffer connected to output of multiplier buffers (collects)
16 products and then sends it to FFT unit.
The STFT parameter “hop length” is defined by specifying number of zeroes
within a time period T of the window. It is given by:
The samples for which corresponding window value is ‘1’ is buffered and sent
to the FFT unit. The FFT length and the window length must be same, and in this
project, they are both set to 16. The 16 complex outputs obtained from FFT is fed
to a custom Simulink block which determines the magnitude and then the index of
490 S. L. Pinjare et al.
the component which has highest magnitude. This index corresponds to bin number
of the dominant frequency component. This is then fed to another custom Simulink
block which outputs the gain value based on the input fed (index). The gain values
are determined beforehand for the subject by audiometry test and stored according to
the bin numbers. The discretized digital output from “From multimedia file” block
is multiplied with the obtained gain value for amplification, and this product is sent
to “To multimedia file” block which writes another audio file. This represents the
corrected audio signal as per audiogram. The generated audio file can be played
either on MATLAB or on any other media player to observe amplification.
The STFT parameters, window length and hop length, are determined by trial.
Various audio files sampled at 16 kHz with 16-bit sample precision were given as
input to the system, and after amplification, it was written to another audio file and
the generated audio file was tested for quality by playing it on a media player. Aging
does not affect the low-frequency thresholds so 16-point FFT is used in this work.
Hence, one 1 kHz band is placed in the first bin and the frequency resolution would
be 1 kHz. Although 32-point FFT could provide better results since it has frequency
resolution of 500 Hz, the resource utilization would be significantly higher [15].
Since FFT length is set to 16, window length must also be 16. Hop length determines
the number of samples that are not going to get processed; the more it is, lesser
computation would be done, but it might also degrade the audio quality. Maximum
hop length must be determined such that the audio quality is retained as that of source.
This is done by incrementing the hop length by ‘1’, and then, the simulation is run
to check the audio quality. The hop length is incremented till the audio quality is
retained, with this methodology, a maximum hop length of 16 was achieved. In the
Implementation of STFT for Auditory Compensation on FPGA 491
Discrete Pulse Generator block, difference between period and pulse width serves
as the hop length. The parameter phase delay could also be used instead to set the
hop length. Pulse width specifies the window length. This implies, alternate frames
consisting of 16 samples are being processed, which in turn implies that only 50%
of the total samples are processed, thereby reducing the switching in circuit by 50%
and reducing dynamic power dissipation.
3 HDL Implementation
The whole system as indicated in Fig. 5 is implemented in RTL using Verilog HDL.
Verilog code for 16-point FFT is generated using MATLAB. Verilog code for all
other modules were designed manually. Other modules required are clock dividers,
SIPO unit, Max bin block, gain look up table, gain multiplier. The system’s main
clock which runs at 256 kHz is fed to the FFT block and a clock divider circuit.
The FFT module is computationally intensive and requires seven clock cycles to
provide the output which is why it is driven with a faster clock. In the system, two
clock dividers, clock divider 16 (clk_div_16) and clock divider 32(clk_div_32) are
required.
The multiplexer must select input from the ADC for a duration of one frame.
Since each frame consists 16 samples and the sampling rate is 16 kHz, one frame
happens to be 1 ms long. As alternate frames are being processed, multiplexer must
output ‘0’ for 1 ms as well, for which select line must be ‘0’ for 1 ms. The system
clock runs at 50% duty cycle, with the help of clock dividers the required window
can be generated. Hence, a window of 1 ms and a hop length of 1 ms is required.
The clock period for such a window must be 2 ms with 50% duty cycle. clk_div_32
(8 kHz clock) is used as select signal input to multiplexer.
Serial-In Parallel Out (SIPO) unit is driven by clk_div_16(16 kHz clock). Serial
in Parallel Out unit collects 16 input samples and then outputs them parallelly. It
has an inbuilt counter which counts the number of inputs received. Once the number
reaches 16, it outputs all the received inputs at once and sets the valid bit high so as
to indicate that the output is ready.
The input from ADC, each of 16-bit, after passing through the multiplexer, gets
accumulated in SIPO. Sixteen such samples from ADC get accumulated in SIPO
after which they are fed to FFT block for processing. Since the second N/2 values
of FFT are repetition of first N/2 values, only first N/2 outputs from this block are
carried to the MAX bin block. As 16 point FFT is performed, the first 8 outputs of FFT
block are real components and the next 8 are corresponding imaginary components.
The MAX bin block uses first eight outputs of FFT, determines the maximum among
them and finds its index (bin number) and outputs it. The module must be initially
reset to initialize the states of registers. The MAX BIN block calculates the dominant
frequency component and outputs its index, which is then fed to the gain lookup table
wherein gain values are pre-stored; hence, the gain value according to the index
stored in lookup table is given as output. The amplification required at frequencies
corresponding to the bin values is determined by the audiogram and stored in the
gain look up table. The gain multiplier is a signed 16-bit multiplier. The sampled
input in real time is multiplied by the gain value in gain multiplier block to produce
the required amplification. It is assumed that the audio signal is quasi-stationery.
Implementation of the proposed system was done with respect to Xilinx Artix-7
(xc7a100tcsg324-1) using Xilinx Vivado HLx Design suite 2019.2. Figure 6 depicts
the elaborated design schematic for the system. It is very similar to the RTL block
diagram shown in Fig. 5.
Table 1 depicts timing summary of implemented design. It can be seen that
setup, hold and pulse-width slacks are positive and that the design has met timing
constraints. It can be inferred that after logic, power and area optimisations, the elab-
orated schematic which was merely a graphical representation of the RTL has trans-
formed into device specific schematic after synthesis and PAR. Table 2 represents
resource utilization for the implemented design.
It can be seen that FFT block contains significantly more logic than any other
blocks. Large number of IOBs on top module is attributed for the gain values being
fed using verification IP cores VIO, ICON and ILA. While creating this IP core,
the nets to observed and probed must be mentioned after which the design is re-
implemented. For this re-implemented design, a bitsteam is generated and then the
FPGA is programmed after which input signals (or nets) are triggered from Vivado
and sent to the FPGA board, the board processes and sends back the output to Vivado.
In this manner, all desired inputs, outputs and intermediatory nets can be probed. To
program the device (FPGA), the implemented design is converted to a bitstream
using Xilinx bitstream generation tool. The FPGA is programmed and verified for
the functionality.
494 S. L. Pinjare et al.
Fig. 7 Comparison of outputs from magnitude FFT blocks of input and output
The model built on Simulink was tested for numerous audio files sampled at 16 K Hz
with 16-bit precision. The amplification was observed by keeping the volume levels
of the media player at same level for input and output audio files. The amplification
was also observed by comparing the outputs of magnitude FFT blocks for input and
output frames (16 samples). Figure 7 depicts the comparison of outputs between the
magnitude FFT blocks of the input and output. Magnitude FFT represented in blue is
for the input signal whereas magnitude FFT1 represented in yellow is of the output.
It can be seen from the waveform that the magnitude of output is significantly higher
than that of the input.
To see frequency-based amplification, control over each frequency bin also had
to be tested, for which monotone signals were fed and gain values were configured
only to amplify that particular bin and their magnitude FFT graphs were compared.
When 16-point FFT is applied on a signal sampled at 16 kHz, each bin has resolution
of 1 kHz. Figure 8 represents magnitude FFT comparison for a monotone signal of
3 kHz. Only the third address of the LUT was filled, rest all were initialized to zero.
This way, it was ensured that only the 3rd bin was amplified.
The delay that the implemented system imposed is obtained from post-
implementation simulation. The same (delay of about 1.15 ms) has been modeled in
Implementation of STFT for Auditory Compensation on FPGA 495
Simulink and another audio file is written with these constraints. Figure 9 represents
the system being modeled on Simulink with the delays. The output of the second
custom block is made to go through 16-unit delay elements. Each of these unit delay
blocks provide 1 sample time delay, and 16 such blocks are used to model for the
parasitic effects (in terms of delay) introduced by the FPGA.
Figure 10 represents audiogram of a patient. The custom block 2 is populated
with gain values in accordance with the audiogram and is then simulated and the
Fig. 11 represents the FFT magnitude for the same. Since gain values are required
vary logarithmically, they are significantly higher than the input samples.
Fig. 10 Audiogram
Fig. 11 Magnitude FFT when configured with respect to the above audiogram
6 Conclusion
Hearing loss is diagnosed when hearing testing finds that a person is unable to hear
25 decibels in at least one ear. Treatment for hearing loss depends on the cause
and severity of hearing loss. The hearing aid is the most commonly used solution.
The STFT approach for auditory compensation was verified on Simulink and imple-
mented on Digilent Nexys-4 DDR Artix-7 FPGA using Xilinx VIVADO. Though
filter bank approach provides high quality and high personalization (in terms of
amplification), its resource utilization is significantly higher when compared to that
Implementation of STFT for Auditory Compensation on FPGA 497
of STFTs. By merely determining the dominant frequency component for that instant
and appropriately applying gain replaces the reconstruction process followed in filter
bank approach, by doing so, not just the synthesis filter is forsworn the additional
noise filter added with the synthesis filter can be dropped as well. Frequency resolu-
tion can be changed by varying the FFT length and window length instead of adding
filters. STFT introduces a new paradigm for the design of digital hearing aids and
can replace the filter bank approach.
References
1. Alexandro MSA, Eduardo LR, Bampi S (1999) Dynamically reconfigurable architecture for
image processor applications. In: DAC ’99: proceedings of the 36th annual ACM/IEEE design
Automation Conference, USA, pp 623–628. https://doi.org/10.1109/DAC.1999.782018
2. Gonzalez RC, Woods RE (2014) Digital image processing. Pearson Education Limited
3. Fowers SG, Lee D-J, Ventura DA, Archibald JK (2012) The nature-inspired BASIS feature
descriptor for UAV imagery and its hardware implementation. IEEE Trans Circuits Syst Video
Technol 23(5):756–768. https://doi.org/10.1109/TCSVT.2012.2223631
4. Fularz M, Kraft M, Schmidt A, Kasiński A (2015) A high-performance FPGA-based image
feature detector and matcher based on the FAST and BRIEF algorithms. Int J Adv Rob Syst
12(10):1–15. https://doi.org/10.5772/61434
5. Kuo YT, Lin TJ, Li YT, Liu CW (2016) Design and implementation of low-power ANSI S1.11
filter bank for digital hearing aids. IEEE Trans Circuits Syst: Regular Papers 57(7):1684–1696.
https://doi.org/10.1109/TCSI.2009.2033539
6. Levyitt H (1987) Digital hearing aids: a tutorial review. J Rehabil Res Dev 24(4):7–20
7. Liu CW, Chang KC, Chuang MH, Lin CH (2013) 10 ms 18 band quasi-ANSI S1.11, 1/3 octave
filter bank for digital hearing aids. IEEE Trans Circuits Syst 60(3):638–649. https://doi.org/10.
1109/VLSI-DAT.2012.6212620
8. Chong KS, Gwee BH, Chang JS (2006) A 16-channel low-power non-uniform spaced filter
bank core for digital hearing aids. IEEE Trans Circuits Syst 5(9):853–857. https://doi.org/10.
1109/TCSII.2006.881821
9. Wei Y, Liu D (2013) A reconfigurable digital filterbank for hearing-aid systems with a variety
of sound wave decomposition plans. IEEE Trans Biomed Eng 60(6):1628–1635. https://doi.
org/10.1109/TBME.2013.2240681
10. Chong KS, Gwee BH, Chang JS (2006) A 16-Channel low-power nonuniform spaced filter
bank core for digital hearing aids. In: 2006 IEEE Biomedical Circuits and Systems Conference,
pp 186–189. https://doi.org/10.1109/TCSII.2006.881821
11. Girisha GK, Pinjare SL (2020) Implementation of novel algorithm for auditory compensation
in hearing aids using STFT algorithm. Acta Technica Corviniensis—Bull Eng13:4654–4660
12. Griffin D, Lim J (1984) Signal estimation from modified short-time Fourier transform. IEEE
Trans Acoust Speech Signal Process 32(2):236–243. https://doi.org/10.1109/TASSP.1984.116
4317
13. Alsteris LD, Paliwal KK (2007) Iterative reconstruction of speech from short-time Fourie
transform phase and magnitude spectra. Comput Speech Lang 21(1):174–186. https://doi.org/
10.1016/j.csl.2006.03.001
14. Nakajima Y, Matsuda M, Ueda K, Remijn GB (2018) Temporal resolution needed for auditory
communication: measurement with mosaic speech. Frontiers Human Neurosci 12:149. https://
doi.org/10.3389/fnhum.2018.00149
15. Kurakata K, Mizunami T, Sato H, Inukai Y (2008) Effect of Ageing on Hearing Thresholds in
the Low Frequency Region. J Low Freq Noise Vib Active Control 27(3):175–184
Smart Headgear for Unsafe Operational
Environment
1 Introduction
Two-wheeler accidents [2] are increasing day by day leading to death of numerous
lives. The probability of these deaths can be decreased significantly by using Smart
Headgear (Helmets). About 1.2 million individuals are losing their lives in street
accidents. The demise rate is not diminishing despite the fact that the clinic is giving
crisis emergency vehicle services. So, to conquer these issues, we have two rules to
be met by wearer of savvy head protector. One is that the rider must wear the helmet
which is checked by FSR sensor, and second is the rider should not have consumed
alcohol before riding which is checked by the alcohol MQ3 sensor.
At the point when the rider has devoured liquor, the MQ3 sensor will detect the
rider’s inhale to identify the measure of liquor content and make an impression on
enlisted contact. Third, when the rider meets with a mishap, the Gyro sensor will
identify it and module will send the client’s area to crisis administrations and his
enrolled contacts through a SMS. The cap can distinguish a mishap, utilizing the
Gyro sensor. Liquor MQ3 sensor detects the breath of the rider to recognize if the
current level is within authorized limit or not. FSR sensor identifies whether the rider
is wearing the Helmet or not.
The purpose of Smart Headgear/Helmet [3] is to provide safety for the vehicle
rider. This Headgear has advanced features like alcohol detection [1], accident iden-
tification, location tracking, use as a hands-free device, and fall detection. This makes
it not only a Smart Headgear but also a feature of a smart bike; ignition switch of the
vehicle cannot turn ON, without wearing the Helmet, which makes it compulsory to
wear the helmet.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 499
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_43
500 P. Dhanush et al.
2 System Architecture
• The sensors collect the monitoring parameters like level of intoxication, sleep
drowsiness of driver, speed of vehicle, fall detection, location of Headgear, user
details, etc.
• The sensors collect the date and time and then packetized and communicate these
data on regular intervals to storage location (cloud database).
• Mobile app is developed to register user for the service and avail the services.
• Users can also use web app for register process and for sake of service-related
queries.
• The AWS EC2 is used for analysis of the health of sensor by using outlier model.
This module is used for checking whether the user consumed alcohol or not. We have
tested this alcohol sensor with different alcohol containers like a sanitizer, water, etc.
After all, if the user consumed more alcohol, we are going to send SMS to parents
502 P. Dhanush et al.
or guardians that the user has consumed more alcohol. This alcohol sensor has high
sensitivity and fast response time. This makes sure that the user is in a safer position.
Features of alcohol sensor are as follows:
• Wider detecting scope that helps for user.
• Highly stable and also have long life.
• Produces fast response and also has high sensitivity.
Gyro sensors are precise rate sensors. The precise speed can be characterized as
the adjustment of rotational point per unit of time. Gyro sensors can detect the
rotational movement of the Helmet and also the progressions in direction. Lately,
Gyro sensors have tracked down that these sensors can be utilized in camera shake
location frameworks, movement detecting, and vehicle solidness control frameworks
which are on the other hand called against slip, and so on Gyro sensors have filled
quickly in the spaces like vehicle driver security and emotionally supportive networks
and furthermore in a robot movement control. The Application Module provides
recommendation service to end user and user feedback interface. It is an interface
to provide recommendations generated in the business module to end user in regular
intervals via Web Module API and when user searching the catalog information
provided by the service provider. The user feedback is collected here after watching
the content and provided to business module for generating the recommendations
for that user. They are used for measuring angular velocity sensing and angle.
Vibration sensors use the effects called piezoelectric mechanism which measures the
changes within pressure, temperature, and force by changing to an electrical charge.
It has an ability to sense weak vibration signals. Whenever an accident occurs and
if the user is using this smart helmet, firstly it senses the pressure and immediately
sends the data to NodeMCU, and then, NodeMCU sends SMS to his family members
along with his location details without any delay.
This sensor is mainly used in our project for the detection that whether the user is
wearing the helmet or not. This sensor will sense human touch for the fetching data.
This sensor will be activated before the user starts his bike. If the user wears helmet
and satisfies the condition, then it will send the signal to the user that he can start his
Smart Headgear for Unsafe Operational Environment 503
bike. FSR sensors are used extensively in medical systems and in robotic field and
industry applications.
3 Software Implementation
As the project has been implemented using RPI3 and sensor like FSR MQ3, Gyro,
and GPS Modules, the steps followed are as below:
As the values for NodeMCU and sensor must be stored for future analysis and refer-
ence, we need a cloud-based database. As the parameters are dynamic, NoSQL
Database has been used (Mongo Atlas), which is made available for the team
members to access the database.
• The Tables are Users, Alcohol values, and Sensor Status.
• The tables are mapped each other.
The Rest API is used to communicate sensors by app and web app. The Rest API
was created by using Flask, the API was tested using Postman, the API is deployed
504 P. Dhanush et al.
on Heroku, and the URL takes a parameter with predefined arguments and renders
the json. The data from this URL is inserted or updated into Mongo dB.
URLs:
https://shmet.herokuapp.com/
https://shmet.herokuapp.com/api/user/12345/status/off
https://shmet.herokuapp.com/api/user/12345/getalcohol
https://shmet.herokuapp.com/api/valid/[email protected]/Dhanushp
This sensor is used to test the alcohol content from breath. As RPI does not have
analog converter, Analog to Digital IC is used which is called as MCP3008. Pin
Diagram of MQ3 Module:
VCC supplies power for the module. You can connect it to 5 V output, GND
is the Ground Pin and needs to be connected to GND pin, D0 provides a digital
representation of the presence of alcohol, and A0 provides analog output voltage in
proportional to the concentration of alcohol. The A0 is connected to channel 0 of
MCP3008. The circuit connection is as shown in below figure.
The FSR is used to check if rider is wearing Helmet. FSR allows us to detect physical
pressure, squeezing, and weight. A force sensitive resistor (FSR) is a material which
changes it’s anything but a power or pressing factor is applied. At the end of the day,
power touchy resistor is a sensor that permits you to identify actual pressing factor,
crushing, and weight. The circuit connection is as shown in figure.
Smart Headgear for Unsafe Operational Environment 505
GPS is used to track or to fetch the location of Helmet as well as Bike Riders, which
helps whenever person consumes alcohol or met with accident to forward the location
to the guardians. The connections are shown below.
506 P. Dhanush et al.
Presently to test the working of sensor, we used alcohol-based solutions on the sensor,
the fumes of the liquor into the sensor and change the pot move clockwise with Status
LED is ON the pot move back counterclockwise until the LED goes OFF when the
fumes of alcohol reduce. The sensor can be adjusted to authorized level and prepared
for use.
We were able to check the working by just applying some pressure on the sensors
using our hands and put some weight using objects. Then, we were able to notice the
change in value of sensor in the terminal while it was running on the IOT model.
We were able to see the working of Gyro sensor after connecting to a bread board
and power supply and just changing the orientation of the sensor and rotating the
Smart Headgear for Unsafe Operational Environment 507
position of sensor. We could see the change in value at the terminal we changed the
from not fall to fall as indicating the axial position of sensor has been disrupted.
After implementing GPS sensor within the model and configuration of GPS using
an API, user was able to locate the current position within the app itself and we were
also able to send the co-ordinates to the SOS emergency alert as well.
508 P. Dhanush et al.
• The smart protective headgear guarantees the security of the vehicle rider to wear
head protector and furthermore guarantees that the rider has not devoured liquor
more than prescribed limit.
• In case by chance, if any of security rules are abused, the proposed framework
will forestall the vehicle rider from beginning the riding.
• The smart protective headgear additionally helps in treatment of the outcome of
mishaps by sending a SMS to the area of the biker to the police headquarters and
guardians.
• It helps to monitor the casualties to be noticed, on the off chance that he/she met
with a mishap.
• Smart Headgear is worked with a ton of exceptional frameworks and highlights,
so they can in any case wear Smart Headgear/Helmet like they wear the standard
ones. Thus, this paper helps to make a smart helmet audit with pleasant highlights
yet at the same time awesome in each perspective.
References
1 Introduction
Energy utilization is most essential thing followed by all developed countries. Day by
day, the electricity energy demand has been increasing in many countries. Revolution
of electrical cars and e-vehicles in automobile field of reducing the global warming,
less utilization of fuels and cost-effective makes that advancement of renewable
energy system and development of new techniques for utilization of renewable energy
and free energy sources available from solar, wind and tidal energies, etc. India is
the country which gets the maximum amount of sunlight throughout the year. As the
energy produced by the non-renewable resources in our country is in the verge of
extinction, the future world needs the energy. This leads to the way for trapping the
energy through the renewable energy. And other renewable energy other sun is the
periodic one. So we cannot rely on them to get the stable power supply throughout
the year so the solar energy can be used to get the stable power supply without the
fuel cost and maintenance cost.
Though it has high initial cost, it produces reliable power without polluting the
environment. In recent trends, researches have been widely done to minimize its
initial cost and for effective power production. The sunlight can be trapped by the
photo voltaic cell. The array of such photo voltaic cell placed in the panel is called
photo voltaic panel or the solar panel. The maximum amount of power can be trapped
when maximum amount of sunlight falls on the solar panel which can be done by
providing the solar panel with solar tracker and the mirror [1–3]. Evening time solar
power is disappeared and cannot utilize the power but the alternative wave to find
other sources of light energy for utilization, the street light energy as a one of the
source. Here, the mechanical design has been changed for mounting of street light
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 509
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_44
510 A. Saravanan and N. Sivaramakrishnan
and utilized energy source for power production; the produced power of both solar
and street energy has to be conditioned using DC–DC boost converter to increase
the voltage output [4]. MPPT algorithm is used to increase the efficiency of the solar
panel.
The arrangement of solar panel has been made for utilization of solar and street light
power during day and night time, respectively. The placement of solar panel is shown
in Fig. 1. This arrangement contains the various segments like mirror, solar power
tracker, cooling equipment and GSM-based monitoring to monitor the performance
and improve the efficiency [4]. The street light mechanical design and arrangement
have been changed. The four aluminium layers are used to focus the light during
night time but the two aluminium covers are enough to focus light without affecting
the light luminance. The mechanical design has been changed to replace solar panel
for two aluminium covers. Here, we place the solar panel with solar power tracker
over the street light by removing the two layer of the aluminium covering as only
the four sides of the aluminium covering is used to focus the light.
Solar tracker
Light Detecting Resistor (LDR) is cost-effective device widely used for light
detecting and tracking applications. LDR is placed at positions of the top and the
bottom layer of the solar panel which has been used as the tracking device of solar
panel to track the maximum position of the sunlight. A relay has been used to control
the two limit switches. One limit switch is used to turn the solar panel to the original
position at the early morning, and other limit switch is used to stop the solar panel at
the east direction during the early morning before sunrise. Here, we use the stepper
motor for the rotation of the solar panel. And the controlling of the servo motor for
the soar tracking is done by using controller. The MPPT algorithm is used to track the
maximum output power which increases the efficiency. Here, the MPPT algorithm
is implemented in FGPA microcontroller. The DC–DC boost converter is used to
increase the output voltage which is then given to the inverter circuit to get the Ac
power supply. The basic block diagram for the above method is given in Fig. 2.
3 Proposed System-Overview
Here, we have various elements like solar panel with solar tracker, MPPT algorithm
to get high frequency gating signal for IGBT in the boost converter, battery to store
energy and the street light as the load. During the day time, the solar panel with the
solar tracker is used to track the sunlight and turn the solar panel towards the sunlight
so as to get the high intensity of the light from the sun so as to get the high efficiency
[5].
At night time, it turns the solar panel towards the street light to trap the maximum
light from the street light as the free energy so that 80% of the light that the street light
uses is returned back. Here, we use the boost converter with the MPPT algorithm [6,
7]. The P and O method which is the easiest and cheapest method of implementing
MPPT is used. It requires only two sensors for measuring the voltage and current, and
by perturbing, we get the point at which the voltage and the current are maximum.
And this is used to produce the gating signal for boost converter to get high
efficiency. The battery is used to store the energy, and the street light is used for
illumination purpose during the night time.
512 A. Saravanan and N. Sivaramakrishnan
Light
source
Fig. 4 The flow chart for solar panel with solar tracker
MPPT is the simple method implemented to trap the maximum power from the solar
panel. Without MPPT algorithm, most of the energy obtained from the solar panel
will be wasted.
The main objective of the MPPT algorithm is to find the panel operating voltage
that will provide the maximum power output from the solar panel [9–12]. In MPPT
514 A. Saravanan and N. Sivaramakrishnan
algorithm, the output power of the circuit will be maximum only when the Thevenin
impedance of the circuit matches with the load impedance of the circuit. The MPPT
algorithm is implemented by means of FGPA microcontroller.
There are different methods of implementing the MPPT algorithm such as:
(i) Perturb and observe method
(ii) Incremental conductance method
(iii) Fractional short circuit current
(iv) Fractional open circuit voltage
(v) Neural networks
(vi) Fuzzy logic.
The choice of selection of algorithm is done by considering cost and complexity
in MPPT. Here, we use perturb and observe method (also termed as hill climbing
method) as it is cheap and easy to implement.
Perturb and observe method (P and O method)
It is the cheapest method to implement the MPPT algorithm. Here, we use the single
voltage sensor to track the maximum output voltage of the PV panel. [11, 13, 14]
So that cost of implementing the Maximum Power Point is less when compared to
other method. Here, it does not stop on reaching the Maximum Power Point; it keeps
on perturbing on both the direction. Here, modifying the panel operating voltage is
done by modifying the converter duty cycle. The algorithm to implement the MPPT
algorithm using perturb and observe method is shown Fig. 5.
Here at first, it compares the previous power with the new one when this power is
greater than the previous power; then, it increases the operating voltage. If the power
is less than the previous power, then it decreases the operating voltage so that the
maximum power from the solar panel can be trapped.
From the figure, it can be clearly understood that if the voltage on the right side
decreases, the MPPT increases the power, and if the voltage on the left side decreases,
MPPT increases the power. This is the main idea of the perturb and observe algorithm.
Improve the Solar Panel Proficiency by Using of Free Energy … 515
For implementing perturb and observe algorithm, it is necessary to find the voltage
and the current of the PV panel.
Here, we used the DC–DC boost converter to boost the low voltage DC supply to the
high voltage DC supply which is obtained from the output of the solar panel. Here,
we use the parallel capacitor in the load side to increase the output voltage of the
solar panel. By increasing the output voltage, the efficiency can be increased.
The output voltage of the DC–DC boost converter circuit is V0 = Vs /(1 − α).
Where V 0 —output voltage of the DC–DC boost converter.
V s —Source or the input voltage from the solar panel.
α—firing angle of the converter.
Here by varying the firing angle, we can easily get the Maximum Power Point.
Cooling
Cooling the solar panel will improve the efficiency of the solar panel. Here, when the
solar panel is heated, the efficiency of the solar panel to produce the output power is
reduced. Here, the natural cooling is done by the external wind [15]. If this air is not
516 A. Saravanan and N. Sivaramakrishnan
sufficient to cool the solar panel, then the external setup is done by providing with
the cooling fan which is controlled by the microcontroller. This microcontroller will
turn on the cooling fan only when it is necessary.
Here, the solar panel can be monitored continuously to improve the performance and
to drive the maximum power out of its performance and can be checked continuously
by noting the power output at various time either manually or remotely [15–17]. Here,
we use the GSM module to check whether the solar panel works perfectly or some
external factor reduces its performance such as dust, dirt and fallen leaf from the tree
which can be manually cleared when known, which increase the performance. Here,
we can use the GSM module to check the status of the battery and switch off the load
accordingly. The load can also be switched on and off accordingly. Here, when the
battery is full during the day time, the solar can directly supply the load online with
battery backup. When the battery is half full or empty, it should instruct the user to
reduce the connected load and then simultaneously supplies the load and charges the
battery. It also instructs the user to clean the solar panel when the dust or dirt in the
solar panel reduces its performance. The table for monitoring the performance and
improving it is given in Table 3.
Here, when the sun is in the east direction, the solar tracker which is placed above
the solar panel tracks the position of the sun and move the solar panel towards it,
so the maximum amount of light will be focused on the solar panel which increases
the power output. During the night time, it turns the solar panel towards the street
light so that the 80% of the power that the street light uses is returned back. Here, the
solar panel efficiency reduces when it gets heated so the forced cooling is given to
the solar panel when the natural cooling is insufficient. The performance of the solar
Improve the Solar Panel Proficiency by Using of Free Energy … 517
Fig. 6 Hardware
implementation of our
project
panel is monitored by means of remote monitoring of the solar panel using GSM
module. Here, we have done this in single street light; if this is implemented in all
areas in the city, then we have the surplus amount of power which can be connected
to the grid to meet the current power demand problem easily (Fig. 6).
8 Conclusion
The main objective of this paper is to increase the efficiency of the solar panel which
can be implemented in practical case. I tried my level best to increase the efficiency
of the solar panel with the simplest method. Here, the solar tracking system contains
the LDR to detect the position of the solar panel and to turn the solar panel to increase
the concentration of the sun light towards the solar panel, and during the night time,
it turns towards the street light to increase the efficiency and ultimate efficiency of
the solar panel. The MPPT algorithm is used to track the maximum power from the
solar panel and the sufficient cooling in order to get the optimum efficiency.
References
1. Piegari L, Rizzo R (2010) Adaptive perturb and observe algorithm for photovoltaic maximum
power point tracking. Renew Power Gener, IET 4(4):317–328
2. Femia N, Petrone G, Spagnuolo G, Vitelli M (2004) Optimizing sampling rate of P and O
MPPT technique. In: Proceedings IEEE PESC, pp 1945–1949
3. Esram T, Chapman PL (2007) Comparison of photovoltaic array maximum power point tracking
techniques. IEEE Trans Energy Convers 22(2):439–449
4. Mokhtar A et al. (2011) Design and development of energy-free solar street LED light system.
In: 2011 IEEE PES conference on innovative smart grid technologies-middle east, IEEE
518 A. Saravanan and N. Sivaramakrishnan
5. Hossain E, Muhida R, Ali A (2008) Efficiency improvement of solar cell using compound
parabolic concentrator and sun tracking system. In: 2008 IEEE Canada electric power
conference, IEEE
6. Mamarelis E, Petrone G, Spagnuolo G (2014) A two-steps algorithm improving the P and O
steady state MPPT efficiency. Appl Energy 113:414–421
7. Arshad R et al. (2014) Improvement in solar panel efficiency using solar concentration by
simple mirrors and by cooling. In: 2014 International conference on robotics and emerging
allied technologies in engineering (iCREATE), IEEE
8. Winston DP et al. (2020) Performance improvement of solar PV array topologies during various
partial shading conditions. Sol Energy 196:28–242
9. Pakkiraiah B, Sukumar GD (2016) Research survey on various MPPT performance issues to
improve the solar PV system efficiency
10. Hua C-C, Fang Y-H, Wong C-J (2018) Improved solar system with maximum power point
tracking. IET Renew Power Gener 12(7):806–814
11. Zakzouk NE et al. (2016) Improved performance low-cost incremental conductance PV MPPT
technique. IET Renew Power Gener 10(4):561–574
12. Yilmaz U, Turksoy O, Teke A (2019) Improved MPPT method to increase accuracy and speed
in photovoltaic systems under variable atmospheric conditions. Int J Electr Power Energy Syst
113:634–651
13. Abdel-Salam M, El-Mohandes M-T, Goda M (2018) An improved perturb-and-observe based
MPPT method for PV systems under varying irradiation levels. Sol Energy 171:547–561
14. Kazmi SMR et al. (2009) An improved and very efficient MPPT controller for PV systems
subjected to rapidly varying atmospheric conditions and partial shading. In: 2009 Australasian
universities power engineering conference, IEEE
15. Prudhvi P, Sai PC (2012) Efficiency improvement of solar PV panels using active cooling. In:
2012 11th International conference on environment and electrical engineering, IEEE
16. Alhammad YA, Al-Azzawi WF (2015) Exploitation the waste energy in hybrid cars to improve
the efficiency of solar cell panel as an auxiliary power supply. In: 2015 10th International
symposium on mechatronics and its applications (ISMA), IEEE
17. Nazar R (2015) Paper title: improvement of efficiency of Solar panel using different methods.
Int J Electr Electron Eng (IJEEE) 7
18. Overall efficiency of the grid connected photovoltaic inverters, European Standard EN 50530,
2010
19. Sera D et al. (2006) Improved MPPT algorithms for rapidly changing environmental conditions.
In: 2006 12th International power electronics and motion control conference, IEEE
20. Gomathy SSTS, Saravanan S, Thangavel S (2012) Design and implementation of maximum
power point tracking (MPPT) algorithm for a standalone PV system. Int J Sci Eng Res 3(3):1–7
Hardware Implementation of Machine
Vision System for Component Detection
1 Introduction
P. Smruthi
VLSI and Embedded Systems, Nitte Meenakshi Institute of Technology, Bengaluru, India
K. B. Prajna (B)
Department of Electronics and Communication, Nitte Meenakshi Institute of Technology,
Bengaluru, India
e-mail: [email protected]
J. G. John · A. T. Pasha
Research and Technology Development Group, ACE Designers Ltd, Bengaluru, India
e-mail: [email protected]
A. T. Pasha
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 519
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_45
520 P. Smruthi et al.
2 Literature Survey
A machine vision system is made up of several components, from the camera that
captures an image for examination through the processing engine that generates
and conveys the results. Incoming picture data is processed by software, which then
provides pass/fail outcomes. The collected images are stored in the database which is
used in the testing process. As the testing process starts, it will access the image and
compare it with the obtained image. The parallel communication protocol is used to
communicate between the industrial controller and peripheral interface. Machine
vision software can take numerous forms and used for an application-oriented
purpose.
Golnabia et al. [1] the relevance and importance of machine vision systems in
machine tool industrial applications are described. A basic machine vision model
and the system design are detailed. The system design includes various sub-systems
which are chosen based on the application. It describes the vision system as four-
step processes. Starting with acquiring the image, converting the pixels to array of
images, reviewing and analyzing the image and finally sending the output to external
devices. The sequence and the operation of each element of the system are explained.
Finally, application of machine vision system is described.
Al-Kindi et al. [2] propose using a vision-based monitoring and regulator system in
manufacturing lane to increase CNC machining performance. A solution is presented
and developed to enable the integration of vision processing with CNC machines
by addressing a number of pinpointed concerns that prevent such integration. These
processes are being refined into a practical methodology that used on laboratory-scale
CNC milling machines. To generalize the findings, two distinct kinds of bench-type
CNC machines are used. Each of the two CNC 9 machines has two cameras positioned
on the machine spindle to provide accurate picture data in the cutting direction. An
indicative parameter is proposed and used to evaluate the tool imprints that outcome.
The total results demonstrate the validity of the approach and motivate additional
Hardware Implementation of Machine Vision System … 521
detection. Only pixels points with a different intensity are considered in edge detec-
tion. By combining the gPb-owt-ucm method with Gaussian smoothing filter, the
performance of the image segmentation has been improved. The change of color
space is conducted and explored on the same dataset for improved performance. As
a result, an improved accuracy of several datasets is obtained.
Seo et al. [9] describe a novel Hough transform-based technique for circle recog-
nition. For edge identification in the voting process, the design uses a scan line-based
ball detection algorithm. The simulation results reveal that an image with a VGA
resolution of 640 × 480 is processed in 10 ms, i.e., the design suggests that the
processed architecture can meet the required speed.
3 Methodology
4 Design
Machine vision system refers to the processes and methods used to extract data from
an image on an automatic basis, with the end result being another image format. The
extracted information consists of a set of data such as the position, orientation and
identification of each object, which can be a good-part/bad-part signal. This signal is
used for the further manufacturing process. It can be also used for applications such as
industrial robot and process guidance, automatic inspection, security monitoring and
vehicle guiding. The image depicts the design of a machine vision system. It is built
out of a conveyor that moves the component toward the proximity sensor. A signal is
sent to the controller to capture the image which is processed in the controller after
it is captured. Higher processing performance requirements are given by rule-based
training. Multiple stages of processing are typically performed in a sequence that
results in the desired outcome (Fig. 1).
A typical sequence might begin with tools that modify the image, followed by data
extraction from those objects, followed by communication of that data or comparing
it to target values to produce and convey “pass/fail.”
Figure depicts the key mechanisms of a visual system. Numerous operations, such as
image acquisition, pre-processing, segmentation and feature extraction, are carried
out in a continuous and real-time way. When the proximity sensor detects an object,
it sends a signal to the controller, which causes the camera to record an image. In a
vision system, the optical-acquisition sub-system converts optical image information
into an array of numerical data that can be processed by a processor. The light from a
source illuminates a segment of an image to the image sensor, resulting in an optical
picture. Image arrays are used to convert an optical image to an electrical signal that
can then be transformed to a digital image. In general, cameras with either line scan
or area scan elements offer significant advantages. For light detection, the camera
system may use a charge coupled device (CCD) sensor or a CMOS sensor.
All pre-processing, segmentation, feature extraction and other operations are
accomplished using digital images. At this point, the image classification and inter-
pretation are complete, and the actuation operation can be performed in order to
interact with the division. Thus, the actuation sub-system communicates with the
controller in order to regulate or change any given prerequisite in order to acquire a
better image acquisition. A visual system performs the following functions: picture
capture, image processing and item recognition within an object collection. The
scene is illuminated by light from a source, and image sensors generate a photosen-
sitive image. Image processing methods include image sensing, displaying image
data and digitization. Image processing is the process of changing and creating the
pixel values of a digital image in order to provide a more suitable form for subsequent
processes. Segmentation is the process of dividing an image into expressive sections
that correspond to different parts of the image (Fig. 2).
The Machine Vision system consists of several elements such as camera, lens, compo-
nent presence sensor, solenoid cylinder. The basic elements of the system are shown
in Fig. 3.
The banded piston component is used for the feature identification. As the compo-
nent is placed on the conveyor belt, it is on. The component is moving on the belt. The
proximity sensor is used to identify the component once the component is detected.
The conveyor motor is stopped. The image processing is performed, and the decision
is taken where it is a good or bad component (Fig. 4).
Figure 5 shows the flowchart of hardware implementation of MVS.
The placement of each component in the fetcher (Fig. 6) is according to the process.
This fetcher is a prototype which consists of work piece conveyor where the compo-
nents are loaded for the process [10]. The sensor is placed based on the sensing
distance and also the sensing range. As the component passes the sensor, the light
ray which the sensor continuously transmits breaks and a reflection signal is sent
back. The mechanical fetcher is designed using aluminum sheet metal. The size of
sheet is calibrated using software and designed using the fetcher. The entire architec-
ture is designed to inspect, analyze and identify the moving component. The process
is carried out in a continuous manner.
The electrical cabinet box for the machine vision system is as shown in the figure.
Electrical cabinet (Fig. 7) consists of element such as controller, SMPS, SSR relay,
MCB. Supply to the cabinet is given by 3-phase poles, from which single pole is
528 P. Smruthi et al.
taken to the entire cabinet. The single-pole supply is given to the SMPS directly for
rest; all elements in it are given through terminal of SMPS.
The controller power is given through 10 A MCB to avoid the shorts. This MCB is
connected to the SMPS terminal block. Even the exhaust fan, lighting is connected to
the SMPS terminal block through MCB. The terminal block is used as the connection
point to all the elements in the cabinet. Contactor is used to control the speed of the
motor using digital input–output signal. If any short in contactor, the motor tripped
signal is given to provide to stop the conveyor motor. Two solid-state relays are used
to control the pneumatic cylinder forward and home position. The supply to the SSR
relay is given through the 24 V.
Fig. 9 Conveyor motor and sensor connection of MVS electrical diagram for MVS
The software for machine vision system deals with capture of image to the
decision-making which is done using PyCharm IDE. The main goal of PyInstaller is
to be compatible with third-party packages to create an executable file. The source
code is converted to executable format which is understood by the controller. The
objective is feature detection for a dot of 0.5 mm. This is done by using rule-based
technique. In this method, the real-time image is captured and processing is done.
The processing methods are blurring, thresholding and Hough circle detection. Once
the processing is completed, the decision is taken by software as accepted or rejected
component.
530 P. Smruthi et al.
A proximity sensor is a non-contact sensor. Proximity sensors detect objects when the
light emitted from the sensor is reflected back at the photoelectric receiver. Through-
beam sensors detect targets when the target breaks the beam of light between the
sensor’s emitter and receiver (Fig. 10).
Pin 17 and Pin 18 of controller DIO are 24 V and ground. Pin 3 is digital input
which is connected to the signal of proximity sensor. The sensor circuit used here is
the LC oscillation which is used to generate a designed resonant oscillation continu-
ously. A pull-down resistor of 2.2 k is connected to the signal to maintain always 0 V
at the signal. Since the PNP-type senor is connected to make it NPN, it is connected
to resistor. The below is the code for accessing the input pins:
Mydll = cdll. LoadLibrary (../inpoutx64.dll).
Driver = Mydll.IsInpOutDriverOpen.
Output = mydll. Inp32(0xA03).
A solenoid plunger is the moving part of a solenoid that transfers linear motion
from the solenoid to the component that it is designed to operate. It controls the flow
direction of compressed air. A moving part inside the valve blocks or opens the ports
of the valve. SSR relay has a Zener diode and optocoupler; this is used to switch the
position of the cylinder when the voltage is applied (Fig. 11).
A conveyor belt works by using two motorized pulleys that loop over a long stretch
of thick, durable material. When motors in the pulleys operate at the same speed and
spin in the same direction, the belt moves between the two. The width of the belt
30 mm, length of the belt 100 mm, the speed of the belt 5 mm/s (Fig. 12) [11–13].
Hardware Implementation of Machine Vision System … 531
5 Results
6 Conclusion
X
detected. Other than
6 dots. This dot was
of same radius as the
other 6 six circles
X
between two circle
was not 2 mm
X
not available, then
the component will
not be visible
X
different diameter.
The diameter greater
1 mm is not detected
7 Future Scope
The developed system is to be flexible for variety of components, and graphical user
interface is to be upgraded. The system needs to be integrated with CNC machine
for the complete process.
534 P. Smruthi et al.
Defect Size(mm)
Probability of Detection Manual Testing
Probability of Detection Automated System
Fig. 15 Difference in probability of detection chart
Table 2 Component
Conveyor belt speed No. of components identified in one cycle
detection for different belt
speed 30 rpm 4
80 rpm 7
90 rpm 10
References
1. Golnabi H, Asadpour A (2007) Design and application of industrial Machine vision systems.
Robot Comput Integr Manufactu 23(6):630–637
2. Al-Kindi G, Zughaer H (2012) An approach to improved CNC machining using vision-based
system. Mater Manuf Process 27(7):765–774
3. He X et al. (2020) Design of high-power LED automatic dimming system for light source of on-
line detection system. In: 2020 IEEE 5th Information technology and mechatronics engineering
conference (ITOEC), IEEE
4. Baginski A, Covarrubias G (1997) Open control-the standard for PC-based automation tech-
nology. In: Proceedings 1997 IEEE international workshop on factory communication systems.
WFCS’97, IEEE
5. Liao W et al. (2010) An industrial camera for color image processing with mixed lighting
conditions. In: 2010 The 2nd international conference on computer and automation engineering
(ICCAE), vol 5, IEEE
6. He X et al (2020) An adaptive dimming system of high-power LED based on fuzzy PID control
algorithm for machine vision lighting. In: 2020 IEEE 4th Information technology, networking,
electronic and automation control conference (ITNEC), vol 1, IEEE
7. He X et al (2020) Design of high-power LED automatic dimming system for light source of on-
line detection system. In: 2020 IEEE 5th Information technology and mechatronics engineering
conference (ITOEC). IEEE
8. Deng G, Cahill LW (1993) An adaptive Gaussian filter for noise reduction and edge detection.
In: 1993 IEEE conference record nuclear science symposium and medical imaging conference,
IEEE
Hardware Implementation of Machine Vision System … 535
9. Seo SW, Kim M (2015) Efficient architecture for circle detection using Hough transform. In:
2015 International conference on information and communication technology convergence
(ICTC), IEEE
10. Noble FK (2016) Comparison of openCV’s feature detectors and feature matchers. In: 2016
23rd International conference on mechatronics and machine vision in practice (M2VIP), IEEE
11. Lü C, Wang X, Shen Y (2013) A stereo vision measurement system based on openCV. In: 2013
6th International congress on image and signal processing (CISP), vol 2, IEEE
12. Higuchi T et al. (2019) ClPy: a NumPy-compatible library. accelerated with openCL. In: 2019
IEEE International parallel and distributed processing symposium workshops (IPDPSW), IEEE
13. Seman P et al. (2013) New possibilities of industrial programming software. In: 2013
International conference on process control (PC), IEEE
14. Martin S (1990) PC-based data acquisition in an industrial environment. In: IEE Colloquium
on PC-based instrumentation, IET
15. Wuth SN, Coetzee R, Levitt SP (2004) Creating a python GUI for a C++ image processing
library. In: 2004 IEEE Africon. 7th Africon conference in Africa (IEEE Cat. No. 04CH37590),
vol 2, IEEE
16. Hernandez-Ordonez M et al. (2007) Development of an educational simulator and graphical
user interface for diabetic patients. In: 2007 4th International conference on electrical and
electronics engineering, IEEE
17. Bright G, Potgieter J (1998) PC-based mechatronic robotic plug and play system for part
assembly operations. In: IEEE International symposium on industrial electronics. Proceedings.
ISIE’98 (Cat. No. 98TH8357), vol 2, IEEE
18. Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach
Intell 6:679–698
19. Yan S, Yao L, Zhang Y (2019) Design of industrial robot sorting system based on smart
camera. In: 2019 International conference on artificial intelligence and advanced manufacturing
(AIAM), IEEE
20. Chakole S, Ukani N (2020) Low-cost vision system for pick and place application using camera
and ABB industrial robot. In: 2020 11th International conference on computing, communication
and networking technologies (ICCCNT), IEEE
DEOMAC—Decentralized
and Energy-Efficient Framework
for Offloading Mobile Applications
to Clouds
1 Introduction
The deployment and usage of smartphone applications and platforms have increased
dramatically, around the world for various tasks such as sending emails, watching
videos, online banking, browsing the Internet, navigating using online maps and
using social media [1]. To perform the above tasks, different applications were
utilized. Due to rapid growth of user demands and mobile applications, the Quality
of Service (QoS) is restrained by limitations at the mobile side such as limited avail-
able connectivity, finite energy, resource limitations, and shared wireless medium.
Running complex applications on resource-limited mobile devices which have slow
processors, limited storage abilities and low battery power, lowered magnitude which
will widening the gap between the availability of limited resources and the demands of
these complex programs resulting into lowered performances and slow functionality
of mobile devices [2].
However, offloading computational task could consume a huge amount of energy
and incur delay between cloud clones and mobile devices which could involve consid-
erable communications. Hence, offloading decision should be carefully made at each
mobile device for the execution of computation task either at cloud or mobile, by
considering the status of the wireless network as well as delay and energy consump-
tion of various operations. Clearly, computation offloading is recommended only
when the mobile device execution time is higher than the cloud execution time.
Many factors can impact the offloading process and could influence the offloading
decision [3–7].
Earlier research suggested a threshold-based policy for making the offloading
decision. Apparently, it is very tedious to set a common threshold as it is often
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 537
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_46
538 A. L. Shanthi and V. Ramesh
2 Literature Review
Several offloading frameworks and models exist to enhance the execution capability
and energy optimization in mobile cloud environment but each of these has their
own advantages and limitations. Conventional offloading frameworks used adaptive
algorithms that migrates heavy computations to the remote servers. These frame-
works employ different levels for offloading applications at runtime, but it includes
migration cost of the computational components of the mobile application.
The main goal of this frameworks is to find the optimum solution to offload
intensive parts of the applications with proper use of assets. However, every frame-
work has its own benefits and limitations. Mostly frameworks have not considered
dynamic execution time which is a major aspect for real-time applications. Some of
the frameworks has not focused on resource allocation at cloud environment during
runtime.
Hassan et al. [14] introduced the POMAC framework for a transparent and
dynamic offloading in mobile. Mobile computation offloading operations based
on two issues in offloading executions to the cloud. Their scheme first decided if
code should be offloaded and, on a decision, to offload, it was done transparently.
They implemented their prototype on the Dalvik VM. Their introductory evalu-
ations showed the proposed schemes worked well on real-time applications and
outperformed many existing schemes.
The proposed POMAC implemented an MLP-based decision engine that captured
dynamic elements and relationships between the elements captured. POMAC also
implemented a transparent method level offloading system. This framework mainly
DEOMAC—Decentralized and Energy-Efficient Framework … 539
considered network profiling parameters rather than device and application profiling
which is not sufficient for making a dynamic offloading decision that gives scope for
further research.
Energy savings in smart mobile devices using a MAUI framework were proposed
by Cuervo et al. [15]. The proposed framework’s targeted offloading processes in
a highly dynamic approach as it continuously profiled the processes. The scheme
mocks the complexity of remote executions from smart mobile devices giving an
impression that the application is executed on the smart mobile device locally. MAUI
partitions are based on code annotations and specify components that can be executed
on cloud server remotely. MAUI profiler assesses device characteristics after which
it monitors them and characteristics of network during the whole execution time as
these parameters can change and be the cause of inaccurate measurements and thus
erroneous decisions. Offloading decisions happen at runtime where the framework
selects components that are executed remotely based on the decision from the MAUI
solver which takes MAUI profiler inputs.
Existing frameworks require special compilations/modifications in source codes
or binaries which makes it complex for implementations or adaptations. Effective
decisions need to ensure offloading as beneficial or unprofitable in their energy
conservations or execution times. Predicting offloadable decisions for smart mobile
devices based on analysis of static thresholds is a laborious process as parameters
may change dynamically and frequently. Static profiling leads to a constant offloading
decision. Frameworks that consider latency delays, bandwidths, available memory
spaces, and CPU loads only as QoS parameters, do not take dynamic offloading
decisions creating a need to improve on QoS factors in decision-making.
Previous frameworks did not focus on the energy efficiency of smart mobile
devices in cloud resource allocations which makes it necessary to consider the
creation of optimized decision-making in smart mobile devices. However, the deci-
sions should satisfy the reduction of overheads in offloading. One major issue in
offloading has been in the accurate allocation of cloud resources for obtaining desired
results at minimal execution costs. Most studies have catered to mobile computation
offloading operations in the cloud without specifying virtual machine allocations for
reduced execution times and improved performances (Table 1).
Contribution of the paper
• To propose and implement a novel mobile application offloading framework
that expand the capabilities of mobile devices which includes minimized energy
consumption and delay for executing a computation-intensive task.
• To find the felicitous classifier that should have the properties to accurately make
the offloading decision which is highly precise and lightweight run on mobile
devices.
• To apply decentralized resource allocation in cloud for amending energy efficiency
and reducing the computational involution.
540 A. L. Shanthi and V. Ramesh
Profiler
The profiler is the fundamental part of the framework used to gather QoS parameters
such as transfer speed, accessible RAM in portable and cloud, information size and
battery level and average execution time at mobile and cloud, respectively. Typically,
the working method of the profiler is gathering the data and enhancements into the
database. Profiler gathers the data that causes the framework to settle on precise
choice with respect to offloading by gathering context information about gadget,
application and network. Mobile device profiler states the status of versatile such
as RAM and battery status of the mobile device. Network profiler screens the data
about network condition such as type, signal quality, and transfer speed. Application
profiler gathers the highlights about input size and average execution time. Average
execution time is determined by taking average of what amount of time required to
execute compute intensive task at cloud and mobile. By looking at proc/meminfo
records of android, RAM accessibility of versatile and cloud is resolved.
Decision-Making
To make ideal offloading choice, an offloading decision model is developed to antic-
ipate whether to offload or not, which is used by neuro-fuzzy Controller (NFC).
The fundamental goal of neuro fuzzy is to choose the execution environment for
the compute-intensive task undertaking either at cloud or local mobile device. QoS
542 A. L. Shanthi and V. Ramesh
parameters such as transfer speed, accessible RAM in versatile and cloud, infor-
mation size and battery level and average execution time at portable and cloud are
utilized as input for the decision-making process. The detailed explanation of the
proposed NFC is described in following sections.
Resource Allocation in Cloud server
In cloud, the decentralized resource distribution is accomplished to perform the
execution of a compute-intensive task at the minimum consumption time with the
assistance of adaptive cuckoo search optimization algorithm. Request Handler in
Cloud module will deal with all the solicitations from the mobile contrivances
and forward to resource manager (RM) for dispensing virtual machines to execute
offloading demand. RM distributes decentralized asset designation of VMs with the
usage of ACSOA. Finally, the classification process is also drifted under the NFC
method. Here, the classification is exactly denoted as the identical process of the
images. In the DFOMAC framework, the API is designed for each and every phase
of the process.
Application Programming Interface
Developed android application have three application programming interface to
execute the offloading demand in the cloud environment. The primary API is created
for sending and receiving output and input from application. The subsequent Interface
is used for sending and getting information and yield from cloud. The motivation
behind next API is to schedule the undertaking of virtual machines in the cloud
environment.
In this research, the decentralized resource allocation process is done by the adap-
tive cuckoo search optimization algorithm (ACSOA) to apportion virtual machines
for executing offloading demand in a proficient manner in cloud environment. The
fundamental point of the proposed algorithm is to allot each task to a virtual machine
and redistribute the dispensed machines with another errand to execute a more note-
worthy number of requests so as to finish all the request with least time utilization.
Offloading demand task comprises parameters such as RAM size expected to finish
the the task in terms of MB as (t) and size of the task in terms of Million Instructions
(r). Request Handler in Cloud handles the request from the mobile device with the
assistance of interface and forward these parameters to resource manager which has
ACSOA for mapping it on to optimal Virtual Machines to reach optimal solution It
initializes the number of offloading requests, virtual machines and parameters such as
image size, average execution time and available memory size as Xitr and generates
the initial solution to execute the task.
DEOMAC—Decentralized and Energy-Efficient Framework … 543
Task completion time and available memory is computed for virtual machines
in cloud environment. Based on that, the fitness function is resolute by minimize
the execution time and maximum the available memory for task execution. Update
function in ACSOA is the improvement of cuckoo search finds and rank the best
arrangement and built new one for worst nest using gradient descent in order to
seek optimal virtual machine for task execution. The above process continues till
all the request finish its execution. This calculation builds up the essential cuckoo
search without losing the quality of high-productivity search of Lévy flights which
incorporates fast inquiry strategy with gradient decent (GD) algorithm to improve
the intermingling rate while sustaining the astounding attributes of cuckoo search
algorithm. Nonetheless, the pursuit cycle might be tedious, because of randomization
conduct. With the help of Eqs. (1) and (2), the local search is calculated by using
GD approach. However, with the help of Eq. (3) the global search is calculated using
Levy flight.
Levy flight is a random walk in which the steps are communicated in terms of
the step length that are dispersed according to a heavy tailed probability distribution
with the direction of steps being isotropic and random.
For the quality of the solution, the update function (9) is reinforced with the GD
approach. The DFOMAC framework affects positively on minimum delay and energy
saving in the server side along with the mobile devices during the decision-making
for offloading applications. Algorithm symbol notations are mentioned in Table 2.
The algorithm pseudocode is mentioned below.
In this part, we report the execution and test brings about approving the performance
of the DEOMAC framework, an image comparison android application was devel-
oped and installed on a mobile device. Image comparison method takes majority of
the computations and if it is done at the local device, the battery will be depleted
rapidly and the reaction time will be bigger. Utilizing this DEOMAC framework,
performance of battery in mobile devices improved where comparison method can
be offloaded to the cloud depends upon QoS parameters. Four hundred number of
images stored in database for 80 objects with five images of each object. A 3/4th
of the images is utilized for training and the remaining images for testing. Image
544 A. L. Shanthi and V. Ramesh
Table 2 Notations
Adaptive cuckoo search optimization algorithm
1 Input: Number of VM and Task
2 Output: Optimal VM
3 Objective function: F(X i ) = f (x1 , x2 . . . xd )
4 Initialization:
Initialize or generate the task (Ti ) and resources (Ri )
5 X tir = (xt1r , xt2r . . . xtnr )
6 Initial solution generation Y it r = y1t r , y2t r . . . y nt r
7 While t < Maximum iteration do
8 For i = 1 to Yj
9 For j = 1 to N //N → No of tasks
10 Fitness function Fi = max (γi + 1 − δi )
11 δi = E i j + Ri //task completion time
Mi
12 γi = Si //available memory evaluation
13 End for
14 End for
15 Select the best solution
16 Updation
17 Worst nests are abandoned and built a new one using a gradient descent algorithm
18 Keep the best solution
19 Rank the best solution
20 End while
comparison process is performed for input image and features of the input image
were extracted for image comparison process which is utilized by neuro fuzzy. If
the offloading decision is at cloud, then the server can compare the input image with
images stored in its database. Then, it sends a matched result to the mobile device.
Otherwise, the execution can be done at the mobile device.
Our implementation and experiments were refined to evaluate offloading achieves
better execution time and reduced energy consumption of mobile devices. The smart
mobile device utilized for the research is Samsung Galaxy A7 with Android 8.0
and the AccuBattery application used for evaluating battery consumption. In this
research, the private cloud is purchased from the website https://veloxitec.com as
it provides secure and resizable compute capacity in the cloud. There are about 50
VMs are created through instance with various size and run on the cloud through
API. Images with different sizes such as 100 * 100, 200 * 200, 300 * 300, 400 * 400,
500 * 500, 600 * 600 were used for evaluation. The proposed framework evaluation
was done with the number of images in the database by adding 400 images in the
database and analyze the results of response time, energy consumption for cloud, and
mobile execution for various methods. Figures 2, 3, 4 and 5 represents the response
DEOMAC—Decentralized and Energy-Efficient Framework … 545
time for cloud and mobile, energy consumption for mobile and cloud, classification
accuracy, and offloading prediction for 400 images NFC achieves minimum response
time and battery consumption in cloud and mobile than traditional methods. It also
achieves maximum prediction and classification accuracy when compared to other
methods.
Fig. 4 Classification
accuracy
546 A. L. Shanthi and V. Ramesh
Fig. 6 Analysis of various frameworks a average response time b average energy consumption
4 Conclusion
environment to meet the exact requirements of the offloading process. The proposed
framework is implemented in the real-time environment using Android platform and
battery performances of mobile devices are evaluated using AccuBattery applica-
tion. The proposed framework was evaluated by varying the number of images in the
database which is compared with the existing method and shows better energy capa-
bility and response time as compared to existing methods. After that, the proposed
framework is compared with existing frameworks which shows that execution time,
and the energy consumption is better than the traditional frameworks. The main
outcomes of this research are providing an energy-efficient offloading framework
with improved QoS factors which includes decentralized resource allocation in cloud.
In the future, would concentrate on to stretch out this research further to take the
situations in which numerous cloud servers are reachable for on mobile device.
References
1. https://economictimes.indiatimes.com/tech/internet/internet-users-in-india-to-rise-by-40-sma
rtphones-to-double-by-2023-mckinsey
2. https://www.ericsson.com/en/mobility-report/reports/june-2020/mobile-subscriptions-out
look
3. Wang Y, Chen IR, Wang DC (2015) A survey of mobile cloud computing applications:
perspectives and challenges. J Wirel Pers Commun 1607–1623
4. https://www.bankmycell.com/blog/how-many-phones-are-in-the-world
5. https://www.statista.com/statistics/330695/number-of-smartphone-users-worldwide
6. Liu L, Du Y, Fan Q, Zhang W (2019) A survey on computation offloading in the mobile cloud
computing environment. Int J Comput Appl Technol 59(2)
7. Shruthi BM, Pruthvi PR, Kavana MD (2017) Mobile cloud computation: issues and challenges.
Int J Recent Trends Eng Technol 3(4) ISSN: 2455-1457
8. Sareen B, Sharma S, Arora M (2014) Mobile cloud computing security as a service using
android. Int J Comput Appl 99(17):0975–8887
9. Akherfi K, Gerndt M, Harroud H (2018) Mobile cloud computing for computation offloading:
issues and challenges. Applied computing and informatics 14(1):1–16
10. Kim Y, Lee H-W, Chong S (2019) Mobile computation offloading for application throughput
fairness and energy efficiency. IEEE Trans Wirel Commun 18(1):3–19
11. Wang Y, Sheng M, Wang X, Wang L, Li J (2015) Mobile-edge computing: partial computation
offloading using dynamic voltage scaling. J Latex Class Files 14(8)
12. Cardellini V, De Nitto Persone V, Di Valerio V, Facchinei F, Grassi V, Lo Presti F, Piccialli
V (2015) A game-theoretic approach to computation offloading in mobile cloud computing.
Math Program ISSN 0025-5610
13. WU H (2018) Multi-objective decision-making for mobile cloud offloading: a survey. IEEE
Access 6:3962–3976
14. Hassan MA, Bhattarai K, Wei1 Q, Chen S (2014) POMAC: Properly offloading mobile
applications to clouds
15. Cuervo E, Balasubramanian A, Cho DK, Wolman A, Saroiu S, Chandra R, Bahl P (2010) MAUI:
making smartphones last longer with code offload. In: Proceedings of the 8th international
conference on mobile systems, applications, and services. ACM, pp. 49–62
16. Chun BG, Sunghwan I, Petros M, Mayur N, Ashwin P (2011) CloneCloud: elastic execution
between mobile device and cloud. In: 6th Conference on computer systems (EuroSys), pp. 301–
314
548 A. L. Shanthi and V. Ramesh
17. Kosta S, Aucinas A, Hui P, Mortier R, Zhang X (2012) Thinkair: dynamic resource allocation
and parallel execution in the cloud for mobile code offloading. In: Infocom, 2012 proceedings
IEEE, IEEE, pp. 945–953
A Novel Approach for Identification
of Healthy and Unhealthy Leaves Using
Scale Invariant Feature Transform
and Shading Histogram-PCA Techniques
1 Introduction
This because of varieties in climatic conditions, the cultivation crops and different
pieces of the plants, for example, roots, stem, leaf, and seeds are assaulted by the
infections [1]. Additionally, the agriculture crop sicknesses spread at a quicker rate
contrasted with the other class of yields. This will bring about low harvest yield and
cause budgetary misfortune to the ranchers. The plant’s and the yield’s wellbeing
can be kept up by applying the fundamental medication on the plants, yet this in a
roundabout way influences the soundness of the shoppers [2]. Further, the persistent
increment in populace requests more harvests. Along these lines, vital advances must
be taken to create more and solid yields.
The cultivation crops comprise of foods grown from the ground. As indicated by
the Indian government, in the year 2015, around 500 million individuals have a place
with white collar class and underneath neediness line [3]. For these classifications of
the individuals, notwithstanding the food grains, vegetables are of higher need and
K. S. Shashidhara · B. K. Rai
Nitte Meenakshi Institute of Technology, Bengaluru, Karnataka, India
e-mail: [email protected]
H. Girish
Cambridge Institute of Technology, Bengaluru, Karnataka, India
e-mail: [email protected]
M. C. Parameshwara
VemanaInstitute of Technology, Bengaluru, Karnataka, India
e-mail: [email protected]
V. Dakulagi (B)
Guru Nanak Dev Engineering College, Bidar, Karnataka, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 549
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_47
550 K. S. Shashidhara et al.
structures the essential and significant piece of the dinner contrasted with the natural
products.
The most every now and again utilized vegetables in the everyday life are given
in Table 1 [4, 5]. Experts (experienced farmers) have proposed that, contrasted with
the roots and stem portions of the plants, leaves are progressively defenseless for
infection assaults. They have recommended that distinguishing the wellbeing state
of the plant from the leaves is significant and just as simple contrasted with different
pieces of the plant. Subsequently, leaf is considered in this work for distinguishing
the wellbeing state of the plants. Specialists have given the quantity of days taken by
the infection to assault the plants totally, and this information is summed up in Table
1. Experts have proposed that more the water substance of the stem, more is the plant
inclined to ailments. The information given by the specialists on the water substance
of the stem is summed up in Table 2. From the table, it tends to be seen that the
accompanying three vegetables are having high water content, in particular: tomato,
potato, and beans. Potato is a one-time crop for each plant, as the yield is gotten
subsequent to uncovering the plant. The life expectancy of beans plant is around
a quarter of a year and in this life expectancy it yields four to multiple times. The
life expectancy of tomato plant is around a half year and in this life expectancy it
yields seven to multiple times [6, 7]. Additionally, tomato has more medical advan-
tages contrasted with potato and beans [7–9]. A portion of the medical advantages
of devouring tomato is as per the following: ensures vision and degenerative eye
ailment, lessens cardiovascular ailments, decreases the danger of prostate malignant
growth and bosom disease, forestalls kidney and nerve bladder stones, diminishes the
danger of blood clump, expands fat consuming limit, forestalls stroke, reestablishes
biochemical parity in diabetics, improves processing and forestalls obstruction, gives
solid and gleaming skin, and supports the hairs [10, 11]. Due to these medical advan-
tages and high return from the tomato plant, we have considered tomato in our work
[12].
2 Mathematical Framework
One of the different difficulties looked by the ranchers is to locate a specific plant’s
leaves that has been ailing or not, out of enormous number of tomato plants developed
in the homestead. Accepting that a decent framework exists with the end goal of
picture securing, this work robotizes the way toward distinguishing whether a specific
plant (leaf) is solid or not. Here, two strategies are proposed for the ID of solid
or unfortunate tomato leaves. First strategy depends on the scale invariant feature
transform (SIFT), and the subsequent technique depends on the shading histogram
and the principal component analysis (PCA). A short audit of the SIFT, the shading
histogram and the PCA is as per the following.
A Novel Approach for Identification of Healthy and Unhealthy … 551
2.1 SIFT
The SIFT removes the fixed nearby element focuses from an info picture that are
invariant to fundamental picture changes, for example, scale and revolution. The
info picture is exposed to Gaussian channels with various cover size utilizing various
estimations of σ, to get a lot of pictures with various sizes of blurring. The outrageous
focuses are extricated by looking at each purpose of the obscured picture with the
focuses in a similar space and the focuses in the local space. The Gaussian scale
change administrator utilized by Lowe was the distinction of Gaussian (DoG), and
the scale space is Gaussian contrast scale space is {G(x; y; σ )}, Lowe demonstrated
that the point extricated by this administrator is invariant in scale change of unique
picture I(x, y) (1 cm).
Every pixel in a RGB shading picture is a blend of the parts from red plane, green
plane, and blue plane. The shading histogram is a graphical portrayal of shading
dispersion in a picture. The shading histogram is acquired by plotting the histogram
of each plane (red, green, and blue) independently. Utilizing, the histogram data, the
shading picture can be evened out in two different ways to upgrade the picture quality,
if fundamental. In first technique, the shading picture evening out is performed by
applying dark scale histogram leveling on every one of the plane independently, and
pressing them back together to get the balanced shading picture.
In the subsequent technique, the RGB picture is changed into YIQ qualities and
afterward, the dim scale histogram leveling is applied on the Y channel, leaving
the I and Q channels unmodified. The adjusted Y and the unmodified I and Q are
connected and changed back to RGB to acquire the balanced shading picture.
2.3 PCA
The PCA removes important highlights from the informational collections and
lessens the information from higher dimensional space to the lower dimensional
space. It is one of the most much of the time utilized procedure for picture
acknowledgment applications [13]. Let be the quantity of preparing pictures.
552 K. S. Shashidhara et al.
To distinguish whether a tomato leaf is solid or not, two of the calculations are
proposed in this paper. In both the calculations, the execution is done in two stages.
The main stage is the database creation stage, where in the highlights are removed
from the info preparing set of tomato leaf pictures. The preparation set comprises of
both sound and unfortunate tomato leaves.
The subsequent stage is the recognizable proof stage, where in the highlights are
separated from the inquiry picture and contrasted and the highlights put away in the
database. On the off chance that there is a match over some specific limit, at that
point, the pertinent metadata (sound or unfortunate) is created as the yield. One of the
calculation depends on the SIFT and the other calculation depends on the shading
histogram and the PCA. Both the calculations are clarified in the accompanying
subsections.
The square graph of the proposed calculation dependent on the SIFT is appeared
in Fig. 1. From the square outline, it tends to be seen that the arrangement of steps
followed in the database creation stage and the recognizable proof stage are same.
The succession of steps is as per the following: input picture (preparing picture or
the question picture), standardization, and highlight extraction.
In the database creation stage, the info preparing picture is standardized, by
changing the RGB picture in to the PGM arrangement and afterward resizing to a
predefined size. After standardization, the component is separated utilizing the SIFT
and the element is put away in the database alongside the metadata of the relating
input preparing picture. The metadata is only whether the tomato leaf is solid or not.
At the end of the day, in the event that the information tomato leaf is undesirable, at
that point the relating removed highlights are put away under the unfortunate class
and on the off chance that the info tomato leaf is sound; at that point the comparing
extricated highlights are put away under the solid class. In the ID stage, a question
picture whose wellbeing condition is yet to be recognized by the calculation is given
as information. Following the comparative method in the database creation stage,
the element is extricated. This removed component is contrasted and the highlights
put away in the database and the applicable metadata is created as the yield of the
calculation.
A Novel Approach for Identification of Healthy and Unhealthy … 553
The database contains 80 leaf pictures of tomato plants caught utilizing the Sony
advanced camera. The highlights of the computerized camera are as per the following:
spatial goal of 18.2 uber pixels, advanced zoom of 120X, and optical zoom of 8X.
The inexact good ways from the outside piece of the camera focal point and the leaf
was 17–20 cm. Each picture was caught by keeping a sheet of dark paper under the
leaf and in this way making the foundation dark. This arrangement improves the
exhibition of the calculation in recognizing and characterizing the leaf as solid or
not. Eighty pictures were caught utilizing the above arrangement conditions. Out
of these 80 pictures, 28 pictures are sound and 52 pictures are unfortunate. Further,
out of those 52 undesirable pictures, 37 pictures are considered for preparing the
framework, and 12 pictures are considered as test pictures. These 37 pictures are
named type 1 and type 2. Under sort 1 there are 30 pictures and 7 pictures are of type
2. The leaves under sort 1 considered as ailment assault in early phase, under Type
2 considered as the illness assault last stage. Under sort 1 by apply medications the
infection can be fix without any problem. This malady will not impact encompassing
plants. Under sort 2 malady is in definite stage, by applying medications it is absurd
to expect to fix 100%, and it will influence encompassing plants.
This paper incorporates the proposed system which is executed utilizing SIFT
calculation and shading histogram-PCA. The trial results are as per Figs. 2 and 3
The removed key points of leaf pictures utilizing SIFT calculation are to
distinguish the specific evaluation.
554 K. S. Shashidhara et al.
4 Conclusion
The ranchers are confronting colossal misfortunes because of the leaf maladies. Leaf
maladies are caused because of the lopsided ecological conditions. Because of this
the whole harvest yield has been decreased, cost of the reaped yield will be sold at
significant expenses. The purchasers are in a roundabout way influenced by paying
gigantic total. Hence, insurance must be taken so as to forestall this ailment by
structuring a framework which screens the continuous information of the harvests.
The proposed paper utilizes SIFT and histogram-PCA calculation, which is
utilized to distinguish the leaf malady, checking the leaf is sound or not. The SIFT
A Novel Approach for Identification of Healthy and Unhealthy … 555
calculation is utilized to discover the highlights of each dataset and analyze the ques-
tion picture. In this dataset, the information picture and question picture key points
are considered for coordinating. On the off chance that the key points utilized is not
coordinated, at that point it is considered as unfortunate.
References
1 Introduction
Many machine learning classifiers are trained with the assumption that the distri-
bution of all the classes in the training set is equal. However, in most real-world
applications such as network intrusion detection, credit card fraud detection and
health screening [1], there is a dearth of classes which matter the most. For example,
in credit card fraud detection, the volume of data pertaining to legitimate transaction
is humongous when compared to data pertaining to frauds.
The situation where the ratio of instances in a class is uneven is termed as class
imbalance. Class imbalance leads to many challenges in training the classifiers. Class
imbalance occurs in data which has only two classes (binary class imbalance) and in
data which has multiple classes (multiclass imbalance). The range of methods used
to solve the problem is categorized as Data Level, Algorithmic Level and Hybrid
Level; the taxonomy of the methods has been given in Fig. 1.
The contribution of this paper is as follows.
• Comprehensive categorization of various families and methods to solve the class
imbalance problem.
• Discussion of techniques and evaluation strategies proposed in recent research
works.
• To elaborate on the category of methods which are vast, such as cost-sensitive
methods and ensemble methods, for solving the class imbalance problem.
P. P. Wagle (B)
Ethnus Consultancy Services Pvt. Ltd, Bengaluru, India
e-mail: [email protected]
M. V. Manoj Kumar
Nitte Meenakshi Institute of Technology, Yelahanka, Bengaluru 560064, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 557
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_48
558 P. P. Wagle and M. V. Manoj Kumar
The motivation of this study is to review the existing techniques to handle the
imbalance in the distribution of the classes. The novelty of this research lies in
presentation of the latest techniques of handling class imbalance like MAHAKIL:
diversity-based oversampling approach [2], clustering-based instance selection
(CBIS) [3], stacking ensemble learning [4], etc., and evaluation strategies like
Matthews correlation coefficient (MCC) [5].
This paper provides a comprehensive overview of the techniques used to handle
the effects of skewed class distribution. We have reviewed the standard methods
and latest research works which intended to solve the class imbalance problem,
also known as the skewed class distribution in machine learning. Section 2 deals
with the additional problems that potentially occur along with the problem of class
imbalance. Section 3 deals with the techniques published by various resources to
deal with the problem. Section 4 provides some examples of domain-specific class
imbalance problems and the techniques used to handle them. Section 5 deals with
the evaluation metrics to compare the techniques.
We present some research issues for the future and conclude the paper in Sect. 6.
A Comprehensive Review on the Issue of Class Imbalance … 559
In a binary classification scenario, the minority to majority class ratio can be nearly
equal or can be at the ratio of 1:2, 1:5, 1:10,000 and so on. Several research papers
discuss this issues, and certain classification algorithms have been reported to be
better suited for a smaller skew and certain algorithms for a larger skew. However,
the selection of algorithms also depends on other factors (Table 1).
In problems with a larger skew, the algorithms try to achieve higher accuracy
by incorrectly classifying all test examples as belonging to the majority class. This
scenario may result in non-detection of the monitory class, which in most cases is
more important than the majority classes: a problem which is omnipresent in all
domains presented in Table 3.
It is also noted that alternative methods of measuring classifiers’ performance such
as receiver operating characteristic (ROC) curves are used as opposed to other scalar
methods. Many studies have been carried out studying the effect of class imbalance
ratio on the performance of the classifiers, where some studies say that an equal ratio
may not yield the best performance. Hence, there may be other factors apart from
just the ratio.
Internal clustering is also referred to as disjuncts in the dataset. The imbalance present
within the classes is sometimes ignored, which results in erroneous decision bound-
aries. These imbalances can be imagined as clusters with a cluster. The experiment
described in [11] suggests that the problem is not directly caused by class imbalances,
but rather, that class imbalances may yield small disjuncts which, in turn, will cause
degradation. The authors of [8] argue that in order to improve classifier performance,
it is more useful to focus on the issues caused by small disjuncts’ than to focus on
class imbalance.
In addition to imbalanced class ratio in the training dataset, another challenge arises
where there is not much information about the minority class. More data is usually
preferred as it aids in better modelling. Particularly, lack of a proper data would
lead to misclassification as it affects the accuracy of the decision boundary in a
classification scenario.
The degree of overlapping between various classes presents several challenges for the
classifier to effectively separate the different classes by forming a decision boundary.
Japkowicz [6] presents class complexity as an important factor determining the
performance. Especially, the work in [10] develops a systematic study aiming to
question whether class imbalances are truly to blame for the loss of performance
of learning systems or whether class imbalances are not a problem by themselves.
Hence, in this review, we will not be considering class overlapping as a problem
caused by class imbalance.
In the next section, we shall present the existing methods to prevent the problem
of class imbalance, along with the other issues discussed above.
There have been various techniques developed to handle class imbalance in binary
classification problems. We have opted for binary classification problems for
presenting the methods for ease of understanding. These techniques can be grouped
into data preprocessing methods, algorithmic methods and hybrid methods [1].
A Comprehensive Review on the Issue of Class Imbalance … 561
Data preprocessing techniques or methods are concerned with balancing the class
distribution of the instances of both classes in the dataset. Balancing the class distri-
bution can be done either by including more instances of the minority class (oversam-
pling) or lesser instances of the majority class (undersampling) [12]. Both oversam-
pling and undersampling can have drawbacks. Undersampling can discard potentially
useful data, while oversampling instances may cause overfitting or a combination
of both [13]. Hence, we have a third category of methods, which we call Hybrid
Sampling, which attempts to overcome the drawbacks previously described [14].
Additionally, there are other family of techniques which fall under data prepro-
cessing methods known as the feature selection methods which are effective for high
dimensional data exhibiting class imbalance.
Undersampling Random undersampling [15] is a naive non-heuristic method of
randomly removing the instances of the majority class in order to balance the class
distribution.
Tomek links (TL) [16] is an undersampling method which can also be used as a
data cleaning method. It is a modification of condensed nearest neighbour rule (CNN)
[17] which uses rules to determine if a pair of instances E i and E j belonging to two
different classes form a Tomek link (find math from wiki). TL removes unnecessary
class overlap by removing the majority of class links until all minimally distanced
closest neighbour pairs are of the same class. The procedure involved in Tomek links
is outlined in Fig. 2.
One-Sided Selection (OSS) [19] is an undersampling method which is a combi-
nation of applying Tomek links followed by CNN. TL removes noisy and border-
line, whereas CNN eliminates the majority class examples that are distant from the
decision border.
The Edited Nearest neighbour (ENN) [20] rule removes any example whose class
label differs from the class of at least two of its three nearest neighbours. The Neigh-
bourhood Cleaning Rule [21] is an undersampling method, which is a modification
of ENN. If an instance belongs to the majority class and its labels differ from the
class of at least two of its neighbours, then the instance is removed. However, in the
same scenario, if the instance belongs to the minority class and is surrounded by
majority classes, then the latter are removed [18].
Another family of methods known as the Near-Miss family [22] perform under-
sampling of instances in the majority class based on their vicinity to other instances
in the training set.
The work of Drummond and Holte [23] has shown that undersampling produces
much better results when compared to oversampling using cost curves as an
evaluation metric.
Several clustering-based undersampling have been implemented in the past, which
have yielded good results.
The work in [24] proposes two novel clustering-based strategies which have been
applied to ensemble algorithms [25]. In the first strategy, ‘k’ clusters were generated
562 P. P. Wagle and M. V. Manoj Kumar
using the k-means clustering algorithm, where ‘k’ was equal to the numbers of
samples of the minority class. The majority class examples are replaced by the cluster
centres, as they are the representatives of the replaced examples. The other strategy
involved selecting the nearest neighbour of each cluster centre since it was a real
data sample to replace the centroids. Studies were made using datasets with varying
levels of imbalance, and the studies showed that the second strategy with ensemble
methods was much more preferable for datasets with a large imbalance ratio. The
most common method used in the clustering procedure in the algorithms was k-means
clustering. However, certain limitations of k-means clustering include determination
of the number of clusters before running the algorithm and many others [26]. Hence,
[3] recommends a Clustering and Instance Selection method (CBIS) based on the
clustering algorithm of affinity propagation [27]. As the name says, CBIS has two
components—Clustering and Instance Selection.1 An interesting fact is that both
[3, 28] conclude that clustering algorithm used with Multi-layer Perceptron (MLP)
classifier yields the best results, especially for small-scale datasets [28].
In the work [28], techniques based on clustering have been proposed where
backpropagation neural networks are used to solve the skewed class distribution.
Oversampling Random oversampling (ROS) is the oversampling equivalent of
random undersampling. It is a naive, non-heuristic method of balancing the class
distribution by replication of the majority class examples.
1Instance selection is used in the preprocessing step of training an ML model, where it retains
characteristics of the dataset while minimizing the bulk of the dataset.
A Comprehensive Review on the Issue of Class Imbalance … 563
Fig. 3 Example of an
instance synthesized using
SMOTE as described in [29]
ROS has two major shortcomings. Firstly, it may cause overfitting since the
minority class instances are replicated [30]. Replications may also cause a specific
decision region of the minority class, as observed by [29]. Hence, a new proposal was
made in [29] with a method Synthetic Minority Oversampling Technique (SMOTE)
which adds synthetic instances for the training procedure. Oversampling of the
minority class is done by introducing synthetic examples along the line segments
joining the existing minority class examples. Figure 3 shows an example of SMOTE
[23]. x i is a minority class instance which is selected. Four nearest neighbours x i1 –x i4
are chosen from the training set, and instances r 1 through r 4 are the new synthetic
examples generated.
Borderline-SMOTE is a popular extension to SMOTE which involves selecting
the borderline instances which are more likely to be misclassified [31]. Adaptive
Synthetic (ADASYN) sampling [32] is an oversampling procedure which is an
improvement over SMOTE wherein it not only generates synthetic examples adap-
tively, but also shifts the decision boundary in a bid to emphasize on learning the
difficult examples, thereby improving the performance.
Combining Undersampling and Oversampling The work in [31] shows that
SMOTE combined with undersampling methods performs better than just vanilla
undersampling, based on the former dominating the latter in receiver operating char-
acteristics (ROC) space. However, [33] enlists the disadvantages of SMOTE:- Noise
in samples may produce synthetic samples and may lead to blurring of boundaries
between the classes and lack of diversity in the generated samples. Thus, Liang et al.
[33] proposes an improvement over SMOTE named LR-SMOTE, which claims to
address the shortcomings of SMOTE. Evaluation of the experimental results depicts
significant reduction in noise and denotes the stability of the algorithm.
Hybrid Sampling To account for the limitations of oversampling and undersampling,
ensemble and hybrid methods have been proposed and have shown to address the
problems faced by the techniques discussed previously.
A critical issue that is not addressed in all the methods presented until now is
diversity.
There are certain relatively recent publications which address the issue of lack of
diversity2 in the oversampled majority class after the sampling process, especially
if the datasets consist of many sub-clusters, Fig. 1. MAHAKIL [2] is one such
2 Here, diversity refers to the diversity of the instances of the training data.
564 P. P. Wagle and M. V. Manoj Kumar
Unlike the previous methods, algorithmic methods work on modifying the learning
algorithms to avoid the bias towards the majority class. There are multiple methods
of doing the same, and they are categorized under the methods shown in Fig. 1.
Cost-sensitive learning has gained attention from the machine learning community
3 The chromosomal theory of inheritance, proposed by Sutton and Boveri, states that chromosomes
are the vehicles of genetic heredity [34] where both parents contribute chromosome pair of the
offspring.
A Comprehensive Review on the Issue of Class Imbalance … 565
The hybrid category in the taxonomy of Fig. 1 involves augmenting data prepro-
cessing and algorithmic techniques to alleviate the imbalance. The hybrid ensembles
4Auto-associative neural networks are a type of neural network used to simulate and explore the
associative process [57].
A Comprehensive Review on the Issue of Class Imbalance … 567
out to be mostly much better than the fifteen other methods with a higher area under
the curve (AUC), F-measure and G-mean. Another novel ensemble method where
imbalance data is converted into several balanced datasets and fed into classification
algorithms has been discussed in [67].
Sun et al. [68] presents an intelligent undersampling and ensemble-based clas-
sification method to resolve the problem of imbalanced classes in noisy situations,
which has shown to have better performance with other classifiers.
Other algorithms which combine sampling, feature selection and a combination
of one or more classifiers especially for particular domains are categorized as other
hybrid techniques. For instance, [69] shows that feature selection followed by under-
sampling will lead to generation of better Support Vector Machines in order to account
for the class imbalance in predicting protein function from sequence. Sun et al. [68]
proposes a biased random forest that employs k-nearest neighbours (k-NN) algo-
rithm in order to identify the critical areas in a training set, and based on the critical
areas, the standard random forest is fed with more random trees. Another instance
A Comprehensive Review on the Issue of Class Imbalance … 569
4 Applicable Domains
The selection of a particular technique is dependent on a domain and the nature of the
problem. We have listed a few domains in Table 3 on page 12, where the researchers
have applied techniques for minimizing the effect of class imbalance (Fig. 5).
5 Evaluation Metrics
minority classes which are usually the positive classes. Recall is the count of true
positives divided by the sum of the count of false negatives and true positives. It is
also known as sensitivity. Equation 8 is F-measure which is derived from the scores of
precision and recall [79], and it is the harmonic mean of both of the above-discussed
quantities, intended to balance them. G-mean provides a score which indicates the
capability of a classifier to balance between the accuracies of positive class and
negative class and is shown in Eq. 5. True-positive rate (TPR) is a measure of the
proportion of actual positive instances that were predicted as positive. Recall, also
known as sensitivity, is a measure of the proportion of actual positive cases that were
predicted as positive. Equation 3 shows how it is done. A model is better at correctly
identifying positive cases if the TPR is higher. Equation 4 represents sensitivity as
the true-negative rate (TNR), which suggests that there will be a fraction of actual
negatives that are forecasted as positives, which could be referred to as false positives.
The sum of specificity and false-positive rate given in Eq. 2 would always be 1. A
good ML model should have high specificity, which means a low FPR and a high
TNR. Fawcett [80] presents additional metrics like adjusted GM, optimized precision,
Mean-Class-Weighted Accuracy, Kappa, etc. The formulae for the metrics described
in this section can be found below.
TP + TN
Accuracy = (1)
TP + FP + TN + FN
FP
False positive rate = (2)
FP + TN
TP
True positive rate(Sensitivity) = (3)
TP + FN
TN
True positive rate(Specificity) = (4)
TN + FP
√
G − mean = Sensitivity × Specificity (5)
TP
Precision = (6)
TP + FP
TP
Recall = (7)
TP + FN
2 × Precision × Recall
F − measure = (8)
Precision × Recall
curves focus on the performance of the classifiers on the minority classes where it is
a plot of precision, i.e. positive predictive power (x-axis) versus sensitivity (y-axis).
The relationship between PR curves and ROC curves is detailed in [81]. In these
curves, the performance of individual classifiers is indicated as curves in a 2D graph.
The curve having the maximum area under the curve (AUC) is considered to be the
best among a set of classifiers. There are other metrics which focus on improving
upon ROC curves, namely [82] which introduces cost curves, which in addition to
sharing many properties of ROC curves also introduces performance assessment that
cannot be done using ROC curves. [5] discusses Matthews correlation coefficient as
better than other metrics, as it provides high scores only if the prediction obtained
good results in all cells of the confusion matrix.
The techniques for handling class imbalance are very diverse and humongous. As
the domains which class imbalance affects are quite a lot, there are many open issues
and challenges to be resolved for better classification results. A few of the future
research directions that we propose are as follows:-
1. Clustering-based undersampling techniques used k-means clustering algorithms
primarily. Usage of other clustering algorithms like the work [3] which uses
affinity propagation [27] for undersampling has to be explored further.
2. ROC curves are the most popular metric of a classifier’s performance while
learning under class imbalance. However, other techniques like cost curves [82]
and MCC curves [5] must also be covered in detail.
3. An effort must be made to port the existing algorithms which consist of multi-
class imbalance techniques to binary class imbalance techniques [39]. However,
decomposition itself involves many complexities and there is a need for a
comprehensive study to be performed on all of them.
4. The datasets for training the model are usually prepared with expert supervision
and sometimes may perform poorly post-training due to the difference in the
underlying distribution of the training and testing datasets; the development of
algorithms that are mostly independent of the distribution is an open research
problem.
5. The work in [38] deals with feature selection techniques for small-scale
datasets with high dimensionality. However, the authors have suggested addi-
tional study in future with respect to applying feature selection techniques and
deducing optimal feature selection techniques to handle class imbalance in high
dimensional datasets, where sampling methods were found to be ineffective.
6. The effect of noise on the performance of the classifiers has been generally
neglected, and there is a need for work to be done in studying the characteristics
of noise and its impact in predictive modelling involving imbalanced classes.
572 P. P. Wagle and M. V. Manoj Kumar
7 Conclusion
In the domain of machine learning and data mining, class imbalance remains to be
one of the most important problems for which there has been a good amount of
research work proposed till date. Innovative methodologies and effective evaluation
strategies continue to be proposed as the shortcomings of the class imbalance problem
are yet to be fully addressed. We also should note that the choice of applying various
techniques to handle class imbalance that we have discussed is domain specific. In
general, hybrid and ensemble strategies are to be preferred where a combination of
sampling, feature selection and algorithmic techniques is to be applied, as seen in
Table 3. In this way, multiple strategies compensate for the shortcomings of individual
techniques. The paper, apart from including the latest proposed techniques, also dives
deep into cost-sensitive and ensemble methods and provides an extensive taxonomy
of handling class imbalance and evaluating the techniques, which we found to be
limitations of previous review works.
In the light of the boom in data driving the connected world, classification has been
affected by the problem of skewed class distribution, and alleviating this problem
is a major step towards making the classifiers more performant for better results in
predictive modelling.
References
1. Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions.
Prog Artif Intell 5(4):221–232
2. Bennin KE, Keung J, Phannachitta P, Monden A, Mensah S (2017) Mahakil: diversity based
oversampling approach to alleviate the class imbalance issue in software defect prediction.
IEEE Trans Software Eng 44(6):534–550
3. Tsai C-F, Lin W-C, Hu Y-H, Yao G-T (2019) Under-sampling class imbalanced datasets by
combining clustering analysis and instance selection. Inf Sci 477:47–54
4. Rajagopal S, Kundapur PP, Hareesha KS (2020) A stacking ensemble for network intrusion
detection using heterogeneous datasets. Secur Commun Netw 2020
5. Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (mcc) over
f 1 score and accuracy in binary classification evaluation. BMC Genomics 21(1):1–13
6. Japkowicz N (2000) The class imbalance problem: significance and strategies. In: Proceedings
of the 2000 international conference on artificial intelligence, vol 56. Citeseer
7. Chawla NV, Japkowicz N, Kotcz A (2004) Editorial: special issue on learning from imbalanced
data sets. SIGKDD Explor Newsl 6(1):1–6. [Online]. Available: https://doi.org/10.1145/100
7730.1007733
A Comprehensive Review on the Issue of Class Imbalance … 573
8. Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data
Anal 6(5):429–449
9. Das B, Krishnan NC, Cook DJ (2013) Handling class overlap and imbalance to detect prompt
situations in smart homes. In: 2013 IEEE 13th international conference on data mining
workshops. IEEE, pp 266–273
10. Prati RC, Batista GE, Monard MC (2004) Class imbalances versus class overlapping: an
analysis of a learning system behaviour. In: Mexican international conference on artificial
intelligence. Springer, pp 312–321
11. Jo T, Japkowicz N (2004) Class imbalances versus small disjuncts. SIGKDD Explor Newsl
6(1):40–49. [Online]. Available: https://doi.org/10.1145/1007730.1007737
12. Chawla NV (2009) Data mining for imbalanced datasets: an overview. In: Data mining and
knowledge discovery handbook, pp 875–886
13. Batista GE, Bazzan AL, Monard MC et al (2003) Balancing training data for automated
annotation of keywords: a case study. In: WOB, pp 10–18
14. Ali A, Shamsuddin SM, Ralescu AL (2013) Classification with class imbalance problem. Int
J Adv Soft Comput Appl 5(3)
15. Kotsiantis S, Pintelas P (2003) Mixture of expert agents for handling imbalanced data sets.
Ann Math Comput Teleinform 1(1):46–55
16. Two modifications of CNN. IEEE Trans Syst Man Cybern SMC-6(11):769–772 (1976)
17. Hart P (1968) The condensed nearest neighbor rule (corresp.). IEEE Trans Inf Theory
14(3):515–516
18. Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for
balancing machine learning training data. ACM SIGKDD Explor Newsl 6(1):20–29
19. Kubat M, Matwin S et al (1997) Addressing the curse of imbalanced training sets: one-sided
selection. In: ICML, vol 97. Citeseer, pp 179–186
20. Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE
Trans Syst Man Cybern 3:408–421
21. Laurikkala J (2001) Improving identification of difficult small classes by balancing class
distribution. In: Conference on artificial intelligence in medicine in Europe. Springer, pp 63–66
22. Mani I, Zhang J (2003) kNN approach to unbalanced data distributions: a case study involving
information extraction. In: Proceedings of workshop on learning from imbalanced datasets, vol
126. ICML United States
23. Fern´andez A, Garcia S, Herrera F, Chawla NV (2018) Smote for learning from imbalanced
data: progress and challenges, marking the 15-year anniversary. J Artif Intell Res 61:863–905
24. Sun Z, Song Q, Zhu X, Sun H, Xu B, Zhou Y (2015) A novel ensemble method for classifying
imbalanced data. Pattern Recogn 48(5):1623–1637
25. Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2011) A review on ensembles for
the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans
Syst Man Cybern Part C (Appl Rev) 42(4):463–484
26. Raykov YP, Boukouvalas A, Baig F, Little MA (2016) What to do when k-means clustering
fails: a simple yet principled alternative algorithm. PLoS ONE 11(9):e0162259
27. Wang K, Zhang J, Li D, Zhang X, Guo T (2008) Adaptive affinity propagation clustering. arXiv
preprint arXiv:0805.1096
28. Yen S-J, Lee Y-S (2009) Cluster-based under-sampling approaches for imbalanced data
distributions. Expert Syst Appl 36(3):5718–5727
29. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-
sampling technique. J Artif Intell Res 16:321–357
30. Chawla NV (2003) C4. 5 and imbalanced data sets: investigating the effect of sampling method,
probabilistic estimate, and decision tree structure. In: Proceedings of the ICML, vol 3, p 66
31. Han H, Wang W-Y, Mao B-H (2005) Borderline-smote: a new over-sampling method in imbal-
anced data sets learning. In: International conference on intelligent computing. Springer, pp
878–887
32. He H, Bai Y, Garcia EA, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbal-
anced learning. In: 2008 IEEE international joint conference on neural networks (IEEE world
congress on computational intelligence). IEEE, pp 1322–1328
574 P. P. Wagle and M. V. Manoj Kumar
33. Liang X, Jiang A, Li T, Xue Y, Wang G (2020) Lr-smote: an improved un-balanced data set
oversampling based on k-means and Svm. Knowl-Based Syst 196:105845
34. Lumen. Genetics and inheritance. [Online]. Available: https://courses.lumenlearning.com/san
jacinto-biology1/chapter/chromosomal-theory-of-inheritance-and-genetic-linkage
35. Wong GY, Leung FH, Ling S-H (2013) A novel evolutionary preprocessing method based
on over-sampling and under-sampling for imbalanced datasets. In: IECON 2013-39th annual
conference of the IEEE industrial electronics society. IEEE, pp 2354–2359
36. Barua S, Islam MM, Yao X, Murase K (2012) Mwmote–majority weighted minority
oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng
26(2):405–425
37. Menzies T, Dekhtyar A, Distefano J, Greenwald J (2007) Problems with precision: a response
to comments on ‘data mining static code attributes to learn defect predictors.’ IEEE Trans
Software Eng 33(9):637–640
38. Wasikowski M, Chen X-W (2009) Combating the small sample class imbalance problem using
feature selection. IEEE Trans Knowl Data Eng 22(10):1388–1400
39. Van Hulse J, Khoshgoftaar TM, Napolitano A, Wald R (2012) Threshold-based feature selec-
tion techniques for high-dimensional bioinformatics data. Netw model Anal Health Inform
Bioinform 1(1–2):47–61
40. Threshold-based feature selection techniques for high-dimensional bioinformatics data. Netw
Model Anal Health Inform Bioinform 1(1–2):47–61
41. Zhou Z-H, Liu X-Y (2010) On multi-class cost-sensitive learning. Comput Intell 26(3):232–257
42. Sammut C, Webb GI (2011) Encyclopedia of machine learning. Springer Science & Business
Media
43. Ling CX, Sheng VS (2008) Cost-sensitive learning and the class imbalance problem. Encycl
Mach Learn 2011:231–235
44. Fern´andez A, Garc´ıa S, Galar M, Prati RC, Krawczyk B, Herrera F (2018) Learning from
imbalanced data sets. Springer, vol 10
45. Turney PD (1994) Cost-sensitive classification: empirical evaluation of a hybrid genetic
decision tree induction algorithm. J Artif Intell Res 2:369–409
46. Ling CX, Yang Q, Wang J, Zhang S (2004) Decision trees with minimal costs. In: Proceedings
of the twenty-first international conference on machine learning, p 69
47. Drummond C, Holte RC et al (2003) C4.5, class imbalance, and cost sensitivity: why under-
sampling beats over-sampling. In: Workshop on learning from imbalanced datasets II, vol 11.
Citeseer, pp 1–8
48. Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example
weighting. In: Third IEEE international conference on data mining. IEEE, pp 435–442
49. Domingos P (1999) Metacost: a general method for making classifiers cost-sensitive. In:
Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery
and data mining, pp 155–164
50. Fan W, Stolfo SJ, Zhang J, Chan PK (1999) Adacost: misclassification cost sensitive boosting.
In: Icml, vol 99. Citeseer, pp 97–105
51. Fern´andez, Garc´ıa S, Galar M, Prati RC, Krawczyk B, Herrera F (2018) Cost-sensitive
learning. Springer International Publishing, Cham, pp 63–78
52. Elkan (2001) The foundations of cost-sensitive learning. In: International joint conference on
artificial intelligence, vol 17(1). Lawrence Erlbaum Associates Ltd, pp 973–978
53. Veropoulos, Campbell C, Cristianini N et al (1999) Controlling the sensitivity of support vector
machines. In: Proceedings of the international joint conference on AI, vol 55. Stockholm, p 60
54. TAX MJ (2001) One-class classification. PhD dissertation, Delft University of Technology.
Delft, Netherlands
55. Attenberg J, Ertekin S (2013) Class imbalance and active learning. In: Imbalanced learning:
foundations, algorithms, and applications, pp 101–149
56. Bellinger C, Sharma S, Japkowicz N (2012) One-class versus binary classification: Which and
when? In: 2012 11th International conference on machine learning and applications, vol 2.
IEEE, pp 102–106
A Comprehensive Review on the Issue of Class Imbalance … 575
81. Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In:
Proceedings of the 23rd international conference on Machine learning, pp 233–240
82. Drummond C, Holte RC (2006) Cost curves: an improved method for visualizing classifier
performance. Mach Learn 65(1):95–130
An Ameliorate Analysis
of Cryptocurrencies to Determine
the Trading Business with Deep Learning
Techniques
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 577
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_49
578 N. K. Sakure et al.
2 Related Works
Artificial intelligence and massive facts answers are utilized in a whole lot of regions
in finance and business, which include loan/coverage underwriting, fraud detec-
tion, patron service, sentiment/information analysis, algorithmic trading, portfolio
management, marketing, economical product advice systems, commercial advice,
and so on. The following sections undergo a few AI strategies and their packages
in many domains. The objective of this project is to give technical and fundamental
analysis using machine learning approaches.
One of the maximum essential thoughts in economics and commercial enterprise
is buying and selling. The shopping and promoting of a financial entity, including
goods, stocks, currencies, and so forth, maybe kind of characterized as buying and
selling. Individuals and companies have interaction in buying and selling to make
a profit. Pre-change evaluation, buying and selling sign creation, change execution,
and post-change evaluation are the four additives of the buying and selling process
[19]. Algorithmic buying and selling is an extended period that refers back to the
automation of any aggregate of those procedures or all of them. Artificial intelligence
(AI) has essentially altered this enterprise by automating the buying and selling
process, and lots of buying and selling algorithms can generate income without the
intervention of a human.
One of the maximum urgent financial troubles of our time is fraud. Across the
globe, fraudulent behavior prices companies billions of dollars. According to several
estimates, overall fraud losses were $27. Eighty-five billion in 2018 and this parent
is predicted to upward thrust to $40. Sixty-three billion over the following ten years
[23]. This sum is more than numerous growing countries’ annual GDP. As a result, pc
structures able to figure out and stop fraud are essential, as doing so could extensively
boost up the corporation and financial growth. Researchers have devised structures
that integrate synthetic intelligence and system learning to assemble such structures.
Anomaly detection and misuse detection are the two-word classes that each one of
anomaly detection and misuse detection is fraud detection techniques fall into [31].
Anomaly detection learns a consumer’s transaction behavior. Any new transaction
completed via way of that consumer is assessed as every day or bizarre primarily
based totally on the consumer’s initial transactions. The version has constructed
the usage of tagged facts set of all consumers, and fraudulent behavior is decided
primarily based totally on popular fraudulent trends.
In [13], advise a learning-primarily based inventory fee prediction technique.
They use the New York inventory trade dataset, incorporating the records, open fee,
near fee, and volume. They use the long short-term memory recurrent neural network
(LSTM). LSTMs function a reminiscence molecular that correlates to neurons, much
like conventional synthetic neural networks. These reminiscence cells might also
offer a hyperlink among reminiscences within the enter and the draw close records
structure, resulting in a unique prediction. The LSTM they hired had a sequential
enter layer, LSTM layers, a dense layer that used the ReLU activation characteristic,
and a linear activation characteristic output layer. Roondiwala et al. ran a sequence of
580 N. K. Sakure et al.
tests, enhancing numerous parameters, and observed that the very best acting LSTM
had an RMSE (Root suggest rectangular error) of 0.00859, which is relatively low,
proving that synthetic intelligence may be used to as it should be are expecting
inventory values.
Customer assist has additionally been automatic the usage of synthetic Intelligence
and statistics technology techniques [12]. Businesses now do not want to recruit
expert customer service representatives due to AI technology. A chatbot is to be had
24 h a day, seven days per week can reply to many client questions, and this technique
has proven to be very famous in the latest years in a lot of industries. AI can control
customer questions from a good-sized expertise database from which it has learned.
This notably lowers a firm’s employment costs, and the identical paintings can be
performed for much less money, allowing the agency to flourish.
Marketing has additionally been impacted through synthetic intelligence (AI) and
massive data. Data-pushed advertising techniques are appreciably greater powerful
than human-primarily based advertising tactics. It is all approximately YouTube those
days. In reality, Facebook, Instagram, and all different social media web websites
leverage synthetic intelligence and device gaining knowledge of technology to offer
specific and personalized advertisements. A guitar fanatic is more likely to come upon
a guitar-associated commercial on such networks. In the enterprise realm, synthetic
intelligence has arguably had the most satisfactory effect on advertising [10].
Table 1 summarizes previous work particularly that completed in recent years.
According to the survey, artificial intelligence can be utilized to help businesses
flourish in a variety of ways, from fraud detection to product recommendations.
Business analytics is used by stock traders and Bitcoin traders to build predictive
models.
3 Cryptocurrency
Bitcoin is a peer-to-peer (p2p) price coins machine that turned into created in 2008 as
non-regulated virtual forex without a felony standing. It is classed as a type of cryp-
tocurrency due to its cryptographic characteristic inside the technology and switch
of funds. Bitcoin has been the famous maximum forex inside the region of quan-
tity buying and selling in current years, making it the leading promising economic
medium for investors [18]. It secures the transaction with the aid of using encrypting
the sender, receiver, and transaction quantity [11].
Ethereum (XRP) is a Turing-entire decentralized blockchain primarily based
totally framework for growing and executing clever contracts, and disbursed systems
[7, 25]. The cost of the coin is known as ether. Buterin based it in 2013, and it turned
into funding 12 months later with a complete of US$18 million in Bitcoins raised via
a web public crowd sale. Ether has no regulations on its move and may be traded on
cryptocurrency exchanges. It is now no longer supposed to be a fee system; instead,
it is considered for use inside the Ethereum network [5].
An Ameliorate Analysis of Cryptocurrencies to Determine … 581
Charles Lee created Litecoin (LTC), which was launched in October 2011 and
made use of a comparable generation to Bitcoin. The block era time has been reduced
in half (from 10 to 5 min according to block), and the most restriction has been
raised to eighty-four million that is four instances that of Bitcoin [15]. Litecoin is the
cryptocurrency silver standard, and it is far now the second one maximum extensively
regular through each miner and exchanges. It makes use of the scrypt encryption set
of rules, which differs from SHA-256, and became designed to hurry up transaction
affirmation at the Bitcoin network. It also uses a set of rules, which is proof against
hardware advancements.
NEM is a peer-to-peer community and blockchain notarization platform that
allows customers to ship and acquire cash online. Because it has a collectively owned
notarization, NEM turns into the primary public/personal blockchain combo [7].
Ripple is a dispensed peer-to-peer community price medium owned and maintained
via way of means of a single company [11]. It became based via Jed McCaleb and
Chris Larsen as an open supply virtual cash. It additionally provides some other layer
of security. The Byzantine Consensus Protocol became used to construct Ripple, and
the most range of Ripple is one hundred million.
Stellar, like Ripple, is a complete safety tool this is built the usage of the
Byzantine Consensus Protocol. Stellar has installed a brand new device to execute
economic transactions that consist of open source, dispersed ownership, and
countless ownership [11].
582 N. K. Sakure et al.
4 Proposed Methodology
All models on this have a look at are fed with time collection records primarily based
totally on five years of everyday history; however, this could alternate relying upon
the datasets to be had from the source. Between 2013 and 2018, the records become
generated from every day open, closing, high, and occasional fees, of every day,
buying and selling for a complete of six distinct varieties of cryptocurrencies, and it
becomes obtained from the marketplace capitalization database.
Mastering evaluation is important for buying and selling success. Technical eval-
uation and essential evaluation are strategies for figuring out destiny value. Technical
evaluation forecasts destiny charge the use of buying and selling statistics from the
market, which includes charge and buying and selling volume, while different strate-
gies use statistics from out of doors the market, which includes monetary conditions,
hobby rates, and geopolitical events, to estimate destiny direction [6]. Many buyers
give attention to technical evaluation, while others give attention to essential evalua-
tion. Some buyers, on the alternative hand, are inquisitive about the overlaps among
essential and technical evaluation. The goal of this project is to use machine learning
techniques to provide technical analysis. Machine learning has been established as
a serious model in classical statistics in the forecasting sector for more than two
decades [2, 16]. The proposed methodology is shown in Fig. 1.
Data Acquisition
Processing
Learning
Supervised
Learning
Predictive
Prediction Output Results
To get more value out of business analytics, use advanced artificial intelligence
approaches,
1. Create strong prediction and classification models using neural networks, genetic
algorithms, support vector machines, and fuzzy systems, among other methods
2. Improve fraud detection, cross-selling, credit score analysis, and profiling
3. New case studies and examples from around the company are included.
Artificial intelligence can help people get more value out of business analytics,
account for uncertainty and complexity more effectively, and make smarter decisions.
This book delves into today’s most important artificial intelligence principles, tools,
knowledge, and tactics, as well as how to put them to use in the real world. Some of
the common terminologies are specified in Table 2.
The performance measures for each cryptocurrency type according to classifiers are
displayed first in the result Sect. 4. These serve as a checkpoint for the rest of the
conversation. The investigation is divided into two major experiments: (i) various
classifiers’ performance measures and (ii) machine learning algorithms’ predicted
Bitcoin value versus actual value. On the cryptocurrency market capitalization, Fig. 3
demonstrates the performance accuracy in relation to four classifiers. The training
and testing datasets in our time series data are shown in Fig. 2 (Table 3).
Table 3 Cryptocurrencies
Cryptocurrencies Training data Testing data
with different testing and
training data Observation Observation
Bitcoin 1388 364
Ethereum 526 364
Litecoin 1358 364
NEM 657 364
Ripple 1262 364
Stellar 896 364
References
1. Adhikari S, Thapa S, Shah BK (2020) Oversampling based classifiers for categorization of radar
returns from the ionosphere. In: 2020 international conference on electronics and sustainable
communication systems (ICESC). IEEE, pp 975–978
2. Ahmed NK, Atiya AF, Gayar NE, El-Shishiny H (2010) An empirical comparison of machine
learning models for time series forecasting. Economet Rev 29(5–6):594–621
3. Awoyemi JO, Adetunmbi AO, Oluwadare SA (2017) Credit card fraud detection using machine
learning techniques: a comparative analysis. In: 2017 international conference on computing
networking and informatics (ICCNI). IEEE, pp 1–9
4. Bhatt G (2020) Agriculture and food e-newsletter
5. Buterin V et al (2014) A next-generation smart contract and decentralized application platform.
White Paper 3(37)
6. Chaigusin S (2014) An application of decision tree for stock trading rules: a case of the stock
exchange of Thailand
7. Chuen DLK, Guo L, Wang Y (2017) Cryptocurrency: a new investment opportunity? J Altern
Investments 20(3):16–40
8. Colianni S, Rosales S, Signorotti M (2015) Algorithmic trading of cryptocurrency based on
twitter sentiment analysis. CS229 Project, pp 1–5
9. Cui L, Huang S, Wei F, Tan C, Duan C, Zhou M (2017) Superagent: a customer service chatbot
for e-commerce websites. In: Proceedings of ACL 2017, system demonstrations, pp 97–102
10. Erevelles S, Fukawa N, Swayne L (2016) Big data consumer analytics and the transformation
of marketing. J Bus Res 69(2):897–904
11. Farell R (2015) An analysis of the cryptocurrency industry. Wharton Res Scholars J Paper 130
12. Ghimire A, Thapa S, Jha AK, Adhikari S, Kumar A (2020) Accelerating business growth with
big data and artificial intelligence. In: 2020 fourth international conference on I-SMAC (IoT
in social, mobile, analytics and cloud) (I-SMAC). IEEE, pp 441–448
13. Ghosh A, Bose S, Maji G, Debnath N, Sen S (2019) Stock price prediction using LSTM on
Indian share market. In: Proceedings of 32nd international conference on computer applications
in industry and engineering, vol 63, pp 101–110
14. Gupta S, Singhal A (2017) Phishing URL detection by using artificial neural network with
PSO. In: 2017 2nd international conference on telecommunication and Networks (TEL-NET).
IEEE, pp 1–6
15. Heid A (2013) Analysis of the cryptocurrency marketplace. Retrieved 15 Feb 2014
16. Hitam NA, Ismail AR (2018) Comparative performance of machine learning algorithms for
cryptocurrency forecasting. Ind J Electr Eng Comput Sci 11(3):1121–1128
17. Kim J, Kim J, Thu HLT, Kim H (2016) Long short term memory recurrent neural network
classifier for intrusion detection. In: 2016 international conference on platform technology and
service (PlatCon). IEEE, pp 1–5
18. Krause D, Pham N (2017) Bitcoin a favourable instrument for diversification? A quantitative
study on the relations between bitcoin and global stock markets
586 N. K. Sakure et al.
1 Introduction
The abundant research in human authentication features used was extracted from
the face. In recent years, texture feature extraction [10] from the iris image has
drawn attention as a means of the soft biometric attribute in identifying the gender
of a person. The major advantage of using soft biometrics is that it helps in the
faster retrieval of identities when aggregated with corresponding biometric data. Iris
information had effectively applied in diverse areas as airport check-in or refugee
control [1] and can be used in cross-spectral matching scenarios [5] while comparing
RGB images and NRI images. By improving the recognition attributes and accuracy
provides additional semantic information about an unfamiliar area that fills the gap
between machine and human descriptions about entities [1].
Iris texture feature extraction is well protected as it is an internal organ of the
eye and externally visible from a distance, unique and has a highly complex pattern.
The pattern is stable over the lifetime except for pigmentation. Images of the iris are
taken in visible and near-infrared light. The outside layer, which includes the sclera
and cornea, is fibrous and protective; the middle layer, which includes the choroid,
ciliary body, and iris, is vascular; and the innermost layer, which includes the retina,
is nerve or sensory [11].
The major challenges in extracting iris information are the distance between
camera and eyes, occlusion by the eyelid, eyelashes, eye rotation, and the light
effect in acquiring the image. The camera placed at a distance will capture inconsis-
tent iris size. Occlusion by eyelids and eyelashes may result in inappropriate and/or
B. Patil (B)
Gulbarga University, Kalaburagi, India
e-mail: [email protected]
M. Hangarge
Department of Computer Science, KASC College Bidar, Bidar, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 587
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_50
588 B. Patil and M. Hangarge
insufficient features. The variation in light will cause pupil dilation, which affects the
segmentation method. Eye rotation or tilting head adds variations in the segmentation
process because of intra-class variations.
The aim of this paper is to experiment the gender prediction dependencies like
whole eye image or normalized iris image, the split dataset as between training and
testing data, feature extraction methods traditional machine learning models or neural
network models, small dataset or augmented dataset. Rest of the paper discusses
about the general gender prediction steps, related work in gender prediction using
iris images, discussion of the results, and conclusion of the work.
2 General Steps
3 Related Work
Thomas et al. [18] published the first paper on gender prediction from geometric and
texture features of iris images. The researchers combined the CASIA Dataset, UPOL
Dataset, and UBIRIS Dataset (a total of 57,137 images) with equal distribution of all
genders, generated a feature vector by applying 1D Gabor filters to the normalized
iris image using Daugman’s rubber sheet method, used information gain for feature
Gender Prediction Using Iris Features 589
Table 1 (continued)
S. No. Dataset Creator Size (no. Resolution Format Remark
images)
10 WVU The research 1852 Synthetic Iris
group at West dataset
Virginia collection
University
USA
11 ND (GFI) Iris The Computer 64,980
database Vision
Research Lab
(CVRL) at
University of
Notre Dame,
USA
12 IIT Delhi IIT Delhi, 1120 320 × 240 BMP
New Delhi,
India
selection, and later applied C4.5 decision tree algorithm for classification. Initially,
the authors have used SVM and neural networks for classification. However, they
could not get better results than the decision tree techniques. The authors achieved
75% accuracy and enhanced it to 80% by collecting bagging and random subspaces
with a decision tree. Here, the authors have considered only the left iris for the
experimentation.
Lagree and Bowyer [9] carried the gender prediction based on the SVM clas-
sifier training. The classification is based on the features generated by applying
simple texture feature extraction methods like spot detector, line detector, laws texture
features on normalized iris image of size 40 × 240 and eliminated the occlusions
like the eyelid, eyelash, etc. The accuracy achieved by the authors using twofold,
fivefold, and tenfold cross-validation with the Weka SMO SVM classifier was about
62%. The authors claim that their accuracy is less than Thomas et al. because of
smaller size of the dataset. The researchers have used the same dataset for predicting
both gender and ethnicity.
Tapia et al. [16] claimed accuracy of 91.33% in gender prediction using SVM
classifier for uniform LBP and conventional LBP for subject-disjoint dataset for
training and testing and also used tenfold validations. The Gabor filters were applied
to the normalized image and then transformed into binary iris code with four levels,
which was considered as more stable iris information for predicting gender.
Tapia et al. [17] clarified that authors had used 1500 images from unique subjects
in [16] with incorrect labels. They were able to achieve 91% accuracy and were able
to get this due to overlapping training and testing datasets. In [17], the disjoint train-
test sets were created concerning the subject and used mutual information measures
(mRMR, CMIM, weighted mRMR, and weighted CMIM) for feature selection tested
for statistical significance of gender information distribution across the different
bands of the iris using ANOVA test. In this current work, the three datasets used are:
the UND Dataset, ND-Gender-From-Iris (NDGFI) Dataset, and a subject-disjoint
validation set (UND V). The authors observed that CMIM gives better accuracy than
mRMR and obtained 89% of prediction accuracy by fusing the best features from
left and right iris code.
Tapia and Aravena [14] proposed a modified Lenet-5 CNN model for achieving
a better gender prediction rate. The modified network consists of four convolution
layers and one fully connected layer with a minimum number of neurons. A minimum
number of neurons are considered to reduce the risk of over-fitting and solve the two-
class gender prediction problem. The authors adopted data augmentation to increase
the dataset size from 1500 to 9500 images for each eye. The authors conclude that
the fusion of CNN for the right and left eye gives better prediction than the single
eye, separately.
Tapia and Perez [14] used 2D quadrature quaternionic filter for classification and
replaced the 1D log-Gabor filter with 2D Gabor filters. The 2D Gabor filters encoded
with the normalized image phase information consist of 4 bits per pixel. The authors
conducted five experiments. At first, using all the features from the normalized image
for classification and other experiments are built over this model. The second exper-
iment used transfer learning with a VGG19 model for extracting features. The next
592 B. Patil and M. Hangarge
experiment applied a genetic algorithm for selected blocks of normalized images and
used raw pixel values, principal component analysis (PCA), and local binary patterns
(LBP) as features. The fourth experiment was conducted using different variants of
mutual information for feature extraction and used SVM and ten ensemble classi-
fiers for classification. In the last experiment, gender classification was done using
the encoding images with quaternioc code (QC) with 3 and 4 bits per pixel and
observed that 4 bits per pixel show better results than 3 bits per pixel. The authors
achieved maximum accuracy of 95.45% for gender prediction.
Tapia and Arellano [15] proposed modified binary statistical image features
(mBSIF) for gender prediction. The experiments were carried out with different
filter sizes ranging from 5 × 5 to 13 × 13 and number of bits from 5 to 12 and
observed that 11 × 11 shows better prediction accuracy for MBSIF histogram with
94.66% for the left eye and 92% for right eye with 10 bits per pixel.
Bobeldyk and Ross [1] made an attempt to find the extended ocular region, the
iris-excluded ocular region, the iris-only region, and the normalized iris-only region
was used to determine the gender prediction accuracy. The authors used BSIF code
for feature extraction and applied SVM classifier for the classification of males and
females. They made the geometric adjustment so that the iris was at the center of
the image and tessellated it into blocks. Then, the histogram of BSIF is evaluated
for each region. The histograms are normalized before concatenating them into a
feature vector. The resulting feature vector is used for classification. The authors also
observed the prediction accuracy by varying window sizes for BSIF and obtained an
accuracy of 85.7%. For the research, BioCOP2009 Dataset was used.
Bobeldyk and Ross [2, 3] expanded their earlier work [1] by considering local
binary pattern (LBP) features along with BSIF features and were able to achieve
maximum accuracy of 87.9%. The author also observed the impact of a number
of bits in BSIF code with respect to the computational time and memory. And the
impact of race on gender prediction also tested the results with the cross dataset.
They used three different datasets (BioCOP2009 Dataset, Cosmic contact Dataset,
and GIF Dataset) for their research.
Bobeldyk and Ross [4] have investigated the impact of resolution on gender predic-
tion without reconstructing the low-resolution image to a high-resolution image.
Used BioCOP2009 Dataset and Cosmic contact Dataset for their research. In this
work, researchers used BSIF code with SVM classifier and CNN-based classifier and
observed 72.1% and 77.1% accuracy for the 30-pixel image, respectively. Authors
have used small networks with fewer neurons for CNN as the input image’s size
is small and needs smaller training samples. Also, they carried out experiments on
gender prediction accuracy by varying the window size from 340 × 400 to 2 × 3
and concluded that 5 × 6 ocular images contain gender information with reduced
complexity.
Singh et al. [12] utilized a variation of an auto-encoder in which the attribute class
label has been included in conjunction with the reconstruction layer. They used NIR
ocular pictures that had scaled down to 48 × 64 pixels. The GFI and ND-Iris-0405
Datasets were used for their method. The authors applied RDF and NNet classifiers
and achieved an accuracy of 83.17%. They claim that the deep class encoder only
Gender Prediction Using Iris Features 593
takes a quarter of the overall training time, and their results outperform the outcomes
of Tapia et al. [17].
Sreya and Jones [13] used the IITD Dataset for investigation and ANN for iris
recognition. The authors explained the steps involved in recognition in detail. The
experiments were carried out on cropped NIR images to locate the pupil region. The
authors conclude that the prediction accuracy depends on processing.
Kuehlkamp and Bowyer [7] investigated the impact of mascara on iris gender
prediction. They got a 60% gender prediction accuracy using only the occlusion mask
from each image and 66% accuracy when LBP was used in conjunction with an MLP
network. Also, they were able to attain up to 80% accuracy using the complete ‘eye’
image using CNNs and MLP’s. The authors used the GFI Dataset and classified it as
Males, Females With Cosmetics (FWC) and Females No Cosmetics (FNC).
In this work, the experiments are conducted by adopting different approaches to know
the suitable criteria for the prediction. We have used two publicly available datasets:
IITD Dataset [8] with image size of 320 × 240 and SDUMLA-HMT Dataset with
768 × 576. Both the datasets have female eye image count less than that of male eye
image count, so the eye images are augmented to generate 11,512 male eye images
and 11,906 female eye images that meet experimentation purpose.
Initial experiments were conducted using traditional machine learning classifica-
tion methods based on normalized iris texture features as shown in Fig. 1, as cited in
literature study. We have used local binary pattern (LBP), Gabor filter-based feature
extraction methods for getting the texture features from the normalized iris image
and used SVM and random forest for classification. The experiments are carried out
using the IITD Dataset [8] and SDUMLA-HMT Dataset, the results are given as in
Table 2, and SVM for Gabor features shows enhanced results.
Next experiment was done using dense neural network for classification with
20% dropout and convolution neural network for feature extraction from whole eye
image and normalized iris images. Deep neural network gives an accuracy of 73.96%
and 90.97% for SDUMLA-HMT Dataset when trained using whole eye image and
normalized iris image, respectively, and an accuracy of 98.92% for normalized IITD
Dataset.
Another experiment is conducted by varying the split ratio of training and testing
data. The split is done as 60:40, 80:20, and 90:10 for training and testing and observed
that the results show better results for the support vector machine (SVM) for smaller
dataset and deep neural network shows better accuracy for larger dataset independent
of the split ratio of training and testing data, as shown in Table 2.
594 B. Patil and M. Hangarge
5 Conclusion
The experiments are carried out to study feature extraction and classification
methods’ appropriate for gender prediction. The results in Table 2 show that SVM
shows better outcome for smaller dataset, independent of the feature extraction
method. The gender prediction accuracy increases when normalized iris images are
used as input for feature extraction methods, and Gabor filter-based feature extraction
shows better gender prediction accuracy. The neural network model was trained for
gender prediction using whole eye images and normalized iris images; the gender
prediction accuracy is high for normalized input with greater dataset size. Observa-
tions are made that SDUMLA-HMT images contain full eye image including eyelids,
noisy images, and pupil is not the center of the images which makes iris localization
and normalization more challenging. So it is observed that IITD Dataset shows good
accuracy as compared with the SDUMLA-HMT Dataset as images are focused on
region of interest with minimum noise. Further, the same setup can predict other soft
biometric predictions which are like age and ethnicity.
References
1. Bobeldyk D, Ross A (2016) Iris or periocular? Exploring sex prediction from near infrared
ocular images. Lecture notes in informatics (LNI). In: Proceedings—series of the Gesellschaft
Fur Informatik (GI), p 260. https://doi.org/10.1109/BIOSIG.2016.7736928
Gender Prediction Using Iris Features 595
2. Bobeldyk D, Ross A (2018a) Predicting eye color from near infrared iris images. In: Proceed-
ings—2018 international conference on biometrics, ICB 2018, pp 104–110. https://doi.org/10.
1109/ICB2018.2018.00026
3. Bobeldyk D, Ross A (2018b) Analyzing covariate influence on gender and race prediction from
near-infrared ocular images. https://doi.org/10.1109/ACCESS.2018.2886275
4. Bobeldyk D, Ross A (2019) Predicting soft biometric attributes from 30 pixels: a case study
in NIR ocular images. In: Proceedings—2019 IEEE winter conference on applications of
computer vision workshops, WACVW 2019, pp 116–124. https://doi.org/10.1109/WACVW.
2019.00024
5. Dantcheva A, Elia P, Ross A (2016) What else does your biometric data reveal? A survey on
soft biometrics. IEEE Trans Inf Forensics Secur 11(3):441–467. https://doi.org/10.1109/TIFS.
2015.2480381
6. Daugman J (2004) How iris recognition works. IEEE Trans Circuits Syst Video Technol
14(1):21–30. https://doi.org/10.1109/TCSVT.2003.818350
7. Kuehlkamp A, Becker B, Bowyer K (2017) Gender-from-iris or gender-from-mascara? http://
arxiv.org/abs/1702.01304
8. Kumar A, Passi A (2008) Comparison and combination of iris matchers for reliable personal
identification. In: 2008 IEEE computer society conference on computer vision and pattern
recognition workshops, CVPR Workshops, vol 43, pp 1016–1026. https://doi.org/10.1109/
CVPRW.2008.4563110
9. Lagree S, Bowyer KW (2011) Predicting ethnicity and gender from iris texture
10. Majumdar J, Patil BS (2013) A comparative analysis of image fusion methods using texture.
Lecture notes in electrical engineering, 221 LNEE, vol 1, pp 339–351. https://doi.org/10.1007/
978-81-322-0997-3_31
11. Ramlee RA, Ranjit S (2009) Using iris recognition algorithm, detecting cholesterol presence.
In: Proceedings—2009 international conference on information management and engineering,
ICIME 2009, pp 714–717. https://doi.org/10.1109/ICIME.2009.61
12. Singh M, Nagpal S, Vatsa M, Singh R, Noore A, Majumdar A (2018) Gender and ethnicity
classification of Iris images using deep class-encoder. In: IEEE international joint conference
on biometrics, IJCB 2017, pp 666–673. https://doi.org/10.1109/BTAS.2017.8272755
13. Sreya KC, Jones BRS (2020) Gender prediction from iris recognition using artificial neural
network (ANN). www.ijert.org
14. Tapia J, Aravena CC (2018) Gender classification from periocular NIR images using fusion of
CNNs models. In: 2018 IEEE 4th international conference on identity, security, and behavior
analysis, ISBA 2018, pp 1–6. https://doi.org/10.1109/ISBA.2018.8311465
15. Tapia JE, Perez CA (2019) Gender classification from NIR images by using quadrature encoding
filters of the most relevant features. IEEE Access 7:29114–29127. https://doi.org/10.1109/ACC
ESS.2019.2902470
16. Tapia JE, Perez CA, Bowyer KW (2015) Gender classification from iris images using fusion of
uniform local binary patterns. Lecture notes in computer science (including subseries lecture
notes in artificial intelligence and lecture notes in bioinformatics), vol 8926, pp 751–763.
https://doi.org/10.1007/978-3-319-16181-5_57
17. Tapia JE, Perez CA, Bowyer KW (2016) Gender classification from the same iris code used for
recognition. IEEE Trans Inf Forensics Secur 11(8):1760–1770. https://doi.org/10.1109/TIFS.
2016.2550418
18. Thomas V, Chawla NV, Bowyer KW, Flynn PJ (2007) Learning to predict gender from iris
images
Hardware Implementation
of an Activation Function for Neural
Network Processor
1 Introduction
Processor design is undergoing a grand upheaval. Large part of this evolution is due
to the introduction of neural computation in virtually every field of daily life. Compu-
tational requirements of a neural network processor (NNP) are vastly different than
the conventional processors. The workload of a NNP requires implementation of
massively parallel compute cores. The network architecture tends to depend on the
application area, and hence, there is a need to develop customized processors. Many
companies are using their own processors to implement such hardware. For example,
a special purpose processor for software defined radio and heuristic cognitive radio
algorithms has been reported by Saha et al. in [1]. Use of graphic processor units
(GPU) has shown significant performance advantages [2]. Many new neural archi-
tectures are being proposed to address domain-specific needs like robotic motion
control [3], image recognition [4] etc. A domain-specific instruction set architecture
(ISA) for neural accelerators, called Cambricon, is reported by Liu [5]. Its load-
store architecture integrates scalar, vector, matrix, logical, data transfer and control
instructions, based on a comprehensive analysis of existing neural networks (NN).
Recently, Intel has revealed a new processor called Nervana NNP capable of
performing tensor operations at processor level [6]. Nervana NNP uses a fixed-point
number format named flexpoint that supports a large dynamic range using a shared
exponent. Mantissa is handled as a part of the op-code. IBM has developed a brain-
like chip called TrueNorth, with 4096 processor cores, each capable of emulating 256
neurons with 256 synapses each. Neurons and synapses are two of the fundamental
S. Mayannavar (B)
Nitte Meenakshi Institute of Technology, Yelahanka, Bangalore 560064, India
e-mail: [email protected]
U. Wali
C-Quad Research, Desur IT Park, Belagavi 590014, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 597
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_51
598 S. Mayannavar and U. Wali
biological building blocks that make up the human brain. Hence, the chip mimics one
million human neurons and 256 million synapses [7]. DynamIQ is a new technology
for ARM Cortex. It has dedicated processor instructions for artificial intelligence
(AI) and machine learning (ML) with faster and more complex data processing.
Wathan [8] has reported that dynamIQ performs AI computations with 50 × increased
performance compared to Cortex-A73 systems. It supports flexible computation with
up to 8 cores on a single cluster with SoC, where each core can have different
performance and power characteristics. Therefore, it is not difficult to foresee a
situation where many companies would like to design their own processors to support
deep learning neural network processors.
Implementation of sigmoid function using piecewise linear (PWL) [9] and second-
order nonlinear function (SONF) and look-up table [10] have been reported. They
have reported maximum error between software and hardware-based computation
as 0.002. Similar work using PWL approximation is reported in [11].
In this paper, improved implementations using two-point (PWL) approximation
and second-order interpolation (SOI) are reported. For PWL, maximum error of
0.0974 and 0.0238% was observed with uniform and non-uniform spacing. Further
reduction in error, down to 0.0005%, has been observed using SOI. These modules
use a specific number format suitable for implementing the activation function. The
module has two modes of operation with user selectable speed and error behavior.
Programmers will be able to shift between these modes depending on the end appli-
cation. This work is part of an ongoing work on a design framework for development
of special purpose processors [12].
One of the most frequently used operations in the neural networks is the activation
function. It is possible to define various activation functions but the S-shaped sigmoid
function is most widely used because of its gradient descent properties [13]. Sigmoid
can be defined by the formula given below:
1
y= (1)
1 + e−x
1
y= x2 x3 x4
(2)
1+1−x + 2!
− 3!
+ 4!
− ···
Consider the x–y plane as shown in Fig. 2. Points (x 1 , y1 ) and (x 2 , y2 ) are the two
known points on the curve. The sigmoid of x can be interpreted using two-point
approximation as given in (3).
( )
y2 − y1
y= (x − x1 ) + y1
x2 − x1
y = m(x − x1 ) + y1 (3)
The stored values of (x 1 , y1 ) and m are obtained from look-up table such that
given value of x lies between two sets of (x, y). This approximation may be carried
out with both uniform and non-uniform spacing between two set of points.
Uniform spacing of 0.5
The two known points are assumed to be at uniform distance of 0.5. For example,
sigmoid of x between x 1 = 0 and x 2 = 0.5 is depicted in Fig. 3. This has maximum
error deviation of 0.3838%. Note that Fig. 3 shows error scaled up by 100 for clarity.
By decreasing the interval between points, we can improve the accuracy. For
example, Fig. 4 shows two-point approximation with uniform spacing of 0.25. This
600 S. Mayannavar and U. Wali
Fig. 2 Two-point
approximation of a curve
has maximum error deviation of 0.0974%, an improvement of nearly 75%. Here also
error is scaled by 100 for clarity.
Uniform spacing of 0.25
By decreasing the interval between points, we can improve the accuracy. For example,
Fig. 4 shows two-point approximation with uniform spacing of 0.25. This has
maximum error deviation of 0.0974%, an improvement of nearly 75%. Here also
error is scaled by 100 for clarity.
Hardware Implementation of an Activation Function … 601
Non-uniform spacing
From Figs. 3 and 4, it is observed that the gradient of sigmoid curve is significant
for |x|<6, and therefore, a non-uniform spacing gives better approximation with
less number of computations compared to uniform spacing. Figure 5 shows the
approximation of a curve with non-uniform spacing. Maximum error of 0.0238% is
noted.
Look-Up Table format
From Eq. 3, it is clear that we will need three variables to be stored per point. If we
store them as a tuple {x, y, m}, sequentially, and assuming 16-bit format suggested
in Sect. 3.1, we will need fairly small memory. For a 15-point approximation, we
will need 16 × 3 × 15 = 720 bits or 90 bytes only. Reducing the number of points
in LUT reduces the memory requirement but affects the accuracy.
Fig. 6 Three-point
approximation of a curve
Hardware Implementation of an Activation Function … 603
The coefficients a, b and c are calculated by solving the Eqs. (4)–(6), simultane-
ously. The simplified equations for a, b and c are given in (7)–(9).
The pre-computed values of a, b and c are loaded from the look-up table, and the
sigmoid of any given x is computed using Eq. (10). This method is carried out for
uniform spacing between x = 0 and x = 6. The sigmoid value is assumed to be 1 for
x ≥ 6. This method has maximum error of 0.0030%. The graph of SOI with uniform
spacing of 0.125 is shown in Fig. 7. For visibility purpose, the error is multiplied by
10,000.
In Fig. 7, we can see that for the values of x greater than 6, curve remains at 1,
which means that the gradient of curve is significant for the values less than 6, and
hence, there is no need for any calculation for x ≥ 6.
As we can see from Fig. 7, the error is maximum at x = 0.05. This can be reduced
by decreasing the interval between 0 and 1–0.0625 and by keeping interval outside
this range at 0.125. This non-uniform spacing reduces the error down to 0.00057%
as shown in Fig. 8.
The error deviation is calculated using the formula given in Eq. (11) below.
The basic operations required to implement the sigmoid functions are addition and
multiplication. The algorithm to calculate the sigmoid function is explained in Table
1. Input x is compared with the known values of (x 1 , y1 ), (x 2 , y2 ) and (x 3 , y3 ). The
corresponding a, b and c values are loaded from the look-up table which are used to
evaluate the sigmoid.
The basic idea of implementing the sigmoid function is explained with the help
of finite state machine as shown in Fig. 9.
The FSM shown in Fig. 9 has six states, viz. idle, start, linear search, multiplier,
adder and exit. In the idle state, if there is a load command, then input is loaded to the
internal register and the state changes to linear search. In the linear search state, the
input x in compared with stored (x, y) values and when the x matches with one of the
stored set and the corresponding a, b and c values are loaded from the memory and
state will change to multiplier state. There are two multiplication and two addition
steps involved. In the first step of multiplication, ax is computed, and in the first step
of addition, (ax + b) is computed. In the second step of multiplication, (ax + b)x is
computed. Finally, in the last step of addition, (ax + b)x + c is computed and the
process terminates.
Optimized bit format
The neural network structure deals with real-time inputs, and therefore, there is need
to implement a floating-point arithmetic unit. In order to get the precision up to
4 significant figures, we have fixed 12 bits for the fraction as shown in Fig. 10.
For example, the number 0.0125 is represented in the optimized bit format as
0000000000110011b.
Errors due to Optimized bit format
The optimized bit format has 12 bits to represent the fraction part of floating-point
number. Due to this restriction, there will be small amount of numerical errors in the
calculated result. Tables 1 and 2 summarize the errors due to optimized bit format
for both two-point and three-point approximation methods.
It can be concluded from Tables 2 and 3 that the computational accuracy can be
controlled by selecting the interval. Increasing the number of intervals improves
accuracy but also increases the size of LUT, providing a trade-off between the two.
However, using non-uniform spacing of points, this problem can be overcome to
a certain extent. The other aspect of calculation of the activation function is the
speed with which computations can be carried out. Table 4 shows that the number
of mathematical operations reduces considerably for interpolation methods over
direct computation. Therefore, the time required for the computation is considerably
reduced.
The data presented in this paper shows that interpolation methods can be used to
achieve sufficiently accurate computation of the activation function. The computa-
tional complexity is also considerably reduced for proposed approximations. There-
fore, we could implement hardware modules using the methods discussed in the
above sections. The SOI approximation is implemented using Verilog. With a serial
multiplier, maximum of 30 clock cycles are required to compute the sigmoid function.
Therefore, these methods can be used to achieve better performance with reduced
computational complexity.
608 S. Mayannavar and U. Wali
There are several other types of nonlinear functions used in deep neural networks.
For example auto resonance networks, radial bias functions, self-optimizing maps,
etc., use other nonlinear functions which can be implemented using similar
interpolation methods.
Acknowledgements The authors would sincerely like to thank C-Quad Research, Belagavi, for
all the help and support.
References
1 Introduction
2 Literature Review
The research and development that pertain to sustainable transportation are to even-
tually uphold the self-explanatory term sustainable transportation. Without it, we
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 609
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_52
610 A. Mariyan Richard and P. N. Hamsavath
would not know where to begin, and also miserably fail at spreading the word about
it and convincing people to pursue the same. To be specific, if the higher ups do not
know what the proposed transportation model is clearly, it is near to impossible for
them to go about spreading awareness about it.
This additionally leads to the issue of not having a concrete structure based on
which other policies and programs would be created around [3]. There have always
been a lot of research programs on sustainable development, but nothing specifi-
cally to and exclusive market such as transportation. Many entities have managed
to adopt the foresaid definition as their own at various commissions. It provides
more information on the existing research and development that has gone into the
system by various means. All prototypes have been designed and developed based on
partial information, and no concrete references have been adhered to simply because
there does not exist one. For each review, multiple definitions were observed and
arrived upon. But now, with a better understanding of the scenario, a rough yet stable
discussion is in place [4].
Gamification can be termed as the use of simple game design elements and prin-
ciples in non-gaming environment. It is based on the simple flow of data that is
a thought-driven operation which requires the person engaged in an activity to be
completely immersed into the process of doing the said activity. This is usually
adopted to increase and improve user engagement, productivity, control over activity,
easy to understand, and other organizational criteria [5]. A lot of research has been
carried out on this and was observed to have its positive effects on users. It is also
said to help improve one’s ability to process digital content and analyze the area of
study, respectively.
Sustainable transportation models should include the following phenomena that
help understand the proposed model better:
• Exhaustion of natural resources
• Atmospheric impacts
• Threat to mankind
• Air quality index
• Space constraints
• Equality.
Collectively, from all the work above, it is practically still not possible to define
sustainable transportation. This also leads us to the following observations:
• The concept of sustainable transportation is based on a sustainable development
approach.
• Sustainable transportation is a balance of multiple entities. At its simplest, ST
should be a direct contributor to the local economy of any given place.
• To help define better, it is required by the systems to somehow manage to
help understand what sustainable and unsustainable really equate to in terms
of numbers. A concrete model for the same has to be defined mathematically to
be even adopted.
Using Big Data and Gamification to Incentivize Sustainable … 611
• The system should emphasize more on the reforms, governing body, interdepen-
dence on other factors and sectors in the society.
• The absence of a definition should not stop people from promoting it.
3 Methodology
The gamification elements and technique are something that intend to bring out
peoples’ natural instincts to socialize, learn, master, compete, achieve, status, self-
expressions, observations, and closure. First used gamification policies and strategies
used rewards for users who managed to accomplish intended tasks or compete to
ensure player engagement. The various type of rewards that were offered were points,
badges, levels, progress bar-based status, or allowing players to redeem coupons or
virtual currency. Allowing players to view the rewards and accomplished tasks of
other players induces the competitiveness on one to outdo the other instantly [6].
These led to dynamic leaderboards and other UI-based frameworks to encourage
more and more players to use the system. There have also been instances of clash
between two or more players during the use of these frameworks, which are a result of
unethical behavior on the players’ part, not sportive enough to understand the motive
at hand, or simply something as simple as disadvantages in form as demographics
such as a woman player. Best-practice gamification designs try to refrain from using
this element [7] (Fig. 1).
Another way to using gamification is to make the existing operations and tasks
feel more like reward-based games. Some of the examples where this approach can
be integrated are onboarding of employees virtually, increasing the context screen
engagement time, quick narratives, and additional choices.
Creating new ways of harnessing data as well as giving back value is the only
way we can understand the needs of users better. In transportation industry, the
vast majority of the participants are end users whose objective is to move from
point A to point B, following which there are much more complex needs present for
organizations and their operations [8]. There is a distinction between how the data
can be used to create more convenience to the users and also to understand their
needs at the same time.
In the business-to-consumer segment, one of the important things is cost and fuel
optimization, something that every end consumer really wants as they want to save
money. That is something that data can be extensively used for: to detect driving
patterns, for example, to suggest more fuel-optimized routes and greener ways of
traveling. Those are all sort of key values where big data can be used, and today, it
is totally doable and possible.
The future of efficient transportation is highly multi-modal and highly multi-
dimensional. As we all know, multi-modal means using many means of transport
[9]. This has been in existence for a few decades now. But, multi-dimensional is
when you change the mode of transport, you also change the user experience, the
data flow and pricing structures, and there are so many variations that happen when
you change modes.
This multi-dimensional transport application is going to be the next big thing in the
transport industry. The efficiency is one side of the coin, and the other side of the coin
is user experience. People want to choose the most interesting way of traveling [10].
It does not always need to be the most convenient or the cheapest. Of course, cost is
a high criterion, but not the only one—people also want a comfortable journey, they
want a nice experience, and most importantly, people want to be engaged (Fig. 2).
The general form of a notification is <gamifiableActionID, playerID, timeStamp,
parametersMap> where gamifiableActionID is a unique ID and the parametersMap
contains a set of key/value pairs that are specific to that gamifiable action [11]. The
wrapping layer issues notifications on behalf of the wrapped information and commu-
nications technology systems through a simple actionPerformed service interface.
Moreover, the wrapping layer enables strongly decoupled interactions between the
native Smart City functionalities involved in a specific game which is the component
responsible to execute that game and managing its status.
Using Big Data and Gamification to Incentivize Sustainable … 613
4 Future Enhancements
The proposed mode could be used to benchmark and evaluate transportation sustain-
ability within the general existing framework. Since integrating sustainability in itself
is self-explanatory, guides about the same for transport planning could be formed.
Various options for inter-state departmental collaborations could give raise to much
more feedback-oriented development. Efforts to promote some aspects of non-urban
transportation too could pave the way for last mile connectivity. Lastly, a one-stop
portal to provide development and sustainable transportation could be deployed [12].
5 Conclusion
References
1. How will big data impact the future of transportation?—Part 3 by Vinay Venkatraman. https://
www.move-forward.com/how-will-big-data-impact-the-future-of-transportation-part-3/
2. The role of gamification and big data in today’s world of business. Brigg Patten. https://www.hr.
com/en/app/blog/2016/05/the-role-of-gamification-and-big-data-in-todays-wo_inuulx6g.html
3. Kazhamiakin R, Marconi A, Perillo M, Pistore M, Piras L, Avesani F, Perri N (2015) Using
gamification to incentivize sustainable urban mobility. Trento, Italy 38123
4. Zhou J (2012) Sustainable transportation in the US: a review of proposals, policies, and
programs since 2000. Frontiers Architect Res 1:150–165. https://doi.org/10.1016/j.foar.2012.
02.012
5. Steg L (2007) Sustainable transportation. IATSS Res 31:58–66. https://doi.org/10.1016/S0386-
1112(14)60223-5
6. Bamwesigye D, Hlavackova P (2019) Analysis of sustainable transport for smart cities.
Sustainability 11. https://doi.org/10.3390/su11072140
7. Bran F, Burlacu S, Alpopi C (2018) Urban transport of passengers in large urban agglomerations
and sustainable development. Experience of Bucharest municipality in Romania. Eur J Sustain
Dev 7. https://doi.org/10.14207/ejsd.2018.v7n3p265
8. https://www.researchgate.net/publication/281377423_Using_Gamification_to_Incentivize_
Sustainable_Urban_Mobility
614 A. Mariyan Richard and P. N. Hamsavath
9. Giffinger R, Haindlmaier G, Kramar H (2010) The role of rankings in growing city competition.
Urban Res Pract 3(3):299–312
10. Nam T, Pardo T (2011) Conceptualizing smart city with dimensions of technology people
and institutions. In: Proceedings of the 12th annual international digital government research
conference: digital government innovation in challenging times, pp 282–291
11. Merugu D, Prabhakar B, Rama N (2009) An incentive mechanism for decongesting the roads:
a pilot program in Bangalore. In: Proceedings of ACM NetEcon workshop 2009
12. Gabrielli S, Maimone R, Forbes P, Masthoff J, Wells S, Primerano L et al (2013) Designing
motivational features for sustainable urban mobility. In: CHI’13 extended abstracts on human
factors in computing systems, pp 1461–1466
Optical Character Recognition System
of Telugu Language Characters Using
Convolutional Neural Networks
1 Introduction
K. V. Charan (B)
Department of Computer Science and Engineering, Shridevi Institute of Engineering and
Technology, Tumkur, Karnataka, India
e-mail: [email protected]
T. C. Pramod
Department of Computer Science and Engineering, Siddaganga Institute of Technology, Tumkur,
Karnataka, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 615
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_53
616 K. V. Charan and T. C. Pramod
spoken in many states like Andhra Pradesh, Telangana (official language), some
parts of Puducherry, and Andaman and Nicobar Islands. It is also one of the six
languages nominated as classical language declared by Indian government. The
Telugu language consists of 52 letters (shown in Fig. 1). It has 16 vowels and 36
consonants. There are numerous documents that contain both Telugu and English
jointly. Some of the documents are acknowledgment of Aadhar card, reports like
death and birth, reservation forms of railways, etc.
Deep Learning is a procedure of Data Mining that utilizes deep neural network
architecture, which are particular sorts of machine learning and AI algorithms. There
are three procedures that permit deep learning in resolving different problems.
• Fully Connected Neural Networks
• Convolutional Neural Networks
• Recurrent Neural Networks.
These neural networks are standard networks utilized in various applications. Fully
connected means every neuron in the past layer is associated with each neuron in
the ensuing layer. Feed forward implies that neurons in the former layer are simply
associated with the neurons in a succeeding layer. Every neuron consists of an acti-
vation function that changes the yield of a neuron. It is possible to make a system
with various inputs, different outputs, distinct hidden layers, and separate activation
functions. Different combinations of these networks permit us to make an incredible
network that solves a wide range of problems.
Optical Character Recognition System of Telugu Language … 617
CNN is a sort of deep neural network architecture intended for explicit works
like the classification of images. In this process, extracting the features is done
implicitly within the network. CNN is used for various tasks like image processing,
segmentation, recognition, and natural language processing.
RNNs work viably on successions of data with variable input length which implies
RNNs utilize information on its previous state as a commitment for its current predic-
tion, and we can repeat this procedure for an optional number of times permitting
the network to propagate data by methods for its hidden state through time. With
this feature, RNNs become extremely effective to work with sequences of data that
happen after some time. It functions well for applications that require a series of
information, which changes after some time. Some of them are language translation,
recognition of speech, etc.
In this paper, we are using deep learning techniques like Convolutional Neural
Networks for recognizing Telugu characters. The rest of the paper is organized as
follows: Sect. 3 gives the related work. Proposed solution is discussed in Sect. 4.
Experimental results are discussed in Sect. 5, and conclusion is given in Sect. 6.
3 Related Work
Sahara and Dhok [1] proposed a process for recognition and segmentation of charac-
ters like Devanagari and Latin languages. For character segmentation, heuristic-based
algorithm is integrated with Support Vector Machines. For Character recognition, K-
Nearest Neighbor classifier is used for recognizing input character. It is shown that
the accuracy of 98.86% is procured for segmentation and 99.84% for recognition
process.
Sharma et al. [2] put forward an algorithm deployed on the idea of learning by
itself. In this paper, numbers and English letters are taken for recognition analysis.
SVM classifies the given input into different classes by using hyper planes, and an
optimal hyper-plane is selected among multiple hyper planes. Both multi-class and
bi-class are used to accomplish the identification of characters with 95.23% accuracy.
Jabir Ali and Joseph [3] presented a model of Convolutional Neural Networks
for categorizing Malayalam handwritten characters. It involved image acquisition,
greyscale transformation, word decomposition, binarization, character segmentation,
and estimation. CNN is used for the classification of Malayalam characters. Accuracy
of 97.26% is obtained in examining the CNN model.
618 K. V. Charan and T. C. Pramod
Dara and Panduga [4] described a method for offline handwritten character recog-
nition by extracting features using 2D Fast Fourier Transform and using SVM for
documents containing Telugu characters. A complete class of 750, and 1500 samples
are used for training, and for developing testing area, 750 trails are used. The accuracy
obtained with this method is 71%.
Angadi et al. [5] suggested and evaluated a classic CNN for online Telugu char-
acter recognition. In this process, a character image is classified into 166 classes
contained in 270 trails. The network includes 4 standard layers, first layer with 5 ×
5 and remaining with 3 × 3 kernels and ReLU activation functions, followed by
2 impenetrable layers with function of SoftMax. The result of this method is quite
imposing compared to other algorithms. This technique provided 92.4% accuracy.
Inuganti and Ramisetty [6] has given an analysis of datasets, feature extraction
techniques, and classification of various Indian scripts like Assamese, Kannada,
Telugu, Gurumukhi, Tamil, Bangla, and Devanagari. The steps involved in this
process are data collection, pre-processing of images, extraction of features, identi-
fication, and post-processing of images. Classification is done using Structure-based
Models, Neural network models, motor models, and Statistical Models. In post-
processing, confusing pairs are found and Script-specific feature is used to solve the
ambiguities in the characters. Accuracy of 92% is disclosed over the collection of
datasets from 168 users.
Manjunathachari et al. [7] described the morphological analysis for the classifi-
cation of Telugu Handwritten composite characters. A dataset is created by writing
a composite character on paper and scanning it using a scanner. 250 samples of
each character are taken. Segmentation of characters is done using morphological
operation and attained a 98.1% accuracy rate.
Mohana Lakshmi et al. [8] presented a coherent algorithm to recognize hand-
written Telugu characters based on Histogram of Oriented Gradients (HOG) features
and classification using the Bayesian Classifier. Binarization and smoothing are the
two techniques used to improve the resultant picture. Based on the number of nega-
tive and positive testing and training dataset, the recognition rate was calculated.
This technique provided 87.5% accuracy.
Chakradhar et al. [9] suggested diverse methods to identify Telugu handwritten
characters. The authors have analyzed some of the existing systems for recognition
of handwritten characters of Telugu scripts. It shows that the accuracy rate obtained
using Hybrid Models is 93.1% and using SVM is 90.55%.
Kaur and Kaur [10] analyzed and reviewed many technologies to seek out charac-
ters from input images containing text characters. The process contains the following
phases: Image Scanning, Pre-processing, Extraction of features, Recognition, and
post-processing. In pre-processing, the following steps are processed: noise removal,
thresholding, and skeletonization. For classification, the techniques used are Thomas
Bayes classifier, Support Vector Machines, Neural Networks, and Nearest Neighbor
classifier. In post-processing, symbols are grouped.
Optical Character Recognition System of Telugu Language … 619
Fig. 2 Vowels
Fig. 3 Consonants
4 Proposed Work
The proposed system utilizes CNN framework to classify the characters. The system
mainly comprises five steps:
• Data acquisition
• Defining CNN architecture
• Training the network
• Deploy the model
• Testing the network.
Creating a dataset requires lot of time and needs lot of efforts. There is no accessible
dataset for Telugu characters. We have collected the dataset from particular school
students and modified it. This dataset contains images of 52 Telugu characters in
which 18 characters are vowels and remaining consonants. Among them, only 35
consonants and 13 vowels are used commonly. For training the model, enormous
dataset is required. Each character is written on a paper in a particular order, in
various designs and sizes by 60 individual writers. Then, these documents were
scanned by scanner. Each character is kept in separate folder. Each folder is labeled
with different names. Some of the vowels and consonants are shown in Figs. 2 and
3.
The architecture contains six layers. It contains two groups of activation, convo-
lutional layer, and pooling layers followed by fully connected layer, at last SoftMax
classifier. It has convolutional filters with each filter size of 5 × 5. Then, ReLu acti-
vation function is applied and followed by 2 × 2 max-pooling. Adam Optimizer is
used for gradient descent algorithm which accomplishes in training. An individual
image of Telugu character with label is passed throughout the layers and, to gradient
descent algorithm, then weights get updated. Output layer consists of 44 classes for
each character in the dataset.
Firstly, the network needs to be prepared by training with the created dataset. Then, it
is supposed to label each character in the dataset and labels format is prepared such
that the network can understand them. The created dataset is prepared, and each
character is labeled. This dataset is separated into training set and testing dataset
in the ratio of 75:25. The training dataset is again separated into the training and
validating sets. This validation set is utilized to test during training time that we
can see whether our model overfits or not during every epoch. An epoch means
complete iteration of training over the whole dataset. After the model is well trained,
the weights are stored. Algorithm for training the network is as follows:
1. Input image.
2. Resize the image to 28 × 28 pixels.
3. The image is converted into an array of 2-D as CNN takes input as 2-D array.
4. Extract the label of the class from the image path and update it to the list.
5. Change the intensities of raw pixels to the range of [0, 1].
6. Split the data into training dataset and testing dataset as 75% and 25%,
respectively.
7. Convert the labels of integers into vectors.
8. Construct a generator for the images used during data augmentation.
9. Save the network.
Optical Character Recognition System of Telugu Language … 621
Fig. 5 Flowchart of
proposed method
After effective training, the model must be deployed so that the user can input an
image of Telugu handwritten scripts and is predicted using the model which is saved.
Telugu manually written character image has to be read to predict the output. For
every predicted class, each character is mapped to equivalent Telugu character.
After model deployment, the model is ready for testing. 20% of the total dataset is
considered for testing. First the image is loaded from the dataset. If the image does
not load, then it displays error message and get exits. After loading the image, it is
resized to 28 × 28 pixels. The image is converted to binary image and then converted
into arrays. The network then finds the matching character and gives the accuracy of
each character. The algorithm for testing is given below:
1. Load the image.
2. If there is error in loading the image, it exits and displays the error message.
3. Pre-process the image for classification, i.e., resized to 28 × 28 and converting
image into binary image and converting into arrays.
4. Load the trained convolutional neural network.
5. Classify the input image (Fig. 5).
5 Experimental Results
This section addresses results that are obtained after performing convolutional neural
network technique for the dataset on handwritten Telugu characters. By this algo-
rithm, we have achieved the accuracy of 90%. The accuracy can be increased by
622 K. V. Charan and T. C. Pramod
increasing the dataset. It also plots a graph for each character that is recognized. The
graph tells accuracy of the character that is trained (Figs. 6 and 7). The graph also
tells about the loss of information in each character (Fig. 8).
6 Conclusion
In OCR, there is a need to recognize the character more accurately. In this paper,
we have used deep learning technique, i.e., CNN that suits best for recognizing the
images. The model is trained using the dataset that has over 1000 images of 52
handwritten Telugu characters. The accuracy and effectiveness of the CNN method
can be increased by increasing the dataset of the characters. The accuracy obtained
using this model is 90% approximately. The future enhancement of this work can lie
in effective identification of characters from the connected sentences or words.
References
1. Sahara P, Dhok SB (2018) Multilingual character segmentation and recognition schemes for
Indian document images, Jan 2018. IEEE
2. Sharma S, Cheeran AN, Sasi A (2017) An SVM based character recognition system, May
2017. IEEE
3. Jabir Ali V, Joseph JT (2018) A convolutional neural network based approach for recognizing
malayalam handwritten characters. Int J Sci Eng Res
4. Dara R, Panduga U (2015) Telugu handwritten isolated character recognition using two
dimensional fast Fourier transform and support vector machines. IJCA
5. Angadi A, Vatsavayi VK, Gorripati SK (2018) A deep learning approach to recognize
handwritten Telugu character using convolutional neural networks. In: Proceedings of 4th
ICCM2018
6. Inuganti SL, Ramisetty RR (2019) Online handwritten Indian character and its extension to
Telugu character recognition. IJRE
7. Vishwanath NV, Manjunathachari K, Satya Prasad K (2018) Handwritten Telugu composite
characters recognition using morphological analysis. IJPAM
8. Mohana Lakshmi K, Venkatesh K, Sunaina G, Sravani D, Dayakar P (2017) Handwritten
Telugu character recognition using Bayesian classifier. IJET
9. Chakradhar CV, Rajesh B, Raghavendra Reddy M (2016) A study on online handwritten Telugu
character recognition. IJSETR
10. Kaur J, Kaur R (2017) Review of the character recognition system process and optical character
recognition approach. IJCSMC
Implementation of Python
in the Optimization of Process
Parameters of Product Laryngoscope
Manufactured in the Injection Mold
Machine
Balachandra P. Shetty, J. Sudheer Reddy , B. A. Praveena ,
and A. Madhusudhan
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 625
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_54
626 B. P. Shetty et al.
2 Methodology
To reduce the transmitted exerted force on the organs such as tooth and soft tissue
lesions and for positive tracheal intubation, there is a requirement to develop an
alternative device. The innovative product laryngoscope is as shown in Fig. 1 is
designed and manufactured to obtain video feature of larynx.
The present work is focused on manufacturing the double channel laryngoscope
through the process injection molding as it is opined as the best choice [15]. In
injection molding, the raw material is heated followed by melting in injection unit
and later transferred to mold subjected to higher pressure by clamping the molds.
The design features of the product need to be carefully modified such that the part
ensures better surface finish. Polymer chains will deform, when the melt transfer
takes place from sprue to runner, followed by gates, ingates, and cavities. During the
process of deformation, high shear stress and rapid cooling take place at the forefront
of mold surfaces. Thereby, complex crystalline morphology can be observed clearly
in the microstructure of semi-crystalline polymers of injection molded parts [16].
Fig. 1 Fabricated
laryngoscope samples
Implementation of Python in the Optimization of Process … 627
presented in Table 2. The objective function is to minimize the surface roughness, and
therefore, signal-to-noise (S/N) ratio equation is identified. S/N ratio values corre-
spond to the surface roughness values which are computed for every experimental
condition presented in Table 2.
The effect of injection molded factors on the quality of parts presented in Table 1
is experimentally studied. Higher injection pressure forces the hot-melt close to
die surface walls, to create the replica of the mold surface, an increase in injection
pressure tends to reduce the surface roughness of injection molded parts. The desired
minimal surface roughness was obtained at the mid-values of injection speed as
shown in Fig. 2. This might be due to filling defects at low injection speed and burrs
and jetting at higher injection speed [8]. A combination of low melt temperature and
high mold temperature resulted in reduced surface roughness values (refer Fig. 2). Too
low mold and melt temperature results in low melt viscosity, promoting shrinkage,
warpage, and flow lines [21, 22]. In contrast, too high a temperature results in an
excessive flash and burning [8]. The obtained results strongly justify the published
literature.
The summary of results of the Pareto analysis of variance were constructed for
performing the factor analysis (sum at factor levels, percent contribution) and to
determine the optimal conditions for surface roughness. The sum at factor levels
The surface roughness evaluations are made on the fabricated parts correspond to
experimental conditions given in Table 4. The optimal conditions are compared
with the initial conditions of the L9 orthogonal array for surface roughness quality
assessments. The laryngoscope parts (optimal and initial conditions) are subjected to
quality assessment to graphically visualize the surface textures using a high magni-
fication objective lens equipped in the non-contact-type confocal microscope. Note
that optimal conditions were demonstrated with smooth surface peaks compared to
initial conditions of Taguchi L9 orthogonal array experiments (refer Table 4).
3 Implementation of Python
The implementation of Taguchi optimization is done through Python code for surface
roughness optimization. To increase computational efficiency and to take care the
nonlinearities, the Python code is used. The Python code provides efficient platform
from which further extensions are easily possible. As the program code is in modular
630 B. P. Shetty et al.
construction, new functions and variables cab built in easily. Through this, various
other algorithms can also be implemented. The code is structured to compute signal-
to-noise ratio, to build up orthogonal array, and from that to develop sum at factor
level (SFL) table and finally to obtain parameter-wise optimum levels. The SFL table
is further utilized to get the graph of Pareto analysis of variance of surface roughness
as shown using mat plot library. The important modules of Python implementation
are as shown below.
# To find surface roughness sn ratio (smaller the better)
sr_sn = []
def sn_ratio():
for n in sr_response:
res1 = (n * n)
res2 = math.log10(res1)
res3 = −10 * res2
sr_sn.append(round(res3,2))
sr_response = [0.589,0.495,0.433,0.314,0.354,0.341,0.373,0.214,0.321]
sn_ratio()
# To find surface roughness sum at variable (factor) levels and OA list.
sr_a = np.array([[1,1,1],[4.60, 6.11,7.27],[2,2,2],[10.06,9.02,9.34],
[3,3,3],[8.56,13.39,9.87]])
sr_b = np.array([[1,1,1],[4.60, 10.06,8.56],[2,2,2],[6.11,9.02,13.39],
[3,3,3],[7.27,9.348,9.87]])
sr_c = np.array([[1,1,1],[4.60, 9.34,13.39],[2,2,2],[6.11,9.02,9.87],
[3,3,3],[7.27,10.06,8.56]])
def ssdiff(i):
for list in m:
ssd1 = (m[i][0] − m[i][1]) ** 2
ssd2 = (m[i][0] − m[i][2]) ** 2
ssd3 = (m[i][1] − m[i][2]) ** 2
ssd = ssd1 + ssd2 + ssd3
t_ssd.append(round(ssd,2))
ssdiff(0)
ssdiff(1)
ssdiff(2)
ssdiff(3)
def percent_cont_ratio():
for i in range(len(t_ssd)):
p_c_ratio.append(round((t_ssd[i]/sum_t_ssd) * 100,2))
percent_cont_ratio()
4 Conclusion
Optimum process parameters in injection molding process are of high priority as they
are not controlled by equations and depend more on in equations. In general, setting
the process parameters is left to the experience of the plastic engineer. Since the plastic
exhibits a complex thermo-viscoelastic property, selecting proper parameters from
varying values is a challenge. Plastic engineers otherwise select the parameters from
handbooks and then adjust by the trial-and-error method. The purpose of this variance
analysis is to investigate which factor primarily affect the performance characteristic
of the injection molding process. The implementation of this optimization process
is carried out through Python codes.
The conclusions drawn from the present work are as follows:
(a) Taguchi method was applied to study the influencing parameters (injection pres-
sure, injection velocity, and mold and melt temperature) that could affect the
surface roughness of laryngoscope parts. Injection pressure showed a signifi-
cant impact on surface roughness, followed by mold temperature and injection
velocity.
(b) Taguchi method determined optimal conditions (injection pressure of
160 kg/cm2 , injection velocity of 30 m/s, mold temperature of 100 °C, and
melt temperature 220 °C) which could reduce the surface roughness of laryngo-
scope part to 0.214–0.589 µm compared to initial setting conditions (injection
pressure of 100 kg/cm2 , injection velocity of 20 mm/s, mold temperature of
60 °C, and melt temperature 220 °C). The smooth surface textures are obtained
for optimal conditions compared to the initial setting of the Taguchi L9 method.
References
1. Vojnova E (2016) The benefits of a conforming cooling systems the molds in injection molding
process. Procedia Eng 149:535–543
2. Bianchi MF, Gameros AA, Axinte DA, Lowth S, Cendrowicz AM, Welch ST (2021)
Regional temperature control in ceramic injection molding: an approach based on cooling
rate optimization. J Manuf Process 68:1767–1783
Implementation of Python in the Optimization of Process … 633
3. Lakkanna M, Kumar GCM, Kadoli R (2016) Computation design of mould sprue for injection
moulding thermoplastics. J Comput Des Eng 3:37–52
4. Bryce DM (1996) Plastic injection molding: manufacturing process fundamentals. Society of
Manufacturing Engineers, Dearborn, MI, p 253
5. Mihara R, Komasawa N, Matsunami S, Minami T (2015) Comparison of direct and indirect
laryngoscopes in vomitus and hematemesis settings: a randomized simulation trial. Biomed
Res Int. https://doi.org/10.1155/2015/806243
6. Llewelyn G, Rees A, Griffiths C, Jacobi M (2020) A design of experiment approach for surface
roughness comparisons of foam injection-moulding methods. Materials 13(10):2358
7. Mohan M, Ansari MNM, Shanks RA (2017) Review on the effects of process parameters on
strength, shrinkage, and warpage of injection molding plastic component. Polym Plast Technol
Eng 56(1):1–12
8. Kashyap S, Datta D (2015) Process parameter optimization of plastic injection molding: a
review. Int J Plast Technol 19(1):1–18
9. Fernandes C, Pontes AJ, Viana JC, Gaspar-Cunha A (2018) Modeling and optimization of the
injection-molding process: a review. Adv Polym Technol 37(2):429–449
10. Davis R, John P (2018) Application of Taguchi-based design of experiments for industrial chem-
ical processes. In: Silva V (ed) Statistical approaches with emphasis on design of experiments
applied to chemical processes, p 137. https://doi.org/10.5772/65616
11. Antony J, Antony FJ (2001) Teaching the Taguchi method to industrial engineers. Work Study
50(4):141–149
12. Li K, Yan S, Zhong Y, Pan W, Zhao G (2019) Multi-objective optimization of the fiber-
reinforced composite injection molding process using Taguchi method, RSM, and NSGA-II.
Simul Model Pract Theory 91:69–82
13. Kiatcharoenpol T, Vichiraprasert T (2018) Optimizing and modeling for plastic injection
molding process using Taguchi method. Int J Phys Conf Ser 1026(1):012018
14. Martowibowo SY, Khloeun R (2019) Minimum warpage prediction in plastic injection process
using Taguchi method and simulation. Manuf Technol 19(3):469–476
15. Serban D, Lamanna G, Opran CG (2019) Mixing, conveying and injection molding hybrid
system for conductive polymer composites. Procedia CIRP 81:677–682
16. Praveena BA, Shetty BP, Lokesh N, Santhosh N, Buradi A, Jalapur R (2023) Design of injec-
tion mold for manufacturing of Cup. In: Pradhan P, Pattanayak B, Das HC, Mahanta P (eds)
Recent advances in mechanical engineering. Lecture notes in mechanical engineering. Springer,
Singapore. https://doi.org/10.1007/978-981-16-9057-0_8
17. Dehnad K (2012) Quality control, robust design, and the Taguchi method. Springer Science &
Business Media
18. Roy RK (2010) A primer on the Taguchi method. Society of Manufacturing Engineers,
Dearborn, MI
19. Jeevamalar J, Kumar SB, Ramu P, Suresh G, Senthilnathan K (2021) Investigating the effects
of copper cadmium electrode on Inconel 718 during EDM drilling. Mater Today Proc 45:1451–
1455
20. Suresh G, Srinivasan T, Rajan AJ, Aruna R, Ravi R, Vignesh R, Krishnan GS (2020) A study
of delamination characteristics (drilling) on carbon fiber reinforced IPN composites during
drilling using design experiments. IOP Conf Ser Mater Sci Eng 988(1):012008
21. Azad R, Shahrajabian H (2019) Experimental study of warpage and shrinkage in injection
molding of HDPE/rPET/wood composites with multiobjective optimization. Mater Manuf
Process 34(3):274–282
22. Benedetti L, Brulé B, Decreamer N, Evans KE, Ghita O (2019) Shrinkage behaviour of semi-
crystalline polymers in laser sintering: PEKK and PA12. Mater Des 181:107906
A Practical Approach to Software
Metrics in Beehive Requirement
Engineering Process Model
K. S. Swarnalatha
1 Introduction
K. S. Swarnalatha (B)
Department of Information Science and Engineering, Nitte Meenakshi Institute of Technology,
Bangalore 560064, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 635
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_55
636 K. S. Swarnalatha
2 Performance Measurement
3 Metrics
cognizance and human mistakes amid information extraction and unavoidable issues
that emerge at this phase of the improvement cycle. To this end, it is of vital signif-
icance that we recognize the territories and extent of the mistakes along these lines
created. The coherent successor to this progression is constrain uncertainty and fortify
lucidity to additionally improve the balance and exactness of the ultimate result. The
ambiguity for any given requirement lies between 0 and 1 where 0 shows requirement
has certainly no ambiguous and 1 shows requirement is more ambiguous.
(clearness value)
ambiguity = 1 − (1)
Number of Domains
1
Smallest range of concise = 1 + = 0.5 (Upper Threshold)
1
638 K. S. Swarnalatha
1
Conciseness = 1 + = ∼ Lower threshold
1 + Very Large
1
Conciseness = (3)
1 + size
• Reusability—Reusability alludes to code, process, modules and so on that can be
utilized again and again to accomplish comparative objectives in various items.
Reusing existing code/process accelerates the improvement procedure as well
as diminishes the quantity of mistakes. An examination with the existing model
demonstrates that least time is required to create a similar item utilizing the beehive
RE process, considering five cycles. The software/product is developed from
version 1.0 to 1.5 (5 emphases). Time taken to deliver the software/product item
(in days) is demonstrated as follows. The reusability range cannot be determined
since it is software/product dependent.
• Level of Detail—It is an important metric to understand and estimate the
clear requirements. To know the level of detail of individual requirement, the
following metric separates the requirements as clear and unclear (ambiguous/non-
ambiguous), and this metric is a measure of the reasonable and non-ambiguous.
The range of this metric is min = zero and max = number of non-ambiguous
requirements/total number of requirements.
i.e., min = 0
Max = 1
Internal consistency
no. of unique requirements − no. of unique and non deterministic requirements
=
no. of unique requirements
(5)
Σ
n
Degree of requirements
Average degree of requirements = (10)
i=1
Total number of requirements
which is 4, or the lowest need class, which is 1, perform duplication of the ambi-
guity value and need class will give us the practicable uncertainty for each. Figured
in every neighbourhood.
Σ
Ambiguity ∗ Importance Class
Effective Ambiguity = Σ (12)
Number of Requirements
The beehive process model has been compared with other 2 conventional models,
namely waterfall, spiral model. The beehive RE process model has been validated
in 3 companies. Table 1 gives the consolidated results for a set of 92 requirements.
The beehive RE process model was design and developed to overcome most of the
shortfalls of the traditional model methods like waterfall model, spiral model and
agile method. This model is proved to be more effective in terms of less ambi-
guity, better level of understanding, reduced redundancy, reduced randomness and
increased usability. Beehive RE process has proved to be an operative process.
References
1. Swarnalatha KS, Srinivasan GN, Bhandary PS (2014) A constructive and dynamic frame work
for requirement engineering process model—bee hive model. Int J Comput Eng Technol (IJCET)
5(7):48–54. ISSN 0976-6367 (Print), ISSN 0976-6375 (Online)
2. Swarnalatha KS, Srinivasan GN (2013) A survey on emerging trends in requirement engineering
for a software development life cycle. Int J Adv Res Comput Commun Eng 2(1):950–957. ISSN
(Print): 2319-5940, ISSN (Online): 2278-1021
3. Svensson RB, Höst M, Regnell B (2010) Managing quality requirements: a systematic review.
In: 2010 36th EUROMICRO conference on software engineering and advanced applications,
pp 261–268. ISBN 978-0-7695-4170-9/10 $26.00 © 2010 IEEE. https://doi.org/10.1109/SEAA.
2010.55
4. Ali MJ (2006) Metrics for requirements engineering. Master thesis submitted to Umeå
University, 15 June 2006
5. Morasca S (2013) Fundamental aspects of software measurement. In: De Lucia A, Ferrucci F
(eds) Software engineering. ISSSE 2010, ISSSE 2009, ISSSE 2011. Lecture notes in computer
science, vol 7171. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36054-1_1
Suitability of Process Models
for Software Development
K. S. Swarnalatha
1 Introduction
A lot of software development process models are available today. As each of these
comes with their own strengths and weaknesses, developers are often faced with the
challenge of choosing the right model [1]. Choosing the right model being a very
important first step, quantitative tests to evaluate models for desired characteristics
can hugely simplify the task and help developers make better decisions in the choice
of the model. Well-defined quantitative measures can also reduce the amount of time
required to choose the right model. A defined set of metrics for understanding a
model are present today [2], but a quantitative approach to measure them is lacking.
Some work is done in analysing metrics and enumerating their properties [3]. In this
paper, we define the method for measuring (i) redundancy in model, (ii) persistence
of learning and (iii) flexibility of model. Each of the below metrics is defined, and
a generic formulation is provided. The formulae are then applied directly or in a
modified form to three models.
2 Evaluation Metrics
K. S. Swarnalatha (B)
Department of Information Science and Engineering, Nitte Meenakshi Institute of Technology,
Bengaluru 560064, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 643
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_56
644 K. S. Swarnalatha
no return to previous steps and model flows step-by-step forward till the end, it
is as said to have no redundancy. Redundancy for a model is not straightforward
to calculate, for the reasons that:
i. Failure can happen at any step, thereby causing a backtrack to previous steps.
ii. The probability of failure is not known beforehand.
iii. Amount of backtracking is not same for failure at different steps of a model.
Here we calculate the average number of redundant steps in a model as specified
below in Eq. 1.
Let R be the average number of redundant steps in a software engineering model.
n → number of steps in the model
p → probability of failure at step i (1 ≤ i ≤ n)
Σ
n
R= ((number of steps bracktracted from step i ) ∗ (prob of failure at step i ))
i=1
(1)
1 Σ
n
Rsimple = ∗ (number of steps bracktraced from step i) (2)
n i=1
Σ
n
RC = pi ∗ (sum of costs of all the backtracked steps from step i )
i=1
⎛ ⎞
Σ
n Σ
i
RC = ⎝ pi ∗ ci j ⎠ (3)
i=1 j=i−n i
Note that for any step i, n i is the total number of repetition of it. If the model is
without repetition, then n i = 0, with the step performed only once.
The average RC for each step (RCstep ) can be computed by dividing RC with the
average number of redundant steps in the model (R), when R /= 0. RC = 0 when
R = 0.
RC
RCstep =
R
A simpler formulation of the RC formula can be obtained by assuming equal prob-
ability of failure at each step, similar to the redundancy formula simplification,
reducing RC to RC simple as shown in below Eq. 4.
⎛ ⎞
1 Σ
n Σi
RCsimple = ∗ ⎝ ci j ⎠ (4)
n i=1 j=i−n
i
One complication is the calculation of cost at each step. This problem is made
worse, when the repeat costs are required to be computed. One of the prob-
lems faced with the calculation of redundancy—determining the probability of
failure—resurfaces in this formulation too.
c. Flexibility: Flexibility of a model is an important metric for evaluating a model.
Gao and Yang have provided a preliminary definition and reasoning for it [4].
We define flexibility as the ability of a model to take corrective actions, with the
minimum number of backtracking steps in case of a change like introduction of
new requirements, error in design, etc.
A model is characterized by high flexibility if it performs corrective action in the
same step where the problem occurred, without any backtracking. On the other
end of this spectrum is a model that requires restarting the process from step
1 in case of failure. Such a model is the least flexible. The value of flexibility
falls in the range [0, 1], with 1 characterizing the former type of model and 0
characterizing the latter. The equation for flexibility is shown below in Eq. 5.
Flexibility, when failure happens at step i, is
646 K. S. Swarnalatha
t ← time of failure
( ) ( )
Σ Σ
i
C j t−1 − C j t
PL = ( pi ) ∗ ( ) (7)
for all steps where failure occured j=i−nb
C j t−1
than the others. From a performance standpoint, a good model should have minimal
redundancy, and it should have high flexibility and high persistence of learning.
3 Application
The formulae can be applied to three models. Waterfall: waterfall model is one of the
oldest software development models, and the model can be adopted when the project
size is too large and requirements should be known in advance. V-model: it is an
extension of the waterfall model, the significance of v is verification and validation,
as the name says V&V is applied at each and every phases of software development
life cycle. And iterative and incremental model: initial version of the software is
developed with few specification; later, it will be iterated to incorporate the new
requirements, in incremental model the requirements are divided into independent
modules, and each module of the software has to undergo all the phases of software
development life cycle [5, 6].
a. Redundancy
1. Waterfall [5, 6]:
Here if we assume the classic model where the requirements are fixed
throughout the various steps, without backtracking, then redundancy = 0.
2. V-model [5, 6]:
In this model, from steps 1–5, failure at any step leads to restarting the process.
This half is equivalent to a waterfall model. In the second half, failure at any
step leads to partial backtracking, not restarting.
Σ
n
R= pi ∗ (number of steps backtracked
i=1
from step i on occurence of failure)
= p1 (1) + p2 (2) + p3 (3) + p4 (4) + p5 (5)
+ p6 (3) + p7 (5) + p8 (7) + p9 (9)
For simplicity sake if we consider equal probability (p) for all steps,
Rsimple = p[1 + 2 + · · · + 9]
= p ∗ 39 = 39/9
This value changes based on actual number of steps and the backtrackings
employed in variations of the model.
3. Iterative and incremental [5, 6]:
648 K. S. Swarnalatha
b. Redundancy Cost
1. Waterfall model:
Redundancy cost is 0 as redundancy = 0 in the classic model.
2. V-model:
⎛ ⎞
Σn Σ i
RC = ⎝ pi ci j ⎠
i=1 j=i−n i
Σn ( Σi )
RC i=1 pi ∗ j=i−n i ci j
RCave = = Σn
R i=1 pi ∗ i
Ci ← cost of Step i
Σ
RC = Ci
for each of the
extra steps, i
Here the probabilities are excluded as they do not apply for any step, but for
the steps of mini waterfall contained in each step
RC
RCstep =
R Σ
for each extra iteration,i Ci
=
max(0, actual − expected number
of iterations)
Suitability of Process Models for Software Development 649
i
∴ Fi = 1 − =0
i
Σ
i Fi
Fave = =0
number of steps where failure occured
2. V-model:
1
F1 = 1 − =0
1
F2 = 1 − 2
2
=0
.
.
5
F5 = 1 − =0
5
F1 = F2 = F3 = F4 = F5 = 0
This is because as pointed out earlier, the first half of the model is similar to
the waterfall model.
2 2
F6 = 1 − =
6 3
4 3
F7 = 1 − =
7 7
650 K. S. Swarnalatha
6 1
F8 = 1 − =
8 4
9
F9 = 1 − =0
9
Σ
Fi 2
+ 3
+ 1
Fave = = 3 7 4
n 9
In actual practice, the value of F will vary from Fave shown here as it is
not required for failure to happen at every step of the model. In such cases,
flexibility of one implementation of the model can be used as an approximate
measure of flexibility for a future, similar implementation.
3. Iterative and incremental:
The formula for flexibility is not directly applicable for this model.
For one method of applying the formula, assume that the expected number
of iterations is m and the actual number of iterations is n with n > m.
Let m < i ≤ n.
For each step < i, Fstep = 1.
For every step i, if we think that i might be the last iteration of model, then
i − m is the number of backtracked steps.
(i − m) ( m )
∴ Fi = 1 − =
i i
Σm Σn ( )
j−1 1 +
m
j=m+1 j
∴ Fave =
n
d. Persistence of Learning
1. Waterfall model:
As redundancy = 0, persistence of learning property is not applicable.
2. V-model:
(( ) ( ) )
Σ Σ i
C j t−1 − C j t
∴ PL = pi ( )
i j=i−nb
C j t−1
For this model, the formula can be directly applied, without any modification.
3. Iterative and incremental:
For this model, we apply the formula using a different interpretation of the
extra iterations than for flexibility. The former interpretation can lead to
repeated values in each term of the outer summation.
Here we consider that for each i (i, n, and m having ( the)same meaning
C(i−1) −Ci
as explained in PL formulation), the value Ci−1
in place of
Suitability of Process Models for Software Development 651
Σi ( )
(C j )t−1 −(C j )t
pi
j=i−nb (C j )t−1
Σn ( )
Ci−1 − Ci
∴ PL =
i=m+1
Ci−1
4 Conclusion
References
1. Bokhari MU, Siddiqui ST (2011) Metrics for requirements engineering and automated
requirements tools. In: Proceedings of the 5th national conference; INDIACom-2011
2. Ali MJ (2006) Metrics for requirements engineering. Master’s thesis, 15 June 2006
3. Srinivasan KP, Devi T (2014) Software metrics validation methodologies in software engi-
neering. Int J Softw Eng Appl (IJSEA) 5(6)
4. Gao Y, Yang Y. “Flexibility” of software development method
5. Pressman R. Software engineering—a practitioner’s approach, 7th edn. McGraw-Hill. Accessed
Nov 2016
6. Somerville I. Software engineering, 9th edn. Pearson Education Ltd. Accessed Nov 2016
Solving Problems of Large Codebases:
Uber’s Approach Using Microservice
Architecture
1 Introduction
Back in 2016, Uber chose a drastic plan to change from monolithic to microser-
vices architecture to increase their dominion over the pay per travel industry which
revolutionized the doorstep pickup and drop for people.
Monolithic Model: Monolithic architecture has a single codebase with different
modules. Modules are isolated as either for business highlights or specialized high-
lights. It has a single construct framework which constructs the whole application
and/or reliance on a system. It too has single executable or deployable solutions.
Architecture: Monolithic apps were designed keeping in mind the ability to handle
a plethora of tasks simultaneously. In simple terms, they are complex apps which
house several tightly held functions. Monolithic tools also are used specifically
for huge codebases. This requires compiling, testing even for a small change in
the function within the platform which time is consuming [1]. It is also called
multi-layer architecture as monolithic usually has more than three layers which
comprises of
UI layer: This is the layer which the user uses to interact with the service which is
responsible for all the actions, requests and the retrieval of information.
Business layer: This is responsible for the company’s specific business logic or the
data logic which changes according to the need of the application.
K. S. Swarnalatha (B)
Professor, Department of Information Science and Engineering, Nitte Meenakshi Institute of
Technology, Bangalore 560064, India
e-mail: [email protected]
A. Mallya · G. Mukund · R. Ujwal Bharadwaj
Student, Department of Information Science and Engineering, Nitte Meenakshi Institute of
Technology, Bangalore 560064, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 653
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_57
654 K. S. Swarnalatha et al.
Fig. 1 Monolithic
architecture
Fault Tolerance Is Heightened: Any error found in the microservices affects only
that specific service and not the whole working system. So all the alterations and
tests are executed with fewer errors and risk free.
Microservices Working Principle: The microservice working principle as shown in
Fig. 3.
A general microservice architecture model is composed of
• Clients
• Identity providers
• API gateway
• Messaging formats
• Databases
• Static content
• Management
• Service discovery.
Clients
Clients are different entities sending requests and waiting for responses. Different
applications from various devices can be considered as clients, which try to perform
various functions like search, configure, manage, build, etc.
Identity Providers
All the requests from the clients are sent to these identity providers, which authen-
ticate them and map those requests to the API gateway. The API gateway then takes
over to communicate those requests.
Solving Problems of Large Codebases: Uber’s Approach … 657
API Gateway
API gateway serves as the entry point for all requests coming from clients, which
process them and forward those requests to the appropriate microservice.
Advantages of using API gateway
Microservices can be modified/updated without any communication from the clients.
Since API gateway is independent of clients, services can also use messaging proto-
cols that do not adhere to web standards. API gateway can also perform certain
important functions such as authentication, security and load balancing.
Messaging Formats
Services communicate through two types of messages:
Asynchronous messages
These messages are used by clients who do not wait for a response from a service.
Type of the message is usually defined, and these messages have to be interoperable
between implementations.
Synchronous messages
In this messaging model, a response is awaited from the service, after a request
is sent by a client. Microservices commonly use the representational state transfer
(REST) framework, because it is based on a stateless, client–server model and the
HTTP protocol. This protocol offers a major advantage by providing a distributed
environment where each functionality is provided with separate resources to carry
out its operations.
Data Handling
Each microservice owns a database that is independent of all other databases they
use it to capture data and operate on them using their independent functionality. Data
bases of each microservices are handled by its service API only as shown in Fig. 4.
Static Content
After the services processed among the microservices and they are done communi-
cating they send the static data response to a storage device based on cloud technology
which in turn delivers them to the clients using the content delivery networks (CDNs).
Apart from the core components of the architecture, there are a few other significant
components they are
Management
This component deals with non-functional requirements such as load balancing,
distribution of work on separate nodes and identification of failed modules.
Service Discovery
658 K. S. Swarnalatha et al.
Fig. 4 DB of microservice
Uber’s Transition: Uber started to grow into a larger team of engineers, with multiple
teams owning pieces of the stack and they wanted to break free. Microservices would
allow teams to be more autonomous and make their systems more flexible. So, Uber
transitioned into a Microservice architecture that would now increase the reliability
of systems, provide clear separation of concerns, Define ownerships and Simplify
deployments. This was the “service-oriented microservices architecture” (SOA).
However, as the team grew larger they began to notice some issues associated with the
complexity of this system. Root cause analysis became extremely difficult as under-
standing interdependencies between services became difficult. Now, the codebase
was these black boxes that could show unexpected behaviours and getting visibility
into it needed the right tools, making debugging hard. When you transition from a
monolithic codebase to microservices it is important to make some critical infras-
tructure changes. Two of them are defining sensible contracts and communications.
With newer features added as individual microservices, it is important to set up
endpoints between each of them for communication and have well-defined contracts
for response. Uber observed that services providing REST or RPC endpoints offered
weak contracts, impacting the overall resilience of the system. They needed a more
standard way of communication that could provide type safety, validation and fault
tolerance. Uber found that Apache Thrift was one such tool that met their needs
best. It helps in building cross-language services, which means with the data types
and interfaces defined in language-agnostic files services that are written in different
languages (Python, Node, etc.) could now communicate with each other. Thrift binds
services to strict contracts guaranteeing type safety. A contract is basically a set of
rules that a service must adhere to while trying to interact with that service. It describes
how to call service procedures, what inputs to provide and what response to expect. A
strict contract reduces time spent on figuring out how to communicate with a partic-
ular service. Deployment of services got simple even as a microservice evolved since
Thrift solved problems of safety. To handle latency and fault tolerance, Uber wrote
660 K. S. Swarnalatha et al.
libraries from the ground up taking inspiration from libraries used in other companies
like Netflix’s Hystrix. But their monolithic API turned into a distributed monolithic
API now. To build a simple feature, one has to work across multiple services, collab-
orating with teams that own them. Lines of service ownership blurred, services that
appeared independent had to be deployed together on changes. Once you adopt a
microservice architecture there is no turning back, you need to adjust and adapt for
the larger scheme of things.
Introduction to DOMA: Over the years as Uber grew to provide 2000+ microservices
they started experiencing the downsides of its complexity. Its operational benefits
were too good to be rejected or replaced, with lack of alternatives in the market.
Uber came up with a more generalized approach that could find a fine balance
between overall system complexity and flexibility associated with microservices
architecture—“domain-oriented microservices architecture” (DOMA).
Principles of DOMA
• Orientation towards domains
• Layer designs
• Well-defined gateways
• Mechanism to extend domains.
Uber’s Architecture
Domains: A logical grouping of one or more microservices representing a function-
ality forms a domain. In Uber, there are domains of map search services, fare services
and matching platforms with different gateways for each.
Layer Design: A layered design helps at separation of concerns and dependency
management at scale.
They designed it keeping failure blast radius and product specificity in mind,
which means the bottom layers are the ones that have more dependencies, tending to
have larger blast radius while also representing more general functionalities. A layer
is only dependent on the layers below it, functionality moves down the pyramid, i.e.
specific to generic.
Uber’s layer stack looks like this as shown in Fig. 6.
An API gateway is a way to decouple client interface from the backend implemen-
tation. Microservices communicate through API calls (RPC OR REST), so when a
client makes a request the gateway routes it to the right service and produces expected
response, keeping track of everything (Fig. 7).
Extensions
Extensions help when you might want to make use of the functionality of an under-
lying service, without affecting its implementation or reliability. Uber uses extensions
for both logic and data.
Solving Problems of Large Codebases: Uber’s Approach … 661
2 Conclusion
Microservices architecture like DOMA has shown positive signs at Uber, simplifying
their developer experience and system complexity. Product teams at Uber saw an
accelerated development with reduction in time taken for code reviews, tests and
planning. The onboarding time for a new feature saw a reduction by 25–50%. As
Uber was stepping into offering newer services, adopting microservices architecture
helped them maintain platforms easier with less expense. We think organizations
that are trying to scale their team and venture into offering more services to their
customers can benefit the most from this design pattern.
References
1 Introduction
The current population of India is 1.3 billion as of November 2021 based on Worl-
dometer elaboration of the latest United Nations data. India population is equivalent
to 17.7% of the total world population. On an average, the yearly growth rate is esti-
mated to be 1%. At present, the Urban population in India is found to be 35.0% of
the total population. The rural to urban migration or urbanization trend is expected to
sustain in the next 50 years for better living. It is forecasted that by 2050, the Urban
population will grow to 55%. Figure 1 shows the statistics of Urban versus Rural
population in India [1].
Urbanization leads to increased usage of products and services. Hence, the modern
living demands for more resources resulting in increased pollution and waste gener-
ation. The gap between the increase in the demand and limited resources calls for
an alarming situation which needs to be addressed to ensure that even the future
generations will have sufficient resources. This demands for a paradigm shift that
results in building a new modern society based on an economic system which aims
at eliminating the waste while repetitively reusing the limited resources [2].
The Circular Economy is a smart and innovative approach toward increasing
the economic benefits, thereby reducing the environmental damage and assuring
the efficient management of resources [3]. In this way, the life cycle of products
is extended. The goal of this Circular Economy concept is to decouple economic
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 663
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_58
664 S. Veena et al.
Fig. 1 Urban versus rural population from 1955 to 2020. Source https://www.worldometers.info/
demographics/india-demographics/#urb
growth from resource consumption [4]. The Circular Economy attempts to increase
resource productivity, reduce energy consumption, and reduce greenhouse gas emis-
sions by using the 3R (Reduce, Reuse, Recycle) waste management method [5]. In
recent times, the concept of Circular Economy is gaining popularity in the field of
sustainability due to the evident increase of demand–supply imbalance [6]. Besides,
roof-top solar power generation has been the order of the day and regulatory bodies
and the Government has encouraged it in a big way. Many office spaces with rela-
tively a smaller number of floors have installed Solar Photovoltaic Systems (SPV)
[7]. Earlier, the waste generated by discarded electrical machinery and huge metallic
debris was scrapped and allowed to go as land use waste. This was nothing but
following the “Linear Economy Model” where no thought was given to recycling or
upcycling. Yet another upcoming area that is contributing to Circular Economy is
making scaled models from industry scrap. This involves converting used spare parts
into highly creative models/artifacts that can be displayed. Such models can be either
working models or non-working models. This activity that has picked up recently
contributes in a big way toward “land use waste minimization” [8]. The information
reported in the present work focuses on these aspects.
Circular Economy is a model that has brought a change in the Linear economy. It is
a transition toward the utilization of renewable energy (non-conventional sources of
energy), elimination of harmful chemicals, and waste by implementing better design
of products. Its prime emphasis is on reusing, recovering, remanufacturing, and
regeneration of products and materials after the lifetime of the product. India is rich
in Iron Ore and Bauxite, yet it is import dependent for some of the rare earth metals
Circular Economy with Special Reference to Electrical … 665
which are required for manufacturing Electrical and Electronics Equipment’s (EEE).
Further, electrical and electronic equipment are vastly used in automobiles that can
be salvaged after its useful life and can be converted into artifacts of high aesthetic
value. It is known that the cost incurred in extracting the raw materials is more than
the rate of their occurrence in its crude form. So instead of investing in mining raw
materials, it is recommended to use secondary raw materials which can be derived
from the e-waste. This alternate to mining raw materials leads to resource security and
environmental sustainability. There is a need for modern tools/techniques/practices
which focuses on limiting the use of nonrenewable resources and explores the usage
of secondary resources which can be recovered from the waste and hence resulting in
transparency, socio-economic, and environmental benefits. This Circular Economy
approach builds connection among various stake holders across the value chain.
Thus, in short, Circular Economy can be defined as a system which comprises of
3 aspects (3R) namely:
1. To Reduce the usage of nonrenewable and toxic materials thereby transiting
toward utilizing the renewable energy.
2. To Reuse the products which still are in working condition by adopting better
designs.
3. To Recycle the e-waste into new resources for further utilization [9].
In this paper, Circular Economy concept in managing e-waste for Electrical and
Electronic Equipment is highlighted. Now, in recent years, there is a paradigm shift
in the thinking process of individuals, organizations, and leaders. This new shift has
been toward the “Circular Economy” where at least a part of the discarded material
is reused in some way (Fig. 2).
• Environment protection, waste reduction, and planned recycling are the major
benefits of a Circular Economy.
• CE creates employment opportunities and enhances the economic growth.
• Consumers are benefitted by adapting Circular Economy in their purchases.
• Consumption of nonrenewable resources is reduced.
• Land use waste is minimized.
Our Planet Earth is very delicate with limited natural resources and a balanced
ecosystem. But due to urbanization, there is a huge demand for creation of new
products for which naturally available finite resources are exploited and the used
products are disposed unsystematically. This is “cradle-to-grave” approach which is
mainly consumption driven. However, with the increase in the population, the gap
between the demand and the supply grows thereby increasing the demand for the finite
resources. Hence, an awareness should be brought in to replace the linear economic
model which is oriented toward take-make-dispose (cradle-to-grave) approach to a
Circular Economy model which adopts cradle-to-cradle approach [4].
There is a transition in the socioeconomic, ecological balance, advancement in
the technology, rapid growth in population, global awareness regarding the finite
nature, and rapid depletion of resources. Linear to Circular Economy transformation
is expensive. To achieve Circular Economy, financial incentives should be provided.
The production process should be designed to obtain a better product with high
performance. Reduction in per unit production cost will not only increase the produc-
tion but also increase the consumption levels and thereby override the environ-
mental benefits and eco-effectiveness. Usage of the product must be made effi-
cient throughout its life cycle. A proper technique should be devised for reusing the
product before disposal. Few more solutions that can be thought of to achieve Circular
Economy include sharing the resources, adapting to a new purpose, reprocessing,
and categorizing the waste into organic and inorganic materials and making use of
Renewable energy sources. Also, the fundamental approach to Circular Economy
is to stop producing the products in surplus and redundant products. However, it is
observed that recycling the products is the most practiced solution to attain Circular
Economy.
Proper framework must be designed to promote the concept of Circular Economy
both at the national and international level. Dependency on inter-sectoral, inter-
organizational changes can be avoided by restructuring the societal and institutional
policies. Awareness must be created about the new tradition of consumption.
Circular Economy with Special Reference to Electrical … 667
the appropriate processes to guarantee that their findings and recommendations are
implemented effectively [10].
Even end-of-life goods, recyclable materials, and wastes are among the priority
areas, which either continue to represent significant issues or are emerging as new
difficulty areas that must be addressed holistically.
In the coming years, India’s vision should be in the direction of a steady shift toward
renewable energy sources and reduction of carbon emissions. However, this transition
may not be easy as lot of challenges such as optimization of energy saving and
reduction in energy demand must be met. Modern solar energy systems and materials
are a key factor in addressing these challenges. Circular Economy concept can be
used to tackle the critical situation of recycling the large mass of PV waste [7].
Recycling processes of crystalline silicon (c-Si) modules leads to cost saving,
thereby resulting in sustainability of the supply chain in the long run. This is
expected to influence recovery of energy and embedded materials, while lowering
CO2 emission and energy payback time for the PV industry.
670 S. Veena et al.
they are reaching their end of life is also increasing. By 2050, this figure is expected
to touch 5.5–6 million tons. A typical generic process for recycling crystalline silicon
solar module is shown in Fig. 4.
5 Conclusion
References
1. https://www.worldometers.info/demographics/india-demographics/#urb
2. D’Amato D, Korhonen J (2021) Integrating the green economy, circular economy and bioe-
conomy in a strategic sustainability framework. J Ecol Econ 188. https://doi.org/10.1016/j.eco
lecon.2021.107143
3. Barrie J, Schröder P (2021) Circular economy and international trade: a systematic literature
review. Circ Econ Sustain. https://doi.org/10.1007/s43615-021-00126-w
4. Goyal S, Kapoor A, Esposito M (2016) Circular economy business models in developing
economies—lessons from India on reduce, recycle, and reuse paradigms. Thunderbird Int Bus
Rev 60(5):729–740. https://doi.org/10.1002/tie.21883
5. Ramakrishna S (2020) Circular economy and sustainability pathways to build a new-modern
society. Dry Technol. https://doi.org/10.1080/07373937.2020.1758492
6. Circular economy in electronics and electrical sector. Ministry of Electronics and Information
Technology, Government of India, New Delhi
672 S. Veena et al.
7. Circular economy: recent trends in global perspective (2021) Springer Science and Business
Media LLC
8. Yi S, Lee H, Lee J, Kim W (2019) Upcycling strategies for waste electronic and electrical
equipment based on material flow analysis. Environ Eng Res. https://doi.org/10.4491/EER.
2018.092
9. Brenner W, Adamovic N (2020) Creating sustainable solutions for photovoltaics. In: 2020 43rd
international convention on information, communication and electronic technology (MIPRO),
pp 1777–1782. https://doi.org/10.23919/MIPRO48935.2020.9245369
10. Brenner W, Adamovic N (2016) The European project solar design illustrating the role of
standardization in the innovation system. In: 2016 39th international convention on information
and communication technology electronics and microelectronics (MIPRO)
Analysis and Evaluation
of Pre-processing Techniques for Fault
Detection in Thermal Images of Solar
Panels
1 Introduction
2 Previous Work
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 673
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_59
674 S. P. Pathak and S. A. Patil
[1–11]. Wahab et al. [12] have done comparative evaluation of image filtering and
contrast stretching of individual channels of color thermal images. The effective-
ness of individual filters is evaluated using MSE and PSNR. Three image filtering
techniques as median filter, Gaussian filter and Wiener filter are used along with
contrast stretching. Performance of four filters such as median, mean, Wiener, and
adaptive filter is compared in the paper [13] on the basis of the quantitative parame-
ters such as PSNR and RMSE on MR images. In the paper [14], existing denoising
algorithms such as wavelets based approach and filtering approach are reviewed and
performance is compared against bilateral filters. Various noise models are used to
describe multiplicative and additive noise in an image. Results of the degraded noisy
images are analyzed in terms of MSE, PSNR, and universal quality identifier (UQI).
Bilateral filter is one type of nonlinear filter proposed by Tomasi and Manduchi [15].
This technique reduces additive noise from image and smoothens the images along
with preserving edges. A weighted sum of local neighborhood pixels is considered
which depend on both intensity difference and spatial distance. Hence, noise is aver-
aged out and edges also are well preserved. Bilateral filter’s output at a specific pixel
location x is computed mathematically as shown in Eq. 1.
1 Σ − (y−x)
2 2
− (I (y)−I2(x))
I˜(x) = I (y)
2
e 2σd e 2σr (1)
C y∈N (x)
Mean filtering is an easy and simple method frequently used for smoothing and noise
reduction in images. It does this by decreasing amount of intensity difference between
two neighboring pixels. This method replaces each pixel value with the average or
mean value of its neighbors, including itself. This will replace all eliminating pixel
values which are not representative of their neighborhood. This filter is based on a
kernel like a convolution filter. During mean calculation, it uses the shape and size
of the neighborhood. For an average smoothing 3 × 3 square mask is more common,
but for more severe smoothing, larger masks such as 5 × 5 squares can be used [3].
Many of the graphics softwares use Gaussian filter for reduction of image noise by
blurring an image. The smoothing 2D convolution operator, i.e., Gaussian operator
is used to blur images and remove noise. The probability distribution for noise is
defined by the Gaussian function. The image structures can be enhanced by Gaussian
smoothing [16]. In 2D, circularly symmetric Gaussian has the form:
1 − x 2 +y2 2
G(x, y) = e 2σ (2)
2π σ 2
Median filter preserves useful detail in the image compared to mean filter by consid-
ering its nearby neighbors. Each pixel is replaced with the median of neighboring
pixels. The midpoint is calculated by arranging all the neighborhood pixel values into
numerical order. The pixel under consideration is replaced with the midpoint pixel
value. High spatial frequency details are passed while remaining highly effective at
removing noise on images. For the images corrupted with Gaussian noise, median
filtering may not be much effective at removing noise. Also, it is relatively costly
and complicated to calculate. All the values in the neighborhood are sorted which
makes processing very slow [3].
676 S. P. Pathak and S. A. Patil
where MN are total number of pixels in the image, and nk denotes number of pixels
with intensity r k . The discrete form of the transformation for histogram equalization
is
Σ
k
( )
Sk = T (rk ) = (L − 1) pr r j
j=0 (4)
k = 0, 1, 2, . . . L − 1
where L is the number of gray levels in the image (256 for an 8-bit image). Equalized
image is obtained by using Eq. 2 for mapping of every pixel in the input image with
intensity r k to a subsequent pixel with level sk in the output image. This process is
called as histogram equalization or histogram linearization transformation [17–22].
used clip limit value is between 3 and 4. Obtained histogram is the actual histogram
of recorded intensities centered at the pixel in consideration. But, it is clipped at a
particular height with the clipped pixels redistributed uniformly across all intensities
in the range of the recorded image. Due to clipping, improvement of noise is reduced
in comparatively similar areas of the image by changing the highest possible level
of contrast enhancement [21, 23–25].
In this method, initially an input image is decomposed into two parts based on the
mean of the input image. First image part consists of samples less than or equal to
the mean whereas the other one consists of samples greater than the mean. Then, this
method equalizes the two parts of an image autonomously based on their respective
histograms with the restriction that the samples in the first set are mapped between
minimum gray level and the mean and the samples in the second set are mapped
between mean and maximum gray level. So, one of the sub-sections of image is
equalized up to the mean value and the other sub-section is equalized from the mean
based on the respective histograms. The resulting equalized sub-images are bounded
by each other around the input mean, due to which mean brightness is preserved
[26].
The basic concept behind DSIHE and BBHE is similar. In DSIHE histogram is
separated based on gray level with cumulative probability density equal to 0.5 which
is mean in case of BBHE. In DSIHE, original image is separated into two equal area
sub-sections based on its gray level probability density function. Two sub-sections
are equalized individually and then the equalized sub-sections are composed into
one image. This algorithm enhances visual information in an image. It also prevents
a big shift of average luminance of original image [27].
This algorithm uses the notion of smoothing a global image histogram using Gaussian
kernel. After this valley regions are segmented for dynamic equalization. BPDFHE
[28, 29] handles the image histogram in such a way that no remapping of the histogram
678 S. P. Pathak and S. A. Patil
peaks is required. Only redistribution of the gray-level values in the valley portions
between two successive peaks takes place. It consists of following working steps:
(A) Fuzzy Histogram Computation.
(B) Partitioning of the Histogram.
(C) Dynamic Histogram Equalization of the Partitions.
(D) Normalization of the Image Brightness.
5 Performance Measures
Consider an image of size A × B. If x(i, j) is the original image and y(i, j) is the noisy
image. Based on this, different measurement parameters can be defined as below:
PSNR is represented as the ratio of maximum power in the image to the corrupted
noise in the image [16]. The unit of PSNR is dB (decibels). If the PSNR value is
higher, then the quality of the filtered image will be good. The formula to calculate
PSNR is
( 2 )
R
PSNR = 10 log10 (5)
MSE
It is defined as:
1 ΣΣ
A B
MSE = [x(i, j ) − y(i, j )]2 (6)
A ∗ B i=1 j=1
SNR or S/R is the ratio of signal power and noise power. It compares signal level
with noise level. It is given by
It is described as the absolute difference between the input image mean and the output
image mean [22]. It can be calculated as:
In this research, total 1487 thermal images of solar panel containing hotspots of
various categories are considered. These images are captured under different envi-
ronmental conditions and at different sites. Some sample images are shown in Fig. 1.
Sample thermal color image and images after application of various filters are shown
in Fig. 2. Image quality is affected by noise during thermal image capturing. Various
filters are applied on thermal images with different faults. Performance analysis of
different filters is done using statistical parameters such as PSNR, SNR, MSE, and
SSIM.
A. MSE: Table 1 shows MSE values for various images after application of five
filters on five thermal images belonging to five different faults. Graphical repre-
sentation of the same is shown in Fig. 3. Bilateral filter gives minimum values
for mean square error for the sample images tested.
680 S. P. Pathak and S. A. Patil
B. PSNR: For good filtering effect, the values of PSNR should be high. The PSNR
values along with graphical representation are as shown in Table 2 and Fig. 4,
respectively. Bilateral filter gives highest PSNR values.
C. SNR: Signal-to-noise ratio should be high after application of filters. The SNR
values are shown in Table 3 along with graphical representation in Fig. 5. From
the values of SNR, bilateral filter gives good results.
D. SSIM: SSIM is used for measuring the resemblance between two images. SSIM
values range between 0 and 1; one indicating highest similarity. SSIM values
Analysis and Evaluation of Pre-processing Techniques … 681
Fig. 2 Thermal images a original image, b bilateral filtered image, c Gaussian filtered image, d
median filtered image, e Wiener filtered image, f mean filtered image
after application of five different filters are shown in Table 4 along with graphical
representation in Fig. 6. Bilateral filter gives highest values for SSIM.
Comparing the MSE, PSNR, SNR and SSIM values, it can be proved that bilateral
filter performs well in all the cases for noise removal. It gives higher values of PSNR,
minimum value for MSE and highest value for SSIM. SSIM value closer to 1 indicates
that image quality is very good after filtering.
682 S. P. Pathak and S. A. Patil
MSE values
3000
2000
1000
0
IMG1 IMG2 IMG3 IMG4 IMG5
Bilateral filter Wiener filter Mean filter Gaussian filter Median filter
PSNR
30
20
10
0
IMG1 IMG2 IMG3 IMG4 IMG5
SNR
30
20
10
0
IMG1 IMG2 IMG3 IMG4 IMG5
Bilateral filter Wiener filter Mean filter Gaussian filter Median filter
SSIM
1.5
0.5
0
IMG1 IMG2 IMG3 IMG4 IMG5
Bilateral filter Wiener filter Mean filter Gaussian filter Median filter
After filtering the images with bilateral filter, five different histogram equalization
techniques are applied on these images.
The sample original thermal color and bilateral filtered image along with gray
scale image and histogram are shown in Fig. 7. On the bilateral filtered image various
histogram equalization techniques are applied as shown in Fig. 8. These histogram
techniques are applied on all the available thermal images and the HE technique
providing good quality is selected as one of the pre-processing techniques.
The performance of various histogram equalization techniques is evaluated using
PSNR, AMBE, and SSIM values. As shown in Table 5 and Fig. 9, BPDFHE gives
highest values of PSNR. Higher PSNR values indicate good image quality.
As per the results shown in Table 6, it may be observed that BPDFHE gives least
AMBE values. After looking at average AMBE values shown in last row of table,
BPDFHE gives minimum value compared to other methods. Graphical representation
is shown in Fig. 10.
SSIM values for ten different types of thermal images are shown in Table 7 along
with graph in Fig. 11. SSIM assesses image quality and ranges from 0 to +1. +1
Fig. 7 Original thermal and gray image for one type of fault
Analysis and Evaluation of Pre-processing Techniques … 685
PSNR
50
40
30
20
10
0
IMG1 IMG2 IMG3 IMG4 IMG5 IMG6 IMG7 IMG8 IMG9 IMG10
1 2 3 4 5 6 7 8 9 10
AMBE
40
30
20
10
0
IMG1 IMG2 IMG3 IMG4 IMG5 IMG6 IMG7 IMG8 IMG9 IMG10
1 2 3 4 5 6 7 8 9 10
indicates two images are very similar or same. From the calculated SSIM values,
it can be stated that for the given sample images, BPDHE gives maximum value
indicating that original and histogram equalized images are more similar.
7 Conclusion
Noise removal techniques are very essential in any image processing applications.
Several filtering techniques have been analyzed and its performance is assessed
for solar panel thermal images using parameters such as SNR, PSNR, MSE, and
SSIM. From the obtained results, it can be noted that bilateral filter gives excellent
results for noise removal with highest values of SSIM and lowest value for MSE
for the solar panel thermal images. Once the filter is finalized, next step will be
histogram equalization of images to increase the intensity variation. Analysis of five
688 S. P. Pathak and S. A. Patil
SSIM
1.05
1
0.95
0.9
0.85
0.8
0.75
IMG1 IMG2 IMG3 IMG4 IMG5 IMG6 IMG7 IMG8 IMG9 IMG10
1 2 3 4 5 6 7 8 9 10
Acknowledgements Authors would like to thank PV Diagnostics Ltd. Mumbai for providing
thermal images of solar panel.
References
1. Prasad V, Gopal R (2016) LHM filter for removal salt and pepper with random noise in images.
Int J Comput Appl 139:9–15. https://doi.org/10.5120/ijca2016908962
2. Yadav RB, Srivastava S, Srivastava R (2017) Identification and removal of different noise
patterns by measuring SNR value in magnetic resonance images. In: 2016 9th international
conference on contemporary computing, IC3 2016, pp 9–13. https://doi.org/10.1109/IC3.2016.
7880212
3. Tania S, Rowaida R (2016) A comparative study of various image filtering techniques for
removing various noisy pixels in aerial image. Int J Signal Process Image Process Pattern
Recognit 9:113–124. https://doi.org/10.14257/ijsip.2016.9.3.10
4. Khetkeeree S, Thanakitivirul P (2020) Hybrid filtering for image sharpening and
smoothing simultaneously. In: ITC-CSCC 2020—35th international technical conference on
circuits/systems, computers and communications, pp 367–371
5. Isa IS, Sulaiman SN, Mustapha M, Darus S (2015) Evaluating denoising performances of
fundamental filters for T2-weighted MRI images. Procedia Comput Sci 60:760–768. https://
doi.org/10.1016/j.procs.2015.08.231
6. Hoshyar AN, Al-Jumaily A, Hoshyar AN (2014) Comparing the performance of various filters
on skin cancer images. Procedia Comput Sci 42:32–37. https://doi.org/10.1016/j.procs.2014.
11.030
7. Srivastava C et al (2013) Performance comparison of various filters and wavelet transform for
image de-noising. IOSR J Comput Eng 10:55–63. https://doi.org/10.9790/0661-01015563
Analysis and Evaluation of Pre-processing Techniques … 689
8. Janaki K, Madheswaran M (n.d.) Performance analysis of different filters with various noises
in preprocessing of images. Int J Adv Netw Appl 372–376
9. Kumar MP, Murthy PHST, Kumar PR (2011) Performance evaluation of different image
filtering algorithms using image quality assessment. Int J Comput Appl 18:20–22. https://
doi.org/10.5120/2289-2972
10. Dwivedy P, Potnis A, Soofi S, Giri P (2018) Performance comparison of various filters for
removing different image noises. In: International conference on recent innovations in signal
processing and embedded systems, RISE 2017, Jan 2018, pp 181–186. https://doi.org/10.1109/
RISE.2017.8378150
11. Varghese J (2013) Literature survey on image filtering techniques. Int J Comput Appl Technol
Res 2:286–288. https://doi.org/10.7753/ijcatr0203.1014
12. Wahab AA, Salim MIM, Yunus J, Ramlee MH (2018) Comparative evaluation of medical
thermal image enhancement techniques for breast cancer detection. J Eng Technol Sci 50:40–52
13. Garg S, Vijay R, Urooj S (2019) Statistical approach to compare image denoising techniques in
medical MR images. Procedia Comput Sci 152:367–374. https://doi.org/10.1016/j.procs.2019.
05.004
14. Paudel S, Rijal R (2015) Performance analysis of spatial and transform filters for efficient
image noise reduction
15. Tomasi C, Manduchi R (1998) Bilateral filtering for gray and color images. In: IEEE
international conference on computer vision. https://doi.org/10.1677/joe.0.0930177
16. Umamaheswari D, Karthikeyan E (2019) Comparative analysis of various filtering techniques
in image processing. Int J Sci Technol Res 8:109–114
17. Zeng M, Li Y, Meng Q, Yang T, Liu J (2012) Improving histogram-based image contrast
enhancement using gray-level information histogram with application to X-ray images. Optik
(Stuttg) 123:511–520. https://doi.org/10.1016/j.ijleo.2011.05.017
18. Akila K, Jayashree LS, Vasuki A (2015) Mammographic image enhancement using indirect
contrast enhancement techniques—a comparative study. Procedia Comput Sci 47:255–261.
https://doi.org/10.1016/j.procs.2015.03.205
19. Cheng HD, Shi XJ (2004) A simple and effective histogram equalization approach to image
enhancement. Digit Signal Process 14:158–170. https://doi.org/10.1016/j.dsp.2003.07.002
20. Lu L, Zhou Y, Panetta K, Agaian S (2010) Comparative study of histogram equalization
algorithms for image enhancement. In: Mobile multimedia/image processing, security, and
applications 2010, vol 7708, pp 770811-1–770811-11. https://doi.org/10.1117/12.853502
21. Suryavamsi RV, Reddy LST, Saladi S, Karuna Y (2018) Comparative analysis of various
enhancement methods for astrocytoma MRI images. In: Proceedings of the 2018 IEEE inter-
national conference on communication and signal processing ICCSP 2018, vol 1, pp 812–816.
https://doi.org/10.1109/ICCSP.2018.8524441
22. Senthilkumaran N, Thimmiaraja J (2014) Histogram equalization for image enhancement using
MRI brain images. In: Proceedings of 2014 world congress on computing and communication
technologies WCCCT 2014, pp 80–83. https://doi.org/10.1109/WCCCT.2014.45
23. Pizer SM, Johnston RE, Ericksen JP, Yankaskas BC, Muller KE (1990) Contrast-limited adap-
tive histogram equalization: speed and effectiveness. In: Proceedings of the first conference on
visualization in biomedical computing, pp 337–345. https://doi.org/10.1109/vbc.1990.109340
24. Gupta S, Gupta R, Singla C (2017) Analysis of image enhancement techniques for astrocytoma
MRI images. Int J Inf Technol 9:311–319. https://doi.org/10.1007/s41870-017-0033-8
25. Raj D, Mamoria P (2016) Comparative analysis of contrast enhancement techniques on different
images. In: Proceedings of 2015 international conference on green computing and internet of
things, ICGCIoT 2015, pp 27–31. https://doi.org/10.1109/ICGCIoT.2015.7380422
26. Kim YT (1997) Contrast enhancement using brightness preserving bi-histogram equalization.
IEEE Trans Consum Electron 43:1–8. https://doi.org/10.1109/30.580378
27. Wang Y, Chen Q, Zhang B (1999) Image enhancement based on equal area dualistic sub-image
and non-parametric modified histogram equalization method. IEEE Trans Consum Electron
45:68–75. https://doi.org/10.1109/30.754419
690 S. P. Pathak and S. A. Patil
1 Introduction
The secretory organ (thyroid) endocrine gland endocrine ductless gland may be a
gland of the system. Set within the front of the neck between the collar Bones and
below Adam’s apple (the larynx). Endocrine glands are glands of the system that
secret their merchandise, hormones, directly into the blood rather through a duct.
The major glands of the system embody the ductless gland, neural structure,
endocrine, endocrine, pancreas, testers, the ductless gland, ovaries, and adrenal
glands.
G. Drakshaveni (B)
Department of MCA, BMSIT and Management, Bengaluru, India
e-mail: [email protected]
P. N. Hamsavath
Department of MCA and Advisor—Foreign Students, NMIT, Bengaluru, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 691
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_60
692 G. Drakshaveni and P. N. Hamsavath
The thyroid gland produces hormones T4 and T3 which affect different organs
and functions of human body, sleep, metabolic rate, bone, hair, brain, heart, skin and
nails, intestines, muscles, temperature regulation.
Thyroid gland disorders are hyperthyroidism (high levels of thyroid
hormone), hypothyroidism (low levels of thyroid hormone), thyroid nodules
(growth/enlargement of the thyroid—goiter), thyroid cancer.
Test done for thyroid gland are: Blood test is done to identify thyroid stimu-
lating hormone (TSH), Free T3, Free T4, Total T4 protein-bound, and Total T4
protein-bound, antibody tests, thyroid peroxidase, thyroid-stimulating immunoglob-
ulin, thyroglobulin, ultrasound of thyroid, radioactive scan (the activity of thyroid
gland), also for treatment hypothyroidism, and also a thyroid cancer.
Procedure for hormones generation in the human body: Hypothalamic–pituitary–
thyroid (HPT)-axis gland sends positive signals to pituitary gland to produce thyroid
simulation hormones and pituitary gland, in turn, send positive signals to the thyroid
gland to produce T4 and T3 hormones in to directly into bloodstreams. In turn,
peripheral tissues send negative signals to the hypothalamic-pituitary gland to stop
more hormone generations.
Comparative Analysis of Medical Imaging Techniques … 693
Radioactive iodine scan hyperthyroidism are done with I24 or iodine I31, the
process to do this scan is normally after I24 or I31 injection; after 5 h, a scan is done
to check the thyroid gland.
Usually, pregnant women are diagnosed with hyperthyroidism by a blood test. Preg-
nant women cannot use RAI scanning during pregnancy. Pregnant patient’s lab work
is kept slightly in the upper limit of normal (due to reason if controlled too well can
cause hypothyroidism in the baby).
Hypothyroidism: Hypothyroidism is a condition where under activity of the thyroid
gland happens when it produces less than the normal amount of thyroid hormones.
Hypothyroidism slows down body functions. Hypothyroidism is most of the time
permanent but sometimes it can be temporary. Studies have shown 10% of women
and 3% of men are hypothyroid. Initially, the gland can secrete more hormones to
compensate but is not able to keep up.
Hypothyroidism symptoms: Hypothyroidism symptoms are putty face dry itchy skin,
constipation, drowsiness, dry and brittle hair, forgetfulness, difficulty with learning
(with respective to kids), sore muscles, increased frequency of miscarriages, heavy
and/or irregular menstrual periods, and increased sensitivity to many medications.
Causes for hypothyroidism: Congenital hypothyroidism is one type of hypothy-
roidism in infant born with inadequate thyroid tissue or enzyme defect. If not treated
adequately can lead to physical stunting and mental damage, one in every 3000–4000
babies can have a diagnosis of hypothyroidism.
may lead to anemia, premature birth, and miscarriage can occur in pregnant women
[2].
Thyroid nodules: Thyroid nodules occur in 5% of women and 1% of men and increase
in frequency with age. When a nodule is found, cancer needs to be ruled out, 95–97%
are not cancerous, usually, no symptoms found on routine physical exam, sometimes
on ultrasound, MRI or CT of the spine or chest or a PET scan.
Thyroid nodules symptoms: Symptoms for thyroid nodules are most of the time no
symptoms, sometimes difficulty in swallowing, difficulty in breathing, sometimes
can cause harassment of voice if pressing on the nerve which supplies the voice box,
even it can affect cosmetic appearance.
Thyroid nodule treatment: No medical treatment for thyroid nodules, if causing
compression symptoms, then surgery is the only option.
Thyroid cancer: The chance of being diagnosed with thyroid cancer has increased
three-fold in the last 30 years, due to the use of thyroid ultrasound, detection of
smaller cancers at a younger age than most adult cancers. Three out of four cases are
women. Women in the 40s and 50s are more proven to thyroid cancer, men in the
60s and 70s chance is more, death rate is steady for many years. It can be hereditary
in the cause of medullary thyroid cancer. Having a first degree relative like a parent,
sister, or child is with thyroid cancer increases the risk and low diet in iodine can
increase the risk of follicular and papillary thyroid cancer. Radiation is proven to
have a risk factor. Head and neck radiation in childhood.
Thyroid cancer signs and symptoms: A lump within the neck, generally growing
quickly, swelling within the neck pain within the front of the neck, typically mounting
to the ears and roughness or different voice changes that do not escape, hassle
swallowing, and hassle respiration. A continuing cough that’s undue to a chilly.
Thyroid cancer diagnosis and types: Thyroid cancer is diagnosed by ultrasound and
biopsy.
Types of thyroid cancer: Thyroid cancer types are papillary thyroid cases which
are most common, follicular thyroid cancer 2nd most common, hustle cell cancer,
medullary thyroid cancer, anaplastic thyroid cancer poor outcomes.
RAI scan and treatment: Uses after surgery to diagnose any residual disease and also
if need after the treatment PET scans are used if cancer does not take up iodine to
find cancer has spread [3].
Thyroid cancer treatment: Treatments for thyroid cancer are surgery which is the
main treatment for any type of cancer. Radioactive iodine can be used if patients
are in the intermediate high risk, preparation for RAI diagnosis on treatment can be
done, slipping the thyroid medications, by using a medication causes thyroid. All
patients need to be on thyroid replacement, and depending on the risk of the patient,
the dose adjusted based on TSH and also neck ultrasound is done periodically to see
if there is the only recurrence [4].
696 G. Drakshaveni and P. N. Hamsavath
2 Thermogram Images
Due to COVID-19 screening, each person to check skin temperature before they enter
a building. It is an easy and efficient way of helping to reduce the risk of spreading
the virus. FLIR is the best thermal imaging camera.
Procedure for the Thermal Imaging Camera
The first person who invented infrared light was Sir Frederick William Herschel,
who passed sunlight through a prism and found out that seven colors were out of the
prism [5].
Then, he measured the temperature of all the colors and found that low temperature
was ultraviolet color, as he moved toward another color he found that there was an
increase in the temperature and the highest temperature was with the color red and
lowest temperature was with blue or violet color and even he checked beyond red
color which was not visible light then he found that there was a temperature which
was higher than red but not visible he was the first person who invented infrared
region with a wavelength of 1050 which is an invisible region (invisible) to the
human eye.
Later, Max Planck gave the mathematical equation E = hv, where h is a Planck
constant and v is the frequency of the radiation and E is energy.
Wien’s law formula
The equation describing Wien’s law is very simple: λmax = b/T, λmax is the
aforementioned peak wavelength of light. T is the absolute temperature of a black
body. b = 2.8977719 mm * K is the Wien’s displacement constant [7].
Any camera will be having three components, they are main lens (Glass), sensor,
and electronic processing unit, whereas in thermography, camera lens will be
Germanium lens (0.7 eV), sensors will be indium, gallium and arsenic, and image
processing.
Types of thermal imaging cameras: They are un-cooled and cooled thermal imaging
camera.
3 Result
Hypothyroid
4000
3000
2000
1000
0
Thyroid data Negative hypothyroid
# summarize the result and plot the training and test loss
698 G. Drakshaveni and P. N. Hamsavath
plt.plot(result.history[‘loss’])
plt.plot(result.history[‘val_loss’])
# Set the parameters
plt.title(‘Deep learning model loss’)
plt.ylabel(‘Loss’)
plt.xlabel(‘Epochs’)
plt.legend([‘train’, ‘test’], loc=‘upper right’)
# Display the plots
plt.show()
# Input
model = Sequential()
# Hidden layer
model.add(Dense(64, kernel_initializer=‘uniform’, input_dim=24,
activation=‘relu’))
# Output layer
model.add(Dense(1, kernel_initializer=‘uniform’, activation=‘sigmoid’))
Comparative Analysis of Medical Imaging Techniques … 699
4 Conclusion
Thyroid gland is very important gland for human being to be normal, thyroid gland
secretion of hormones is increased then human will be hypothyroid patient. If thyroid
gland secretion of hormones is less, human will be hyperthyroid patient. So, it is
very important for any human being to health conscious. Designing better image
enhancement technique will aid detecting and segmenting thyroid more efficiently
which in the future will be considered.
References
1. Shi W, Zhuang X, Wang H, Duckett S, Luong DV, Tobon-Gomez C, Tung K, Edwards PJ,
Rhode KS, Razavi RS et al (2012) A comprehensive cardiac motion estimation framework
using both untagged and 3-D tagged MR images based on nonrigid registration. IEEE Trans
Med Imaging 31(6):1263–1275
2. Fa Y, Mendis S (2014) Global status report on non communicable diseases 2014. World Health
Organization reports
3. Sathish D, Kamath S, Rajagopal KV, Prasad K (2016) Medical imaging techniques and
computer-aided diagnostic approaches for the detection of breast cancer with an emphasis
on thermography—a review. Int J Med Eng Inform
700 G. Drakshaveni and P. N. Hamsavath
4. Lustig M, Donoho DL, Santos JM, Pauly JM (2008) Compressed sensing MRI. IEEE Signal
Process Mag 25:72–82
5. Liang D, DiBella EV, Chen RR, Ying L (2012) k-t ISD: dynamic cardiac MR imaging using
compressed sensing with iterative support detection. Magn Reson Med 68:41–53
6. Axel L, Montillo A, Kim D (2005) Tagged magnetic resonance imaging of the heart: a survey.
Med Image Anal 9(4):376–393; Lustig M, Donoho DL, Pauly JM (2007) Sparse MRI: the
application of compressed sensing for rapid MR imaging. Magn Reson Med 58:1182–1195
7. www.healthand.com
8. Zhao B, Haldar JP, Christodoulou AG, Liang ZP (2012) Image reconstruction from highly
undersampled (k, t)-space data with joint partial separability and sparsity constraints. IEEE
Trans Med Imaging 31:1809–1820
9. www.patientmemoirs.com
10. www.allbreed.net
11. www.data.conferecneworld.com
12. www.adclinic.com
13. www.cancer.org
14. www.endocrine.org
15. www.hopetbi.com
16. https://en.wikipedia.org/wiki/Infrared
17. https://en.wikipedia.org/wiki/Max_Planck
18. www.omnicalculator.com/physics/wiens-law
19. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document
recognition. Proc IEEE 86(11):2278–2324
20. Ramos-Llordén G, den Dekker AJ, Sijbers J (2017) Partial discreteness: a novel prior for
magnetic resonance image reconstruction. IEEE Trans Med Imaging 36(5):1041–1053
21. Salem N, Malik H, Shams A (2019) Medical image enhancement based on histogram algo-
rithms. Procedia Comput Sci 163:300–311. In: 16th international learning & technology
conference
The AgroCart Android Application
to Manage Agriculture System
1 Introduction
India is a country, where more population is passionate about the agriculture busi-
ness, directly or indirectly. The manufacture of food and raw materials in agriculture
eventually is the reason for the existence of the population. The demand for food
production technology is continuously increasing. In India where the population is
very passionate about agriculture but still we are not able to alter the utilization of
agriculture. However, there is a need to review the mechanism for improving the
technology. The dearth of rain and overproduction of crops in the market is one of
the foremost reasons, an extension of irrigation facilities, water management, and
use of seeds. The majority of Indian farmers, including small-scale producers, are
often unable to access the information that could increase the yield of the crops and
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 701
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_61
702 N. Sreenivasa et al.
lead to a better price for their crops. Marketing and selling the crops at a proper price
with average profit is that the most important problem faced by the farmer is lack of
pricing. The widespread network of mobile phones could be the game-changer in this
problem. The main purpose is to develop smart mobile phone-based solutions that
help in agriculture farm management, which leads to agricultural revenue improve-
ments, and also helps in the conservation of the agricultural farms. The Agro Cart is
an android-based application that is developed to maximize profits for the farmers.
This application is the bridge between the farmer and the customer that helps buyers
and growers in such a method that none of them must compromise.
out his product and have to pay according to his needs. This application is like a
bridge between the farmer and the customer.
(iii) Blogs
Most of the farmers are unemployed they are totally dependent on the shops for
fertilizer. To overcome this Agro Cart guides farmers by providing proper information
to use the best fertilizers to get a better yield of crops. In the blogs module, farmers
can get the proper information from experts and there will be a YouTube link so
that farmers can see and cultivate in a proper method. Farmers can also post their
questions regarding the fertilizer that has to be sprayed for the crops. An expert and
Agriculture doctors are connected with the application to help farmers by providing
proper dose for the crops.
(iv) IoT-Based Smart Irrigation
Irrigation is one the most important for the successful cultivation of a crop. The IoT
is remodeling agriculture with top techniques. IoT technology helps in collecting
information about a condition like moisture temperature and soil humidity, etc. So In
Agro Cart, IoT is used to check the humidity, soil, and moisture before cultivating.
Farmers can make use of this for smart irrigation like he can monitor the water for
the crops by this application. And it also has pest detection. There will be sensors
fixed in the land if some pest or animals are entered in the cultivated land then the
alarm will be buzzed and a notification will be sent to the owner’s number.
1.2 Applications
• Agro Cart is a mobile app that helps farmers to grow crops according to the market
requirement.
• With help of Agro Cart, farmers will get a good price for their crops. There will
be no wastage of crops or overproduction of crops.
• Agro Cart is a bridge between the customer and farmers.
• Agro Cart helps farmers to plan sprays of their crops efficiently with the help of
weather conditions.
• Agro cart is a mobile app that detects and identifies decease simply by taking
photos and uploading in an application.
• Agro cart helps farmer to detect and guard their crops from pest and animals.
• Agro Cart helps farmer to use fertilizers to get a better yield of crops.
2 Related Work
Online Shopping Portal is like a selected requirement of the customer that combines
the shopping like buying and selling and promoting the offering in particular to their
704 N. Sreenivasa et al.
customers. Reports may be generated at any time inside few seconds, in order that
guide exertion is not required, and additional evaluation may be executed tons extra
often which facilitates in taking decision. Allow a customer to get the registered form
their location or area and transact for the specified product. It has always been India’s
maximum sizeable economic area. Farming is probably characterized as an included
Association of strategies to manipulate the improvement and gathering of animals
and fruit green or vegetables it as a recognizable stack Indian GDP as maximum
sports are agro-based. Maharashtra is surely actively all rights considered.
A. An Android Application for Plant Leaf Disease Detection and Automated
Irrigation
Due to climatic changes, the necessity for correct and bearable irrigation techniques
is in excessive demand. The crops want to be irrigated in step with their water
necessities primarily-based totally on climatic conditions. So, during this paper [1],
a clever irrigation machine is proposed that could manage irrigation robotically the
usage of a mobile utility. The snapshots of leaves are captured and are dispatched
to the server, which’s processed and as compared with the diseased within side the
cloud database. Based on differentiation a listing of sickness suspects is dispatched to
the consumer through cell utility. The machine includes (1). An embedded machine:
The embedded machine has a microcontroller, a soil moisture sensor, a temperature
sensor, a relay switch, and a Wi-Fi module. The consumer can log in to the cloud
the usage of the consumer call and the password. A digital dig cam is supplied
within side the utility for taking an image and sending it to the cloud. Then the
pictograph is processed from the cloud and as compared with a database this is
supplied. If the picture graph suits any of the pictures from the database, then the
prediction of illnesses is dispatched to the consumer’s utility (2). The android utility:
After the verification of the consumer, the consumer has the option to manage and
taking photos of leaves for studying the sickness. This manage statistics is then
processed via way of means of the microcontroller and automated mode. Users can
view statistics approximately the contemporary soil moisture sensor and temperature
sensor reading. This machine is straightforward and cost-effective. The consumer can
view and have interaction with the contemporary utility that’s related to the cloud.
The consumer also can come across sickness via way of means of taking photos of
leaves and sending them to the cloud.
B. Crop-Shop: This Application is to Extremity Financial Gain for Farmer
In this paper [2], for a while, in India farmer have low amount of freedom in picking
marketing and buyers for distribution of services. The nation in all States, aside
from three, order includes advertising and exchanging marketing of home stead
distributing should be coordinated of all-over state-claimed and regulated by market
yards’ showcases where go between intermediates squash, ranchers build edges, as
per Goldman Sachs, middle people have become overwhelming purchasers of the
agrarian market, coming about them to assume responsibility for the predicament of
the ranchers and swallowing every one of the benefits. In the paper [3], the rancher’s
workday and late evening anticipating a proper yield. They utilize a lot of monetary
The AgroCart Android Application to Manage Agriculture System 705
assets loaning cash and purchasing manures, seeds, and so on thus, they reserve the
privilege to appreciate each rupee acquired of their corp. In this particular situa-
tion, we suggest a framework that brings ranchers close to the outlets reducing the
marketers. The marketers commonly soak up to 70% of the blessings of ranchers
leaving them vulnerable. Our framework contains a transportable utility so one can
fill in as a level for the cultivators and outlets or customers to promote and buy their
ranch gadgets. These framework objectives give a useful price to ranchers to their
dwelling, House gadgets reducing the go-betweens. This lets in the outlets or the
customers to buy gadgets from the ranchers at a decrease than the everyday price.
C. E-Commerce Application for Farmers
In this paper [4, 5], the digital market is the platform to integrate and bridge the gap
between the farmer, markets, government, and users. It additionally lets everyone
be refreshed with the changing business situation. Indian farmers are facing various
challenges like not getting a good profit for their efforts as well as the investments
they invest in farming. There are several reasons for this, such as a limited season,
the shelf life of plants because there is not enough time to study market conditions.
Studying the flowering plants and products on the current market in the agricultural
sector is very important to get a good one. Since it makes no sense for the farmers
to physically reach all traders since it takes a lot of time and effort, where our where
farmers do not have that long time. In addition, the methods introduced by farmers
traditionally create limited access to customers (traders), so there are very few options
for selling the crop in the market, so with the new marketing method a farmer can
sell his harvest at every level of the marketing chain (trader), Markets or directly to
users along with several options. By selling their crops at a minimum price, they may
not be able to fulfill their changing demands. The platform will help to sell the plants
at different levels of the marketing chain, where an analysis of the current market
situation can be carried out with the help of KNN algorithms and with GPS to buy
or sell the plants. It provides a complaint box to launch complaints.
D. Crops Disease Finding and Anticipation by Android Applications
This technique was introduced by the students of JCEM in the Year 2015. In this
paper [6, 7], it is said that according to their survey in some villages more than 80
percentile of farmers tries new kind of crops instead of their old traditional because
of the lack of experiences or knowledge. This application proposed the following
services for farmers:
(1). Image Processing: In this module image are captured by camera and using
image processing formulas detects leaf diseases (2). Online marketplace: This
module facilitates third-party vendors by allowing them to sell agricultural prod-
ucts in one marketplace. Marketplace ecommerce will benefit for all in a number of
ways Vendor: Smaller stores without the investment can establish their own ecom-
merce website. Consumer: Help customers from seeing advances option on this
application in their mobile and find lower price and quality products (3). Market
Rate Guide: In this module, users get information about market rates across all the
available markets in geographically distributed areas. This is a web service which is
706 N. Sreenivasa et al.
the SMS only. The drawback is that data and information are sent through SMS and
there will be a delay in the SMS and have security issues.
G. Real-Time Atomization of Irrigation Systems for Social Transformation of
Indian Agriculture System
The paper “Real-time atomization of the agricultural environment for social modern-
ization of Indian agricultural system” using ARM7 and GSM’ is focused on an auto-
matic irrigation system for the development of the Indian agricultural system and
to give better irrigation in a particular area. This setup consists of an ARM7TDMI
core, which is a 32-bit-microprocessor; GSM plays an important role, since it is
responsible for the management and control of the irrigation in the field and sends
this in the form of coded signals to the receiver. GSM operates through SMS, and it is
the link between ARM processor and centralized unit. ARM7TDMI is an advanced
version of microprocessors and acts as the heart of the system. The goal of the project
is to realize basic applications in the field of automatic irrigation by programming
the components and placing the necessary equipment. This project is used to find
the field condition and use GSM for information sharing through SMS. So, in this
paper [10, 11] the purpose of this project is to improve the irrigation system of the
Indian agricultural system. Good environmental conditions are very important for
better plant growth, higher yields, and rational use of water and other resources
(such as fertilizers). This project was developed with mobile phones, based on agri-
cultural electric pumps to remotely control irrigation, thereby reducing labor costs.
The project has further applications in the field of family farming. It has a precise
irrigation system through controlled irrigation.
H. A Survey of Automated GSM-Based Irrigation Systems
GSM-Based Irrigation System is proposed by students of the Engineering Institute
of Engineering and Management Kolkata in the Year 2019. India has different types
of soil in different areas. The Agriculturists depend on the monsoon and all different
areas are not getting the same amount of rainwater in the country every year. If crops
did not get water at the scheduled time, crops will not grow as expected. Since it
happening every year, farmers are getting losses every year. There are many different
types of solutions to solve this problem but in this paper [12], they proposed a solution
with a less cost and sustainable solution. In this paper [13], they clearly mentioned, the
proposed system will since the content of the soil moisture and its fertility. According
to the moisture of the soil, the water will be supplied and everything will be done
automatically. This saves time, wastage of water and reduces the labor cost.
708 N. Sreenivasa et al.
3 System Requirements
User Requirements:
The system should be user-friendly so that it is easy to use for users. The system
should run 24 h a day. The system should refresh faster and take less time than
possible to respond. Loading of UI must be faster. The user will search for different
types of crops so the system must display the accurate result. They should handle a
large amount of data. The system should handle unexpected errors.
4 System Design
As we discussed proposed System has a three-module, Fig. 1 describes the data flow
of the application.
The user needs to login using a login credential. The system will authenticate if
the credential corrects it provides access to the application, if wrong then gives an
error alert. A new user has to register first than they can log in. After login former
The AgroCart Android Application to Manage Agriculture System 709
are requested to enter their personal information, details of crops they cultivate, and
total area of cultivation land. These all details will be stored in the system database.
User can also see their information and other user name and contact numbers of who
are registered for the cultivation of crops.
(ii) Use case Diagram
Pre-processing module data flow is described in Fig. 2 (Use Case Diagram), the
System will be handled by the Admin and used by the user.
Users will be first requested to register and they are given access to log in. Admin
has the option to update crop details, adding new crops manually according to the
market needs. The user can view crop details, the details contain the name of crops,
total crops needed in the market, available area, and registered area. Users are
requested to register for crop cultivation. The user has to select a crop and needs
to provide the total area that the user is going to cultivate the respective crops. All
details will be updated in the database and the details also updated in the application.
Here first come first serve will be followed if the registered area meets the marked
needs for particular crop than user are requested to register for other crops available,
so the user gets the minimum profit.
Buying and selling module working is described in Fig. 3 (use case diagram),
Here the admin will update and add new products (crops) details like name, price,
quantity, quality, and name of the seller.
Management of product is taken care of by the admin. Admin will have the option
to delete and modification of the product. The user needs to log in to the application to
view the product details. User has to select the products they want to buy. The selected
products will be automatically stored in their cart. The cart can be managed by the
user. Next user needs to provide an address and contact number. This information
will be verified by the admin. Next users will be provided different payment options,
the user needs to select one and must complete the payments. Payments done by the
user will be verified and confirm the order by the admin.
Blog module is shown in Fig. 4, the user needs to log in first and user have the
option to ask their query’s related to cultivation method, information about medicine
any other guidelines.
These queries will be answered by the expert and if YouTube videos are available
the system will provide them for the user.
5 Technical Requirments
6 Results
• Agro Cart is a mobile app that helps farmers to grow crops according to the market
requirement.
• With help of Agro Cart, farmers will get a good price for their crops. There will
be no wastage of crops or overproduction of crops.
• Agro Cart helps farmers to sell their products at a good price. Farmers can check
the market price and demand their products to sell at a good price.
• Agro Cart is a bridge between the customer and farmers.
• Agro Cart helps farmers to plan sprays of their crops efficiently with the help of
weather conditions.
• Agro cart is a mobile app that detects and identifies decease simply by taking
photos and uploading in an application.
The Stage where planning and development are done involves testing, operating,
developing, and maintenance of software products. This model improves project
communication and increases project manageability, cost control, and quality of
products. Leading the models and business rules are exhausted in the analysis phase.
Leading the software architecture is finished within the designing phase. Operations:
Installation, migration, and maintenance of the total system. Sometimes this model
is additionally called the Waterfall Model. The implementation phase of software
development involves design specification into ASCII text file and debugging, and
unit testing of the ASCII text file.
There is a need within the operations that provides the positions between the
farmers, the user, and the trader where the customer username and passwords are
stored in the system database resulting in the startup system session. A query that
receives data and are stored in the cloud via web content to induce general data
situations with the help of KNN algorithms and the use of GPS to buy or sell crops.
Provide a complaint box to file complaints.
Step 1: Login Page: Users need to register before getting through the application.
Complete the registration form by filling in all details like User Name, Email Id, and
Phone Number is as shown in the Fig. 5.
Step 2: Home Page: After successful registration user can view the home page
(Fig. 6) of the application. The home page consists of Register Crop, Market Rate,
Best Practices, and Blogs.
Step 3: Register Crop: User can view allowed area and registered area for
particular crop is shown in Fig. 7. User can register the crops according to allowed
area.
Step 4: Buying and Selling (Fig. 8): Customers can view the agriculture products
available in the application and then apply according to his requirement.
Step 4: Blogs: There are two options as shown in Fig. 9, first is the user can view
the suggestions provided by the agriculture experts and the second is the user can
write his views and queries so that others can make use of them.
The AgroCart Android Application to Manage Agriculture System 713
Fig. 6 Home
The AgroCart Android Application to Manage Agriculture System 715
Fig. 9 Blogs
718 N. Sreenivasa et al.
7 Conclusion
AGROCART is conceptually a new idea in the era of online market. While looking
at the previous works, the paper aims to successfully define a new concept of farmers
directly selling its stock to the customers and customers buying directly through a
virtual intermediary, i.e., our system. The project aimed at providing the maximum
profitability to the farmers who do not get profits due to the wholesalers who quote
their own price for the stock.
With the pre-production concept, farmers can plan before cultivating the crop, by
which they can get more profit. So, our system aims at providing maximum profit to
the farmers through direct deals with the customers. Providing fertilizers and guiding
farmers to gain good profits. According to the need of the market, the project guides
the farmers that which crops to be planted to gain maximum profit.
References
1. Ranjith, Anas S, Badhusha I, Zaheema OT, Faseela K, Shelly M (2017) Cloud based automated
irrigation and plant leaf disease detection system using an android application. Department of
Computer Science Engineering Eranad Knowledge City Technical Campus Manjeri, Kerala,
India
2. Chauhan N, Krishnakanth M, Kumar GP, Jotwani P, Tandon U, Gosh A, Garg N, Santhi V (2019)
Crop shop—an application to maximize profit for farmers. School of Computer Science and
Engineering, VIT Vellore
3. Grajales DFP et al (2015) Crop-planning, making smarter agriculture with climate data. In: 4th
International conference on agro-geoinformatics, pp 240–244
4. Bhende M, Moheni S, Patil S, Mishra P, Prasad P (2018) Digital market: e-commerce application
for farmers. Computer Engineering Department DYPIEMR, Akurdi, Pune, India
5. Li J, Zhou L (2018) Research on recommendation system of agricultural products e-commerce
platform based on hadoop. School of Information Science and Engineering Guilin University
of Technology Guilin, Guangxi Province, China
6. Anand VKM, Harshitha K, Chandan Kumar KN, Kumar N, Kashif Khan MK (2018) An
improved agriculture monitoring system using agri-app for better crop production. Department
of E&CE, SVCE, Bangalore, India
7. Reddy S, Pawar A, Rasane S, Kadam S (2015) A survey on crop disease detection and prevention
using android application. Department of Computer Science Engineering, JCEM K.M. Gad
8. Madhu A, Archana K, Kulal DH, Sunitha R, Honnavalli PB (2020) Smart Bot and e-commerce
approach based on internet of things and blockchain. Department of Computer Science
Engineering, PES University, Bangalore, Karnataka, India
9. Bhave A, Joshi R, Fernandes R, Somaiya KJ (2014) MahaFarm—an android based solution for
remunerative agriculture. Institute of Engineering & Information Technology, Mumbai, India
10. Galgalikar MM, Deshmukh GS (2013) Real-time automization of irrigation system for social
modernization of Indian agricultural system. Department of Electronics and Telecommunica-
tion Jawaharlal Darda Institute of Engineering & Technology, Yavatmal, India
11. Shiraz Pasha BR, Yogesha DB (2014) Microcontroller based automated irrigation system.
Department of Mechanical Engineering, MCE, Hassan
12. Ingale HT, Kasat NN (2014) Automated irrigation system. GF’s G.C.O.E, Jalgaon, C.O.E.T,
Amaravati
13. Chanda C, Agarwal D, Er B, Persis UI (2013) A survey of automated GSM based irrigation
systems. School of Information Technology and Engineering, VIT University, TN, India
Dielectric Recovery and Insulating
Properties of Coconut Oil
and Transformer Oil
1 Introduction
The intrinsic dielectric properties of some liquids have made scientists the world over
to believe that they could be superior to their liquid and solid counterparts. Hence,
they find their application in devices such as capacitors, cables, circuit breakers,
and more importantly transformers. In transformers, apart from providing electrical
insulation between windings, such liquids take away heat. The most widely studied
properties are insulation strength, dielectric loss tangent, and thermal conductivity.
Besides, other liquid properties like viscosity, thermal stability, specific gravity, and
flash point are also studied [1]. Fine water droplets and the fibrous impurities affect
the dielectric properties. However, chemical stability is often a concern. Some other
factors are the cost, space, and environmental effects. Insulation characteristics of
coconut oil as an alternative to the liquid insulation of power transformers have been
analyzed and reported [2].
Coconut oil is abundantly available in India, Sri Lanka, and other tropical countries
and is used as edible oil. The insulating properties of this oil have been a subject
of study, but its insulating behavior under different conditions is yet to be under-
stood. Generally, vegetable oils are rich in fat (nearly 90%) out of which about
65% comprise of short and medium chains and the saturated fats present in them
increases the melting point. They suffer from the disadvantage of low oxidation
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 719
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_62
720 T. Balachandra and S. V. Kulkarni
resistance. The concern for future scarcity and poor biodegradability attention is
focused on alternative natural esters such as vegetable oils, soya, and sunflower oil
[2]. Coconut oil like most natural esters contains more free fatty acids that increase
the conductivity. Some workers have experimented and proved that virgin coconut
oil has more potential dielectric properties in comparison to vegetable oil, palm oil,
and commercial grade coconut oil [3]. One of the main components in any power
system is the power transformer, and oil/paper/pressboard insulation is considered as
one of its major components. Mechanical, thermal, and electrical stresses cause dete-
rioration of insulation during service, and insulation failure is very common. Over
the last seven decades, mineral oil extracted from crude oil is a preferred candidate.
Studies have proved that esters must be processed, treated, and purified to enhance
its insulating properties, and this is a non-trivia task. Most distribution transformer
operations are affected by insulation systems supporting them. Accidental spill and
consequent environmental negative influences are a major concern in the use of
mineral oils because of their poor biodegradability. Therefore, there is a strong need
for a suitable substitute for mineral oil. Refined, bleached, and deodorized (RBD)
coconut oil has shown superior insulating properties compared to copra coconut oil,
and prototypes of transformers filled with these oils are used in Sri Lanka since
2001. It is also shown that in a relative scale, copra (direct natural extract) displays
the least impressive insulating behavior [4]. Elaborate studies on aging of coconut
oil and comparison with mineral oil have been reported [5]. Studies on thermal prop-
erties of vegetable are also reported [6]. Some have used insulating electrodes under
a quasi-uniform AC field for their study with mineral/palm/coconut oils [7]. Some
workers have experimented with blending two types of oils such as sesame oil and
field-aged mineral oil [8]. Small to medium distribution transformers are currently
using such insulating oils. Some other liquid dielectrics are oils extracted from rape-
seed, canola, and palm. Comparison of sesame, castor, and coconut oils has also been
tried using frequency domain spectroscopy by some workers [9]. The feasibility and
some important insulating properties such as breakdown voltage (BDV) and dielec-
tric recovery characteristics and a comparison with mineral oil are reported in this
paper. The scope of the present study is limited to breakdown voltage studies and
dielectric recovery of natural coconut oil in “as is” condition without any purifica-
tion, dehydration, or neutralization processes. Standard testing procedures (ASTM
D 877-02p1 ) are followed during experimentation to obtain the BDV. The present
work is focused on recovery of the dielectric property apart from studying electrical
breakdown. Results are compared for mineral and coconut oil (Fig. 1).
Dielectric Recovery and Insulating Properties of Coconut Oil … 721
Two test methods A and B are made available in standards documents, and the details
of choices of tests are mentioned in equipment manufacturing catalogs [10].
2.1.1 Procedure A
2.1.2 Procedure B
Insoluble material that does not settle during the interval between two tests is gener-
ally handled by this test. This is described as Procedure B in ASTM standards docu-
ment [10]. Insulating oils used in load tap changers, circuit breakers, and equipment
with heavy contaminants come under this category. After each successive breakdown,
fresh oil is refilled in the test cell (Fig. 2).
Since commercial untreated samples were used, heavier particles settling down
were expected, and therefore, Procedure A of ASTM standards was preferred. This
test does not require refilling after each experiment (Fig. 3).
722 T. Balachandra and S. V. Kulkarni
The rate of rise of application of the high voltage across the gap has an influence on
the observed breakdown voltage in any insulation system. However, this is considered
by standards bodies like IEC and ASTM. IEC standards specify rate of rise should
be 2 kV/s whereas ASTM standards specify that rate of rise should be 3 kV/s. To
ascertain consistency of results, some experiments were conducted by applying both
these raise times. This did not result in any significant change in the results (less than
5% variation). This is shown in Fig. 4a, b for mineral oil sample.
In this experiment, the voltage was raised at the rate of 3 kV/s until a voltage collapse
occurred. The voltage was brought to zero, and without any change, the voltage was
increased at 3 kV/s until the next breakdown occurred. Every trial was recorded
and plotted as trial number as shown in Fig. 4a, b. An experiment consisting of 110
trials without changing the oil was conducted. Contrary to expectations, the BDV
improved with the number of trials in case of mineral oil sample and displayed a
reducing trend after 110 trials. The dotted line in Fig. 5 shows the polynomial fit to
indicate the trend.
Dielectric Recovery and Insulating Properties of Coconut Oil … 723
Fig. 5 BDV against number of trials for a rate of raise of 3 kV/s (Average BDV = 34.6 kV for
standard test gap)
The applied AC voltage was gradually increased maintaining a constant rate of raise
as specified by the standards. This was 3 kV/s. The same procedure was adopted in
all the trials. For each fresh sample, five breakdowns were allowed, and the data were
recorded to obtain consistent results. Increasing the number of trials beyond five did
not result in any variations. Therefore, the trials for each sample were maintained
at five. The variation of average BDV (RMS value in kV) is plotted against the trial
numbers as shown in Fig. 6 (for example, 1 indicates the first trial, and 6 indicates
the 6th trial in Fig. 6). The BDV for mineral oil reduced from 21 to 18 kV whereas
the BDV improved from 15 kV to 17.5 kV in case of coconut oil. However, the
average value after 5 trials stabilized at 19.5 kV for mineral oil and 16.8 kV for
coconut oil. The results demonstrate that even though the BDV reduces initially
in mineral oils, after reaching a stable value, the withstand capability is not lost.
Dielectric Recovery and Insulating Properties of Coconut Oil … 725
The breakdown does not significantly reduce even after 100–110 trials—see Fig. 5.
This kind of dielectric recovery pattern was observable and recordable in case of
mineral oil. On the other hand, it was not possible to deduce a comparison with
coconut oil since commercially available coconut oil showed inconsistent results
after 30 trials. Nevertheless, the present results indicate that untreated commercially
available coconut oils can also be used as a dielectric, and it does not reach its flash
point even after repeated application of voltage (Fig. 7 and Table 1).
In this experiment, 5 trials for each oil are conducted, and average breakdown
voltage is calculated, and a graph is plotted for the same.
Fig. 7 Graphical comparisons between BDVs of transformer oil and coconut oil
726 T. Balachandra and S. V. Kulkarni
Two different samples of the same type of oil were tested for BDV to check the
repeatability of the results. Fig. 8a, b show the variation of the BDV with the trial
number. The x-axis figures indicate the trial number (1st trial, 2nd trial, etc.), and
the y-axis shows the BDV in kV (RMS). The results indicate that for 5 trials initially
during the first trial and last trial coconut oil displayed larger variations—see Fig. 8a,
and the variation was less for the corresponding data for transformer mineral oil. The
first and last trials showed more consistency for mineral oil as seen in Fig. 8b.
4 Conclusion
The present work addresses the dielectric recovery property of both widely used
mineral oil and an organic oil like commercially available coconut oil. Measurements
of breakdown voltages have been customary, but the present work also throws light on
the less focused “dielectric recovery.” The BDV measurement methods and the choice
of appropriate method for measurement specially to test commercially available oils
such as coconut oil are highlighted. The rate of raise of applied voltage (2 kV/s vs
3 kV/s) prescribed by standards bodies was found to have no significant influence on
the BDV values. A manual method of tracking the total time and gradual variation
of the applied voltage was found to give reasonably good repeatable results even
though many automated testers are available in the market that also suffer from the
disadvantage of not being periodically calibrated. The dielectric recovery of mineral
oil was found to be more compared to commercially available untreated coconut
oil. The experiments demonstrated that commercially available coconut oils have a
potential to be used as a liquid insulator in as is condition. Further work is in progress
to study the effect of different types of treatment of such oils before subjecting them
to tests and the influence of suspended metallic nanoparticles.
Dielectric Recovery and Insulating Properties of Coconut Oil … 727
Fig. 8 (a) Test for repeatability test (coconut oil). b Test for repeatability test (mineral oil)
References
1. Naidu MS, Kamaraju V (2013) High voltage engineering, 4th edn. Tata Mac-graw Hill
Publishing Company, New Delhi, India
2. Ranawana S, Ekanayaka CMB, Kurera NASA, Fernando MARM, Perera KAR (2008) Anal-
ysis of insulation characteristics of coconut oil as an alternative to the liquid insulation of
power transformers. In: IEEE region 10 and the 3rd international conference on industrial and
information systems, pp 1–5. https://doi.org/10.1109/ICIINFS.2008.4798493
3. Zaidi AAH, Hussin N, Jamil MKM (2015) Experimental study on vegetable oils properties
for power transformer. In: IEEE conference on energy conversion (CENCON), pp 349–353.
https://doi.org/10.1109/CENCON.2015.7409567
4. Matharage BSHMSY, Fernando MARM, Bandara MAAP, Jayantha GA, Kalpage CS (2013)
Performance of coconut oil as an alternative transformer liquid insulation. IEEE Trans Dielectr
Electr Insul 20(3):887–898. https://doi.org/10.1109/TDEI.2013.6518958
728 T. Balachandra and S. V. Kulkarni
5. Matharage BSHMSY, Bandara MAAP, Fernando MARM, Jayantha GA, Kalpage CS (2012)
Aging effect of coconut oil as transformer liquid insulation—comparison with mineral oil.
In: IEEE 7th international conference on industrial and information systems (ICIIS), pp 1–6.
https://doi.org/10.1109/ICIInfS.2012.6304770
6. Ahmed MR, Islm MS, Karmaker AK (2021) Experimental investigation of electrical and
thermal properties of vegetable oils for use in transformer. In: International conference on
automation, control and mechatronics for industry 4.0 (ACMI), pp 1–4. https://doi.org/10.
1109/ACMI53878.2021.9528278
7. Katim NIA et al (2017) Investigation on AC breakdown of vegetable oils with insulated elec-
trodes. In: International conference on high voltage engineering and power systems (ICHVEPS)
2017:312–316. https://doi.org/10.1109/ICHVEPS.2017.8225963
8. Bandara DU, Kumara JRSS, Fernando MARM, Kalpage CS (2017) Possibility of blending
sesame oil with field aged mineral oil for transformer applications. In: IEEE international
conference on industrial and information systems (ICIIS) 2017:1–4. https://doi.org/10.1109/
ICIINFS.2017.8300411
9. Kumara JRSS, Fernando MARM, Kalpage CS (2017) Comparison of coconut/sesame/castor
oils and their blends for transformer insulation. In: IEEE international conference on industrial
and information systems (ICIIS) 2017:1–6. https://doi.org/10.1109/ICIINFS.2017.8300410
10. https://hvtechnologies.com/which-oil-testing-standard-should-you-choose-to-determine-die
lectric-breakdown-voltage/
Predictive Maintenance of Lead-Acid
Batteries Using Machine Learning
Algorithms
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 729
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_63
730 H. R. Sridevi and S. Bothra
due to time constraints or the cost of the operation [3]. An internal short, sulphation,
corrosion, or other forms of degradation can cause a battery to fail [4].
Battery degradation can be identified by:
(1) Reduced battery capacity.
(2) Reduced discharge voltage.
(3) Increase in battery internal resistance.
The formation of lead sulphate crystals on the surface of electrodes happens over a
long period of time and causes the electrodes to become passive to additional electro-
chemical activity since it is non-conductive. Corrosion of the Pb/Pb alloy substrates
in the anode is another significant source of failure. During battery recharging, the
continual corrosion of the Pb/Pb alloy grid takes place. Natural oxidation of the
exposed Pb grid will result in the formation of PbO2 . Corrosion due to stress and
electrochemical attack all occur as a result of the grid’s ageing [5].
It is possible that unexpected battery failures will result in equipment becoming
unavailable, which can be quite costly [6]. It is the goal of this study to develop
prediction models for flexible maintenance of lead-acid batteries in order to extend
the battery life to its maximum potential.
By adopting data-based predictive maintenance procedures, it is possible to avert
unexpected battery failure.
Predictive Maintenance of Lead-Acid Batteries Using Machine … 731
The proposed battery maintenance model is based on measuring the internal resis-
tance of battery modules to evaluate how well they are working, and it was originally
created for lead-acid batteries [7].
The internal resistance of:
(1) New/healthy batteries were discovered to be in the range of 0.1–0.3 through
experiments.
(2) Old/replaced batteries were discovered to be greater than 5.
The data collected should be accurate and comprehensive. It must include.
(1) Date and time stamp,
(2) The battery’s serial number and
(3) The battery pack serial number.
The fundamental battery characteristics are listed in Table 1.
When an event happens that necessitates the replacement of a battery, the battery
is labelled as failed. This type of occurrence is referred to as an event of interest
(EoI) [8].
An EoI is typically classified into one of the categories listed below:
(1) Normal/spontaneous Ageing—the resistance of a battery gradually increases as
it ages, resulting in a reduction in battery capacity.
(2) Internal Fault—a battery’s condition could deteriorate dramatically as a result
of an internal fault.
Battery replacement, however, takes place only during a manual inspection. We
must check historical data manually to determine the exact timing of EoI. We aim
to design algorithms that automatically detect when a battery begins to degrade in
order to remove the need for recurrent and significant manual labour.
1 Σ (i )
N
μtV = Vt (1)
N i=1
[
|
|1 Σ N
σtV =√ (Vt(i ) − μtV ) (2)
N i=1
where the number of batteries in a single battery pack is denoted by the letter N, and
Vt(i ) is the voltage of each individual battery cell i, i ∊ {1,…,N} at instant of time t
μrt and σtr . The same method was used to calculate both.
Additionally, comparing relative performance to the intra-pack average could be
helpful in identifying a battery cell’s health. As a result, we use two indicators in our
prediction model: relative resistance and relative voltage, Rrt and Rvt respectively,
as shown in Eqs. (3) and (4).
Mathematically,
(3) Time series Features—the surveillance data are accumulated over a period of
time. A number of observations highlight the importance of building features
of time series, including the following:
Predictive Maintenance of Lead-Acid Batteries Using Machine … 733
The time period utilised to calculate the rate of change is denoted by TC , and the
number of time occurrences included in a single day is D.
VGt is the voltage gradient at time t, which is calculated by solving the least
squares regression problem in Eq. (6) as follows:
Σ
t
mina0 ,a1 = ||Vi − (a0 + a1 · i )||2 (6)
i=t−Tg
Tg is the time period during which the gradient was calculated. After finding the
optimal answer, we set,
VGt = a1
Rate of change of resistance and resistance gradient, RCt and RGt , are both
formulated in a similar way.
(4) Combined features—to introduce nonlinearity into our model, the feature
combination approach is employed. This generates a new feature, shown in
Eq. (7), which is combined with other existing features.
/
VDRt = Vt Rt (7)
Accordingly, the feature space has been expanded to include 14 attributes in total,
as shown in Table 2.
We train classification models for battery replacement based on the data that were
obtained in the previous step. We anticipate that the model will produce high-quality
predictions on both the training data and the testing data that have not yet been
observed.
734 H. R. Sridevi and S. Bothra
3 Prediction Methods
preferable: in this study, we investigate a supervised approach to PdM, given the over-
whelming adoption of preventive maintenance practises in industries, and therefore
the availability of acceptable data sets [10].
For PdM problems, regression algorithms are generally used when predicting the
remaining useful life of a process/equipment, whilst classification algorithms are used
when we aim to differentiate between healthy and unhealthy conditions of the system
being monitored, in our case a battery. Because of their accuracy, efficiency, and ease
of implementation, we chose the random forest and gradient boosting decision tree
classification algorithms.
The random forest (RF) is a machine learning technique used for solving problems
related to classification and regression. It employs ensemble learning, a method used
for resolving complex problems by integrating many classifiers. The RF algorithm
uses numerous decision trees to determine the outcome, which is dependent on the
predictions of the decision trees. It anticipates by averaging the results of different
trees. As the number of trees increases, the precision of the output improves.
There are two tactics in RF that makes it superior to other typical classification and
regression trees: random variable selection at the split steps and bootstrap aggrega-
tion, often known as bagging. Bootstrap aggregation builds a series of new data sets
by sampling evenly with replacement from the original data set and then fitting the
models with them. A predictor with a variance lower than other typical classification
and regression trees can be obtained by averaging over all models.
Survival analysis is a set of statistical methods that answers questions like “how long
before a specific event occurs.” It is also known as “time to event” analysis. This
method was mainly established by medical researchers who were more interested in
determining the predicted longevity of patients in different cohorts; however, it can
also be applied in this case for predictive maintenance.
With techniques such as Kaplan Meier Estimate or The Cox Proportional Hazard
Model, a survival curve estimate is obtained; predictions are drawn by calculating
the probability of survival beyond a specific time t.
A flowchart showing the deployment of the algorithm to an electrical system is
shown in Fig. 2.
4 Conclusion
The purpose of this study is to address the problem of anticipating the breakdown
of lead-acid battery systems. ML Algorithms: random forest and gradient boosting
decision tree, and survival analysis are used to solve the challenge of determining
a battery maintenance policy based on historical data. The data consist primarily of
sensor readings accumulated throughout the course of the battery’s life. To effectively
utilise the power of big data, we used a feature expansion technique on our collected
data. More frequent and regular readouts will improve model prediction performance,
which is expected given the increased amount of data provided to the predictive
model. This work has a lot of practical applications. The model is self-contained
and requires no additional effort. As long as the fundamental characteristics of the
battery are identified, the model can be easily applied to batteries manufactured by any
manufacturer. The results of our method should outperform the present maintenance
policies considerably.
Predictive Maintenance of Lead-Acid Batteries Using Machine … 737
References
5. Yang J, Hu C, Wang H, Yang K, Liu JB, Yan H (2017) Review on the research of failure mode
and mechanism for lead–acid batteries. Int J Energy Res 41:336–352. https://doi.org/10.1002/
er.3613
6. Voronov S, Krysander M, Frisk E (2020) Predictive maintenance of lead-acid batteries with
sparse vehicle operational data
7. Gomez-Parra M et al (2009) Implementation of a new predictive maintenance methodology for
batteries. application to railway operations. In: IEEE vehicle power and propulsion conference,
pp 1236–1243. https://doi.org/10.1109/VPPC.2009.5289709.
8. Tang JX, Du JH, Lin Y, Jia QS (2020) Predictive maintenance of VRLA batteries in UPS
towards reliable data centers.
9. Wuest T, Weimer D, Irgens C, Thoben KD (2016) Machine learning in manufacturing:
advantages, challenges, and applications. Prod Manuf Res 4
10. Susto GA, Schirru A, Pampuri S, McLoone S, Beghi A (2015) Machine learning for predictive
maintenance: a multiple classifier approach. IEEE Trans Industr Inf 11(3):812–820. https://
doi.org/10.1109/TII.2014.2349359
11. Breiman L, Friedman J, Olshen R, Stone C (1984) In: Chollet F (ed) Classification and
regression trees. Taylor and Francis
Cloud-Aided IoT for Monitoring Health
Care
1 Introduction
The Internet of Things (IoT) is without a doubt perhaps the most reviving subject
to the industry, private and public sector, and research communal. While customary
web encourages correspondence between various restricted gadgets and people, IoT
interfaces a wide range of associated “things” into a far-reaching organization of
interrelated figuring insight without the mediation of a human. The reception of IoT
and the advancement of remote correspondence advances permit patient’s ailments to
be spilled to guardians continuously [1, 2]. Besides, numerous accessible sensors and
versatile gadgets can quantify explicit human physiological boundaries, for example,
pulse (HR), breath rate (RR), and circulatory strain (BP) through a solitary touch.
Although it is as yet in the early advancement stage, organizations and enterprises
have immediately embraced the force of IoT in their current frameworks, and they
have seen upgrades underway just as client encounters [3].
The ascent of versatile gadgets, artificial intelligence [6], and cloud computing
guarantees a firm establishment for the development of IoT in the medical services
area to change each part of living souls. Because of the combination of IoT and
cloud computing in the medical care area, wellbeing experts can give quicker, more
productive, and better medical care administrations, which hence lead to better
patient experience. Subsequently, it brings better medical care administrations, better
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 739
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_64
740 A. Manikonda and N. Nalini
patient experience, and less desk work for wellbeing experts. Specifically, IoT-based
advances have as of late become famous for making non-intrusive patient wellbeing
status observing where different clinical gadgets, sensors, and demonstrative gadgets
can be seen as shrewd gadgets or items comprising a centerpiece of the IoT. Even
though IoT can be applied in numerous clinical applications, the proper asset of the
executives of a lot of observed information is been put away in cloud workers to
dispense with paper-based works.
Indeed, there are a few limitations identified with the center IoT gadgets, for
example, restricted memory, power supply, and preparing capacities that adversely
impact the exhibition of the organizations. Also, they are exclusively centered around
a single application based on clients. This powers every medical care supplier to send
a customized checking network, which confines sharing of the actual sensors with
different associations especially those that are not fundamentally associated with
similar security perspectives [4]. As these customized networks require free admin-
istrations regarding their assets, for example, correspondence and organizations, the
expense is expanded relatively. Nonetheless, the mix of IoT innovation in the medical
services brings a few difficulties, including information stockpiling, information the
board, trade of information between gadgets, security, and protection, and brought
together and omnipresent access. One potential solution that can address these diffi-
culties is cloud computing innovation. Figure 1 shows an ordinary medical care
framework that coordinates both IoT and cloud computing to give the capacity to
get to shared clinical information and normal foundation pervasively and straight-
forwardly, offering on-request benefits, over the organization, and performing tasks
that address developing issues.
IoT-based advances interfacing clinical checking gadgets through cell phones to
cloud stages have as of late become prominent for making non-intrusive patient
wellbeing status observing [5, 6]. The stage can decrease the expense of medical
care as administrations can be shared by various end clients. Additionally, applying
an IoT–WBAN stage in a wellbeing checking framework improves the endeavors of
framework usage by sharing data on that specific stage, in this way upgrading the
coordination and tasks [7].
As of late, IoT and cloud registering innovation have indicated a vital job in far-
off patient checking applications because the associated gadgets let medical care
suppliers and doctors notice patients distantly. This pattern prompts fewer admissions
to the clinic, more agreeable administrations, and activity cost decrease. The primary
component of patient observing is different sorts of sensors and wearable gadgets.
They help medical care experts in noticing and analyzing patients’ vital organs and
Cloud-Aided IoT for Monitoring Health Care 741
indications depriving their physical presence. With the help of fitting segments in
the patient checking structure, it will end up being an early admonition framework
for potential clinical manifestations that could be perilous to the patient whenever
left untreated.
2.2 Telemedicine
An expanding prerequisite from another age of the educated populace has pushed
for fast reception of telemedicine due to its accommodation, efficiency, and canny
highlights. Telemedicine empowers the distant conveyance of medical care admin-
istration, so patients can be dealt with distantly utilizing broadcast communica-
tions innovation. The discovery in innovation and medical services advancement
has significantly improved its convenience and making it an essential piece of far-
off patient checking. As of late, on account of the improvement of IoT and cloud
processing, telemedicine innovation will see significantly more upgrades that help
the correspondence among specialists and patients across existence.
742 A. Manikonda and N. Nalini
3.1 Bluetooth
NFC allows you to follow where individuals are, and who’s done what. Clinical
staff can know, progressively, where a patient is, the point at which the attendant
last visited, or what treatment a specialist just managed. NFC labels and NFC-
empowered wristbands can supplant the customary armbands worn by patients and
can be refreshed with constant data, for example, when a prescription was last given,
or which system should be performed.
At the point when NFC labels are added to a medicine’s bundling or marking, tapping
the tag with a smartphone or tablet allows you to confirm the prescription’s legit-
imacy, see insights regarding measurements, or read about results and medication
collaborations. The tag can likewise give admittance to web joins, to get more data,
demand a top off, or contact a clinical expert. These are just some of the ways NFC
can enhance health care.
The physical measures are associated with the sensors in IoT and virtual machines
in the cloud. Some of the measures are discussed below.
Cloud-Aided IoT for Monitoring Health Care 745
The memory of IoT gadgets is little, and the vast majority of the gadget’s memory
is utilized to store an implanted working framework. Thus, the framework that
utilizes IoT registering gadgets has restricted memory to perform complex security
conventions.
Almost all IoT registering gadgets have low-power processors; the processor needs
to play out various errands including overseeing, detecting, breaking down, saving,
and speaking with a restricted force source. Subsequently, power of the processor to
do the security method is a difficult issue.
Most IoT gadgets have a low battery limit. Thus, there is an instrument that compels
them to consequently enter the force saving mode to save power at sensors’ inactive
time. Along these lines, it is hard for IoT gadgets to perform security conventions
constantly.
4.4 Scalability
There is a sharp ascent in the quantity of registering gadgets in the IoT organization.
Consequently, it is trying to locate the most appropriate security calculation for the
developing number of gadgets in the IoT in the medical care organization.
The smart city framework is executed by joint effort among government and public
and private associations. Albeit, smart urban communities’ idea is achieving expo-
sure these days, and there is no single existing city that achieves all necessities for
a smart city [11]. A portion of the smart innovations pertinent for smart urban areas
incorporates energy, structures, portability, authority and instruction, and medical
care. Smart city applications usage uses thoughts from the region of computerized
reasoning, implanted figuring, AI, cloud registering, heterogeneous organizations,
and biometrics. Likewise, it utilizes various parts including sensors, RFIDs, regis-
tering, and organizing objects to amplify the use of assets in various applications.
Overseeing different administrations in a smart city requires an unpredictable orga-
nizational foundation. Smart city applications are observing and recording the resi-
dents’ private data consequently, and it is basic to appropriately make sure about
this information. Potential dangers to smart urban areas framework are listening
in, burglary, forswearing of administration, the disappointment of equipment or
programming, producing bugs, lacking testing, and catastrophic events.
6 Conclusion
This paper is helpful for perusers who are keen on learning various parts of IoT and
cloud processing in medical care. It bids a total IoT and cloud processing structure
for medical services that underpin submissions in using the IoT and cloud regis-
tering spine and gives a stage to encourage the transmission of clinical informa-
tion amid clinical gadgets and far-off workers or cloud registering stages. Medical
care associations need creative what’s more, savvy techniques to help medical care
suppliers discover more beneficial approaches to address developing quantities of
patients’ information. Electronic Health Records (EHR) combined with cloud regis-
tering frameworks bring an answer for the “huge information” challenge where it
obliges capacity assets and encourages the way toward sharing patients’ information
between medical care suppliers. From that point forward, we gather existing inno-
vative work measures in the medical services industry by segments, applications,
what’s more, end-client, and afterward, critical accomplishments that demonstrate
the adequacy of incorporating IoT what’s more and cloud registering in medical
services were depicted.
References
1. Abidi B, Jilbab A, Haziti ME (2017) Wireless sensor networks in biomedical: wireless body
area networks. In: Europe and MENA cooperation advances in information and communication
technologies. Springer, Berlin/Heidelberg, Germany; pp 321–329
2. Xu Q, Ren P, Song H, Du Q (2016) Security enhancement for IoT communications exposed to
eavesdroppers with uncertain locations. IEEE Access 4:2840–2853
3. Scuotto V, Ferraris A, Bresciani S (2016) Internet of things: applications and challenges in
smart cities: a case study of IBM smart city projects. Bus Process Manage J 22:357–367
748 A. Manikonda and N. Nalini
4. Stergiou C, Psannis KE, Kim BG, Gupta B (2018) Secure integration of IoT and cloud
computing. Future Gener Comput Syst 78:964–975
5. Truong HL, Dustdar S (2015) Principles for engineering IoT cloud systems. IEEE Cloud
Comput 2:68–76
6. Minh DL, Sadeghi-Niaraki A, Huy HD, Min K, Moon H (2018) Deep learning approach for
short-term stock trends prediction based on two-stream gated recurrent unit network. IEEE
Access 6:55392–55404
7. Jiang Y, Cui S, Xia T, Sun T, Tan H, Yu F, Su Y, Wu S, Wang D, Zhu N (2020) Real-time
monitoring of heavy metals in healthcare via twistable and washable smartsensors. Anal Chem
92(21):14536–14541. https://doi.org/10.1021/acs.analchem.0c02723
8. Srimathi B, Ananthkumar T (2020) Li-Fi based automated patient healthcare monitoring
system. Indian J Public Health Res Dev Feb2020 11(2):393–398 6p
9. Ganiga R, Pai RM, Manohara Pai MM, Sinha RK, Mowla S (2020) Integrating NFC and IoT
to provide healthcare services in cloud-based EHR system. In: Sengodan T, Murugappan M,
Misra S (eds) Advances in electrical and computer technologies. Lecture Notes in Electrical
Engineering, vol 672. Springer, Singapore. https://doi.org/10.1007/978-981-15-5558-9_33
10. Sengupta S, Bhunia SS (2020) Secure data management in cloudlet assisted IoT enabled e-
health framework in smart City. IEEE Sens J 20(16):9581–9588. https://doi.org/10.1109/JSEN.
2020.2988723
11. Birje MN, Hanji SS (2020) Internet of things based distributed healthcare systems: a review. J
Data Inf Manage 2:149–165. https://doi.org/10.1007/s42488-020-00027-x
12. Hussain S, Mahmud U, Yang S (2020) Care-talk: An IoT-enabled cloud-assisted smart fleet
maintenance system. IEEE Internet Things J. https://doi.org/10.1109/JIOT.2020.2986342
Energy-Efficient Dynamic Source
Routing in Wireless Sensor Networks
1 Introductıon
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 749
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_65
750 D. R. Bolla et al.
the routing process much effectively till now ensuring a potential communication,
but these algorithms exhibit narrow focus concerning energy dealing. The design of
an optimal routing protocol that mitigates high energy consumption among nodes
always suits the current scenario for effective network establishment [3, 4].
The routing protocols have been classified as hierarchical and flat routing proto-
cols, in which flat is classified as proactive and reactive. Keeping in mind choosing
an apt routing protocol is the fundamental concept in this manuscript; the design
aspects among various reactive routing protocols such as AODV, DSR and proac-
tive routing protocol such as DSDV have been analyzed and implemented through
performance comparison ultimately to come out with an optimum routing scenario
for energy management between nodes; here, the prime focus is on sensor nodes
for effective power management through the modification of existing DSR routing
protocol further to enhance the network lifetime [5–7]. So this chapter introduces the
modification of the existing DSR routing algorithm to mitigate high energy consump-
tion among sensor nodes through residual energy measuring mechanism, apparently,
the modified protocol intends to route the data between the nodes that are with high
residual energy levels and gives up the least energy node path, ultimately for power
saving among those to improve the network lifetime. The manuscript involves in
dealing with the prior related implementation in Chapter 2 ascertaining the possible
aspects in dealing the framed objectives, routing protocol classification and residual
energy measuring mechanism dealt in Chapter 3, performance aspects considering
various parameters discussed in Chapter 4 followed by results and its discussion in
Chapter 5 concluding the complete scenario at the end.
2 Related Work
The WSNs are mainly classified based on the proactive or reactive protocols, the
classification of routing protocols is proactive energy routing protocol and the table-
driven routing protocols, and in this, each node here makes an attempt in order to
maintain consistency and updates the information to the neighboring nodes in the
network [1–3].
In this research work, we have made the necessary changes in the network, and
thus, the routing protocol forwards the route, and it can use the distance vector algo-
rithms or either link-state algorithms which contain the next-hop address and in turn
the destination address [4]. Further, the routing protocols can be categorized based on
the reactive energy-aware routing which is of an on-demand routing and this estab-
lishes the route through the route discovery and the routes are been maintained.form
source node the routing is been established based on the route discovery process, and
thus, it forwards the route requests once the route requests are established the route
establishes the connection to the destination nodes. Further, this can be communi-
cated by sending the route-reply back to the source node through the neighboring
nodes or the intermediate nodes, and the route gets updated [8, 9].
Energy-Efficient Dynamic Source Routing … 751
In a routing protocol, the information from the surroundings is collected and recorded
by the sensor nodes and sent to different stations and end-users. A large portion of
the past work on routing in wireless networks involved tracking and keeping up the
correct path to the destination.
During the transfer of data, the nodes select the optimal path. The distributed
algorithm executed by the sensor node will produce a typical routing table in order
to minimize resource utilization. The selection of an optimal routing algorithm is
the obvious factor for the sensors in order to perform well; both the reactive and
proactive routing protocols serve this requirement for effective communication.
In the case of on-demand routing to reactive routing, the maintenance made is nominal
in routing protocols based on the movement of the system hubs. The discovery of
the routes is performed only when a sourcing hub wishes them and in turn, sets up a
connection to send and receive the packets. in these protocols, route discovery and
route maintenance are the key fundamental factors/strategies. Based on the requests
received from the routes, the data is sent from the source to the nearest hubs and after
that the requesters are forwarded to the nodes adjacent to them and this process is
known as the route discovery process.
The target node receives the route request a route-reply packet is reversed through
the neighbors to the source hub. In the process of route maintenance, the route is
established but if suddenly route failure is faced then another route is established. As
time passes, each node tries to learn the routing paths. Dynamic source routing (DSR)
and ad-hoc on-demand distance vector (AODV) are well-known reactive routing
protocols. The disadvantages are extreme flooding leads to clogging of network and
to find route high latency time is required.
The hybrid routing is a combination of both reactive and proactive behaviors having
the potential to propose higher scalability. Routing is primarily fixed with some
table-driven prospected routes and the demand is served from the nodes which are
additionally activated through on-demand driven flooding. Zone routing protocol
(ZRP) is one of the hybrid routing protocols. The disadvantages are both amount
of additional hubs activated and the response to traffic demand depend on traffic
gradient volume.
This proactive routing protocol was developed based on the Bellman–Ford algorithm
for wireless networks [2] where each hub maintains a table to find the shortest path
from hop to every other hop.
Each node frequently updates destination path information because of the random
topology. The neighbor hub exchanges the routing table data, and every hub updates
the new routing data. The data is cached if it cannot discover its destination.
At that point, information packets are permitted to receive till the capture report
appears from the destination. In this routing protocol, a most extreme size of buffering
is accessible in memory to gather those information packets until the routing data
is not received. Low latency and loop-free paths are some of the advantages of this
Energy-Efficient Dynamic Source Routing … 753
The energy-efficiency DSR is been proposed mainly for the MANET application to
reduce the energy consumption by using the energy efficiency route metrics and the
routes are been established based on the minimum hop count metrics, and further, the
routes are been established based on the minimum energy routes such the network
lifetime seems to be improved in the proposed work algorithm or methodology used.
754 D. R. Bolla et al.
In the routing process, we have a sender node and a destination node; when a
sender node wants to communicate to a destination node, based on its route cache,
it decides which route to be selected, and further best route out of the existing route
will be chosen if there is no route that exists; then, the route discovery process may be
initiated to establish a route. Further, if any route is available in the existing routing
table, it will establish the route; else, the route has to be discovered. The process of
route discovery can be formed based on broadcasting the route request packet over
the network.
The amount of energy level in a node is represented by the energy model. At the
starting, the node has an initial energy value, and when the packet is transmitted and
received, it has transmitted and receives power level. The help of a power proficient
routing matrix in the routing table reduces the conception of power in WSN as seen in
most of the routing protocols used for energy efficiency [3–5]. Thus, power efficiency
can be introduced in the routing protocol to transfer the data packet.
Energy-Efficient Dynamic Source Routing … 755
The energy is dependent upon the power consumption in the active, sleep and idle
states
E = E A + ES + EI (5)
where P represents the power consumption and Pactive, Psleep and Pidle represent
route exists the active, sleep, idle power consumptions and where the Tactive, Tsleep,
Tidle represents the time spent by the transceiver in the corresponding states. This
parameter may vary based on the number of bits to be transmitted and the bandwidth
of the channel.
4.2 Throughput
It is also referred to as the packet delivery ratio. It is the ratio of the total number
of packets received at the destination to the total number of packets sent from the
source node as in Eq. (7).
TotalpacketsReceived ∗ 8
Throughput = (7)
(LastPacketReceived − FirstPacketReceived)
The average time taken by the packets to reach the destination, wherein which all
the delays like queuing delay, interference delay, route discovery delay, etc., is also
referred to as path optimality or E2ED as shown in Eq. (8).
It is defined as the variability over time of the packet latency across the network. If
any network has a constant latency, it means it is having no jitter. If a network with
N packets where i = 1–N , and if N is greater than 2, then the jitter can be expressed
as in Equation (9).
The energy can be consumed based on the mode of operation of the corresponding
node. It is further observed that when an idle node and sleep node also consume the
power the simulation may go worse in this scenario. It is been noted that the idle
mode and sleep modes need to be taken care of in the simulation environment.
The simulations carried out in this protocol were in network simulator-2. The simu-
lations carried out were compared with the existing protocols and found that the
proposed algorithm outperforms with better efficiency. The simulation parameters
as shown in Table 1.
The simulation parameters have been analyzed for different speeds of the nodes
like 10 m/sec, 15 m/sec and 20 m/sec for the throughput and the end-to-end delay and
the jitter parameters; it is observed that the proposed EEDR protocol outperforms
the existing state of are routing protocols like AODV, ZRP and DSR protocols, and
the simulation results obtained for the parameters proposed in Table 1 are discussed
in Figs. 1, 2, 3, 4, 5, 6, 7, 8 and 9.
6 Conclusion
The research work carried out discussed various protocols and the comparative anal-
ysis has been presented in the manuscript, the throughput and end-to-end delay; jitter
Energy-Efficient Dynamic Source Routing … 757
Table 1 Parameters of
S. No. Criterion Value
simulation
i Channel type Wireless channel
ii Number of nodes 20 node
iii Topology size 600 × 600
iv Packet size 512 bytes
v Traffic type Constant bit rate (CBR)
vi Antenna model Omni antenna
vii Mobility model Random mobility model
viii Parameters Temperature, BP, HR
ix Routing protocol DSDV, AODV, DSR
x DSDV, AODV, DSR MAC
xi Simulation Tool NS-2.35
was analyzed in this article for a maximum speed of 10m/sec, 15m/sec. 20m/sec.
Proposed EEDR protocol outperforms the existing state of are routing protocols like
AODV, ZRP and DSR protocols with respect to the communication and computation
of the devices end to end. Finally, the analysis was performed comparatively. As we
are aware, energy efficiency and quality of service are vital in the frequencies in low-
radio. The capacity of the ad-hoc networks in wireless networks is exhibited. Finally,
we calculated and evaluated the routing protocols with various speeds and node
758 D. R. Bolla et al.
References
1. Ali A et al (2020) Adaptive bitrate video transmission over cognitive radio networks using
cross layer routing approach. IEEE Trans Cognitive Commun Netw 6(3):935–945. https://doi.
org/10.1109/TCCN.2020.2990673
2. Ramly M, Abdullah NF, Nordin R (2021) Cross-layer design and performance analysis for
ultra-reliable factory of the future based on 5G mobile networks. IEEE Access 9:68161–68175.
https://doi.org/10.1109/ACCESS.2021.3078165
3. Salameh HAB, Bani Irshaid M, Al Ajlouni M, Aloqaily M (2021) Energy-efficient cross-layer
spectrum sharing in CR green IoT networks. IEEE Trans Green Commun Netw 5(3):1091–
1100. https://doi.org/10.1109/TGCN.2021.3076695
4. Guo H Wu R, Qi B, Liu Z (2021) Lifespan-balance-based energy-efficient routing for recharge-
able wireless sensor networks. IEEE Sens J 21(24):28131–28142. https://doi.org/10.1109/
JSEN.2021.3124922
5. K. Sangaiah et al. (2021) Energy-aware geographic routing for real-time workforce monitoring
in industrial informatics. IEEE Internet Things J 8(12):9753–9762. https://doi.org/10.1109/
JIOT.2021.3056419
6. Xu H, Huang L, Qiao C, Zhang Y, Sun Q (2012) Bandwidth-power aware cooperative multipath
routing for wireless multimedia sensor networks. IEEE Trans Wireless Commun 11(4):1532–
1543. https://doi.org/10.1109/TWC.2012.020812.111265
7. Khan A et al (2021) EH-IRSP: energy harvesting based intelligent relay selection protocol.
IEEE Access 9:64189–64199. https://doi.org/10.1109/ACCESS.2020.3044700
8. Bolla DR, Shivashankar (2017) An efficient protocol for reducing channel interference and
access delay in CRNs. In: 2017 2nd IEEE international conference on recent trends in elec-
tronics, information & communication technology (RTEICT), pp 2247–2251. https://doi.org/
10.1109/RTEICT.2017.8257000
9. Shankar S, Jijesh J, Bolla DR, Penna M, Sruthi PV, Gowthami A (2020) Early detection of
flood monitoring and alerting system to save human lives. In: 2020 International conference
on recent trends on electronics, information, communication & technology (RTEICT), pp
353–357. https://doi.org/10.1109/RTEICT49044.2020.9315556
10. Bolla DR, Jijesh JJ, Palle SS, Penna M, Keshavamurthy, Shivashankar (2020) An IoT based
smart E-fuel stations using ESP-32. In: 2020 International conference on recent trends on
electronics, information, communication & technology (RTEICT), pp 333–336. https://doi.
org/10.1109/RTEICT49044.2020.9315676
11. Jambli MN, Shuhaimi WBWM, Lenando H, Abdullah J, Suhaili SM (2014) Performance eval-
uation of AODV in MASNETs: study on different simulators. IEEE international symposium
on robotics and manufacturing automation (ROMA) 2014:131–135. https://doi.org/10.1109/
ROMA.2014.7295875
12. Wazid M, Das AK, Kumar N, Alazab M (2021) Designing authenticated key management
scheme in 6G-enabled network in a box deployed for industrial applications. IEEE Trans Ind
Inf 17(10):7174–7184. https://doi.org/10.1109/TII.2020.3020303
13. Zhao J, Wang Y, Lu H, Li Z, Ma X (2021) Interference-based QoS and capacity analysis of
VANETs for safety applications. IEEE Trans Veh Technol 70(3):2448–2464. https://doi.org/
10.1109/TVT.2021.3059740
Energy-Efficient Dynamic Source Routing … 763
14. Jabbari A, Mohasefi JB (2022) A secure and LoRaWAN compatible user authentication protocol
for critical applications in the IoT environment. IEEE Trans Ind Inf 18(1):56–65. https://doi.
org/10.1109/TII.2021.3075440
Impact on Squeeze Film Lubrication
on Long Cylinder and Infinite Plane
Surface Subject
to Magnetohydrodynamics and Couple
Stress Lubrication
1 Introduction
Thrust bearings are a particular type of rotary bearings that permanently rotate
between the parts and are intended to carry an axial load. They are commonly used
in automotive (like in modern cars which uses helical gears) marine and aerospace
applications. To improve the life and use of such equipment, the ideal performance
of these moving parts is of greatest importance. The knowledge that the application
of magnetic and electric fields enhances the load supporting capacity of the liquid
metal bearings resulted in the development of magnetohydrodynamics lubrication.
Liquid metals have highly conducting properties and have become an area of interest
recently. Many magnetohydrodynamic lubrication problems have been analysed for
a few years [1–3].
Earlier study of Newtonian fluid do not consider the size of fluid particles and is
not an acceptable engineering approach for the analysis of fluids with microstructure
additives. Hence, on the basis of proposed microcontinuum theories [4, 5] and Stoke’s
[6] microcontinuum theory, couple stress fluid model has been used by several authors
to study hydrodynamic lubrication [7–11].
The combined effect of couple stresses and magnetohydrodynamics has been
analysed by many authors. [12–14]. They all found that combined effect MHD and
couple stress provides a promising increase in the bearing characteristics.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 765
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_66
766 C. K. Sreekala et al.
“In the application of squeezing film bearings, lubricated joints and injection
moulding systems the squeeze flow between a cylinder and a plane is also important.
The couple stress effects of squeeze film characteristics between a cylinder and a
plane surface have been studied by Jaw Ren Lin. et.al,” [15]. A further study is
motivated by this way as we have no idea how the collective effect of transverse
magnetic fluid and couple stress fluid affects the cylinder plane system.
2 Mathematical Formulation
∂ 2u ∂ 4u ∂p
μ − η − σ B02 u = (1)
∂ z2 ∂ z4 ∂x
∂p
=0 (2)
∂z
∂u ∂w
+ =0 (3)
∂x ∂z
∂ 2u
u = 0, w = −V , =0 (4)
∂z 2
∂ 2u
u = 0, w = 0, =0 (5)
∂z 2
Solving momentum expression (1) subject to the conditions (4) and (5) we get
(( ) )
1 ∂p A2 Cosh 2lB (2z − h) B2 Cosh 2lA (2z − h)
u= − −1
σ B02 ∂ x (A2 − B 2 ) Cosh Bh 2l
(A2 − B 2 ) Cosh Ah 2l
(6)
where [12]
⎧ ⎫1/ ⎧ ⎫1/
⎨ 1 + (1 − 4σ B 2 l 2 /μ) / 2 ⎪ ⎨ 1 − (1 − 4σ B 2 l 2 /μ) / 2 ⎪
⎪ 1 2 ⎪ 1 2
⎬ ⎬
0 0
A= ,B =
⎪
⎩ 2 ⎪
⎭ ⎪
⎩ 2 ⎪
⎭
2
In the film region,x << R and hence, film thickness is approximated to h = h m + 2R
x
Plugging expression (6) in to the continuity expression (2) and substituting the
Eqs. (4) and (5), we derive the modified Reynolds equations in the form
( (( ) ))
∂ 12h 2m0 ∂ p 2l A2 Bh 2l B 2 Ah
tan h − tan h − h = 12V μ
∂x M02 ∂ x B(A2 − B 2 ) 2l A(A2 − B 2 ) 2l
(7)
x Ph 2m 0 l hm hm0 ∗ h
x∗ = , P∗ = , l∗ = , h ∗m = ,β = ,h = (8)
R μRV hm0 hm0 R hm0
where
g(h ∗ , l ∗ , M0 )
(( ) )
12l ∗ 2 A∗ B∗h∗ 2B ∗ A∗ h ∗ h∗
2 2
= tan h ∗ − ∗ ∗2 tan h ∗ − ∗
M02 B ∗ (A∗2 − B ∗2 ) 2l A (A − B ∗2 ) 2l l
(10)
d p∗
And = 0 when x ∗ = 0 (11)
dx ∗
Integrating expression (9) and applying the boundary condition (11), we get the
non-dimensional pressure
(
∗
1
12x ∗
p =− dx ∗ (12)
x∗ βg(h ∗ , l ∗ , M0 )
(1 (( x ∗ =1 ( )
∗
1
12x ∗ ∗ ∗
T = − dx dx dh ∗m (14)
x ∗ =−1 x∗ βg(h ∗ , l ∗ , M0 )
h ∗m
The paper describes characteristics of the bearings due to squeeze film lubrication and
by the collective impact of magnetohydrodynamics and a couple stresses on a cylinder
plane system. The magnetohydrodynamics is described by Hartmann number M0 ,
and couple stress parameter l ∗ characterizes couple stresses.
Figure 2 represents variation of p ∗ against x ∗ with β = 0.04, l ∗ = 0.2, h ∗m = 0.8.
The dotted line characterizes the non-magnetic case (M0 = 0), whereas the solid
lines denote the magnetic case M0 (2 ∼ 6). It is seen in the graph that with the
Impact on Squeeze Film Lubrication … 769
In Figure: 5, W ∗ (load) is plotted against h ∗m (min film thickness) for several values of
Hartmann number M0 by fixing l ∗ = 0.2, β = 0.04, h ∗m = 0.8. M0 = 0 represents
the non-magnetic case, whereas M0 (2 ∼ 6) represents the magnetic case. It is viewed
that as the value M0 increases, W ∗ also increases. Figure 6 illustrates the variation in
W ∗ , for increasing values of the l ∗ . It is visible that as l ∗ increases W ∗ also increases.
770 C. K. Sreekala et al.
Fig. 5 Variation of non-dimensional load carrying capacity W ∗ with h ∗m for different values of M0
with l ∗ = 0.2, β = 0.04
4 Conclusions
Fig. 6 Variation of non-dimensional load carrying capacity W ∗ with h ∗m for different values of l ∗
with M0 = 0.2, β = 0.04
Fig. 7 Variation of non-dimensional load carrying capacity T with h ∗m for different values of M0
with l ∗ = 0.2, β = 0.04
Impact on Squeeze Film Lubrication … 773
Fig. 8 Variation of non-dimensional load carrying capacity T ∗ with h ∗m for different values of l ∗
with M0 = 0.2, β = 0.04
References
1. Hughes WF, Elco RA (1962) MHD lubrication flow between parallel rotating disks. J Fluid
Mech 13:21–32, View at: Google Scholar
2. Kuzma DC, Maki ER, Donnely RJ (1964) The MHD squeeze film. J Fluid Mech 19:395–400,
View at: Google Scholar
3. Hamza EA (1988) Magnetohydrodynamic squeeze film. J Tribol 110(2):375–377
4. Ariman T, Turk MA, Sylvester ND (1973) Microcontinuum fluid mechanics—a review. Int J
Eng Sci 11(8):905–930, View at: Google Scholar
5. Ariman T, Turk MA, Sylvester ND (1974) Applications of microcontinuum fluid mechanics.
Int J Eng Sci 12(4):273–293, View at: Google Scholar
6. Stokes VK (1966) Couple stresses in fluids. Phys Fluids 9(9):1709–1715
7. Lin JR (2000) Squeeze film characteristics between a sphere and a flat plate: couple stress fluid
model. Comput Struct 75:73–80
8. Bujurke NM, Jayaraman G (1982) The influence of couplestresses in squeeze films. Int J Mech
Sci 24:369–376
9. Ramanaiha G (1979) Squeeze films between finite plates lubricated by fluids with couplestress.
Wear 54:315–320
10. Sinha P, Singh C (1981) Couplestresses in the lubrication of rolling contact bearings considering
cavitation. Wear 67:85–91
11. Naduvinamani NB, Hiremath PS, Gurubasavaraj G (2005) Effect of surface roughness on the
couple-stress squeeze film between a sphere and a flat plate. Tribology Int 3
12. Naduvinamani NB, Fathima ST, Hanumagauda BN (2011) Magneto-hydrodynamic couple
stress squeeze film lubrication of circular stepped plates. Proc Mech Eng Part I J Eng Tribology
225:1–9
13. Naduvinamani NB, Rajashekar M (2011) MHD Couplestress squeeze-film characteristics
between a sphere and a plane surface tribology—materials. Surf Interfaces 5:94–99
774 C. K. Sreekala et al.
14. Naganagowda HB (2016) Effect of magnetohydrodynamics and couple stress on steady and
dynamic characteristics of plane slider bearing. Tribology Online 11(1):40–49
15. Lin J-R, Liao W-H, Hung C-R (2004) The effects of couple stresses in the squeeze film
characteristics between a cylinder and a plane surface. J Mar Sci Technol 12(2):119–123
Collapse Detection Using Fusion
of Sensor
1 Introduction
S. A. Pattar
L and T Technology Services, Bangalore, India
A. C. Ramachandra (B) · N. Rajesh
Nitte Meenakshi Institute of Technology, Bangalore 560064, India
e-mail: [email protected]
C. R. Prashanth
Dr Ambedkar Institute of Technology, Bangalore 560056, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 775
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_67
776 S. A. Pattar et al.
a higher level of accuracy, we [2, 3] have selected the heart rate sensor by using a
multi-dimensional fusion of physiological and kinematic parameters. Even the heart
rate sensor has helped in the form of cost and space comparing other physiological
sensors which are used in hospitals.
The source of the data is classified in to 3 items: fall, non-fall and nearly fall with
the different variation.
2 Block Diagram
From Fig. 1. block diagram, the dataset is taken from the GitHub website, and both
heartrate and accelerometer data are taken separately. From the heart rate sensor
and accelerometer, data are combined, creating datasheet which is input to train
dataset. These train data are raw; there is no proper alliance and method and has
some missing [4, 5] value, zero values and other. In order to get proper output, it
should pre-processes the data, and it is done in the pre-processing input dataset block.
After this, it should select the better algorithm which fits the model in terms of high
accuracy and efficiency and low mean square error. In the machine learning, there are
many different algorithms, and select the algorithm which will fit the model properly
with minimum error, high efficiency and high accuracy and train the model. From
the below block are the heart rate sensor and accelerometer data from the real word
and should be given to the train model. In the block of train model, the data are taken
from the test data from the pre-processing block and real-time block and find out the
person is fall or non-fall.
Figure 2 tells about the basic idea of working of the model (fall detector). From
the first block start, the input data are taken from the source data which is from the
website called GitHub. These data are raw data and pre-possess the data by adding
the missing value, removing the null value in the row, etc. After the pre-processing of
the input data, there is overfitting in the given input from train dataset to test dataset;
in order to remove these overfittings, we should do k-cross-fold validation. Find the
best algorithm for the model in order to minimize the error and increase efficacy.
Then use if condition; if the condition is true, then it is fall, or if it is false, then
it is non-fall, wherever the output is again sent to the dataset for not to repeat next
iteration. By detecting the fall and non-fall, calculate the performance evaluation by
calculating the sensitivity, specificity, accuracy, recall, precision and F1-score and
end of the algorithm (Fig. 3).
3 Proposed Model
The fall data contain different activities of 21 people of different gender and age.
Dataset was collected from GitHub. The participants are both men and women. It is
778 S. A. Pattar et al.
collected from an accelerometer, heart rate sensor and gyroscope sensor and divided
into 3 types (6=fall, 9 non-falls and 4=near to fall). The dataset has 12 parameters
that are collected from the participant w, x, y and z which are quaternions of the
gyroscope. Ax , Ay and Az are the axis of the accelerometer signal (g). Droll, ditch and
dyawn are the angular velocities of the gyroscope and heart PPG sensor (Fig. 4).
The pre-processing of the input data is used to check null values and categorical
attributes, if there are any null values and different categorical values present in the
given input data. For the categorical attributes, create a list for a categorical column
[6] with the empty list with the help of for loop and divide object data types from the
given data. Some of the categorical columns which are not necessary for the model
just remove from the dataset by using drop instruction if any value is present in the
given input. To find the null value in a particular column with the help of Boolean
value and to fill the more accurate value with the help the pivot table (Figs. 5 a and
b).
Collapse Detection Using Fusion of Sensor 779
Input data visualization is done with the package called Seaborn. It is a Python data
visualization library built on top of matplotlib and provides [7] a high-level interface
for informative statistical graphics. Figure 6a shows that particular column is taken
as input. Bar grasp shows particular number of value and used prepossessing input,
and curve graphs represent normalization of particular column value. In Fig. 6b,
countplot graph of the action is taken as separate attribute where x axis represents
different type of action present in dataset and y axis represents number of actions
present in the input dataset.
Label encoding is assigned a different value of object dataset for better performance.
For the label encoding, import model from sklearn import and the column action is
converted into numerical value based on the specific action. Without label ending,
prediction may effect to get a better solution for the given model to convert to cate-
gorical attribute (action) to label encoder for model improvement, and model can
process as numerical value.
780 S. A. Pattar et al.
Correlation matrix will measure the degree of relatedness of variables. For a pair of
data, correlation analysis can lead to a numerical value that represents the degree
of relatedness for the pair of variables. The correlation coefficient is denoted as r.
as given in figure from blue to orange color which shows the relativeness of the
variable. If r value is near to 1, variables are directly responsible; if r value is near
to −1, variables are inversely proportional; and if r is 0, there is no relation between
variables (Fig. 7).
Collapse Detection Using Fusion of Sensor 781
Cross-validation is used to remove overfitting of the given data. The input dataset
splits the train and test data for cross-validation. In case given train data which does
not aware of pattern of test data, in real word, the accuracy will get reduced. To
get more accuracy, we are using k fold cross-validation. For example, take 21210
samples in 10 folds where K= 10. For 10 iterations, take 1/10 of the data as test
dataset. Remaining data are taken as train dataset. Second iteration takes second set
of 1/10 data as test data and reaming train data, and repeat these processes up to
k(10) iteration. Take average of the score of 10 iteration score.
In machine learning, there are different types of algorithms for different models.
Every model has its own advantages and disadvantages. To select the algorithm
which suits the best for the model, we have to calculate mean square error and
[8–11] cross-validation for every algorithm. The value of mean square error is less
means that the proposed model is near the fitting line, and cross-validation of the
model is used to remove overfitting of the data.
From the above Fig. 8, the graph from the linear regression model is shown. Its
statistical model attempts to show the relation between input variable (x) and output
variable (y) with the linear equation. It is taken from sklearn linear model. Mean
square error of linear regression is 0.1132.
Ridge regression is based on the simple principle of linear regression and also
comes under the category of regularization (L2 regression). It is used when several
attributes are more to visualization. Figure 9 is the graph of ridge regression where
it is taken from the sklearn linear model. Mean square value of the ridge regressor
algorithm is 0.12911.
Least absolute shrinkage and selection operator (Lasso) regression. Regularization
parameter is multiplied by summation of absolute value. Figure 10 is the graph of
Lasso regression where it is taken from the sklearn linear model. Mean square value
of the Lasso regressor algorithm is 0.1532.
The decision tree is a graphical representation of all the possible solutions to a
decision based on the certain condition. It starts with the root and then branches off
to a number of decisions and the condition of tree. It will begin with adding a root
node for the tree, all the nodes receive the list of rows input, and root will receive
the entire training set, then each node asks for the true and false question about one
other feature, and in response to the question, it will split to partition the dataset into
two different subsets; these subsets become input to the child node. Figure 11 is the
graph of decision tree regression where it is taken from the sklearn linear model.
Mean square value of the lasso regressor algorithm is 0.0.
Collapse Detection Using Fusion of Sensor 783
Random forests are made out of decision trees. The disadvantage of the decision
tree does not perform well on the real dataset; it tends to overfit, meaning it will
perform well in the training dataset but not on the test dataset. The decision tree has
high variance and low bias (datasets that are not used in training data). To overcome
this problem, using decision trees in a different form is called random forest. Random
forest uses the ensemble learning method in which the prediction is based on the
combined results of various individual models. It will take the entire dataset and
784 S. A. Pattar et al.
crest subset of the data. The size of the data remains the same, and all the subsets
are equal to a number of rows means taking a subset on the random base with
replacement. Figure 12 is the graph of random forest regression where it is taken
from the sklearn linear model. Mean square value of the lasso regressor algorithm is
0.00013 (Table 1).
From the above table, there are 5 different algorithms and mean square error value
of algorithm. Linear regression is the simplest algorithm, it works on basis of linear
equation, its MSE is more, and accuracy of the model becomes low. Ridge algorithm
is used for multidirectional data for model selection, its MSE is more compared
Collapse Detection Using Fusion of Sensor 785
to linear regression, and accuracy becomes less with time management. Lasso is
another algorithm which is used to embed system shrinkage method that performs
both variable selection and regularization at the same time, and it has more MSE
and high accuracy. Decision tree regression has MSE very low compared to other
algorithms and has very high accuracy. Random forest regressor has small MSE, but
random forests are made out of decision trees. The disadvantage is the decision tree
does not perform well on the real dataset; it tends to overfit, meaning it will perform
well in the training dataset but not on the test dataset. The decision tree has high
variance and low bias. To overcome this problem, using decision trees in a different
form is called random forest.
786 S. A. Pattar et al.
3.8 Evolution
• Specificity tells that what percentage of patients without fall well correctly
identified.
true negative
Specificity =
true negative + false positive
• Accuracy is calculated as the total number of correct prediction (true positive and
true negative) divided by total number of dataset.
• Recall is calculated as the percentage of actual negative results out of all predicted
negative values from the model.
true positive
Recall =
true positive + false negative
Collapse Detection Using Fusion of Sensor 787
true positive
Precision =
true positive + false positive
• F1-score is calculated as a balance between precision and recall and takes into
account both of these values. And the model is accurately predicting both.
2 ∗ recall ∗ precision
f 1 − score =
recall + precision
4 Result
The proposed model is used to predict the fall and non-fall which is helpful for people
who are sudden fall without any notice. Prediction is done in real time and already
built-in data.
From Fig. 13 above, the prediction of the value is taken as already built-in test
dataset, and the prediction value is taken as the separate data file to compare actual
test data and predicted value data.
From Fig. 14, the prediction value is detected from real-time test data. To create
real-time test data, data values are manually entered, and it is predicted action.
5 Conclusion
We have discussed the fall detection with the fusion of heart rate sensor and
accelerometer which act as both accelerometer and gyroscope with the help of
machine learning algorithm. Then, machine learning algorithm will fit the model
according to the dataset and model. Our results are compared with the existing method
from the one of the literature survey, i.e. cluster based on fall detection which detects
the fall only using the unsupervised cluster algorithm. The accuracy and efficiency
are less. In order to increase the accuracy, sensitivity, and efficiency, we used same
dataset but with the help of different machine learning algorithms. There are two
algorithms which give less mean square error and high efficiency; they are decision
tree regressor and random forest regressor. The decision tree regressor has very least
MSE, but it is less accurate for real-time data. In order to avoid this disadvantage,
we have used random forest regressor, and it gave less MSE and high accuracy and
F1-score.
References
1. Khojasteh SB, Villar JR, Chira C, González VM, de la Cal E (2018) Improving fall detection
using an on-wrist wearable accelerometer. Sensors (Basel, Switzerland)
2. Lee M-S, Lim J-G, Park K-R, Kwon D-S (2017) Unsupervised clustering for abnormality
detection based on the tri-axial accelerometer. ICCAS-SICE, 2017
3. Lee J-S, Tseng H-H (2019) Development of an enhanced threshold-based fall detection system
using smartphones with built-in accelerometers. IEEE Sens J
4. Bourke AK, Van de Ven PW, Chaya AE, OLaighin GM, Nelson J (2018) Testing of a long-term
fall detection system incorporated into a custom vest for the elderly. Eng Med Biol Soc
5. Youngkong P, Panpanyatep W (2021) A novel double pressure sensors-based monitoring and
alarming system for fall detection. In: Second international symposium on instrumentation,
control, artificial intelligence, and robotics, 2021 .
6. Chen Y, Du R, Luo K, Xiao Y (2021) Fall detection system based on real-time pose estimation
and SVM. In: IEEE 2nd international conference on big data, artificial intelligence and internet
of things engineering, 2021
Collapse Detection Using Fusion of Sensor 789
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 791
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_68
792 M. R. Sowmya et al.
and mean beneficiary, associate the presence with the message, a type of safety
through indefinite quality. Sound steganography is an idea of hiding the information
by concealing it into another medium, for example, sound record. In this paper, we
principally talk about various kinds of sound steganography strategies, benefits, and
hindrances.
Sound steganography deals with a strategy to cover a strange message in a sound
record. Moreover, audio steganography can be used for secret watermarking or
covering ownership or copyright information in the sound that can be affirmed later
to legitimize ownership rights.
Audio steganography encrypts the very presence of a message so that if successful
it generally attracts no suspicion at all. Using audio steganography, information can
be encrypted in audio files of mp3 or wav format and thus stored or transmitted
without grabbing the attention of any third party who is looking for any sensitive
data. Thus, ensuring data privacy by taking advantage of loopholes in the human
sensory system.
The main objective of this paper is to utilize the technique of audio steganography
to embed any kind of information like a txt, pdf, jpeg, or png file into a cover
audio file which results in stego-object of mp3/wav format, depending on the type
of information embedded into it. The user is given a key to extract the information
from the stego-object.
2 Literature Survey
Hmood [3] say since safe information transfers are growing day by day, steganog-
raphy has become very relevant, and new techniques have been used. Steganography
is a technique that the information needed is hidden in all other information so that
the second information does not alter substantially and continues to be the same as the
initial information. This thesis is proposing a new approach to concealing encrypted
smartphone image in an audio file.
Jian et al. [6] say the application has been successfully developed and reached
its projected objectives, this application that allows user to embed and extract
their message in wav audio, which will offer confidentiality in the communication.
However, one constraint for this application is that the applying supports document
to be embedded in wav audio. Therefore, additional file formats like video and docu-
ments may be added to the applying as an activity medium. This mechanism can be
further worked upon by detailed research on file formats.
Chandrakar et al. [7] say the application of the steganographic rule has been
designed accentuation sweetening in capability and security of message transmission.
This user-friendly application maintains the three major aspects of the user’s message
privacy upon covertly hiding the message. Messages are randomly mixed before
hiding which adds another advantage in terms of security. Mistreatment of the least
significant bits [LSBs] algorithm adds to the advantage of operating with any audio
Secured Storage of Information Using Audio Steganography 793
file format. Audio quality is kept intact without any deterioration even in cases where
two or more audio files are used, to reduce the suspicion of any secret communication.
Timothy et al. [8] say there is a variety of proven ways for applying steganography
to cover info at intervals of audio data. During this analysis work, an associate in
nursing audio steganography system for MP3 and MP4 that uses mistreatment sepa-
rate circular function rework (DCT) and unfold spectrum techniques were developed.
It completely was shown through implementation and subjective experimentation
that the developed audio steganography system supports MP3 and MP4 digital audio
format. The system developed has the flexibility to enter a secret message of a size
that is up to 500 kb; however, the system has the power to infix a text size of 250 kb
concerning the digital audio length or size with no distortion and can retain a similar
size once embedding text into it. The work has been able to develop a sturdy hand
system that will help secure and share a great amount of sensitive information or info
without arousing suspicion. This method is so suggested for security agencies and
different organizations that think about data security as being of uttermost priority.
This method could help send covert battlefield information via an innocuous cover
audio signal.
Indrayani et al. [9] say steganography on the WAV format audio, using multiple
levels of LSB manipulation ranging from (LSB + 1 to LSB + 6), has a successful
result. The higher the LSB manipulation, the larger the secret data can be embedded,
but the stego-object resulted will have high noise. PSNR and spectrograms are used
to compare various sizes of audio files to measure the noise level.
User(s) Registration
The user has to register if not registered already
Input: User name, distinctive Id, face data
Output: Successful registration message is displayed.
System Login
Using credentials login to system.
Wrong use of credentials is handled by error handling.
Input: User credentials.
Output: Successful message for valid credentials.
Using Audio file to embed Secret file.
The user has to select the file of any format which has to be hidden.
A smaller size cover audio file is chosen by the user.
A relatively larger size larger audio file is chosen by the user again.
The secret file is embedded with the smaller audio file first and again embedded
in the larger (second) audio file.
794 M. R. Sowmya et al.
4 Design
Two audio files are used to embed any kind of secret file. Any type of secret file can
be hidden in the cover audio file. Once the secret file is embedded, the recovery key
and stego-object which has the secret file is returned to the user. This stego-object
can be saved in a hard disk or over the cloud. This can also be used in communication
between two individuals. In order to extract the secret file, recovery key and stego-
object are required. Once the file is extracted, it can be returned with no damage to
the file (Fig. 1).
Secured Storage of Information Using Audio Steganography 795
5 Implementation
5.1 Registration
User needs to register an account that is stored into the local database using SQLite3,
and only, a user can use the program.
A secret file of any format be it documents (pdf, txt, docx), images (jpg, jpeg, png)
has to be chosen by the user which has to be hidden. As cover files, two audio files of
wav format have to be chosen. To ensure that the subsequent stego-object is noise-
free, the first audio should be smaller, and the second audio should be larger than the
first.
Prepare: The smaller and larger files are prepared for the secret file’s embedding in
this section. To compute the number of LSB required to correctly embed the secret
file into the audio files, values such as channel number, wave width, and wave frames
are obtained and processed. Each time an audio file is embedded, the prepare function
is invoked.
796 M. R. Sowmya et al.
The secret file, which is now turned into a binary array, is then embedded with the
readied audio file using the LSB approach after the prepare procedure is completed.
Secret files’ binary values are modified to match the values of audio files, allowing
them to be successfully hidden.
After a successful embedding of the secret file, the user is given a base 64 recovery
key that is applied to retrieve the secret file resulting stego-object.
5.6 Extracting
Once the user wants to extract the secret data, he has to login to the software, select
the stego-object and recovery key, and using the same processing and LSB method
used for embedding, the secret file is retrieved (Fig. 2).
Registration:
The program asks the user for credentials like username, password, and email id and
stores them into a local DB called data. db using SQLite3. The function reg() does
the abovementioned process (Fig. 3).
Secret file to binary conversion:
Since we are using LSB encoding to hide the data, the entire secret file is converted
into a binary array. This is done using the f.open built-in function (Fig. 4).
Embedding:
Since the audio file contains a waveform, there is a lot of processing involved. In the
processing phase, the width of the wave, frame, and required LSBs are calculated
and repeated twice for both audio files. Once the processing is complete, the binary
array from the secret file is embedded into the LSB of the waveform, and using the
base64 function, a recovery key is generated (Fig. 5).
Extraction:
Once the user wants to extract the secret data, he has to login to the software, select
the stego-object and recovery key, and using the same processing and LSB method
used for embedding, the secret file is retrieved (Fig. 6).
Secured Storage of Information Using Audio Steganography 797
The system uses two audio files to ensure that the resulting stego-object has minimum
noise. To check if we have achieved our objective, the resulting stego-object is veri-
fied over two metrics. The two metrics are PNSR and spectrogram. In the PNSR,
minimum PNSR value a stego audio file must have to be called robust is 30 dbs.
The application was 90% successful in obtaining the 30 dB value. When employing
steganography, a spectrogram is utilized for comparing the frequencies of two audio
samples within a file. The system shows small difference in the resulting graphs for
about 95% of the tests.
Secured Storage of Information Using Audio Steganography 799
We use PSNR as a metric to check the noise level contrasting the original audio to the
stego-object. The PSNR esteem was acquired from the examination of sign strength
with the steganographic procedure. A high PSNR value shows great sound quality.
On the other hand, a low PSNR value deteriorated sound quality with a lot of noise.
A sound quality rating of a minimum of 30 dB is considered best [13]. The PSNR
formula is written in Eq. 1.
( Σm 2 )
i=1 x 1
PSNR = 10 × 10 log Σm (1)
i=1 (x 1 − x 0 )
2
7 Conclusion
The system is successful in achieving its objectives to hide multiple file formats into
an audio file. It is robust and secure in the sense; the extracted secret file is intact
without any corruption, and both audio files have minimum to no noise, while they
are embedded with secret files. While we can successfully have embedded the files
into wav format, we have a few issues with mp3 file embedding; the mp3 file is a
heavily compressed audio format which brings the issues like size imbalance. These
issues can be addressed in the future.
References
1. Artz D (2001) Digital steganography: hiding data within data: Internet Computing. IEEE
5(3):75–80. https://doi.org/10.1109/4236.935180
2. Amin MM, Salleh M, Ibrahim S, Katmin MR, Shamsuddin MZI (2003) Information hiding
using steganography. In: 4th National conference of telecommunication technology. https://
doi.org/10.1109/NCTT.2003.1188294
3. Hmood DN, Khudhiar KA, Altaei MS (2012) A new steganographic method for embedded
image in audio file. Int J Comput Sci Secur (IJCSS) 6(2):135–141
4. Morkel T, Eloff JH, Olivier MS (2005) An overview of image steganography. Proceedings of
the ISSA
5. Chung YY, Xu FF, Choy F (2006) Development of video watermarking for MPEG2 video. In:
TENCON 2006—2006 IEEE region 10 conference. https://doi.org/10.1109/TENCON.2006.
343843
6. Jian CT, Wen CC, Rahman NH, Hamid IR (2017) Audio steganography with embedded text.
In: IOP conference series: materials science and engineering. Vol 226. https://doi.org/10.1088/
1757-899X/226/1/012084
7. Chandrakar P, Choudhary M, Badgaiyan C (2013) Enhancement in security of lsb-based audio
steganography using multiple files. Int J Comput Appl 73(7) (0975-8887). https://doi.org/10.
5120/12754-9705
8. Timothy AO, Adebayo A, Junior GA (2020) Embedding text in audio steganography system
using advanced encryption standard, text compression, and spread spectrum techniques in Mp3
and Mp4 file formats. Int J ComputAppl 177:46–51. https://doi.org/10.5120/ijca2020919914
9. Indrayani R (2020) Modified LSB on audio steganography using WAV format. In: 3rd Inter-
national conference on information and communications technology, pp 466–470. https://doi.
org/10.1109/ICOIACT50329.2020.9332132
10. Python wave library: https://docs.python.org/3/library/wave.html
11. Python base64 library:https://docs.python.org/3/library/base64.html
Run-time Control Flow Model Extraction
of Java Applications
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 803
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_69
804 G. Saravanan et al.
amount of code coverage in these tools is low. A lot of code in the application part is
left undetected. We have created a bytecode instrumentation tool based on the DiSL
framework in order to extract the run-time execution trace of the applications. The
application code along with its libraries can be instrumented, and their execution
trace can be collected using the tool. This trace can be used to construct the run-time
model of the application, on which the properties of interest can be verified.
2 Literature Review
scheme based upon regular expressions to compactly represent long sequences and
an algorithm for computing these labels in the representations.
In [11], the author describes how JIVE helps in consistency of the Java run-time
behaviour with design-time specifications point of view.
In [12], the author proposes a technique that provides a clear and concise picture of
the history of program execution with respect to entities of interest to a programmer.
It presents the technique along with experimental results from summarizing several
different program executions in order to illustrate the benefit of our approach.
In [13], the author describes Adrenalin RV, a run-time verification tool for android.
Adrenalin RV overcomes the limited bytecode coverage issue found in many run-
time verification tools. Adrenalin RV is based on DiSL which is a dynamic program
analysis framework. Adrenalin RV uses load-time weaving to intercept every class
that is loaded by the VM during the load time, which removes the problem of static
weaving which only intercepts the APK classes. This allows Adrenalin RV to monitor
all classes (including zygote).
In [14], it builds a new run-time verification framework to perform RV on multiple
processes on the android platform. This new framework is built upon Adrenalin RV
which uses DiSL, a dynamic program analysis. In this framework, the event order
is recorded by extending libraries such as Binder. This framework also has extended
the regular expressions RE which enables RE that supports multiple processes.
In [15], the author presents an updated framework for run-time verification non-
single android processes. This framework extends android’s standard in-built IPC
which is overridden in the Binder library, and for interaction between two processes,
it deploys a shared-memory service. It is built on top of the multi-process framework
and Adrenalin RV by the same author.
3 Architecture
The architecture is depicted in Fig. 1. The application can be used for run-time control
flow model extraction using the source code or compiling it into a jar. By modifying
the properties of DiSL, it can run corresponding to the method used (source code or
JAR file).
A custom DiSL code for the application is stored along with the application in
the file system.
The code along with application or source code is executed in the JVM envi-
ronment, and run-time trace extraction takes place. The trace contains the order of
execution of methods during the run time. The method trace obtained is then used in
an analysis model which will produce the state diagrams of the method executions.
806 G. Saravanan et al.
Fig. 1 Architecture
4 Implementation
desired format for the project, i.e. CSV. We used FileWriter class to write the traced
data into the output file.
Once the coding part is done, we started to instrument the target files using the
DiSL framework in JVM. The input of the DiSL should be in the format of two JAR
files: one file containing the target classes and the other file containing instrumented
DiSL classes. During execution, whenever the target classes are called by the JVM
for execution, they will be instrumented using the instrumentation class. The instru-
mentation process will be done by the DiSL framework. While the target application
is being executed, the trace data will be written to the CSV file.
Once the CSV has been derived, we used the JIVE tool in order to build the state
diagram. JIVE is a plug-in extension of Eclipse IDE which allows us to build the
state diagram with the CSV in the required format. So we modify the format of CSV
in order to match the requirements for the JIVE tool. Then we used the trace file to
build our state diagram using the JIVE tool.
4.1 DiSL
For instrumentation of Java applications, DiSL has been chosen due to the wide
coverage of the application DiSL provides for bytecode instrumentation.
Many dynamic analysis tools for programs written in managed languages such as
Java rely on bytecode instrumentation. Bytecode instrumentation covers more areas
than traditional instrumentation techniques. As shown in Fig. 2, the instrumentation
covers the scope of the class mentioned at the ‘Before’ tag. Tool development is often
tedious because of the use of low-level bytecode manipulation libraries. While aspect-
oriented programming (AOP) offers high-level abstractions to concisely express
certain dynamic analyses, the join point model of mainstream AOP languages such
as AspectJ is not well suited for many analysis tasks, and the code generated by
weavers in support of certain language features incurs high overhead.
DiSL contains the following features which are being implemented: instrumen-
tations, snippets, markers, control of snippet order, synthetic local variables, thread-
local variables, static context information and dynamic context information. Based
on these features, custom DiSL code is created which can retrieve the trace file.
The instrumentation file will try to access the information related to the method
which is being accessed and store the relative information in CSV format. The trace
would consist of all the methods invoked during the program.
The trace file which produces information similar to Fig. 3 would consist of the
current thread in which the method was invoked, serial no., the corresponding class’s
source file, the status of the method (entry, exit or called) and context and target
method. By obtaining the trace file, we can generate state diagrams.
808 G. Saravanan et al.
It aids in the classification of the input data. In addition, it receives several inputs
and may return output when it reaches a certain point.
This can then be used to determine the target of a sample in the sense of supervised
learning and classification.
Single-layer perceptron Java application [8] uses the single-layer perceptron
neural network for binary classification of data. This application is implemented
using Java Swing. The DiSL instrumentation implemented for this Java applica-
tion required a higher amount of instrumentation methods to bring maximum code
coverage. A portion of the instrumentation code is shown in Fig. 6.
After implementing, we have successfully managed to get the trace of a Java
executable program. We have also obtained the state diagram for the resulting trace.
Then we filtered the trace to obtain a higher-level state diagram.
We have selected perceptron, a Java application that produces a single-layer
perceptron model of any data set given, as the testing application for instrumen-
tation. We have attached the trace files, state diagrams, statistics of the trace file and
other required screenshots.
We have trained the single-layer perceptron using the training data set we have
created. We have trained the data set based on the parameters like learning rate,
threshold, maximum convergence and initial weights range. Based on the test data we
have chosen, we have obtained final threshold, synaptic weights, training recognition
rates and testing recognition rates as the final results.
Figure 7 shows the trace generated for the dining philosophers problem execution,
and each row represents a different nature of the program. The trace files contain
information such as thread name, index, file name and action performed at that
particular stage with method name included. Based on the trace, there are three
different states: thinking, hungry and eating, and we can determine from those trace
whether the philosopher was thinking or hungry or eating.
We have also obtained the models like field state diagram and method call state
diagrams shown in Figs. 8 and 9 using the trace files we have generated. In the
field state diagram, the terms T, H and E stand for thinking, hungry and eating. The
diagram represents the situation of each philosopher in different stages of time. Field
state diagram mentions the value of the state whether the philosophers are thinking,
hungry or eating. If you consider the first state, there are 5 Ts which represent that
they are five values which are threads and all the five are thinking. The other states
also similarly represent the different stages of the process.
Image shown in Fig. 10 is the method trace that was obtained by executing the Java
application single-layer perceptron application.
Each row represents a different stage of the program. The trace files contain
information such as thread name, index of the state, file name and action performed
at that particular stage with method name included. This trace file is helpful in
obtaining a method state diagram of the application.
Using the trace file generated, we have obtained method call state diagram, shown
in Fig. 11, for the perceptron. The obtained state diagram contained methods related
to the UI application. So to obtain a state diagram for only the core aspects of the
application, the trace file obtained is modified in the instrumentation code to ignore
these UI methods and correspondingly a new state diagram is obtained as shown in
Fig. 12. In the method call state diagram, it can be noted that the method invocation
order can be obtained and used for verification purposes. The single-layer perceptron
application consists of methods such as loadFile, resetFrame, resetData, startTrain,
trainPerceptron, RandomNumber and TestPerceptron.
In this article, we used a different technique for run-time control flow model extrac-
tion of Java applications with the help of the DiSL framework. We have success-
fully generated the state diagrams and method call diagrams from the method trace
obtained. We have extracted the trace for the selected Java applications, and we have
created models for that trace extracted correspondingly. Furthermore, instrumenta-
tion for a classical Java problem has been carried out with positive results. Finally,
we provided a case study, where the technique was implemented and the result was
obtained and analysed.
This research is done with Java applications. Currently, it can be used on Java
applications to extract the required control flow models. There are real-world appli-
cations that use Java applications in their embedded systems. These applications can
814 G. Saravanan et al.
also be used for the same purposes, as these applications are uncommon and not
popular at present but have a big role in future. It can be used to extract models
for many real-world Java applications containing various methods and classes. The
bytecode instrumentation can be improved in future to support events like field values.
References
1 Introduction
In the recent era, for security purposes, authentication is mandatory in all domains,
e.g., robotics, surveillance, law enforcement, interactive game applications. Authen-
tication is the process of recognizing a person’s identity, this can be done using a
biometric system. It uses human features like fingers structures, faces facts, etc. By
using these features, a biometric system compares with existing data then identifies a
person. There are plenty of biometric systems like iris recognition, fingerprint recog-
nition, face detection, etc., are available in the market. These biometric systems are
used to identify criminals easily in any sectors.
The face is a complex multiple-layer construction and needs great registering
procedures for acknowledgment. The face is our essential and first focal point of
consideration in public activity assuming a significant part in the character of a
person. We can perceive the number of countenances learned for the duration of
our lifetime and personality that face at glace even later years [1]. There might be
varieties in faces because of maturing and interruption. To identify the given person,
we need something which should be more reliable for verifications or identifications.
To suit this requirement, there is a device that uses finger impressions and voice of the
persons that is called the biometric system that uses automatic methods to identify the
persons accurately. The qualities are quantifiable and interesting. This is an average
B. N. Chandrashekhar (B)
Department of Information Science and Engineering, Nitte Meenakshi Institute of Technology,
Bangalore 560064, India
e-mail: [email protected]
H. A. Sanjay
Department of Information Science and Engineering, M.S. Ramaiah Institute of Technology,
Bangalore 560054, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 817
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_70
818 B. N. Chandrashekhar and H. A. Sanjay
circumstance where the degree of safety given is given as the measure of cash the
fraud needs to acquire unapproved access [2].
There are a few strategies in distinguishing faces. Highlight-based methodologies
utilize nearby facial elements, nose, mouth, eyes, and the underlying connection
among them. These strategies are thought of as strong against brightening changes,
impediments, and perspectives. Nonetheless, great superiority pictures are essential,
and the computational techniques are costly. Another methodology is appearance-
based techniques where the position of the face is viewed as a two-class design
acknowledgment issue [3]. The arrangement depends on highlights determined from
pixel esteems in the hunt window. A few element types are being used, for example,
a Haar-like classifier is made by utilizing factual learning over a huge arrangement
of tests [4].
Face identification is costly. It is a challenging task for hybrid [CPU-GPU] founda-
tions [5]. The graphics processing units (GPUs) have enormous equal figuring assets,
just as superior execution with drifting point activities and high memory data transfer
capacity in hybrid [CPU-GPU] foundations. With the appearance of compute unified
device architecture (CUDA) [6] and open detail for computation vision (OpenCV),
these assets have opened up to conventional figuring [7]. Not like equally distributing
[8] on CPUs-based HPC infrastructures[clusters] [9], light computation in hybrid
[CPU-GPU] foundations a GPU has parallelism an enormous number of lightweight
bit occasions (additionally called strings, work-things, or microkernels) are lined and
dispatched. Regularly, every one of these will deal with calculations for one result
component, implying that one handling cycle is acknowledged with up to a huge
number of microkernels. Notwithstanding, as GPUs are exceptional processors, they
must be used in speeding up calculations that can be fitted to GPU design speed
increase.
The rest of the paper is structured as follows. Section 2 explains the related work
of face detection. Section 3 hybrid [CPU-GPU] infrastructures used for real-time
face detection. Section 4 faces detection strategy Sect. 5 describes the experimental
results. We briefly conclude in Sect. 6.
2 Related Work
work accelerated the face detection algorithm by using accelerators like GPU with
OpenCL. The main idea to accelerate was the improve the performance by preserving
the functionally equally. Overall, from this implementation, we can see that GPU
performs faster execution.
Sharma et al. [10], author proposed surveyed two calculations for recognizing
individuals in night vision recordings. The proposed problem area calculation
utilizes the dark body radiation hypothesis and the foundation deduction calcula-
tion utilizes the distinction picture got from the info picture and a created foundation
picture. The outcome examination is done of the investigations performed on these
methodologies.
Viola et al. [11] depicts how to handle the images very fast by using the artificial
intelligence approach for visual article discovery which is equipped for handling
pictures. In this study, author considered three key obligations. The first is the vital
picture which authorizes the features utilized by the indicator to be processed quickly.
The second is an AdaBoost-based classifier for learning calculation, which selects
a few features from a larger set of data very proficiently. The third obligation is a
course joining classifier, which gradually locates the picture to proximately inclined.
Lescano et al. [12], authors utilized the GPUs for the training phase to reduce the
amount of time needed to process training data, and even authors made performance
comparisons with the other works. Outstanding results were obtained by using the
CUDA framework to reach adequate times for the training phase.
Mutneja et al. [13] explored three principle obligations in this work. Primarily,
by using Haar features dependent on the Viola–Jones structure, the authors accom-
plished a noteworthy acceleration by parallelizing the location on GPU. Also, through
AdaBoost investigation growing more inventive and productive methods for choosing
classifiers for the errand of face recognition, which can additionally be summed up
for an article location. Thirdly, execution of parallelization methods of an altered
adaptation of Viola–Jones faces identification calculation in the mix with skin tone
separating to diminish the hunt space has been finished. We have had the option to
accomplish an impressive decrease in the pursuit existence cost by utilizing skin tone
separating related to the Viola–Jones calculation. For 54.31%, at the picture goal of
640 * 480, time cost decrease on GPU time versus CPU time has been accomplished
by the proposed parallelized calculation.
Fig. 1 Framework of hybrid [CPU + GPU] cluster environment with each node having two different
computing capabilities GPUs
of the CPUs and GPUs computation capacity. The GPUs performance changes with
different workloads, while the performance of the CPU is moderately stable for
the load running on it. In hybrid [CPU + GPU] architecture, PCIe-bus is used to
interconnect CPUs and GPUs. Numerous cores with respective caches are available
for every CPU. GPU consists of several streaming multiprocessors (SMs). In this
architecture, CPUs are responsible for allocating tasks, initiating computation and
controlling the GPUs, and finally, fetching the computed results from the GPUs [14].
Once compute-intensive real-time face detection application launches to the
hybrid [CPU-GPU] infrastructures, we have considered pinned memory technique
with a single MPI process. Single MPI process per node which improves the intern-
odes communication overhead and memory bandwidth. MPI process initially trans-
fers the computed workload to respective nodes in a heterogeneous cluster and then
on each respective node for dynamic distribution of workload on the CPU and GPUs
based upon their computing capabilities, speed, and hardware specification.
The MPI cycle produces numerous OpenMP threads as the number of cores in
the CPU inside each node increases. Just the main thread helps out GPU and the
others perform important math activities equally. On GPU, kernels will be launched
parallelly, then by using grid dimensions, several blocks, and several threads, it
processes the kernels very rapidly. Lastly, the results of the computed kernel will be
transferred back to the CPU memory using the MPI process which is shown in Fig. 2.
Only CUDA-based GPU face detection has various issues such as:
1. Data communication overhead between CPU-GPU and vice versa
2. Only supports NVIDIA’s GPUs
Accelerating Real-Time Face Detection Using Cascade Classifier on Hybrid … 821
3. Application dependency on the previous stage due to this efficiency will not be
noticed on only CUDA-based GPU
4. Due to performance overhead, CUDA does not support exception handling but
CUDA managed it by using thousands of threads.
To address these issues in this research work, we have considered hybrid [CPU-
GPU] infrastructure to implement Viola–Jones face detector on CPU with OpenCV
and GPU with CUDA improved the performance. This performance was cutting-edge
in different ways to accelerate its speed.
In the recent era, still in so many domains to detect the faces used Viola–Jones
algorithm-based real-time face detection system. A face recognition algorithm should
have dual important highlights, exactness, and speed. Three fixings are working
in the show to empower a quick and exact detection. For efficient computational
resource allotment, the basic picture includes integral image feature computa-
tion, AdaBoost feature selection, and loading of cascade classifier for efficient
computational resource allocation.
1. Integral Image feature Computation
The basic picture is characterized as the summation of the pixel benefits of the first
picture. The worth at any area (x, y) of the basic picture is the number of the picture’s
pixels above and to one side of the area (x, y). The figure beneath represents the
vital picture age. The vital picture utilizes basic rectangular to determine a halfway
portrayal of a picture. The cluster of pixels is known as an image. (x, y) is the pixel
at the area complete? So, if the basic image is A (x, y) and the vital picture is AI [x,
y], then the basic image is registered as displayed outlined in Fig. 3 (Figs. 4 and 5).
ii(x, y) = i x ', y'
x ' ≤x,y ' ≤y
822 B. N. Chandrashekhar and H. A. Sanjay
Perhaps the 2D wave calculation is the most effective way to decrease the spatial
information reliance made by basic pictures is as shown in Fig. 6. We have identified
some of the issues concerned with this algorithm when deployed to hybrid [CPU-
GPU] infrastructure as it includes scanty non-straight gets to of memory. In this way,
a productive strategy is to carry out the necessary picture algorithm is by thinking
about the stuff to perform rows and columns calculation.
Henceforth, the means of calculation are as per the following:
1. Evaluate the information from the rows in the picture
2. Rendering the computed results
3. Evaluate the rows in the results rendered
4. Result creating in the wake of rendering the last result.
(x,y)
I (x, y) = F(i, j )
(i, j)=(0,0)
(x,y)
x
y
I (x, y) = F(i, j ) = F(i, j )
(i, j)=(0,0) i=0 j=0
Threads will heap many threads for every repetition from the row into memory
over elite sweep and expansion of transporter. For the most portion of the transporter
worth will be introduced to zero toward the start, however, it will be refreshed with
the amount of all components being looked over to the second later the principal
cycle. At last, the threads will move the outcome into the global memory. The scan
is performed in 2 phases:
Accelerating Real-Time Face Detection Using Cascade Classifier on Hybrid … 825
Experiments were conducted for validating the accelerated real-time face detec-
tion using cascade classifier on hybrid [CPU-GPU] HPC infrastructure, and the
experiments were conducted on 2 different clusters:
Nodes Nodes names No. of Processors Clock speed RAM (GB) PCIE Operating Ethernet MPI library Compiler
cores/scoket (GHz) system (Mbps)
2 Dell precision 8 Intel Xeon 3.70 128 3.0× Fedora 24 100 MPIC H2-1.2 GCC version
T3610 E5-1600 CPU 4.4.7
227 Two-NVIDIA 2000 2 3.0× nvcc version
Quadro 5.0
K2000-GPUs
3 Dell precision 4 Intel Xeon 2.40 192 3.0× Fedora 24 100 MPIC H2-1.2 GCC version
R5500 5600 CPU 4.4.7
192 Two-NVIDIA 1250 1 3.0× nvcc version
Quadro 2000 5.0
1 Power edge 24 Intel Xeon 2 768 3.0× Fedora 24 100 MPIC H2-1.2 GCC version
R720 E5-2620 CPU 4.4.7
[server]
448 NVIDIA 1.15 6 3.0× nvcc version
TESLA M 5.0
2075
Accelerating Real-Time Face Detection Using Cascade Classifier on Hybrid …
827
828 B. N. Chandrashekhar and H. A. Sanjay
In the experiment conducted, we have provided two options mainly the training phase
and the testing phase. In the training phase, we fed the computer with several faces
of human and non-human faces. In the testing phase, we executed our code on CPU-
based HPC infrastructure and hybrid [CPU-GPU]-based infrastructure formulated
our results. We also tried testing our code of face detection on animal faces and
computed results; all the snapshots are listed in Figs. 9 and 10.
The execution on hybrid [CPU-GPU]-based infrastructure utilizes the OpenCV
+ CUDA-based cascade classifiers with roughly 2430 Haar-like classifiers for front-
facing faces. Tests dependent live video take care of 24 fps from the camera were
led, with a portion of the recognition outcomes displayed in Figs. 9 and 10.
Proposed real-time face detection was tested on a hybrid [CPU-GPU] HPC
infrastructure for their accelerated performance (Figs. 11, 12 and 13).
A comparative study between the CPU-based HPC infrastructure and the proposed
hybrid [CPU-GPU] HPC infrastructure acceleration in fps is listed in Table 1. The
greatest fps execution of our projected execution is 37.9871 frames per second in
Hybrid [OpenCV + CUDA] HPC infrastructure, in the examination of the CPU-based
HPC infrastructure execution of 2.71 frames per seconds.
Figure 14 shows the FPS performance comparison of the proposed hybrid [CPU-
GPU] infrastructure against CPU-based HPC infrastructure for varying resolutions,
where X-axis shows the different resolutions and Y-axis shows the frames per second.
During the experiments on hybrid [CPU-GPU] infrastructure for lower resolutions
Accelerating Real-Time Face Detection Using Cascade Classifier on Hybrid … 829
Table 1 Performance comparison between CPU-HPC infrastructure and hybrid HPC infrastructure
Resolution Frames per seconds (FPS) Average detection time (µs)
CPU-based Hybrid Speedup CPU-based Hybrid Speedup
HPC [CPU-GPU] HPC [CPU-GPU]
infrastructure HPC infrastructure HPC
infrastructure infrastructure
640 × 480 1.70 43.78 25.75294118 0.956236 0.024378 0.025494
720 × 480 1.73 42.67 24.66473988 0.995612 0.027878 0.028001
800 × 600 1.75 40.32 23.04 1.061782 0.787878 0.742034
1024 × 768 1.77 39.00 22.03389831 1.218787 0.987999 0.810641
1152 × 864 1.97 38.98 19.78680203 1.549898 1.029922 0.66451
1280 × 768 2.13 38.75 18.19248826 1.679899 1.217668 0.724846
1366 × 768 2.36 38.71 16.40254237 1.787897 1.565466 0.875591
1440 × 900 2.54 38.54 15.17322835 1.876755 1.564543 0.833643
1680 × 1050 2.59 38.21 14.75289575 1.929870 1.678799 0.869903
1920 × 1200 2.67 38.01 14.23595506 1.967576 1.769789 0.899477
2048 × 1536 2.71 37.98 14.01476015 1.887672 1.178923 0.624538
640 × 480, we have observed a speedup of 25.75 frames per second against CPU-
based HPC infrastructure. Similarly, for higher resolutions 2048 × 1536 experiments,
we have noticed a speedup of 14.014 frames per second against CPU-based HPC
infrastructure.
Figure 15 comparison of real-time face detection time on the proposed hybrid
[CPU-GPU] infrastructure against CPU-based HPC infrastructure for varying reso-
lutions, where X-axis shows the different resolutions and Y-axis shows the average
detection time. During the experiments on hybrid [CPU-GPU] infrastructure for
lower resolutions 640 × 480, we have observed that detection time speed up of
0.025494 computation time when compared against CPU-based HPC infrastruc-
ture. Similarly, for higher resolutions 2048 × 1536 experiments, we have noticed
Accelerating Real-Time Face Detection Using Cascade Classifier on Hybrid … 831
Fig. 14 Performance
comparison between
CPU-HPC and hybrid HPC
in real time
50
Frames Per Second[FPS]
45
40
35
30
25
20
15
10
5
0
Resolutions
FPS for CPU HPCinfrastructure FPS for HYBRID[CPU-GPU] HPCinfrastructure
Fig. 15 FPS performance comparison of the proposed hybrid [CPU-GPU] infrastructure against
CPU-based HPC infrastructure
6 Conclusion
2.5
Detection Time(μs) 2
1.5
0.5
Resolution
CPU-based HPC Infrastructure Hybrid [CPU-GPU] Infrastructure
Fig. 16 Comparison of real-time face detection time on the proposed hybrid [CPU-GPU]
infrastructure against CPU-based HPC infrastructure
per second and average detection time up to 0.64 times computational speed up
based on varying resolutions from 640 × 480 to 2048 × 1536 when compared
against CPU-based HPC infrastructure.
References
1. Jain V, Patel D (2016) A GPU based implementation of the robust face detection system.
Procedia Comput Sci 87:156–163
2. Jussi M (2013) GPU accelerated face detection. Department of Computer Science and Engi-
neering, University of Oulu, Oulu, Finland University of Oulu, 2013 ebook. http://urn.fi/URN:
NBN:fi:oulu-201303181103
3. Mohanty A, Suda N, Kim M, Vrudhula S, Seo JS, Cao Y (2016) High-performance face
detection with CPU-FPGA acceleration. IEEE Int Symp Circ Syst (ISCAS) 2016:117–120
4. Daschoudhary RN, Tripathy R (2014) Real-time face detection and tracking using Haar classi-
fier. In: Proceedings of SARC-IRF international conference, 12th Apr 2014, New Delhi, India,
ISBN: 978-93-84209-03-2
5. Kurniawan B, Adji TB, Setiawan NA (2015) Analisis Perbandingan Komputasi GPU dengan
CUDA dan Komputasi CPU untuk Image dan video processing. In: Seminar Nasional Aplikasi
Teknologi Informasi (SNATI), vol 1, no 1
6. Wai AWY, Tahir SM, Chang YC (2015) GPU acceleration of real-time Viola-Jones face detec-
tion. In: 2015 IEEE international conference on control system, computing and engineering
(ICCSCE), pp 183–188
7. Chandrashekhar BN, Sanjay HA, Deepashree KL, Ranjith N (2018) Implementation of image
inpainting using OpenCV and CUDA on CPU-GPU environment. In: International conference
advances in computing, control & telecommunication technologies (ACT-2018). Academic
Press Publishers
8. Kane SN, Mishra A, Gaur A (2014) International conference on recent trends in physics (ICRTP
2014). J Phys Conf Ser 534(1):11001
9. Patterson DA, Hennessy JL (2020) Computer organization and design MIPS edition: the
hardware/software interface. Morgan Kaufmann, Burlington
Accelerating Real-Time Face Detection Using Cascade Classifier on Hybrid … 833
10. Sharma S, Agrawal R, Srivastava S, Singh D (2017) Review of human detection techniques in
night vision, pp 2216–2220. https://doi.org/10.1109/WiSPNET.2017.8300153
11. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In:
Proceedings of the 2001 IEEE computer society conference on computer vision and pattern
recognition, CVPR 2001, vol 1, p I-I
12. Lescano GE, Santana Mansilla P, Costaguta R (2017) Analysis of a GPU implementation of
Viola-Jones’ algorithm for features selection. J Comput Sci Technol 17(1):68–73
13. Jia H, Zhang Y, Wang W, Xu J (2012) Accelerating viola-jones face detection algorithm
on GPUs. In: 2012 IEEE 14th international conference on high-performance computing and
communication & 2012 IEEE 9th international conference on embedded software and systems,
pp 396–403
14. Chandrashekhar BN, Sanjay HA (2019) Performance framework for HPC applications on
homogeneous computing platform. Int J Image Gr Signal Process (IJIGSP) 11(8):28–39. https://
doi.org/10.5815/ijigsp.2019.08.03
15. Marineau R (2018) Parallel implementation of facial detection using graphics processing units
16. Putro MD, Adji TB, Winduratna B (2012) Sistem Deteksi Wajah dengan Menggunakan Metode
Viola-Jones
17. Patel R, Vajani I (2015) Face detection on a parallel platform using CUDA technology. Natl J
Syst Inf Technol 8(1):17
18. Rahmad C, Asmara RA, Putra DRH, Dharma I, Darmono H, Muhiqqin I (2020) Comparison
of Viola-Jones Haar cascade classifier and histogram of oriented gradients (HOG) for face
detection. IOP Conf Ser: Mater Sci Eng 732(1):12038
MedArch—Medical Archive
and Analytical Solution
1 Introduction
Though the whole world is moving forward and is trying to digitize in whatever field
possible medicine/health care is no exception. Even though most of the government
hospitals are still following the traditional protocol of pen and paper-based collection
of medical data, hospitals in cities, basically private hospitals are coming up with what
is called an electronic health record system. In our course of collecting information for
this project, we have tried 2–3 open-source EHR tools by ourselves. EHR is very good
at doing what we used to do using pen and paper but a bit more time efficiently and
smartly. But again, since we are dealing with medical sciences, our target is not just to
save papers, our target must be, to use such a huge amount of data in the most efficient
way possible. The biggest problem in today’s world of medical sciences is not the
scarcity of resources, science and technologies have ensured a large amount of digital
machinery that aids toward more efficient medical procedures, and the problem is the
lack of knowledge, lack of communication. Scientists do not communicate directly
with the patients, doctors barely communicate with other doctors, researchers are
mostly privately sponsored, they do not communicate with the government, and this
keeps on and on. Technology is advancing day by day giving us new ways to share
and collect information and use it, but still, the idea of business and profit is what lets
us have this barrier in communication which otherwise would have helped us solve
very mysteries set of condition that are there in our world but remain unreported or
unidentified due to this lack of communication. Also, every time one moves from
one location to another has to see a different doctor or comes in an emergency case,
or any other reason, the hospital or the doctor has to take all the information from
the start which can be a very tedious task. Also, not everyone in the world is capable
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 835
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_71
836 J. N. Kalshetty et al.
2 Problem Statement
In this project, we are going to focus on the use of technologies in data analytics in
the field of Medical Science. It may seem to be a pile of data of billions of people,
but if we can see the bigger picture, it will have a huge impact on Medical Science
and the future of mankind. We have talked to many friends pursuing medical science
and tried to understand their approach and found out a few major problems.
1. An ill-informed, uneducated, or unconscious patient is not a reliable source of
medical history; hence, it affects a doctor’s approach. For example, a patient was
admitted from the emergency room because of acute respiratory failure. He could
not verbalize his medical history, because all of his focus and efforts were upon
getting enough air. In terms of his medical history, his medical team was flying
blind.
The patient was a 48-year-old truck driver from out of town. No family was at
the bedside, and no medical records were available. What medications was he on?
What allergies did he have? Given all of the unknowns, the patient received the
generic, one size fits all treatment for acute hypoxic respiratory failure. Over the
next several hours, his heart rate gradually increased into the 130 s. Most likely, he
was developing sepsis from an acute infection which was the cause of his initial
breathing difficulties. In response, he was given antibiotics and intravenous fluids,
again the generic treatment for what was the most likely cause of his condition.
But it turns out his symptoms were due to something else entirely that would not
have been missed if old medical records had been available. The patient just had a
severe exacerbation of his chronic obstructive pulmonary disease. He did not have
sepsis at all. His increased heart rate was due to beta-blocker withdrawal from
not getting his routine nightly dose of metoprolol. He was eventually discharged
from the hospital in good condition, back at his baseline. His hospitalization,
however, was prolonged by a full day, and he received unnecessary antibiotics
all because nobody knew he was on a beta-blocker. His old medical records were
in Lucknow, locked up safe, and secure at his home. We, however, were in Delhi.
While his medical records were secure in Lucknow, they were not useful.
2. Not only is there bad communication between the patient and the doctors, but also
the medical records maintained by any particular hospital remain confined to itself
resulting in a lack of network between hospitals and hence bad communication
and sharing of resources between them. Before the treatment of any disease, every
patient is a live human experiment of that disease in a particular geography, sub-
population specific genetic makeup, etc., but due to lack of connectivity between
hospitals, we lost all of these important data.
MedArch—Medical Archive and Analytical Solution 837
3 Proposed System
We are going to make use of a NoSQL database for storing patients’ medical records
since most of their records will be unstructured. The idea is to make the medical
record more objective as possible rather than storing PDFs and PPTs the data will
be broken down and will be stored objectively. This objective data will make each
and every medical detail atomic; hence, it will be easy to manage and process data.
Now, this database will be connected to the two different applications for separating
two types of views.
We are going to develop two applications:
1. A doctor’s interface/perspective.
2. A web-application for patients.
838 J. N. Kalshetty et al.
This application will be for reading and updating the patient’s medical records in the
database.
• The hospitals, pathological laboratories, and clinics will be provided with the soft-
ware where they first have to register the doctors working there by providing their
license details. After this process, the doctors will be provided login credentials
for authentication purposes.
• Only the registered doctors will be authorized to access a registration form, and
doctors will ask the patient for his/her government id and will register the patient
on the portal when a patient visit for the very first time. We will match the patient’s
credentials with the government database.
• Once the patient is registered, the doctor will have the responsibility to fill out the
basic details like the patient’s weight, height, age, blood group, etc. Doctors can
also add to the past medical records of the patient if it is available. These details
will be added to our centralized database.
• The patient will be assigned a collection in the database which can be accessed
by the doctors under the following rights:
• Read Access: All the valid licensed doctors will have this access; the doctor will
provide his/her login credentials, then he will be authenticated and will be directed
to the portal where he can give the patient id to check his/her medical history.
• Update Access: This is a special access right which will be used whenever a doctor
has to update the patient’s medical records, and for this purpose, the doctors will
require consent from the patient which can be done using the OTP authorization
process. It is more of a handshake rather than an authentication which means that
the doctor authenticates the patient and the patient authenticates the doctor.
These rights help in preserving confidentiality.
Confidentiality: Since only the authorized doctor and patient will have read access
rights.
• Each and every read or update transaction’s details such as its timestamp and
the details of the person accessing it will be stored in our database for security
purposes.
• Now since all these records will be stored in a digital database, it will be available
to all the registered hospitals and corresponding registered doctors.
• When the patient makes the next visit to any registered hospital, the doctor
assigned to the patient can access and consequently update the patient’s record
(Fig. 1).
MedArch—Medical Archive and Analytical Solution 839
• It is also necessary that the patient can keep track of his/her medical records; for
this purpose, we will create a web application which will be directly connected
to the centralized database.
• The patients can authorize themselves by giving their identification number and
the OTP sent to their registered mobile number and finally can check their medical
record but won’t be able to update it.
It will be the doctor’s responsibility to make the patient aware of the analysis
program and based on their decision, on getting consent, a copy of the relevant data
will be pushed for analytics.
Anonymity: of the patient will be preserved which means that the identity of
the person whose medical record is chosen for the analysis process will never be
disclosed.
Cloud
So as to the problem of storage and accessibility according to our research for this
project, we think the use of cloud would be the best. Cloud is one of the tried and
tested technologies that guarantee 24 × 7 access to data from anywhere using any
device. They way any doctor can access any patient’s data at any time, given that
patient has given consent to do so. This can be further be ensured through multi-
factor authentication and authorization using patient and doctor unique id being
logged separately at every transaction.
840 J. N. Kalshetty et al.
4 Implementation
The doctor will login using his credentials and creates a session. The doctor gets
three options, to register a patient, create a case sheet, and view old case sheets.
During the initial phase of implementation doctor can register a patient to the system
MedArch—Medical Archive and Analytical Solution 841
which will generate a unique id for the patient, later patient can set password for his
account at first login attempt through OTP verification (Figs. 3, 4, 5 and 6).
This data can only be viewed by doctors and patients through their respective
portal. The data is encrypted using patient’s id as key. Below capture depicts the
view (Fig. 7).
Finally, once the data is stored and consent is given by the patient, some specific
attributes are taken and transferred to another server where different analyses are
performed. For example, we showed age versus cataract plot and realized that cataract
mostly occurs in case of old peoples, but still there are exceptions (the data below is
dummy data created from our domain knowledge) (Figs. 8 and 9).
Similarly, we did a demographic analysis finding the cities that are worst hit by
diseases coded as 0–6.
With these small examples, we have showcased how a system like this imple-
mented in at least government run hospital can change the picture of healthcare
status of our country, which at the current moment is poor and lack behind a lot in
healthcare infrastructure.
5 Conclusion
Returning back to the example, we put forth earlier in the problem statement, because
of such large datasets depositing at one place diseases like acute idiopathic pancre-
atitis will be highlighted, and we might find the unknown reasons of it happening.
Analysis like these will not only help in increasing the existing knowledge but will
also enhance the doctor’s approach while dealing with such cases. This system can
help in increasing the efficiency as well as the standard of diagnosis in our country.
842 J. N. Kalshetty et al.
While treating the above disease using traditional methods, since the strength
of the network between hospitals is poor, hence the information or the knowledge
gained by the doctors of Lucknow is unknown to the doctors working in Bengaluru
and vice-versa. Hence, the doctors on a large scale are unable to share their experience
and knowledge with each other; therefore, every time a new such case is recorded,
they have to follow their own algorithm rather than following a hidden algorithm
that has a more success rate and is used by other doctors at other places around the
country.
MedArch—Medical Archive and Analytical Solution 843
References
10. Wu TY, Jen MH, Bottle A, Molokhia M, Aylin P, Bell D et al (2010) Ten-year trends in hospital
admissions for adverse drug reactions in England 1999–2009. J R Soc Med 103:239–250
11. Runciman WB, Roughead EE, Semple SJ, Adams RJ (2003) Adverse drug events and
medication errors in Australia. Int J Qual Health Care 15(Suppl 1):i49-59
12. Moore N, Lecointre D, Noblet C, Mabille M (1998) Frequency and cost of serious adverse
drug reactions in a department of general medicine. Br J Clin Pharmacol 45:301–308
13. Patel KJ, Kedia MS, Bajpai D, Mehta SS, Kshirsagar NA, Gogtay NJ (2007) Evaluation of the
prevalence and economic burden of adverse drug reactions presenting to the medical emergency
department of a tertiary referral centre: a prospective study. BMC Clin Pharmacol 7:8
14. Khan FA, Nizamuddin S, Najmul H, Mishra H (2013) A prospective study on prevalence of
adverse drug reactions due to antibiotics usage in otolaryngology department of a tertiary care
hospital in North India. Int J Basic Clin Pharmacol 2:548–553
15. Wanbin W (2011) Design and implementation of electronic medical record management
system. Chin Comput Commun 7:26–28
16. Bo L (2017) Design and research of hospital electronic medical record management system
based on B/S architecture. Electron Des Eng 25(5):46–49
17. Lazarou J, Pomeranz BH, Corey PN (1998) Incidence of adverse drug reactions in hospitalized
patients. A meta-analysis of prospective studies. JAMA 279:1200–1205
18. Wester K, Jonnson AK, Sigset O, Druid H, Hagg S (2008) Incidence of fatal adverse drug
reactions: a population based study. Br J Clin Pharmacol 65:573–579
19. Pirmohamed M, James S, Meakin S, Green C, Scott AK et al (2004) Adverse drug reactions
as a cause of admission to hospital: prospective analysis of 18,820 patients. BMJ 329:15–19
20. Winterstein AG, Sauer BC, Hepler CD, Poole C (2002) Preventable drug-related hospital
admissions. Ann Pharmacother 36:1238–1248
Pneumonia Prediction Using Deep
Learning
1 Introduction
Pneumonia is an inflammatory disorder of the lung that mainly affects the tiny air
sacs recognized as alveoli. As pneumonia is an inflammatory disorder of the lung that
mainly affects the tiny air sacs recognized as alveoli. Symptoms typically include
a mixture of productive or dry cough, chest pain, fever and breathing difficulties.
There is a variable of severity. Pneumonia is generally caused by virus or bacterial
infection and is less frequently caused by other microorganisms, certain medicines
and circumstances such as autoimmune diseases. Risk factors include other lung
diseases such as cystic fibrosis, chronic obstructive pulmonary disease (COPD),
asthma, diabetes, heart failure, smoking history, bad cough capacity such as a stroke
or a weak immune system. The size of lungs changes with the growth of human
being. As the disease is observed in all the ranges of age, it is important to know the
structure of lungs in all ages.
Lungs of a healthy person consist of hard tissue evenly present on outer layer. The
color of the lungs is evenly present all over the surface. A report released in India
Today stated nearly 1.7 million children will die due to pneumonia by 2030 in India.
A convolutional neural network in deep learning is a category of profound neural
networks, most frequently used for visual imaging analysis. CNNs are multilayer
perceptron regularized variants. Multilayer perceptron generally refers to fully linked
networks, that is, the next layer connects each neuron in one layer to all neurons.
These networks’ full connectivity makes them susceptible to information overfitting.
Typical forms of regularization include adding to the loss function some sort of weight
measurement of magnitude (Fig. 1).
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 847
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_72
848 B. G. Mamatha Bai and V. Meghana
2 Related Work
Nogues et al. [1] have used three significant but earlier understood variables in this
article to use profound convolutionary neural networks for computer-aided identi-
fication issues. They have investigated and assessed various CNN architectures for
the first time. The designs researched comprise parameters of 5 thousand to 160
million and differ in number of layers, assess the efficiency impact of the scale of the
dataset and the spatial picture background. Liet al. [2] give a technique relying on
1D neural network convolution which is suggested in this document to classify ECG
signals. In addition to the input layer and the output layer, the proposed CNN model
consists of five layers, that is, two convolution layers, two downsampling layers and
a full connection layer, extracting the effective features from the original data and
automatically classifying the features.
Albawi et al. [3] have given CNN’s efficiency in machine learning issues that
are outstanding. In particular, applications dealing with image data, such as the
largest image classification dataset, computer vision and the results achieved in
natural language processing were very amazing. In this document, we will clarify and
describe all the CNN-related aspects and significant problems and how they operate.
They also specify the parameters that affect the effectiveness of CNN.
Jmour et al. [4], this article discusses a teaching strategy centered on instruction
for a traffic sign classification scheme in convolutionary neural networks (CNN).
It also provides the outcomes of the preliminary evaluation of using this CNN to
know characteristics and lists the assignment of RGB-D pictures. To determine the
suitable architecture, we investigate the transfer teaching method called the “good
tuning method” of reusing layers practiced on the ImageNet dataset to provide an
alternative for a fresh information set’s four-class classification. Bhandare et al.
Pneumonia Prediction Using Deep Learning 849
the processing robustness of the function. Usually, it is put between two parts of
the convolution. The size of feature maps is determined by the moving step of the
kernels in the pooling layer. Average pooling and maxpooling are the typical pooling
activities.
Powell et al. [13], their study proposes a new and robust machine learning model
based on a convolutionary neural network to classify single cells automatically as
either infected or uninfected in thin blood smears on standard microscope slides.
The average accuracy of our new 16-layer CNN model is 97.37% in a tenfold cross-
validation based on 27,578 single-cell pictures. Only 91.99% of the same images are
achieved by a transfer learning model. Antinl et al. [14], they have to train a model
using the dataset out below to assist doctors make X-rays chest pneumonia diagnoses.
Chest X-ray pneumonia alone is a challenging job requiring understanding of disease
pathology and human anatomy. From the inspection of the dataset, it is evident that it
poses a difficult issue: The ribs often obscure regions of concern, and other illnesses
appear visually comparable to pneumonia in the dataset [15, 16].
Yu et al. [17], the proposed algorithm he used combines unsupervised features
from the saliency map with supervised features from convolutionary neural networks,
which are fed to an SVM to automatically detect high-quality retinal images versus
poor quality. On a large retinal image dataset, we demonstrate the superior perfor-
mance of our proposed algorithm and the method could achieve greater accuracy
than other methods. Krismono et al. [18], classification of the retinal eye picture
is an exciting computer vision issue with broad medical apps. For example, under-
standing retinal image is very important for ophthalmologists to evaluate eye diseases
such as glaucoma and hypertension; if left untreated, visual impairment and blind-
ness can result from these diseases. Cross-validation utilizes leave-one-out method
in these studies.
Liu et al. [19], they used 16-layer and 19-layer mode of Oxford, however, even
deeper GoogleNet has verified that depth is the most critical factor leading to elevated
D-CNN results. On the gigantic ImageNet datasets, however, nearly all the very
profound convolutionary neural networks were taught. In reality, we lack marked
information, and in the globe, there is only one ImageNet dataset. Sandova et al.
[20], their work introduces a new two-stage approach to classifying images with the
aim of improving the accuracy of style classification. The suggested strategy splits
the input picture into five patches at the first point and uses a profound convolutionary
neural network for individual training and classification of each patch. Pasupa et al.
[21], the network size is a crucial consideration, and owing to the amount of stacked
layers, feature concentrations can be enriched. If the VGG 16 design is expanded
in depth, this can lead to a gradient disappearing issue that contributes to a greater
learning mistake [22, 23]. The ResNet-50 model, on the other hand, is deeper than
the VGG 16 model, but it has a function of identity that can preserve the gradient
resulting in a more accurate model.
Pneumonia Prediction Using Deep Learning 851
3 Proposed Model
Pneumonia is a lung disease which is seen in 880,000 children under the age of 2 died
in the year 2016. A report released in India Today stated nearly 1.7 million children
will die due to pneumonia by 2030 in India. Here, we are helping doctors by making
their work easier in predicting the disease through X-ray images. So we are collecting
X-ray images and extracting features from them to perform correct diagnosis. Here,
we are passing those images into CNN model and making the model to predict
by training the model with huge amount of data. By building automatic prediction
system, we can make predictions easily so that we can lesser the time constraints
instead of wasting doctors time in identification of a disease we can rather use this
system. From this now doctors can check more number of patients, patients do not
have to wait much longer. As sooner the patient gets treatment, recovery rate also
gets faster; by this, we can help more people to live. Proposed flow diagram is shown
in Fig. 2.
Here, we are passing those images into CNN model and making the model to
predict by training the model with huge amount of data. By building automatic
prediction system, we can make predictions easily so that we can lesser the time
constraints instead of wasting doctors time in identification of a disease we can rather
use this system. From this now doctors can check more number of patients, patients
do not have to wait much longer. As sooner the patient gets treatment, recovery rate
also gets faster by this we can help more people to live. This PPDL model is divided
into two parts as Level 1 and Level 2 to enable two-step verification. In the first part,
we are training the model and checking the model accuracy on the new test dataset,
whereas in the second part, we are predicting the disease when given a new image.
The above model consists of subsequent steps as shown below:
Step 1: Here, we are collecting dataset from online resources. Dataset consists of
chest X-ray images which include both pneumonia and healthier people. Further
dataset is divided into training and testing dataset.
Step 2: Images are converted into grayscale images. Then, image resizing is applied
on each images followed by reshaping them.
Step 3: We are using CNN architectures to perform classification of images. There
are different architectures like AlexNet, VGGNet, ResNet, GoogleNet and so on.
Step 4: In this step, we are using 2 architectures VGGNet and AlexNet. VGGNet is
made of 16 layers, and AlexNet is of 9 layers.
Step 5: Training images—Images are allowed to pass through the model, and as
they pass, all the features are extracted. As longer the model, so many more features
are extracted. The process starts learning each images and starts to collecting and
learning about it more in a deeper fashion. Training is similar to both architectures.
Step 6: Testing images—This step is used to check the accuracy of the model. Once
training is completed, we now pass the other set of images which are not trained.
852 B. G. Mamatha Bai and V. Meghana
If the test error rate decreases and the accuracy increases, then it is said that the
model is working good, else we have to work more on the model in order to get more
approximation. Testing is similar to both architectures.
Step 7: Prediction of images—In this step, we are passing a new image in order to
check the capacity of the model to distinguish between normal image and pneumonia
affected image. We are using confusion matrix on the predicted output of test images
so that we can clearly see how many images are classified correctly and misclassified.
Pneumonia Prediction Using Deep Learning 853
Step 8: At this step, we are comparing the predicted outputs obtained from both the
architectures so that we can decide which architecture is working better.
Step 9: In this last step, we perform analysis by calculating the final accuracy of
both the architecture and conclude which architecture is showing best results for our
dataset.
Dataset is retrieved from the Kaggle data repository [24] which consists of 5840
images as in Fig. 3, where 5216 images are taken for training and 624 images for
testing.
1. Normal image—In the normal chest image, the outer thin lining of the chest is
visible. As chest tissue is harder, the X-rays will not be able to penetrate through
the tissue, hence color is darker.
2. Pneumonia image—In the pneumonia image, sometimes the outer lining of the
chest will not be clearly visible. The affected regions of the chest are lighter when
compared to the healthy region.
In our work, we are using VGG 16 and AlexNet architecture. VGG 16 and AlexNet
are a convolutionary neural network educated from the ImageNet database on more
than one million pictures.
The network is profound in 16 layers and can classify pictures into 2 classifications of
images. It is improved as a network of AlexNet by replacing big kernel-sized filters
one after the other with 3 × 3 kernel-sized filters. VGG 16 is trained with maximum
854 B. G. Mamatha Bai and V. Meghana
number of images. The depth of filters increases as we move forward. Here, we have
used only 3 × 3 filters as we are following VGG 16-layer architecture. Figure 4
shows the outer architecture of the VGGNet with 16 layers. The main architecture
consists of:
• Convolution layer
• Pooling layer
• Fully connected layer
Convolution layer: Pneumonia images are given as an input in this layer, image size
is taken as 150 × 150 × 3 where 150 × 150 is width and height and 3 is depth of an
image. Convolution neural networks are considered in the form of volumes where
these layers take these volumes of activations and perform chunks of operations to
produce new volumes of activations. In our case, we have depth as 3 which means
this model consists of 3 channels which are said to be RGB channels. A 3 × 3 filter
is used at convolution layer. Here, we have considered only 3 × 3 filters because
we are using VGG 16-layer architecture. According to this architecture, using 2 ×
2 size filters does not give good efficiency. And using larger size like 5 × 5 and 7
× 7 provides more number of parameters. When we consider 7 × 7 size filter, then
the number of parameter obtained will be more, instead we can use 3 × 3 size filters
simultaneously. Parameters obtained using three 3 × 3 filters will be lesser than that
of one 7 × 7 filters, and network will also be deeper with ReLU activation function.
These filters convolve around the image, and these filters will extend full depth
of the output volume. These filters convolve around the input volume till the full
Pneumonia Prediction Using Deep Learning 855
available spatial space and compute the dot product for each. After covering the
spatial space of the input volume, a response obtained it considered to be the activation
maps. Number of activation maps depends upon the number of filters used. In this
work, we are using 16 3 × 3 size filters in the first layer. Then, these activation maps
are given as input to the next layer. This process continues to the next layers, as we
go deeper the layers, the number of filters used increases accordingly as shown in
Fig. 5.
Pooling layer: This layer makes the ConvNet less sensitive to small changes in the
component area, in that the pooling layer yield continues even when a component
is moved. There are different approaches to pooling, but most used is maxpooling.
Imagine a window slipping over the component to conduct maxpooling. We grab
the greatest values in the window as the window passes over the manual and dispose
the remainder. So we are using maxpooling method with 2 × 2 size of the pane as
shown in Fig. 5. Dropout is widely used to regulate profound neural networks, but
it is essentially distinct to apply dropout on fully connected layers and dropout on
convolutionary layers where 0.4 is set for dropout (Fig. 6).
Fully connected layer: The fully related layer has no less than three parts: an infor-
mation layer, a hidden layer, and a return layer. The data layer is the former layer’s
output, which is just a range of characteristics with 4096 channels. The neurons in the
yield layer are related to each of the classes that the ConvNet is looking for. Like the
communication between the information and the hidden layer, the yield layer takes
856 B. G. Mamatha Bai and V. Meghana
values and their weights of comparison from the hidden layer and applies a capacity
and results. Here in the yield layer, 4096 channels are reduced to 2 channels as we
classify into only 2 classes. There are two categories under consideration normal and
affected image as shown in Fig. 7. A Softmax activation function is used at the last
layer.
The network is profound in 9 layers and can classify pictures into 2 classifications of
images. This architecture has no specified kernel size like in VGGNet; it is a simple
architecture. Conv layers are not repeated many times, and it has 4 dense layers which
help in classifying the images accurately. Figure 8 shows the outer architecture of
the AlexNet with 16 layers. The main architecture consists of:
• Convolution layer
• Pooling layer
• Fully connected layer
Convolution layer: Pneumonia images are given as an input in this layer; image size
is taken as 150 × 150 × 3 where 150 × 150 is width and height, 3 is depth of an
image. Convolution neural networks are considered in the form of volumes where
these layers take these volumes of activations and perform chunks of operations to
produce new volumes of activations. In our case, we have depth as 3 which means
this model consists of 3 channels which are said to be RGB channels. Here, we are
using one 1 × 1, one 11 × 11 and three 3 × 3 filters, and the optimizer used is Adam
optimizer.
These filters convolve around the image, and these filters will extend full depth
of the output volume. These filters convolve around the input volume till the full
available spatial space and compute the dot product for each. After covering the
spatial space of the input volume, a response obtained it considered to be the activation
maps. Number of activation maps depends upon the number of filters used. In this
work, we are using three 3 × 3 size filters in the third layer and 11 × 11 in second
and 1 × 1 in the first layer. Then, these activation maps are given as input to the next
layer (Fig. 9).
Pooling layer: The pooling layer also adds to the ConvNet ability to locate where
it is in the image. In particular, the pooling layer makes the ConvNet less sensitive
to small changes in the component area; it gives the ConvNet the property of trans-
lational invariance in that the yield of the pooling layer continues as before, even
if a component is moved. Pooling also decreases the scope of the overview of the
element, streamlining calculation in subsequent stages. Here, we are using only 2
maxpool layers overall with 2 × 2 kernel size; these 2 layers are seen after 1 and 3
layer of conv layer simultaneously (Fig. 10).
Fully connected layer: The fully linked layer is the layer where the last “decision”
is done. At this layer, the ConvNet restores the probability of a particular kind of
protest in an image. We have been speaking about the convolution neural structures
to actualize something that many refer to as supervised learning. The data layer is
the former layer’s output, which is just a range of characteristics with 4096 channels
followed by 1000 channels. The neurons in the yield layer are related to each of
the classes that the ConvNet is looking for. Like the communication between the
information and the hidden layer, the yield layer takes values and their weights of
comparison from the hidden layer and applies a capacity and results. Here in the
yield layer, 4096 channels are reduced to 1000 channels and then to 2 channels as
Pneumonia Prediction Using Deep Learning 859
we classify into only 2 classes. There are two categories under consideration normal
and affected image. A Softmax activation function is used at the last layer.
860 B. G. Mamatha Bai and V. Meghana
There are a lot of differences between AlexNet and VGGNet to begin with both archi-
tectures were designed for ImageNet dataset; AlexNet was introduced in 2012; and
VGGNet was introduced in 2013. VGGNet architecture has 16 layers, and AlexNet
has 9 layers. Kernel size is fixed for VGGnet, i.e., 3 × 3 and varies in AlexNet; it can
be 1 × 1, 3 × 3, 5 × 5, 7 × 7, and 11 × 11. Loss function, activation remains same
we are taking Binary cross_entropy as loss function and ReLU, Softmax as activa-
tion function. Here, we are using RMSprop as optimizer in VGGNet and Adam in
AlexNet. When we consider networks in terms of layers then looking at Table 1, we
can say that VGGNet is a deeper network than that of AlexNet. As we are performing
classification in the fully connected layer, we are using Softmax activation function.
Padding is required to retain same size of an image till the last layer; here in this work,
we are using padding only for VGGNet where we are retaining the same size of an
image. Dropout layer is used to eliminate the unwanted neurons like neurons which
are not active, fault neuron, or neurons which are said to be empty. The dropout size
of VGG is 0.4 and for AlexNet is 0.5.
conv layers with 348 filters each. Where in the third set, we are using 3 conv layers
with 256 filters each, one maxpool layer again, and AlexNet has one conv layer with
256 filters plus two dense layers with 4096 neurons each. Next fourth set in VGG
has 3 conv layers with 512 filters each, one maxpool layer where Alexnet has last 2
dense layers third dense layer has 1000 neurons, and in the fourth dense layer, these
1000 neurons are shorted down to 2 in which the images are finally classified to 2
classes. In fifth set, VGG has 3 conv layers with 512 filters, one maxpool layer; the
last sixth set has 2 dense layers with 4096 neurons, third dense layer with 2 seeds to
classify the images into two classes. So totally VGGNet architecture is composed of
13 conv layers, 5 maxpool layers, and 3 dense layers, whereas AlexNet has 5 conv
layers, 2 maxpool layers, and 4 dense layers (Table 2).
4 Result Analysis
as healthy images. Rest 92 images are wrongly classified as false negative which is
high at risk factors; it should be taken care of. Remaining 5 images are misclassified.
Figure 16 represents the comparison graph which is plotted between AlexNet and
VGGNet where the parameters are training accuracy, testing accuracy, training loss,
testing loss, time, memory, recall, and precision. By observing the graph, we notice
that the accuracy obtained by VGGNet is more when compared to that of AlexNet.
Time taken and memory used are more in VGGNet as it has deeper network.
864 B. G. Mamatha Bai and V. Meghana
5 Conclusion
The work is carried out is very useful for the society as maximum amount of the
population is facing this problem; they can use this method to minimize the amount
time, complications and also can take precautions on their daily health.
Pneumonia Prediction Using Deep Learning 865
Table 5 Comparison
Confusion matrix AlexNet VGGNet
between AlexNet and
VGGNet using confusion True negative 145 144
matrix False positive 6 3
False negative 89 90
True positive 384 387
Precision 98.46% 99.23%
Recall 81.12% 81.13%
testing by reducing loss rate to 0.1084. Precision and recall rate obtained for this
model are 98.46 and 81.12%. After analyzing the model, we can say that VGG 16
architecture is working better than that of AlexNet.
For the future work, we can improve the model by increasing the batch size and
number of epochs. We can also use more other CNN models rather than VGG 16
and AlexNet and can achieve more accuracy. By training the model multiple times,
we can achieve good accuracy.
Acknowledgements The authors express their sincere gratitude to Prof. N. R. Shetty, Advisor,
and Dr. H. C. Nagaraj, Principal, Nitte Meenakshi Institute of Technology, for giving constant
encouragement and support to carry out research at NMIT.
The authors extend their thanks to Vision Group on Science and Technology (VGST), Govern-
ment of Karnataka, to acknowledge our research and providing financial support to set up the
infrastructure required to carry out the research.
References
1. Isabella Nogues J, Summers RM (2016) Deep convolutional neural networks for computer-
aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Tans
Med Imag 35(5):1285–1298
2. Li D, Zhang J, Zhang Q, Wei X (2017) Classification of ECG signals based on 1D convolution
neural network. In: 19th international conference on e-health networking, vol 5, pp 105–112
3. Albawi S, Mohammed TA (2018) Understanding of a convolutional neural network. In:
International conference on engineering and technology. ISBN:978-1-5386-1948-3
4. Jmour N, Zayen S, Abdelkrim A (2018) Convolutional neural networks for image classification.
In: International conference on advanced systems and electric technologies. ISBN:978-1-5386-
4449-2/18
Pneumonia Prediction Using Deep Learning 867
1 Introduction
The Internet of Things (IoT) is a network infrastructure that allows multiple Internet-
connected devices to be installed anywhere, from human bodies to the most remote
parts of the globe, with more than 20 billion networked items expected by 2020 [1].
Smart grid systems are one of numerous areas of Industrial Internet of Things
(IIoT) that have the potential to increase energy delivery dependability, flexibility,
and quality [2]. However, as the system grows in size (e.g. as the number of customers
grows), issues such as decreasing latency and enhancing quality of service (QoS)
may arise [3]. As a result, there have been attempts to overcome these difficul-
ties using edge computing, such as using electric vehicle charging stations as edge
computing devices to make real-time decision-making easier, and as a result, improve
provisioned QoS and eco-friendliness in latency-sensitive applications [4, 5].
As a non-centralised security mechanism, blockchain technology may be a viable
solution for addressing high creditability and high-security concerns in IoT. The
blockchain provides a secure, distributed, and autonomous framework that allows
IoT devices, nodes, processes, and systems to securely connect with one another and
sign transactions without the need for a third-party to process or verify transactions.
Asymmetric authentication can be used by IoT devices to authenticate each other.
The blockchain is a distributed ledger that maintains transactions that are made up
of blocks of transactions that are cryptographically linked.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 869
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_73
870 M. Prasanna Kumar and N. Nalini
By 2023, the global market for blockchain in energy is predicted to grow from
USD 180.3 million in 2017 to over USD 5000 million. Blockchain is being used
by businesses to manage data and track financial transactions and relationships. It
also provides a safe avenue for corporations to manage their data. Due to their great
significance, technologies such as blockchain are gaining appeal among corporations
and other organisations in today’s society. Operational expenses, capital expenditure,
risk management, and security are all areas where blockchain can have a substan-
tial impact. Increased automation, as well as data integrity and security, is projected
to help the global economy grow. Ledgers are dispersed throughout a network of
computers in blockchain, leaving no room for hackers. The system is totally trans-
parent, with all users able to see transactions and modifications made on public
blockchains if they are supplied. As a result, numerous sectors are experimenting
with blockchain. The energy business can benefit greatly from blockchain, which
provides new, tamper-proof techniques for authentication, authorisation, and data
transmission.
For the past few years, we have heard reports about attempts to hack into elec-
trical grids in the United States, therefore, smart grid security is vital. As a result,
blockchain technology could be the key to better interconnectivity, data interchange,
and permission control security requirements. Automation and remote access are key
components of smart grids. These, in turn, bring with them security problems, which
we are only now beginning to address.
Because blockchain includes identification security, which is achieved by a
public–private key encryption with key access, anyone attempting to gain access to
a system must check the credentials and their authentication before doing anything
on the network. If key access codes are basically kept safe and secure, blockchain is
the technology that will ensure the safety of electricity grids. Because of enhanced
data collecting, intelligent electricity metres help both energy providers and end
customers. However, if these smart metres are not properly secured, hackers might
gain access to important customer information on a wide scale. By functioning as a
decentralised transaction log, blockchain can minimise security gaps and establish the
circumstances for peer-to-peer trading, in which local energy trade is made possible
owing to big energy suppliers. As a result, a new decentralised security system that
can meet basic security requirements including secrecy, integrity, and authentication
must be proposed. Apart from that, the system must address the drawbacks of a
centralised architecture.
Hence, the focus of this paper is on building a decentralised model for smart grid
application. As we know, physically unclonable functions are the promising hard-
ware security primitives for IoT devices. The advantages of physically unclonable
functionalities, as well as the decentralised and distributed nature of blockchain tech-
nology, are used in our proposed framework to construct a security framework for
IoT devices in smart grid infrastructure.
The outline of the paper is as follows: Sect. 2 briefs the related work on blockchain
and PUF technology in building security solution in IoT environment. Section 3
describes the system model and architecture of the framework. Section 4 outlines the
An Efficient Blockchain-Based Security Framework for PUF-Enabled … 871
implementation of the system, Sect. 5 discuss the outcomes of the work, and finally,
Sect. 6 concludes work carried out in the paper.
2 Related Work
The volume of data created by IoT devices is continually increasing as the IoT
business grows and the number of connected devices grows. However, IoT security
and privacy issues have arisen as a result of its rapid growth. A slew of recent
research has focused on blockchain and its usage in IoT security and privacy, as
well as an alternate solution for IoT device identification and authorisation. The
Internet of Things (IoT) is a rapidly evolving technology that consists of disruptive
networked smart gadgets that are connected via the Internet without the need for
human intervention to exchange sensor-based data. IoT devices are low-capacity
devices (nodes) with a variety of problems, including processing, connectivity, and
most importantly, security [6]. As the first gateway to the network, IoT devices require
authentication and authorisation, which is one of the most important security criteria
[6]. To create secure communication, these independent, networked nodes must first
authenticate each other. Mutual authentication is an effective method for ensuring
trust identity and safe communications by authenticating the identity of Internet-
connected communicators prior to future interactions and avoiding the transmission
of critical information over an open channel [7].
A number of security solutions for smart grids and edge computing systems have
been developed in recent years. Tsai and Lo, for example, proposed an anonymous
key distribution mechanism based on identity-based signature and encryption [8]
to construct secure communication sessions. In the paper [9], He et al. provided
a novel key agreement and authentication mechanism which has lower computa-
tion and communication costs than [8]. However, it was later pointed out that the
protocol is subject to ephemeral secret key leaking and does not ensure the privacy of
smart metre credentials, hence, an improved authenticated key agreement protocol
[10] was introduced. In [11], Jia et al. proposed and explicitly verified the secu-
rity of an efficient identity-based anonymous authentication mechanism for mobile
edge computing. The protocol, on the other hand, does not take into account key
management of communication participants.
The potential use of decentralised blockchain technology to address IoT secu-
rity issues was investigated in the article [12]. Using genetic algorithms and particle
swarm optimisation, a self-clustering approach for IoT networks is proposed, which
clusters the network into K-unknown clusters and improves the network lifetime.
To verify the proposed system, the model uses the open source hyperledger fabric
blockchain platform. Wang et al. presented a blockchain-based mutual authenti-
cation and key agreement protocol for smart grid edge computing devices in [13].
Specifically, the protocol can allow efficient conditional anonymity and key manage-
ment without the use of other sophisticated cryptographic primitives by lever-
aging blockchain. A mutual authentication-based key agreement protocol has been
872 M. Prasanna Kumar and N. Nalini
designed in the paper [3]. The developed protocol takes advantage of FHMQV, ECC,
and the one-way hash function to provide a mutual authentication mechanism that
is provably secure. In the above works, blockchain is used to implement completely
decentralised security solutions in IoT systems. Permissioned blockchain, such as
hyperledger fabric, has a lot of potential as an infrastructure for IoT security, credit
management, and other things. In [14], EC-ElGamal-based transaction encryption
and enhanced SHA-384-based block hashing are used to increase lightweight scalable
blockchain for better acceptance in blockchain-based IoT applications.
Traditional cryptographic security methods are out of reach for many embedded
systems and IoT applications due to a lack of resources. It is necessary to use
lightweight security primitives. PUF is another option for generating low-cost keys.
In restricted IoT applications, PUFs paired with other factors can give a solid authen-
tication system. Gope and Sikdar [15] proposes a two-factor authentication strategy
for Internet of Things (IoT) devices that addresses privacy and resource constraints.
PUFs are one of the authentication factors in this method. The second factor is a
password or a shared secret key. An authentication and key exchange system based
on PUFs, Keyed Hash, and identity-based encryption has been created in [16] (IBE).
The protocol removes the need for the verifier to store the PUF’s challenge answer
database and the need for a security method to keep it secret. However, the protocol
requires resource optimisation for encrypting frames, and side channel vulnerabili-
ties must be investigated. In [17], PUF-based key-sharing approach is presented, in
which the same shared key can be created physically for all devices, allowing it to be
used in a lightweight key-sharing protocol for IoT devices. In all these works, PUF
emerges as an efficient mechanism to implement security in IoT ecosystem.
In our proposed framework, the benefits of physically unclonable functions and
decentralised and distributed nature of blockchain technology are used to implement
security framework for IoT devices in smart grid infrastructure.
3 System Model
BC network [13]. The ES can also connect to a distant cloud to perform additional
data analysis or long-term data storage.
UE: UE is the user equipment which is usually a smart metre. These devices update
information about power consumption and associated data to an IES. Each UE
connects with the nearest intermediate edge server.
PBC: The permissioned blockchain (PBC) is responsible for providing decentralised
and distributed database for storing end user authentication information. In the
resource-constrained IOT environment, the permissioned blockchain (PBC) is more
efficient. The system is implemented on permissioned blockchain-hyperledger fabric.
4 Implementation
and IoT nodes easier. The IES sends various requests to the smart contract in order
to perform various transactions in the blockchain network, to authenticate UE.
Initialisation of a blockchain.
To generate a blockchain, RA prepares a genesis file that includes the necessary
settings. The RA then chooses a few trustworthy partners and launches the blockchain
using a specific consensus process. The RA can join an existing blockchain system
(e.g. hyperledger fabric) directly for simplicity.
Registration of a new device.
UE and ESP initially use a secure medium to execute the registration process of
the new device. If the device is not already registered, ESP generates authentication
credentials for the device and stores in the smart contract of permissioned blockchain.
During registration, UE generates challenge-response pairs (CRP) and shares with
the service provider(ESP). ESP stores this CRPs of the device in the block chain
network. The steps are summarised as follows.
PUF-based devices produce one of a kind challenge-response pairs. The IoT
devices are expected to be PUF capable in this scenario. Each device creates a
challenge-response pair (CRP) that is exchanged with the server’s challenge-response
pair. Initially, CRP is exchanged between a device and the server. Both will select a
random challenge and generate responses using PUFs in this scenario. The device
produces and sends a Cd, Rd challenge-response pair to the server. In the same way,
the server creates a challenge-response pair and sends it to the device. Finally, the
tuple <Cd, Cs, Rs> is saved by the device in its memory, while the server stores the
tuple <Cs, Cd, Rd> in the smart contract. This data is used for authentication and to
set up a session for communication (Fig. 2).
Authentication and Session Establishment
UE requests for authentication with the nearest IES. IES retrieves information of the
device from blockchain network and sends a challenge to the requested device.
The device has saved a tuple <Cd, Cs, Rs> containing a CRP of server and the
challenge from the CRP it has shared with server during the initial CRP exchange
phase. The device computes PUF output for the challenge Cd on the fly based on
this information. After that, the device looks for a key K = h (RdRsCs) and selects
a random integer Rn1. Following these calculations, a request message is sent to the
server by the device with the parameters K, Rn1.
When the IES receives the request message, it looks for PUF output RS for the
challenge CS , which is Rs = PUF (Cs). It computes the value h (RdRsCs) and checks
it with the K using RS and (CS , RD ) from the saved data <Cs, Cd, Rd>. The device
request is approved if it is valid; otherwise, it is rejected. Now, the server calculates
L = h (RdRsCd) using a random number Rn2. Finally, the server responds with an
L, Rn2 message. Simultaneously, the IES computes SK = (RS ⊕ Rn2) ⊕ (RS ⊕
Rn1).
Value L is verified by the device using its available data after getting the server
answer. The server response will be accepted by the device if it matches. The session
key SK = (RsRn2) (RSRn1) is then computed by the device. After successful session
key exchange between the IES and the UE, the communication is initiated.
An Efficient Blockchain-Based Security Framework for PUF-Enabled … 875
5 Discussion
The suggested solution efficiently combines the advantages of blockchain and PUF
technology to create a secure, lightweight framework for smart grid applications. Our
approach is more efficient in the resource-constrained IoT ecosystem, according to
the findings. The suggested strategy uses blockchain technology to allow systems to
control their devices and resources without relying on a centralised authority to create
trust relationships with unknown nodes. The computationally expensive elliptic-
curve cryptosystem is used in all user-centric authentication schemes (ECC). Our
proposed technique, on the other hand, is based on symmetric key crypto-systems that
are computationally efficient like PUF, which are ideal for resource-constrained IoT
devices. Permissioned networks also make good use of blockchain, such as storing
data in its decentralised form. When we compared permissioned blockchains against
permissionless blockchains, we found that permissioned blockchains outperform
permissionless blockchains. The platform’s restricted number of nodes is the main
reason behind this. This reduces the number of superfluous computations required
to reach network consensus, resulting in improved overall performance. Combining
PUF technology with blockchain in IoT scenario proves to be efficient.
876 M. Prasanna Kumar and N. Nalini
6 Conclusion
In smart grid architecture, the ability to provide private and secure communication
between end users, edge servers, and service providers is critical. A unique anony-
mous authentication and authorisation technique with efficient key management were
introduced in this study. Unlike most existing protocols, the proposed protocol not
only provides fundamental security qualities but also accomplishes additional key
security properties. The protocol’s main feature is that it makes use of PUF and
blockchain technology to provide a more secure and simple solution to protect IoT
applications and other types of data. To test the proposed system, the model uses
hyperledger fabric, an open source blockchain technology. A framework for smart
devices is provided by the authentication and authorisation mechanism at lower
layers to communicate locally with intermediate edge servers, while a permissioned
blockchain implementation is explored for the upper layer communications.
References
1. Lyu L, Nandakumar K, Rubinstein BIP, Jin J, Bedo J, Palaniswami M (2018) PPFA: privacy
preserving fog-enabled aggregation in smart grid. IEEE Trans Indus Inform 14(8):3733–3744.
https://doi.org/10.1109/TII.2018.2803782
2. Wang K, Yu J, Yu Y, Qian Y, Zeng D, Guo S, Xiang Y, Wu J (2018) A survey on energy internet:
architecture, approach, and emerging technologies. IEEE Syst J 12(3):2403–2416
3. Garg S, Kaur K, Kaddoum G, Rodrigues JJPC, Guizani M (2020) Secure and lightweight
authentication scheme for smart metering infrastructure in smart grid. IEEE Trans Industr Inf
16(5):3548–3557. https://doi.org/10.1109/TII.2019.2944880
4. Sarkar S, Chatterjee S, Misra S (2018) Assessment of the suitability of fog computing in the
context of internet of things. IEEE Trans Cloud Comput 6(1):46–59. Available https://doi.org/
10.1109/TCC.2015.2485206
5. Kumar N, Zeadally S, Rodrigues JJPC (2016) Vehicular delay-tolerant networks for smart grid
data management using mobile edge computing. IEEE Commun Mag 54(10):60–66. Available
https://doi.org/10.1109/MCOM.2016.7588230
6. Zhang Z, Cho MCY, Wang C, Hsu C, Chen C, Shieh S (2014) IoT security: ongoing challenges
and research opportunities. In: 2014 IEEE 7th international conference on service-oriented
computing and applications, pp 230–234
7. Wu L, Wang J, Choo KR, He D (2019) Secure key agreement and key protection for mobile
device user authentication. IEEE Trans Inform Forens Secur 14(2):319–330. Available https://
doi.org/10.1109/TIFS.2018.2850299
8. Tsai J, Lo N (2016) Secure anonymous key distribution scheme for smart grid. IEEE Trans
Smart Grid 7(2):906–914
9. He D, Wang H, Khan MK, Wang L (2016) Lightweight anonymous key distribution scheme
for smart grid using elliptic curve cryptography. IET Commun 10(14):1795–1802
10. Odelu V, Das AK, Wazid M, Conti M (2018) Provably secure authenticated key agreement
scheme for smart grid. IEEE Trans Smart Grid 9(3):1900–1910
11. Jia X, He D, Kumar N, Choo K-KR (2019) A provably secure and efficient identity-based
anonymous authentication scheme for mobile edge computing. IEEE Syst J 14(1):560–571
12. Rashid MA, Pajooh HH (2019) A security framework for IoT authentication and authorization
based on blockchain technology. In: 18th IEEE international conference on trust, security
and privacy in computing and communications/13th IEEE international conference on big
An Efficient Blockchain-Based Security Framework for PUF-Enabled … 877
1 Introduction
High consumption, low speed, and density beyond 10 nm have all hindered the usage
of CMOS technology in recent years. To address these issues, a group of experts
developed a solution for this traditional CMOS technology, known as quantum dot
cellular automata (QCA), which is employed in high-speed applications. CMOS
technology also works at nanoscale, but QCA uses quantum cells rather than tran-
sistors in circuits which gives quick results with less power dissipation compared to
CMOS. Quantum cells are basic building blocks of circuit to be designed in QCA.
These cells contain 4 quantum dots as shown in Fig. 1 [1–4].
A quantum dot is a semiconductor particle which can emit light of specific wave-
length when applied with some energy. These quantum dots are usually represented
as a core–shell structure with the size almost in nanometres [5, 6].
The rest of the paper is organized as following: In Sect. 2, basic of quantum dot
cellular automata (QCA). Section 3 quantum gates. In Sect. 4 implementation and
data flow. Section 5 shows the simulation results in QCA designer tool, and Sect. 6
shows the results analysis. Finally, Sect. 7 concludes the paper.
The quantum cell acts as a wire through which the data flows; hence, it is also termed
as QCA wire. A 5-cell QCA wire is represented in Fig. 2. In this QCA wire, data
transfer occurs from A to B. The data flowing through the intermediate cells are
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 879
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_74
880 U. Sharma et al.
The most important concept about clocking is the flow of data from one clock
cycle to another which is an important concept while understanding the working of
reversible gates (Fig. 7).
Let us assume the time taken for data to flow from one cell to another under clock
0 is 1 ns. The time taken for the data transfer from one cell to another in different
clocks is 5 ns. This is because changing from one clock to another creates a delay in
transfer of data.
There are four clock cycles: clock 0, clock 1, clock 2, clock 3. Delay of data can
also be seen from clock 0 to 1, 1 to 2, 2 to 3, and so on [10–12].
882 U. Sharma et al.
Y = 1 only if A = B
3 Quantum Gate
A quantum gate is a small circuit which uses some qubits for operation just as
classical gates using 0’s and 1’s. There are different types of quantum gate. One of
the quantum gate is Feynman gate.
Feynman gate is a quantum gate that uses the reversibility concept [13, 14]. This
gate has two inputs A, B and two outputs P, Q.
Figure 9 shows the design of Feynman gate which also resembles another quantum
gate called CNOT gate. Figure 10 depicts the output waveform of Feynman gate for
the corresponding input signal which is similar to the output of XOR gate [15].
884 U. Sharma et al.
Before moving on with designing the XOR gate, it is necessary to learn an important
concept of NOT gate which is depicted in Fig. 10.
The input to the cell A will be inverted and obtained at the output cell B, by
placing the cell B at an angle 45° upright to the cell A as shown in Fig. 10. Among
the various designs of NOT gate, the above design is more efficient and less area
consuming.
The actual XOR gate representation [16] is as shown in Fig. 11. Data flow is the
important part in understanding the working of XOR.
Consider input A in the Fig. 12, when the data flows from input A to a normal
cell of clock 0, it has two paths to go. First to clock 1 represented by a purple cell
and second to clock 0 represented by green cell.
Since we have discussed time delay in the previous section, we know that the data
will flow first to “clock 0” cell and then to “clock 1”. Similar is the case with input
B in Fig 13.
Referring to Fig. 12, the cell “a” receives the data from both the inputs A and B.
Thus, cell “a” acts as a device cell; similarly, the cell “b” shown in Fig. 13 also acts
as a device cell.
The data from the device cell “a” flow in two directions shown in Fig. 14.
The “clock 1” cells already received data through both inputs A and B shown in
Fig. 15.
The data from the inputs will be first received to the “clock 1” cell, so that data
will be processed first and the data from device cell “a” will be processed in the next
cycle. Similar process happens at the other two ends, input “B” and input “−1”.
If we notice, the data from device cell “a” is inverted input of A and B. So we can
say that inputs A and A’ are processed alternatively. This concept is applied for the
rest.
The main part of the XOR circuit is its majority gate. In this majority gate, the
input of “−1 or 1” decides the operation of either “AND” or “OR” gate shown in
Fig. 16. There are many possibilities of different combinations, but the valid output
is only A’B + AB’, and rest all are garbage outputs that can lead to disturbances in
output just as in output Q in Fig. 17.
5 Simulation
The wave touching the maximum value represents the binary digit of the “1” state,
and the wave touching the minimum value represents the ‘0’ state as shown in Fig. 17.
Since the output is similar to that of the XOR, the output waveform of Feynman
can be easily matched with the truth table of the XOR gate shown in Fig. 8.
The final processed data in the device cell can be examined by considering the
device cell as an output cell.
From Fig. 18, it can be observed that the device cell “a” stores the inverted data
of both the inputs, i.e., if A = 0 and B = 0, then a = 1 which is indicated by a square
wave.
The Abstraction of XOR Gate Using Reversible Logic 889
Table 1 Result analysis of Feynman and XOR gate in terms input and output cells, area, energy
dissipation
Gate No. of No. of No. of Area Total Total energy Average
input output device no. of dissipation dissipation
cells cells cells cells of energy
Feynman 2 2 1 0.02 µm2 15 1.78E−2 1.62E−3
gate
XOR gate 2 1 1 0.02 µm2 14 1.5E−2 1.36E−3
6 Result
From Table 1, it can be seen that two gates (Feynman and XOR) have the same
area, but different dissipation energies. According to the obtained stats, XOR gate
is more suitable and efficient design but Feynman gate gains its advantage due to its
reversibility concept.
7 Conclusion
Many complicated circuits rely on quantum gates for their implementation. The XOR
gate, like the other digital gates, is an important gate in QCA. It is the only classical
gate that is a replica of two or more quantum gates. The XOR and Feynman gate
890 U. Sharma et al.
circuits are developed using the QCAD designer tool in this paper, and the exact
working of the XOR gate, as well as data flow via each cell, is explained. The design
is evaluated in terms of input and output cells, as well as area and energy dissipation.
In the future, more attention will be placed on how device cells perform and how
they process data. This will provide more information on how reversible circuits and
a few sophisticated circuits function.
References
1. Lent CS, Douglas Tougaw P, Porod W, Bernstein GH (1993) Quantum cellular automata.
Nanotechnology 4(1):49–57
2. Lent CS, Douglas Tougaw P (1997) A device architecture for computing with quantum dots.
Proc IEEE 85(4):541–557
3. Sheikhfaal S, Angizi S, Sarmadi S, Moaiyeri M, Sayedsalehi S (2015) Designing efficient QCA
logical circuits with power dissipation analysis. Microelectron J 46:462–471. https://doi.org/
10.1016/j.mejo.2015.03.016
4. Singh G, Sarin RK, Raj B (2016) A novel robust exclusive-OR function implementation in QCA
nanotechnology with energy dissipation analysis. J Comput Electron 15:455–465. https://doi.
org/10.1007/s10825-016-0804-7
5. Abdullah-Al-Shafi M, Shifatul M, Bahar AN (2015) A review on reversible logic gates and its
QCA implementation. Int J Comput Appl 128(2):27–34
6. Tripathi D, Wairya S (2021) A cost efficient QCA code converters for nano communication
applications. Int J Comput Digit Syst
7. Lent CS, Douglas Tougaw P (1993) Lines of interacting quantum-dot cells: a binary wire. J
Appl Phys 74(10):6227–6233
8. Lent CS, Douglas Tougaw P, Porod W (1994) Quantum cellular automata: the physics of
computing with arrays of quantum dot molecules. In: Proceedings workshop on physics and
computation, PhysComp’94, pp 5–13. IEEE
9. Majeed AH (2017) A novel design binary to gray converter with QCA nanotechnology. Int J
Adv Eng Res Dev 4(9)
10. Sridharan K, Pudi V (2015) Design of arithmetic circuits in quantum dot cellular automata
nanotechnology. Springer International Publishing, Cham
11. Goswami M, Mondal A, Mahalat MH, Sen B, Sikdar BK (2019) An efficient clocking scheme
for quantum-dot cellular automata. Int J Electron Lett 6:1
12. Liu W et al (2014) A first step toward cost functions for quantum-dot cellular automata designs.
IEEE Trans Nanotechnol 13(3):476–487
13. Bahar A, Waheed S, Habib A (2014) A novel presentation of reversible logic gates in quan-
tumdot cellular automata (QCA). In: 2014 international conference on electrical engineering
and information communication technology (ICEEICT), pp 1–6
14. Bahar N, Waheed S, Hossain N, Saduzzaman M (2017) A novel 3-input XOR function imple-
mentation in quantum dot-cellular automata with energy dissipation analysis. Alex Eng J
56:1–9. https://doi.org/10.1016/j.aej.2017.01.022
15. Feynman RP (1985) Quantum mechanical computers. Opt News 11(2):11–20
16. Bahar AN et al (2018) A novel 3-input XOR function implementation in quantum dot-cellular
automata with energy dissipation analysis. Alexandria Eng J 57(2):729–738
17. Balakrishnan L, Godhavari T, Kesavan S (2015) Effective design of logic gates and circuit
using quantum cellular automata (QCA). In: 2015 international conference on advances in
computing, communications and informatics (ICACCI), 10 Aug 2015, pp 457–462. IEEE
18. QCADesigner 2.0. https://qcadesigner.software.informer.com/2.0/
VisionX—A Virtual Assistant
for the Visually Impaired Using Deep
Learning Models
1 Introduction
One of the beautiful gifts to the humans are their vision which plays an important role.
It is crucial to us which helps in our daily life. But by thinking about the fact given by
the World Health Organization (WHO), in 2018 about 1.3 billion people suffer from
vision problems globally. Amongst them, approximately 39 million human beings
are blind, and more or less 246 million humans have mild visual limitations.
The facts these days presented by means of [1] and world health organization
proves that around 1.3 million people (1.96%) earth’s total 7.7 billion people is visu-
ally impaired consequently there may be a bigger need to resolve for such ultimatum
and this is justified by South African records. Because of visual impairment, one
needs to depend on others for their daily needs or sometimes compromise because of
that illness. But with the Technology and industrial revolution happening and Artifi-
cial Intelligence for automation we can develop an equipment which can help blind
to do their daily tasks without depending on others [2]. Around the globe, there are
135 million visually impaired people out of which 45 million are blind people, visual
disability have a great impact on one’s life since they can’t see anything and has to
depend on others for doing their daily tasks [3]. In these days, modern high-tech
world the need of independent dwelling is important for visually impaired people.
They may live in their daily environment, but in strange and new environments they
can’t live easily without any manual aid [4].
Can you imagine? Taking help of others for doing using simple tasks. Some people
can’t walk without taking help from others [5]. They should depend on others for
doing basic tasks as well. This problem is increasing too fast in many people, so
researchers are developing in new technologies to assist and help such people. This
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 891
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_75
892 A. B. Royal et al.
work aims at developing an assistive glass for helping this people which takes input
from camera on user command and process it and give it in the form of audio response
to the user.
2 Literature Survey
Even though there are apps for helping visually impaired it didn’t completely solve
the problem. So researchers came with idea of using camera integrated on glasses
to ease the usability but it also didn’t completely ease so we wanted to integrate a
voice assistant which will ease them to do their jobs like reading, knowing about their
environment classifying objects and identifying people. This helps visually impaired
for doing their daily jobs without depending on others.
The authors in [6] proposed a system where user can capture images in the smart
phone which are then processed by the OCR model and extracts text from the image
which is the given out in the form speech using a text to speech module.
The system user interface consists of a login page, registration page and also
displays the text extracted from the image [7]. There are many devices for helping
visually impaired for having perception around environment using touch or sound
but the text reading systems are still in development. The authors developed an OCR
system for supporting visually impaired people and the system consists of an image
scanning module which takes image and feature extraction module which extracts
features by binarizing, segmentation which helps in segmenting different parts of the
image into different segments and then the relevant features are extracted and finally
the recognition system to perceive the textual content within the pics.
By considering the advancements in the recent technologies the authors developed
an intelligent assistant chatbot e-book reading [8]. Here, the rest API’s are used for
analysing the different images which is powered by google cloud vision engine.
The images then are classified into various classes like landmark detection, emblem
detection, express content detection. They used cloud speak API for speech to text.
Feature extraction is the main step in object detection and text detection [9,
10]. The two ways of feature extraction methods, namely, scale invariant features
transform technique (SIFT) and speeded up sturdy features (SURF) in which the
sooner one was designed to fit images or objects of various scenes [11, 12]. The
paper aims in developing a system that restores crucial function of vision which is
perceiving about the surrounding environment. The system uses the object detec-
tion using feature extraction which is scale invariant feature transform (SIFT). The
main idea is to generate a local gradient patch around the key point. Many stages
of Gaussian filtered picture are subtracted to construct the difference of Gaussian
image. For further calculation variations of Gaussian (DoG) for every octave locate
extrema from DoG is considered.
The blind can receive the information about surroundings with help of sound
where they implemented an algorithm for image recognition by sonar. In this, edge
detection is the main and first step of algorithm which is used for preprocessing [13].
VisionX—A Virtual Assistant for the Visually Impaired … 893
To reduce noise associated with the images, applying a Gaussian blur to a picture
is just like convolving the photograph with a Gaussian function. It is because of the
reality that a Fourier transform of a Gaussian is also another Gaussian, hence making
use of a Gaussian blur has the impact of lowering the picture’s excessive-frequency
components.
With a purpose to extract text from then photo it is critical to recognize prop-
erties of the image and additionally textual content in it [14, 15]. We will see that
textual content extraction from synthetic or document photograph is less complicated
than scene image because scene image can be blended with noise and blur. Textual
content detection techniques can be divided as area-based, connected-component
based and texture-based techniques wherein edge based techniques use exceptional
mask like Sobel, Roberts, Prewitt and Canny to discover edges from photograph.
Connected component-based method uses graph algorithm, wherein text is taken
into consideration as mixture of related additives.
In another work proposed by the authors, a smart walking stick is designed which
uses the Raspberry Pi 3b+ as a principal micro controller, global Positioning system
(GPS) and ultrasonic sensors [16, 17]. The GPS system is used for navigation and
guidelines for his destination whilst ultrasonic sensors have been used for locating
barriers within the environment via soundwaves. The output is given through headset
linked via Bluetooth.
3 Proposed Method
The proposed system helps visually impaired by detecting different objects and text
present in the surroundings. It helps visually impaired in reading books as well as
different texts present in the images.
The architectural design of the system is shown in Fig. 1, where Esp32 camera
module is used for capturing images and the proposed system is based on a client–
server model. In the proposed method the Esp32 is made as Bluetooth server and the
mobile application is made as client.
Here, the images are taken from the camera module and are then passed onto to
mobile application, where the processing is made using object recognition model
and pytessaract for OCR, and the result is given as speech.
3.1 Hardware
Esp32
Esp32 is a low-cost integrated board system on chip (SoC) microcontroller. It is an
upgraded version of ESP8266 SoC and is shown in Fig. 2.
894 A. B. Royal et al.
It has dual core microprocessor with integrated Bluetooth and Wi-Fi. Esp32
board supports code dumps to make hardware work as it is expected to and can
be programmed using Arduino IDE, Platform IO IDE, MicroPython and other such
IDE’s. Esp32 has a secure boot and flash encryption making it a bit more secure than
its previous versions.
Esp32 Camera Module
The Esp32-CAM is a totally small digicam module which has the Esp32-S chip
which costs approximately $10 besides the OV2640 camera, and numerous GPIOs
to attach peripherals, it additionally functions a microSD card slot is very useful here
since we can store the required photographs concerned with the digicam or to keep
documents to serve to clients.
FTD1
The FTDI USB to TTL serial converter module is a universal asynchronous receiver-
transmitter (UART) board used for TTL serial communication. It is a breakout board
for the FTDI FT232R chip with a USB interface, can use 3.3 or 5 V DC and has Tx/Rx
and different breakout factors. The block diagram in Fig. 3 shows FTD1 module.
FTDI USB to TTL serial converter modules are used for popular serial applica-
tions. It is popularly used for communication to and from microcontroller develop-
ment boards consisting of ESP-01s and Arduino micros, which do not have USB
interfaces.
3.2 Dataset
Microsoft Common Objects in Context (COCO) is the dataset used inside the object
detection segment. The primary version of MS COCO dataset was launched in 2014.
It includes 164 K images cut up into training (83 K), validation (41 K) and test
(41 K) units. In 2015 extra test set of 81 K pics became launched, along with all of
the previous test pics and 40 K new pictures. Primarily based on community remarks,
in 2017 the training/validation split became changed from 83 K/41 K to 118 K/5 K.
The new break up makes use of the equal pictures and annotations. The 2017 take
a look at set is a subset of 41 K pictures of the 2015 test set. Moreover, the 2017
launch consists of a new unannotated dataset of 123 K pictures.
3.3 Methodology
3.3.1 Input
The input of the system is taken from the Esp32 and with Bluetooth it is taken to
the mobile application. Here, we are creating a server in the Esp32 and creating a
client in the mobile along with an interface to access the image files on request. The
problem with flutter and Esp32 is flutter takes images in the form of bytes, so we
need to synchronize and keep delays for loading the image so that we can store image
in the directory structure.
Here, the Esp32 camera module is made as a Bluetooth server which is responsible
for capturing images and then passes onto mobile client where the various models
are run on the mobile where the output is given as speech.
Various models in the system are as follows.
MobileSSDNet
The mobile net structure has two layers depth-wise convolutions and point-wise sepa-
rable convolutions. The depth-wise separable convolutions are a shape of factorized
convolutions. This model standardizes convolutions into depth-wise convolutions
while 1 × 1 convolutions into point-wise convolutions. In those sort of neural nets
depth-wise convolution applies a single filter to every input channel. The Pointwise
convolution then applies a 1 × 1 convolution to mix the outputs. The depth-wise
convolutions split this into two layers: a separable layer for filtering and a sepa-
rable layer for combining. This factorization has the effect of notably decreasing the
computation and model length.
The depth-wise convolutions unlike traditional convolutions splits the multipli-
cation operation into sub operations like 1 × n and n × 1 from n × n thus reducing
VisionX—A Virtual Assistant for the Visually Impaired … 897
Fig. 4 a Depthwise
convolution. b Pointwise
convolution
significant amount of computations (Fig. 4a). The main operation n × n is split into
1 × n and n × 1 which is reduced. This operation does not have significant change on
a small dataset but for a large dataset containing more features and more rows it has
reduced more costly computations. For example consider a computation involving
3 × 3 filter the depth wise separable convolutions splits into 3 × 1 and 1 × 3 filters
thus reducing the computations from 9 operations into 6 operations.
The point-wise convolutions in Fig. 4b, are a type of convolutions which uses a 1 ×
1 kernel, a kernel that iterates through every point. It has a depth which is equal to the
number of channels the input is having. It is used along with depth-wise convolutions
to form a depth-wise separable convolutions. The depth-wise separable convolutions
can then be used as layers for the mobile net convolutions which are then used for
object detection since images have large number features this can reduce the cost of
computations.
OCR
Optical character recognition (OCR) is an ability of a computer to detect different
characters present in the images or video scenes. In the last few decades, many
researchers have developed many different techniques to perform this task. The OCR
is already in use in many sectors like banking like automatic filling of forms etc. The
OCR has different steps involved as the text is scattered into many segments of the
images. The segmentation is required since the characters are widely spread in an
image it is necessary to group them into meaningful chunks which makes the detected
characters into a meaningful text. In the recent years, many companies are providing
various OCR engines for OCR and Pytessarct by Google is the most efficient and
most accurate OCR engine present.
Optical character recognition (OCR) is a method that is used to discover varies
characters present within the photograph. Optical character recognition or optical
character reader is the digital or mechanical conversion of pictures of typed, hand-
written or published textual content into device-encoded textual content, whether or
not from a scanned record, an image of a file, a scene-picture or from subtitle textual
898 A. B. Royal et al.
4 Results
The proposed system uses Esp32 Cam for taking images and uses deep learning
models and tesseract for performing object detection and Optical character Recog-
nition.
The model used in the research is mobile SSD net which has an accuracy of 97%
on training set and 88% on test set when Sigmoid and Tanh activation functions were
used and is shown in Fig. 5. A comparison is also made with the Sigmoid and RELU
activation functions as shown in Fig. 6.
VisionX—A Virtual Assistant for the Visually Impaired … 899
5 Conclusion
The research is mainly focused on helping visually impaired using deep learning
algorithms and Pytessaract OCR Engine. The proposed system consists of an Esp32
900 A. B. Royal et al.
camera module and also has a voice assistant which takes input in form of images
and gives output in the form of speech.
The research focuses on helping visually impaired by identifying different objects
and also helps in reading different text present in the images and also identifying
objects in real time.
References
1. Mathur A, Pathare A, Sharma P, Oak S (2019) AI based reading system for blind using OCR.
In: 2019 3rd international conference on electronics, communication and aerospace technology
(ICECA), pp 39–42. IEEE
2. Gopinath J, Aravind S, Chandran P, Saranya SS (2015) Text to speech conversion system using
OCR. Int J Emerg Technol Adv Eng 5(1)
3. Felix SM, Kumar S, Veeramuthu A (2018) A smart personal AI assistant for visually impaired
people. In: 2018 2nd international conference on trends in electronics and informatics (ICOEI),
pp 1245–1250. IEEE
4. Jabnoun H, Benzarti F, Amiri H (2014) Object recognition for blind people based on features
extraction. In: International image processing, applications and systems conference, pp 1–6.
IEEE
5. Jabnoun H, Benzarti F, Amiri H (2015) Object detection and identification for blind people
in video scene. In: 2015 15th international conference on intelligent systems design and
applications (ISDA), pp 363–367. IEEE
6. Gopala Krishnan K, Porkodi CM, Kanimozhi K (2013) Image recognition for visual impaired
people by sound. In: International conference on communication and signal processing
7. Panchal AA, Varde S, Panse MS (2016) Character detection and recognition system for visu-
ally impaired people. In: 2016 IEEE international conference on recent trends in electronics,
information & communication technology (RTEICT), pp 1492–1496. IEEE
8. Shandu NE, Owolawi PA, Mapayi T, Odeyemi K (2020) AI based pilot system for visually
impaired people. In: 2020 international conference on artificial intelligence, big data, computing
and data communication systems (icABCD), pp 1–7. IEEE
9. Yi C, Tian Y (2014) Scene text recognition in mobile applications by character descriptor and
structure configuration. IEEE Trans Image Process 23(7):2972–2982
10. Chinchole S, Patel S (2017) Artificial intelligence and sensors based assistive system for the
visually impaired people. In: 2017 international conference on intelligent sustainable systems
(ICISS), pp 16–19. IEEE
11. Saeed NN, Salem MAM, Khamis A (2013) Android-based object recognition for the visually
impaired. In: 2013 IEEE 20th international conference on electronics, circuits, and systems
(ICECS), pp 645–648. IEEE
12. Gaudissart V, Ferreira S, Thillou C, Gosselin B (2004) SYPOLE: mobile reading assistant for
blind people. In: 9th conference speech and computer
13. Bourbakis NG, Kavraki D (2001) An intelligent assistant for navigation of visually impaired
people. In: Proceedings 2nd annual IEEE international symposium on bioinformatics and
bioengineering (BIBE 2001), pp 230–235. IEEE
14. Oak SA, Vidhate A (2016) Improved duplicate address detection for fast handover mobile IPv6.
In: The international conference on computing communication, control and automation, IEEE
section (ICCUBEA)
15. Koustriava E, Papadopoulos K, Koukourikos P, Barouti M (2016) The impact of orientation
and mobility aids on way finding of individuals with blindness: verbal description vs. audio-
tactile map. In: International conference on universal access in human-computer interaction,
pp 577–585. Springer, Cham
VisionX—A Virtual Assistant for the Visually Impaired … 901
16. Varma M, Zisserman A (2003) Texture classification: are filter banks necessary? In: Proceedings
of 2003 IEEE computer society conference on computer vision and pattern recognition, vol 2,
p II-691. IEEE
17. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In:
European conference on computer vision, pp 818–833. Springer, Cham
Analyzing the Performance of Novel
Activation Functions on Deep Learning
Architectures
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 903
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_76
904 A. Chaturvedi et al.
2 Activation Functions
Activation functions serve as a link between the data sent to the input layer and the
neuron presently in use, as well as the outcomes sent to the final output layer. Neuron
activation is determined by computing the weighted sum of activation functions and
then adding bias to the total [7]. Neurons with similar information are triggered. The
activation of neurons is governed by a set of principles. A set of rules governs the
firing of neurons. The main purpose of activation functions is to bring non-linearity
into the system. Forward propagation passes the results of activation functions to
the next layer. An error is computed if the output value differs considerably from
the true value. The process is then referred to as reverse propagation. Consider the
neural network shown in Fig. 1.
For linear activation functions, the above network layers can be pictured as-
Layer1:
where
Y (1) is the output of layer 1.
W (1) denotes the weights matrix for input to hidden layer neurons, i.e., w1 , w2 ,
w3 , and w4 .
M denotes the inputs i1 and i2 .
b denotes the vectored bias assigned to hidden layer neurons, i.e., b1 and b2 .
c(1) is the vectored sort of any linear function.
Similarly Layer 2:
∴ Y (2) = w ∗ M + b
ReLU seems to be a linear activation function, yet it allows the network to converge
fast. However, because there is a derivative, backpropagation is attainable. Because
a model that utilizes it is quicker to train and generally produces higher performance,
it has become the default activation function for many types of neural networks.
AReLU is the generalized version of the activation function ReLU used in this study.
It is further shown analytically and empirically that the approximation to ReLU,
designated as AReLU, is constant and discrete at the “knee-point,” and that AReLU
906 A. Chaturvedi et al.
does not quite require significant parameter adjustment. Extensive experience demon-
strates unambiguously that the insights derived from the mathematical theory are
supported by performance metrics that outperform ReLU. AReLU is written as:
y = kx n
SBAF is the generalized version of the Sigmoid activation function in this study.
Furthermore, Banach space theory and contraction mapping have been used to
demonstrate that SBAF may be viewed as a solution to a first order differential
equation. This is analyzed by comparing system utilization metrics such as runtime,
memory, and CPU consumption. The function has acceptable analytical properties
and does not appear to have any local oscillation problems. Saha-Bora Activation
Function is formulated as:
1
y=
1 + kx α (1 − x)(1 − α)
The first thing that comes to mind while looking at this layout is that it looks quite a
bit like ReLU, with one exception: the domain about 0 is not the same as ReLU. It
is a non-obtrusive function. That is, unlike ReLU, it does not abruptly shift direction
near x = 0. However, it bends smoothly from 0 and higher and then back upwards.
As a result, unlike ReLU and the other two activation functions, it does not remain
steady or move in one direction and is also non-monotonic. According to the authors,
it is this trait that distinguishes Swish from most other activation functions, which
share this uniformity. Swish Activation Function is uninterrupted at all points. It is
termed as-
here σ (x) is the sigmoid function and β can either be a constant defined prior to the
training or a parameter that can be trained during training time. The derivative of the
Swish Activation function is
Here, alpha and scale are constants where alpha = 1.673 and scale = 1.0507. The
values for alpha and scale are chosen in such a manner that the mean and variance
of the values supplied as input are maintained throughout all layers and the inputs
are appropriately big.
ELU, which stands for Exponential Linear Unit, is an activation function with compa-
rable functionality to ReLU but less differences. ELU conducts resilient deep network
training, resulting in higher classification precision. In comparison to other activation
functions, ELU includes a saturation function for dealing with the negative section.
When the unit is disabled, the activation function is reduced, causing ELU to execute
quicker in the presence of noise. ELU does not have the issue of disappearing and
bursting gradients. It is uninterruptible and distinguishable at all places.
2.7 Mish
f (x) = x tanh(softplus(x))
( ( ))
= x tanh ln 1 + e x
908 A. Chaturvedi et al.
ex ω
f ' (x) =
δ2
where
CNN is the most successful model in the field of image processing. It has accom-
plished great results in image classification, recognition, semantic segmentation, and
machine translation, and can freely learn and extract features of images. CNN has
solved or partially solved the issues of low performance, lack of actual images,
and segmented operation of traditional machine learning methods. The impor-
tant advantage of CNN model is that they can extract features without applying
segmented operation while obtaining satisfactory performance. Features of an object
are consequently extracted from the original data. Kunihiko Fukushima introduced
the Neocognitron in 1980, which propelled CNNs. The development of CNNs has
made the innovation progressively productive and automated.
The first and foremost step is to feed the dataset into a CNN classifier to complete
the classification task. It uses different activation functions referenced in the review.
Our CNN architecture includes two convolutional layers (one convolutional oper-
ation and one max pooling operation), one fully connected layer and one output
layer.
The CNN architecture shown in Fig. 2, includes one input image, two convolu-
tional pooling layers, one fully connected layer, and one output layer. The sizes of
the convolutional filters and of the pooling operation are 5 × 5 and 2 × 2, respec-
tively. The first layer has three convolutional filters, the subsequent layer has five
convolutional filters, and the fully connected layer has 50 units. In all classification
experiments, we set the learning rate to 0.01 and the size of a batch is 100. The same
architecture is run on different datasets and activation functions. The CNN archi-
tecture is chosen to limit the computational complexity and retain the classification
accuracy.
Analyzing the Performance of Novel … 909
3.2 DenseNet
Densely Connected Convolutional Networks, i.e., DenseNet, are the new choice for
many of the datasets. When we try to analyze, we find that CNN has flaws. Because the
route for information to go from the initial input layers to the final output layer (and for
the gradient in the other direction) is longer and larger, they disappear before reaching
the other side of the layer. Residual Network (ResNet) also a type of Convolution
Neural network architecture that makes it possible to construct networks with up to
thousands of convolutional layers, which outperform shallower networks. DenseNet
is similar to ResNet but achieves more precision. In other networks, the results of the
preceding layer are linked to the next layer via a sequence of processes. Convolution
operations on pooling layers, batch normalization, and an activation function are
often included in the collection of procedures. DenseNet, do not combine the layer’s
feature maps with the input features of other layers, but rather concatenate them. The
equation reshapes again into:
xl = Hl (x0 , x1 , x2 , . . . , xl − 1)
DenseNet are seen as groups of DenseBlocks, with the number of filters changing
but the size of the maps remaining constant. Transition layers are the layers that
exist between these blocks. Normalizing layers by re-centering and re-scaling inputs
stabilizes neural networks. The preceding layers’ results are normalized by executing
1 × 1 convolution and 2 × 2 pooling layers. We concatenate the outputs of all layers
here, thus dimensions are increased at all levels. The generalization at the kth layer
following H 1 that produces ‘m’ features every time is known as:
m l = m 0 + m X (k − 1), heremisgrowthrate.
910 A. Chaturvedi et al.
The following layers add additional functionality to the layers that came before
them. Transition blocks execute 1 × 1 convolution with filters, followed by 2 × 2
pooling with a stride of 2, resulting in a 50% decrease in feature maps and size. As
we progress through the network, the number of levels increases ‘m’ times. Initially,
1 × 1 convolution is conducted using 128 filters, followed by 3 × 3 convolution.
This is accomplished through the usage of 32 feature maps. This is known as the
growth rate.
Finally, each layer in each denseblock performs the identical set of operations on
the input, and the resulting outputs are concatenated. Each layer contributes fresh
information to the ones that came before it.
4 Dataset
We have used a combination of computer vision datasets in this paper. These bench-
mark datasets are obtained from UCI Machine Learning Repository. We have exper-
imented on Intel I5 2nd generation 8250U Processor (1.6–3.4 GHz Turbo Boost)
8 GB RAM and 1 TB Hard Drive, with 2 GB Dedicated AMD Graphics Card. The
following datasets were used-
Cifar 10: The CIFAR-10 Dataset, as the name implies, contains images from ten
distinct categories. There are 60,000 images in 10 distinct categories, including
Airplane, Automobile, Bird, Cat, Deer, Dog, Frog, Horse, Ship, and Truck.
The images are all 32 × 32 pixels in size. There are 50,000 train images and
10,000 test images in all.
Cifar 100: CIFAR 100 is identical to CIFAR 10, except it has 100 classes with 600
images in each class. There are 500 training images and 100 testing images in each
class. CIFAR 100 consists of 100 classes divided into 20 super classes. Each picture
has a “fine” label and a “coarse” label. Fine labels define the class to which they
belong, whereas coarse labels identify the superclass to which they belong.
MNIST: MNIST stands for Modified National Institute of Standards and Tech-
nology. This collection includes 60,000 tiny square 28X28 pixel grayscale pictures
of handwritten single numerals ranging from 0 to 9.
Fashion MNIST: Fashion MNIST is a dataset that contains 60,000 examples of the
training set and 10,000 examples of the test set, both of which are photographs from
Zolando’s articles. Each example is a 28 × 28 grayscale picture with a label from
one of ten classifications.
Analyzing the Performance of Novel … 911
5 Methodology
The goal of this research is to evaluate the performance of bespoke activation func-
tions on various deep neural architectures such as CNN and DenseNet. Because these
activation functions worked remarkably well on vanilla neural networks, it will be
interesting to anticipate the model’s loss and accuracy using these activation func-
tions on deep neural architectures. The proposed system consists of the subsequent
steps:
• Step 1: Determine the datasets on which the model must be trained.
• Step 2: Create train-test data partitions.
• Step 3: Perform one hot encoding on train and test datasets.
• Step 4: Create a predictive model with the CNN and DenseNet architectures.
• Step 5: Test the model and tune hyper-parameters.
• Step 6: Use categorical CrossEntropy as loss function and Adam as optimizer.
• Step 7: Loss and accuracy values are plotted.
Model Architecture: The model essentially recreates a network that functions
similarly to neurons in our brain. The network is taught to make predictions based on
previously collected data. In this study, we investigated the performance of various
activation functions using deep neural networks such as CNN and DenseNet. Cate-
gorical CrossEntropy, a robust and popular loss function utilized in most of the
classification problems, was computed during each execution. There are various
objectives to complete, one of which is that an example can only belong to one of
the potential categories. The model determines the category it is in here.
6 Performance Analysis
All of the following activation functions have been evaluated on various networks
such as CNN and DenseNet, as well as on various datasets such as MNIST, Cifar10,
Fashion MNIST, and so on. We have a variety of activation functions, and it is
difficult to select one that is appropriate for all test situations. Many aspects come
into play, including whether or not it is differentiable, how quickly a neural network
with a particular activation function converges, how smooth it is, whether it fits the
constraints of the universal approximation theorem, and whether or not normalization
is retained.
As we have seen that we have used different datasets on CNN and DenseNet
architectures with multiple activation functions. Here are a few plots that show the
performance of the ReLU and AReLU activation functions (Figs. 3, 4, 5 and 6).
912 A. Chaturvedi et al.
(a) CNN MNIST ReLU accuracy (b) CNN MNIST ReLU loss
Fig. 3 ReLU was passed on CNN over MNIST dataset, with an accuracy of 0.9734 and had loss
of 0.1025
(a) DenseNet CIFAR10 ReLU accuracy (b) DenseNet CIFAR10 ReLU loss
Fig. 4 ReLU was passed on DenseNet over CIFAR10 dataset, with an accuracy of 0.8083 and had
loss of 0.9868
Fig. 5 AReLU accuracy showed upto 0.9162 and a loss of 0.2386 on fashion MNIST dataset
Analyzing the Performance of Novel … 913
Fig. 6 AReLU was tested on DenseNet over CIFAR10 dataset, and accuracy was upto 0.7769 and
had loss of 1.24
7 Conclusion
References
1. Ginge G et al (2015) Mining massive databases for computation of scholastic indices: model
and quantify internationality and influence diffusion of peer-reviewed journals. In: Proceedings
of the 4th national conference of institute of scientometrics, SIoT, pp 1–26
2. Anisha RY et al (2017) Early prediction of LBW cases via minimum error rate classifier: a
statistical machine learning approach. In: IEEE international conference on smart computing
(SMARTCOMP), pp 1–6
3. Saha S et al (2016) DSRS: estimation and forecasting of journal influence in the science
and technology domain via a lightweight quantitative approach. Collnet J Sci Inf Manage
10(1):41–70
4. Safonova M et al (2021) Quantifying the classification of exoplanets: in search for the right
habitability metric. Euro Phys J Spec Top 230(10):2207–2220
5. Basak S et al (2020) CEESA meets machine learning: a constant elasticity earth similarity
approach to habitability and classification of exoplanets. Astron Comput 30:100335
6. Ravikiran M et al (2018) TeamDL at SemEval-2018 Task 8: cybersecurity text analysis using
convolutional neural network and conditional random fields. *SEMEVAL
7. Hebbar PA et al (2022) Theory, concepts, and applications of artificial neural networks. In:
Applied soft computing. Taylor & Francis, p24
8. Saha S, Mathur A, Bora K, Basak S, Agrawal S (2018) A new activation function for artificial
neural net based habitability classification. In: 2018 international conference on advances in
computing, communications and informatics (ICACCI), 2018, pp 1781–1786. https://doi.org/
10.1109/ICACCI.2018.8554460
9. Ramachandran P et al (2017) Swish: a self-gated activation function. Neural Evol Comput: n
pag. arXiv
10. Basak S, Mathur A, Theophilus AJ et al (2021) Habitability classification of exoplanets: a
machine learning insight. Eur Phys J Spec Top 230:2221–2251. https://doi.org/10.1140/epjs/
s11734-021-00203-z
11. Mohapatra R et al (2021) AdaSwarm: augmenting gradient-based optimizers in deep learning
with swarm intelligence. In: The IEEE transactions on emerging topics in computational
intelligence. https://doi.org/10.1109/TETCI.2021.3083428
12. Yedida R, Saha S (2021) Beginning with machine learning: a comprehensive primer. Euro Phys
J Spec Top 230:2363–2444. https://doi.org/10.1140/epjs/s11734-021-00209-7
13. Prashanth T et al (2021) LipGene: Lipschitz continuity guided adaptive learning rates for fast
convergence on Microarray Expression Data Sets.” IEEE/ACM transactions on computational
biology and bioinformatics; https://ieeexplore.ieee.org/document/9531348
14. Saha S et al (2021) DiffAct: a unifying framework for activation functions. In: International
joint conference on neural networks (IJCNN), pp 1–8
15. Mediratta I et al (2021) LipAReLU: AReLU networks aided by Lipchitz acceleration. In: 2021
international joint conference on neural networks (IJCNN), pp 1–8
16. Sarkar J et al (2014) An efficient use of principal component analysis in workload
characterization-a study. AASRI Procedia 8:68–74
17. Yedida R, Saha S (2019) A novel adaptive learning rate scheduler for deep neural networks.
ArXiv, abs/1902.07399
18. Makhija S et al (2019) Separating stars from quasars: machine learning investigation using
photometric data. Astron Comput 29:100313
19. Sridhar S et al (2020) Parsimonious computing: a minority training regime for effective predic-
tion in large microarray expression data sets. In: 2020 international joint conference on neural
networks (IJCNN), pp 1–8
20. Saha S et al (2018) A new activation function for artificial neural net based habitability clas-
sification. In: 2018 international conference on advances in comput ing, communications and
informatics (ICACCI), pp 1781–1786
Hybrid Model for Stress Detection
of a Person
1 Introduction
Stress is the body’s response to any Physical, Mental or Emotional pressure. Stress
has negative effects on human body and is classified into two types Acute stress
and Chronic stress. Acute stress generally lasts for short interval of time with high
intensity of stress whereas chronic stress effects the person over a long period of time.
During the Period of stress body releases adrenaline and hormone called cortisol also
known as stress hormone. Adrenaline is responsible for increase in the blood pressure,
heart rate and Cortisol is responsible for increase in blood sugar levels. The response
system for stress is usually self-limiting and when the body access the threat has
passed hormone levels drop and blood pressure and heart rate comes to baseline.
But when stressors are always present the body constantly feels under attack and
long-term activation of stress response and high cortisol levels will lead a person to
be diabetic as the body is unable to produce high insulin levels continuously.
Long term elevation in Blood Pressure is harmful and may lead to Heart failure.
Apart from these it also leads to feelings like anger, sad, fear or depression and
can push a person into complete mental illness such as BPD. Which causes mood
swings and unstable behavioral patterns in some stressful events like traveling in
public transport, cope with the work pressure in stressful working environments,
etc. A clinical study conducted by Oka, Department of psychosomatic medicine
from Kyushu University Japan. They have conducted a clinical study [1] where
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 917
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_77
918 A. C. Ramachandra et al.
they mainly concentrated on how stress effects core body temperature. During their
study on lab rats whenever these rats are placed in some unfamiliar environments
or in some of the stressful events, they have noticed the raise in body temperature
which they have termed as Psychogenic fever which is stress related fever. Some
people who are highly stressed their body temperature reached 41 ºC. They have
concluded that psychological stress increases the body temperature of a person. A
public dataset collected by the Alessio Rossi. In this dataset [2] they have collected
various Physiological responses from 22 People who have volunteered to provide
data. Time period for collecting the data was 24 h and parameters included heartrate,
sleep quality, physical activities and emotional status (anxiety status and stressful
events). In this study most of the people is having higher heart rate variation during the
time interval they have stated that they have undergone high workload comparatively
with that of not working conditions.
Consistent stress due to workload will have adverse effect on a person’s mental
well-being. It also contributes for unstable behavior and physical breakdown amid
people of various ages. High stress due to workload increases fatigue and chances of
cancer. Many people lack in identifying stress as it is very challenging to detect. By
the time people decide to take medical help from doctors they might be affected very
badly with noticeable ailments. So, to avoid this there is requirement to detect stress
in early stages. Many methods [3, 4] use a questionnaire-based surveys to detect
stress of an individual in this type of surveys the person has to spend needful time
to take such surveys. Some systems used only Facial emotion-based stress detection
which is not reliable [5–7]. Thus, these systems are not suitable for the Modern-day
approach and sometimes even these surveys might not be effective in determining
stress [8–11]. There is a need to develop a system which is reliable in predicting
stress of a person by taking into account the parameters which are more likely to be
affected in a stressful condition.
The aim is to make a Hybrid system that monitors some of the parameters which
can be used to predict stress such as Temperature, Heartbeat and facial Emotion of
a person. So, the objective of the proposed system will be collecting and monitoring
the Temperature, heartbeat of a person with the help of Microcontroller and detecting
the facial emotion of that person using pre-trained CNN model and Table 1 shows
the list of Emotions taken for training.
Total of 24,282 image of 5 emotion classes will be used to train the model. Finally,
by comparing both the outputs from Microcontroller and CNN model the system will
decide whether a person is stressed or not stressed as shown in Fig. 1.
The Overall architecture of the system is represented in Fig. 2. Hardware unit consists
of Temperature sensor and Pulse sensor for monitoring the corresponding tempera-
ture and heartbeat of a person. Both the sensors are connected to a microcontroller
board based on AT mega 328P. In Software unit we are using python [12–15] for
training a model which is capable of predicting stress based on emotion. The training
model will run in Software unit which is used to capture a frame from the video
stream and using the trained model and input sensor data from the microcontroller
the proposed system predicts the person is stressed or not stressed. Open CV is being
used for processing the image captured by the camera.
Using built in camera an image is captured, processing of the captured image
is done by using Python. Machine learning algorithms and Computer vision are the
strategies utilized by the checking framework for computations and used for real-time
920 A. C. Ramachandra et al.
application. Haar [16] feature-based cascade classifiers are used for Face Detection
[17, 18]. Figure 3 shows the different stages of face Detection system.
Detected face will act a Region of Interest for the CNN model which detects the
expression of the person and decide whether a person is stressed or not. When eyes
are wide open the person is in fear, when eyebrows shrink with eyes slightly closed
the person is angry, when a person head position is slightly downwards with closed
lips, he is sad. All these Emotions are used to detect stress. As only facial emotion of
a person is not reliable in deciding whether the person is stressed or not parameters
such as temperature of a person and heartrate of a person is taken into account.
Temperature sensor used is TMP36 contact type sensor, it is a low voltage precision
centigrade temperature sensor produces a voltage output that is linearly proportional
to temperature. The accuracy of the sensor is typically ± 2 ºC with operating range
of 2.7–5.5 V. It uses the property of the diodes and measures the small changes and
outputs an analog voltage between 0 and 1.75 V DC. TMP 36 has an offset of 500mv
which is equal to 0 ºC. For 10 mv change in voltage = 1 ºC change in Temperature.
• Temperature = (mv – 500)/10
The pulse sensor operating voltage range is between 3.3 and 5 V which contains
an LED and a photo diode for detecting the heartbeat. This sensor has two surfaces,
Hybrid Model for Stress Detection of a Person 921
first surface the light-emitting diode and light sensor is connected. On the second
surface, the circuit is connected which is responsible for the noise cancelation and
amplification.
Calculation of Heart rate from Inter Beat Interval (IBI):
The pulse sensor will calculate the heart rate based on the interval time between
two successive peaks in the heart rate R–R. The unit of IBI can be both millisecond
(ms) and in second (s). Since we need heart rate per minute, we convert the
milliseconds to seconds (60 × 100 = 60,000 ms) for example if IBI from R–R
is 650 ms.
• Heart rate = (60,000/IBI)
• Heart rate = (60,000/650)
• Heart rate = 92 beats/min.
Microcontroller receives the data from the sensors. Camera is used to capture a
frame from the video stream and face is detected using HAAR classifier and detected
face will now act as a Region of Interest (ROI). The CNN model predicts the facial
emotion the label generated from the Microcontroller which is read by the Python
using serial UART protocol. The result is compares based on the outputs from the
microcontroller and CNN model.
The hardware side of the system and the software side of the system will start simul-
taneously as shown in Fig. 4. These sensors are attached to the index finger of a
person.
Hardware side consisting sensors are connected to microcontroller for monitoring
the temperature and heartbeat of a person. The threshold value for heartbeat is set as
80 beats/min and Temperature as 38 ºC. Whenever the threshold value is reached the
system generates a label ‘X’ which will be read by the software part using UART
protocol and stores in a data variable. Similarly, the software part uses Anaconda
and python with OpenCV for capturing the frame, resizing the image as a part of
pre-processing. The image is given to pre-trained model to predict a person is stresses
or not stressed based on the emotion of a person. If the label ‘X’ is generated and
stress is predicted by the CNN model the output is given as stress has been detected.
4 Implementation
For the classification of emotion in a photo we use a convolution neural network. Here
we implement CNN using pre-trained Mobile net [19] neural network architecture
using Tensor Flow and Open CV in python platform as shown in Fig. 5.
922 A. C. Ramachandra et al.
Mobile Net is a convolutional neural network that tends to perform well on mobile
devices. It is based on an inverted residual structure where the residual connections
are between the bottleneck layers. The intermediate expansion layer uses lightweight
depth wise convolutions to filter features as a source of non-linearity. As a whole, the
architecture of Mobile Net contains the initial fully convolution layer with 32 filters,
followed by 19 residual bottleneck layers. Mobile Net layers are shown in Fig. 6.
The presented CNN learning methodology revolves around the Kaggle fer_2013
dataset. It contains different emotions, separated into training and testing dataset.
The training dataset consists of 24,282 images and testing dataset consists of 5937
images of 5 different classes which are angry, fear, happy, neutral and sad.
Steps in training:
• The dataset is imported from the system, which contain various emotions and is
broken into two parts, i.e., training and validation.
5 Results
During the time of 8:00 am to 8:30 am the person 1 and person 2 was resting and the
corresponding body temperature and heartbeat was noted. During the time of 8:30
am to 11:00 am the person 1 and person 2 was working in a stressful environment and
the corresponding body temperature and heartbeat was noted as shown in Tables 2
and 3.
926 A. C. Ramachandra et al.
It is noted that the variation of body temperature and heart rate is less when a
person is not working in the stressful environment and during working the body
temperature and heartbeat was comparatively high than that in resting as shown in
the graph Figs. 12 and 13.
In Fig. 14 we can clearly differentiate the period where the person is experiencing
stress due to excess of workload in stressful environment. When Temperature exceeds
> 38 ºC and Heartbeat > 80 beats/min label ‘X’ is generated by the microcontroller.
Simultaneously when the CNN model Predicts emotion which are predefined as
stressful emotions the system will display the output as Stress Detected as shown in
Fig. 15.
• When CNN model predicts Emotion as Angry with Label ‘X’ generated from
Microcontroller. The frame shows the person is stressed and related emotion as
Shown in Fig. 16.
• When CNN model predicts Emotion as Fear with Label ‘X’ generated from
Microcontroller. The frame shows the person is stressed and related emotion.
• When CNN model predicts Emotion as sad with Label ‘X’ generated from
Microcontroller. The frame shows the person is stressed and related emotion.
• When CNN model predicts Emotion as Happy with Label ‘X’ generated from
Microcontroller. The frame shows NO-stress with related emotion.
• When CNN model predicts Emotion as Neutral with Label ‘X’ generated from
Microcontroller. The frame shows NO-stress with related emotion.
• When CNN model predicts Emotion as Angry without Label ‘X’ generated from
Microcontroller. The frame shows NO-stress with related emotion.
6 Conclusion
The proposed system is able to identify the emotion present in the image based on
training labels. The system is accurately comparing the results from the Microcon-
troller and the CNN model. Further the system is able to display whether a person is
stressed or not. The validation accuracy is around 69%. More data in dataset in future
will help the classifier to produce the more accurate results as well as accuracy.
References
1. Oka T, Oka K, Hori T (2001) Mechanism and mediators of Psycholoical stress-induced rise in
core temperature. Psychosom Med 63(3)
2. Rossi A et al (2020) A public dataset of 24-h multi-levels psycho-physiological responses in
young healthy adults. Data 5(4):91
3. El-Samahy E et al (2015) A new computer control system for mental stress management
using fuzzy logic. In: 2015 IEEE international conference on evolving and adaptive intelligent
systems (EAIS). IEEE, 2015
Hybrid Model for Stress Detection of a Person 929
1 Introduction
Globally, the coronavirus disease (COVID-19) has spread rapidly, making social
separation an increasingly important preventative measure. This paper focuses on
building a surveillance system that combines deep learning, OpenCV, and computer
vision to ensure pedestrian safety, avoid overcrowding, and maintain a safe distance
between pedestrians.
Large crowds at the locations may exacerbate the existing scenario. Recently, all
countries around the world had been, and still are, under lockdown, forcing inhabi-
tants to stay at home. However, because this encourages people to visit more public
areas, religious sites, and tourist locations, this technique of measuring social sepa-
ration will be good all over the world in certain conditions. Limiting contact between
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 931
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_78
932 D. Shetty et al.
infected individuals and healthy individuals, or between populations with high trans-
mission rates and populations with low transmission rates, COVID-19 transmission
is reduced or disrupted by social distancing.
The rest of the paper is organized as follows. Section 2 is providing the system
architecture. Section 3 focuses on design. Section 4 describes the implementation.
Section 5 details the process and approaches. Section 6 lists a few test cases. Section 7
presents the results. Section 8 provides the impact of the work presented and the future
scope. Lastly, Sect. 9 draws conclusions.
2 System Architecture
The crowd behaviour is examined using three tasks in the proposed FOB conges-
tion control system: Object identification, tracking of the object itself and object
movement tracking. Here, the object is a human head. [1]
The faster R-CNN architecture helps to achieve the objective of head detection
by deploying the input frame on the pre-trained CNN model, such as GoogleNet’s
inception architecture [2]. “The region proposal network” (RPN) identifies the areas
in the feature map that might contain the objects by creating the score and bounding
boxes for the object proposals. Based on RPN’s recommended regions and CNN’s
feature maps, the “region of interest (RoI)” pooling layer is then utilized to extract
feature maps. Finally, using fully linked layers the bounding boxes are categorized
and fine-tuned using the output feature map. If the closeness criterion is not satisfied,
the item is handled as a novel object in crowd tracking. Every subsequent frame
shows a gradual movement of a certain object, i.e., object tracking in crowds. This
is demonstrated in Fig. 1.
3 Design
The proposed notion is built using Python 3, OpenCV, and the Caffe framework
in this study. Based on a segmented ROI flowchart, people can be detected for
social distance and safety violation alerts. The framework for detecting objects is
the most essential feature of this study. The primary goal of this system is to analyse
collected footage filmed for human detection and then analyse it for social distancing
violations. Techniques for image processing and the OpenCV Library are used here.
This study’s object detection model is run using a deep learning model framework
[3]. Because of the low execution time, the mobile net SSD model was chosen [4].
This is visually depicted in Fig. 2.
Parallelism in threads is exploited in this implementation. Using threading is a
good idea to reduce the amount of time it takes for each frame to process object
detection in this investigation. The frame will be run while the object detection is
processed using a multithreading technique [5–7]. To determine the distance between
Machine Learning-Based Social Distance Detection: An Approach … 933
two bounding boxes, the following strategy is adopted. The bounding box’s centre
point is used in this study to determine the distance between a pair of box’s bounds
locations. To calculate a bounding box’s centre point, the centre of the moment at
which the bounding boxes come together is used to tell the difference between a pair
of two distinct bounding box positions [8].
934 D. Shetty et al.
Fig. 2 Detection of social distance and safety violations based on the flow chart segmented by the
region of interest
Machine Learning-Based Social Distance Detection: An Approach … 935
4 Implementation
4.1 Methodology
Using deep learning and computer vision algorithms along with OpenCV and the
Tensor flow library. In the proposed framework, the focus is on recognizing indi-
viduals in video or image streams and whether or not social distance is preserved
[9, 10].
YOLOv3 is used to detect people within the video. This calculates the gap between
every person within the frame that has been observed. This follows a list showing
how many people are considered high, low, and not in danger. To introduce social
distancing detectors, deep learning and computer vision is used. To make a social
distancing detector, below mentioned procedures are followed:
• Identify all the people in the video stream using object detection [11]. Calculate
the pairwise distances between all of the individuals who are identified.
• Check whether any two individuals are N pixels apart. For object detection, YOLO
is employed, using which a bounding box around the objects (people) are drawn
once they have been identified.
• Calculate the distances between the boxes using the centroid of the boxes.
• The Euclidean distance is used to calculate the space. A box is coloured red if it
is unsafe, and green if it is safe [12].
You only look once (YOLO) is a real-time convolutional neural network that can be
used to detect objects in real time. Before segmenting the image into sections and
estimating bounding boxes and probabilities for each, to process the entire image,
the technique employs a single neural network.
A unique feature of YOLO is its ability to run in real time and its excellent accuracy.
To produce predictions, the method merely performs one forward propagation across
the neural network, so it “only looks at the image once. “It then outputs recognized
objects along with bounding boxes after suppression.
In the software, the picture is divided into areas and crop boxes and the likelihood
for each is determined using a single neural network. Usually, the boxes that define
the boundaries are weighed using determined likelihood. With YOLO, a single CNN
can forecast a few boxed boundaries and complexity probability, as well as shooting
crate, all at the same time. This optimizes detection efficiency by training on complete
images.
This model has a few distinct advantages over other models when it comes to
detecting objects:
936 D. Shetty et al.
• Because YOLO has seen the big picture throughout preparing and assessment, it
records qualitative information in relation to groups in addition to their look data.
• Because this method of learning is applicable for a wide range of situations when
trained on genuine images, even the most advanced detection algorithms currently
available are outperformed by YOLO.
The interrelation between deep learning, machine learning, and artificial Intelligence
is depicted visually in the Venn diagram in Fig. 3. AI’s main goal is to create a set of
algorithms and approaches that can be utilized to tackle a difficulty that people can
handle in a normal and almost automatic manner however that computers struggle
with.
A wonderful example of an AI challenge is elucidating and recognizing the parts
of a picture which is a job that a person could complete with less effort, however,
machines find this a quite tough job. Pattern recognition and data learning are two
areas where the machine learning subfield is particularly involved.
Deep learning is part of the ANN algorithm family, and the two concepts can
be used interchangeably in most situations. Deep learning has existed for more than
sixty years, under numerous titles and personification dependent on research patterns,
accessible technology, and repositories. The emphasis on the basics of deep learning,
including what makes a neural network “deep” and the idea of “hierarchical learning”
has contributed to deep learning is one of the most common machine learning and
computer vision applications today.
Pixels are the fundamental elements of any image. Every picture is made up of a
series of pixels. The pixel is the smallest granularity that can be used. A pixel is
commonly thought of as the “colour” or “intensity” of light visible in a specific area
of our picture. When viewed as a grid, each square in an image contains a single
pixel.
Each square in an image contains a single pixel if we consider it as a grid. The
following depicts an image with a resolution of 1000 × 750 pixels in Fig. 4, which
implies it is 1000 pixels wide and 750 pixels in height. A (multidimensional) matrix
can be used to describe an image. In this scenario, our matrix comprises 1000 columns
and 750 rows (the height). The total number of pixels in our picture is 1; 000 750 =
750; 000.
There are two ways to represent most pixels: Monochrome/single channel or
colour channel.
A scalar value between 0 and 255 represents each pixel in a grayscale image with
zero representing “black” and 255 representing “white”, with values that are closer
to zero being those that are darker and nearer to 255 being brighter. A grayscale
gradient of darker pixels on the left and lighter pixels on the right is shown in Fig. 5.
In RGB colour space, a pixel is made up of three values: one for each of the red,
green, and blue elements, instead of just a single scalar value, as in a grayscale picture.
In the RGB colour model, to determine colour, all that is required is the amount of
red, green, and blue contained in a single pixel. For a total of 256 “shades,” each red,
green, and blue channel can have values in the range [0; 255] where 0 denotes no
representation, while 255 denotes maximum representation.
The strength of it is usually reflected with 8-bit unsigned integers since the range
[0; 255] is required for pixel value. Along with this mean subtraction or scaling on
the image is also been done, which entails converting the image to a floating-point
data sort.
An RGB tuple is made by combining the three red, green, and blue values (red,
green, and blue). This tuple is a representation of a colour in the RGB colour space.
An additive colour space, such as the RGB colour space, is an example in which the
pixel brightens and approaches white as the amount of each colour is increased.
In Fig. 6, the RGB colour space (left) is visible. Adding red and green produces
yellow, which is evidenced by the fact that. Combining red and blue results in pink.
White is made up of three primary colours red, green, and blue.
Consider the following colour “white. “ Each of the red, green, and blue buckets
will be filled and as follows: (255, 255, and 255) is a three-digit number. Then, since
black is the absence of colour, each of the buckets (0, 0, 0) are emptied. To make
a pure red colour, the bucket is filled in red (except the red bucket): (255, 0, 0). A
cube is another typical representation of the RGB colour space (right-hand side of
the following diagram) because an RGB colour is a three-valued tuple with values
ranging from 0 to 255, maybe imagined as the 256 256 256 = 16; 777; 216 cube of
different colours, based on the amount of each bucket contains three colours: RGB.
The coordinate system for images a picture is depicted as a grid of pixels. To
appreciate this issue as an example, consider the grid to be a sheet of graph paper. The
genesis point of the image is in the upper-left corner. (0; 0) on this graph document.
The x and y values both increase as we step down and to the right. This “graph paper”
representation is depicted in Fig. 7. The letter “I” has been written on a scrap of graph
paper. It can be observed that this is a 64-pixel 8 8 grid. It’s worth noting that instead
of one, it starts from zero. Because Python is a zero-indexed programming language,
it always starts at zero.
RGB pictures are represented as multidimensional NumPy arrays in image
processing software such as OpenCV and scikit-image (height, width, and depth).
Machine Learning-Based Social Distance Detection: An Approach … 939
This representation also confuses readers who are new to image processing libraries:
Although we usually think of a picture in terms of width first, height second, why
does the height arrive before the width? Matrix notation explains the solution. When
defining the dimensions of a matrix it is often written as rows × columns. An image’s
height is determined by the number of rows, while its width is determined by the
number of columns. The quantity of information of the depth, however, will not
change. Seeing a NumPy array’s shape as (height, width, and depth) can seem
perplexing at first, but it makes intuitive sense when it comes to the construction
and annotation of a matrix.
RGB channels are saved in reverse order by OpenCV. OpenCV keeps pixel data
in BGR order despite our tendency of thinking in terms of RGB. This is done by
OpenCV for a purpose. Simple history explains why this is so. RGB ordering was
employed at the time by camera manufacturers and other software developers, the
OpenCV library’s early developers chose the RGB colour format.
Simply said, the order of the RGB values was determined and it is done because
of the past, as well as we must at this time respect that decision. It is a minor point
to remember while dealing with OpenCV, but it is crucial.
A CNN’s layers each apply a different group of filters, usually tens of thousands and
then aggregate the outcome before transmitting the next layer’s output. The values
of these filters are invariably learned by a CNN at the time of preparation.
A CNN might be able to:
• In the first layer, find edges from raw pixel data in the context of picture
categorization, then use these edges to recognize forms in the next layer (or
“blobs”).
940 D. Shetty et al.
3. An output image for the image convolved with the kernel’s output. Convolution
(i.e., cross-correlation) is very easy.
A CNN’s layers each apply a different group of filters, usually tens of thousands
and then aggregate the outcome before transmitting the next layer’s output. The
values of these filters are invariably learned by a CNN at the time of preparation.
CNN might be able to find edges in the first layer from raw pixel data in the context
of picture categorization, then use these edges to recognize forms in the next layer
(or “blobs”).
These forms can be used to recognize higher-level elements including faces, auto
elements, and so on in the network’s topmost layers. These higher-level features are
used by the final layer in a CNN to make assumptions about the image’s contents. An
(image) convolution is a multiplication of two matrices element by element followed
by a number in deep learning. Take two matrices and multiply them together (which
both have the same dimensions). Multiply each factor individually. (Note that this is
not a dot product, but rather a simple multiplication.)
Layer Types in CNN
Convolutional neural networks are built using a variety of layers, but the ones you
are most likely to see are:
• Activation (ACT)
• Pooling (POOL)
• Fully connected
• CONV (convolutional) is a term used to describe a type of convolute on (ACT or
RELU, where the actual activation function is the same) (FC)
• Batch normalization (BN) is the process of converting one batch of data into
another batch of (DO).
A CNN is constructed by stacking these layers in a specific order. To illustrate a
CNN, we frequently utilize simple text diagrams:
“CONV => RELU => FC => SOFTMAX INPUT => CONV => RELU => FC
=> SOFTMAX INPUT => CONV => RELUCNN: INPUT => CONV => RELU
=> FC => SOFTMAX” [15].
Table 3 Classification of
Test case 3
person test case
Name of the test Classification of person
Input Camera video input forming a frame
Expected output To classify person or individual in frame
Actual output Individual classified in frame
Result Successful
reports such as accuracy, recall, and f-measure, which are used to assess our network’s
overall effectiveness.
6 Test Cases
System developed is tested for its working and its performance. A few test cases
prepared for this are in Tables 1, 2, 3 and 4.
7 Results
According to the findings, the distance tracking system had an accuracy range of
56.5–68% when tested on outdoor and demanding input films, while indoor testing in
944 D. Shetty et al.
a controlled setting yielded 100% accuracy. Both are shown in Fig. 10. It was discov-
ered that the “safety violation alert feature based on segmented ROI had superior
accuracy,” ranging from 95.8 to 100% for all input videos analysed.
This implemented system development was performed using Python 3, OpenCV
for image processing techniques, and the Caffe object identification model frame-
work. Some research has been conducted to determine the efficacy of the system
that has been built, and findings have been acquired. “The MobileNet SSD Caffe
model” has been employed as the essential algorithm for detecting people. The main
surveillance video is from the full scene set in the lounge room, where the camera is
mounted high to acquire overhead view, is taken for programme tuning.
Due to the recent rapid spread of coronavirus disease (COVID-19), social separa-
tion has become one of the most important prophylactic methods to avoid physical
contact. The project’s main goal is to develop a surveillance system that employs
OpenCV, computer vision, and deep learning algorithms to track people and mini-
mize overcrowding while keeping a safe space between them. As the existing system
gives performance only up to a certain extent, i.e., the individuals not maintaining
social distancing can be identified overall, it can be further enhanced by incorporating
additional features and thereby extending the functionality provided. An application
that detects the location coordinates of an individual in real time can be introduced.
The application has to be installed by a particular individual on his mobile.
This application can then use the GPS feature through which it extracts the location
of that particular individual in a required frame and on violation of the set minimum
distance constraints; can be personally notified with the help of the app.
9 Conclusion
Fig. 10 Result footage still from both indoor and outdoor setting
be on accepting and complying with the WHO’s protections and laws, with the
person having ultimate responsibility for themselves rather than the government.
Because COVID-19 is disseminated by close contact with sick people, social distance
is unquestionably the most important factor. An efficient solution is required to
supervise huge crowds, and this is what our system aims to provide. Authorities can
maintain track of human activities and control big crowds by using cameras to bring
them back together and stop breaking the law. Using cameras to bring people back
together and stop breaching the law, authorities can keep track of human actions and
control large gatherings. They will be marked with a green boundary box if they are
staying a safe distance, and with a red boundary box if they are not.
946 D. Shetty et al.
References
1. Punn NS, Agarwal S (2019) Crowd analysis for congestion control early warning system on
foot over bridge. In: 12th International conference on contemporary computing (IC3), IEEE
2. Ren S et al (2015) Faster r-cnn: Towards real-time object detection with region proposal
networks. Adv Neural Inf Process Syst 28:91–99
3. Agarwal A, Gupta S, Singh DK (2016) Review of optical flow technique for moving object
detection. In: 2nd International conference on contemporary computing and informatics (IC3I),
IEEE
4. Punn NS et al (2020) Monitoring COVID-19 social distancing with person detection and
tracking via fine-tuned YOLO v3 and deepsort techniques. arXiv preprint arXiv:2005.01385
5. Charan SS, Saini G (2018) Pedestrian detection system with a clear approach on raspberry Pi 3.
In: International conference on inventive research in computing applications (ICIRCA), IEEE
6. Ahamad AH, Zaini N, Latip MF (2020) Person detection for social distancing and safety
violation alert based on segmented ROI. In: 10th IEEE international conference on control
system, computing and engineering (ICCSCE), IEEE (2020)
7. Tarimo W, Sabra MM, Hendre S (2020) Real-time deep learning-based object detection
framework. In: IEEE symposium series on computational intelligence (SSCI), IEEE
8. Li K, Lu C (2020) A review of object detection techniques. In: 5th international conference on
electromechanical control technology and transportation (ICECTT), IEEE
9. Xu Y et al (2016) Background modeling methods in video analysis: a review and comparative
evaluation. CAAI Trans Intell Technol 1(1):43–60
10. Tsutsui H, Miura J, Shirai Y (2001) Optical flow-based person tracking by multiple cameras.
In: Conference documentation international conference on multisensor fusion and integration
for intelligent systems. MFI 2001, IEEE
11. Dollár P et al (2005) Behavior recognition via sparse spatio-temporal features. In: IEEE interna-
tional workshop on visual surveillance and performance evaluation of tracking and surveillance,
IEEE
12. Niyogi SA, Adelson EH (1994) Analyzing gait with spatiotemporal surfaces. In: Proceedings
of IEEE workshop on motion of non-rigid and articulated objects, IEEE
13. Piccardi M (2004) Background subtraction techniques: a review. In: IEEE international
conference on systems, man and cybernetics, vol 4. IEEE
14. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional
neural networks. Adv Neural Inf Process Syst 25:1097–1105
15. Musaev M, Khujayorov I, Ochilov M (2020) The use of neural networks to improve the
recognition accuracy of explosive and unvoiced phonemes in Uzbek language. In: Information
communication technologies conference (ICTC), IEEE (2020)
16. Brunetti A et al (2018) Computer vision and deep learning techniques for pedestrian detection
and tracking: a survey. Neurocomputing 300:17–33
17. Zhao Z-Q et al (2019) Object detection with deep learning: a review. IEEE Trans Neural Netw
Learn Syst 3212–3232
18. Ahmed Z, Iniyavan R (2019) Enhanced vulnerable pedestrian detection using deep learning.
In: International conference on communication and signal processing (ICCSP), IEEE (2019)
Autism Spectrum Disorder Prediction
Using Machine Learning
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 947
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_79
948 A. C. Ramachandra et al.
neurons. These connections can be made between neurons. Some brain parts are
especially affected: cerebral cortex, basil ganglia, and amygdala (Fig. 2).
People are now taking better care of themselves by visiting the general health
check- ups should be done every six months or once a year. Take care of your mental
health, as people are so busy these days. Life is hard. Humans can be subject to stress
and pressure, and children might grow up alone. Their growth [4]. We propose a
model that will allow them to take advantage of this. Assessment of the people is
based upon their inputs. Decide if they are suffering with disorder or not. You can
take this assessment remotely at any time. Helps to diagnose the disorder and offer
therapy, if necessary, disorder. We can thus improve our quality of life. This model
was designed by us. Machine learning is used where the machine has been trained,
and the machine will automatically learn. Predict for the new input. This is because
machine learning is developing. It allows us to know more about the world and makes
it easier for us to make informed decisions. Technology is also used by people.
It is easy to find a doctor these days, and all diagnoses are available. Remote
assessment so that we can offer the prediction to the user the disorder.
2 Related Works
Huang and his colleagues discussed the identification of autism spectrum disorder
[5] in their discussion. As autism spectrum disorder (ASD) is on the rise, it is crucial
to recognize ASD. Patients for early intervention and effective treatment, particularly
in the area of childhood. Yaneva and colleagues worked to detect high-functioning
autism among adults [6] whether there are visual processing differences in adults with
and without high-powered vision. Eye tracking can be used for diagnosing autism
by capturing functioning autism. Yuan et al. analyzed the automatic identification of
high-risk autistic spectrum disorder [1, 7, 8] that the symptoms of autism spectrum
disorders (ASDs) could be improved by early intervention, which is a great way to
help the situation. Identification of ASD: Mostafa and colleagues presented the eigen-
values for brain networks diagnose [9] autism spectrum disorder neurodysfunction,
which causes Patients with repetitive behaviours and social instability are two exam-
ples. Reem Haweel and colleagues worked on Autism severity using a response-to-
speech study. A neurodevelopmental disorder associated with impairments of social
and lingual skills.
In the ASD population, failure in language development can be variable and
follow a wide range of factors. Spectrum Koirala et al. [10] conducted virtual reality-
based touch and visual sensory experiments processing assessment for adolescents
with autism spectrum disorders. Sensory individuals with autism spectrum disorder
(ASD) experience abnormalities. A developmental disorder affects children around
the world. Del Coco et al. [11] the mechanisms of stimulation of social interaction
in autism spectrum disorder have been studied. It has been proven that informa-
tion and communication technologies can have a tremendous impact on our lives.
Impact on children’s social, communicative, and language development Munoz and
co. presented software that supports the [12] improvement of the theory of mind in
autism spectrum disorder children. Sang wan lee et al., Deeply explored the strategic
and structural bases of autism spectrum disorders. Learning: clinical research is
conducted using deep learning models [3] to aid in diagnosis. Disease: it is difficult
to diagnose autism spectrum disorders (ASDs) because of its complexity. Complex
psychiatric symptoms and a general insufficient amount of medication can lead to
complex psychiatric problems. Neurobiological evidence.
Akter et al. worked on machine learning-based models for early-stage detection
autism spectrum disorders. Muhammad awais bin altaf and others analyzed on chip.
950 A. C. Ramachandra et al.
For chronic neurological disorders, processor [5] CNDs are lifelong illnesses that
cannot be eliminated, but they can be treated. Preventive measures taken early can
reduce the severity of these problems. Wang et al. worked on identification of autism-
based upon SVM–RFE [7] in order to improve the accuracy of classification based
on the complete autism brain imaging data exchange dataset, patients with autism.
They first applied the resting state functional magnetic resonance imaging data and
calculate the functional connectivity. They also adopted the support vector machine
recursive features.
They also trained an auto-encoder stacked sparse with two hidden layers for
extracting. The high-level latent features and complex features of the 1000 features.
Mingxia Liu et al. worked on the identification of autism spectrum disorder using
FMRI. They propose a multidisciplinary approach. Site [13] adaption framework
via low-rank representation decomposition (MALRR) for functional MRI (FMRI) is
used to identify ASDs. It is important to identify a common low-rank representation
of data from multiple sites. This is a goal to reduce. There are differences in data
distributions.
We wanted to create a model that could predict new inputs and disorder. This is
where the machine learning model must be trained. This requires the dataset. Next,
we must label the data in order for machine learning to take place. Next, we must
apply algorithm to machine to make right decision and predict output. All details are
provided in Fig. 3.
A comma separated values file is a file that contains delimited text and uses a comma
for separation values. Each data record is a line in the file. Each record is composed of
one or more data records. More fields separated by commas [2]. Use of the comma to
separate fields is the source name of this file format. CSV files typically store tabular
data. Plain text will display the lines with the same number fields as in plain text.
We are here we need to have the dataset of individuals and their related symptoms,
so we are gathering by conducting survey. We are looking at 1100 patients and 21
attributes. Any family members who have jaundice are advised to check the symptoms
and where they live. Disorder before it was related to genetic factor.
Data labelling is an essential part of data pre-processing in ML, especially for the
case of data labels. Supervised learning is where input and output data are labelled
in order to create a learning base for future data processing. Data labelling can also
be used. Constructing ML algorithms to create an autonomous model [4]. Labelling
data is intended to aid the machine to learn because not all data is understandable
by humans machine. So, labelling the item will allow machine to make the correct
decision about new set data. To move data between programmes, you do not normally
use, and you can use a CSV file. Data exchange possible.
3.3 Classifier
Classification is the process by which data points are predicted to belong to a partic-
ular class. There are two types of classes. Sometimes referred to as targets/ labels/
categories. Predictive modelling for classification is the task of approximating the
mapping function (f ), from input variables (X), to discrete output variables (y). The
category of supervised learning includes classification, where the targets are also
included in the input data. There are [14] different applications in classification in
many domains, such as credit approval, medical diagnosis, and target marketing.
Although there are many classification algorithms now, it is not sufficient. It is
possible to determine which one is superior. It all depends on the application. Nature
of the dataset: [1] if the classes can be separated linearly, for example, linear classifiers
such as logistic regression and fisher’s linear discriminant may outperform high-end
models and vice versa.
Prediction is the output of an algorithm that has been trained using historical data.
Dataset used to forecast the likelihood of a particular event result. Machine learning
algorithms look for rules that will allow them to. Determine the general characteristics
of elements in a group to achieve the goal of Apply the learning to other elements.
952 A. C. Ramachandra et al.
We have labelled the data so that it is easy for machines to understand. We now
need to apply classifier, and there are many options available. Requirement: there
are many symptoms in our project that are called attributes. We need to verify that
the attributes are present. This can be done by Because it works on the probability
function, the Naive Bayes classifier is used. Number of instances: this is illustrated
by 8 instances and 4 attributes’ 8 patients (Table 1; Fig. 4).
Calculating probability function for attributes A1 to A4.
A1 Yes No
(continued)
(continued)
0 1/3 2/5
1 2/3 2/5
A2 Yes No
0 1/3 2/5
1 2/3 3/5
A3 Yes No
0 0 3/5
1 1 2/5
A4 Yes No
0 0 3/5
1 2/3 2/5
New instance 1
A1 A2 A3 A4
0 1 0 1
New instance 2
A1 A2 A3 A4
1 1 1 1
• P(NEW|YES) = 0
P(NEW|NO) = 0.036.
• P(NEW|NO) > P(NEW|YES)
For new instance 1, we can see occurrence of no is greater than occurrence yes,
hence, we can say the user has no disorder.
Similarly for new instance 2, we got occurrence of yes is greater, so the user is
found suffering from disorder. This is how the prediction works.
• P(NEW|YES) = 0.111
P(NEW|NO) = 0.025
• P(NEW|YES) > P(NEW|NO).
Attributes we are considering from each individual are shown in Table 3.
We are using Tables 2 and 3 to help us formulate the assessment questions for you.
We will explain in detail how it works behind the scenes during the implementation.
We are also considering how we will interface with the server and user. We are
currently considering. These attributes are important as follows,
3. To find out which part of the world is most popular; you need to know your
ethnicity and country suffered.
4. To learn more about autism spectrum effects of jaundice in children born with it
disorder.
5. To determine if a family member has this disorder, genetic factors are considered.
It is also known as hereditary.
6. If there is any significant change in the daily routine, a screening test will be
performed.
4 Implementation
In the implementation stage, how we programme in the backend. Here, we are using
PyCharm IDE tool. To work on this firstly, we need to create a new environment
for new project. Once environment is created, install all the packages and libraries
required for our project. If everything is set, we can see the command in the terminal
that environment is created.
Once the environment is set, import all the libraries and dataset.
956 A. C. Ramachandra et al.
Once the dataset is imported, we need to label the data, and this done use
fit_transform() function for this. Labelling will help target labels, and it is also used
to transform non numerical data. fit_transform() is used on the training data so that
we can scale the training data and also learn the scaling parameters of that data. Here,
the model built by us will learn then used to scale our test data.
Once we have used the machine learning algorithm, we need an interface between
the user interface and server so we need a frame work which acts as a interface. For
this, we are using Flask framework.
Figure 5 explains the interface between the user end and server. The ML model
in the server space receives the input from the flask which is entered by the user in
the frontend. The ML model works with the probability function and gives output
predicted back to the use space.
Autism Spectrum Disorder Prediction Using Machine Learning 957
These attributes we are using which are named as A1_Score to A10_Score. Next,
we are giving the details required from the user in the user registration page. Then,
login page allows the user to login for assessment using the gmail id and password. If
new user is using, he/she need to register first and then login next. Also, there is about
958 A. C. Ramachandra et al.
page which helps the users to know about the disorder and steps to be taken to take
the assessment. Based on the prediction output, we are suggesting required therapy
and diagnosis required. The post and get will transfer data. url_for will redirect to
the templates designed for the Web page. Some of them are home page, user page,
login page, about page, and suggestion page.
5 Results
In the results, we are going to show the graph where we can analyze how many
patients have disorder with highest number of attributes. Along with that we are
showing the how the home pages, about page, and assessment pages look (Figs. 6,
7, 8, 9, 10, 11, 12, 13, 14, 15, 16 and 17).
ATTRIBUTE COUNT OF
patient attributes 12
EACH PATIENT
10
8
6
4
2
0
0 5 10 15 20 25
NO OF PATIENTS
6 Conclusion
We have analyzed the data for adults, adolescents, and toddlers. We are now ready
to share our findings with you. We conclude that this classifier works best when we
apply the Naive Bayes probability function. Predict the presence of the symptoms or
attributes by examining the frequency. The proposed model can accurately predict
and treat the disorder. Early diagnosis is key for patients. Future studies will be based
on ethnicity, gender, and country. We can identify the most affected areas and alert
them to help save their lives.
References
2. Haweel R et al (2021) A novel grading system for autism severity level using task-based
functional MRI: a response to speech study. IEEE Access
3. Ke F et al (2020) Exploring the structural and strategic bases of autism spectrum disorders with
deep learning. IEEE Access 8:153341–153352
4. Akter T et al (2019) Machine learning-based models for early stage detection of autism spectrum
disorders. IEEE Access 7:166509–166527
5. Aslam AR, Altaf MAB (2020) An on-chip processor for chronic neurological disorders assis-
tance using negative affectivity classification. IEEE Trans Biomed Circ Syst 14(4):838–851
6. Yaneva V et al (2020) Detecting high-functioning autism in adults using eye tracking and
machine learning. IEEE Trans Neural Syst Rehabil Eng 28(6):1254–1261
7. Wang C et al (2019) Identification of autism based on SVM-RFE and stacked sparse auto-
encoder. IEEE Access 7:118030–118036
8. Chén OY et al (2020) Building a machine-learning framework to remotely assess Parkinson’s
disease using smartphones. IEEE Trans Biomed Eng 67(12):3491–3500
9. Mostafa S, Tang L, F-X Wu (2019) Diagnosis of autism spectrum disorder based on eigenvalues
of brain networks. IEEE Access 7:128474–128486
10. Koirala A et al (2021) A preliminary exploration of virtual reality-based visual and touch
sensory processing assessment for adolescents with autism spectrum disorder. IEEE Trans
Neural Syst Rehabil Eng 29:619–628
11. Del Coco M et al (2017) Study of mechanisms of social interaction stimulation in autism
spectrum disorder by assisted humanoid robot. IEEE Trans Cogn Dev Syst 10(4):993–1004
12. Munoz R et al (2018) Developing a software that supports the improvement of the theory of
mind in children with autism spectrum disorder. IEEE Access 7:7948–7956
13. Wang M et al (2019) Identifying autism spectrum disorder with multi-site fMRI via low-rank
domain adaptation. IEEE Trans Med Imag 39(3):644–655
14. Tamilarasi FC, Shanmugarn J (2020) Evaluation of autism classification using machine learning
techniques. In: 2020 3rd international conference on smart systems and inventive technology
(ICSSIT). IEEE, 2020
15. Huang Z-A et al (2020) Identifying autism spectrum disorder from resting-state fMRI using
deep belief network. IEEE Trans Neural Netw Learn Syst 32(7):2847–2861
16. Zhao Z et al (2019) Applying machine learning to identify autism with restricted kinematic
features. IEEE Access 7:157614–157622
17. Banire B et al (2020) The effects of visual stimuli on attention in children with autism spectrum
disorder: an eye-tracking study. IEEE Access 8:225663–225674
18. Liang S et al (2021) Autism spectrum self-stimulatory behaviors classification using explain-
able temporal coherency deep features and SVM classifier. IEEE Access 9:34264–34275
Early Detection of Infection in Tomato
Plant and Recommend the Solution
A. C. Ramachandra, N. Rajesh, N. B. Megha, Apoorva Singh,
and C. R. Prashanth
1 Introduction
Agriculture has been the basis of human existence since its inception. India’s main
occupation is agriculture. India is second in terms of agricultural production. Wherein
variety of crops are grown. Modern organic farming has brought more attention to
quality and yield. As the number of crops increases yearon year, so do the diseases.
Plant diseases can ruin agricultural yields. It is a serious problem for food safety.
Climatical conditions are not control of humans, and this is a major setback for
farmers and hence a big loss. Due to uncontrolled change in climate, the agriculture
sector is attacked by millions of pests. This should be detected in early stages, failing
which there are the chances of completed failure in crop yield. The symptoms can be
seen in different parts of plants, such as the leaves, stems and lesions, and the fruits.
The leaf will show the symptoms by changing color or showing spots. This process
will help in plant disease classification and detection, which leads to better quality
and higher plant productivity.
Traditional methods for diagnosing disease require extensive knowledge and expe-
rience in the field. Manual observation and pathogen detection are the best methods
for diagnosing disease. However, this can be costly and time-consuming. Farmers
used to monitor their crops at regular intervals. If they could not identify the disease
symptoms, they would apply a certain amount of pesticide or fertilizer which may
lead in reduction in yield.
The absence of disease can lead to incorrect fertilizer applications, which ulti-
mately harm both the plant as well as the soil. Farmers often resort to pesticides and
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 963
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_80
964 A. C. Ramachandra et al.
expensive methods to avoid these diseases. This approach also increases production
costs and causes major monetary losses to farmers. Effective disease management
begins with early detection.
To improve accuracy and minimize detection of traditional leaf diseases, as well as
to take into account leaf position, we use image processing with the neural network.
The proposed approach improves the detection of tomato diseases and can even
suggest treatments.
The aim is to develop a sheet recognition algorithm based on specific features
from photography. This therefore introduces an approach where the plant is identified
based on the properties of its leaves such as area, histogram equalization, and edge
detection and classification. The main purpose of this algorithm is to use OpenCV
resources.
The tomato plant is considered for experimental study. Compare to all other plants,
tomato plant is quite sensitive, and it requires particular weather conditions to grow.
As the prize of tomato is fluctuating in our day-to-day life, it is very important to
detect the diseases reduce the loss.
Table 1 shows the life span of tomato plant stage by stage. It is very important to
observe or monitor the plant during the period of 30 to 40 days. Because during thig
stages, there is a high chance of plants getting infected. If we monitor plants correctly
during this period, then we can reduce the loos of production due to infection.
To help farmers, a new method to identify tomato diseases is suggested. Our
approach is able to detect tomato diseases more accurately. For experimentation,
we are using totally 2511 image of 5 disease classes to train the model. Finally, by
comparing both the outputs from ResNet-50 and CNN model, the system gives the
better accuracy compared to existing methods. The different images with respect to
the diseases are as shown in Table 2.
2 Related Work
The rapid advancement of computers in the last few years has made vision and deep
learning possible. This has significantly increased image recognition’s flexibility
as well as accuracy. Deep learning is able to extract classifications in a better way
compared to other technology. Using deep learning, features can be extracted directly
without the need to use classifiers. Deep learning is an effective method of classifying
in many situations. It works very effectively in at generalization, especially when it
comes to the extraction of complex and special features.
Aravinth et al. [1] introduced a method to identify brinjal leaf diseases like Bacte-
rial Wilt and Cercospora Leaf Spot. Collar Rot and the method to detect diseases with
care. Artificial neural network was used for classification. K-means clustering algo-
rithm was used for segmentation, and texture features identification is used for feature
identification. Kamlapurkar proposed a system that can give more precise results in
the classification and identification of disease from an image of a leaf. They used
different methods [2] such as pre-processing, training, and identification. They used
feature extraction to classify images and diagnose. Zhou et al. had restructured the
residual dense network to identify tomato leaf diseases. The hybrid deep learning
model combines the best of dense and deep residual networks. This can improve the
accuracy of [3] calculations as well as increase the flow of information. The model
achieves a top-one average identification accuracy, according to experimental results.
Ding and colleagues had used tomato leaves for their experiments. They used [4]
deep learning to extract disease features from the leaf surface. ResNet-50 is used as
the base network model in this experiment. Subhajit Maity and colleagues proposed
a simple method to detect leaf diseases [5] by using images of leaves. This was done
with image processing and segmentation. For identification, they used Otsu’s method
and k-means clustering. Ding et al. used a pixel wise [4] instance segmentation tech-
nique, mask region-based convolutional neural network, of an improved version, in
order to detect cucumber fruits. This [6, 7] research identifies the disease in four
stages: image acquisition, image segmentation, and feature extraction. The extracted
features include contrast, energy, homogeneity, and mean, standard deviation, vari-
ance, and energy. Saxen and colleagues proposed an easy and quick face alignment
method for pre-processing. They also address the [8, 9] problem of estimating facial
attributes using RGB images for mobile devices. MobileNetV2 and NASNet mobile
are two lightweight CNN architectures.
The flow of the system is shown in Fig. 1. Firstly, we are taking raw images of five
different diseases of different sizes. To make size of the images, same pre-processing
is done. The block diagram is the proposed model which is as shown in Fig. 1.
966 A. C. Ramachandra et al.
Tomatoes are one of the most widely grown agricultural crops. It is grown extensively
in both north and south India. The experiment produced 2511 images showing the five
most prevalent tomato leaf diseases: Septoria, bacterial spot, and leaf mold fungus.
The data was obtained from Kaggle. Some examples can be seen in Fig. 2.
3.2 Pre-processing
The database is pre-processed, such as image reshaping and resizing. The test image
also undergoes similar processing. Pre-processing refers to the improvement of image
data in order to suppress unwanted distortions or enhance some important image
features for further processing. The resizing of image is shown in Fig. 3.
Fig. 2 Dataset
The dataset has five different plant diseases. Any image can be used as a test. CNN
uses the train dataset to train the model so it can identify the test images and deter-
mine the disease. CNN has many layers: dense, dropout, and activation as well as
968 A. C. Ramachandra et al.
convolution2D and maxpooling2D. These layers can be used to extract the feature
or classify the disease. The algorithms is able to detect the disease in a plant species
once it has been trained. This is done by comparing features from the test image and
tarin. The trained model and the test image can detect the disease in the leaf.
The experiment involved a classification and sorting of the training photos. These
were then placed in the folder that corresponded to the disease category name.
Comparing the ResNet-50 network with its original ResNet-50 network, we found
different activation functions and convolution kernel size. For the classification of
diseases in a image, we use a convolution neural network. Here, we implement CNN
using pre-trained ResNet-50 [10] neural network architecture using TensorFlow and
OpenCV in Python platform.
4 Result Analysis
The experiments is done using the available dataset which is having images of all
five different diseases considered. The pre-processed images are shown in Table 3.
The values are for all five diseases with five models. The values are noted which are
numbers of iteration, as the number of iteration increases, the diseases are identified
in specific. The values obtained are plotted using bar chart, and it is evident that the
infection is identified only with more numbers and iteration as shown in Fig. 4.
The weights obtained by both techniques are compared to calculate in efficiency
of a proposed model. The efficiency of each infection, for each model, is shown in
Table 4. These values are plotted as shown in Fig. 5. It is observed that between fifth
of sixth iteration, the infection is identified in specific, and hence, the healthy leaf
value comes down.
5 Conclusion
The proposed model is evaluated using two methods to extract features, identify, and
classify. The experimentation has been done using the existing dataset and a sample
dataset created by our own images. The proposed model is able to perform better
compared to existing models, because of dual model application, the results obtained
are shown in Table 4, where the accuracy is calculated for different diseases.
References
1. Anand R, Veni S, Aravinth J (2016) An Application of image processing techniques for detec-
tion of diseases on Brinjal leaves using k-means clustering method. In: 2016 5th international
conference on recent trends in information technology, 2016
2. Kamlapurkar SR (2016) Detection of plant leaf disease using image processing approach. Int
J Sci Res Publ 6(2)
3. Zhou C et al (2021) Tomato leaf disease identification by restructured deep residual dense
network. IEEE Access 9
4. Ding J et al (2020) A tomato leaf diseases classification method based on deep learning. In:
2020 Chinese control and decision conference (CCDC). IEEE, 2020
5. Maity S, Sarkar S, Vinaba Tapadar A, Dutta A, Biswas S, Nayek S, Saha P (2018) Fault area
detection in leaf diseases using k-means clustering. In: 2018 2nd international conference on
trends in electronics and informatics (ICOEI). IEEE, 2018
6. Kumar V, Arora H, Sisodia J (2020) ResNet-based approach for detection and classifica-
tion of plant leaf diseases. In: 2020 international conference on electronics and sustainable
communication systems (ICESC). IEEE, 2020
7. Saxen F, Werner P, Handrich S, Othman E, Dinges L, Al-Hamadi A (2019) Faceattribute
detection with MobileNetV2 and NasNet- mobile. In: 11th international symposium on image
and signal processing and analysis (ISPA), 2019
8. Ren S, He K, Girshick R, Sun J (2016) Faster R-CNN: towards real-time object detection with
region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6) (2016)
9. Yin X et al (2020) Enhanced faster-RCNN algorithm for object detection in aerial images. In:
2020 IEEE 9th joint international information technology and artificial intelligence conference
(ITAIC). vol 9. IEEE, 2020
10. Jiang P et al (2019) Real-time detection of apple leaf diseases using deep learning approach
based on improved convolutional neural networks. IEEE Access 7:59069–59080
Design and System Level Simulation
of a MEMS Differential Capacitive
Accelerometer
1 Introduction
Accelerometers are electromechanical devices which are used to measure the acceler-
ation of a moving system. Wide range of applications require acceleration measure-
ment. These inertial sensors are also useful in applications where motion sensing
such as vibration detection, shock and tilt is required [1]. As the MEMS accelerom-
eters are small in size, consume low power and offer high precision, they are widely
used in automobiles for airbag deployment, flight control and navigation, smart-
phones, etc. [2]. These advantages make them suitable for IoT-based applications
such as industrial automation, structural health monitoring, condition monitoring of
machines, medical applications and many more.
The basic mechanical structure of MEMS accelerometer contains proof mass
supported by spring and appended to a dashpot [3]. The spring and the dashpot are
in turn connected to frame as shown in Fig. 1. When an external force is applied,
the proof mass gets displaced from its rest position and this displacement can be
measured by some electronic circuitry to know about the acceleration. Based on the
principle of sensing, the accelerometers are classified as piezoelectric, piezoresistive
and capacitive accelerometers.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 971
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_81
972 S. Veena et al.
/
where resonant frequency (ωres ) = k
m
√
km
Quality factor (Q) = (4)
b
The magnitude of Eq. 4 can be plotted as in Fig. 2. It can be seen that for frequencies
much lower than the resonance frequency, the system responds with a linear propor-
tional trend of 1/K. Now considering that the force that will cause the displacement
is F = ma, it is easy to find out the relationship between x and the acceleration for
frequencies much lower than ωres :
ma a
x= = 2 (5)
k ωres
2 Principle of Operation
The capacitive accelerometers basically comprise a pair of parallel plates, one fixed
and the other movable. The parallel plates are very commonly used as interdigital
comb fingers as shown in Fig. 3 [5] which can be moved in lateral (in-plane) direction
or in vertical (out-of-plane) direction. As the lateral dimensions of the plate can be
increased to a few tens of millimeters, it is used for sensing the acceleration in lateral
direction, whereas the vertical dimensions of the plates are restricted to only few
microns and hence suitable for sensing the acceleration in traverse direction.
From Fig. 3a, it is seen that 3 comb fingers (one of which is movable) form two
capacitors, C 1 and C 2 , and any force applied to the movable finger results in a change
in capacitance. This differential capacitance may be measured, and thus, the voltage
associated with it can be determined using Eq. 6. Out-of-plane forces can be caused
due to additional parasitic capacitances such as those between the fingers and the
body, as well as the asymmetry of the fringing fields, which can be reduced with
more complex designs.
(a) (b)
(C1 − C2 )
Vs = Vout ( ) Vm (6)
C1 + C2 + Cp
H H
C 1 = E0 Er (L + x) and C2 = E0 Er (L − x) (7)
d d
(C1 − C2 ) x
Vout = ( )= V (8)
C1 + C2 + Cp L
So, from the above equation it is seen that the when the fingers are accelerated,
the output voltage is directly proportional to the displacement.
Now, from Hooke’s law as applied to Fig. 3b, the displacement of proof mass (x)
is given by [7]
F M
x= = ∗a (9)
k k
where F is the force applied to the mass of the system (M) by the external acceleration
(a) through the springs whose stiffness is given by [8]
Etw 3
k= (10)
4L 3
Design and System Level Simulation of a MEMS Differential … 975
Ma
Vout = V (11)
kL
3 Accelerometer Structure
The two accelerometer structures are as shown in Fig. 4a, b. Model 1 is accelerated
in lateral direction, and Model 2 is accelerated in traverse direction.
Model 1: The accelerometer structure shown in Fig. 4a has unique structure with
movable and fixed parts. The movable part consists of two proof masses which are
symmetrically suspended to a central anchor by a single folded beam on one side and
interdigitated sensing fingers on the other side as shown in figure. The two electrodes
on either side of the proof mass have the fixed fingers attached to it as shown in figure.
In response to the force applied, the two proof masses vibrate, and movable fingers
form the interdigitated finger capacitor pair C 1 and C 2 with the fixed fingers across
which the differential capacitance can be measured. This differential capacitance
varies linearly with the applied acceleration [10].
Model 2: The accelerometer structure shown in Fig. 4b has a micromechanical proof
mass which is placed at the central part of the accelerometer and acts as the sensing
element. This proof mass is suspended by four serpentine springs and will move with
respect to the moving frame of reference when an external acceleration is applied.
(a) (b)
The displacement made by the proof mass gives the measure of the acceleration
applied to it. In the capacitive approach, the change in the capacitance between the
proof mass and the fixed electrodes is measured to determine the displacement. The
dimensions of both the models are kept same and are as shown in Table 1.
4 Analytical Modeling
The analytical calculations to find the natural frequency of Model 1 and Model 2 are
discussed in this section.
Model 1
The spring constant for the folded beam for Model 1 is found to be 10 Nm from
Eq. 10.
The total sensing mass of Model 1 accelerometer is given by [10]
So, from the above equation, the total sensing mass is found to be m =
3.53625 * 10−8 kg.
The natural frequency of the lateral structure is given by Eq. 13 and is found to
be 2.3 kHz
/
1 k
fn = (13)
2π m
Design and System Level Simulation of a MEMS Differential … 977
Model 2
The spring constant for the Model 2 is calculated by Eq. 14 and is found to be
115.8 Nm [3].
( ( ))
π4 Etws6
k=2 (14)
6 (2L 31 ) + (2L 32 ) + (2L 33 ) + (2L 34 ) + (2L 35 )
m = ρV (15)
5 Simulation Study
(a) (b)
Table 2 Comparison of
Resonant frequency Analytical value in Simulated value in
frequencies of the 2 models
kHz kHz
Model 1 2.3 2.1
Model 2 8.03 7.83
(a) (b)
of the mass along this line as shown in Fig. 7a. For Model 2, analysis is made by
applying acceleration in traverse direction. The displacement in the y direction is
analyzed using a 3D cut line along the x-axis while keeping the y constant in the
center of the structure as shown in Fig. 7b.
It can be seen that the deviation is in the order of a few nano-meters for Models
1 and 2. Also, it is observed that the maximum deviation for Model 1 is 1 nm and is
Design and System Level Simulation of a MEMS Differential … 979
Table 3 Effect of
Acceleration (g) in m/s2 Displacement in µm
acceleration on the
displacement of the models Model 1 in µm Model 2 in µm
1 18 * 10−6 25 * 10−7
2 35 * 10−6 5 * 10−6
3 50 * 10−6 8 * 10−6
4 7* 10−5 10 * 10−6
5 9 * 10−5 12 * 10−6
6 10 * 10−5 16 * 10−6
7 12 * 10−5 18 * 10−6
8 14 * 10−5 20 * 10−6
9 16 * 10−5 20 * 10−6
10 18 * 10−5 25 * 10−6
(a) (b)
Fig. 7 a Displacement in x direction when passed through 3D cut line with x = constant. b
Displacement in y direction when passed through 3D cut line with y = constant
10 nm for Model 2. The stress analysis of the two structures is as shown in Fig. 8a, b.
It can be seen that maximum stress is accumulated on the springs at a point where it is
attached to the proof mass. Model 1 has 2 springs connected to the either proof mass.
So Fig. 8a shows 2 maximum stress points. Figure 8b shows 4 points corresponding
to the maximum stress values which relate to the four springs in Model 2.
Capacitance: Electrostatic study is done in COMSOL to find the capacitance of the
two models which is shown in Fig. 9a, b. It is observed that the capacitances 1.31 pF
and 16.4 pF are obtained from Models 1 and 2, respectively.
To understand the resonant behavior of the two accelerometers, MATLAB simu-
lations are done. The maximum response and phase response of Model 1 and Model
2 as a function of applied external force are as shown in Fig. 10a, b, respectively. It
is seen that the natural frequency of the Model 1 is 2.6 kHz and 7.8 kHz for Model
2.
980 S. Veena et al.
(a) (b)
(a) (b)
(a) (b)
Fig. 10 a Phase plot for Model 1. b Phase plot for Model 2
Design and System Level Simulation of a MEMS Differential … 981
6 Conclusion
The results from the COMSOL simulations are almost in line with the analytical
values calculated. The fundamental issue with these simulations is that the capaci-
tance remains nearly constant with variable acceleration; therefore, a variation of the
output voltage cannot be calculated in differential mode from the two capacitances
C 1 and C 2 . To get appreciable variations in capacitance, much larger displacements
would be required, perhaps several orders of magnitude, both for lateral and traverse
motion, and to obtain them, very strong accelerations may be required, which could
lead to an excessive accumulation of stress and possibly cause the structure to break;
as seen, springs are the points where the most stress accumulates, and these could
be the critical points.
It is worth noting that excessive accelerations are not only unnecessary for the
purposes for which this gadget is proposed, but they are also nearly impossible to
achieve. To use capacitances for position detection, the system’s displacement must
be greatly increased, which can be accomplished by expanding the entire volume.
But, increasing the volume without increasing the K of the springs, or perhaps without
increasing their size, could result in stiffening of the system springs. Furthermore,
982 S. Veena et al.
as the springs are too small in comparison with the rest of the system, they may be
subjected to excessive stress, resulting in structural damage.
One more method can be thought of is to reduce the distance between the fingers
but however this is not possible as further reduction in distance between the fingers
may not be acceptable for fabrication. When the results of both the models are
compared, it is observed that the natural frequency of the Model 1 is 2.1 kHz and that
of Model 2 is 7.8 kHz. Model 2 exhibits better displacement for various accelerations
as compared to Model 1.
Acknowledgements This work is a part of ISSS community chip activity. The authors are grateful
to Prof. Ananth Suresh, Chairman, ISSS, IISc Bangalore, for giving the opportunity to participate
in community chip activity supported by ISSS, IISc, Bangalore. The authors are thankful to Dr.
Habibuddin Shaik, Associate Professor, Physics Department, NMIT, Bangalore, for the support
extended in carrying out this work. The authors also thank Mrs. Nithya and Mrs. Stuthi, Centre
for Nano-Materials and MEMS, NMIT, Bangalore, for their timely support and the Department of
Electrical and Electronics Engineering and authorities of Nitte Meenakshi Institute of Technology,
Bangalore, for the continuous support and encouragement. The authors extend their sincere gratitude
to Visveswaraya Institute of Technology, Belagavi, for the opportunity and support.
References
1. Mukhiya R, Agarwal P et al (2019) Design, modelling and system level simulations of DRIE-
based MEMS differential capacitive accelerometer. Microsyst Technol. https://doi.org/10.1007/
s00542-018-04292-0
2. Vijayakumar S, Vijila G, Alagappan M, Gupta A (2011) Design and analysis of 3D capacitive
accelerometer for automotive applications. In: COMSOL conference, Bangalore
3. Puccioni G (2020) Design and analysis of a MEMS capacitive accelerometer. Sens Microsyst
J
4. Xie H, Sulouff RE (2008) Capacitive accelerometer. Compr Microsyst
5. Senturia SD (2001) Microsystem design. Kluwer Academic Publishers. ISBN 0-7923-7246-8
6. Kannan A (2008) Design and modeling of a MEMS-based accelerometer with pull in analysis
7. Padmanabhan Y (2017) MEMS based capacitive accelerometer for navigation, 20 Apr 2017.
https://doi.org/10.13140/RG.2.2.35625.49769
8. Singh P, Srivastava P, Chaudhary RK, Gupta P (2013) Effect of different proof mass supports
on accelerometer sensitivity, pp 896–900. https://doi.org/10.1109/ICEETS.2013.6533506
9. Sinha S, Shakya S, Mukhiya R, Gopal R, Pant BD (2014) Design and simulation of MEMS
differential capacitive accelerometer. In: Proceeding of ISSS international conference on smart
materials, structures and systems, Bangalore, India, 8–11 July 2014
10. Veena S, Rai N, Suresh HL, Nagaraja VS (2021) Design, modelling, and simulation analysis
of a single axis MEMS-based capacitive accelerometer. IJETT J
Auto-Load Shedding and Restoration
Using Microcontroller
1 Introduction
Modern electrical power system is itself a very complex technical system which had
been developed and built by the mankind. The main aim of such service is to provide
a continuous and uninterruptable service to the consumer end. With civilization, the
power systems need to be expanded to meet the needs. However, constructing a
new transmission lines or building a new generation plants to have a reliable service
is difficult task. Since installation of a plant would take roughly around three to
four years and moreover it is a high budget installation requiring huge capitals. In
concern with providing services, a best way to mitigate such issues would be load
shedding. A load shedding is a common practice that takes place when the demand
for power is generally more than the generated power. To make sure that the system
is stable and available during all the conditions, under a main power station, there are
several substations which handles power-cut for a certain duration to cope up with
this shortage in electrical energy.
When the power system blackouts occur due to abnormalities [1], it causes huge
losses to the customers as well as utility. This disturbance spreads into larger areas
and leads to complete system failure and also unexpected risks [2]. Even the time for
restoration of the networks cannot be estimated. Hence, to overcome all these issues,
load shedding concept plays prominent role.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 983
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_82
984 G. L. Harsha et al.
2 Background
Conventionally, practiced load shedding techniques are very slow, and also, they
do not calculate the correct amount of load that need to be shed. This will cause
unnecessary load shedding [3]. This would decrease the efficiency of the system as
they shed excessive loads and providing inconvenience to the customers.
Load shedding is classified in three types: (1) traditional, (2) semi-adaptive and
(3) adaptive depending on the strategy involved in load shedding and restoration [4].
Traditional load shedding normally called as conventional load shedding is most used
among these three categories because it is simple and doesn’t require complicated
relays. Under this scheme, certain amount of load sheds when system frequency
falls below a certain threshold level. In semi-adaptive scheme, rate of change of
frequency, ROCOF in system is compared with the system frequency reaching the
threshold [5]. Based on the value of ROCOF, certain amount of load is shed. Adaptive
scheme uses real-time data which are collected from phasor measurement unit, and
using that, disintegration of the system into several islands where generation tripping
with initiation of load shedding is adapted in a controlled manner [6].
Whenever there is a heavy machine is switched on, a drop in frequency at lines
is general, and this should not be considered as a pickup value for load shedding.
Instead, the rate of fall of frequency over the given time lag, i.e., df /dt [7]. Many
blackout data in the power distribution network revealed that voltage stability is
also important for a power system else they would cause undesirable disturbances.
Frequency as well as voltage are affected on loading, and hence, frequency and
voltage are used as parameters for load shedding [8].
3 Basic Design
The prototype developed comprises the power supply module, the sensing unit, the
dynamic load controller, the programmable control switch, the distribution feeders,
and the output display for other type of fault related to power line. The sensing
unit serves as input to the dynamic load controller. The available power is input to
the system through the analog pin of microcontroller. The dynamic load controller
contains the program that implements the load shedding based on available power.
The generation represents the available power in the transmission line. The load
represents the load point. The programmable unit helps the relay switch to transfer
power automatically to the load. The system will help improve the load shedding
schedules, accuracy, quality, and fairness. Figure 1 shows block representation of
implemented technique.
Auto-Load Shedding and Restoration Using Microcontroller 985
Sensors are basically a device which can sense or identify and react to certain
types of electrical or some optical signals.
Vs ∗ R2
Vout = (1)
R1 + R2
From Eq. 1, the output voltage can be calculated based on ohms law where
• V S = source voltage, in volts (V)
• R1 = resistance of the 1st resistor, in ohms (Ω )
• R2 = resistance of the 2nd resistor, in ohms (Ω )
• V out = output voltage, in volts (V).
Figure 3 shows the simulation circuit for automatic load shedding and restoration
using microcontrollers with underfrequency relay circuit, by using Proteus simulation
software.
The model is subdivided into:
• Frequency sensing unit (input to microcontroller)
• Load shedding block (controller actions are given from microcontroller).
It is a four-step load shedding with 25% of total load shed at each step as shown
in Table 1. A conventional method of auto-load shedding using microcontroller is
very useful as they provide accurate and error-less operation.
The amount of load that need to be shed is just necessary to bring back the system
frequency (50 Hz). That means the load that need to be shed is nearly equal to amount
of overload that is being calculated [10].
It is not necessary to restore the frequency exactly for 50 Hz instead it could be
above 49 Hz, remaining would be recovered by the system generation action by speed
governor. Frequency levels at which load shedding are initiated depend on various
factors. In an interconnected system, a frequency deviation of 0.2–0.3 Hz would
indicate severe disturbance in the system. Hence, load shedding must be initiated
with 49.3 Hz of decrease in system frequency. Load shedding must be coordinated
with the equipment operating since most of the undesirable phenomenon would take
place at reduced frequency. The increase in load demand compared to generation
would cause performance of the power plant outputs to reduce. And this would drive
system toward unbalanced condition further decreasing the efficiency of the plants.
Based on Eq. 2, the amount of load demand over generated power is calculated
to bring load shedding as well as restoration. With increase in loading on feeder
system or if generated power itself has been reduced, then a supply and load demand
vary; hence, there would be decrease in voltage as well as a frequency. Using these
parameters, auto-load shedding as well as restoration shown in Fig. 6, at duly time,
is implemented as discussed in Sect. 1.
990 G. L. Harsha et al.
Fig. 6 Flowchart of load shedding and restoration with decreased generated power
6 Conclusion
In this paper, the design and construction of automatic load shedding and restora-
tion in system. This work presents an enhanced solution to the challenges of poor
scheduling of load shedding. The system integrated sensor, microcontroller, LCD
display, relay switch to automate the load shedding system. The proposed system
provides information about voltage and frequency. The automation of the load shed-
ding system will help in ensuring fairness, accuracy, and efficiency in scheduling
as human errors are eliminated. The operational values of substation are generally
Auto-Load Shedding and Restoration Using Microcontroller 991
66/11 kV or 66/33 kV; hence, kVA rated circuit breakers are required to integrate
this module. This packed module can be installed at the substations with the help of
contactors and interposing relays.
7 Future Scope
Wide area measurement system (WAMS) is a hot topic in research of power system
domain as there is increase in use of advanced real-time measurement system for
collection of data in power system (like frequency, voltage, current, and generation
capacity). This paper is a complete setup for required working, and in future, many
other function and features like IoT module can be added for more reliable operation.
Another significant field would be WAMS-based load shedding. Wide area measure-
ment system uses adaptive relaying technology to achieve greater adaptability [6].
Underfrequency load shedding and restoration are brought using supervisory controls
where at significant generation loss or loads, load shedding scheme is implemented
by formation of islands.
References
1. Wu Y-K, Chang SM, Hu Y-L (2017) Literature review of power system blackout. In: 4th inter-
national conference on power and energy systems engineering, CPESE 2017, Berlin, Germany,
25–29 Sept 2017
2. Alhelou HH, Hamedani-Golshan ME, Njenda TC, Siano P (2019) A survey on power system
blackout and cascading events: research motivations and challenges, 20 Feb 2019
3. Hirodontis S, Li H, Crossley PA (2019) Load shedding in a distribution network. In:
International conference on sustainable power generation and supply, Nanjing, China
4. Shafiullah M, Alam MS, Hossain MI, Ahsan MQ (2014) Impact study of a generation rich island
and development of auto load shedding scheme to improve service reliability. In: International
conference on electrical engineering and information & communication technology (ICEEICT)
5. Ahsan MQ, Chowdhury AH, Ahmed SS, Bhuyan IH, Haque MA, Rahman H (2012) Technique
to develop auto load shedding and island scheme to prevent power system blackout. IEEE Trans
Power Syst 27(1)
6. Phadke AG, Thorp JS. Computer relaying for power systems, 2nd edn
7. Perumal N, Amran AC (2003) Automatic load shedding in power system. In: National
power and energy conference (PECon) 2003 proceedings, Bangi, Malaysia. 0-7803-8208-
0/03/$17.00©2003 IEEE
8. Joshi P. Load shedding algorithm using voltage and frequency data. https://tigerprints.clemson.
edu/all_theses/240
9. https://simple-circuit.com/arduino-frequency-meter-220-380v-ac/
10. Berdy J (1968) Load shedding—an application guide. General Electric Company, Electric
Utility Engineering Operation, Schenectady, NY
A Review on Design and Performance
Evaluation of Privacy Preservation
Techniques in Data Mining
1 Introduction
As the world is moving towards digitization in all sectors like education, banking,
voting, education, transportation, etc. [1]. There are possible chances of attacks in
all fields like denial of service, man in the middle attack, malware, phishing, imper-
sonation attack, etc. All of these attacks can be either active or passive. These attacks
alter the results hence the security is compromised, there are many protocols used to
protect the privacy of users known as data hiding. User’s trust is very important in
sectors like banking, payroll, voting, etc. [1]. Users trust is questioned in many ways
during hacking attacks, and there are many privacy preserving mechanisms for data
encryption.
Data mining is another technique used to rearrange large sets of data or patterns
to extract useful information [2]. The data is collected and assembled in warehouses
on particular areas of interest for efficient analysis in effective decision-making cost
reduction, etc. Data mining to obtain necessary information without privacy leakage
is an import task. Many algorithms have been designed for preserving privacy through
data mining.
Data mining is a key player in healthcare; the huge amount of data generated by the
healthcare transforms this data into useful form for decision-making [3]. It also helps
healthcare industries to detect fraud and abuse, customer relationship management
decisions, evaluation of cost of treatment, identifying the risk factors associated with
diabetes [4]. Data mining in healthcare uses EHR systems, which provides a view
of person’s health stored at different locations, connected to a central mining server.
This central server preserves the privacy and stores the information from different
her systems by using classification and clustering.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 993
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_83
994 J. N. Kalshetty and N. Nalini
The main goal of the DPPP is to maintain data confidentiality; there are three
privacy preservation techniques, randomization, anonymization and encryption [5].
When randomizing, the noise is added to the original data, the noise is large enough
for the individual values of the records to no longer be recorded [6]. In anonymization,
the particular individual record may be made indivisible in a group of records using
the generalization and aggregation technique k [7]. Encryption is the process for
encrypting information. This process involves the conversion of original information,
referred to as plaintext, into an alternative form known as cipher text.
Cyber threats are a major concern these days there are many techniques developed
to overcome cyber threats, using privacy preserving techniques we can protect the
data by privacy preserving. Privacy preserving method include blockchain, authen-
tication and cryptography [7]. Cryptography is a technique of providing secure
communications that allow only the sender and intended recipient of a message
to view its contents. Here, data encryption is done using a secret key. After the
encryption, both the encoded message as well as secret key are sent to the recip-
ient for the process of decryption. Block chain collects information as blocks that
hold the information together. As huge amounts of heterogeneous data is collected
continuously through IoT, privacy preserving mechanisms are used to protect user’s
privacy.
IoT provides interconnection among various heterogeneous devices [8]. Data
is collected from the sensors about machines, human beings as well. Despite
there are many advantages of collecting data there are many third party intruders
who can mine the data extract necessary information. In IoT, large amounts of
data is continuously collected by different sensors, IoT sensors includes: pressure
sensors, humidity sensors, proximity sensors, accelerometers, level sensors, gyro-
scope, infrared sensors, gas sensors, temperature sensors, and optical sensors [9].
Data is classified into three types depending on its structure structured, unstructured,
semi-structured data. Privacy preserving data mining is gaining more popularity
because it preserves the quality of data without altering it. Collection of information
across various sources of data while preserving the privacy of data.
As the saying goes these days, “Data is the new gold”, the importance of data is
not unknown to anyone. To make data readable and interpretable, we deploy various
data mining techniques and in this process, make our data vulnerable to various
privacy concerns. These privacy concerns may arise from lack of awareness, personal
embracement, or surveillance. To maintain the confidentiality, integrity and avail-
ability of data, we need PPDM methods. PPDM methods make sure that the results
acquired after data mining is intact in terms of confidentiality and integrity.
A Review on Design and Performance Evaluation of Privacy … 995
The main objective achieved in the proposed system in [1] is verified voters,
Duplicate vote detection, protected voter details, Online Ballot Integrity, Vote storing
and Verification, End-to-End Security. The basic idea here is that the user can login
with the user details. ECS server provides some values to poll the votes the voting
server communicates and verify the details of the user on the ECS server and provide
the online ballot based on the ward candidate list.
Cube-structured data storage is used to store encrypted data in order to facilitate
data recovery from the cloud used to check the integrity of the user. To secure
voting and results, verification relies on a user-differentiated system model. It has
five entities: Users, Trusted ECS Server, Trusted Vote Verification Server, Online
Voting Server and Cloud Voting Storage server. The proposed system consists of
three phases, three phases: the first consists in saving user data, candidate data offline.
Phase two is the voting phase, and phase three is the voting results announcement
phase.
Transformation using rotation data disturbance (RDP) involves the rotation of a
point present in the axis of the coordinates into different axes without affecting the
metric [4]. Geometrically disturbed data will be output from actual data sets (Fig. 1).
The algorithm proposed in this paper consists of a original matrix m, n number
of attributes are considered for perturbation, Singular value decomposition takes a
rectangular matrix of gene expression data (defined as A, where A is a n × p matrix) in
which the n rows represents the genes, and the p columns represents the experimental
conditions. The SVD theorem states:
where the columns of U are the left singular vectors (gene coefficient vectors); S
(the same dimensions as A) has singular values and is diagonal (mode amplitudes);
and V T has rows that are the right singular vectors (expression level vectors). The
SVD represents an expansion of the original data in a coordinate system where the
covariance matrix is diagonal. The features, which are having smaller values than
the threshold, will be set to zero.
A Review on Design and Performance Evaluation of Privacy … 997
4 Literature Survey
(continued)
S. [Reference] Technique/methodology/algorithm Performance Dataset Research gaps
No. year
5 [14] 2019 DeepChain DeepChain Data collected Security should
guarantees from blockchain be redefined
data privacy
for each
participant
and provides
audit ability
for the whole
training
process but
the
encryption
part and
adding data
on the chain
consumes a
considerable
amount of
time
6 [3] 2018 Medical agglomeration behaviors MABM Medical insurance Firstly threshold
mining (MAMB) algorithm industry need to be
has better determined and
scalability, then the fraud of
more the person’s card
efficient in can be identified
running
parameter
than Eclat
algorithm
and apriori
algorithm
7 [5] 2018 Ant colony optimization, random The Real-time Different types
rotation perturbation, K-means proposed healthcare of data hiding
clustering algorithm is techniques can
better in be used by
utilization choosing
and a suitable QI
combination attributes
of K-means
clustering
algorithm.
This method
protects the
privacy of
individual
more
accurately
(continued)
A Review on Design and Performance Evaluation of Privacy … 999
(continued)
S. [Reference] Technique/methodology/algorithm Performance Dataset Research gaps
No. year
8 [7] 2018 PPDM hybrid algorithm In this paper, archive.ics.uci.edu We can also try
(cryptography and perturbation) the data is (machine learning, to work with the
converted in UCI) (Indian liver video and audio
to their patient dataset, data
respective balance scale
ASCII values dataset, abalone
then apply dataset, and bank
perturbation marketing dataset)
techniques,
after then
cryptography
technique
was applied
on the data
9 [2] 2018 Privacy preserving item-centric It is used in Dataset is not PPDM mining
mining algorithm PP-UV-Eclat the Apache disclosed can be improved
Spark and publishing
environment step can be
to find shifted from the
frequent post processing
patterns step to
intermediate
processing step
10 [15] 2017 Data mining tools which are Efficient Healthcare Semi-supervised
efficient in detecting upcoding review method of
frauds learning would
be highly
appreciable in
fraud detection
11 [6] 2017 Privacy preserving hybrid Information Database stored Does not throw
technique (suppression and loss is zero, on MS Access light in case of
perturbation) execution very huge
time are datasets with
minimum high
and privacy dimensionalities
preserved is
maximum
12 [16] 2017 Multi-party privacy preserving Performance Student data In this algorithm
data mining for vertically in the terms complexity time
partitioned data of accuracy consumption
is high, error should be
rate is low, measured as low
time
consumption
high and
memory
consumption
is low
1000 J. N. Kalshetty and N. Nalini
Privacy and its preservation are very big concerns in data mining. Now there is
an abundance of ways, algorithms, and techniques for privacy preservation. The
paper throws light on some of them. Table 1 shows a comparison of various PPDM
techniques along its methods, scenarios, advantages and limitations.
6 Conclusion
References
9. Beg S, Anjum A, Ahmad M, Hussain S, Ahmad G, Khan S, Choo K-KR (2021) A privacy-
preserving protocol for continuous and dynamic data collection in IoT enabled mobile app
recommendation system (MARS). J Netw Comput Appl 174:102874. ISSN 1084-8045. https://
doi.org/10.1016/j.jnca.2020.102874
10. Zhou Y, Tian Y, Liu F, Liu J, Zhu Y (2019) Privacy preserving distributed data mining based
on secure multi-party computation. In: 2019 IEEE 11th international conference on advanced
infocomm technology (ICAIT), pp 173–178. https://doi.org/10.1109/ICAIT.2019.8935900
11. Domadiya N, Rao UP (2021) Improving healthcare services using source anonymous scheme
with privacy preserving distributed healthcare data collection and mining. Computing 103:1–
23. https://doi.org/10.1007/s00607-020-00847-0
12. Upadhyay S, Sharma C, Sharma P, Bharadwaj P, Seeja KR (2018) Privacy preserving data
mining with 3-D rotation transformation. J King Saud Univ Comput Inf Sci 30(4):524–530.
https://doi.org/10.1016/j.jksuci.2016.11.009
13. Keshk M, Moustafa N, Sitnikova E, Turnbull B, Vatsalan D (2020) Privacy-preserving tech-
niques for protecting large-scale data of cyber-physical systems. In: 2020 16th international
conference on mobility, sensing and networking (MSN), pp 711–717. https://doi.org/10.1109/
MSN50589.2020.00121
14. Weng J, Weng J, Zhang J, Li M, Zhang Y, Luo W (2021) DeepChain: auditable and privacy-
preserving deep learning with blockchain-based incentive. IEEE Trans Dependable Secure
Comput 18(5):2438–2455. https://doi.org/10.1109/TDSC.2019.2952332
15. Sheshasayee A, Thomas SS (2017) Implementation of data mining techniques in upcoding
fraud detection in the monetary domains. In: 2017 international conference on innovative
mechanisms for industry applications (ICIMIA), pp 730–734. https://doi.org/10.1109/ICIMIA.
2017.7975561
16. Sharma S, Shukla D (2016) Efficient multi-party privacy preserving data mining for verti-
cally partitioned data. In: 2016 international conference on inventive computation technologies
(ICICT), pp 1–7. https://doi.org/10.1109/INVENTIVE.2016.7824852
Controller Area Network (CAN)-Based
Automatic Fog Light and Wiper
Controller Prototype for Automobiles
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 1003
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_84
1004 S. Madhavan et al.
2 Literature Survey
The literature survey provides a bird’s eye view about the research done in this area
so far.
Networks built using CAN and traditional vehicle circuit connections are
compared. It is proved that CAN has drastically changed the electrical wiring system
of the vehicle which has a direct impact in reducing component costs and increasing
the reliability [6].
Any general embedded system can employ a CAN bus. The CAN protocol defines
both hardware and software standards. The main advantage of designing with CAN
is that it is very easily implementable, more like a plug and play fashion. It also
means that the designers need not understand every bit of the CAN protocol, but they
can use the protocol to build an application by knowing the hardware interfaces and
output values [7].
Single Board Computers (SBC) are usually employed to implement the CAN bus
protocol. Raspberry Pi can also be used for CAN implementation. The Raspberry Pi
is also equipped with a high performance processor which also helps in faster data
transmission and processing [8].
Semi-autonomous vehicles are built with an analog driver vehicle interface. The
analog interface can be converted into a digital interface. The design has a data
acquisition system in which the Analog to Digital Converter (ADC) converts all data
into digital for. The digital data will be sent to the display. Low speed CAN and high
speed CAN were used according to the requirements [9].
Traditionally, CAN is a wired protocol. But as the world is embracing artifi-
cial intelligence-based systems, there is an ever increasing need to control automo-
bile systems remotely. Wireless CAN and IoT can merge for realizing a remotely
controlled automobiles [10].
There is an ever increasing need to develop more safer and reliable systems in
automobiles, along with other entertainment technologies. Automatic control of the
speed of the windshield also comes under increasing the safety of the automobile.
There are various methods to start the wiper motion and adjust its speed automatically
[11].
Controller Area Network (CAN)-Based Automatic Fog Light … 1005
Existing systems might have some limitations in their functionality. One such
problem is the blurriness for the driver because of high intensity head lights coming
from the opposite direction. Also, the windshield haze is a threat because everything
in front of the driver appears smudged and cause accidents. Fog light automation is
one of the main solutions for this, which doesn’t exist in economy cars [12].
Increased safety means there would be a lot of parameters to monitor lie tempera-
ture, gas leakage, glaring effect, current sensor, fuel leakage, etc. All sensors can be
interfaced to Single Board Computers and to both master and slave ECUs, if there
are two. CAN efficiently supports interface between different ECUs which monitor
all these parameters [13].
In many cases, high intensity headlights are problematic which can cause lot
of glare to drivers coming in the opposite direction. Automobile drivers might not
decrease the intensity of their head lights according to the lighting on the road or
when other vehicles pass beside them. Automatic variation in the intensity of head
lights can be benefitting. This can be accomplished using an accelerometer sensor.
It has already been tested in the lab with 97% success rate [14].
In a modern-day automobile, there exists a complex interconnection of sensors,
actuators, controllers, and physical cables along with other electronic or electrical
equipments. The widely used protocol for this connection is the CAN protocol which
is wired, having differential voltages on the physical cable. However, a new hybrid
communication protocol termed as vehicular wireless CAN (ViCAN) is introduced.
This is called “hybrid” since it combines both wired and wireless versions of the
CAN protocol. The major advantage of this hybrid architecture is that it reduces the
overall complexity of the system and supports reliable node to node communication
[15].
The description of the sensors used, CAN Frame format, CAN Transceiver, CAN
controller and the software used to develop the application are briefly described here.
Rain Detection Sensor SKU890228 actually works like a switch. If rain drops are
detected, the switch will be closed. Nickel-coated lines are mounted on the board in
the form of fringes. The resistance of the sensor decreases when rain drops fall on it
and increases whenever the sensor ID dry. The Rain Detection Sensor is as shown in
Fig. 2.
CAN version 2.0B is implemented by the CAN controller MCP 2515. The data is
converted according to the CAN data frame format. The CAN controller will be inter-
faced with the microcontroller via the Serial Peripheral Interface (SPI) protocol. MCP
2551 is the CAN Transceiver, which converts data into physical voltage levels. Differ-
ential physical voltages are employed in the range of 2.5–3.5 V. This is placed between
the CAN controller and the physical bus. Figures 3 and 4 show CAN controller and
CAN Transceiver, respectively.
The tool used to develop this application is the Arduino IDE.
The CAN protocol defines a standard format of data which will be transmitted on
the CAN physical bus. CAN data format is as shown in Fig. 5.
Controller Area Network (CAN)-Based Automatic Fog Light … 1007
A prototype of the automatic fog light and wiper control system is as shown in Fig. 9.
Master and slave ECUs communicate with each other to perform specific func-
tions, namely displaying temperature and humidity values, turning ON or OFF a
DC motor which controls the wiper and turning ON or OFF Light Emitting Diodes
(LED) lights, corresponding to fog lights of the automobile. It is battery powered.
1010 S. Madhavan et al.
References
1. https://www.streetdirectory.com/travel_guide/56847/cars/the_modern_day_car_a_sophistica
ted_high_tech_gadget.html
2. https://www.bosch-mobility-solutions.com/en/solutions/control-units/eengine-control-unit/
3. https://www.cselectricalandelectronics.com/difference-between-lin-can-most-flexray/
4. https://www.embitel.com/blog/embedded-blog/what-is-can-protocol-stack-why-its-critical-
software-solution-for-ecu-communication
5. https://www.infineon.com/cms/en/product/transceivers/automotive-transceiver/automotive-
can-transceivers/?gclid=CjwKCAiAtouOBhA6EiwA2nLKH-206vOZ05EGfiPZmFBDQ_J
59OQW0ovExNxIxgwbxgVm7QAZzrj1jBoC9oEQAvD_BwE&gclsrc=aw.ds
6. Guo S (2011) The application of CAN-bus technology in the vehicle. In: 2011 international
conference on mechatronic science, electric engineering and computer (MEC), pp 755–758.
https://doi.org/10.1109/MEC.2011.6025574
7. Li X, Li M (2010) An embedded CAN-BUS communication module for measurement and
control system. https://doi.org/10.1109/ICEEE.2010.5661248
Controller Area Network (CAN)-Based Automatic Fog Light … 1011
8. Salunkhe AA, Kamble PP, Jadhav R (2016) Design and implementation of CAN bus protocol
for monitoring vehicle parameters. In: 2016 IEEE international conference on recent trends in
electronics, information & communication technology (RTEICT), pp 301–304. https://doi.org/
10.1109/RTEICT.2016.7807831
9. Vijayalakshmi S (2013) Vehicle control system implementation using CAN protocol. Int J Adv
Res Electr Electron Instrum Eng 2(6). ISSN (Print): 2320-3765, ISSN (Online): 2278-8875
10. Chikhale SN (2018) Automobile design and implementation of CAN bus protocol—a review.
IJRDO J Electr Electron Eng 4(1):01–05. ISSN: 2456-6055. Retrieved from http://www.ijrdo.
org/index.php/eee/article/view/1528
11. Naresh P, Haribabu AV (2015) Automatic rain-sensing wiper system for 4-wheeler vehicles. J
Adv Eng Technol 3:1–5
12. Balaji RD (2020) A case study on automatic smart headlight system for accident avoidance.
IJCCI 2(1):70–77
13. Wagh PA, Pawar RR, Nalbalwar SL (2017) A review on automotive safety system using can
protocol. Int J Curr Eng Sci Res (IJCESR) 4(3). ISSN (PRINT): 2393-8374, (ONLINE): 2394-
0697
14. Muhammad F, Dwi Yanto D, Martiningsih W, Noverli V, Wiryadinata R (2020) Design of
automatic headlight system based on road contour and beam from other headlights. In: 2020
2nd international conference on industrial electrical and electronics (ICIEE), pp 112–115.
https://doi.org/10.1109/ICIEE49813.2020.9276906
15. Laifenfeld M, Philosof T (2014) Wireless controller area network for in-vehicle communica-
tion. In: 2014 IEEE 28th convention of electrical and electronics engineers in Israel, IEEEI
2014. https://doi.org/10.1109/EEEI.2014.7005751
Multivariate Long-Term Forecasting
of T1DM: A Hybrid Econometric
Model-Based Approach
1 Introduction
Type 1 diabetes mellitus (T1DM) patient’s glucose variations are highly influenced
by multitude of parameters, namely insulin dosage, diet, lifestyle, sleep quality,
stress, etc. This demands that an accurate blood glucose prediction model should
be based on multivariate analysis approach. A few univariate-based blood glucose
prediction algorithms have been studied in literature [1–5]. These univariate time
series algorithms predict the blood glucose values by only considering the blood
glucose history. The univariate models were reliable, simple and fast; they are more
suitable for short-term prediction [6]. As univariate model is less comprehensive,
it is more judicious to use a multivariate model for long-term prediction of blood
glucose values [7]. In this paper, the author implements econometric model-based
approach, where multivariate time series algorithms are used for long-term blood
glucose forecasting.
A survey of various machine learning algorithms involving multiple variables is
as discussed.
Jensen et al. [8] have used machine learning approach, namely autoregressive
integrated moving average (ARIMA) and support vector regression (SVR) model
for hypoglycemia prediction. They claim that SVR model has outperformed clinical
diagnosis in predicting 23% of hypoglycemic events 30 min in advance.
Marling et al. [9] divided the available dataset of patients into training and testing
data. Initial 7 days data was used to train the model and last 3 days data was used to
test. Moving average (MA), simple exponential smoothing (SES) and support vector
machine (SVM) techniques were used. SVM outperformed by obtaining RMSE of
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 1013
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_85
1014 R. Phadke and H. C. Nagaraj
18.0 mg/dL for 30 min prediction horizon (PH) and RMSE of 30.9 mg/dL for 60 min
PH. Neural network-based approach for predicting glucose level is employed in
[10–14].
Eren-Orukulu et al. [15] have collected databases of 2 patients under hospital-
ized and normal life conditions measured by CGM sensor. The time series model
employed both recursive identification and change detection methods to adapt to vari-
ability and glycemic disturbances dynamically among inter/intra patients. Prediction
performances were evaluated based on glucose prediction error and Clark’s error grid
analysis (C-EGA). Analysis using C-EGA resulted in accurate readings of 90% or
more.
Petridis et al. [16] built a probabilistic time series predictor using Bayesian
combined predictor (BCP). The BCP outperformed the conventional predictors.
In the next section, dataset being used in this study is discussed.
Long-term prediction time series algorithms are implemented in this study, on two
different datasets, namely:
. Dataset A: Librepro CGM sensor dataset (Courtesy: Jnana Sanjeevini Diabetes
Hospital and Medical Center, Bangalore).
. Dataset B: Ohio T1DM CGM sensor dataset (Courtesy: Ohio University. An DUA
was signed by NMIT, Bangalore, and Ohio University for using the dataset strictly
for academic research).
Table 1 Performance statistics of univariate algorithms on T1DM Librepro CGM sensor dataset
A; for a prediction horizon of 15 min [7]
Algorithm RMSE MAE MAPE
Moving average 18.09 13.86 10.76
Linear regression 35.52 27.84 25.68
ARIMA 7.07 5.12 3.98
Novel ensemble method 7.38 5.47 3.22
Holts AAN 7.98 5.91 4.57
Holts MMN 8.504 6.11 4.65
The Ohio T1DM dataset consists of type 1, six patients eight weeks’ data. The
patients age group was in the range 40–60 years. Out of which two were male, and
four were female. All the patients used insulin pump therapy (Medtronic 530G) with
continuous glucose monitoring (CGM) (Medtronic Enlite). A custom smartphone is
used to record the hyper/hypo glycemic data. Basis peak fitness band was used to
report physiological data.
The dataset consisted of CGM blood glucose level recorded at 5 min intervals,
self-monitored finger stick-based method of blood glucose levels, insulin dosage,
meal intake with an estimate of carbohydrate rating (self-reported), exercise time
(self-reported), quality and duration of sleep, work and stress, illness and every 5 min
1016 R. Phadke and H. C. Nagaraj
recording of heart rate, galvanic skin response (GSR), skin and air temperatures and
step count.
Each contributor had 2 XML file, one for training data and the other for test data
which are tabulated with respect to their ID numbers (Table 2).
Each XML file contained the following data field (Table 3).
The Ohio T1DM viewer [18] is a visualizer tool that graphically displays XML file
of the Ohio T1DM dataset. The bottom panel shows the CGM data, insulin and
self-reported life events. The panel at the top displays basis peak fitness band data.
Snapshot of Ohio T1DM viewer is as shown in Fig. 2.
The Ohio T1DM dataset was originally available to only the participants of the 3rd
international workshop IJCAI-ECAI 2018. Later this dataset was made available to
all the researchers in the field of health care with the obtainability of a non-disclosure
data user agreement (DUA).
3 Dataset B Preparation
Table 3 XML data fields with 19 different attributes per contributor [18]
Data field Details
Patient Patient ID number and the insulin type
Glucose level CGM data for every 5 min interval
Finger stick Finger stick-based BG values
Basal Basal insulin infusion rate
temp_basal Temporary basal insulin rate
Bolus Insulin rate delivered before a meal or during hyperglycemia
Meal Patient’s meal timing and carbohydrate rating
Sleep Patient’s sleep quality scaled from 1 to 3 with 3 as good
Work Patient’s physical exercise and exertion rating scaled from 1 to 10 with
10 as most active
Stressors Time of stressful events
Hypo-event Time of hypoglycemic event
Illness Time of sudden illness
Exercise Exercise intensity scaled from 1 to 10 along with duration in minutes
basis_heart_rate Heart rate collected for a period of 5 min interval
basis_gsr Galvanic skin response collected for a period of 5 min interval
basis_skin_temperature Skin temperature, in degrees Fahrenheit, collected for a period of
5 min interval
basis_air_temperature Air temperature, in degrees Fahrenheit, collected for a period of 5 min
interval
basis_steps Step count is collected for a period of 5 min interval
basis_sleep Basis band reports when the subject is asleep, along with the sleep
quality estimation
Fig. 2 Snapshot of Ohio T1DM viewer for day-wise display of integrated data [18]
1018 R. Phadke and H. C. Nagaraj
This involves
. Splitting the 24 h time zone into 7 time-flag buckets. Followed by realigning the
data as per the bucket.
. Classification of independent data variables into physiological and psychological
data.
Multivariate Long-Term Forecasting of T1DM: A Hybrid … 1021
In this step, the 24 h time slots are divided into 7 different time-flag buckets as
tabulated in Table 5.
This method is advantageous as it helps in validation process. During the validation
process, begin and end time slot of the test window is checked, and it is mapped with
an appropriate time-flag bucket. Forecasting is done considering the previous history
of blood glucose values from that corresponding time-flag bucket of the training
dataset. This helps in more reliable and accurate prediction of blood glucose value.
For example, to predict the blood sugar level at 8:00 a.m., the previous day same
time slots blood sugar level will also be considered by the forecasting algorithms.
Artificial intelligence (AI) models until now have considered only the quantifiable
physiological parameters for diabetes monitoring and prediction. The human facet,
i.e., the patient’s psychological data is the least being considered by the AI, while
they can actually play a very important role in improving the prevention and care of
diabetes. Ignorance of this aspect will lead to erroneous treatment being dispensed
to a patient [19]. Hence, in this paper, author considers both physiological and
psychological data of a patient.
Here, the 19 different attributes in the dataset are classified into physiological and
psychological variable. The classification is listed in Fig. 6.
4 Analytical Model
The model is now built using the segregated variables as prepared in the above section
and is trained using multivariate machine learning algorithms. The analytic approach
used by the model for blood glucose prediction is as shown in Fig. 7.
1022 R. Phadke and H. C. Nagaraj
Train set
Evaluation
Data Preparation Psychological,
Physiological ARIMAX,
and Hybrid ANN, SVM
Model Models
Train
XML Blood
Files Glucose
Test Validation Prediction
Test set
Evaluation
Three types of analytical model are built for glucose prediction, namely:
. Psychological Model: This model predicts the blood glucose value based on time-
based BG values and psychological variables.
. Physiological Model: This model predicts the blood glucose value based on time-
based BG values and physiological variables.
. Hybrid Model: This model predicts the blood glucose value based on time-based
BG values and a combination of physiological and psychological parameters.
. The process flow of the model implementation using machine learning algorithms
is as depicted in the block diagram shown in Fig. 8.
The multivariate time series machine learning models are trained on the three
types of analytical models discussed above and is then validated for performance
using performance metrics. 80:20 ratio is used for dataset division into train and test
sets, respectively. The performance metrics used for evaluation of the multivariate
models are:
. Mean absolute percentage error (MAPE)
. Clarks error grid analysis (EGA).
Blood glucose data is often affected by insulin reaction, physical activity, carbo-
hydrate intake, stress and similar events, which we can refer to as involvement of
multiple events. Hence, there is a requisite to analyze multiple parameters for BG
prediction.
The multivariate time series models employed in this paper are:
. Autoregressive integrated moving average with explanatory variable (ARIMAX)
1024 R. Phadke and H. C. Nagaraj
The ACF plot in Fig. 11 is quickly decaying exponentially. And it goes below
the significance range (dotted blue line). This indicates a stationary data series.
Hence post differentiation, the dependent variable is made stationary. In the next
section, deployment of the multivariate time series models on the stationary dataset
is discussed.
4.1.1 ARIMAX
Table 6 MAPE for ARIMAX implemented on psychological, psychological and hybrid model
MAPE for PH 30 min 60 min 2h 3h 1 day
ARIMAX physio 30.25 41.75 43.40 32.04 32.16
ARIMAX psycho 33.88 34.63 35.19 29.93 24.39
ARIMAX_hybrid (80:20 weightage) 11.12 15.29 23.41 29.14 22.73
. Zone E: This zone indicates the points that would confuse the treatment between
hypoglycemia and hyperglycemia.
EGA which was developed in 1987 [23] is used here to quantify the ARIMAX
hybrid model. The Clarks error grid analysis plot for hybrid model for PH of 30 min
and 24 h is as shown in Figs. 13 and 14.
The error grid analysis percentage table is given in Table 7.
Table 7 gives that in ARIMAX 30.20 percentage of points lie in Zone D, which is
potentially dangerous failures in detecting hypoglycemia or hyperglycemia events.
4.1.2 ANN
Table 8 MAPE for ANN implemented on psychological (psycho), physiological (physio) and
hybrid model
MAPE for PH 30 min 60 min 2h 3h 1 day
ANN physio 22.06 24.03 27.26 23.61 21.15
ANN psycho 34.14 34.56 37.37 33.96 18.78
ANN hybrid (80:20 weightage) 3.70 6.89 14.86 21.49 13.37
{ ( { )
yt = w0 + wj · g w0 j + wi j · yt − 1 + εt (4)
The capability of SVM to solve nonlinear problems makes it interesting and suitable
for time series forecasting [26, 27]. Considering a data training set, T, represented
by [28]:
Multivariate Long-Term Forecasting of T1DM: A Hybrid … 1029
where, w is the weight vector, b is the bias and ϕ(xi) is the high dimensional feature
space, which is linearly mapped from the input space x.
The number of support vectors used was 22,185.
Table 10 lists the MAPE for SVM implemented on only hybrid model with weigh-
tage set to 80:20 because it had poor performance on physiological and psychological
model.
The Clarks error grid analysis plot for hybrid model for PH of 30 min and 24 h is
as shown in Figs. 17 and 18.
The error grid analysis percentage table is given in Table 11.
Table 11 gives that in SVM, 25 percentage of points lie in Zone D, which is
potentially dangerous failures in detecting hypoglycemia or hyperglycemia events.
Table 12 gives the comparison of ARIMAX, ANN and SVM-based hybrid model
implemented.
Multivariate Long-Term Forecasting of T1DM: A Hybrid … 1031
The comparison Table 12 reveals that ANN hybrid model results in least MAPE
of 13.37 for a prediction horizon of 24 h and Fig. 19 also reveals that ANN follows
the trend of the actual data indicating that as a best long-term prediction model for
blood glucose prediction for the Ohio T1DM dataset B. Hence, the author recom-
mends ANN for multivariate long-term forecasting of blood glucose values. Also the
proposed hybrid model outperforms the other two single models, thus paving way
to further work around hybrid permutation model hypothesis.
As seen in Fig. 20, ANN hybrid model yielded better result for 1-day ahead
prediction with scatter plot in A, B and D zone since it had more data points to train
itself.
1032 R. Phadke and H. C. Nagaraj
Fig. 20 1 day ahead prediction and EGA plot of hybrid ANN model
6 Conclusion
Time series forecasting requires data at regular intervals for accurate forecasting.
They cannot be left unfilled. Hence, interpolation played a major role in our data
preparation. Also, some of the self-reported data was converted to categorical. The
Multivariate Long-Term Forecasting of T1DM: A Hybrid … 1033
Acknowledgements I thank Jnana Sanjeevini Hospital, Bangalore, and Ohio University for
providing with their dataset for the research work.
References
1. Frandes M, Timar B, Timar R et al (2017) Chaotic time series prediction for glucose dynamics
in type 1 diabetes mellitus using regime-switching models. Sci Rep 7. Article Number 6232.
https://doi.org/10.1038/s41598-017-06478-4
2. Bremer T, Gough DA (1999) Is blood glucose predictable from previous values? A solicitation
for data. Diabetes 48:445–451
3. Sparacino G, Zanderigo F, Maran A, Facchinetti A, Cobelli C (2007) Glucose concentration
can be predicted ahead in time from continuous glucose monitoring sensor time-series. IEEE
Trans Biomed Eng 54:931–937
4. El Youssef J, Castle J, Ward WK (2009) A review of closed-loop algorithms for glycemic
control in the treatment of type 1 diabetes. Algorithms 2:518–532
5. Gani A, Gribok A, Rajaraman S, Ward W, Reifman J (2009) Predicting subcutaneous glucose
concentration in humans: data-driven glucose modeling. IEEE Trans Biomed Eng 56:246–254
6. Pros and cons of using the univariate model of financial analysis. http://www.floridabankrupt
cyblog.com/pros-and-cons-of-using-the-univariate-model-of-financial-analysis/. Accessed 20
Dec 2018
7. Phadke R, Prasad V, Nagaraj HC (2019) Time series based short term T1DM prediction of
Librepro CGM sensor data: a novel ensemble method. Int J Eng Adv Technol (IJEAT) 8(6)
8. Jensen M, Cristensen TF, Tarnow L, Seto E, Johansen M, Hejlesen O (2013) Real time hypo-
glycemia detection from continuous glucose monitoring data of subjects with type 1 diabetes.
Diabetes Technol Ther 15(7)
9. Marling C, Wiley M, Buneseu R, Shudrook J, Schwartz F (2012) Emerging applications for
intelligent diabetes management. AI Mag 67
1034 R. Phadke and H. C. Nagaraj
10. Polat K, Gunes S (2007) An expert system approach based on principal component analysis and
adaptive neuro-fuzzy inference system to diagnosis of diabetes disease. Digit Signal Process
702–710
11. Baghdadi G, Nasrabadi A (2007) Controlling blood levels in diabetics by neural network
predictor. Eng Med Biol Soc 3216–3219
12. Zecchin C, Facchinetti A, Sparacino G, Nicolao GD, Cobelli C (2012) Neural network incorpo-
rating meal information improves accuracy of short-time predictions of glucose concentration.
IEEE Trans Biomed Eng
13. Hamdi T et al (2017) Artificial neural network for blood glucose level prediction. In:
International conference on smart, monitored and controlled cities
14. Pappada S, Cameron BD, Rosman PM, Bourey RE, Papadimos TJ, Olorunto W, Borst MJ
(2011) Neural network based real time prediction of glucose in patients with insulin dependent
diabetes. Diabetes Technol Ther 135–141
15. Eren-Orukulu M, Cinar A, Quinn L, Smith D (2009) Estimation of future glucose concentrations
with subject specific recursive linear models. Diabetes Technol Ther 243–253
16. Petridis V, Kehagias A, Petrou L, Bakirtzis A, Kiartzis S, Panagiotou H, Masalaris N (2001) A
Bayesian multiple models combination method for time series prediction. J Intell Robot Syst
31(1–3):69–89
17. https://diatribe.org/abbott-freestyle-libre-pro-cgm-system-fda-approval. [Online]. Accessed 2
Mar 2016
18. Marling C, Bunescu C (2018) The OhioT1DM dataset for blood glucose level prediction. In:
IIIrd international workshop on knowledge discovery in healthcare data, Stockholm, Sweden,
13 July 2018
19. Phadke R, Prasad V, Nagaraj HC (2019) Precise humane diabetes management: synergy of
physiological and psychological data in AI based diabetes. Int J Sci Technol Res 8(11)
20. Pankratz A (1991) Forecasting with dynamic regression models. Wiley-Interscience
21. Khashei M, Bijari M, Ardali GAR (2009) Improvement of auto-regressive integrated moving
average models using fuzzy logic and artificial neural networks (ANNs). Neurocomputing
72(4–6):956–967
22. Adebiyi AA, Adewumi AO, Ayo CK (2014) Comparison of ARIMA and artificial neural
networks models for stock price prediction. J Appl Math 2014:7 pages. Article ID 614342
23. Clarke WL (2005) The original Clarke error grid analysis (EGA). Diabetes Technol Ther
7(5):776–779. https://doi.org/10.1089/dia.2005.7.776
24. Zhang G, Patuwo B, Hu MY (1998) Forecasting with artificial neural networks: the state of
the art. Int J Forecast 14(1):35–62
25. Vapnik VN (1995) The nature of statistical learning theory, 1st edn. Springer-Verlag, New York
26. Zbikowski K (2014) Using volume weighted support vector machines with walk forward testing
and feature selection for the purpose of creating stock trading strategy. Expert Syst Appl 42
27. Cao LJ, Tay EH (2001) Support vector with adaptive parameters in financial time series
forecasting. IEEE Trans Neural Netw 14:1506–1518
28. Ojemakinde BT (2006) Support vector regression for non-stationary time series. Master thesis,
University of Tennessee, Knoxville
29. Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol
147:195–197. https://doi.org/10.1016/0022-2836(81)90087-5
30. May P, Ehrlich H-C, Steinke T (2006) ZIB structure prediction pipeline: composing a complex
biological workflow through web services. In: Nagel WE, Walter WV, Lehner W (eds) Euro-
Par 2006. LNCS, vol 4128. Springer, Heidelberg, pp 1148–1158. https://doi.org/10.1007/118
23285_121
31. Foster I, Kesselman C (1999) The grid: blueprint for a new computing infrastructure. Morgan
Kaufmann, San Francisco
32. Czajkowski K, Fitzgerald S, Foster I, Kesselman C (2001) Grid information services for
distributed resource sharing. In: 10th IEEE international symposium on high performance
distributed computing. IEEE Press, New York, pp 181–184. https://doi.org/10.1109/HPDC.
2001.945188
Multivariate Long-Term Forecasting of T1DM: A Hybrid … 1035
33. Foster I, Kesselman C, Nick J, Tuecke S (2002) The physiology of the grid: an open grid
services architecture for distributed systems integration. Technical report, Global Grid Forum
34. National Center for Biotechnology Information. http://www.ncbi.nlm.nih.gov
Master and Slave-Based Test-Bed
for Waste Collection and Disposal:
A Dissertation
1 Introduction
The process of automating things is exploited in almost every major area of life.
Making things automatic reduces the burden on humans. The cost and effort used in
manually controlled products is much higher than automated systems. Considering
the fact that the problem of effective waste management is one of the biggest problems
of modern times; it is absolutely necessary to address this problem [1]. The correct
waste management system is a must for the sanitary society in general and for the
world as entire. Waste management includes the planning, financing, construction,
and operation of facilities for the collection, transport, recycling, and final disposal of
waste [2]. Currently, large cities around the world are in need of demanding solutions
for solid waste management (SWM), due to the growth of residential areas and the
economy. SWM is an expensive urban service that consumes around 20–50% of the
annual municipal budget in developing countries [3].
In the proposed system, there are two main participant systems, the master bin
and slave bin. The master bin is a trash collector and disposer that receives a signal
from the slave source garbage bin and begins its procedure after receiving it. The
garbage collector moves around a corridor or along predetermined pathways within
the building or neighborhood, stopping at designated slave bins to collect trash [4].
The lane that the master bin follows is designated in black so that (infrared) IR
sensors can readily pick it up. When the master bin fills up, a signal is sent out,
and the container proceeds to its dumpsite without stopping at any other collection
station. The collection and disposal of garbage information can be tracked through
Web site, and the message is made to send to the concerned authorities. This prototype
of proposed system automates the collection and disposal mechanism of an area or
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 1037
N. R. Shetty et al. (eds.), Emerging Research in Computing, Information, Communication
and Applications, Lecture Notes in Electrical Engineering 928,
https://doi.org/10.1007/978-981-19-5482-5_86
1038 S. V. Kulkarni et al.
institutional premises. The system comprises of mainly multiple slaves and one
master as a participant. It capitalizes on the demarcations made at the edge of roads
or at the edge or corridors of premises to make its system functional in its objective.
The multiple slave bins of an area are connected to master bin wirelessly using X-bee
in this prototype. The slave bin continuously monitors the garbage pile-up level in
it. As the garbage is piled up in any of the slave bin, the slave bin communicates
this situation wirelessly to master. The master bin in this system is a maneuverable
asset that upon receiving the information from any of source bin of an area activates
its automatic collection and disposal mechanism [5, 6]. The master generally stays
hold until and unless an interrupt is received from any slave, and the moment it
receives the interrupt, it starts maneuvering toward slave bin. The appropriate slave
bins in the system are located by comparing radio frequency identification (RFID)
tags with the pre-stored database [7]. The master moves to the area of the particular
slave bin and executes collection and disposal mechanism. The system comprises of
multiple slaves in connection with single master wirelessly through X-bee movement
to source location; the general closed-loop system to facilitate the master is shown
in Fig. 1.
In the present scenario, there are many instances of untimely collection of
garbage and its disposal creating garbage menace [8]. The garbage over flowing
in locality/premises creates esthetically unpleasant environment and encourages
epidemic and the garbage segregation [7, 8]. The mix of wet and dry waste makes
the segregation difficult that entails the difficulty in recycling and reuse of garbage.
The main objective of the paper is to automate monitoring of garbage pile-up level
in localities/institutional premises via multiple slaves. Initiate automatic collection
and disposal mechanisms via master garbage bin and slave bin and keep concerned
parties informed, notify and alert authorities on violations of segregation norms or
other issues, keep a record of garbage collection pile-up and collection on server,
and keep the surrounding area clean, hygienic, and healthy [9, 10].
2 Test-Bed Requisites
The prototype of proposed system automates the collection and disposal mechanism
of an area or institutional premises. The system comprises of mainly multiple slaves
and one master as a participant. Hardware required is Arduino mega microcontroller,
ultrasonic sensors, GSM/GPRS module, X-bee, IR sensors, DC motors, motor control
board, FID reader, servomotors, LED, buzzer, and software required is Arduino IDE
and XCTU.
• Arduino Mega 2560 and Communication Network in X-bee: Arduino is an
open-source platform for creating electronic projects that combines several parts
and interfaces into a single board [10]. It comes with it all you need to get started
with the microcontroller. It can easily plug it to computer with a USB connection
or battery with an AC-to-DC converter. The integrated development environment
(IDE) is used to program the Arduino mega 2560, which is the same for all boards
and works both online and offline. Almost all Arduino shields are also compatible
with the mega. OEMs may employ X-bee RF modules to build a universal mark
that can be used on a variety of platforms, including points, ZigBee/Mesh, and
2.4 GHz and 900 MHz solutions. OEMs that use the X-bee can switch out one
X-bee for another according on the application’s requirements, reducing devel-
opment time, risk, and time-to-market. To be setup on a specific network or to
serve as a master or slave X-bee, the X-bees must be programmed using software
XCTU.
• Ultrasonic Sensor (HCSR04) and IR Sensors: Ultrasonic sensors are devices
that take electrical–mechanical energy transformation to measure distance of
target object from the sensor [11]. An infrared sensor is electronic equipment that
emits and/or detects infrared radiation to perceive certain features of its surround-
ings. Infrared sensors are also effective in detecting motion and measuring the
heat released by an item.
• RFID Reader: A radio frequency identification reader (RFID reader) is a device
cast-off to collect information from an RFID tag, which is used to keep a track
on individual objects. Radio waves are utilized to transfer data from the tag to a
reader [12, 13].
• Servomotor and SIM 900A: A servomotor is a spinning or linear actuator that
can regulate angular or linear position, velocity, and acceleration with pinpoint
accuracy. It is constructed from a suitable motor, a gear, and a position feedback
sensor. The servomotor’s job is to take a control signal that indicates the desired
output position of the servo shaft and drive its DC motor until the shaft rotates to
that point. The motor which was used here has the torque rating of SIM900A is
a GSM/GPRS module which could be used to send/receive messages and make a
1040 S. V. Kulkarni et al.
calls [14, 15]. It can also connect to the Internet using GPRS. The SIM900 Quad-
band/SIM900A dual-band GSM/GPRS module includes a breakout board and a
basic system. It uses AT instructions to interact with controllers (GSM 07.07,
07.05, and SIMCOM-enhanced AT commands). The software power on and reset
functions are supported by this module.
• Motor Control Board (LM298) and DC Motor: LM298 is a motor control board
which controls the operation of DC motor. It has three ports, 5, 12 V power supply
and one ground. It takes input from 4 ports for two DC motors to be controlled
in either of the directions. The robot runs on wheels, and it has an arm which
rotates; this is made possible by DC motors. In order to attain linear motion of
the vehicle, DC motors are used. There are six DC motors fixed at the ends of the
robot for the movement of the robot, and it requires three motors for the robotic
hand to work properly.
• Arduino Nano Overview and XCTU: The ATmega328-based Arduino Nano
is a compact, comprehensive, and breadboard-friendly board. It functions in a
similar way as the Arduino Duemilanove but in a different packaging. It just has
a DC power connector and uses a Mini-B USB cable rather than a conventional
one. XCTU is a free multi-stage program that lets developers utilize a graphical
user interface to connect with Digi RF modules. It is simple to setup, configure,
and test an X-bee module which includes new tools. The interface depicts the
X-bee network graphically as well as the signal strength of each connection,
and the X-bee API frame builder, which intuitively facilitates in the generation
and interpretation of API frames for X-bees in API mode these are two unique
features. XCTU may be configured on a variety of RF devices [16]. Frame gener-
ators, frames interpreter, recovery, range test, and firmware explorer are just a few
examples of embedded tools that may be utilized without an RF module (Table 1).
The master bin is the central system in this prototype, and the block diagram is shown
in Fig. 2. Master bin is mainly designed to execute to reach the destination of source of
garbage, collect the waste, and dispose it to another site. Also, it is programmed and
Master and Slave-Based Test-Bed for Waste Collection … 1041
designed to alert master bin comprises of MCU which is Arduino mega in this case.
Arduino mega is interfaced with line following robot system, water sensors, X-bee,
GSM/GPRS module, servomotors, buzzer, and ultrasonic sensors, master bin which
follows the path and fixed on top of the line follower robot. This bin has an Arduino
interfaced with ultrasonic sensors to check the pile-up level, X-bee to receive the
wireless signal from slave bin when slave garbage source bin is full, servomotors for
operating collection and disposal mechanism automatically. The master bin receives
information of filling of source garbage bin through X-bee wirelessly from the slave
or the source bin when it is filled with garbage up to brim.
After the interrupt, the collection mechanism is initiated in master bin. Once its
initiates its movement by activating the line follower bot toward the slave bin, message
is sent to the concerned authorities about its initiation for the collect of garbage from
the particular slave bin. A message is also sent to public of corresponding slave bin
area to refrain from adding more bin until cleanup has taken place, and the frequency
of collection at different sites is uploaded to Web page using GSM/GPRS module;
thus, the garbage collection rate at different sites can be monitored and studied. The
ultrasonic sensors are incorporated in bin to monitor its own garbage filling level and
give appropriate alert and indication of filling up. Water sensor senses whether there
any wet waste present in the system, if presents, it alerts the concerned people and
creates alarm. An actuator mechanism is set up using “servomotor” which allows
control of angular position which is used to open the lids of bin for the purpose of
collection and disposal. When the main bin arrives at the correct source bin to collect
the garbage, the upper lid operates with the actuation of servomotor, and garbage is
collected. And the lower lid opens at the disposal point when the bin has to dispose
the garbage.
The block diagram of slave bin is shown in Fig. 3. The slave bins are the source
of garbage generating points. These are to be stationed at tragic height and position
to facilitate the automatic collection and disposal. The master arrives under the slave
bin which is sensed by sensors incorporated in slave which then operates its disposal
mechanism. Slave bin consists of array of led whose color raging from red to green.
It is also interfaced with X-bee, ultrasonic sensors, and servomotors. The level of
garbage in source bin is sensed by ultrasonic sensor and is transferred serially to
Arduino, and when the sensed value reaches below the critical value, information
is sent through X-bee to master bin for it to initiate the automatic collection. The
pile-up levels of bin visually indicated using LEDs as the garbage pile-up level grow
the led shifts from green to red. Bin pile-up levels are visually indicated by LEDs,
which change from green to red as the garbage pile-up level increases. Once filled,
the buzzer and red LEDs turned on. As soon as the master arrives, the ultrasonic
sensors fixed at the bottom of the slave bin sense the change in distance thus open up
the bottom lid and pushes the garbage out using servomotor mechanism as depicted
in Figs. 4 and 5.
The robot that can follow a path is known as a line follower robot shown in Fig. 6. On
a white surface, the route may be seen as a black line (or vice-verse). It is a combined
design based on mechanical, electrical, and computer engineering understanding.
A line follower is designed in order to carry the “Master bin” on top of it. A line
follower has IR sensors which detect the intensity of light being reflected from the
ground. While it is programmed to run on a black line on a white surface or any
Master and Slave-Based Test-Bed for Waste Collection … 1043
dark line on a light surface. Depending upon the output of the IR sensors, the line
following robot is coordinated to move in appropriate direction.
The motors are driven in this line follower robot using motor driving circuit
which depending upon the input from the Arduino runs or stops the motor to give
4 Detailed Discussions
A servomotor is used here in this prototype; it is fitted at the edge of the lids of both
master and slave bins; the degree of rotation of motor is controlled to control the
opening and closing of lids of both slave as well master bin. An ultrasonic sensor
is fitted at the base of the slave bin so that it detects the main bin which comes to
collect the garbage. After detecting the correct RFID tag, the master bin that is the
line follower stops, and the upper lid of master opens, and the bottom lid of slaves
open. The internal mechanism of slave bin facilitates the push of garbage from slave
to master. Figure 9 shows experimental setup of line following robot attached with
sensors.
Figures 10, 11 and 12 depict garbage collection statistics for a three-month period
(from April 10, 2021 to July 10, 2021). The waste was disposed of after the bin
Fig. 9 Experimental setup: a line following robot attached with sensors and b master following
the line to reach slave location
1046 S. V. Kulkarni et al.
Fig. 10 Update of garbage collection data on server-ThingSpeak-channel status from 10th April
2021 to 10th May 2021
5 Conclusion
The automatic collection and the disposal of the garbage could be achieved through
an interconnected slaves and master garbage bins. This system could also reduce
the unawareness among authorities and public regarding garbage bin piling and
collection and disposal usage as the system could update the statics of collection
and disposal on the server. The system also kept public and authorities informed and
apprised by sending them periodic messages. This system also kept check on the
public fowling segregation norms or not, hence maintaining the segregation of dry
and wet waste efficiently. Having such system in place could reduce the un-hygienic
conditions and could bring down the epidemic level as emanated from the untimely
collection and non-segregation of waste. This system also facilitated the resource
management in terms of labor management by making everything automated.
Master and Slave-Based Test-Bed for Waste Collection … 1047
Fig. 11 Update of garbage collection data on server-ThingSpeak-channel status from 10th May
2021 to 10th June 2021
Fig. 12 Update of garbage collection data on server-ThingSpeak-channel status from 10th June
2021 to 10th July 2021
1048 S. V. Kulkarni et al.
References
1. Arebey M et al (2009) Solid waste monitoring and management using RFID, GIS and GSM.
In: 2009 IEEE student conference on research and development (SCOReD). IEEE
2. Arampatzis T, Lygeros J, Manesis S (2005) A survey of applications of wireless sensors
and wireless sensor networks. In: Proceedings of the 2005 IEEE international symposium
on Mediterranean conference on control and automation intelligent control, 2005. IEEE
3. Longhi S, Marzioni D, Alidori E, Di Buo G, Prist M, Grisostomi M, Pirro M (2012) Solid
waste management architecture using wireless sensor network technology. In: Proceedings of
the 2012 5th international conference on new technologies, mobility and security (NTMS),
Istanbul, Turkey, 7–10 May 2012, pp 1–5
4. Narendra Kumar G, Swamy C, Nagadarshini KN (2014) Efficient garbage disposal management
in metropolitan cities using VANETs. J Clean Energy Technol 2:258–262
5. Saji RM, Gopakumar D, Kumar SH, Sayed KNM, Lakshmi S (2016) A survey on smart garbage
management in cities using IoT. Int J Eng Comput Sci 5:18749–18754
6. Younis O, Fahmy S (2004) HEED: a hybrid, energy-efficient, distributed clustering approach
for ad hoc sensor networks. IEEE Trans Mob Comput 3:366–379
7. Al-Khatib IA, Monou M, Abu Zahra ASF, Shaheen HQ, Kassinos D (2010) Solid waste char-
acterization, quantification and management practices in developing countries. A case study:
Nablus district, Palestine. J Environ Manage 91(5):1131–1138
8. Eisted R, Larsen A, Christensen T (2009) Collection, transfer and transport of waste: accounting
of greenhouse gases and global warming contribution. Waste Manage Res 27(8):738–745
9. Bhat VN (1996) A model for the optimal allocation of trucks for solid waste management.
Waste Manage Res 14(1):87–96
10. Hannan MA, Arebey M, Basri H (2010) Intelligent solid waste bin monitoring and management
system. Aust J Basic Appl Sci 4(10):5314–5319. ISSN 1991-8178
11. Ali ML, Alam M, Rahaman MANR (2012) RFID based e-monitoring system for municipal solid
waste management. In: 7th international conference on electrical and computer engineering,
Dec 2012
12. Singh T, Mahajan R, Bagai D (2016) Smart waste management using wireless sensor network.
Int J Innov Res Comput Commun Eng (IJIRCCE) 4(6)
13. Healy M, Newe T, Lewis E (2008) Wireless sensor node hardware: a review. In: 2008 IEEE
sensors, pp 621–624
14. Nithya L, Mahesh M (2016) A smart waste management and monitoring system using automatic
unloading robot. Int J Innov Res Comput Commun Eng (IJIRCCE) 4(12):20838–20845
15. Bashir A, Banday SA, Khan AR, Shafi M (2013) Concept, design and implementation of
automatic waste management system. Int J Recent Innov Trends Comput Commun (IJRITCC)
1(7):604–609
16. Al Mamun MA, Hannan MA, Hussain A, Basri H (2015) Integrated sensing systems and
algorithms for solid waste bin state management automation. IEEE Sens J 15(1):561–567