Advances in VLSI and Embedded Systems: Zuber Patel Shilpi Gupta Nithin Kumar Y. B. Editors
Advances in VLSI and Embedded Systems: Zuber Patel Shilpi Gupta Nithin Kumar Y. B. Editors
Advances in VLSI and Embedded Systems: Zuber Patel Shilpi Gupta Nithin Kumar Y. B. Editors
Zuber Patel
Shilpi Gupta
Nithin Kumar Y. B. Editors
Advances
in VLSI and
Embedded
Systems
Select Proceedings of AVES 2019
Lecture Notes in Electrical Engineering
Volume 676
Series Editors
Leopoldo Angrisani, Department of Electrical and Information Technologies Engineering, University of Napoli
Federico II, Naples, Italy
Marco Arteaga, Departament de Control y Robótica, Universidad Nacional Autónoma de México, Coyoacán,
Mexico
Bijaya Ketan Panigrahi, Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, Delhi, India
Samarjit Chakraborty, Fakultät für Elektrotechnik und Informationstechnik, TU München, Munich, Germany
Jiming Chen, Zhejiang University, Hangzhou, Zhejiang, China
Shanben Chen, Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Tan Kay Chen, Department of Electrical and Computer Engineering, National University of Singapore,
Singapore, Singapore
Rüdiger Dillmann, Humanoids and Intelligent Systems Laboratory, Karlsruhe Institute for Technology,
Karlsruhe, Germany
Haibin Duan, Beijing University of Aeronautics and Astronautics, Beijing, China
Gianluigi Ferrari, Università di Parma, Parma, Italy
Manuel Ferre, Centre for Automation and Robotics CAR (UPM-CSIC), Universidad Politécnica de Madrid,
Madrid, Spain
Sandra Hirche, Department of Electrical Engineering and Information Science, Technische Universität
München, Munich, Germany
Faryar Jabbari, Department of Mechanical and Aerospace Engineering, University of California, Irvine, CA,
USA
Limin Jia, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Alaa Khamis, German University in Egypt El Tagamoa El Khames, New Cairo City, Egypt
Torsten Kroeger, Stanford University, Stanford, CA, USA
Qilian Liang, Department of Electrical Engineering, University of Texas at Arlington, Arlington, TX, USA
Ferran Martín, Departament d’Enginyeria Electrònica, Universitat Autònoma de Barcelona, Bellaterra,
Barcelona, Spain
Tan Cher Ming, College of Engineering, Nanyang Technological University, Singapore, Singapore
Wolfgang Minker, Institute of Information Technology, University of Ulm, Ulm, Germany
Pradeep Misra, Department of Electrical Engineering, Wright State University, Dayton, OH, USA
Sebastian Möller, Quality and Usability Laboratory, TU Berlin, Berlin, Germany
Subhas Mukhopadhyay, School of Engineering & Advanced Technology, Massey University, Palmerston
North, Manawatu-Wanganui, New Zealand
Cun-Zheng Ning, Electrical Engineering, Arizona State University, Tempe, AZ, USA
Toyoaki Nishida, Graduate School of Informatics, Kyoto University, Kyoto, Japan
Federica Pascucci, Dipartimento di Ingegneria, Università degli Studi “Roma Tre”, Rome, Italy
Yong Qin, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China
Gan Woon Seng, School of Electrical & Electronic Engineering, Nanyang Technological University,
Singapore, Singapore
Joachim Speidel, Institute of Telecommunications, Universität Stuttgart, Stuttgart, Germany
Germano Veiga, Campus da FEUP, INESC Porto, Porto, Portugal
Haitao Wu, Academy of Opto-electronics, Chinese Academy of Sciences, Beijing, China
Junjie James Zhang, Charlotte, NC, USA
The book series Lecture Notes in Electrical Engineering (LNEE) publishes the latest
developments in Electrical Engineering—quickly, informally and in high quality.
While original research reported in proceedings and monographs has traditionally
formed the core of LNEE, we also encourage authors to submit books devoted to
supporting student education and professional training in the various fields and
applications areas of electrical engineering. The series cover classical and emerging
topics concerning:
• Communication Engineering, Information Theory and Networks
• Electronics Engineering and Microelectronics
• Signal, Image and Speech Processing
• Wireless and Mobile Communication
• Circuits and Systems
• Energy Systems, Power Electronics and Electrical Machines
• Electro-optical Engineering
• Instrumentation Engineering
• Avionics Engineering
• Control Systems
• Internet-of-Things and Cybersecurity
• Biomedical Devices, MEMS and NEMS
For general information about this book series, comments or suggestions, please
contact [email protected].
To submit a proposal or request further information, please contact the
Publishing Editor in your country:
China
Jasmine Dou, Associate Editor ([email protected])
India, Japan, Rest of Asia
Swati Meherishi, Executive Editor ([email protected])
Southeast Asia, Australia, New Zealand
Ramesh Nath Premnath, Editor ([email protected])
USA, Canada:
Michael Luby, Senior Editor ([email protected])
All other Countries:
Leontina Di Cecco, Senior Editor ([email protected])
** Indexing: The books of this series are submitted to ISI Proceedings,
EI-Compendex, SCOPUS, MetaPress, Web of Science and Springerlink **
Editors
Advances in VLSI
and Embedded Systems
Select Proceedings of AVES 2019
123
Editors
Zuber Patel Shilpi Gupta
Department of Electronics Engineering Department of Electronics Engineering
S. V. National Institute of Technology S. V. National Institute of Technology
Surat, Gujarat, India Surat, Gujarat, India
Nithin Kumar Y. B.
National Institute of Technology Goa
Ponda, Goa, India
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Committees
Organizing Committee
Organizing Chairs
Anand D. Darji, SVNIT, Surat
Rasika Dhavse, SVNIT, Surat
Zuber Patel, SVNIT, Surat
Pinalkumar Engineer, SVNIT, Surat
Program Chairs
Virendra Singh, IIT Bombay, India
R. M. Patrikar, VNIT Nagpur, India
Vineet Sahula, MANIT Jaipur, India
Web Chairs
Seena V., IIST, India, India
Mahendra Sakre, IIT Ropar, India
Hospitality Chairs
A. H. Lalluwadia, SVNIT Surat, India
Abhilash Mandloi, SVNIT Surat, India
Mehul Patel, SVNIT Surat, India
Registration Chairs
Shweta Shah, SVNIT Surat, India
K. P. Upla, SVNIT Surat, India
P. K. Shah, SVNIT Surat, India
v
vi Committees
Sponsorship Chairs
Jignesh Sarvaiya, SVNIT Surat, India
P. N. Patel, SVNIT Surat, India
Manish kumar, MMMUT Gorakhpur, India
Publication Chairs
Zuber Patel, SVNIT, Surat
Shilpi Gupta, SVNIT, Surat
Nithin Kumar Y. B., NIT Goa, India
Exhibition Chairs
N. B. Kanirkar, SVNIT Surat, India
G. Santra, SVNIT Surat, India
Abhishek Agrahari, Mentor Graphics, India
Advisory Committee
Pinalkumar Engineer
ix
x Message from General Chairs
Anand D. Darji
Zuber Patel
Rasika Dhavse
Pinalkumar Engineer
General Chairs, AVES 2019
Department of Electronics Engineering, SVNIT, Surat
Preface
xi
xii Preface
Editorial Team
Surat, India Zuber Patel
Surat, India Shilpi Gupta
Ponda, India Nithin Kumar Y. B.
Acknowledgements
The journey of the conference started many months ago in the Department of
Electronics Engineering, SVNIT. The General chairs Mr. A. D. Darji,
Mr. Z. M. Patel, Mrs. R. N. Dhavse, and Mr. P. J. Engineer initiated this task which
was then carried out by various committees, who they started working diligently.
During this journey, the whole-hearted support was received from the staff of
Electronics Engineering Department of SVNIT and various people of TEQIP-III.
We extend our special thanks to the authors for noteworthy contribution.
Our esteemed sponsors are—Springer, TEQIP-III, Cadre, and optimized solu-
tions limited, without whose financial support this conference would not take its
shape. We are very grateful to MMMUT Gorakhpur for joining hands with us for
this task. There was extreme support from TPC team and reviewers, all the keynote
speakers, invited paper speakers, session chairs, and registered participants. It is
worth to express deep gratitude toward SVNIT authority people—Director,
Registrar In-Charge, and administrative staff for providing the infrastructural sup-
port and approvals for various arrangements in the Institute premise.
xiii
Contents
xv
xvi Contents
Dr. Nithin Kumar Y. B. received the B.E. degree from Manipal Institute of
Technology Manipal, India, and the M.Tech. degree from National Institute of
Technology-Surathkal, and the Ph.D. degree from Indian Institute of Technology-
Kharagpur, India. Currently he is working as an Assistant Professor at National
Institute of Technology Goa, India. He is a recipient of young Indian researcher
award from Government of Italy. He has published many research papers in reputed
international journals/conferences and delivered several technical presentations at
various international conferences/universities in India and abroad. He also filed 4
patents in the relevant area. His current research interests include analog and mixed
signal design.
xix
Reusability and Scalability of an SoC
Testbench in Mixed-Signal
Verification—The Inevitable Necessity
1 Introduction
The ultimate goal of ASIC designs is to construct a chip based on highly precise
specifications. These predefined specifications have to be inspected during and after
design flow. In order to acquire the highest possible level of certainty in the func-
tionality of the design it is of vital importance to use accurate and liable verification
techniques.
In design of today’s SoCs, functional verification complexity rises exponentially
with hardware complexity doubling exponentially with time. Many novel, advanced,
No academic titles or descriptions of academic positions should be included in the addresses. The
affiliations should consist of the author’s institution town/city and country.
B. C. Pal (B)
Einfochips, Kolkata, India
e-mail: [email protected]
and liable methods have been introduced for digital verification. However, up to 70%
of design development time and resources are still dedicated to functional verification.
As SoCs became multifunctional, more analog blocks are integrated on a single
chip. Therefore, Mixed-Signal verification difficulties became more visible to veri-
fication teams. Moreover, at chip-level verification, when block-level verification
is accomplished successfully, it is required to develop a technique through which
analog and digital sub-blocks of a design are considered holistically.
Universal Verification Methodology (UVM) by offering base class libraries,
brings much automation to the digital verification world. In addition, this method-
ology is simulator vendor independent. It is possible to create UVM components
and reuse them in different projects. Taking advantage of constraint random stimuli
generation offered by UVM, the engineering effort has been turned into building
automatic checkers instead of writing directed test. The typical approach in UVM-
based verification flow is to define a test case and simulate the Design Under Test
(DUT) with constraint random stimuli to put the design into corner cases. Automatic
checkers ensure the correctness of the design functionality according to the gener-
ated stimuli. Furthermore, coverage mechanisms are used in order to measure the
inspected specification of the DUT and to point out to the verification closure.
The main goal of this work is to utilize the provided facilities by UVM in
Mixed-Signal verification environments. The proposed structure by this work highly
improves the Mixed-Signal verification quality in terms of verification performance
with minimum human effort devoted. It is possible to generate real value input stream
and drive analog interfaces of the Design Under Test (DUT) using automated digital
verification techniques like constraint random stimuli generation. Moreover, the
utilized analog transaction-level modeling approach enhances full-chip verification
procedure significantly.
In this work, a UVM-based Mixed-Signal verification environment is proposed.
Analog integrated circuits are simulated using analog simulators, like SPICE, over
the last decades. Simulation Program with Integrated Circuit Emphasis (SPICE) is
the most well-known tool for analog simulation. SPICE simulation before manufac-
turing the integrated circuits is a common task for designers to ensure the correctness
of the design. The tool was first developed at the University of California at Berkeley
as a class project about 40 years ago and since then it became a worldwide standard
circuit simulator. SPICE simulation covers wide range of components from simple
passive to complicated active devices like MOSFETs. By writing a text-description
of a component, the SPICE simulation model of it is achievable and its behavior is
predictable. SPICE like simulators are suitable for detailed design analysis, when the
design is constructed in transistor level. Even for a small portion of a design SPICE
simulation might take a week or more to be accomplished. SPICE simulation perfor-
mance was improved to speed up the simulation at about 2–4 orders of magnitude
over the last years. Although this improved simulation speed was achieved at the
expense of losing simulation accuracy.
In order to provide higher levels of productivity, it is of significant importance to
utilize various simulation tools during all steps of design development. For an analog
circuit designer to gain better productivity it is crucial to model the circuit in a higher
4 B. C. Pal
level of abstraction and verify its functionality. This is how designers can simply
detect some functional defects prior to devoting time to design to the transistor level.
Moreover, in chip-level verification when the design is going to be checked in
context of the whole system and verification aim is more to check system-level
behavior or interconnectivity between blocks, detailed results of SPICE models are
not requested.
Today’s design methodologies are highly affected by two different directions; in one
hand, it is required to inspect electrical behavior of the design to avoid physical
defects and later product re-spins. Detailed design representation is necessary in
order to verify it at this level. On the other hand, time-to-market pressure forces
designers to use abstract representation to be able to overcome the challenging task
of complex Mixed-Signal circuit verification.
Today, between 70 and 80% of designs are Mixed-Signal designs and, therefore,
Mixed-Signal verification takes enormous amount of time to be accomplished. To be
able to use digital-like simulators and, therefore, speed up the verification process,
it is crucial to consider analog and digital portions of the design holistically. This
highlights the demand for novel strategies in order to verify an analog IP in the SoC
context.
The aim of this paper is to reuse and scale a UVM SoC-level test bench for
Mixed-Signal verification from project to project.
As it has been recorded, more than 50% of design re-spins at 65 nm and below are the
overcome of Mixed-Signal designing. The consequent wasted time and high costs
of product re-spin bring about the demand for new strategies for SoC verification. It
is not possible any more simply assuming the practicality of old strategies in SoC-
level Mixed-Signal verification. To have an overview of old Mixed-Signal SoC-level
verification strategies, brief description of two of them is stated in this section.
Black Box Approach. One of the common methods to verify a mixed-signal
design with dominant digital content is called black box verification. In this approach,
a pre-verified analog block is integrated into a bigger design using its highly
abstracted model. The abstract model contains the interface description only in many
cases. This is how simulation speed degradation can be avoided but this problem
arises that verification quality degrades instead. Today’s Mixed-Signal SoCs are not
only more complex in terms of added analog blocks, but also in terms of compli-
cated interconnections and feedback loops between analog and digital portions of
Reusability and Scalability of an SoC Testbench … 5
the design. It is not reliable any more to replace analog portions of the design with
a black box model and perform SoC-level verification.
SoC-level verification of Mixed-Signal ICs has been getting more attention in last
years. Several approaches have been proposed by researchers in order to enhance
the verification process. Since digital verification teams were able to build highly
automated test benches, it was time to introduce more digital-like and consequently
more automated techniques for Analog and Mixed-Signal (AMS) circuit verifi-
cation. Although analog verification with SPICE is still a golden standard and
cannot be ignored, to achieve better simulation speed, different levels of design
abstraction are used to model an analog subsystem. Taking into account SPICE and
fast-SPICE simulation next level of abstraction, analog behavioral modeling (with
Verilog-AMS), improved simulation speed significantly. Furthermore, Real Number
Modeling (RNM) and pure digital models are the other approaches to describe the
analog sub-circuit in higher levels of abstraction and, therefore, to achieve higher
simulation performance. Moreover, another crucial factor to choose an appropriate
abstract model is the required effort to build a simulation environment. To clarify
more a comparative chart is shown in Fig. 1.
The chart shows the distinction between mentioned approaches in terms of
required effort for simulation setup in Fig. 1 and performance trade-off in Fig. 2.
As depicted in this figure, although RNMs or pure digital models are less accu-
rate models but less effort is required to build a simulation environment using these
models than AMS models. Therefore, this is the most apparent advantage of these
models in full-chip verification.
AMS models, this approach requires less effort and provides higher performance
when verification goal is apart from implementation details (accuracy).
Three HDL languages support RNM such as
• VHDL (with real type)
• SystemVerilog (with real type)
• Verilog-AMS (with wreal type).
Simulation performance comparable to what is already existing in digital veri-
fication techniques, makes it possible to apply highly automated digital techniques
like verification planning, random test generation, coverage, and assertions in analog
verification area by using real number models. This is the most well-liked verification
capability, which can be achieved in Analog and Mixed-Signal verification.
Three experimental simulations have been performed with Master Bias Block using
the Schematic, Verilog-AMS Model and the SV-RNM Model in Cadence Platform.
The simulation details are captured in Table 1 as shown below.
A sample DUT with SPICE netlist or Verilog-AMS or RNM blocks and RTL blocks
is presented in Fig. 3. In this case, the DUT has both digital and real-valued pins.
The existing approach to drive real-valued pins of DUT is to hard code stimuli in the
test (direct way). However, the proposed technique which leads to a way in which
analog pins of the DUT are driven using constrained random stimulus generation.
8 B. C. Pal
Fig. 3 A mixed-signal design with analog sub-circuits and digital RTL interfacing with each other
The proposed method to drive analog pins of depicted sample DUT in Fig. 3
resulted in a new concept, which is Analog Transaction. To clarify more on transaction
concept, it is noticeable that transaction is a data structure containing parameters. In
a UVM-based test bench, data fields within transaction are randomized and are used
in driver, which implements the protocol separately and wiggles the DUT pins.
To be able to extend the concept of transaction in analog domain, a term replace-
ment from “protocol” in digital world to “shape” in analog world is suggested. Analog
signals can have different shapes, for example, harmonic, linear or cubic, spline, etc.
To be able to describe or generate these analog shapes besides the name of the shape
we have to define parameters. This brings about a request to use analog transactions
to generate analog waves with defined parameters and consequently in TLM way [2].
In other words, in a UVM-based test bench, data structure including parameters is
passed to the driver and according to the specific numerical algorithm driver reflects
those parameters to generate the desired analog wave.
be modeled using AMS languages or Real Number Model (RNM). This is because,
the SPICE-level simulation is slow in comparison to digital simulation.
Constrained Random Stimulus. To explore different behaviors of DUT, it is
required to generate both digital and analog random stimulus. Efficient randomization
of analog inputs to the design is a major request when adopting MDV to analog
environment.
Self-checking. A self-checking test bench can determine that the design behaves
as planned or not. In digital designs, this is carried out by monitoring the output pins
of DUT and comparing it with expected output in Scoreboard. This process is more
complicated when checking an analog design.
The goal of this work is the improvement of existing verification methodology and
test bench structure of Mixed-Signal designs. In order to do so, a test bench structure
is proposed in which analog interfaces of the depicted DUT in Fig. 3 have been driven
taking advantage of transaction-level modeling. The proposed test bench structure
shown in Fig. 4 is responsible for generating random stimulus in order to shape
various analog waves and sending them to the DUT’s analog interface automatically.
The aim is to generate more different analog waves in transaction-based manner
in which the driver component receives transactions from sequencer and drives DUT
pins. The driver reflects the received transaction and eventually generates the intended
shape at the inputs of the DUT.
Moreover, generated components in the proposed verification environment can
support and deliver transactions with various data structures. This enables the user to
analog begin
if(enable == 1) begin
//noise_val = (($random() % 1000) * 0.000001);
//ramp_real = transition(ramp_val, 0.0,
trise*1n,tfall*1n) + noise_val;
ramp_real = transition(ramp_val, 0.0,
trise*1n,tfall*1n);
`ifdef UVC_ELECTRICAL_DRIVERS
V(o_ramp) <+ ramp_real;
`endif
end
end
endmodule
The randomized spectrum of signal is delivered by the UVM sequence item class
and the ramp signal is obtained through Verilog-AMS driver (either in “electrical”
domain or in “wreal” domain).
As the verification is simulation based, thus a simulator (either a Mixed-Signal
simulator or a digital solver) is a mandatory one.
For this purpose, let’s consider Cadence Mixed-Signal Simulator (e.g., AMS
Designer).
In a typical Mixed-Signal test bench, the Mixed-Signal DUT can be configured
with SPICE netlist and Verilog-AMS models, mixed and matched. This configura-
tion can be controlled from the AMS test bench configuration based upon the test
scenarios, which is explained in the next section [3, 4].
In this paper, the goal is to describe a UVM-based Mixed-Signal test bench through
which analog input pins of a sample DUT are driven using the same strategy as digital
input pins. Since different verification methodologies are used to verify digital and
analog blocks, it is crucial to have a holistic view of both digital and analog sub-blocks
of the design at chip-level verification.
The verification approach followed here is Command Line Based or Batch
Mode [3, 4].
The various components of a Mixed-Signal test bench are shown in Fig. 5.
12 B. C. Pal
AMS Netlist Generation. AMS netlist generation in the integration process where
the netlist can be generated with the help of “runams” script which needs a config
view of the design. The following command shows how to generate a Verilog-AMS
netlist using a config view:
Program Code.
amsd{
ie vsup=1.8 tr=1n instport=<port name including
hierarchical path> mode=split
ie vsup=1.2 tr=1n instport=<port name including
hierarchical path>
}
simulator lang=spectre
global 0
tran1 tran stop=10m errpreset=moderate relref=alllocal
For user friendly debugging, the simulation must be run with debug switch enabled
in the simulation control file (e.g., spectre.scs). Typical example of such options using
Spectre simulator is shown below.
Program Code.
14 B. C. Pal
Setting up Device Model Libraries. Any SPICE design requires proper device
model libraries to be included with the design for its proper functionality. Hence,
correct process corners must be mentioned in the field known as “section”. For a
typical corner simulation, it will be marked with “tt” in the section while including
the library path in the AMS configuration file as shown below.
Program Code.
Note. The device model libraries may be included in the AMS configuration file
(e.g., amscf.scs).
Setting up Test Bench Configuration File (e.g., amscf.scs). This file is the
mixed-signal configuration file which includes the following:
• spectre.scs
• device model libraries
• amsd block for configuring connect rules using “i.e. card”
• amsd block for configuring SPICE subcircuit using portmap statement.
Note. The mixed-signal test bench configuration file (e.g., amscf.scs) must be
parsed with the “irun” or “xrun” file list.
AMS Assertion Checkers. These checkers are written in Verilog-AMS and they
check voltage levels of electrical signal. In order to determine voltage level in elec-
trical domain, a “cross” statement (Verilog-AMS) can be used and the checker can
be bind with the electrical signal of DUT.
Reusability and Scalability of an SoC Testbench … 15
5 Conclusion
Based on the information described before, the aim of this work is to build a test
bench which is fully reusable and scalable in different projects when other shapes of
analog wave are required to drive a DUT.
To clarify more, UVM_seq_item class is implemented to hold parameters which
are required to generate a specific analog wave. In described example to generate
a ramp signal, signal parameters are needed. In one hand, parameters within
UVM_seq_item class were defined to deliver signal’s spectrum. In other words,
sequence items are sent to driver from sequencer.
In this work, the main focus is to implement a UVM-based test bench in which
transaction type, driver class are also implemented in a generic way. In this manner,
it is possible to generate various analog signal shapes like harmonic, cubic spline, or
any other shapes using the same structure. In many verification projects, it is required
to generate different shapes of analog waves. To have a fully flexible test bench with
the ability of generating different shapes of analog waves it is necessary to define an
unconstrained transaction-level communication between UVM components. Desired
analog output wave is generated eventually.
In a nutshell, a UVM SoC test bench can be leveraged for Mixed-Signal
verification in the following manner as shown in Table 2.
Acknowledgments At First, I would like to express my sincere gratitude to Mr. Gopal Bhavsar
for continuous support of my work, for his patience, motivation, and immense knowledge. His
guidance helped me in all the time of work and writing of this paper.
Table 2 Digital integration with different modeling standard for mixed-signal simulation
Approach Verilog-A Verilog-AMS VAMS wreal System Verilog
(2012)
Description Verilog lookalike Superset of Verilog-AMS RVM Most recent
for SPICE Verilog-A and subset. approach for
Verilog-D RVM
Evaluation Continuous Flexible Discrete time Discrete time
Digital Via co-sims True AMS, True AMS, limited Excellent
integration limited UVM/SV integration
UVM/SV
Modeling True analog True analog + Abstract signal Abstract signal
features modeling mixed-signal chain only chain but most
interaction permissive
Speed Slow Better Close to digital Fast, close to
digital
Note. It is not recommended to use both Verilog-AMS and SystemVerilog modeling mixed and
matched. A Verilog-AMS model for an analog block is always preferred over a SystemVerilog
model when interacting with a SPICE netlist
16 B. C. Pal
Besides my advisor, I would like to thank the rest of my Analog Mixed-Signal team members
for not only their insightful comments and encouragement, but also for the hard questions, which
motivated me to hone my technical know-how in detail from various perspectives.
My sincere thanks also go to Mr. Saurabh Desai, who provided me an opportunity to join his
team, and who gave access to open up a new center at Kolkata with Analog and Mixed-Signal work.
Without his precious support it would not be possible to write this paper.
I thank my fellow colleagues for the stimulating discussions and for all the fun we have had in
the last 3 years. Also I thank my friends at eInfochips. In particular, I am grateful to Mr. Snehal A.
Patel for enlightening me the first glance of writing paper.
Last but not the least, I would like to thank my family for supporting me spiritually throughout
writing this paper and my life in general.
References
1 Introduction
the tissue after application of excitation [1]. Body composition analysis [2], study of
cardiovascular abnormalities through impedance cardiography (ICG) [3], impedance
glottography (IG) [4], impedance pneumography (IPG) [5], uro-lithosis [6], study of
cell suspension in cancer [7], observation of post knee replacements, prevention of
amputation of limps in diabetic patients [1], etc. are some of the applications where
bio-impedance measurement is employed.
Unlike X-ray and Computed Tomography (CT) bio-impedance measurement
does not require radiation exposure and presence of trained personnels. It is also
a noninvasive procedure which involves no risks as compared to blood tests or pap
smear tests and also does not involve complications like general anaesthesia. Due to
these immense advantages over other diagnostic methods, it is necessary to develop
cost effective, user friendly, indigenous, accurate and application-specific measuring
device for bio-impedance measurement. Bio-impedance measurement comprises a
drive module which is voltage or current excitation, a Tissue Under Test (TUT),
electrodes and sense module which measures the response of the tissue. Basic block
diagram of bio-impedance measurement is shown in Fig. 1. Tissue Under Test (TUT)
is represented as bio-impedance simulator during the testing phase of bio-impedance
devices.
Simulation is the artificial replication of an actual-world process with sufficient
fidelity by providing immersion, feedback, reflection, and practice without the risks
involved in real-time measurement. Medical domain uses simulators for education
and training, testing of new drugs in pharmacology and for validating newly devel-
oped health monitoring devices. Patient safety, ethical sensitivities of subjects where
people refuse to use newly developed devices on them, depleted resources such
as unavailability of a fully fledged open source database, shortage of experimen-
tal animals, lack of skilled training personnels urge the use of simulators in these
applications. Accuracy and safety of a bio-medical device is to be ensured before it is
used for real-time measurement in human subjects [8]. Such simulators which help to
test and validate the developed bio-impedance measurement devices are called bio-
impedance simulators. Bio-impedance simulator-based measurement yields multi-
tude of bio-impedance values for creation of detailed open source database.
Automated Simulator for the Validation of Bio-Impedance Devices 19
From the behaviour of tissues with respect to frequency, Cole and Cole proposed
equivalent circuit of tissue consists of resistive and reactive components. Different
combinations of resistances and capacitors are suggested by researchers to represent
this equivalent circuit.
Frickse-Morse model is simple, mostly used and is used for the present work too
which is shown in Fig. 2. Ri is intracellular resistance offered to the current due to
proteins and ions within the cell as well as resistance offered by cell membrane; Re
is the extracellular resistance due to ions outside the cell and Ci is capacitance due
to membrane which lets in ions selectively just like a capacitor.
3.2 Design
Extracellular
Resistance, Re
4 Automation
AGND B1
Re
B2 A1
Ri
A2 W1
W
W2 VDD
AD8402
A B _____
DGND RS
____
SHDN CLK
_____
CS SDI
Maximum Resistance
Resolution = (1)
2Bit Resolution
1000
= 256 (2)
2
= 3 (3)
There can be total of 332 values for each resistance. AD8402 is controlled through
Serial Peripheral Interface (SPI) by Arduino.
C
Cpn
Swn
E
C
Cp4
Sw4
E
C
Cp3
Sw3 (a)
E
Push BuƩon
C C
Cp2 (b)
Sw2 Sw
E E
MOSFET
C (c)
Cp1
Sw1
E MulƟplexer
Capacitor
Bank, Ci
Controller
Demul plexer
C15
C14 SIG
D7
C13
C12 S3
C11
C10 Digital D6
C9 S2
Poten ometer Monitor
C8 CD74HC4067
16-CHANNEL D5 ARDUINO 16*2 LCD
C7 S1 B1
AGND
C6 Analog Re A
UNO VDD
B2 1
D4
C5 MulƟplexer S0 A2
Ri
W1
C4
W2 VDD
AD8402 ___
C3 EN SCK GND
DGND RS
C2 ____
SHDN CLK
C1 VCC ___ MOSI SCL
SDI
CS
C0
GND SS SDA
Digital
Poten ometer
Monitor- I2C
based LCD
MODULE
Controller
- Arduino
Uno
CD74HC4067
16-CHANNEL
Analog
Mul plexer
Capacitor Bank
1200
Digital poten ometer (R i, Re (Ω))
1000
800
600
Slope=1 indicates
400 accurate resistance
values
200
0
0 200 400 600 800 1000 1200
True value (Ω)
Table 2 Comparison of theoretical and practical capacitances while using 16-channel analog mul-
tiplexer as switch
S3 S2 S1 S0 Ci (Theo.) (nF) Ci (Prac.) (nF) Error (%)
Default 10 12.3 23
0 0 0 0 20 24.04 20.2
0 0 0 1 47 49.2 4.680851
0 0 1 0 67 73.12 9.134328
0 0 1 1 110 116.09 5.536364
0 1 0 0 210 217.7 3.666667
0 1 0 1 310 320.07 3.248387
0 1 1 0 410 422.28 2.995122
0 1 1 1 510 525.22 2.984314
1 0 0 0 710 729.52 2.749296
800
700
600
Ci (nF) (Prac cal)
500
400
Ci PracƟcal
300
Ci TheoreƟcal
200
Tolerance
100 Band
0
0 200 400 600 800 1000
Ci (nF) (Theore cal)
Fig. 8 Comparison of theoretical and practical values of capacitor, Ci incorporated in the simulator
USB
Power
Supply
DC Power Supply
Precision
Bio-impedance
MulƟmeter
Simulator Arduino
Texas
Uno
Board
Sine Wave
Generator
Constant
Current Source
Fig. 9 Experimental setup of validation of bio-impedance simulator using precise constant current
source
Table 3 Bio-impedance simulator acting as varying load for constant current source
Capacitance (nF) Current (µA with varying Ri and Re
Min Max Avg
4.7 158.911 159.7885 159.5256
10 158.8551 159.5258 159.3171
22 158.0882 158.5427 158.3158
47 158.1555 158.5440 158.3146
60 158.1297 158.5621 157.3325
80 157.5245 158.1151 157.3085
100 158.3373 159.0958 158.5408
1000 158.157 159.5232 158.2635
developed simulator is experimentally got as 125 mW. The system will be improved
to study the behaviour of tissues at different frequencies and will be modified to suit
bio-impedance spectroscopy-based disease diagnosis.
References
12. Dodde RE, Kruger GH, Shih AJ (2015) J Med Devices 9(2):11. https://doi.org/10.1115/1.
4029706
13. Muñz-Huerta RF, Ortiz-Melendez ADJ, Guevara-Gonzalez RG, Torres-Pacheco I, Herrera-
Ruiz G, Contreras-Medina LM, Prado-Olivarez J, Ocampo-Velazquez RV (2014) Sensors
14(7):11492. https://doi.org/10.3390/s140711492
14. Gonzalo M, Martínez-Beamonte R, Palacios P, Marín J, Castiella T, Surra J, Burdío F,
Sousa R, Gemes A, Osada J, García-Gil A (2012) Transplantation proceedings, vol 44, no
6, p 1579. https://doi.org/10.1016/j.transproceed.2012.05.006, http://www.sciencedirect.com/
science/article/pii/S004113451200437X
15. Olmo A, Yufera A (2010) pp 178–182. https://doi.org/10.13140/2.1.4991.8402
16. Anand G, Lowe A, Al-Jumaily A (2016) J Electr Bioimpedance 7:20. https://doi.org/10.5617/
jeb.2657
17. Pandey VD, Pandey PC, Sarvaiya J (2008) IETE J Res 54: https://doi.org/10.1080/03772063.
2008.10876200
18. Sruthi S, Dhavse R, Sarvaiya JN (2019) Analog integrated circuits and signal processing.
https://doi.org/10.1007/s10470-019-01430-0
19. Bogonez-Franco P, Nescolarde L, Galvez-Monton C (2013) Physiol Meas 34(1):1. https://doi.
org/10.1088/0967-3334/34/1/1
20. Sanchez B, Bandarenkab AS, Gerd Vandersteenc RB, Schoukensc J (2013) Med Eng Phys 35
21. Analog Devices, Digital Potentiometer (2002). Rev. 3
22. Badger P (2008) Capacitance meter and rc time constants. https://www.arduino.cc/en/Tutorial/
CapacitanceMeter
Analysis of Memory-Based Real Fast
Fourier Transform Architectures
for Low-Area Applications
Abstract The Fast Fourier Transform (FFT)− processor can be described as the
most important Numerical Algorithm of our lifetime. In this paper, we have presented
different Memory-based Real Fast Fourier Transform (RFFT) Architectures for low-
area applications with the Processing Elements (PE) like High Radix Small Butterfly
(HRSB), Urdhva Tiryakbhayam Butterfly (UTB) and a PE with vedic multiplier-
carry lookahead units are utilized to reduce the Area, Delay and Area-Delay-Product
(ADP). In this, the FFT processor is based on Radix-2 Decimation-In-Frequency
(DIF) Algorithm, and it also supports higher radix algorithms. The architectures are
implemented on various Xilinx Field Programmable Gate Array (FPGA) devices and
also on 180 nm Application-Specific Integrated Circuit (ASIC) to compare different
parameters like Area, Delay and Power.
Keywords Fast Fourier Transform (FFT) · Real FFT · Field programmable gate
arrays (FPGA) · Application-specific integrated circuit (ASIC) · Urdhava
tiryakbhayam sutra
1 Introduction
The Fast Fourier Transforms (FFT) are the basic operations in most of the digital
and image signal processing applications, and they are an efficient way to compute
the Discrete Fourier Transform (DFT). The FFT processor plays an important role in
modern digital communications such as digital video broadcasting (DVB), high-
speed digital subscriber lines, medical imaging and power line communications
R. Turaka (B)
Department of ECE, Velagapudi Ramakrishna Siddhartha Engineering College, Kanuru,
Vijayawada, A.P., India
e-mail: [email protected]
M. S. S. Ram
Department of ECE, RVR&JC College of Engineering, Guntur, A.P., India
e-mail: [email protected]
(PLC) [1]. Because of its productive parallel form, the FFT is a benchmark in evalu-
ating the performance of the digital signal processor. Area, speed, power consump-
tion, execution time and throughput are the key parameters of an FFT processor.
Different architectures, algorithms and methodologies have been extensively used
in the past. This forms the foundation for new competent novel architectures. One
fundamental architecture issue is the type of hardware platform. The FFT proces-
sors are commonly implemented on various evaluation boards such as digital signal
processor (DSP), FPGAs and ASICs. Digital signal processors are a generalized form
of microprocessors. The primary reason most engineers choose FPGA over DSP is
driven by the application’s millions instructions per second (MIPS) requirements
and also FPGAs having inherent advantages such as reliability, flexibility and adapt-
ability, where as ASIC stands for application-specific IC, specifically designed for
one particular application. The difference between FPGA and ASIC is that in case
of ASIC, the system once embedded with a function cannot be modified, whereas
FPGAs are highly flexible devices.
A standard FFT processing unit is composed of butterfly computation units,
memories and an address generator. Diverse architectures are designed to enhance
the overall performance and complexity of the FFT hardware units. The processing
element (PE) and the memory unit are the key units in the FFT processor. FFT
processors can be classified based on their attributes, architectures and algorithms
they apply. Based on their architecture, FFT processors are classified as memory,
pipelined and parallel structures and their subdivisions, and also a combination of
these architectures like parallel-pipelined architecture is possible. Pipelined archi-
tectures use many butterfly units to attain high-speed operation with a larger die area.
Many butterfly units are used in a parallel manner in parallel FFT architectures [2].
2 Methodology
M−1
Y (k) = nk
y(n)W M ,0 ≤ k ≤ M − 1
n=0
− j2π/
where W M = e M is the twiddle factor.
Today, most of the signals are real-valued signals which are very important in case
of real-time signal processing. Due to the complex conjugate symmetry property of
real-valued signals can reduce both memory and arithmetic requirements. Hence,
there is no need to calculate all of the samples of FFT.
An in-place FFT architecture with one processing element (PE) shown in Fig. 4 can
process 4 samples at a time, and each memory module in the architecture can store
up to M/4 words of length P. There are four memory banks which are used to store
the intermediate values of the butterfly computations. Each processing element is
connected with four memory banks through the set of multiplexers, in which one set
decides the input sequence and another set decides which memory bank stores the
intermediate values.
The processing element can read and write the data to the same memory module
[4].
This is the example of the index mapping technique of the prime factor algorithm and
common factor algorithm. The FFT processor architecture with a high radix butterfly
unit (HRSB) is shown in Fig. 5. It comprises an index vector generator unit for both
input and output, three memory modules, address generator and an HRSB unit to
compute the butterfly opeartion with the help of prestored twiddle factor parameters.
Without any data conflicts the input data distributes to memory modules by the input
index vector, and the processed data reloads into another memory module by the
output index vector generator.
crowning gem of all sutras, is the most universal and efficient method through which
the product of any two natural numbers can be produced [5].
The proposed RFFT architecture with dual-port RAM with the processing element
where the data can read from and write into the same memory is shown in Fig. 8.
The PE uses a vedic multiplier and carry lookahead adder to improve the area,
speed, delay and area-delay-product (ADP) and is suited for moderate and high-
speed applications. It consists of a control circuit having the controlling signals,
DRAM used to perform both read and write operations and a PE could perform the
butterfly computations of all the stages [7].
Analysis of Memory-Based Real Fast Fourier Transform Architectures … 37
In this, the FFT architectures are implemented using Verilog hardware description
language (HDL) on various FPGA and ASIC platforms.
We have analyzed FPGA performance for Virtex-4 device using Xilinx ISE 14.2
tool. ASIC performance can be obtained from the Cadence Virtuoso RTL compiler
tool in 180 nm technology. The DRAM-VM-CLA architecture uses only one dual-
port memory bank whereas other architectures use more than two memory banks
38 R. Turaka and M. S. S. Ram
which are shown in Figs. 4, 5, 6 and 7. Due to this less-memory complexity, the
area was reduced in the DRAM-VM-CLA architecture. FPGA and ASIC perfor-
mance comparison table for various FFT architectures are shown in Tables 1 and 2,
respectively.
4 Conclusion
References
1. Jo BG, Sunwoo MH (2005) New Continuous-Flow Mixed-Radix (CFMR) FFT processor using
novel in-place strategy. IEEE Trans Circ Syst-1 52(5):911–919
2. Joshi SM (2015) FFT architectures: a review. Int J Comput Appl 116(7)
Analysis of Memory-Based Real Fast Fourier Transform Architectures … 39
3. Bass BM (1999) A low-power, high-performance, 1024-point FFT processor. IEEE J Solid State
Circ 34(3)
4. Ma Z-G, Yin X-B, Feng Yu (2015) A novel memory based FFT architecture for real valued signals
based on a Radix-2 decimation-in-frequency algorithm. IEEE Trans Circ Syst II 62(9):876–880
5. Xia K-F, Wu B, Xiong T (2017) A memory based FFT processor design with generalized efficient
conflict-free address scheme. IEEE Trans Very Large Scale Integr Syst 25(6)
6. Turaka R, Ram MSS (2018) Low area high speed LC-CSLA-RFFT architecture for Radix-2
decimation-in-frequency algorithm. J Adv Res Dyn Control Syst 10(09-Special Issue)
7. Turaka R, Ram MSS (2019) Low power VLSI implementation of real fast fourier transform
with DRAM-VM-CLA. Microprocess Microsyst 69:92–100
Optimization of MEMS-Based
Capacitive Sensor with High-k Dielectric
for Detection of Heavy Metal Ions
Abstract Nowadays, all the countries worldwide facing lots of environmental issues
out of which Heavy Metal Ions (HMIs) contamination is extremely harmful and
hazardous to human health. Many countries were facing enormous challenges to
solve this HMIs problem. Air and Water pollution due to HMIs is a global issue
and required to solve as early as possible to maintain today’s air and water quality
demands. A portable device made using MEMS-based sensors technology recom-
mended to detect the multiple analytes simultaneously for environmental monitoring
applications. Accordingly, our main objective is to optimize and design a capacitive
MEMS-Based sensor platform utilized for sensing the low concentration of HMIs.
As we know, a proposed capacitive sensor is designed using MEMS-based tech-
nology, which ultimately produces the capacitance in the femtofarads (fF) range. It
is tough and costly to measure this fF range of capacitance. An individual capac-
itance to digital converter (CDC) circuit is required to measure this capacitance
produced by MEMS-based sensors. Hence, we have used the HfO2 as High-k dielec-
tric layer to increase the sensor capacitance in picofarads (pF) range to use the
market available CDC circuit AD7150. This mixed dielectric approach is beneficial
to solve the problem of measurement in capacitive sensors for the low-cost applica-
tion using the market available embedded systems. In this work, we try to optimize
the microcantilever-based capacitive MEMS sensor with a different thickness of
HfO2 as a High-k material using COMSOL Multiphysics 5.3 software. The FEA
analysis for an optimized capacitive sensor with HfO2 as a High-k dielectric shows
the maximum capacitance variation 3.5 fF compared to 0.3 fF without High-k for
HMIs mass between 1 and 1000 μg.
1 Introduction
Today many environmental problems have a significant effect on the life of humans
all around the globe, out of which HMIs contamination is proven to be very harmful,
according to World Health Organization (WHO) and Central Ground Water Board
(CGWB). According to reports, 15 million children die each year under the age of
five because of diseases caused by drinking water, out of which HMIs contamination
is the most hazardous and cannot be removed easily. To use any type of sensor in the
Biosensor application requires to have a surface modification to capture the targeted
molecules. So, we need to deposit the gold layer on the top surface of the sensor.
Chromium was used below the gold surface to improve the adhesion of gold to
polysilicon. The Au-layer is used to immobilize the biolinker on the top surface of
the sensor to detect the required analytes. Air and Water pollution due to HMIs is a
global problem that required prime attention and needs to solve as early as possible.
The proposed application assumes variation in HMIs mass between 1 and 1000 μg
per liter range as per the WHO data [1]. The proposed MEMS-based sensor utilized
the capacitive effect to detect the targeted HMI biomolecules. This range of HMIs
mass exerts a force in the range of 9.80665 × 10−9 N to 9.80665 × 10−6 N and
pressure of 0.78453–784.53 Pa for a proposed design. We have also calculated the
number of particles need to detect in order to cross the WHO limit depending on the
molar mass of different HMIs. According to the average particle radii of HMIs, we
have decided the width of the microcantilever beam to be 250 μm.
We have also investigated the previous work done in the field of HMIs detec-
tion, such as microcantilevers functionalized with metal-binding protein AgNt84-6
is demonstrated as suitable sensors for the detection of heavy metal ions like Hg2+ and
Zn2+ [2]. SAMs (self-assembled monolayers) modified microcantilevers used for the
detection of Ca2+ ions are presented in [3]. Arrays of microcantilever sensors encap-
sulated in fluidic wells and fluidic channels are discussed in [4] and [5], respectively.
A Chitosan (CS)-graphene oxide (GO) Surface Plasmon Resonance (SPR) sensor
is explained in [6], while simple microcantilever beam based detection is given in
[7–9]. Since all of these methods use optical readout, they require heavy setup and
costly lab equipment. Also, the optical readouts for microcantilever-based sensors
have disadvantages when used with a microfluidic biosensor environment when the
refractive index of liquid changes [10].
A portable system made of MEMS sensors capable of detecting multiple analytes
simultaneously in air and water is highly demanded. The HMI detection in the vapor
phase can be a solution for laboratory-based detection, but for the field instrument
using MEMS, the temperature cannot be raised beyond a specific limit. Hence, the
microfluidic detection is the only option which required high sensitivity [11, 12].
Capacitive detection has enormous potential to outperform over other sensing tech-
niques for the applications which require to have high sensitivity, stability, and low-
pressure. It is simple to take contact measurements and simple design principles.
Compatible with CMOS technology and easily fabricated as compared to the other
sensor types. This method does not require continuous bias current and unaffected by
Optimization of MEMS-Based Capacitive Sensor … 43
The relation between capacitance (C) of the parallel plate capacitor and applied
potential (V ) is given by
Q
C= (1)
V
where C = capacitance between two plates, Q = amount of charge stored, V =
applied potential.
Now, by substituting V = Ed and Q = εAE Eq. (1), we have parallel plate
capacitance (C) is given by
εA
C= (2)
d
where A = area of capacitor plate, ε = permittivity and d = distance between two
plates.
Also, Capacitive sensor value with mixed dielectric theoretically given by formula
with bilayer dielectric (Air + High-k) is
εA
C= (3)
d −t 1− 1
ke
44 D. Rotake and A. D. Darji
2.1 Sensitivity
In case of BioMEMS, sensor sensitivity is a very important parameter and this value
required to be maximum for sensor application. The sensitivity (Sc) for the capacitive
sensor is calculated by using the following formula:
C
Sc = C (4)
Pressure(Pa)
The materials properties used in the FEA simulation of the capacitive sensor using
COMSOL Multiphysics software are listed in Table 1.
FEA analysis of polysilicon and Silicon Nitride based capacitive sensor has been
performed for different lengths and thicknesses of microcantilever to optimize the
dimension [11]. To improve the sensitivity of the proposed capacitive sensor, we
have performed the sensitivity analysis with respect to different thicknesses of High-
k material. We have used the HfO2 as a High-k material because of its compatibility
with MEMS devices and well-known foundry processes. The fabrication step of the
proposed modified capacitive sensor with High-k is shown in Fig. 1.
Fig. 1 Fabrication steps of proposed modified capacitive sensor with High-k: a Silicon Wafer <100>
b 1 μm SiO2 grown by thermal oxidation at 1000 °C c Aluminum deposition 200 nm d Photoresist
(PR) coating 1 μm e UV- exposure through mask to have desired pattern f development of PR
g etching of desired material h acetone dip for PR removal i deposition of HfO2 3.2 μm j patterning
of HfO2 by repeating steps (d, e, f, g, h) k SiO2 sacrificial layer deposition 0.3 μm l patterning of
Polysilicon = 0.475 μm by repeating steps (d, e, f, g, h) m patterning of Chrome (Cr = 10 nm) and
Gold (Au = 15 nm) by repeating steps (d, e, f, g, h) n SiO2 sacrificial layer etching using Buffered
Hydrofluoric Acid (BHF) to released microcantilever
FEA simulation of the polysilicon-based capacitive sensor without High-k has been
performed using COMSOL software. The analysis is carried out for different pressure
corresponding to HMIs mass between 1 and 1000 μg as per WHO data. The maximum
value of change in capacitance for finalized dimension without High-k dielectric is
0.3 fF for length (L) = 250 μm, width (W ) = 80 μm and thickness (T ) = 0.5 μm
with a maximum displacement of 3.18 μm for the polysilicon-based capacitive sensor
using COMSOL Multiphysics software [15], as shown in Fig. 2. The sensitivity of
this capacitive sensor without High-k using Eq. (4) is found to be 8.39 μF/F/Pa (see
the appendix for calculation).
46 D. Rotake and A. D. Darji
Fig. 2 a Maximum displacement for 784.53 Pa, b pressure versus capacitance graph without High-k
dielectric, c, d showing variation of maximum displacement for 0.784 Pa and 784.53 Pa respectively
for Applied potential (1.2 V)
Incorporating the bilayer of air and High-k dielectric between the two plates of
capacitor solves two existing problems; first, avoid the shorting between to plates
of the capacitor to reduce the power consumption and second, it will increase the
capacitance few hundreds of femtofarads to use the market available CDC integrated
circuit AD7150 (evaluation board with 0–0.5 pF sensor input and a resolution of
1 fF) to directly measure the capacitance in the digital domain and do the further
processing.
Optimization of MEMS-Based Capacitive Sensor … 47
Fig. 3 a Maximum displacement for 784.53 Pa b pressure versus capacitance graph with High-k
dielectric of 0.3 μm
Fig. 4 a Maximum displacement for 784.53 Pa b pressure versus capacitance graph with High-k
dielectric of 1.3 μm
48 D. Rotake and A. D. Darji
Fig. 5 a Maximum displacement for 784.53 Pa b pressure versus capacitance graph with High-k
dielectric of 2.5 μm
Fig. 6 a Maximum displacement for 784.53 Pa b pressure versus capacitance graph with High-k
dielectric of 3.2 μm
Fig. 7 Comparison for Capacitance using Analytical method and COMSOL simulation with
maximum applied pressure of 784.53 Pa (see Appendix-C for theoretical calculation) is shown
in Fig. 7.
4 Conclusion
We have proposed an HfO2 -based MEMS capacitive sensor for HMIs detection in
air and water. Many of the cantilever based design studied in the literature required
the costly lab instruments for characterization and processing to detect the HMIs.
Today’s world requires a portable, low-cost embedded system with fast and accurate
detection of contaminants. The proposed capacitive method overcomes all the above
problems with HfO2 as a High-k dielectric which ultimately increases the base capac-
itance of the proposed sensor to picofarads range. This approach solves two existing
problems; first, avoid the shorting between two plates of the capacitor to reduce the
power consumption, and second, it will increase the change in capacitance to few
femtofarads in order to use the market available CDC integrated circuit AD7150
(evaluation board). The FEA analysis for an optimized capacitive sensor with HfO2
as a High-k dielectric shows the maximum capacitance variation 3.5 fF compared to
0.3 fF without High-k for HMIs mass between 1 and 1000 μg. The maximum value
of change in capacitance for an optimized polysilicon-based capacitive sensor with
High-k dielectric 0.3 μm, 1.3 μm 2.5 μm, and 3.2 μm is 0.35 fF, 0.50 fF, 1.1 fF, and
50 D. Rotake and A. D. Darji
3.5 fF, respectively. FEA analysis of the optimized capacitive MEMS sensor shows
nearly ten times improvement in changing the capacitance and also able to use the
market CDC (embedded system) to reduce the system cost.
Acknowledgments The authors would like to thank the Ministry of Electronics and Information
Technology (MeitY), New Delhi for partial financial support through “Visvesvaraya Ph.D. Scheme
for Electronics and IT” and SMDP-C2SD project.
Appendix
Conversion Table
There are various HMIs out of which Cadmium (Cd2+ ) Mercury (Hg2+ ), Lead (Pb2+ ),
Arsenic (Ar2+ ) and Chromium (Cr2+ ) are hazardous and WHO (World Health Organi-
zation) has provided limits 3, 1, 10, 10, 50 μg/L, respectively. Copper has the largest
limit (Cu2+ ) = 1000 μg/L so, from this we thought of mass variation in 1–1000 μg/L
and conversion of mass to force shown in Table 3.
• Here, we have selected area (L = 250 μm and width = 80 μm) = 20000 (μm ×
μm)
• So,
No. of particles
that can sit in this area of microcantilever =
20000(μm×μm)
Π×150×150(pm×pm)
= 2.83085 × 1011
• Hence, approximately we can detect mercury ions in the range of 1 pg to 1 ng.
• The sensitivity (Sc) for capacitive sensor without High-k is calculated by using a
formula:
C/C 0.3 fF/45.5 fF
Sc = = = 8.39 μF/F/Pa
Pressure(Pa) 784.53 (Pa)
• The sensitivity (Sc) for capacitive sensor with High-k is calculated by using a
formula
C/C 3.5 fF/532 fF
Sc = = = 8.39 μF/F/Pa
Pressure(Pa) 784.53 (Pa)
• From the calculated sensitivity of the capacitive sensor with and without High-k;
it is clear that change in capacitance is improved without affecting the sensitivity
of the sensor.
References
1 Introduction
The major areas of concerns like intelligent transportation system and intelligent
traffic management in smart cities are useful for economic growth and development,
improvement in road transportation, use of information technology in the road infras-
tructure modification. This enables to achieve smart city challenges in India. The
ITS application and systems are implemented with varieties of domain like wireless
technology, Bluetooth, sensor-based, internet of things, wireless sensor networks,
artificial intelligence, machine learning, image processing etc.
In this paper is the area of domain used is internet of things and wireless sensor
network to integrate the transportation system with the information and mobile tech-
nology. The modification in current system will be by permission of government
agencies and ministry of transportation and by law and rules if applicable.
The intelligent transportation system had wide range of applications which are useful
for public safety measures and public concern domains like smart parking or parking
guidance and information system, intelligent traffic management system, integrated
multi-modal transport, vehicle detection, accident notification system, traffic anal-
ysis, car navigation, automatic number plate recognition, traffic signal, toll collection,
bill payment etc.
The accident notification system is one of the challenge were researcher focuses
in order to save public lives. The major cause for accident in urban area or the
construction area are due to uncontrolled vehicles, driver’s ambiguous behavior,
traffic at roadside, lack of rules, unavailability of traffic police etc. In the construction
area most of time accidents occurs at day time or night. So in order to avoid the
accidents the notification to the workers in advance will be given.
Accidents can be prevented if vehicle crossing the prone area is detected and
its detected distance from prone area is notified to cloud service via the proposed
methodology. In the proposed system the distance value is calculated using the ultra-
sonic sensor configured with Arduino-mega micro-controller. As soon the value fed
in controller will be send to cloud service via NodeMCU device. To predict the traffic,
vehicle analysis will help to give the prediction of number of vehicles crossing the
prone area considered. Also the accidents can be prevented if workers working in
prone area are warned in advance with calculated distance.
The accident notification system involves the vehicle detection module in the appli-
cation of intelligent transportation system considered in this paper. The vehicle detec-
tion in the accident alert module is one of the crucial tasks. In order to detect the
vehicles real time the experiments had been carried out. Experiments are carried out
to detect the nearest vehicle crossing the construction prone area by analyzing real
time investigation.
Statistical Analysis of Vehicle Detection in the ITS Application … 57
The real time vehicle detection will give the distance calculated from prone area
and flow rate of vehicles will help to evaluate the traffic and to avoid the accident. For
public safety the intelligent transportation system plays important role when there
are traffic issues or accident at road side or construction area. In order to develop and
achieve the goals of ITS there is vast scope of improvement for system modeling and
enhancing.
2 Literature Review
Literature review will give an idea to know the current trends used for intelligent
transportation system and its application. The study is been carried out for vehicle
detection and its analysis in the traffic evaluation and accident notification application
in the field of Intelligent transportation system.
The road accidents and vehicle monitoring in the intelligent transportation system
are main area of inter-est. Using VANET and connected technology the vehicle was
monitored [2]. The intelligent traffic monitoring system based on cloud computing
was considered as future scope [2].
Real time vehicle detection and counting of vehicles using image processing can
also be achieved. Data processed from video vehicle detected and compared with the
existing algorithms [3].
Parking is one of the issues to be solved in intelligent transportation system;
the online parking information system was developed which also shows the vehicle
detection using web application method [4].
Vehicle detection based on anisotropic magneto-resistive sensor can be used in
traffic monitoring. Vehicle detected with its digital distance measurement for finding
the speed of vehicle [5].
Vehicle detection in the traffic evaluation and accident alert or accident notification
system is one of the tasks to be performed. The vehicle detection and speed of vehicle
is evaluated using ultrasonic sensor and Arduino microcontroller [7].
A prototype for vehicle monitoring, driver condition, fuel monitoring and location
finding was developed using microcontroller, GPS and cloud services [8].
An internet of things based vehicle tracking system was studied to find various
challenges based on IOT based solutions [9].
The traffic congestion system developed to find the traffic density using the Intel
Galileo and IR sensor. Real time monitoring of traffic is to be considered as future
evaluation parameter [11].
DSRC (dedicated short range communication) help in sharing the street light,
road side infrastructure with ISP’s [12].
Study of different types of detectors for vehicle in traffic monitoring or accident
alert or ITS application [15].
The traffic congestion, traffic management and vehicle queue management was
the focused issues. The two models i.e. TDMM (traffic density monitoring module)
and TMM (Traffic management module) were proposed which uses the internet of
58 D. Vadhwani and D. Thakor
things. Real time traffic monitoring using internet of things based will gives the
experimental results [17].
Traffic and real road monitoring in single lane and multiple lanes. Vehicle detection
in single and multiple lanes can be monitored in the model implemented. Traffic
acquisition based on wireless sensor network for single lane of road monitoring was
analyzed [21].
From the detailed literature study some the parameters were found which are
vehicle speed, vehicle distance, traffic condition, driver behavior, Volume of traffic,
Time gap, vehicle count, parking slot, fuel detection, vehicle count, street light
sharing, infrastructure sharing etc.
Above Table 1 shows the statistical analysis of vehicle detection, traffic and acci-
dent by consideration of various parameters like traffic density, vehicle count, speed,
distance, time gap, accident rate, hazardous location etc. The literature study shows
the statistical method implemented to find the vehicle count, number of accidents
occurred and traffic conditions.
In the study of existing statistical method like Poisson distribution, Binomial distri-
bution, Exponential distribution etc., were used for statistical analysis for different
parameter evaluation. In order to avoid the accidents at construction site there should
be some statistical evaluation of vehicle crossing near that prone area.
The real time traffic evaluation and real time vehicle detection and notification
are the scope found from the above literature studies.
The statistical Poisson method [6, 10] is used for evaluating the random number.
So in order to evaluate the vehicle count using Poisson method [6, 10] was used. The
existing system implemented to detect vehicles or for traffic evaluation.
In this paper we came with new system implemented to find the distance of vehicles
crossing near the prone area in order to avoid the accident or crash at construction
site and save the workers and vehicle driver lives.
The main contributions of this paper are as follows:
• In the proposed method the distance is calculated from prone area using
experimental set-up.
• This distance parameter is the random value of vehicle distance which is used in
the statistical evaluation to find the number of vehicles crossing the prone area.
• The experimental setup is portable.
• Internet of Things technology is used for real time recording of distance value.
In order to consider the parametric evaluation of vehicle detection for accident
notification system in the construction area following section explains the prototype
evaluation of vehicle detection using internet of thing and cloud services.
Statistical Analysis of Vehicle Detection in the ITS Application … 59
Table 1 Literature on statistical analysis for vehicle detection, traffic congestion and accident
Sr. Paper title Data sets Statistical methods Traffic Remarks
No parameters
1. Probability Data sets Poisson and Accident Statistical
approach for obtained from binomial rate, approach
ranking traffic distributions hazardous estimates the
high-accident department in location probability of
locations [1] Riyadh having a
(Capital of certain
Saudi Arabia) number of
sites which
can be
classified as
hazardous
locations
2. Distances between An automatic Poisson’s Volume of The
vehicles in traffic detector Distribution of traffic, speed simplified
flow and the (radar) was probabilities, density, time models have
probability of used to give negative-exponential gap, distance shown the
collision with the value for model road barrier
animals [6] distance effect with
certain traffic
intensity on
wildlife
3. Mathematical study Datasets Poisson distribution Traffic flow, To traffic
for traffic flow and collected from density and congestion
traffic density in the national speed can be reduce
kigali roads [10] police of and the
Rwanda in number of
2012. Data accidents on
sets collected Kigali roads
from Kigali are analyzed
roads depending on
specifically at the existing
roundabouts roads and the
from Kigali number of
Business vehicles
Center (KBC) travelling in a
to Prince particular
House as our place
study sites
4. A low-cost IoT Data sets Poisson distribution Vehicle System
application for the collected and exponential count, predicts
urban traffic of using distributions w density of alternatives to
vehicles, based on ARDUINO traffic reduce the
wireless sensors UNO R3 GSM traffic
using GSM /GPS module, congestion,
technology [13] laser sensor fuel waste,
and air
pollution
(continued)
60 D. Vadhwani and D. Thakor
Table 1 (continued)
Sr. Paper title Data sets Statistical methods Traffic Remarks
No parameters
5. A Datasets Poisson regression Vehicle A random
semi-nonparametric collected from model (Semi count, parameter
Poisson regression 1443 rural nonparametric density of model can be
model for analyzing highway (SNP) distribution) traffic developed
motor vehicle crash sections in and
data [19] California compared
State of USA with the SNP
from 1993 to model and
2002 the traditional
NB model
3 Proposed System
The proposed system for the notification of accident in the construction area is
mention in [18] and its prototype is implemented using Arduino 2560 microcon-
troller, two ultrasonic sensors, NodeMCU wifi module. For evaluating and analyzing
the distance values generated from sensors is recorded on the thingspeak cloud
service.
From Fig. 1 the diagram shows the proposed system used in the current prototype
implementation [18].
Fig. 1 Block diagram for proposed system for vehicle infrastructure, accident alert and mobile
application [18]
Statistical Analysis of Vehicle Detection in the ITS Application … 61
Also from [18] my system model was proposed to alert the workers in construction
area or urban area, the distance was assumed to be 4 to 400 cm. When the vehicle is
detected is nearer to the prone area i.e. distance less than 4 cm than the warning will
be send to workers using a buzzer, here LED is considered as output. So the LED
output will be HIGH and when the distance is more than 4 cm than LED output is
LOW. The proposed system [18] is implemented by considering the first scenarios
as explained in next sections.
Assumption for Vehicle Detection and Analysis:
Consider the following assumption for algorithm making:
• Find the traffic zone or construction areas were the accidents are occurring
frequently.
• Mount the proposed prototype or scenario in that zone.
• Install the mobile application in the user mobile or vehicle driver mobile.
• Connect through Wi-Fi the Sensors, Microcontroller, Web or Mobile application
to the cloud service thingspeak.com
• Assume that model proposed may be install externally in the prone area or may
be install in the vehicle itself which will be evaluated after configuration and
experimentation.
• Assume distance of vehicle between 4 and 400 cm for calculation.
• Availability of internet or Wi-Fi for communication.
Algorithm Steps for System Model for Vehicle detection which can be installed
externally:
(For Vehicle Detection and Analysis)
where,
P(n) = Probability of exactly ‘n’ vehicles arriving in given time.
λ = Average arrival rate (vehicles / unit time).
n = Number of vehicles arriving in specific time interval.
e = 2.71828 (constant) (‘e’ is the base of the natural logarithm system).
As the system implemented the value for distances from the prone area was
calculated for 1, 2 and 3 h. All the distances from prone area to the road were
recorded on the cloud service thingspeak.com. After the measurement of the distances
using sensors the live data of vehicle detection, the statistical analysis by poisons
distribution method from [6, 10] and [13] was calculated on the recorded data for
prediction of vehicle detection, traffic conditions and accident evaluation.
4 Implementation
Fig. 2 Prototype for proposed system model with distances less than 4 cm (LED = HIGH)
Fig. 3 Prototype for proposed system model with one side distance less than 2 cm (LED = HIGH)
and another side distance is greater than 4 cm (LED = LOW)
The evaluation results show that, when the vehicle is the crossing the prone area
than its distance is measured and according to the distance generated the LED output
responses.
The generated distance are send to the cloud service in order to warn the vehicle
drivers and the workers working in construction prone area. The distance will helps to
analyze the amount of traffic in that area and the number of vehicles coming nearer to
prone area. These distances will help in detection of the number of vehicles crossing
the prone area and the number of accidents occurred at road side or construction
area.
64 D. Vadhwani and D. Thakor
Fig. 4 Output when the vehicle or object is nearer i.e. less than 4 cm distance and when object is
far i.e. distance is 5 cm
The real time data is generated from sensor and controller and is sent to cloud
service for monitoring the vehicle condition. The real time value is posted and is
shown in Fig. 5.
For the accident detection at construction site, to find the vehicle crossing nearer to
prone area is one of the crucial tasks. When the vehicle is detected nearer to prone
area, workers can be alerted using some notification device or alarm. For the proposed
system gives an immense idea to find the distance of vehicle crossing nearer to prone
area. The number of vehicle were crossed nearer to prone area were computed. The
number of vehicle passed in unit time was calculated using poison’s distribution
method [14–20].
Here prone area is the construction site where workers are working, so for safety
of workers we need to evaluate traffic and vehicles crossing near that area. To find
the number of vehicles crossing near the prone area, assume distance is 4 to 400 cm.
If vehicle distance is less than 4 cm than there are chances of accidents, if vehicle
distance is greater than 4 cm there may be no accidents.
From the real datasets collected the number of vehicle or vehicle count per hour
was calculated.
Total number of vehicle arrived in one hour = 96 vehicles per hour.
Total number of vehicle arrived in two hour = 193 vehicles per hour.
Total number of vehicle arrived in three hour = 310 vehicles per hour.
For Poisson’s distribution [14, 19],
E(n) = λ.
λ1 = 96 ∗ 60/3600 = 1.6
where,
P(n) = Probability of exactly ‘n’ vehicles arriving in given time.
λ = Average arrival rate (vehicles/unit time)
n = Number of vehicles arriving in specific time interval
e = 2.71828 (constant) (‘e’ is the base of the natural logarithm system.)
Probability values of the vehicles arrivals calculated using Poisson’s distribution
is as shown in Table 2 below.
The graph of probabilities is shown in Fig. 6.
The minimum number of people died annually in Indian construction sector from
2008 to 2012 was 11,614. These estimates number of accidents are based on available
reliable information derived from the construction sector of National Capital Territory
(NCT) Delhi region using different sources [6] (Fig. 7).
The road accidents are frequent in time. So there should be model to evaluate the
random number. In this paper the Poisson’s distribution is proposed to compute the
probability and to model the number of vehicles arrived in the available duration time.
Since the Poisson’s distribution is used to describe the random numbers [7, 10, 13]
the probability for ‘n’ vehicles arrived in given time interval and the probability that
certain number of vehicles arrived in given time interval is computed using Poisson’s
distribution.
The probability of number of vehicles arriving in unit time is computed using P
(n) function by Poisson’s distribution.
After the implementation the results were recorded on the cloud service things-
peak.com. Prototype implementation calculates the distance using ultrasonic sensors.
The two sensors are used to calculate the distance values from 2 to 20 cm and 2 to
400 cm. After the distance values recorded using the module, the cloud service shows
the live data recording for 1, 2 and 3 h.
From the data collected for one hour the vehicle count is 96 vehicles per hour, so
λ1 = I/3600 * t [19], i.e.
Value of λ1 = 1.6 vehicles per second.
From the data collected for two hour the vehicle count is 193 vehicles per hour,
so λ2 = I/3600 * t [19], i.e.
Value of λ2 = 3.21 vehicles per second
From the data collected for one hour the vehicle count is 310 vehicles per hour,
so λ3 = I/3600 * t [19], i.e.
Value of λ3 = 5.16 vehicles per second
The hourly flow rate at construction site gives 96 vph using proposed system
model and from data set available shows that out of 96 vehicles in hour has 29
vehicles are having the distances less than 4 cm so there is chance of 30% that there
may be an accident occurred.
The work performed by authors in [10], shows analysis of data of accidents
obtained from the national police using Poisson distribution that the probability
of having more accidents on the specified roads in the study in [10] is very high.
Also the authors of [1] finds a statistical approach to estimate the probability of
having a certain number of sites which can be classified as hazardous locations (a
location with at least k accidents in a defined period of time) within a certain region.
Now from our results generated, the hourly rate of vehicle’s crossing near the
prone was calculated.
Also it gives the distance value of vehicles which are crossing nearer to prone area,
which is used find the count of vehicles which may cause the hazardous environment
in construction area. In our work the distance parameter was evaluated to check the
number of vehicles coming nearer to prone area which may decrease the accident
rate at construction site.
The public safety on road and safety of workers on construction side in urban areas is
one the major area of application in the field of intelligent transportation system. The
current system consists of vehicle detection in accident alert or accident notification
system.
Internet of Things is research area were the things, devices can be controlled using
web of world, i.e. things connected to web. This idea helps the researcher to monitor
the real time conditions. The current system is implemented using internet of things
which is the widely recent domain of research to monitor the real time scenario for
traffic prediction, vehicle detection and accident alert. The results generated from
the statistics givens an immense idea to evaluate the distance of vehicle at road side
or construction side in the prone area. The real time values will gives an analysis
of vehicle detection at construction site which will help to reduce the accidents at
construction site. Finally this will help in public safety and reduce deaths at road
side.
To avoid the road accidents at construction site there are parameters, i.e. vehicle
crossing near prone area and flow rate to be computed. In this paper Poisson’s distri-
bution was used to compute the flow rate of vehicles and vehicle arrival in given time
interval. The results shows that when the number is random we can find the flow
rate and can evaluate the traffic at particular time interval. Also all the vehicles have
the distances values calculated using the system model implemented in the prone
Statistical Analysis of Vehicle Detection in the ITS Application … 69
area. Flow rate of vehicles is used to evaluate the traffic at given time interval. The
number of vehicles which are crossing nearer prone can give the prediction results
of accident occurrence. As if the distance of vehicle is less than 4 cm than there
is chance of accident and to avoid this, the LED output unit or an alarm gives the
notification to workers at construction site to be safe.
In future the traffic prediction algorithms will be used to reduce the traffic and
reduce the accident rate at construction site. Another future work will be real time
data evaluation for real time traffic monitoring at roadside using some statistical
analysis with comparative analysis of existing methods.
References
1. Al-Ghamdi AS (2000) Probability approach for ranking high-accident locations. In: Brebbia
CA, Sucharov LJ (eds), Urban transport VI. WIT Press. www.witpress.com. ISBN 1-85312-
823-6
2. Agrawal Y, Jain K, Karabasoglu O (2018) Smart vehicle monitoring and assistance using cloud
computing in vehicular Ad Hoc networks. Int J Trans Sci Technol 60–73. Publishing services
Elsevier B. V
3. Anandhalli M, Baligar V (2017) A novel approach in real-time vehicle detection and tracking
using Raspberry Pi. Faculty of Engineering, Alexandria University, Production and hosting by
Elsevier B.V
4. Bagenda D, Parulian C (2011) Online information of parking area using ultrasonic sensor
through Wifi data acquisition. In: ICon-ITSD IOP Publishing IOP conf. series: earth and
environmental science, vol 175
5. Daubaras A, Zilys M (2012) Vehicle detection based on magneto-resistive magnetic field sensor.
Electron Elect Eng 2(118). ISSN 1392–1215
6. Martolos J, Anděl P (2013) Distances between vehicles in traffic flow and the probability of
collision with animals. Transact Trans Sci 6(2)
7. Jain A, Thakrani A, Mukhija K, Anand N, Sharma D (2017) Arduino based ultrasonic radar
system using Matlab. Int J Res Appl Sci Eng Technol (IJRASET) 5(4). ISSN: 2321-9653
8. Josea D, Prasad S, Sridhar V (2015) Intelligent vehicle monitoring using global positioning
system and cloud computing. In: 2nd international symposium on big data and cloud computing
(ISBCC’15), Elsevier B.V
9. Kannimuthu S, Somesh C, Mahendhiran P, Bhanu D, Bhuvaneshwari K (2016) Certain investi-
gation on significance of internet of things (IoT) and big data in vehicle tracking system. Indian
J Sci Technol 9(39). https://doi.org/10.17485/ijst/2016/v9i39/87427
10. Idrissa K (2017) mathematical study for traffic flow and traffic density in kigali roads. In: World
Academy of Science, Engineering and Technology International Journal of Mathematical,
Computational, Physical, Electrical and Computer Engineering, vol 11, no 3
11. Lakshminarasimhan GV, Parthipan V, Mohammed Irfan A (2017) Traffic density detection and
signal automation using IoT. Int J Pure Appl Math 116(21):389–394
12. Ligo A, Peha J (2018) Cost-effectiveness of sharing roadside infrastructure for internet of
vehicle. In: IEEE transactions on intelligent transportation systems, IEEE
13. Nugra H, Abad A, Walter F, Galárraga F, Aules H, Villacís C, Toulkeridis T (2016) A low-
cost IoT application for the urban traffic of vehicles, based on wireless sensors using GSM
technology. In: IEEE/ACM 20th international symposium on distributed simulation and real
time applications, IEEE
14. Onoja A, Oluwadamilola A, Adewale L (2017) Embedded system based radio detection and
ranging (RADAR) system using arduino and ultra-sonic sensor. Am J Embed Syst Appl 5(1):7–
12
70 D. Vadhwani and D. Thakor
15. Parulekar G, Desai D, Gupta A (2009) Vehicle detect and monitor techniques for intelligent
transportation—a survey. IOSR J Mechan Civil Eng (IOSR-JMCE) 57–59
16. Prasetyo M, Latuconsina R, Purboyo T (2018) A proposed design of traffic congestion
prediction using ultrasonic sensors. Int J Appl Eng Res 13(1):434–441. ISSN 0973-4562
17. Sadhukhan P, Gazi F (2018) An IoT based intelligent traffic congestion control system for road
crossings. In: International conference on communication, computing and internet of things
(IC3IoT), IEEE
18. Vadhwani D, Buch S (2019) A novel approach for the ITS application to prevent accidents using
wireless sensor network, IoT and VANET. In: The 2019 third IEEE international conference
on electrical, computer and communication technologies IEEE ICECCT (2019)
19. Ye X, Wang K, Yajie Z, Lord D (2018) A semi-nonparametric poisson regression model for
analyzing motor vehicle crash data. Plos One. https://doi.org/10.1371/journal.pone.0197338
20. Youngtae J, Jinsup C, Inbum J (2014) Traffic information acquisition system with ultrasonic
sensors in wireless sensor networks. Int J Distribut Sensor Netw 961073:12. Hindawi Publishing
Corporation
21. Youngtae J, Jung I (2014) Analysis of vehicle detection with WSN-based ultrasonic sensors.
Sensors. www.mdpi.com/journal/sensors. ISSN 1424-8220
Qualitative and Quantitative Analysis
of Parallel-Prefix Adders
Abstract Binary adders are one of the most recurrent architectures in digital VLSI
design, and the choice of adder architecture can boost or bust the overall perfor-
mance of the design. Parallel-Prefix Adders are preferred over conventional adders
for higher wordlengths. In this paper, a comprehensive, qualitative and quantitative
analysis of popular Parallel-Prefix Adders for various wordlengths (N = 4, 8, 16
and 32) is presented. The adders are implemented using VHDL coding and Vivado
2016.2 HLx platform targeted for Basys3 board and compared on the basis of Device
Utilization, Speed and Power Consumption. Results indicate that Kogge–Stone adder
is the fastest adder with Fmax = 104.93 MHz but is most area-power inefficient con-
suming 133 LUTs and 23.756 W power at 10 GHz. Sklansky adder is most power
efficient consuming 22.857 W power at 10 GHz. Brent–Kung adder is area optimum
consuming 62 LUTs.
1 Introduction
becomes customary for a VLSI designer to choose an architecture which has high
speed or low power or both.
Digital hardware implementation of any signal processing algorithm can be
mapped to three arithmetic computational blocks, viz., Adders, Multipliers and
Shifters. Of these, adders are the most frequently occurring sub-circuit elements
in processors, ALU units, multipliers, dividers, etc. Beyond all doubts, adder archi-
tecture can boost or bust the overall performance of the system. Performance of
traditional adders such as Ripple Carry adders, Carry Propagate adders, etc., deteri-
orates with increase in wordlength [3]. Carry Look Ahead adders are viewed as best
alternative for adders which offer good speed and consume less power beyond 32-bit
wordlength input [4]. Variations of Carry Look Ahead adders, collectively known
as Parallel-Prefix Adders, are potential candidates for the abovementioned scenario.
A VLSI designer may fail to choose an apt adder if all these architectures are not
properly evaluated for the application under consideration.
In apropos, we present an exhaustive comparative analysis of various binary
adders. The most popular Parallel-Prefix Adders are Sklansky Adder (SA),
Kogge–Stone Adder (KSA), Brent–Kung Adder (BKA), Han–Carlson Adder (HCA),
Ladner–Fischer Adder (LFA) and Knowles Adder (KA). We have thoroughly
reviewed architectures of the aforementioned adders [5–11]. Moreover, we have
compared them on a single platform for various wordlengths so that a consensus can
be reached on the choice of adder architecture. This paper will aid any designer to
choose the adder architecture based on constraints in the design.
Section 2 deals with a brief literature review of the Parallel-Prefix Adders, fol-
lowed by analysis of Parallel-Prefix Adders with Boolean algebra in Sect. 3. The
experimental results obtained using Xilinx Vivado 2016.2 HLx tool for Basys3 board
are represented in Sect. 4. Section 5 covers detailed analysis and discussion based
on the results.
2 Literature Review
The gate-count explosion problem is solved in adder structure by Brent and Kung
[5], who proposed another regular adder structure for the computation of prefixes.
Brent–Kung approach increases logic depth of the adder and avoids long wires. While
still maintaining the computation time complexity at O(21og2 N − 1), Brent Kung
approach achieves area complexity as O(Nlog2 N ). Han and Carlson [9] combined
BKA and KSA structures and proposed an intermediate adder structure whose depth
grows in proportion to 2k N + N 2 /2k, where N is the number of bits and k is the
increase in depth of the hybrid adder structure. The area complexity of HCA is
O(Nlog2 N ).
Ladner and Fischer [8] conceptualized that adders could be represented by directed
acyclic oriented graph tree. Each node of the graph tree would represent the Boolean
product (yn = x1 x2 . . . xn , yn−1 = x1 x2 xn−1 , . . . y1 = x1 ; n ∈ D) at the node n. The
edges in graph reflect the wiring on the integrated circuit adder. The computation
time for parallel carry computation depends on the depth of the graph, while the
amount of hardware required to implement is dictated by the number of nodes in
such graph. Ladner and Fischer [8] adders have a depth proportional to 2(log2 N )
and size < 4N for N-bit binary addition.
Knowles [6] established that KSA and LFA are two extreme ends of prefix adders
having minimal depth and proposed a family of prefix adders which serve as a trade-
off in terms of speed versus power between the two extremes. Dimitrakopoulus and
Nikolos [4] proposed an adder architecture which offers reduced delay in comparison
with KA and also saves one-logic level of implementation. Das and Khatri [14]
proposed an architecture which is significantly faster than the BKA, but with some
area overhead. At the same time, it offers marginally high speed in comparison to
the KSA, but with much reduced area.
Roy et al. [15] proved that prefix-adder architecture is most efficient when bit-
width N is a power of 2 (size optimality criteria) and logic depth level (L) is log2 N .
Detailed analysis of impact of voltage scaling and process variation on efficiency of
32-bit Parallel-Prefix Adders is investigated by Bahadori et al. [16].
3 Parallel-Prefix Adders
Cin Cout
The initial preprocessing stage computes the generate (gi ) and propagate signals ( pi )
and half-sum (di ) , i ∈ (0, N ) defined as
gi = ai .bi (1)
pi = ai + bi (2)
di = ai ∧ bi (3)
where ‘.’ denotes AND operation, ‘+’ indicates OR operation and ‘∧’ denotes XOR
operation.
This stage computes the carry signals ci using the outputs of the preprocessing steps gi
and pi . The parallel-prefix circuit with input of wordlength N and inputs u 1 , u 2 . . . u N
computes outputs v1 , v2 . . . v N determined using an associative operator ◦ such that
v1 = u 1 , v2 = u 2 ◦ u 1 . . . v N = u N ◦ u N −1 ◦ . . . u 2 ◦ u 1 [8].
Carry Computation stage is modeled as a prefix problem by Han and Carlson [9]
and defines the association between the pairs of generate and propagate bits as
where i ∈ (0, N ) and ‘.’ denotes AND operation and ‘+’ denotes OR operation.
Extending Eq. 4 results in a series of consecutive generate and propagate pairs (g,
p) produced from bits k, k−1, k−2…, j, defined as
The operator ◦ can be represented as a node in graph and the pairs (G i: j , Pi: j )
form the edges of the graph.
Equation 6 corresponds to a set of (G,P) pairs computed using combinational
circuits, known as Black cell and Gray cell. The representation is shown in Fig. 2.
The buffers indicated by inverted triangular notation are used for data synchronization
at the outputs. At the end of this stage, the carry of the stage is available as
ci = G i:0 (7)
The various Parallel-Prefix Adders achieve high speed of operation through variation
of the prefix-tree stage. In essence, the number of gray and black cells and their
arrangement (i.e., depth of the graph and the interconnections between the cells)
dictate the speed of the design. The characteristic of Parallel-Prefix Adders is that
the carry-computation time doesn’t increase linearly with increase in Wordlength N
and the carry bits are computed in parallel in the prefix-tree stage, hence justifying
the name. At the end of the prefix carry-computation stage, the carry bits are available
at the output of the prefix tree. The carries are fed to the post-processing stage.
76 S. Janwadkar and R. Dhavse
In the post-processing stage, the carry bits are XORed with the half-digit bits
produced in the preprocessing stage to produce the final sum bits. This stage computes
the sum bits S = s N −1 s N −2 s N −3 . . . s2 s1 s0 as [18]
si = di ∧ ci−1 (8)
The various Parallel-Prefix Adders viz., SA [10], KSA [7], BKA [5], HCA [9],
LFA [8] and KA [6] differ in the architecture used in this stage, i.e., depth of the
adder and the amount of routing [12].
Various literature sources [4, 17, 19–21] describe the architecture of the Parallel-
Prefix Adders using the gray and black cells represented in Fig. 2. The 16-bit repre-
sentations of these adders are shown in Fig. 3.
board (Xilinx part number xc7a35tcpg236-1), by Digilent, Inc. Artix-7 family Field
Programmable Gate Array (FPGA) devices are based on 28 nm high-K metal gate
(HKMG) process technology [22]. The particular FPGA has 20,800 LUT elements,
41600 Flip-flops, 106 IOBs, 50 Block RAMs and 90 DSP units among other periph-
erals, and FPGA core operates at 1 V supply [23]. Functional Verification (using
test-bench) Synthesis and Implementation were performed using ‘Xilinx Vivado
2016.2 HLx Edition’.
Each architecture has been compared for the parameters, device utilization
(reflects the area), timing (reflects the maximum frequency of operation of device)
and power consumption. Analysis is carried out post implementation stage after com-
pletion of routing. The tool at this stage can report exact number of logic resources,
routing resources, routing delays and exact activity of the internal nodes. Power
analysis at this level provides the most accurate power estimation [24].
The device utilization has been captured in terms of the Number of LUTs required to
implement logic, Number of Slices, Number of Slice Registers and Number of IOBs.
Of these, Number of logic LUTs and Number of slices are the critical parameters
to compare the designs. The device utilization summary, as reported by the tool, is
reported in Table 1.
4.2 Timing
To compare the speed of the architectures, the comparison has been made on the total
delay of the critical path. Also, logic delay versus routing delay contribution to the
total delay is studied in essence to the fact that the architectures bring about speed
by either varying the depth of the adder or increasing the logic elements performing
the operation leading to wire congestion. The tool reports the paths with maximum
delay after implementation stage. The logic delay, routing delay and the total delay
of the path are reported [25]. Details are summarized in Table 2.
The other parameter of interest is the speed at which an architecture can work
under given constraints. Constraints were set for minimum clock period as 10 ns and
maximum rise time delay at each of the inputs and outputs as 0.1 ns. Under these
constraints, the tool reports the Worst Negative Slack (WNS) and Worst Hold Slack
(WHS) [25]. WNS is the negative of all the slacks, and hence is slack of the critical
path [26].
WNS = min{slack(τ )} (9)
where τ is the set of timing endpoints, e.g., primary inputs and outputs to the flip-
flops.
78
Route delay (in ns) 1.860 2.141 3.52 6.427 1.860 2.085 5.249 4.955 1.860 2.158 3.646 4.814
% Logic delay 63.26 59.03 46.56 16.09 63.26 60.37 36.88 38.22 63.26 58.84 45.69 39.34
% Route delay 36.74 40.96 53.44 83.90 36.74 39.62 63.12 61.78 36.74 41.15 54.31 60.65
79
80 S. Janwadkar and R. Dhavse
The maximum frequency at which the architecture can be operated under given
constraints is then given by [27]
1
Fmax = (10)
Actual Clock Period − WNS
WHS corresponds to the worst slack of all the timing paths for min delay analysis
[25]. A positive or zero value of slack indicates that the design works properly at the
specified frequency. Negative value of slack is undesirable, implying constraints are
violated [28]. The WNS and WHS slacks are summarized in Table 3.
FPGA consists of internal Phase Locked Loops (PLLs) that can generate various
clock frequencies. Power Analysis has been carried out considering 10 GHz clock
frequency for switching activity computations. The static switching probability of
0.5 and toggle rate 12.5% is chosen [24]. The tool reports power consumption as an
estimate of static power consumption (due to leakages), dynamic power consumption
(captured in terms of signal power (depends on switching activity across signals in
design) and logic power (depends on switching activity in the logic)) and IO power
[24, 29]. These values are reported in Table 4. The values in Table 4 are as reported
by the tool, at 10 GHz clock frequency. Under the constraints on clock (minimum
clock period as 10 ns and maximum rise time delay at each of the inputs and outputs
as 0.1 ns), the prefix adders work at maximum frequency 100–105 MHz. However,
Vivado 2016.2 tool reports power consumption at 10 GHz clock frequency, by default.
Therefore, this frequency has been chosen for power consumption analysis.
5 Discussion
Fmax (in MHz) 103.918 103.788 104.942 104.822 103.918 103.766 103.648 103.040 103.918 103.788 105.053 104.789
81
82
80 SA
KSA
60
BKA
40 LFA
HCA
20
KA
0
4-bit 8-bit 16-bit 32-bit
Wordlength N (in bits)
implementations, respectively) and KA (51 and 131 logic LUTs for 16-bit and 32
bit implementations) shoot up and they prove to be the priciest. Explosion in gate
count with recursive doubling hardware approach is clearly visible. BKA (30 and 62
logic LUTs for 16-bit and 32-bit implementations, respectively) is most economic in
terms of device utilization at high wordlengths, which is less than half required for
KSA and KA. BKA utilizes 89 slices against 106 and 113 required for KSA and KA
implementations, respectively. Figures 4 and 5 graphically represent FPGA device
utilization in terms of number of logic LUTs and number of slices, respectively.
5.2 Timing
From Table 2, it is clearly evident that the total delay doesn’t linearly increase
with increase in wordlength, as the case with conventional adders such as Ripple
Carry adder and Carry Propagate adder. It is also be interpreted that, irrespective
of the architecture style, at higher wordlengths routing delay supersedes the logic
84 S. Janwadkar and R. Dhavse
2 HCA
KA
0
4-bit 8-bit 16-bit 32-bit
Wordlength N (in bits)
104.5
SA
104
KSA
103.5 BKA
103 LFA
HCA
102.5
KA
102
4-bit 8-bit 16-bit 32-bit
Wordlength N (in bits)
delay. For 32-bit adders, net delay accounts for about 60–80% of the total delay. At
lower wordlengths (N = 4 and N = 8), the total delay for each of the architecture
is reported to be the same. However, for higher wordlengths (N = 16 and N = 32),
the total delay of BKA shoots up (9.541 ns for 32-bit adder). The total delay is
graphically represented in Fig. 6.
Considering the maximum frequency at which the Parallel-Prefix Adders can
be operated, under the constraint of clock period as 10 ns, KSA and KA (104.932
MHz and 104.789, respectively, for 32-bit adder implementations) are obvious best
choices, closely followed by LFA (104.822 MHz). The results are graphically rep-
resented in Fig. 7.
From Table 4, it is concurred that irrespective of the architecture, the static power
consumption remains fixed for a given wordlength. This is justified as static power
Qualitative and Quantitative Analysis of Parallel-Prefix Adders 85
0.1 HCA
KA
0
4-bit 8-bit 16-bit 32-bit
Wordlength N (in bits)
consumption of
parallel-prefix adders 2.5
at f=10 GHz (in W)
2 SA
KSA
1.5
BKA
1 LFA
HCA
0.5
KA
0
4-bit 8-bit 16-bit 32-bit
Wordlength N (in bits)
depends on leakages and for given wordlength N, leakages are similar. Figure 8
represents Static Power Dissipation.
However, for various Parallel-Prefix Adder architectures, dynamic power con-
sumption varies. Dynamic Power dissipation consists of signal power and logic
power components. Depending on the logic used in the architecture and the amount of
switching activity based on wiring, the dynamic power varies. Comparisons are made
on the basis of dynamic power consumption. KSA (Dynamic Power Consumption
= Signal Power (1.886 W) + Logic Power (0.786 W) = 2.672 W) and KA (Dynamic
Power Consumption = Signal Power (1.852 W) + Logic Power (0.805 W) = 2.657 W)
are power-hungry architectures at 32-bit wordlength. SA is the most power-efficient
(Dynamic Power Consumption = Signal Power (0.483 W) + Logic Power (0.186 W)
= 0.669 W for 16-bit and Dynamic Power Consumption = Signal Power (1.475 W)
+ Logic Power (0.405 W) = 1.88 W for 32-bit implementations) adder architecture.
Results are depicted pictographically in Fig. 9.
IO power doesn’t vary much with the architecture style. The IO power consump-
tion by various architectures is represented in Fig. 10.
86 S. Janwadkar and R. Dhavse
Fig. 10 IO power 25
consumption of
parallel-prefix adders 20
IO Power Consumpon
at f= 10 GHz (in W)
SA
15
KSA
10 BKA
LFA
5 HCA
KA
0
4-bit 8-bit 16-bit 32-bit
Wordlength N (in bits)
20
at f=10 GHz (in W)
SA
15
KSA
10 BKA
LFA
5
HCA
KA
0
4-bit 8-bit 16-bit 32-bit
Wordlength N (in bits)
6 Conclusion
Acknowledgements The authors express their gratitude to Special Manpower Development Pro-
gram for Chips to System Design (SMDP-C2SD) project under Ministry of Electronics and Infor-
mation Technology, Government of India, for providing access to tools required to conduct this
research work.
References
1 Introduction
In Indian culture the meaning of the word Vedas is the books of wisdom. It has
many aspects of life explained with education and engineering. Vedic mathematics
is also a part of Vedas. This explores the methods used in mathematics for different
applications in the form of vedic sutras and upasutras. There are 16 sutras (Formulae)
and 13 subsutras (derived formulae) in vedic mathematics as explored by Bharati
Krishna Teertha Maharaj [1]. In applications such as amplifier, increasing brightness
of an image, amplifying the speech etc. the operation performed is the scaling by a
number.
The EDA tools such as Xilinx and Altera Quartus II are powerful tools which give
analysis of the design for different parameters such as area (in terms of number of
slice or LUT), delay and power dissipation. Most of researchers have implemented
the multiplier on FPGA with VHDL or Verilog. The technique for implementation
of general N × N multiplication is proposed in [2], in this general method appli-
cable for 4,8,16 and 32 bit multiplication is given. The vedic mathematics can be
used for the decimal number system [3] and binary number system [4] with ASIC
implementations for improvement in the performance of multiplier. The vedic multi-
plier is successfully implemented for applications discrete wavelet transform [5],
block convolution [6], cryptographic application [7], squaring and cubing [8] with
improved propagation delay to get the quick results of the multiplications. There
is very less literature on the Ekanyunena Purvena sutra of vedic multiplier, as it is
special case of multiplications. The Multiplier using Ekanyunena Purvena sutra has
the integer multiplicand and multiplier is always 9 or array of 9 (for example 9,99,999
etc.) [9].
This paper presents BCD and binary multiplier as use of BCD numbers in
displaying kinds of application gives great simplification by treating BCD numbers as
separate digit. This separate digit matches with series of separate identical 7-segment
displays. If the numeric quantity were stored and manipulated as pure binary, inter-
facing to such a display would require complex circuitry. Therefore, in cases where
the calculations are relatively simple working throughout with BCD can lead to a
simpler overall system than converting to binary.
Example 1. 4 × 9 = Example 2. 34 × 99 =
Step 1: RHS of result 4–1 = 3 Step 1: RHS of result 34–1 = 33
Step 2: LHS of result 9–3 = 6 Step 2: LHS of result 99–33 = 66
Step 3: Combined result 5 × 9 = 36 Step 3: Combined result 25 × 99 = 3366
Similar multiplications steps are followed for binary coded decimal (BCD)
number and binary number in which the maximum digit is 1001(in decimal 9) and
1111(in decimal 15) respectively.
The multiplieris designed with the ekanyunena purvena method for BCD as well as
binary number system. The designed multiplier is implemented in Xilinx ISE with
VHDL.
Let the multiplier is 1001 (decimal equivalent = 9) and multiplicand may be any
integer consider as x the multiplication is represented as
x × 1001 =x(101 − 1)
=10x − x (1)
The multiplication can be done with one multiplication with 10 and subtracting
the multiplicand from the result.
In the proposed method following approach is used for implementation
x × 1001 =10x − 10 + 10 − x
x × 1001 =10(x − 1) + (10 − x) (2)
92 M. A. Sayyad and D. N. Kyatanavar
The term on LHS is the upper digit of the result and the RHS term is the lower digit
of the result. The result can be generated with the help of 2 subtractions only. For
upper digit subtract 1 form multiplicand, which is implemented as adding 1111 to
the multiplicand and neglect the carry. To get the lower digit use binary subtraction.
The scheme for implementations is shown in Fig. 1.
Let the multiplier is 1111 (decimal equivalent = 15) and multiplicand may be any
integer consider as x the multiplication is represented as
x × 1111 =x(24 − 1)
=24 x − x (3)
The multiplication can be done with one multiplication with 24 and subtracting
the multiplicand from the result. In the proposed method following approach is used
for implementation
x × 1111 =24 x − 24 + 24 − x
x × 1111 =24 (x − 1) + (24 − x) (4)
The term on LHS represents the upper 4 bits of the result and the term on RHS
represents lower 4 bits of the result. The result can be generated with the help of
2 subtractions only. For upper 4 bits subtract 1 form multiplicand, which is imple-
mented as adding 1111 to the multiplicand and neglect the carry. To get the lower
A Novel Method of Multiplication with Ekanyunena Purvena 93
4 bits use binary subtraction. The scheme for implementations is shown in Fig. 2.
For 4 × 4 bit multiplier the multiplication is with one 4 bit adder and one 2 compli-
ment circuit. The 4 bit adder circuit uses 1 half adder and 3 full adder whereas 2
complement circuit consist of 4 half adders.
The proposed multiplier is implemented with Xilinx ISE in VHDL and synthesized
using Xilinx XST. The synthesized circuit for 16 bit multiplier is shown in Fig. 3a, b
for proposed multiplier for BCD numbers and Proposed multiplier for binary numbers
respectively. All the designed multipliers are simulated using Isim simulator the
simulation results 16 bit multiplier is shown in Fig. 4 for Proposed Ekanyunena
Purvena multiplier with binary numbers.
Fig. 3 16 × 16 bit Ekanyunena Purvena multiplier BCD numbers and binary numbers
94 M. A. Sayyad and D. N. Kyatanavar
Synthesis of the proposed Ekanyunena Purvena multiplier was done using Xilinx ISE
14.7 and the report related to Number of Slices, Number of 4 input LUTs and delay
is presented in Table 1. For calculation of power dissipation XPower analyzer tool is
used, the switching activity data can be written at the time of simulation after place
and route. This switching activity data file is used by XPower analyzer to calculate
total power. The power calculations will help to optimize the power during design.
In our design the simulation input are kept same for array and vedic multiplier so
that there should be same effect of dynamic power on the circuits. The total power
dissipation of each multiplier is shown in Table 1.
Note: Table 1 contain the results of 16 bit array multiplier and UT vedic multiplier
form [10] which is our own work.
5 Conclusion
This paper presents the design approach and architecture for BCD as well as binary
number system multiplications. The proposed architecture for multiplication of 16
bit BCD number is 62% and 47% more delay efficient than array multiplier and
vedic UT multiplier respectively. It is not more efficient for propagation delay in
comparison with architecture presented in [9] but the number of LUTs required
is reduced by 11.86%. The proposed method for 16 binary multiplication is very
efficient, it has improved the propagation delay by 80% and 71% with respect to
array multiplier and vedic UT multiplier respectively. The proposed method for 16
bit binary multiplication is 45.9% delay efficient and 30% area (in LUTs) efficient
than the architecture proposed in [9]. This method is applicable for special class of
multiplication in which the multiplier is fixed and maximum in that number system,
and very useful for scaling by maximum number.
A Novel Method of Multiplication with Ekanyunena Purvena 95
References
1. Swami Bharati Krsna Tirtha (1965) Vedic mathematics. Motilal Banarsidass publishers, Delhi
2. Akhter S (2007) VHDL implementation of fast nxn multiplier based on vedic mathematic. In:
18th European Conference, IEEE circuit theory and design ECCTD 2007, pp 472–475
3. Saha P, Banerjee A, Dandapat A, Bhattacharyya P (2012) Design of high speed vedic multiplier
for decimal number system. In: Proceedings of 16th international symposium, VDAT 2012
Shibpur, India, pp 79–88
4. Jagannatha KB, Lakshmisagar HS, Bhaskar GR (2014) FPGA and ASIC implementation of
16-bit vedic multiplier using Urdhva Triyakbhyam Sutra. In: Proceedings of International
Conference, ICERECT 2012, Lecture Notes in Electrical Engineering, vol 248, Springer, India,
pp 31–38. https://doi.org/10.1007/978-81-322-1157-0_4
5. Tripathi S, Singh AK (2015) High speed and area efficient discrete wavelet transform using
vedic multiplier. In: International conference on computational intelligence and communication
networks (ICCICN 2015), pp 363–367
6. Hanumantharaju MC, Jayalaxmi H, Renuka RK, Ravishankar M (2007) A high speed block
convolution using ancient indian vedic mathematics, In: International conference on compu-
tational intelligence and multimedia applications (ICCIMA 2007), Sivakasi, Tamil Nadu, pp
169–173. https://doi.org/10.1109/ICCIMA.2007.332
7. Leonard Gibson Moses S, Thilagar M (2011) VHDL implementation of high performance
RC6 algorithm using ancient indian vedic mathematics. In: 3rd international conference IEEE
electronics computer technology (ICECT), vol 4, pp 140–143
8. Ramalatha M, Thanushkodi K, Dayalan D, Dharani P (2009) A novel time and energy effi-
cient cubing circuit using vedic mathematics for finite field arithmetic. In: 2009 International
conference on advances in recent technologies in communication and computing, Kottayam,
Kerala, pp 873–875. https://doi.org/10.1109/ARTCom.2009.227
9. Khan A, Das R (2015) Novel approach of multiplier design using ancient vedic mathematics.
Adv Intell Syst Comput. https://doi.org/10.1007/978-81-322-2247-7_28
10. Sayyad MA, Kyatanavar DN (2017) Optimization for addition of partial product in vedic multi-
plier. In: 2017 International conference on computing, communication, control and automation
(ICCUBEA), Pune, pp 1–4
FPGA Design of SAR Type ADC Based
Analog Input Module for Industrial
Applications
Abstract Programmable logic controller (PLC) is connected with analog and digital
input and output modules to process physical variables which are to be maintained at
the desired values. Each module has a processor to process analog or digital signals.
This paper has designed analog input module (AIM) using field programmable gate
array (FPGA). It has used a digital to analog converter (DAC), comparator and FPGA
for a single channel analog to digital converter (ADC). It becomes an analog input
module (AIM) when a DAC and comparator are added to FPGA for every addition
of a channel. Conversion time of a processor based AIM is n*tc where ‘n’, ‘tc ’ are the
number of channels and conversion time of ADC respectively. Conversion time of ‘n’
channel FPGA based AIM is ‘tc ’ as all the analog signals are processed concurrently.
The design was experimented using Multisim software and the conversion time of
eight channels ADC was identified 0.13 ms.
1 Introduction
The concept of Industrial automation emerged from pneumatic system. After the
invention of electric system, relay logic enhanced industrial automation. Nowa-
days, modern industries use programmable logic controller (PLC) that reduces the
complexity of relay logic. PLC architecture has a processor, memory and input/output
interface. Program and data are stored in separate memory. The function of the PLC
is to execute the program available in memory. Ladder diagram (LD) representation
is the familiar way of programming the PLC. A typical LD will have many rungs.
PLC reads the rungs from left to right and top to bottom. Output of one rung can be
G. Dhanabalan (B)
Sri Vidya College of Engineering and Technology, Virudhunagar, India
e-mail: [email protected]
T. Murugan
Sethu Institute of Technology, Kariapatti, India
e-mail: [email protected]
also used as input(s) in other rung(s). Also, many rungs shall have the same input(s).
Each rung generates output based on the satisfactory input conditions. Inputs to the
rungs are accessed from the input modules which are attached with the PLC. Input
modules acquire the field signals and store in memory. Whenever the PLC executes
the rungs, it refers the memory of the input module to access the necessary input
signals [1, 2]. PLC executes the LD rung by rung and hence it is sequential. PLC
stores the results of the execution of LD into the memory of output modules. A
processor in the output module transfers these outputs signals to the field. Thus the
PLC communicates to analog input/output or digital input/output modules to read or
write data from or to the external world [3]. Number of rungs in the LD, number of
inputs and outputs used in the rung impacts the scan time of the PLC. The scan time
of a PLC is the cumulative time of analog and digital signals scanning, execution
of ladder logic program and storage of outputs in memory [4]. The scan time varies
according to the number of inputs and outputs connected with the PLC [5].
The efficiency of an automation system established through PLC is based on the
ability of the PLC to respond to the change(s) in the process that happens in lesser
period of time. Scan time of PLC dictates the term “lesser period of time”. Therefore
it is necessary to optimize the scan time so that the PLC is able to respond even when
the changes happen at the level of micro second. Reducing the number of rungs
is one way of reducing scan time. The total number of rungs in a ladder diagram
(LD) depends upon the logic. Logic optimization can help to reduce the number
of rungs. Reducing the number of digital and analog signals processed by the PLC
is another way of reducing the scan time. PLC processes digital signals through
digital input module (DIM). Processing speed of DIM is at satisfactory level as the
processor in the DIM acquires the digital signals as such. Therefore, modification in
DIM does not impact the reduction in scan time. Analog signals are processed by
analog input module (AIM). AIM acquires analog signals of physical parameters like
pressure, level, and temperature [6] and outputs them as digital signals. Normally,
eight to sixteen analog signals are connected with an AIM. It has a processor in its
architecture. Since the processor cannot read analog signal, it must be converted into
digital using analog to digital converter (ADC).
Processor in the AIM activates a multiplexer to choose a channel, enables start
conversion pin of analog to digital converter (ADC) and stores the digital output
in memory after identifying the end of conversion signal from ADC. It repeats the
process for the remaining channels. If 1 µs is assumed as the conversion time to
convert one analog signal into digital, the conversion time for eight analog signals
will be 8 µs. This indicates that the conversion time of an AIM depends on the total
number of analog signals to be converted into digital. Also, the processor consumes
time to store the ADC output in memory. If 1 µs is the storage time for one analog
signal, the total time required by the AIM to process eight analog signals is 16 µs. This
indicates that the conversion time increases along with the number of analog signals.
This greatly affects the scan time of PLC. Increase in scan time can be otherwise
viewed as a drop in the quality of a product. Therefore, reduction in conversion time
of AIM will reduce the burden of PLC. Also, PLC is actually connected with more
FPGA Design of SAR Type ADC Based Analog … 99
than one AIMs and the time taken by the PLC to retrieve data from all the AIMs
brings down its efficiency.
AIM can be otherwise viewed as a multichannel ADC as it processes eight to
sixteen analog signals. An elaborate survey reveals that much attention has been
given in the design and applications of multichannel ADC. Use of multichannel
ADC in the gamma-ray spectroscopic measurement is discussed in [7]. It proved
that multichannel ADC performs better in the gamma-ray spectroscopic measure-
ment by comparing its performance with the performance of single channel ADC.
Nonlinearity, energy resolution and timing resolution were considered as the parame-
ters for performance analysis. Design of multichannel ADC for medical applications
is discussed in [8]. It has used field programmable gate array (FPGA) to establish a
data acquisition system. FPGA based motor controller has used ADC in its design
[9]. FPGA reads ADC output and initiates control action to ensure that the motor
speed is always at the desired value. Algorithm for PI controller was implemented
in FPGA so that it acts as a PI controller. A case study on neural network based
electronic nose also has been discussed in [9]. It has used LabVIEW to establish an
user interface. Design of an educational laboratory has used multichannel ADC to
acquire signals from various sensors [10]. Development of a prototype model for
multichannel neuronal recording has used multichannel ADC [11]. It has also used
FPGA to transfer data to the host computer. One of the major observations from
this survey is that the input range of the ADC is well below 5v almost in all the
works. Actually, an industrial standard transmitter, that generates output for a phys-
ical parameter like pressure, temperature etc., is 0–5v. The survey has also indicated
that multichannel ADC has used microprocessor to perform the conversion.
This work has developed FPGA based AIM that has reduced the total conversion
time of AIM. FPGA based solutions for industrial problems are tremendous [9, 12].
FPGA performs simultaneous conversion for all the analog signals and stores the
data in FPGA RAM in a single clock cycle [13]. Thus the conversion time for ‘n’
channel is the conversion time for a single channel. This FPGA based AIM can be
used as a standalone system for an application. Otherwise, it shall be used along with
the PLC.
Conventional AIM architecture has multiplexer type ADC controlled by the processor
as shown in Fig. 1. Whenever an analog signal is converted into digital, the processor
reads ADC output and stores it in random access memory (RAM) [14]. The method
by which the analog signal must be converted into digital depends on the type of
the application. Many industrial ADCs employ successive approximation register
(SAR) type ADC. This work has developed FPGA based AIM based on the working
principle of SAR type ADC. In SAR type ADC [15], the most significant bit (MSB) of
SAR register is set and the remaining bits are reset. Digital to analog converter (DAC)
converts the SAR content into analog signal and compares it with the analog signal
100 G. Dhanabalan and T. Murugan
that is to be converted into digital. DAC output is connected to the non inverting
terminal of comparator and the analog input signal is connected to the inverting
terminal of the comparator. If the output of DAC is less than the value of analog
input signal then it resets the MSB bit of SAR register. Otherwise, SAR content is
not disturbed. Now, the next bit towards least significant bit (LSB) is set and the
above process is repeated until all the SAR bits are set and compared. An 8-bit ADC
consumes eight clock pulses to complete the conversion. The content of SAR register
at the eighth pulse is treated as the equivalent digital value of analog input signal.
The conversion time of a single channel SAR type ADC is given in Eq. (1) as
tc = tclk × n (1)
where t c is the conversion time, t clk is the time duration of clock pulse applied to
ADC and ‘n’ is the number of bits of ADC output. Thus, the conversion time of ‘N’
channels ADC is
where t cN is the conversion time of ‘N’ channels. It is clear that the conversion time
of a single channel gets multiplied with the total number of channels. For example,
the conversion time of 8, 16 channels ADC will be 8 µs and 16 µs respectively, if
the conversion time of single channel ADC will be 1 µs.
The processor can deal the ADC output in the following way.
• Processor reads the ADC output and initiates conversion for the next channel.
(or)
• Processor reads the buffer register which has the digital output for the earlier
sample and the ADC performs conversion for the present sample. This leads to
the pipelining concept, so that the process of both reading and conversion happens
simultaneously.
ADC in AIM slows down the processor execution speed, as its conversion time is
high when compared to the processor clock time. Effective utilization of the processor
FPGA Design of SAR Type ADC Based Analog … 101
is possible only when the conversion time of ADC matches the processor execution
time. The period by which a particular signal is accessed by a processor depends on
the following time constraints.
• The time required by the processor to select an analog signal using multiplexer.
• Conversion time of ADC.
• The time required to store ADC output in RAM.
Under these constraints, the time required to sample nth signal will be the time that
meets out the above constraints multiplied by ‘n’. This makes the sampled signals
as a dead signal instead of a live signal. Reduction in conversion time will reduce
the sampling time which in turn increases the number of samples per second. If the
conversion time can be reduced to the processor clock pulse, then the sampling time
will be the conversion time plus the time required to store ADC output in memory.
The sampling time can be further reduced, if the ADC output is directly stored in the
RAM.
Operating speed of a processor will be always less than that of its clock speed.
Multitasking can be achieved either by sharing processor’s clock time equally to all
the tasks or assign one processor to each task. In time-sharing concept, tasks will
be assigned on the basis of priority. This is not a multitasking as only one task is
executed is at a time. Assigning one processor to each task is a real multitasking
which is not possible in practical scenario. FPGA is a reconfigurable device that has
configurable logic blocks (CLB) in it [16], whereas the CLB is a combinational logic
circuit that executes a set of inputs applied to it. A portion of the output is fed to
the CLB as input so that a logic for sequential circuit can be realized. Digital circuit
designed using hardware description language (HDL) can be simulated, verified and
converted into bit file so that it can be downloaded into FPGA using synthesize
software. Concurrent execution of any combinational or sequential circuit by the
FPGA is confirmed by assigning individual FPGA hardwares to it. Thus the delay
time of a circuit is actually the propagation delay time of FPGA. This concurrent
execution nature of FPGA [5] is effectively utilized in the design of AIM.
Block RAM and distributed RAM are one of the distinct feature of FPGA. These
memories are available in the chip itself. Hence, the data transfer happens at a faster
rate. Also, it provides the feature of storing or accessing more than one data at a time.
Memory capacity of a block RAM of SPARTAN-3 generation FPGA chip XC3S500E
is 360 Kb. This block RAM capacity has been divided as 20 block RAMs. Hence,
the capacity of each block RAM will be 18 Kb. All these 20 block RAMs can be
accessed in a single clock pulse as the memory operation is synchronous. Latest
FPGA technology is flexible enough that either the number of block RAMs or the
memory size of block RAM is reconfigurable. This work has proposed the design of
ADC by considering the following aspects.
102 G. Dhanabalan and T. Murugan
• Let the working principle of ADC be successive approximation register type. This
helps to design concurrent conversion of more than one analog signals into digital
when combined with FPGA.
• The design must provide better performance even with the lower configuration of
the devices.
• Consider the DAC and the comparator that are already available in the market.
• Design the comparator using operational amplifier.
The processor and the RAM in conventional AIM are replaced by an FPGA in the
proposed single channel ADC as in Fig. 2. Control logic implemented in the FPGA
sets MSB bit of the SAR register. DAC converter converts the SAR content into
analog signal. This analog signal is treated as the reference signal for the purpose
of comparison. A comparator designed using operational amplifier compares the
reference signal with the analog signal which is to be converted into digital. The
reference signal and the analog signal are connected to the inverting and non-inverting
input of the comparator respectively. FPGA reads the comparator output and decides
whether to reset or retain the MSB bit of SAR register. MSB bit must be retained
when the value of reference voltage is less than that of analog signal. Therefore, the
FPGA retains the MSB bit when the comparator output is high and resets when it is
low.
If the input connections to the comparator are reversed, then the condition for
retaining or resetting the MSB is also reversed. Thus FPGA is flexible, as the condi-
tion of reversal can be modified using the hardware description language (HDL)
coding. Now, the FPGA sets the immediate MSB bit and the above process is repeated
till the status of the LSB bit is decided. The content of SAR register is equivalent
to the digital output for the analog signal applied at the non-inverting input of the
comparator.
Latest FPGAs provide storage facility in the form of block or distributed RAM.
The design was experimented for 8-bit ADC. As the SAR register will be referred
eight times during conversion, the design requires eight clock pulses to complete
the conversion and the data in SAR at eighth clock pulse is the digital output. SAR
register will be reset in the ninth pulse and the conversion starts again from the next
pulse. The conversion time for the single channel ADC is calculated from Eq. (3) as
tc = n × tset + tr + tpd (3)
where t c is the conversion time, ‘n’ is the number of bits of ADC output, t set is the
settling time of DAC, t r is the response time of comparator and t pd is the propagation
delay time by FPGA logic. Delay time of t set will be greater than t r & t pd as the
settling of operational amplifier used in DAC is high.
Field programmable gate array analog input module–FPGA AIM. Industrial
automation system deals with more number of analog signals. Circuit complexity
increases when the PLC has to acquire digital data directly from many ADCs. Total
number of modules that can be attached with the PLC is based on the process condi-
tion. PLC communicates to these AIMs through a common communication bus
system. In AIM, a set of analog signals are grouped together to perform conversion
for multiple analog signals. The digital data is transferred to PLC through suitable
data transfer technique. However, this work has focused on the conversion process
only.
The concept of FPGA based AIM has been developed using multiple DACs and
comparators as shown in Fig. 3. Every channel is assigned a SAR register, DAC and
a comparator. This confirms that all the analog signals are simultaneously converted
into digital. Thus the total number of DACs and comparators used in the design is
equivalent to the total number of analog signals to be converted into digital. However,
only one FPGA handles the conversion for all the analog signals. Since the FPGA
can directly communicate with all the DACs and comparators simultaneously, there
is no need for a multiplexer.
As a set of DAC and comparator is used for each channel, the conversion time for
‘n’ channel is the conversion time of a single channel only.
The entire circuit was designed, simulated and verified using Multisim software.
General purpose 8-bit DAC (DIG2ANACON8) was used to generate analog signal
for the applied 8-bit digital input. LTC6240CS8 [17], complementary metal oxide
semiconductor CMOS operational amplifier was used to design comparator. Analog
signal which is to be converted into digital was connected to the non-inverting input
of the comparator. The output of DAC was connected to the inverting terminal of the
comparator. The operating frequency of FPGA was set as 60 kHz. Normal procedure
to download a digital design into an FPGA is to write HDL description equivalent to
the hardware circuit, simulate the program, verify that the HDL description generates
104 G. Dhanabalan and T. Murugan
expected output, synthesize the program to convert HDL description into binary
file and download it into FPGA. In case of Multisim, HDL description for single
channel ADC is developed as hardware logic itself. Multisim has an option to convert
this hardware logic into Very high speed hardware integrated circuit HDL (VHDL)
description. FPGA was connected with other necessary components to realize single
channel ADC as in Fig. 4.
FPGA logic to perform ADC conversion developed using Multisim is shown in
Fig. 5. It has used 4-bit synchronous counter, 3 to 8 decoder, 2:1 Multiplexers and D-
Latches and their respective identifications are also shown in the Figure. are U1, U4,
U6, U8. Since the design is for 8-bit ADC, eight multiplexers and eight D-Latches
were used in the design.
Eight bits of the decoder output were connected as selector lines to the eight
multiplexer. D-Latches receive their inputs from the multiplexers. Outputs of the D-
Latches were connected to the SAR register. Multiplexer plays a key role in setting
or resetting a specific bit of the SAR register. The two inputs to the multiplexer are
the comparator output and the logic ‘1’. Multiplexer 1 allows Logic ‘1’ to set the
MSB bit of SAR register when the counter value is one. In the next count value,
multiplexer 1 allows the comparator output and the multiplexer 2 allows Logic ‘1’
to the SAR register.
This process will continue for eight clock pulses. The counter is reset once the
count value reaches eight. Actually, It is enough to have a 3-bit synchronous counter
FPGA Design of SAR Type ADC Based Analog … 105
as the specification of ADC is 8-bit. This design has used 4-bit counter to identify
the end of eighth clock pulse. MSB bit of the counter is XORed with the logic ‘1’ to
reset the counter. The complete circuit was simulated and verified that the expected
output is reached. VHDL description equivalent to the hardware circuit designed in
FPGA will be automatically generated when the “Generate and save VHDL files”
shown in Fig. 6 is selected from the option “Export to PLD” in the menu.
The aim of this work is to develop an AIM which has the capacity to convert eight
analog signals into digital. 1 FPGA, 8 DACs and 8 comparators were used to design
eight channel ADC. FPGA hardware logic was repeated eight times so that the design
for eight channel ADC is complete. This ensures concurrent conversion for all the
eight channels. Figure 6 shows FPGA logic for single channel which was converted
into VHDL code and synthesized to indentify the delay time and hardware resources
106 G. Dhanabalan and T. Murugan
Fig. 6 FPGA hardware logic for single channel analog to digital converter
utilized by FPGA. Eight channel ADC was also synthesised for the identification of
delay time and hardware resources utilized by FPGA.
Figure 7 shows the changes made by the FPGA to bring comparator output
equivalent to the analog signal. It also indicates the equivalent digital output for the
analog signal in eighth channel. Analog signal that is to be converted into digital in
that channel was 4.0v and its digital output was 110, 011, 011. Input range to the
ADC is 0–5v and the digital output is 8-bit. Hence the ADC resolution is 5/256.
AIM designed using FPGA can be used as a stand alone system in small scale
industries. However, it is necessary to have a data acquisition system (DAS) to
monitor the field parameters [18]. Since the monitoring process happens at layman
level, it is mandatory that the data acquisition software must be graphical and user
friendly. Data acquisition system software installed in a computer monitors field
parameters by communicating to PLC which acquires field parameters through AIM
and digital input module (DIM). Precision in monitoring time can be relaxed as the
software does not produce any control action. Serial data communication is adaptable
to transfer data from FPGA to computer [19]. FPGA has the capacity to transfer data
to computer and PLC simultaneously.
This work has developed DAS using LabVIEW software. LabVIEW is basically
utilized to support the concept of virtual instrumentation. This helps to monitor the
analog signals that are processed by AIM. A tool kit named “Multisim connectivity
tool kit” has been used since the work is done at the level of simulation. It provides
a link between Multisim and LabVIEW. LabVIEW reads the specific parameters of
a circuit that has been developed in Multisim. Figure 8 shows the details for two
FPGA Design of SAR Type ADC Based Analog … 107
analog signals. DAC output was changed by FPGA for every clock pulse to bring
its output that is equivalent to the analog signal. Considering the analog value at
the eighth clock pulse, it can be used for the purpose of monitoring or establishing
control actions.
where t con is the conversion time, ‘n’ is the number of bits of digital inputs and tset is
the settling time. The number of bits is 8 bit. Settling time is identified from the time
required to settle within 1/2 LSB. DAC used in this design was DIG2ANACON8.
Step signal is used as a test signal to identify the settling time. Response of the DAC
for rising and falling step signal was analyzed. Settling time for DIG2ANACON8
DAC is calculated from Figs. 9 and 10. Rise time and fall time of the DAC was fixed
as 10 ns.
Highest settling time is considered to calculate the conversion time. From Figs. 9
and 10, the highest settling time is 7.03 ns. Transient analysis of comparator was
done to identify its maximum response time to generate the output. The response
was evaluated under two situations. In the first situation, comparator output changes
from low to high while in the other situation, change in the output was reversed.
Figure11 shows the transient response of the comparator under both the conditions.
Highest delay time response by the comparator was identified as 893.6 ns.
The delay time was also calculated by simulating the complete circuit. From the
simulation result, it was found that the circuit produced the expected output for
60 kHz. Hardware resources utilized by single channel and multichannel ADC are
shown in Table 1. It was found that logic utilization was less than 10% except for
LUT-FF pair in the device XC6SLX100T-3FGG484. Since less than 10% of FPGA
hardware is accommodated by the FPGA logic, remaining hardware shall be used to
include other useful logics like controllers.
Delay time incurred by the FPGA to perform the conversion for eight channel
ADCs were also analyzed from the synthesis result. It was found that the delay time
by the corresponding FPGAs in Table 2 had proved that higher version FPGA has
lesser delay time.
The analysis based on the Table 2 indicates that there is a difference of 1 ns in the
delay time when comparing the delay time of single channel ADC with eight channel
ADC. The basic intention of this design is to reduce conversion time which in turn
helps to reduce PLC scan time. It means that the efficiency of PLC is increased. The
proposed design has met the industrial requirements. Field transmitters for various
physical parameters like pressure, flow, temperature, level etc., generate output as
4–20 ma signal.
Current signal is converted into voltage whose range is 0–5 v and applied to AIM.
Input range in the proposed design has been also set as 0–5 v. Hence, the design is
compared with the industrial AIM as in Table 3.
Conversion time of an 8-bit SAR type ADC is the sum of settling time of DAC,
comparator and the delay time of FPGA. It was found that the eight ADC is able
to produce the digital output at an operating frequency of 60 kHz. Therefore the
110 G. Dhanabalan and T. Murugan
conversion of eight channel AIM is 0.13 ms and it remains same for ‘n’ channel
ADC also. Conversion time of the proposed design was compared with the AIMs of
GE Fanuc, Yokogawa and Allen Bradley AIMs [20–22]. From Table 3, it is clear that
the proposed design has lesser conversion time than the other AIMs. In conventional
AIMs, it depends upon the number of channels. For example, the conversion time
for single channel in the model F3AD08-4R is 50 µs. Hence for 16 channels, it is
16 * 50 µs which is equivalent to 0.8 ms.
It is to be noted that this design has considered the DAC whose settling time
is higher eventhough lesser settling time based DAC is available. The comparator
was developed using operational amplifier instead of a dedicated comparator. Table
3 is the proof of better performance of the proposed design even under the lowest
configurations of DAC and comparator.
FPGA Design of SAR Type ADC Based Analog … 111
7 Conclusions
Conversion time of SAR type ADC depends on the settling time of the DAC and the
response time of the comparator. It can be reduced further if a dedicated comparator
circuit is used instead of using operational amplifier as a comparator. This work has
112 G. Dhanabalan and T. Murugan
not focused on the establishment of compatibility between the PLC and the FPGA.
Solution for FPGA networking has not been also discussed. However, these works
will be done in the future.
References
13. Tessier R, Betz V, Neto D, Egier A, Gopalsamy T (2007) Power-Efficient RAM mapping
algorithms for FPGA embedded memory blocks. IEEE Trans Comput Aided Design Integr
Circ Syst 26:278–289
14. Roth J, Darr M (2011) Data acquisition system for soil-tire interface stress measurement.
Comput Electron Agric 78:162–166
15. Chang H, Huang H (2013) Adaptive successive approximation ADC for biomedical acquisition
system. Microelectron J 44:729–735
16. Khedkar AA, Khade RH (2017) High speed FPGA-based data acquisition system. Microprocess
Microsyst 49:87–94
17. Linear Technology (2005) LTC6240 / LTC6241 / LTC6242 CMOS Op Amps. Available https://
www.linear.com
18. Zheng W, Liu R (2014) Design of FPGA based high-speed data acquisition and real-time data
processing system on J-TEXT tokamak. Fusion Eng Des 89:698–701
19. An Y, Chung K, Na D, Hwang YS (2013) Control and data acquisition system for versatile
experiment spherical torus at SNU. Fusion Eng Des 88:1204–1208
20. GE Fanuc automation (2002) Series 90™–30/20/Micro PLC CPU instruction set reference
manual. Available https://support.ge-ip.com
21. Yokogawa (2012) General specifications, FA-M3 analog input modules. Available https://www.
yokogawa.com
22. Rockwell automation (2013) Flex I/O input, output and input/output analog modules. Available
https://www.rockwellautomation.com
Need for Predictive Data Analytics
in Cold Chain Management
1 Introduction
The cold chain industry is one of the fast-growing business sectors in India. The
cold chain facilities will play an important role in India because of the issue of
food shortage and food security. The cold chain is a logistical chain of activities
involving packaging, storage, and distribution of perishable food products like fruits
and vegetables, milk, meat and poultry, flowers, and vaccines from production to
consumption, where the inventory is maintained in predetermined environmental
parameters [1]. Refrigeration forms an important and significant part of the food
and beverage retail market. It ensures the optimal preservation of perishable food.
Domestic refrigeration and commercial refrigeration are important elements of the
cold chain [2].
A smart cold chain management system monitors and controls temperature, vibra-
tion, light, and humidity of the perishable food during the cold chain. It also refers
2 Related Work
NABARD report [1] presents the existing structure of Cold Chain Management
(CCM) and future requirements. It explains various components of CCM, benefits
of well connected CCM and challenges in reefer transportation for in-transit temper-
ature monitoring. Nodali Ndrahaa et al. [2] have focused on the quality and safety
of food in cold chain and also described temperature abuse in food cold chains and
provided various cold chain solutions. The challenges and future research in food
cold chains are also discussed. The temperature abuse is frequently occurred in the
food cold chain because of poor practices of the food cold chain operators, weak
design of refrigeration equipment, and the locations of food packages in the storage
container. It has been observed that recent technology applied in monitoring and
controlling of temperature provides a noteworthy contribution to food cold chain
but appropriate data is not generated for predictive analytics. The authors Fotis Ster-
giou et al. [11] have presented a review on effective management of the cold chain
through establishing control systems, such as intelligent packaging which includes
Time Temperature Indicators (TTI). Samuel Mercier et al. [3] have applied time–
temperature conditions at each stage of the cold chain. The major weaknesses in
the modern cold chain have been discussed as precooling, various operations during
transportation, storage at retail and in domestic refrigerators.
Need for Predictive Data Analytics in Cold Chain Management 117
Riccardo et al. [4] have discussed goals and strategies for the design of an IoT
architecture of the Food Supply Chain (FSC) operations and illustrated the poten-
tial benefits and opportunities for direct combination of physical food systems with
virtual computer-aided control systems. Christian C. et al. [6] have developed predic-
tive neural network models based on information that is collected using RFID
temperature sensors. Spatial and temporal in-transit time–temperature monitoring
is performed using neural networks. But other critical parameters are not consid-
ered for monitoring. Sjaak Wolfert et al. [7] have developed a conceptual frame-
work that influenced the entire food supply chain where big data is used to provide
predictive insights in farming operations, derived real-time operational decisions,
and redesigned business processes. Abel Avitesh et al. [9] have developed a system
consisting of Arduino wireless sensor networks and Xively sensor cloud which was
used to monitor temperature and humidity in cold chain logistics. Khanuja et al. [12]
have proposed a framework for cold chain management using IoT combined with
Cloud Computing, Machine Learning, and Big Data Analytics to revolutionize the
cold transport industry to monitor, visualize, track, and control various platform-
dependent parameters with assured freshness and palpability (sensitivity). Halima
et al. [13] have used machine learning algorithms such as neural network, SVM,
Decision Tree, Linear Regression, Random Forest in the planning and transportation
activities of the supply chain. The same concept can be implemented in cold chain
management.
Hongmin Sun et al. [15] have designed the real-time monitoring system on the
raw milk transport process based on the GPS, GPRS, and RFID technologies and
the system fulfilled the whole-process monitoring on the raw milk transport and
temperature from livestock farm to a processing plant, thus improving the quality
safety of dairy products and the work efficiency of food administration and supervi-
sion department. Similarly, Ronak et al. [5] have addressed the problem of analyzing
data collected by the dairy products to optimize the supply chain management and
maximize profit in the manufacturing of milk and other dairy items. M. Subburaja
et al. [16] have discussed various issues required in improving the operational effi-
ciency of the dairy supply chain in TamilNadu, India. Elisabeth Ilie-Zudor et al. [17]
have examined the challenges and potential of big data in heterogeneous business
networks and relate those to an implemented logistics solution.
C. N. Verdouw et al. [18] have implemented a system based on the concept of
virtual food supply chain on the IoT platform. On that basis, N.Indumathi et al. [19]
have proposed an IoT based system to determine the quality of milk by measuring
its pH value. The system has used the routing technique to govern the nearest milk
booth. The system also traces various activities throughout the supply chain in food
processing environments.
Ning Wang et al. [20] have presented an overview of the recent development
of wireless sensor technologies in the agriculture and food industry. The paper also
discussed the significance of wireless sensors to achieve market growth. Atefe Zakeri
et al. [24] have developed a system to detect the level of microorganisms in milk. The
system also gives the earlier determination of milking cycle events using a proactive
approach with high accuracy. Sichao Lu et al. [25] have proposed a system based on
118 S. D. Kale and S. C. Patil
artificial intelligence, cloud computing, and IoT to detect the temperature violation
and route deviation accurately. Chaug-Ing Hsu et al. [26] have considered various
features of perishable food for distribution and thus determined the vehicle count for
delivering perishable food, loads and departure times of vehicles, and the shortest
routing path. Maarten L et al. [27] have developed a real-time system considering
shelf life inventories to monitor supply chain conditions at different nodes. Bastian
Lange et al. [28] have presented challenges in existing structures of cold chains and
suggest possible solutions to improve cold chain management.
From the study of previous work, it is found that the quality of food is affected
mostly during transportation and many critical internal and external factors are
responsible for this. Therefore, some mechanism is required to automate this process
for predicting the status of food during transportation.
• Packing and cooling fresh food products (It is immediately after harvest or
collection).
• Food processing (i.e., ripening, chilling, or freezing of foods).
• Refrigerated or cold transportation.
• Cold storage (short or long term warehousing of chilled foods).
• Retail markets and foodservice outlets.
The following parameters are considered for the design of an efficient and effective
cold chain to provide the best conditions for inhibiting any undesirable changes.
• Temperature: Achieving and maintaining the perishable food at a specific lowest
suitable temperature (lowest safe temperature). If safe temperature levels for food
are not adhered to, this will result in chilling or freezing injury and increased food
losses.
• Humidity: Providing relative humidity in the storage environment that prevents
water loss along with a corresponding loss in weight and quality of food and
also avoids excess humidity. Excess humidity can also result from temperature
fluctuations, causing water to condensate on product surfaces (sweating). Free
water on the product surface can lead to favorable growing conditions for bacteria
and, therefore, promote deterioration. Humidity is primarily influenced by the
cooling applied technology, temperatures inside the storage space, and airflows.
• Atmospheric composition: The composition of the atmosphere within a storage
space influences the rate at which metabolic processes, such as the ripening of
fruit progress. The metabolization depends on the oxygen. Therefore, a simple
way to exert influence on atmospheric composition within storage is the control
of airflow (i.e., limiting or increasing the influx of outside air). More advanced
methods include modified atmosphere packaging [24].
• Coefficient of Performance (CoP): It is a measure of the energy efficiency of a
refrigerating system and is defined as the ratio between the refrigerating capacity
and the power consumed by the system. It is mainly dependant on the working
cycle, the evaporating/condensing temperature levels, and type of the refrigerant.
• Depth of Discharge (DoD): A measure used to describe the degree to which
a battery is discharged. DoD is defined as the percentage of battery capacity
discharged expressed as a percentage of a battery’s maximum capacity [27].
• Global Warming Potential (GWP): An index comparing the climate impact of a
greenhouse gas relative to emitting the same amount of carbon dioxide. The GWP
of carbon dioxide is standardized to 1. GWP includes the radiative efficiency, i.e.,
infrared-absorbing ability of the gas as well as the rate at which it decays from
the atmosphere.
• Refrigerant: It is a type of fluid used in a refrigerating system for heat transfer.
Refrigerant absorbs the heat at a low temperature and pressure and rejects it at
a higher temperature and pressure of the fluid. Usually it involves change of the
phase of the fluid [28].
120 S. D. Kale and S. C. Patil
Analytics algorithms are required to ensure cold chain integrity for converting raw
data into actionable recommendations and warnings that can improve cold storage
processes, guide business decisions, and prevent cold chain failures before they occur.
This includes descriptive and predictive analytics [16].
• Descriptive: Descriptive analytics derive properties of the cold storage system that
is to be monitored. These properties are deciding the typical range of temperature
of specific perishable food, determining the set point of the system’s thermostat,
the duty cycle of the system’s compressor, etc. Because of this information the
user came to know whether their storage unit is properly configured to store
a particular food or not. In maximum consumer refrigerators, thermostats are
available with only basic high and low settings that are not corresponding to a
specific temperature. Due to this, it becomes very difficult to determine the setting
temperature of a simple refrigerator’s thermostat.
• Predictive: Predictive analytics is increasingly important to cold chain manage-
ment which makes the process more accurate, reliable at a reduced cost. For
keeping the cold chain safe, it is very important to do the predictions of the
critical parameters in the cold chain such as temperature, humidity, coefficient
of performance, i.e., energy consumption, depth of battery discharge and thus
adjusting the temperature in transit or by diverting to the nearest cold storage,
i.e., finding the shortest path of cold storage. Predictive analytics is used to detect
problems in the cold chain before temperature abuse has occurred [29].
Following problems generally occur
• Poor Equipment Configuration:
The predictive model will determine the basic properties of a cold storage system
using information obtained from descriptive analytics. If the system’s configuration is
wrong then predictive analytics algorithms will indicate these configuration problems
and alert the system’s users about it.
• Human Error:
Simple human errors are common causes of many cold chain infractions. During
transportation, vendors and drivers frequently forget to shut refrigerator or freezer
doors or to ensure that a good seal is established. Therefore, it is necessary to deter-
mine when a cold storage system door will be left open alerting the system’s users
via a web interface, text message, or audible alert from temperature trend data before
temperature bounds are violated.
• Equipment Failure:
Sometimes cold chain violations are occurred because of compressor breakdown and
power failure. If a compressor failure occurs at night or on a holiday then it will be
Need for Predictive Data Analytics in Cold Chain Management 121
more destructive. In that case, predictive analytics can be used to detect power or
compressor failure and thus alert the users through message or an automated phone
call. The predictive system can also calculate the estimated time and the value of
energy consumed before the cold storage system reaches on the verge of a risky
temperature. Thus the system can prevent spoilage of perishable food.
Predictive analytics and machine learning are closely associated. Therefore,
predictive analytics is effectively used to evaluate and control cold chain risks.
Generally, two types of predictive models are used. First is the classification model
that predicts class membership while second is the regression model that predicts
a number. These models are used to perform data mining and statistical analysis.
Built-in algorithms are available in predictive analytics software solutions [29].
Losses of perishable foods are most important in developing countries. These
losses represent more than 400 million tons per year. The large post-harvest losses
affect food quality. The typical cold chain involves three steps of harvesting at a
warehouse, transportation, and distribution. As the change in temperature is occurred
mostly during transportation because of some internal critical and environmental
parameters. There is no any provision found in previous work to predict these losses in
transit. Therefore, some mechanism is required to automate this process for predicting
the status of food with time–temperature management. Predictive analytics can be
used to avoid the wastage of perishable items.
The loss of fresh food storage in developing countries according to the study of
Kitinoja [22] is shown in the following Table 2.
5 Methodology
The quality of food is affected during transportation due to the change in temperature
of food, its pH value, and CO2 emission.
There is no any provision found in previous work to predict these losses in transit.
Therefore, some mechanism is required to automate this process for predicting the
status of food with time–temperature management. Predictive analytics can be used
to avoid the wastage of perishable items. The proposed system will give a prediction
about the quality of food using predictive machine learning algorithms [30].
Following supervised M. L. algorithms are used for the prediction of food quality:
(i) Regression Analysis
(ii) Decision Tree
(iii) Support Vector Machines
(iv) Random Forest
Some critical factors shown in Fig. 1 are responsible for these changes.
The main aim of the proposed system is to maintain cold storage conditions
during transport. Transport vehicles are nothing but cold stores on wheels, and it is
Fig. 2 Case 1
Need for Predictive Data Analytics in Cold Chain Management 123
Fig. 3 Case 2
Fig. 4 Case 3
very difficult to maintain the correct storage temperature. Generally, these factors
are responsible for this:
During loading and unloading of the vehicle supplementary heat is introduced
due to the time required for the loading/unloading operation. This time must be the
lowest possible. The level of protection of the cargo during these operations is equally
124 S. D. Kale and S. C. Patil
important. Closed contact between foods and the lateral walls and sun irradiation on
the external surface causes an increase in temperature. Defrosting has a more severe
effect on foods compared to refrigerated stores (more restricted space for the coils,
more humid air inlet, i.e., internal dimensions of cargo) [38].
The essential component of any system to transport frozen food is an insulated
container. The overall coefficient of heat transfer, which represents the insulating
capacity of the equipment, is defined as
K = W S −1 T −1 (1)
126 S. D. Kale and S. C. Patil
W = K .S.T.1.75 (2)
is assumed as 4 °C when the local travel starts from the depot. The total time of
traveling is considered 8 h with 20 times the open frequency and an open duration
of 5 min.
7 Conclusion
Existing literature is not sufficient to determine perishable food quality. The focus is
more on predictive data analytics for evaluating the quality of food after transporta-
tion based on data mining techniques using machine learning algorithms. To this
end, Random Forest and Regression Analysis algorithms have been implemented
in python to relate Time—Temperature data along the chain. It is observed that
the Random Forest algorithm is outperformed than Regression Analysis. Vehicle
characteristics are equally responsible for temperature abuse in the cold chain.
128 S. D. Kale and S. C. Patil
References
1. NABARD (National Bank for Agriculture and Rural Development) report: Cold Chain
Technologies, Transforming Food Supply Chain, May 2017
2. Ndrahaa N, Hsiaoa HI, Vlajicb J, Yang MF. Time-temperature abuse in the food cold chain:
review of issues, challenges, and recommendations, Elsevier publication 05/02/2018. Review
article
3. Mercier S, Villeneuve S, Mondor M, Uysal I (2017) Time-temperature management along the
food cold Chain: a review of recent developments. Comprehensive Rev Food Sci Food Saf, 16
4. Accorsi R, Bortolini M, Baruffaldi G (2017) Internet-of-Things paradigm in food supply
chain control and management, Elsevier publication International Conference on Flexible
Automation and Intelligent Manufacturing, FAIM2017, pp 27–30 June 2017, Modena, Italy
5. Chudasama R, Dobariya S, Patel K, Lopes H (2017) DAPS: Dairy analysis and prediction
system using technical indicators. IEEE publication, conference paper
6. Emenike CC, Van Eyk NP, Hoffman AJ (2016) Improving Cold Chain Logistics through RFID
temperature sensing and Predictive Modelling. IEEE International Conference on Intelligent
Transportation Systems, Nov
7. Wolfert S, Ge L, Verdouw C, Bogaardt MJ (2017) Big data in smart farming—a review. Elsevier
J Agricul Sys 153:69–80
8. Jeble S, Dubey R, Childe SJ, Papadopoulos T, Roubaud D, Prakash A (2018) Impact of big
data & predictive analytics capability on supply chain sustainability. Int J Log Manag, March
9. Chandra AA, Lee SR (2014) A method of WSN and sensor cloud system to monitor cold Chain
logistics as part of the IoT technology. Int J Multimedia Ubiquit Eng 9(10):145–152
10. Pant RR, Prakash G, Farooqui JA. A framework for traceability and transparency in the dairy
supply Chain networks. Elsevier Journal XVIII Annual International Conference of the Society
of Operations Management (SOM-14)
11. Stergiou F (2018) Effective management and control of the cold chain by application of
Time-Temperature Indicators (TTIs) in food packaging, Review article. J Food Clin Nutr 1(1)
February
12. Khanuja, G.S., Sharath, D.H., Nandyala, S., and Palaniyandi, B.: Cold Chain Manage-
ment Using Model-Based Design, Machine Learning Algorithms, and Data Analytics, SAE
Technical Paper 2018–01–1201, 2018.
13. Kawtar H (2017) Machine learning applications in supply chains: an emphasis on neural
network applications, IEEE
14. Brown DE, Abbasi A, Lau RYK. Predictive Analytics, 1541–1672/15/ © 2015 IEEE Intelligent
Systems
15. Sun H, Jiang G, Kong Q, Chen Z, Li X (2016) Design of real-time monitoring system on raw
milk transport process. Int J Multimedia Ubiquit Eng 11(4):335–342
16. Subburaja M, Babub TR, Subramanian RS (2015) A study on strengthening the operational
efficiency of dairy supply Chain in Tamilnadu, India,XVIII Annual International Conference of
the Society of Operations Management Elsevier publication Procedia—Social and Behavioral
Sciences 189:285–29
17. Ilie-Zudor E, Kemeny Z, Buckingham C. Advanced predictive-analysis-based decision support
for collaborative logistics networks, Supply Chain Management, an International Journal
20:369–388
18. Verdouw CN, Wolfert J, Beulens AJM, Rialland A (2016) Virtualization of food supply chains
with the internet of things, Elsevier publication. J Food Eng 176:128–136
19. Indumathi N, Vijaykumar K (2018) Well-organized milk distribution monitoring system based
on Internet of Things (IoT). Int Res J Eng Technol (IRJET) 05(07), July, e-ISSN: 2395–0056
p-ISSN: 2395–0072
20. Wang N, Zhang N, Wang M (2006) Wireless sensors in agriculture and food industry-
recent development and future perspective. Computers and Electronics in Agriculture Elsevier
Publication, 50
Need for Predictive Data Analytics in Cold Chain Management 129
21. Novaes AGN, Lima Jr OF, de Carvalho CC, Bez ET (2015) Thermal performance of refrigerated
vehicles in the distribution of perishable food. Pesquisa Operacional 35(2):251–284, ©2015
Brazilian Operations Research Society
22. Kitinoja L (2013) Use of cold chains for reducing food losses in developing countries, PEF
White Paper No. 13–03, The Postharvest Education Foundation (PEF) December
23. Cold Chain Development Centre, National Horticulture Board. Technical Standards and
Protocol for the Cold Chain in India, 2010
24. Zakeri A, Saberi M, Hussain OK, Chang E (2018) An Early Detection System for Proactive
Management of Raw Milk Quality: An Australian Case Study 6:169–3536, 2018 IEEE
25. Lu S, Wang X (2016) Toward an intelligent solution for perishable food cold Chain
management, IEEE
26. Hsu C-I, Hung S-F, Li H-C (2007) Vehicle routing problem with time-windows for perishable
food delivery, Journal of Food Engineering, Elsevier
27. Maarten L. A. T. M. Hertog, Uysal I (2013) Shelf life modeling for first-expired-first-out
warehouse management, Philosophical Transactions of the Royal Society
28. Lange B, Priesemann C, Geiss M, Lambrecht A (2016) Promoting food security and safety via
cold Chains. International Zusammenarbeit (GIZ) GmbH, Eschborn, December
29. Shina SJ, Wooa J, Rachuria S (2014) Predictive analytics model for power consumption in
manufacturing, selection and peer-review under responsibility of the International Scientific
Committee of the 21st CIRP Conference on Life Cycle Engineering, Science Direct
30. Agrawal R (2018) Using Machine learning to transform supply chain management, White
paper, Tata Consultancy Services (TCS)
31. Bhardwaj A, Mor RS, Singh S, Dev M (2016) An investigation into the dynamics of supply chain
practices in dairy industry: a pilot study. Industrial Engineering and Operations Management
Detroit, International Conference
32. Zage D, Glass K, Colbaugh R (2013) Improving supply chain security using big data, 978–1–
4673–6213–9/13 ©2013 IEEE 254 ISI 2013
33. Mack M, Dittmer P, Veigt M, Kus M, Nehmiz U, Kreyenschmidt (2014) Quality tracing in
meat supply chains. Philosophical Transactions of the Royal Society
34. Smola A, Vishwanathan SVN (2008) Introduction to machine learning, eBook, University
Press
35. Shalev-Shwartz S, Ben-David S (2014) Understanding machine learning: from theory to
algorithms, eBook, Cambridge University Press
36. Dasgupta A, Nath A (2016) Classification of machine learning algorithms. Int J Innov Res Adv
Eng (IJIRAE) 3(03), March, ISSN: 2349–2763
37. Chakurkar P, Shikalgar S, Mukhopadhyay D (2017) An internet of things (IoT) based moni-
toring system for efficient milk distribution. 2017 International Conference on Advances in
Computing, Communication, and Control (ICAC3), Mumbai, pp 1–5
38. Ajay K (2014) Generation of shelf life equations of cauliflower. Int J Agricul Food Sci Technol
5:15–26
FPGA-Based Implementation of Artifact
Suppression and Feature Extraction
1 Introduction
Artifact can be defined as any wrong interpretation in the representation and per-
ception and consideration of signal because of involved measuring equipments and
device equipments and device (equipment) and methods. Artifacts are any artificial
product; a structure or appearance that is not natural, but is because of manipulation.
Work is inclined for achieving the low-complexity implementation of biomedical
signal denoising and its feature extraction module. Initial or primary diagnosis can
be done with lesser accuracy with mobile instrumentation, equipments. Biomedical
signal such as Electrocardiogram (ECG) has the morphological features, which can
be correctly predicted with some acceptable approximations and inaccuracies.
Nature and properties of noise seen in biomedical signals such as ECG, Electroen-
cephalogram (EEG), Electromyography (EMG) signals are having variety across
these signals. Noise can be classified based on its occurrence consistency and its
contributor.
– Physiologic noise: Noise caused because of human body involvement is called
as physiological noise (artifacts). Physiological noise (artifacts) changes in the
interested signal because of other physiological processes happening in the sur-
rounding of the body. Most common noises under physiological noise category are
eye activity, heart pumping and myo-activity-related noise, muscle tension signals
measured using EMG, and heart-related signals.
– Non-physiological noise: Noises such as powerline noise, noise due to electric
equipment in vicinity, mechanically produced effects due to ventilators, circulatory
pumps, etc come under non-physiological noise. Removing of instrumentation
noise is very difficult after certain extent, but it can be reduced through careful
circuit design and higher quality measuring equipments.
Powerline interference noise, baseline wander, electromyography noise are per-
sistent in nature. Variations in electrode-skin impedance and activities like patient’s
movements and breathe cause baseline wandering. Noises such as electrode popup or
contact noise, electrode motion artifact, and instrumentation noise occur in a burst.
Burst noise is typically classified as a White Gaussian Noise (WGN) which is present
on subset of leads for a small time duration. Low-frequency noise such as baseline
wandering and other fixed frequency noise such as powerline noise in biomedical
signals are addressed in different techniques during filtering them out.
Feature extraction transforms raw signals into more informative signatures or finger-
prints of a system. Feature of signal is meaningful content of signal in the context of
its application. Large number of data or information is converted to simplified for-
mat by maintaining sufficient accuracy. This will suppress the efforts of computation
time, task of noise removal, and power efficiency improvement. However, choice
of feature extraction and noise removal algorithm has to be made based on type of
biomedical signal and has different techniques for each of them.
Artifacts and noises present in biomedical signals degrade performance of systems
and hence need to be removed before performing feature extraction. Researchers have
FPGA-Based Implementation of Artifact Suppression and Feature Extraction 133
proposed many methods to perform preprocessing. But these algorithms are highly
complex and some of them do not give reasonable performance. Hence, primary
objective is to develop a method which is simple to implement and gives reasonable
performance and medical expert can draw correct primary diagnosis.
In Sect. 2, proposed methodology is explained. Section 3 comprised of simulation
results and implementation details. Results and comparison gives the comparison
between the work presented in paper and previously published efforts based on
improvement in Signal-to-Noise Ratio (SNR) and hardware savings.
2 Proposed Methodology
Biomedical signal undergoes denoising step before moving toward feature extraction
of the same. Desired level accuracy of biomedical signal is governed by the appli-
cation or purpose for which that biomedical signal is recorded. Method proposed
here is divided into stages largely differentiated into two parts which are “Denois-
ing of biomedical signal” and “Feature extraction of biomedical signal”. Each type
of biomedical signal has different sets of filters and feature extraction blocks as
mentioned in Fig. 1.
Preprocessing block includes the removal of common noise appearing in all types of
biomedical signal such as baseline wandering, powerline noise interference. Baseline
wandering is the low-frequency noise which can be removed by high-pass filter. Some
researchers have used Hamming window to remove noise, we can prefer windowing
method for simplicity. Moving average filtering also used to smoothen the signal
like ECG since ECG morphology holds the key in determining ECG feature. Finite
Impulse Response (FIR) filters can be realized fully parallel or partly serial based on
timing goals and hardware resources available.
FIR filters are easy to implement on hardware provided that filter coefficients are
readily available for computation. FIR filters realized here are with partly serial
topologies. The limited resources are available. Fully parallel topology consumes
equal number of multipliers as that of order of filter as shown in Fig. 2a. For every
tap in FIR filter, a dedicated multiplier is available. Implementation of FIR filter can
be done in a variety of serial architectures to obtain the desired speed/area trade-off.
Two extreme filter implementation styles are fully parallel and fully serial which is
represented in paper [1]. Although fully serial occupies very low area, it requires a
134 S. Kumbhar et al.
higher clock rate to function. On the other hand, fully parallel structure consists most
of chip area with better performance. Comparison between fully parallel and fully
serial FIR filter has been performed in the implementation section of this paper.
Moving average filters are nothing but low-pass filters which are used here in this
methodology for smoothening the ECG data. Smoothening of data is directly pro-
portional to window length of moving average filter [2, 3]. For digital filter in DSP,
the moving average is generally used mainly due to easiness of structure in terms
of understanding and usage. Despite its simplicity, the moving average filter is most
selected for reduction of random noise while maintaining a sharp step response.
Biomedical signal such as ECG has the QRS complex as a major feature. QRS com-
plex is high-frequency component. P and T waves have low-frequency range. Discrete
wavelet transform is used to decompose the signal into low-frequency and high-
frequency components. Detail coefficient and approximate coefficients are formed
after decomposing the signal through mother wavelet [4]. Haar wavelet has been cho-
sen as it has minimum complexity/hardware requirements along with the fact that
the accuracy and reliability of the features extracted from Haar wavelet transform
are satisfactorily and almost equivalent to other complex mother wavelets.
EMG signal has time-domain and frequency-domain features. EMG signal is decom-
posed into first scale to have optimal approximation. MATLAB signal processing
toolbox is used to calculate some of the features like total power, peak power fre-
quency, variance of signal, mean of signal, peak amplitude of signal, total power
spectral density spread, etc. (Fig. 3).
Haar wavelet decomposition of EEG signal helps to analyze the EEG signal in fre-
quency domain. Wavelet decomposition has the inherent property of low-pass filter-
ing [5, 6].
136 S. Kumbhar et al.
For performance evaluation, MATLAB simulations are carried out. Based on MAT-
LAB simulation appropriate wavelet transform is chosen for RTL design. RTL code
is written for “Spartan 3E starter board,” which comes under Spartan 3E family. Tar-
geted device is “xc3s500e-4-fg320.” It has 50 MHz crystal oscillator on board which
ultimately provides system clock of 50 MHz. RTL and MATLAB simulation results
are mentioned in this section. For simulation and analysis purpose, one and two
cycles of ECG signal are used from the database available at “www.physionet.org.”
Author in [2] has implemented filter algorithm with length of 8 and [3] uses 3, 4, 5
points for averaging filter. Algorithms proposed by [2, 3] have been implemented and
compared. In this paper, filter has been implemented using length of 10 which gives
improved performance. The order of FIR filter has been decided by MATLAB-based
simulation results. Author has simulated and analyzed the results for different order
lengths. For optimum length implementation, the order of the filter turns out to be 10
which gives promising results. The values of the FIR coefficients for FPGA imple-
mentation have been derived from FDATool by providing filter constraints (Figs. 4,
5, 6, 7, 8, 9, 10, and 11).
Denoising of ECG is performed using moving average filter and/or FIR filter. Both
the filters are compared based on following terms. According to [7], FIR filter is
suggested for ECG noise suppression. Moving average filter is useful for signal with
less variations whereas FIR filter is mostly used for biomedical applications for noise
suppression or noise cancellation.
FPGA-Based Implementation of Artifact Suppression and Feature Extraction 137
Table 1 gives the comparison between FIR filter and moving average filter based
on SNR. Table 2 gives comparison for hardware utilization. From Tables 1 and 2,
it can be observed that FIR filter provides better performance in terms of SNR and
power dissipation but FIR filter is having higher resource utilization as compared to
moving average filter.
FPGA-Based Implementation of Artifact Suppression and Feature Extraction 141
Table 1 Comparison between moving average filter and FIR filter based on signal-to-noise ratio
(SNR)
Filter behavior: LPF FIR filter (fully parallel) (10 Moving average filter (window
coeff.) size 10)
SNR initial (db) 0.6577 0.6577
SNR final (db) 19.4579 7.3891
Total stop band power before 1X 1X
filtering
Total stop band power after 0.01822 X 0.02962 X
filtering
Table 2 Comparison between moving average filter and FIR filter based on resource utilization
FIR filter (fully parallel) (10 Moving average filter (window
coeff) size 10)
No. of slices 259 59
Slice flip flops 193 73
4-ip LUT’s 454 86
Bonded IOB 52 52
18X multipliers 10 1
Table 3 Comparison between different mother wavelets based on R-peak location accuracy
R-peak sym4 (sample Haar (after db1 (sample db2 (sample db4 (sample
locations in index) cascade index) index) index)
filtering)
Contaminated 635,1379 635,1379 635,1379 635,1379 635,1379
signal
Wavelet 640,1384 636,1380 636,1383 638,1382 642,1386
transformed
signal
Table 4 Comparison between moving average filtered signal and Haar wavelet transformed signal
based on R-peak location accuracy
R-peak locations in MA (sample index) Haar (sample index)
Contaminated signal 635,1379 635,1379
Wavelet transformed signal 640,1385 636,1380
4 Conclusion
Proposed algorithm is primarily focused to process the biomedical signal for sup-
pression of noise and for the feature extraction so that initial diagnosis can be done.
Selection of Haar wavelet transform over other complex mother wavelets on the
cost of accuracy is to minimize the computational overhead. This can be targeted
to real-time first level diagnosis. Moving average filter smoothens the biomedical
signal such as ECG. This will effectively help to achieve better accuracy with simple
mother wavelet such as Haar wavelet. Cascading of LPF, notch filter, and HPF is
used in order to effectively denoise the biomedical signals such as EEG, ECG, and
EMG. With similar kind of noises present in almost all biomedical signals, the same
circuitry can be used for removing such noises. Significant percent of slices are saved
in moving average filter implementation as compared to FIR filter comprising of 10
coefficients for the noise suppression.
References
1. Sundaram K, Marichamy, Pradeepa (2016) FPGA based filters for EEG preprocessing. In:
Second international conference on science technology engineering and management (ICON-
STEM), pp 572–576
2. Ueno J, Chiu HK, Lin CH, Lin YC, Huang LR, Gong CSA (2015) ECG noise thresholding
based on moving average. In: IEEE international conference on consumer electronics, Taiwan,
pp 188–189
3. Pandey V, Giri VK (2016) High frequency noise removal from ECG using moving average
filters. In: International conference on emerging trends in electrical electronics sustainable
energy systems (ICETEESES), pp 191–195
FPGA-Based Implementation of Artifact Suppression and Feature Extraction 143
4. Singh Chaudhary M, Kapoor RK, Sharma AK (2014) Comparison between different wavelet
transforms and thresholding techniques for ECG denoising. In: International conference on
advances in engineering technology research (ICAETR-2014), pp 1–6
5. Vaneghi FM, Oladazimi M, Shiman F, Kordi A, Safari MJ, Ibrahim F (2012) A comparative
approach to ECG feature extraction methods. In: Third international conference on intelligent
systems modelling and simulation, pp 252–256
6. Saraswat S, Srivastava G, Shukla S (2016) Decomposition of ECG signals using discrete
wavelet transform for wolff Parkinson white syndrome patients. In: International conference
on micro-electronics and telecommunication engineering (ICMETE), pp 361–365
7. Gonzalez-Fernandez R, Mulet-Cartaya M, Lopez-Cardona JD, Lopez-Rodriguez R (2015) A
mobile application for cardiac rhythm study. In: Computing in cardiology conference (CinC),
pp 393–396
8. Mironovova M, Bla J (2015) Fast Fourier transform for feature extraction and neural network
for classification of electrocardiogram signals. In: Fourth international conference on future
generation communication technology (FGCT), pp 1–6
9. Zou Y, Han J, Xuan S, Huang S, Weng X, Fang D, Zeng X (2015) An energy efficient design
for ECG recording and r-peak detection based on wavelet transform. IEEE Trans Circuits Syst
II Express Briefs 62(2):119–123
Test Time Reduction Using Power-Aware
Dynamic Clock Allocation to Scan
Vectors
Abstract As circuit size increases with scale down in technology, the time required
to test the circuits also increases. Scheduling of the cores is a very effective technique
to reduce the test time of a system-on-chip (SoC) in the given power budget. As the
frequency is relative to the power and the test time, by controlling the test clock
frequency, the power consumption and the test time per core can be adjusted to
yield an optimal solution to the test scheduling problem. In traditional methods, the
fixed test clock frequency is applied to all the test vectors of a given core in case of
power-aware test scheduling. Whereas in the proposed plan, a power-aware dynamic
frequency allocation is done to individual scan vector of the core. Session-based test
scheduling scheme is implemented to reduce test time. Results show the improvement
up to 58% with session-based test scheduling is achieved over the existing solution
for the benchmark SoCs with minor bit and area overhead.
1 Introduction
SoC testing research. Many techniques have been developed like test scheduling
optimization, test architecture design, and optimization, etc.
SoC testing can be done in three ways: 1. Test Access Mechanism (TAM) design
and optimization, 2. Wrapper optimization, and 3. Test scheduling.
TAM carries test patterns from the test pattern source to the device under test and
takes a test response from the device under test to the test pattern sink [2]. Core test
wrapper makes the interface between design under test (core) and its environment. It
interfaces the input–output of a core to the rest of the IC and the test access mechanism
[2, 3].
Once an appropriate test transportation system and test translation system have
been established, the next big challenge for SoC is the test schedule. It gives the
detail about the order in which various cores are tested. It also ensures that there is
no resource conflict among cores [3].
The power consumption of the core during the test is usually higher than the
functional mode of operation [4]. Parallel testing of a core will reduce test time, but
at the same time, it will increase switching activities which lead to higher power
consumption. So the power-aware test scheduling technique is required for efficient
power management.
Recently, test time and test power are formulated as a function of the test clock
frequency and achieve test time minimization for a predefined power budget, where
test clock frequency was customized for each test sessions [5]. In this paper, a new
idea is presented to customize the test clock frequency of each test pattern in a core in
such a way that power dissipation does not go beyond the predefined power budget.
The whole paper is organized as follows. Section 2 discusses the prior work
on SoC testing. Section 3 describes the problem statement. Dynamic clock alloca-
tion technique and power-aware test scheduling technique is shown in Sects. 4 and
5. Section 6 describes the hardware realization of the proposed approach. Result
discussion is done in Sect. 7, and finally, Sect. 8 concludes.
2 Prior Work
Power dissipation during the test is always higher than the normal mode of the
operation. So test scheduling is done in such a way that it does not cross the limit
of the given power budget. Thus, different power-aware test scheduling techniques
were developed to test the SoCs.
One such procedure was proposed in [6], which is based on a polynomial-time
algorithm to reduce SoC test time. This method describes minimizing test area over-
head by merging the test sessions of two different cores. Scheduling of the core is
done with desirability matrix which is calculated from the area penalties. With a
similar approach, a new technique is proposed in [7] which states a Polynomial-time
algorithm based on precedence constraint. Precedence constraints force a partial
order among the cores in SoC. In 2003, a technique proposed in [8] handles prece-
dence constraints along with test conflicts which occurs because of unit testing with
Test Time Reduction Using Power-Aware Dynamic Clock Allocation to Scan Vectors 147
multiple test set, cross core testing, sharing of TAMs, and hierarchical SoCs. Handling
of conflicts and precedence constraints are based on the order in which the test has
to be applied. Another approach to handle this structural conflict is proposed in [9].
The genetic algorithm is used here for test application conflict graph to minimize
SoC test time.
Test scheduling technique for minimal energy consumption under power
constraints is proposed in [9]. In this approach, switching activities occurring in
the overlapping region of the subcircuits is taken into consideration. They have also
shown that energy is saved if the test units share common subcircuits or they are
executed in the same test sessions. Power-aware test scheduling based on simulated
annealing is presented in [9]. In this technique, a testing scheme is partitioned to
minimize the number of idle slots. In [9], another approach was presented which
stated that conventional power-aware test scheduling technique does not generate
the thermal safe solution. Wake up, and leakage power cannot be neglected in deep
submicron technology. So he proposed partitioned based thermal and power-aware
test scheduling technique.
Genetic algorithm based heuristic technique for power-aware test schedule was
proposed in [9]. Here, wrapper configuration, which is represented as a rectangle
with height equal to test time and width corresponding to some TAM channels, is
used as a core testing solution. Rectangles are placed using best-fit heuristic in such
a way that they minimize test time.
Test scheduling with the multi-clock domain is first discussed in [9]. Here, the
virtual TAM is used to solve the frequency gap between each core and a given ATE.
Virtual TAM is also used here to minimize the power consumption of the core during
the test.
Recently, many techniques have been proposed on multi-frequency SoC testing.
In [9], the author presents an idea of the multi-frequency test environment; this is
the biggest hurdle for SoC manufacturing capability. Many interdependent entities
like TAM design; multi-frequency interface configuration, bandwidth matching, and
power-aware test scheduling are well incorporated in the test framework. In [9], a
new approach is proposed based on deterministic and pseudo-random test pattern to
minimize test application time in the multi-clock domain test framework. A heuristic
is defined to decide the core to be used for concurrent test and order of test patterns
sequence for each core. In another approach, a heuristic on multi-frequency test
schedule for the sessionless system is proposed in [9]. Two heuristics were proposed
here, one for preemptive test scheduling and another one for non-preemptive test
scheduling. In [9], the author has proposed a new multiplexer based architecture to
send test data at a different frequency to the core. In a similar approach presented in
[5], the author has proposed an idea of minimizing test time by varying test clock
frequency. Traditional ILP model was modified by adding the variable frequency
parameter. In [9], the author has proposed a method of power-aware test scheduling by
scaling supply voltage and test clock frequency. Here, the scaling is done per session
if its session-based test scheduling else dynamically if itis sessionless test scheduling.
Exact and heuristic algorithm was discussed here for solving optimization. In [9], the
author has suggested to increase the test clock frequency of core to Pmax (maximum
148 H. Parmar and U. Mehta
power budget of SoC). Test time is much optimized here. In a similar approach [9],
the author has suggested to bound test clock frequency to a certain range. In his
approach, he has suggested to test core with three clock frequencies fT /2, fT, and
2fT where fT is normal test clock frequency. Cores are assigned to any one of three
frequencies and then test scheduling is done.
In all the previously published methods, no one has touched the issue of changing
the test clock frequency of individual test vectors in a core. In this paper, a new idea
is presented based on customizing the test clock frequency of the test vectors in a
given core to minimize test time. The details are presented in the following sections.
3 Problem Statement
Letus consider,
Nc —Number of cores in a given SoC.
L—Length of the scan chain.
W—TAM width.
Sc —Number of the scan chain in a given core.
n—Number of test patterns in a given core.
Pmax_avg —Maximum average power of the core.
Pi —Test power of test vector i.
Pmax —Maximum power budget of SoC.
ffunc —Functional frequency of a core.
Determine the efficient algorithm of dynamically allocating the test clock
frequency to the different test vectors of the core and schedule the core for a given
power budget in such a way that it minimizes the overall test time.
This section describes the methodology used in dynamically allocating test clock
frequency to the test vectors of the core during scan in and scan out operation [9].
Example 1 Letus assume that core C1 contains four test patterns which are denoted
as T1, T2, T3, and T4 and four output responses O1, O2, O3, and O4. Each test
pattern has three scan chains which are 3 bits, 2 bits, and 3 bits long. Also assume
that switching between the consecutive test vectors T1-T2-T3-T4 is (1, 7, 5, 7) which
is stored in an array ‘Sw’. Here, Sw(2) = is the switching between test vector T1 and
T2, Sw(3) = is the switching between test vector T2 and T3 , Sw(4) = is the switching
between test vector T3 and T4 . Switching Sw(1) is calculated when a very 1st test
vector (T1) in the sequence will pass through the initial output response. Here, the
initial output response is all 1 s or all 0 s depending on the last bit of the very 1st test
vector in the sequence to reduce the switching.
Test Time Reduction Using Power-Aware Dynamic Clock Allocation to Scan Vectors 149
Now, letus find maximum switching among all switching number stored in an
array ‘Sw’ and store that in a variable ‘SWmax ’.
Maximum average power which is drawn for a single test vector is defined for
each core in SoC. Power consumed by other test vectors is not known. Here, we are
considering the dynamic power which is proportional to the switching activity in a
test vector.
Now, take SWmax as a reference and find out the power of the remaining test
vectors of the core.
However, for each core there exists a maximum functional frequency ffunc, which
decides power constraints, e.g., the maximum power limit and structural constraints
like critical path, etc.
So, while selecting the test clock frequency, two points need to be kept in mind:
(1) Allowed test clock frequency should not go beyond functional frequency ffunc.
Functional frequency is 3 to 4 times higher than the normal test clock frequency fT
[9]. (2) Power dissipation of core should not go beyond given power budget Pmax .
Based on these criteria, we are modifying Eq. 3 by putting an upper limit of test
clock frequency which is shown in Sect. 3.
f N T i = P Fi ∗ f T , ∀i, 1 ≤ i ≤ n (5)
with scaling the test clock frequency of the test vectors is the main motive of this
proposed method.
Here, it is assumed that “set up and hold times” are not violated during this
operation. For synchronization, scan in and scan out process is done on the same test
clock frequency. For example, in Example 1, scan in of T2 test vector and scan out
of O1 is done on the same test clock frequency.
In the previous section, we have discussed the test time optimization for the single
core. In SoC, there are multiple cores which need to be scheduled. ILP-based core
scheduling for SoC is shown in this section.
In the traditional method, the test time of the core is calculated from the equation
given below.
where Si —the length of the scan in test vector, So —length of Scan out vector, ti —test
time of core i, n—total test patterns of the core
This equation states that all test patterns will get the same customized test clock
frequency. For example, if the power factor is 2, then all the test vectors in the core
will have two times higher test clock frequency.
But in our proposed method, instead of giving the same frequency to all test
patterns of the core, the variable frequency is given to different test patterns of the
core to minimize test time. So, we are modifying test time Eq. 6, which is given as
n
ti = n + (max(Si , S0 ))/P Fi + min(Si , S0 )/P FL , ∀i, 1 ≤ i ≤ Nc (7)
j=1
In session-based test scheduling, each core is assigned to one and only one session.
To represent this, a new binary variable Xij is introduced, where i represent the core
and j represents the session.
S
X i j = 1, ∀i, 1 ≤ i ≤ Nc (8)
j=1
where S—total test session, Nc is the number of cores. Xij = 1, if core i is assigned
to session j and Xij = 0, otherwise.
Since it is the session-based test scheduling, the total power consumption of the
cores scheduled in a session must not be greater than Pmax .
Nc
Pi ∗ X i j ≤ Pmax , ∀j, 1 ≤ j ≤ S (9)
i=1
After scheduling each core in a different session, the maximum test time of the
cores scheduled on each session is stored in an array R.
Total test time ‘T’ of the SoC is the addition of maximum test time of the cores
scheduled for each session.
S
T = R j , ∀j, 1 ≤ j ≤ S (11)
j=1
Min T (12)
power does not exceed beyond Pmax then update test time and session’s power which
is shown inline 10–12. If the session’s total power exceeds beyond Pmax then remove
the recently added core and try new core which is shown inline 10 and 14–19. If
no further cores are possible to accommodate in a session then start a new session
and repeat all procedure from step 8. At last, the maximum test time of all session is
added together which gives the final test time of the SoC. If further optimization is
possible then repeat scheduling from step 4.
6 Hardware Realization
Test clock generation technique is shown in Fig. 2. Automatic test equipment (ATE)
has all the test data to test SoC. Serial binary counter, three D flip flop, AND gates, 2
Test Time Reduction Using Power-Aware Dynamic Clock Allocation to Scan Vectors 153
be a minor bit overhead in test data as shown in Table 2 for SoC d695, g1023, p22810,
and p93791, respectively. In that Table, the 1st column shows the SoC name. The
2nd column shows the number of cores in that SoC. 3rd column shows the number
of bits needed to test the SoC. Total bits along with frequency factor selection bits
are put in column 4. Bits overhead and percentage bits overhead is shown in column
5 and 6 respectively.
7 Result Discussion
the technique presented in [9], is also not given, so we have implemented the same
algorithm again in MATLAB and result is shown in Column 4. Test time with the
proposed method is shown in Column 5. Percentage reduction in test time is shown
in Column 6. Experimental results show that up to 58% reduction in test time is
achieved through session-based test scheduling which shows the effectiveness of
this method.
8 Conclusion
References
1. Aikyo T. “Issues on SOC Testing in DSM Era” Design automation conference Japan (2000).
2. Aikyo T (2000) Issues on SOC Testing in DSM Era. Design automation conference Japan
3. Parmar H & Mehta U, “An improved algorithm for TAM optimization to reduce test application
time in core based SoC”, IEEE International WIE Conference on Electrical and Computer
Engineering (WIECON-ECE), 2015
4. Parmar H, Mehta U (2015) An improved algorithm for TAM optimization to reduce test appli-
cation time in core based SoC. IEEE International WIE Conference on Electrical and Computer
Engineering (WIECON-ECE)
5. Chakrabarty K, Iyengar V , & Chandra A “Test resource partitioning for system- on-a-chip”
Frontiers in Electronic Testing, . (2002).
6. Chakrabarty K, Iyengar V, Chandra A (2002) Test resource partitioning for system- on-a-chip.
Frontiers in Electronic Testing
156 H. Parmar and U. Mehta
7. Girard P, Nicolici N, & Wen X. “Power-aware testing and test strategies for low power devices.”
Springer, 2010.
8. Girard P, Nicolici N, Wen X (2010) Power-aware testing and test strategies for low power
devices. Springer
9. Vijay S , Agrawal V & Agrawal P, “Optimal Power-Constrained SoC Test Schedules With
Customizable Clock Rates”. IEEE International SOC Conference, 2012.
10. Vijay S, Agrawal V, Agrawal P (2012) Optimal power-constrained SoC test schedules with
customizable clock rates. IEEE International SOC Conference
11. Ravikumar C, Verma A & Chandra G, “A Polynomial-Time Algorithm for Power Constrained
Testing of Core Based Systems”. Eighth Asian Test Symposium,1999.
12. Ravikumar C, Verma A, Chandra G (1999) A polynomial-time algorithm for power constrained
testing of core based systems. Eighth Asian Test Symposium
13. Iyengar V & Chakrabarty K, “System-on-a-Chip Test Scheduling With Precedence Relation-
ships, preemption, and Power”. IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, (pp. 1088–1094), 2002.
14. Iyengar V, Chakrabarty K (2002) System-on-a-chip test scheduling with precedence rela-
tionships, preemption, and power. IEEE transactions on computer-aided design of integrated
circuits and systems, pp 1088–1094
15. Pouget J, Larsson E , & Peng Z , “SOC test time minimization under multiple constraint”. 12th
Asian Test Symposium, (pp. 312–317), 2003.
16. Pouget J, Larsson E, Peng Z (2003) SOC test time minimization under multiple constraint.
12th Asian Test Symposium, pp 312–317
17. Skarvada J (2006a) Test Scheduling for SOC under Power Constraints. Czech Republic, IEEE
Design and Diagnostics of Electronic Circuits
18. Skarvada J (2006b) Test scheduling for SOC under power constraints. Czech Republic, IEEE
design and diagnostics of electronic circuits
19. Schuele T & Stroele A, “Test Scheduling for Minimal Energy Consumption under Power
Constraints”. 19th IEEE VLSI Test. CA, USA, 2001.
20. Schuele T, Stroele A (2001) Test scheduling for minimal energy consumption under power
constraints. 19th IEEE VLSI Test. CA, USA
21. Harmanani H, Hassan S (2006a) On Power-Constrained System-On-Chip Test Scheduling
Using Precedence Relationships. IEEE North-East Workshop on Circuits and Systems,
Gatineau, QC, Canada
22. Harmanani H, Hassan S (2006b) On power-constrained system-on-chip test scheduling using
precedence relationships IEEE North-East Workshop on Circuits and Systems Gatineau. QC,
Canada
23. Yao C, Kewal S (2009) Partition Based SoC Test Scheduling with Thermal and Power
Constraints under Deep. IEEE Asian Test Symposium, Taichung, Taiwan
24. C Yao S Kewal 2009 Partition based SoC test scheduling with thermal and power constraints
under deep IEEE Asian Test Symposium Taichung, Taiwan
25. Giri C, Sarkar S & Chattopadhyay S, “ A Genetic Algorithm based Heuristic Technique for
power constrained test scheduling in core”, IEEE International Conference on Very Large Scale
Integration. Atlanta, GA, USA, 2007.
26. Giri C, Sarkar S, Chattopadhyay S (2007) A genetic algorithm based heuristic technique for
power constrained test scheduling in core. IEEE International Conference on Very Large Scale
Integration. Atlanta, GA, USA
27. Yoneda T, Masuda K, Fujiwara H (2006) Power-Constrained Test Scheduling for Multi-Clock
Domain SoCs. Automation and Test in Europe. Munich, Germany, IEEE Design
28. Yoneda T, Masuda K, Fujiwara H (2006) Power-constrained test scheduling for multi-clock
domain SoCs. IEEE Design, Automation and Test in Europe. Munich, Germany
29. Zhao D, Huang R, Yoneda T, Fujiwara H (2007a) Power-Aware Multi-Frequency Heteroge-
neous SoC Test Framework Design with Floor-Ceiling Packing. IEEE International Symposium
on Circuits and Systems, New Orleans, LA, USA
Test Time Reduction Using Power-Aware Dynamic Clock Allocation to Scan Vectors 157
Abstract Inx Ga1−x As channel materials with x > 0.53 have been extensively
researched for beyond silicon CMOS (complementary metal oxide semiconduc-
tor) logic applications, and now its promising non-Si n-channel material, enabling
a continuous scaling of the supply voltage (Vdd ) while providing a performance
improvement. The numerous kinds of literature suggesting that the impact of spac-
ers and raised source/drain in FinFETs for silicon-based device have good arguments
in improving the performances. The FinFETs in sub-14 nm technology nodes face
severe short channel effects (SCEs), and the same is true for In0.53 Ga0.47 As-based
FinFETs. In this paper, we have investigated the impact of nitride spacer in raised
source/drain In0.53 Ga0.47 As-FinFETs for improving the SCEs. The improvement of
10% is observed in subthreshold swing while 32% in drain-induced barrier lowering
when nitride spacer is used in raised source/drain In0.53 Ga0.47 As-FinFETs.
1 Introduction
potential of varying the percentage of Indium composition (x) in Inx Ga1−x As and
achieving a wide range of bandgap and effective mass [2]. In0.53 Ga0.47 As FinFET has
been explored in several literatures and has been fortuitously fabricated with various
techniques. The In0.53 Ga0.47 As-based FinFET for sub-16 nm technology node is crit-
ical due to several aspects like lattice mismatch with substrate and oxides (high-K),
interface traps, change in transport properties of the material beyond 10 nm thick-
ness, etc. Such problems lead to affect SCEs but by incorporating various techniques
in the device, suppression in SCEs can be achieved. The importance of raised S/D
technique has been raised in future goal of CMOS scaling as it has to keep Ion as
high as possible while scaling Vdd . The raised S/D helps to reduce S/D resistance
which is predominant for sub-22 nm technology nodes [3–7]. In this work, the SCEs
in In0.53 Ga0.47 As-FinFET at sub-14 nm technology nodes with three different struc-
tures are analyzed. The three structures, namely, traditional, raised source/drain (S/D)
and raised S/D with spacers are taken into consideration. The work is divided into
three sections, Sect. 2 has the description of device structure and simulation method-
ology. Section 3 exhibits the physical importance of spacer in raised source/drain,
while results and analysis of SCEs are described in Sect. 4.
The digital performance of InGaAs FinFET is carried out for sub-16 nm technol-
ogy node with channel length of 14 nm. The detailed structure parameters are pre-
scribed from 2013 ITRS (International Technology Roadmap for Semiconductors)
[8]. Table 1 reflects the dimensions of structure for channel length of 14 nm. The
combination of InGaAs/HfO2 is taken as oxide/semiconductor interface, and several
techniques have been successfully implemented to integrate HfO2 on InGaAs [9].
The spacer is introduced with nitride across the uncovered fins of the raised S/D
device. The body is doped with beryllium of concentration 1 × 1017 cm−3 , while
S/D is doped with silicon of concentration 5 × 1019 cm−3 . The doping in body and
S/D region is uniformly doped throughout the device. The gate of the device is of
nickel as it has low contact resistivity. The structures of In0.53 Ga0.47 As-FinFET are
shown in Fig. 1.
The device simulation has been performed using Synopsys SDEVICE 3-D TCAD
Tool [10]. The device physics used for the characterizing 14 nm channel length
FinFETs have been included by calibrating the fabricated device of 50 nm chan-
nel length In0.53 Ga0.47 As nFinFET [11]. The state of the art is similar to that adopted
in [12–14]. Doping dependence Arora model and Lombardi High-K model have been
used for mobility of inversion charge carriers. The recombination and generations
are apprehend by introducing the models like Hurkx, Auger [15, 16], and Shock-
leyReadHall. The device operates under high electric field at small bias voltage, so
the high electric field phenomenon saturates the carrier’s velocity and this event is
included using the high-field saturation model. The parameters such as velocity satu-
ration, mobility, density of states, etc. for In0.53 Ga0.47 As material are extracted from
Impact of Spacers in Raised Source/Drain 14 nm Technology Node … 161
Fig. 1 FinFET structure. a Simple In0.53 Ga0.47 As FinFET. b Raised source/drain In0.53 Ga0.47 As
FinFET. c Raised source/drain In0.53 Ga0.47 As FinFET with nitride spacer
the experimental device of 50 nm channel length In0.53 Ga0.47 As nFinFET [11]. The
experimental device of 50 nm channel length In0.53 Ga0.47 As nFinFET is calibrated
with the help of the drift diffusion model and Ids − Vgs and Ids − Vds characteristics
have been shown in Fig. 2.
Aside from device physics models utilized in adjusting the 50 nm channel length,
various models were likewise included, for example, quantization effects and non-
parabolicity effects for sub-14 nm In0.53 Ga0.47 As nFinFETs to deal with quantum
confinement effects. Energy bandgap and the band structure of InGaAs that dis-
play non-parabolic behavior at various valleys are likewise considered in the device.
At 8.5 nm channel thickness affects the quantum mechanical effects, so the fitting
parameters in quantum potential parameters (namely, gamma, effective mass) are
162 J. Pathak and A. D. Darji
Fig. 2 Calibration of In0.53 Ga0.47 As nFinFET with L g = 50 nm. a Ids − Vgs transfer characteristics
during different drain bias. b Ids − Vds output characteristics during different gate bias [13, 14]
Fig. 3 Conduction band energy along the S/D in on and off state for both devices
Fig. 4 a Electric field profile along the S/D for both devices. b Variation of electrostatic potential
at the spacer interface and inside the channel (in on state)
helps gate to control the channel. The bending helps inversion charge carriers to
pass through the channel during on regime under gate control. So apart from carrier
density, extra bending in CBE across channel region also helps in increasing the
charge carrier during on regime. The improvement in on-state current is the major
achievement when the spacers are introduced in the device. Figure 4b provides the
influence of spacer on electrostatic potential in on and o f f state of the device. Thus,
by introducing the spacers and its impact on the behavior of conduction band energy,
electrostatic potential and electron density concentration at high drain bias and low
drain bias improve on-state current.
164 J. Pathak and A. D. Darji
The off current is essential for device in power perspective. The In0.53 Ga0.47 As-
FinFET devices have significant effects of off currents. The current observed when
Vgs = 0 for Vd = Vdd is known as leakage current. In short channel devices, the
source–drain potential strongly affects the band bending over a huge segment of the
device. Thus, the subthreshold currents and the threshold voltages of short channel
devices change with the drain bias. The DIBL is observed on the behavior of the
threshold voltage at saturation and linear region, and hence it is governed through
Eq. 1:
δVth
DIBL = (1)
δVds
The subthreshold leakage current is the another key parameter that determines the off-
state performance of device and whether the chip’s static power dissipation is within
tolerable limits. Ideally, the current decreases to zero as soon as the gate bias is below
Vth . In reality, existence of drain conduction below Vth is known as subthreshold
conduction. The important relation for the subthreshold swing is governed by Eq. 2:
d Vgs
SS = (2)
d(log Ids )
Extraction of the Vth for the device is carried out using the transconductance method
[20]. In transconductance method, the threshold voltage is extracted using the
assumption of ideal case where Ids = 0 for Vgs < Vth , and with these assumptions
d 2 Ids /d 2 Vgs becomes infinity exactly at Vgs = Vth . The assumption of d 2 Ids /d 2 Vgs =
∞ is quiet impractical in real device so at Vgs = Vth the value of d 2 Ids /d 2 Vgs becomes
maximum.
Fig. 5 a Ids − Vgs characteristics and b Ioff current characteristics of all the In0.53 Ga0.47 As-FinFET
devices
the device. Figure 6 reflects Ion versus SS versus DIBL benchmark for InGaAs-OI
FinFETs with SS ranging from 60 to 200 mV/decade [21–24]. The reported work has
significant improvement in Ion and SS as compared to other InGaAs-OI FinFETs.
5 Conclusion
The raised S/D In0.53 Ga0.47 As-FinFETs with spacers have good control of gate on
channel through fringing electric field arising at gate to S/D region. The raised S/D
structure introduces extra fringing field which may give raise to parasitic capacitance
across different parts of the region. The extra fringing field helps to control the flow of
electron during the off condition. The impact of spacer is observed in subthreshold
swing and DIBL of the device. SS and DIBL of raised source/drain with spacer
are lowest compared to other devices. SS is as low as 103 mV/decade and DIBL
is 66 mV/V. Implementing InGaAs FinFET with spacers is prominent option for
high-speed and low-power applications.
Acknowledgments The author would like to thank the Special Manpower Development Program
Phase-II (SMDP-II) for VLSI design sponsored by government of India, New Delhi.
References
15. Hurkx GAM, Klaassen DBM, Knuvers MPG (1992) IEEE Trans Electron Devices 39(2):331
16. Hurkx GAM, de Graaff HC, Kloosterman WJ, Knuvers MPG (1992) IEEE Trans Electron
Devices 39(9):2090
17. Tewari S, Biswas A, Mallik A (2016) IEEE Trans Electron Devices 63(6):2313
18. Koley K, Dutta A, Syamal B, Saha SK, Sarkar CK (2013) IEEE Trans Electron Devices 60(1):63
19. Sachid AB, Francis R, Baghini MS, Sharma DK, Bach KH, Mahnkopf R, Rao VR (2008) 2008
IEEE international electron devices meeting (IEEE, 2008), pp 1–4
20. Ortiz-Conde A, García-Sánchez FJ, Muci J, Barrios AT, Liou JJ, Ho CS (2013) Microelectron
Reliab 53(1):90
21. Djara V, Deshpande V, Uccelli E, Daix N, Caimi D, Rossel C, Sousa M, Siegwart H, Marchiori
C, Hartmann JM, Shiu K, Weng C, Krishnan M, Lofaro M, Steiner R, Sadana D, Lubyshev D,
Liu A, Czornomaz L, Fompeyrine J (2015) Proceedings of the symposium on VLSI technology
(IEEE, Kyoto, 2015), pp T176–T177
22. Sun X, D’Emic C, Cheng CW, Majumdar A et al (2017) Proceedings of the symposium on
VLSI technology (IEEE, Kyoto, 2017), pp T40–T41
23. Luc QH, Yang KS, Lin JW, Chang CC, Do HB, Huynh SH, Ha MTH, Nguyen TA, Lin YC,
Hu C et al (2018) IEEE Electron Device Lett 39(3):339
24. Vardi A, Lin J, Lu W, Zhao X et al (2016) Proceedings of the symposium on VLSI technology
(IEEE, Honolulu, 2016), pp 1–2
Impact of Multi-Metal Gate Stacks
on the Performance of β-Ga2 O3 MOS
Structure
Abstract Gallium oxide (Ga2 O3 ) due to its Ultra-wide bandgap (UWB) becomes the
most promising semiconductor material for future-generation electronic applications.
The Ga2 O3 semiconductor possesses excellent material properties such as ultra-
wide bandgap of 4.8 eV, ability to withstand breakdown field ranging from 5 to 9
MV/cm along with superior thermal and chemical stability. Also, it has excellent
Baliga’s figure-of-merit (B-FOM). However, the performance of β-Ga2 O3 devices is
still limited due to the issue related to good ohmic contact materials. In this work,
we have studied the Impact of using multi-metal gate stack arrangements on the
performance of β-Ga2 O3 MOS Structure. The performance parameters used in the
analysis are IDmax , IDmin , ION /IOFF, gm, and gd . It is observed that the Ti/Au metal
stack arrangement shows better results among all the metal stack arrangements and
hence found to be suitable for high power RF applications with low losses.
1 Introduction
Silicon has reached its theoretical limitations for high power wireless applications [1,
2]. To improve the performance of the high power electronic devices with minimized
power losses, it is needed to replace the silicon material by other semiconductor
technology. Recently reported wide band gap (Eg ) materials like SiC, GaN, and
Ga2 O3 surpass the Si because these materials have the advantage of high breakdown
electric field (Ebr ) over the silicon [3]. Ga2 O3 material with ultra-wide band gap
(~5 eV) is found to be the most promising choice for efficient high power electronic
applications due to its ability to withstand a breakdown electric field of 8MV/cm [4,
5]. To analyze and suppress the losses in high power devices, Baliga’s figure-of-merit
(B-FOM) is used and breakdown field which is one of the key factors in the above
FOM plays a crucial role to suppress losses [1].
In high voltage applications, the blocking voltage is required to be very high, i.e.,
>600 V, and Ga2 O3 MOSFET already has made improvement in terms of Ec for such
applications by overcoming the theoretical limits imposed by other known materials
like GaN, SiC, Si, etc. [6]. Ga2 O3 MOSFET with high mobility of 100 cm2 /Vs
[7] provides an opportunity to implement efficient power amplifier combining the
advantages of optimized integration of power switches, RF switches, and power
amplifiers [8]. However, the performance of β-Ga2 O3 devices is still limited due
to the issue related to good ohmic contact materials. Many researches related to the
different gate metal arrangement for Ga2 O3 MOSFET has already been carried out to
overcome the contact metal related issues [9–15]. Higashiwaki et al. investigated the
performance of Ga2 O3 MOSFET having Ti(3 nm)/Pt(12 nm)/Au(280 nm) gate metal
stacks of 2 μm in length and obtained low off-state leakage current in the range of
few pico-Ampere with good switching performance (Ion-to-Ioff current ratio >1010 )
[10]. Green et al. studied the performance of Ti(20 nm)/Au(480 nm) metal stack gated
Ga2 O3 MOSFET and found that the designed device having MOVPE grown Ga2 O3
resultant in high field strength of about 3.8MV/cm [11]. Feng et al. studied that the
use of Ni(60 nm)/Au(120 nm) metal stack gate electrode in a Ga2 O3 MOSFET which
results in Ion-to-Ioff current ratio >108 with a high breakdown of 800 V [12].
In this work, we have studied the impact of using different multi-metal gate stacks
on the performance of β-Ga2 O3 MOS Structure. The parameters for the different gate
metal stack arrangements are taken from the experimentally reported Ga2 O3 MOS
devices [10–12, 14]. The performance parameters used in the analysis are IDmax ,
IDmin , ION /IOFF, gm, and gd .
The structure of the Ga2 O3 MOS devices with different multi-metal gate stacks is
shown in Fig. 1.
The channel of the device is a uniformly doped (7e17cm−3 ) 300 nm thick
n-type region grown over a semi-insulating single crystal Ga2 O3 substrate. The
multiple Si+ implantation is used to define source and drain electrode regions of
150 nm shallow box profile. These regions are n-type uniformly doped regions
(3e19 cm−3 ). The source and drain regions are 20 μm apart from each other over
which a gate dielectric insulator Al2 O3 was made to grow followed by deposition
of different gate metal arrangement of 2 μm in length on the top. The different gate
metal arrangement used in the study includes Al(100 nm), Ti(20 nm)/Au(480 nm),
Ni(60 nm)/Au(120 nm), and Ti(3 nm)/Pt-(12 nm)/Au(280 nm). All these data are
taken from the experimentally demonstrated Ga2 O3 MOS devices [10–12, 14].
In Ga2 O3 , the density of states for conduction band (Nc ) is evaluated from the
electron effective mass (mn * = 0.28m0 ) and is taken ~3.7e18cm−3 . The work function
for the gate contact of 5.23 eV is taken with electron affinity of 4 eV.
Impact of Multi-Metal Gate Stacks on the Performance … 171
Fig. 1 The structure of the Ga2 O3 MOS device with different multi-metal gate stacks
The capacitance versus voltage (C-V) characteristics of the Ga2 O3 MOS capacitor
for different gate electrode metal arrangement is shown in Fig. 2. The value of capac-
itances for different electrode arrangements is measured at a frequency of 1 MHz.
All the relevant plots are extracted from the powerful ATLAS TCAD simulator [16].
In Fig. 2, it is observed that the difference in the value of capacitance for Al, Ti/Au,
and Ti/Pt/Au is very small (see the inlet graph). However, the difference in the capac-
itance value of Ni/Au is large in comparison to all other arrangements. It is because
the Ti metal provides good ohmic contacts over the gate oxide (Al2 O3 ) in comparison
to Ni metal.
Ni/Au
different gate electrode metal 8 Ti/Au
arrangement Ti/Pt/Au
Capacitance (pF/mm)
6 9.00
8.98
8.96
8.94
4 8.92
8.90
8.88
2 8.86
1.5 2.0 2.5 3.0 3.5 4.0
VG (V)
IDS(A/mm)
electrode metal arrangement 0.03
1E-05
Ion/Ioff>1010
1E-07
0.02
1E-09
0.01
1E-11
VDS=25V
0.00 1E-13
-25 -20 -15 -10 -5 0 5
VGS (V)
The transfer characteristics of the devices in both linear as well as in log scale
are shown in Fig. 3. The plot shows that the performance of Ti/Au metal electrode
arrangement is superior among all and shows the advantage of good current driving
capability. The ION /IOFF ratio of greater than 1010 with OFF-state leakage current of
few pico-Amperes is obtained for the all the devices.
Figure 4 represents the plot of transconductance (gm ). The gm which is an essential
parameter to determine the gain of the amplifier is found to be superior in case Ti/Au
metal electrode arrangement in comparison to other metal stack arrangements and it
is due to the improved value of ON-state current. The transconductance (gm ) can be
formulated by using Eq. (1). High gm is required to obtain improved carrier transport
efficiency, and to implement efficient analog/RF amplifiers.
In analog circuits, to achieve high gain, low gd is required. Figure 5 represents the
characteristic of output-conductance (gd ) versus drain voltage (V DS ) at V GS equal to
−4 V. Lower value of gd is observed in Ni/Au metal stack arrangement in comparison
to any other gate stack arrangements and provides an advantage to implement high
gain amplifier. Table 1 summarizes the performance comparison of the obtained
results for different metal electrode arrangements in Ga2 O3 MOSFET. It is observed
that the Ti/Au metal stack arrangement shows better results among all the metal
stack arrangements. It has the highest value of ON-state current, i.e., 13.1%, 20.7%,
Fig. 4 The 4
transconductance behavior Al
Ni/Au
of Ga2 O3 MOSFET for
Ti/Au
different gate electrode metal 3
gm (mS/mm)
Ti/Pt/Au
arrangement
VDS=25V
2
0
-25 -20 -15 -10 -5 0 5
VGS (V)
Impact of Multi-Metal Gate Stacks on the Performance … 173
gd (mS/mm)
2.0 Ti/Pt/Au
arrangement
1.5
1.0
0.5
VGS = - 4V
0.0
0 10 20 30 40
VDS (V)
and 13.3% superior to that of Al, Ni/Au, and Ti/Pt/Au, respectively. The value of
OFF-state leakage current is also found to be lower and it is 96.12%, 96.12%, and
27% lower to that of Al, Ni/Au, and Ti/Pt/Au, respectively.
∂ID
gm = (1)
∂VGS
∂ID
gd = (2)
∂VDS
4 Conclusions
The Impact of multi-metal gate stacks on the performance of β-Ga2 O3 MOS Structure
is studied. The Ion/Ioff of greater than 1010 with OFF-state leakage of a few pico-
Amperes is obtained for the all the devices. It is observed that the Ti/Au metal stack
arrangement shows better results among all the metal stack arrangements and hence
found to be suitable for high power RF applications with low losses. It has highest
value of ON-state current, i.e., 13.1%, 20.7%, and 13.3% superior to that of Al, Ni/Au,
and Ti/Pt/Au, respectively. The value of OFF-state leakage current is also found to be
174 N. Yadava and R. K. Chauhan
lower and it is 96.12%, 96.12%, and 27% lower to that of Al, Ni/Au, and Ti/Pt/Au,
respectively. Finally, the performance of β-Ga2 O3 devices can be further improved
by the choice of good ohmic contact materials with good adhesive properties.
References
1. Baliga BJ (1989) Power semiconductor device figure of merit for high-frequency applications.
IEEE Electron Device Lett 10:455–457
2. Johnson E (1965) Physical limitations on frequency and power parameters of transistors. In:
1958 IRE International Convention Record, vol 13, pp 27–34
3. Pearton SJ et al (2018) A review of Ga2 O3 materials, processing, and devices. Appl Phys Rev
5:11301
4. Higashiwaki M, Sasaki K, Kuramata A, Masui T, Yamakoshi S (2012) Gallium oxide (Ga2 O3 )
metal-semiconductor field-effect transistors on single-crystal β-Ga2O3 (010) substrates. Appl
Phys Lett 100:13504
5. Irmscher K, Galazka Z, Pietsch M, Uecker R, Fornari R (2011) Electrical properties of β-Ga2O3
single crystals grown by the Czochralski method. J Appl Phys 110:63720
6. Green AJ et al (2016) 3.8-MV/cm breakdown strength of MOVPE-grown Sn-doped β-
Ga2 O3 MOSFETs. IEEE Electron Device Lett 37:902–905
7. Sasaki K et al (2012) Device-quality β-Ga2 O3 epitaxial films fabricated by ozone molecular
beam epitaxy. Appl Phys Express 5:35502
8. Green AJ et al (2017) β-Ga2 O3 MOSFETs for radio frequency operation. IEEE Electron Device
Lett 38:790–793
9. Yadava N, Chauhan RK (2019) RF performance investigation of β-Ga 2 O 3 /Graphene and
β-Ga 2 O 3 /black phosphorus heterostructure MOSFETs. ECS J Solid State Sci Technol
10. Higashiwaki M et al (2013) Depletion-mode Ga2 O3 metal-oxide-semiconductor field-effect
transistors on β-Ga2 O3 (010) substrates and temperature dependence of their device character-
istics. Appl Phys Lett 103:123511
11. Green AJ et al (2016) 3.8-MV/cm breakdown strength of MOVPE-grown Sn-doped β-Ga2O3
MOSFETs. IEEE Electron Device Lett. https://doi.org/10.1109/LED.2016.2568139
12. Feng Z et al (2019) A 800 V β-Ga 2 O 3 Metal–Oxide–semiconductor field-effect transistor
with high-power figure of merit of Over 86.3 MW cm −2. Physica Status Solidi. https://doi.
org/10.1002/pssa.201900421.
13. Sasaki K, Higashiwaki M, Kuramata A, Masui T, Yamakoshi S (2013) Si-Ion implantation
doping in β-Ga2O3and its application to fabrication of low-resistance ohmic contacts. Appl
Phys Express 6:86502
14. Oslo UO. FYS2210. University of Oslo, Viktor Bobal
15. Mastro MA et al (2017) Perspective—opportunities and future directions for Ga2 O3 . ECS J
Solid State Sci Technol 6:P356–P359
16. ATLAS user’s manual (2014) SILVACO Int, Santa Clara, CA
Improved VLSI Architecture
of Dual-CLCG for Pseudo-Random Bit
Generator
Abstract In this paper, improved VLSI architecture of dual coupled linear congru-
ential generator (CLCG) for pseudorandom bit generation is proposed to enhance
the timing performance with minimum area overhead. To improve its performance,
a splitting structure of the adder circuit rather than a single adder is used in linear
congruential generator (LCG). Its effect on the design of a dual-CLCG is observed to
compute some of the essential performance parameters such as maximum frequency
of bit generation, minimum period of clock signal, initial clock latency, output to
output latency and area complexity. For Spartan-3E XC500E FPGA based synthesis
of proposed architecture with 8, 16 and 32-bit word length, the ISE design suite
provided by Xilinx is used.
1 Introduction
Linear congruential generator (LCG) and linear feedback shifts register (LFSR)
are the most common pseudorandom bit generators with low complexity. Problems
associated with these random bit sequences, it is insecure due to its linear structure and
fail in NIST or randomness test [8, 9]. Blum Blum Shub (BBS) generator is one of the
secure and unpredictable key generators is present in [10, 11]. It is highly secure due
to a large prime number factorization problem but its design of VLSI architecture
is challenging for handling and computing the large prime integer. So, designing
an efficient architecture for pseudorandom bit generation is a major challenge to
generate random bit in uniform clock time and minimum output to output latency
with minimum area overhead. Two optimized architecture of LCGs i.e. the Ranq1
and Ran are proposed in [12]. It is designed by XOR shifting and multiplication with
carry generator. Most of them are required high clock latency, the non-uniform clock
rate of bit generator and large hardware complexity [13, 14]. To overcome these
drawbacks, the coupling of LCG i.e. CLCG was proposed in [15] and [16]. In [17],
a coupling of two CLCG architecture is proposed which is denoted by CLCG-1 and
CLCG-2. Each architecture generates separates output with different seed values. The
coupling in this architecture makes a more secure than single architecture. Although
improvement in the secure generation of pseudorandom bit sequence, the CLCG
algorithm fails in five major NIST statistical tests [18]. It is failing in desecrate
Fourier transform (DFT) test i.e. used to investigate the periodic bit pattern and shows
the weak generator. It is generating a one-bit random output only when it meets the
condition of inequality equations. So. It cannot be generating pseudorandom bit at
every iteration. In [19], proposed a new hardware architecture of the existing dual-
CLCG algorithm is developed that generates the pseudorandom bit at a uniform clock
rate.
In this paper, we design an improved VLSI architecture of dual-CLCG for pseu-
dorandom bit generator to enhance the timing performance with minimum area over-
head. To improve its performance, we design a splitting structure of the adder circuit
rather than a single adder i.e. used in the linear congruential generator (LCG). There-
after, it’s FPGA synthesis has been done using Spartan-3E XC500E FPGA. The
remaining brief of work is organized as follows: Section-2 presents the proposed
architecture of dual-CLCG for random bit generation. Section-3 presents the perfor-
mance comparison between proposed architecture and previous reported work [19]
in terms of maximum frequency of bit generation, minimum period of clock signal,
initial clock latency, output to output latency and area complexity (in terms of total
flip-flops, number of slices and LUTs). Finally, Section-4 concludes the work.
The dual-CLCG method techniques is proposed by Katti et al. [18]. It is dual coupling
of four linear congruential generator (LCG) and is define by following recurrence
equations:
Improved VLSI Architecture of Dual-CLCG … 177
Here b1 , b2 , b3 , b4 are the constant parameter and x0 , y0 , p0 and q0 are the initial
seeds for above recurrence equation for pseudorandom bit generation. Here r is
the positive integer i.e.1 < 2r < 2n . Final output variable z i is the random output
sequence. It is evaluated based on Eq. (5) in every iteration.
circuit to improve the timing performance and area complexity. It is done according
to Eqs. (6)–(9).
/ / / / / / / /
Sx2[n-r1-1:0] Sx1[r1-1:0] Sy2[n-r2-1:0] Sy1[r2-1:0] Sp1[r3-1:0] Sq2[n-r4-1:0] Sq1[r4-1:0]
Sp2[n-r3-1:0]
MSB LSB MSB LSB MSB LSB MSB LSB
Register (n-bit) Register (n-bit) Register (n-bit) Register (n-bit)
xi+1 yi+1 pi+1 qi+1
/ / / /
Comparator Comparator
Zi
Fig. 1 Proposed architecture of improved dual –CLCG architecture for pseudo random bit generator
Improved VLSI Architecture of Dual-CLCG … 179
The improve structure of dual-CLCG method for different bit size of 8, 16 and 32-
bit are design by Verilog HDL code. Its simulation and synthesis has been done
using ISE design suite by Xilinx. Synthesis of this architecture has been done
using commercially available Spartan-3E XC500E FPGA. Performance parameter
of improved architecture of dual-CLCG for pseudorandom bit generator such as
maximum frequency of bit generation, minimum time period of clock signal, initial
clock latency, output to output latency and area complexity (in terms of total flip-flops,
number of slices and LUTs) are evaluated and compare with previously reported
dual-CLCG [19]. It is shown in Table 1. Figure 2 shows the simulation waveform
using Verilog test bench for 8-bit word length of proposed architecture. For this
simulation we take the prime number (b1 = 1, b2 = 3, b3 = 141 and b4 = 79)
and initialization of x0 , y0 , p0 , q0 equal to 1, 2, 14, 3 respectively. left shifting of
Clock
X[7:0]
Y[7:0]
P[7:0]
Q[7:0]
Bi
Ci
Zi
4 Conclusion
Utilized Area
architecture and comparison 500
with previous literature [19].
400
a Area complexity.
b Maximum frequency of bit 300
generation. c Minimum
possible time-period of clock 200
signal 100
0
5 10 15 20 25 30
Size (bit)
(a)
220
200
180
160
140
120
100
5 10 15 20 25 30
Size (bit)
(b)
Time (TCLK) [20]
Time (TCLK) [Proposed]
9
Time (TCLK) (ns)
4
5 10 15 20 25 30
Size (bit)
(c)
182 M. D. Gupta and R. K. Chauhan
References
1 Introduction
P = {P1 , P2 , P3 , . . . , Pn }
(1)
GC D(Pi , P j ) = 1 ∀ i and j expect i = j
RN S
X −−→ (X 1 , X 2 , X 3 . . . , X n ) (2)
To design a RNS system based on balance word length most often used moduli
set 2n − 1, 2n , 2n + 1. Out of these, modulo 2n + 1 is the biggest hindrance as it
requires (n + 1) number of bits for their representation while other two sets uses
only n bits. Thus, the design of modulo 2n + 1 multiplier requires more logical
efforts comparison to other modulo multiplier because it uses (n + 1) bit operand.
To address this, a variety of techniques are proposed in different literatures [6–16].
Depending on the representation of Multiplier (B) and Multiplicand (A), most of the
techniques come under the following three category
1. Both in weighted representation [6–9]
2. Both in diminished-1 representation [10–15]
3. One in diminished-1 and another in weighted representation [16].
Multiplier proposed in [6], was based on category 1 form except 2n , which is repre-
sented as an all zeros vector. Modulo 2n + 1 multiplier based on a binary multiplier
was described in [7]. Modulo 2n + 1 multiplier using weighted-binary representation
suggested by [8], exploits the redundancy in the representation. The partial product
matrix was divided into four groups and simplified considering the fact that the bits
from only one group can be different from 0s. Thus, the partial product matrix was
Low Power Radix-8 Modulo 2n + 1 Multiplier … 185
reduced to an n × n bit array. In [9], the correction terms of [8], were further simpli-
fied and represented as a single additional partial product. However, in comparison to
[10–13, 15], the designed proposed for diminished-1 in [17], is the most prevalent for
modulo 2n + 1 multiplier designs. Based on encoding scheme used, the diminished-1
multiplier can be categorised into: (a) Non-encoded multipliers [10, 11] (b) Radix-4
Booth encoded multipliers [12–14]. In [12], the number of Modulo Reduced Partial
Products (MPPA’s) were reduced to n/2 + 1 and length of each partial product was
n + 1 bits. However, the summation of partial products was performed in an integer
Carry Save Adder (CSA) array that outputs S and C vector of the length n + log2 n
bits that needed to be reduced further. The modulo reduction entailed two additional
Carry Save Adder’s (CSAs) in series with the CSA array. This technique for Mod-
ulo Reduced Partial Product Accumulator (MPPA) has been superseded by the use
of the regular Complimentary End Around Carry-Carry Save Adder (CEAC-CSA)
tree in [13, 14]. In [13], a unified architecture for modulo 2n + 1 multiplier using
diminished-1 and weighted-binary number representations was proposed.
Radix-8 Booth encoded modulo 2n + 1 multipliers with area -power efficiency
were proposed in [18, 19]. In [18], the weighted method was used for the representa-
tion of both operands. The design has problem related to bias constant. The aggregate
bias due to hard multiple and partial product is composed of multiplier-dependent
dynamic bias and multiplier-independent static bias. The bias was represented by
hardwired the output of booth encoder and constant 1s at appropriate bit position
with at most two level of AND gates. The total number of modulo reduced partial
product, including bias imposed, was equal to n3 + 6. The proposed architecture
was implemented using n3 + 1 Booth Encoder (BE), n( n3 + 1) Booth Selec-
tor (BS), n( n3 + 4) Full Adder (FA) and a parallel prefix 28 + 1 adder. While in
[19], the diminished-1 method was used to represent both operands. The method
to generate partial product similar properties was used as in [18]. Thus, the design
of the booth selector was similar to [18]. A Modified Booth Encoder (MBE) was
proposed to perform booth encoding. But the algorithm is not capable to handle
zero input. The architecture was implemented using n3 + 1 MBE, n( n3 + 1) BS,
[(n + 1)( n3 ) − (n − 1)] FA, n − n4 Half Adder (HA), three 2-input logic gates and
a diminished-1 28 + 1 adder. The main contribution of the proposed algorithm was
as follows:
1. The number of (2n + 1) modulo reduced partial product including bias was
reduced to n3 + 2. In addition to that it has a ‘1’ constant that was handled
with help of 2-input three gates.
2. A new Hard Multiple Generator (HMG) was proposed using a modified structure
of parallel-prefix operator. But the proposed design has a limitation related to the
generation of carry signal. The odd carry signals were used to generate the sum
bits at both even and odd positions.
186 N. K. Kabra and Z. M. Patel
The rest of the paper is organised as follows: Sect. 2 gives the mathematical
expression of the proposed algorithm of modulo 2n + 1. The proposed architecture
of each section such as Modified Booth Encoder∗ (MBE∗ ), Modified Booth Selector∗
(MBS∗ ), partial product accumulation stage and finally modulo 2n + 1 adder is pre-
sented in Sect. 3. The implementation analysis and result verifying the effectiveness
of proposed architecture is presented in Sect. 4. Finally, the paper is concluded in
Sect. 5.
P D = AB + B − B
= (A + 1)B − B
B = 26 b6 + 25 b5 + 24 b4 + 23 b3 + 22 b2 + 21 b1 + 20 b0
= −28 b8 + 28 b8 − 27 b7 + 27 b7 + 26 b6 + 25 b5 + 24 b4 + 23 b3 + 22 b2 + 21 b1 + 20 b0
= 28 b8 − 27 b7 + 26 (−4b8 + 2b7 + b6 + b5 ) + 23 (−4b5 + 2b4 + b3 + b2 )
+ 20 (−4b2 + 2b1 + b0 ) (4)
2
3i
+ (b3i−1 + b3i + 2b3i+1 − 4b3i+2 )2
i=1 2n +1
B = [(b0 + bn ) + 2(b1 + bn+1 ) − 4(b2 + bn+1 )]
2n +1
n3
+ (b3i−1 + b3i + 2b3i+1 − 4b3i+2 ) (5)
i=1 2n +1
Now after applying boolean identities of OR gate with respect to binary addition,
Eq. 5 can be written as
B = [(b0 ∨ bn ) + 2(b1 ∨ bn ) − 4(b2 ∨ bn )]
2n +1
3
n
3i
+ (b3i−1 + b3i + 2b3i+1 − 4b3i+2 )2
i=1 2n +1
n3
∗
= [b−1 + b0∗ + 2b1∗ − 4b2∗ ] + (b3i−1 + b3i + 2b3i+1 − 4b3i+2 )23i
i=1 2n +1
n
3 ∗
B
= ∗ ∗ ∗ 3i
(b3i−1 + b3i + 2b3i+1 − 4b3i+2 )2 (6)
2n +1
i=0 2n +1
n3
∗ M B 3i
B = (bi )2 (7)
2n +1
i=0 2n +1
where for i = 0
∗ ∗
b0 M B = b−1 + b0∗ + 2b1∗ − 4b2∗ ;
for 1 ≤ i ≤ n3
∗
bi M B = b3i−1 + b3i + 2b3i+1 − 4b3i+2 ;
∗
along with b−1 = 0, b0∗ = b0 ∨ bn , b1∗ = b1 ∨ bn and b2∗ = b2 ∨ bn .
B = 27 b7 + 26 b6 + 25 b5 + 24 b4 + 23 b3 + 22 b2 + 21 b1 + 20 b0
= −28 b8 + 28 b8 + 27 b7 + 26 b6 + 25 b5 + 24 b4 + 23 b3 + 22 b2 + 21 b1 + 20 b0
= 28 b8 + 26 (−4b8 + 2b7 + b6 + b5 ) + 23 (−4b5 + 2b4 + b3 + b2 )
+ 20 (−4b2 + 2b1 + b0 ) (8)
2
+ (b3i−1 + b3i + 2b3i+1 − 4b3i+2 )23i
i=1 2n +1
Low Power Radix-8 Modulo 2n + 1 Multiplier … 189
2
+ (b3i−1 + b3i + 2b3i+1 − 4b3i+2 )23i
i=1 2n +1
= [(b0 ∨ bn ) − 4(b1 · bn + b2 ) + 2(b1 ⊕ bn )]
2
+ (b3i−1 + b3i + 2b3i+1 − 4b3i+2 )23i
i=1 2n +1
2
+ (b3i−1 + b3i + 2b3i+1 − 4b3i+2 )23i
i=1 2n +1
2
∗ ∗ + 2b∗ − 4b∗ )20 +
B = (b + b (b + b + 2b − 4b )2 3i
2n +1 −1 0 1 2 3i−1 3i 3i+1 3i+2
i=1 2n +1
n3
∗
B = ∗
(b3i−1 + b3i ∗
+ 2b3i+1 ∗
− 4b3i+2 )23i (9)
2n +1
i=0 2n +1
n
3 ∗ M B 3i
B
= (bi )2 (10)
2n +1
i=0 2n +1
where for i = 0
∗ ∗
b0 M B = b−1 + b0∗ + 2b1∗ − 4b2∗ ;
190 N. K. Kabra and Z. M. Patel
for 1 ≤ i ≤ n3
∗
bi M B = b3i−1 + b3i + 2b3i+1 − 4b3i+2 ;
∗
along with b−1 = 0, b0∗ = b0 ∨ bn , b1∗ = b1 ⊕ bn and b2∗ = b2 ∨ b1 · bn .
Now from Eqs. 6, 7, 9 and 10 multiplier bit B can be Booth encoded as
n
3 ∗ M B 3i
B = B
= (bi )2 (11)
2n +1 2n +1
i=0 2n +1
where for i = 0
∗ ∗
b0 M B = b−1 + b0∗ + 2b1∗ − 4b2∗ ;
for 1 ≤ i ≤ n3
∗
bi M B = b3i−1 + b3i + 2b3i+1 − 4b3i+2 ;
n3
∗M B
P D
= (A + 1) (bi )2 2n +1 − B
3i
2n +1 i=0 2n +1
n3
∗ M B 3i
= (bi )2 (A + 1) − B
i=0 2n +1 2n +1
n3
P D = P Pi − B (12)
2n +1 i=0 2n +1 2n +1
where
∗
P Pi = (bi M B )23i (A + 1)
∗ ∗
= bi M B 23i A + bi M B 23i
Table 1 Partial product generation of proposed modulo 2n + 1 multiplier using modified weighted
method
an b31+2 b3i+1 b3i b3i−1 P Pi C Ri
0 0 0 0 0 . . .
111 000 23i + 1
(n−3i) (3i)
0 0 0 0 1 an−1−3i . . . a0 an−1 . . . a1 an−3i +1
0 0 0 1 0 an−1−3i . . . a0 an−1 . . . a1 an−3i +1
0 0 0 1 1 an−1+(3i+1) . . . a0 an−1 . . . a1 an−(3i+1) +1
0 0 1 0 0 an−1−(3i+1) . . . a0 an−1 . . . a1 an−(3i+1) +1
0 0 1 0 1 h n−1−(3i) . . . h 0 h n−1 . . . h 1 h n−3i +1
0 0 1 1 0 h n−1−(3i) . . . h 0 h n−1 . . . h 1 h n−3i +1
0 0 1 1 1 an−1−(3i+2) . . . a0 an−1 . . . a1 an−(3i+2) +1
0 1 0 0 0 an−1−(3i+2) . . . a0 an−1 . . . a1 an−(3i+2) +1
0 1 0 0 1 h n−1−(3i) . . . h 0 h n−1 . . . h 1 h n−(3i) +1
0 1 0 1 0 h n−1−(3i) . . . h 0 h n−1 . . . h 1 h n−(3i) +1
0 1 0 1 1 an−1−(3i+1) . . . a0 an−1 . . . a1 an−(3i+1) +1
0 1 1 0 0 an−1−(3i+1) . . . a0 an−1 . . . a1 an−(3i+1) +1
0 1 1 0 1 an−1−(3i) . . . a0 an−1 . . . a1 an−(3i) +1
0 1 1 1 0 an−1−(3i) . . . a0 an−1 . . . a1 an−(3i) +1
0 1 1 1 1 . . .
111 000 23i + 1
(n−3i) (3i)
1 X X X X . . .
111 000 23i + 1
(n−3i) (3i)
constant term equal to 1, expect for ‘0000’ and ‘1111’, i.e. for scaling factor ‘0’ that
introduces constant term equal to 23i + 1. Thus, sum of all correction term derived
using partial product can be given by
n
C R = + Cn (13)
3
n
where Cn = i=0 3
C3i 23i .
After putting value of Eq. 13 into Eq. 12
n
−1
3 n
P D
= P Pi + + Cn − B
3
2n +1 i=0 2n +1
n3 −1
n
= P Pi + + Cn − bn 2 − Bn−1:0
n
i=0
3 2n +1
n
−1
3 n
P D = P Pi + + Cn + bn − Bn−1:0
3
2n +1 i=0 2n +1
192 N. K. Kabra and Z. M. Patel
n
−1
3 n
P D
= P Pi + + Cn + bn + Bn−1:0 + 2
3
2n +1 i=0 2n +1
n3 −1
n
= P Pi + + Cn + bn + 1 + 1 + Bn−1:0
i=0
3 2n +1
n
−1
3 n
= P Pi + + Cn + 1 + Bn−1:0
∗
i=0
3 2n +1
P D = S + C + 1
(14)
2n +1 2n +1
The Eq. 14 represents final equation of proposed algorithm where S + C =
2n +1
n3 −1
∗ ∗
i=0 P Pi + 3 + Cn + Bn−1:0 n and Cn = 1 + bn + Cn .
n
2 +1
there are 3i MBSC∗ blocks and (n-3i) MBS∗ blocks will be used. Truth Table 3 shows
the bit wise partial product generation for different Booth encoded bits. Figure 3a,
b shows a block diagram and logical representation of MBS∗ block while Fig. 4a, b
shows a block diagram and logical representation of MBSC∗ block, respectively.
This section describes generation of correction term (C3i ) required when any one of
∗
condition an = 0 or bn = 0 or bi M B = 0 will happen. From Table 1 correction term
for proposed algorithm will be in form of 0ci+3 00ci 00c6 00c3 00c0 where ci [0, 1].
Truth Table 4 shows correction term of 0th partial product and it can be seen that Ci
is equal to 1 only when four output of Booth encoder HG,TH,TW and ON is equal to
0. Thus, CTG is implemented by 4 input NOR GATE as shown in Fig. 5a. Figure 5b
shows correction term generator for 8-bit modulo 2n + 1 multiplier.
196 N. K. Kabra and Z. M. Patel
Fig. 6 Proposed architecture of partial product generator for modified weighted method
Fig. 7 Proposed architecture of partial product accumulator for modified weighted method
All partial products are accumulated with help of CEAC-CSA tree while constant
values will naturally be added and the accumulated partial product reduced to two
n-bit operands. The CSA tree is usually constructed with full adder’s (FAs). A stage
that takes input correction term can be constructed using half adder’s (HAs) and FA
as shown in Fig. 7. Finally, diminished-1 modulo 2n + 1 adder accepts n-bit inputs
and generates the final product in weighted representation. The 8-bit modulo 28 + 1
multiplier is shown in Figs. 6 and 7.
198 N. K. Kabra and Z. M. Patel
5 Conclusion
References
1. Taylor FJ (1984) Residue arithmetic a tutorial with examples. IEEE Comput 17(5):50–62.
https://doi.org/10.1109/MC.1984.1659138
2. Watson RW, Hastings CW (1966) Self-checked computation using residue arithmetic. Proc
IEEE 54(12):1920–1931. https://doi.org/10.1109/PROC.1966.5275
3. Conway R, Nelson J (2004) Improved RNS FIR filter architectures. IEEE Trans Circuits Syst
II Express Briefs 51(1):26–28. https://doi.org/10.1109/TCSII.2003.821524
4. Wang W, Swamy MNS, Ahmad MO (2004) RNS application for digital image processing.
In: 4th IEEE international workshop on system-on-chip for real-time applications, pp 77–80.
https://doi.org/10.1109/IWSOC.2004.1319854
5. Yen SM, Kim S, Lim S, Moon SJ (2003) Rsa speedup with chinese remainder theorem immune
against hardware fault cryptanalysis. IEEE Trans Comput 52(4):461–472. https://doi.org/10.
1109/TC.2003.1190587
200 N. K. Kabra and Z. M. Patel
Abstract The floating-point (FP) addition is the most frequently used FP opera-
tion. Here we are using single-electron transistor (SET) for floating-point addition.
This research aims to implement a 32-bit binary floating-point adder with IEEE
754 standard using SET. In floating-point arithmetic, FP addition is the most diffi-
cult activity and it offers more delay and more power consumption. Here we are
comparing SET and CMOS-based (16 nm) floating-point adder. SET-based FP adder
consumes very less power and also very less delay. For simulation and verification,
CADENCE virtuoso is used. According to our results, SET-based FP addition has
79.70% improvement in power and 97.67% faster than CMOS-based FP.
1 Introduction
the measurements on integrated circuits and semiconductor chips [2]. This is the
reason many problems, like short channel effects, costly lithography, and ultrathin
gate leakage, will take place. As stated by Moore’s Law and the driving of technology
in nano-scale routine, new nano-electronic solutions will be desired, overcoming the
physical and economic barriers of recent technologies. Among the present devices
like TFET, NEMS, SET, CNTFET, and graphene FET, SET is the most promising
device that can be hybridized with CMOS technology. SET has many features like
small size, low power consumption, and high-speed capabilities [3]. SET is a poten-
tial nano-device that can be designed and fabricated to build high-performance logic
circuit. Floating-point (FP) adder is a complex operation that needs high memory
and processing power. So in this paper we shall exploit SET to perform FP addition
and compare the result with nano CMOS circuit.
SET consists of source, drain, gate, back gate, island, and two tunneling junctions.
Fig. 1a shows the SET symbol. Here VDS = bias voltage, CD and CS = tunneling
junction capacitance, CG1 and CG2 = gate capacitance, RD and RS = tunneling
Fig. 1 a Single-electron transistor. b The IDS versus VGS characteristic of the SET. c The IDS
versus VDS characteristic of the SET. d The IDS versus VGS characteristic of the n-SET and p-SET
Design of Prominent Single-Precision 32-Bit Floating-Point … 203
junction resistance, VG1 and VG2 = gate voltage. A detailed information about SET
working and modeling is available in the literatures [4, 5]. This work is based on
metallic SET which is fabricated using back-end-of-line (BEOL) fabrication process,
showing a low thermal budget process [6]. The SET basically works on two principles,
Coulomb blockade and quantum mechanical tunneling. The responsibility for SET
working as a switching device lies with these two phenomena. It will work as a p-
switch and n-switch by adjusting a SET’s VG2 back gate potential. This combination
is known as n-SET and p-SET, replicating n-MOS and p-MOS. It is observed that
in the region of the Coulomb blockade, the current is nearly zero. It can be viewed
as a simple gate-controlled switch. Switch is ‘OFF’ when Coulomb blockade region
exists, otherwise switch is in ‘ON’ state. SET and CMOS have complementary
properties [6].
Figure 1b is the drain current characteristics at constant VDS , drain to source
voltage. With small values of VDS , electrons cannot rise above the single-electron
charging energy and cannot tunnel onto the island, so current cannot flow in the
device. This is called the Coulomb blockade effect, and leads to the I–V character-
istics shown in Fig. 1c. By the SET characteristic, we can conclude that SET will
be used as a complementary state, n-SET, and p-SET. Fig. 1d shows IDS versus VGS
curve of n-SET and p-SET.
SET parameters are defined in the work presented by Parekh et al. [6]. The SET
characteristics are based on the MIB model (Mahapatra–Ionescu–Banerjee) [7]. To
compare with bulk CMOS 16-nm, we are using 16-nm BSIM predictive model [8]
and implementing it in analog design environment of CADENCE virtuoso [9]. So
here we have capacitance CD = CS = CJ ; the control gate capacitance, CG1 = CG2
= CG ; the tuning gate capacitance, CB1 = CB2 = CB ; so the total capacitance is C
(C = 2CJ + CG + CB ) and the tunnel junction resistances, RD1 = RS1 = RD2 =
RS2 = Rt . The parameters are: CG1 = 0.045aF, CG2 = 0.050aF, CD = 0.030aF, CS
= 0.030aF, Rt = 1 M, T = 300 K, and a CMOS 16 nm parameter is defined by
16-nm BSIM predictive model [8, 10].
Binary Value = (−1) (Sign bit) × 2(Exponent −127) × (1. Mantissa) (1)
204 A. Sharma et al.
32-bit
Floating-Point Addition
The most frequently used FP operation is the addition. It accounts for nearly 50%
of logical scientific operation in digital signal processing processors, math co-
processors, arithmetic operations in embedded processors, and many data processing
units. In these processors, every part requires very high numerical stability and preci-
sion and is subsequently FP based. In most hardware designs FP unit is one of the most
significant convention applications desirable because of accuracy, ease of use, and
robustness to quantization errors added in the design. When compared to FP multi-
plication, FP addition is the more complicated and most troublesome task. FP adder
comprises many sub-operation and variable latency. For improving the overall latency
of FP adders a lot of works have been reported. In 1985, IEEE issued 754 standards for
binary floating-point arithmetic [12]. In this work we have designed and simulated
32-bit FP adder and a detailed explanation on algorithms, architecture-operation,
sub-operations, and simulation results is followed in the subsequent section (Fig. 3).
This paragraph will investigate the FP algorithm. As a part of this proposed algorithm,
FP architecture modules are designed and simulated in cadence virtuoso environment
including their function, capacity, and use. The proposed algorithm for floating-point
addition is shown below [13] and following are the steps.
Design of Prominent Single-Precision 32-Bit Floating-Point … 205
Step 7: If there is no overflow or underflow then the next step is to round off to
fit the number in assumptions.
Step 8: After rounding off, we again check if the sum is normalized. If not then
go to step 4 and normalize the sum.
Step 9: If normalized then FP addition process is completed.
Here in this work SET-based FP adder has been compared with CMOS 16 nm FP
adder. To simulate, 16 nm CMOS PTM model has been integrated and implemented in
the spectre circuit simulator and virtuoso analog design environment of CADENCE
[8, 10]. SET parameters are defined in the work by Parekh et al. [6]. The SET is a
metallic SET that uses Ti/TiOX interface at the junctions fabricated on a TEOS layer
by the nanodamascene process. This SET can operate at room temperature (300°K).
The parameters are: CB = back gate capacitance = 50zF, junction capacitance (CD
and CS ) = 30zF, CG = gate capacitance = 45zF, tunnel junction resistance RT =
1 M, high voltage = 800 mV, low voltage = 0 V, gain = 1, T = 300 K[6]. With this
data we have simulated the SET-based FP adder with the following example.
Example: For an example we have taken two numbers 0.6 and 0.1 and their FP
representation is shown below.
Step 3: Normalize the output and the result is (get the “hidden bit” to be a 1).
0 01111110 01100110011001100110100
5 Performance Evaluation
Table 1 Performance
Parameter CMOS (16 nm) SET Improvement (%)
analysis of FP
Delay (ps) 904.40 21 97.67
Power (µW) 16.81 3.41 79.70
Addends = 0.6
Addends = 0.1
Sum = 0.7
Addends = 0.6
Addends = 0.1
Sum = 0.7
6 Conclusion
References
1 Introduction
Low-density parity-check code is popular due to its performance near channel capac-
ity. It was invented by R. Gallager in his Ph.D. thesis at MIT in 1963 [8]. It was for-
gotten due to the unavailability of software and implementation complexity of code.
It was rediscovered by D. J. C. Mackay and Neal in the mid-’90s at Cambridge Uni-
versity [12]. It was graphically represented by R. M. Tanner in 1981 using a bipartite
graph, also known as tanner graph [14]. It is classified into two categories: (1) random
constructed LDPC code and (2) structured constructed LDPC code. Random con-
structed code perform well than the structured one but implementation complexity
is more than structured constructed code [10]. Structured constructed code is widely
D. J. Patel (B)
Dr. S. & S. S. Ghandhy Government Engineering College, Surat, India
e-mail: [email protected]
P. Engineer (B)
S. V. National Institute of Technology, Surat, India
e-mail: [email protected]
N. S. Bhatt (B)
C. K. Pithawala College of Engineering and Technology, Surat, India
e-mail: [email protected]
used in short and moderate length of code, while the random constructed code is
used in long length code such as satellite communication standard, i.e., DVB-S2.
The LDPC code was adopted in various wireless standards, such as IEEE 802.11n
(Wi-Fi) and IEEE 802.16e (Wi-Max). It is also used in different storage media
applications.
It is constructed by sparse parity-check matrix H which contains less number of
the nonzero entries in it. Due to that, the computational complexity of an LDPC
decoder is less. Different methods of code constructions [11, 13, 16, 17] are used to
construct parity-check matrix H of QC-LDPC code. These methods give a different
performance of bit error rate (BER) for a different number of iterations. All these
methods are limited in girth due to its structured constructions. Performance of BER
in the waterfall region can be improved by density evaluation while the error floor
can be improved by avoiding short cycle (cycle of four). Good performing code can
efficiently perform image communication with less signal-to-noise Ratio (SNR) and
less bit error rate (BER) with high peak signal-to-noise Ratio (PSNR) because an
improvement in BER improves PSNR [5]. The code construction suggested using
Sridara–Fuja–Tanner (SFT) structure and quadratic congruence gives BER of 10−5
approximately at 4.5 dB SNR for 20 iterations [16]. The construction suggested
in [11] gives BER of 10−5 at SNR of 3 dB while the code construction suggested
in [13] gives BER of 10−5 for 35 iterations at 4.5 dB SNR.
The parity-check matrix H obtained using this kind of constructions is used to
make the generator matrix either by linear time encoding method or by Gauss–
Jordan elimination method [10]. The generator matrix is used to encode the data.
The linear time encoding keep parity-check matrix sparse [10]. The computational
complexity using Gauss–Jordan elimination method to reduce parity-check matrix
H into systematic form is O(N 3 ) and complexity of encoder is O(N 2 ), where N is
code length [15]. In [15], other methods were suggested such as encoding based on
the structure of the parity-check matrix, encoding by erasure decoding, which are
the methods used for block codes. These methods encode the message with linear
complexity. Decoding can be done using either by hard-decision decoding or by
soft-decision decoding. The accuracy of soft-decision decoding is more compared to
hard-decision decoding. The Sum–Product, min-sum and off-set min-sum decoding
are the examples of soft-decision decoding. To reduce the computational complex-
ity proposed decoder used min-sum iterative message-passing decoding algorithm.
There is a coding gain difference of 0.27–1.03 dB between Sum-Product decoding
and min-sum decoding algorithm [2]. According to author[15], the decoding com-
plexity is O(N ), which is less compared to the encoder. According to author [4], the
BER performance of SPA is improved by 1.8 dB compared to Bit Flipping Algorithm
(BFA) of hard-decision decoding. Decoder design might be of bit-serial, partially par-
allel or fully parallel type [3]. The bit-serial design gives fewer throughput with the
use of less resources. The partially parallel design gives throughput, which is more
than bit-serial but less than fully parallel design. The fully parallel design gives more
throughput with increasing hardware resources [3]. The bit-serial and partially paral-
lel design require intermediate memory blocks. Proposed decoder design is based on
fully parallel approach. In [1], partial parallel design was implemented on Virtex-7
Image Communication Using Quasi-Cyclic … 213
FPGA for code length (1152, 2304) and obtained throughput of 300 Mbps for 20
iterations and 400 Mbps for 15 iterations. The design was implemented with the use
of BRAM due to which the throughput was reduced and slice LUTs were consumed
less. In [1] quantization of (6, 4) was used which is more than our proposed design
for the code length 1152. In [6], partially parallel design was implemented for three
different rates which gives throughput of 100 Mbps with 12 iterations. This design
is based on the offset belief propagation algorithm. A fully parallel LDPC decoder
was designed using (4, 1) quantization achieving a throughput of 16.9 Gbps at SNR
value of 3.5 dB [3].
Proposed work construct the structured LDPC code using isolated shifted identity
matrices known as quasi-cyclic LDPC code. The LDPC code is constructed for half
rate, and it is of (3, 6) regular type LDPC code. Here 3 indicate column weight
and 6 indicate row weight of a parity-check matrix H. The minimum distance lin-
early increased with code length due to a column weight of 3. For the code of col-
umn weight less than 3, the minimum distance is logarithmically increased [8]. The
more minimum distance will increase the error-correcting capability of the decoder,
according to Hamming. Proposed work is done for QC-LDPC code of length 1152.
The parity-check matrix H constructed using the proposed method is as follows
⎡ ⎤
I1 I2 I3 I4 I5 I6
H = ⎣ I6 I5 I4 I3 I2 I1 ⎦ (1)
I2 I3 I4 I5 I6 I7
where I1 to I7 are the circularly shifted identity matrices of size p their rows are
right shifted rows. It means first row is right shifted by ‘1’ with respect to last row
and second row is right shifted with respect to first. The minimum distance of the
above constructed code is (J + 1)! for (J, L) type regular code [7]. The rate of the
wc
code is 1 − w r
where wc = weight of column and wr = weight of row. According
to that rate of our code is 21 .
2.1 Encoding
H = [I |P] (2)
where I is the identity matrix of size (n − k) × (n − k), where n is code length and
k is the message length. A P is the matrix of parity bits, which is transposed and
concated with the identity matrix of size k × k and form the generator matrix G
which is represented by
G = [P T |I ] (3)
2.2 Decoding
Hard decision taken at the end of given iterations or for exit condition test is given
in (8).
ci = 0 if y f i = +1
=1 if y f i = −1 (8)
General diagram of LDPC decoder shown in Fig. 1. It mainly consist of Input network,
Output network, Control unit and Permutation network of Variable nodes and Check
nodes. It is a fully parallel design where all the Variable nodes and Check nodes are
mapped. Heart of this decoder is Control unit which controls all the block on specific
count which was synchronized with the system clock. Figure 1 shows that the encoded
message in the form of 4-bit LLR’s comes through AWGN channel and given to the
input network then decoding process started in the form of Variable nodes and Check
nodes process for specific iterations given to LDPC decoder. Decoded output comes
through the Output network. Proposed decoder take 96 simulteneous inputs at a time
inside the LDPC decoder. Within 12 clock cycles, all inputs (1152) are fed inside the
Input network of the decoder. After decoding process is over, 96 simulteneous outputs
comes outside of the LDPC decoder at a time through the Output network. Due to
fully parallel architecture Variable nodes and Check nodes process are completed
into one clock cycle in alternate manner so, for one iteration they required two clock
cycles. The throughput of proposed decoder is approximately 2 Gbps due to fully
parallel design but resource requirement is more. The throughput is calculated as
follows:
N ×r × f
Throughput = (9)
itr × θ
The proposed work was done using MATLAB R2014a and Xilinx ISE Design Suite
14.2. The flow of Image communication using the proposed design is shown in
Fig. 2. Image is read and converted into binary data which is then encoded using
Gauss–Jordan elimination method on MATLAB R2014a. After encoding, image is
modulated with BPSK and passed through the AWGN channel generating text files
for various SNR values. Noisy data is sent to FPGA through UART and further
decoded using min-sum iterative message-passing decoding algorithm on hardware.
A decoded data is further sent to PC through UART. Then a decoded data is recon-
structed using conversion from binary to image data on MATLAB R2014a. The
proposed LDPC decoder is used to reconstruct the image at 4 dB similar to original
image which can be observed from the peak signal-to-noise ratio (PSNR) value of
reconstructed image. The quality of decoded image at different BER is also analyzed
by computing peak signal-to-noise ratio (PSNR) with respect to original image. The
PSNR is calculated as follows [5],
2
Pmax
PSNR (dB) = 10 × log (10)
MSE
x y
(|Ai j − Bi j |)2
MSE = (11)
i=1 j=1
x×y
where MSE is mean-square error, Pmax is maximum value of pixel in the image, A
is pixel value of original image, B is pixel value of reconstructed image, x is height
of image in pixels, y is width of image in pixels [5].
The PSNR values for different SNR can be observed from Fig. 3. We can observe
the plot of BER and PSNR from Fig. 4. We can observe that PSNR is increased as
the BER is improved. We got maximum PSNR for the first image shown in Table 1
which was around 40 dB.
218 D. J. Patel et al.
PSNR at different BER: The proposed decoder gives a comparable BER and PSNR
performance to the decoder suggested by [5] that can be observed from Table 1.
PSNR at different SNR of reconstructed image: Proposed decoder gives better
performance of PSNR at less SNR values compared to design suggested in [9, 18]
that can be observed from Figs. 5, 6, and 7. The proposed decoder reconstruct the
image at SNR value of 4 dB while the design suggested in [9] is able to reconstruct the
image at SNR value of 12 dB while the image was reconstructed at SNR value of 4 dB
but still having distortion [18]. Proposed decoder efficiently do image communication
compared to design suggested in [9, 18]. The reconstructed images can be observed
from the comparison shown in Table 2.
Image Communication Using Quasi-Cyclic … 219
Proposed 4 dB
[5] 3.5 dB
[9] 12 dB
[18] – 4 dB
[18] – 4 dB
5 Conclusion
Proposed decoder efficiently reconstructs the image with the good quality compared
to design suggested in [9, 18]. It also gives a comparable performance of BER to
the design suggested in [5]. Due to the use of min-sum decoding algorithm, the
computational complexity of decoder is less, and there is not any alternate solution
of a min-sum decoder which is less complex. Due to fully parallel architecture, the
proposed decoder communicates an image at a high data rate.
Acknowledgments The authors would like to thank the VLSI Laboratory of Electronics engineer-
ing department, Sardar Vallabhbhai National Institute of Technology (SVNIT) surat for providing
resources for this work.
Image Communication Using Quasi-Cyclic … 221
References
1. Amaricai A, Boncalo O, Mot I (2015) Memory efficient FPGA implementation for flooded
LDPC decoder. In: 2015 23rd telecommunications forum Telfor (TELFOR), pp 500–503
2. Anastasopoulos A (2001) A comparison between the sum-product and the min-sum iterative
detection algorithms based on density evolution. In: IEEE global telecommunications confer-
ence GLOBECOM’01 (Cat. No.01CH37270), vol 2, pp 1021–1025
3. Balatsoukas-Stimming A, Dollas A (2012) FPGA-based design and implementation of a multi-
GBPS LDPC decoder. In: 22nd international conference on field programmable logic and
applications (FPL), pp 262–269
4. Chandrasetty V, Aziz S (2011) FPGA implementation of a LDPC decoder using a reduced
complexity message passing algorithm. JNW 6:36–45
5. Chandrasetty V, Aziz S (2014) Resource efficient LDPC decoders for multimedia communi-
cation. Integr VLSI J 48:213–220
6. Chen Y, Chen X, Zhao Y, Zhou C, Wang J (2010) Design and implementation of multi-mode
QC-LDPC decoder. In: 2010 IEEE 12th international conference on communication technol-
ogy, pp 1145–1148
7. Fossorier MPC (2004) Quasicyclic low-density parity-check codes from circulant permutation
matrices. IEEE Trans Inf Theory 50(8):1788–1793
8. Gallager R (1962) Low-density parity-check codes. IRE Trans Inform Theory 8(1):21–28
9. Jeng L-D, Chang Y-H, Lee TH (2014) Image transmission for low-density parity-check coded
system under PBNJ. In: The fourth international conference on digital information processing
and communications (ICDIPC2014), pp 101–106
10. Johnson SJ (2010) Introducing low-density parity-check codes
11. Kong L, Xiao Y (2008) Construction of good quasi-cyclic LDPC codes based on the row vectors
of generator matrix. In: IET 2nd international conference on wireless, mobile and multimedia
networks (ICWMMN 2008), pp 215–218
12. MacKay DJC, Neal RM (1997) Near Shannon limit performance of low density parity check
codes. Electron Lett 33(6):457–458
13. Malema G (2007) Low-density parity-check codes: construction and implementation
14. Tanner R (1981) A recursive approach to low complexity codes. IEEE Trans Inf Theory
27(5):533–547
15. Tanner RM, Sridhara D, Sridharan A, Fuja TE, Costello DJ (2004) LDPC block and convolu-
tional codes based on circulant matrices. IEEE Trans Inf Theory 50(12):2966–2984
16. Timakul S, Choomchuay S (2011) Construction of quasi-cyclic LDPC codes form SFT struc-
ture and cyclic shift. In: 2011 international symposium on intelligent signal processing and
communications systems (ISPACS), pp 1–4
17. Wang Y, Draper S, Yedidia J (2013) Hierarchical and high-girth QC-LDPC codes. IEEE Trans
Inform Theory 59:4553–4583
18. Zhang Y, Li X, Yang H (2016) Unequal error protection in image transmission based on LDPC
codes. Int J Signal Process Image Process Pattern Recognit 9:1–10
Smart Soldier Health Monitoring System
Incorporating Embedded Electronics
1 Introduction
The military forces of a country play an important role in the external as well as the
internal security of the country. The same is attained by activities such as patrolling,
counter-insurgency operations, surgical strikes, and so on. The modus operandi
of these activities is deputing a group of soldiers with a well-structured chain of
command who operate as per the instructions of the designated leader. During every
operation, there is a controlling base station which monitors the overall activities
of many such troops and keeps a track of them primarily through communication
with RF devices such as walkie-talkie and communication sets. Early retrieval of
information regarding the activities happening in the area of the operation plays an
important role in making a decision and planning for further actions. In practice,
when the soldier does not respond to the reporting medium such as walkie-talkie,
communication sets, or does not turn up in providing the situational report in the
stipulated time, a casualty is assumed. This procedure is timeworn and antiquated.
During patrolling, the group leader generally subdivides the group into a subgroup
of few soldiers, and the area to be patrolled is divided and assigned to each subgroup.
A fixed time interval is decided by the group leader and the subgroups submit the
situational report (called SITREP) according to the frequency of time fixed. Same
is informed to the commander at the base station. In the case of action or attack
during patrolling, there is no communication between the subgroups. The group
leader does not know the location of the subgroup until they physically come and
make the SITREP. Due to the heavy time lag in obtaining information about any
kind of casualty, the group leader cannot swiftly take effective decisions such as
providing backup cover, sending additional manpower to the affected area, and so
on, thus, dwindling the overall fighting efficacy of the troop. The proposed model
aims at the design of a prototype for a smart soldier health monitoring system. It
is an incorporation of embedded systems and body area network for automatically
providing information on the occurrence of any casualty in a troop. Incorporation
of wearable electronics with wireless networks would rally round in improving the
fighting efficiency of the troops with minimal casualty.
2 Literature Review
The functioning of a group of soldiers in the battlefield and tactical scenario can
be augmented by monitoring the activities and physiological parameters. The same
requires precise information about the soldier’s activities, health, and position [1].
Recognition of human activities plays an essential role in such requirements and
is carried out in two ways, namely through external ways and through wearable
sensors, where in tactical scenario the latter is the essential requirement. A system
that monitors personnel conditions includes a variety of sensors that can be put on
a soldier for measuring parameters like the physiological and activity parameters.
The sensors upon detection of the signals communicate with a soldier unit which
can process the information. Location parameters can be obtained through a global
positioning unit through which the leaders can effectively locate the soldiers and take
necessary action in a tactical scenario [2]. Further, the same also helps in providing
the medical help to track and treat any of the casualties in the battle. Along with
that, additional sensors can provide information which may assist both the group
leader as well as the medicos in understanding the area of deployment and take
necessary measures before deployment to the area. The primary parameter which
Smart Soldier Health Monitoring System … 225
needs to be measured in this scenario is the motion parameter of the soldier which
determines the condition of the soldier. An inertial measurement unit can be best
preferred for the same. In addition to the application of accuracy and precision, this
application requires high shock survivability and post-shock stability. The MEMS-
based sensors can be effectively used for motion detection as the size, accuracy,
and precision match the requirements for parameter measurement of a soldier. For
measuring the activities of a soldier, any MEMS-based gyroscope which can measure
6–9 degrees of freedom is preferable. To cover the same, three gyroscopes and three
accelerometers are essential [3].
Kozlovszky et al. in their work on IMU-based human motion tracking [4]
described the usage of an inertial movement unit-based wearable rehabilitation move-
ment tracking device which can be used for human motion tracking. A realized soft-
ware, using a multisensory-based fusion solution where the data is taken from a
multiple-axis accelerometer and gyroscopic sensor devices, has been implemented
for detecting the motion parameters. The second parameter which is essential for
the appraisal of the condition of soldiers in the battlefield or forward area is their
physiological condition. The efficacy of combat medics can be greatly improved
by increasing the speed and precision in which the physiological information is
gathered from the soldiers who are wounded [5]. For monitoring the health of a
soldier, multiple physiological parameters can aid in multiple diagnoses and help in
determining the most effective forms of the treatment. The pulse rate of a soldier is
calculated through measurement of heart rate and arterial oxyhemoglobin saturation
(SpO2) [6]. Further, the heart rate variability plays an important role in identifying
critically injured patients and is a latent predictor for mortality. The task force of
the European Society of Cardiology and The North American Society of Pacing
and Electrophysiology has proposed a study bringing out the various applications of
heart rate variability of a human being [7]. The study brought out various measures
of determining the heart rate variability, analysis, and evaluation methods. With this
study, it is clear that the pulse rate measurement can be highly effective in measuring
the physiological condition of the soldier. At locations where the atmospheric condi-
tions have a deep impact on the soldiers, ECG monitoring of the soldier is preferred
for detailed health monitoring.
Measurement of the atmospheric conditions in which the soldier is operating is
also essential to enable the commander in control make necessary changes on the
backup of troops as well as the medical help being deployed. In a field manual of US
army, various weather parameters affecting the soldier in a battlefield were brought
out, with the primary ones being temperature, humidity, and altitude in which the
troops are operating. The measured data of the atmospheric parameters at the area
of operation when transmitted to the control station gives a clear picture of the
scenario and necessary corrective actions can be taken by the commander on the
medical backup being deployed as well. Measurement of air quality in the area of
operation would be extremely helpful for the tactical decision-makers in making
effective decisions on further deploying the troops or providing additional backup.
Furthermore, the data of the measured air quality would also assist in perspective
health monitoring. Thus, the essential sensors and components required for making
226 K. Teja et al.
a smart soldier system have been defined. The next prerequisite for the smart soldier
system is the transmission of measured values and signals as per the algorithm defined
for various conditions of the soldier. The transmission module should consume less
power and should be adaptable to various electronic warfare techniques, such as
frequency agility and frequency hopping. Low power supply consumption is essential
as the same would help in the conservation of the portable power supply, which is
generally battery. Sensor connectivity plays an important role in the system. As
the system is being operated in a tactical environment, the accuracy, precision, and
connectivity of the sensors should be paramount, and like any sort of minute change
in the activity and conditions should be detected at the immediate instinct [8]. The
connectivity of sensor can either be wired or wireless. The wireless connectivity of
sensors in the form of a body area network is best suited for tactical scenarios [8].
The wireless network of sensors on a human body is called as wireless body area
network. A wireless body area network (WBAN) is a system that is capable of trans-
mitting a soldier’s positional information as well as physiological information to a
central control unit for storage and tactical data analysis. The primary aim of WBAN
is to abridge and enhance the speed, accuracy, and reliability of communication of
sensors and actuators within, or on immediate proximity of a human body. Incor-
porating WBAN would enable continuous monitoring of physiological and environ-
mental attributes. In the case of detection of any abnormality, the data collected by
the sensors can be sent to a gateway [9]. The activities, condition, and environment
affecting the soldier on the battlefield can be effectively monitored through wireless
body area network. This can be carried out by integrating various sensors, GPS, and
wireless networking combined with an aggregation device for communication with
other soldiers and centralized monitoring [10]. WBAN, in the proposed system uses
star topology in which several sensor nodes are mounted on various locations of the
human body (soldier). They collect data related to various parameters and send them
to the coordinator node. The coordinator node is responsible for conveying the data
after aggregation to a central location for further analysis and planning [11].
The work done in this area had several constraints till now which are addressed in
the presented work. One of the already published papers presented a system which
used GSM module to transfer the sensor data to the control station [12]. However,
the system given in the presented paper uses LoRa module which is less power-
consuming than GSM, and there is no requirement of long range GSM. Furthermore,
GSM signals are not available in many places, such as national borders and mountain
regions. Another work used K-means clustering algorithm to determine the condition
of the soldier [13]. The use of machine learning algorithm requires more computation
power and more time for analyzing the data. The system presented in this paper
determines the condition of the soldier based on the real-time data read from multiple
sensors and makes the determining algorithm simple. One more already published
paper used WiFi module to transmit the data to the base station [14]. The range of
WiFi module is very less with respect to the requirement of the range for the system
to work properly. Along with that, it has high power consumption than the LoRa
module. Thus, the improvement in the power consumption and the range has been
carried out in the presented paper. Moreover, the system also uses motion sensor
Smart Soldier Health Monitoring System … 227
which is the key point of the algorithm and along with that, various other sensors are
used to get a detailed analysis of the soldier’s health in real time.
The block diagram in Fig. 1 has been conceptualized using sensors for measurement
of casualty, physiological health monitoring, geographical area indication, and detec-
tion of environmental condition. The processing circuitry and transmission section
form the other part of the system.
The body movement detection sensor is either an inertial measurement unit (IMU)
or a gyroscopic accelerometer with suitable degrees of freedom. It would be trans-
mitting signals as per the movement of the human body. When the soldier does not
move due to a casualty or injury, this sensor would not transmit any signal to the
processing circuitry. The non-receipt of the signal from the sensor can be treated as
an indication of the casualty. Vibration detection sensor indicates if a soldier experi-
ences a bullet wound. If a soldier is shot, he would experience a vibration wave above
particular threshold intensity as a result of the gunshot. The same can be established
on receiving a signal from the shock and vibration sensor. Information about the
environment and of surrounding areas like the light intensity, temperature, humidity,
and direction is provided by the respective sensors. In case the soldier intends trans-
mitting a code during operation, the same can be done by the keypad provided. If
GPS coordinates and identification code of all the soldiers of a particular group are
transmitted to the base station, it can be considered as a biological warfare attack.
The data generated by the sensors as per the conditions are fed to the processing
circuitry. The inputs from the GPS chip of the particular node transmitting would be
multiplexed with the identity code of the soldier. The processed signals are converted
into RF signals of the suitable frequency of IEEE communication standards. Thus,
the generated RF signal is a multiplexed signal of GPS coordinates as well as the
soldier identity. The system would also consist of receiver along with alarm and indi-
cator. It would first alarm the other nodes in case of any casualty and indicate the call
sign if the soldier shot or hurt. The details of the casualty along with the coordinates
will also be transmitted to the base station for any further actions. Figure 2 depicts
the flowchart of the working of the proposed model.
The overall goal of the system is to determine the physical health of the soldier and the
environmental conditions of the soldier’s surroundings. ECG sensor, accelerometer,
and SPO2 sensor are used to determine the physical health of the soldier. Along with
that, temperature sensor, humidity sensor, air quality sensor, and GPS sensor are used
to determine the environmental conditions of the surroundings and location of the
soldier. Finally, LoRa module is used to communicate with the base station. A sensor
which can simultaneously measure motion parameters such as body movement and
Smart Soldier Health Monitoring System … 229
the sensors are selected based on the requirements of the accuracy. Figure 3 depicts
the circuit diagram of the system.
The development, modeling, and analysis of the system are split into two categories.
These two categories are namely algorithm and results. The components are first
programmed as per the designed algorithm. The sensors have been individually
interfaced and results have been analyzed. A graphical user interface has been created
by using the concept of threading in PYTHON. A user interface has been created
using QT designer editor and the real-time data of the system is displayed in the GUI
module created in QT designer. The condition of the soldier along with the parameter
values of the sensors is displayed in the GUI, which is assumed to be located at the
central control station.
Smart Soldier Health Monitoring System … 231
5.1 Algorithm
The steps involved in the development of the system are as in the algorithm shown
in Fig. 4. In the first step, each of the sensors is initialized with their respective
requirements. Thereafter, three messages are defined, which are: when the soldier is
healthy, the soldier is wounded, and the soldier is dead. The sensors of the system
are set up and a loop runs which iteratively takes the reading from the sensors. After
measuring the readings from the sensors, the required calculation is done to determine
the condition of the soldier using a principal logic. The data is then transmitted to
LoRa module which in turn is received at the control station and the data is displayed
on the GUI. In addition, Table 1 shows the range of all the parameters to determine
the condition of any particular soldier.
5.2 Results
Table 1 The condition of the soldier and the corresponding range of parameters
Condition Simulation Motion sensor Pulse sensor ECG sensor (AD
(MPU 6050) (MAX 30105) 8232)
Soldier is Simulation by Acceleration X: Pulse rate Heart rate
healthy measurement 1–16; Y: 1–16; Z: between 45 and between 45 and
through normal 1–16; Pitch: 120 (IR value 120 resulting in
condition of 1–255; Roll: from 10500 to the following
motion and pulse 1–255; Yaw: 10600) intervals. PR
rate 1–255; Movement interval from 0.12
X: 00/01; Y: to 0.20 s, QRS
00/01; Z: 00/01; interval from 0.06
Angle X: 0–360°; to 0.1 s
Y: 0–360°; Z:
0–360
Soldier requires Condition is Soldier assumed Pulse rate less Heart rate less
medical help simulated by to be injured if than 45 (IR than 45; PR
showing no any of the value less than interval less than
motion in the acceleration and 10500) 0.1147; QRS
body or low pulse movement values interval less than
rate through are read as zero 0.0585
controlled
respiration
Soldier is dead Condition is Soldier is assumed Pulse rate is Heart rate equal to
simulated with no to be dead if all measured to be 0 resulting in the
motion in the the values are read zero no difference in
body and as zero for a time period of PR
controlled significant amount and QRST
respiration of time intervals
6 Conclusion
Fig. 5 a Output when soldier is healthy, b Output when soldier is healthy, c Output when soldier
is wounded
References
1. Lara OD, LabradorMA (2012) A survey on human activity recognition using wearable
sensors.IEEE Commun Surv Tutorials
2. Jacobson et al (2001)System for remote monitoring of personnel. United States patent number
US 6198394 B
3. Habibi S, Cooper SJ, Stauffer J-M, Dutoit B (2008)Gun hard inertial measurement unit based
on MEMS capacitive accelerometer and rate sensor.In: 2008 IEEE/ION position, location and
navigation symposium, IEEE, pp 232–237
4. Nguyen, KD, Chen I-M, Luo Z, Yeo SH, DuhBLR (2010) A wearable sensing system
for tracking and monitoring of functional arm movement.IEEE/ASME Trans Mechatron
16(2):213–220
5. Johnston W, MendelsonY (2005) Extracting heart rate variability from a wearable reflectance
pulse oximeter.In: Proceedings of the IEEE 31st annual northeast bioengineering conference,
2005, IEEE, pp 157–158
234 K. Teja et al.
6. Johnston WS, Mendelson Y (2004) Extracting breathing rate information from a wearable
reflectance pulse oximeter sensor.In: The 26th annual international conference of the IEEE
engineering in medicine and biology society, vol 2, IEEE, pp 5388–5391
7. Standards of Measurement, Physiological Interpretation, and Clinical Use. Task force of the
european society of cardiology and the north american society of pacing and electrophysiology
8. Montgomery RR, Anderson YL (2016) Battlefield medical network: biosensors in a tactical
environment. Naval Postgraduate School Monterey United States
9. Movassaghi S, Abolhasan M, Lipman J, Smith D, Jamalipour A (2014) Wireless body area
networks: a survey. IEEE Commun Surv Tutorials 16(3)
10. Movassaghi S, Abolhasan M, Lipman J, Smith D, Jamalipour A (2014) Wireless body area
networks: a survey. IEEE Commun Surv Tutorials 16(3):1658–1686
11. Arai H (2016) Smart suit for body area network. IEEE J Asia Pac Conf Appl Electromagnet
12. Shweta et al (2015) Soldier tracking and health monitoring systems. In: International journal
of soft computing and artificial intelligence, vol 3, Issue 1
13. Aashay et al (2018) iot-based healthcare monitoring system for war soldiers using machine
learning. In: International conference on robotics and smart manufacturing (RoSMa2018)
14. Pawar P, Desai A (2018) Soldier position tracking and health monitoring system: a review. In:
International journal of innovative research in computer and communication engineering, vol
6, Issue 3
15. Physionet, the research resource for complex physiological signals. https://archive.physionet.
org/
A 3–7 GHz CMOS Power Amplifier
Design for Ultra-Wide-Band Applications
1 Introduction
[6, 7], and Mixers [8] in an UWB transceiver. While designing PA on UWB trans-
mitter side it must meet some of the stringent requirements such as achieving broad-
band input and output matching, achieving high power gain, reasonable efficiency,
and low power consumption since PA is the power-hungry block on UWB transmitter
side.
In the literature, several implementations of wideband amplifiers are already
reported such as the resistive shunt feedback topology [9], the RLC matching
topology [10–13], distributed amplifier [14]. Among them, the distributed ampli-
fiers can provide good matching and linearity over a wide band of frequencies but
it dissipates more power and occupies more area on chip. Whereas, RLC matching
technique can provide both wideband and low power but it needs a number of reactive
elements to form the wideband and thus occupies large chip area and its layout design
is also complicated. In contrast, resistive shunt feedback technique can provide not
only wideband matching at both input and output port but provides added advantage
of providing stability in spite of occupying less area on chip [9]. The proposed PA
thus uses resistive shunt feedback in its second stage.
The UWB LNAs reported previously [15, 16] proposed Current-Reused (CR)
technique to enhance the gain of an amplifier at the upper end of the desired band-
width. The same CR technique has been incorporated in the designing of UWB PA
design in the present work to achieve the required gain for 3–7 GHz bandwidth.
Further, the PA designs reported previously could achieve a minimum chip area of 1
mm2 only, which needs to be further reduced to reduce the design cost of the overall
transmitter design. The phase linearity is another important performance parameter
that needs special attention in PA designs. These PA designs reported so far could
achieve the lowest gain ripple of about ±0.6 dB [17] within the entire bandwidth.
The phase ripples in the desired operating frequency band should be reduced below
±0.6 dB in designing UWB PA. Furthermore, the previous UWB PAs consumed chip
area nearly 1 mm2 , which could increase the cost for transmitter design. There is a
trade-off between efficiency, linearity, and low gain ripple requirement in designing
UWB PA.
For the proposed PA design, this trade-off is solved with moderate efficiency,
good linearity, and low gain ripple all fulfilling simultaneously without using any
special techniques for gain flatness, linearity, and low gain ripple. In [15, 16], the
principle of Current-Reused (CR) is used to gain flatness and matching while stagger
tuning is used in [18] for the same. While for the proposed PA design, instead of
using special techniques as used in [15, 16, 18]; effort has been taken in designing
the optimum R, L, C components to achieve all above parameters simultaneously
and it further makes proposed design more simpler than previously reported UWB
PA designs [17–20]. As a result, the proposed PA design exhibit good linearity, low
gain ripple of about ± 0.3 dB, moderate efficiency simultaneously.
A 3–7 GHz CMOS Power Amplifier Design … 237
2 Circuit Design
As a novelty, the proposed PA design exhibits the simplest design topology to fulfill
the requirement of low gain ripple, good linearity, and small group delay variations all
simultaneously. Murad et al. [17, 18] in his work, as depicted in Fig. 1a, used a Current
Reused (CR) cascoded Common Source (CS) structure as the first stage to boost the
gain since the input signal gets amplified twice with this technique for higher gain
[18]. Also, inter stage inductor is used to further improve the broadband operation
toward higher frequency of 7 GHz. Figure 1b shows the proposed 3–7 GHz UWB PA
which used less number of lump components for simplicity and those components
have been carefully designed and optimized to achieve performance improvement
over [17, 18] for low gain ripple, good linearity, and efficiency simultaneously. The
proposed power amplifier employs a cascode topology with inductive shunt peaking
on the first stage and Common Source (CS) with resistive feedback as the second
stage. The feedback resistance, R3 , here has been designed carefully to obtain resistive
shunt feedback, wideband output matching, and to bias the transistor M3 . At the input
stage, resistive shunt feedback R1 is used to achieve good input impedance matching
over the required bandwidth. This feedback resistance can provide excellent matching
to the input by selecting small value of R1 . However, the gain of the amplifier will
drop due to significant signal feedback through the path. A large value of R1 can
provide good gain but reduces the effective feedback.
Assuming a current of 10 mA to be drawn by transistor M1 , the calculated size
for NMOS transistor M1 is approximately 105 µm under saturation:
1 W1
I D1 = μn Cox (VG S1 − VT )2 (1)
2 L
C3
C2 M2
L2
L3 R1
L1 C1 L2 M1
C1 M1 L1
R1
R2
C2
Vb1
(a) (b)
Fig. 1 a Schematic of the 3–7 GHz CMOS UWB PA proposed by [18] (area is 0.88 mm2 ). The
components shown dotted in (a) are eliminated from the proposed PA design. b Modified schematic
for 3–7 GHz CMOS UWB PA proposed in the present work (area is 0.75 mm2 )
238 V. P. Bhale and U. Dalal
Table 1 Component values and device sizes for proposed UWB PA and PA reported in [18]
L C R M1 (W/L) M2 (W/L) M3 (W/L)
Proposed 320 360 fF 560 105 µm/0.18 µm 105 µm/0.18 µm 105 µm/0.18 µm
UWB PA pH to to to 10.4
4.12 2.112 k
nH pF
PA 0.4 0.4 pF 0.9 k 512 µm/0.18 µm 96 µm/0.18 µm 128 µm/0.18 µm
proposed pH to to 1.6 to 2.4
in [18] 2.2 pF k
nH
where μn = 327.4 cm2 /Vs, C ox = 8.42 × 10–3 pF/µm2 , the threshold voltage, V t =
0.45 V, for a typical 0.18 µm silicon CMOS process. In general, a large transistor
size M1 is needed to provide high gain and output power of the amplifier at high
frequency. The sizes for the transistors M2 and M3 have been designed with the same
mathematical approach. Hence, in this design, optimum value of feedback resistance
R1 is chosen to meet input matching and gain flatness over the frequency band of
interest. Capacitor C1 is a DC blocking and inductors L1 , L2, and Capacitor C2 is
a part of input T-matching. Transistors M1 , M2 , and M3 are operated in saturation
region of operation and transistor M1 is biased into class-AB to achieve sufficient
linearity and efficiency with the size of 105 µm/0.18 µm. All the shunt inductive
peaking techniques with Ld1 and Ld2 , in the proposed PA is needed to compensate for
bandwidth limitation, to increase the gain flatness and small group delay variations.
As an improvement over design proposed by Murad et al. in [17, 18], more effort
has been taken in well designing of RF components Ld1 , Ld2 , feedback resistance R1
and R3, and well designing of input T-matching network to prove more robustness and
performance improvement compared to previously reported power amplifier design
[17, 18]. As a result, the proposed PA design exhibit good linearity, lowest gain
ripple of about only ±0.3 dB, low power, and small chip area all simultaneously.
Capacitor C3 is added to improve broadband output impedance of 50 matching
over the entire bandwidth. Ld2 is tuned with capacitor C4 for output matching.
Table 1 shows the mathematically obtained optimized values of the proposed PA
design and the circuit parameters by Murad et al. [18].
3 Simulation Results
The circuit design of the UWB PA is simulated using Cadence Specter RF simulators
in UMC 0.18-µm CMOS process. Figure 2a shows the effect of optimizing the value
of peaking inductor to achieve a low gain ripple. In the figure, the lowest gain ripple
is obtained by optimizing the value of Ld1 within range of 3–4.4 nH. The optimum
Ld1 is chosen to be 3.4 nH for low ripple during 3–7 GHz. While, the size of the
transistor M1 is designed for low Noise Figure (NF).
A 3–7 GHz CMOS Power Amplifier Design … 239
Fig. 2 a Simulation results for effect of Ld1 on Gain. b Simulation results for effect of size of M1
on NF
In Fig. 2b, the lowest NF is obtained by optimizing the value of width of M1 from
50 to 200 µm. The optimum size is chosen to be 100 µm for low NF. The simulated
pre- and post-layout simulation results for gain are in good agreement with each
other with excellent gain flatness of 19 ± 0.3 dB as can be seen in Fig. 3. For present
work, gain flatness is achieved without using the special gain flatness improvement
techniques like Current Reused (CR) and stagger tuning as mentioned in [17–19].
Figure 4 shows the simulation results for the comparison of gain as obtained by
the proposed PA design and previously reported PA [17] with CR technique for 3.1–
4.8 GHz band. The proposed UWB PA shows very low gain ripple of ±0.3 dB for
3–7 GHz band. The simulated pre-layout and post-layout input and output return
loss, S11 and S22 are plotted in Fig. 5.
Fig. 3 Simulated gain versus frequency (pre-layout and post-layout) for proposed 3–7 GHz UWB
PA
240 V. P. Bhale and U. Dalal
Fig. 4 Comparison of
simulated gain performance
for proposed 3–7 GHz PA
and previously reported PA
The return loss at the input and output is below −7.5 dB and −6 dB, respectively,
is obtained within the required bandwidth. Also, it is observed from Fig. 5 that the
matching performance is degrading at the upper end of desired frequency (7 GHz) so
that the output matching gives good matching only up to 3–5.5 GHz. This may be due
to the tradeoffs involved in excellent gain flatness and perfect matching for wideband
applications. The input and output matching parameter can be further improved by
using off-chip matching component during implementation, if required.
In Fig. 7, the small phase linearity is obtained by optimizing the value of Ld1
within the range of 3–4.4 nH. Good phase linearity property is achieved, i.e., group
delay variation is ±200 ps across the whole band. The operating DC supply voltage
directly affects the performance of PA with respect to Power Added Efficiency (PAE)
A 3–7 GHz CMOS Power Amplifier Design … 241
as can be seen from Fig. 6a. It is observed that with the increase in VDD for 1, 1.5, and
2 V; efficiency improves for 3–7 GHz band. For UMC −0.18 µm CMOS optimum
VDD of 1.8 V has been chosen and accordingly PAE of 29% is obtained at −5 dBm
input power from Fig. 6b. Figure 7 shows the method of optimizing the value of Ld1
to achieve good phase linearity.
The pre- and post-layout simulation results for the Noise Figure (NF) are in good
agreement with each other as can be seen in Fig. 8. A good linearity is observed from
Fig. 9a, b with input and output 1-dB compression point of −3.495 dBm and 8.2
dBm, respectively, at 5 GHz. The input and output third-order intermodulation point
obtained is 17.339 dBm from Fig. 9c. Stability analysis shows that K f > 1 and Bf <
1 across the frequency band of 3–7 GHz, stating that designed PA is unconditionally
stable as shown in Fig. 10. The layout of the proposed PA is a symmetric one as
shown in Fig. 11. All the signal paths are designed with high-level metal 6, while
metal 1 is used for the ground plane. The dc pads are used for biasing and supply
voltages. RF MIM capacitors and spiral inductors are used due to its low losses. The
distance between the two metal wires and in between metal 6 to component is kept
as 0.8 µm.
The dimension of via’s is 0.28 µm × 0. 28 µm. All pin contacts are made with
metal 6. For inductor contact with other components an additional metal plate of
dimension 16.1 µm × 3. 5 µm is used. Table 2 shows the simulated summary and
performance comparison with other literature. As can be seen that the proposed UWB
PA has obtained very low gain ripple of about ±0.3 dB, average group delay, good
input and output matching, unconditional stability, and small chip area as compared
to other works reported on 3–7 GHz band. However, this PA suffers from more power
consumption to achieve wide bandwidth and good linearity simultaneously.
Fig. 6 a Variation of power added efficiency (PAE) with respect to supply voltage (VDD ) = 1, 1.6
and 2 V. b PAE for proposed PA design
242 V. P. Bhale and U. Dalal
4 Conclusion
For the designed 3–7 GHz UWB power amplifier the pre-layout and post-layout
simulation results are found to be in great agreement with each other. By employing
two stages of power amplifier, resistive shunt feedback, and inductive shunt peaking,
A 3–7 GHz CMOS Power Amplifier Design … 243
(c)
Fig. 9 Linearity analysis for proposed 3–7 GHz PA. a Input referred 1-dB compression point.
b Output referred 1-dB compression point. c Third-order intermodulation point
the proposed design achieves 20% reduction in gain ripple and 6.37% reduction in
layout area while simultaneously achieving good linearity, small group delay varia-
tion, and unconditional stability as compared to previously reported power amplifier
design as shown in Tables 1 and 2 across the entire band of interest.
244 V. P. Bhale and U. Dalal
0.852 mm
0.88 mm
Fig. 11 Layout of proposed low gain ripple 3–7 GHz UWB PA including RF pads (size: 0.88 mm
× 0. 852 mm = 0.749 mm2 )
A 3–7 GHz CMOS Power Amplifier Design … 245
Table 2 Comparison of wideband CMOS power amplifiers performances: published and present
work
Parameters References Proposed PA
design
[17] [18] [19] [20]
Band width 3.1–4.8 3–7 3.1–10.6 3.1–10.6 3–7
(GHz)
Technology 0.18 0.18 0.18 0.18 0.18
(µm)
Supply 1 1.8 1.8 1.8 1.8
voltage (VDD )
Input <−5 <−6 <−10 <−10 <−7.5
matching
S11 (dB)
Output <−5 <−7 <−10 <−14 <−6
matching S22
(dB)
Gain (dB) 18.4 ± 1 14.5 ± 0.5 10.46 ± 0.8 11.48 ± 0.6 19 ± 0.3
Input P1 dB −10.6 @ N/A N/A N/A −3.5 @ 5 GHz
(dBm) 4 GHz
Output P1 dB 6.0 @ 4 GHz 7 5.6 5 8.2 @ 5 GHz
(dBm)
PAE (%) 18 @ 4 GHz N/A N/A N/A 29 @ 5 GHz
Power (mW) 22 24 84 100 28
Group delay N/A ±178.5 ± ±85.8 ±200
(pS)
Stability N/A Unconditionally
stable
Area (mm2 ) 0.97 0.88 1.76 0.69 0.749
Remarks Cascode with Two-stage Cascode CS Three-stage Two-stage
CR technique cascode CR cascode CS cascoded PA
and additional PA with stagger with CS as
CS stage to tuning for second stage
achieve high broadband
power gain
CR—current reused, CS—common source, N/A—not mentioned
References
1. Makarov DG, Krizhanovskii VV, Shu C, Krizhanovskii VG (2008) CMOS 0.18-µm integrated
power amplifier for UWB systems. In: 4th international conference on ultra wide band and
ultra short impulse signals, Sevastopol, Ukraine, Sept 2008, pp 153–155
2. Blaakmeer SC, Klumbering EAM, Leenaerts DMW, Nauta B (2007) An inductor less wide-
band balun-LNA in 65 nm CMOS with balanced output. In: 33rd European solid state circuit
conference, Munich, 11–13 Sept 2007, pp 364–367
246 V. P. Bhale and U. Dalal
3. Blaakmeer SC, Klumperink EAM, Leenaerts DMW, Nauta B (2007) Wideband balun-LNA
with simultaneous output balancing, noise-cancelling and distortion-canceling. In: IEEE
international solid state circuit conference, pp 1341–1350
4. Tian MS, Mikklensen JH, Larsen OJT (2011) Design and implementation of a 1–5 GHz low
noise amplifier in 0.18 µm CMOS. Analog Integr Circuits Signal Process 67(1):41–48
5. Duan J-H, Han X, Li S (2009) A wideband CMOS LNA for 3–5 GHz UWB systems. In: IEEE
conference
6. Yoo S, Kim JJ, Choi J (2013) A 2–8 GHz wideband dually frequency-tuned ring VCO with a
scalable KVCO . IEEE Microwave Wirel Compon Lett 1–2
7. Kuan-Chung Lu, Wang F-K, Horng T-S (2013) Ultralow phase noise and wideband CMOS
VCO using symmetrical body-bias PMOS varactors. IEEE Microwave Wirel Compon Lett
23(2):90–92
8. Yoon J, Kim H, Park C, Yang J, Song H, Lee S, Kim B (2008) A new RF CMOS Gilbert mixer
with improved noise figure and linearity. IEEE Trans Microwave Theory Tech 56(3)
9. Kim CW, Kahn MS, Anh PT, Kim HT, Lee SG (2005) An ultra wideband CMOS low noise
amplifier for 3–5 GHz UWB system. IEEE J Solid State Circuits 40(2):544–547
10. Jose S, Lee H-J, Ha D, Choi SS (2005) A low-power CMOS power amplifier for ultra-wideband
(UWB) applications. In: Proceedings of the IEEE international symposium on circuits and
systems conference, pp 5111–5114
11. Han CH, Zhi WW, Gin KM (2005) A low power CMOS full-band UWB power amplifier using
wideband RLC matching method. In: Proceedings of the IEEE electron devices and solid-state
circuit conference, pp 223–236
12. Lu C, Pham A-V, Shaw M (2006) A CMOS power amplifier for full-band UWB transmitters.
In: Proceedings of the California, IEEE RFIC symposium, pp 397–400
13. Wang R-L, Su Y-K, Liu C-H (2006) 3–5 GHz cascoded UWB power amplifier. In: Proceedings
of the IEEE Asia Pasific conference on circuits and systems, pp 367–369
14. Grewing C, Winterberg K, Waasen SV (2004) Fully integrated distributed power amplifier in
CMOS technology, optimized for UWB transmitter. In: IEEE RF IC symposium, Aug 2004,
pp 87–88
15. Huang ZD, Lin ZM, Lai HC (2007) A high gain low noise amplifier with current-reused
technique for UWB applications. In: IEEE conference on electron devices and solid-state
circuits, Tainan, 20–22 Dec 2007, pp 977–980
16. Lin Y-J, Hsu SSH, Jin J-D, Chan CY (2007) A 3.1–10.6 GHz ultra-wideband CMOS low noise
amplifier with current-reused technique. IEEE Microwave Wirel Compon Lett 17(3): 232–234
17. Murad SAZ, Pokharel RK, Kanaya H, Yoshida K (2009) A 3.1–4.8 GHz CMOS power
amplifier using current reused technique. In: IEEE 5th international conference on wireless
communication sand, networking and mobile computing (WiCOM) 24–26 Sept 2009, Beijing,
China.
18. Murad SAZ, Pokharel RK, Galal AIA, Sapawi R, Kanaya H, Yoshida K (2010) An excellent
gain flatness 3.0–7.0 GHz CMOS PA for UWB applications. IEEE Microwave Wirel Compon
Lett 20(9)
19. Chen C-Z, Lee J-H, Chen C-C, Lin Y-S (2007) An excellent phase-linearity 3.1-10.6 GHz
CMOS UWB LNA uses standard 0.18 µm CMOS technology. IEEE Microwave Wirel Compon
Lett 17(3):232–234
20. Sapawi R, Pokharel RK, Murad SAZ, Anand A, Koirala N, Yoshida K (2012) Low group delay
3.1–10.6 GHz CMOS power amplifier for UWB applications. IEEE Microwave Wirel Compon
Lett 22(1)
An Approach to Detect and Prevent
Distributed Denial of Service Attacks
Using Blockchain Technology in Cloud
Environment
1 Introduction
and the resources. A DDoS attack on the cloud is executed by creating malicious
traffic that exhaust resources by creating a large group of agents called botnets as
discussed in [1]. In cloud environment DDoS attack can also degrade the working
ability of cloud service significantly by disrupting the virtual servers.
As per the author of [2], blockchain technology is a revolutionary technology
and is nowadays having significant applications in financial sectors, healthcare and
government. The blockchain technology is becoming the greatest among favorable
technologies of cyber security. Blockchain stores data in distributed manner that
retains a list of records that goes on increasing continuously known as blocks that
are secured from unauthorized alterations and reformation. These blocks contain the
cryptographically hashed data as discussed in [3]. The hashed blocks are based on
previous blocks so as to prevent tampering of data in blockchain. Blockchain provides
protection of machines and resources, availability and reliability of networks and
systems.
A smart contract is “An arithmetical function that is written in source
code/platform and executed by computers that used to integrates the tamper-proof
mechanism of blockchain” discussed in [4]. Smart contract defines the rules and
agreements as traditional contract does. It also automatically enforces a legal bond.
According to the developer’s need, they are able to develop a smart contract by spec-
ifying any instructions in it. Smart contracts give authorization to trustworthy trans-
actions and agreements to be executed among different, unidentified parties without
a requirement of a central expert system, legal authority or external enforcement
mechanism. As blockchain has become popular, smart contracts have received atten-
tion increasingly. Due to the promising features of blockchain, it ensures enhanced
security and privacy in many domains, like IoT, cloud and data mining, as discussed
in [5].
There are many techniques that can be adapted to identify and stop the DDoS attack
in the cloud environment, such as classification techniques, encryption techniques,
machine-learning-based techniques and entropy-based system, as discussed in [6].
Cloud computing needs more secure and efficient methods to identify and stop the
cloud resources from a DDoS attack. Therefore, this research work will be helpful in
the blockchain technology-based system that detects and prevents the DDoS attack
in a cloud environment.
This paper is organized as follows: In Sect. 2, a detailed summary of the recent
research works to stop and identify DDoS attack on the cloud is given. In Sect. 3
we show the general scenario of DDoS attack in the cloud. Section 4 explains the
proposed work with blockchain technology, and we enclosed our research work with
conclusion and the future directions in Sect. 5.
An Approach to Detect and Prevent Distributed Denial of Service Attacks … 249
2 Related Works
In recent times, many researchers have been working on the identification and preven-
tion of DDoS attack on the cloud. The history of this growing attack mechanism
was detected and that the internet control message protocol (ICMP) packets make
the network complex for the users and are marked as the concept of DDoS attack
discussed in [7]. The consortium of network of researchers first discovered the DDoS
attack in June 1998.
In early work about DDoS detection and prevention in a cloud, the research of [7],
proposed fuzzy-based mechanism which is helpful for the detection of DDoS attack.
The system uses three parameters, which are entropy of source IP, port address and
packet arrival rate, and using these parameters, the authors analyze the incoming data
packets behavior by fuzzy logic-based IF–THEN rules. It is a cost-effective, reliable
and easy method for the cloud system.
As discussed in [8], the authors have proposed a system that uses practical dynamic
resource allocation mechanism to face DDoS attacks. They use the intrusion preven-
tion system (IPS) to monitor incoming packets. The proposed system will auto-
matically and dynamically allocate extra resources from available cloud resources.
IPS works simultaneously to filter out attack packets when a host cloud server is
under DDoS attack. When the volume of DDoS attack packets is reduced, it releases
additional resources back to the available cloud resource pool.
Machine learning techniques are being used for DDoS attack detection system
discussed in [9], and are proposed to prevent attacks on the source side in the cloud.
The authors analyze the statistical feature of DDoS attacks. They had developed a
proof-of-concept model and tested it real cloud. They had evaluated and compared
nine different types of supervised and unsupervised machine learning algorithms.
To identify and stop the DDoS TCP flood attack in public cloud using classifier,
a new classifier system CS_DDoS is proposed in [10]. It provides an explanation
to securing kept records. For this, it classifies the incoming packets and makes a
decision on the basis of the classification results. In the detection phase, four different
classifiers are used to identify and determine whether a packet is regular or irregular.
In the prevention phase, it will deny access of cloud service to the malicious packets
and it will add the source IP in the blacklist table.
To identify and stop/avoid the DDoS attack, Updated Snort (Usnort) model is
proposed, as discussed in [11]. It was based on a Snort tool to detect DDoS imple-
mented in a distributed virtual environment. Usnort gives an alert message about
the intrusion with all relevant data associated with the attack to the network admin.
Admin drops the connection with that IP for certain time. This proposed system
favors to degrade the effect of DDoS by detecting the attack at very prior stages.
TCP/IP packet header features classification of connecting clients, according to
their OS, as proposed in [12]. It is used to identify the original source of an incoming
packet during the spoofed DDoS attack. They deployed an open source Xen Cloud
Platform (XCP) test-bed. To determine the original source of attacker, they match
the final TTL value of active and passive OS detection.
250 V. Patel et al.
A method of incorporation with HTTP GET flooding among DDoS attacks and
Map Reduce processing for quick attack detection is proposed in [13]. This method
performs signature-based detection with low errors and in less time. It shows that
the proposed pattern detection system is better compared to SNORT detection.
An entropy-based system discussed in [14], is used to provide a multilevel detec-
tion. Systems handle flow of a massive amount of file packets by multithreaded
IDS approach in a cloud. It would give the observed alerts to a monitoring service.
Monitoring service notifies the cloud user about their system under attack.
As discussed in [15], it shows a detailed review on DDoS attack, and classi-
fication of DDoS attack includes two types: bandwidth-based and resource-based
attacks. It also shows different techniques for detection of DDoS attacks, like
anomaly detection, NOMAD, packet selection and filtering method with conges-
tion, D-WARD, MULTOPS, and misuse detection and prevention techniques like
path-based circulated packet filtering, ingress and egress filtering and secure overlay
services.
The research of [16], states that to identify the attacks two main research avenues
need to be followed. First, the intrusion tries to harm VMs to launch a DDoS attack
against a target that is external to the cloud. Second, develop defense tools for the
more traditional cloud disturbance, where the goal of the attack is the cloud or a
part of the cloud itself. These resources are applicable to those situations that will
be constructed on a Eucalyptus cloud system.
The method proposed in [17], is used to provide security for web services against
the DDoS attack. This approach protects HTTP flooding, oversized XML, coercive
parsing, oversized encryption and WS, addressing spoofing vulnerabilities at the
application layer. Outlier detection technique could be used for effective filtering out
of malicious requests.
The Honeypot system proposed in [18], enables cloud servers to detect the
unknown attacks and prevent DDoS attacks. Honeypot network blocks the entire IP
address after the home IP address was detected to deal with attackers as prevention.
It also shows the difference between traditional systems and the proposed system.
To avoid and identify the DDoS attack on a cloud environment, many techniques
are being proposed, and most of these techniques provide less accuracy and results
in slow down in the system performance.
Therefore, cloud computing is used as an efficient DDoS detection and prevention
mechanism that can provide more accuracy and detect and prevent attacks accurately
on a cloud environment.
Figure 1, shows the scenario of DDoS TCP overflow (flood) attack that is performed
by an attacker sending constant requests to the service provider that can affect the
cloud server performance within a small time and even stop the services completely.
Here, an attack is performed using three attackers and one is the genuine user. In
An Approach to Detect and Prevent Distributed Denial of Service Attacks … 251
this scenario, the genuine user accesses cloud service to perform the execution of the
python program.
A TCP flood attack was performed as a DDoS attack. This is carried out by
manually sending constant requests to the victim server.
First, a genuine user is executed, and it takes 16.66 s to get the result, as shown in
Fig. 2.
An attack was launched by sending constant requests that perform DDoS TCP
overflow (flood) attack on victim server. As shown in Fig. 2, when an attack was
performed, the genuine user got to result in 52.89 s, that is, more than double times
to get the result.
252 V. Patel et al.
4 Experimental Setup
For the implementation scenario, Linode service discussed in [19], is used to work
in a cloud that provides platform-as-a-service, with Linux operating system, 4 GB
RAM, 2 CPU cores and 80 GB storage. The practicals are being performed on
Python language with the help of PuTTY emulator to connect with the cloud, while
MongoDB is used for data storage in the cloud environment. Packet analyzer software
Tcpdump is used to collect incoming packets in the network.
5 Proposed Work
In this section, the proposed security support to identify and stop/avoid the DDoS
attack in a cloud environment is described.
There are two phases of the proposed system on which it is defined:
(i) Detection Phase
(ii) Prevention Phase.
An Approach to Detect and Prevent Distributed Denial of Service Attacks … 253
• The detection system fetches the incoming packets within a timeframe as a client
request, for example, 1 min.
• The received packets are processed to find whether packet sources are blacklisted
as an attacker of the system or not. For this, the system makes use of blockchain-
based blacklist table, which contains blacklisted IPs.
• If source of packet is already in blacklisted IPs, the detection phase will send
packets directly to the prevention phase without any processing.
• If source of packet is not blacklisted, the incoming packet will be passed through
smart contract to check whether the packets are normal or abnormal.
• The smart contract contains rules that are used to classify abnormal packets. For
example, predefined threshold.
• If one user makes requests more frequently than predefined threshold, it will be
detected as attacker by the system. Threshold can be manually adjusted by the
system admin.
• If a packet is defined as normal, the detection phase will send it to the cloud
service provider. Otherwise, the detection phase will update the blacklisted table
and send to the prevention phase.
• All arriving requests are stored into blocks. A new block will be stored in
blockchain using the consensus algorithm known as proof of work.
• Blocks will be encrypted for making chains, and each block appended to the chain
will be in a sequential manner.
• Smart contract is also executed as a part of blockchain.
• When the packet reaches to prevention phase, first it will alert the system
administrator.
• Then, it will add the attacking source address to the blacklist table, if it is not
already available in the blacklist table.
• Finally, the packet will be dropped.
As described earlier, the proposed system can be used to identify and stop the DDoS
attack on cloud environment using blockchain technology. To perform DDoS attack
on cloud environment, first blockchain-based cloud is created. Figure 4, shows three
implementation scenarios in blockchain-based cloud.
254 V. Patel et al.
First, the detection system fetches the incoming packets within a time bound, for
example 1 min. All requests are stored into IP table. If source of incoming packets
is blacklisted as an attacker, then it will directly send to the prevention system.
If source of packet is not blacklisted, the incoming packet will be passed through
smart contract to choose whether the packets are normal or abnormal. Smart contract
defines the rule like predefined threshold in it. For example, in 1 min the threshold
value is defined as 10. If the incoming requests exceed from the predefined point, it
is known as attacking IP.
Proof of work consensus algorithm of blockchain is used to generate block that
contains data like current hash key, previous hash key and nonce, as well as the
current block key. As block is generated, it is encrypted with SHA-256 algorithm
and added to the blockchain.
As blockchain is created, it also identifies the attacking IPs using smart contract.
Here smart contract performs Level 1 detection of DDoS attack before entering into
blockchain-based cloud. If smart contract considers the incoming IP as attacking IP,
then the system will send incoming packets to prevention system. If the incoming
request is considered as genuine user, then portion of the blockchain and services
are provided to the user. As attacking IPs are detected by smart contract, prevention
phase blocks the attacking IPs.
An Approach to Detect and Prevent Distributed Denial of Service Attacks … 255
After becoming part of the blockchain if genuine user tries to perform DDoS attack,
then Level 2 detection is performed using smart contract. Smart contract filters out
attacker using rule of predefined threshold within a timeframe. For example, if any
genuine user makes more than 30 requests in 2 min, then the user is considered as
an attacker by the smart contract.
In this scenario, if any genuine user tries to alter the data of block or tries to increase
the threshold value, then blockchain does not allow to alter the data because of the
tamper-proof mechanism. If any user tries to alter the block, then when counting the
hash key different key is generated from the already stored in the database. Therefore,
it gives match false and system founds that blockchain is altered and it allows adding
an updated block in blockchain because of its tamper-proof mechanism.
7 Performance Analysis
In this section, the performance of the proposed system is evaluated. The accuracy
represents the rate of correctly identified attackers and genuine users in network. The
accuracy of the proposed system is measured by the following equation.
TP +TN
Accuracy =
T P + T N + FP + FN
where
True positive (TP) is correctly identified abnormal packets.
True negative (TN) is correctly identified normal packets.
False positive (FP) is incorrectly identified abnormal packets.
False negative (FN) is incorrectly identified normal packets.
Table 1, shows that proposed system is more efficient, but it requires more execu-
tion time to identify the DDoS attack because it generates blockchain that requires
more time to be generated.
Table 2, shows the evaluation of the CS_DDoS system with the proposed method
with respect to a number of genuine users and the attacks that are performed on
the cloud environment. CS_DDoS is known as a classifier system that is used in
cloud environment to detect and prevent the DDoS attack. Results show that when
the number of attackers is increased with respect to genuine users, both systems will
provide 100% accuracy. As soon as the number of attackers is increased (e.g. 6, 8),
256 V. Patel et al.
the change in the accuracy of the CS_DDoS system get changed while the proposed
system shows the highest accuracy. When the number of attackers is further increased
(e.g. 10, 12), the proposed system attains high accuracy as compared to the CS_DDoS
system as 92 and 93.75%, respectively.
DDoS attacks are quickly becoming the most prevalent type of cyber threat, growing
rapidly in the past years, both in numbers and volumes. Many detection tech-
niques are used against DDoS attack on a cloud, but these techniques including
CS_DDoS have disadvantages like sometimes attacks are not properly detected, it
slows down the performance of the system or stops the services completely. The
proposed system shows that DDoS attack is properly identified by the detection
phase and blocked by the prevention phase. It also shows that blockchain provides
tamper-proof mechanism.
An Approach to Detect and Prevent Distributed Denial of Service Attacks … 257
The system can be tested and analyzed with real server and clients on a large
network. The system can be checked with different threshold values for performance
analysis. The proposed system can be optimized in terms of execution time.
References
1. Singh N, Hans A, Kumar K, Birdi MPS (2015) Comprehensive study of various techniques for
detecting DDoS attacks in cloud environment. Int J Grid Distrib Comput 8(3):119–126
2. Tama BA, Kweka BJ, Park Y, Rhee K-H (2017) A critical review of blockchain and its current
applications. In: 2017 international conference on electrical engineering and computer science
(ICECOS). IEEE, pp 109–113
3. Zheng Z, Xie S, Dai H, Chen X, Wang H (2017) An overview of blockchain technology:
architecture, consensus, and future trends. In: 2017 IEEE international congress on big data
(BigData congress). IEEE, pp 557–564
4. Cheng J, Lee N, Chi C, Chen Y (2018) Blockchain and smart contract for digital certificate. In:
2018 IEEE international conference on applied system invention (ICASI), Chiba, pp 1046–1051
5. Miraz, MH, Ali M (2018) Applications of blockchain technology beyond cryptocurrency
6. Devi BSK, Subbulakshmi T (2017) DDoS attack detection and mitigation techniques in cloud
computing environment. In: 2017 international conference on intelligent sustainable systems
(ICISS), Palladam, pp 512–517
7. Mondal HS, Hasan MT, Hossain MB, Rahaman ME, Hasan R (2017) Enhancing secure cloud
computing environment by detecting DDoS attack using fuzzy logic. In: 2017 3rd international
conference on electrical information and communication technology (EICT), Khulna, pp 1–4
8. Yu S, Tian Y, Guo S, Wu DO (2014) Can we beat DDoS attacks in clouds? IEEE Trans Parallel
Distrib Syst 25(9):2245–2254
9. He Z, Zhang T, Lee RB (2017) Machine learning based DDoS attack detection from source side
in cloud. In: 2017 IEEE 4th international conference on cyber security and cloud computing
(CSCloud), New York, NY, pp 114–120
10. Sahi A, Lai D, Li Y, Diykh M (2017) An efficient DDoS TCP flood attack detection and
prevention system in a cloud environment. IEEE Access 5:6036–6048
11. Khadka B, Withana C, Alsadoon A, Elchouemi A (2015) Distributed denial of service attack on
cloud: detection and prevention. In: 2015 international conference and workshop on computing
and communication (IEMCON). Vancouver, BC, pp 1–6
12. Osanaiye OA, Dlodlo M (2015) TCP/IP header classification for detecting spoofed DDoS attack
in Cloud environment. In: IEEE EUROCON 2015—International conference on computer as
a tool (EUROCON), Salamanca, pp 1–6
13. Choi J, Choi C, Ko B, Kim P (2014) A method of DDoS attack detection using HTTP packet
pattern and rule engine in cloud computing environment. Soft Comput 18(9):1697–1703
14. Navaz AS, Sangeetha V, Prabhadevi C (2014) Entropy based anomoly detection system to
prevent DDoS attacks in cloud. Soft Comput 18(9):1697–1703
15. Deshmukh RV, Devadkar K (2015) Understanding DDoS attack and Its effect in cloud envi-
ronment. In: 2015 international conference on advances in computing, communication and
control, pp 203–210
16. Carlin A, Hammoudeh M, Aldabbas O (2015) Defence for distributed denial of service attacks
in cloud computing. Procedia Comput Sci 73:490–497
258 V. Patel et al.
17. Vissers T, Somasundaram TS, Pieters L, Govindarajan K, Hellinckx P (2014) DDoS defense
system for web services in a cloud environment. Future Generat Comput Syst 37(2014):37–45
18. Manoja I, Sk NS, Rani DR (2017) Prevention of DDoS attacks in cloud environment. In: 2017
international conference on big data analytics and computational intelligence (ICBDAC),
Chirala, pp 235–239
19. Linode Service. https://www.linode.com/
Variability Analysis of On-Chip
Interconnect System Using Prospective
Neural Network
Abstract Interconnects are densely laid as multiple layers in ICs and play a
dominant role in characterizing the system performance. At nanodimensions, vari-
ability and reliability in interconnects become a dominant concerning issue. Vari-
ability analysis can be accomplished using several statistical mathematical tech-
niques, such as Monte-Carlo, parametric, process corner, ANOVA and rank table
analyses. However, these conventional techniques are becoming obsolete and defi-
cient in determining the severe variability effects of the sophisticated, densely packed
and enormously large on-chip interconnects. Moreover, traditional techniques are
tremendously computational-expensive. Henceforth, in the present paper, prospec-
tive neural network-based back-propagation technique is incorporated to capture the
effect of highly variable parameters that are existing in on-chip interconnects. Back-
propagation neural network (BPNN) is used along with Levenberg–Marquardt (LM)
algorithm to adjust weights of hidden layer in order to minimize error. The proposed
model is efficient and versatile such that it predicts the reliability of the circuit
performance based on input parameters, such as interconnect resistance, inductance
and capacitance. The performance parameter considered for the developed model is
mean square error. The obtained results exhibit high accuracy and adaptability. The
accuracy of the proposed model is assessed through regression, error and histogram
plots.
1 Introduction
With the magnificent advancements in VLSI technology, the demand for high-end
electronic appliances has increased manifolds. These technology developments have
been achievable due to continuous scaling of technology. Stepping with the minia-
turization of technology, the device performance increases. However, at the same
time, the on-chip interconnects become more dense and huge at lower technology
nodes and consequently proven to be a major bottleneck for IC designs [1]. At scaled
technology nodes, interconnects are the major governing parameter in determining
the efficiency of electronic devices.
At lower technology nodes, the non-ideal effect such as variability also aggravates
substantially. The variability refers to the change in output system responses due to
deviation in certain input parameters. Variability can be induced during any of the
manufacturing steps such as fabrication and packaging. As technology advances
toward sub-micron regime, interconnects dominate the model behavior and become
more prone to process variations in such scaled down environment. Such variations
can result in output that may be quite contrary to the expected results. Therefore,
circuit performance is hampered by variability effects and it becomes a necessity to
consider them in order to construct a robust model.
Interconnect variability can occur in any of its physical parameters such as spacing
between interconnects (s), thickness of wire (t), interlayer dielectric thickness (h)
and width of the wire (w). These defects arise due to imperfections in fabrication
process, thereby significantly affecting the parasitic of interconnect, namely resis-
tance, inductance and capacitance [2]. These in turn affects the circuit performance
parameters as delay and power dissipation. Till date, majority of analysis of such
variability have been performed using traditional techniques, such as process corner,
parametric sensitivity, Monte-Carlo method, Taguchi, ANOVA and rank table [3,
4]. In process corner analysis, typical region (formed by corners) is defined within
which it is expected that the system can withstand noise and works well. The corners
are basically formed based on the combination of fast and slow transistors, namely
NMOS and PMOS. These corners enclose an imaginary area, and it is expected that
the circuit performance is executed correctly within this area. Parametric analysis
studies the effect of variation in each process parameter individually over circuit
performance, whereas Monte-Carlo method investigates the cumulative effect of all
parameters simultaneously over circuit performance. Taguchi and ANOVA methods
indicate the influence of each factor on the system in terms of percentage, whereas
rank table determines the impact of parameters by assigning ranks to each factor.
The higher rank or percentage implies stronger parameter influence on the system
performance. Monte-Carlo has been a widely used variability technique. Henceforth,
Variability Analysis of On-Chip Interconnect System … 261
in the ongoing paper, Monte-Carlo analysis has been incorporated for creating initial
dataset for on-chip neural network-based model.
In order to design a system, determination of desired output performance parame-
ters values is needed. This task cannot be solely dependent on the traditional methods
as the designer might have to iterate multiple numbers of times to re-optimize the
circuit in accordance to a new performance requirement. In order to increase perfor-
mance efficiency of a system, prediction of output parameters is essentially needed.
This can be attained by developing initial training dataset. The development of these
datasets, and subsequently its system prediction is based on regression models that
can be developed using prospective machine learning and neural network (NN) tech-
niques [5–7]. Neural networks have been employed as regression and classification
algorithm in various fields, such as speech processing, image processing, control
system and more, as an optimization technique [5]. They mimic the biological func-
tioning of neurons. The algorithm has two main aspects: training and testing. Training
is used so as to make the system learn, while testing is used to check the accuracy of
the designed structure. In order to minimize the error, back-propagation techniques
are used as training algorithm [5–7]. Various training algorithms are used in neural
network such as gradient descent, conjugate gradient, quasi-Newton, Levenberg–
Marquardt (LM) and so on. These algorithms are classified on the basis of speed and
memory usage. This paper incorporates LM algorithm while generating weights for
NN model [8]. Back-propagation neural network (BPNN) is a powerful predictive
and optimization techniques which can be prospectively used in performance anal-
ysis of interconnect. BPNN replaces the manual time taking process that is used in
traditional variability analysis method and speeds up the process of optimal designing
of the system.
The present paper prominently attempts to bridge the gap between VLSI design
and neural network modeling approach. In the paper, a robust approach to integrate
faster optimization technique for prediction of performance with variable parasitics
is presented.
Fig. 1 Block diagram representation of sequential steps for NN-based model development
Build NN structure
Training
Readjusts weights
No
Minimum
MSE?
Yes
Testing
The first step involves creation of initial dataset. This is performed using
Monte-Carlo analysis. Monte-Carlo analysis is a statistical-based technique used
for assessment of variation impact of multiple parameters on the circuit perfor-
mance. According to this method, multiple iterations are performed for response
evaluation based on random set of varying inputs specified within a certain range.
The system chosen in the current paper is a driver-interconnect-load (DIL) where a
CMOS inverter is considered as driver and a capacitor as load [1]. This is shown in
Fig. 3.
Fig. 3 Driver-interconnect-
load (DIL) Driver Interconnect Receiver
system
R L C
Variability Analysis of On-Chip Interconnect System … 263
The second step involves building the NN-based model for training. To imple-
ment this, it is necessary to be acquainted with the NN approach. Algorithms based
on neural network (NN) are inspired by biological neural network. These are basi-
cally mathematical interpretation of human neuron model that attempts to emulate
the associative and parallel pattern of the human brain. Neural network-based algo-
rithms are being widely researched presently and employed for various regression
and classification problems. The structure of any neural network consists of input,
hidden and output layers. Each layer consists of a set of neurons that are defined
by user. Multilayer perceptron (MLP) is the most frequent representation of NN
network, as depicted in Fig. 4a.
The middle layer, which is the hidden layer, transforms the obtained weighted
linear function to a nonlinear function by using activation function [9]. Based on
the weights or strength of each connection of the nodes, the data is passed on to the
Hidden Layer
Output Layer
a Input Layer H1
Y1
X1 H2
Y2
X2 H3
Yk
XN HL
X1 W1
Zi
X2 W2
H A
W3
X3 Summer Activation
function
WN b
XN
bias
Set of Inputs
Fig. 4 a Neural network architecture. b Detailed architecture corresponding to a node of the hidden
layer
264 A. Misra et al.
other hidden layer or the output layer. The input layer consists of neurons which are
equal to the number of input variables. The input nodes accept the input independent
dataset. The number of hidden layers can be more than one depending upon the
complexity of the problem. A detailed structure corresponding to single neuron of
the hidden layer (of Fig. 4a) is shown in Fig. 4b. In the figure all inputs are connected
to each node of the hidden layer with some strength known as weight. Also, a bias
has been added in order to incorporate shifting.
Consider a function such that
Y = f (X ) (1)
where X is the independent variable given to the input layer and Y is the dependent
variable which is obtained through output layer. X and Y are variables that contain set
of values arranged in the form of array. Even if the exact function which computes Y
from X is unknown, neural network approach has the magnificent ability to map and
form a relationship by adjusting the weights of neurons. It first learns through the
learning data set provided and then tests the system by using the test dataset. Usually
the entire dataset available is divided into 70% as training dataset and 30% as testing
dataset.
In Fig. 4b, consider the output from the node of hidden layer to be Z, then it can
be defined as
N
Zi = W j .X j + b (2)
j=1
where W and b are defined as weight corresponding to each input and bias given
to a particular node, respectively. The node number of hidden layer is defined by
i that ranges from 1 to L, while j represents the count number of total number of
inputs ranging from 1 to N. The output (Z i ) is then passed through the activation
function ‘A’. The main objective of transfer function or activation function is to
introduce nonlinearity in the circuit. The commonly used activation functions are
linear, sigmoid, Gaussian and piecewise linear. The output of the hidden layer is
passed through activation function to get a normalized output. In this paper, sigmoid
function is used as an activation function. It is defined as follows:
1
σ (Z ) = (3)
1 + e−z
The sigmoid function converts the output in the range between 0 and 1.
The difference between the predicted and actual values is used to compute mean
square error (MSE). The MSE is propagated backwards so that each layer readjusts
its weights so that minimum error is obtained. This method is hence known as back-
propagation algorithm [7]. There are many NN back-propagation algorithms that
have been proposed by different researchers [10]. Levenberg–Marquardt (LM) algo-
rithm is one of the best NNBP algorithms as it considerably lowers computational
Variability Analysis of On-Chip Interconnect System … 265
time [8]. LM algorithm is stable and faster than other algorithms when it comes to
weights updation. Weight updation using LM algorithm is defined as:
where W is the weight of the NN network, J is the Jacobian matrix, I is the identity
matrix and e is the error matrix. The computational complexity of LM algorithm
depends on the Jacobian matrix. Thus, the complexity of the proposed system depends
on the number of input, hidden, output layers and on the number of neurons in the
hidden layer. This complexity is defined only for training phase as in testing phase
we use the NN network which has been created using LM algorithm.
The LM algorithm has been used in the current paper. The training process is
executed until minimum MSE is achieved. Once the desired MSE is obtained, the
training process is completed and last step for model formation is executed. The final
step comprises testing process. Here, the reserved testing dataset is provided as input
and the output is analyzed using developed NN-based model.
This section presents the results that are obtained using the developed model that
is based on NN approach. The precision of the developed model is also compared
with the traditional analytical model. The NN-based model is developed for 32 nm
technology. The on-chip interconnect dimensions are defined in Table 1 [11, 12].
The neural network is trained using dataset that is generated from Monte-Carlo
simulations. These have been performed using HSPICE. Interconnect parasitics,
namely resistance, inductance and capacitance, are varied. The output performance
parameters are delay and power dissipated. The interconnect length considered is
0.5 mm. The Monte-Carlo simulations have been performed for 1000 iterations.
The process parameters are varied with deviation of ±10% simultaneously. Further
training, testing and creation of neural network structure have been implemented
Table 1 Interconnect
Interconnect dimension Value
dimensions for 32 nm
technology Width (w) 48 nm
Space (s) 48 nm
Thickness (t) 144 nm
Height (h) 110.4 nm
Aspect ratio 3
Vdd 0.9 V
Length 0.5 mm
Dielectric constant (ε) 2.25
266 A. Misra et al.
in MATLAB with the help of nntraintool. Training dataset consists of 70% of the
total dataset. Further, the remaining dataset is divided into validation set and testing
dataset, each with 15% of the total dataset. Input layer is connected to hidden layer.
Number of hidden layers and neurons in it depend on the complexity of the problem.
Since feed forward neural network does not have any prior knowledge regarding the
weights or hidden layers, henceforth back-propagation algorithm is used to compute
weights. These weights are iteratively updated in order to fit with respect to the
performance parameter. For this problem, two hidden layers have been employed
with count of neuron number as 10 and 8, respectively. The LM training algorithm is
used with 1000 epochs (iterations) and mean square error as a performance parameter.
The activation function used is sigmoid in order to introduce nonlinearity to the
hidden layer.
Regression curves are plotted and mean square error has been computed as depicted
in Fig. 5. Regression is a statistical modeling technique used for setting up the
relationship between input and output variables. It also defines the level of closeness
of the developed model with initial dataset. Figure 5a–c indicates regression plots for
output data with the target values. The correlation factor ‘R’ indicates the linearity
between the actual obtained and expected output values. The closer the value of R to
1, the greater is the linear relationship between targets and output. The fitting of the
dataset and predicted output is plotted using regression plots. From Fig. 5, it can be
seen that the best fit line for the clustered dataset lies on the target line. This shows
that the developed NN-based model works very optimally.
To compute the accuracy and confirming the efficiency of the NN-based developed
model, mean square error is computed. Figure 6 indicates the mean square error,
which in this case is low, thereby indicating the network is working very closely
to the analytical model. It can be observed that best performance or least error is
obtained after 32 epochs. Hence, a system stabilizes and adjusts its parameters for
best performance in 32 iterations.
In order to analyze each sample’s accuracy in the total data set, error histogram
is plotted as shown in Fig. 7. The error histogram curve indicates that around 700
training samples, 150 validation samples and 150 testing samples have error near to
0. Therefore, these results confirm that NN works closely when compared to SPICE
simulation, so NN can be employed as an alternative predication method.
Variability Analysis of On-Chip Interconnect System … 267
Output a b
Output
c
Output
Fig. 5 a Training regression plot. b Testing regression plot. c Regression plot for entire data
Providing inputs to
Identifying varying Output
the constructed NN
parameters
with saved weights
a
0.5
Mean Power Dissipated ( W)
0.4
0.3
0.2
Monte Carlo (Power dissipation mean)
0
500 1000 1500 2000 2500 3000
Interconnect Length ( m)
b
20
15
Mean Delay (ns)
10
5
Neural Network (Delay mean)
Monte Carlo (Delay mean)
0
500 1000 1500 2000 2500 3000
Interconnect Length ( m)
Fig. 9 a Monte-Carlo and NN comparison plot based on power dissipation. b Monte-Carlo and
NN comparison plot based on delay
270 A. Misra et al.
wire. It is seen from the figure that NN-based model works very closely with Monte-
Carlo analysis. Hence, there is a good agreement of the proposed model with the
existing model.
4 Conclusion
This paper presents a novel and effective method for variability analysis of on-
chip interconnects system using prospective neural network approach. The approach
incorporates a systematic process comprising creation of initial dataset, then training
and building the NN-based model. This is followed by testing of the model. The
developed neural network-based model is computationally efficient. The validity
and accuracy of developed model is ensured using regression plots and analyzing
mean square error. Further, variability analysis has been performed using NN-based
model. This has also been validated using conventional Monte-Carlo method. It has
Variability Analysis of On-Chip Interconnect System … 271
been analyzed that NN-based model is 9.2 times faster as compared to conventional
Monte-Carlo method. Since neural network does not have a prior knowledge, it has
the capacity to grasp from abstract dataset, and thereby it is not limited to a particular
interconnect dimension. Also the proposed model can handle high dimension datasets
which have unknown non-linear relationship with each other. The neural network-
based model is versatile, dynamic and can be effectively incorporated to implement,
analyze and design efficient circuits and systems.
References
1. Lin Q, Wu HF, Jia GQ (2018) Review of the global trend of interconnect reliability for integrated
circuit. Circuits Syst 9:9-21Y
2. Scheffer L (2006) An overview of on-chip interconnect variation. In: ACM international
workshop on system-level interconnect prediction, pp 27–28
3. Agrawal R, Chandel, Dhiman R (2016) Variability analysis of stochastic parameters on the
electrical performance of on-chip current-mode interconnect system. IETE J Res (Taylor and
Francis) 63(2):268–280
4. Agrawal Y, Parekh R, Chandel R (2018) Performance analysis of current-mode interconnect
system in presence of process, voltage, and temperature variations. In: Garg A, Bhoi A, Sanjee-
vikumar P, Kamani K (eds) Advances in power systems and energy management. Lecture notes
in electrical engineering, vol 436. Springer, Singapore
5. Tu TY, Chao PC-P (2018) Continuous blood pressure measurement based on a neural network
scheme applied with a cuffless sensor. Springer, Berlin Heidelberg
6. Trinchero R, Manfredi P, Stievano IS, Canavero FG (2018) Machine learning for the
performance assessment of high-speed links. IEEE Trans Electromagn Compat
7. Zhang L, Wang F, Sun T et al (2018) A constrained optimization method based on BP neural
network. Neural Comput Appl 29
8. Lourakis MIA (2005): A brief description of the Levenberg–Marquardt algorithm implemented
by Levmar. Proc Found Res Technol 1–6
9. Sibi P, Bordes A, Jones SA, Siddarth P (2013): Analysis of different activation functions using
back propagation neural network. J Theor Appl Inf Technol 47(3)
10. Saduf MAW (2013) Comparative study of back propagation learning algorithms for neural
networks. Int J Adv Res Comput Sci Softw Eng 3(12)
11. Predictive Technology Models. https://ptm.asu.edu (2016)
12. International Technology Roadmap for Semiconductors. https://public.itrs.net (2013)
Prospective Incorporation of Booster
in Carbon Interconnects for High-Speed
Integrated Circuits
Abstract VLSI technology has eulogistically grown over the years. This has been
made feasible due to continuous scaling of technology. Miniaturization of tech-
nology and conscientious demand of high-speed applications have resulted in dense
and compact packaging of on-chip interconnects and devices. Copper has been
widely used as an on-chip interconnect material in VLSI chips. However, copper is
constrained by the grain and surface boundary scattering effects at scaled technology
nodes that limit its utility and further incorporation in futuristic integrated circuit (IC)
designs. Subsequently, carbon-based on-chip interconnects have been investigated
and proven to be one of the effective alternatives to conventional copper intercon-
nects. The present work vividly explores and investigates carbon nano-materials that
can be used for high-speed VLSI interconnects. The other important entanglement
with on-chip interconnect is that its parasitic increases and henceforth output perfor-
mance degrades as the length of wiring over IC increases. This can be eminently
alleviated using prospective booster insertion in between on-chip interconnects. The
elegant booster insertion technique for prosperous carbon interconnects has been
marginally explored till date and henceforth taken up in this work. This paper metic-
ulously investigates various system characteristics such as delay, power dissipation,
crosstalk, signal integrity, eye diagram for different interconnect materials such as
copper, carbon-based CNTs and GNRs. It is observed that carbon interconnects
impressively outperform in terms of delay, power, crosstalk and high-speed operation
than conventional copper interconnects.
1 Introduction
Progressive advancements in analog, digital and mixed signal systems and their real-
ization in integrated circuits (ICs) have created need for high-speed requirements,
and necessity to attain more functionality over same silicon chip area. This requires
scaling of technology that leads to compact integration of millions of transistors and
interconnections. In the early era of VLSI development, when the technology dimen-
sions were large, the performance of IC was dominantly determined by transistors.
However, as the technology miniaturized, on-chip interconnects have become more
vital factor and dictate in characterizing system performance of nano-chips [1–10].
Hence, performance analysis of on-chip interconnects to enhance efficiency of ICs
has become very essential. Incorporation of advanced integration techniques, like
3D and multicore ICs, to attain high speed and enhanced performance leads to reli-
ability concerns like thermal reliability, crosstalk, electromigration and scattering
effects [2, 3, 11]. As the devices are miniaturized below submicron technologies,
new nanomaterials are explored and introduced to meet high-performance require-
ments. International Technology Roadmap for Semiconductors (ITRS) published in
its reports of year 1994 about the necessity to incorporate new on-chip conducting
materials for attaining projected overall technology requirement [1]. And later from
year 2001 onwards, ITRS emphasized on prospective new materials and highlighted
the problems that are existing in conventional copper on-chip interconnects. It is
reported that copper resistivity increases at scaled technology nodes due to down-
scaling of physical dimensions of interconnects. Nanomaterials that can be aptly
used in high-speed on-chip interconnects are carbon nanomaterials that are briefly
discussed in subsequent sections.
The vital system performance is majorly evaluated and analyzed using propa-
gation delay and power dissipation. Both propagation delay and power dissipation
are function of on-chip interconnect length. The increase in the length of intercon-
nect has deteriorating effects on these output performance parameters. The global
on-chip interconnects refer to top layer of on-chip interconnect system and are of
longest length in IC designs. These global interconnect provides power, ground and
control signals to entire chip modules. The length of global interconnect increases in
proportion with the die size and is prime evildoer for propagation delay and signal
degradation in miniaturized ICs [2, 3, 10, 12]. Booster insertion is one of the efficient
techniques, to mitigate the signal integrity issues in long interconnects [10, 12, 13].
In this technique, boosters (buffers) are placed in between long wires by segmenting
it into smaller sections. Since propagation delay increases with the length of inter-
connect, dividing the line into smaller sections is an efficient scheme to limit system
latency due to interconnects.
Prospective Incorporation of Booster in Carbon Interconnects … 275
2 Carbon Nanomaterials
Length Length
GNR (SLGNR). Likewise, when multiple layers of GNR are stacked one above the
other maintaining minimum distance of van der Waal’s gap (≈0.34 nm) between
the intermediate layers, it is called as multilayer GNR (MLGNR). MLGNRs are
most preferred for on-interconnect applications. This is because MLGNR possesses
better current conduction and lesser impedance than SLGNR structure [5–7]. The
geometric representation of MLGNR interconnect is demonstrated in Fig. 2, where N
represents the number of graphene layers. These are separated by van der Waal’s gap
(δ). Width (w) and thickness (t) of GNR correspond to dimension of interconnects
[4–8].
The total number of layers (N layer ) in MLGNR is computed as [4–7].
t
Nlayer = 1 + Integer (1)
δ
Alike GNR structures, CNTs are also categorized into single-wall carbonnan-
otube (SWCNT), which is a single tubular structure, and multiwall carbon nanotube
(MWCNT) which is a CNT with multiple shells concentric in it. In addition to these,
potential research has been performed considering CNTs in bundle form. Depending
on the type of CNTs packed in bundle structure, they are correspondingly defined and
nomenclatured. Bundle consisting of only SWCNT is called SWCNT bundle while
bundle of MWCNT is termed as MWCNT bundle. Later researchers have investigated
1
2
N-1
N
that during development of CNTs using chemical vapor deposition process (CVD),
carbon nanotubes are formed in a bundle structure comprising both SWCNTs and
MWCNTs. This structure is referred to as mixed CNT bundle (MCB). The structures
of SWCNT, MWCNT and mixed CNT bundle are shown in Fig. 3.
Figure 3 also depicts the spatial arrangement of SWCNTs, MWCNTs and mixed
CNTs that are packed in a bundle structure. It is seen that the centers of SWCNTs,
MWCNTs and mixed wall CNTs in a bundle are arranged in triangular placement
form. It has been analyzed that by placing centers of CNTs in the shown triangular
formation gives the high package density and incorporation of maximum CNTs in the
rectangular area. In any CNT bundle with given cross-sectional area, total number of
CNTs is a critical factor to compute equivalent circuit parasitical parameters. Total
number of CNTs (SWCNTs/MWCNTs) in a bundle are determined as [10, 11, 14,
16, 21, 22].
WB WB
HB HB
1nm
Din
(a) (b)
HB
WB
(c)
In this section, electrical modeling, its model description and parasitic extraction of
conventional copper and advanced carbon (MLGNR and CNT bundles) based inter-
connects have been performed. The interconnect parasitics comprise resistance (R),
inductance (L) and capacitance (C). The interconnect has to be energized with driver
and terminated by some load. The driver-interconnect-load (DIL) is the considered
model in the present work and is shown in Fig. 4.
The driver and load are realized using CMOS inverter. The interconnect length
can be represented by small lumped sections. Each of these lumped sections can be
defined by RΔz, LΔz and CΔz, where Δz defines the incremental small distance.
This is shown in Fig. 4.
VDD VDD
Iin R∆z L∆z Iout
VDD VDD
Buffer Buffer
Vout
Vin
Gnd
Gnd leΔz Rlump/2
Rlump/2 RESCΔz lkΔz
cqΔz
ceΔz
z=0 z=l
where
h
Rq = (5)
2 · e2 Nch
The per unit length (p.u.l.) scattering resistance (r ECS ) primarily depends on the
mean free path (λmfp ) and can be expressed as
Rq
rESC = (6)
Nlayer · λmfp
The per unit length (p.u.l.) capacitances and inductances are obtained as
lk0 h
lk = ; where lk0 = 2 (7)
2Nch · Nlayer 2e v F
2e2
cq,n = 2cq0 Nch · Nlayer ; where cq0 = (8)
hv F
μ0 d ε0 w
le,n = , ce = (9)
w d
where
NCNT
λmfp ≈ 103 Di and Ntotal = NCH (11)
i=1
, 2π ε
CESC = ≈ 50 aF (12)
cosh−1 h t +D
D
4.e2
Cqo, = ≈ 193.7 pF and Cq = Cqo, · Ntotal (13)
h · vF
,
, Cq · CESC
CESC = Cq ||CESC = , (14)
Cq + CESC
MWCNTs and mixed CNT bundles are of greater importance due to their metallic
property and reliability; their modeling differs in terms of number of shells each
CNT have. Number of conducting channels is a critical parameter that determines
the performance of interconnect and is estimated on the basis of how many shells
each CNT possesses and it can be given as
K 1 T Di + K 2 , when Di > dt/T
Nch (Di ) = (16)
2
3
, when Di < dt/T
2π ε0 εr
Ce,ESC = × Nx (19)
Dshell +2·h t
cosh−1 Dshell
L ko,
L k,ESC = (21)
2 · Ntotal
282 T. Pathade et al.
1 μo Dshell + 2 · h t
L e,ESC = ln (22)
N x 2π Dshell
Fig. 6 Variation in propagation delay with varying interconnect length for different interconnect
materials. (Notations 1–5 on x-axis of the figure represent as follows: 1. MWCNTB, 2. SWCNTB,
3. MCB, 4. MLGNR, 5. Cu)
Prospective Incorporation of Booster in Carbon Interconnects … 283
3
Without buffer 300 Without buffer
1 100
0 0
Interconnects Interconnects
Fig. 7 Delay (left) and power dissipation (right) in varying interconnect materials for with buffer
and without buffer insertion cases
the interconnect materials. But since CMOS buffers consume significant amount of
energy from the power supply, these buffers are at the same time power-hungry also.
So this work also gives direction to future researchers in this area for using efficient
booster that consumes lesser power. It is investigated from the figures that MWCNTB
and SWCNTB interconnects outperform among all the interconnect materials. But
due to their development infeasibility, MCB interconnect is considered as suitably
optimal performing material for interconnects since MCB possesses more number of
conduction channels than SWCNT and MWCNT bundles. Hence, MCB is proposed
as a prospective interconnect material in the current work.
Another critical issue in interconnect performance is coupling phenomenon, and
it is modeled using the following crosstalk model. Figure 8 shows crosstalk model
Load
Driver Victim
for three interconnect lines. Since interconnect lines are parallel to each other, they
exhibit coupled capacitances and mutual inductances as shown in the figure. There
are basically two types of crosstalk analysis. First is the functional crosstalk in which
victim line is at quiescent and other lines are switching. Due to switching activities
of aggressor lines, there is insertion of noise at victim line. This is depicted in
the following simulation results. Due to this forced switching action, victim line
signal gets undershooted/overshooted. Crosstalk analysis is performed for 5 mm of
interconnect length for all interconnect structures.
It is analyzed from Fig. 9 that SWCNT and MWCNT bundles are good performing
candidates over other on-chip interconnect structures. But since their practical
feasibility is difficult, mixed CNT bundle can cope in the race of performance.
Another emerging critical issues in the high-speed nano-interconnects are its noise
immunity and clock jitter problems. Hence, the present work also briefly illustrates
300
225
(ps)
150
75
0
5 10 20 32
Spacing (nm)
(a)
SWCNTB MWCNTB MCB MLGNR Cu
Crosstalk induced Power
800
Dissipation (μW)
600
400
200
0
5 10 20 32
Spacing (nm)
(b)
Fig. 9 a Crosstalk-induced delay (top) and b crosstalk-induced power dissipation (bottom) for
different interconnect materials and varying spacing between capacitively and inductively coupled
interconnect lines
Prospective Incorporation of Booster in Carbon Interconnects … 285
signal integrity analysis in high-speed interconnects. The clock jitter and signal-to-
noise ratio are the measured entities and are computed using eye diagram, as shown
in Figs. 10 and 11.
An eye diagram is a powerful tool for understanding signal impairments in the
high-speed digital channel system, verifying transmitter output compliance, and
revealing the amplitude and time distortion elements that degrade the BER for diag-
nostic purposes [23]. By taking bandwidth samples of a high-speed digital signal,
an eye diagram represents the sum of samples by superimposing high and low levels
of the signal, and corresponding transition measurements. The eye diagram analysis
is performed for different interconnect materials, viz., copper, MLGNR, SWCNTB,
Eye amplitude
Eye height
Eye Opening
Fig. 10 Eye diagram representations and measurement entities for 1 Gbps data rate operation of
MWCNTB interconnect
Fig. 11 a Rise time jitter, b fall time jitter representations for 1 Gbps data rate operation of
MWCNTB interconnect
286 T. Pathade et al.
Table 2 Eye diagram measurement of parameters for different interconnect topologies at data rate
of 1 Gbps
Parameter Copper MLGNR SWCNTB MWCNTB MCB
Eye amplitude 0.899 0.901 0.951 0.947 0.829
(V)
Eye height −0.462 −0.476 −0.219 −0.237 −1.439
(db)
Eye width (ns) 9.875E−10 9.942E−10 9.831E−10 9.46E−10 9.81E−10
Eye SNR 7.77E−12 9.012E19 9.509E19 9.47E19 13.159
Eye jitter (rms) 3.999E−11 2.238E−12 4.263E−12 1.445E−11 4.375E−12
MWCNTB and MCB. Their performance is shown in Table 2. Figure 10 depicts the
eye diagram of MWCNTB interconnect.
Eye diagram is an aid for getting information about signal amplitude and time
distortions, and some useful parameters relevant to amplitude, eye height, rise and
fall jitter, and eye masks are computed through this research. These parameters are
measured and enlisted in Table 2.
Eye amplitude is measured as the difference between the one and zero levels of
the digital signal. In other words, it is the difference measured between mean values
of yellow lines (red lines) indicated in Fig. 10. Eye height is defined as the difference
between the variance of amplitude distortion. Measurements of eye amplitude and
eye height are shown in Fig. 10. The eye amplitude and eye height entities are vital
amplitude terms since the strength of these values determines whether the received
data bit is a “0” or “1”. Figure 10 depicts the eye opening which reflects the accuracy
of receiver system of the on-chip interconnects. The higher value of eye opening
factor signifies lower probability to ‘wrongly identify’ logical ‘1’ and ‘0’.
The signal-to-noise ratio of eye is the metric of the amplitude distortion and is
computed as a ratio of the desired signal level to the level of background noise,
together with other existing distortions. Higher SNR values are desirable for good
signal transmission. SNR is defined as,
Timing distortions are measured in terms of jitter. To estimate jitter, the time
variances of the rising and falling edges of an eye diagram at the crossing point
are captured. These are shown in Fig. 11. Fluctuations can be random and/or deter-
ministic. RMS jitter is defined as the standard deviation of the histogram (indicated
by pink lines in Fig. 10). All the measurements of eye diagram are performed at
high data rate of 1 Gbps. The simulations are carried out for 1 mm of interconnect
length in ADS tool. Measurement values tabulated in Table 2 show that copper is
failing to suffice with the high-speed operation while CNT bundles are frontrunners
in high-speed operational metrics.
Prospective Incorporation of Booster in Carbon Interconnects … 287
5 Conclusion
The present work investigates prospective booster technique for different inter-
connect materials. The various structures of on-chip interconnect considered are
copper and carbon-based SWCNT bundle, MWCNT bundle, mixed CNT bundle and
MLGNR. To speculate the performance of different interconnect materials and then
switching to futuristic interconnects, several signal integrity and reliability analyses
have been performed. It has been analyzed that with increase in length of intercon-
nects in DIL system, the performance of interconnect system degrades. To mitigate
the signal degradation issues in long interconnects, repeater insertion technique has
been proposed and implemented for all interconnect structures. It is inspected that
up to 20% delay can be reduced by incorporating carbon interconnects using CMOS
buffer insertion. Similarly, nearly 30% reduction in delay is possible for copper inter-
connect. Crosstalk is a function of pitch (spacing) between interconnect lines and is
one of the main parameters for inducing delay and power dissipation in the system.
Hence, crosstalk analysis is performed for various values of spacing between inter-
connects. It is observed that by using carbon nano-interconnects, crosstalk-induced
delay and power consumption reduction can be attained up to 50 and 25%, respec-
tively. It is inferred from various analyses that bundled CNTs are leading in high
performance. But due to feasibility issue of developing SWCNTB and MWCNTB,
MCB is proposed as potential on-chip interconnect structure for high performance.
Analysis shows that approximately 20% speed improvement and approximately 60%
lesser power dissipation improvement can be achieved if carbon interconnects are
used in place of copper. Furthermore, signal integrity of on-chip interconnect is
assessed using eye diagram and jitter analyses. From eye diagram analysis, signal-
to-noise ratio (SNR) is computed. It is investigated that graphene interconnects
possess higher SNR than its counterpart copper interconnects. From different anal-
yses performed in this work, it is envisaged that booster insertion technique along with
prospective grapheme-based mixed CNT bundle interconnects is prominent solution
to attain high performance for on-chip interconnects in high-speed IC designs.
References
1 Introduction
Owing to small size and low power consumption, single-electron transistors (SETs)
are becoming an attractive alternative in the post-CMOS era of semiconductor tech-
nology [1–4]. They operate on the principles of Coulomb blockade and quantum
mechanical tunneling (QMT), depicting the on and the off states of the device,
respectively [5–7]. SETs have proved their potential to be a promising candidate
for non-volatile memory as well as logic applications [2, 3, 8]. SET-MOS hybridiza-
tion is also one huge field of interest for logic development. It is envisaged that SETs
can outperform CMOS in nanometer regime with its virtues like ultra-low power
consumption and scalability [2, 3, 9]. Since last two decades, many researchers have
worked to overcome two fundamental issues of SETs, viz., (a) mass production and
(b) room temperature operation. In recent years, with advent in time and technology,
many CMOS-compatible processes are demonstrated for fabrication of SETs [2,
4, 10–12]. They include nanodamascene processes [4], patterning with reactive ion
etching and electromigration, chemical mechanical polishing-based fabrication [10],
self-assembly and self-alignment of quantum dots [11], even optical lithography of
silicon nanowire [12]. Thus, prospects of integrating SETs in a CMOS process are
bright. However, room temperature operation is still a matter of experimentation.
Coulomb blockade is inherently a low energy phenomenon [6, 7]. Hence, it cannot
be detected at ambient temperatures easily. So, to observe the SET operation in true
sense, the device needs to be engineered carefully. The material and dimensions of
island play crucial role in tunneling of electron between source and drain. Hence, the
effect of island on SET drain current along with CB and QMT must be observed. The
structure of island should also be compatible with the CMOS fabrication processes. In
this paper, we will present the rudiments of SET design followed by a detailed analyt-
ical approach. We will then simulate the device demonstrating thorough island engi-
neering. Results show occurrence of both, the Coulomb blockade and the quantum
mechanical tunneling in the optimized device. Finally, we will present validation of
our device with MIB model [3].
As shown in Fig. 1, a SET consists of two tunnel junctions on source and drain sides
sandwiching a quantum dot. Every junction offers a barrier—popularly known as
Coulomb blockade—to flow of electrons. These junctions or barriers ensure weak
coupling of source and drain with the quantum island. An electron from source,
after overcoming Coulomb blockade energy level, enters the island by the process
of tunneling. Simultaneously, an electron leaves the island and tunnels to the drain,
causing current to flow. Thus, to ensure the charge transfer, one needs to supply
Coulomb energy to the island. This energy can be supplied by an external voltage
source. It is given by [5–7],
Island Engineering of Single-Electron Transistor for Room Temperature Operation 291
Island
Source Drain
Tunnel
Junctions
e2
EC = (1)
2C
engineering, one can opt for island engineering to achieve acceptable I ON and I OFF
values. It is comparatively easy, simple to comprehend and process-friendly.
Figure 2a, b shows structure and MIB model of a single-gate SET [3]. Source,
drain and gate are made up of n-polysilicon (phosphorus, 1e19 cm−3 ), whereas the
substrate is p-silicon (boron, 1e15 cm−3 ). Tunneling junction capacitances on source
as well as drain side are C TD = C TS = C J , control gate capacitance is C G , tunnel
junction resistances are RTD = RTS = RT = 1 M. The total device capacitance is
defined as C = 2C J + C G . SET design parameters compatible with BSIM predictive
technology model (PTM) [14] and 22 nm node CMOS model operation at T =
300 K are chosen. Consequently, supply voltage is chosen to be 0.8 V. For a robust
reliable design of SET logic, which can operate with the least possible error at room
temperature, the charging energy E C must be as large as possible compared with the
Tunnel VDS
Barriers Island Gate
Drain TJ TJ
Source RTS RTD
Source Drain
CS CD
CG
VGS Island
(a) (b)
Fig. 2 a Single-gate SET structure, b equivalent MIB model where tunnel barriers are represented
by parallel RC combinations (RTS || C TS and RTD || C TD ), gate dielectric as gate capacitance C G ,
conductive island as black dot and two bias supplies, namely, V DS and V GS between drain––source
and gatesource, respectively [3]
Island Engineering of Single-Electron Transistor for Room Temperature Operation 293
3 Island Engineering
45 (TJ , Lisland)
(TJ , Lisland) 7
40 4 nm, 17 nm 4 nm, 17 nm
6 4 nm, 32 nm
Drain Cuurent, I DS (nA)
35 4 nm, 32 nm
Fig. 4 V D –I D characteristics of single-electron devices for a aluminum island and b copper island
for different island dimensions
exhibit blockade and tunneling as shown in Fig. 4a, b, for aluminum and copper
island, respectively.
The devices clearly show the mechanisms of Coulomb blockade and quantum
mechanical tunneling. In the limiting case of small structures, the structure resis-
tivity becomes proportional to ρ0 * λ for any given fixed dimension [17], where
ρ0 and λ are the bulk resistivity and mean free path for electron phonon scattering,
respectively. Thus, the metal with the lowest product ρ0 * λ is expected to exhibit the
highest conductivity. This product is 6.70e–16 m2 for copper and 5.01e–16 m2
for aluminum [17]. Hence, the current capacity of aluminum devices is higher than
the copper devices. Coulomb voltage and SET capacitances extracted from above
profiles for all the devices are listed in Table 3.
First observation here is that copper island devices offer better confinement than
their aluminum counterparts. Small islands have higher Coulomb voltage. This is
attributed to increased confinement and hence higher blockade. It is also observed
that the extracted values have higher degree of matching as the device capacitance
becomes smaller and smaller. Effect of island dimension and material can be seen in
Fig. 5.
So, it is found that bigger island leads to higher Coulomb blockade and lower
drain current.
44 4.5
TJ=3 nm TJ=4 nm
40 4.0
Lisland= 19 nm Lisland= 17 nm
36 3.5
Lisland= 34 nm Lisland= 32 nm
Drain Cuurent, IDS (nA)
16 1.5
12 1.0
8
0.5
4
0.0
0
0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0
Drain Voltage, VDS (V) Drain Voltage, VDS (V)
(a) Aluminum island variation for TJ= 3 nm (b) Aluminum island variation for TJ= 4 nm
TJ=3 nm 0.7
TJ=4 nm
7
Lisland= 19 nm Lisland= 17 nm
6 0.6
Lisland= 34 nm Lisland= 32 nm
Drain Cuurent, IDS (nA)
Drain Cuurent, IDS (nA)
5 0.5
4 0.4
3 0.3
2 0.2
1 0.1
0 0.0
0.0 0.4 0.8 1.2 1.6 2.0 0.0 0.4 0.8 1.2 1.6 2.0
Drain Voltage, VDS (V) Drain Voltage, VDS (V)
(c) Copper island variation for TJ= 3 nm (d) Copper island variation for TJ= 4 nm
50
TJ = 4 nm TJ = 3 nm
4 Al ( Lisland= 17 nm) Al ( Lisland= 17 nm)
40
Drain Cuurent, I DS (nA)
Drain Cuurent, I DS (nA)
3
30
2 20
1 10 Al ( Lisland= 32 nm)
Cu ( Lisland= 17 nm
Cu ( Lisland= 17 nm
Al ( Lisland= 32 nm)
0 Cu ( Lisland= 32 nm) 0 Cu ( Lisland= 32 nm)
0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0
Drain Voltage, VDS (V) Drain Voltage, VDS (V)
Fig. 5 I–V profiles for variation in materials and dimensions of island and junction
296 R. Shah et al.
Fig. 6 Output
characteristics of SET using 2
TCAD
1
-1
-3
Tunneling Coulomb Blockade Tunneling
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
4 Validation
Validation of our work was done by implementing SET in cadence with the help
of MIB model. Parameters extracted from the TCAD simulations were used for
implementing MIB model [3]. The model setup and various results are as shown in
Fig. 7a–d.
Clear-cut manifestation of Coulomb blockade, tunneling and Coulomb oscil-
lations indicate successful implementation of the designed device. Effect of gate
voltage can be easily seen in both output as well as the transfer characteristics.
Comparison of our technique with other groups peer work is mentioned in
Table 4.
5 Conclusion
0.0
Fig. 7 a Electric field approaching at island, b simulated electric field of SET, c and d transfer and
output characteristics of SET using MIB
Table 4 Comparison of proposed SET with other peer work with respect to various electrical and
physical parameters
Reference Technique Pros Cons
Dubuc [4, 18] BEOL processing CMOS BEOL process Required 3D integration,
compatibility, high more back ground charge
current drive due to metallic structure
Joshi [10, 19] Island material variation Simple, CMOS No room temperature
fabrication processing operation
Hajjam [13] Tunnel barrier Accurate Time consuming, no
optimization CMOS technology
compatibility
Proposed work Island engineering Simple, accurate Depends on resolution of
(choosing proper island nanoscale fabrication
material and dimension) processes
298 R. Shah et al.
BSIM predictive technology model (PTM) and 22 nm node CMOS model. The device
can operate at a bias voltage of 0.8 V and can be used in logic as well as memory
applications. The proposed technique is simple and easily realizable compared with
the other techniques proposed.
References
18. Beaumont A, Dubuc C, Beauvais J, Drouin D (2009) Room temperature single-electron tran-
sistor featuring gate-enhanced on-state current. IEEE Electron Device Lett 30:766–768. https://
doi.org/10.1109/led.2009.2021493
19. Lee Y-C, Joshi V, Orlov AO, Snider GL (2010) Si single electron transistor fabricated by
chemical mechanical polishing. J Vac Sci Technol B Nanotechnol Microelectron Mater Process
Meas Phenom 28(6). https://doi.org/10.1116/1.3498748