Lung Disease Detection Using X Rays: Under The Mentorship of
Lung Disease Detection Using X Rays: Under The Mentorship of
Lung Disease Detection Using X Rays: Under The Mentorship of
Submitted by:
( 101611041) RABJOT SINGH
(101611040) PRATEEK GARG
(101603107) GYANDEEP DIGRA
(101783032) PARSHANT JINDAL
This project aims to develop a fully automated system for diagnosis of lung diseases using chest x-rays. The
platform will enable its users and professional diagnostic centres to upload their chest radiographs (x-rays)
and get accurate predictions based on those. Chest radiography has important clinical value in the diagnosis
of diseases. Thus the automatic detection of chest disease based on chest radiography has become a hot topic
in medical imaging research.
This project has overall two parts :
1.Backend Server - The server will cater the request from the medical diagnosis labs and individual users
who do not have access professional consultation for medical diagnosis. The server should be capable of
generating report whenever a chest radiograph is uploaded and provide accurate results.
2. Mobile App - This mobile application will serve as a client to the backend server. It will be the primary
target for interaction with the users. The user can install this application on their devices and use it to view
reports.
All these components together will act as a Report Generation Tool from the chest radiographs that ensures
accurate results.
Apart from the above said components, there will be a machine learning algorithm that will learn from the
new x-rays being uploaded thus continuously improving its accuracy.
2
DECLARATION
We hereby declare that the design principles and working prototype model of the project entitled ‘LUNG
DISEASE DETECTION USING X RAYS’ is an authentic record of our own work carried out in the
Computer Science and Engineering Department, TIET, Patiala, under the guidance of Dr Ashutosh Aggarwal
during 6th semester (2019).
3
ACKNOWLEDGEMENT
We would like to express our thanks to our mentor Dr. Ashutosh Aggarwal. He has been of great help in our
venture, and an indispensable resource of technical knowledge. He is truly an amazing mentor to have.
We are also thankful to Dr Maninder Singh, Head, Computer Science and Engineering Department, entire
faculty and staff of Computer Science and Engineering Department, and also our friends who devoted their
valuable time and helped us in all possible ways towards successful completion of this project. We thank all
those who have contributed either directly or indirectly towards this project.
Lastly, we would also like to thank our families for their unyielding love and encouragement. They always
wanted the best for us and we admire their determination and sacrifice.
4
TABLE OF CONTENTS
ABSTRACT
DECLARA TION
ACKNOWLEDGEMENT
LIST OF FIGURES
LIST OF TABLES
LIST OF ABBREVIATIONS
CHAPTER 1- INTRODUCTION
1.1.3 GOAL
1.1.4 SOLUTION
2.2 STANDARDS
2.3.1 Introduction
2.3.1.1 Purpose
5.2.1 DATA
7
5.3.1.2 Test Strategy
APPENDIX A: REFERENCES
8
LIST OF TABLES
9
LIST OF FIGURES
10
LIST OF ABBREVIATIONS
ML Machine Learning
CNN Convolutional Neural Network
ABBR2 Abbreviation 2
11
CHAPTER 1 - INTRODUCTION
A large number of diseases that affect the worldwide population are lung-related. Therefore, research in the
field of Pulmonology has great importance in public health studies and focuses mainly on Infiltration,
Atelectasis, Cardiomegaly, Effusion, Mass, Nodule, Pneumonia, Pneumothorax.
The World Health Organisation (WHO) estimates that there are 300 million people who suffer from asthma,
and that this disease causes around 250 thousand deaths per year worldwide (Campos and Lemos, 2009). In
addition, WHO estimates that 210 million people have Cardiomegaly. The disease caused the death of over
300 thousand people in 2005 (Gold Cardiomegaly, 2008). Recent studies reveal that CARDIOMEGALY is
present in the 20 to 45 year-old age bracket, although it is characterised as an over-50-year-old disease.
Accordingly, WHO estimates that the number of deaths due to CARDIOMEGALY will increase 30% by
2015, and by 2030 CARDIOMEGALY will be the third cause of mortalities worldwide (World…, 2014).
For the public health system, the early and correct diagnosis of any pulmonary disease is mandatory for
timely treatment and prevents further death. From a clinical standpoint, diagnosis aid tools and systems are
of great importance for the specialist and hence for the people’s health.
X RAY images of lungs represent a slice of the ribcage, where a large number of structures are located, such
as blood vessels, arteries, respiratory vessels, pulmonary pleura and parenchyma, each with its own specific
information. Thus, for pulmonary disease analysis and diagnosis, it is necessary to segment lung structures.
It is worth noting that segmentation is an essential step in image systems for the accurate lung disease
diagnosis, since it delimits lung structures in X RAY images. Indeed, image processing techniques can help
computer diagnosis if lung region is accurately obtained.
Following the segmentation process, an automatic procedure is applied to detect possible diseases in lung X
RAY images in order to guide the radiologist diagnosis. Some studies have yielded promising disease
detection results as reported by Trindade (2009) that uses texture descriptors extracted from the gray level
concurrence matrix (GLCM) (Haralick et al., 1973) to describe three disease patterns (nodule, emphysema
and frosted glass) and a normal one. Shimo et al. (2010) also employ GLCM texture descriptors to determine
if the lungs are healthy or not. Furthermore, some papers address the detection of certain specific diseases,
such as nodules (Ayres et al., 2010; Silva and Oliveira, 2010), and emphysema (Felix et al., 2007, 2011).
12
1.1 PROJECT OVERVIEW
1. Backend Server - The server will cater the request from the medical diagnosis labs and individual
users who do not have access professional consultation for medical diagnosis. The server should be
capable of generating report whenever a chest radiograph is uploaded and provide accurate results.
2. Mobile App - This mobile application will serve as a client to the backend server. It will be the
primary target for interaction with the users. The user can install this application on their devices and
use it to view reports.
All these components together will act as a Report Generation Tool from the chest radiographs that
ensures accurate results.
Apart from the above said components, there will be a machine learning algorithm that will learn from
the new x-rays being uploaded thus continuously improving its accuracy.
Lack of trained radiologists make it very difficult to provide accurate interpretation and get accurate
predictions from chest x-rays.
1.1.3 GOAL
Accuracy in detection of disease in lungs.
1.1.4 SOLUTION
Pre prediction of the disease so that a proper treatment can take place. So design a fully automated
system for diagnosis of lung diseases using chest x-rays. The platform will enable its users and
professionals to get accurate predictions based on chest x-rays.
13
1.2 NEED ANALYSIS
The need of our project is dire and can be defined and explained under the following headings:
1. Advantages of chest X-Rays include their low cost and easy operation. Even in underdeveloped areas,
machines are very affordable. Chest radiographs are widely used in the detection and diagnosis of
lung diseases and contain a large amount of information about a patient’s health. However, correctly
interpreting the information is always a major challenge.
2. Overlapping of the tissue structure and lack of well trained radiologists make it very difficult to
provide accurate interpretations of the chest X-rays.
3. Advancing the NIH Director’s global health initiative by making a significant impact in the
development and application of low cost disease detection technologies in resource-challenged
regions.
4. Developing screening technology for lung diseases, a major global health challenge identified by the
WHO as the second leading cause of mortality from infectious disease. HIV and TB co-infections
result in treatment complications and spread of the disease.
5. Advancing the science in image analysis for automatically detecting pulmonary diseases from digital
CXR images.
6. Instead of going to a medical professional for consultation from the report, the users can easily get
accurate results even in areas where there is no professional help and based on those results they can
get the required medical help.
Till now the research done on this topic is only limited to one specific disease but here we have done the
detection with more than one disease on a single go in one model which will result in multiple detections
with improved accuracy due to availability of huge dataset.
The major problem is expert doctors are not able to find the problem with the patient by just their x-ray as
there is no clear visibility in them which may result in ignorance of the upcoming problem with patient. So
14
the current scope of the project is to get that visibility by applying image processing techniques and then
compare it with the available data set and generate the results which further can be verified by the experts
this also saves time and chances of human errors.
S. No. Assumptions
Availability :
1
Clear chest radiographs should be available in digital form (eg. JPEG)
Correct Labelling :
3
The chest X-rays are correctly labelled.
This project exploits the convergence of imaging research and system development at the NLM and NIH
policy objectives in global health. The following are project objectives :
1. Advance the state-of-the-art in automated CXR image analysis. Automatically detect presence of
pulmonary diseases including TB and other relevant disease in digital CXRs, leading to suitable
discrimination for screening, as well as compute a measure of confidence in its determination.
2. Develop deployable screening software such that it can aid field clinical officers in decision making at
the point of care, and for radiologists to organise their workload.
3. Recognising the severity of lung diseases and the shortage of radiological services in western Kenya,
deploy developed software on to a self-powered mobile X-ray truck that AMPATH uses in rural areas.
Their staff take chest x-rays of the population and employ the NLM-developed software to screen for the
presence of lung diseases and other diseases.
15
1.7 METHODOLOGY USED
The system architecture is designed as a set of cascaded modules, with the flexibility to implement alternate
image analysis pathways followed by late stage decision fusion. As currently developed, every image is
analysed for automatic lung region localisation. Image features are extracted from within the localised lung
boundary, leading to 2-class normal/abnormal decision for the input CXR image. We are also studying
alternate techniques for detecting abnormalities in the CXR without localising the exact lung boundary. The
method also uses edge detection to find spurious contours that could be indicative of disease. Initial results
suggest that the method is fast and quite powerful in detecting certain kinds of pathologies.
Figure 1. An overview of our lung segmentation algorithm: (Stage-I) finding the similar lung CXRs from an atlas; (Stage-II) warping selected
images to patient CXR; and, (Stage-III) lung boundary detection using a graph-cuts optimization approach.
http://ceb.nlm.nih.gov/repos/chestImages.php
16
Stage 1:
First we use a content-based image retrieval (CBIR) method to identify a small set of similar appearing
CXR images from an expert-annotated set, that we shall call the “atlas set” hereafter. Horizontal and
vertical projection profiles are computed for all CXR images in the atlas set. Then, we measure the
similarity of each projection profile between the atlas set and the patient chest X-ray using the average
Bhattacharyya coefficient.
Stage 2:
In order to create the lung model, we register the selected set of CXR images that have similar appearance
but may have different lung outlines. The transformation mapping is done using the SIFT-flow algorithm.
The algorithm first models the local gradient information of the observed image using scale invariant
feature transforms (SIFT). Next, a minimization algorithm calculates the SIFT-flow, the transformation
mapping between each selected atlas image and the patient image. The mapping is used to register and
warp the selected atlas CXRs, making them geometrically aligned to the patient image. The lung model
for the patient X-ray is then composed as the mean of the warped lung masks from the registered atlas
images. The model is a probabilistic shape prior in which each pixel value is the probability of the pixel
being part of the lung field in the patient image.
Stage 3:
As a refining stage, we perform image segmentation using the graph cut algorithm and model the
segmentation process with an objective function. The max-flow/min-cut algorithm minimizes the
objective function to find a global minimum that corresponds to foreground (within-lung) and background
(outside- lung) labeling of the pixels.
17
Evaluation:
A radiology manually generated gold standard segmentations for the atlas chest X-ray images. The process
was aided using an interactive boundary marking tool [39], developed in prior NLM research and reported to
the Board. The radiologist then corrected these outlines using FireFly [40], a web-based labeling tool
developed at the University of Missouri. The method was evaluated on three datasets (JSRT, Montgomery,
and India) described, previously, in Section 3 above. We also compared the system performance with the
systems in the literature. The Jaccard Index3 (which measures overlap agreement) resulted in average
accuracy of 95.4% on the public JSRT database, which bests all prior published results. A similar degree of
accuracy of 94.1% and 91.7% on Montgomery and India datasets, respectively, demonstrates its robustness
to image variety.
! A well trained accurate model for prediction of lung diseases using chest X-rays.
! Aid the professionals in early and speedy classification of X-rays.
! Providing accurate results where trained medical professionals are not available.
Mobile App - This mobile application will be the primary target for interaction with the users. The user can
install this application on their devices and use it to view reports.
The model we are designing will be generating results for Infiltration, Atelectasis, Cardiomegaly, Effusion,
Mass, Nodule, Pneumonia, Pneumothorax with a good accuracy. On the other hand existing model focuses
only on one of the above mentioned diseases. While the app will be generating report that is directly
available to the fellow person. So no long waiting queues.
18
CHAPTER 2: REQUIREMENT ANALYSIS
Chest X-rays produce images of your heart, lungs, blood vessels, airways, and the bones of your chest
and spine. Chest X-rays can also reveal fluid in or around your lungs or air surrounding a lung. As the
most common examination tool in medical practice, chest radiography has important clinical value in
the diagnosis of disease. Thus, the automatic detection of chest disease based on chest radiography has
become one of the hot topics in medical imaging research. Our project focuses on computer-aided
detection (CAD) systems technology applied in chest radiography. The paper presents several common
chest X-ray datasets and briefly introduces general image preprocessing procedures, such as contrast
enhancement and segmentation, and bone suppression techniques that are applied to chest radiography.
If you go to your doctor or the emergency room with chest pain, a chest injury or shortness of breath,
you will typically get a chest X-ray. The image helps your doctor determine whether you have heart
problems, a collapsed lung, pneumonia, broken ribs, emphysema, cancer or any of several other
conditions.
The chest X-ray is a common way to diagnose disease. But it can also be used to tell whether a certain
treatment is working. Some people have a series of chest X-rays done over time, to track whether a
health problem is getting better or worse.
The Dubai Health Authority (DHA) on April 17, 2018 announced the preliminary results of a chest X-
ray artificial intelligence (AI) algorithm deployed across DHA medical fitness centers (MFCs). The
collaboration is the first validation of Agfa Healthcare’s Augmented Intelligence (AI) in the United
Arab Emirates (UAE).The partners began reviewing the use of artificial intelligence-enabled
workflows in radiology with Agfa more than two years ago. Upon completion of phase one of onsite
validation early January 2018, and on analysis of preliminary data, the algorithm was able to correctly
19
identify lung diseases in chest X-rays approximately 90 percent of the time. Phase two results in
March 2018 showed further improved sensitivity to 95 percent
The paper presents several common chest X-ray datasets and briefly introduces general image
preprocessing procedures, such as contrast enhancement and segmentation, and bone suppression
techniques that are applied to chest radiography. Then, the CAD system in the detection of specific
disease (pulmonary nodules, tuberculosis, and interstitial lung diseases) and multiple diseases is
described, focusing on the basic principles of the algorithm, the data used in the study, the evaluation
measures, and the results. Finally, the paper summarizes the CAD system in chest radiography based
on artificial intelligence and discusses the existing problems and trends.
They experiment a set of deep learning methods for the multi label classification of ChestX-ray14
dataset and provide results comparable to the state-of-the-art. They provide comparison results for
cross entropy and pairwise error loss for the task of multi label classification of the dataset. Further,
they implement a cascade network that improves upon the performance of deep learning models along
with modeling label dependencies. In summary, the present work provides optimistic results for the
automatic diagnosis of thoracic diseases. However, future work related to disease localization and
improvement of classification performance is in progress.
They discussed several state-of-the-art models and novel approaches for detecting, classifying, and
analysing various abnormalities involving the chest. The biggest impediment to achieving superhuman
level performance seems to come from the lack of large, high-quality datasets. However, the future
looks bright — with larger, better-annotated datasets and innovative models targeted towards working
with medical images, it is plausible that deep learning will bring phenomenal improvement to the
efficiency of radiologists’ workflow and quality of radiological diagnoses worldwide.
20
2.1.4 The Problem That Has Been Identified
Usually there are four steps in a CAD system: algorithm preprocessing, extracting ROI regions,
extracting ROI features, and classifying disease according to the features. In the algorithm
preprocessing and extraction of ROI, the techniques of enhancement and segmentation are very
important. Usually, there are many ways to highlight lesions and suppress noise. In the segmentation,
the deformable model and the deep learning method are the best, while the rule-based methods have
poor performance, and they often used together with other methods to improve the segmentation
performance. The techniques of bone suppression are used less frequently in the literature, but
removing the rib and clavicle that block lung abnormalities can improve the system performance; In
terms of feature extraction, the features extracted by traditional machine learning algorithms include
geometric features, texture features, and shape features, which are usually processed to reduce the
dimensionality due to feature redundancy. However, hand-crafted features could have errors that affect
the classification performance and are gradually replaced by deep learning methods. In terms of
classifier selection, the performance of support vector machine and random forest in traditional
algorithms may be better, but with the excellent performance of deep learning in image classification,
the deep learning methods have gradually become the mainstream.
Following are the tools that have been surveyed for both software and hardware components:
Hardware:
1.Servers to host the application.
Software:
1.Python
2.Tensorflow/Keras
3.Google Colab
4.Kaggle
5.Flask
21
2.3 SOFTWARE REQUIREMENTS SPECIFICATION
2.3.1 Introduction
2.3.1.1 Purpose:
“One hospital in Boston has 126 radiologists. Liberia has two.”
Frankly, even if these two radiologists have the speed of the Flash, the mental faculties of
Einstein, and no need for “amenities” like sleep and a social life, the burden of chest diseases
would prove too much to bear. Around 18 people die from lung cancer per hour in the United
States alone, and that number would be significantly higher were it not for the routine screening
of patients and early detection of nodules. Deep learning may help automatically discover chest
diseases at the level of experts, providing the two Liberian radiologists with some respite and
potentially saving countless lives worldwide.
The intended audience for this product is doctors as it will aid and assist them in classifying X-
Rays, sorting and pre-processing so they are able to focus on the cases which require their
attention and screening out the normal ones. Once we get a stable and accurate model that can
correctly predict the results independently without requiring any human intervention they can be
used to screen the X-Rays autonomously and generate automated templates giving much faster
results and reports.
! Workflow automation for radiologists - Ability to focus on suspected chest X-rays faster
instead of manual searching;
! Improved turn-around times - Expand the scope of chest X-ray screening program to add
more volume and capacity
The product has wide applications in detecting various diseases, and they are playing a vital role
as a second opinion for medical experts. In addition, CAD algorithms also reduce the workload
of medical experts by reviewing many CXRs quickly.
! Improved results by faster screening and results and ability to hande large volumes.
This section describes the way user interact with the system.
Can run on any desktop with web browser and internet connection.
23
2.3.4 Other Non-functional Requirements
This section will deal with some non-functional requirements of our project.
Availability
Response Time
The product should be able to reply to the query in a given amount of time
Processing speed
Processing speed of the app should be fast as it needs to process large volume data for optimal
functionality.
Users will have unique id and password which they can use to login the app.
User’s personal data will not be shared with any other company/person.
Item Price
1.Hosting the WebApp on Cloud Rs.3000
Total Cost Rs.3000
24
2.5 RISK ANALYSIS
Project Risk
If any member gets sick or is not able to do his part of work for some
will increase.
Product Risk
If the model does not give accurate results, it may diagnose incorrectly which can have fatal results.
25
CHAPTER 3 – METHODOLOGY ADOPTED
26
3.4 TOOLS AND TECHNOLOGIES USED
27
CHAPTER 4 - DESIGN SPECIFICATION
28
4.2 DESIGN LEVEL DIAGRAMS
29
USE CASE DIAGRAM
30
Actor: User, Medical Diagnostic Lab
Preconditions: User wants a report based on their Chest X-ray
Postconditions: ● Success end Condition
1. Report will be generated based on the file uploaded.
2. User will be alerted in case of abnormalities.
● Failure end Condition
1. Report Not Generated
2. Inaccurate Report Generated.
Normal Scenario: 1. User will log into the application.
2. User will upload the file.
3. Pre-processed Image will be fed to the neural network.
4. The neural network makes predictions and a report is generated
based on the predictions
Alternative Flow: ● First
1. System determines user is logged on.
2. Return to normal scenario step 2.
● Second
1. User logs out.
2. Return to normal scenario step 1.
● Third
1. User does not have an account.
2. User creates an account.
3. System confirms account creation.
4. Return to normal scenario step 1.
Extensions: If there is abnormality in the report then the user will get a
recommendation of doctors.
Special Requirements: ● Performance
1. The device shall display report within 5 minutes.
● User Interface
1. The application shall display all outputs in english language.
2. User-friendly interface.
31
ACTIVITY DIAGRAM
32
CLASS DIAGRAM
33
4.3 USER INTERFACE DIAGRAMS
Figure 10
Figure 10
34
Figure 11
35
4.4 SNAPSHOTS OF WORKING PROTOTYPE MODEL
Figure 12
36
CHAPTER 5: CONCLUSIONS AND FUTURE DIRECTIONS
5.2 CONCLUSION
In a developing country like India it is difficult for the government to maintain a regular surveillance on the
road conditions and therefore sometimes these small potholes results in some large accidents, costing
someone's life sometimes. That’s where our project comes in, with this we can not only mark the potholes
present on the road but also the road conditions preventing the user from any major accidents and also
alerting the government about these potholes. With this quick actions can be taken by the government to
repair the potholes. We look forward to further improve and expand our project with the coming period of
time.
37
5.4 Future Work Plan
38
APPENDIX A: REFERENCES
[1] J. S. Miller, and W. Y. Bellinger (2003), Distress identification manual for the long-term
pavement performance program, FHWA-RD-03-031, Federal Highway Administration,
Washington, DC, USA.
[2] Mohan, P., Padmanabhan, V.N., & Ramjee, R. (2008). Nericell: rich monitoring of road
and traffic conditions using mobile smartphones. SenSys.
[3] Mednis, Artis & Strazdins, Girts & Zviedris, Reinholds & Kanonirs, Georgijs & Selavo,
Leo. (2011). Real Time Pothole Detection Using Android Smartphones with
Accelerometers. 1 - 6. 10.1109/DCOSS.2011.5982206.
[4] Mednis, Artis & Strazdins, Girts & Zviedris, Reinholds & Kanonirs, Georgijs & Selavo,
Leo. (2011). Real Time Pothole Detection Using Android Smartphones with
Accelerometers. 1 - 6. 10.1109/DCOSS.2011.5982206.
[5] Song, Hyunwoo & Baek, Kihoon & Byun, Yungcheol. (2018). Pothole Detection using
Machine Learning. 151-155. 10.14257/astl.2018.150.35.
[6] S.H. Son Dept. of Comput. Sci., Virginia Univ., Charlottesville, VA, USA
39