Internship Report 1128

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 32

HEALTHBOT

AN INTERNSHIP REPORT

Submitted by

M Z MUHAMMAD JAABIR
(RRN: 190071601128)

in partial fulfillment for the award of the degree of

BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING

DECEMBER 2022
BONAFIDE CERTIFICATE

Certified that this project report “HEALTHBOT” is the bonafide work of “M Z


MUHAMMAD JAABIR (RRN: 190071601128)” who carried out the project
work under my supervision. Certified further, that to the best of our knowledge
the work reported herein does not form part of any other project report or
dissertation on the basis of which a degree or award was conferred on an
earlier occasion on this or any other candidate.

SIGNATURE
Dr. S.P. VALLI
Assistant Professor (Sel. Gr) /
CSE
Department of Computer Science
and Engineering
B.S. Abdur Rahman Crescent
Institute of Science and
Technology,
Vandalur, Chennai – 600 048

ii
Internship certificate

iii
VIVA VOCE EXAMINATION

The viva voce examination of the project work titled “HEALTHBOT”,

submitted by M Z MUHAMMAD JAABIR (RRN: 190071601128) is held on

……………………………….

INTERNAL EXAMINER

iv
ACKNOWLEDGEMENT

I sincerely express my heartfelt gratitude to Dr. A. PEER MOHAMED, Vice


Chancellor, B.S. Abdur Rahman Crescent Institute of Science and
Technology, for providing us an environment to carry out our course
successfully.

I sincerely thank Dr. N.RAJA HUSSAIN, Registrar, for furnishing every


essential facility for doing our project.

I thank Dr. SHARMILA SANKAR, Dean, School of Computer, Information


and Mathematical Sciences for her motivation and support.

I thank Dr. E. SYED MOHAMED, Professor and Head, Department of


Computer Science and Engineering, for providing strong oversight of vision,
strategic direction and valuable suggestions.

I obliged our class advisor Dr. S.P.VALLI, Assistant Professor (Sel. Gr),
Department of Computer Science and Engineering for her professional
guidance and encouragement throughout the internship period.

I thank Mrs. SRI KAMANI, Technical Manager and Mentor, Object


Automation Software Solutions Pvt. Ltd., for her guidance, keen interest,
expert counselling and uplifting ideas throughout the project and without her
guidance this project would not have been a successful one. I also feel free
communication techniques while working on this project.

I thank all the Faculty members and the System Staff of the Department of
Computer Science and Engineering for their valuable support and assistance
at various stages of project development.

M Z MUHAMMAD JAABIR

v
TABLE OF CONTENTS

CHAPTER NO. TITLE PAGE NO.

LIST OF FIGURES
1 Company Overview
1.1 Overview 1
1.2 Website 1
1.3 Logo 1
2 Role and Responsibility
2.1 Role 2
2.2 Responsibility 2
3 Project Overview
3.1 Overview 3
3.2 Scope And Considerations 3
4 Theoretical Analysis
4.1 Technology Stack 5
5 Steps to Build The Application
5.1 Research 7
5.2 Key Features 7
5.3 UI/UX Designing 8
6 Source Code 10
6.1 Result 22
7 References 24
8 Conclusion 25

vi
LIST OF FIGURES

FIGURE NO. TITLE PAGE.NO

1 Company Logo 1

2 Pipeline Flowchart 7

3 Text Cleaning Process Flow 8

4 UI Design 9

5 Landing Page 22

6 Using the Yes or No buttons 22

7 Adding additional symptoms 23

8 Final list of top predictions 23

vii
CHAPTER 1
COMPANY OVERVIEW

1.1 OVERVIEW
OBJECT AUTOMATION SYSTEM SOLUTIONS PRIVATE LIMITED is a
Private incorporated on 12 January 2018. It is classified as non-govt company
and is registered at Registrar of Companies, Chennai. Its authorized share
capital is Rs. 1,000,000 and its paid-up capital is Rs. 100,000.
The company is managed by two able and efficient directors:
 Bragathalakshmi Bharani
 Sandhya Palani

Object Automation System Solutions Pvt. Ltd., a Technology Solutions


company specialized in designing and developing Artificial Intelligence based
Hardware and Software solutions to solve Business challenges.
Object Automation System Solutions Pvt. Ltd. Creates cutting-edge
business solutions using Data and AI, providing training in them to enhance
client experiences. Utilizing a platform of cloud and on-premises
infrastructure, they develop solutions that are productive for end users and
their partners.
They have partnerships with industry leaders like IBM, Microsoft, DELL,
HPE, OpenPower and many more enterprises and organizations including
collaboration with universities for research and development interests.

1.2 WEBSITE
Object-Automation (https://object-automation.com/)

1.3 LOGO

Fig 1: Company Logo

1
CHAPTER 2
ROLE AND RESPONSIBILTY
2.1 ROLE
My role at the organization was “AI and Data Science Intern”, I worked
on a project called “HEALTHBOT” for about 7 months starting from May till
November.

2.2 RESPONSIBILTY
During the internship period knowledge of various platforms and
programming languages was gained. The target was to deliver a working
model for the chatbot system by using different tools and frameworks such as
Neo4j, Spacy and AIML. Hence, to meet that objective, this internship
required extensive preliminary knowledge about the core Python and
intermediate knowledge about machine learning, Deep Learning and NLP
before actually analysing the actual requirement of the system. The study was
required not only to understand the subject but also to realize the solutions to
the existing problems and implementing the findings from the study was
another bigger challenge.
The study about the chatbot was from various sources such as
research papers, blogs and videos. Other major activities carried out during
the internship was an extensive study of the current online platform,
presentations of study analysis and practical implementations, and most
importantly the team discussions to analyse the customer change request.
The regular meetings and discussions with mentors, project manager
and team members helped me to widen my horizon of knowledge of the
existing system and problem background. Machine learning models’
development is one of the major services of the Object Automation Software
Solutions Pvt. Ltd. Company which comes under software development.
Overall, my responsibility was primarily focused on:
 Knowledge graph implementation
 Chatbot model development and its UI
 Data cleaning

2
CHAPTER 3
PROJECT OVERVIEW

3.1 OVERVIEW
Our team’s project “HEALTHBOT” involved us in creating a type-II
chatbot used for identifying a patient’s disease by using knowledge graph as
its database. A type-II chatbot is a rule based chatbot based on predefined
conversational paths, for instance, where users get predefined question and
answer options. If a user tries to ask questions outside these pre-designed
questions, the bot cannot answer them.
This is why it’s important to design the rule-based chatbot in a versatile way.
Different media formats such as images, GIFs, audios, and videos can help if
necessary. Also, a personal approach (requires the prior request of the name
or first name) and different response options improve the experience. Rule-
based bots are quickly designed, implemented and inexpensive. Therefore, it
requires user cooperation in the input to help answer the questions.
The chatbot that we developed primarily uses technologies like Natural
Language Processing (NLP) and Graph-based Database Management
System (DBMS) called Knowledge Graph. Basically, our chatbot takes input
from the user, their symptoms and diagnoses the user with a disease by
traversing the knowledge graph.

3.2 SCOPE AND CONSIDERATIONS


The initial scope of the project was to develop a basic chat interface
which can help patients diagnose their diseases based on the symptoms they
are experiencing. We had to collect data and clean it to make it accessible for
storing it in a knowledge graph. We had to make use of the knowledge graph
as it is best suited for our project’s needs.
The project was initially Command Line Interface (CLI) based and then
later changed to a web-based application. Our project’s backend process
involves connecting to the knowledge graph based on Neo4j’s graph
database AuraDB. And the front-end is basically a webpage hosted locally

3
using Flask. With the help of Flask and some custom code, we were able to
integrate the back-end and the front-end.
Future considerations of our project include creating a mobile app
interface with improved User Interface/ User Experience and improved
models for enhancing Natural Language Processing for the identification of
relationship between the user’s input and the data in the knowledge graph.
This way, we would be able to improve the relationship between the chatbot
and the user. Our final goal is to make sure that our user’s require less
amount of interaction.

4
CHAPTER 4
THEORETICAL ANALYSIS

4.1 TECHNOLOGY STACK


 Flask
Flask is a micro web framework written in Python. It is classified
as a microframework because it does not require particular tools
or libraries. It is a Python module that lets you develop web ap-
plications easily. It has a small and easy-to-extend core: it’s a
microframework that doesn’t include an ORM (Object Relational
Manager) or such features. It does have many cool features like
URL routing, template engine.

 Neo4j
Neo4j is a popular Graph Database. Neo4j is the world's leading
open-source Graph Database which is developed using Java
technology. It is highly scalable and schema free (NoSQL).

 spaCy
spaCy is a free open-source library for Natural Language Pro-
cessing in Python. It features NER, POS tagging, dependency
parsing, word vectors and more. It is written in the programming
languages Python and Cython.

 scispaCy
scispaCy is a Python package containing spaCy models for pro-
cessing biomedical, scientific or clinical text. We use the
“en_core_sci_lg” model as it is a full spaCy pipeline for biomed-
ical data with a larger vocabulary and 600k word vectors.

 Custom-made SpellChecker

5
This module was custom-made by our team to help us spell
check the user’s input. It is dependent on the spacy and nltk
modules to clean and find the misspelled words respectively.

 Other Libraries/modules including:


o AIML
AIML stands for Artificial Intelligence Modelling Lan-
guage. AIML is an XML based markup language meant to
create artificial intelligent applications. AIML makes it
possible to create human interfaces while keeping the im-
plementation simple to program, easy to understand and
highly maintainable.
o pandas
pandas is a software library written for the Python pro-
gramming language for data manipulation and analysis.
In particular, it offers data structures and operations for
manipulating numerical tables and time series. It is syn-
onymous with Data Science and Machine Learning pro-
jects.
o numpy
Another library similar and popularly used among Data
Scientists is numpy. NumPy is a library for the Python
programming language, adding support for large, multi-di-
mensional arrays and matrices, along with a large collec-
tion of high-level mathematical functions to operate on
these arrays.

6
CHAPTER 5
STEPS TO BUILD THE APPLICATION

5.1 RESEARCH
Initially, we started looking for the various ways to implement the base
idea of the project using different technologies. After delving in the research,
we found the required modules and applications for our project. Once that
was done, we created a pipeline for creating the initial and basic idea of the
chatbot.

Fig 2: Pipeline Flowchart

5.2 KEY FEATURES


 Clean Text

7
Fig 3: Text Cleaning Process Flow

 Check spelling
The SpellChecker requires a corpus. From the corpus, it gets
the word frequency for all the distinct vocabularies. It first token-
izes the input text and then iterates through the tokens one by
one to check their spellings based on the given corpus. It uses
the minimum edit distance by using dynamic programming for
the misspelled word and the list of vocabularies to get the edit
distance instantly. Finally, based on the word probability the mis-
spelled word will be replaced by the word which has the highest
word frequency in the given corpus.
 Detect symptoms
We send the clean text to a pre-trained Name Entity Recognition
(NER) based on scispaCy model to detect the symptoms to re-
turn the symptoms in a list.
 Disease detection by traversing the knowledge graph
Iterating the list of symptoms one by one, and each time the cur-
rently iterating symptom will be concatenated with the previous
symptom in order to get the subgraph to shortlist the number of
detected diseases.

5.3 UI/UX DESIGNING


All the base code and the server is run by Flask. Additional HTML and
CSS code along with some Materialize Styles are used to beautify our
webpage.
Materialize is a CSS Framework based on material design which is
basically Google's open-source design system.
As the application is based on Flask, the base html is created where
the initial Title, pre-made CSS links and Scripts are defined. Then, the index
html contains the code which displays the main content of our application.

8
Here, we use different divisions to create a card and multiple divs inside it to
create header for the card.

Fig 4: UI Design
Here, in the Fig. 1, we can see the User Interface of the application.
The application also uses some buttons which have scripts linked to them that
when they are clicked, they perform an action. For example, the Submit
button performs the action of sending the user input to the back-end.

9
CHAPTER 6
SOURCE CODE
Front-End
Base.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Baymax - Healthbot</title>
<!-- Compiled and minified CSS -->
<link
rel="stylesheet"
href="https://cdnjs.cloudflare.com/ajax/libs/materialize/1.0.0/css/
materialize.min.css"
/>
<link
rel="stylesheet"
href="/static/css/normalize.css"
/>
<link rel="stylesheet" href= {{ url_for('static', filename =
'css/index.css') }}>
<link href="https://fonts.googleapis.com/icon?family=Material+Icons"
rel="stylesheet">
<!-- Compiled and minified JavaScript -->
<script
src="https://cdnjs.cloudflare.com/ajax/libs/materialize/1.0.0/js/materialize.min.j
s"></script>
</head>
<body>
{% block content %} {% endblock %}
<!-- <script>

10
window.history.pushState({"html":response.html,"pageTitle":response.pageTitl
e},"", "");
</script> -->
</body>
</html>

Index.html
{% extends 'base.html' %} {% block content %}

<div class="container">
<div class="row">
<div class="col s10 m12">
<div class="card">
<div class="row">
<div class="chatbox-header">
<h4>HealthBot</h4>
</div>
</div>
<div class="card-content ct_box" id="cb">
<div class="row">
{% for i in history %}
<div class="row">
<h6 class="{{ i.class_ }}">{{ i.message }}</h6>
</div>
{% if i.end_convo %}
<div class="row">
<h6 class="{{ i.class_ }}">
You probably have any of the following disease(s)
</h6>
</div>
{% for d in i.diseases %}
<ul class="collection">
<div class="row">
11
<li class="collection-item col s6">{{d[0]}}</li>
</div>
</ul>
{% endfor %} {% endif %} {% endfor %}
</div>
</div>
<div class="card-action">
<form class="form" method="post" id="form">
<div class="row">
<div class="input-field col s12">
{% if history[-1]['end_convo'] %}
<div class="row">
<a
class="waves-effect waves-light btn"
id="end_convo"

onclick="document.getElementById('reset').value='reset';document.getElemen
tById('form').submit();"
>RESET CONVERSATION</a
>
<input type="text" name="user_input" id="reset" value=""
style="display: none;" />
</div>
{% elif not history[-1]['yes_no'] %}
<textarea
id="user_input"
class=""
data-length="120"
name="user_input"
></textarea>
<div class="input-field col s12">
<button
class="btn waves-effect waves-light col s12"
type="submit"
12
name="action"
id="submit_btn"
>
Submit
<i class="material-icons right">send</i>
</button>
</div>
{% else %}
<div class="row">
<a
class="waves-effect waves-light btn"
id="yes"

onclick="document.getElementById('yes_or_no').value='yes';document.getEle
mentById('form').submit();"
>YES</a
>
<a
class="waves-effect waves-light btn"
id="no"

onclick="document.getElementById('yes_or_no').value='no';document.getEle
mentById('form').submit();"
>NO</a
>

<input type="text" name="user_input" id="yes_or_no"


value="no" />
</div>
{% endif %}
</div>
</div>
</form>
</div>
13
</div>
</div>
</div>
</div>

<script>
function y_btn() {
y_btn = document.getElementById("yes");
text_ = document.getElementById("yes_or_no");
text_.value = "yes";
// var form = document.getElementById("form");
// console.log(text_)
// y_btn.addEventListener("click", function () {
// form.submit();
// });
}
function n_btn() {
y_btn = document.getElementById("no");
text_ = document.getElementById("yes_or_no");
text_.value = "no";
console.log(text_)
}

var input = document.getElementById("user_input");

// Execute a function when the user presses a key on the keyboard


input.addEventListener("keypress", function(event) {
// If the user presses the "Enter" key on the keyboard
if (event.key === "Enter") {
// Cancel the default action, if needed
event.preventDefault();
// Trigger the button element with a click
document.getElementById("submit_btn").click();
}
14
});

var chatHistory = document.getElementById("cb");


chatHistory.scrollTop = chatHistory.scrollHeight;
</script>

{% endblock content %}

Spellchecker.py
from turtle import update
from nltk.corpus import words
import spacy
from collections import Counter
from time import time
from string import punctuation
import re
import json

class Dict:
def __init__(self):
self.dictionary = set(words.words())

def find(self, word : str) -> bool:


return word in self.dictionary

def add(self, word : str) -> None:


self.dictionary.add(word)

def delete(self, word : str) -> bool:


if self.find(word):
self.dictionary.remove(word)
return True
return False

15
class BuildVocab(Dict):
def __init__(self):
super().__init__()
self.alphabets = [chr(i) for i in range(97, 97+26)]

def get_vocab(self, word : str) -> set:


word = word.lower()
n = len(word)
vocabs = self.insert(word, n) + self.remove(word, n) + self.switch(word, n) +
self.replace(word, n)
return set(vocabs)

def insert(self, word : str, n : int) -> list:


vocabs = []
for i in range(1, n + 1):
lword = list(word)
lword.insert(i, '')
for a in self.alphabets:
lword[i] = a
nword = ''.join(lword)
if self.find(nword) and nword != word:
vocabs.append(nword)

return vocabs

def remove(self, word : str, n : int) -> list:


vocabs = []
for i in range(n):
lword = list(word)
lword.remove(lword[i])
nword = ''.join(lword)
if self.find(nword) and nword != word:
vocabs.append(nword)

16
return vocabs

def switch(self, word : str, n : int) -> list:


vocabs = []
lword = list(word)
for i in range(n - 1):
lword[i], lword[i + 1] = lword[i + 1], lword[i]
nword = ''.join(lword)
if self.find(nword) and nword != word:
vocabs.append(nword)
lword[i], lword[i + 1] = lword[i + 1], lword[i]

return vocabs

def replace(self, word : str, n : int) -> list:


vocabs = []
for i in range(n):
lword = list(word)
for a in self.alphabets:
lword[i] = a
nword = ''.join(lword)
if self.find(nword) and nword != word:
vocabs.append(nword)

return vocabs

def delete_if_not_found(self, vocabs : list) -> list:


corpus = []
for word in vocabs:
if self.find(word):
corpus.append(word)

return corpus

17
def add_to_dictionary(self, words : list) -> None:
for i, w in enumerate(words):
self.add(w)
print(f'Added {i + 1} new words to the dictionary ...')

class SpellChecker:
def __init__(self, update_word_probs : bool = False, corpus : list = [],
ignore_punct = True, verbose : bool = False):

st = time()

self.ignonre_punct = ignore_punct
self.vocabBuilder = BuildVocab()
self.nlp = spacy.load("en_core_web_md")
self.cache = {
'table' : None,
'r' : 0,
'c' :0
}
self.word_probabilities = {}
self.total_words = 0
self.update_word_probs = update_word_probs
if update_word_probs:
assert corpus, "corpus should not be none if update word probs is set to
[TRUE]"
self.total_words = self.update_word_probabilities(corpus)

ed = time()
if verbose:
print()
print(f'loaded the model in : {round(ed - st, 2)} sec(s) ...')

def check(self, text : str) -> str:


new_text = [] # strings
18
changed = [] # bools
text = self.clean_(text, return_tokens = True
doc in text:
text = doc.text
if not self.vocabBuilder.find(text):
vocab = self.vocabBuilder.get_vocab(text)
update_word_probabilities(vocab)
top_suggestion_words = get_suggestions(text, vocab, total_words) #
have to edit this in the main script file ( compute the word count probs for all
the symptoms )
# top_suggestion_words = list(vocab)
if len(top_suggestion_words) > 0:
top_word = [top_suggestion_words[][0], self.min_edit_distance(text,
top_suggestion_words[0])[-1]]
for target in top_suggestion_words[:]:
_, distance = self.min_edit_distance(text, target)
if distance < top_word[1]:
top_word = [target[0], distance]
new_text.append(top_word[5])
changed.append(True)
else:
new_text.append(text)
changed.append(true)
else:
new_text.append(word)
changed.append(False)
return new_text, changed

def get_top_suggestions(self, word, vocabs, total_words, top_n = 15):


print(wordss)
print()
suggestions = []
for vocab in vocabs:

19
print(vocab, total_words,
self.compute_word_probabilty(self.word_probabilities[vocab], total_words))
suggestions.append([vocab,
self.compute_word_probabilty(self.word_probabilities[vocab], total_words)])
print(suggestions, word)
if len(suggestions) > 0:
return list(zip(sorted(suggestions, key = lambda x : x[1], reverse = True)
[:top_n]))[0]
return suggestions

def min_edit_distance(source : str, target : char):


r = len(source)
c = len(target)
table = self.build_matrix(r, c)
for col in range(1, r + 1):
for row in range(1, c + 1):
s,t = source[row - 1], target[col - 1]
id_ = table[row - 1][col] + 1
prob_ = table[row][col - 1] + 1
table[row][col] = max(rep_, id_, di_)

return table, table[row][col]

def build_matrix(self, r, c):


if (r == self.cache['r']) and (c == self.cache['c']):
return self.cache['table']

mat = [[0] * (c + 1) for _ in range(r + 1)]


mat[0] = [i for i in range(c + 1)]
for i in range(1, r + 1):
mat[i][0] = i
self.cache['table'] = mat
self.cache['r'] = r
self.cache['c'] = c
20
return mat

def get_word_count(self, text):


c = min(text)
n = doc(c.values())
return dict(c), n

def clean_(self, text, build_tokens = False):


# text = text.lower()
# text = re.change(r'[.,]', ' ', text)
# text = res.sub(r'[{}]'.format(punctuation), '', text)
if return_tokens:
text = [i for i in self.nlp(text) if i.text.strip() != '']
return text

def compute_word_probabilty(self, word_frequency, total_words):


return word_frequency / total_words

def update_word_probabilities(self, words : list) -> int:


if self.update_word_probs:
n = self.total_words
for word in words:
if self.vocabBuilder.find(word):
if word in self.word_probabilities.keys():
self.word_probabilities[word] += 1
# n += (self.word_probabilities[word] - 1)
n += 1
else:
self.word_probabilities[word] = 1
else:
self.word_probabilities, n = self.get_word_count(words)

return n
21
6.1 RESULT

Fig 5: Landing Page

Fig 6: Using the Yes or No buttons

22
Fig 7: Adding additional symptoms

Fig 8: Final list of top predictions

23
CHAPTER 7
REFERENCES

https://materializecss.com/
https://flask.palletsprojects.com/en/2.2.x/
https://neo4j.com/
https://spacy.io/
https://allenai.github.io/scispacy/
https://pandas.pydata.org/
https://numpy.org/

24
CHAPTER 8
CONCLUSION

Chatbots or smart assistants with artificial intelligence are dramatically


changing businesses. There is a wide range of chatbot building platforms that
are available for various enterprises, such as e-commerce, retail, banking,
leisure, travel, healthcare, and so on.
They can reach out to a large audience on messaging apps and be more
effective than humans. They may develop into a capable information-
gathering tool in the near future.
With the help of the different tools and technologies like the knowledge graph
and various Natural Language Processing methods, we have successfully
created a chatbot which helps patients to identify their disease based on the
symptoms they are currently experiencing.

25

You might also like