Internship Report 1128
Internship Report 1128
Internship Report 1128
AN INTERNSHIP REPORT
Submitted by
M Z MUHAMMAD JAABIR
(RRN: 190071601128)
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING
DECEMBER 2022
BONAFIDE CERTIFICATE
SIGNATURE
Dr. S.P. VALLI
Assistant Professor (Sel. Gr) /
CSE
Department of Computer Science
and Engineering
B.S. Abdur Rahman Crescent
Institute of Science and
Technology,
Vandalur, Chennai – 600 048
ii
Internship certificate
iii
VIVA VOCE EXAMINATION
……………………………….
INTERNAL EXAMINER
iv
ACKNOWLEDGEMENT
I obliged our class advisor Dr. S.P.VALLI, Assistant Professor (Sel. Gr),
Department of Computer Science and Engineering for her professional
guidance and encouragement throughout the internship period.
I thank all the Faculty members and the System Staff of the Department of
Computer Science and Engineering for their valuable support and assistance
at various stages of project development.
M Z MUHAMMAD JAABIR
v
TABLE OF CONTENTS
LIST OF FIGURES
1 Company Overview
1.1 Overview 1
1.2 Website 1
1.3 Logo 1
2 Role and Responsibility
2.1 Role 2
2.2 Responsibility 2
3 Project Overview
3.1 Overview 3
3.2 Scope And Considerations 3
4 Theoretical Analysis
4.1 Technology Stack 5
5 Steps to Build The Application
5.1 Research 7
5.2 Key Features 7
5.3 UI/UX Designing 8
6 Source Code 10
6.1 Result 22
7 References 24
8 Conclusion 25
vi
LIST OF FIGURES
1 Company Logo 1
2 Pipeline Flowchart 7
4 UI Design 9
5 Landing Page 22
vii
CHAPTER 1
COMPANY OVERVIEW
1.1 OVERVIEW
OBJECT AUTOMATION SYSTEM SOLUTIONS PRIVATE LIMITED is a
Private incorporated on 12 January 2018. It is classified as non-govt company
and is registered at Registrar of Companies, Chennai. Its authorized share
capital is Rs. 1,000,000 and its paid-up capital is Rs. 100,000.
The company is managed by two able and efficient directors:
Bragathalakshmi Bharani
Sandhya Palani
1.2 WEBSITE
Object-Automation (https://object-automation.com/)
1.3 LOGO
1
CHAPTER 2
ROLE AND RESPONSIBILTY
2.1 ROLE
My role at the organization was “AI and Data Science Intern”, I worked
on a project called “HEALTHBOT” for about 7 months starting from May till
November.
2.2 RESPONSIBILTY
During the internship period knowledge of various platforms and
programming languages was gained. The target was to deliver a working
model for the chatbot system by using different tools and frameworks such as
Neo4j, Spacy and AIML. Hence, to meet that objective, this internship
required extensive preliminary knowledge about the core Python and
intermediate knowledge about machine learning, Deep Learning and NLP
before actually analysing the actual requirement of the system. The study was
required not only to understand the subject but also to realize the solutions to
the existing problems and implementing the findings from the study was
another bigger challenge.
The study about the chatbot was from various sources such as
research papers, blogs and videos. Other major activities carried out during
the internship was an extensive study of the current online platform,
presentations of study analysis and practical implementations, and most
importantly the team discussions to analyse the customer change request.
The regular meetings and discussions with mentors, project manager
and team members helped me to widen my horizon of knowledge of the
existing system and problem background. Machine learning models’
development is one of the major services of the Object Automation Software
Solutions Pvt. Ltd. Company which comes under software development.
Overall, my responsibility was primarily focused on:
Knowledge graph implementation
Chatbot model development and its UI
Data cleaning
2
CHAPTER 3
PROJECT OVERVIEW
3.1 OVERVIEW
Our team’s project “HEALTHBOT” involved us in creating a type-II
chatbot used for identifying a patient’s disease by using knowledge graph as
its database. A type-II chatbot is a rule based chatbot based on predefined
conversational paths, for instance, where users get predefined question and
answer options. If a user tries to ask questions outside these pre-designed
questions, the bot cannot answer them.
This is why it’s important to design the rule-based chatbot in a versatile way.
Different media formats such as images, GIFs, audios, and videos can help if
necessary. Also, a personal approach (requires the prior request of the name
or first name) and different response options improve the experience. Rule-
based bots are quickly designed, implemented and inexpensive. Therefore, it
requires user cooperation in the input to help answer the questions.
The chatbot that we developed primarily uses technologies like Natural
Language Processing (NLP) and Graph-based Database Management
System (DBMS) called Knowledge Graph. Basically, our chatbot takes input
from the user, their symptoms and diagnoses the user with a disease by
traversing the knowledge graph.
3
using Flask. With the help of Flask and some custom code, we were able to
integrate the back-end and the front-end.
Future considerations of our project include creating a mobile app
interface with improved User Interface/ User Experience and improved
models for enhancing Natural Language Processing for the identification of
relationship between the user’s input and the data in the knowledge graph.
This way, we would be able to improve the relationship between the chatbot
and the user. Our final goal is to make sure that our user’s require less
amount of interaction.
4
CHAPTER 4
THEORETICAL ANALYSIS
Neo4j
Neo4j is a popular Graph Database. Neo4j is the world's leading
open-source Graph Database which is developed using Java
technology. It is highly scalable and schema free (NoSQL).
spaCy
spaCy is a free open-source library for Natural Language Pro-
cessing in Python. It features NER, POS tagging, dependency
parsing, word vectors and more. It is written in the programming
languages Python and Cython.
scispaCy
scispaCy is a Python package containing spaCy models for pro-
cessing biomedical, scientific or clinical text. We use the
“en_core_sci_lg” model as it is a full spaCy pipeline for biomed-
ical data with a larger vocabulary and 600k word vectors.
Custom-made SpellChecker
5
This module was custom-made by our team to help us spell
check the user’s input. It is dependent on the spacy and nltk
modules to clean and find the misspelled words respectively.
6
CHAPTER 5
STEPS TO BUILD THE APPLICATION
5.1 RESEARCH
Initially, we started looking for the various ways to implement the base
idea of the project using different technologies. After delving in the research,
we found the required modules and applications for our project. Once that
was done, we created a pipeline for creating the initial and basic idea of the
chatbot.
7
Fig 3: Text Cleaning Process Flow
Check spelling
The SpellChecker requires a corpus. From the corpus, it gets
the word frequency for all the distinct vocabularies. It first token-
izes the input text and then iterates through the tokens one by
one to check their spellings based on the given corpus. It uses
the minimum edit distance by using dynamic programming for
the misspelled word and the list of vocabularies to get the edit
distance instantly. Finally, based on the word probability the mis-
spelled word will be replaced by the word which has the highest
word frequency in the given corpus.
Detect symptoms
We send the clean text to a pre-trained Name Entity Recognition
(NER) based on scispaCy model to detect the symptoms to re-
turn the symptoms in a list.
Disease detection by traversing the knowledge graph
Iterating the list of symptoms one by one, and each time the cur-
rently iterating symptom will be concatenated with the previous
symptom in order to get the subgraph to shortlist the number of
detected diseases.
8
Here, we use different divisions to create a card and multiple divs inside it to
create header for the card.
Fig 4: UI Design
Here, in the Fig. 1, we can see the User Interface of the application.
The application also uses some buttons which have scripts linked to them that
when they are clicked, they perform an action. For example, the Submit
button performs the action of sending the user input to the back-end.
9
CHAPTER 6
SOURCE CODE
Front-End
Base.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Baymax - Healthbot</title>
<!-- Compiled and minified CSS -->
<link
rel="stylesheet"
href="https://cdnjs.cloudflare.com/ajax/libs/materialize/1.0.0/css/
materialize.min.css"
/>
<link
rel="stylesheet"
href="/static/css/normalize.css"
/>
<link rel="stylesheet" href= {{ url_for('static', filename =
'css/index.css') }}>
<link href="https://fonts.googleapis.com/icon?family=Material+Icons"
rel="stylesheet">
<!-- Compiled and minified JavaScript -->
<script
src="https://cdnjs.cloudflare.com/ajax/libs/materialize/1.0.0/js/materialize.min.j
s"></script>
</head>
<body>
{% block content %} {% endblock %}
<!-- <script>
10
window.history.pushState({"html":response.html,"pageTitle":response.pageTitl
e},"", "");
</script> -->
</body>
</html>
Index.html
{% extends 'base.html' %} {% block content %}
<div class="container">
<div class="row">
<div class="col s10 m12">
<div class="card">
<div class="row">
<div class="chatbox-header">
<h4>HealthBot</h4>
</div>
</div>
<div class="card-content ct_box" id="cb">
<div class="row">
{% for i in history %}
<div class="row">
<h6 class="{{ i.class_ }}">{{ i.message }}</h6>
</div>
{% if i.end_convo %}
<div class="row">
<h6 class="{{ i.class_ }}">
You probably have any of the following disease(s)
</h6>
</div>
{% for d in i.diseases %}
<ul class="collection">
<div class="row">
11
<li class="collection-item col s6">{{d[0]}}</li>
</div>
</ul>
{% endfor %} {% endif %} {% endfor %}
</div>
</div>
<div class="card-action">
<form class="form" method="post" id="form">
<div class="row">
<div class="input-field col s12">
{% if history[-1]['end_convo'] %}
<div class="row">
<a
class="waves-effect waves-light btn"
id="end_convo"
onclick="document.getElementById('reset').value='reset';document.getElemen
tById('form').submit();"
>RESET CONVERSATION</a
>
<input type="text" name="user_input" id="reset" value=""
style="display: none;" />
</div>
{% elif not history[-1]['yes_no'] %}
<textarea
id="user_input"
class=""
data-length="120"
name="user_input"
></textarea>
<div class="input-field col s12">
<button
class="btn waves-effect waves-light col s12"
type="submit"
12
name="action"
id="submit_btn"
>
Submit
<i class="material-icons right">send</i>
</button>
</div>
{% else %}
<div class="row">
<a
class="waves-effect waves-light btn"
id="yes"
onclick="document.getElementById('yes_or_no').value='yes';document.getEle
mentById('form').submit();"
>YES</a
>
<a
class="waves-effect waves-light btn"
id="no"
onclick="document.getElementById('yes_or_no').value='no';document.getEle
mentById('form').submit();"
>NO</a
>
<script>
function y_btn() {
y_btn = document.getElementById("yes");
text_ = document.getElementById("yes_or_no");
text_.value = "yes";
// var form = document.getElementById("form");
// console.log(text_)
// y_btn.addEventListener("click", function () {
// form.submit();
// });
}
function n_btn() {
y_btn = document.getElementById("no");
text_ = document.getElementById("yes_or_no");
text_.value = "no";
console.log(text_)
}
{% endblock content %}
Spellchecker.py
from turtle import update
from nltk.corpus import words
import spacy
from collections import Counter
from time import time
from string import punctuation
import re
import json
class Dict:
def __init__(self):
self.dictionary = set(words.words())
15
class BuildVocab(Dict):
def __init__(self):
super().__init__()
self.alphabets = [chr(i) for i in range(97, 97+26)]
return vocabs
16
return vocabs
return vocabs
return vocabs
return corpus
17
def add_to_dictionary(self, words : list) -> None:
for i, w in enumerate(words):
self.add(w)
print(f'Added {i + 1} new words to the dictionary ...')
class SpellChecker:
def __init__(self, update_word_probs : bool = False, corpus : list = [],
ignore_punct = True, verbose : bool = False):
st = time()
self.ignonre_punct = ignore_punct
self.vocabBuilder = BuildVocab()
self.nlp = spacy.load("en_core_web_md")
self.cache = {
'table' : None,
'r' : 0,
'c' :0
}
self.word_probabilities = {}
self.total_words = 0
self.update_word_probs = update_word_probs
if update_word_probs:
assert corpus, "corpus should not be none if update word probs is set to
[TRUE]"
self.total_words = self.update_word_probabilities(corpus)
ed = time()
if verbose:
print()
print(f'loaded the model in : {round(ed - st, 2)} sec(s) ...')
19
print(vocab, total_words,
self.compute_word_probabilty(self.word_probabilities[vocab], total_words))
suggestions.append([vocab,
self.compute_word_probabilty(self.word_probabilities[vocab], total_words)])
print(suggestions, word)
if len(suggestions) > 0:
return list(zip(sorted(suggestions, key = lambda x : x[1], reverse = True)
[:top_n]))[0]
return suggestions
return n
21
6.1 RESULT
22
Fig 7: Adding additional symptoms
23
CHAPTER 7
REFERENCES
https://materializecss.com/
https://flask.palletsprojects.com/en/2.2.x/
https://neo4j.com/
https://spacy.io/
https://allenai.github.io/scispacy/
https://pandas.pydata.org/
https://numpy.org/
24
CHAPTER 8
CONCLUSION
25