Sign Language To Text Converter
Sign Language To Text Converter
Sign Language To Text Converter
Project course
BACHELOR OF ENGINEERING
Co Supervisor Signature
Er. Gauri Shankar(E11266)
i | Page
Sign Language Detection
ABSTRACT
Considering the barriers faced by pupils of the speech impaired community we aim to introduce a
tool which will bridge the communication gap and aid better interaction. In situations where a
vocal individual is unskilled with sign language, the need for a sign-interpreter is inevitable in
order to establish an interplay of expounding’s. We propose a system that enables a two-way
conversation between the speech-impaired and other vocal individuals. In this paper we present a
prototype which is effective in two phases. In the first phase, the sign-language gestures are fed
into the system in real-time through computer vision capabilities of the device. These gestures are
then recognized with the help of our Deep Neural Network while the hand detection is crisped
upon with an edge detection algorithm interpreting it in the text as well as audio format. The second
phase accomplishes to convert audio into text and eventually displays relevant hand-gestures for
the same. The system is capable of recognizing over 300 words gestured by the Indian Sign
Language.
Keywords - Speech Recognition, Sign Language, Natural Language Processing, Computer Vision
2 | Page
Sign Language Detection
TABLE OF CONTENT
Sr no. Topic Page No.
1 Feature/characteristics identification 04-06
2 Constraints Identification 06-07
3 Analysis of features and finalization subject to constraints 07-11
4 Design selection 11-16
3 | Page
Sign Language Detection
Feature/characteristics identification
1. Objectives:
This project aims at identifying alphabets and sentences in sign language from
corresponding gestures and vice-versa. Gesture recognition and sign language recognition
has been a well research topic for ASL but has been rarely touched for its Indian
counterpart. We aim to tackle this problem but instead of using high end technology like
gloves or Kinect for gestures recognition we aim at recognition from image. It can be
obtained from say webcam; and then use computer vision techniques and ML techniques
for extracting relevant features and subsequent classification.
2. Single entity:
The project is ML based and we as a team prefer to work together to implement its
component, such as NLP (Natural Language Processing), Root Words and Dataset. We
worked together as a team to bring this project to fruition as a single entity. Regardless of
culture, separate jobs are assigned to different person equally, but everyone’s effort to work
as a team to complete a single project is recognized.
3. Life Span:
The total time span of a project i.e., Sign Language Converter is 3 months. The project is
divided in two modules, all two takes 1.5 month each for the completion. The first module
is the dataset creation which is self-created. The second module is the coding part. The two
4. Life cycle:
The 1st phase of the project was project planning; the structure and the timeline of the
project was created. The task was evenly divided in the team.
The 2nd phase the project definition was created that included the objectives, the scope of
In the 3rd phase designing constraints were created, and the framework of the design was
built.
4 | Page
Sign Language Detection
The 4th phase is the building of the project which includes various modules with specific
timeline and the keeping the project well within the constraints.
The 5th phase will have the testing of the project before the deployment to check for
expected bugs, and find out whether the code and programming work are according to the
project constraints.
The last phase is the deployment, after verifying and testing of the project it will be
5. Team Spirit:
We as a team have had team spirit from the beginning, from project selection to learning
to implementation. We put our faith in each other to complete the task at hand, and we
aided each other as needed. When working on a project as part of a group, team spirit, trust,
uncertainty remains, there might be some corner cases which may have been missed during
the programming and testing part which can possibly give undesirable information or some
7. Directions:
The project is well within the constraints as per the requirement, the input of our mentor is
being taken well care of, all the resources required for the project are available according
to the requirements of the certain phases. Appropriate time and equal division of work is
5 | Page
Sign Language Detection
8. Uniqueness:
Many projects that have been done in this field mainly deals with conversation of speech
into sign-language and moreover they are in ASL (American Sign Language) but our
project is in ISL (Indian Sign Language) and it deals with the conversation of speech into
sign language as well as it also converts the sign language into speech so as to make this
9. Flexibility:
It can run into all platforms and as well as easy and open to all changes that may require in
future.
10.Sub-Contracting:
The development of the project is purely done using the open-source platform some generic
module’s code is taken from the publicly accessible code with proper credit and references
CONSTRAINTS IDENTIFICATION
1. Time: The project’s completion, or final due date for deliverables is expected to be in May
2022, with the completion of all two modules with the constraint of 1.5 month each,
2. Scope: Speech to sign language translation is a necessity in the modern era of online
communication for hearing impaired people. It will bridge the communication gap between
normal and hearing-impaired people.
Our proposed system is designed to overcome the troubles faced by the Indian deaf people.
This system is designed to translate each word that is received as input into sign language.
This project translates the words based on Indian Sign Language.
6 | Page
Sign Language Detection
3. Risk: The project is well defined so the risk is low, there might be some delay in updating
the data. The technological aspect of running a project is a complex deliverable because
there is a high turnover of new and advanced technologies. A project may stall or terminate
if there is a poor implementation of critical operations and core processes such as
production or procurement.
7 | Page
Sign Language Detection
B. Speech Recognition
The live speech is received as input from the microphone of our system. This is
done using the Python package PyAudio. PyAudio is a Python package that is used to
record audio on a variety of platforms. The audio thus received is converted into text
using Google Speech Recognizer API. It is an API that helps to convert audio to text
by incorporating neural network models. In the input format of giving the audio file,
the received audio is translated into text by using this Google Speech Recognizer. For
lengthier audio files, the audio is divided into smaller chunks on the basis of the
occurrence of silence. The chunks are then passed into the Google Speech Recognizer
to efficiently convert into text.
8 | Page
Sign Language Detection
The system removes the morphological and in flexional endings of the English words.
The system uses Porter stemming Algorithm to remove the commonly used suffixes
and prefixes of the words and find the root word or original word. For example, the
Porter stemming algorithm reduces the words “agrees”, “agreeable”, “agreement” to
the root word “agree”. Because of this stemming, we can reduce the time taken for
searching the sign language for the given word.
9 | Page
Sign Language Detection
the path of the video sequence to the OpenCV module to play the video. It shows the
video sequence frame by frame.
If the word is not found in the local system, the system will search for the word in
a sign language repository named “Indian Sign Language Portal”. The system looks for
the video link in the Indian Sign Language Portal by webscraping. And plays the
corresponding sign language video sequence.
Webscraping is the process of extracting the content from the website.
WebScraping is achieved using the BeautifulSoup module. BeautifulSoup module in
Python helps to get or search or navigate or modify the data from the Html files by
using parsers.
When we speak the sentence “Hello, what is your name” as input through the microphone,
the text is processed and converted to “Hello, what your name” to give faster conversion by
removing the filling words. The following output pops up each video in the sequence as –
Fig. 4(a), Fig. 4(b), Fig. 4(c), Fig. 4(d) shows the output sign languages for the given sentence “Hello,
what is your name”.
10 | P a g e
Sign Language Detection
Fig. 4(a): Output sign (Hello) Fig. 4(b): Output sign (What)
Fig. 4(c): Output sign (your) Fig. 4(d): Output sign (name)
DESIGN SELECTION
11 | P a g e
Sign Language Detection
PROCESSING
The project gives us the many advantages of usage area of sign language.
After this system, it is an opportunity to use this type of system in any places
such as schools, doctor offices, colleges, universities, airports, social services
agencies, community service agencies and courts, briefly almost everywhere.
Speech processing is the field which works on the speech signals and the
processing of them. The signals are usually processed in a digital
representation, although the signals are analog. Speech processing is
interested in to gather, store, manipulate, transfer speech signals. It is faster to
12 | P a g e
Sign Language Detection
communicate with the voice than text, therefore with the translation ovoice to
the image will give healthy people to communicate with the people with the
hearing disorders. Once the user press the button to record the speech,
computer’s microphone starts to listen, and after catching the voice with the
help of CMU Sphinx, it finds the meaning as the text. Then in Java it is
matched with the proper .gif image, so that the other user will understand.
One of the developed image processing sensors is Microsoft’s Kinect Sensor [4-7, 14]. As it can
be called the second part of the project, the motion capturing is the part where Kinect Sensor is
used. Once the user presses the button to record the motion, Kinect sensor starts to capture motions,
but to start to record the sign motions it starts a specific motion, which is shown in the Figure 12.
Figure 12. “Starting Motion” After the “Starting Motion”, Kinect captures the motions and it
converts them to the text. On the computer, this text is converted to the voice and then the other
user can hear the meaning of the sign. Flow chart of the sign language converter program is given
in Figure
13 | P a g e
Sign Language Detection
The text is then pre-processed using NLP (Natural Language Processing). As we know that
Machine can only understand binary language (i.e.,0 and 1) then how can it understand our
language. So, to make the machine understand human language NLP was introduced.
Natural Language Processing is the ability of the machine where it processes the text said
and structures it. It understands the meaning of the words said and accordingly produces
the output. Text preprocessing consists of three things Tokenization, Normalization and
Noise removal as shown in Fig.6. Natural Language processing which is the mixture of
artificial intelligence and computational linguistics. But actually, how it works with our
project is most important. NLP can do additional functions to our language. We will get
our information after giving audio input based on the NLP devices to understand human
language. For example, Cortana and Siri.
It is not an easy task for the machine to understand our language but with the help of NLP,
it becomes possible. Actually, how it works is shown below: We give audio as input to
the machine. The machine records that audio input. Then machine translates the audio
into text and displays it on the screen. The NLP system parses the text into components;
understand the context of the conversation and the intention of the person. The machine
decides which command to be executed, based on the results of NLP. Actually, NLP is
process of creating algorithm that translates text into word labelling them based on the
position and function of the words in the sentences. Human language is converted
meaningfully into a numerical form. This allows computers to understand the nuances
implicitly encoded into our language.
14 | P a g e
Sign Language Detection
Dictionary based machine translation is done finally. When you speak “How Are You” as
input into the microphone, the following output pops up as separate letters
15 | P a g e
Sign Language Detection
16 | P a g e