Sign Language To Text Converter

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Sign Language Detection

Project Phase III Report


“Sign Language Converter”

Submitted for the requirement of

Project course



Submitted to: Er. Gauri Shankar Submitted By:

Project Teacher (Supervisor): Abhishek Sharma(20BCS9162)
Er. Sumit Malhotra(E7822) Anant Tripathi(20BCS9167)
Aditya Raj(20BCS9168)
Abhishek Pandey(20BCS9188)

Co Supervisor Signature
Er. Gauri Shankar(E11266)


June 2022

i | Page
Sign Language Detection


Considering the barriers faced by pupils of the speech impaired community we aim to introduce a
tool which will bridge the communication gap and aid better interaction. In situations where a
vocal individual is unskilled with sign language, the need for a sign-interpreter is inevitable in
order to establish an interplay of expounding’s. We propose a system that enables a two-way
conversation between the speech-impaired and other vocal individuals. In this paper we present a
prototype which is effective in two phases. In the first phase, the sign-language gestures are fed
into the system in real-time through computer vision capabilities of the device. These gestures are
then recognized with the help of our Deep Neural Network while the hand detection is crisped
upon with an edge detection algorithm interpreting it in the text as well as audio format. The second
phase accomplishes to convert audio into text and eventually displays relevant hand-gestures for
the same. The system is capable of recognizing over 300 words gestured by the Indian Sign

Keywords - Speech Recognition, Sign Language, Natural Language Processing, Computer Vision

2 | Page
Sign Language Detection

Sr no. Topic Page No.
1 Feature/characteristics identification 04-06
2 Constraints Identification 06-07
3 Analysis of features and finalization subject to constraints 07-11
4 Design selection 11-16

3 | Page
Sign Language Detection

Feature/characteristics identification
1. Objectives:
This project aims at identifying alphabets and sentences in sign language from
corresponding gestures and vice-versa. Gesture recognition and sign language recognition
has been a well research topic for ASL but has been rarely touched for its Indian
counterpart. We aim to tackle this problem but instead of using high end technology like
gloves or Kinect for gestures recognition we aim at recognition from image. It can be
obtained from say webcam; and then use computer vision techniques and ML techniques
for extracting relevant features and subsequent classification.
2. Single entity:
The project is ML based and we as a team prefer to work together to implement its
component, such as NLP (Natural Language Processing), Root Words and Dataset. We
worked together as a team to bring this project to fruition as a single entity. Regardless of
culture, separate jobs are assigned to different person equally, but everyone’s effort to work
as a team to complete a single project is recognized.
3. Life Span:
The total time span of a project i.e., Sign Language Converter is 3 months. The project is

divided in two modules, all two takes 1.5 month each for the completion. The first module

is the dataset creation which is self-created. The second module is the coding part. The two

module is the review, then testing and evaluate.

4. Life cycle:

The 1st phase of the project was project planning; the structure and the timeline of the

project was created. The task was evenly divided in the team.

The 2nd phase the project definition was created that included the objectives, the scope of

the project, the purpose of the project.

In the 3rd phase designing constraints were created, and the framework of the design was


4 | Page
Sign Language Detection

The 4th phase is the building of the project which includes various modules with specific

timeline and the keeping the project well within the constraints.

The 5th phase will have the testing of the project before the deployment to check for

expected bugs, and find out whether the code and programming work are according to the

project constraints.

The last phase is the deployment, after verifying and testing of the project it will be

deployed, and maintained

5. Team Spirit:
We as a team have had team spirit from the beginning, from project selection to learning

to implementation. We put our faith in each other to complete the task at hand, and we

aided each other as needed. When working on a project as part of a group, team spirit, trust,

and passion are all important factors to consider.

6. Risk and Uncertainty:

The project Sign Language Detection is well and properly defined, still the risk and certain

uncertainty remains, there might be some corner cases which may have been missed during

the programming and testing part which can possibly give undesirable information or some

error which needs to be reported and corrected as soon as possible.

7. Directions:

The project is well within the constraints as per the requirement, the input of our mentor is

being taken well care of, all the resources required for the project are available according

to the requirements of the certain phases. Appropriate time and equal division of work is

given by the developers and programmers.

5 | Page
Sign Language Detection

8. Uniqueness:

Many projects that have been done in this field mainly deals with conversation of speech

into sign-language and moreover they are in ASL (American Sign Language) but our

project is in ISL (Indian Sign Language) and it deals with the conversation of speech into

sign language as well as it also converts the sign language into speech so as to make this

easy for both end who are conversating.

9. Flexibility:

It can run into all platforms and as well as easy and open to all changes that may require in



The development of the project is purely done using the open-source platform some generic

module’s code is taken from the publicly accessible code with proper credit and references

given for the particular piece of code.


There are three major constraints in project management to consider:

1. Time: The project’s completion, or final due date for deliverables is expected to be in May

2022, with the completion of all two modules with the constraint of 1.5 month each,

therefore taking 3 months for the completion of the project.

2. Scope: Speech to sign language translation is a necessity in the modern era of online
communication for hearing impaired people. It will bridge the communication gap between
normal and hearing-impaired people.
Our proposed system is designed to overcome the troubles faced by the Indian deaf people.
This system is designed to translate each word that is received as input into sign language.
This project translates the words based on Indian Sign Language.

6 | Page
Sign Language Detection

3. Risk: The project is well defined so the risk is low, there might be some delay in updating
the data. The technological aspect of running a project is a complex deliverable because
there is a high turnover of new and advanced technologies. A project may stall or terminate
if there is a poor implementation of critical operations and core processes such as
production or procurement.


This sign language detection project has different features: -
A. Forms of Input
Our project is intended to get inputs in multiple formats. The inputs can be of forms:
 Text input
 Live speech input

Fig. 1: Front end of the System

7 | Page
Sign Language Detection

B. Speech Recognition
The live speech is received as input from the microphone of our system. This is
done using the Python package PyAudio. PyAudio is a Python package that is used to
record audio on a variety of platforms. The audio thus received is converted into text
using Google Speech Recognizer API. It is an API that helps to convert audio to text
by incorporating neural network models. In the input format of giving the audio file,
the received audio is translated into text by using this Google Speech Recognizer. For
lengthier audio files, the audio is divided into smaller chunks on the basis of the
occurrence of silence. The chunks are then passed into the Google Speech Recognizer
to efficiently convert into text.

Fig. 2: Block diagram of Speech to Text Conversion

8 | Page
Sign Language Detection

Fig. 3: Speech Input

C. Porter Stemming Algorithm

Porter Stemming algorithm provides a basic approach to conflation that may work
well in practice. Natural Language Processing (NLP) helps the computer to understand
the human natural language. Porter Stemming is one of the Natural Language
Processing techniques. It is the famous stemming algorithm proposed in 1980. Porter
Stemmer algorithm is known for its speed and ease. It is mainly used for data mining
and to retrieve information. It produces better results than any other stemming
algorithms. It has less error rate.

The system removes the morphological and in flexional endings of the English words.
The system uses Porter stemming Algorithm to remove the commonly used suffixes
and prefixes of the words and find the root word or original word. For example, the
Porter stemming algorithm reduces the words “agrees”, “agreeable”, “agreement” to
the root word “agree”. Because of this stemming, we can reduce the time taken for
searching the sign language for the given word.

D. Text to Sign Language

The system iterates through every word in the processed text sentence which is
received from the previous step and searches the corresponding sign language video
sequences in the local system. If the word is found, the system shows the output as a
video sequence using the OpenCV module in Python.
OpenCV is an open-source library mainly used for image processing, video
analysis, and many more functionalities related to computer vision. The System passes

9 | Page
Sign Language Detection

the path of the video sequence to the OpenCV module to play the video. It shows the
video sequence frame by frame.
If the word is not found in the local system, the system will search for the word in
a sign language repository named “Indian Sign Language Portal”. The system looks for
the video link in the Indian Sign Language Portal by webscraping. And plays the
corresponding sign language video sequence.
Webscraping is the process of extracting the content from the website.
WebScraping is achieved using the BeautifulSoup module. BeautifulSoup module in
Python helps to get or search or navigate or modify the data from the Html files by
using parsers.

Fig. 4: Block diagram for showing the sign language

When we speak the sentence “Hello, what is your name” as input through the microphone,
the text is processed and converted to “Hello, what your name” to give faster conversion by
removing the filling words. The following output pops up each video in the sequence as –
Fig. 4(a), Fig. 4(b), Fig. 4(c), Fig. 4(d) shows the output sign languages for the given sentence “Hello,
what is your name”.

10 | P a g e
Sign Language Detection

Fig. 4(a): Output sign (Hello) Fig. 4(b): Output sign (What)

Fig. 4(c): Output sign (your) Fig. 4(d): Output sign (name)


11 | P a g e
Sign Language Detection


Main Program Flow Chart

The project gives us the many advantages of usage area of sign language.
After this system, it is an opportunity to use this type of system in any places
such as schools, doctor offices, colleges, universities, airports, social services
agencies, community service agencies and courts, briefly almost everywhere.

One of the most important demonstrations of the ability for communication to

help sign language users communicate with each other occurred. Sign
languages can be used everywhere when it is needed and it would reach
various local areas. The future works are about developing mobile application
of such system that enables everyone be able to speak with deaf people.

Voice Recognition Procedure:

Speech processing is the field which works on the speech signals and the
processing of them. The signals are usually processed in a digital
representation, although the signals are analog. Speech processing is
interested in to gather, store, manipulate, transfer speech signals. It is faster to

12 | P a g e
Sign Language Detection

communicate with the voice than text, therefore with the translation ovoice to
the image will give healthy people to communicate with the people with the
hearing disorders. Once the user press the button to record the speech,
computer’s microphone starts to listen, and after catching the voice with the
help of CMU Sphinx, it finds the meaning as the text. Then in Java it is
matched with the proper .gif image, so that the other user will understand.

Figure. The diagram of the Voice Recognition Procedure

Motion Capture Procedure

In this procedure, image processing is really important. Image processing is used

commonly in our life recently, and it seems that future will bring much more than that.

Figure. Motion Capture Diagram

One of the developed image processing sensors is Microsoft’s Kinect Sensor [4-7, 14]. As it can
be called the second part of the project, the motion capturing is the part where Kinect Sensor is
used. Once the user presses the button to record the motion, Kinect sensor starts to capture motions,
but to start to record the sign motions it starts a specific motion, which is shown in the Figure 12.
Figure 12. “Starting Motion” After the “Starting Motion”, Kinect captures the motions and it
converts them to the text. On the computer, this text is converted to the voice and then the other
user can hear the meaning of the sign. Flow chart of the sign language converter program is given
in Figure

13 | P a g e
Sign Language Detection

Figure 12. “Starting Motion”

The text is then pre-processed using NLP (Natural Language Processing). As we know that
Machine can only understand binary language (i.e.,0 and 1) then how can it understand our
language. So, to make the machine understand human language NLP was introduced.
Natural Language Processing is the ability of the machine where it processes the text said
and structures it. It understands the meaning of the words said and accordingly produces
the output. Text preprocessing consists of three things Tokenization, Normalization and
Noise removal as shown in Fig.6. Natural Language processing which is the mixture of
artificial intelligence and computational linguistics. But actually, how it works with our
project is most important. NLP can do additional functions to our language. We will get
our information after giving audio input based on the NLP devices to understand human
language. For example, Cortana and Siri.

Figure. Text pre processing

It is not an easy task for the machine to understand our language but with the help of NLP,
it becomes possible. Actually, how it works is shown below:  We give audio as input to
the machine.  The machine records that audio input.  Then machine translates the audio
into text and displays it on the screen.  The NLP system parses the text into components;
understand the context of the conversation and the intention of the person.  The machine
decides which command to be executed, based on the results of NLP. Actually, NLP is
process of creating algorithm that translates text into word labelling them based on the
position and function of the words in the sentences. Human language is converted
meaningfully into a numerical form. This allows computers to understand the nuances
implicitly encoded into our language.

14 | P a g e
Sign Language Detection

Dictionary based machine translation is done finally. When you speak “How Are You” as
input into the microphone, the following output pops up as separate letters

15 | P a g e
Sign Language Detection

16 | P a g e

You might also like