Voice Assistant
Voice Assistant
Voice Assistant
On
VOICE ASSISTANT
Bachelor of Computer Applications (Artificial Intelligence)
In
Computer Science and Engineering
It is hereby declared that the work, which is being presented in the seminar report titled
“VOICE ASSISTANT” in partial fulfillment of the award of Bachelor of Computer
Application(Artificial Intelligence) and submitted in the department of Computer Science
Engineering of Vivekananda Global University, Jaipur is an authentic record of the work
under the supervision and valuable guidance of Ms. Sonal Saxena, Assistant professor, Dept.
of Computer Science & Engineering.
The matter presented in the report embodies the result of the studies carried out of the
student and has not been submitted for the award of any other degree in this any other
institute.
The completion of this project report gives me much Pleasure. I would like to show
my gratitude to Asst. Prof. “Sonal Saxena” for giving me a good guideline for project
throughout numerous consultations. I would also like to expand my deepest gratitude to all
those who have directly and indirectly guided us in writing this project report.
Many people, especially my classmates and friends themselves, have made valuable
comments and suggestions on this proposal which gave me inspiration to improve my project.
Here I thank all the people for their help directly and indirectly to complete this project
report.
The author
Table of Contents
5 Applications 23
6 Source Code 26
5.1 main.py 26
5.2 Selenium_Webdriver.py 28
5.3 News.py 29
5.4 YT_auto.py 29
7 Future Scope 30
8 Conclusion 31
9 Reference and bibliography 32
VIRTUAL ASSISTANT
1. INTRODUCTION
In today’s era almost all tasks are digitalized. We have Smartphone in hands and it is
nothing less than having world at your finger tips. These days we aren’t even using fingers.
We just speak of the task and it is done. There exist systems where we can say Text Dad, “I’ll
be late today.” And the text is sent. That is the task of a Virtual Assistant. It also supports
specialized task such as booking a flight, or finding cheapest book online from various e-
commerce sites and then providing an interface to book an order are helping automate search,
discovery and online order operations.
Virtual Assistants are software programs that help you ease your day to day tasks, such
as showing weather report, creating reminders, making shopping lists etc. They can take
commands via text (online chat bots) or by voice. Voice based intelligent assistants need an
invoking word or wake word to activate the listener, followed by the command. For my
project the wake word is JIA. We have so many virtual assistants, such as Apple’s Siri,
Amazon’s Alexa and Microsoft’s Cortana. For this project, wake word was chosen JIA.
Voice searches have dominated over text search. Web searches conducted via mobile
devices have only just overtaken those carried out using a computer and the analysts are
already predicting that 50% of searches will be via voice by 2020.Virtual assistants are turning
out to be smarter than ever. Allow your intelligent assistant to make email work for you.
Detect intent, pick out important information, automate processes, and deliver personalized
responses.
This project was started on the premise that there is sufficient amount of openly
available data and information on the web that can be utilized to build a virtual assistant that
has access to making intelligent decisions for routine user activities.
1
1.1 BACKGROUND
There already exist a number of desktop virtual assistants. A few examples of current
virtual assistants available in market are discussed in this section along with the tasks they can
provide and their drawbacks.
SIRI is personal assistant software that interfaces with the user thru voice interface,
recognizes commands and acts on them. It learns to adapt to user’s speech and thus improves
voice recognition over time. It also tries to converse with the user when it does not identify the
user request.
It integrates with calendar, contacts and music library applications on the device and
also integrates with GPS and camera on the device. It uses location, temporal, social and task
based contexts, to personalize the agent behavior specifically to the user at a given point of
time.
Supported Tasks
Drawback
SIRI does not maintain a knowledge database of its own and its understanding comes
from the information captured in domain models and data models.
2
ReQall
ReQall is personal assistant software that runs on smartphones running Apple iOS or
Google Android operating system. It helps user to recall notes as well as tasks within a
location and time context. It records user inputs and converts them into commands, and
monitors current stack of user tasks to proactively suggest actions while considering any
changes in the environment. It also presents information based on the context of the user, as
well as filter information to the user based on its learned understanding of the priority of that
information.
Supported Tasks
• Reminders
• Email
• Calendar, Google Calendar
• Outlook
• Evernote
• Facebook, LinkedIn
• News Feeds
Drawback
Will take some time to put all of the to-do items in – you could spend more time
putting the entries in than actually doing the revision.
3
1.2 OBJECTIVES
Virtual assistants can tremendously save you time. We spend hours in online research
and then making the report in our terms of understanding. JIA can do that for you. Provide a
topic for research and continue with your tasks while JIA does the research. Another difficult
task is to remember test dates, birthdates or anniversaries. It comes with a surprise when you
enter the class and realize it is class test today. Just tell JIA in advance about your tests and
she reminds you well in advance so you can prepare for the test.
One of the main advantages of voice searches is their rapidity. In fact, voice is reputed
to be four times faster than a written search: whereas we can write about 40 words per minute,
we are capable of speaking around 150 during the same period of time15. In this respect, the
ability of personal assistants to accurately recognize spoken words is a prerequisite for them to
be adopted by consumers.
4
1.3 PURPOSE, SCOPE AND APPILCABILITY
Purpose
Purpose of virtual assistant is to being capable of voice interaction, music playback,
making to-do lists, setting alarms, streaming podcasts, playing audiobooks, and providing
weather, traffic, sports, and other real-time information, such as news. Virtual assistants
enable users to speak natural language voice commands in order to operate the device and its
apps.
Scope
Voice assistants will continue to offer more individualized experiences as they get
better at differentiating between voices. However, it’s not just developers that need to address
the complexity of developing for voice as brands also need to understand the capabilities of
each device and integration and if it makes sense for their specific brand. They will also need
to focus on maintaining a user experience that is consistent within the coming years as
complexity becomes more of a concern. This is because the visual interface with voice
assistants is missing. Users simply cannot see or touch a voice interface.
Applicability
The mass adoption of artificial intelligence in users’ everyday lives is also fueling the
shift towards voice. The number of IoT devices such as smart thermostats and speakers are
giving voice assistants more utility in a connected user’s life. Smart speakers are the number
one way we are seeing voice being used. Many industry experts even predict that nearly every
application will integrate voice technology in some way in the next 5 years.
The use of virtual assistants can also enhance the system of IoT (Internet of Things).
Twenty years from now, Microsoft and its competitors will be offering personal digital
assistants that will offer the services of a full-time employee usually reserved for the rich and
famous.
5
2. SURVEY OF TECHNOLOGY
Python
Python provides a huge list of benefits to all. The usage of Python is such that it cannot
be limited to only one activity. Its growing popularity has allowed it to enter into some of the
most popular and complex processes like Artificial Intelligence (AI), Machine Learning (ML),
natural language processing, data science etc. Python has a lot of libraries for every need of
this project. For JIA, libraries used are speechrecognition to recognize voice, Pyttsx for text to
speech, selenium for web automation etc.
Python is reasonably efficient. Efficiency is usually not a problem for small examples.
If your Python code is not efficient enough, a general procedure to improve it is to find out
what is taking most the time, and implement just that part more efficiently in some lower-level
language. This will result in much less programming and more efficient code (because you
will have more time to optimize) than writing everything in a low-level language.
Pyttsx
Pyttsx stands for Python Text to Speech. It is a cross-platform Python wrapper for text-
to-speech synthesis. It is a Python package supporting common text-to-speech engines on Mac
OS X, Windows, and Linux. It works for both Python2.x and 3.x versions. Its main advantage
is that it works offline.
Speech Recognition
This is a library for performing speech recognition, with support for several engines
and APIs, online and offline. It supports APIs like Google Cloud Speech API, IBM Speech to
Text, Microsoft Bing Voice Recognition etc.
6
Selenium Webdriver
Selenium WebDriver is an automated testing framework used for the validation of websites
(and web applications). It supports popular programming languages such as Python, C#, Java,
Ruby, and more.
Selenium WebDriver was introduced in Selenium v2. As Selenium WebDriver communicates
with a web browser using its corresponding browser driver, it does not require a component
like Selenium RC Server (as in Selenium RC).
Requests
The requests module allows you to send HTTP requests using Python.
The HTTP request returns a Response Object with all the response data (content, encoding,
status, etc).
The requests library is the de facto standard for making HTTP requests in Python. It abstracts
the complexities of making requests behind a beautiful, simple API so that you can focus on
interacting with services and consuming data in your application.
7
3. REQUIREMENT AND ANALYSIS
Usually, user needs to manually manage multiple sets of applications to complete one
task. For example, a user trying to make a travel plan needs to check for airport codes for
nearby airports and then check travel sites for tickets between combinations of airports to
reach the destination. There is need of a system that can manage tasks effortlessly.
We already have multiple virtual assistants. But we hardly use it. There are number of
people who have issues in voice recognition. These systems can understand English phrases
but they fail to recognize in our accent. Our way of pronunciation is way distinct from theirs.
Also, they are easy to use on mobile devices than desktop systems. There is need of a virtual
assistant that can understand English in Indian accent and work on desktop system.
When a virtual assistant is not able to answer questions accurately, it’s because it lacks
the proper context or doesn’t understand the intent of the question. Its ability to answer
questions relevantly only happens with rigorous optimization, involving both humans and
machine learning. Continuously ensuring solid quality control strategies will also help manage
the risk of the virtual assistant learning undesired bad behaviors. They require large amount of
information to be fed in order for it to work efficiently.
Virtual assistant should be able to model complex task dependencies and use these
models to recommend optimized plans for the user. It needs to be tested for finding optimum
paths when a task has multiple sub-tasks and each sub-task can have its own sub-tasks. In such
a case there can be multiple solutions to paths, and the it should be able to consider user
preferences, other active tasks, priorities in order to recommend a particular plan.
8
3.2 REQUIREMENT SPECIFICATION
Personal assistant software is required to act as an interface into the digital world by
understanding user requests or commands and then translating into actions or
recommendations based on agent’s understanding of the world.
JIA focuses on relieving the user of entering text input and using voice as primary
means of user input. Agent then applies voice recognition algorithms to this input and records
the input. It then use this input to call one of the personal information management
applications such as task list or calendar to record a new entry or to search about it on search
engines like Google, Bing or Yahoo etc. Focus is on capturing the user input through voice,
recognizing the input and then executing the tasks if the agent understands the task. Software
takes this input in natural language, and so makes it easier for the user to input what he or she
desires to be done.
Voice recognition software enables hands free use of the applications, lets users to
query or command the agent through voice interface. This helps users to have access to the
agent while performing other tasks and thus enhances value of the system itself. JIA also have
ubiquitous connectivity through Wi-Fi or LAN connection, enabling distributed applications
that can leverage other APIs exposed on the web without a need to store them locally.
9
Feasibility Study
Feasibility study can help you determine whether or not you should proceed with
your project. It is essential to evaluate cost and benefit. It is essential to evaluate cost and
benefit of the proposed system. Five types of feasibility study are taken into consideration.
1. Technical feasibility: It includes finding out technologies for the project, both
hardware and software. For virtual assistant, user must have microphone to convey
their message and a speaker to listen when system speaks. These are very cheap now a
days and everyone generally possess them. Besides, system needs internet connection.
While using JIA, make sure you have a steady internet connection. It is also not an
issue in this era where almost every home or office has Wi-Fi.
2. Operational feasibility: It is the ease and simplicity of operation of proposed system.
System does not require any special skill set for users to operate it. In fact, it is
designed to be used by almost everyone. Kids who still don’t know to write can read
out problems for system and get answers.
3. Economical feasibility: Here, we find the total cost and benefit of the proposed
system over current system. For this project, the main cost is documentation cost. User
also would have to pay for microphone and speakers. Again, they are cheap and
available. As far as maintenance is concerned, JIA won’t cost too much.
4. Organizational feasibility: This shows the management and organizational structure
of the project. This project is not built by a team. The management tasks are all to be
carried out by a single person. That won’t create any management issues and will
increase the feasibility of the project.
5. Cultural feasibility: It deals with compatibility of the project with cultural
environment. Virtual assistant is built in accordance with the general culture. The
project is named JIA so as to represent Indian culture without undermining local
beliefs.
This project is technically feasible with no external hardware requirements. Also it issimple in
operation and does not cost training or repairs. Overall feasibility study of the project reveals
that the goals of the proposed system are achievable. Decision is taken to proceed with the
project.
10
3.3 HARDWARE AND SOFTWARE REQUIREMENTS
Hardware:
Software:
• Windows 7(32-bit) or above.
• Python 2.7 or later
• Chrome Driver
• Selenium Web Automation
• Any Python Code Editor
11
4. SYSTEM DESIGN
4.1 ER DIAGRAM
The above diagram shows entities and their relationship for a virtual assistant system.
We have a user of a system who can have their keys and values. It can be used to store any
information about the user. Say, for key “name” value can be “Jim”. For some keys user might
like to keep secure. There he can enable lock and set a password (voice clip).
Single user can ask multiple questions. Each question will be given ID to get
recognized along with the query and its corresponding answer. User can also be having n
number of tasks. These should have their own unique id and status i.e. their current state. A
task should also have a priority value and its category whether it is a parent task or child task
of an older task.
12
4.2 ACTIVITY DIAGRAM
Initially, the system is in idle mode. As it receives any wake up cal it begins execution.
The received command is identified whether it is a questionnaire or a task to be performed.
Specific action is taken accordingly. After the Question is being answered or the task is being
performed, the system waits for another command. This loop continues unless it receives quit
command. At that moment, it goes back to sleep.
13
4.3 CLASS DIAGRAM
The class user has 2 attributes command that it sends in audio and the response it
receives which is also audio. It performs function to listen the user command. Interpret it and
then reply or sends back response accordingly. Question class has the command in string form
as it is interpreted by interpret class. It sends it to general or about or search function based on
its identification.
The task class also has interpreted command in string format. It has various functions
like reminder, note, mimic, research and reader.
14
4.4 USE CASE DIAGRAM
In this project there is only one user. The user queries command to the system. System
then interprets it and fetches answer. The response is sent back to the user.
15
4.5 SEQUENCE DIAGRAM
The above sequence diagram shows how an answer asked by the user is being fetched
from internet. The audio query is interpreted and sent to Web scraper. The web scraper
searches and finds the answer. It is then sent back to speaker, where it speaks the answer to
user.
16
4.5.2 Sequence diagram for Task Execution
The user sends command to virtual assistant in audio form. The command is passed to
the interpreter. It identifies what the user has asked and directs it to task executer. If the task is
missing some info, the virtual assistant asks user back about it. The received information is
sent back to task and it is accomplished. After execution feedback is sent back to user.
17
4.6 DATA FLOW DIAGRAM
18
4.6.3 DFD Level 2
19
Data Flow in Kid Zone
20
4.7 COMPONENT DIAGRAM
The main component here is the Virtual Assistant. It provides two specific service,
executing Task or Answering your question.
21
4.8 DATA DICTIONARY
User
Key Text
Value Text
Lock Boolean
Password Text
Question
Qid Integer PRIMARY KEY
Query Text
Answer Text
Task
Tid Integer PRIMARY KEY
Status Text (Active/Waiting/Stopped)
Level Text (Parent/Sub)
Priority Integer
22
4.9 TEST CASE DESIGN
• Test Case 1
Test Title: Response Time
Test ID: T1
Test Objective: To make sure that the system respond back time is efficient.
Description:
Time is very critical in a voice based system. As we are not typing inputs, we are
speaking them. The system must also reply in a moment. User must get instant response of the
query made.
• Test Case 2
Test ID: T2
Test Objective: To assure that answers retrieved by system are accurate as per gathered data.
Description:
A virtual assistant system is mainly used to get precise answers to any question asked.
Getting answer in a moment is of no use if the answer is not correct. Accuracy is of utmost
importance in a virtual assistant system.
23
• Test Case 3
Test ID: t3
Description:
There are times when mathematical calculation requires approximate value. For
example, if someone asks for value of PI the system must respond with approximate value and
not the accurate value. Getting exact value in such cases is undesirable.
Note: There might include a few more test cases and these test cases are also subject to change
with the final software development.
24
5 Applications
There are a wide variety of services which are provided by the voice-enabled
devices which range from simple commands like providing information about the
weather of a place, general information from Wikipedia, movie rating from IMDB,
setting an alarm or reminder, creating a to-do list and adding items to the shopping list
so that we don’t forget when we go shopping. It can also read books for the user or else
play music from any streaming services depending on the device provider or user
preference. It can also play videos from YouTube or else from any streaming services.
In a recent study, voice assistants are also being used to assist public interactions with
the Government and also a decrease of 30% work- load on humans when voice
assistants are used in call-centers.
25
6 SOURCE CODE
6.1 main.py
import pyttsx3 as p
import speech_recognition as sr
from selenium_web import *
from YT_auto import *
from News import *
import randfacts as rf
engine = p.init()
rate=engine.getProperty('rate')
engine.setProperty('rate',180)
voices=engine.getProperty('voices')
engine.setProperty('voice',voices[0].id)
def speak(text):
engine.say(text)
engine.runAndWait()
r = sr.Recognizer()
if "information" in text2:
speak("you need information related to which topic?")
27
elif "news" in text2:
print("sure sir, Now i will read news for you")
speak("sure sir, Now i will read news for you")
arr=news()
for i in range(len(arr)):
print(arr[i])
speak(arr[i])
elif "fact" or "facts" in text2:
x=rf.get_fact()
print(x)
speak("did you know that, "+x)
6.2 Selenium_Web.py
class infow():
def __init__(self):
self.driver=webdriver.Chrome(executable_path='D:\chromedriver_win32/
chromedriver.exe')
def get_info(self,query):
self.query=query
self.driver.get(url="https://www.wikipedia.org")
search=self.driver.find_element_by_xpath('//*[@id="searchInput"]')
search.click()
search.send_keys(query)
enter = self.driver.find_element_by_xpath('//*[@id="search-
form"]/fieldset/button')
enter.click()
28
6.3 News.py
import requests
from ss import *
api_address="https://newsapi.org/v2/everything?q=keyword&apiKey="+k
ey
json_data = requests.get(api_address).json()
ar=[]
def news():
for i in range(3):
ar.append("number "+str(i+1)+", " +
json_data["articles"][i]["title"]+".")
return ar
6.4 YT_auto.py
class music():
def __init__(self):
self.driver =
webdriver.Chrome(executable_path='D:\chromedriver_win32/chromedriv
er.exe')
self.driver.get(url="https://www.youtube.com/results?search_query=" +
query)
video= self.driver.find_element_by_xpath('//*[@id="title-
wrapper"]')
video.click()
29
7 FUTURE SCOPE
Voice assistants are one of many ‘smart’ devices making their way into our lives that make up
the Internet of Things. These are devices that are connected to wireless networks which can
communicate with other devices within our homes.
In 2017, there was an estimated 27 billion Internet of Things devices and this is expected to
grow by 12% every year to reach more than 125 billion devices by 2030. It is clear big tech
companies are investing heavily in these devices, so it will be interesting to consider how
these will have developed by 2030.
7.1 WHAT TASKS WILL VOICE ASSISTANTS BE ABLE TO DO IN THE FUTURE?
The ambitions of some companies show what voice assistants could successfully do in the
future. As an example, Google has developed Google Duplex, which is a technology that can
conduct natural conversations and carry out “real world” tasks over the phone.
Users can ask questions or give commands through Google Assistant and this can then use
Google Duplex to carry out specific tasks such as making a restaurant booking or scheduling
a hair appointment.
7.2 WHAT ROLES WILL BE AFFECTED BY VOICE ASSISTANTS?
Using Google Duplex as an example, roles that heavily require phone calling may be open to
disruption. Google Duplex is seen as beneficial to businesses by being able to arrange
appointments, gather information from different sources and address accessibility and
language barriers.
Therefore, roles such as a PA could be changed with improvements in voice assistant
technology. However, they will not necessarily be made redundant, as day to day interactions
can be complex for a computer and there would need to be a human operator to defer to from
time to time in order for a difficult task to be completed. It could be that a PA turns into a VA
manager.
30
8 Conclusion
Voice-Controlled Devices uses Natural Language Processing to process the language spoken
by the human and understand the query and process the query and respond to the human with
the result. The understanding of the device means Artificial Intelligence needs to be
integrated with the device so that the device can work in a smart way and can also control IoT
applications and devices and can also respond to query which will search the web for results
and process it. It is designed to minimize the human efforts and control the device with just
human Voice. The device can also be designed to interact with other intelligent voice-
controlled devices like IoT applications and devices, weather reports of a city from the
Internet, send an email to a client, add events on the calendar, etc. The accuracy of the
devices can be increased using machine learning and categorizing the queries in particular
result sets and using them in further queries. The accuracy of the devices is increasing
exponentially in the last decade. The devices can also be designed to accept commands in
bilingual language and respond back in the same language queried by the user. The device
can also be designed to help visually impaired people.
31
REFERENCE AND BIBLIOGRAPHY
• Websites referred
▪ www.stackoverflow.com
▪ www.pythonprogramming.net
▪ www.codecademy.com
▪ www.tutorialspoint.com
▪ www.google.co.in
• Books referred
▪ Python Programming - Kiran Gurbani
▪ Learning Python - Mark Lutz
• Documents referred
▪ Designing Personal Assistant Software for Task Management using Semantic
Web Technologies and Knowledge Databases
- Purushotham Botla
▪ Python code for Artificial Intelligence: Foundations of Computational Agents
- David L. Poole and Alan K. Mackworth
THANK YOU
32