Final Year Project Report
Final Year Project Report
Final Year Project Report
PROJECT REPORT
ON
“Real Time Prediction of Taxi Demand using Recurrent Neural Networks”
2020-2021
This is to certify that the project titled “Real Time Prediction of Taxi Demand using
Recurrent Neural Networks” is the bona fide work carried out by Rishabh Khandelwal, Shahid
Ali, Nishant Pahwa, Sarthak Chaturvedi students of B.Tech. (ECE) of Jaipur Engineering
College And Research Centre, Jaipur affiliated to Rajasthan Technical University, Kota,
Rajasthan (India) during the academic year 2020-21, in partial fulfilment of the requirements for
the award of the degree of Bachelor of Technology (Electronics and Communication
Engineering) and that the project has not formed the basis for the award previously of any other
degree, diploma, fellowship or any other similar title.
DR. S. K. SINGH
(Prof./Associate Prof./Assistant Prof.) Department
of ECE
ii
ACKNOWLEDGEMENT
We thank the Head of the Department Dr.Sandeep Vyas for his constant
suggestions, support and Encouragement towards the completion of the project
with perfection.
We express our heartfelt thanks to our Project in-charge Mr. Vikas
Sharma and our project mentor Mr. S.K. Singh Assistant Professor –
Department of Electronics and Communication Engineering for their
sustained encouragement, consecutive criticisms and support throughout this
project work.
We thank to all of our staff members of the Department of Electronics
and Communication who gave many suggestions from time to time that made
our project work better and wellfinished.
Also,we thank our parents ,all our friends for their moral support
and guidance in finishing our project.
TABLE OF CONTENTS
CHAPTE TITLE PAGE
R No
LIST OF FIGURES
1 INTRODUCTION 4
1.1 About the Project 5
2 SYSTEM ANALYSIS 8
Predicting taxi demand throughout a city can help to organize the taxi fleet and minimize the
wait-time for passengers and drivers. In this paper, we propose a sequence learning model
that can predict future taxi requests in each area of a city based on the recent demand and
other relevant information. Remembering information from the past is critical here, since
taxi requests in the future are correlated with information about actions that happened in the
past. For example, someone who requests a taxi to a shopping centre, may also request a taxi
to return home after few hours.
We use one of the best sequence learning methods, long short term memory that has a gating
mechanism to store the relevant information for future use. We evaluate our method on a
data set of taxi requests in New York City by dividing the city into small areas and predicting
the demand in each area. We show that this approach outperforms other prediction methods,
such as feed-forward neural networks. In addition, we show how adding other relevant
information, such as weather, time, and drop-offs affects the results.
CHAPTER 1
INTRODUCTION
Aim:
To predict the high demand need of pickup location for taxi services based on their previous
history.
.
Synopsis:
The service industry is booming for the last couple of years and it is expected to grow in the
near future. One of the important natures of the business is the serve the customer. To
effectively utilize the resource at hand is the key factor. Businesses are using advanced
technology to achieve this.
CHAPTER 2
SYSTEM ANALYSIS
2.1 EXISTING SYSTEM
TAXI drivers need to decide where to wait for passengers in order to pick up someone
as soon as possible. Passengers also prefer to quickly find a taxi whenever they are ready for
pickup. The control centre of the taxi service decides the busy area to be concentrated.
Sometimes the taxi were scattered across the larger area missing the time based busy area
like Airport, Business area, school area, Train stations etc,.
Problem Statement:
Managing fleet of taxi to crowded area.
Effective utilization of resources to reduce waiting time for passengers.
Serve more customers in short time by organizing the availability of taxi.
2.2PROPOSED SYSTEM
Effective taxi dispatching can help both drivers and passengers to minimize
the wait-time to find each other. Drivers do not have enough information about where
passengers and other taxis are and intend to go. Therefore, a taxi centre can organize the taxi
fleet and efficiently distribute them according to the demand from the entire city.To build
such a taxi centre, an intelligent system that can predict the future demand throughout the city
is required. Our system uses GPS location and other properties of the taxi like drop point,
pickup point etc. to predict the future demand. AnRecurrent Neural Networks (RNN) based
model is trained with given history data. This model is used to predict the demand in different
areas of the city.
CHAPTER 3
REQUIREMENT SPECIFICATIONS
3.1 INTRODUCTION
TAXI drivers need to decide where to wait for passengersin order to pick up someone as soon
as possible. Passengersalso prefer to quickly find a taxi whenever they are readyfor pickup.
Effective taxi dispatching can help both driversand passengers to minimize the wait-time to
find each other.Drivers do not have enough information about where passengersand other
taxis are and intend to go. Therefore, a taxicentre can organize the taxi fleet and efficiently
distribute themaccording to the demand from the entire city. This taxicentre is especially
needed in the future where self-drivingtaxis need to decide where to wait and pick up
passengers.To build such a taxi centre, an intelligent system that canpredict the future
demand throughout the city is required.Predicting taxi demand is challenging because it is
correlatedwith many pieces of underlying information. One of themost relevant sources of
information is historical taxi trips.
Thanks to the Global Positioning System (GPS) technology,taxi trip information can be
collected from GPS enabledtaxis. Analysing this data shows that there are repetitive patterns
in the data that can help to predict the demand in aparticular area at a specific time. Several
previous studies haveshown that it is possible to learn from past taxi data.In this paper, we
propose a real-time method for predictingtaxi demands in different areas of a city. We divide
a big cityinto smaller areas and aggregate the number of taxi requests ineach area during a
small time period (e.g. 20 minutes). In thisway, past taxi data becomes a data sequence of the
number oftaxi requests in each area. Then, we train a Long Short TermMemory (LSTM)
recurrent neural network (RNN) with thissequential data. The network input is the current
taxi demandand other relevant information while the output is the demandin the next time-
step. The reason we use a LSTM recurrentneural network is that it can be trained to store all
the relevant information in a sequence to predict particular outcomes inthe future. In addition,
taxi demand prediction is a time seriesforecasting problem in which an intelligent sequence
analysismodel is required. LSTMs are the state of the art sequencelearning models that are
widely used in many applicationssuch as unsegmented handwriting generation and
naturallanguage processing. LSTMis capable of learning longterm dependencies by utilizing
some gating mechanisms tostore information. Therefore, it can for instance remember
howmany people have requested taxis to attend a concert and aftera couple of hours use this
information to predict that the samenumber of people will request taxis from the concert
locationto different areas.However, predicting real-valued numbers is tricky becausemany
times simply learning the average of the values in thedataset does not give a valid solution. It
will also confuseLSTM in the next time-step since the network has not seenthe average
before. Therefore, we add Mixture Density Networks(MDN) on top of LSTM. In this way,
instead of direct predicting a demand value, we output a mixture distributionof the demand. A
sample can be drawn from thisprobability distribution and be treated as the predicted
taxidemand.The remainder of this paper is organized as follows.Section II introduces related
works on prediction applicationsusing past taxi data and sequential learning applications
ofLSTMs. Section III shows how we encode the huge numberof GPS records and a brief
explanation of recurrent neuralnetworks. Section IV describes the proposed sequence
learningmodel, as well as the training and testing procedures.In Section V, we show the
performance metrics and presentthe experiment results. Lastly, in Section VI we conclude
thepaper.
Python is a programming language that lets you work quickly and integrate systems
more efficiently.
It is used for:
Why Python?
Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc).
Python has a simple syntax similar to the English language.
Python has syntax that allows developers to write programs with fewer lines than
some other programming languages.
Python runs on an interpreter system, meaning that code can be executed as soon as it
is written. This means that prototyping can be very quick.
Python can be treated in a procedural way, an object-orientated way or a functional
way.
Good to know
The most recent major version of Python is Python 3, which we shall be using in this
tutorial. However, Python 2, although not being updated with anything other than
security updates, is still quite popular.
Python 2.0 was released in 2000, and the 2.x versions were the prevalent releases until
December 2008. At that time, the development team made the decision to release
version 3.0, which contained a few relatively small but significant changes that were
not backward compatible with the 2.x versions. Python 2 and 3 are very similar, and
some features of Python 3 have been backported to Python 2. But in general, they
remain not quite compatible.
Both Python 2 and 3 have continued to be maintained and developed, with periodic
release updates for both. As of this writing, the most recent versions available are
2.7.15 and 3.6.5. However, an official End Of Life date of January 1, 2020 has been
established for Python 2, after which time it will no longer be maintained.
Python is still maintained by a core development team at the Institute, and Guido is
still in charge, having been given the title of BDFL (Benevolent Dictator For Life) by
the Python community. The name Python, by the way, derives not from the snake, but
from the British comedy troupe Monty Python’s Flying Circus, of which Guido was,
and presumably still is, a fan. It is common to find references to Monty Python
sketches and movies scattered throughout the Python documentation.
It is possible to write Python in an Integrated Development Environment, such as
Thonny, Pycharm, Netbeans or Eclipse which are particularly useful when
managinglarger collections of Python files.
Python was designed to for readability, and has some similarities to the English
language with influence from mathematics.
Python uses new lines to complete a command, as opposed to other programming
languages which often use semicolons or parentheses.
Python relies on indentation, using whitespace, to define scope; such as the scope of
loops, functions and classes. Other programming languages often use curly-brackets
for this purpose.
Many languages are compiled, meaning the source code you create needs to be
translated into machine code, the language of your computer’s processor, before it can
be run. Programs written in an interpreted language are passed straight to an
interpreter that runs them directly.
This makes for a quicker development cycle because you just type in your code and
run it, without the intermediate compilation step.
One potential downside to interpreted languages is execution speed. Programs that are
compiled into the native language of the computer processor tend to run more quickly
than interpreted programs. For some applications that are particularly computationally
intensive, like graphics processing or intense number crunching, this can be limiting.
In practice, however, for most programs, the difference in execution speed is
measured in milliseconds, or seconds at most, and not appreciably noticeable to a
human user. The expediency of coding in an interpreted language is typically worth it
for most applications.
For all its syntactical simplicity, Python supports most constructs that would be
expected in a very high-level language, including complex dynamic data types,
structured and functional programming, and object-oriented programming.
Additionally, a very extensive library of classes and functions is available that
provides capability well beyond what is built into the language, such as database
manipulation or GUI programming.
Python accomplishes what many programming languages don’t: the language itself is
simply designed, but it is very versatile in terms of what you can accomplish with it.
Machine learning tasks are classified into several broad categories. In supervised learning, the
algorithm builds a mathematical model from a set of data that contains both the inputs and the
desired outputs. For example, if the task were determining whether an image contained a
certain object, the training data for a supervised learning algorithm would include images
with and without that object (the input), and each image would have a label (the output)
designating whether it contained the object. In special cases, the input may be only partially
available, or restricted to special feedback. Semi algorithms develop mathematical models
from incomplete training data, where a portion of the sample input doesn't have labels.
In unsupervised learning, the algorithm builds a mathematical model from a set of data that
contains only inputs and no desired output labels. Unsupervised learning algorithms are used
to find structure in the data, like grouping or clustering of data points. Unsupervised learning
can discover patterns in the data, and can group the inputs into categories, as in feature
learning. Dimensionality reduction is the process of reducing the number of "features", or
inputs, in a set of data.
Active learning algorithms access the desired outputs (training labels) for a limited set of
inputs based on a budget and optimize the choice of inputs for which it will acquire training
labels. When used interactively, these can be presented to a human user for
labeling. Reinforcement learning algorithms are given feedback in the form of positive or
negative reinforcement in a dynamic environment and are used in autonomous vehicles or in
learning to play a game against a human opponent. Other specialized algorithms in machine
learning include topic modeling, where the computer program is given a set of natural
language documents and finds other documents that cover similar topics. Machine learning
algorithms can be used to find the unobservable probability density function in density
estimation problems. Meta learning algorithms learn their own inductive bias based on
previous experience. In developmental robotics, robot learning algorithms generate their own
sequences of learning experiences, also known as a curriculum, to cumulatively acquire new
skills through self-guided exploration and social interaction with humans. These robots use
guidance mechanisms such as active learning, maturation, motor synergies, and imitation.
The types of machine learning algorithms differ in their approach, the type of data they input
and output, and the type of task or problem that they are intended to solve.
Supervised learning:
Supervised learning algorithms build a mathematical model of a set of data that contains both
the inputs and the desired outputs. The data is known as training data, and consists of a set of
training examples. Each training example has one or more inputs and the desired output, also
known as a supervisory signal. In the mathematical model, each training example is
represented by an array or vector, sometimes called a feature vector, and the training data is
represented by a matrix. Through iterative optimization of an objective function, supervised
learning algorithms learn a function that can be used to predict the output associated with
new inputs. An optimal function will allow the algorithm to correctly determine the output
for inputs that were not a part of the training data. An algorithm that improves the accuracy of
its outputs or predictions over time is said to have learned to perform that task.
Supervised learning algorithms include classification and regression. Classification
algorithms are used when the outputs are restricted to a limited set of values, and regression
algorithms are used when the outputs may have any numerical value within a
range. Similarity learning is an area of supervised machine learning closely related to
regression and classification, but the goal is to learn from examples using a similarity
function that measures how similar or related two objects are. It has applications
in ranking, recommendation systems, visual identity tracking, face verification, and speaker
verification.
In the case of semi-supervised learning algorithms, some of the training examples are missing
training labels, but they can nevertheless be used to improve the quality of a model.
In weakly supervised learning, the training labels are noisy, limited, or imprecise; however,
these labels are often cheaper to obtain, resulting in larger effective training sets.
Unsupervised learning:
Unsupervised learning algorithms take a set of data that contains only inputs, and find
structure in the data, like grouping or clustering of data points. The algorithms, therefore,
learn from test data that has not been labeled, classified or categorized. Instead of responding
to feedback, unsupervised learning algorithms identify commonalities in the data and react
based on the presence or absence of such commonalities in each new piece of data. A central
application of unsupervised learning is in the field of density estimation in statistics, though
unsupervised learning encompasses other domains involving summarizing and explaining
data features.
Cluster analysis is the assignment of a set of observations into subsets (called clusters) so that
observations within the same cluster are similar according to one or more pre designated
criteria, while observations drawn from different clusters are dissimilar. Different clustering
techniques make different assumptions on the structure of the data, often defined by
some similarity metric and evaluated, for example, by internal compactness, or the similarity
between members of the same cluster, and separation, the difference between clusters. Other
methods are based on estimated density and graph connectivity.
Semi-supervised learning:
Semi-supervised learning falls between unsupervised learning (without any labeled training
data) and supervised learning (with completely labeled training data). Many machine-
learning researchers have found that unlabeled data, when used in conjunction with a small
amount of labeled data, can produce a considerable improvement in learning accuracy.
Introduction to RNN:
Traditional neural networks can’t do this, and it seems like a major shortcoming. For
example, imagine you want to classify what kind of event is happening at every point in a
movie. It’s unclear how a traditional neural network could use its reasoning about previous
events in the film to inform later ones.
Recurrent neural networks address this issue. They are networks with loops in them,
allowing information to persist.
This chain-like nature reveals that recurrent neural networks are intimately related to
sequences and lists. They’re the natural architecture of neural network to use for such data.
And they certainly are used! In the last few years, there has been incredible success
applying RNNs to a variety of problems: speech recognition, language modeling, translation,
image captioning… The list goes on. But they really are pretty amazing.
Essential to these successes is the use of “LSTMs,” a very special kind of recurrent
neural network which works, for many tasks, much better than the standard version. Almost all
exciting results based on recurrent neural networks are achieved with them. It’s these LSTMs
that this essay will explore.
One of the appeals of RNNs is the idea that they might be able to connect previous
information to the present task, such as using previous video frames might inform the
understanding of the present frame. If RNNs could do this, they’d be extremely useful. But can
they? It depends.
Sometimes, we only need to look at recent information to perform the present task. For
example, consider a language model trying to predict the next word based on the previous ones.
If we are trying to predict the last word in “the clouds are in the sky,” we don’t need any further
context – it’s pretty obvious the next word is going to be sky. In such cases, where the gap
between the relevant information and the place that it’s needed is small, RNNs can learn to use
the past information.
But there are also cases where we need more contexts. Consider trying to predict the last
word in the text “I grew up in France… I speak fluent French.” Recent information suggests that
the next word is probably the name of a language, but if we want to narrow down which
language, we need the context of France, from further back. It’s entirely possible for the gap
between the relevant information and the point where it is needed to become very large.
Unfortunately, as that gap grows, RNNs become unable to learn to connect the information.
LSTM Networks
Long Short Term Memory networks – usually just called “LSTMs” – are a special kind
of RNN, capable of learning long-term dependencies. They were introduced
by Hochreiter&Schmidhuber (1997), and were refined and popularized by many people in
following work.1 They work tremendously well on a large variety of problems, and are now
widely used.
LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering
information for long periods of time is practically their default behavior, not something they
struggle to learn!
All recurrent neural networks have the form of a chain of repeating modules of neural network.
In standard RNNs, this repeating module will have a very simple structure, such as a single tanh
layer.
LSTMs also have this chain like structure, but the repeating module has a different
structure. Instead of having a single neural network layer, there are four, interacting in a very
special way.
Don’t worry about the details of what’s going on. We’ll walk through the LSTM
diagram step by step later. For now, let’s just try to get comfortable with the notation we’ll be
using.
In the above diagram, each line carries an entire vector, from the output of one node to
the inputs of others. The pink circles represent pointwise operations, like vector addition, while
the yellow boxes are learned neural network layers. Lines merging denote concatenation, while
a line forking denote its content being copied and the copies going to different locations.
The key to LSTMs is the cell state, the horizontal line running through the top of the
diagram.
The cell state is kind of like a conveyor belt. It runs straight down the entire chain, with
only some minor linear interactions. It’s very easy for information to just flow along it
unchanged.
The LSTM does have the ability to remove or add information to the cell state, carefully
regulated by structures called gates.
Gates are a way to optionally let information through. They are composed out of a
sigmoid neural net layer and a pointwise multiplication operation.
The sigmoid layer outputs numbers between zero and one, describing how much of each
component should be let through. A value of zero means “let nothing through,” while a value of
one means “let everything through!”
An LSTM has three of these gates, to protect and control the cell state.
The first step in our LSTM is to decide what information we’re going to throw away
from the cell state. This decision is made by a sigmoid layer called the “forget gate layer.” It
looks at \(h_{t-1}\) and \(x_t\), and outputs a number between \(0\) and \(1\) for each number in
the cell state \(C_{t-1}\). A \(1\) represents “completely keep this” while a \(0\) represents
“completely get rid of this.”
Let’s go back to our example of a language model trying to predict the next word based
on all the previous ones. In such a problem, the cell state might include the gender of the present
subject, so that the correct pronouns can be used. When we see a new subject, we want to forget
the gender of the old subject.
The next step is to decide what new information we’re going to store in the cell state.
This has two parts. First, a sigmoid layer called the “input gate layer” decides which values we’ll
update. Next, a tanh layer creates a vector of new candidate values, \(\tilde{C}_t\), that could be
added to the state. In the next step, we’ll combine these two to create an update to the state.
In the example of our language model, we’d want to add the gender of the new subject to
the cell state, to replace the old one we’re forgetting.
It’s now time to update the old cell state, \(C_{t-1}\), into the new cell state \(C_t\). The
previous steps already decided what to do; we just need to actually do it.
We multiply the old state by \(f_t\), forgetting the things we decided to forget earlier.
Then we add \(i_t*\tilde{C}_t\). This is the new candidate values, scaled by how much we
decided to update each state value.
In the case of the language model, this is where we’d actually drop the information about
the old subject’s gender and add the new information, as we decided in the previous steps.
Finally, we need to decide what we’re going to output. This output will be based on our
cell state, but will be a filtered version. First, we run a sigmoid layer which decides what parts of
the cell state we’re going to output. Then, we put the cell state through \(\tanh\) (to push the
values to be between \(-1\) and \(1\)) and multiply it by the output of the sigmoid gate, so that
we only output the parts we decided to.
For the language model example, since it just saw a subject, it might want to output
information relevant to a verb, in case that’s what is coming next. For example, it might output
whether the subject is singular or plural, so that we know what form a verb should be conjugated
into if that’s what follows next.
The above diagram adds peepholes to all the gates, but many papers will give some
peepholes and not others.
Another variation is to use coupled forget and input gates. Instead of separately deciding
what to forget and what we should add new information to, we make those decisions together.
We only forget when we’re going to input something in its place. We only input new values to
the state when we forget something older.
A slightly more dramatic variation on the LSTM is the Gated Recurrent Unit, or GRU,
introduced by Cho, et al. (2014). It combines the forget and input gates into a single “update
gate.” It also merges the cell state and hidden state, and makes some other changes. The
resulting model is simpler than standard LSTM models, and has been growing increasingly
popular.
These are only a few of the most notable LSTM variants. There are lots of others, like
Depth Gated RNNs by Yao, et al. (2015). There’s also some completely different approach to
tackling long-term dependencies, like Clockwork RNNs by Koutnik, et al. (2014).
Which of these variants is best? Do the differences matter? Greff, et al. (2015) do a nice
comparison of popular variants, finding that they’re all about the same. Jozefowicz, et al.
(2015)tested more than ten thousand RNN architectures, finding some that worked better than
LSTMs on certain tasks.
Conclusion
Earlier, I mentioned the remarkable results people are achieving with RNNs. Essentially
all of these are achieved using LSTMs. They really work a lot better for most tasks!
Written down as a set of equations, LSTMs look pretty intimidating. Hopefully, walking
through them step by step in this essay has made them a bit more approachable.
LSTMs were a big step in what we can accomplish with RNNs. It’s natural to wonder: is
there another big step? A common opinion among researchers is: “Yes! There is a next step and
its attention!” The idea is to let every step of RNN pick information to look at from some larger
collection of information. For example, if you are using an RNN to create a caption describing
an image, it might pick a part of the image to look at for every word it outputs. In fact, Xu, et al.
(2015) do exactly this – it might be a fun starting point if you want to explore attention! There’s
been a number of really exciting results using attention and it seems like a lot more are around
the corner…
CHAPTER 4
The application at this side controls and communicates with the following three main
general components.
embedded browser in charge of the navigation and accessing to the web service;
Server Tier: The server side contains the main parts of the functionality of the
proposed architecture. The components at this tier are the following.
1. The software may be safety-critical. If so, there are issues associated with its integrity
level
2. The software may not be safety-critical although it forms part of a safety-critical system.
3. If a system must be of a high integrity level and if the software is shown to be of that
integrity level, then the hardware must be at least of the same integrity level.
4. There is little point in producing 'perfect' code in some language if hardware and system
5. If a computer system is to run software of a high integrity level then that system should
same environment.
CHAPTER 5
Graph Display
Fig: 5.1
objects. They require use cases, system operation contracts and domain model to already
exist. The collaboration diagram illustrates messages being sent between classes and objects.
5.6 DATA FLOW DIAGRAM:
A Data Flow Diagram (DFD) is a graphical representation of the “flow” of data through an
information system, modeling its aspects. It is a preliminary step used to create an overview
of the system which can later be elaborated DFDs can also be used for visualization of data
processing.
Level 1:
Level 2:
Level 3:
CHAPTER 6
6.1 MODULES
Dataset pre-processing
Recurrent Neural Network
Prediction, result presentation
6.2 MODULE EXPLANATION:
A graph is plotted for the future prediction for the next time slot and the area to be
crowded. This machine learning model predicts the future demand area in a city based on NN
and the drivers were taken to wait in the area where the system identified as demand area.
CHAPTER 7
CODING AND TESTING
7.1 CODING
Once the design aspect of the system is finalizes the system enters into the coding and testing
phase. The coding phase brings the actual system into action by converting the design of the
system into the code in a given programming language. Therefore, a good coding style has to
be taken whenever changes are required it easily screwed into the system.
and appearance of the program. They make the code easier to read, understand and maintain.
This phase of the system actually implements the blueprint developed during the design
phase. The coding specification should be in such a way that any programmer must be able to
understand the code and can bring about changes whenever felt necessary.Some of the
Naming conventions
Value conventions
be self-descriptive. One should even get the meaning and scope of the variable by its name.
The conventions are adopted for easy understanding of the intended message by the user. So
Class names
Class names are problem domain equivalence and begin with capital letter and have
mixed cases.
subsequent letters of the new words in uppercase and the rest of letters in lowercase.
following:
statements are to be properly aligned to facilitate easy understanding. Comments are included
to minimize the number of surprises that could occur when going through the code.
achieve this, a specific format has been adopted in displaying messages to the user. They are
as follows:
part of the entire development and maintenance process. The goal of the testing during phase
is to verify that the specification has been accurately and completely incorporated into the
design, as well as to ensure the correctness of the design itself. For example the design must
not have any logic faults in the design is detected before coding commences, otherwise the
cost of fixing the faults will be considerably higher as reflected. Detection of design faults
Testing is one of the important steps in the software development phase. Testing checks
for the errors, as a whole of the project testing involves the following test cases:
Static analysis is used to investigate the structural properties of the Source code.
Dynamic testing is used to investigate the behavior of the source code by executing
the software. Unit testing focuses on the smallest unit of the software design (i.e.), the
module. The white-box testing techniques were heavily employed for unit testing.
for which the expected results are known, as well as boundary values and special values, such
Performance Test
Stress Test
Structure Test
7.4.3 PERFORMANCE TEST
It determines the amount of execution time spent in various parts of the unit, program
throughput, and response time and device utilization by the program unit.
Stress Test is those test designed to intentionally break the unit. A Great deal can be
learned about the strength and limitations of a program by examining the manner in which a
traversing particular execution paths. The way in which White-Box test strategy was
employed to ensure that the test cases could Guarantee that all independent paths within a
Execute all loops at their boundaries and within their operational bounds.
Handling end of file condition, I/O errors, buffer problems and textual errors
in output information
structure while at the same time conducting tests to uncover errors associated with
interfacing. i.e., integration testing is the complete testing of the set of modules which makes
up the product. The objective is to take untested modules and build a program structure tester
should identify critical modules. Critical modules should be tested as early as possible. One
approach is to wait until all the units have passed testing, and then combine them and then
tested. This approach is evolved from unstructured testing of small programs. Another
strategy is to construct the product in increments of tested units. A small set of modules are
integrated together and tested, to which another module is added and tested in combination.
And so on. The advantages of this approach are that, interface dispenses can be easily found
and corrected.
The major error that was faced during the project is linking error. When all the
modules are combined the link is not set properly with all support files. Then we checked out
for interconnection and the links. Errors are localized to the new module and its
they complete unit testing. Testing is completed when the last module is integrated and
tested.
test case is one that has a high probability of finding an as-yet –undiscovered error. A
successful test is one that uncovers an as-yet- undiscovered error. System testing is the stage
of implementation, which is aimed at ensuring that the system works accurately and
efficiently as expected before live operation commences. It verifies that the whole set of
programs hang together. System testing requires a test consists of several key activities and
steps for run program, string, system and is important in adopting a successful new system.
This is the last chance to detect and correct errors before the system is installed for user
acceptance testing.
The software testing process commences once the program is created and the
documentation and related data structures are designed. Software testing is essential for
correcting errors. Otherwise the program or the project is not said to be complete. Software
testing is the critical element of software quality assurance and represents the ultimate the
review of specification design and coding. Testing is the process of executing the program
with the intent of finding the error. A good test case design is one that as a probability of
finding an yet undiscovered error. A successful test is one that uncovers an yet undiscovered
error. Any engineering product can be tested in one of the two ways:
specific functions that a product has been design to perform test can be conducted that
demonstrate each function is fully operational at the same time searching for errors in each
function. It is a test case design method that uses the control structure of the procedural
design to derive test cases. Basis path testing is a white box testing.
Cyclometric complexity
that “all gears mesh”, that is the internal operation performs according to specification and all
Equivalence partitioning
Boundary value analysis
Comparison testing
activity that can be planned in advance and conducted systematically. For this reason a
template for software testing a set of steps into which we can place specific test case design
Testing begins at the module level and works “outward” toward the
The developer of the software and an independent test group conducts testing.
the same time conducting tests to uncover errors associated with. Individual modules, which
are highly prone to interface errors, should not be assumed to work instantly when we put
them together. The problem of course, is “putting them together”- interfacing. There may be
the chances of data lost across on another’s sub functions, when combined may not produce
error in a program statement that in violates one or more rules of the language in which it is
written. An improperly defined field dimension or omitted keywords are common syntax
error. These errors are shown through error messages generated by the computer. A logic
error on the other hand deals with the incorrect data fields, out-off-range items and invalid
combinations. Since the compiler s will not deduct logical error, the programmer must
examine the output. Condition testing exercises the logical conditions contained in a module.
The possible types of elements in a condition include a Boolean operator, Boolean variable, a
testing method focuses on testing each condition in the program the purpose of condition test
is to deduct not only errors in the condition of a program but also other a errors in the
program.
in fact, protect it from improper penetration. The system security must be tested for
invulnerability from frontal attack must also be tested for invulnerability from rear attack.
During security, the tester places the role of individual who desires to penetrate system.
completion of the project with the help of the user by negotiating to establish a method for
resolving deficiencies. Thus the proposed system under consideration has been tested by
using validation testing and found tobe working satisfactorily. Though there were
deficiencies in the system they were not catastrophic
User acceptance of the system is key factor for the success of any system. The system under
consideration is tested for user acceptance by constantly keeping in touch with prospective
system and user at the time of developing and making changes whenever required. This is
app.layout = html.Div(
[
html.Div(
[
html.H1(
'Estimating Taxi Demand at Delhi Airport',
className='ten columns', style = {'font-weight':'bold'}
),
],
className='row', style={'margin-bottom':'20px', 'color' : '#000000'}
),
html.Div(
[
@app.callback(
dash.dependencies.Output('Prediction', 'children'),
[dash.dependencies.Input('my-date-picker-single', 'date'),
dash.dependencies.Input('precipitation', 'value'),
dash.dependencies.Input('weather', 'value'),
dash.dependencies.Input('temp', 'value'),
dash.dependencies.Input('hour', 'value')])
@app.callback(
dash.dependencies.Output('predict_graph', 'figure'),
[dash.dependencies.Input('my-date-picker-single', 'date'),
dash.dependencies.Input('precipitation', 'value'),
dash.dependencies.Input('weather', 'value'),
dash.dependencies.Input('temp', 'value'),
dash.dependencies.Input('hour', 'value')])
def day_format(day):
days = np.zeros(7)
days[day] = 1
return days
def month_format(month):
months = np.zeros(12)
months[month-1] = 1
return months
def weather_format(weather):
array = np.zeros(6)
if 'clear' in weather:
array[0] = 1
if 'clouds' in weather:
array[1] = 1
if 'fog' in weather:
array[2] = 1
if 'rain' in weather:
array[3] = 1
if 'snow' in weather:
array[4] = 1
if 'thunderstorm' in weather:
array[5] = 1
return array
def hour_format(hour):
hours = np.zeros(4)
if hour < 7:
hours[0] = 1
elif hour < 13:
hours[1] = 1
elif hour < 19:
hours[2] = 1
else:
hours[3] = 1
return hours
def is_holiday(date):
dr = pd.date_range(pd.Timestamp('2014-01-01'), pd.Timestamp('2020-12-31'))
cal = calendar()
holidays = cal.holidays(start=dr.min(), end=dr.max())
return date in holidays
if __name__ == '__main__':
app.server.run(debug=True, threaded=True)
Screenshots:
LSTM MODULE :-
No of Weeks by No of Days
REFERENCES
[1] N. J. Yuan, Y. Zheng, L. Zhang, and X. Xie, “T-finder: A recommendersystem for finding
passengers and vacant taxis,” IEEE Trans. Knowl.Data Eng., vol. 25, no. 10, pp. 2390–2403,
Oct. 2013.
2] K. T. Seow, N. H. Dang, and D.-H. Lee, “A collaborative multiagenttaxi-dispatch system,”
IEEE Trans. Autom. Sci. Eng., vol. 7, no. 3,pp. 607–616, Jul. 2010.
benefits of vehicle pooling with shareability networks,”Proc. Nat. Acad. Sci. USA, vol. 111,
evolution prediction using deep learning theory,”PLoS ONE, vol. 10, no. 3, p. e0119044,
2015.
demand prediction and recommendation,” inProc. IEEE SCC, Jun. 2016, pp. 340–347.
[6] K. Zhao, D. Khryashchev, J. Freire, C. Silva, and H. Vo, “Predictingtaxi demand at high
spatial resolution: Approaching the limit of predictability,”in Proc. IEEE BigData, Dec.
based on big data from a roving sensor network,”IEEE Trans. Big Data, vol. 3, no. 1, pp. 362
[8] F. Miao et al., “Taxi dispatch with real-time sensing data in metropolitanareas: A
receding horizon control approach,” IEEE Trans. Autom. Sci.Eng., vol. 13, no. 2, pp. 463–