Batch 9
Batch 9
Batch 9
Bachelor of Technology
in
COMPUTER SCIENCE & ENGINEERING
Submitted by
2016 -2020
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
CERTIFICATE
This is to certify that the Project entitled “LOW- COST CONCRETE BY PARTIAL
REPLACEMENT OF FINE AGGREGATE WITH SAWDUST” is being submitted
by
N.G. CHAITANYA REDDY - 164E1A0504
KORRAKUTI MURALI - 164E1A0522
A. MONISHA - 164E1A0521
P. JUVERIA AJUM - 164E1A0511
in partial fulfillment of the requirements for the award of BACHELOR OF
TECHNOLOGY in COMPUTER SCIENCE & ENGINEERING to JNTUA,
Ananthapuramu. This project work or part thereof has not been submitted to any
other University or Institute for the award of any degree.
We would also like to extend our gratitude to Dr. D. William Albert, Head of the
Computer Science & Engineering Department for his encouragement and for
providing the facilities to carry out the work in a successful manner.
We are thankful to Dr. M. Janardhana Raju, Principal for his encouragement and
support.
We wish to express our sincere thanks to Dr. K. Indiraveni, Vice-Chairman, and Dr.
K. Ashok Raju, Chairman of Siddharth Group of Institutions, Puttur, for providing
ample facilities to complete the project work.
We would also like to thank all the faculty and staff of the Computer Science &
Engineering Department, for helping us to complete the project work.
Very importantly, we would like to place on record our profound indebtedness to our
parents and families for their substantial moral support and encouragement
throughout our studies.
CONTENTS
DEDICATIONS i
QUOTATIONS ii
LIST OF FIGURES iii
ABSTRACT v
1. INTRODUCTION
2. SYSTEM ANALYSIS
2.1 Problem Statement 23
2.2 Problem Description 23
2.3 Existing System 23
2.4 Disadvantages of Existing System 24
2.5 Proposed System 24
2.6 Advantages of Proposed System 24
4. LITERATURE SURVEY 27
5. SYSTEM DESIGN
5.1 System Architecture 31
5.2 Modules 32
5.2.1 Data Understanding 32
5.2.2 Exploratory Data Analysis 32
5.2.3 Tensor Flow 33
5.2.4 Machine Learning Models 33
5.2.5 Deep Learning Models 33
5.2.6 Validation 34
5.3 Algorithm 34
5.4 Introduction to UML 38
5.5 UML diagrams 39
6. SYSTEM IMPLEMENTATION
6.1 Software Description 46
6.1.1 Introduction to Google Colaboratory 46
6.1.2 Introduction to jupyter Notebook 49
6.2 Python Libraries for data Science 55
6.4 Sample code 58
Chapter No Title Page. No
7. SYSTEM TESTING
7.1 Validation 62
7.2 Types of Validation 62
7.2.1 Validation 63
7.2.2 Leave One Out Cross Validation 63
7.2.3 KFold Cross Validation 63
7.3 Cross Validating 65
8. RESULTS
8.1 Execution procedure 67
8.2 Screen Shots 67
10. BIBLIOGRAPHY
10.1 References 75
10.2 Websites 75
10.3 Text Books 75
i
QUOTATIONS
-ALBERT EINSTEN
ii
LIST OF FIGURES
5.3(a) Perceptron 34
iii
LIST OF FIGURES
7.3(a) Validation 65
8.2.1-8.2.11 Screenshots 67
iv
ABSTRACT
As of late, Deep Learning, Machine Learning, and Artificial Intelligence are profoundly
engaged ideas of information science. Deep learning has made progress in the field of Computer
Vision, Speech and Audio Processing, and Natural Language Processing. It has the solid learning
capacity that can improve use of datasets for the information extraction contrasted with
conventional Machine Learning Algorithm. Perceptron is the fundamental structure hinder for
making a profound Neural Network, it breaks down the unsupervised data, making it an important
instrument for information analytics. A key assignment of this paper is to create and analyze
Deep learning Algorithm and compared with the conventional Machine learning Algorithms,
Model starts with Deep learning with perceptron and how to apply it to comprehend different
issues in non-separable information. The fundamental piece of this paper is to make perceptron
learning Algorithm respectful with non-separable training datasets when compared with the
conventional Machine learning Algorithms.
v
Implementation of Deep Learning Algorithm with Perceptron using TensorFlow
Library
1. INTRODUCTION
Data science is a "concept to unify statistics, data analysis, machine learning and
their related methods" in order to "understand and analyze actual phenomena" with
data.[3] It employs techniques and theories drawn from many fields within the context
of mathematics, statistics, computer science, and information science. Turing
award winner Jim Gray imagined data science as a "fourth paradigm" of science
(empirical, theoretical, computational and now data-driven) and asserted that "everything
about science is changing because of the impact of information technology" and the data
deluge. In 2015, the American Statistical Association identified database management,
statistics and machine learning, and distributed and parallel systems as the three emerging
foundational professional communities.
Historical overview
The term "data science" has appeared in various contexts over the past thirty years
but did not become an established term until recently. In an early usage, it was used as a
substitute for computer science by Peter Naur in 1960. Naur later introduced the term
"datalogy". In 1974, Naur published Concise Survey of Computer Methods, which freely
used the term data science in its survey of the contemporary data processing methods that
are used in a wide range of applications.
The modern definition of "data science" was first sketched during the second
Japanese-French statistics symposium organized at the University of Montpellier
II (France) in 1992. The attendees acknowledged the emergence of a new discipline with a
In November 1997, C.F. Jeff Wu gave the inaugural lecture entitled "Statistics =
Data Science?" for his appointment to the H. C. Carver Professorship at the University of
Michigan. In this lecture, he characterized statistical work as a trilogy of data collection,
data modeling and analysis, and decision making. In his conclusion, he initiated the
modern, non-computer science, usage of the term "data science" and advocated that
statistics be renamed data science and statisticians data scientists. Later, he presented his
lecture entitled "Statistics = Data Science?" as the first of his 1998 P.C. Mahalanobis
Memorial Lectures. These lectures honor Prasanta Chandra Mahalanobis, an Indian
scientist and statistician and founder of the Indian Statistical Institute.
In April 2002, the International Council for Science (ICSU): Committee on Data
for Science and Technology (CODATA) started the Data Science Journal, a publication
focused on issues such as the description of data systems, their publication on the internet,
Around 2007, Turing award winner Jim Gray envisioned "data-driven science" as
a "fourth paradigm" of science that uses the computational analysis of large data as primary
scientific method and "to have a world in which all of the science literature is online, all of
the science data is online, and they interoperate with each other."
In the 2012 Harvard Business Review article "Data Scientist: The Sexiest Job of
the 21st Century", DJ Patil claims to have coined this term in 2008 with Jeff
Hammerbacher to define their jobs at LinkedIn and Facebook, respectively. He asserts that
a data scientist is "a new breed", and that a "shortage of data scientists is becoming a serious
constraint in some sectors” but describes a much more business-oriented role.
In 2013, the IEEE Task Force on Data Science and Advanced Analytics was
launched. In 2013, the first "European Conference on Data Analysis (ECDA)" was
organized in Luxembourg, establishing the European Association for Data Science
(EuADS). The first international conference: IEEE International Conference on Data
Science and Advanced Analytics was launched in 2014.
Data science can be used to assist minimize costs, discover new markets and make
better decisions. It will be a bit difficult to pinpoint a single or specific example to make
one understand about Data Science since it mainly works in confluence with other
variables.
There are many algorithms and methods that data experts analyse to help the
managers and directors extend their companies bottom line and strategic positioning. Data
science techniques are incredible at spotting abnormalities, optimizing constraint
problems, predicting and targeting.
For example, data scientists can analyse the data on food habits of the people in a
certain area and can help in identifying the location and type of food joint that would be
successful. They can also predict the amount of sale a particular food joint will earn there.
Obtaining Data
Before delving into any research, we need to collect or obtain our data. We need
to first identify our strong reasons as to why we are collecting data and if we do what will
our sources be? It can be internal or external, there can be an existing database which has
to re-analyzed, create a new database, or use both. Here, the Database Management
System comes into play.
SQL is more conventionally used data based, however, NoSQL has become more
popular in recent years.
Process
• Going through both SQL and NoSQL database and identify one that you need
to follow
• Querying these databases
• Retrieving unstructured data in the form of videos, audios, texts, documents,
etc.
• Storing the information
Cleaning Data
Data cleaning or Data Scrubbing is the technique of detecting and correcting (or
removing) corrupt or inaccurate pieces of information from a record set, table, or
database. It helps in figuring out incomplete, incorrect, inaccurate or irrelevant parts of
the data and then replacing, modifying, or deleting the soiled or coarse data.
Data Visualization
Once your data is preprocessed, now you have the matter to work upon. Though
the data is clean it is still raw, and it will be hard to convey the hidden meaning behind
those numbers and statistics to the end client. Data Visualization is like a visual
interpretation of the raw data with the help of statistical graphics, plots, information
graphics and other tools.
To convey ideas effectively, both aesthetic form and functionality need to go hand
in hand, providing insights into a rather sparse and complex data set by communicating
its key-aspects in a more intuitive way.
Data Modelling
Data modelling is used to define and analyze data requirements, it is the framework of
what the relations are within the database. A data model can be thought of as a flowchart
that illustrates the relationships between data. Models are simply general rules in a
statistical sense. There are quite a few special strategies for facts modelling, including:
Logical Data Modeling – illustrates the unique entities, attributes and relationships
involved in an enterprise function. Serves as the basis for the creation of the physical
records model. Physical Data Modeling represents an application and database-specific
implementation of a logical information model, One ought to have this particular skill-set
to go about it.
• Evaluation methods
All the above steps were restricted to your personal boundary; however, data
interpretation makes your work go public. It is supposedly the toughest of all the steps
followed above and also the culmination point.
Machine Learning
Any technology user today has benefitted from machine learning. Facial
recognition technology allows social media platforms to help users tag and share photos of
friends. Optical character recognition (OCR) technology converts images of text into
movable type. Recommendation engines, powered by machine learning, suggest what
movies or television shows to watch next based on user preferences. Self-driving cars that
rely on machine learning to navigate may soon be available to consumers.
In machine learning, tasks are generally classified into broad categories. These
categories are based on how learning is received or how feedback on the learning is given
to the system developed.
Two of the most widely adopted machine learning methods are supervised
learning which trains algorithms based on example input and output data that is labeled by
humans, and unsupervised learning which provides the algorithm with no labeled data in
order to allow it to find structure within its input data. Let’s explore these methods in more
detail.
Supervised Learning
In supervised learning, the computer is provided with example inputs that are
labeled with their desired outputs. The purpose of this method is for the algorithm to be
able to “learn” by comparing its actual output with the “taught” outputs to find errors, and
For example, with supervised learning, an algorithm may be fed data with images
of sharks labeled as fish and images of oceans labeled as water. By being trained on this
data, the supervised learning algorithm should be able to later identify unlabeled shark
images as fish and unlabeled ocean images as water. A common use case of supervised
learning is to use historical data to predict statistically likely future events. It may use
historical stock market information to anticipate upcoming fluctuations or be employed to
filter out spam emails. In supervised learning, tagged photos of dogs can be used as input
data to classify untagged photos of dogs.
Unsupervised Learning
Unsupervised learning is commonly used for transactional data. You may have a
large dataset of customers and their purchases, but as a human you will likely not be able
to make sense of what similar attributes can be drawn from customer profiles and their
types of purchases. With this data fed into an unsupervised learning algorithm, it may be
determined that women of a certain age range who buy unscented soaps are likely to be
Without being told a “correct” answer, unsupervised learning methods can look at
complex data that is more expansive and seemingly unrelated in order to organize it in
potentially meaningful ways. Unsupervised learning is often used for anomaly detection
including for fraudulent credit card purchases, and recommender systems that recommend
what products to buy next. In unsupervised learning, untagged photos of dogs can be used
as input data for the algorithm to find likenesses and classify dog photos together.
Approaches
1. Linear Regression
2. Logistic Regression
3. Decision Tree
4. SVM
5. Naive Bayes
6. kNN
7. K-Means
8. Random Forest
9. Dimensionality Reduction Algorithms
10. 10.Gradient Boosting algorithms
1. GBM
2. XGBoost
3. LightGBM
4. CatBoost
Deep Learning
Deep learning attempts to imitate how the human brain can process light and sound
stimuli into vision and hearing. A deep learning architecture is inspired by biological neural
networks and consists of multiple layers in an artificial neural network made up of
hardware and GPUs. Deep learning uses a cascade of nonlinear processing unit layers in
order to extract or transform features (or representations) of the data. The output of one
layer serves as the input of the successive layer. In deep learning, algorithms can be either
supervised and serve to classify data, or unsupervised and perform pattern analysis.
Computer vision and speech recognition have both realized significant advances
from deep learning approaches. IBM Watson is a well-known example of a system that
leverages deep learning.
The Google Brain Team project and deep learning software like TensorFlow have
given further traction to the development of deep learning techniques. Such techniques are
based on mathematical functions and parameters for achieving the desired output.
If you need more flexibility, eager execution allows for immediate iteration and
intuitive debugging. For large ML training tasks, use the Distribution Strategy API for
distributed training on different hardware configurations without changing the model
definition.
Colaboratory is a free Jupyter notebook environment that requires no setup and runs
entirely in the cloud.
With Colaboratory you can write and execute code, save and share your analyses,
and access powerful computing resources, all for free from your browser.
Introduction
Components
The Jupyter Notebook combines three components:
• The notebook web application: An interactive web application for writing and
running code interactively and authoring notebook documents.
• Kernels: Separate processes started by the notebook web application that runs users’
code in each language and returns output back to the notebook web application. The
kernel also handles things like computations for interactive widgets, tab completion
and introspection.
• Python(https://github.com/ipython/ipython)
• Julia (https://github.com/JuliaLang/IJulia.jl)
• R (https://github.com/IRkernel/IRkernel)
• Ruby (https://github.com/minrk/iruby)
Each of these kernels communicate with the notebook web application and web browser
using a JSON over ZeroMQ/WebSockets message protocol that is described here. Most
users don’t need to know about these details, but it helps to understand that “kernels run
code.”
Models are created based on the past result gathered from training data, It studies
the past observation results to make precise predictions. Conventional machine-learning
techniques were restricted in their abilities, for decades Machine learning algorithm is
trained using training data sets to create a model. When the new input data is introduced
to the Machine Learning Algorithm (MLA), it makes predictions based on the model.
NURAL NETWORK
Neural network is made up of neurons connected to each other; at the same time,
each connection of our neural network is associated with a weight that dictates the
importance of this relationship in the neuron when multiplied by the input value. Each
neuron has an activation function that defines the output of the neuron. The activation
function is used to introduce non-linearity in the modeling capabilities of the network. We
have several options for activation functions that we will present in this post. Training our
neural network, that is, learning the values of our parameters (weights and biases) is the
most genuine part of Deep Learning and we can see this learning process in a neural network
as an iterative process of “going and return” by the layers of neurons. The “going” is a
forward propagation of the information and the “return” is a backpropagation of the
information.
f (x) = wx + b (1)
Where w=weight vector, x =Input Vector, b =Bias and f (x) = Transfer capacities.
final output in eq. (1) Bias enables us to move the decision line with the goal that
can best separate the input into two classes.
2.6 Advantages
No need for Feature Engineering.
Best results with unstructured data.
No need for labeling of data.
Efficient at delivering high quality results.
Technical Feasibility
This study is carried out to check the technical feasibility, that is, the technical
requirements of the system. Any system developed must not have a high demand on the
available technical resources. This will lead to high demands on the available technical
resources. This will lead to high demands being placed on the client. The developed system
must have a modest requirement, as only minimal or null changes are required for
implementing this system.
Social Feasibility
The aspect of study is to check the level of acceptance of the system by the user.
This includes the process of training the user to use the system efficiently. The user must
not feel threatened by the system, instead must accept it as a necessity. The level of
acceptance by the users solely depends on the methods that are employed to educate the
user about the system and to make him familiar with it. His level of confidence must be
raised so that he is also able to make some constructive criticism, which is welcomed, as
he is the final user of the system.
Cross-validation is a technique in which we train our model using the subset of the
dataset and then evaluate using the complementary subset of the dataset.
5.3 Algorithm
If you are aware of the Perceptron Algorithm, in the perceptron we just multiply
with weights and add Bias, but we do this in one layer only.
In the Multilayer perceptron, there can more than one linear layer (combinations
of neurons). If we take the simple example the three-layer network, first layer will be
the input layer and last will be output layer and middle layer will be called hidden layer. We
feed our input data into the input layer and take the output from the output layer. We can
increase the number of the hidden layer as much as we want, to make the model more
complex according to our task.
Feed Forward Network, is the most typical neural network model. Its goal is to
approximate some function f (). Given, for example, a classifier y = f ∗ (x) that maps an
input x to an output class y, the MLP find the best approximation to that classifier by
defining a mapping, y = f(x; θ) and learning the best parameters θ for it. The MLP networks
are composed of many functions that are chained together. A network with three functions
or layers would form f(x) = f (3)(f (2)(f (1)(x))). Each of these layers is composed of units
A popular strategy is to initialize the weights to random values and refine them
iteratively to get a lower loss. This refinement is achieved by moving on the direction
defined by the gradient of the loss function. And it is important to set a learning rate defining
the amount in which the algorithm is moving in every iteration
Activation function
Activation functions also known non- linearity, describe the input-output relations
in a non-linear way. This gives the model power to be more flexible in describing arbitrary
relations.
1. Forward pass
3. Backward pass
1.Forward Pass
In this step of training the model, we just pass the input to model and multiply with
weights and add bias at every layer and find the calculated output of the model.
2. Loss Calculate
When we pass the data instance(or one example) we will get some output from the
model that is called Predicted output(pred_out) and we have the label with the data that is
3. Backward Pass
After calculating the loss, we backpropagate the loss and updates the weights of the
model by using gradient. This is the main step in the training of the model. In this step,
weights will adjust according to the gradient flow in that direction.
Advantages
• To represent complete systems (instead of only the software portion) using object-
oriented concepts.
• To establish an explicit coupling between concepts and executable code.
• To consider the scaling factors that are inherent to complex and critical systems.
As UML describes the real time systems it is very important to make a conceptual
model and then proceed gradually. Conceptual model of UML can be mastered by learning
the following three major elements:
• Things
• Relationships
• Diagrams
5.5UML Diagrams
UML diagrams are the ultimate output of the entire discussion. All the elements,
relationships are used to make a complete UML diagram and the diagram represents a
system.
The visual effect of the UML diagram is the most important part of the entire process. All
the other elements are used to make it a complete one.
UML includes the following nine diagrams and the details are described in the following
chapters.
• Class diagram
• Object diagram
• Use case diagram
• Sequence diagram
• Collaboration diagram
• Activity diagram
• State chart diagram
• Deployment diagram
• Component diagram
1. Use case diagrams: represent the functions of a system from the user's point of view.
2. Class diagrams : represent the static structure in terms of classes and relationships
User
Data Understanding
- Data Base
- Exploratory Data Analysis - Algorithms
explores data - Cross Validation
+ Variable Identification()
1.. 1 + Upload Data()
+ Multi Varient Analysis()
+ Importance Of Variables() + View Dataset()
+ Normalization Of Variables() + Evaluate Model()
+ Compare Model()
modeling
1
1
Modeling
- Machine Learning
Algorithms - Deep Learning
using data
+ Train Model()
+ Test Model()
+ Select Algorithm()
algorithm
selection
Algorithms
- Libraries
1 1
select algorithn + Select Algorithm()
1 select algorithm
ML Classification
- Logistic Regression 1
- Linear Discriminate Analysis
- Decission Tree Model Ev aluation DL Classification
- Random Forest cross validation cross validation
- KNeighborsClassifier - Cross Validation - Multi Layer Perceptron Classifier
- GaussianNB - KFold Validation
- SVM
- AdaBoostClassifier
- GradientBoostingClassifier
- ExtraTreesClassifier
System
Authentication &
Authorization
Google Colaboratory
Select Dataset
Load Dataset
Analyse Dataset
Split Dataset
(Training & Testing)
User
Execution Env ironment
Select Algorithm
Train Dataset
Validate Dataset
Cross v alidate
Result
Compare Accuracy
1:User Logi n
()
2:Accept/Rej ect()
5:Load Dataset()
7:Spl i tDataset()
Stop Stop
Stop
User
Authenti cate
Val i d
User Login Import Libraries
Inval i d
Validate Dataset
Cross Validation
Stop
Google
Colaboratory
Python
Notebook
When you log in and connect to the Colaboratory runtime, you will get your own
virtual machine with a K80 GPU (not always entirely yours, read below) and a Jupyter
notebook environment. You can use it to your heart's content for up to 12 hours. Or until
you close your browser window. Be warned, sometimes the runtime can disconnect
randomly.
If you're lucky and you get access to the full card you'll see something like this:
If you see RAM usage, that means that you are sharing the card with someone. Since by
default Tensorflow allocates almost 100% of the GPU's memory this is a problem. To get
a new instance and a new chance to get you own GPU, run the following command in a
cell:
!kill -9 -1
import tensorflow as tf
with tf.device('/gpu:0'):
model.fit(X_train, y_train, epochs=10,batch_size=128, verbose=1)
Although the first method described below seems more involved, I think it's the best one
as I have had problems with the second method in the past.
import os
from google.colab import drive
drive.mount('/gdrive')
To download data you can also use the files library provided by Google.
You can also upload your own files using the same library.
model = loaded_model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
• In-browser editing for code, with automatic syntax highlighting, indentation, and tab
completion/introspection.
• The ability to execute code from the browser, with the results of computations attached
to the code which generated them.
Notebook documents
Notebook documents contains the inputs and outputs of a interactive session as well
as additional text that accompanies the code but is not meant for execution. In this way,
notebook files can serve as a complete computational record of a session, interleaving
executable code with explanatory text, mathematics, and rich representations of resulting
objects. These documents are internally JSON files and are saved with
the .ipynb extension. Since JSON is a plain text format, they can be version-controlled
server. Jupyter doesn’t send your data anywhere else—and as it’s open source, other people
can check that we’re being honest about this.
Notebook name: The name displayed at the top of the page, next to the Jupyter logo,
reflects the name of the .ipynb file. Clicking on the notebook name brings up a dialog
which allows you to rename it. Thus, renaming a notebook from “Untitled0” to “My first
notebook” in the browser, renames the Untitled0.ipynb file to My first notebook.ipynb .
Toolbar: The tool bar gives a quick way of performing the most-used operations within
the notebook, by clicking on an icon.
Code cell: the default type of cell; read on for an explanation of cells.
Markdown cells
You can document the computational process in a literate way, alternating
descriptive text with code, using rich text. In IPython this is accomplished by marking up
text with the Markdown language. The corresponding cells are called Markdown cells. The
Markdown language provides a simple way to perform this text markup, that is, to specify
which parts of the text should be emphasized (italics), bold, form lists, etc.
If you want to provide structure for your document, you can use markdown headings.
Markdown headings consist of 1 to 6 hash # signs # followed by a space and the title of
your section. The markdown heading will be converted to a clickable link for a section of
the notebook. It is also used as a hint when exporting to other document formats, like PDF.
displayed mathematics. When the Markdown cell is executed, the LaTeX portions are
automatically rendered in the HTML output as equations with high quality typography.
This is made possible by MathJax, which supports a large subset of LaTeX functionality
Markdown cell. These definitions are then available throughout the rest of the IPython
session.
Raw cells
Raw cells provide a place in which you can write output directly. Raw cells are not
evaluated by the notebook. When passed through nbconvert, raw cells arrive in the
destination format unmodified. For example, you can type full LaTeX into a raw cell,
which will only be rendered by LaTeX after conversion by nbconvert.
Keyboard shortcuts
All actions in the notebook can be performed with the mouse, but keyboard
shortcuts are also available for the most common ones. The essential shortcuts to remember
are the following:
Execute the current cell, show any output, and jump to the next cell below. If Shift-
Enter is invoked on the last cell, it makes a new cell below. This is equivalent to
clicking the Cell , Run menu item, or the Play button in the toolbar.
In command mode, you can navigate around the notebook using keyboard
shortcuts.
Enter: Edit mode
In edit mode, you can edit text in cells.
Plotting
One major feature of the Jupyter notebook is the ability to display plots that are the
output of running code cells. The IPython kernel is designed to work seamlessly with
the matplotlib plotting library to provide this functionality. Specific plotting library
integration is a feature of the kernel.
Inside the Python interpreter, the help() function pulls up documentation strings for
various modules, functions, and methods. These doc strings are similar to Java's
javadoc. The dir() function tells you what the attributes of an object are. Below are
some ways to call help() and dir() from the interpreter:
help(len) — help string for the built-in len() function; note that it's "len" not "len()",
which is a call to the function, which we don't want
help(sys) — help string for the sys module (must do an import sys first)
help(sys.exit) — help string for the exit() function in the sys module
help('xyz'.split) — help string for the split() method for string objects. You can
call help() with that object itself or an example of that object, plus its attribute. For
example, calling help('xyz'.split) is the same as calling help(str.split).
help(list.append) — help string for the append() method for list objects
SciPy
This useful library includes modules for linear algebra, integration, optimization,
and statistics. Its main functionality was built upon NumPy, so its arrays make use of this
library. SciPy works great for all kinds of scientific programming projects (science,
mathematics, and engineering). It offers efficient numerical routines such as numerical
Pandas
Pandas is a library created to help developers work with "labeled" and "relational"
data intuitively. It's based on two main data structures: "Series" (one-dimensional, like a
list of items) and "Data Frames" (two-dimensional, like a table with multiple columns).
Pandas allows converting data structures to DataFrame objects, handling missing data, and
adding/deleting columns from DataFrame, imputing missing files, and plotting data with
histogram or plot box. It’s a must-have for data wrangling, manipulation, and visualization.
Keras
Keras is a great library for building neural networks and modeling. It's very
straightforward to use and provides developers with a good degree of extensibility. The
library takes advantage of other packages, (Theano or TensorFlow) as its backends.
Moreover, Microsoft integrated CNTK (Microsoft Cognitive Toolkit) to serve as another
backend. It's a great pick if you want to experiment quickly using compact systems – the
minimalist approach to design really pays off!
SciKit-Learn
This is an industry-standard for data science projects based in Python. Scikits is a
group of packages in the SciPy Stack that were created for specific functionalities – for
example, image processing. Scikit-learn uses the math operations of SciPy to expose a
concise interface to the most common machine learning algorithms.
Tensor-Flow
TensorFlow is a popular Python framework for machine learning and deep learning,
which was developed at Google Brain. It is the best tool for tasks like object identification,
Pattern recognition, speech recognition, and many others. It helps in working with artificial
neural networks that need to handle multiple data sets.
2. Data Visualization
Matplotlib
This is a standard data science library that helps to generate data visualizations such
as two-dimensional diagrams and graphs (histograms, scatterplots, non-Cartesian
coordinates graphs). Matplotlib is one of those plotting libraries that are really useful in
data science projects — it provides an object-oriented API for embedding plots into
applications, It's thanks to this library that Python can compete with scientific tools like
MatLab or Mathematica. However, developers need to write more code than usual while
using this library for generating advanced visualizations. Note that popular plotting
libraries work seamlessly with Matplotlib.
Seaborn
Seaborn is based on Matplotlib and serves as a useful Python machine learning tool
for visualizing statistical models – heatmaps and other types of visualizations that
Plotly
This web-based tool for data visualization that offers many useful out-of-box
graphics – you can find them on the Plot.ly website. The library works very well in
interactive web applications. Its creators are busy expanding the library with new graphics
and features for supporting multiple linked views, animation, and crosstalk integration.
6.4 Sample code
import os
import numpy as np
import pandas as pd
from google.colab import drive
drive.mount('/content/gdrive')
df=pd.read_csv("/content/sonar.all-data.csv")
import tensorflow as tf
from sklearn import preprocessing
df.head()
le=preprocessing.LabelEncoder()
for i in range(len(df.columns)-1,len(df.columns)):
df.iloc[:,i]=le.fit_transform(df.iloc[:,i]).astype(float)
df.head()
seed = 7
np.random.seed(seed)
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.20)
# Compare Algorithms
fig = plt.figure(figsize = (12,12))
fig.suptitle('Algorithm Comparison')
ax = fig.add_subplot(111)
plt.boxplot(results)
ax.set_xticklabels(names)
plt.show()
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
# Finalize Model
# prepare the model
model.fit(x_train, y_train)
class Perceptron(object):
def __init__(self,eta=0.01, n_iter=200):
self.n_iter = n_iter
self.eta = eta
def fit(self,x,y, chooseWeightVector, x_test,y_test):
if chooseWeightVector == 1:
self.w_ = np.random.rand(1 + x.shape[1])
else:
self.w_ = np.zeros(1 + x.shape[1])
self.w_[0] = 1
self.errors_ = []
self.accuracies_ = []
for _ in range(self.n_iter):
#zip: Make an iterator that aggregates elements from each of the iterab
les.
for xi, target in zip(x, y):
# w <- w + α(y — f(x))x or alternately
# w <- w + α(t - o)x
# predict is: o = sign(w * x) != t
o = self.predict(xi)
update = self.eta * (target - o)
self.w_[1:] += update * xi
self.w_[0] += update
self.calc_error(x_test,y_test)
def net_input(self,x):
# sum(wi * xi)
# w · x + b
return np.dot(x, self.w_[1:]) + self.w_[0]
def predict(self, x):
#sign(net)
return np.where(self.net_input(x) >= 0.0, 1, -1)
print('Mean accuracy on test set with {} epochs and learning rate={}: {
} '.format(p_null.n_iter,i,sum(p_null.accuracies_)/(len(p_null.accuraci
es_))))
kf = KFold(n_splits = 10, shuffle = True)
rf_reg = MLPClassifier(solver='lbfgs',hidden_layer_sizes=(40,40),max_it
er=200,random_state=None)
for i in range(10):
result = next(kf.split(df), None)
x_train = df.iloc[result[0]]
x_test = df.iloc[result[1]]
y_train = y.iloc[result[0]]
y_test = y.iloc[result[1]]
model = rf_reg.fit(x_train,y_train)
predictions = rf_reg.predict(x_test)
scores.append(model.score(x_test,y_test))
print('Scores from each Iteration: ', scores)
print('Average K-Fold Score :' , np.mean(scores))
7.1. Validation
Once we are done with training our model, we just cannot assume that it is going
to work well on data that it has not seen before. In other words, we cannot be sure that
the model will have the desired accuracy and variance in production environment. We
need assurance of the accuracy of the predictions that our model is putting out. For this,
we need to validate our model. This process of deciding whether the numerical results
quantifying hypothesized relationships between variables, are acceptable as descriptions
of the data, is known as validation.
In machine learning, we couldn’t fit the model on the training data and can’t say
that the model will work accurately for the real data. For this, we must assure that our
model got the correct patterns from the data, and it is not getting up too much noise. For
this purpose, we use the cross-validation technique.
Cross-validation is a technique in which we train our model using the subset of the dataset
and then evaluate using the complementary subset of the dataset.
Score
Classification report
• Obtaining data
• Cleaning or Scrubbing data
• Visualizing and Exploring data will allow us to find patterns and trends
• Modelling data will give us our predictive power as a data magician
• Interpreting data
8.2 Screenshots
8.2.1. Google colab Jupiter Notebook
Description
The above figure represents how to mount a google drive to the working environment to
access data.
8.2.3. Reading Data
Description
The above figure represents the splitting of data into training data and testing data to build
a model.
Description
The above figure represents the accuracy score of all Deep Learning classification
algorithms using the Multi-Layer Perceptron Classifier algorithm.
8.2.10. Deep Learning classification Report
10.1 References
1. Yann LeCun, Yoshua Bengio & Geoffrey Hinton,”Deep Learning”,436
NATURE, Vol. 521, 28 May 2015.
2. R. Sathya, Annamma Abraham, “Comparison of Supervised and Unsupervised
Learning Algorithms for Pattern Classification”, (IJARAI)
International Journal of Advanced Research in Artificial Intelligence, Vol. 2, No.
2, 2013.
3. Kai-Yeung Siu, Amir Dembo, Thomas Kailath. “On the Perceptron Learning
Algorithm on Data with High Precision”, Journal Of Computer And System
Sciences 48, 347-356 (1994).
4. Tang, J., Deng, C. and Huang, G.B., 2016. Extreme learning machine for
multilayer perceptron. IEEE transactions on neural networks and learning
systems, 27(4), pp.809-821.
5. Yang xin, Lingshuang Kong, Zhi Liu, (member, IEEE), Yuling Chen, Yanmiao Li,
Hongliang Zhu, Mingcheng Gao,Haixia Hou, and Chunhua Wang , “Machine
Learning and Deep Learning Methods for Cyber security”,IEEE Access,Vol. 6
page 35365 – 35381(15 May 2018).21
6. Raiko, T., Valpola, H. and LeCun, Y., 2012, March. Deep learning made easier
by linear transformations in perceptrons. In Artificial intelligence and statistics
(pp. 924-932).
10.2 Websites
1. http://wikipedia.org/
2. https://archive.ics.uci.edu/ml/index.php
3. https://towardsdatascience.com/how-do-artificial-neural-networks-learn-
773e46399fc7
4. https://colab.research.google.com/notebooks/intro.ipynb#recent=true
10.3 Textbooks
Abstract—In recent years, Deep Learning, Machine Learning,
and Artificial Intelligence are highly focused concepts of data
science. Deep learning has achieved success in the field of
Computer Vision, Speech and Audio Processing, and Natural
Language Processing. It has the strong learning ability that can
improve utilization of datasets for the feature extraction
compared to traditional Machine Learning Algorithm.
Perceptron is the essential building block for creating a deep
Neural Network. The perceptron model is the more general
computational model. It analyzes the unsupervised data, making
it a valuable tool for data analytics. A key task of this paper is to
develop and analyze learning algorithm. It begins with deep
learning with perceptron and how to apply it using TensorFlow
to solve various issues. The main part of this paper is to make
perceptron learning algorithm well behaved with non-separable
training datasets. This type of algorithm is suitable for Machine
Learning, Deep Learning, Pattern Recognition, and Connectionist
Expert System. Fig. 1. Machine Learning Model
C. Semi-Supervised Learning
It uses a large dataset of unlabeled data and can reduce the
efforts of labeling data while achieving high accuracy [2].
The rest of this paper is organized as below, Section II
about Neural Network and Section III about how to implement For three inputs (Input 1, Input 2, Bias) three weight values
Neural network in perceptron Learning.In section IV are defined for each input. For these inputs tensor variable of
concludes the paper. 3*1 vectors for weight will be initialized with random values.
In TensorFlow, variables update the learning procedure by
II. NEURAL NETWORK changing their weights.
The Neural Network is divided into two main broad
streams as shown in Fig. 2. Linear Separable problem: It
shows in Fig. 2(a) the datasets are classified into two basic
categories or sets using a single line. Non-Linear Separable
problem: It shows in Fig. 2(b) It is the datasets contains
multiple categories or sets and require a non-linear line to
separate them in respective sets.
0173
To calculate the error value for perceptron output and the
desired output, minimize TensorFlow error where it provides
optimizers that gradually change each variable weight and bias
in the successive iteration. Perceptron is trained to update
weights and bias values in the successive iteration in order to
minimize the error or loss. An accuracy of 83.84% is achieved during this process. A
C. Activation Functions graph in Fig. 5 shows reduced cost or error in successive
The activation function is applied to the perceptron’s output. epochs in a graph where Cost vs Number of Epochs are
Activation function is given below to perform linear plotted.
classification on the AND gate linear perceptron in eq. (2) and
(3) and input dataset.
z x1 w1 x2 w2 ................. xn wn (2)
y f (z ) (3)
0174
REFERENCES
[6] Nicol`o Cesa-Bianchi, Alex Conconi, and Claudio Gentile “A Second-
[1] O.R. Zaiane,”Web usage mining for better web-based learning
Order Perceptron Algorithm”,Siam J.Comput., Vol. 34, No. 3, pp. 640–
environment”, in:Prof. of Conference on Advanced Technology for
668.
Education, Banff, AB, June2001, pp. 60–64.
[7] Tang, J., Deng, C. and Huang, G.B., 2016. Extreme learning machine
[2] Yang xin, Lingshuang Kong, Zhi Liu, (member, IEEE), Yuling Chen,
for multilayer perceptron. IEEE transactions on neural networks and
Yanmiao Li, Hongliang Zhu, Mingcheng Gao,Haixia Hou, and Chunhua
learning systems, 27(4), pp.809-821.
Wang , “Machine Learning and Deep Learning Methods for Cyber
security”,IEEE Access,Vol. 6 page 35365 – 35381(15 May 2018).21 [8] Raiko, T., Valpola, H. and LeCun, Y., 2012, March. Deep learning
made easier by linear transformations in perceptrons. In Artificial
[3] Yann LeCun, Yoshua Bengio & Geoffrey Hinton,”Deep Learning”,436
intelligence and statistics (pp. 924-932).
NATURE, Vol. 521, 28 May 2015.
[9] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I.,
[4] R. Sathya, Annamma Abraham, “Comparison of Supervised and
Wierstra, D. and Riedmiller, M., 2013. Playing atari with deep
Unsupervised Learning Algorithms for Pattern Classification”, (IJARAI)
reinforcement learning. arXiv preprint arXiv:1312.5602.
International Journal of Advanced Research in Artificial Intelligence,
[10] Zhu, Z., Luo, P., Wang, X. and Tang, X., 2014. Multi-view perceptron:
Vol. 2, No. 2, 2013.
a deep model for learning face identity and view representations.
[5] Kai-Yeung Siu, Amir Dembo, Thomas Kailath. “On the Perceptron In Advances in Neural Information Processing Systems (pp. 217-225).
Learning Algorithm on Data with High Precision”, Journal Of Computer
And System Sciences 48, 347-356 (1994).
0175