SlideShare a Scribd company logo
Python for R Users
By
Chandan Routray
As a part of internship at
www.decisionstats.com
Basic Commands
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. i
Functions R Python
Downloading and installing a package install.packages('name') pip install name
Load a package library('name') import name as other_name
Checking working directory getwd() import os
os.getcwd()
Setting working directory setwd() os.chdir()
List files in a directory dir() os.listdir()
List all objects ls() globals()
Remove an object rm('name') del('object')
Data Frame Creation
R Python
(Using pandas package*)
Creating a data frame “df” of
dimension 6x4 (6 rows and 4
columns) containing random
numbers
A<­
matrix(runif(24,0,1),nrow=6,ncol=4)
df<­data.frame(A)
Here,
• runif function generates 24 random
numbers between 0 to 1
• matrix function creates a matrix from
those random numbers, nrow and ncol
sets the numbers of rows and columns
to the matrix
• data.frame converts the matrix to data
frame
import numpy as np
import pandas as pd
A=np.random.randn(6,4)
df=pd.DataFrame(A)
Here,
• np.random.randn generates a
matrix of 6 rows and 4 columns;
this function is a part of numpy**
library
• pd.DataFrame converts the matrix
in to a data frame
*To install Pandas library visit: http://pandas.pydata.org/; To import Pandas library type: import pandas as pd;
**To import Numpy library type: import numpy as np;
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 1
Data Frame Creation
R Python
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 2
Data Frame: Inspecting and Viewing Data
R Python
(Using pandas package*)
Getting the names of rows and
columns of data frame “df”
rownames(df)
returns the name of the rows
colnames(df)
returns the name of the columns
df.index
returns the name of the rows
df.columns
returns the name of the columns
Seeing the top and bottom “x”
rows of the data frame “df”
head(df,x)
returns top x rows of data frame
tail(df,x)
returns bottom x rows of data frame
df.head(x)
returns top x rows of data frame
df.tail(x)
returns bottom x rows of data frame
Getting dimension of data frame
“df”
dim(df)
returns in this format : rows, columns
df.shape
returns in this format : (rows,
columns)
Length of data frame “df” length(df)
returns no. of columns in data frames
len(df)
returns no. of columns in data frames
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 3
Data Frame: Inspecting and Viewing Data
R Python
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 4
Data Frame: Inspecting and Viewing Data
R Python
(Using pandas package*)
Getting quick summary(like
mean, std. deviation etc. ) of
data in the data frame “df”
summary(df)
returns mean, median , maximum,
minimum, first quarter and third quarter
df.describe()
returns count, mean, standard
deviation, maximum, minimum, 25%,
50% and 75%
Setting row names and columns
names of the data frame “df”
rownames(df)=c(“A”, ”B”, “C”, ”D”, 
“E”, ”F”)
set the row names to A, B, C, D and E
colnames=c(“P”, ”Q”, “R”, ”S”)
set the column names to P, Q, R and S
df.index=[“A”, ”B”, “C”, ”D”, 
“E”, ”F”]
set the row names to A, B, C, D and
E
df.columns=[“P”, ”Q”, “R”, ”S”]
set the column names to P, Q, R and
S
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 5
Data Frame: Inspecting and Viewing Data
R Python
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 6
Data Frame: Sorting Data
R Python
(Using pandas package*)
Sorting the data in the data
frame “df” by column name “P”
df[order(df$P),] df.sort(['P'])
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 7
Data Frame: Sorting Data
R Python
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 8
Data Frame: Data Selection
R Python
(Using pandas package*)
Slicing the rows of a data frame
from row no. “x” to row no.
“y”(including row x and y)
df[x:y,] df[x­1:y]
Python starts counting from 0
Slicing the columns name “x”,”Y”
etc. of a data frame “df”
myvars <­ c(“X”,”Y”)
newdata <­ df[myvars]
df.loc[:,[‘X’,’Y’]]
Selecting the the data from row
no. “x” to “y” and column no. “a”
to “b”
df[x:y,a:b] df.iloc[x­1:y,a­1,b]
Selecting the element at row no.
“x” and column no. “y”
df[x,y] df.iat[x­1,y­1]
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 9
Data Frame: Data Selection
R Python
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 10
Data Frame: Data Selection
R Python
(Using pandas package*)
Using a single column’s values
to select data, column name “A”
subset(df,A>0)
It will select the all the rows in which the
corresponding value in column A of that
row is greater than 0
df[df.A > 0]
It will do the same as the R function
PythonR
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 11
Mathematical Functions
Functions R Python
(import math and numpy library)
Sum sum(x) math.fsum(x)
Square Root sqrt(x) math.sqrt(x)
Standard Deviation sd(x) numpy.std(x)
Log log(x) math.log(x[,base])
Mean mean(x) numpy.mean(x)
Median median(x) numpy.median(x)
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 12
Mathematical Functions
R Python
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 13
Data Manipulation
Functions R Python
(import math and numpy library)
Convert character variable to numeric variable as.numeric(x) For a single value: int(x), long(x), float(x)
For list, vectors etc.: map(int,x), map(float,x)
Convert factor/numeric variable to character
variable
paste(x) For a single value: str(x)
For list, vectors etc.: map(str,x)
Check missing value in an object is.na(x) math.isnan(x)
Delete missing value from an object na.omit(list) cleanedList = [x for x in list if str(x) !
= 'nan']
Calculate the number of characters in character
value
nchar(x) len(x)
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 14
Date & Time Manipulation
Functions R
(import lubridate library)
Python
(import datetime library)
Getting time and date at an instant Sys.time() datetime.datetime.now()
Parsing date and time in format:
YYYY MM DD HH:MM:SS
d<­Sys.time()
d_format<­ymd_hms(d)
d=datetime.datetime.now()
format= “%Y %b %d  %H:%M:%S”
d_format=d.strftime(format)
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 15
Data Visualization
Functions R Python
(import matplotlib library**)
Scatter Plot variable1 vs variable2 plot(variable1,variable2) plt.scatter(variable1,variable2)
plt.show()
Boxplot for Var boxplot(Var) plt.boxplot(Var)
plt.show()
Histogram for Var hist(Var) plt.hist(Var)
plt.show()
Pie Chart for Var pie(Var) from pylab import *
pie(Var)
show()
** To import matplotlib library type: import matplotlib.pyplot as plt
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 16
Data Visualization: Scatter Plot
R Python
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 17
Data Visualization: Box Plot
R Python
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 18
Data Visualization: Histogram
R Python
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 19
Data Visualization: Line Plot
R Python
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 20
Data Visualization: Bubble
R Python
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 22
Data Visualization: Bar
R Python
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 21
Data Visualization: Pie Chart
R Python
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 23
Thank You
For feedback contact
DecisionStats.com
Coming up
● Data Mining in Python and R ( see draft slides
afterwards)
Machine Learning: SVM on Iris Dataset
*To know more about svm function in R visit: http://cran.r-project.org/web/packages/e1071/
** To install sklearn library visit : http://scikit-learn.org/, To know more about sklearn svm visit: http://scikit-
learn.org/stable/modules/generated/sklearn.svm.SVC.html
R(Using svm* function) Python(Using sklearn** library)
library(e1071)
data(iris)
trainset <­iris[1:149,]
testset <­iris[150,]
svm.model <­ svm(Species ~ ., data = 
trainset, cost = 100, gamma = 1, type= 'C­
classification')
svm.pred<­ predict(svm.model,testset[­5])
svm.pred
#Loading Library
from sklearn import svm
#Importing Dataset
from sklearn import datasets
#Calling SVM
clf = svm.SVC()
#Loading the package
iris = datasets.load_iris()
#Constructing training data
X, y = iris.data[:­1], iris.target[:­1]
#Fitting SVM
clf.fit(X, y)
#Testing the model on test data
print clf.predict(iris.data[­1])
Output: Virginica Output: 2, corresponds to Virginica
Linear Regression: Iris Dataset
*To know more about lm function in R visit: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/lm.html
** ** To know more about sklearn linear regression visit : http://scikit-
learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html
R(Using lm* function) Python(Using sklearn** library)
data(iris)
total_size<­dim(iris)[1]
num_target<­c(rep(0,total_size))
for (i in 1:length(num_target)){
  if(iris$Species[i]=='setosa'){num_target[i]<­0}
  else if(iris$Species[i]=='versicolor')
{num_target[i]<­1}
  else{num_target[i]<­2}
}
iris$Species<­num_target
train_set <­iris[1:149,]
test_set <­iris[150,]
fit<­lm(Species ~ 0+Sepal.Length+ Sepal.Width+ 
Petal.Length+ Petal.Width , data=train_set)
coefficients(fit)
predict.lm(fit,test_set)
from sklearn import linear_model
from sklearn import datasets
iris = datasets.load_iris()
regr = linear_model.LinearRegression()
X, y = iris.data[:­1], iris.target[:­1]
regr.fit(X, y)
print(regr.coef_)
print regr.predict(iris.data[­1])
Output: 1.64 Output: 1.65
Random forest: Iris Dataset
*To know more about randomForest package in R visit: http://cran.r-project.org/web/packages/randomForest/
** To know more about sklearn random forest visit : http://scikit-
learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
R(Using randomForest* package) Python(Using sklearn** library)
library(randomForest)
data(iris)
total_size<­dim(iris)[1]
num_target<­c(rep(0,total_size))
for (i in 1:length(num_target)){
  if(iris$Species[i]=='setosa'){num_target[i]<­0}
  else if(iris$Species[i]=='versicolor')
{num_target[i]<­1}
  else{num_target[i]<­2}}
iris$Species<­num_target
train_set <­iris[1:149,]
test_set <­iris[150,]
iris.rf <­ randomForest(Species ~ ., 
data=train_set,ntree=100,importance=TRUE,
                        proximity=TRUE)
print(iris.rf)
predict(iris.rf, test_set[­5], predict.all=TRUE)
from sklearn import ensemble
from sklearn import datasets
clf = 
ensemble.RandomForestClassifier(n_estimato
rs=100,max_depth=10)
iris = datasets.load_iris()
X, y = iris.data[:­1], iris.target[:­1]
clf.fit(X, y)
print clf.predict(iris.data[­1])
Output: 1.845 Output: 2
Decision Tree: Iris Dataset
*To know more about rpart package in R visit: http://cran.r-project.org/web/packages/rpart/
** To know more about sklearn desicion tree visit : http://scikit-
learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
R(Using rpart* package) Python(Using sklearn** library)
library(rpart)
data(iris)
sub <­ c(1:149)
fit <­ rpart(Species ~ ., data = iris, 
subset = sub)
fit
predict(fit, iris[­sub,], type = "class")
from sklearn.datasets import load_iris
from sklearn.tree import 
DecisionTreeClassifier
clf = 
DecisionTreeClassifier(random_state=0)
iris = datasets.load_iris()
X, y = iris.data[:­1], iris.target[:­1]
clf.fit(X, y)
print clf.predict(iris.data[­1])
Output: Virginica Output: 2, corresponds to virginica
Gaussian Naive Bayes: Iris Dataset
*To know more about e1071 package in R visit: http://cran.r-project.org/web/packages/e1071/
** To know more about sklearn Naive Bayes visit : http://scikit-
learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html
R(Using e1071* package) Python(Using sklearn** library)
library(e1071)
data(iris)
trainset <­iris[1:149,]
testset <­iris[150,]
classifier<­naiveBayes(trainset[,1:4], 
trainset[,5]) 
predict(classifier, testset[,­5])
from sklearn.datasets import load_iris
from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()
iris = datasets.load_iris()
X, y = iris.data[:­1], iris.target[:­1]
clf.fit(X, y)
print clf.predict(iris.data[­1])
Output: Virginica Output: 2, corresponds to virginica
K Nearest Neighbours: Iris Dataset
*To know more about kknn package in R visit:
** To know more about sklearn k nearest neighbours visit : http://scikit-
learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html
R(Using kknn* package) Python(Using sklearn** library)
library(kknn)
data(iris)
trainset <­iris[1:149,]
testset <­iris[150,]
iris.kknn <­ kknn(Species~., 
trainset,testset, distance = 1,            
    kernel = "triangular")
summary(iris.kknn)
fit <­ fitted(iris.kknn)
fit
from sklearn.datasets import load_iris
from sklearn.neighbors import 
KNeighborsClassifier
knn = KNeighborsClassifier()
iris = datasets.load_iris()
X, y = iris.data[:­1], iris.target[:­1]
knn.fit(X,y) 
print knn.predict(iris.data[­1])
Output: Virginica Output: 2, corresponds to virginica
Thank You
For feedback please let us know at
ohri2007@gmail.com
Ad

More Related Content

What's hot (20)

pandas - Python Data Analysis
pandas - Python Data Analysispandas - Python Data Analysis
pandas - Python Data Analysis
Andrew Henshaw
 
Pandas
PandasPandas
Pandas
maikroeder
 
Data Analysis with Python Pandas
Data Analysis with Python PandasData Analysis with Python Pandas
Data Analysis with Python Pandas
Neeru Mittal
 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAse
enissoz
 
Introduction to Python Pandas for Data Analytics
Introduction to Python Pandas for Data AnalyticsIntroduction to Python Pandas for Data Analytics
Introduction to Python Pandas for Data Analytics
Phoenix
 
Python Cheat Sheet 2.0.pdf
Python Cheat Sheet 2.0.pdfPython Cheat Sheet 2.0.pdf
Python Cheat Sheet 2.0.pdf
Rahul Jain
 
Pandas
PandasPandas
Pandas
Dr. Chitra Dhawale
 
Exploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science ClubExploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science Club
Martin Bago
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the Hood
Databricks
 
Graph database
Graph databaseGraph database
Graph database
Achintya Kumar
 
Incremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and IcebergIncremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and Iceberg
Walaa Eldin Moustafa
 
Presto overview
Presto overviewPresto overview
Presto overview
Shixiong Zhu
 
How to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analytics
Julien Le Dem
 
Odoo Experience 2018 - How to Break Odoo Security (or how to prevent it)
Odoo Experience 2018 - How to Break Odoo Security (or how to prevent it)Odoo Experience 2018 - How to Break Odoo Security (or how to prevent it)
Odoo Experience 2018 - How to Break Odoo Security (or how to prevent it)
ElínAnna Jónasdóttir
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
Databricks
 
DASK and Apache Spark
DASK and Apache SparkDASK and Apache Spark
DASK and Apache Spark
Databricks
 
python operators.ppt
python operators.pptpython operators.ppt
python operators.ppt
ErnieAcuna
 
Python Programming by Dr. C. Sreedhar.pdf
Python Programming by Dr. C. Sreedhar.pdfPython Programming by Dr. C. Sreedhar.pdf
Python Programming by Dr. C. Sreedhar.pdf
Sreedhar Chowdam
 
Data Structures Notes 2021
Data Structures Notes 2021Data Structures Notes 2021
Data Structures Notes 2021
Sreedhar Chowdam
 
Partie3BI-DW-OLAP2019
Partie3BI-DW-OLAP2019Partie3BI-DW-OLAP2019
Partie3BI-DW-OLAP2019
Aziz Darouichi
 
pandas - Python Data Analysis
pandas - Python Data Analysispandas - Python Data Analysis
pandas - Python Data Analysis
Andrew Henshaw
 
Data Analysis with Python Pandas
Data Analysis with Python PandasData Analysis with Python Pandas
Data Analysis with Python Pandas
Neeru Mittal
 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAse
enissoz
 
Introduction to Python Pandas for Data Analytics
Introduction to Python Pandas for Data AnalyticsIntroduction to Python Pandas for Data Analytics
Introduction to Python Pandas for Data Analytics
Phoenix
 
Python Cheat Sheet 2.0.pdf
Python Cheat Sheet 2.0.pdfPython Cheat Sheet 2.0.pdf
Python Cheat Sheet 2.0.pdf
Rahul Jain
 
Exploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science ClubExploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science Club
Martin Bago
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the Hood
Databricks
 
Incremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and IcebergIncremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and Iceberg
Walaa Eldin Moustafa
 
How to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analytics
Julien Le Dem
 
Odoo Experience 2018 - How to Break Odoo Security (or how to prevent it)
Odoo Experience 2018 - How to Break Odoo Security (or how to prevent it)Odoo Experience 2018 - How to Break Odoo Security (or how to prevent it)
Odoo Experience 2018 - How to Break Odoo Security (or how to prevent it)
ElínAnna Jónasdóttir
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
Databricks
 
DASK and Apache Spark
DASK and Apache SparkDASK and Apache Spark
DASK and Apache Spark
Databricks
 
python operators.ppt
python operators.pptpython operators.ppt
python operators.ppt
ErnieAcuna
 
Python Programming by Dr. C. Sreedhar.pdf
Python Programming by Dr. C. Sreedhar.pdfPython Programming by Dr. C. Sreedhar.pdf
Python Programming by Dr. C. Sreedhar.pdf
Sreedhar Chowdam
 
Data Structures Notes 2021
Data Structures Notes 2021Data Structures Notes 2021
Data Structures Notes 2021
Sreedhar Chowdam
 

Similar to Python for R Users (20)

Python for R users
Python for R usersPython for R users
Python for R users
Satyarth Praveen
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
AmanBhalla14
 
Practical data science_public
Practical data science_publicPractical data science_public
Practical data science_public
Long Nguyen
 
R Introduction
R IntroductionR Introduction
R Introduction
Sangeetha S
 
Basics of R-Progranmming with instata.ppt
Basics of R-Progranmming with instata.pptBasics of R-Progranmming with instata.ppt
Basics of R-Progranmming with instata.ppt
geethar79
 
17641.ppt
17641.ppt17641.ppt
17641.ppt
AhmedAbdalla903058
 
Slides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MDSlides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MD
SonaCharles2
 
17641.ppt
17641.ppt17641.ppt
17641.ppt
vikassingh569137
 
How to obtain and install R.ppt
How to obtain and install R.pptHow to obtain and install R.ppt
How to obtain and install R.ppt
rajalakshmi5921
 
Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018
DataLab Community
 
Lecture 9.pptx
Lecture 9.pptxLecture 9.pptx
Lecture 9.pptx
MathewJohnSinoCruz
 
Advanced Data Analytics with R Programming.ppt
Advanced Data Analytics with R Programming.pptAdvanced Data Analytics with R Programming.ppt
Advanced Data Analytics with R Programming.ppt
Anshika865276
 
EDA.pptx
EDA.pptxEDA.pptx
EDA.pptx
yovi pratama
 
R and Python, A Code Demo
R and Python, A Code DemoR and Python, A Code Demo
R and Python, A Code Demo
Vineet Jaiswal
 
R-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdfR-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdf
KabilaArun
 
R-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdfR-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdf
attalurilalitha
 
R-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdfR-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdf
DrGSakthiGovindaraju
 
R programming language
R programming languageR programming language
R programming language
Alberto Minetti
 
EDA.pptx
EDA.pptxEDA.pptx
EDA.pptx
Rahul Borate
 
R language introduction
R language introductionR language introduction
R language introduction
Shashwat Shriparv
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
AmanBhalla14
 
Practical data science_public
Practical data science_publicPractical data science_public
Practical data science_public
Long Nguyen
 
Basics of R-Progranmming with instata.ppt
Basics of R-Progranmming with instata.pptBasics of R-Progranmming with instata.ppt
Basics of R-Progranmming with instata.ppt
geethar79
 
Slides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MDSlides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MD
SonaCharles2
 
How to obtain and install R.ppt
How to obtain and install R.pptHow to obtain and install R.ppt
How to obtain and install R.ppt
rajalakshmi5921
 
Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018
DataLab Community
 
Advanced Data Analytics with R Programming.ppt
Advanced Data Analytics with R Programming.pptAdvanced Data Analytics with R Programming.ppt
Advanced Data Analytics with R Programming.ppt
Anshika865276
 
R and Python, A Code Demo
R and Python, A Code DemoR and Python, A Code Demo
R and Python, A Code Demo
Vineet Jaiswal
 
R-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdfR-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdf
KabilaArun
 
R-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdfR-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdf
attalurilalitha
 
Ad

More from Ajay Ohri (20)

Introduction to R ajay Ohri
Introduction to R ajay OhriIntroduction to R ajay Ohri
Introduction to R ajay Ohri
Ajay Ohri
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
Ajay Ohri
 
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionSocial Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 Election
Ajay Ohri
 
Pyspark
PysparkPyspark
Pyspark
Ajay Ohri
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for free
Ajay Ohri
 
Install spark on_windows10
Install spark on_windows10Install spark on_windows10
Install spark on_windows10
Ajay Ohri
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri Resume
Ajay Ohri
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
Ajay Ohri
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...
Ajay Ohri
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
Ajay Ohri
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
Ajay Ohri
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data Science
Ajay Ohri
 
Tradecraft
Tradecraft   Tradecraft
Tradecraft
Ajay Ohri
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data Scientists
Ajay Ohri
 
Craps
CrapsCraps
Craps
Ajay Ohri
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in Python
Ajay Ohri
 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen Ooms
Ajay Ohri
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports Analytics
Ajay Ohri
 
Kush stats alpha
Kush stats alpha Kush stats alpha
Kush stats alpha
Ajay Ohri
 
Analyze this
Analyze thisAnalyze this
Analyze this
Ajay Ohri
 
Introduction to R ajay Ohri
Introduction to R ajay OhriIntroduction to R ajay Ohri
Introduction to R ajay Ohri
Ajay Ohri
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
Ajay Ohri
 
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionSocial Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 Election
Ajay Ohri
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for free
Ajay Ohri
 
Install spark on_windows10
Install spark on_windows10Install spark on_windows10
Install spark on_windows10
Ajay Ohri
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri Resume
Ajay Ohri
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
Ajay Ohri
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...
Ajay Ohri
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
Ajay Ohri
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
Ajay Ohri
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data Science
Ajay Ohri
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data Scientists
Ajay Ohri
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in Python
Ajay Ohri
 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen Ooms
Ajay Ohri
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports Analytics
Ajay Ohri
 
Kush stats alpha
Kush stats alpha Kush stats alpha
Kush stats alpha
Ajay Ohri
 
Analyze this
Analyze thisAnalyze this
Analyze this
Ajay Ohri
 
Ad

Recently uploaded (20)

Hootsuite Social Trends 2025 Report_en.pdf
Hootsuite Social Trends 2025 Report_en.pdfHootsuite Social Trends 2025 Report_en.pdf
Hootsuite Social Trends 2025 Report_en.pdf
lionardoadityabagask
 
Introduction to MedDRA hgjuyh mnhvnj mbv hvj jhgjgjgjg
Introduction to MedDRA hgjuyh mnhvnj mbv hvj jhgjgjgjgIntroduction to MedDRA hgjuyh mnhvnj mbv hvj jhgjgjgjg
Introduction to MedDRA hgjuyh mnhvnj mbv hvj jhgjgjgjg
MichaelTuffourAmirik
 
Mixed Methods Research.pptx education 201
Mixed Methods Research.pptx education 201Mixed Methods Research.pptx education 201
Mixed Methods Research.pptx education 201
GraceSolaa1
 
最新版澳洲西澳大利亚大学毕业证(UWA毕业证书)原版定制
最新版澳洲西澳大利亚大学毕业证(UWA毕业证书)原版定制最新版澳洲西澳大利亚大学毕业证(UWA毕业证书)原版定制
最新版澳洲西澳大利亚大学毕业证(UWA毕业证书)原版定制
Taqyea
 
End to End Process Analysis - Cox Communications
End to End Process Analysis - Cox CommunicationsEnd to End Process Analysis - Cox Communications
End to End Process Analysis - Cox Communications
Process mining Evangelist
 
Important JavaScript Concepts Every Developer Must Know
Important JavaScript Concepts Every Developer Must KnowImportant JavaScript Concepts Every Developer Must Know
Important JavaScript Concepts Every Developer Must Know
yashikanigam1
 
Dr. Robert Krug - Expert In Artificial Intelligence
Dr. Robert Krug - Expert In Artificial IntelligenceDr. Robert Krug - Expert In Artificial Intelligence
Dr. Robert Krug - Expert In Artificial Intelligence
Dr. Robert Krug
 
Digital Disruption Use Case_Music Industry_for students.pdf
Digital Disruption Use Case_Music Industry_for students.pdfDigital Disruption Use Case_Music Industry_for students.pdf
Digital Disruption Use Case_Music Industry_for students.pdf
ProsenjitMitra9
 
Feature Engineering for Electronic Health Record Systems
Feature Engineering for Electronic Health Record SystemsFeature Engineering for Electronic Health Record Systems
Feature Engineering for Electronic Health Record Systems
Process mining Evangelist
 
Introduction to Python_for_machine_learning.pdf
Introduction to Python_for_machine_learning.pdfIntroduction to Python_for_machine_learning.pdf
Introduction to Python_for_machine_learning.pdf
goldenflower34
 
Responsible Data Science for Process Miners
Responsible Data Science for Process MinersResponsible Data Science for Process Miners
Responsible Data Science for Process Miners
Process mining Evangelist
 
Dynamics 365 Business Rules Dynamics Dynamics
Dynamics 365 Business Rules Dynamics DynamicsDynamics 365 Business Rules Dynamics Dynamics
Dynamics 365 Business Rules Dynamics Dynamics
heyoubro69
 
Red Hat Openshift Training - openshift (1).pptx
Red Hat Openshift Training - openshift (1).pptxRed Hat Openshift Training - openshift (1).pptx
Red Hat Openshift Training - openshift (1).pptx
ssuserf60686
 
presentacion.slideshare.informáticaJuridica..pptx
presentacion.slideshare.informáticaJuridica..pptxpresentacion.slideshare.informáticaJuridica..pptx
presentacion.slideshare.informáticaJuridica..pptx
GersonVillatoro4
 
web-roadmap developer file information..
web-roadmap developer file information..web-roadmap developer file information..
web-roadmap developer file information..
pandeyarush01
 
390713553-Introduction-to-Apportionment-and-Voting.pptx
390713553-Introduction-to-Apportionment-and-Voting.pptx390713553-Introduction-to-Apportionment-and-Voting.pptx
390713553-Introduction-to-Apportionment-and-Voting.pptx
KhimJDAbordo
 
Publication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdf
Publication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdfPublication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdf
Publication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdf
StatsCommunications
 
report (maam dona subject).pptxhsgwiswhs
report (maam dona subject).pptxhsgwiswhsreport (maam dona subject).pptxhsgwiswhs
report (maam dona subject).pptxhsgwiswhs
AngelPinedaTaguinod
 
Time series analysis & forecasting day 2.pptx
Time series analysis & forecasting day 2.pptxTime series analysis & forecasting day 2.pptx
Time series analysis & forecasting day 2.pptx
AsmaaMahmoud89
 
Get Started with FukreyGame Today!......
Get Started with FukreyGame Today!......Get Started with FukreyGame Today!......
Get Started with FukreyGame Today!......
liononline785
 
Hootsuite Social Trends 2025 Report_en.pdf
Hootsuite Social Trends 2025 Report_en.pdfHootsuite Social Trends 2025 Report_en.pdf
Hootsuite Social Trends 2025 Report_en.pdf
lionardoadityabagask
 
Introduction to MedDRA hgjuyh mnhvnj mbv hvj jhgjgjgjg
Introduction to MedDRA hgjuyh mnhvnj mbv hvj jhgjgjgjgIntroduction to MedDRA hgjuyh mnhvnj mbv hvj jhgjgjgjg
Introduction to MedDRA hgjuyh mnhvnj mbv hvj jhgjgjgjg
MichaelTuffourAmirik
 
Mixed Methods Research.pptx education 201
Mixed Methods Research.pptx education 201Mixed Methods Research.pptx education 201
Mixed Methods Research.pptx education 201
GraceSolaa1
 
最新版澳洲西澳大利亚大学毕业证(UWA毕业证书)原版定制
最新版澳洲西澳大利亚大学毕业证(UWA毕业证书)原版定制最新版澳洲西澳大利亚大学毕业证(UWA毕业证书)原版定制
最新版澳洲西澳大利亚大学毕业证(UWA毕业证书)原版定制
Taqyea
 
End to End Process Analysis - Cox Communications
End to End Process Analysis - Cox CommunicationsEnd to End Process Analysis - Cox Communications
End to End Process Analysis - Cox Communications
Process mining Evangelist
 
Important JavaScript Concepts Every Developer Must Know
Important JavaScript Concepts Every Developer Must KnowImportant JavaScript Concepts Every Developer Must Know
Important JavaScript Concepts Every Developer Must Know
yashikanigam1
 
Dr. Robert Krug - Expert In Artificial Intelligence
Dr. Robert Krug - Expert In Artificial IntelligenceDr. Robert Krug - Expert In Artificial Intelligence
Dr. Robert Krug - Expert In Artificial Intelligence
Dr. Robert Krug
 
Digital Disruption Use Case_Music Industry_for students.pdf
Digital Disruption Use Case_Music Industry_for students.pdfDigital Disruption Use Case_Music Industry_for students.pdf
Digital Disruption Use Case_Music Industry_for students.pdf
ProsenjitMitra9
 
Feature Engineering for Electronic Health Record Systems
Feature Engineering for Electronic Health Record SystemsFeature Engineering for Electronic Health Record Systems
Feature Engineering for Electronic Health Record Systems
Process mining Evangelist
 
Introduction to Python_for_machine_learning.pdf
Introduction to Python_for_machine_learning.pdfIntroduction to Python_for_machine_learning.pdf
Introduction to Python_for_machine_learning.pdf
goldenflower34
 
Dynamics 365 Business Rules Dynamics Dynamics
Dynamics 365 Business Rules Dynamics DynamicsDynamics 365 Business Rules Dynamics Dynamics
Dynamics 365 Business Rules Dynamics Dynamics
heyoubro69
 
Red Hat Openshift Training - openshift (1).pptx
Red Hat Openshift Training - openshift (1).pptxRed Hat Openshift Training - openshift (1).pptx
Red Hat Openshift Training - openshift (1).pptx
ssuserf60686
 
presentacion.slideshare.informáticaJuridica..pptx
presentacion.slideshare.informáticaJuridica..pptxpresentacion.slideshare.informáticaJuridica..pptx
presentacion.slideshare.informáticaJuridica..pptx
GersonVillatoro4
 
web-roadmap developer file information..
web-roadmap developer file information..web-roadmap developer file information..
web-roadmap developer file information..
pandeyarush01
 
390713553-Introduction-to-Apportionment-and-Voting.pptx
390713553-Introduction-to-Apportionment-and-Voting.pptx390713553-Introduction-to-Apportionment-and-Voting.pptx
390713553-Introduction-to-Apportionment-and-Voting.pptx
KhimJDAbordo
 
Publication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdf
Publication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdfPublication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdf
Publication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdf
StatsCommunications
 
report (maam dona subject).pptxhsgwiswhs
report (maam dona subject).pptxhsgwiswhsreport (maam dona subject).pptxhsgwiswhs
report (maam dona subject).pptxhsgwiswhs
AngelPinedaTaguinod
 
Time series analysis & forecasting day 2.pptx
Time series analysis & forecasting day 2.pptxTime series analysis & forecasting day 2.pptx
Time series analysis & forecasting day 2.pptx
AsmaaMahmoud89
 
Get Started with FukreyGame Today!......
Get Started with FukreyGame Today!......Get Started with FukreyGame Today!......
Get Started with FukreyGame Today!......
liononline785
 

Python for R Users

  • 1. Python for R Users By Chandan Routray As a part of internship at www.decisionstats.com
  • 2. Basic Commands Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. i Functions R Python Downloading and installing a package install.packages('name') pip install name Load a package library('name') import name as other_name Checking working directory getwd() import os os.getcwd() Setting working directory setwd() os.chdir() List files in a directory dir() os.listdir() List all objects ls() globals() Remove an object rm('name') del('object')
  • 3. Data Frame Creation R Python (Using pandas package*) Creating a data frame “df” of dimension 6x4 (6 rows and 4 columns) containing random numbers A<­ matrix(runif(24,0,1),nrow=6,ncol=4) df<­data.frame(A) Here, • runif function generates 24 random numbers between 0 to 1 • matrix function creates a matrix from those random numbers, nrow and ncol sets the numbers of rows and columns to the matrix • data.frame converts the matrix to data frame import numpy as np import pandas as pd A=np.random.randn(6,4) df=pd.DataFrame(A) Here, • np.random.randn generates a matrix of 6 rows and 4 columns; this function is a part of numpy** library • pd.DataFrame converts the matrix in to a data frame *To install Pandas library visit: http://pandas.pydata.org/; To import Pandas library type: import pandas as pd; **To import Numpy library type: import numpy as np; Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 1
  • 4. Data Frame Creation R Python Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 2
  • 5. Data Frame: Inspecting and Viewing Data R Python (Using pandas package*) Getting the names of rows and columns of data frame “df” rownames(df) returns the name of the rows colnames(df) returns the name of the columns df.index returns the name of the rows df.columns returns the name of the columns Seeing the top and bottom “x” rows of the data frame “df” head(df,x) returns top x rows of data frame tail(df,x) returns bottom x rows of data frame df.head(x) returns top x rows of data frame df.tail(x) returns bottom x rows of data frame Getting dimension of data frame “df” dim(df) returns in this format : rows, columns df.shape returns in this format : (rows, columns) Length of data frame “df” length(df) returns no. of columns in data frames len(df) returns no. of columns in data frames Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 3
  • 6. Data Frame: Inspecting and Viewing Data R Python Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 4
  • 7. Data Frame: Inspecting and Viewing Data R Python (Using pandas package*) Getting quick summary(like mean, std. deviation etc. ) of data in the data frame “df” summary(df) returns mean, median , maximum, minimum, first quarter and third quarter df.describe() returns count, mean, standard deviation, maximum, minimum, 25%, 50% and 75% Setting row names and columns names of the data frame “df” rownames(df)=c(“A”, ”B”, “C”, ”D”,  “E”, ”F”) set the row names to A, B, C, D and E colnames=c(“P”, ”Q”, “R”, ”S”) set the column names to P, Q, R and S df.index=[“A”, ”B”, “C”, ”D”,  “E”, ”F”] set the row names to A, B, C, D and E df.columns=[“P”, ”Q”, “R”, ”S”] set the column names to P, Q, R and S Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 5
  • 8. Data Frame: Inspecting and Viewing Data R Python Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 6
  • 9. Data Frame: Sorting Data R Python (Using pandas package*) Sorting the data in the data frame “df” by column name “P” df[order(df$P),] df.sort(['P']) Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 7
  • 10. Data Frame: Sorting Data R Python Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 8
  • 11. Data Frame: Data Selection R Python (Using pandas package*) Slicing the rows of a data frame from row no. “x” to row no. “y”(including row x and y) df[x:y,] df[x­1:y] Python starts counting from 0 Slicing the columns name “x”,”Y” etc. of a data frame “df” myvars <­ c(“X”,”Y”) newdata <­ df[myvars] df.loc[:,[‘X’,’Y’]] Selecting the the data from row no. “x” to “y” and column no. “a” to “b” df[x:y,a:b] df.iloc[x­1:y,a­1,b] Selecting the element at row no. “x” and column no. “y” df[x,y] df.iat[x­1,y­1] Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 9
  • 12. Data Frame: Data Selection R Python Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 10
  • 13. Data Frame: Data Selection R Python (Using pandas package*) Using a single column’s values to select data, column name “A” subset(df,A>0) It will select the all the rows in which the corresponding value in column A of that row is greater than 0 df[df.A > 0] It will do the same as the R function PythonR Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 11
  • 14. Mathematical Functions Functions R Python (import math and numpy library) Sum sum(x) math.fsum(x) Square Root sqrt(x) math.sqrt(x) Standard Deviation sd(x) numpy.std(x) Log log(x) math.log(x[,base]) Mean mean(x) numpy.mean(x) Median median(x) numpy.median(x) Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 12
  • 15. Mathematical Functions R Python Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 13
  • 16. Data Manipulation Functions R Python (import math and numpy library) Convert character variable to numeric variable as.numeric(x) For a single value: int(x), long(x), float(x) For list, vectors etc.: map(int,x), map(float,x) Convert factor/numeric variable to character variable paste(x) For a single value: str(x) For list, vectors etc.: map(str,x) Check missing value in an object is.na(x) math.isnan(x) Delete missing value from an object na.omit(list) cleanedList = [x for x in list if str(x) ! = 'nan'] Calculate the number of characters in character value nchar(x) len(x) Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 14
  • 17. Date & Time Manipulation Functions R (import lubridate library) Python (import datetime library) Getting time and date at an instant Sys.time() datetime.datetime.now() Parsing date and time in format: YYYY MM DD HH:MM:SS d<­Sys.time() d_format<­ymd_hms(d) d=datetime.datetime.now() format= “%Y %b %d  %H:%M:%S” d_format=d.strftime(format) Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 15
  • 18. Data Visualization Functions R Python (import matplotlib library**) Scatter Plot variable1 vs variable2 plot(variable1,variable2) plt.scatter(variable1,variable2) plt.show() Boxplot for Var boxplot(Var) plt.boxplot(Var) plt.show() Histogram for Var hist(Var) plt.hist(Var) plt.show() Pie Chart for Var pie(Var) from pylab import * pie(Var) show() ** To import matplotlib library type: import matplotlib.pyplot as plt Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 16
  • 19. Data Visualization: Scatter Plot R Python Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 17
  • 20. Data Visualization: Box Plot R Python Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 18
  • 21. Data Visualization: Histogram R Python Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 19
  • 22. Data Visualization: Line Plot R Python Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 20
  • 23. Data Visualization: Bubble R Python Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 22
  • 24. Data Visualization: Bar R Python Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 21
  • 25. Data Visualization: Pie Chart R Python Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 23
  • 26. Thank You For feedback contact DecisionStats.com
  • 27. Coming up ● Data Mining in Python and R ( see draft slides afterwards)
  • 28. Machine Learning: SVM on Iris Dataset *To know more about svm function in R visit: http://cran.r-project.org/web/packages/e1071/ ** To install sklearn library visit : http://scikit-learn.org/, To know more about sklearn svm visit: http://scikit- learn.org/stable/modules/generated/sklearn.svm.SVC.html R(Using svm* function) Python(Using sklearn** library) library(e1071) data(iris) trainset <­iris[1:149,] testset <­iris[150,] svm.model <­ svm(Species ~ ., data =  trainset, cost = 100, gamma = 1, type= 'C­ classification') svm.pred<­ predict(svm.model,testset[­5]) svm.pred #Loading Library from sklearn import svm #Importing Dataset from sklearn import datasets #Calling SVM clf = svm.SVC() #Loading the package iris = datasets.load_iris() #Constructing training data X, y = iris.data[:­1], iris.target[:­1] #Fitting SVM clf.fit(X, y) #Testing the model on test data print clf.predict(iris.data[­1]) Output: Virginica Output: 2, corresponds to Virginica
  • 29. Linear Regression: Iris Dataset *To know more about lm function in R visit: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/lm.html ** ** To know more about sklearn linear regression visit : http://scikit- learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html R(Using lm* function) Python(Using sklearn** library) data(iris) total_size<­dim(iris)[1] num_target<­c(rep(0,total_size)) for (i in 1:length(num_target)){   if(iris$Species[i]=='setosa'){num_target[i]<­0}   else if(iris$Species[i]=='versicolor') {num_target[i]<­1}   else{num_target[i]<­2} } iris$Species<­num_target train_set <­iris[1:149,] test_set <­iris[150,] fit<­lm(Species ~ 0+Sepal.Length+ Sepal.Width+  Petal.Length+ Petal.Width , data=train_set) coefficients(fit) predict.lm(fit,test_set) from sklearn import linear_model from sklearn import datasets iris = datasets.load_iris() regr = linear_model.LinearRegression() X, y = iris.data[:­1], iris.target[:­1] regr.fit(X, y) print(regr.coef_) print regr.predict(iris.data[­1]) Output: 1.64 Output: 1.65
  • 30. Random forest: Iris Dataset *To know more about randomForest package in R visit: http://cran.r-project.org/web/packages/randomForest/ ** To know more about sklearn random forest visit : http://scikit- learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html R(Using randomForest* package) Python(Using sklearn** library) library(randomForest) data(iris) total_size<­dim(iris)[1] num_target<­c(rep(0,total_size)) for (i in 1:length(num_target)){   if(iris$Species[i]=='setosa'){num_target[i]<­0}   else if(iris$Species[i]=='versicolor') {num_target[i]<­1}   else{num_target[i]<­2}} iris$Species<­num_target train_set <­iris[1:149,] test_set <­iris[150,] iris.rf <­ randomForest(Species ~ .,  data=train_set,ntree=100,importance=TRUE,                         proximity=TRUE) print(iris.rf) predict(iris.rf, test_set[­5], predict.all=TRUE) from sklearn import ensemble from sklearn import datasets clf =  ensemble.RandomForestClassifier(n_estimato rs=100,max_depth=10) iris = datasets.load_iris() X, y = iris.data[:­1], iris.target[:­1] clf.fit(X, y) print clf.predict(iris.data[­1]) Output: 1.845 Output: 2
  • 31. Decision Tree: Iris Dataset *To know more about rpart package in R visit: http://cran.r-project.org/web/packages/rpart/ ** To know more about sklearn desicion tree visit : http://scikit- learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html R(Using rpart* package) Python(Using sklearn** library) library(rpart) data(iris) sub <­ c(1:149) fit <­ rpart(Species ~ ., data = iris,  subset = sub) fit predict(fit, iris[­sub,], type = "class") from sklearn.datasets import load_iris from sklearn.tree import  DecisionTreeClassifier clf =  DecisionTreeClassifier(random_state=0) iris = datasets.load_iris() X, y = iris.data[:­1], iris.target[:­1] clf.fit(X, y) print clf.predict(iris.data[­1]) Output: Virginica Output: 2, corresponds to virginica
  • 32. Gaussian Naive Bayes: Iris Dataset *To know more about e1071 package in R visit: http://cran.r-project.org/web/packages/e1071/ ** To know more about sklearn Naive Bayes visit : http://scikit- learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html R(Using e1071* package) Python(Using sklearn** library) library(e1071) data(iris) trainset <­iris[1:149,] testset <­iris[150,] classifier<­naiveBayes(trainset[,1:4],  trainset[,5])  predict(classifier, testset[,­5]) from sklearn.datasets import load_iris from sklearn.naive_bayes import GaussianNB clf = GaussianNB() iris = datasets.load_iris() X, y = iris.data[:­1], iris.target[:­1] clf.fit(X, y) print clf.predict(iris.data[­1]) Output: Virginica Output: 2, corresponds to virginica
  • 33. K Nearest Neighbours: Iris Dataset *To know more about kknn package in R visit: ** To know more about sklearn k nearest neighbours visit : http://scikit- learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html R(Using kknn* package) Python(Using sklearn** library) library(kknn) data(iris) trainset <­iris[1:149,] testset <­iris[150,] iris.kknn <­ kknn(Species~.,  trainset,testset, distance = 1,                 kernel = "triangular") summary(iris.kknn) fit <­ fitted(iris.kknn) fit from sklearn.datasets import load_iris from sklearn.neighbors import  KNeighborsClassifier knn = KNeighborsClassifier() iris = datasets.load_iris() X, y = iris.data[:­1], iris.target[:­1] knn.fit(X,y)  print knn.predict(iris.data[­1]) Output: Virginica Output: 2, corresponds to virginica