SlideShare a Scribd company logo
Python
Sky
20160708 データ処理のプラットフォームとしてのpython 札幌
•
• Python 2000
(**)
• db tech showcase MongoDB
•
• FB: Ryuji Tamagawa
• Twitter : tamagawa_ryuji
20160708 データ処理のプラットフォームとしてのpython 札幌
2015
2016
• Python
• Python
• Python
•
• Python
• NumPy, SciPy, matplotlib, Pandas
• Python
• scikit-learn
• TensorFlow
• Python IPython, Jupyter notebook, Spyder, VisualStudio
• Python
• Python
• Pandas
• Spark - PySpark DataFrame API
• matplotlib
Part 1 : Python
Python
•
• Google
Guido Google
Google 1
•
NumPy, SciPy, matplotlib → Pandas
•
•
-2000
Linux
-2010 Web Trac
Google
Python
•
•
•
•
→
•
Python
•
• pyODBC
• Web WSGI
Python
• 2.x 3.x 32bit 64bit
64bit
• 2.x
• 3.x
3
• 2.x
3.x
• Ruby?
• R?
• Java?
• Scala?
Python
• Python ’CPython’ JIT
PyPy JVM Jython .Net IronPython
• CPython
• CPython 2
• C
• processing
PySpark
Python
• Python
• 1 Linux Mac OS Python
Python Mac
• Python pip 3.x Python 2.7.9 2.x
Python pip Linux Python
pip yum apt
• Python Anaconda Python
conda
• python 2016

http://qiita.com/y__sama/items/5b62d31cb7e6ed50f02c
NumPy, SciPy, matplotlib, Pandas
•
• NumPy SciPy
• Pandas
Pandas Pandas NumPy
• Anaconda Python
Python
•
scikit-learn http://
scikit-learn.org/stable/
Python
• TensorFlow 

Python
Python


IPython
Jupyter, …
IDE
Spyder, Rodeo
Visual Studio, PyCharm, PyDev
•
• GUI IDLE
•
OK
• IPython
•
•
• Anaconda
• pip


• Jupyter Notebook
• Python
• IPython Notebook
Python
• Apache Zeppelin http://
zeppelin.apache.org
IDE
• R RStudio
• IDE
•
• 2 Spyder Rodeo
•
Spyder
•
• Visual Studio
• Eclipse PyDev
• PyCharm
•
Part 2 :
Python
1 1.2 1000000L Python2
‘abc’ u’ ’ Python2
[1, 2, 3,‘foo’,‘bar’,‘foo’]
(1, 2, 3,‘foo’,‘bar’,‘foo’)
{‘k1’:‘value1’,‘k2’:‘value2’}
set(1, 2, 3,‘foo’,‘bar’)
•
•
• split
s = ‘foo, bar, baz’
items = s.split(‘,’)
print items[0]
print items[-1]
print items[0][-2:]
• 

list comprehension
• 

dictionary comprehension
• lambda map, reduce, filter
sList = [‘foo’, ‘bar’, ‘baz’]
lList = [len(s) for s in sList]
lList = map(lambda s:len(s),
sList)
lDict = {s:len(s) for s in sList}
Pandas
• Pandas
•
matplotlib / seaborn
• NumPy
SciPy
Python
• Pandas + matplotlib
OK Pandas NumPy
NumPy / SciPy
Pandas
• Pandas
DataFrame
• R
• RDB
2
• index Series Columns
Columns
Series Series SeriesIndex
Pandas I/O
• CSV JSON RDB Excel
• column
• RDB
•
import pandas as pd
pd.read_csv(<filename>)
pd.read_json(<filename>)
pd.to_csv(<filename>)
pd.to_excel(<filename>)
#
pd.to_clipboard()
• http://sinhrks.hatenablog.com/entry/2015/01/28/073327
0 1
import pandas as pd
df[‘nValue’] = df[‘value’] / sum(df[‘value’])
id value color
sapporo 43 red
osaka 42 pink
matsumoto 40 green
id value color nValue
sapporo 43 red 0.344
osaka 42 pink 0.336
matsumoto 40 green 0.32
Python
Spark - PySpark DataFrame
API
•
Python
• Spark PySpark
findSpark
Spark
• Python Spark API
DataFrame API
• Spark Pandas
Spark
PySpark
Spark

node
Spark

node
Spark

node
Spark

node
driver
matplotlib / seaborn
•
• Python NumPy
/ Pandas
• Jupyter Notebook
Spyder
Questions ?

More Related Content

20160708 データ処理のプラットフォームとしてのpython 札幌