ä¼å¡äºæ¥é¨ã®æè³(id:chezou)ã§ãã ä»å¹´ä¸å¹´ã社å ã§ã¯åæã«"Jupyterã®ä¼é師"ãæ¨æ¦ãã¦Jupyter notebookã®æ®åæ´»åãå±éãã¦ãã¾ããã å æ¥ã社å ã§ãã³ãºãªã³ãè¡ã£ããããããããããªã社å ã®ãã·ã³ã«Pythonç°å¢ãæ§ç¯ããã¦ãã¾ãã :)
Jupyter notebookã¨ã¯ï¼
ã²ã¨ãã¨ã§è¨ãã¨ãã©ã¦ã¶ã§åãããã便å©ãªREPL*1ã§ãã ç¾èã¯ä¸è¦ã«ããããè¦ã¦ã¿ã¾ãããã
ãã®ããã«ãRubyã®å¯¾è©±ç°å¢ã§ããpryã触ã£ã¦ããããã«ã¤ã³ã¿ã©ã¯ãã£ãã«ã³ã¼ããæ¸ããã¨ãã§ãã¾ãã 以éã§èª¬æããã¾ãããJupyter notebookã¯è¨é²ã»å ±æã»åç¾ãã¨ã¦ã徿ã§ããç¹ã«å³è¡¨ãããã¨ãã«ãã®å¹æãçºæ®ãã¾ãã
Jupyter notebookã®è¯ãæ
éå»ã®ã³ã¼ããæ¹å¤ãåå®è¡ã§ãã
ã»ã«ã¨å¼ã°ããå ¥åé¨åã«ã¯Markdownãã³ã¼ããè¨è¿°ã§ãã¾ããããã®ã³ã¼ãã¯Shift+Enterã§å®è¡å¯è½ãªã®ã§ãããä½åº¦ãä¿®æ£ãã¦åå®è¡ãããã¨ãã§ãã¾ãã ãã©ã¡ã¼ã¿ãå°ããã¤å¤ãã¦åå®è¡ããããããã¨ãã¨ã¦ã容æã§ãã ä¿åãããããã°ãããCtrl+Sã§ä¿åããã°è¯ãã®ã§*2ããããªãconsoleãããã³ã¼ãã¨å®è¡çµæãå¾ã«æ®ããã¨ã容æã§ãã
ç§ãèªåã®ä¸»å¬ããåå¼·ä¼(kawasaki.rb)ã®ãã¼ãã§ã¯ãRubyèªæ¸ä¼ã§1年以ä¸ä½¿ã£ã¦ãã¾ããããã¡ãã¡historyãè¨é²ã«æ®ãããããé¢åããªãã®ã§ã¨ã¦ãéå®ãã¦ãã¾ãã
ç»é¢ãåãæ¿ããã«ã³ã¼ããæ¸ããªããã°ã©ãã®æç»ãã§ãã
Jupyter notebookã§ã¯æ¸ããã³ã¼ãã®æç»çµæãåãè¾¼ããã¨ãã§ãã¾ããæ£ã°ã©ããæãç·ã°ã©ããç®±ã²ãå³ãªã©å¤§æµã®ã°ã©ããæãã¾ãã ç»åã¯base64 encodeããã¦ä¿åãããã®ã§ãnotebookã®ä¸ã«ä¿åãããã¨ãã§ãã¾ãã
ã°ã©ãä»ãã®notebookãç°¡åã«å ±æã§ãã
ä¿åããnotebookã¯ç°¡åã«å ±æãããã¨ãã§ãã¾ãã notebookèªä½ã¯jsonå½¢å¼ã§ä¿åãããã®ã§ãããGithubã®ã¬ãã¸ããªãgistã«ç½®ãã°ã°ã©ããªã©ã®ç»åã¨ã¨ãã«ãã®ã¾ã¾ã¬ã³ãã¼ããã¾ãã Github Enterpriseãã使ãã®å ´åãnbviewerã使ãã°URLã使ã£ãnotebookã®å ±æãã§ãã¾ãã
ããããªè¨èªãå®è¡ç°å¢ã¨ãã¦ãã
Jupyter notebookã¯ãã¨ãã¨Pythonåãã®ãã¼ã«IPython notebookã¨ãã¦ã¹ã¿ã¼ãããã®ã§ãããversion 3.0ã§ã«ã¼ãã«ãåé¢ãååãå¤ããã¾ããã ããã«ãããåè¨èªã®ã«ã¼ãã«ãå°å ¥ãããã¨ã§Jupyterä¸ã§Ruby, Julia, R, Sparkãªã©æ§ã ãªè¨èªãåãã¾ãã*3
SQLã®ã¡ã¢å¸³ã¨ãã¦ã®Jupyter notebook
ãµã¼ãã¹ã®æ¹åãæ°æ©è½ããªãªã¼ã¹ããæã«ã¯ãããã·ã¥ãã¼ããã¤ããåã«ãTreasureDataãRedshiftãBigQueryã«èç©ããããã°ã«å¯¾ãã¦SQLã§ã¢ãããã¯ã«åæãã¾ãããã
ã¯ãã¯ãããã®å ´åãTDã¨Redshiftãå©ç¨ãã¦ããã®ã§ããã以åã¯ä»¥ä¸ã®æ§ãªæé ã§ã¢ãããã¯åæããã¦ãã¾ããã
- console/SQL clientã§TD/Redshiftã«ã¯ã¨ãªãå®è¡
- åå¾çµæãcsvã§ä¿å
- Google spreadsheetã«è²¼ãä»ãã¦ã°ã©ãåãã
- ãã¡ã ã£ãã1ã«æ»ã
- è¯ãã°ã©ããå¾ããããå ±æãã
ãªãã©ããªãã©ã2ã¨3ãå¾å¾©ããã®ãçµæ§é¢åã§ãã
Jupyter notebookã使ãã¨
- Jupyter notebookã§ã¯ã¨ãªãå®è¡ãã°ã©ããæã試è¡é¯èª¤ãã
- è¯ãçµæãå¾ããããnotebookãå ±æãã
ã¨ããããã«1ã¹ãããã§ã§ããããã«ãªãã¾ãã
ããã¯ãpandasã¨ããã©ã¤ãã©ãªã®æ©æµãã¨ã¦ã大ããã§ãã pandasã¯ã表形å¼ã®ãã¼ã¿æ§é DataFrameã¨ã°ã©ãæç»ãã·ã¼ã ã¬ã¹ã«æ±ããã©ã¤ãã©ãªã§ãã*4 Rè¨èªã§DataFrameãçã¾ãã¾ããããpandasã§ãã便å©ã«é²åãã¦ãã¾ãã
pandas-tdãredshift-sqlalchemyã使ãã¨ãTDãRedshiftãªã©ã®æ¥ç¶ãç°¡åã«ã§ãã¾ããå®é¨çã«BigQueryããµãã¼ãããã¦ããããã§ãã*5
ããã»ã©ã®ã¢ãã¡ã¼ã·ã§ã³gifã§ããè¦ããããRedshiftã®ãã¼ã¿ãæ±ã£ããã®ããã¡ãã§ãã ãã¼ã¿ã®ä¾ã¨ãã¦UCI Machine Learning RepositoryããBankãã¼ã¿ãå©ç¨ãã¦ãã¾ãã çµå©ãã¦ãããã©ããã¨ãã屿§ãã¨ã®é éãæç»ããç®±ã²ãå³ãã妿´æ¯ã«ããå¹´é½¢ã¨é éã®æ£å¸å³ãªã©ãæç»ããã¦ãã¾ãã
Jupyter Tips
ç°å¢æ§ç¯
Pythonã«æ £ãã¦ããªãæ¹ã¯ãMinicondaã使ã£ã¦ç°å¢æ§ç¯ãããã®ãç°¡åãªã®ã§ããããã§ãã*6ç§ã¯pyenvã¨minicondaã§ç°å¢ãä½ãã®ã好ãã§ãã¾ãã Treasure Data社ã®ããã°ãå°å ¥æ¹æ³ã¨ãã¦ããããããã§ãã
æ £ãã¦ããæ¹ã¯ã好ããªæ¹æ³ã§ç°å¢æ§ç¯ãã¦ããã ããã°ã¨æãã¾ããã社å ã§èãã¨Pythonã«å¼·ã人ã¯pyenvã¨pyenv-virtualenvwrapperãçµã¿åããã¦ãã人ãå¤ãããã§ãã
Redshift/TDã¯ä»¥ä¸ã®ããã±ã¼ã¸ã追å ããã¨ä¾¿å©ã«ä½¿ãã¾ããBigQueryã¯pandasèªèº«ãå®é¨çã«ãµãã¼ããã¦ãã¾ãã
- Redshift
- redshift_sqlalchemy
- ipython-sql
- TD
- pandas-td
ãªããpandas-tdã¯ã¯ã¨ãªãå®è¡ãããWeb consoleã®URLã¨å®è¡ç¶æ³ãåºã¦ãã¦ã¨ã¦ã便å©ã§ãã
ã¾ããfeature requestãéã£ãã1æéãã§ããã§å¯¾å¿ãã¦ããããªã©ãTreasureData社ãæåããµãã¼ããã¦ããã¦ãã¾ãã ãã®ãªã¯ã¨ã¹ãã®ãããã§ãjobã®å®è¡çµæãå¾ããåããã¨ãã§ããããã«ãªãã¾ããã
ãã¹ã¯ã¼ãå¨ã
DBã«æ¥ç¶ããããã®ãã¹ã¯ã¼ããªã©ã¯ãç°å¢å¤æ°ã«æå®ãããªã©ãã¦ãnotebookã«ç´æ¥åãè¾¼ã¾ãªãããã«ãã¾ãããã ããã¯ãnotebookãå ±æããæã«ãã£ãããã¹ã¯ã¼ããå ±æãã¦ãã¾ãã®ãé²ãããã§ã
å¼ç¤¾ã§ã¯ãç°å¢å¤æ°ã§ç®¡çããããã«envchainã使ã£ã¦ãã¾ãã
notebookç¨ã®ãã£ã¬ã¯ããªãgitã§ç®¡çãã
å人çãªãªã¹ã¹ã¡ã®ä½¿ãæ¹ã¨ãã¦ã¯ã~/notebooks
ã¨ãããã£ã¬ã¯ããªã使ããããã§Jupyter notebookãèµ·åãgitã§ç®¡çãããã¨ã§ãã
ãããããã¨ã§ãèªåç¨ã®ã¡ã¢ã貯ãã¦ã¯å®æçã«repositoryãpushããããçµæãè¦ã¤ããã°ãããç°¡åã«å ±æã§ãã¾ãã
çµããã«
仿¥ã¯RubyKaigi 1æ¥ç®ã§ãããJupyter notebookã¯Pythonistaã®éã ãã§ä½¿ããã¦ããã®ã¯ãã£ãããªãï¼ã¨æã£ã¦ç´¹ä»ãã¾ããã ç¹ã«pandasã¯ãã¾ãPythonã£ã½ãè¨æ³ã§ã¯ãªããRubyistãªæ¹ã ãæ¯éä¸åº¦è©¦ãã¦ã¿ã¦ããã ããã°ã¨æãã¾ãã
Rubyã§ã¯nyaplotãdaruãªã©ã使ãã°notebookã§ã°ã©ãæç»ãã§ãã¾ãããpandasã«æ¯ã¹ãã¨ã¾ã ã¾ã æ¹åã®ä½å°ã¯å¤§ãããã§ãã ããããã°ã©ãæç»å¨ããè¡åè¨ç®ãå å®ãã¦ãããRubyã§ãç§å¦æè¡è¨ç®ãçãã«ãªããã¨ãæå¾ ãã¦ãã¾ãã
*1:対話åã®å®è¡ç°å¢
*2:autosaveæ©è½ãããã®ã§çªç¶ã®ã«ã¼ãã«ãããã¯ã«ãå®å¿
*3:ãªããJupyterã®"Ju"ã¯Juliaã®Juã§ã
*4:@lchin æ°ããRubyã®ãã©ã¼ã¢ããªãRailsãªãPythonã®ãã©ã¼ã¢ããªãpandasã ã
*5:æ®å¿µãªãããBigQueryã¨pandasã¯çè ã¯ä½¿ã£ããã¨ãããã¾ãã
*6:numpy, scipyãªã©ã®å°å ¥ã¯ãããã¨çµæ§å¤§å¤ã ããminicondaã¯ä¸éãé¢åãè¦ã¦ãããã®ã§åå¿è ã«ã¯è¯ã