- Workflow Engines Meetup #1 - connpass
- 2017/03/09 Workflow Engines Meetup #1 #wfemeetup - Togetterã¾ã¨ã
ããã°ãã¼ã¿åºç¤å¨ãã®æ¥åã«æºãã£ã¦ããè ã«ã¨ã£ã¦ãã¯ã¼ã¯ããã¼ã¨ã³ã¸ã³ãã¯é常ã«éè¦ãªä½ç½®ãå ãã"é¢å¿äº"ã§ããOSSããåç¨ã®ãã®ã¾ã§ãã®åéã®ãã¼ã«ããããã¯ãã¯æ°å¤ãåå¨ããå¤ããå°ãªããçããè¦å´ããªãããããããã®å©ç¨ãã¦ãã¦ãèç©ãã¦ããç¶æ³ã®æ§ã§ãããããªä¸ããã®ãã®ãºããªã®åå¼·ä¼ãä¼ç»ããã¦ããã®ã§éæ»ã§ç³ãè¾¼ã¿ããã®æ¥åå ãã¦ãã¾ããã
ã¤ã³ãã
ä¼ã®åé ã¯@Mahitoæ°ã«ããã¤ãã³ãå®ç¾©ã»ä¸»æ¨èª¬æã«ã¤ãã¦ããã¯ã¼ã¯ããã¼ã¨ã³ã¸ã³ãã®å®ç¾©çã®å ±æãçºããã¾ããã
ã»ãã·ã§ã³å 容
ã¯ã¼ã¯ããã¼ã¨ã³ã¸ã³æ¯ã®çºè¡¨å 容ã¯ä»¥ä¸ã®éãã
Digdagï¼Digdagã®ç¹å¾´ã¨Quick Start
- çºè¡¨è ï¼Sadayuki Furuhashi(@frsyuki) æ°
ãã®æ¥å¤æ©æ°ã¯ç±³å½ã«ãªãã©ã«ãã¢å·ããã¦ã³ãã³ãã¥ã¼ããã®ä¸ç¶çºè¡¨ãçºè¡¨æã®ç±³å½æéã¯å¤ä¸ã ã£ã模æ§ãé ãæéã«ãããã¨ããããã¾ããã
以ä¸çºè¡¨ã¡ã¢ã
ã¯ã¼ã¯ãã¼ãã®èªååã¨ã¯
- ããããæä½æ¥ã®èªåå
- ããããã¼ã¿è§£æï¼ãã¼ã¿ãã¼ããETLãJOINãéè¨å¦çãã¬ãã¼ãçæãéç¥
- ã¡ã¼ã«éä¿¡ï¼ã¢ãã¬ã¹ä¸è¦§ã®åå¾ã»å¯¾è±¡ã®çµè¾¼ããæ¬æçæãéä¿¡ãå®äºéç¥ã¾ã§
- ã·ã¹ãã éãã¼ã¿é£æº
- ãµã¼ãã»DBã»ãããã¯ã¼ã¯æ©å¨ã®ç®¡çããããã¸ã§ãã³ã°ã®èªåå
- ãã¹ãã»ãããã¤ã®èªåå
- ããããæä½æ¥ã®èªåå
æ±ããããæ©è½ã¯å¤ç¨®å¤æ§ã
ã¯ã¼ã¯ããã¼èªååå¨ãã®è£½åã«ã¤ãã¦ï¼å¤æ©æ°ã®èª¿ã¹ãéãã ã¨OSSï¼æå製åå ±ã«ä»¥ä¸ã®æ§ãªã©ã¤ã³ãããã§å¤æ°åå¨ã
- OSS
- æå製å
ã¯ã¼ã¯ããã¼ã®å®ç¾©æ¹æ³ã«ããåé¡
- ããã°ã©ãã³ã°è¨èªåï¼LuigiãAirflow
- é·æï¼ä½ã§ãæ¸ãããè¤éãªå¦çãèªç±èªå¨ãgitãã¼ã¸ã§ã³ç®¡çã容æ
- çæï¼èªã¿æ¸ãåæ¹ã§ã³ã¼ãã®ç解ãå¿ é ãã¯ã¼ã¯ããã¼å®ç¾©ã®å ¨ä½åã®ä¿¯ç°ãå°é£
- GUIåï¼RundeckãJenkins
- é·æï¼ã·ã³ãã«ãªã¯ã¼ã¯ããã¼ã®çµã¿ä¸ããç°¡åã誰ã§ãéçºï¼ç®¡çå¯è½
- çæï¼è¤éãªã«ã¼ãå¦ççãæ¸ãã®ãè¾ãããã¼ã¸ã§ã³ç®¡çãå°é£ãåç¾æ§ãä½ã(å¥ç°å¢ã«åãã¯ã¼ã¯ããã¼ããããã¤ããã®ãå°é£)
- å®ç¾©ãã¡ã¤ã«ï¼ã¹ã¯ãªããåï¼MakefileãAzkaban
- é·æï¼gitã§ãã¼ã¸ã§ã³ç®¡çå¯è½ããããªãã«èªã¿æ¸ããæã
- çæï¼èªã¿æ¸ãåæ¹ã§ã¹ã¯ãªããã®ç解ãå¿ è¦ãè¤éãªå¦çã®è¨è¿°ãç ©éã«ãªããå¶ç´å¤ã
- ããã°ã©ãã³ã°è¨èªåï¼LuigiãAirflow
Digdagã®ç»å ´ï¼å®ç¾©ãã¡ã¤ã«ï¼ãªãã¬ã¼ã¿ï¼ã°ã«ã¼ãå
- é·æï¼
- èªã¿æ¸ããæã
- ã¿ã¹ã¯ã®ã°ã«ã¼ãåãå¯è½ãæåã俯ç°åºæ¥ã
- è¯ãããå¦çãªãããã°ã©ãã³ã°ä¸è¦
- ç¹æ®ãªå¦çã«ã¤ãã¦ã¯ã¹ã¯ãªãããè¨è¿°å¯è½
- å®è¡ç¶æ ããã§ãã¯ãã管çUIãã
- gitã§ãã¼ã¸ã§ã³ç®¡çå¯è½
- çæï¼
- é·æï¼
ã½ã¼ã¹ã³ã¼ãã®è§£èª¬ãå ããªããDIgdagã®ãã¢å®æ¼ã
ã¾ãã主ã ã£ãã¦ã¼ã¹ã±ã¼ã¹ãå¹¾ã¤ãããã¯ã¢ãããããã®ã¦ã¼ã¹ã±ã¼ã¹ãDigdagã§å®ç¾ããã«ã¯ã©ãæ¸ãããã¨ãã観ç¹ã§å¹¾ã¤ãã®è¨è¼ä¾ã®è§£èª¬ãããã¾ãããå©ç¨å¯è½ãªãªãã¬ã¼ã¿ã®ä¸è¦§ã¯ä¸è¨ããã¥ã¡ã³ããåç §ã
ã¾ã¨ãï¼åèæ å ±
Jenkinsï¼Jenkins 2.0 Pipeline & Blue Ocean
- çºè¡¨è ï¼Akihiko Horiuchi(@hico_horiuchi) æ°
Jenkins2.0ãã対å¿ããæ§ã«ãªã£ããã©ã°ã¤ã³ãPipelineã¨Blue Oceanã«é¢ãã解説ãã¡ã¤ã³ã¨ãªã£ãã»ãã·ã§ã³ã ãã¡ãã®å 容ã«ã¤ãã¦ã¯ã¹ã©ã¤ãè³æãæ¢ã«å±éããã¦ãããããçºè¡¨ã¡ã¢ã«ã¤ãã¦ã¯å²æãã¾ãã
Luigiï¼Luigiã使ã£ã¦ãã話
- çºè¡¨è ï¼takacy(@bwtakacy) æ°
ãªã¯ã«ã¼ããã¼ã±ãã£ã³ã°ãã¼ããã¼ã®@bwtakacyæ°ãã¹ã¿ãã£ãµããªã»ãã¼ã¿åæåºç¤ã«ã¦Luigiã使ã£ã¦ããã¨ã®äºã以ä¸æ§æå³ã
æ¥æ¬¡å¦ç(ã®è¦æ¨¡æ)
- Embulkã§50以ä¸ã®ãã¼ãã«ãé£æº
- Luigiã§30以ä¸ã®Hiveã¯ã¨ãªãå®è¡
- TD Workflow(Digdag)ã§10以ä¸ã®Prestoã¯ã¨ãªãå®è¡
- TDä¸ã®ã¹ã±ã¸ã¥ã¼ã«ã¯ã¨ãªã10å以ä¸åå¨
- è¤æ°ã®é¨éã«æ¸¡ãã20以ä¸ã®ã¬ãã¼ããæä¾
Luigiã®æ¦è¦
- è¤æ°ã®ãããå¦çãçµã¿åãããã¸ã§ããå¶å¾¡
- å¦çã®ä¾åé¢ä¿ã®è§£æ±ºã»ã¹ã±ã¸ã¥ã¼ãªã³ã°ã«ç¹å
- å¦çã®ã¢ãããã¯æ§ã確ä¿
- å ¨ã¦Pythonã§è¨è¿°
- ãã©ãããã©ã¼ã ã«ä¾åããªããããæ§ã
ãªå¦çãä¸å
çã«è¨è¿°å¯è½
- Hadoop, Hive, Pig, Spark
- MySQL, PostgreSQL, SQLAlchemy
- Treasure Data, BigQuery, Redshift
- SSH, FTPãªã©ãªã©
- Spotifyãéçºãããã®ãOSSåãããã
- ååã®ç±æ¥ã¯ãä¸çã§2çªç®ã«æåãªé
管工ãã
Luigiã®ç¯å²å¤ã¨ãªãé¨å
- ãªã¢ã«ã¿ã¤ã å¦çã»é·æéç¶ç¶å®è¡ã®å¦çã«ã¯ä¸åã
- å¦çã®åæ£å®è¡ã¯æªãµãã¼ã
- å¦çã¹ã±ã¸ã¥ã¼ã«èµ·åãããªã¬èµ·åã¯åºæ¥ãªãï¼ã¹ã¿ãã£ãµããªãã¼ã¿åºç¤ã§ã¯JenkinsããLuigiãã¹ã±ã¸ã¥ã¼ã«èµ·å
- ã¹ã±ã¼ã©ããªãã£ã¯è¿½æ±ãã¦ããªãï¼æ°åããããªãè¡ãããæ°ä¸è¦æ¨¡ã¯ç¡ç
Luigiã®ç¨èª
- Taskï¼å¦çã®å®ä½ãã¯ã¼ã¯ããã¼ã®é¨åãäºåã«å®è¡ããã¦ããã¹ãTaskãå®ç¾©åºæ¥ã
- Targetï¼Taskã®æ£å¸¸çµäºã示ãæ å ±ããã¡ã¤ã« on ãã¼ã«ã«ãHDFSãRDBãS3ãªã©
- Parameterï¼Taskã®å¼æ°ã¨ãã¦ä¸ããäºãåºæ¥ãå¤æ°ãä¾)æ¥æ¬¡å¦çã«ãããæ¥ä»ãªã©
ç°¡åãªä¾ï¼
- ã¹ã¿ãã£ãµããªã§ã®ä½¿ç¨ä¾ï¼
ã¤ã³ã¹ãã¼ã«ã¨å®è¡
- ã¤ã³ã¹ãã¼ã«ã¯
pip install luigi
ã§å¯è½ã - ã¯ã¼ã¯ããã¼å®è¡ï¼
luigi –module foo examples.Foo –local-scheduler
- èµ·åããPythonã¹ã¯ãªãããsys.pathé ä¸ã«åå¨ãã¦ããäº
- local-schedulerï¼ã³ãã³ãå®è¡æ¯ã«ã¹ã±ã¸ã¥ã¼ã©ãèµ·åï¼ã¹ã±ã¸ã¥ã¼ã©ããã»ã¹ãç¬ç«ãã¦å®è¡ãã¦ããã°ä¸è¦
- ã¤ã³ã¹ãã¼ã«ã¯
Luigiã«å¯¾ãã¦ã®æãæç
- CASE1.TASKã®å®è¡æéãç¥ããã
- ã³ã³ã½ã¼ã«ãã°ã¯ãããªãã«åºãããåTASKã®å®è¡æéã¯åºãªãï¼
- 解決ç)PROCESSING_TIMEã¤ãã³ãã使ãã°åºããã
- CASE2.並åå®è¡ã¨ã³ãã³ãæ»ãå¤
- ããã©ã«ãã§ã¯TASKã®å¤±æãèµ·ãã¦ãluigiã³ãã³ãã®æ»ãå¤ã¯0ãæ»ãå¤ã«ããã¨ã©ã¼ãã³ããªã³ã°ãåºæ¥ãªãã
- 解決ç)è¨å®ãã¡ã¤ã«ã«ã¦retcodeè¨å®ãè¡ããluigiã³ãã³ãèµ·åæã«èªã¿è¾¼ã¾ããäºã§å¯¾å¿å¯è½ã
ç°å¸¸çºçã®ç¨®é¡ã«å¿ããæ»ãå¤ãè¨å®åºæ¥ãã
- Luigi.cfg
- luigiã³ãã³ãå®è¡æã«ä»¥ä¸ããèªã¿è¾¼ã¾ãããä¸ã«è¡ãç¨åªå
度é«
/etc/luigi/client.cfg
- luigi.cfg(ã«ã¬ã³ããã£ã¬ã¯ããªä¸)
- ç°å¢å¤æ°
LUIGI_CONFIG_PATH
ä¸
- luigiã³ãã³ãå®è¡æã«ä»¥ä¸ããèªã¿è¾¼ã¾ãããä¸ã«è¡ãç¨åªå
度é«
- CASE1.TASKã®å®è¡æéãç¥ããã
Luigiã®ãããã©ãã
- 1.RETCODEããã¤ãã¹ããã¯ã¼ã¯ããã¼ãåºæ¥ã¦ãã¾ã
- 2.並åå®è¡ããã¨æ»ãå¤ããããããªãï¼ãã¡ãã¯ãããæ稿ã§è§£æ±ºæ¸
Digdagã¨Luigiã®ä½¿ãåãã«ã¤ãã¦
- ã·ã³ãã«ãªã¯ã¼ã¯ããã¼ãè¨è¿°ããã«ã¯Luigiã¯éãã
- 以ä¸å
容ã«ã¤ãã¦ã¯Treasure Workflowã«ç§»è¡ä¸
- ã¯ã¨ãªã®ä¾åé¢ä¿ãç°¡åãªãã®
- TASKãTDã¯ã¨ãªã®ã¿ã®ãã®
- Embulk/FTPçä»ã·ã¹ãã é£æºç³»ã®å¦çãããå ´åã¯Luigiã使ç¨
- TASKéã®ä¾åé¢ä¿ãè¤éã«ãªããããªãã®ã¯Luigi
- GUIã¯Digdag/Treasure Workflowã®æ¹ãåªãã¦ãããLuigiã®GUIã¯æ£ç´ãã¾ãå®ãããªã…
Azkabanï¼Azkaban in my use case
- çºè¡¨è ï¼wyukawa(@wyukawa) æ°
LinkedIn社ã§Hadoopã®ä¾åé¢ä¿ã解決ããçºã«å®è£ ããããã¼ã«ãã¢ãã³ã§ã¯ãªãããã£ããã«åªãããJavaã§æ¸ããã¦ãããããã¯ããªã®ã ããã§ãããã¡ãã®ã»ãã·ã§ã³ã«ã¤ãã¦ãè³æãæ¢ã«å ¬éããã¦ããã®ã§ã¡ã¢ã¯å²æãã¾ãã
Airflowï¼Apache Airflow(incubating)ã®ç´¹ä»
çºè¡¨è ï¼Kengo Seki(@sekikn39) æ°
Apache Airflowã«èå³ãæã£ãèæ¯
- å¹¾ã¤ãã®æ¡ä»¶ã§ã¯ã¼ã¯ããã¼ã¨ã³ã¸ã³ã¨ãã¦Apache Oozieã使ç¨ãã¦ãã
- Oozieã¯å®ç¸¾ãå¤ãå®å®ãã¦ãããHadoopã¨ã®è¦ªåæ§ãé«ã
- ä¸æ¹ã§å¶ç´ãå¤ãããã®çºãOozieã®ä»£æ¿ã¨ãªããããã¯ããæ¢ãå§ãã
- å¶ç´ã好ãã§ãªãã¨ããã®ä¸è¦§ããã®ãã¼ã¸ã«ã¯@sekikn39æ°ã®æ¨ã¿è¾ã¿ãè©°ã¾ã£ã¦ããã¨ã®äºw
- ãããã®è¦ç´ ã«å¯¾ãã¦ãApache Airflowã§ã¯ä»¥ä¸ã®æ§ãªå¯¾å¿ç¶æ³ã ã£ããããã«ããApache Airflowã«èå³ãæ±ãå½¢ã«ã
Apache Airflowã®æ¦è¦ã»æ©è½ç´¹ä»
- Airbnb社éçºã2016å¹´03æã«Apache Software Foundationã«å¯è´
- 2017å¹´03ææç¹ã§81社ãå ¬å¼ã«å©ç¨ã表æã
- ç¾å¨ã®å®å®çã¯1.7ç³»ã1.8.0ãéçºä¸ã1.8.0ã«ã¤ãã¦ã¯ä»¥ä¸ã®issueãå ¨ã¦è§£æ±ºããã°ãªãªã¼ã¹ãããã®ã§ã¯ã
- ã¤ã³ã¹ãã¼ã«ã¯
pip install airflow
ã
ç»é¢è¦ç´ ãè¦ããªããã®ç¨èªè§£èª¬ã
æ¢åã®Operatorã®ä¾
- åèï¼Concepts â Airflow Documentation
- ã³ãã³ãçºè¡ç³»ï¼
- BashOperator, DockerOperator, SimpleHttpOperator, PythonOperator
- SQLçºè¡ç³»ï¼
- HiveOperator, JdbcOperator, MsSqlOperator, MySqlOperator, OracleOperator
- PigOperator, PostgresOperator, SqliteOperator
- ãã¼ã¿è»¢éç³»ï¼
- HiveToDruidTransfer, HiveToMySqlTransfer, MsSqlToHiveTransfer
- MySqlToHiveTransfer, PrestoToMySqlTransfer, RedshiftToS3Transfer
- S3FileTransformOperator, S3ToHiveTransfer
- éç¥ç³»ï¼
- EmailOperator, SimpleHttpOperator, SlackAPIPostOperator
- èªä½ã®Operatorå®ç¾©ãå¯è½ã
ãã®ä»ã®æ¦å¿µãæ©è½
- Connectionï¼å種ãã¼ã¿ã¹ãã¢ã¸ã®æ¥ç¶æ å ±ã管ç
- Hookï¼Connectionã使ã£ã¦ãã¼ã¿ã¹ãã¢ã«ã¢ã¯ã»ã¹ãããããã¼ã¿ãload/dumpããããã®ã¡ã½ãããæä¾
- Poolsï¼ã¿ã¹ã¯ã®ä¸¦åæ°ã管ç
- Queueï¼Celeryã®ãããªãå¤é¨ã®ãã¥ã¼ã¤ã³ã°ã·ã¹ãã ãã¸ã§ããã¥ã¼ã¨ãã¦å©ç¨å¯è½
- Branchingï¼DAGä¸ã§ã®æ¡ä»¶åå²ãå®ç¾
- SLAï¼ä¸å®æéå ã«æåããªãã£ãtaskã管çè ã«ã¡ã¼ã«éç¥
ãã¢å®æ¼ï¼æ¶ç©ºã®ãããåºã®å£²ä¸ãã¼ã¿ã®æ´å½¢ã»éè¨ã»åæãé¡æã¨ãããã¢ãå®è·µã
Apache Airflowã¸ã®è¦æ
- Airflowã«ä¸è¶³ãã¦ããæ©è½
- HAæ§æã®ãµãã¼ã
- éç¨æ§ã®åä¸ãç¹ã«DAGã®ç»é²ã»æ´æ°ã»åé¤ãWebUIããå®æ½åºæ¥ãããã«ããã
- UTC以å¤ã®ã¿ã¤ã ã¾ã¼ã³ã®ãµãã¼ã
- WebUIããã¯ã¼ã¯ããã¼ã®ä»»æã®å ´æã§å¦çãåæ¢ã»åéããæ©è½(å½ç£ã¸ã§ãã¹ã±ã¸ã¥ã¼ã©ã¯å¤§ä½æã£ã¦ãã)
- Airflowã«ä¸è¶³ãã¦ããæ©è½
Airflow Meetup Tokyoï¼
- Airflow PMC chairã«ä¼ã£ãã¨ãããæ±äº¬ã§ãmeetupãéå¬ãã¦ã¿ã¦ã¯ï¼ã¨ã®ææ¡ãããã
- ç±³å½è¥¿æµ·å²¸ã§ã¯2ã3ã¶æã«1åã®ãã¼ã¹ã§éå¬ãã¦ããããã
- ãã¬ã«ã³ã»è±èªã§è¯ããã°ã¹ãã¼ã«ã¼ã®ç´¹ä»ã
- Airflow PMC chairã«ä¼ã£ãã¨ãããæ±äº¬ã§ãmeetupãéå¬ãã¦ã¿ã¦ã¯ï¼ã¨ã®ææ¡ãããã
ã¯ãã¼ã¸ã³ã°
æå¾ã¯æ¬¡åéå¬ã«ã¤ãã¦ã®ãã£ãããªæ§æ³ãªã©ãå ´æãæ¥æã¯æªå®ã¨ãªã£ã¦ãããã®ã®ãæ±ããã¼ãã«ã¤ãã¦ã¯Apache Oozie/StackStorm/Trigrav/Rukawaçãæ¢ã«çºè¡¨åè£ã¨ãã¦æãã£ã¦ããããã§ãã
ã¾ã¨ã
1ãã¼ãç´20åãåè¨5ã¤ã®ãã¯ã¼ã¯ããã¼ã¨ã³ã¸ã³ãã«é¢ããçºè¡¨ã¨ãªãã¾ãããããããæ©è½ã®æ¦è¦ãææ¡ããä¸ã§ã¯é常ã«å å®ããå 容ã¨ãªã£ã¦ããã¨æãã¾ããå人çã«å¹¾ã¤ãèå³ããããããã¯ãã®è©±ãèããã®ã§è²ã ã¨åèã«ãªãé¨åãå¾ãäºãåºæ¥ã¾ããã第2åãé常ã«æ¥½ãã¿ã§ããï¼