並åããããã¼ã¿è»¢éOSSã®Embulkãã½ã¼ã¹ã³ã¼ããªã¼ãã£ã³ã°ãã¦ã¿ãï¼ãã®ï¼
ããã«ã¡ã¯ã
ããããKinesisSpoutãä¸æ®µæ¥½ããã®ã§æ¬¡ã®ãã¿ãã
å
æ¥ãデータ転送ミドルウェア勉強会ããéå¬ããã
ããã§バルクデータロードツール『Embulk』ãå
¬éããã¾ããã
ãã¼ã¿ã®ãã«ã¯ãã¼ãã¨ããã¨ãå®çªã®OSSã¨ããã®ããªãã¦ã
HDFSã«ãã«ã¯ãã¼ã¿ããã¼ãããæã¯hadoopã³ãã³ãã§è¡ãã»ã»ãªã©ãè¡ã£ã¦ããã®ã§ããã
ããããã¼ã«ã§ã§ããã¨ããã®ã¯é常ã«ãããããã§ããã
ã§ãæ¢ã«ä½¿ã£ã¦ã¿ãæ¹ã®äºä¾ã¯ããã¤ãæãã£ã¦ãã¾ãã®ã§ãå®éã«ã©ãä½ããã¦ããããè¦ã¦ã¿ããã¨æãã¾ãã
ã»ã»ã»ãããJavaã§ãã©ã°ã¤ã³ãæ¸ããããã«ãªãã¾ã§å®éã«åãããã½ã¼ã¹èªãããåºæ¥ãªãããã§ããã
1.embulkã®ã¢ã¸ã¥ã¼ã«æ§æ
embulkã®GitHubã確èªãã¦ã¿ã¾ãã¨ãä¸è¨3ã¤ã®ã¢ã¸ã¥ã¼ã«ã§æ§æããã¦ãã¾ãã
- embulk-cli
- embulk-core
- embulk-standards
åã¢ã¸ã¥ã¼ã«ãä½ãããè¦ã¦ã¿ãã¨ã©ãããä¸è¨ã®ãããªæãã®ããã§ãã
embulk-cli
embulkãjavaã³ãã³ãã§èµ·åããéã«å¼ã°ããMainã¯ã©ã¹ã®ã¿ãä¿æããã¢ã¸ã¥ã¼ã«ã
èµ·åå¼æ°ã®åã«ãclasspath:embulk/command/embulk.rbãã追å ããJRubyãå¼ã³åºãã¦ããã®ã¿ã§ãã
ã»ã»ã½ã¼ã¹ãJavaã¨æå¾
ãã¦èªã¿å§ããã¨ããã£ã±ãªããRubyçªå
¥ãã¦ããï¼ï¼
embulk-core
embulkã®ã³ã¢ã¢ã¸ã¥ã¼ã«ã§ã¢ã¸ã¥ã¼ã«ããã¼ãããæ©è½ãå®è¡ããæ©è½ãä¿æã
embulk-standards
ä¸è¨ã®ãããªåºæ¬æ©è½ã¨ããã¼ãå¦çãè¡ãã¢ã¸ã¥ã¼ã«ã
- CSVãã¡ã¤ã«ãã©ã¼ããã¿ããã¼ãµããã¼ã¯ãã¤ã¶ã¼
- GZipãã¡ã¤ã«ã®ã¨ã³ã³ã¼ãï¼ãã³ã¼ã
- ãã¼ã«ã«ãã¡ã¤ã«åºåï¼å ¥å
- NullOutput
- S3ã¸ã®åºå
- æ¨æºåºå
2.èµ·åã®æµã
è»éå¶å¾¡é¨åãJRubyã§æ¸ããã¦ãããã¨ãããã£ããããKinesisSpoutã®ããã«Javaã³ã¼ããã追ã£ã¦ããããæ¹ã¯å¤ååºæ¥ãªãã»ã»ã»
ã¨ãããã¨ã§ãJRubyã®èµ·åå¦çé¨åã追ã£ã¦ã¿ã¾ãã
ï¼Rubyã«ã¤ãã¦ã¯æ§æãç¥ã£ã¦ããä½ã®ã¬ãã«ãªã®ã§ãã±ãã¾ãã¦ãããç温ããçªã£è¾¼ãã§é ããã¨å¹¸ãã§ãã
å°ãRubyã®ã³ã¼ãã¯libãã£ã¬ã¯ããªé
ä¸ã«é
ç½®ããã¦ãã¾ããã
èµ·åããéã«ã¯JRubyçµç±ã§ãembulk/command/embulk.rbããå¼ã³åºãããããããèµ·åãã¾ãã
embulk.rb
- ç°å¢å¤æ°ãEMBULK_BUNDLE_PATHãã¾ãã¯èµ·åå¼æ°ãåºã«Gemã®ã¤ã³ã¹ãã¼ã«ãã¹ãåå¾ããã
- Gemã®ã¤ã³ã¹ãã¼ã«ãã¹ãæå®ãã¦Embulk#run(embulk_run.rb)ãå¼ã³åºãã
- æå®ããã¦ããªãå ´åã¯embulk_runã§è¨å®ãããã
embulk_run.rb
- èµ·åå¼æ°ããã-ããä»ä¸ãããªãå¼æ°ã®ãã¡ã¯ããã®å¼æ°ãåå¾ããããµãã³ãã³ããã¨ããã
- ããµãã³ãã³ãããåå¨ããªãå ´åã¯ãã®æç¹ã§usageã¨ã©ã¼
- ããµãã³ãã³ãããä¸è¨ã®ããããã®å ´åã¯æå®å¼æ°ã«å¿ãã¦å¾ã®usageã¡ãã»ã¼ã¸ã«å
容ã追å ã
- bundle/run/preview/guess/example(usageã¡ãã»ã¼ã¸ã®è¿½å ã¯ãªã)
- ããµãã³ãã³ãããä¸è¨ã®ãããã§ããªããä¸è¨ã®ããããã®å ´åã¯å¦çãèµ·å
- gem(Rubyã®gemã³ãã³ããèµ·å)ãexec(å¼æ°ãç¨ãã¦ãã®ã¾ã¾ããã»ã¹èµ·å)
- ãã®ä»ã®ãªãã·ã§ã³ã®å½¢å¼ãã§ãã¯ãå®æ½ãNGã§ããã°usageã¨ã©ã¼
- ããµãã³ãã³ããã«å¿ãã¦ä¸è¨ã®ããã«å¦çãåå²
- bundle
- bundleãã£ã¬ã¯ããªãã³ãã¼å¾ãbundlerãã¤ã³ã¹ãã¼ã«å¾ãBundler::CLIéå§ï¼è©³ç´°ã¯ããããï¼
- example
- æå®ãã¹ã«å¯¾ãã¦exampleãåºåãã
- ä¸è¨ä»¥å¤
- org.embulk.command.Runnerã®ã³ã³ã¹ãã©ã¯ã¿ã«å¼æ°ãJSONã«å¤æãã¦å®è¡
- ããµãã³ãã³ãããè¨å®ãã¡ã¤ã«ãã¹ããå¼æ°ã¨ãã¦æå®ãã¦mainã¡ã½ãããå®è¡
- bundle
ã»ã»ã¨ããããã§ãCLIããJRubyãèµ·åãã¦GemãClasspathã®è§£æ±ºãè¡ãã
org.embulk.command.Runnerããå度Javaã³ã¼ãã«æ»ã£ã¦ãã¼ã«æ¬ä½ãæ¬æ ¼çã«èµ·åããã»ã»
ã¨ããæµãã®ããã§ãã
æ©ã段éã§Javaã«æ»ã£ã¦ãã¦ã¡ãã£ã¨å®å¿ã
ã¨ããããã次åã¯Javaã³ã¼ãã®æ§é ã確èªããä¸ã§Javaå´ã®æµãã追ã£ã¦ã¿ã¾ãã