Embulkã§ã¯ã¦ãªããã°ã®è¨äºãparseãã¦Elasticsearchã«å ¥ãã¦ã¿ã
Embulkè¯ãããã ããã¨è¨ãã¤ã¤ããã£ããã¨ãªãã£ãã®ã§ã å æ¥ã¡ãã£ã¨è¿æã§è©±é¡ã«ä¸ãã£ãã®ã§ãä»äºä¸ã«æ¯æãã§ãã£ã¦ã¿ããon mac
å ã«ã¾ã¨ã
- input / parser / execute / outputã¨ãç´°ããåãã¦pluginã«ãªã£ã¦ãããä¸è¬çãªå½¢å¼ãªãã»ã¼çµã¿åããã¦ããã
- ç°å¸¸ã«æ¥½
- ä¸æ¦inputå´ãä½ã£ã¦stdoutã«åºã -> outputãä½ã(ãããã¯éé )ãã¨ã¹ããããåå²ãã¦ä½ã£ã¦ãããã®ã§ãæ®éã«ä½ããããããã«ããã
- sampleã®inputãã¼ã¿ãèªåçæãã¦ãããã®ã§ãoutputãå ã«ä½ã£ã¦ã楽ã ã£ãã¨æã(ä»åã¯ãããªãã£ããã©)
- pluginãåãæ±ãã®ã«
embulk gem
ã³ãã³ãã¨ãç¨æããã¦ã¦å°å³ã«ä¾¿å© - é¢ä¿ãªããã©brewããã
Embulkã®install
æå ã®Macã§ã®å®é¨ãªã®ã§ãããªæã
% brew cask install java % brew install embulk
RSSãåå¾ãã¦STDOUTã«åºåããã
ä¸æ°ã«ããã¨ããããªããªãã®ã§ãã¨ããããRSSããåå¾ãã¦stdoutã«åºãã¦ã¿ã
åèã«ããã®ã¯ä»¥ä¸ã®ããã
- http://qiita.com/takumakanari/items/8f6efe9c115411f25547
- https://github.com/takumakanari/embulk-parser-xml
Pluginãinstall
% embulk gem install embulk-input-http % embulk gem install embulk-parser-xml
config.ymlã®ä½æ
input
ã«http
ãparser
ã¨ãã¦xml
ã使ã
in: type: http url: http://yudoufu.hatenablog.jp/rss params: ~ parser: type: xml root: rss/channel/item schema: - { name: title, type: string } - { name: link, type: string } - { name: pubDate, type: string } method: get out: type: stdout
- ã¨ããããæ¨æºåºåã«åºã
- schemaã¯ãdescriptionãå«ãã¡ããã¨
å®è¡
% embulk run config.yml 2015-06-09 16:12:06.611 +0900: Embulk v0.6.5 2015-06-09 16:12:08.400 +0900 [INFO] (transaction): {done: 0 / 1, running: 0} 2015-06-09 16:12:08.681 +0900 [INFO] (task-0000): GET "http://yudoufu.hatenablog.jp/rss" Norikra meetup #2 ã«åå ãã¦ãã,http://yudoufu.hatenablog.jp/entry/2015/06/03/235131,Wed, 03 Jun 2015 23:51:31 +0900 AWS Summit 2015 - Day2 è¡ã£ã¦ããã¾ã¨ã,http://yudoufu.hatenablog.jp/entry/2015/06/03/232347,Wed, 03 Jun 2015 23:23:47 +0900 AWS Summit 2015 - Day2 - æ°ãµã¼ãã¹è§£èª¬ã»ãã·ã§ã³ EFS 㨠ML,http://yudoufu.hatenablog.jp/entry/2015/06/03/223904,Wed, 03 Jun 2015 22:39:04 +0900 AWS Summit 2015 - Day2 - ã¯ã©ã¦ããæ´»ç¨ããIoT/M2Mã½ãªã¥ã¼ã·ã§ã³,http://yudoufu.hatenablog.jp/entry/2015/06/03/223701,Wed, 03 Jun 2015 22:37:01 +0900 AWS Summit 2015 - Day2 - AWS ã»ãã¥ã¢ãã¶ã¤ã³(IAM) Deep Dive,http://yudoufu.hatenablog.jp/entry/2015/06/03/223637,Wed, 03 Jun 2015 22:36:37 +0900 AWS Summit 2015 - Day2 - AWS System Operation Deep Dive,http://yudoufu.hatenablog.jp/entry/2015/06/03/223605,Wed, 03 Jun 2015 22:36:05 +0900 AWS Summit 2015 - Day2 - ãããã¯ã¼ã¯Deep Dive,http://yudoufu.hatenablog.jp/entry/2015/06/03/223523,Wed, 03 Jun 2015 22:35:23 +0900 2015-06-09 16:12:09.064 +0900 [INFO] (transaction): {done: 1 / 1, running: 0} 2015-06-09 16:12:09.079 +0900 [INFO] (main): Committed. 2015-06-09 16:12:09.080 +0900 [INFO] (main): Next config diff: {"in":{},"out":{}}
- ã²ã¨ã¾ãç®çã®ãã¼ã¿ã¯åºã
Elasticsearchã«å ¥ãã¦ã¿ã
以ä¸ã®è¨äºãREADMEãåèã«ãæ¸ãã¦ã¿ã
- http://swfz.hatenablog.com/entry/2015/04/25/184339
- https://github.com/muga/embulk-output-elasticsearch
Elasticsearchã®Install
- https://www.elastic.co/guide/en/elasticsearch/guide/current/_installing_elasticsearch.html
- unzipããã°ããããã
- ä»åã¯å®é¨ãªã®ã§ãbrewã§install
% brew install elasticesearch
- brewã§å
¥ããã¨
cluster_name
ãelasticsearch_username
ã«ãªã£ã¦ãã®ã§ãå¿ è¦ãªãé©å½ã«å¤ãã
% vi /usr/local/opt/elasticsearch/config/elasticsearch.yml cluster.name: elasticsearch_yudoufu
% elasticsearch --config=/usr/local/opt/elasticsearch/config/elasticsearch.yml
- ã¨ããããããã§ã
http://127.0.0.1:9200/
ã«nodeãç«ã¤ 9300
ããã¼ãééä¿¡ã®portã§ããã£ã¡ã使ã£ã¦æ¥ç¶ããã£ã½ã
Embulk pluginãå ¥ãã
% embulk gem install embulk-output-elasticsearch
config.ymlã®outputãä½ã
- outé¨åãä¿®æ£
- äºåã«
index
ãä½ã£ã¦ããå¿ è¦ã¯ç¹ã«ãªãã®ã§ãcluster_name
ã¨nodes
ã®æ å ±ã ãåããã¦ãã
- äºåã«
out: type: elasticsearch cluster_name: elasticsearch_yudoufu nodes: - { host: 127.0.0.1, port: 9300 } index: embulk_yudoufulog_rss index_type: embulk
å®è¡
% embulk run config.yml 2015-06-09 17:05:01.632 +0900: Embulk v0.6.5 2015-06-09 17:05:03.559 +0900 [INFO] (transaction): [Dominus] loaded [], sites [] 2015-06-09 17:05:04.391 +0900 [INFO] (transaction): {done: 0 / 1, running: 0} 2015-06-09 17:05:04.400 +0900 [INFO] (task-0000): [Siena Blaze] loaded [], sites [] 2015-06-09 17:05:04.686 +0900 [INFO] (task-0000): GET "http://yudoufu.hatenablog.jp/rss" 2015-06-09 17:05:05.137 +0900 [INFO] (task-0000): Execute 7 bulk actions 2015-06-09 17:05:05.735 +0900 [INFO] (elasticsearch[Siena Blaze][transport_client_worker][T#5]{New I/O worker #22}): 7 bulk actions succeeded 2015-06-09 17:05:05.745 +0900 [INFO] (transaction): {done: 1 / 1, running: 0} 2015-06-09 17:05:05.761 +0900 [INFO] (main): Committed. 2015-06-09 17:05:05.762 +0900 [INFO] (main): Next config diff: {"in":{},"out":{}}
ãã¼ã¿ã®ç¢ºèª
- ãã¼ã¿å ¥ã£ãã®ã確èª!!ï¼¼(^o^)ï¼
% curl -X GET http://127.0.0.1:9200/embulk_yudoufulog_rss/embulk/_search\?q\=AWS\&pretty { "took" : 10, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 6, "max_score" : 0.26516503, "hits" : [ { "_index" : "embulk_yudoufulog_rss", "_type" : "embulk", "_id" : "AU3XWfGno1wqzcGPiVrl", "_score" : 0.26516503, "_source":{"title":"AWS Summit 2015 - Day2 - AWS System Operation Deep Dive","link":"http://yudoufu.hatenablog.jp/entry/2015/06/03/223605","pubDate":"Wed, 03 Jun 2015 22:36:05 +0900"} }, { "_index" : "embulk_yudoufulog_rss", "_type" : "embulk", "_id" : "AU3XWfGno1wqzcGPiVrh", "_score" : 0.111475274, "_source":{"title":"AWS Summit 2015 - Day2 è¡ã£ã¦ããã¾ã¨ã","link":"http://yudoufu.hatenablog.jp/entry/2015/06/03/232347","pubDate":"Wed, 03 Jun 2015 23:23:47 +0900"} }, { "_index" : "embulk_yudoufulog_rss", "_type" : "embulk", "_id" : "AU3XWfGno1wqzcGPiVrm", "_score" : 0.111475274, "_source":{"title":"AWS Summit 2015 - Day2 - ãããã¯ã¼ã¯Deep Dive","link":"http://yudoufu.hatenablog.jp/entry/2015/06/03/223523","pubDate":"Wed, 03 Jun 2015 22:35:23 +0900"} }, { "_index" : "embulk_yudoufulog_rss", "_type" : "embulk", "_id" : "AU3XWfGno1wqzcGPiVrk", "_score" : 0.081366636, "_source":{"title":"AWS Summit 2015 - Day2 - AWS ã»ãã¥ã¢ãã¶ã¤ã³(IAM) Deep Dive","link":"http://yudoufu.hatenablog.jp/entry/2015/06/03/223637","pubDate":"Wed, 03 Jun 2015 22:36:37 +0900"} }, { "_index" : "embulk_yudoufulog_rss", "_type" : "embulk", "_id" : "AU3XWfGno1wqzcGPiVrj", "_score" : 0.057534903, "_source":{"title":"AWS Summit 2015 - Day2 - ã¯ã©ã¦ããæ´»ç¨ããIoT/M2Mã½ãªã¥ã¼ã·ã§ã³","link":"http://yudoufu.hatenablog.jp/entry/2015/06/03/223701","pubDate":"Wed, 03 Jun 2015 22:37:01 +0900"} }, { "_index" : "embulk_yudoufulog_rss", "_type" : "embulk", "_id" : "AU3XWfGno1wqzcGPiVri", "_score" : 0.057534903, "_source":{"title":"AWS Summit 2015 - Day2 - æ°ãµã¼ãã¹è§£èª¬ã»ãã·ã§ã³ EFS 㨠ML","link":"http://yudoufu.hatenablog.jp/entry/2015/06/03/223904","pubDate":"Wed, 03 Jun 2015 22:39:04 +0900"} } ] } }
æçµçãªconfig.yml
åå¿é²ã¨ãã¦ã¾ã¨ãã¦ãã
in: type: http url: http://yudoufu.hatenablog.jp/rss params: ~ parser: type: xml root: rss/channel/item schema: - { name: title, type: string } - { name: link, type: string } - { name: pubDate, type: string } method: get out: type: elasticsearch cluster_name: elasticsearch_yudoufu nodes: - { host: 127.0.0.1, port: 9300 } index: embulk_yudoufulog_rss index_type: embulk