Treasure Data
å°ãåã«ãã°ã®è©±ãæ¸ãã http://d.hatena.ne.jp/naoya/20130219/1361262854 ã¨ãã«ãTreasure Data ã«ã¤ãã¦ã¯å¾æ¥ã«ããå°ã詳細ã«æ¸ãã¨è¨ã£ãã®ã§æ¸ãã¨ãããã
è¿é Treasure Data (以ä¸ãææ TD) ã¨ããååãã¡ãã»ãèãããã¨ããã人ã¯å¤ãã®ã§ã¯ãªããã¨æãã¾ãããããã°ãã¼ã¿ã®ã¯ã©ã¦ããµã¼ãã¹ã§ãããã¨ããæ¥æ¬äººãåµæ¥ããã·ãªã³ã³ãã¬ã¼ã®ãã³ãã£ã¼ããããã㯠Yahoo! åµæ¥è ã® Jerry Yang ãæè³ããã¨ããFluentd ã¨ä½ãé¢ä¿ãããã¨ãã£ãæèãªã©ãªã©ã
ãã©ãå ·ä½çã« Treasure Data ãã©ããããµã¼ãã¹ã§ãã©ãããæ©è½ãæã£ã¦ãã¦ãã©ããªå ´é¢ã§å©ç¨ããããã®ãªã®ãã¯ã¾ã ãã¾ãè¯ãç¥ããã¦ããªããããããªã・・・ããã«ãè¦ãããä»æ¥ã¯ãã®è¾ºããå°ãç´¹ä»ãã¦ãããããªã¨æãã
Treasure Data ãæä¾ãããµã¼ãã¹
æ¬å½ã«ããããåç´åãã¦è¨ãã¨TDã¯ãæå ã®ãµã¼ãã¼ã¨ããããã°ãã©ãã©ãéãã¤ãã¦ããã¨ãããä¿åãã¨ãã¦ããã¦ãSQL ãæãã㨠MapReduce ã§å¤§è¦æ¨¡ä¸¦åã«ãããå®è¡ãã¦çµæã ãè¿ãã¦ãããã¯ã©ã¦ããªãµã¼ãã¹ãã§ãã
èªåã¯å人ã§ã TD ãå©ç¨ãã¦ããã®ã ãã©ãä¾ãã° amazlet ã¨ããããã¶ãæã«ä½ã£ãã¦ã§ãã¢ããªã±ã¼ã·ã§ã³ã®ãã°ãã¢ã¯ã»ã¹ãã°ã«ãããæ å ±ãå ãããã®ãªããã TD ã«éãç¶ãã¦ãããOSX ã«ã¤ã³ã¹ãã¼ã«ãã td ã³ãã³ãã§ãTD ã®ãµã¼ãã¼ã«ã¹ãã¼ã¿ã¹ãåãåãããã
% td tables nginx +----------+--------+------+---------+--------+---------------------------+--------+ | Database | Table | Type | Count | Size | Last import | Schema | +----------+--------+------+---------+--------+---------------------------+--------+ | nginx | access | log | 2649812 | 0.1 GB | 2013-03-22 17:01:57 +0900 | | +----------+--------+------+---------+--------+---------------------------+--------+
ã¾ããããªã«å¤§ããè¦æ¨¡ã®ãã¼ã¿ã§ã¯ãªããã©ãã¨ã¯ããããã§ãæéæ°ä¸äººãããã®ã¦ã¼ã¶ã¼ã¯ããã
ãã¦ããã®éãç¶ãããã°ããç´è¿ä¸ã¶æãããã®éã«ãamazlet ã§ç´¹ä»ããã Amazon ã®ååãè¨ç®ãã¦ã¿ããã
% td query -w -d nginx "select v['asin'] as asin, count(1) as cnt from access group by v['asin'] order by cnt desc limit 100"
td ã³ãã³ã㧠SQL (ã£ã½ã) ã¯ã¨ãªãéä¿¡ãããããã¨
Job 2131709 is queued. Use 'td job:show 2131709' to show the status. queued... started at 2013-03-22T08:07:49Z Hive history file=/mnt/hive/tmp/1624/hive_job_log__1111533064.txt Total MapReduce jobs = 2 Launching Job 1 out of 2 Number of reduce tasks not specified. Defaulting to jobconf value of: 12 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_201301150013_218289, Tracking URL = ⦠2013-03-22 08:08:18,702 Stage-1 map = 0%, reduce = 0% 2013-03-22 08:08:26,779 Stage-1 map = 26%, reduce = 0% 2013-03-22 08:08:29,814 Stage-1 map = 41%, reduce = 0% 2013-03-22 08:08:32,858 Stage-1 map = 58%, reduce = 0% 2013-03-22 08:08:35,907 Stage-1 map = 72%, reduce = 0% 2013-03-22 08:08:38,935 Stage-1 map = 83%, reduce = 0%
ãããªæãã§ãããã¯ã¼ã¯ã®åããå´ã§ MapReduce è¨ç®ãå§ã¾ã£ã¦å¦çãè¡ããããåãã¦å®è¡ããã¨ãã¯æå ã®OSXããã³ãã³ããéãã ãã§ãã¤ã³ã¿ã¼ããããéã㦠MapReduce ãå®è¡ãã¦ããªãã¦ï½¥ï½¥ï½¥! ã¨ã¡ãã£ã¨ããé«ææããã£ããããã
ã§ãçµæã¯ãã®ã¾ã¾æ¨æºåºåã«è¿ã£ã¦ããã
| B00BHAF688 | 307 | â ã¸ã§ã¸ã§ (PlayStation 3) | B00BHO0FK8 | 274 | â Evangelion Q ã® Blu-ray | B009GSX0A4 | 147 | â éä¹±ã«ã°ã© (PSP Vita) | B00A64CFIK | 136 | â åé³ã㯠(PlayStation 3) | B00APVDHLI | 134 | â ã¸ã§ã¸ã§ (PlayStation 3) | B0095D6I86 | 128 | â ã¡ã¿ã«ã®ã¢ ã©ã¤ã¸ã³ã° (PlayStation 3) | B00BIYSEFA | 123 | â çã»å¥³ç¥è»¢çIV (Nintendo 3DS) | B00AHA5OCC | 113 | â SOUL SACRIFICE (PlayStaion 3) | B00BIYSF7C | 112 | â ãµã¢ã³ãã¤ã5
ç¢å°ä»¥éã¯èªåãè£ã£ããã®ãã©ããã amazlet ã¯ã²ã¼ã ã½ãããç´¹ä»ããã®ãªããã«ãã使ããã¦ãç´è¿ä¸ã¶æããã㯠PS3 ã®ã¸ã§ã¸ã§ãã¨ã´ã¡ã³ã²ãªãªã³ã®æ ç»ã® Blu-ray ã人æ°ã ã£ã・・・ãªãã¦ãã¨ãããã£ããããã§ã¯å²ã¨åç´ãªã¯ã¨ãªãæãã¦ãããã©ãããããªãã¼ã¿ã¨ç´ã¥ãã¦ãã£ã¨è¤éãªã¯ã¨ãªãå®è¡ãããã¨ãã¦ãããã㯠MapReduce ãªã㧠I/O ãªã½ã¼ã¹ã CPU ãªã½ã¼ã¹ããªãã¢ã«ã¹ã±ã¼ã«ããããã«ãªã£ã¦ããã
ãã®ä¾ã®ã±ã¼ã¹ã®å ´åãéã£ã¦ãããã¼ã¿ã¯ããããã¾ã æ°ç¾MBç¨åº¦ãªã®ã§ä½ã Treasure Data ãé ¼ããªãã¦ã MySQL ã MongoDB ã§ãååå¦çã§ãããã§ããã¤ã³ãã¯ããã§ã¯ãªãããã¨ããã¼ã¿ã æ°ç¾GB ã TB ãªã¼ãã¼ã«ãªã£ã¦ããªãã¬ã¼ã·ã§ã³ã¨ãã¦ã¯ä½ãããããªããã¤ã¾ãã¹ã±ã¼ã©ãã«ã§ãããã¨ããã¨ãããªã®ã¯ããããããªã
éä¸ãã¸ã§ãã®çµéåºåã« Hadoop ã Hive ãªãã¦åèªãã¡ãã»ãè¦ããéããTD 㯠MapReduce ã®å®è¡åºç¤ã¨ãã¦ã® Hadoopããããã SQL 風ã®è¨èª (HiveQL) 㧠Hadoop ä¸ã®ãã¼ã¿ãæä½ã§ãã Hiveããããã使ã£ã¦æ§ç¯ããã¦ãããã¾ãå¾ã§ããããã触ããããã©ããå®éã«ã¯åãªã Hadoop + Hive ã®ãã¹ãã£ã³ã°ã§ã¯ãªããã¼ã¿ãåãä»ããé¨åããã¼ã¿ãä¿åããã¹ãã¬ã¼ã¸ããã«ãããã³ãã®ã¸ã§ããåé ããã¹ã±ã¸ã¥ã¼ã©ããããã¯çµæãè¿ãå種 API ãªã©ã¯ TD 社ãç¬èªã«éçºãããã®ã§ã¾ããªãå ¨ä½ãçµ±åãããã®ããã°ãããããéãã¤ãã¦ããã¦å¥½ããªã¨ãã« SQL 㧠MapReduceã§ãããã¨ããã¦ã¼ã¹ã±ã¼ã¹ãæä¾ããã
ããã Treasure Data ã¨ãããµã¼ãã¹ï½¥ï½¥ï½¥ã¨ãããã¨ã«ãªãã¾ãã
å®éã©ããªå ´é¢ã§ä½¿ãããã®ã
大è¦æ¨¡ã« SQL çã«ãã¼ã¿è§£æãã§ããã¨ãã¦ãå®éã«ã©ããªå ´é¢ã§ä½¿ãããã®? ã¨ããã®ã次ã«æ°ã«ãªãã¨ããã§ãããã
ã½ã¼ã·ã£ã«ã²ã¼ã ãã¯ããã¨ããæè¿ã®Webãµã¼ãã¹ã§ã¯ãã°è§£æãéè¦ãªå½¹å²ãå ãã・・・ã¨ãã話ã¯èãããã¨ããã人ãå¤ãã¨æãã¾ããTD ã®ãããªã½ãªã¥ã¼ã·ã§ã³ã使ãããã®ã¯ãã¾ãã«ããã§ããæè¿ã¯åºåãªãããæè¡é©æ°ãé²ãã§ããªã大è¦æ¨¡ãªãã¼ã¿ãå¦çããããã«ãªã£ã¦ãã¦ãã¦ãããã§ã使ããã¦ãããä¸ã®äººã®ãã¬ã¼ã³ ã«ããäºä¾ã ã¨ã¯ãã¯ããããMobFox ãªãããæåã©ããã§ããè³æã«ã¯è¼ã£ã¦ãªããã©ãå²ã¨å½å ã®ã½ã¼ã·ã£ã«ã²ã¼ã ãããããã¼å社ã§ã¯ããªãå°å ¥ãé²ãã§ããã¨èãã¦ã¾ãã
ããããä½ã§ãããªãã¨ã«ãªã£ã¦ããã®? ã¨ããç¹ã«ã¤ãã¦å°ãææãã¦ããããã
ãã¨ãã¨ã¦ã§ãã·ã¹ãã ã®ãã¼ã¿è§£æã¨ãã£ãããã£ã±ãããã¯ã¦ã§ããµã¼ãã¼ã®ã¢ã¯ã»ã¹ãã°ã®ãã¨ã ã£ããApache ã® access_logããããã¢ã¯ã»ã¹ãã°ã«é¢ãã¦ã¯ãæè¿ã¯ãã°ãã PV ã UU ãè¨ç®ãããããã ã£ãã Google Analytics ãªãããæåãã使ã£ã¦ããã¨ããäºä¾ã®ã»ããå¤ããããããªãããããã«ãã¦ãæ¬è³ªçã«ã¯HTTPãªã¯ã¨ã¹ãããå¾ãããæ å ±ã ãã§åæãã¦ããã¨ãããã¨ã§å¾ãããæ å ±ã¯ä¸ç·ã§ãã
ãã®ã¢ã¯ã»ã¹ãã°åæã§ã¯ãã©ã®URLã«ã©ã®ç¨åº¦ã®ã¢ã¯ã»ã¹ããã£ããã¨ããã ãããæ¥ãæã«ã©ããããã® UU ããã£ããã¨ãã£ããã¨ã¯ããããã§ãããã以ä¸ã¯ããããªããã¢ã¯ã»ã¹ããã¦ã¼ã¶ã¼ã®æ§å¥ãå¹´é½¢ã¨ãã£ãåå¥ã®å±æ§ããã©ã³ã¶ã¯ã·ã§ã³IDãè³¼å ¥ãããã¨ããåå・・・ã¿ãããªã¦ã§ããµã¼ãã¼ãæç¥ãããã®ãªããã¼ã¿ã¯å«ã¾ãã¦ããªãã®ã§ããã以ä¸ã®ãã¨ã調ã¹ããã¨æã£ã¦ã調ã¹ããããªãã
ã詳細ãªãã¼ã¿ãåããªããªãåããããã«ããã°ãããããªããã¨ãããã¨ã§ãã¢ããªã±ã¼ã·ã§ã³ã®ãã¸ãã¯ããè²ã ã¨ãã®è¾ºãç´ã¥ãããã°ãåãããã«ãã¦ããã°ãã・・・ã¿ããªå½ç¶ããããããã§ãã
ã©ãããã³ã¼ãã§ä¾ãæ¸ããããããã¡ãã£ã¨å¾®å¦ã ãã©ã
# ååè³¼å ¥ç»é¢ post '/purchase' => sub { my $self = shift; my $item = My::Item->find(â¦); my $user = My::User->purchase( $item ); # ã¤ã³ã¿ã©ã¯ã·ã§ã³ãã°ãåºåãã $self->logger->emit( user_hash => $user->hash, age => $user->age, sex => $user->sex, session => $user->session_id, item => $item->id, ⦠); $self->render; };
ãããªæãã§ãã¨ããã¤ãã³ãã«å¯¾ãã¦ã¢ã¯ã»ã¹ãã°ã ãã§ã¯ææãããããªããã¼ã¿ããã°ã¨ãã¦æ¸ãåºãã¦ããããã®æã®ãã°ããã¡ãã¡ã§ã¨ã£ã¦ããã¦å¾ãã解æããã°ãä¾ãã°ãåå購買åã®ç»é¢ã§ä½%ããããããã¦ãããã©ãããã©ããªå±æ§ã®ã¦ã¼ã¶ã¼ã ã£ããã¨ããä¸ã¶æã«æ°å以ä¸è¨ªããã¦ã¼ã¶ã¼ã¨ããã§ãªãã¦ã¼ã¶ã¼ã®ã³ã³ãã¼ã¸ã§ã³ã®ç¨åº¦ããã®ãããéã£ãããªãã¦ããåæãå¯è½ã«ãªãããã®ãã¼ã¿ã使ã£ã¦æå¿æ±ºå®ãããã°ããã¿ããã«ãµã¤ããæ¹åããããã¯ãã£ã¨ç¢ºåº¦ã®é«ãæ½çãæã¤ãã¨ãã§ããããA/B ãã¹ããªããã§ã®è©ä¾¡ã«ãå©ç¨ããããã
・・・解æã§ããããã«ãªãã®ã¯ãããã ãã©ããããªã«ãã¾ã話ã¯ãªããå½ç¶ããããæ©ã¾ããåé¡ãã§ã¦ãããç¹ã«ãµã¤ãã®è¦æ¨¡ã大ãããªãã°ãªãã»ã©ã
- ãã®ãã°ã£ã¦ã©ããã£ã¦åéããã®?
- ãããªã§ãããã¼ã¿ã©ãã«ã¹ãã¢ããã®?
- ãããªã§ãããã¼ã¿ã©ããã£ã¦è¨ç®ããã®?
- ãã°ã®ãã©ã¼ãããå¤æ´ã«ã©ã対å¿ããã®?
- è¨ç®çµæã¯ã©ããã£ã¦åç §ããã®?
ãã¼ã«ã«ã«åãåºãããã°ã¯ãã©ãã«ããã¦è§£æç¨ã®ã¹ãã¬ã¼ã¸ã«éãã¦ããªãã¨ãããªããã¤ãã³ããã°ã¨ããæ§æ ¼ä¸ãããã¯ãªãã¹ããªã¢ã«ã¿ã¤ã ã§åéãã¦ãããããéãããã¼ã¿ãä¿åããã¨ãã¦ãæ¥ã«æ°ç¾GBã«ãªããããªãã¼ã¿ããã£ããã©ãã«ä¿åãã¤ã¥ããã¨ããã®ããMySQL? MongoDB? ãã¼ããéããã¯ãããã©ããã¼ã¿ãã§ãããã¦éè¨ã®ããããä¸æ¥ã§çµãããªã・・・!! è¨ç®ã§ããã®ã¯ãããã©ãæ¯åã¨ã³ã¸ãã¢ã«ãé¡ãããªãã¨ãããªããã§ãã¨ã³ã¸ãã¢ãå¿ããã¦ãã£ã¦ãããªã! ãã°ã«æ°ããå±æ§ã追å ããã! ãããªã«ãããªå·¨å¤§ãª MySQL ã®ãã¼ãã«ã alter table ãããã? ãã¼ããã£ã¹ã¯ãå£ãã¾ãã! ãããã¯ã¼ã¯å¸¯åã溢ãã¾ãã・・・!
ãããã¼ã
ãªã¼ãã¦ãã¨ãèµ·ãã£ã¦ãã¾ããªããªãããããªã«é«åº¦ãªåæãããããããããªãã£ãã¨ãã¦ãããããªãã«ããã®ã·ã¹ãã æ§ç¯ã¨éç¨ç¶æã¯éª¨ã®æãã話・・・ã ã£ããã ããã¿ããªã¢ã¯ã»ã¹ãã°ç¨åº¦ã®åæã§å¦¥åãã¦ãããã¨ããããããã®ã¨ãããé å¼µã£ã¦ãã¡ãã¨ãã£ã¦ããã¼ã¿åæããã¨ã«ããæå¿æ±ºå®ãå¯è½ã«ããã®ã Zynga ãªãããæåã«ãä»ã§ã¯ä¸è¬çã«ãªãã¤ã¤ãããã®ææ³ã§ãã
Webãµã¼ãã¹ã®ãããªã¹ã¿ã¼ãå°è¦æ¨¡B2Cã§å§ã¾ããããªä¸çã§ã¯ããã®ãã¼ã¿è§£æå¨ãã¨ããã®ã¯ããæ°å¹´ã®éã«æ¥æ¿ã«çãä¸ãã£ãåéã ããã©ããã¨ã³ã¿ã¼ãã©ã¤ãºã·ã¹ãã ã§ã¯ (èªåãããã¾ãããããã£ã¦ãªããã©) ERM ãã SCM ãããã®è¾ºã®åºå¹¹çµ±åã·ã¹ãã ããéãã大éã®ãã¼ã¿ãæ ¼ç´ããåæã表示ããä¸é£ã®ã·ã¹ãã 㯠DWH (ãã¼ã¿ã¦ã§ã¢ãã¦ã¹) ã¨è¨ããã¦ãããã¯ããã¯ããããªã¨ã³ã¿ã¼ãã©ã¤ã¸ã¼ãªãã¼ããã½ããã«ãã£ã¦å®ç¾ããã¦ãã・・・ãããã
å æ¥ããã¨ããã¨ã³ã¿ã¼ãã©ã¤ãºãªåºå¹¹æ¥åã«æºãã£ã¦ããå人ããåºå¹¹ã®ããããä¸æ¥ã§çµããããªãã¨ãããªããã ãã©ããã¼ã¿ãå¤ããã¦çµãããªãã£ã¦å¤§å¤ããã®ããã«çµæ§ãªæ§è½ã®åç¨è£½åãè²·ã£ã¦ãããã®ãã¡ Hadoop ã¨ãåæ£ã·ã¹ãã ã§ãããã¨ãããã ãã©ããã¨è¨ã£ã¦ã¾ããã
ããã°ãã¼ã¿ã®æ´»ç¨ã¯ã3ã¤ã®æ®µéã«åããããã¨èãã¦ãã¾ãã第1段éã¯ã¦ã§ããã°ãã¼ã¿ã対象ã§ã主ã«ãããã³ã ä¼æ¥ããããã®ãã¼ã¿ã解æãã¦ãã¾ãããç¹ã«ããã©ã³ã¶ã¯ã·ã§ã³ãã¼ã¿ã®é¨åã«ç¦ç¹ãå½ã¦ã¦ããã®ã§ãããããããã°ãã¼ã¿ã®å°æ¥ã«ãã£ã¦ããã©ã³ã¶ã¯ã·ã§ã³ããããä¸æ®µé詳細ãªã¬ãã«ã§ããã¤ã³ã¿ã©ã¯ã·ã§ã³ãã¼ã¿ãã¤ã¾ããããåãããããã¼ã¿ã®ä¸èº«ã解æãããããã«ãªãã¾ããã
第2段éã¯ã解æã®å¯¾è±¡ãã½ã¼ã·ã£ã«ã¡ãã£ã¢ã«ç§»ã£ã¦ãã¾ãããFacebookãTwitterãããã°ãªã©ã«æ¸ãããããã¹ãã対象ã¨ãããã®ã§ããç¾å¨ã¯ãã®ç¬¬2段éã«ããã¾ãã
ã¨ããã® DWH ä¼æ¥ã® Teradata ã® CTO ãè¨ãããã«ãã ã®ã¢ã¯ã»ã¹ãã¼ã¿ããä¸æ©é²ãã§ãã¤ã³ã¿ã©ã¯ã·ã§ã³ãã¼ã¿ããåæã»æ´»ç¨ããããã«ãªã£ãã¨ããã®ã大ããªæµãã§ãã®èå¾ã«ã¯ããã°ãã¼ã¿(ã«ã¾ã¤ãããã¼ãã¦ã§ã¢ã®é²åãã½ããã¦ã§ã¢æè¡ã®ç»å ´)ããã£ããã¨ããã®ãè¿å¹´ã§ãã
ããããä¸é¨ã®éçºåã®ããä¼æ¥ã¯èªç¤¾ã®ã¨ã³ã¸ãã¢ãããã°ã£ããããããã¯è³éåã®ããä¼æ¥ã¯å°æ¥ãã³ãã¼ã¨çµãã§ããããã£ããã®ãæ§ç¯ãã¦ããã®ã ãã©ãAWS ãä»®æ³åæè¡ãã¯ããã¨ãã¦å¤§è¦æ¨¡ã¤ã³ãã©ãã³ã¢ãã£ãã£åããããã«ããã¼ã¿è§£æã·ã¹ãã ãã¯ã©ã¦ãã«ãã£ã¦ã³ã¢ãã£ãã£åãããã¨ãã試ã¿ã人ãã¡ãããæè¿ã§ã¦ãããTreasure Data 社ã¯ãããªéå¿æº¢ãããã³ãã£ã¼ä¼æ¥ã®ã²ã¨ã¤ããªãã§ãããã
Treasure Data ã®ã¢ã¼ããã¯ã㣠(ãã£ãã)
ããå ·ä½çã« Treasure Data ã¯å ã»ã©ã®å¤§è¦æ¨¡ãã°è§£æã«ã¾ã¤ããå種åé¡ã«ã©ã対å¿ããã®ãã
ãã®è¾ºã¯ å æ¥ã® JAWS DAYS 2013 ã§ã® @repeatedly ã®ãã¬ã¼ã³ã詳ããã
ãã°ã®åéã¯ãTreasure Data ãã¹ãã³ãµã¼ã«ãªã£ã¦éçºããã¦ãã OSS ã® Fluentd ã§è¡ããããæ£ç¢ºã«ã¯ãã® OSS ã® Fluentd ããTreasure Data åãã«ä½¿ãåæã«ããã±ã¼ã¸ã³ã°ãã td-agent (https://github.com/treasure-data/td-agent) ã使ããAPI ãã¼ãå ¥åãã¦ã¡ãã¡ããã¨å ¥åºåã®è¨å®ããã¦ããã ã㧠TD ã«ãã¼ã¿ãéããã¨ãã§ããããã«ãªã£ã¦ãããFluentd ãã®ãã®ã¯é常ã«ã¹ã±ã¼ã©ãã«ãªä½ãã«ãªã£ã¦ããããå®ç¸¾é¢ã(ããç¥ããã¦ããããã«) LINE ã®ããã¯ã¨ã³ããªããã«ã使ããã¦ããããã¦ååããã©ã¬ãã«ãªã¢ã¼ããã¯ãã£ã«ãã£ã¦ãã°ãã®ã»ãã®å ¥åãå¤ç¨®å¤æ§ãªãã©ã¼ãããã«å¯¾å¿ããããã JSON ã¨ããå¤åã«å¼·ãæè»ãªå½¢ã«å¤æãã¦åãæ±ããããã°ã©ã å ãã Fluentd ã«ãã°ãé£ã°ãããã®ãã¬ã¼ã©ã¤ãã©ãªããå種è¨èªã«å¯¾å¿ããå®è£ ãç¨æããã¦ããã
ãã¼ã¿ã¹ãã¢ã¯ TD 社ãç¬èªã«éçºããã«ã©ã ãã¼ã¹ãã¬ã¼ã¸ã使ããããç¬èªã«éçºããã¨ãã£ã¦ãå®éã«ã¯ S3 ä¸ã«æ§ç¯ãããã½ããã¦ã§ã¢ã®ããã§ã99.999999999% ã®å ç¢æ§ã¨ 99.99% ã®å¯ç¨æ§ã謳ã S3 ã®ä¸ã«ãHDFS ã®å¼±ç¹ãå æãããã(+ ãã®ã»ãå¹¾ã¤ãã®ç®çã®ãã)ã«ã©ã ãã¼å½¢å¼ã®ã¤ã³ã¿ãã§ã¼ã¹ãå®ç¾ãããã®ã«ãªã£ã¦ãããS3 ãªã®ã§å¢ãç¶ãããã¼ã¿ã«å¯¾ãã¦ã¹ã±ã¼ã©ãã«ã ããã«ã©ã ãã¼å½¢å¼ãªã®ã§ç¹å®ã®ãã¼ã¿ã ããå¦çãããããªãã¦ã¨ãã«ä½è¨ãª I/O ãçºçããªãä½ãã«ãªã£ã¦ããã®ã§å¹ççã
大è¦æ¨¡ãã¼ã¿ã«å¯¾ãã¦ã¹ã±ã¼ã«ããããã®è¨ç®åºç¤ã¯ Hive + Hadoopãã§ãHadoop ãå ã®ã«ã©ã ãã¼ã¹ãã¬ã¼ã¸ã«å¯¾å¿ããã¦ããã
è¨ç®çµæã®åãåãæ¹ï½¥ï½¥ï½¥ãã㯠TD ã®ã¦ãªã®ä¸ã¤ã§ããã£ã¦ãWeb API ã MySQL ã S3 ãªã©ããããªå½¢å¼ã§åãåããããã«ãªã£ã¦ããããããã¨èªç¤¾ã®ã°ã©ãåãã¼ã«ï½¥ï½¥ï½¥ãã®åéã§ã¯ BI (Business Intelligence) ãªãã¦è¨ã£ãããããã©ããããã«æµãè¾¼ãã§ãããã¨ã§ TD ã§è¨ç®ããå種ææ¨ãå®ç¹è¦³æ¸¬ããããããã¨ãªãããã§ããã
・・・ã¨ãããã¼ã¿åéããåºåã¾ã§ã®ä¸é£ã®ã·ã¹ãã ãæ§ç¯ãããã¨ã§ããããããã¼ã®è¦çããè©ä»£ãããã¦ãã¾ãããããªã¼ã«ã¤ã³ã¯ã³ãã¯ã©ã¦ããµã¼ãã¹ã¨ãã¦ã§æä¾ãããã¨ã«ãã£ã¦ä»¶ã®åé¡ã解決ãã¦ããã
Treasure Data vs ...
Bigdata as a Service ã®æµãã¯ä½ã Treasure Data ã ããé²ãã¦ããåéã¨ããããã§ã¯ãªããä»ã«ãããããªç«¶åããããç¹ã«æ¯è¼ãããããã®ã¯ TD ã®ããã¯ã¨ã³ãã«ããªã£ã¦ãã Amazon ããAWSã®ä¸ç°ã¨ãã¦æä¾ãããã¼ã¿è§£æç¨ã®å種ãµã¼ãã¹ã*1 ããå ·ä½çã«ã¯
ããããããã«ç¸å½ããã
EMR ã¯ãã®åã®éã AWS ã«ããã MapReduce ã®ãµã¼ãã¹ã§ãHive ã使ããªãã·ã§ã³ããããS3 ã«ã¹ãã¢ãããã¼ã¿ãèªã¿è¾¼ãã§ä»»æã® MapReduce å¦çãå®è¡ããããã¨ãã§ãããRedshift ã«ããã£ã¦ã¯ã¾ãã« DWH ãã®ãã®ã§ãã
ããã«å¯¾ã㦠Treasure Data ãæä¾ãããã®ã¯ãã£ãã? ãããå ã®ãã¬ã¼ã³ãã¿ãã¨ãããåç»ã®æå¾ã§ã¯ãã¾ãã«ãããã®éãã®è³ªçå¿çããã£ããããã
ããã«å¯¾ããçãã¯ãæ®éã« EMR ã Redshift ã使ãã ããªãåãã ããã©ãå®éã«ã¯ Treasure Data ã¯åéããåºåã¾ã§ãçµ±åçã«ã¾ã¨ãã¦é¢åãè¦ã¦ãããã¨ãããã«è±å¯ãª API ãç¨æãããã¨ã§ãããããã¼ãã¬ã³ããªã¼ã«ä»ä¸ãã¦ãããã¨ã大ããªå·®å¥åãã¤ã³ãã«ãªã£ã¦ããã誤解ãæããã«è¨ã£ã¦ã¿ãã°ãçã® AWS ã«å¯¾ãã Herokuãã¿ãããªãã®ã ã¨è¦ã¦ããããããããªããDWH ã¯éç¨ãã¨ã«ããé¢åãªã®ãæã ãé¢åè¦ãã! ã¨ããã®ããã®æã®ã½ãªã¥ã¼ã·ã§ã³ã®ã¹ã¿ã³ã¹ã ãããã®é¡§å®¢ããã¡ã°ããã£ã¦æ¬²ããé¨åã«ç¹ã«ãã©ã¼ã«ã¹ãããã¨ã§å·®å¥åãã¦ããã¨ããæå³ã§ Treasure Data ã®æ¦ç¥ã¯çµæ§çããã・・・ã¨èªåãæããæ¹ã ã§ãããè¦ããã¦ããããã ã
TD ã®ä½¿ç¨æ
å®é TD ã解決ããã®ã¯å¤§è¦æ¨¡ãã°è§£æãªããã ããã©ããèªåã®ããã«å°è¦æ¨¡ã«ä½¿ãã¨ããã®ã§ãå ¨ãåé¡ãªããã¨ããã TD ã¯ãããæ³å®ãã¦ããã
ãã®è¾ºã¯ http://d.hatena.ne.jp/naoya/20130219/1361262854 ã§ãæ¸ããéãä»å¾ããã£ã¨å¢ãç¶ãããã°ããã ãã TD ã«éãç¶ããã ãã§è¯ãã¨ã使ãåæã®è¯ãããã®ããã¸ã¡ã³ããããªãã¦ã¨ããç²¾ç¥çå®å¿ãå¾ããããå ã®ãè¿°ã¹ãéãããããããã¼ãã¬ã³ããªã¼ã§ãããã¨ãã¦ãªã«ãã¦ãããã¨ããã£ã¦ãç ©éãªè¨å®ããããªãããåã£ããã¨ããããã¨æã£ããã ããããããããã¨ã«ç¸å½ãã API ãç¨æããã¦ããããã® API ã Restful ã§ã·ã³ãã«ãªã¢ã¼ããã¯ãã£ã«ãªã£ã¦ã¦å¦ç¿ã³ã¹ãã¯ä½ãã
ãªããèªå㯠Fluentd ã使ã£ã¦åºåã2æ¹åã«ã³ãã¼ãã¦ãçãã°ã¯ä¿åç®ç㧠S3 ã«ç´æ¥è»¢éãã¦ãåæåãã®ããããªãã¼ã¿ããã£ã¤ãããã®ã TD ã«éãã¤ãããªã©ãã¦ä½¿ã£ã¦ãã¾ãã
ã©ããªé¢¨ã«ã¿ãã¦ããã
Treasure Data ã®ãã¾ãä»å¾ãã©ããªé¢¨ã«è¦ãããã¨ããã®ã¯ãæºããã·ã¹ãã 大å°ã®è¦ç¹ã«ãã£ã¦å¤ãã£ã¦ããã¨æãã
ã¨ã³ã¿ã¼ãã©ã¤ãºãªäººã大è¦æ¨¡Webå±ã«ã¨ã£ã¦ã¯æ¢åã®DWHãããã¯èªç¤¾ã§éçºãããã°è§£æã·ã¹ãã ãç½®ãæãããã®ãã¤ã¾ã Treasure Data ãæ³å®ããã¦ã¼ã¶ã¼åã®éãã«ãããèªèããã¨æãã¾ããå®éãã¯ãã¯ããã社ãªããã¯èªåãã¡ã§ Hadoop ãéç¨ãã¦ããããã©ããã®éç¨ã³ã¹ããé«ãã¤ããã¨ããã®ã§ Treasure Data ã«ç§»è¡ããã¨ããããã¨ã³ã¿ã¼ãã©ã¤ãºé åã«é¢ãã¦ã¯å ã«è¿°ã¹ãéããããã«ä½¿ããã大è¦æ¨¡ãã¼ã¿ãã¼ã¹ãªãããå¿ è¦ã ã£ãé¨åã®ç½®æãæ³åããã¨ããã ã¨æãããã¨ã³ã¿ã¼ãã©ã¤ãºé åã¯ãã¼ã¿ãã¯ã©ã¦ãã«é ãããããåé¡ãã¨ãããã¤ãã®èª²é¡ãããã®ã§ãªããªãã¹ã ã¼ãºã«æµ¸éãã¦ãããªãã¨ã¯æãããå é²çãªã¦ã¼ã¶ã¼ã¯ãã§ã«å°å ¥ãéå§ãã¦ããã¨ãããã¨ããã£ã¦ãæ¡å¤æ¥½è¦³çãªæªæ¥ãå¾ ã£ã¦ãããããããªãã
ä¸æ¹ãããã¯å人çãªå¦æ³ã§ãããã®ã ãã©ãã¹ã¿ã¼ãã¢ãããå°è¦æ¨¡ãªãããããã¼ãã㯠AWS ãå人ã«å¯¾ãã¦ãä»®æ³åãµã¼ãã¼ããã¼ããã©ã³ãµã¼ãã®ã»ããçµã¿åãããã·ã¹ãã ãã³ã¢ãã£ãã£åããããã«è¦ããã®ã¨åãã§ãTD社ãããã¯å社ã®ç«¶åããããããå ãã¼ã¿è§£æåºç¤ (DWH) ãã³ã¢ãã£ãã£åãã¦ãããã¨ãæå¾ ããããã¯ã©ã¦ãã¯ããæå³ãå人ãã¹ã¿ã¼ãã¢ããã®ãããªå°è¦æ¨¡ãªãã¼ã ãã¨ã³ãã¯ã¼ã¡ã³ããããã¼ã«ã§ããããããã 4人ã®ä¼ç¤¾ã§ 1,000 ä¸ã¦ã¼ã¶ã¼ããã°ããã£ã Instagram ã®æåãªãã㯠AWS ããããããµã¼ãã¹ãªãã ã¨ãããã¨ãä¸ã®ä¸ã«ç¥ãããããTD ãåãããã«ã巨人ã¨æ¦ããã¨ãããããªå°ããªãã¼ã ã®æ°ããæ¦å¨ã«ãªãã°ãããããããªã£ãã楽ããã
・・・ã¨ããããã§ãªããè²ã æ¸ãã¦ããç±ããªã£ã¦ãã¾ã£ã¦æ¸ãããã¾ããã Treasure Data ã®ç´¹ä»ã§ããã
èªåããªãã§ãããªã« Treasure Data æ¨ããã¨ããã¨ãã¾ãæ£ç´ã«è¨ã£ã¦ CTO ã® @kzk_mover ãç¥ã£ã¦ããããå¿æ´ãããã¨ããå人çãªæ°æã¡ãçµæ§ããã®ã¯é ããªããã§ããå®éã«èªåã§ä½¿ã£ã¦ã¿ããè¦äºã«èªåã®æ±ãã¦ããåé¡ã解決ãã¦ãããããæè¡çãçã£å½ã§ããªã«ãããã®å人çå±æã¿ãããªã®ãå®ç¾ãã¦ãããå¯è½æ§ãããããã ãããã¤ãã¤ãæå¾ ãã¡ãã£ããã§ããã
ãã®è¨äºãçµæ§èªã¾ãã¦ããã«å°æ¥ TD 社ããã£ã¨å¤§ãããªã£ã¦ãµã¯ã»ã¹ (!) ããæ¥ã«ã¯ãç¹ä¸å¯¿å¸ã§ã奢ã£ã¦ããããã¨ã«ããã¨ãããã¨ã§æ¬ç¨¿ãç· ãããã¨æãã¾ãã