BigQuery 㨠Google 㮠Big Data Stack 2.0
å æ¥ãæå¿ã§éã¾ã£ã¦ãBigQuery Analyticsãã¨ããæ¸ç±ã®èªæ¸ä¼ããã£ãããã®åã®éã Google BigQuery ã«ã¤ãã¦æ¸ãããæ´æ¸ã
BigQuery ãæè¿ä»äºã§ä½¿ãå§ããã®ã ããBigQuery ãéçºãããèæ¯ã¨ãã¢ã¼ããã¯ãã£ã¼ã¨ããã¾ã調ã¹ãããã«ä½¿ãå§ããã®ã§ä»æ´ãªãããã®è¾ºã®ã¤ã³ããããå¢ããã¦ä»¥éã¨æã£ã次第ã
ããã§ãèªæ¸ä¼ã®ç¬¬1åç®ã¯æ¸ç±ã®ä¸ã§ã Overview ã«ç¸å½ããã¨ãããä¸å¿ã«èªã¿åããã¦ãã£ããããã ãã§ããªããªãã«é¢ç½ãã£ãã®ã§å°ãããã°ã«ã§ãæ¸ãã¦ã¿ããããªã¨æãã
BigQuery ã®è©±ãã®ãã®ãé¢ç½ãããå人çã«ã¯ Google ã®ã¤ã³ãã©ãæ¸ç±ãGoogle ãæ¯ããæè¡ãã§è§£èª¬ããããã®ã "Big Data Stack 1.0" ã ã¨ãã¦ãBigQuery 㯠Big Data Stack 2.0 ã®ä¸ã«æ§ç¯ããã¦ãããæ´ã« Google å é¨ã§ã¯ã¨ã£ãã« Big Data Stack 3.0 ã«ç§»è¡ãã¦ã (ãã¤ã¤ãã?) ã¿ãããªè©±ã«å¼·ãèå³ãæã£ãã(ã¾ãã3.0 ã¯è«æã«ãä½ã«ãçºè¡¨ããã¦ãªãã®ã§ããããããã¥ã¢ã³ã¹ã®ã¿ããå ãåããªãã®ã§ã¯ããã)
Google BigQuery
Google BigQuery ã¯ãä¸è¨ã§ããã¨ãè¶ ã§ãããã¼ã¿ãSQLã§æ°ç§ã§è§£æã§ããã¯ã©ã¦ããµã¼ãã¹ãã
ä¾ãã° fluentd ãªããã§æ¯æ¥ãã°ãéãã¤ãã¦ããã¦ãæºã¾ã£ã æ°ç¾GB ã¨ãæ°TBããããã¯æ°PB ã¨ãã«ãªã£ããã¼ã¿ã« SQL ãæããã¨ã¡ããã¨çµæãè¿ã£ã¦ããããããããã®ãããããããã°ãã¼ã¿åºç¤ã§ãããè£å´ã§ã¯å½ç¶åæ£å¦çãè¡ããã¦ãã¦ãGoogle ã®æã¤å¤§éã®ã³ã³ãã¥ã¼ã¿ã¨ãã£ã¹ã¯ãããããé«éãªãããã¯ã¼ã¯ã«ãã£ã¦ãããå¦çããã¦ããã
ãã£ã¨ç¥ãããã¨ããæ¹ã¯ Googleの虎の子「BigQuery」をFluentdユーザーが使わない理由がなくなった理由 #gcpja - Qiita ããããåç §ãã¹ãã
ãªã大ããªãã¼ã¿ãç§åä½ã§å¦çã§ãããã¨ããã¨ããããã¾ãã« BigQuery Analytics ã®æ¬é¡ãªã®ã ããã£ããã¯
- SQL ã¯é«åº¦ã«ä¸¦ååã§ããè¨èª (åºæ¬ãã·ã¼ã±ã³ã·ã£ã«ã«å¦çãããã)
- RDBMS ã®ããã«ã¤ã³ããã¯ã¹ãä½ãããã§ã¯ãªããåºæ¬ãã«ã¹ãã£ã³ãã
- ãã®ãã«ã¹ãã£ã³ã«ä¼´ã I/O ã忣å¦çããããã«å¤§éã® HDD ãç¨æãã¦ãããã大éã®ãµã¼ãã¼ã«æ¥ç¶ãã¦ã¯ã©ã¹ã¿ãªã³ã° (ãã¡ãããã¹ã±ã¼ã«ã¢ã¦ãã®çºæ³ã§)
- ãã¼ã¿ããã®ã¯ã©ã¹ã¿å ã§ã«ã©ã ãã¼å½¢å¼ã§ä¿æãã¤ã¤åæ£ããã¦ä¿åãã¦ãããSQL ãæ¥ããããã並åå¦çåã㦠I/O ãæ£ãã
ã¿ãããªæãã§ããããã詳ããã¯ï½¥ï½¥ï½¥ã¨ãããæ¸ç±ã®æ¬é¡ã§ãããã®çè§£ã¯ä»å¾ã®èªæ¸ä¼ãéãã¦è¡ãããäºå®ãã¨ãã£ã¦èª¤éåãã
ããããã¢ã¼ããã¯ãã£ã«ãªã£ã¦ãã®ã§ãBigQuery ã«å¯¾ããã¯ã¨ãªã®ã¬ã¹ãã³ã¹é度ã¯ãã¼ã¿ãµã¤ãºã«æ¯ä¾ããªããä¾ãã° 100GB ã®ãã¼ã¿ã«å¯¾ã㦠3sec ããã£ãã¯ã¨ãªãã対象ã 1TB ã«ãªã£ã¦ã 5sec ã§åã¾ãã¨ããããããç¹æ§ãæã£ã¦ããã
Google BigQuery ã®è£½åãã¸ã·ã§ã³ / MPP ã®çãä¸ãã
ãã®ãSQL ã並å忣å¦çãã¦ããã°ãã¼ã¿ãæ°ç§ã§è§£æãã¾ããããã¨ãã製åã¯ä½ã BigQuery ã ãã§ã¯ãªãã
OSS ã®å®è£ ã§ããã° Facebook ãéçºã㦠OSS ã«ãªã£ã Prestoããããã Cloudera Impala ãªã©ãããããã®è¾ºã¯ãHadoop ã¯ã©ã¹ã¿ã«ã¢ããªã³ããå½¢ã§å©ç¨ããã¨ãHadoop ã¯ã©ã¹ã¿ã«æºãè¾¼ãã ãã¼ã¿ã SQL 並åå¦çã§ããããã«ãªã£ã¦ããã¯ã©ã¦ããµã¼ãã¹ã§ããã° (èªåã¯è©³ãããªãããçµæ§ç¹æ§ã«ã¯éããããã) Amazon Redshift ãªã©ãåæ§ã®ã«ãã´ãªã®ãµã¼ãã¹ã®ããã ãããããã以åã«ãã®ããã°ã§ãç´¹ä»ãã TreasureData ã¯ãã¨ã㨠Hive + Hadoop ãåºç¤ã«ãããµã¼ãã¹ã ã£ãããã®å¾ Presto ãã¢ããªã³ãããã¨ã§åã«ãã´ãªã®ãµã¼ãã¹ã«ãã¼ã¸ã§ã³ã¢ãããã¦ããã
ãã®è¾ºã®è©±ã¯ Hadoop Conference Japan 2014 ã§ã® tagomoris ããã®çºè¡¨ã¹ã©ã¤ã Batch processing and Stream processing by SQL ã詳ããã
è³æã®å 容ã軽ããµããªããã
SQL ãã¼ã¹ã®ããã°ãã¼ã¿è§£æåºç¤ã¯å¤§ããåé¡ããã¨
- Large Batch
- Short Batch
- Stream Processing
ã®3ã«åé¡ãããããããã
- Large Batch : å®å®ãã¦å·¨å¤§ãªãã¼ã¿ããããå¦çã§ããããå®è¡æãªã¼ãã¼ããã大ãã (ãã®ããã¡ããã¡ããã¯ã¨ãªãå¤ãã¦ã¯æããç®ç ・・・ ã¢ãããã㯠ã¯ã¨ãªã«ã¯åãã¦ãªã)
- Short Batch : Large Batch ã®å®å®æ§ã¨è¦æ¨¡æ§ãå¤å°ç ç²ã«ãã¤ã¤ãå®è¡æãªã¼ãã¼ããããæ°ç§ (ã¤ã¾ãã¢ããã㯠ã¯ã¨ãªã«åãã¦ãã)
- Stream Processing : ã¹ããªã¼ã ã«æµãããã¼ã¿ããªã¢ã«ã¿ã¤ã å¦çããããã§ã¯ãªã
ã¨ãããã®ã対å¿ãã代表çãªå®è£ ã¯
- Large Batch ・・・ Hive + Hadoop
- Short Batch ・・・ Presto / Impala etc.
- Stream Processing ・・・ Twitter Storm / Norikra
ãªã©ã¨ãªã£ã¦ãããã® Short Batch ã®ã¦ã¼ã¹ã±ã¼ã¹ã«å«ã¾ããå®è£ ã¯æ¨ä» MPP (Massively Parallel Processing) ç³»ã¯ã¨ãªã¨ã³ã¸ã³ã¨å¼ã°ãã¦ãã¦ãããã°ãã¼ã¿çéã§ã¯ä»ãã£ã¨ãããããªãããã¯ï½¥ï½¥ï½¥ã§ããã¨ãHadoop Conference ã«åºã¦ã¿ã¦èªåã¯ããæããã
ããã¦ãGoogle ã® BigQuery ã¯å ã Google å é¨ã§éçºããã Dremel ã¨ããã¯ã¨ãªã¨ã³ã¸ã³ãåºã«ãªã£ã¦ããããã® Dremel ãã¾ãã«ãã® MPP ç³»ã¯ã¨ãªã¨ã³ã¸ã³ã«ç¸å½ããé¡ã§ãBigQuery ã¯ããã«ã¹ãã¬ã¼ã¸ãå ãã¦å ¬éãµã¼ãã¹ã«ãããã®ã ã¨ç¾æç¹ã§ã¯çè§£ãã¦ããã(BigQuery ã¯èå¾ã«ãã³ããã¯ã©ã¹ã¿ç¾¤ãè¶ å¤§è¦æ¨¡ãªã®ã§ãShort Batch ã«åé¡ãããã¨ã¯ãã Large Batch ãå å«ãããããªãµã¼ãã¹ã§ã¯ããã)
OSS ç³»ã®æµãããããã¨ãHadooop + Hive ã§ SQL ã§ããã°ãã¼ã¿ãè§£æããã¨ããã½ãªã¥ã¼ã·ã§ã³ãçãä¸ãã£ãããMapReduce ã¯ã¿ã¹ã¯èµ·åæã®ãªã¼ãã¼ãããã大ããã¢ãããã¯ã¯ã¨ãªã®åæã«ã¯åãã¦ãªãããããã«ãã¼ããããã« Presto ã Impala ãåºã¦ãã¦ã¿ãããªæµãã ã¨æãã
Big Data Stack 2.0
ããã§ãåé ã® Big Data Stack 2.0 ã§ãããBigQuery 㯠Google ã® Big Data Stack 2.0 ã®ä¸ã«æ§ç¯ããã¦ãããããããæ¸ç±ã«ããã°ã
ãããã Big Data Stack 1.0 ã¨ã¯ãGFS, MapReduce, BigTable ãªã©ã®ãGoogle ãæ¯ããæè¡ãã®ãã¨ã§ãããMapReduce ãªã©ãè«æã§çºè¡¨ãããã®ã 2004 å¹´ã¨ãã ã£ãããããã 10 å¹´çµã£ãçµæãGoogle å é¨ã¯æ¸ç±ãèªãéã㯠Big Data Stack 2.0 ã«ã¾ã§é²åãã¦ãã ã(ããã¦ç¾å¨é²è¡ç³»ã§ã¯ 3.0 ã«è³ã£ã¦ãããããªãã¨ããã©ãã©è¦ããã)
ãã® Big Data Stack 2.0 ã®ä¸»è¦ã³ã³ãã¼ãã³ãã¯ä»¥ä¸ã§ããã説ææã¯èªæ¸ä¼ã§ hakobera ããããµããªãã¦ãããããã¹ãããå¼ç¨ããã
- Colossus
- GFS ã®å¾ç¶ (ã®åæ£ãã¡ã¤ã«ã·ã¹ãã ?)ãè©³ç´°ã¯æªçºè¡¨ã
- Megastore
- Paxos ã¢ã«ã´ãªãºã ã«ãããè¤æ°ãã¼ã¿ã»ã³ã¿ã¼ã§ã®ä¸è²«æ§ã®ãã Read/Write ãå®ç¾ãã NoSQL DBãBigtable ä¸ã«æ§ç¯ããã¦ããã
- Spanner
- Megastore + ãã¼ã¿ã«å°åå¶ç´ï¼ã©ã®ãã¼ã¿ã»ã³ã¿ã¼ã«æå±ãããï¼ãä»ä¸ãããã¨ãã§ããã
- FlumeJava
- MapReduce ã®ãã¤ãã©ã¤ã³å¦çãç°¡åã«æ¸ããããã«ãããã¬ã¼ã ã¯ã¼ã¯
- Dremel
- 忣 SQL ã¯ã¨ãªã¨ã³ã¸ã³ãBigQuery ã®æ ¸ã¨ãªãã¢ã¼ããã¯ãã£ãã¹ãã¬ã¼ã¸ã«ä¾åããªãã
ãããã®å®è£ ã¯æ¦ã Big Data Stack 1.0 ã®ä¸ã«æ§ç¯ããã¦ããããããããã¦ããã hakobera ããããã¹ãããã®å¼ç¨ãªã®ã ã
- Big Data Stack 1.0 ã¯ãDatacenter ã1ã¤ã®ãµã¼ãã¨ãã¦æ±ãæè¡ç¾¤
- Big Data Stack 2.0 ã¯ãè¤æ° Datacenter ã1ã¤ã®ãµã¼ãã¨ãã¦æ±ãæè¡ç¾¤
ã¨ãã風㫠(ä¹±æ´ã«ã¯) ã¾ã¨ããããããããã¦ä¸çã«åæ£ãã¦ãããã¼ã¿ã»ã³ã¿ã¼ããããã°ã©ãããã¿ãå ´å㯠1ã¤ã®ãµã¼ãã¼ã¨ãã¦æ±ãããããªå½¢ã§æ½è±¡åããã®ã Google ã® Big Data Stack 2.0 ã ããã ã
ãåç¥ã®ããã« Big Data Stack 1.0 ããªãã¡ã¬ã³ã¹ã«ã§ã¦ãã OSS ã Hadoop ã HDFSãHBase çã ã ã£ãããã«ããã® Big Data Stack 2.0 ãåèã«ããã¨æããã OSS å®è£ ãå½ç¶åºã¦ãã¦ãã¦ããããã Apache Crunch ã PrestoãImpala ・・・ã¨ããã®ãæ¨ä»ã®ç¶æ³ãGoogle ãè«æãªã©ã§ãã®ããã¾ããçºè¡¨ããããã«ã¯ãGoogle å é¨ã¯æ¬¡ã®ä¸ä»£ã®å®è£ ã«ç§»è¡ãã¦ããã¨ããã®ãéå»ã®ãã¿ã¼ã³ãªã®ã§ããããã Google å é¨ã¯ãã¯ã Big Data Stack 3.0 ãªã®ã§ã¯? ã¨æ¨æ¸¬ããã â ã¨ãã£ãå ·åã§ããã
èªåã¯ãã®åéãè¦ã¦ã¾ããã®ã¯ãããã Google ãæ¯ããæè¡ããåæã® Hadoop ããããBig Data Stack 1.0 ã®é ã§æ¢ã¾ã£ã¦ããã®ã§ããããããã°ããæéãçµã£ã¦ãããªç¶æ³ã«ãªã£ã¦ããã¨ã¯ãæ¹ã㦠Google ã¯ããã伿¥ã ã¨ããææ³ãæ±ãã«ããã£ãã®ã¯å½ç¶ã®ãã¨ãã¨ã¦ãé¢ç½ãèªããããã£ã¨ãã®åéã追ã£ã¦ãã人ã«ã¨ã£ã¦ã¯ä½ã仿´ã¨ãã話ãªã®ãããããªãããé¢ç½ããã¾ã£ã¦ããã°ã«ã¾ã¨ããã¤ãã³ã³ã§ããã
ã¾ã人ã®ä¼ç¤¾ã®è©±ãªãã ãã©ããGoogle ã®å¨ãåã naoyaã
8/30 ã® YAPC::Asia ã§ã¯ãBigQuery Analytics ãããå°ãèªã¿é²ããå¾ã®ãµããªã¨ãå®éã« BigQuery ããããã¯ã·ã§ã³ã§å©ç¨ãã¦ã¿ã¦ã®ã¦ã¼ã¹ã±ã¼ã¹ãææ³ãªã©ãå«ãã¦çºè¡¨ã§ãããã¨æã£ã¦ããã
æ¥ãã§æ¸ããã®ã§èª¤åè±åãä¹±æ´ãªè§£èª¬ããã¤ã以ä¸ã«å¤ãã¨æããããåå¼ãã
ä½è«
ã¡ãªã¿ã«ãã®è¾ºãã¿ã¦ãGoogle ãæ¯ããæè¡IIãã®åºçãå¾ ããã! ã¨å£°ã大ã«ãã¦è¨ããã¨ããã£ããããµã¼ãã¼/ã¤ã³ãã©ãæ¯ããæè¡ãããªãã¨ããããã¨ããç¥ã®å£°ãèããã¦ãããç·¨éãããã¢ã¸ã§ã³ããªããªãæ¸ããªãã¦ããããªãã・・・ã
追è¨
BigQuery ã«ã¤ãã¦ã#gcpja ã§è©±ãã¾ãããã¹ã©ã¤ãã以ä¸ã«å ¬éãã¦ãã¾ãã