Current 2024 ã®ã»ãã·ã§ã³ãChange Data Capture & Kafka How Slack Transitioned to CDC with Debezium & Kafka Connectããæ¥æ¬èªã§ã¾ã¨ãã¾ãã
å¯è½ãªéãæ£ç¢ºã«å
容ãæ¾ããããã«ãªã¹ãã³ã°ã«åªããã¤ããã§ããããã誤ããããã°ãææãã ããã
- Current ã¨ã¯ï¼
- Leveraging Iceberg Puffin Files to Accelerate Queries
- ã¹ãã¼ã«ã¼
- ã»ãã·ã§ã³ã¾ã¨ã
Current ã¨ã¯ï¼
Confluent ã主å¬ãããKafka ã Flink ãã¯ããã¨ãã Streaming å¦çã«é¢ããç¥è¦ã交æããã«ã³ãã¡ã¬ã³ã¹.
ã¤ãã³ããã¼ã¸
åã»ãã·ã§ã³ã¯å ¬å¼ãµã¤ãã§è¦è´å¯è½
Leveraging Iceberg Puffin Files to Accelerate Queries
ã»ãã·ã§ã³ãªã³ã¯ã¯ä»¥ä¸.
ã¹ãã¼ã«ã¼
- Joseph Thaidigsman
- Tom Thornton
- Slack 社ã®ã½ããã¦ã§ã¢ã¨ã³ã¸ãã¢
ã»ãã·ã§ã³ã¾ã¨ã
æ¬ã»ãã·ã§ã³ã¯ãªãã¬ã¼ã·ã§ãã«DBããåæåºç¤ã¸ã®ã¬ããªã±ã¼ã·ã§ã³ãæ±ã.
ã¬ããªã±ã¼ã·ã§ã³ã®æ¹æ³ã¨ãã¦ä¸è¬ã«ãããã¨CDCããã.
ããããCDCã¸ç§»è¡ããæ¹æ³ãCDCã®ã¹ã±ã¼ã©ãã«ãªéç¨ããããã«ä¾¿å©ãªOSSã®ç´¹ä»ããã¼ã.
Slack ããããã CDC ã¸ç§»è¡ãããã¨ã«ããåæ©
- ã³ã¹ã
- å°éã®æ´æ°ã§ããã¼ã¿å ¨éã®å¤æ´ãçºçãã¦ãã
- ä¾ãã° Slack ã®ã¡ãã»ã¼ã¸ãä¿æãã¦ãããã¼ãã«ãããããããã£ã³ãã«ã¸ã¡ãã»ã¼ã¸ãéããã度ã«ã¡ãã»ã¼ã¸ã®å±¥æ´å ¨éãæ´æ°ãããã¨ã«ãªã
- çµæã¨ãã¦ãè¨ç®ã³ã¹ããè«å¤§ã«ãªã
- (çè 注: ãã¼ã¿ã¬ã¤ã¯ã«ãããã¡ãªèª²é¡. ããããä½ããã®ãã¼ãã£ã·ã§ãã³ã°ã¯ãã¦ããã¨æãã®ã§ãæ¬å½ã«éå»ã®ã¡ãã»ã¼ã¸å ¨ä½ãæ¸ãæãã¦ããããã§ã¯ãªãã ããããé常ã«éããã¨ã«å¤ããã¯ãªã.)
- æè¡è² åµ
- 2021 å¹´ã« OSS ã¨ãã¦ãªã¿ã¤ã¢ãã¦ãã Apache Sqoop ãå©ç¨ãã¦ãããéç¨ãéè·ã«ãªã£ã¦ãã
- (çè 注: Apache Sqoop 㯠RDBMS ãªã©ã®ãã¼ã¿ã¹ãã¢ã¨ Apache Hadoop ã®éãå¹ççã«ãã¼ã¿ç§»åããããã®ãã¼ã«. 2021å¹´ã«ããã¸ã§ã¯ãããªã¿ã¤ã¢ãã¦ãã)
- ã¬ã¤ãã³ã·
- ãã¼ã¿åãè¾¼ã¿ã®ãããå¦çã«é常㫠24 - 48 æé以ä¸ãããä¸ãããããã®å®è¡ãé«ã³ã¹ãã§ãããããåæåºç¤ã§ãã¼ã¿ã使ããããã«ãªãã¾ã§ã®ã¬ã¤ãã³ã·ã大ãã
- ä¾ãã°æ°æ©è½ã® AB ãã¹ããå®æ½ããéã«ãæ°æ©è½ãåºãã¦ããçµæããããã¾ã§ã« 2 æ¥ä»¥ä¸ãããã¨ãã£ãåé¡ãèµ·ãã¦ãã
ããã§ãCDC ãã¼ã¹ã®ã¢ã¼ããã¯ãã£ã¸ç§»è¡ãã.
ãã¼ã¿ã½ã¼ã¹ã¨ãªããªãã¬ã¼ã·ã§ãã« DB 㯠Vitess. Vitess 㯠MySQL ã®æ°´å¹³ã¹ã±ã¼ã«ã¯ã©ã¹ã¿ãªã³ã°ã«ç¨ããä»çµã¿.
Vitess ã¸ã®å¤æ´ã¯ Kafka Connect ã¨ãã¦å®è¡ããã debezium ã«ãã£ã¦èªã¿åãããKafka ã¸éä¿¡ããã. debezium 㯠OSS ã® CDC ãã©ãããã©ã¼ã .
Kafka ã¸éããããããã¯ã¯ Avro ã¹ãã¼ã㧠Iceberg Sink Connector ãã³ã³ã·ã¥ã¼ã ãã.
Iceberg Sink Connector 㯠Kafka ãéãã¦æµãã¦ãã DB ã®å¤æ´ã Iceberg ã¨ã㦠Amazon S3 ã¸æ¸ãè¾¼ã. Iceberg Catalog 㯠Hive Metastore
S3 ã«æ¸ãè¾¼ã¾ãããã¼ã¿ã Amazon EMR ä¸ã® Spark ãå¦çãã¦ãæ¹ã㦠Iceberg ã¨ã㦠S3 ã¸æ¸ãæ»ã
Viteess ãã Debezium, Kafka ã¾ã§ã®ã¢ã¼ããã¯ãã£
ååé¨ã§ã¯ãªãã¬ã¼ã·ã§ãã«DBã§ããViteess ãã Debezium, Kafka ã¾ã§ã®ã¢ã¼ããã¯ãã£ã説æãã.
Viteess ãã Debezium, Kafka ã¾ã§ã®ã¢ã¼ããã¯ãã£è§£èª¬(ã¹ã©ã¤ãã®ç»åã¨è¦æ¯ã¹ãªããèªãã§ã)
- Vitess
- Vitess 㯠Keyspace ã¨å¼ã°ããè«çãã¼ã¿ãã¼ã¹ã§æ§æããã
- Keyspace ã¯è¤æ°ã® Shard ã«ãã¼ãã£ã·ã§ãã³ã°ãã¦æ°´å¹³ã¹ã±ã¼ã«ã§ãã
- Shard 㯠Tablet ããã»ã¹ã¨ MySQL ããã»ã¹ã§æ§æããã
- Tablet ã«ã¯ãPrimaryãããReplicaããªã©ä»»æã® Tablet Type ãã¢ãµã¤ã³ããã
- VTGate ã¯è»½éãªãµã¼ããããã·ã§ãããã¯ã¨ãªãæ£ãã Shard ã¸ã«ã¼ãããããã¯ã¨ãªçµæã Shard 横æã§çµ±åãã
- VTGate ã® VStream API ãéãã¦ã¯ã©ã¤ã¢ã³ã㯠Shard ãæ§æãã MySQL ã® Binlog ã®ã¹ããªã¼ã ããµãã¹ã¯ã©ã¤ãã§ãã
- Debezium
- Kafka Connect ã¯ã©ã¹ã¿å ã® Debezium ã VStream ãã³ã³ã·ã¥ã¼ã ãã
- Debezium ã³ãã¯ã¿ã¯ãã¼ã¿ãã©ãèªã¿åãããå®ç¾©ãã Task ãå®ç¾©ãã¦ãããWorker ãããããå®è¡ãã
- Kafka
- Debezium ã«ãã£ã¦ CRUD æ å ±ã Data Changes Topic ã¨ã㦠Kafka ã¸æ¸ãè¾¼ã¾ãã
- Data Changes Topic ã«å ããDebezium 㯠Offset Topic ãæ¸ãè¾¼ã. ãã㯠Binlog ãä½å¦ã¾ã§æ¸ãããããè¨é²ãã¦ãã
- Debezium ãã¯ã©ãã·ã¥ããå ´åã¯ãåèµ·åå¾ã« Offset Topic ãèªãã§ãBinlog ã®é²è¡ç¶æ³ãææ¡ããä¸ã§ VStream ãåéã§ãã
ãããã®ä»çµã¿ã大è¦æ¨¡ãªã¹ã±ã¼ã«(以ä¸)ã§éç¨ãã¦ãã.
Debezium éç¨ã®èª²é¡
å è¿°ã®ä»çµã¿ãå®ç¾ããã«ããã£ã¦ããã¤ãã®èª²é¡ããããOSS ãã«ã¹ã¿ãã¤ãºãããã¨ã§è§£æ±ºãã. ãããã®å·¥å¤«ã¯ OSS ã«ã³ã³ããªãã¥ã¼ããã¦ãã¿ããªã使ããããã«ãã¦ãã
ã¹ãããã·ã§ããã®ãã©ã¼ã«ããã¬ã©ã³ã¹
æåã®èª²é¡ã¯ Table Snapshot ã®ãã©ã¼ã«ããã¬ã©ã³ã¹ã«ã¤ãã¦. Binglog ã«å
¨ã¦ã®å¤æ´ãä¿æããã¦ããªãå ´åã§ãããã¼ãã«ã®å®å
¨ãªãã¼ã¿ãåå¾ããæ¹æ³ãå¿
è¦ã ã£ã. ä¾ãã°ã¦ã¼ã¶ ID 5, ãã£ãã« ID 6 ã®ã¬ã³ã¼ãã Kafka ãããã¯ã«åãè¾¼ã¿ããããæ¢ã« Binlog ã«åå¨ããªãå ´åã«ãããã Kafka ãããã¯ã«åãè¾¼ã¿ããå ´åã«ã¹ãããã·ã§ãããå¿
è¦ã«ãªã.
VStream API ã¯ãããå®ç¾ãã VSteram Copy æ©è½ãæä¾ãã¦ããããã¼ãã«ãããã¹ã¦ã®è¡ãã³ãã¼ãã¦ãVStream ã¯ã©ã¤ã¢ã³ãã¸éä¿¡ãããã¨ãã§ãã.
ããããããã Debezium ã§å©ç¨ããå ´åã«åé¡ãèµ·ãã. Debezium 㯠Offset ãä¿åããéã« Keyspace Shard 㨠Global Transaction ID(DTID) ãä¿åããããVStream ã³ãã¼ã®é²è¡ç¶æ³ã示ãææ¨ããªããããDebezium ãã¯ã©ãã·ã¥ãã¦åèµ·åããå ´åã«ã¹ãããã·ã§ãããåéã§ããªããã¤ã¾ããã¹ãããã·ã§ããããã©ã¼ã«ããã¬ã©ã³ãã§ã¯ãªã.
ããã§ãOffset ãããã¯ã«ã¹ãããã·ã§ããä¸ã«æå¾ã«æ£å¸¸ã«éä¿¡ãããè¡ã®ä¸»ãã¼ãä¿åããããã«ãããããã«ãã£ã¦ãDebezium ãã¯ã©ãã·ã¥ããå ´åã§ãããã®ãã£ã¼ã«ããèªã¿åã£ã¦ VStream ã¸æ¸¡ããã¨ãã§ããVStream ã³ãã¼æä½ããã®å°ç¹ããåéã§ããããã«ãªããããã«ãã£ã¦ããã¼ãã«ã®ã¹ãããã·ã§ããããã©ã¼ã«ããã¬ã©ã³ãã«ãªã£ã.
Debezium ã®ã¹ã±ã¼ã«
Viteess ã®ãã©ã³ã¶ã¯ã·ã§ã³å¢å ã«ä¼´ã£ã¦ Debezium ã³ãã¯ã¿ãã¹ã±ã¼ã«ãããå¿
è¦ãããããDebezium ã® Viteess ã³ãã¯ã¿ã¯åä¸ã¿ã¹ã¯ã¢ã¼ããããµãã¼ããã¦ããªãã£ã.
Viteess ã« Shard ã 3ã¤ããå ´åãOffset ãããã¯ã«ã¯ããããã® Shard ã® GTID ãä¿æããã.
ããã§ãDebezium ã® Viteess ã³ãã¯ã¿ããã«ãã¿ã¹ã¯ã¢ã¼ãããµãã¼ãããããã«æ¹ä¿®ãã.
Shard ãã¿ã¹ã¯ã«ã©ã¦ã³ãããã³ã§åé
ã§ããããã«ããä¸ã§ãåã¿ã¹ã¯ã 1 ã¤ã® Shard ãå¦çããOffset ããããã管çããããã«ãã. ããã«ãã£ã¦ééçãªã¹ã±ã¼ã«ãå¯è½ã«ãªã£ã.
Kafka ã®ãã¼ãã£ã·ã§ã³æ°å¤æ´å¯¾å¿
Kafka ã®ãã¼ãã£ã·ã§ã³æ°ãå¢ããã¦ãKafka ã®ããã©ã¼ãã³ã¹ãã¹ã±ã¼ã«ããããå ´é¢ããã.
ããã§ããã¼ã¿å¤æ´ãããã¯ã®é åºæ§ãåé¡ã«ãªã.
ä¾ãã°ããªãµãããããã¸ã§ã¯ãã®ãºã¢ãã£ã³ãã«ã«åå ããã¨ãã. ã¦ã¼ã¶ã¼IDã¯5ããã£ã³ãã«IDã¯6ã§ããã£ã³ãã«ã«åå ãããããããã¯ä½ææä½ã¨ãã¦è¨é²ããã.
ããã§ããã¼ãã£ã·ã§ã³æ°ã 2 ã¤ã«å¢ãããã¨ãã. æ´ã«ãã®å¾ããªãµããããã£ã³ãã«ãéåºããã¨ãã.
ã¤ã¾ããåããã©ã¤ããªãã¼ã«å¯¾ãã¦åé¤æä½ãçºçãã. çµæçã«ãåããã©ã¤ããªãã¼ï¼ã¦ã¼ã¶ã¼ ID ã¨ãã£ã³ãã« IDï¼ãç°ãªããã¼ãã£ã·ã§ã³ã«æ¯ãåããããåé¡ãèµ·ãã.
ãã¼ãã£ã·ã§ã³ã¯ã¡ãã»ã¼ã¸ã®ãã¼ã®ããã·ã¥å¤ããã¼ãã£ã·ã§ã³æ°ã§å²ã£ãä½ãã§æ±ºã¾ãããããã¼ãã£ã·ã§ã³æ°ãå¤æ´ããã¨ãåããã¼ã«å¯¾ãã¦æ°ãããã¼ãã£ã·ã§ã³ãå²ãå½ã¦ãããå¯è½æ§ããã.
çµæã¨ãã¦ãã³ã³ã·ã¥ã¼ã㯠2 ã¤ã®ãã©ã¤ããªãã¼ã«å¯¾ããé åºãä¿è¨¼ã§ãããææ°ã®ã¬ã³ã¼ããç¹å®ã§ããªããªã. MySQL ã® Binlog ã®ã¿ã¤ã ã¹ã¿ã³ãã¯ç§åä½ã®ç²¾åº¦ãããªãã1 ç§ããã 60 ä¸ä»¶ã®æ¸ãè¾¼ã¿ããã Slack ã®ãããªç°å¢ã§ã¯å½¹ã«ç«ããªã.
åç´ãªå¯¾çã¨ãã¦ã¯ä»¥ä¸ãèããããããã©ã¡ããçæ³çã§ã¯ãªã.
- ãããã¯ã®å 容ãå ¨ã¦åé¤ããæ°ããã¹ãããã·ã§ãããä½æãã¦ãããã¯ã«ã¹ããªã¼ãã³ã°
- ãã¹ã¦ã®ã³ã³ã·ã¥ã¼ãã¼ãããã¼ãã£ã·ã§ã³æ°ãå¢ããæ°ãããããã¯ã«åãæ¿ãã
ãããã®è§£æ±ºçã¯ããããã大éã®æä½æ¥ããã¦ã³ã¿ã¤ã ããã¼ã éã®èª¿æ´ãå¿ è¦ã«ãªã.
ããã§ãã¡ãã»ã¼ã¸ãã³ã³ã·ã¥ã¼ã ãããé åºã«ä¾åããã«é åºä»ããè¡ããä»çµã¿ãã¤ã¾ãä»»æã® 2 ã¤ã®ã¡ãã»ã¼ã¸ã«å¯¾ãã¦ãã©ã¡ããææ°ããå¤æã§ããããã«ããä»çµã¿ãä½ã£ã.
ãããå®ç¾ããããã3 ã¤ã®ãã£ã¼ã«ãã追å ãã.
- tx_order
- åããã©ã³ã¶ã¯ã·ã§ã³å ã® 2 ã¤ã®ã¬ã³ã¼ãã®é åºç¹å®ã«ä½¿ç¨ãã.
- VStream ãéå§ã¤ãã³ããè¤æ°ã®å¤æ´ãã³ãããã¤ãã³ããéä¿¡ãããããã«ã¦ã³ããä¿æãã¦åå¤æ´ã«å²ãå½ã¦ããã¨ãã§ãã.
- tx_order ã大ããæ¹ãããã®ãã©ã³ã¶ã¯ã·ã§ã³å ã§ããå¾ã®å¤æ´ãæå³ãã.
- tx_rand
- ç°ãªããã©ã³ã¶ã¯ã·ã§ã³ã® 2 ã¤ã®ã¬ã³ã¼ãã®é åºç¹å®ã«ä½¿ç¨ãã. å¤ã大ããã»ã©æ°ããã¬ã³ã¼ãã示ãå調å¢å ã®æ´æ°ã§ãGTID ããå°ããã¨ãã§ãã.
- GTID ã¯ãã¹ã㨠Binlog ä½ç½®ã®ç¯å²ã§æ§æããã¦ãããBinlog ä½ç½®ã®ç¯å²ã®ä¸éã¯å調å¢å ã®æ´æ°ã¨ãªãã®ã§ãå調å¢å ã®æ´æ°ã®åè¨ãåããã¨ã§ãåãç¹æ§ãä¿æã§ãã.
- ã¹ã©ã¤ãã®ä¾ã§ã¯ãæåã®ã¬ã³ã¼ãã¯åä¸ã®ãã¹ãã®ã¿ãæã¤ãããåè¨ã¯3ã«ãªã. 2 çªç®ã®ã¬ã³ã¼ãã§ã¯ã3 㨠5 ã®åè¨ã 8 ã«ãªãããã大ããªå¤ãæã¤æ¹ããããæ°ããã¬ã³ã¼ãã§ãããã¨ãããã
- epoch
- ã©ã³ã¯ã®å調å¢å ã®ç¹æ§ã失ãããå ´åã«ä½¿ç¨ãã. ããã¯ãå¾ããã§ãããã¹ãã»ããã以åã®ãã¹ãã»ããã®ã¹ã¼ãã¼ã»ããã§ãªãå ´åã«çºçãã. ä¾ãã°ãGTID ã¨ãã¤ããªãã°ããªã»ãããã管çæä½ãè¡ãããå ´åãªã©
ãããã®æ°ãããã£ã¼ã«ãã使ç¨ãã¦ããã¼ãã£ã·ã§ã³ã®ã¹ã±ã¼ã«ã¢ã¦ããå度試ã¿ã.
2 ã¤ã®ãã¼ãã£ã·ã§ã³ã«ã¹ã±ã¼ã«ã¢ã¦ããããªãµããããåã³ãã£ã³ãã«ãéåºããã¨ããã¼ãã£ã·ã§ã³ 2 ã«åé¤æä½ãè¨é²ããã.
é åºä»ãã®ç¢ºç«ã¯ä»¥ä¸ã®æé ã§è¡ãï¼
- epoch ã®æ¯è¼: epoch ãåãã§ããã°ãtx_rand ã®å調å¢å ã®ç¹æ§ãä¿ããã¦ãã
- tx_rand ã®æ¯è¼: tx_rand ãé«ãæ¹ãããæ°ããã¬ã³ã¼ã
- tx_rand ãåãå ´åã¯ãtx_order ãæ¯è¼ãã¦ææ°ã®ãã®ã決å®
ãã®æ¹æ³ã«ãããKafka ã®ã¹ã±ã¼ã«ã¢ã¦ãæã«æä½æ¥ããã¼ã éã®èª¿æ´ããã¦ã³ã¿ã¤ã ãä¸è¦ã«ãªã
Vitess ã® Reshard ã¸ã®å¯¾å¿
Vitess ã® Shard ã®å¦çéãå¢å ããShard ã®æ°ãã¹ã±ã¼ã«ããããå ´åããã.
ã¹ã©ã¤ãã®ä¾ã§ã¯ãShard s1 ãããããããã¯å
ã®ã¬ã³ã¼ãã¨ãã¦ãªãµããããã£ã³ãã«ã«åå ãããã¨ã示ã create æä½ããã. ãã㧠Shard 㯠s1 ã§ãããã¨ã示ããã¦ãã.
ããã§ããã® Shard ã s1.1 㨠s1.2 㸠Reshard ããã¨ãã. ãã®å¾ããªãµããããã£ã³ãã«ãé¢ããdelete ã¬ã³ã¼ããçæãããæã«åé¡ãèµ·ãã.
æ°ãã Shard s1.2 ã§ã¯ãepoc ã«ããã©ã«ãå¤ã® 0 ã使ç¨ãã¦ãããããå¾ã®ã¬ã³ã¼ãã®æ¹ã epic å¤ãä½ããªã£ã¦ãã¾ããå
ã«ç´¹ä»ããé åºã確ä¿ããããã®ã¢ã«ã´ãªãºã ãå£ãã.
ããã解決ããããã«ã¯ãShard ã®ãªãã¼ã¸ã確ç«ããä»çµã¿ãä½ã£ãã
s1.2 㯠s1 ãã«ãã¼ãã¦ãããã¼ã®ç¯å²ã®ãµãã»ããã§ããã¨è¨ãããããs1 㯠s1.2 ã®è¦ªã¨è¦åããã¨ãåºæ¥ããããã§ãæ°ãã Shard ã® epoch ãããã¹ã¦ã®è¦ª Shard ã®æ大å¤ã« 1 ãå ããå¤ã«è¨å®ãã.
å ç¨ã®ä¾ã®å ´åããã¹ã¦ã®è¦ª Shard ã®æ大å¤ã¯ 1 ãªã®ã§ãããã« 1 ãå ã㦠2 ã¨ããããã«ãããããã§ãå¾ã®ã¬ã³ã¼ãã®æ¹ã epic å¤ãé«ããªããããé åºã確ç«ããã¢ã«ã´ãªãºã ãåã³æ©è½ããããã«ãªã£ã. ãã®ããã«ãã¦ãééçãªã¹ã±ã¼ãªã³ã°ãå®ç¾ãã¦ãã.
Kafkaãããã¼ã¿ã¬ã¤ã¯ã¸ã®åãè¾¼ã¿ãæä¾ã¾ã§ã®ã¢ã¼ããã¯ãã£
ãããã㯠Kafka ãã³ã³ã·ã¥ã¼ã ãã¦ãã¼ã¿ã¬ã¤ã¯ã«åãè¾¼ã¿ãå©ç¨è
ã¸æä¾ããã¾ã§ã®ã¢ã¼ããã¯ãã£ã説æãã.
CDC ãã©ãããã©ã¼ã ã®ã³ã³ã·ã¥ã¼ãå´ã«ã¤ãã¦ã1 å
ã¬ã³ã¼ããè¶
ãããã¼ãã«ããµãã¼ãããããã®å·¥å¤«ã¨ãå¾æ¥ã®ããããã¤ãã©ã¤ã³ããã®ç§»è¡ã«ã¤ãã¦èª¬æãã.
Slack ãæçµçã«æ¡ç¨ããã¢ã¼ããã¯ãã£ã¯ Ryan Blue ã®è¨äº ãThe CDC merge patternã ã«ã¤ã³ã¹ãã¬ã¼ã·ã§ã³ãåã¦ãã.
(çè
注ï¼Ryan Blue 㯠Apache Iceberg ã®ãªãªã¸ãã«ã¯ãªã¨ã¤ã¿ã¼ã®ä¸äºº)
åæã¨ãã¦ãCDC ãå®ç¾ããä¸ã§ããã¼ã¿ã¬ã¤ã¯ä¸ã« 3 ã¤ã®é¢é£ãããã¼ãã«ãæã.
- CDC ãã¼ãã«
- 追è¨å°ç¨ã® Iceberg ãã¼ãã«ã§ãçã®å¤æ´ãã°ã¤ãã³ããå«ãã§ãã
- ãã©ã¼ãã¼ãã«
- ã½ã¼ã¹ãã¼ãã«ã 1 対 1 ã§åæ ãããã¨ãç®çã¨ãã Iceberg ãã¼ãã«
- DS ãã¼ãã£ã·ã§ã³ãã¼ãã«
- æ¥ä»ã«ãã£ã¦ãã¼ãã£ã·ã§ã³åå²ããããHive Style ã® Parquet ãã¼ãã«. åæ¥ä»ã®ãã¼ãã£ã·ã§ã³ã«ã¯ã½ã¼ã¹ãã¼ãã«ã®å®å ¨ãªã¹ãããã·ã§ãããå«ã¾ãã¦ãã
å ¨ä½ã®ã¢ã¼ããã¯ãã£. ã¬ã³ã¼ãã Kafka ãééããã¨ãk8s ä¸ã® Kafka Connect ã¯ã©ã¹ã¿ã§å®è¡ããã¦ãã Tabular Iceberg ã³ãã¯ã¿ã«ãã£ã¦å¦çããã.
(çè 注: Tabular Iceberg ã³ãã¯ã¿ã¯ç¾å¨ã§ã¯ Iceberg æ¬ä½ã«åãè¾¼ã¾ãã¦ãã)
Kafka ããèªã¿åã£ã Avro ã¬ã³ã¼ã㯠Apicurio ã®ã¹ãã¼ãã¬ã¸ã¹ããªã«ä¿åããã¦ããã¹ãã¼ãã«ãã£ã¦ãã·ãªã¢ã©ã¤ãºããããããã S3 ã®è¿½è¨å°ç¨ CDC ãã¼ãã«ã«æ°¸ç¶ããã.
ãããããAirflow ã EMR ä¸ã§å®è¡ããã Spark ã¸ã§ããèµ·åããCDC ãã¼ãã«ããææ°ã®æ´æ°ãå¢åçã«ã³ã³ã·ã¥ã¼ã ãããããããã©ã¼ãã¼ãã«ã«ãã¼ã¸ãã.
ããã«ãã¦ã³ã¹ããªã¼ã ã§ã¯ãCDC ãã¼ãã«ã¨ãã©ã¼ãã¼ãã«ã®ä¸¡æ¹ããèªã¿åããDS ãã¼ãã£ã·ã§ã³ãã¼ãã«ã«æ¥æ¬¡ DS ãã¼ãã£ã·ã§ã³ãçæããæ¥æ¬¡ã¸ã§ãããã.
æå¾ã«ãä»ã®ãã¼ã ã¯èªåãã¡ã®ã¸ã§ããå®è¡ãããããã® DS ãã¼ãã£ã·ã§ã³ãã³ã³ã·ã¥ã¼ã ãã¦ãã¡ã¯ããã¼ãã«ãã¡ããªã¯ã¹ãã¼ãã«ãçæãã.
CDC ãã¼ãã«ã®è©³ç´°. Kafka ãããã¯ãã¨ã«å¢ãããã¨ãã§ããå¯å¤æ°ã®ã¿ã¹ã¯ãããããã¤ãã©ã¤ã³ãã¹ã±ã¼ã«ã¢ããã§ãã. ã¾ããIcebergã¸ã®ã³ããããå¦çããã·ã³ã°ã«ãã³ã³ã¼ãã£ãã¼ã¿ã¼ããã. Slack ã使ç¨ãã¦ãã主è¦ãªæ©è½ã¯ãexactly-once ã»ãã³ãã£ã¯ã¹ãã¹ãã¼ãé²åãHive Catalogã対å¿.
å
·ä½ä¾ã¨ãã¦ããªãµããããã£ã³ãã«ã«åå ããä¾ãèãã.ãã®å ´åããã£ã³ãã«ã¸ã®åå ã示ãã¬ã³ã¼ãã Debezium ãéã㦠Avro 㧠Kafka ã«çæããã. Iceberg ã³ãã¯ã¿ã¯ã¹ãã¼ãã¬ã¸ã¹ããªå
ã®ã¹ãã¼ãã使ç¨ãã¦ã¬ã³ã¼ãããã·ãªã¢ã©ã¤ãºãããæçµçã« CDC ãã£ã³ãã«ã¡ã³ãã¼ãã¼ãã«ã¸è¿½å ããã.
ãã® CDC ãã¼ãã«ã¯ãã½ã¼ã¹ãã©ã³ã¶ã¯ã·ã§ã³ã®ã¿ã¤ã ã¹ã¿ã³ãã使ç¨ãã¦æéåä½ã§ãã¼ãã£ã·ã§ã³åããã追è¨å°ç¨ã® Iceberg ãã¼ãã«.
CDC ã¬ã³ã¼ãããã©ã¼ãã¼ãã«ã«ãã¼ã¸ãããæ¹æ³ã«ã¤ãã¦.
ãã©ã¼ãã¼ãã«ã¯ãã½ã¼ã¹ã® Vitess ãã¼ãã«ã® 1 対 1 ã®ãã©ã¼ã§ãã. ãã®ãã¼ãã«ãçæããä¸æµã® Vitess ãã¼ãã«ã¨åæãããããã«ãåºå®ã®é »åº¦ã§ EMR ä¸ã§ Spark ã¸ã§ããå®è¡ããCDC ã¬ã³ã¼ããããã©ã¼ãã¼ãã«ãæ´æ°ãã.
ãã®ã¸ã§ã㯠Spark ã® Incremental Read ã使ç¨ãã¦ãã¾ã å¦çããã¦ããªãæ°ãã追å ãããã¤ãã³ãã®ã¿ãèªã¿åã. ä»åã®ä¾ã§ã¯ãIncremental Read ã¯ããªãµããããã£ã³ãã«ã«åå ãããã¨ã§æ°ãã追å ãããè¡ã ããé¸æãã.
ãããã®ã¬ã³ã¼ããèªã¿åã£ãå¾ãSpark ã® merge into 㧠ãã©ã¼ãã¼ãã«ãæ´æ°ãã. ä»åã®å ´åãLisaã®æ°ããè¡ãæ¿å
¥ããã.
å
ã»ã©ç´¹ä»ãã Vitess ã«é¢ããã¬ã³ã¼ãï¼tx_rank, tx_order, epochï¼ã使ç¨ãã¦ãåããã©ã¤ããªãã¼ã«å¯¾ãã¦è¤æ°ã®ã¬ã³ã¼ããããå ´åã«ã©ã®ã¬ã³ã¼ããæ¡ç¨ãããã決å®ãã.
éè¦ãªç¹ã¨ãã¦ãIceberg 㧠Incremental Read ã使ç¨ããã«ã¯ãèªã¿åããéå§ãã Snapshot ID ãæä¾ããå¿
è¦ããã.
ãã¼ã¸ã¸ã§ãã¯ãCDC ãã¼ãã«ã®ç¹å®ã® Snapshot ããå§ã¾ããå¥ã® Snapshot ã«ç§»åãããã®éå§ã¨çµäºã®å¢çããçæããæ°ãã Snapshot ã®ãã©ã¼ãã¼ãã«ã®ã¡ã¿ãã¼ã¿ã«ä¿åãã. 次åå®è¡æã«ã¯ããã®ã¡ã¿ãã¼ã¿ãèªã¿åã£ã¦éå§ç¹ãç¥ããã¨ãã§ãã.
ããã¾ã§ã«èª¬æããä»çµã¿ã¯ãããã»ã©è¦æ¨¡ã大ãããªããã¼ãã«ã§ããã°ãã¾ãæ©è½ãã.
ãããã100 åè¡ãè¶
ãã大è¦æ¨¡ãªãã¼ãã«ãæ±ãä¸ã§ã¯ãwrite amplificationãåé¡ãçºçãã. write amplification ã¨ã¯ãå°éã®å
¥åãã¼ã¿ã大éã®å¦çãå¼ãèµ·ããç¾è±¡ãæå³ããç¨èªã§ãã.
åºæ¬çãªå¯¾çã¨ãã¦ã¯ä»¥ä¸ã® 2 ã¤ãæãããã.
- ã¸ã§ãã®å®è¡é »åº¦ãä¸ãã
- write amplification ã®æ§è³ªä¸ã10 åéã®è¨é²ããã¼ã¸ããã®ã¨ 1 æ¥åã®è¨é²ããã¼ã¸ããã®ã¨ã§ã¯ãå¦çæéã«ããã»ã©å¤§ããªå·®ããªã. ãããã£ã¦ãå³å¯ãªé 延è¦ä»¶ããªãå ´åã¯ããã¼ã¸ã®é »åº¦ãå¤§å¹ ã«ä¸ãããã¨ãã§ãã.
- Merge-on-Read
- Iceberg ã® MoR ãæ´»ç¨ãã
ãããã¯å¤§ããªå©ãã«ãªããã1 å è¡ä»¥ä¸ã®æ¬å½ã«å·¨å¤§ãªãã¼ãã«ãæ±ãä¸ã§ã¯ãããã§ãååã§ã¯ãªã. ããã§ãããã±ãããã¼ã¸ãã¨ããææ³ã使ç¨ãã¦ãã.
ãã±ãããã¼ã¸ã¨ã¯ã巨大ãªãã¼ãã«ã«å¯¾ãã¦ã·ã£ããã«ãæ¸ããããã«ä½¿ç¨ãããã¼ã¸æ¦ç¥ã§ãStorage Partitioned Join ã使ç¨ãã.
Storage Partitioned Join ã¯ãçµåãã両ãã¼ãã«ã®ãã¼ãã£ã·ã§ã³ãç´æ¥çµåããã¢ããã¼ã. ãããæ©è½ãããã«ã¯ãçµåã®ä¸¡å´ã®ãã¼ãã«ãåããã¼ãã£ã·ã§ãã³ã°æ¦ç¥ãæã£ã¦ããå¿
è¦ããã.
ããã«ãããåçµåãã¼ãå·¦å³ã®ãã¼ãã«ã§åããã¼ãã£ã·ã§ã³ã«ãããã³ã°ããã. Slack ã® CDC ãã¤ãã©ã¤ã³ã§ã¯ãCDC ãã¼ãã«ãããã©ã¼ãã¼ãã«ã¸ãã©ã¤ããªãã¼ã§çµåããããããã©ã¤ããªãã¼ã®ããã·ã¥ã«åºã¥ãã¦ãã©ã¼ãã¼ãã«ããã±ãããã¼ãã£ã·ã§ãã³ã°ãã¦ãã.
(çè 注: Storage Partitioned Join ã®ä»çµã¿ã¨å¹æã«ã¤ãã¦ã¯ã Petabyte-Scale Row-Level Operations in Data Lakehouses è«æã§è©³ãã解説ããã¦ãã)
以ä¸ã®å³çã§ã¯ããã±ãããã¼ã¸ã使ç¨ãããã¼ã¸ã¸ã§ãã®æ°ããã¹ãããã追å ãã¦ãã.
以å㯠CDC ãã¼ãã«ããå¢åèªã¿åããè¡ããç´æ¥ãã©ã¼ãã¼ãã«ã«ã¬ã³ã¼ãããã¼ã¸ãã¦ãã. ãããå¤æ´ãããã±ãããã¼ã¸ã®ãããå¢åèªã¿åããã CDC ã¬ã³ã¼ããä¸æç㪠Iceberg ãã¼ãã«ã«ä¿åãããã®ãã¼ãã«ããã©ã¼ãã¼ãã«ã¨åãæ¹æ³ã§ãã±ãããã¼ãã£ã·ã§ãã³ã°ãã.
å³çã®ä¾ã§ã¯ãæ°ãããpercolator_bucket_keyãã«ã©ã ãã追å ãã¦ãã. ããã¯ã¦ã¼ã¶ã¼ ID ã¨ãã£ã³ãã« ID ã®ããã·ã¥ãé¸æãããã±ããæ°ã§å²ã£ãä½ãã¨ãã¦è¨ç®ããã¦ãã.
ãã®ãã¼ãã«ãæ¸ãè¾¼ãã å¾ã5 ã¤ã® Spark è¨å®ã使ç¨ã㦠Storage Partitioned Join ãæå¹ã«ããããã®ä¸ã§é常ã®ãã¼ã¸ç¨ SQL ãå®è¡ã㦠CDC ã¬ã³ã¼ãããã¼ã¸ãã.
å
·ä½çãªå¹æã示ãããã«ãSlack ã«ãããæ大ã®ãã¼ãã«ã§ããã¡ãã»ã¼ã¸ãã¼ãã«ã¸ã®å½±é¿ãç´¹ä»ãã.
以ä¸ã¯ãã±ãããã¼ã¸ãå®è£
ããåã®ãã¼ã¸ã¸ã§ãã®ã¹ã¯ãªã¼ã³ã·ã§ããã§ã赤ãå²ã¾ããã¨ã¯ã¹ãã§ã³ã¸ã¹ãããã®ãããã¯ãã·ã£ããã«ãæå³ãã¦ãããã¸ã§ãã®ããã«ããã¯ã¨ãªã£ã¦ãã.
9520 åã®ã·ã£ããã«ã¬ã³ã¼ããæ¸ãè¾¼ã¾ãã¦ãããã¸ã§ãã¯æ©è½ãã失æãã.
ãããããã±ãããã¼ã¸å°å ¥å¾ã¯ã¨ã¯ã¹ãã§ã³ã¸ã¹ããããã®ãã®ããªããªããã¸ã§ãã®ããã©ã¼ãã³ã¹ãç ´å£ãã巨大ãªã·ã£ããã«ã¯ãããªããç´ 1 æé㧠1 æ¥åã®ã¡ãã»ã¼ã¸ããã¼ã¸ã§ããããã«ãªã£ã.
移è¡ã®èª²é¡
Slack ãå«ããå¤ãã®çµç¹ã§ã¯æ°ããæè¡ã大è¦æ¨¡ã«æ¡ç¨ããéã®è¦å´ããã. ä¸æµã®ãã¼ã ã«æ°ãããã¼ã¿ãæ¡ç¨ãã¦ãããã®ã¯é常ã«é£ãã.
ä»åã®ç§»è¡ã§ã¯ãä¸æµã®ã¸ã§ããæ§ã·ã¹ãã ãæä¾ãã¦ããä¿è¨¼ãã¤ã¾ãæ¥æ¬¡ã¬ããªã±ã¼ã·ã§ã³ã®ãµã¤ã¯ã«ãåæã«æ§ç¯ããã¦ãããã¨ã課é¡ã«ãªã£ã.
æ§ã·ã¹ãã ã§ã¯ãUTC ã®æ·±å¤ã¾ã§ã®ãã¹ã¦ã®ã¬ã³ã¼ããåæ¥æ¬¡ã¹ãããã·ã§ããã«åå¨ãããã¨ãä¿è¨¼ãã¦ãããããããCDC ã®ä¸çã§ã¯ããã¼ã¸ã¸ã§ããã¸ã§ãå®è¡æã«å©ç¨å¯è½ãªãã¹ã¦ã® CDC ã¬ã³ã¼ãããã¼ã¸ãã¦ããã ããªã®ã§ããUTC ã®æ·±å¤ã«å®è¡ããããã®ã¸ã§ãããã½ã¼ã¹ã®ãã¹ããã¼ã¿ãã¼ã¹ãã UTC ã®æ·±å¤ã¾ã§ã®ãã¹ã¦ã®ã¬ã³ã¼ãã確å®ã«æ¿å
¥ãããã¨ãã£ãä¿è¨¼ã¯ããªã.
ãã®ãããä¸æµã®ãã¼ã ã大è¦æ¨¡ãªç§»è¡ãããªãã¦ããæ°ãããã¤ãã©ã¤ã³ã§æ¸ãè¾¼ã¾ãããã¼ã¿ã«ç°¡åã«ã¢ã¯ã»ã¹ã§ããããã«ããæ¹æ³ãå¿
è¦ã ã£ãããã㧠DS ãã¼ãã£ã·ã§ã³ãã¼ãã«ãç»å ´ãã.
DS ãã¼ãã£ã·ã§ã³ãã¼ãã«ã¯ãæ¢åã®ãã¼ãã«ã§ãã¬ã¬ã·ã¼ãªããããã¼ã¹ã®ãã¼ã¿è¤è£½ã·ã¹ãã ã®ä¸æµã«ãã Spark ã¸ã§ãã«ãã£ã¦å©ç¨ããã¦ãã.
å DS ãã¼ãã£ã·ã§ã³ã«ã¯ãUTC 0:00ãæç¹ã®å®å
¨ãªã¹ãããã·ã§ãããå«ã¾ãã¦ãããåªçæ§ã®ç¢ºä¿ãæç³»ååæã¨ãã£ãç¨éã§ä½¿ç¨ããã.
åªçæ§ã®å´é¢ã§ã¯ãä¾ãã°ãã¡ããªã¯ã¹ã®å®ç¾©ãå¤æ´ãããã¡ããªã¯ã¹ãã¼ã ãéå» 1 å¹´éã®ã¡ããªã¯ã¹ãåè¨ç®ãããå ´åããã®æéç¯å²ã«ããããã¼ã¿ã®ãã¥ã¼ãå¾ãã®ã«ä½¿ç¨ããã.
æç³»ååæã§ã¯ãä¾ãã°ãæ¥ã
ã®ã¢ã¯ãã£ãã¦ã¼ã¶ã¼æ°ã®å¾åãæéã¨ã¨ãã«è¦³å¯ã§ãã.
æ¢åã®ã¦ã¼ã¶ã«å½±é¿ãä¸ããªããããåºæ¬çã«ãã®ãã¼ãã«ã®ã»ãã³ãã£ã¯ã¹ã¯å¤æ´ããªããã¨ãä¿è¨¼ããè£å´ã§ã¯ããã¼ã¿ã½ã¼ã¹ãã¬ã¬ã·ã¼ã·ã¹ãã ããæ°ãã CDC ãã¤ãã©ã¤ã³ã«åãæ¿ãã.
ããã«ãããä¸æµã®ã¦ã¼ã¶ã¼ã¯å¤æ´ãå ããããæ°ã¥ãå¿
è¦ããªããå ´åã«ãã£ã¦ã¯1æ¥ä»¥ä¸æ©ãä¾åé¢ä¿ã解決ã§ããããã«ãªã.
ãã®ç®æ¨ãéæããããããParcolator DS ãã¼ãã£ã·ã§ã³ãã¼ãã«ã¸ã§ãããä½æãã.
ããã¯ãå¤ãã·ã¹ãã ã¨åãä¿è¨¼ãæã¤åä¸ã®æ¥æ¬¡ãã¼ãã£ã·ã§ã³ãçæãããã®ã§ãåªçæ§ã¨ä¸è²«æ§ãå®ç¾ããããã«Iceberg ã®ã¿ã¤ã ãã©ãã«æ©è½ãå©ç¨ãã¦ãã.
ããæ¥ã®åå¾ã«ããªãµããããã£ã³ãã« ID 7 ã®ãã£ã³ãã«ã«åå ããã¨ãã. 彼女㯠UTC çå¤ä¸ç´åã«ãã®ãã£ã³ãã«ã«åå ããæ¥ä»ãå¤ãã£ãæ°åå¾ã«æ°ãå¤ãã£ã¦ãã£ã³ãã«ãéåºããã¨ãã.
å¤ãã·ã¹ãã ã¨ã®ä¸è²«æ§ãä¿ã¤ãããæ¥æ¬¡ DS ãã¼ãã£ã·ã§ã³ã«ã¯ã彼女ããã£ã³ãã« 7 ã«ãããã¨ã«å¯¾å¿ããã¬ã³ã¼ããå«ã¾ãã¦ããªããã°ãªããªã. ãããããã®ä¾ã§ã¯ãã¬ã³ã¼ããé常ã«è¿æ¥ãã¦ãããããCDC ãã¼ãã«ã®åãã¹ãããã·ã§ããã«å«ã¾ãã¦ããå¯è½æ§ãããããããã£ã¦ãã©ã¼ãã¼ãã«ã®ã¹ãããã·ã§ããã«ã¯ãã®ã¬ã³ã¼ããå«ã¾ãã¦ããªãå¯è½æ§ããã.
ããã§ãUTC 0:00 åã®ææ°ã®ã¹ãããã·ã§ããã«ãã©ã¼ãã¼ãã«ãã¿ã¤ã ãã©ãã«ããã. 次ã«ããã®ã¿ã¤ã ãã©ãã«ãããã©ã¼ãã¼ãã«ã S3 ã®ä¸æãã¼ãã«ã¨ãã¦æ°¸ç¶åãã.
ãã®å¾ãCDC ãã¼ãã«ã®å¢åèªã¿åããè¡ã. ãã®èªã¿åãã¯ãå
ã»ã©ã¿ã¤ã ãã©ãã«ããã¹ãããã·ã§ããã®ããã©ã¼ãã¼ãã«ã¹ãããã·ã§ããã¡ã¿ãã¼ã¿ã«æ ¼ç´ããã¦ããã¹ãããã·ã§ãã ID ããéå§ãã.
ãã ããé常ã®ãã¼ã¸ã¸ã§ãã¨ã¯ç°ãªãããã®å¢åèªã¿åãã¯ã½ã¼ã¹ãã©ã³ã¶ã¯ã·ã§ã³ã®ã¿ã¤ã ã¹ã¿ã³ãã«ãã£ã¦ããã£ã«ã¿ãªã³ã°ãããUTC 0:00 以åã®ã¬ã³ã¼ãã®ã¿ãé¸æããã.
次ã«ãCDC ã¬ã³ã¼ãã®ãã®ãµãã»ããããã¿ã¤ã ãã©ãã«ããä¸æãã¼ã¸ã§ã³ã®ãã©ã¼ãã¼ãã«ã«ãã¼ã¸ãã¦ãUTC 0:00æç¹ã®ä¸è²«æ§ã®ããã¹ãããã·ã§ãããçæãã.
æå¾ã«ããã®ä¸æãã¼ãã«ããè¡ã DS ãã¼ãã£ã·ã§ã³ãã¼ãã«ã®æ°ãããã¼ãã£ã·ã§ã³ã«æ¿å
¥ãã.
ããã«ãã£ã¦ãä¸æµã®ã¦ã¼ã¶ã«ã¨ã£ã¦ã¯ããã¼ã¿ã 1 æ¥ç¨åº¦æ©ãå°çãããã¨ã«æ°ã¥ã以å¤ã¯ãä½ãå¤åããªããã¨ã«ãªã.
ä¸æµã®ã¦ã¼ã¶ã«æä¾ãããã¼ã¿ã®å®å ¨æ§ãã¤ã¾ãããæ¥ã®ãã¹ã¦ã®ãã¼ã¿ãæ¥æ¬¡ DS ãã¼ãã£ã·ã§ã³ã«å«ã¾ãã¦ãããã¨ãä¿è¨¼ããããã«ãç§ãã¡ã¯ãBinlog Watermarkingãã¨å¼ã°ããææ³ã使ç¨ãã¦ãã.
Binlog Watermarking ã®èæ¯ã¨ãã¦ãã¸ã§ãå®è¡å¾ã«é 延ãã¦å°çããã¬ã³ã¼ãããªããã¨ã確èªããããããã®æ¥ã®ãã¹ã¦ã®ã¬ã³ã¼ããå¦çããããã¨ã示ã決å®è«çãªã·ã°ãã«ãå¿ è¦ã ã£ã.
MySQL Binlog 㯠Shard å
ã®ãã¹ã¦ã®ãã¼ãã«ã«ããããã¹ã¦ã®ãã©ã³ã¶ã¯ã·ã§ã³ã«ã°ãã¼ãã«ãªé åºã課ããDebezium ããããã® Binglog ã¤ãã³ããé 次å¦çãã.
å¾ã£ã¦ãKafka å
ã® 1 ã¤ã®ã¬ã³ã¼ãã®åå¨ã使ç¨ãã¦ããã® Shard å
ã®ä»ã®ãã¹ã¦ã®ãã¼ãã«ã®ã¬ã³ã¼ãããã§ã«å¦çããããã¨ãæ¨è«ã§ãããã¨ãæå³ãã.
ãããå®ç¾ããã«ã¯ãç¹å®ã®æéï¼ä»åã®å ´å㯠0:00ï¼ã«æ¸ãè¾¼ã¿ãè¡ããããã¨ãäºåã«ç¥ã£ã¦ããå¿
è¦ããããããããã°ããã®ã¬ã³ã¼ããç¾ããã®ãå¾
ã£ã¦ãä»ã®ãã¹ã¦ãæ¢ã«å¦çãããã¨æ¨è«ã§ãã.
ãããå®è£
ããããã1 ç§ãã¨ã®ãã¼ããã¼ããå°å
¥ãã. å Vitess Keyspace ã®å Shard ã®ç¹å¥ãªãã¼ããã¼ããã¼ãã«ã«è¡ãæ¸ãè¾¼ã¾ãããããã®æ¸ãè¾¼ã¿ã¯ Binlog ã«å«ã¾ãã. ã¹ã©ã¤ãã®ä¾ã§ã¯ããªãµããããã£ã³ãã« 7 ã«åå ããã¬ã³ã¼ãããããã 2 ã¤ã®ãã¼ããã¼ãã¤ãã³ãã®éã«æã¾ãã¦ãããã¨ãããã.
Debezium ã¯ãããã®ã¤ãã³ããé 次å¦çããKafka ã«çæãã. ããããã·ã¹ãã ãæµãããã¼ã¿ã¬ã¤ã¯ã®å¯¾å¿ãããã¼ãã«ã«æ¿å
¥ããã. ä¸æ¹ãAirflow ã¯ã»ã³ãµã¼ã¿ã¹ã¯ããªã¼ã±ã¹ãã¬ã¼ã·ã§ã³ãã¦ãããUTC 0:00 ã®ãã¼ããã¼ããå°çããã®ãå¾
ã¤ããããã¼ããã¼ããã¼ãã«ãç¶ç¶çã«ãã¼ãªã³ã°ãã.
ãããå°çããã¨ãDebezium ããã®ã·ã£ã¼ãå
ã®ä»ã®ãã¹ã¦ã®ãã¼ãã«ã®ã¬ã³ã¼ããæ¢ã«å¦çããã¨æ¨è«ã§ãã. ããã¦ããã¼ãã«ã³ãã¯ã¿ã¼ã®ã©ã°ãæå¾ã«ãã§ãã¯ãã¦ãã³ã³ã·ã¥ã¼ãã¼ã UTC 0:00 ã®ã¬ã³ã¼ãã«è¿½ãã¤ãã¦ãããã¨ã確èªãããã®å¾ããã¼ã¿ãå©ç¨å¯è½ã§ããã¨ããä¿è¨¼ä»ãã§ã¸ã§ããéå§ã§ãã.
ææã¨å¦ã³
å¾æ¥ã®ããã¯ã¢ãããã¹ãã¢ãã¼ã¹ã®ãããã¬ããªã±ã¼ã·ã§ã³ã·ã¹ãã ã¨æ¯è¼ãã¦ã大å¹
ãªã³ã¹ãåæ¸ãå®ç¾ãã.
以åã¯ã¡ãã»ã¼ã¸ãã¼ãã«ã®è¤è£½ã ã㧠1 æ¥æ°åãã«ããã£ã¦ããã®ããç´ 10 %ã¾ã§åæ¸ããã.
ã¾ããæ°ãããã¤ãã©ã¤ã³ã¯ãè¦ä»¶ã«å¿ããããå¤ãã®èª¿æ´å¯è½ãªãã¤ã³ããããç¹ããã¤ã³ãã¨è¨ãã.
ä½ã¬ã¤ãã³ã·ã¼ãå®ç¾ããããã«ããå¤ãã®ã³ã¹ãããããããä½ã¬ã¤ãã³ã·ã¼ãä¸è¦ãªå ´åã«ããå¤ãã®ç¯ç´ããããã¨ãã£ã調æ´ãã§ãã.
ä¾ãã°ã製åãã¼ã ã AB ãã¹ããè¡ãéããã®çµæãããã«ç¢ºèªãããã¨èãããªãããã¼ã¸ã¸ã§ããããé »ç¹ã«å®è¡ãããã¨ãã§ãã.
ä¸æ¹ãã¡ãã»ã¼ã¸ã¯ä¸»ã«ãªãã©ã¤ã³ã®æ¤ç´¢ã¤ã³ããã¯ã¹ä½æã«ä½¿ç¨ãããããã1 æ¥ 1 åã®å®è¡ã§ååã§ããããããã¨ã§å¤ãã®è²»ç¨ãç¯ç´ã§ãã.
çµç¹ãããããã¼ã¿ã¬ããªã±ã¼ã·ã§ã³ã«å¤ãã®è²»ç¨ãããã¦ããå ´åãããã¯åæ¯ã«å¤§ããªå½±é¿ãä¸ããã¯ãã .
ããã©ã¼ãã³ã¹ã«ã¤ãã¦ã¯ãããã¯ã¢ããã復å
ãã¦ãã¼ãã«ã®å
¨å±¥æ´ãæ¯æ¥åå¦çããå¿
è¦ããªãããããã¼ã¿ã¬ã¤ã¯ãã¼ãã«ã®æ´æ°ã«ãããå¹³åæéã大å¹
ã«ç縮ããã.
é常ã®ãµã¤ãºã®ãã¼ãã«ã®å ´åãå¹³å㧠12 ã 48 æéããã£ã¦ãããã®ãç´ 5 åã«ãªã£ã.
ã¹ã©ã¤ãã¯æãæéããããã·ããªãªã®ä¾ã ããæ§ã·ã¹ãã ã§ã¯ã¡ãã»ã¼ã¸ã®å®éã®åãè¾¼ã¿ã« 26.7 æéããã£ã¦ããã®ããæ°ã·ã¹ãã ã§ã¯ 1 æé 7 åå¼·ã«ãªã£ã. ã¤ã¾ããç´ 26 åã®é«éåããããã¯ä»¥åã®è¤è£½æéã®ç´ 4 %ã«ãªã£ã.
大ãã ï¼ ã¤ã®ä»¥ä¸ã®å¦ã³ããã£ã.
ã¾ããè¤éãã«ã¤ãã¦.
æ°ã·ã¹ãã ã¯ã以åã®ããããã¼ã¿ã¬ããªã±ã¼ã·ã§ã³ã®ãããæããã«è¤éã§ãã. 大ããªçç±ã¯ããããã¢ããã¼ãã§å¾ãããèªåèªå·±ä¿®æ£æ©è½ã失ããããã¨ã«ä¾ã. ãããã®ä¸çã§ã¯ãåå®è¡ã§å®å
¨ãªãã¼ãã«ãåå¦çããã®ã§ãåé¡ãããã°ãã¸ã§ããåå®è¡ããã°ãã.
ããããCDC ã§ã¯ããã»ã©åç´ã§ã¯ãªã. è¡ãè¦éããããè¡ãä¸æ£ç¢ºã ã£ããããå ´åãç°¡åãªå復çµè·¯ã¯ãªã. ä¸æµã¨ã®ä¹é¢ãªãã«ä½ã¶æããã¤ãã©ã¤ã³ãå®è¡ã§ããç¾å¨ã®ç¶æ
ã«éããã¾ã§ã«é·ãæéãããã£ã. çµç¹ã CDC ã¢ããã¼ãã®æ¡ç¨ãæ¤è¨ãã¦ããå ´åã¯ãå
ç¢ãªç£æ»ã¨ãã¼ã¿å質ãã§ãã¯ã®ãã¬ã¼ã ã¯ã¼ã¯ã¸ã®æè³ãå¼·ãå§ãã.
2 ã¤ç®ã®å¦ã³ã¯ãIceberg ãã¼ãã«ã®ã¡ã³ããã³ã¹ã¿ã¹ã¯ã管çãããªã¼ã±ã¹ãã¬ã¼ã·ã§ã³ä¸ã®è² æ
ã«ã¤ãã¦.
Iceberg ã®éç¨ã«ã¯ã¹ãããã·ã§ããã®ã©ã¤ããµã¤ã¯ã«ãã³ã³ãã¯ã·ã§ã³ãå¤ç«ãã¡ã¤ã«ã®åé¤ãªã©ããã¾ãã¾ãªã¡ã³ããã³ã¹ã¿ã¹ã¯ãå®è¡ããå¿
è¦ããã. ããããã¯é£ãããã¨ã§ã¯ãªããããã¼ãã«ææè
ã«ã¨ã£ã¦ã¯è¿½å ã®è² æ
ã¨ãªã. æåéãããå¤ãã®ãã®ããªã¼ã±ã¹ãã¬ã¼ã·ã§ã³ããªããã°ãªããªãã¨ããç¹ã¨ãèªç¥çãªã¬ãã«ã§ããä»ã®ãã¼ã ãèæ
®ããªããã°ãªããªããã¨ãå¢ãããããIcebergã®æ¡ç¨ãèºèºãããå¯è½æ§ããã.
èªåèªèº«ããããä½ãå§ããã¨ããchannels_members ãã¼ãã«ã«å¯¾ãã¦æå㯠Copy on write ããã¹ããã1 æéã« 1 åãã¼ã¸ãã¦ãããããã¨ãæ°ãã©ãã¤ãã®ãã¼ãã«ãçæéã§ç´ 650 ãã©ãã¤ãã® S3 ã¹ãã¬ã¼ã¸ã«è¨ãä¸ãã£ã¦ãã¾ã£ãããã®ã¨ããããã¾ã£ããsnapshot_expire ãå®è¡ããå¿
è¦ããã£ããã¨æ°ã¥ããæ£å¸¸ãªãµã¤ãºã«æ»ããã¨ãã§ãã. ããã¯ä»ã®ãã¼ã ã Iceberg ã«åãçµãéã®å®éã®ãªã¹ã¯ã«ãªãã¨ããã. Iceberg ãã¼ãã«ãç¶æããããã«ä½ãããªããã°ãªããªãããç解ãã¦ããããªããã°ããã¼ã¿éãç°¡åã«è¨ãä¸ãããä»ã®åé¡ã«ç´é¢ããå¯è½æ§ããã.
ãã®ããã社å
ã§ã¯ãIceberg 管çãµã¼ãã¹ãã®ãããªãã®ãä½æãããã¨ã«ã¤ãã¦è©±ãåã£ã¦ã. ããã¯ããã¹ã¦ã®ãã¼ã ã«ããããã¼ã¿ã¬ã¤ã¯å
ã®ãã¹ã¦ã® Iceberg ãã¼ãã«ã®ã¡ã³ããã³ã¹ã¿ã¹ã¯ãä¸å
çã«ç®¡çããæ¹æ³ã«ãªããããããªã. ã¾ã å®è£
ããã¦ããªãããæ¤è¨ä¸ã§ãã. çµç¹ã Iceberg ã®æ¡ç¨ãæ¤è¨ãã¦ããå ´åã¯ããã¼ã¿ã¦ã¼ã¶ã¼ãæè¡ãç°¡åã«æ¡ç¨ã§ããããã«ããããã«ãã©ã®ããã«ç®¡çãããã¹ã¦ã® Iceberg ã¡ã³ããã³ã¹ã¿ã¹ã¯ã管çãããã«ã¤ãã¦æ
éã«èããå¿
è¦ããã.