Upgrade to Pro — share decks privately, control downloads, hide ads and more …

モンスターストライクを支えるデータ分析基盤と準リアルタイム集計

 モンスターストライクを支えるデータ分析基盤と準リアルタイム集計

AWS Summit 2018 Tokyo にて

Ojima Hikaru

June 01, 2018
Tweet

More Decks by Ojima Hikaru

Other Decks in Technology

Transcript

  1. #" • \ Z-/7 6.<[ @ojima-h • QCF? SWGL*

    &DTYP • KYH 5+X)= • KYHU@C9KYHAB>MAG325 FGJR • VD5 ;(8.:,'X %715KYHOVY • BIIYT5! • $EYNG40< 5
  2. XFLAG STUDIO • $%& % % • !%&" •

    #& • !%&" • !% • & • • FC • " #% • XFLAG PARK • XFLAG STORE SHIBUYA • etc… • Coming soon…
  3. • EMR • m4.2xlarge x 20 core

    • Master / Core • • Redshift • ds2.8xlarge x 3 (48TB)
  4. #/%,"+0( "$'& $#. +0(!

    EMR ! Hive -*)!
  5. Hive Metastore •EMR % … • $#"$!

    • Spark SQL Redshift Spectrum
  6. 3 4 Glue Data Catalog • +/,)2!(-$&2$"' • Hive Metastore

    • 02.2&2$ "2, Glue Data Catalog 3*% 1(DB #")% …4
  7. ORC • Hive "$& • ! •

    % $# • # • ACID transaction Complex Data Type $
  8. • Application Log •

    API Application Log (1TB/day)
  9. )&-/ • API Log ! API ,0+$(./#" 3 •

    ,0+$(./ 211 • ,0+$(./'%* • API Log )&-/
  10. Sort • API Log URI

    • ORC index • API
  11. Sort • ORC • API Log

    URI • INSERT OVERWRITE api_log SELECT … FROM … DISTRIBUTE BY RAND() SORT BY uri • PPD • hive.optimize.index.filter: true • hive.optimize.ppd: true • hive.optimize.ppd.storage: true
  12. • EMR Task 1

    Task 2 Task 3 Task 4
  13. • Task 1 Task 2 Task

    3 Task 4 • •
  14. '-,+/0*0.# • )(&"! !)(&% • ! )(&"$% •

    ! or " • Luigi, Airflow, Digdag
  15. BI$*( • ECS • !%'"*#&)

    • Docker image • Task & Service CloudFormation • ALB CloudWatch Logs
  16. SELECT … FROM (SELECT * FROM (SELECT user_id, game_id, stage_id

    FROM game_log WHERE CONCAT_WS('-',y,m,d) = '{{DATE}}') AS a JOIN (SELECT NVL(host_game_id, game_id) AS host_game_id, COUNT(*) AS players_num FROM game_log WHERE CONCAT_WS('-',y,m,d) = '{{DATE}}’ GROUP BY NVL(host_game_id, game_id)) AS b ON a.game_id = b.host_game_id WHERE players_num > 1) …
  17. )'9 (85 user_id 2;&6= ID game_id >

    *=570=)ID host_game_id 4:.2;&*,/3,/ game_id 4:.2;& 3,/#!-<2;& game_id … +=1= $%"
  18. #!1 "0- user_id *3 .5 ID game_id 6$5-/)5#ID host_game_id

    ,2'*3 $%(+%( game_id ,2'*3 +%(&4*3 game_id is_multi ,2'*3 TRUE
  19. SELECT … FROM (SELECT * FROM (SELECT user_id, game_id, stage_id

    FROM game_log WHERE CONCAT_WS('-',y,m,d) = '{{DATE}}') AS a JOIN (SELECT NVL(host_game_id, game_id) AS host_game_id, COUNT(*) AS players_num FROM game_log WHERE CONCAT_WS('-',y,m,d) = '{{DATE}}’ GROUP BY NVL(host_game_id, game_id)) AS b ON a.game_id = b.host_game_id WHERE players_num > 1) … SELECT … FROM game_log WHERE CONCAT_WS('-',y,m,d) = '{{DATE}}’ AND is_multi …
  20. $"#% • !$ •

    ⇒ Dimensional Modeling
  21. Dimensional Modeling • Fact Dimension (= Star Schema) •

    Fact … "'591! %8*:2 • Dimension … "'591!% • 0:."#& • (76#47)" • ,-/6" ⇒ +:3-" • Dimension & % $& Fact Dimension Dimension Dimension Dimension
  22. %.*- &.# Dimensional Modeling •

    ").'$+ • • ! ,(.'-
  23. )0& #-+*/0 )0&' ("1Data-QA2 • )0&%0$

    • Login API vs. users.last_login_time • • NULL!., • COUNT(*) vs. COUNT(col) UnitTest
  24. 3.-*/ ! EMR 2)' → Hive metastore " Hive

    )' → # A<;2) → 9DCBGI $ 7H9@6,(?I:1+540' → BI=IF % 87E) → >ICF2 & ?I:2 → ?I:2
  25. ;A01B04=*-! ,<A9BA& $%# >(?1*: & " 1.

    2. /B3)@AGET,<A9BA 3. ;A04BINGO 4. 6'*4>B. 7=A5+28
  26. GET : 25,000 records /

    s DynamoDB: 1RCU / 1WCU
  27. 15% EH?@BINGO • https://www.monster-strike.com/promotion/winter2017/bingo.html • FI>I BH= :' • 4BH=<IA:"

    ' • 5/;CH@53& 3BH= : ' • BH=:!/2(8FI>I:$/& 3)60' • BH=DIG+.92*6&3,8#7-
  28. $ 25-/BINGO • Redshift ) • 05+$4,.*)SQL •

    05+163('%# $ BINGO ) • 1~10 &" )!
  29. >1?72=(#$ • (-"$ * • AWS )4@;60+/ .&%(%. •

    S3 , Kinesis 0 .&% 569=(0.&' >1?72=01:<83(.&%!
  30. !