Upgrade to Pro — share decks privately, control downloads, hide ads and more …

モンスターストライクを支えるデータ分析基盤と準リアルタイム集計

 モンスターストライクを支えるデータ分析基盤と準リアルタイム集計

AWS Summit 2018 Tokyo にて

07f2123825aa4d86d69ba6ad18f305e1?s=128

Ojima Hikaru

June 01, 2018
Tweet

Transcript

  1.       

  2. #" • \  Z-/7 6.<[ @ojima-h • QCF? SWGL*

    &DTYP • KYH 5+X)= • KYHU@C9KYHAB>MAG325 FGJR • VD5 ;(8.:,'X %715KYHOVY • BIIYT5! • $EYNG40< 5
  3.    4,500 

  4. XFLAG STUDIO • $%& % % • !%&"  •

     #& • !%&"    • !%  • & •    • FC • " #% • XFLAG PARK • XFLAG STORE SHIBUYA • etc… • Coming soon…
  5.     200+

  6.   1 PB @s3 2 TB / day •

     • DB   •  
  7.  • EMR • m4.2xlarge x 20 core  

    • Master / Core  •    • Redshift • ds2.8xlarge x 3 (48TB)
  8.  KPI  !"! "     

  9. 

  10.  ~ 2014 11 DAUKPI !"    

  11.   

  12.  EMR    Hive    

     
  13.   #/%,"+0( "$'&   $#.  +0(! 

    EMR !    Hive    -*)! 
  14.  EMR   Hive     

  15.  EMR  16 • EMR  • Hive Metastore

     
  16.  EMR   → Hive Metastore 

  17.  Hive Metastore  18 • Hive Metastore RDS 

    • Metastore server 
  18.  Hive Metastore  •EMR % … • $#"$! 

      • Spark SQL  Redshift Spectrum 
  19. 3 4 Glue Data Catalog • +/,)2!(-$&2$"' • Hive Metastore

     • 02.2&2$ "2,  Glue Data Catalog   3*% 1(DB  #")% …4
  20.  EMR   Hive     

  21.   Hive  Why Hive? • SQL • Hive

    Metastore •  Hive   
  22.  Hive   STEP1. ORC   STEP2. 

     STEP3. 
  23.  Hive   STEP1. ORC   STEP2. 

     STEP3. 
  24. ORC • Hive  "$& • !  • 

    % $# • # • ACID transaction  Complex Data Type $
  25. ORC •  ORC   •   

  26.  • TEZ Engine • Cost Based Optimization • Vectorization

    
  27.  Hive   STEP1. ORC   STEP2. 

     STEP3. 
  28.   •  Application Log    •

     API   Application Log  (1TB/day)
  29.    Dynamic Partition • Dynamic Partition • API

    Log / Error Log / Custom Log  
  30.  • API Log   API   

  31. )&-/ • API Log !  API ,0+$(./#" 3 •

    ,0+$(./ 211 • ,0+$(./'%* • API Log )&-/ 
  32.    Sort • API Log  URI 

     • ORC  index  •   API   
  33.    Sort • ORC  • API Log

     URI  • INSERT OVERWRITE api_log SELECT … FROM … DISTRIBUTE BY RAND() SORT BY uri • PPD  • hive.optimize.index.filter: true • hive.optimize.ppd: true • hive.optimize.ppd.storage: true
  34.  Hive   STEP1. ORC  STEP2.  

    STEP3. 
  35.  •   EMR    Task 1

    Task 2 Task 3 Task 4
  36.  •    Task 1 Task 2 Task

    3 Task 4  •   • 
  37. Hive  15 → 

  38.  EMR   Hive     

  39.  %$#   Hive & •  • "!

     • "!  
  40.     → 

  41. '-,+/0*0.# • )(&"! !)(&% •  !  )(&"$% •

    ! or " • Luigi, Airflow, Digdag  
  42.   

  43.  ,54389)  • ',54389197) • .-+' &%"!$#20/& • 20/'6*26

    &%( !&%"!
  44.  EMR  Hive    

  45.   !  !

  46.        

  47.  $"!%  • #!%   SSH  

  48.      → BI

  49.  BI  • Zeppelin • metabase • re:dash •

     
  50.   BI$*(  • ECS  • !%'"*#&) 

    • Docker image   • Task & Service  CloudFormation • ALB  CloudWatch Logs 
  51.   !    !

  52. SELECT … FROM (SELECT * FROM (SELECT user_id, game_id, stage_id

    FROM game_log WHERE CONCAT_WS('-',y,m,d) = '{{DATE}}') AS a JOIN (SELECT NVL(host_game_id, game_id) AS host_game_id, COUNT(*) AS players_num FROM game_log WHERE CONCAT_WS('-',y,m,d) = '{{DATE}}’ GROUP BY NVL(host_game_id, game_id)) AS b ON a.game_id = b.host_game_id WHERE players_num > 1) …    
  53.   )'9  (85 user_id 2;&6= ID game_id >

    *=570=)ID host_game_id 4:.2;&*,/3,/ game_id 4:.2;& 3,/#!-<2;&   game_id … +=1=  $%" 
  54.    → 

  55.  #!1 "0- user_id *3 .5 ID game_id 6$5-/)5#ID host_game_id

    ,2'*3 $%(+%( game_id ,2'*3 +%(&4*3   game_id is_multi ,2'*3  TRUE
  56. SELECT … FROM (SELECT * FROM (SELECT user_id, game_id, stage_id

    FROM game_log WHERE CONCAT_WS('-',y,m,d) = '{{DATE}}') AS a JOIN (SELECT NVL(host_game_id, game_id) AS host_game_id, COUNT(*) AS players_num FROM game_log WHERE CONCAT_WS('-',y,m,d) = '{{DATE}}’ GROUP BY NVL(host_game_id, game_id)) AS b ON a.game_id = b.host_game_id WHERE players_num > 1) … SELECT … FROM game_log WHERE CONCAT_WS('-',y,m,d) = '{{DATE}}’ AND is_multi …
  57.  $"#% • !$   •   

    ⇒ Dimensional Modeling
  58. Dimensional Modeling • Fact  Dimension (= Star Schema) •

    Fact … "'591! %8*:2 • Dimension … "'591!%  • 0:."#& • (76#47)" • ,-/6" ⇒ +:3-" • Dimension & % $& Fact Dimension Dimension Dimension Dimension
  59.    %.*-  &.# Dimensional Modeling  •

     ").'$+ •    • ! ,(.'-
  60.     !  !

  61.  $ •$ • " #  • SQL 

    ! • Hive   • 
  62.     →  

  63.  )0&  #-+*/0  )0&' ("1Data-QA2 • )0&%0$ 

    • Login API vs. users.last_login_time •     • NULL!., • COUNT(*) vs. COUNT(col) UnitTest 
  64. 3.-*/ ! EMR 2)' → Hive metastore  " Hive

    )' →  # A<;2)  → 9DCBGI $ 7H9@6,(?I:1+540' → BI=IF  % 87E) → >ICF2  & ?I:2 → ?I:2 
  65. 

  66.   KPI    

  67.   KPI    

  68.   

  69. ;A01B04=*-! ,<A9BA& $%#  >(?1*: & " 1.  

    2. /B3)@AGET,<A9BA 3. ;A04BINGO 4. 6'*4>B. 7=A5+28 
  70. None
  71. ,4$  • https://www.monster-strike.com/promotion/12shi/#yosou • YouTube 210A3 4 9)+79 

    • 3./% 4#9& '*-?@=@0!9 (& • ' )87504"%6:249<;>3 
  72.    25,000 records / s Kinesis 1 shard

    EMR c3.2xlarge x 2
  73. None
  74.  #8)67GET /7,87 • https://www.monster-strike.com/promotion/kiwami2017/unkyoku_ice.html • "8. /2!( #8)67&!38. !8-7+5'7*

    • 08$8+5'7*!8-734(. ,58 %17 
  75.    GET   : 25,000 records /

    s DynamoDB: 1RCU / 1WCU
  76. None
  77. 15% EH?@BINGO • https://www.monster-strike.com/promotion/winter2017/bingo.html • FI>I BH= :' • 4BH=<IA:"

    ' • 5/;CH@53& 3BH= : ' •  BH=:!/2(8FI>I:$/& 3)60' • BH=DIG+.92*6&3,8#7-
  78. $ 25-/BINGO • Redshift )  • 05+$4,.*)SQL  •

    05+163('%# $ BINGO ) • 1~10 &" )! 
  79. None
  80. ' ;-/9DG2 <CF:08= • ;-/9DG2(7@?3GA • H '&/>F9BG6G#74., • 'CF1F2,

    5/9& •  'E2,JOIN"+! )Lambda * Flink % $#(
  81.    •  Kinesis Firehose Redshift  •

    Redshift  JOIN 
  82. >1?72=(#$ • (-"$ *  • AWS )4@;60+/ .&%(%. •

    S3 , Kinesis 0 .&% 569=(0.&' >1?72=01:<83(.&%!
  83. 

  84.  • %#    • %"!$ • ML

    / AI  
  85. !