Upgrade to Pro — share decks privately, control downloads, hide ads and more …

データ分析基盤の変遷とデータレイクの作り方

 データ分析基盤の変遷とデータレイクの作り方

Battle Conference U30 #2018

Ojima Hikaru

April 21, 2018
Tweet

More Decks by Ojima Hikaru

Other Decks in Technology

Transcript

  1.  •   S')1 0(6T • L>A9 XFLAG CDB=

    !?NRK • GRD /%Q$7 • GRDO:>3GRD;<8H;C-,/ ACFM • P?/5#2(4&"Q 1+/GRDJPR • BIERN/ • @RIC. *6 /  • GitHub: ojima-h 2
  2. Data Lake " • -4,&$#!-4,+.' • -4,&% "%,(13*+)40&% ! 

    (Schema on Read) • Data Lake -4,&  DWH 24/$   $% 13
  3. Hive " • Hadoop%(47-:.69!; • SQL ,*7&$S3 # HDFS !1:/

    #1:/ & • ORC !3')83+:502& 16
  4. Hive Metastore  • S3/HDFS * "-SQL /1,&(.&0 (.&%)! •

    ,&(.& • * "- • * "-*#.+') • (.&%$.+ •   17
  5. Hive Metastore   • EMR !  Hive Metastore

    !  •   • EMR 30 18
  6. Hive Metastore   • Hive Metastore  MySQL 

     • Hive Metastore (HCatalog) server    • EMR  5   19
  7. Hive Metastore   • '  • '"% 

    • 'ORC • '!&' '  !'#$$ 21
  8. Hive Metastore  • Hive Metastore S3  " 

    S3"  !"    22
  9. Hive Metastore * • "+$%-  :>:>(*+ • 8C6*/,# •

    3C;4' Hive DB / • Hive ).!% S3&*8C6/ • Hive &.( 8C6)-*@C@/ 23 3C;4 D=A49B<019?C2BBE 8C6579 8C6 Hive Database Table Partition S3 s3://BUCKET/warehouse/SERVICE.db/ s3://BUCKET/warehouse/SERVICE.db/TABLE/ s3://BUCKET/warehouse/SERVICE.db/TABLE/y=YYYY/m=MM/d=DD/
  10. Hive Metastore   1.   Hive Metastore 

    2.  3. Hive Metastore  27
  11. Hive Metastore  1.    Hive Metastore 

    2.  3. Hive Metastore   4.  28
  12. Hive Metastore ! 1. ),(! $ Hive Metastore # 2.

    ),($'*, 3. Hive Metastore ! $  4. ),($ &%+ $ "),($ 29
  13. Hive Metastore  • Hive  Redshift "%!$%# • Redshift

     COPY  "%! csv+gzip • Hive "%! ORC • Redshift  csv+gzip  Hive  ORC    ⇒ Redshift Spectrum   31
  14. Redshift Spectrum  • Redshift  S3(#$+ &%*" • ',)+

    Hive Metastore  ! Hive ',)+"  32 CREATE EXTERNAL SCHEMA schema_name FROM HIVE METASTORE DATABASE 'database_name’ URI 'hive_metastore_uri’;
  15. Hive Metastore  • Redshift Hive   33 INSERT

    INTO ‘Redshift ’ SELECT … FROM ‘Hive ’ WHERE y=YYYY AND m=MM AND d=DD;
  16. Hive Metastore   • Redshift Spectrum   

    Hive Metastore  • Spark SQL • Presto • Athena • Flink  34
  17. ($) • Hive Metastore '25103-$251.4/4& • Hive Metastore , $"

    Data Lake , !$# 251&*251&%+$#! Hive Metastore , +$# Data Lake , "$#(!6 37