Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
データ分析基盤の変遷とデータレイクの作り方
Search
Ojima Hikaru
April 21, 2018
Technology
1
1.8k
データ分析基盤の変遷とデータレイクの作り方
Battle Conference U30 #2018
Ojima Hikaru
April 21, 2018
Tweet
Share
More Decks by Ojima Hikaru
See All by Ojima Hikaru
Podのオートスケーリングに苦戦し続けている話
ojima_h
1
270
ディメンショナルモデリングのすすめ
ojima_h
7
4.5k
モンスターストライクを支えるデータ分析基盤と準リアルタイム集計
ojima_h
6
5.6k
Other Decks in Technology
See All in Technology
あなたの人生も変わるかも?AWS認定2つで始まったウソみたいな話
iwamot
3
780
Formal Development of Operating Systems in Rust
riru
1
410
.NET 最新アップデート ~ AI とクラウド時代のアプリモダナイゼーション
chack411
0
170
AIエージェントに脈アリかどうかを分析させてみた
sonoda_mj
2
140
Kotlin Multiplatformのポテンシャル
recruitengineers
PRO
1
120
Amazon Q Developerで.NET Frameworkプロジェクトをモダナイズしてみた
kenichirokimura
1
180
AWS Community Builderのススメ - みんなもCommunity Builderに応募しよう! -
smt7174
0
130
Oracle Base Database Service:サービス概要のご紹介
oracle4engineer
PRO
1
16k
ヤプリQA課題の見える化
gu3
0
160
信頼されるためにやったこと、 やらなかったこと。/What we did to be trusted, What we did not do.
bitkey
PRO
0
2k
カップ麺の待ち時間(3分)でわかるPartyRockアップデート
ryutakondo
0
120
Fearsome File Formats
ange
0
580
Featured
See All Featured
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
38
1.9k
How to Ace a Technical Interview
jacobian
276
23k
Gamification - CAS2011
davidbonilla
80
5.1k
YesSQL, Process and Tooling at Scale
rocio
170
14k
Being A Developer After 40
akosma
89
590k
Site-Speed That Sticks
csswizardry
2
240
Large-scale JavaScript Application Architecture
addyosmani
510
110k
Optimizing for Happiness
mojombo
376
70k
Building Your Own Lightsaber
phodgson
104
6.2k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
33
2k
Documentation Writing (for coders)
carmenintech
67
4.5k
We Have a Design System, Now What?
morganepeng
51
7.3k
Transcript
L FG A
• S')1 0(6T • L>A9 XFLAG CDB=
!?NRK • GRD /%Q$7 • GRDO:>3GRD;<8H;C-,/ ACFM • P?/5#2(4&"Q 1+/GRDJPR • BIERN/ • @RIC. *6 / • GitHub: ojima-h 2
4 DAUKPI !
5
6 • • 2TB/day
30 → 1000
7 • 5
→ 100
− 8 S3
− 9 S3
− 10 Redshift
− 11
12 Data Lake Architecture
Data Lake " • -4,&$#!-4,+.' • -4,&% "%,(13*+)40&% !
(Schema on Read) • Data Lake -4,& DWH 24/$ $% 13
Data Lake 14 Hive Metastore
Hive Metastore 15
Hive " • Hadoop%(47-:.69!; • SQL ,*7&$S3 # HDFS !1:/
#1:/ & • ORC !3')83+:502& 16
Hive Metastore • S3/HDFS * "-SQL /1,&(.&0 (.&%)! •
,&(.& • * "- • * "-*#.+') • (.&%$.+ • 17
Hive Metastore • EMR ! Hive Metastore
! • • EMR 30 18
Hive Metastore • Hive Metastore MySQL
• Hive Metastore (HCatalog) server • EMR 5 19
Hive Metastore S3 20
Hive Metastore • ' • '"%
• 'ORC • '!&' ' !'#$$ 21
Hive Metastore • Hive Metastore S3 "
S3" !" 22
Hive Metastore * • "+$%- :>:>(*+ • 8C6*/,# •
3C;4' Hive DB / • Hive ).!% S3&*8C6/ • Hive &.( 8C6)-*@C@/ 23 3C;4 D=A49B<019?C2BBE 8C6579 8C6 Hive Database Table Partition S3 s3://BUCKET/warehouse/SERVICE.db/ s3://BUCKET/warehouse/SERVICE.db/TABLE/ s3://BUCKET/warehouse/SERVICE.db/TABLE/y=YYYY/m=MM/d=DD/
Hive Metastore • %)" &'&'%)" • &$#
! ( 24
Hive Metastore 1. Hive Metastore
25
Hive Metastore 1. Hive Metastore
2. 26
Hive Metastore 1. Hive Metastore
2. 3. Hive Metastore 27
Hive Metastore 1. Hive Metastore
2. 3. Hive Metastore 4. 28
Hive Metastore ! 1. ),(! $ Hive Metastore # 2.
),($'*, 3. Hive Metastore ! $ 4. ),($ &%+ $ "),($ 29
Hive Metastore 30
Hive Metastore • Hive Redshift "%!$%# • Redshift
COPY "%! csv+gzip • Hive "%! ORC • Redshift csv+gzip Hive ORC ⇒ Redshift Spectrum 31
Redshift Spectrum • Redshift S3(#$+ &%*" • ',)+
Hive Metastore ! Hive ',)+" 32 CREATE EXTERNAL SCHEMA schema_name FROM HIVE METASTORE DATABASE 'database_name’ URI 'hive_metastore_uri’;
Hive Metastore • Redshift Hive 33 INSERT
INTO ‘Redshift ’ SELECT … FROM ‘Hive ’ WHERE y=YYYY AND m=MM AND d=DD;
Hive Metastore • Redshift Spectrum
Hive Metastore • Spark SQL • Presto • Athena • Flink 34
Hive Metastore Hive Metastore S3 Hive,
Redshift Spectrum , Spark 35
36
($) • Hive Metastore '25103-$251.4/4& • Hive Metastore , $"
Data Lake , !$# 251&*251&%+$#! Hive Metastore , +$# Data Lake , "$#(!6 37
None