Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
データ分析基盤の変遷とデータレイクの作り方
Search
Ojima Hikaru
April 21, 2018
Technology
1
1.9k
データ分析基盤の変遷とデータレイクの作り方
Battle Conference U30 #2018
Ojima Hikaru
April 21, 2018
Tweet
Share
More Decks by Ojima Hikaru
See All by Ojima Hikaru
Podのオートスケーリングに苦戦し続けている話
ojima_h
1
310
ディメンショナルモデリングのすすめ
ojima_h
7
4.7k
モンスターストライクを支えるデータ分析基盤と準リアルタイム集計
ojima_h
6
5.7k
Other Decks in Technology
See All in Technology
Delta airlines®️ USA Contact Numbers: Complete 2025 Support Guide
airtravelguide
0
340
SmartNewsにおける 1000+ノード規模 K8s基盤 でのコスト最適化 – Spot・Gravitonの大規模導入への挑戦
vsanna2
0
140
Getting to Know Your Legacy (System) with AI-Driven Software Archeology (WeAreDevelopers World Congress 2025)
feststelltaste
1
130
KubeCon + CloudNativeCon Japan 2025 Recap
ren510dev
1
390
20250707-AI活用の個人差を埋めるチームづくり
shnjtk
4
3.9k
マーケットプレイス版Oracle WebCenter Content For OCI
oracle4engineer
PRO
3
960
React開発にStorybookとCopilotを導入して、爆速でUIを編集・確認する方法
yu_kod
1
280
How Do I Contact HP Printer Support? [Full 2025 Guide for U.S. Businesses]
harrry1211
0
120
PO初心者が考えた ”POらしさ”
nb_rady
0
210
Glacierだからってコストあきらめてない? / JAWS Meet Glacier Cost
taishin
1
160
AWS認定を取る中で感じたこと
siromi
1
190
赤煉瓦倉庫勉強会「Databricksを選んだ理由と、絶賛真っ只中のデータ基盤移行体験記」
ivry_presentationmaterials
2
370
Featured
See All Featured
Thoughts on Productivity
jonyablonski
69
4.7k
Music & Morning Musume
bryan
46
6.6k
The Pragmatic Product Professional
lauravandoore
35
6.7k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
34
5.9k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
7
740
Automating Front-end Workflow
addyosmani
1370
200k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
3.9k
Building Better People: How to give real-time feedback that sticks.
wjessup
367
19k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Java REST API Framework Comparison - PWX 2021
mraible
31
8.7k
Practical Orchestrator
shlominoach
189
11k
Transcript
L FG A
• S')1 0(6T • L>A9 XFLAG CDB=
!?NRK • GRD /%Q$7 • GRDO:>3GRD;<8H;C-,/ ACFM • P?/5#2(4&"Q 1+/GRDJPR • BIERN/ • @RIC. *6 / • GitHub: ojima-h 2
4 DAUKPI !
5
6 • • 2TB/day
30 → 1000
7 • 5
→ 100
− 8 S3
− 9 S3
− 10 Redshift
− 11
12 Data Lake Architecture
Data Lake " • -4,&$#!-4,+.' • -4,&% "%,(13*+)40&% !
(Schema on Read) • Data Lake -4,& DWH 24/$ $% 13
Data Lake 14 Hive Metastore
Hive Metastore 15
Hive " • Hadoop%(47-:.69!; • SQL ,*7&$S3 # HDFS !1:/
#1:/ & • ORC !3')83+:502& 16
Hive Metastore • S3/HDFS * "-SQL /1,&(.&0 (.&%)! •
,&(.& • * "- • * "-*#.+') • (.&%$.+ • 17
Hive Metastore • EMR ! Hive Metastore
! • • EMR 30 18
Hive Metastore • Hive Metastore MySQL
• Hive Metastore (HCatalog) server • EMR 5 19
Hive Metastore S3 20
Hive Metastore • ' • '"%
• 'ORC • '!&' ' !'#$$ 21
Hive Metastore • Hive Metastore S3 "
S3" !" 22
Hive Metastore * • "+$%- :>:>(*+ • 8C6*/,# •
3C;4' Hive DB / • Hive ).!% S3&*8C6/ • Hive &.( 8C6)-*@C@/ 23 3C;4 D=A49B<019?C2BBE 8C6579 8C6 Hive Database Table Partition S3 s3://BUCKET/warehouse/SERVICE.db/ s3://BUCKET/warehouse/SERVICE.db/TABLE/ s3://BUCKET/warehouse/SERVICE.db/TABLE/y=YYYY/m=MM/d=DD/
Hive Metastore • %)" &'&'%)" • &$#
! ( 24
Hive Metastore 1. Hive Metastore
25
Hive Metastore 1. Hive Metastore
2. 26
Hive Metastore 1. Hive Metastore
2. 3. Hive Metastore 27
Hive Metastore 1. Hive Metastore
2. 3. Hive Metastore 4. 28
Hive Metastore ! 1. ),(! $ Hive Metastore # 2.
),($'*, 3. Hive Metastore ! $ 4. ),($ &%+ $ "),($ 29
Hive Metastore 30
Hive Metastore • Hive Redshift "%!$%# • Redshift
COPY "%! csv+gzip • Hive "%! ORC • Redshift csv+gzip Hive ORC ⇒ Redshift Spectrum 31
Redshift Spectrum • Redshift S3(#$+ &%*" • ',)+
Hive Metastore ! Hive ',)+" 32 CREATE EXTERNAL SCHEMA schema_name FROM HIVE METASTORE DATABASE 'database_name’ URI 'hive_metastore_uri’;
Hive Metastore • Redshift Hive 33 INSERT
INTO ‘Redshift ’ SELECT … FROM ‘Hive ’ WHERE y=YYYY AND m=MM AND d=DD;
Hive Metastore • Redshift Spectrum
Hive Metastore • Spark SQL • Presto • Athena • Flink 34
Hive Metastore Hive Metastore S3 Hive,
Redshift Spectrum , Spark 35
36
($) • Hive Metastore '25103-$251.4/4& • Hive Metastore , $"
Data Lake , !$# 251&*251&%+$#! Hive Metastore , +$# Data Lake , "$#(!6 37
None