Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
データ分析基盤の変遷とデータレイクの作り方
Search
Ojima Hikaru
April 21, 2018
Technology
1
1.8k
データ分析基盤の変遷とデータレイクの作り方
Battle Conference U30 #2018
Ojima Hikaru
April 21, 2018
Tweet
Share
More Decks by Ojima Hikaru
See All by Ojima Hikaru
Podのオートスケーリングに苦戦し続けている話
ojima_h
1
270
ディメンショナルモデリングのすすめ
ojima_h
7
4.5k
モンスターストライクを支えるデータ分析基盤と準リアルタイム集計
ojima_h
6
5.6k
Other Decks in Technology
See All in Technology
新しいスケーリング則と学習理論
taiji_suzuki
9
3.8k
Zero Data Loss Autonomous Recovery Service サービス概要
oracle4engineer
PRO
1
5k
Evolving Architecture
rainerhahnekamp
3
240
30分でわかる「リスクから学ぶKubernetesコンテナセキュリティ」/30min-k8s-container-sec
mochizuki875
3
400
OCI技術資料 : ファイル・ストレージ 概要
ocise
3
12k
東京Ruby会議12 Ruby と Rust と私 / Tokyo RubyKaigi 12 Ruby, Rust and me
eagletmt
1
270
12 Days of OpenAIから読み解く、生成AI 2025年のトレンド
shunsukeono_am
0
1.1k
Fearsome File Formats
ange
0
580
Azureの開発で辛いところ
re3turn
0
220
商品レコメンドでのexplicit negative feedbackの活用
alpicola
1
210
カップ麺の待ち時間(3分)でわかるPartyRockアップデート
ryutakondo
0
110
#TRG24 / David Cuartielles / Post Open Source
tarugoconf
0
500
Featured
See All Featured
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
28
4.5k
It's Worth the Effort
3n
183
28k
YesSQL, Process and Tooling at Scale
rocio
170
14k
Product Roadmaps are Hard
iamctodd
PRO
50
11k
How to train your dragon (web standard)
notwaldorf
89
5.8k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
160
15k
A better future with KSS
kneath
238
17k
Automating Front-end Workflow
addyosmani
1366
200k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
330
21k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
33
2.7k
[RailsConf 2023] Rails as a piece of cake
palkan
53
5.1k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
3
350
Transcript
L FG A
• S')1 0(6T • L>A9 XFLAG CDB=
!?NRK • GRD /%Q$7 • GRDO:>3GRD;<8H;C-,/ ACFM • P?/5#2(4&"Q 1+/GRDJPR • BIERN/ • @RIC. *6 / • GitHub: ojima-h 2
4 DAUKPI !
5
6 • • 2TB/day
30 → 1000
7 • 5
→ 100
− 8 S3
− 9 S3
− 10 Redshift
− 11
12 Data Lake Architecture
Data Lake " • -4,&$#!-4,+.' • -4,&% "%,(13*+)40&% !
(Schema on Read) • Data Lake -4,& DWH 24/$ $% 13
Data Lake 14 Hive Metastore
Hive Metastore 15
Hive " • Hadoop%(47-:.69!; • SQL ,*7&$S3 # HDFS !1:/
#1:/ & • ORC !3')83+:502& 16
Hive Metastore • S3/HDFS * "-SQL /1,&(.&0 (.&%)! •
,&(.& • * "- • * "-*#.+') • (.&%$.+ • 17
Hive Metastore • EMR ! Hive Metastore
! • • EMR 30 18
Hive Metastore • Hive Metastore MySQL
• Hive Metastore (HCatalog) server • EMR 5 19
Hive Metastore S3 20
Hive Metastore • ' • '"%
• 'ORC • '!&' ' !'#$$ 21
Hive Metastore • Hive Metastore S3 "
S3" !" 22
Hive Metastore * • "+$%- :>:>(*+ • 8C6*/,# •
3C;4' Hive DB / • Hive ).!% S3&*8C6/ • Hive &.( 8C6)-*@C@/ 23 3C;4 D=A49B<019?C2BBE 8C6579 8C6 Hive Database Table Partition S3 s3://BUCKET/warehouse/SERVICE.db/ s3://BUCKET/warehouse/SERVICE.db/TABLE/ s3://BUCKET/warehouse/SERVICE.db/TABLE/y=YYYY/m=MM/d=DD/
Hive Metastore • %)" &'&'%)" • &$#
! ( 24
Hive Metastore 1. Hive Metastore
25
Hive Metastore 1. Hive Metastore
2. 26
Hive Metastore 1. Hive Metastore
2. 3. Hive Metastore 27
Hive Metastore 1. Hive Metastore
2. 3. Hive Metastore 4. 28
Hive Metastore ! 1. ),(! $ Hive Metastore # 2.
),($'*, 3. Hive Metastore ! $ 4. ),($ &%+ $ "),($ 29
Hive Metastore 30
Hive Metastore • Hive Redshift "%!$%# • Redshift
COPY "%! csv+gzip • Hive "%! ORC • Redshift csv+gzip Hive ORC ⇒ Redshift Spectrum 31
Redshift Spectrum • Redshift S3(#$+ &%*" • ',)+
Hive Metastore ! Hive ',)+" 32 CREATE EXTERNAL SCHEMA schema_name FROM HIVE METASTORE DATABASE 'database_name’ URI 'hive_metastore_uri’;
Hive Metastore • Redshift Hive 33 INSERT
INTO ‘Redshift ’ SELECT … FROM ‘Hive ’ WHERE y=YYYY AND m=MM AND d=DD;
Hive Metastore • Redshift Spectrum
Hive Metastore • Spark SQL • Presto • Athena • Flink 34
Hive Metastore Hive Metastore S3 Hive,
Redshift Spectrum , Spark 35
36
($) • Hive Metastore '25103-$251.4/4& • Hive Metastore , $"
Data Lake , !$# 251&*251&%+$#! Hive Metastore , +$# Data Lake , "$#(!6 37
None