Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
データ分析基盤の変遷とデータレイクの作り方
Search
Ojima Hikaru
April 21, 2018
Technology
1
1.9k
データ分析基盤の変遷とデータレイクの作り方
Battle Conference U30 #2018
Ojima Hikaru
April 21, 2018
Tweet
Share
More Decks by Ojima Hikaru
See All by Ojima Hikaru
Podのオートスケーリングに苦戦し続けている話
ojima_h
1
320
ディメンショナルモデリングのすすめ
ojima_h
7
4.7k
モンスターストライクを支えるデータ分析基盤と準リアルタイム集計
ojima_h
6
5.7k
Other Decks in Technology
See All in Technology
american aa airlines®️ USA Contact Numbers: Complete 2025 Support Guide
aaguide
0
500
対話型音声AIアプリケーションの信頼性向上の取り組み
ivry_presentationmaterials
3
1.1k
Rethinking Incident Response: Context-Aware AI in Practice
rrreeeyyy
2
940
CDK Vibe Coding Fes
tomoki10
1
630
AI Ready API ─ AI時代に求められるAPI設計とは?/ AI-Ready API - Designing MCP and APIs in the AI Era
yokawasa
8
2.3k
Four Keysから始める信頼性の改善 - SRE NEXT 2025
ozakikota
0
420
今だから言えるセキュリティLT_Wordpress5.7.2未満を一斉アップデートせよ
cuebic9bic
2
170
20250718_ITSurf_“Bet AI”を支える文化とコストマネジメント
helosshi
0
100
AI エージェントと考え直すデータ基盤
na0
20
7.9k
マルチプロダクト環境におけるSREの役割 / SRE NEXT 2025 lunch session
sugamasao
1
730
Talk to Someone At Delta Airlines™️ USA Contact Numbers
travelcarecenter
0
160
ロールが細分化された組織でSREは何をするか?
tgidgd
1
420
Featured
See All Featured
Designing Experiences People Love
moore
142
24k
The Straight Up "How To Draw Better" Workshop
denniskardys
235
140k
Typedesign – Prime Four
hannesfritz
42
2.7k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
The World Runs on Bad Software
bkeepers
PRO
70
11k
Navigating Team Friction
lara
187
15k
The Cost Of JavaScript in 2023
addyosmani
51
8.6k
We Have a Design System, Now What?
morganepeng
53
7.7k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
181
54k
GraphQLの誤解/rethinking-graphql
sonatard
71
11k
Code Review Best Practice
trishagee
69
19k
How STYLIGHT went responsive
nonsquared
100
5.6k
Transcript
L FG A
• S')1 0(6T • L>A9 XFLAG CDB=
!?NRK • GRD /%Q$7 • GRDO:>3GRD;<8H;C-,/ ACFM • P?/5#2(4&"Q 1+/GRDJPR • BIERN/ • @RIC. *6 / • GitHub: ojima-h 2
4 DAUKPI !
5
6 • • 2TB/day
30 → 1000
7 • 5
→ 100
− 8 S3
− 9 S3
− 10 Redshift
− 11
12 Data Lake Architecture
Data Lake " • -4,&$#!-4,+.' • -4,&% "%,(13*+)40&% !
(Schema on Read) • Data Lake -4,& DWH 24/$ $% 13
Data Lake 14 Hive Metastore
Hive Metastore 15
Hive " • Hadoop%(47-:.69!; • SQL ,*7&$S3 # HDFS !1:/
#1:/ & • ORC !3')83+:502& 16
Hive Metastore • S3/HDFS * "-SQL /1,&(.&0 (.&%)! •
,&(.& • * "- • * "-*#.+') • (.&%$.+ • 17
Hive Metastore • EMR ! Hive Metastore
! • • EMR 30 18
Hive Metastore • Hive Metastore MySQL
• Hive Metastore (HCatalog) server • EMR 5 19
Hive Metastore S3 20
Hive Metastore • ' • '"%
• 'ORC • '!&' ' !'#$$ 21
Hive Metastore • Hive Metastore S3 "
S3" !" 22
Hive Metastore * • "+$%- :>:>(*+ • 8C6*/,# •
3C;4' Hive DB / • Hive ).!% S3&*8C6/ • Hive &.( 8C6)-*@C@/ 23 3C;4 D=A49B<019?C2BBE 8C6579 8C6 Hive Database Table Partition S3 s3://BUCKET/warehouse/SERVICE.db/ s3://BUCKET/warehouse/SERVICE.db/TABLE/ s3://BUCKET/warehouse/SERVICE.db/TABLE/y=YYYY/m=MM/d=DD/
Hive Metastore • %)" &'&'%)" • &$#
! ( 24
Hive Metastore 1. Hive Metastore
25
Hive Metastore 1. Hive Metastore
2. 26
Hive Metastore 1. Hive Metastore
2. 3. Hive Metastore 27
Hive Metastore 1. Hive Metastore
2. 3. Hive Metastore 4. 28
Hive Metastore ! 1. ),(! $ Hive Metastore # 2.
),($'*, 3. Hive Metastore ! $ 4. ),($ &%+ $ "),($ 29
Hive Metastore 30
Hive Metastore • Hive Redshift "%!$%# • Redshift
COPY "%! csv+gzip • Hive "%! ORC • Redshift csv+gzip Hive ORC ⇒ Redshift Spectrum 31
Redshift Spectrum • Redshift S3(#$+ &%*" • ',)+
Hive Metastore ! Hive ',)+" 32 CREATE EXTERNAL SCHEMA schema_name FROM HIVE METASTORE DATABASE 'database_name’ URI 'hive_metastore_uri’;
Hive Metastore • Redshift Hive 33 INSERT
INTO ‘Redshift ’ SELECT … FROM ‘Hive ’ WHERE y=YYYY AND m=MM AND d=DD;
Hive Metastore • Redshift Spectrum
Hive Metastore • Spark SQL • Presto • Athena • Flink 34
Hive Metastore Hive Metastore S3 Hive,
Redshift Spectrum , Spark 35
36
($) • Hive Metastore '25103-$251.4/4& • Hive Metastore , $"
Data Lake , !$# 251&*251&%+$#! Hive Metastore , +$# Data Lake , "$#(!6 37
None