Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
データ分析基盤の変遷とデータレイクの作り方
Search
Ojima Hikaru
April 21, 2018
Technology
1
1.7k
データ分析基盤の変遷とデータレイクの作り方
Battle Conference U30 #2018
Ojima Hikaru
April 21, 2018
Tweet
Share
More Decks by Ojima Hikaru
See All by Ojima Hikaru
ディメンショナルモデリングのすすめ
ojima_h
7
4.1k
モンスターストライクを支えるデータ分析基盤と準リアルタイム集計
ojima_h
6
5.4k
Other Decks in Technology
See All in Technology
長期運用プロジェクトでのMySQLからTiDB移行の検証
colopl
2
670
[PlatformCon 24] Platform Orchestrators: The Missing Middle of Internal Developer Platforms?
danielbryantuk
1
180
Postman v10リリース後を振り返る
nagix
0
130
コンパウンドスタートアップのためのスケーラブルでセキュアなInfrastructure as Codeパイプラインを考える / Scalable and Secure Infrastructure as Code Pipeline for a Compound Startup
yuyatakeyama
3
2.5k
巨大なテーブルのテーブル定義を無停止で安全に誰でも変更できるようにする / Table-definitions-for-huge-tables-can-be-modified-by-anyone-safely-and-non-disruptively
freee
1
740
4年前、あるじゃん老害エンジニアLT合戦に登壇、米国西海岸コンピュータ歴史博物館体験記の続編
toshi_atsumi
0
200
スタートアップの技術顧問を3年間続けて発生した事と気付き
biwakonbu
0
160
The CloudCompare project by Dr. Daniel Girardeau-Montaut
kentaitakura
0
510
レガシーをぶっ壊せ。AEONで始めるDevRelの話 / Qiita Night 2024-2-22
aeonpeople
3
150
Google Cloud の AI を支える裏側のインフラを垣間見る!
maroon1st
0
190
GraphQL 成熟度モデルの紹介と、プロダクトに当てはめた事例 / GraphQL maturity model
mh4gf
4
110
Databricks におけるデータエンジニアリング
databricksjapan
0
380
Featured
See All Featured
How To Stay Up To Date on Web Technology
chriscoyier
782
250k
Adopting Sorbet at Scale
ufuk
67
8.6k
Code Review Best Practice
trishagee
54
15k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
220
21k
The Invisible Customer
myddelton
114
12k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
60
14k
Optimizing for Happiness
mojombo
370
69k
Fontdeck: Realign not Redesign
paulrobertlloyd
76
4.9k
Build The Right Thing And Hit Your Dates
maggiecrowley
23
2k
The Brand Is Dead. Long Live the Brand.
mthomps
48
28k
Into the Great Unknown - MozCon
thekraken
10
980
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
154
14k
Transcript
L FG A
• S')1 0(6T • L>A9 XFLAG CDB=
!?NRK • GRD /%Q$7 • GRDO:>3GRD;<8H;C-,/ ACFM • P?/5#2(4&"Q 1+/GRDJPR • BIERN/ • @RIC. *6 / • GitHub: ojima-h 2
4 DAUKPI !
5
6 • • 2TB/day
30 → 1000
7 • 5
→ 100
− 8 S3
− 9 S3
− 10 Redshift
− 11
12 Data Lake Architecture
Data Lake " • -4,&$#!-4,+.' • -4,&% "%,(13*+)40&% !
(Schema on Read) • Data Lake -4,& DWH 24/$ $% 13
Data Lake 14 Hive Metastore
Hive Metastore 15
Hive " • Hadoop%(47-:.69!; • SQL ,*7&$S3 # HDFS !1:/
#1:/ & • ORC !3')83+:502& 16
Hive Metastore • S3/HDFS * "-SQL /1,&(.&0 (.&%)! •
,&(.& • * "- • * "-*#.+') • (.&%$.+ • 17
Hive Metastore • EMR ! Hive Metastore
! • • EMR 30 18
Hive Metastore • Hive Metastore MySQL
• Hive Metastore (HCatalog) server • EMR 5 19
Hive Metastore S3 20
Hive Metastore • ' • '"%
• 'ORC • '!&' ' !'#$$ 21
Hive Metastore • Hive Metastore S3 "
S3" !" 22
Hive Metastore * • "+$%- :>:>(*+ • 8C6*/,# •
3C;4' Hive DB / • Hive ).!% S3&*8C6/ • Hive &.( 8C6)-*@C@/ 23 3C;4 D=A49B<019?C2BBE 8C6579 8C6 Hive Database Table Partition S3 s3://BUCKET/warehouse/SERVICE.db/ s3://BUCKET/warehouse/SERVICE.db/TABLE/ s3://BUCKET/warehouse/SERVICE.db/TABLE/y=YYYY/m=MM/d=DD/
Hive Metastore • %)" &'&'%)" • &$#
! ( 24
Hive Metastore 1. Hive Metastore
25
Hive Metastore 1. Hive Metastore
2. 26
Hive Metastore 1. Hive Metastore
2. 3. Hive Metastore 27
Hive Metastore 1. Hive Metastore
2. 3. Hive Metastore 4. 28
Hive Metastore ! 1. ),(! $ Hive Metastore # 2.
),($'*, 3. Hive Metastore ! $ 4. ),($ &%+ $ "),($ 29
Hive Metastore 30
Hive Metastore • Hive Redshift "%!$%# • Redshift
COPY "%! csv+gzip • Hive "%! ORC • Redshift csv+gzip Hive ORC ⇒ Redshift Spectrum 31
Redshift Spectrum • Redshift S3(#$+ &%*" • ',)+
Hive Metastore ! Hive ',)+" 32 CREATE EXTERNAL SCHEMA schema_name FROM HIVE METASTORE DATABASE 'database_name’ URI 'hive_metastore_uri’;
Hive Metastore • Redshift Hive 33 INSERT
INTO ‘Redshift ’ SELECT … FROM ‘Hive ’ WHERE y=YYYY AND m=MM AND d=DD;
Hive Metastore • Redshift Spectrum
Hive Metastore • Spark SQL • Presto • Athena • Flink 34
Hive Metastore Hive Metastore S3 Hive,
Redshift Spectrum , Spark 35
36
($) • Hive Metastore '25103-$251.4/4& • Hive Metastore , $"
Data Lake , !$# 251&*251&%+$#! Hive Metastore , +$# Data Lake , "$#(!6 37
None