Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
データ分析基盤の変遷とデータレイクの作り方
Search
Ojima Hikaru
April 21, 2018
Technology
2
1.9k
データ分析基盤の変遷とデータレイクの作り方
Battle Conference U30 #2018
Ojima Hikaru
April 21, 2018
Tweet
Share
More Decks by Ojima Hikaru
See All by Ojima Hikaru
家族の思い出を形にする 〜 1秒動画の生成を支えるインフラアーキテクチャ
ojima_h
3
1.8k
Railsの限界を超えろ!「家族アルバム みてね」の画像・動画の大規模アップロードを支えるアーキテクチャの変遷
ojima_h
5
760
Podのオートスケーリングに苦戦し続けている話
ojima_h
1
340
ディメンショナルモデリングのすすめ
ojima_h
8
4.7k
モンスターストライクを支えるデータ分析基盤と準リアルタイム集計
ojima_h
7
5.7k
Other Decks in Technology
See All in Technology
多野優介
tanoyusuke
1
420
Oracle Base Database Service 技術詳細
oracle4engineer
PRO
11
77k
"複雑なデータ処理 × 静的サイト" を両立させる、楽をするRails運用 / A low-effort Rails workflow that combines “Complex Data Processing × Static Sites”
hogelog
3
1.9k
stupid jj tricks
indirect
0
7.9k
Trust as Infrastructure
bcantrill
0
330
【新卒研修資料】LLM・生成AI研修 / Large Language Model・Generative AI
brainpadpr
23
17k
From Prompt to Product @ How to Web 2025, Bucharest, Romania
janwerner
0
120
Shirankedo NOCで見えてきたeduroam/OpenRoaming運用ノウハウと課題 - BAKUCHIKU BANBAN #2
marokiki
0
140
それでも私はContextに値を詰めたい | Go Conference 2025 / go conference 2025 fill context
budougumi0617
4
1.2k
実装で解き明かす並行処理の歴史
zozotech
PRO
1
320
生成AI_その前_に_マルチクラウド時代の信頼できるデータを支えるSnowflakeメタデータ活用術.pdf
cm_mikami
0
110
20250929_QaaS_vol20
mura_shin
0
110
Featured
See All Featured
Rebuilding a faster, lazier Slack
samanthasiow
84
9.2k
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.4k
Product Roadmaps are Hard
iamctodd
PRO
54
11k
BBQ
matthewcrist
89
9.8k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
The Pragmatic Product Professional
lauravandoore
36
6.9k
[RailsConf 2023] Rails as a piece of cake
palkan
57
5.9k
Raft: Consensus for Rubyists
vanstee
139
7.1k
How GitHub (no longer) Works
holman
315
140k
The Cult of Friendly URLs
andyhume
79
6.6k
A Modern Web Designer's Workflow
chriscoyier
697
190k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
9
580
Transcript
L FG A
• S')1 0(6T • L>A9 XFLAG CDB=
!?NRK • GRD /%Q$7 • GRDO:>3GRD;<8H;C-,/ ACFM • P?/5#2(4&"Q 1+/GRDJPR • BIERN/ • @RIC. *6 / • GitHub: ojima-h 2
4 DAUKPI !
5
6 • • 2TB/day
30 → 1000
7 • 5
→ 100
− 8 S3
− 9 S3
− 10 Redshift
− 11
12 Data Lake Architecture
Data Lake " • -4,&$#!-4,+.' • -4,&% "%,(13*+)40&% !
(Schema on Read) • Data Lake -4,& DWH 24/$ $% 13
Data Lake 14 Hive Metastore
Hive Metastore 15
Hive " • Hadoop%(47-:.69!; • SQL ,*7&$S3 # HDFS !1:/
#1:/ & • ORC !3')83+:502& 16
Hive Metastore • S3/HDFS * "-SQL /1,&(.&0 (.&%)! •
,&(.& • * "- • * "-*#.+') • (.&%$.+ • 17
Hive Metastore • EMR ! Hive Metastore
! • • EMR 30 18
Hive Metastore • Hive Metastore MySQL
• Hive Metastore (HCatalog) server • EMR 5 19
Hive Metastore S3 20
Hive Metastore • ' • '"%
• 'ORC • '!&' ' !'#$$ 21
Hive Metastore • Hive Metastore S3 "
S3" !" 22
Hive Metastore * • "+$%- :>:>(*+ • 8C6*/,# •
3C;4' Hive DB / • Hive ).!% S3&*8C6/ • Hive &.( 8C6)-*@C@/ 23 3C;4 D=A49B<019?C2BBE 8C6579 8C6 Hive Database Table Partition S3 s3://BUCKET/warehouse/SERVICE.db/ s3://BUCKET/warehouse/SERVICE.db/TABLE/ s3://BUCKET/warehouse/SERVICE.db/TABLE/y=YYYY/m=MM/d=DD/
Hive Metastore • %)" &'&'%)" • &$#
! ( 24
Hive Metastore 1. Hive Metastore
25
Hive Metastore 1. Hive Metastore
2. 26
Hive Metastore 1. Hive Metastore
2. 3. Hive Metastore 27
Hive Metastore 1. Hive Metastore
2. 3. Hive Metastore 4. 28
Hive Metastore ! 1. ),(! $ Hive Metastore # 2.
),($'*, 3. Hive Metastore ! $ 4. ),($ &%+ $ "),($ 29
Hive Metastore 30
Hive Metastore • Hive Redshift "%!$%# • Redshift
COPY "%! csv+gzip • Hive "%! ORC • Redshift csv+gzip Hive ORC ⇒ Redshift Spectrum 31
Redshift Spectrum • Redshift S3(#$+ &%*" • ',)+
Hive Metastore ! Hive ',)+" 32 CREATE EXTERNAL SCHEMA schema_name FROM HIVE METASTORE DATABASE 'database_name’ URI 'hive_metastore_uri’;
Hive Metastore • Redshift Hive 33 INSERT
INTO ‘Redshift ’ SELECT … FROM ‘Hive ’ WHERE y=YYYY AND m=MM AND d=DD;
Hive Metastore • Redshift Spectrum
Hive Metastore • Spark SQL • Presto • Athena • Flink 34
Hive Metastore Hive Metastore S3 Hive,
Redshift Spectrum , Spark 35
36
($) • Hive Metastore '25103-$251.4/4& • Hive Metastore , $"
Data Lake , !$# 251&*251&%+$#! Hive Metastore , +$# Data Lake , "$#(!6 37
None