Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Intro to Parquet (June 2015)
Search
Sam Bessalah
April 06, 2016
Technology
0
320
Intro to Parquet (June 2015)
Sam Bessalah
April 06, 2016
Tweet
Share
More Decks by Sam Bessalah
See All by Sam Bessalah
Streaming Platforms
samklr
0
370
High Performance RPC with Finagle
samklr
1
220
Dotscale 2015 Lightning - Distributed Systems Research
samklr
1
820
Datageeks_27-05.pdf
samklr
0
72
Big data and Machine learning APIs
samklr
4
280
Scalable Machine Learning
samklr
2
260
mesos.devoxx.2014
samklr
2
290
Algebird : Abstract Algebra for Big Data Analytics.
samklr
9
3k
Algebra for analytics
samklr
1
300
Other Decks in Technology
See All in Technology
Agent Skill 是什麼?對軟體產業帶來的變化
appleboy
0
230
Astro Islandsの 内部実装を 「日本で一番わかりやすく」 ざっくり解説!
knj
1
280
スケールアップ企業でQA組織が機能し続けるための組織設計と仕組み〜ボトムアップとトップダウンを両輪としたアプローチ〜
qa
0
270
Kiro Meetup #7 Kiro アップデート (2025/12/15〜2026/3/20)
katzueno
2
250
AI時代のオンプレ-クラウドキャリアチェンジ考
yuu0w0yuu
0
230
モジュラモノリス導入から4年間の総括:アーキテクチャと組織の相互作用について / Architecture and Organizational Interaction
nazonohito51
6
2.9k
GitHub Copilot CLI で Azure Portal to Bicep
tsubakimoto_s
0
190
LLMに何を任せ、何を任せないか
cap120
10
5.4k
【PHPerKaigi2026】OpenTelemetry SDKを使ってPHPでAPMを自作する
fendo181
1
190
「コントロールの三分法」で考える「コト」への向き合い方 / phperkaigi2026
blue_goheimochi
0
150
形式手法特論:SMT ソルバで解く認可ポリシの静的解析 #kernelvm / Kernel VM Study Tsukuba No3
ytaka23
1
790
韓非子に学ぶAI活用術
tomfook
2
550
Featured
See All Featured
Building a Modern Day E-commerce SEO Strategy
aleyda
45
9k
Navigating the moral maze — ethical principles for Al-driven product design
skipperchong
2
300
SEOcharity - Dark patterns in SEO and UX: How to avoid them and build a more ethical web
sarafernandez
0
150
We Have a Design System, Now What?
morganepeng
55
8k
Facilitating Awesome Meetings
lara
57
6.8k
Ten Tips & Tricks for a 🌱 transition
stuffmc
0
91
Designing for Timeless Needs
cassininazir
0
170
Money Talks: Using Revenue to Get Sh*t Done
nikkihalliwell
0
190
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
The Cult of Friendly URLs
andyhume
79
6.8k
How to Think Like a Performance Engineer
csswizardry
28
2.5k
Utilizing Notion as your number one productivity tool
mfonobong
4
270
Transcript
Sam BESSALAH @samklr http://parquet.apache.org
Typical Data workflow
Typical Data workflow
Typical Data workflow
Typical Data workflow
Multiple Data Format
Big Data Data Format Zoo - Sequence Files
these formats provide
None
Binary, columnar storage format for big data analytics workloads, inspired
by the Google Dremel Paper. - Language independent - Processing framework independent - Formally specified - More than a columnar storage : Dynamic partionning, automatic predicate and projections push down - Awesome performance
Columnar Storage 101
Columnar Storage 101
Columnar Storage 101
Columnar Storage 101 Advantages : - Limits I/O to the
data only needed - Big Space savings, better compression, and faster and low overhead encodings - Enables vectorized engine
Columnar Storage 101
None
Parquet Model
Example Parquet Schema
None
None
Definition and Repetition Levels Definition Level : Stores the level
for which the field is null Repetition Level : Store levels when new lists are starting in column values.
None
None
None
None
None
None
Numbers Example: Appnexus 2 MM Logs of Ads impressions 270
TB of Log Data in Protobuf on HDFS http://techblog.appnexus.com/blog/2015/03/31/parquet-columnar-storage-for-hadoop-data/
simple bench with HIVE
None
None
Disk Space usage on HDFS with 128 MB blocks
None
None
None
None
None
None
Slides shamelessly cloned from Julien Le Dem(@J_) , Lead of
the Apache Parquet Project
BACKUP SLIDES
None
None
None
None
None
None
None
None
None
None
None
None