Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Intro to Parquet (June 2015)
Search
Sam Bessalah
April 06, 2016
Technology
0
290
Intro to Parquet (June 2015)
Sam Bessalah
April 06, 2016
Tweet
Share
More Decks by Sam Bessalah
See All by Sam Bessalah
Streaming Platforms
samklr
0
350
High Performance RPC with Finagle
samklr
1
190
Dotscale 2015 Lightning - Distributed Systems Research
samklr
1
800
Datageeks_27-05.pdf
samklr
0
53
Big data and Machine learning APIs
samklr
4
270
Scalable Machine Learning
samklr
2
240
mesos.devoxx.2014
samklr
2
260
Algebird : Abstract Algebra for Big Data Analytics.
samklr
9
2.9k
Algebra for analytics
samklr
1
290
Other Decks in Technology
See All in Technology
現地速報!Microsoft Ignite 2025 M365 Copilotアップデートレポート
kasada
1
1.2k
セマンティックHTMLによる アクセシビリティ品質向上の基礎
zozotech
PRO
0
110
マーケットプレイス版Oracle WebCenter Content For OCI
oracle4engineer
PRO
4
1.4k
バクラクの AI-BPO を支える AI エージェント 〜とそれを支える Bet AI Guild〜
tomoaki25
2
780
[CV勉強会@関東 ICCV2025 読み会] World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model (Zheng+, ICCV 2025)
abemii
0
230
「データ無い! 腹立つ! 推論する!」から 「データ無い! 腹立つ! データを作る」へ チームでデータを作り、育てられるようにするまで / How can we create, use, and maintain data ourselves?
moznion
8
4.5k
[mercari GEARS 2025] Building Foundation for Mercari’s Global Expansion
mercari
PRO
1
140
Service Monitoring Platformについて
lycorptech_jp
PRO
0
290
技術広報のOKRで生み出す 開発組織への価値 〜 カンファレンス協賛を通して育む学びの文化 〜 / Creating Value for Development Organisations Through Technical Communications OKRs — Nurturing a Culture of Learning Through Conference Sponsorship —
pauli
5
440
なぜThrottleではなくDebounceだったのか? 700並列リクエストと戦うサーバーサイド実装のすべて
yoshiori
13
4.7k
What's the recommended Flutter architecture
aakira
3
2k
FFMとJVMの実装から学ぶJavaのインテグリティ
kazumura
0
130
Featured
See All Featured
The Pragmatic Product Professional
lauravandoore
36
7k
Building a Modern Day E-commerce SEO Strategy
aleyda
45
8.1k
4 Signs Your Business is Dying
shpigford
186
22k
Designing for humans not robots
tammielis
254
26k
A Tale of Four Properties
chriscoyier
162
23k
How To Stay Up To Date on Web Technology
chriscoyier
791
250k
Side Projects
sachag
455
43k
Designing Experiences People Love
moore
142
24k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
48
9.8k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.3k
Testing 201, or: Great Expectations
jmmastey
46
7.8k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.6k
Transcript
Sam BESSALAH @samklr http://parquet.apache.org
Typical Data workflow
Typical Data workflow
Typical Data workflow
Typical Data workflow
Multiple Data Format
Big Data Data Format Zoo - Sequence Files
these formats provide
None
Binary, columnar storage format for big data analytics workloads, inspired
by the Google Dremel Paper. - Language independent - Processing framework independent - Formally specified - More than a columnar storage : Dynamic partionning, automatic predicate and projections push down - Awesome performance
Columnar Storage 101
Columnar Storage 101
Columnar Storage 101
Columnar Storage 101 Advantages : - Limits I/O to the
data only needed - Big Space savings, better compression, and faster and low overhead encodings - Enables vectorized engine
Columnar Storage 101
None
Parquet Model
Example Parquet Schema
None
None
Definition and Repetition Levels Definition Level : Stores the level
for which the field is null Repetition Level : Store levels when new lists are starting in column values.
None
None
None
None
None
None
Numbers Example: Appnexus 2 MM Logs of Ads impressions 270
TB of Log Data in Protobuf on HDFS http://techblog.appnexus.com/blog/2015/03/31/parquet-columnar-storage-for-hadoop-data/
simple bench with HIVE
None
None
Disk Space usage on HDFS with 128 MB blocks
None
None
None
None
None
None
Slides shamelessly cloned from Julien Le Dem(@J_) , Lead of
the Apache Parquet Project
BACKUP SLIDES
None
None
None
None
None
None
None
None
None
None
None
None