Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Intro to Parquet (June 2015)
Search
Sam Bessalah
April 06, 2016
Technology
0
280
Intro to Parquet (June 2015)
Sam Bessalah
April 06, 2016
Tweet
Share
More Decks by Sam Bessalah
See All by Sam Bessalah
Streaming Platforms
samklr
0
350
High Performance RPC with Finagle
samklr
1
180
Dotscale 2015 Lightning - Distributed Systems Research
samklr
1
790
Datageeks_27-05.pdf
samklr
0
53
Big data and Machine learning APIs
samklr
4
260
Scalable Machine Learning
samklr
2
230
mesos.devoxx.2014
samklr
2
250
Algebird : Abstract Algebra for Big Data Analytics.
samklr
9
2.9k
Algebra for analytics
samklr
1
280
Other Decks in Technology
See All in Technology
CDKコード品質UP!ナイスな自作コンストラクタを作るための便利インターフェース
harukasakihara
2
240
Introduction to Bill One Development Engineer
sansan33
PRO
0
270
QuickSight SPICE の効果的な運用戦略~S3 + Athena 構成での実践ノウハウ~/quicksight-spice-s3-athena-best-practices
emiki
0
290
本当にわかりやすいAIエージェント入門
segavvy
5
3k
スタックチャン家庭用アシスタントへの道
kanekoh
0
130
公開初日に Gemini CLI を試した話や FFmpeg と組み合わせてみた話など / Gemini CLI 初学者勉強会(#AI道場)
you
PRO
0
1.4k
助けて! XからWaylandに移行しないと新しいGNOMEが使えなくなっちゃう 2025-07-12
nobutomurata
2
200
AI時代にも変わらぬ価値を発揮したい: インフラ・クラウドを切り口にユーザー価値と非機能要件に向き合ってエンジニアとしての地力を培う
netmarkjp
0
150
60以上のプロダクトを持つ組織における開発者体験向上への取り組み - チームAPIとBackstageで構築する組織の可視化基盤 - / sre next 2025 Efforts to Improve Developer Experience in an Organization with Over 60 Products
vtryo
3
2k
20250708オープンエンドな探索と知識発見
sakana_ai
PRO
4
1.1k
サービスを止めるな! DDoS攻撃へのスマートな備えと最前線の事例
coconala_engineer
1
190
クラウド開発の舞台裏とSRE文化の醸成 / SRE NEXT 2025 Lunch Session
kazeburo
1
620
Featured
See All Featured
Agile that works and the tools we love
rasmusluckow
329
21k
Writing Fast Ruby
sferik
628
62k
Become a Pro
speakerdeck
PRO
29
5.4k
Bootstrapping a Software Product
garrettdimon
PRO
307
110k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
120k
[RailsConf 2023] Rails as a piece of cake
palkan
55
5.7k
VelocityConf: Rendering Performance Case Studies
addyosmani
332
24k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
8
340
Typedesign – Prime Four
hannesfritz
42
2.7k
Music & Morning Musume
bryan
46
6.7k
Six Lessons from altMBA
skipperchong
28
3.9k
Raft: Consensus for Rubyists
vanstee
140
7k
Transcript
Sam BESSALAH @samklr http://parquet.apache.org
Typical Data workflow
Typical Data workflow
Typical Data workflow
Typical Data workflow
Multiple Data Format
Big Data Data Format Zoo - Sequence Files
these formats provide
None
Binary, columnar storage format for big data analytics workloads, inspired
by the Google Dremel Paper. - Language independent - Processing framework independent - Formally specified - More than a columnar storage : Dynamic partionning, automatic predicate and projections push down - Awesome performance
Columnar Storage 101
Columnar Storage 101
Columnar Storage 101
Columnar Storage 101 Advantages : - Limits I/O to the
data only needed - Big Space savings, better compression, and faster and low overhead encodings - Enables vectorized engine
Columnar Storage 101
None
Parquet Model
Example Parquet Schema
None
None
Definition and Repetition Levels Definition Level : Stores the level
for which the field is null Repetition Level : Store levels when new lists are starting in column values.
None
None
None
None
None
None
Numbers Example: Appnexus 2 MM Logs of Ads impressions 270
TB of Log Data in Protobuf on HDFS http://techblog.appnexus.com/blog/2015/03/31/parquet-columnar-storage-for-hadoop-data/
simple bench with HIVE
None
None
Disk Space usage on HDFS with 128 MB blocks
None
None
None
None
None
None
Slides shamelessly cloned from Julien Le Dem(@J_) , Lead of
the Apache Parquet Project
BACKUP SLIDES
None
None
None
None
None
None
None
None
None
None
None
None