Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Intro to Parquet (June 2015)
Search
Sam Bessalah
April 06, 2016
Technology
0
280
Intro to Parquet (June 2015)
Sam Bessalah
April 06, 2016
Tweet
Share
More Decks by Sam Bessalah
See All by Sam Bessalah
Streaming Platforms
samklr
0
350
High Performance RPC with Finagle
samklr
1
180
Dotscale 2015 Lightning - Distributed Systems Research
samklr
1
790
Datageeks_27-05.pdf
samklr
0
53
Big data and Machine learning APIs
samklr
4
260
Scalable Machine Learning
samklr
2
230
mesos.devoxx.2014
samklr
2
250
Algebird : Abstract Algebra for Big Data Analytics.
samklr
9
2.9k
Algebra for analytics
samklr
1
280
Other Decks in Technology
See All in Technology
キャリアを支え組織力を高める「多層型ふりかえり」 / 20250821 Kazuki Mori
shift_evolve
PRO
2
300
RAID6 を楔形文字で組んで現代人を怖がらせましょう(実装編)
mimifuwa
0
300
EKS Pod Identity における推移的な session tags
z63d
1
200
R-SCoRe: Revisiting Scene Coordinate Regression for Robust Large-Scale Visual Localization
takmin
0
430
人と組織に偏重したEMへのアンチテーゼ──なぜ、EMに設計力が必要なのか/An antithesis to the overemphasis of people and organizations in EM
dskst
5
600
モバイルアプリ研修
recruitengineers
PRO
2
220
AIが住民向けコンシェルジュに?Amazon Connectと生成AIで実現する自治体AIエージェント!
yuyeah
0
260
Amazon Bedrock AgentCore でプロモーション用動画生成エージェントを開発する
nasuvitz
6
420
我々は雰囲気で仕事をしている / How can we do vibe coding as well
naospon
2
220
mruby(PicoRuby)で ファミコン音楽を奏でる
kishima
1
220
AIドリブンのソフトウェア開発 - うまいやり方とまずいやり方
okdt
PRO
9
570
第4回 関東Kaggler会 [Training LLMs with Limited VRAM]
tascj
12
1.7k
Featured
See All Featured
Art, The Web, and Tiny UX
lynnandtonic
302
21k
Designing Experiences People Love
moore
142
24k
Speed Design
sergeychernyshev
32
1.1k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
36
2.5k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
8
890
What’s in a name? Adding method to the madness
productmarketing
PRO
23
3.6k
GraphQLの誤解/rethinking-graphql
sonatard
71
11k
How to Think Like a Performance Engineer
csswizardry
25
1.8k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
30
9.6k
Being A Developer After 40
akosma
90
590k
Statistics for Hackers
jakevdp
799
220k
Building Applications with DynamoDB
mza
96
6.6k
Transcript
Sam BESSALAH @samklr http://parquet.apache.org
Typical Data workflow
Typical Data workflow
Typical Data workflow
Typical Data workflow
Multiple Data Format
Big Data Data Format Zoo - Sequence Files
these formats provide
None
Binary, columnar storage format for big data analytics workloads, inspired
by the Google Dremel Paper. - Language independent - Processing framework independent - Formally specified - More than a columnar storage : Dynamic partionning, automatic predicate and projections push down - Awesome performance
Columnar Storage 101
Columnar Storage 101
Columnar Storage 101
Columnar Storage 101 Advantages : - Limits I/O to the
data only needed - Big Space savings, better compression, and faster and low overhead encodings - Enables vectorized engine
Columnar Storage 101
None
Parquet Model
Example Parquet Schema
None
None
Definition and Repetition Levels Definition Level : Stores the level
for which the field is null Repetition Level : Store levels when new lists are starting in column values.
None
None
None
None
None
None
Numbers Example: Appnexus 2 MM Logs of Ads impressions 270
TB of Log Data in Protobuf on HDFS http://techblog.appnexus.com/blog/2015/03/31/parquet-columnar-storage-for-hadoop-data/
simple bench with HIVE
None
None
Disk Space usage on HDFS with 128 MB blocks
None
None
None
None
None
None
Slides shamelessly cloned from Julien Le Dem(@J_) , Lead of
the Apache Parquet Project
BACKUP SLIDES
None
None
None
None
None
None
None
None
None
None
None
None