Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Intro to Parquet (June 2015)
Search
Sam Bessalah
April 06, 2016
Technology
0
300
Intro to Parquet (June 2015)
Sam Bessalah
April 06, 2016
Tweet
Share
More Decks by Sam Bessalah
See All by Sam Bessalah
Streaming Platforms
samklr
0
360
High Performance RPC with Finagle
samklr
1
200
Dotscale 2015 Lightning - Distributed Systems Research
samklr
1
810
Datageeks_27-05.pdf
samklr
0
57
Big data and Machine learning APIs
samklr
4
270
Scalable Machine Learning
samklr
2
240
mesos.devoxx.2014
samklr
2
270
Algebird : Abstract Algebra for Big Data Analytics.
samklr
9
2.9k
Algebra for analytics
samklr
1
300
Other Decks in Technology
See All in Technology
30分であなたをOmniのファンにしてみせます~分析画面のクリック操作をそのままコード化できるAI-ReadyなBIツール~
sagara
0
150
コンテキスト情報を活用し個社最適化されたAI Agentを実現する4つのポイント
kworkdev
PRO
0
1.3k
MapKitとオープンデータで実現する地図情報の拡張と可視化
zozotech
PRO
1
140
新 Security HubがついにGA!仕組みや料金を深堀り #AWSreInvent #regrowth / AWS Security Hub Advanced GA
masahirokawahara
1
2k
AWS Security Agentの紹介/introducing-aws-security-agent
tomoki10
0
250
モダンデータスタック (MDS) の話とデータ分析が起こすビジネス変革
sutotakeshi
0
500
re:Invent2025 3つの Frontier Agents を紹介 / introducing-3-frontier-agents
tomoki10
0
120
初めてのDatabricks AI/BI Genie
taka_aki
0
160
[デモです] NotebookLM で作ったスライドの例
kongmingstrap
0
150
SSO方式とJumpアカウント方式の比較と設計方針
yuobayashi
7
680
re:Inventで気になったサービスを10分でいけるところまでお話しします
yama3133
1
120
生成AI時代におけるグローバル戦略思考
taka_aki
0
190
Featured
See All Featured
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
10
730
The Cult of Friendly URLs
andyhume
79
6.7k
A better future with KSS
kneath
240
18k
For a Future-Friendly Web
brad_frost
180
10k
Raft: Consensus for Rubyists
vanstee
141
7.2k
How to train your dragon (web standard)
notwaldorf
97
6.4k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.8k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
128
54k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4.1k
It's Worth the Effort
3n
187
29k
Optimizing for Happiness
mojombo
379
70k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
61k
Transcript
Sam BESSALAH @samklr http://parquet.apache.org
Typical Data workflow
Typical Data workflow
Typical Data workflow
Typical Data workflow
Multiple Data Format
Big Data Data Format Zoo - Sequence Files
these formats provide
None
Binary, columnar storage format for big data analytics workloads, inspired
by the Google Dremel Paper. - Language independent - Processing framework independent - Formally specified - More than a columnar storage : Dynamic partionning, automatic predicate and projections push down - Awesome performance
Columnar Storage 101
Columnar Storage 101
Columnar Storage 101
Columnar Storage 101 Advantages : - Limits I/O to the
data only needed - Big Space savings, better compression, and faster and low overhead encodings - Enables vectorized engine
Columnar Storage 101
None
Parquet Model
Example Parquet Schema
None
None
Definition and Repetition Levels Definition Level : Stores the level
for which the field is null Repetition Level : Store levels when new lists are starting in column values.
None
None
None
None
None
None
Numbers Example: Appnexus 2 MM Logs of Ads impressions 270
TB of Log Data in Protobuf on HDFS http://techblog.appnexus.com/blog/2015/03/31/parquet-columnar-storage-for-hadoop-data/
simple bench with HIVE
None
None
Disk Space usage on HDFS with 128 MB blocks
None
None
None
None
None
None
Slides shamelessly cloned from Julien Le Dem(@J_) , Lead of
the Apache Parquet Project
BACKUP SLIDES
None
None
None
None
None
None
None
None
None
None
None
None