Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Intro to Parquet (June 2015)
Search
Sam Bessalah
April 06, 2016
Technology
330
0
Share
Intro to Parquet (June 2015)
Sam Bessalah
April 06, 2016
More Decks by Sam Bessalah
See All by Sam Bessalah
Streaming Platforms
samklr
0
380
High Performance RPC with Finagle
samklr
1
220
Dotscale 2015 Lightning - Distributed Systems Research
samklr
1
830
Datageeks_27-05.pdf
samklr
0
77
Big data and Machine learning APIs
samklr
4
290
Scalable Machine Learning
samklr
2
270
mesos.devoxx.2014
samklr
2
300
Algebird : Abstract Algebra for Big Data Analytics.
samklr
9
3k
Algebra for analytics
samklr
1
310
Other Decks in Technology
See All in Technology
類似画像検索モデルの開発ノウハウ
lycorptech_jp
PRO
4
1k
食べログのサーキットブレーカー導入を振り返って
atpons
1
150
OpenClawとHermesAgentでAI新入社員を作った話
takanoriyanada
0
140
oracle-to-databricks-migration-with-llm-and-dbt
casek
1
360
GitHub Copilot CLIでWebアクセシビリティを改善した話
tomokusaba
0
120
Spring AI × MCP 入門〜AIエージェントへのツール公開、境界設計から始める最小構成 〜
yuyamiyamoto
0
170
テストコードのないプロジェクトにテストを根付かせる
tttol
0
220
Agentic Design Patterns
glaforge
0
270
なぜハノーバーメッセに行くべきなのか 〜初参加だから語れること〜
tanakaseiya
0
170
Typiaで配信JSONの安全性を構造的に担保する(TSKaigi2026)
righttouch
PRO
1
200
Kiro CLI v2.0.0がやってきた!
kentapapa
0
210
Claude code Orchestra
ozakiomumkj
2
310
Featured
See All Featured
The untapped power of vector embeddings
frankvandijk
2
1.7k
Navigating the Design Leadership Dip - Product Design Week Design Leaders+ Conference 2024
apolaine
1
330
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
1
2.7k
HDC tutorial
michielstock
2
680
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
11
920
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
8.1k
Six Lessons from altMBA
skipperchong
29
4.3k
Claude Code のすすめ
schroneko
67
220k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
37
6.4k
Applied NLP in the Age of Generative AI
inesmontani
PRO
4
2.3k
Optimising Largest Contentful Paint
csswizardry
37
3.7k
[SF Ruby Conf 2025] Rails X
palkan
2
1.1k
Transcript
Sam BESSALAH @samklr http://parquet.apache.org
Typical Data workflow
Typical Data workflow
Typical Data workflow
Typical Data workflow
Multiple Data Format
Big Data Data Format Zoo - Sequence Files
these formats provide
None
Binary, columnar storage format for big data analytics workloads, inspired
by the Google Dremel Paper. - Language independent - Processing framework independent - Formally specified - More than a columnar storage : Dynamic partionning, automatic predicate and projections push down - Awesome performance
Columnar Storage 101
Columnar Storage 101
Columnar Storage 101
Columnar Storage 101 Advantages : - Limits I/O to the
data only needed - Big Space savings, better compression, and faster and low overhead encodings - Enables vectorized engine
Columnar Storage 101
None
Parquet Model
Example Parquet Schema
None
None
Definition and Repetition Levels Definition Level : Stores the level
for which the field is null Repetition Level : Store levels when new lists are starting in column values.
None
None
None
None
None
None
Numbers Example: Appnexus 2 MM Logs of Ads impressions 270
TB of Log Data in Protobuf on HDFS http://techblog.appnexus.com/blog/2015/03/31/parquet-columnar-storage-for-hadoop-data/
simple bench with HIVE
None
None
Disk Space usage on HDFS with 128 MB blocks
None
None
None
None
None
None
Slides shamelessly cloned from Julien Le Dem(@J_) , Lead of
the Apache Parquet Project
BACKUP SLIDES
None
None
None
None
None
None
None
None
None
None
None
None