Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Intro to Parquet (June 2015)
Search
Sam Bessalah
April 06, 2016
Technology
0
310
Intro to Parquet (June 2015)
Sam Bessalah
April 06, 2016
Tweet
Share
More Decks by Sam Bessalah
See All by Sam Bessalah
Streaming Platforms
samklr
0
360
High Performance RPC with Finagle
samklr
1
210
Dotscale 2015 Lightning - Distributed Systems Research
samklr
1
820
Datageeks_27-05.pdf
samklr
0
68
Big data and Machine learning APIs
samklr
4
280
Scalable Machine Learning
samklr
2
260
mesos.devoxx.2014
samklr
2
280
Algebird : Abstract Algebra for Big Data Analytics.
samklr
9
3k
Algebra for analytics
samklr
1
300
Other Decks in Technology
See All in Technology
Evolution of Claude Code & How to use features
oikon48
1
210
管理者向けGitHub Enterpriseの運用Tips紹介: 人にもAIにも優しいプラットフォームづくり
yuriemori
0
110
クラウド時代における一時権限取得
krrrr38
1
160
Ultra Ethernet (UEC) v1.0 仕様概説
markunet
3
200
製造業ドメインにおける LLMプロダクト構築: 複雑な文脈へのアプローチ
caddi_eng
1
450
Serverless Agent Architecture on Azure / serverless-agent-on-azure
miyake
1
150
Datadog Cloud Cost Management で実現するFinOps
taiponrock
PRO
0
140
大規模サービスにおける レガシーコードからReactへの移行
magicpod
1
130
LINEヤフーにおけるAI駆動開発組織のプロデュース施策
lycorptech_jp
PRO
0
400
ヘルシーSRE
tk3fftk
2
240
生成AIの利用とセキュリティ /gen-ai-and-security
mizutani
1
1.2k
「データとの対話」の現在地と未来
kobakou
0
1.3k
Featured
See All Featured
The Curse of the Amulet
leimatthew05
1
9.6k
The Organizational Zoo: Understanding Human Behavior Agility Through Metaphoric Constructive Conversations (based on the works of Arthur Shelley, Ph.D)
kimpetersen
PRO
0
260
Leveraging Curiosity to Care for An Aging Population
cassininazir
1
190
What does AI have to do with Human Rights?
axbom
PRO
1
2k
SEO for Brand Visibility & Recognition
aleyda
0
4.3k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.7k
The AI Search Optimization Roadmap by Aleyda Solis
aleyda
1
5.4k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.9k
Java REST API Framework Comparison - PWX 2021
mraible
34
9.2k
Lightning Talk: Beautiful Slides for Beginners
inesmontani
PRO
1
460
My Coaching Mixtape
mlcsv
0
63
[RailsConf 2023] Rails as a piece of cake
palkan
59
6.3k
Transcript
Sam BESSALAH @samklr http://parquet.apache.org
Typical Data workflow
Typical Data workflow
Typical Data workflow
Typical Data workflow
Multiple Data Format
Big Data Data Format Zoo - Sequence Files
these formats provide
None
Binary, columnar storage format for big data analytics workloads, inspired
by the Google Dremel Paper. - Language independent - Processing framework independent - Formally specified - More than a columnar storage : Dynamic partionning, automatic predicate and projections push down - Awesome performance
Columnar Storage 101
Columnar Storage 101
Columnar Storage 101
Columnar Storage 101 Advantages : - Limits I/O to the
data only needed - Big Space savings, better compression, and faster and low overhead encodings - Enables vectorized engine
Columnar Storage 101
None
Parquet Model
Example Parquet Schema
None
None
Definition and Repetition Levels Definition Level : Stores the level
for which the field is null Repetition Level : Store levels when new lists are starting in column values.
None
None
None
None
None
None
Numbers Example: Appnexus 2 MM Logs of Ads impressions 270
TB of Log Data in Protobuf on HDFS http://techblog.appnexus.com/blog/2015/03/31/parquet-columnar-storage-for-hadoop-data/
simple bench with HIVE
None
None
Disk Space usage on HDFS with 128 MB blocks
None
None
None
None
None
None
Slides shamelessly cloned from Julien Le Dem(@J_) , Lead of
the Apache Parquet Project
BACKUP SLIDES
None
None
None
None
None
None
None
None
None
None
None
None