Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MapReduce and Columnar DB's
Search
samant
April 02, 2014
Programming
0
1.4k
MapReduce and Columnar DB's
samant
April 02, 2014
Tweet
Share
More Decks by samant
See All by samant
Introduction to Firebase (May contain some pieces of AppGyver and Polymer)
samant
0
1.2k
Why Ruby on Rails? - Feweb - September 2014
samant
1
1.3k
Beloved JS - JavaScript…. What else?
samant
1
1.7k
WTF: OOCSS like a boss !
samant
7
1.5k
WTF: Document DBs
samant
0
2k
WTF: Rails App Templates
samant
2
2.8k
Other Decks in Programming
See All in Programming
採用事例の少ないSvelteを選んだ理由と それを正解にするためにやっていること
oekazuma
2
1k
「とりあえず動く」コードはよい、「読みやすい」コードはもっとよい / Code that 'just works' is good, but code that is 'readable' is even better.
mkmk884
3
440
これが俺の”自分戦略” プロセスを楽しんでいこう! - Developers CAREER Boost 2024
niftycorp
PRO
0
190
情報漏洩させないための設計
kubotak
2
250
競技プログラミングへのお誘い@阪大BOOSTセミナー
kotamanegi
0
360
[JAWS-UG横浜 #76] イケてるアップデートを宇宙いち早く紹介するよ!
maroon1st
0
460
テストコードのガイドライン 〜作成から運用まで〜
riku929hr
5
610
【re:Growth 2024】 Aurora DSQL をちゃんと話します!
maroon1st
0
780
Scalaから始めるOpenFeature入門 / Scalaわいわい勉強会 #4
arthur1
1
330
useSyncExternalStoreを使いまくる
ssssota
6
1.1k
menu基盤チームによるGoogle Cloudの活用事例~Application Integration, Cloud Tasks編~
yoshifumi_ishikura
0
110
暇に任せてProxmoxコンソール 作ってみました
karugamo
2
720
Featured
See All Featured
Building a Scalable Design System with Sketch
lauravandoore
460
33k
Imperfection Machines: The Place of Print at Facebook
scottboms
266
13k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
251
21k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
26
1.5k
GraphQLの誤解/rethinking-graphql
sonatard
67
10k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
48
2.2k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
2
170
Adopting Sorbet at Scale
ufuk
73
9.1k
Thoughts on Productivity
jonyablonski
67
4.4k
Mobile First: as difficult as doing things right
swwweet
222
9k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
38
1.9k
Reflections from 52 weeks, 52 projects
jeffersonlam
347
20k
Transcript
MapReduce and Columnar DB’s Amant Stéphane @stephamant
Summary • MapReduce • Columnar DB’s • Practical Use Case
MapReduce
MapReduce - Definition • One of Google’s greatest contributions to
computer science • MapReduce is an algorithmic framework for executing jobs in parallel over several nodes
MapReduce
MapReduce
MapReduce - Major Implementation • Almost always based on Hadoop
- a Framework for the storage and processing of large scaled and distributed data supported by Apache • Itself inspired by Google BigTable Project
Columnar DB’s
Columnar DB’s - Definition Columnar databases are so named because
the important aspect of their design is that data from a given column is stored together. (By contrast, a row-oriented database keeps information about a row together.) In column-oriented databases, adding columns is quite inexpensive.
Columnar DB’s - Definition
Columnar DB’s - Definition
Columnar DB’s - Queries get ‘t1′, ‘r1′, {COLUMN => ‘c1′}
get ‘t1′, ‘r1′, {COLUMN => ['c1', 'c2', 'c3']} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMERANGE => [ts1, ts2], VERSIONS => 4} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1, VERSIONS => 4}
Columnar DB’s - Major Implementation • Cassandra • Hypertable •
HBase
Columnar DBs - Supporting Companies • Facebook • Yahoo •
Ebay • Twitter • Amazon • Google • ...
Columnar DB’s - Pro’s • Horizontal scalability (replication and partitioning)
• Versioning is trivial • No real storage cost for null values • Used mainly for Big Data / data mining / Business Intelligence analysis
Columnar DB’s - Con’s • Complexity (Installation, infrastructure and usage)
• Design your schema based on how you plan to query the data • Some operations are really time expensive
Practical Use Case
Facebook Messaging Index Table Keyword #1 Keyword #2 Keyword #3
Keyword #... User ID #1 User ID #2 User ID #... Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id
References Seven Databases in Seven Weeks: A Guide to Modern
Databases and the NoSQL Movement by Eric Redmond and Jim R. Wilson
Thank you