Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MapReduce and Columnar DB's
Search
samant
April 02, 2014
Programming
0
1.4k
MapReduce and Columnar DB's
samant
April 02, 2014
Tweet
Share
More Decks by samant
See All by samant
Introduction to Firebase (May contain some pieces of AppGyver and Polymer)
samant
0
1.2k
Why Ruby on Rails? - Feweb - September 2014
samant
1
1.3k
Beloved JS - JavaScript…. What else?
samant
1
1.7k
WTF: OOCSS like a boss !
samant
7
1.5k
WTF: Document DBs
samant
0
2k
WTF: Rails App Templates
samant
2
2.8k
Other Decks in Programming
See All in Programming
TypeScriptでライブラリとの依存を限定的にする方法
tutinoko
3
690
CSC509 Lecture 11
javiergs
PRO
0
180
CSC509 Lecture 12
javiergs
PRO
0
160
どうして僕の作ったクラスが手続き型と言われなきゃいけないんですか
akikogoto
1
120
Jakarta EE meets AI
ivargrimstad
0
130
OnlineTestConf: Test Automation Friend or Foe
maaretp
0
110
色々なIaCツールを実際に触って比較してみる
iriikeita
0
330
macOS でできる リアルタイム動画像処理
biacco42
9
2.4k
Micro Frontends Unmasked Opportunities, Challenges, Alternatives
manfredsteyer
PRO
0
110
RubyLSPのマルチバイト文字対応
notfounds
0
120
ふかぼれ!CSSセレクターモジュール / Fukabore! CSS Selectors Module
petamoriken
0
150
Jakarta EE meets AI
ivargrimstad
0
600
Featured
See All Featured
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
280
13k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
47
5k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
250
21k
We Have a Design System, Now What?
morganepeng
50
7.2k
Designing for Performance
lara
604
68k
Gamification - CAS2011
davidbonilla
80
5k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
246
1.3M
Reflections from 52 weeks, 52 projects
jeffersonlam
346
20k
It's Worth the Effort
3n
183
27k
YesSQL, Process and Tooling at Scale
rocio
169
14k
Side Projects
sachag
452
42k
Intergalactic Javascript Robots from Outer Space
tanoku
269
27k
Transcript
MapReduce and Columnar DB’s Amant Stéphane @stephamant
Summary • MapReduce • Columnar DB’s • Practical Use Case
MapReduce
MapReduce - Definition • One of Google’s greatest contributions to
computer science • MapReduce is an algorithmic framework for executing jobs in parallel over several nodes
MapReduce
MapReduce
MapReduce - Major Implementation • Almost always based on Hadoop
- a Framework for the storage and processing of large scaled and distributed data supported by Apache • Itself inspired by Google BigTable Project
Columnar DB’s
Columnar DB’s - Definition Columnar databases are so named because
the important aspect of their design is that data from a given column is stored together. (By contrast, a row-oriented database keeps information about a row together.) In column-oriented databases, adding columns is quite inexpensive.
Columnar DB’s - Definition
Columnar DB’s - Definition
Columnar DB’s - Queries get ‘t1′, ‘r1′, {COLUMN => ‘c1′}
get ‘t1′, ‘r1′, {COLUMN => ['c1', 'c2', 'c3']} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMERANGE => [ts1, ts2], VERSIONS => 4} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1, VERSIONS => 4}
Columnar DB’s - Major Implementation • Cassandra • Hypertable •
HBase
Columnar DBs - Supporting Companies • Facebook • Yahoo •
Ebay • Twitter • Amazon • Google • ...
Columnar DB’s - Pro’s • Horizontal scalability (replication and partitioning)
• Versioning is trivial • No real storage cost for null values • Used mainly for Big Data / data mining / Business Intelligence analysis
Columnar DB’s - Con’s • Complexity (Installation, infrastructure and usage)
• Design your schema based on how you plan to query the data • Some operations are really time expensive
Practical Use Case
Facebook Messaging Index Table Keyword #1 Keyword #2 Keyword #3
Keyword #... User ID #1 User ID #2 User ID #... Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id
References Seven Databases in Seven Weeks: A Guide to Modern
Databases and the NoSQL Movement by Eric Redmond and Jim R. Wilson
Thank you