Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MapReduce and Columnar DB's
Search
samant
April 02, 2014
Programming
0
1.4k
MapReduce and Columnar DB's
samant
April 02, 2014
Tweet
Share
More Decks by samant
See All by samant
Introduction to Firebase (May contain some pieces of AppGyver and Polymer)
samant
0
1.2k
Why Ruby on Rails? - Feweb - September 2014
samant
1
1.4k
Beloved JS - JavaScript…. What else?
samant
1
1.7k
WTF: OOCSS like a boss !
samant
7
1.5k
WTF: Document DBs
samant
0
2k
WTF: Rails App Templates
samant
2
2.8k
Other Decks in Programming
See All in Programming
MCPで実現するAIエージェント駆動のNext.jsアプリデバッグ手法
nyatinte
6
840
モバイルアプリからWebへの横展開を加速した話_Claude_Code_実践術.pdf
kazuyasakamoto
0
260
CSC305 Summer Lecture 12
javiergs
PRO
0
110
実践 Dev Containers × Claude Code
touyu
1
240
開発チーム・開発組織の設計改善スキルの向上
masuda220
PRO
13
7.5k
Introduction to Git & GitHub
latte72
0
120
『リコリス・リコイル』に学ぶ!! 〜キャリア戦略における計画的偶発性理論と変わる勇気の重要性〜
wanko_it
1
590
技術的負債で信頼性が限界だったWordPress運用をShifterで完全復活させた話
rvirus0817
1
2.1k
TROCCO×dbtで実現する人にもAIにもやさしいデータ基盤
nealle
0
330
新世界の理解
koriym
0
140
The state patternの実践 個人開発で培ったpractice集
miyanokomiya
0
150
「リーダーは意思決定する人」って本当?~ 学びを現場で活かす、リーダー4ヶ月目の試行錯誤 ~
marina1017
0
240
Featured
See All Featured
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
23
1.4k
Into the Great Unknown - MozCon
thekraken
40
2k
Music & Morning Musume
bryan
46
6.7k
Learning to Love Humans: Emotional Interface Design
aarron
273
40k
Automating Front-end Workflow
addyosmani
1370
200k
4 Signs Your Business is Dying
shpigford
184
22k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
110
20k
Docker and Python
trallard
45
3.5k
YesSQL, Process and Tooling at Scale
rocio
173
14k
Statistics for Hackers
jakevdp
799
220k
For a Future-Friendly Web
brad_frost
179
9.9k
Building a Modern Day E-commerce SEO Strategy
aleyda
43
7.5k
Transcript
MapReduce and Columnar DB’s Amant Stéphane @stephamant
Summary • MapReduce • Columnar DB’s • Practical Use Case
MapReduce
MapReduce - Definition • One of Google’s greatest contributions to
computer science • MapReduce is an algorithmic framework for executing jobs in parallel over several nodes
MapReduce
MapReduce
MapReduce - Major Implementation • Almost always based on Hadoop
- a Framework for the storage and processing of large scaled and distributed data supported by Apache • Itself inspired by Google BigTable Project
Columnar DB’s
Columnar DB’s - Definition Columnar databases are so named because
the important aspect of their design is that data from a given column is stored together. (By contrast, a row-oriented database keeps information about a row together.) In column-oriented databases, adding columns is quite inexpensive.
Columnar DB’s - Definition
Columnar DB’s - Definition
Columnar DB’s - Queries get ‘t1′, ‘r1′, {COLUMN => ‘c1′}
get ‘t1′, ‘r1′, {COLUMN => ['c1', 'c2', 'c3']} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMERANGE => [ts1, ts2], VERSIONS => 4} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1, VERSIONS => 4}
Columnar DB’s - Major Implementation • Cassandra • Hypertable •
HBase
Columnar DBs - Supporting Companies • Facebook • Yahoo •
Ebay • Twitter • Amazon • Google • ...
Columnar DB’s - Pro’s • Horizontal scalability (replication and partitioning)
• Versioning is trivial • No real storage cost for null values • Used mainly for Big Data / data mining / Business Intelligence analysis
Columnar DB’s - Con’s • Complexity (Installation, infrastructure and usage)
• Design your schema based on how you plan to query the data • Some operations are really time expensive
Practical Use Case
Facebook Messaging Index Table Keyword #1 Keyword #2 Keyword #3
Keyword #... User ID #1 User ID #2 User ID #... Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id
References Seven Databases in Seven Weeks: A Guide to Modern
Databases and the NoSQL Movement by Eric Redmond and Jim R. Wilson
Thank you