Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MapReduce and Columnar DB's
Search
samant
April 02, 2014
Programming
0
1.4k
MapReduce and Columnar DB's
samant
April 02, 2014
Tweet
Share
More Decks by samant
See All by samant
Introduction to Firebase (May contain some pieces of AppGyver and Polymer)
samant
0
1.2k
Why Ruby on Rails? - Feweb - September 2014
samant
1
1.4k
Beloved JS - JavaScript…. What else?
samant
1
1.7k
WTF: OOCSS like a boss !
samant
7
1.5k
WTF: Document DBs
samant
0
2k
WTF: Rails App Templates
samant
2
2.8k
Other Decks in Programming
See All in Programming
Foundation Modelsを実装日本語学習アプリを作ってみた!
hypebeans
0
110
The Past, Present, and Future of Enterprise Java
ivargrimstad
0
360
Devoxx BE - Local Development in the AI Era
kdubois
0
130
Software Architecture
hschwentner
6
2.3k
そのpreloadは必要?見過ごされたpreloadが技術的負債として爆発した日
mugitti9
2
3.4k
Writing Better Go: Lessons from 10 Code Reviews
konradreiche
0
1.2k
バッチ処理を「状態の記録」から「事実の記録」へ
panda728
PRO
0
160
10年もののAPIサーバーにおけるCI/CDの改善の奮闘
mbook
0
820
組込みだけじゃない!TinyGo で始める無料クラウド開発入門
otakakot
0
260
スマホから Youtube Shortsを見られないようにする
lemolatoon
27
32k
Building, Deploying, and Monitoring Ruby Web Applications with Falcon (Kaigi on Rails 2025)
ioquatix
4
2.1k
ソフトウェア設計の実践的な考え方
masuda220
PRO
4
580
Featured
See All Featured
Large-scale JavaScript Application Architecture
addyosmani
514
110k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Why You Should Never Use an ORM
jnunemaker
PRO
59
9.6k
Product Roadmaps are Hard
iamctodd
PRO
54
11k
Building Adaptive Systems
keathley
44
2.8k
Thoughts on Productivity
jonyablonski
70
4.9k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
45
2.5k
Code Review Best Practice
trishagee
72
19k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
30
2.9k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.6k
Git: the NoSQL Database
bkeepers
PRO
431
66k
Transcript
MapReduce and Columnar DB’s Amant Stéphane @stephamant
Summary • MapReduce • Columnar DB’s • Practical Use Case
MapReduce
MapReduce - Definition • One of Google’s greatest contributions to
computer science • MapReduce is an algorithmic framework for executing jobs in parallel over several nodes
MapReduce
MapReduce
MapReduce - Major Implementation • Almost always based on Hadoop
- a Framework for the storage and processing of large scaled and distributed data supported by Apache • Itself inspired by Google BigTable Project
Columnar DB’s
Columnar DB’s - Definition Columnar databases are so named because
the important aspect of their design is that data from a given column is stored together. (By contrast, a row-oriented database keeps information about a row together.) In column-oriented databases, adding columns is quite inexpensive.
Columnar DB’s - Definition
Columnar DB’s - Definition
Columnar DB’s - Queries get ‘t1′, ‘r1′, {COLUMN => ‘c1′}
get ‘t1′, ‘r1′, {COLUMN => ['c1', 'c2', 'c3']} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMERANGE => [ts1, ts2], VERSIONS => 4} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1, VERSIONS => 4}
Columnar DB’s - Major Implementation • Cassandra • Hypertable •
HBase
Columnar DBs - Supporting Companies • Facebook • Yahoo •
Ebay • Twitter • Amazon • Google • ...
Columnar DB’s - Pro’s • Horizontal scalability (replication and partitioning)
• Versioning is trivial • No real storage cost for null values • Used mainly for Big Data / data mining / Business Intelligence analysis
Columnar DB’s - Con’s • Complexity (Installation, infrastructure and usage)
• Design your schema based on how you plan to query the data • Some operations are really time expensive
Practical Use Case
Facebook Messaging Index Table Keyword #1 Keyword #2 Keyword #3
Keyword #... User ID #1 User ID #2 User ID #... Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id
References Seven Databases in Seven Weeks: A Guide to Modern
Databases and the NoSQL Movement by Eric Redmond and Jim R. Wilson
Thank you