Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MapReduce and Columnar DB's
Search
samant
April 02, 2014
Programming
0
1.4k
MapReduce and Columnar DB's
samant
April 02, 2014
Tweet
Share
More Decks by samant
See All by samant
Introduction to Firebase (May contain some pieces of AppGyver and Polymer)
samant
0
1.2k
Why Ruby on Rails? - Feweb - September 2014
samant
1
1.4k
Beloved JS - JavaScript…. What else?
samant
1
1.7k
WTF: OOCSS like a boss !
samant
7
1.5k
WTF: Document DBs
samant
0
2k
WTF: Rails App Templates
samant
2
2.8k
Other Decks in Programming
See All in Programming
クラシルを支える技術と組織
rakutek
0
190
Catch Up: Go Style Guide Update
andpad
0
180
After go func(): Goroutines Through a Beginner’s Eye
97vaibhav
0
230
株式会社 Sun terras カンパニーデック
sunterras
0
230
アメ車でサンノゼを走ってきたよ!
s_shimotori
0
140
NetworkXとGNNで学ぶグラフデータ分析入門〜複雑な関係性を解き明かすPythonの力〜
mhrtech
3
1k
ソフトウェア設計の実践的な考え方
masuda220
PRO
3
490
Pythonスレッドとは結局何なのか? CPython実装から見るNoGIL時代の変化
curekoshimizu
4
1.3k
AIで開発生産性を上げる個人とチームの取り組み
taniigo
0
130
Pull-Requestの内容を1クリックで動作確認可能にするワークフロー
natmark
2
460
プロダクト開発をAI 1stに変革する〜SaaS is dead時代で生き残るために〜 / AI 1st Product Development
kobakei
0
490
なぜGoのジェネリクスはこの形なのか? Featherweight Goが明かす設計の核心
ryotaros
7
1k
Featured
See All Featured
The Straight Up "How To Draw Better" Workshop
denniskardys
237
140k
How to train your dragon (web standard)
notwaldorf
96
6.3k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.6k
Code Review Best Practice
trishagee
72
19k
Bash Introduction
62gerente
615
210k
A better future with KSS
kneath
239
17k
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.4k
Java REST API Framework Comparison - PWX 2021
mraible
33
8.8k
Designing for humans not robots
tammielis
254
25k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.6k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.2k
Transcript
MapReduce and Columnar DB’s Amant Stéphane @stephamant
Summary • MapReduce • Columnar DB’s • Practical Use Case
MapReduce
MapReduce - Definition • One of Google’s greatest contributions to
computer science • MapReduce is an algorithmic framework for executing jobs in parallel over several nodes
MapReduce
MapReduce
MapReduce - Major Implementation • Almost always based on Hadoop
- a Framework for the storage and processing of large scaled and distributed data supported by Apache • Itself inspired by Google BigTable Project
Columnar DB’s
Columnar DB’s - Definition Columnar databases are so named because
the important aspect of their design is that data from a given column is stored together. (By contrast, a row-oriented database keeps information about a row together.) In column-oriented databases, adding columns is quite inexpensive.
Columnar DB’s - Definition
Columnar DB’s - Definition
Columnar DB’s - Queries get ‘t1′, ‘r1′, {COLUMN => ‘c1′}
get ‘t1′, ‘r1′, {COLUMN => ['c1', 'c2', 'c3']} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMERANGE => [ts1, ts2], VERSIONS => 4} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1, VERSIONS => 4}
Columnar DB’s - Major Implementation • Cassandra • Hypertable •
HBase
Columnar DBs - Supporting Companies • Facebook • Yahoo •
Ebay • Twitter • Amazon • Google • ...
Columnar DB’s - Pro’s • Horizontal scalability (replication and partitioning)
• Versioning is trivial • No real storage cost for null values • Used mainly for Big Data / data mining / Business Intelligence analysis
Columnar DB’s - Con’s • Complexity (Installation, infrastructure and usage)
• Design your schema based on how you plan to query the data • Some operations are really time expensive
Practical Use Case
Facebook Messaging Index Table Keyword #1 Keyword #2 Keyword #3
Keyword #... User ID #1 User ID #2 User ID #... Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id
References Seven Databases in Seven Weeks: A Guide to Modern
Databases and the NoSQL Movement by Eric Redmond and Jim R. Wilson
Thank you