Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MapReduce and Columnar DB's
Search
samant
April 02, 2014
Programming
1.5k
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
MapReduce and Columnar DB's
samant
April 02, 2014
More Decks by samant
See All by samant
Introduction to Firebase (May contain some pieces of AppGyver and Polymer)
samant
0
1.3k
Why Ruby on Rails? - Feweb - September 2014
samant
1
1.4k
Beloved JS - JavaScript…. What else?
samant
1
1.7k
WTF: OOCSS like a boss !
samant
7
1.5k
WTF: Document DBs
samant
0
2k
WTF: Rails App Templates
samant
2
2.8k
Other Decks in Programming
See All in Programming
Developing with AI Agents — Codex, Claude Code & Cowork Practical Guide
x5gtrn
PRO
0
1.3k
Skillsは効率化、Agentsは"自分の拡張"——Builder時代のエージェント編成(CC Night 2026)
wemra
1
160
Javaの型とAI時代に型が大事な理由 / java types and type in AI era
kishida
2
150
不変条件と整合性境界—ビジネスが決める設計判断と実現パターン / Invariants and Consistency Boundaries
nrslib
14
5.8k
IBM Bobを活用したレガシーアプリの最新化
oniak3ibm
PRO
1
210
AI 輔助遺留系統現代化的經驗分享
jame2408
1
990
AIで効率化できた業務・日常
ochtum
0
140
PHPで使える日時の表現と、その知り方 #frontend_phpcon_do
o0h
PRO
0
260
AI 時代のソフトウェア設計の学び方
masuda220
PRO
29
13k
[2026年度第1回ORセミナー] 計画最適化ベンチャーと競技プログラミング人材
terryu16
0
270
Signal Forms: Details & Live Coding @enterJS 2026 in Mannheim
manfredsteyer
PRO
0
190
気圧・高度・GPSを記録&可視化するアプリ「Koudo」を作った話
hjmkth
1
320
Featured
See All Featured
Stop Working from a Prison Cell
hatefulcrawdad
274
21k
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
Discover your Explorer Soul
emna__ayadi
2
1.1k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
133
19k
Into the Great Unknown - MozCon
thekraken
41
2.6k
Agile that works and the tools we love
rasmusluckow
331
22k
Designing for Performance
lara
611
70k
State of Search Keynote: SEO is Dead Long Live SEO
ryanjones
0
210
JAMstack: Web Apps at Ludicrous Speed - All Things Open 2022
reverentgeek
1
480
Hiding What from Whom? A Critical Review of the History of Programming languages for Music
tomoyanonymous
2
870
Between Models and Reality
mayunak
4
350
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.8k
Transcript
MapReduce and Columnar DB’s Amant Stéphane @stephamant
Summary • MapReduce • Columnar DB’s • Practical Use Case
MapReduce
MapReduce - Definition • One of Google’s greatest contributions to
computer science • MapReduce is an algorithmic framework for executing jobs in parallel over several nodes
MapReduce
MapReduce
MapReduce - Major Implementation • Almost always based on Hadoop
- a Framework for the storage and processing of large scaled and distributed data supported by Apache • Itself inspired by Google BigTable Project
Columnar DB’s
Columnar DB’s - Definition Columnar databases are so named because
the important aspect of their design is that data from a given column is stored together. (By contrast, a row-oriented database keeps information about a row together.) In column-oriented databases, adding columns is quite inexpensive.
Columnar DB’s - Definition
Columnar DB’s - Definition
Columnar DB’s - Queries get ‘t1′, ‘r1′, {COLUMN => ‘c1′}
get ‘t1′, ‘r1′, {COLUMN => ['c1', 'c2', 'c3']} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMERANGE => [ts1, ts2], VERSIONS => 4} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1, VERSIONS => 4}
Columnar DB’s - Major Implementation • Cassandra • Hypertable •
HBase
Columnar DBs - Supporting Companies • Facebook • Yahoo •
Ebay • Twitter • Amazon • Google • ...
Columnar DB’s - Pro’s • Horizontal scalability (replication and partitioning)
• Versioning is trivial • No real storage cost for null values • Used mainly for Big Data / data mining / Business Intelligence analysis
Columnar DB’s - Con’s • Complexity (Installation, infrastructure and usage)
• Design your schema based on how you plan to query the data • Some operations are really time expensive
Practical Use Case
Facebook Messaging Index Table Keyword #1 Keyword #2 Keyword #3
Keyword #... User ID #1 User ID #2 User ID #... Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id
References Seven Databases in Seven Weeks: A Guide to Modern
Databases and the NoSQL Movement by Eric Redmond and Jim R. Wilson
Thank you