Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MapReduce and Columnar DB's
Search
samant
April 02, 2014
Programming
0
1.4k
MapReduce and Columnar DB's
samant
April 02, 2014
Tweet
Share
More Decks by samant
See All by samant
Introduction to Firebase (May contain some pieces of AppGyver and Polymer)
samant
0
1.3k
Why Ruby on Rails? - Feweb - September 2014
samant
1
1.4k
Beloved JS - JavaScript…. What else?
samant
1
1.7k
WTF: OOCSS like a boss !
samant
7
1.5k
WTF: Document DBs
samant
0
2k
WTF: Rails App Templates
samant
2
2.8k
Other Decks in Programming
See All in Programming
Swift ConcurrencyでよりSwiftyに
yuukiw00w
0
270
GC言語のWasm化とComponent Modelサポートの実践と課題 - Scalaの場合
tanishiking
0
110
「抽象に依存せよ」が分からなかった新卒1年目の私が Goのインターフェースと和解するまで
kurogenki
0
120
エンジニアの「手元の自動化」を加速するn8n 2026.02.27
symy2co
0
160
AWS Infrastructure as Code の新機能 2025 総まとめ 〜SA 4人による怒涛のデモ祭り〜
konokenj
10
3.4k
Claude Code Skill入門
mayahoney
0
380
受け入れテスト駆動開発(ATDD)×AI駆動開発 AI時代のATDDの取り組み方を考える
kztakasaki
2
580
Docコメントで始める簡単ガードレール
keisukeikeda
1
120
SourceGeneratorのマーカー属性問題について
htkym
0
200
AIコードレビューの導入・運用と AI駆動開発における「AI4QA」の取り組みについて
hagevvashi
0
470
技術検証結果の整理と解析をAIに任せよう!
keisukeikeda
0
120
RubyとGoでゼロから作る証券システム: 高信頼性が求められるシステムのコードの外側にある設計と運用のリアル
free_world21
0
280
Featured
See All Featured
Claude Code のすすめ
schroneko
67
220k
The SEO Collaboration Effect
kristinabergwall1
0
390
Kristin Tynski - Automating Marketing Tasks With AI
techseoconnect
PRO
0
190
Code Review Best Practice
trishagee
74
20k
Public Speaking Without Barfing On Your Shoes - THAT 2023
reverentgeek
1
340
Ethics towards AI in product and experience design
skipperchong
2
220
Deep Space Network (abreviated)
tonyrice
0
90
コードの90%をAIが書く世界で何が待っているのか / What awaits us in a world where 90% of the code is written by AI
rkaga
60
42k
Money Talks: Using Revenue to Get Sh*t Done
nikkihalliwell
0
180
The innovator’s Mindset - Leading Through an Era of Exponential Change - McGill University 2025
jdejongh
PRO
1
120
jQuery: Nuts, Bolts and Bling
dougneiner
65
8.4k
The Organizational Zoo: Understanding Human Behavior Agility Through Metaphoric Constructive Conversations (based on the works of Arthur Shelley, Ph.D)
kimpetersen
PRO
0
270
Transcript
MapReduce and Columnar DB’s Amant Stéphane @stephamant
Summary • MapReduce • Columnar DB’s • Practical Use Case
MapReduce
MapReduce - Definition • One of Google’s greatest contributions to
computer science • MapReduce is an algorithmic framework for executing jobs in parallel over several nodes
MapReduce
MapReduce
MapReduce - Major Implementation • Almost always based on Hadoop
- a Framework for the storage and processing of large scaled and distributed data supported by Apache • Itself inspired by Google BigTable Project
Columnar DB’s
Columnar DB’s - Definition Columnar databases are so named because
the important aspect of their design is that data from a given column is stored together. (By contrast, a row-oriented database keeps information about a row together.) In column-oriented databases, adding columns is quite inexpensive.
Columnar DB’s - Definition
Columnar DB’s - Definition
Columnar DB’s - Queries get ‘t1′, ‘r1′, {COLUMN => ‘c1′}
get ‘t1′, ‘r1′, {COLUMN => ['c1', 'c2', 'c3']} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMERANGE => [ts1, ts2], VERSIONS => 4} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1, VERSIONS => 4}
Columnar DB’s - Major Implementation • Cassandra • Hypertable •
HBase
Columnar DBs - Supporting Companies • Facebook • Yahoo •
Ebay • Twitter • Amazon • Google • ...
Columnar DB’s - Pro’s • Horizontal scalability (replication and partitioning)
• Versioning is trivial • No real storage cost for null values • Used mainly for Big Data / data mining / Business Intelligence analysis
Columnar DB’s - Con’s • Complexity (Installation, infrastructure and usage)
• Design your schema based on how you plan to query the data • Some operations are really time expensive
Practical Use Case
Facebook Messaging Index Table Keyword #1 Keyword #2 Keyword #3
Keyword #... User ID #1 User ID #2 User ID #... Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id
References Seven Databases in Seven Weeks: A Guide to Modern
Databases and the NoSQL Movement by Eric Redmond and Jim R. Wilson
Thank you