Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MapReduce and Columnar DB's
Search
samant
April 02, 2014
Programming
0
1.4k
MapReduce and Columnar DB's
samant
April 02, 2014
Tweet
Share
More Decks by samant
See All by samant
Introduction to Firebase (May contain some pieces of AppGyver and Polymer)
samant
0
1.2k
Why Ruby on Rails? - Feweb - September 2014
samant
1
1.3k
Beloved JS - JavaScript…. What else?
samant
1
1.7k
WTF: OOCSS like a boss !
samant
7
1.5k
WTF: Document DBs
samant
0
2k
WTF: Rails App Templates
samant
2
2.8k
Other Decks in Programming
See All in Programming
iOS 26にアップデートすると実機でのHot Reloadができない?
umigishiaoi
0
130
Deep Dive into ~/.claude/projects
hiragram
14
2.6k
AI時代のソフトウェア開発を考える(2025/07版) / Agentic Software Engineering Findy 2025-07 Edition
twada
PRO
93
31k
Advanced Micro Frontends: Multi Version/ Framework Scenarios @WAD 2025, Berlin
manfredsteyer
PRO
0
280
코딩 에이전트 체크리스트: Claude Code ver.
nacyot
0
690
なぜ適用するか、移行して理解するClean Architecture 〜構造を超えて設計を継承する〜 / Why Apply, Migrate and Understand Clean Architecture - Inherit Design Beyond Structure
seike460
PRO
3
780
10 Costly Database Performance Mistakes (And How To Fix Them)
andyatkinson
0
440
“いい感じ“な定量評価を求めて - Four Keysとアウトカムの間の探求 -
nealle
2
11k
ふつうの技術スタックでアート作品を作ってみる
akira888
1
900
PHPで始める振る舞い駆動開発(Behaviour-Driven Development)
ohmori_yusuke
2
400
Startups on Rails in Past, Present and Future–Irina Nazarova, RailsConf 2025
irinanazarova
0
140
なんとなくわかった気になるブロックテーマ入門/contents.nagoya 2025 6.28
chiilog
1
280
Featured
See All Featured
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Fireside Chat
paigeccino
37
3.5k
GraphQLの誤解/rethinking-graphql
sonatard
71
11k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
138
34k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
31
1.3k
Raft: Consensus for Rubyists
vanstee
140
7k
Docker and Python
trallard
44
3.5k
Being A Developer After 40
akosma
90
590k
Documentation Writing (for coders)
carmenintech
72
4.9k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
60k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
229
22k
Designing Experiences People Love
moore
142
24k
Transcript
MapReduce and Columnar DB’s Amant Stéphane @stephamant
Summary • MapReduce • Columnar DB’s • Practical Use Case
MapReduce
MapReduce - Definition • One of Google’s greatest contributions to
computer science • MapReduce is an algorithmic framework for executing jobs in parallel over several nodes
MapReduce
MapReduce
MapReduce - Major Implementation • Almost always based on Hadoop
- a Framework for the storage and processing of large scaled and distributed data supported by Apache • Itself inspired by Google BigTable Project
Columnar DB’s
Columnar DB’s - Definition Columnar databases are so named because
the important aspect of their design is that data from a given column is stored together. (By contrast, a row-oriented database keeps information about a row together.) In column-oriented databases, adding columns is quite inexpensive.
Columnar DB’s - Definition
Columnar DB’s - Definition
Columnar DB’s - Queries get ‘t1′, ‘r1′, {COLUMN => ‘c1′}
get ‘t1′, ‘r1′, {COLUMN => ['c1', 'c2', 'c3']} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMERANGE => [ts1, ts2], VERSIONS => 4} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1, VERSIONS => 4}
Columnar DB’s - Major Implementation • Cassandra • Hypertable •
HBase
Columnar DBs - Supporting Companies • Facebook • Yahoo •
Ebay • Twitter • Amazon • Google • ...
Columnar DB’s - Pro’s • Horizontal scalability (replication and partitioning)
• Versioning is trivial • No real storage cost for null values • Used mainly for Big Data / data mining / Business Intelligence analysis
Columnar DB’s - Con’s • Complexity (Installation, infrastructure and usage)
• Design your schema based on how you plan to query the data • Some operations are really time expensive
Practical Use Case
Facebook Messaging Index Table Keyword #1 Keyword #2 Keyword #3
Keyword #... User ID #1 User ID #2 User ID #... Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id
References Seven Databases in Seven Weeks: A Guide to Modern
Databases and the NoSQL Movement by Eric Redmond and Jim R. Wilson
Thank you