Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MapReduce and Columnar DB's
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
samant
April 02, 2014
Programming
1.5k
0
Share
MapReduce and Columnar DB's
samant
April 02, 2014
More Decks by samant
See All by samant
Introduction to Firebase (May contain some pieces of AppGyver and Polymer)
samant
0
1.3k
Why Ruby on Rails? - Feweb - September 2014
samant
1
1.4k
Beloved JS - JavaScript…. What else?
samant
1
1.7k
WTF: OOCSS like a boss !
samant
7
1.5k
WTF: Document DBs
samant
0
2k
WTF: Rails App Templates
samant
2
2.8k
Other Decks in Programming
See All in Programming
GoogleCloudとterraform完全に理解した
terisuke
1
200
Kingdom of the Machine
yui_knk
2
1.5k
サプライチェーン攻撃対策「層を重ねて落ちない壁」を10日間で組み上げた話 #TechLeadConf2026
kashewnuts
1
280
要はバランスからの卒業 #yumemi_grow
kajitack
0
160
Skillは並べた。動かなかった。契約で繋いだ。— 65個のSkillから、自走する開発サイクルへ
junholee
0
590
mruby on C#: From VM Implementation to Game Scripting (RubyKaigi 2026)
hadashia
2
1.8k
When benchmarks go bad - what I learned from measuring performance wrong
hollycummins
0
390
Agentic UI in the Frontend: Architectures with Open Standards @JAX 2026 in Mainz
manfredsteyer
PRO
0
110
AI時代になぜ書くのか
mutsumix
0
400
AI時代だからこそ「Bloc」を採用する価値があるのかもしれない
takuroabe
0
180
Augmenting AI with the Power of Jakarta EE
ivargrimstad
0
450
いつか誰かが、と思っていた フロントエンド刷新5年間の実践知
kiichisugihara
1
280
Featured
See All Featured
The Invisible Side of Design
smashingmag
302
52k
Navigating Weather and Climate Data
rabernat
0
190
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
10k
Bioeconomy Workshop: Dr. Julius Ecuru, Opportunities for a Bioeconomy in West Africa
akademiya2063
PRO
1
110
Navigating the moral maze — ethical principles for Al-driven product design
skipperchong
2
360
Measuring & Analyzing Core Web Vitals
bluesmoon
9
820
Sam Torres - BigQuery for SEOs
techseoconnect
PRO
0
260
Leo the Paperboy
mayatellez
7
1.8k
Effective software design: The role of men in debugging patriarchy in IT @ Voxxed Days AMS
baasie
0
350
What does AI have to do with Human Rights?
axbom
PRO
1
2.1k
Fireside Chat
paigeccino
42
3.9k
SEOcharity - Dark patterns in SEO and UX: How to avoid them and build a more ethical web
sarafernandez
0
180
Transcript
MapReduce and Columnar DB’s Amant Stéphane @stephamant
Summary • MapReduce • Columnar DB’s • Practical Use Case
MapReduce
MapReduce - Definition • One of Google’s greatest contributions to
computer science • MapReduce is an algorithmic framework for executing jobs in parallel over several nodes
MapReduce
MapReduce
MapReduce - Major Implementation • Almost always based on Hadoop
- a Framework for the storage and processing of large scaled and distributed data supported by Apache • Itself inspired by Google BigTable Project
Columnar DB’s
Columnar DB’s - Definition Columnar databases are so named because
the important aspect of their design is that data from a given column is stored together. (By contrast, a row-oriented database keeps information about a row together.) In column-oriented databases, adding columns is quite inexpensive.
Columnar DB’s - Definition
Columnar DB’s - Definition
Columnar DB’s - Queries get ‘t1′, ‘r1′, {COLUMN => ‘c1′}
get ‘t1′, ‘r1′, {COLUMN => ['c1', 'c2', 'c3']} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMERANGE => [ts1, ts2], VERSIONS => 4} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1, VERSIONS => 4}
Columnar DB’s - Major Implementation • Cassandra • Hypertable •
HBase
Columnar DBs - Supporting Companies • Facebook • Yahoo •
Ebay • Twitter • Amazon • Google • ...
Columnar DB’s - Pro’s • Horizontal scalability (replication and partitioning)
• Versioning is trivial • No real storage cost for null values • Used mainly for Big Data / data mining / Business Intelligence analysis
Columnar DB’s - Con’s • Complexity (Installation, infrastructure and usage)
• Design your schema based on how you plan to query the data • Some operations are really time expensive
Practical Use Case
Facebook Messaging Index Table Keyword #1 Keyword #2 Keyword #3
Keyword #... User ID #1 User ID #2 User ID #... Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id
References Seven Databases in Seven Weeks: A Guide to Modern
Databases and the NoSQL Movement by Eric Redmond and Jim R. Wilson
Thank you