Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MapReduce and Columnar DB's
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
samant
April 02, 2014
Programming
1.4k
0
Share
MapReduce and Columnar DB's
samant
April 02, 2014
More Decks by samant
See All by samant
Introduction to Firebase (May contain some pieces of AppGyver and Polymer)
samant
0
1.3k
Why Ruby on Rails? - Feweb - September 2014
samant
1
1.4k
Beloved JS - JavaScript…. What else?
samant
1
1.7k
WTF: OOCSS like a boss !
samant
7
1.5k
WTF: Document DBs
samant
0
2k
WTF: Rails App Templates
samant
2
2.8k
Other Decks in Programming
See All in Programming
Agentic AI: Evolution oder Revolution
mobilelarson
PRO
0
230
ポーリング処理廃止によるイベント駆動アーキテクチャへの移行
seitarof
3
1.3k
生成 AI 時代のスナップショットテストってやつを見せてあげますよ(α版)
ojun9
0
330
Smarter Angular mit Transformers.js & Prompt API
christianliebel
PRO
1
110
へんな働き方
yusukebe
6
2.9k
AI 開発合宿を通して得た学び
niftycorp
PRO
0
190
AI活用のコスパを最大化する方法
ochtum
0
370
Java 21/25 Virtual Threads 소개
debop
0
320
Redox OS でのネームスペース管理と chroot の実現
isanethen
0
500
夢の無限スパゲッティ製造機 -実装篇- #phpstudy
o0h
PRO
0
190
野球解説AI Agentを開発してみた - 2026/02/27 LayerX社内LT会資料
shinyorke
PRO
0
390
20260315 AWSなんもわからん🥲
chiilog
2
180
Featured
See All Featured
GitHub's CSS Performance
jonrohan
1032
470k
Effective software design: The role of men in debugging patriarchy in IT @ Voxxed Days AMS
baasie
0
280
How STYLIGHT went responsive
nonsquared
100
6k
Navigating the Design Leadership Dip - Product Design Week Design Leaders+ Conference 2024
apolaine
0
260
How to Talk to Developers About Accessibility
jct
2
170
Avoiding the “Bad Training, Faster” Trap in the Age of AI
tmiket
0
110
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
RailsConf 2023
tenderlove
30
1.4k
The World Runs on Bad Software
bkeepers
PRO
72
12k
Gemini Prompt Engineering: Practical Techniques for Tangible AI Outcomes
mfonobong
2
350
Designing for Performance
lara
611
70k
DBのスキルで生き残る技術 - AI時代におけるテーブル設計の勘所
soudai
PRO
64
53k
Transcript
MapReduce and Columnar DB’s Amant Stéphane @stephamant
Summary • MapReduce • Columnar DB’s • Practical Use Case
MapReduce
MapReduce - Definition • One of Google’s greatest contributions to
computer science • MapReduce is an algorithmic framework for executing jobs in parallel over several nodes
MapReduce
MapReduce
MapReduce - Major Implementation • Almost always based on Hadoop
- a Framework for the storage and processing of large scaled and distributed data supported by Apache • Itself inspired by Google BigTable Project
Columnar DB’s
Columnar DB’s - Definition Columnar databases are so named because
the important aspect of their design is that data from a given column is stored together. (By contrast, a row-oriented database keeps information about a row together.) In column-oriented databases, adding columns is quite inexpensive.
Columnar DB’s - Definition
Columnar DB’s - Definition
Columnar DB’s - Queries get ‘t1′, ‘r1′, {COLUMN => ‘c1′}
get ‘t1′, ‘r1′, {COLUMN => ['c1', 'c2', 'c3']} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMERANGE => [ts1, ts2], VERSIONS => 4} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1, VERSIONS => 4}
Columnar DB’s - Major Implementation • Cassandra • Hypertable •
HBase
Columnar DBs - Supporting Companies • Facebook • Yahoo •
Ebay • Twitter • Amazon • Google • ...
Columnar DB’s - Pro’s • Horizontal scalability (replication and partitioning)
• Versioning is trivial • No real storage cost for null values • Used mainly for Big Data / data mining / Business Intelligence analysis
Columnar DB’s - Con’s • Complexity (Installation, infrastructure and usage)
• Design your schema based on how you plan to query the data • Some operations are really time expensive
Practical Use Case
Facebook Messaging Index Table Keyword #1 Keyword #2 Keyword #3
Keyword #... User ID #1 User ID #2 User ID #... Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id
References Seven Databases in Seven Weeks: A Guide to Modern
Databases and the NoSQL Movement by Eric Redmond and Jim R. Wilson
Thank you