Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MapReduce and Columnar DB's
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
samant
April 02, 2014
Programming
0
1.4k
MapReduce and Columnar DB's
samant
April 02, 2014
Tweet
Share
More Decks by samant
See All by samant
Introduction to Firebase (May contain some pieces of AppGyver and Polymer)
samant
0
1.3k
Why Ruby on Rails? - Feweb - September 2014
samant
1
1.4k
Beloved JS - JavaScript…. What else?
samant
1
1.7k
WTF: OOCSS like a boss !
samant
7
1.5k
WTF: Document DBs
samant
0
2k
WTF: Rails App Templates
samant
2
2.8k
Other Decks in Programming
See All in Programming
AI Agent の開発と運用を支える Durable Execution #AgentsInProd
izumin5210
7
2.3k
SourceGeneratorのススメ
htkym
0
190
AI時代のキャリアプラン「技術の引力」からの脱出と「問い」へのいざない / tech-gravity
minodriven
21
7.2k
Implementation Patterns
denyspoltorak
0
280
Oxlintはいいぞ
yug1224
5
1.3k
15年続くIoTサービスのSREエンジニアが挑む分散トレーシング導入
melonps
2
200
今こそ知るべき耐量子計算機暗号(PQC)入門 / PQC: What You Need to Know Now
mackey0225
3
370
Data-Centric Kaggle
isax1015
2
770
izumin5210のプロポーザルのネタ探し #tskaigi_msup
izumin5210
1
110
インターン生でもAuth0で認証基盤刷新が出来るのか
taku271
0
190
組織で育むオブザーバビリティ
ryota_hnk
0
170
疑似コードによるプロンプト記述、どのくらい正確に実行される?
kokuyouwind
0
380
Featured
See All Featured
Pawsitive SEO: Lessons from My Dog (and Many Mistakes) on Thriving as a Consultant in the Age of AI
davidcarrasco
0
64
Navigating the moral maze — ethical principles for Al-driven product design
skipperchong
2
240
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
37
6.3k
Building Applications with DynamoDB
mza
96
6.9k
Raft: Consensus for Rubyists
vanstee
141
7.3k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
The State of eCommerce SEO: How to Win in Today's Products SERPs - #SEOweek
aleyda
2
9.5k
Scaling GitHub
holman
464
140k
Max Prin - Stacking Signals: How International SEO Comes Together (And Falls Apart)
techseoconnect
PRO
0
84
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
How to Grow Your eCommerce with AI & Automation
katarinadahlin
PRO
0
110
Paper Plane (Part 1)
katiecoart
PRO
0
4.2k
Transcript
MapReduce and Columnar DB’s Amant Stéphane @stephamant
Summary • MapReduce • Columnar DB’s • Practical Use Case
MapReduce
MapReduce - Definition • One of Google’s greatest contributions to
computer science • MapReduce is an algorithmic framework for executing jobs in parallel over several nodes
MapReduce
MapReduce
MapReduce - Major Implementation • Almost always based on Hadoop
- a Framework for the storage and processing of large scaled and distributed data supported by Apache • Itself inspired by Google BigTable Project
Columnar DB’s
Columnar DB’s - Definition Columnar databases are so named because
the important aspect of their design is that data from a given column is stored together. (By contrast, a row-oriented database keeps information about a row together.) In column-oriented databases, adding columns is quite inexpensive.
Columnar DB’s - Definition
Columnar DB’s - Definition
Columnar DB’s - Queries get ‘t1′, ‘r1′, {COLUMN => ‘c1′}
get ‘t1′, ‘r1′, {COLUMN => ['c1', 'c2', 'c3']} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMERANGE => [ts1, ts2], VERSIONS => 4} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1, VERSIONS => 4}
Columnar DB’s - Major Implementation • Cassandra • Hypertable •
HBase
Columnar DBs - Supporting Companies • Facebook • Yahoo •
Ebay • Twitter • Amazon • Google • ...
Columnar DB’s - Pro’s • Horizontal scalability (replication and partitioning)
• Versioning is trivial • No real storage cost for null values • Used mainly for Big Data / data mining / Business Intelligence analysis
Columnar DB’s - Con’s • Complexity (Installation, infrastructure and usage)
• Design your schema based on how you plan to query the data • Some operations are really time expensive
Practical Use Case
Facebook Messaging Index Table Keyword #1 Keyword #2 Keyword #3
Keyword #... User ID #1 User ID #2 User ID #... Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id
References Seven Databases in Seven Weeks: A Guide to Modern
Databases and the NoSQL Movement by Eric Redmond and Jim R. Wilson
Thank you