Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MapReduce and Columnar DB's
Search
samant
April 02, 2014
Programming
1.5k
0
Share
MapReduce and Columnar DB's
samant
April 02, 2014
More Decks by samant
See All by samant
Introduction to Firebase (May contain some pieces of AppGyver and Polymer)
samant
0
1.3k
Why Ruby on Rails? - Feweb - September 2014
samant
1
1.4k
Beloved JS - JavaScript…. What else?
samant
1
1.7k
WTF: OOCSS like a boss !
samant
7
1.5k
WTF: Document DBs
samant
0
2k
WTF: Rails App Templates
samant
2
2.8k
Other Decks in Programming
See All in Programming
The Arts and Crafts of Work in the AI Era — Toward Mastery in Software Development
kuranuki
1
710
JavaDoc 再入門
nagise
0
260
AI時代のUIはどこへ行く?その2!
yusukebe
19
6.5k
IBM Bobを活用したレガシーアプリの最新化
oniak3ibm
PRO
1
160
LLM本来の能力を解き放つサンドボックス技術とAI民主化への適用
yukukotani
3
2.3k
CLIであることを活かしたGitHub Copilot CLI活用術 / GitHub Copilot CLI Pro Tips & Tricks
nao_mk2
1
1.2k
LLM Plugin for Node-REDの利用方法と開発について
404background
0
160
タクシーアプリ『GO』の バックエンド開発のおける AI利活用と若者のすべて
pyama86
3
1.8k
AIチームを指揮するOSS「TAKT」活用術 / How to Use “TAKT,” an OSS Tool for Orchestrating AI Teams
nrslib
6
800
dRuby over BLE
makicamel
2
300
作って学ぶ、 JSX (TSX) ランタイムの基本
syumai
7
1.5k
Migrations : C'est une question d'hygiène !
vinceamstoutz
0
3.2k
Featured
See All Featured
Organizational Design Perspectives: An Ontology of Organizational Design Elements
kimpetersen
PRO
1
720
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
10
1.2k
Darren the Foodie - Storyboard
khoart
PRO
3
3.4k
Rails Girls Zürich Keynote
gr2m
96
14k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
62k
Effective software design: The role of men in debugging patriarchy in IT @ Voxxed Days AMS
baasie
0
380
The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs
inesmontani
PRO
3
3.5k
The B2B funnel & how to create a winning content strategy
katarinadahlin
PRO
1
380
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
How to train your dragon (web standard)
notwaldorf
97
6.7k
How to Grow Your eCommerce with AI & Automation
katarinadahlin
PRO
1
200
Jess Joyce - The Pitfalls of Following Frameworks
techseoconnect
PRO
1
160
Transcript
MapReduce and Columnar DB’s Amant Stéphane @stephamant
Summary • MapReduce • Columnar DB’s • Practical Use Case
MapReduce
MapReduce - Definition • One of Google’s greatest contributions to
computer science • MapReduce is an algorithmic framework for executing jobs in parallel over several nodes
MapReduce
MapReduce
MapReduce - Major Implementation • Almost always based on Hadoop
- a Framework for the storage and processing of large scaled and distributed data supported by Apache • Itself inspired by Google BigTable Project
Columnar DB’s
Columnar DB’s - Definition Columnar databases are so named because
the important aspect of their design is that data from a given column is stored together. (By contrast, a row-oriented database keeps information about a row together.) In column-oriented databases, adding columns is quite inexpensive.
Columnar DB’s - Definition
Columnar DB’s - Definition
Columnar DB’s - Queries get ‘t1′, ‘r1′, {COLUMN => ‘c1′}
get ‘t1′, ‘r1′, {COLUMN => ['c1', 'c2', 'c3']} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMERANGE => [ts1, ts2], VERSIONS => 4} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1, VERSIONS => 4}
Columnar DB’s - Major Implementation • Cassandra • Hypertable •
HBase
Columnar DBs - Supporting Companies • Facebook • Yahoo •
Ebay • Twitter • Amazon • Google • ...
Columnar DB’s - Pro’s • Horizontal scalability (replication and partitioning)
• Versioning is trivial • No real storage cost for null values • Used mainly for Big Data / data mining / Business Intelligence analysis
Columnar DB’s - Con’s • Complexity (Installation, infrastructure and usage)
• Design your schema based on how you plan to query the data • Some operations are really time expensive
Practical Use Case
Facebook Messaging Index Table Keyword #1 Keyword #2 Keyword #3
Keyword #... User ID #1 User ID #2 User ID #... Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id
References Seven Databases in Seven Weeks: A Guide to Modern
Databases and the NoSQL Movement by Eric Redmond and Jim R. Wilson
Thank you