Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MapReduce and Columnar DB's
Search
samant
April 02, 2014
Programming
0
1.4k
MapReduce and Columnar DB's
samant
April 02, 2014
Tweet
Share
More Decks by samant
See All by samant
Introduction to Firebase (May contain some pieces of AppGyver and Polymer)
samant
0
1.2k
Why Ruby on Rails? - Feweb - September 2014
samant
1
1.3k
Beloved JS - JavaScript…. What else?
samant
1
1.7k
WTF: OOCSS like a boss !
samant
7
1.5k
WTF: Document DBs
samant
0
2k
WTF: Rails App Templates
samant
2
2.8k
Other Decks in Programming
See All in Programming
PHPの次期バージョンはこの時期どうなっているのか - Internalsの開発体制について - PHPカンファレンス小田原
youkidearitai
PRO
1
220
雑に思考を整理する技術と効能
konifar
63
30k
DMMプラットフォームがTiDB Cloudを採用した背景
pospome
9
4.2k
Go製Webアプリケーションのエラーとの向き合い方大全、あるいはやっぱりスタックトレース欲しいやん / Kyoto.go #50
utgwkk
6
1.7k
新宿ダンジョンを可視化してみた
satoshi7190
3
380
効率化に挑戦してみたらモバイル開発が少し快適になった話
ryunakayama
0
140
大規模Reactアプリのリアーキテクチャ~8万行のTanStack Query移行の軌跡~
kj455
4
1k
SIMD Parallel Programming with the Vector API
josepaumard
0
220
OpenAPIを中心に考えるAPI開発入門 / Introduction to API Development with a Focus on OpenAPI
seike460
PRO
2
170
SwiftUIで使いやすいToastの作り方 / How to build a Toast system which is easy to use in SwiftUI
lovee
3
170
CA.swift19 恋するAIアプリ開発の裏側
oskmr
0
380
Behind VS Code Extensions for JavaScript / TypeScript Linnting and Formatting
unvalley
6
1.1k
Featured
See All Featured
No one is an island. Learnings from fostering a developers community.
thoeni
16
2.1k
ParisWeb 2013: Learning to Love: Crash Course in Emotional UX Design
dotmariusz
104
6.6k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
13
8.3k
Designing on Purpose - Digital PM Summit 2013
jponch
111
6.5k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
126
32k
Automating Front-end Workflow
addyosmani
1357
200k
How to Ace a Technical Interview
jacobian
273
22k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
7
1.3k
Gamification - CAS2011
davidbonilla
77
4.6k
Fontdeck: Realign not Redesign
paulrobertlloyd
76
4.9k
Making the Leap to Tech Lead
cromwellryan
125
8.5k
Faster Mobile Websites
deanohume
300
30k
Transcript
MapReduce and Columnar DB’s Amant Stéphane @stephamant
Summary • MapReduce • Columnar DB’s • Practical Use Case
MapReduce
MapReduce - Definition • One of Google’s greatest contributions to
computer science • MapReduce is an algorithmic framework for executing jobs in parallel over several nodes
MapReduce
MapReduce
MapReduce - Major Implementation • Almost always based on Hadoop
- a Framework for the storage and processing of large scaled and distributed data supported by Apache • Itself inspired by Google BigTable Project
Columnar DB’s
Columnar DB’s - Definition Columnar databases are so named because
the important aspect of their design is that data from a given column is stored together. (By contrast, a row-oriented database keeps information about a row together.) In column-oriented databases, adding columns is quite inexpensive.
Columnar DB’s - Definition
Columnar DB’s - Definition
Columnar DB’s - Queries get ‘t1′, ‘r1′, {COLUMN => ‘c1′}
get ‘t1′, ‘r1′, {COLUMN => ['c1', 'c2', 'c3']} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMERANGE => [ts1, ts2], VERSIONS => 4} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1, VERSIONS => 4}
Columnar DB’s - Major Implementation • Cassandra • Hypertable •
HBase
Columnar DBs - Supporting Companies • Facebook • Yahoo •
Ebay • Twitter • Amazon • Google • ...
Columnar DB’s - Pro’s • Horizontal scalability (replication and partitioning)
• Versioning is trivial • No real storage cost for null values • Used mainly for Big Data / data mining / Business Intelligence analysis
Columnar DB’s - Con’s • Complexity (Installation, infrastructure and usage)
• Design your schema based on how you plan to query the data • Some operations are really time expensive
Practical Use Case
Facebook Messaging Index Table Keyword #1 Keyword #2 Keyword #3
Keyword #... User ID #1 User ID #2 User ID #... Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id
References Seven Databases in Seven Weeks: A Guide to Modern
Databases and the NoSQL Movement by Eric Redmond and Jim R. Wilson
Thank you