$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MapReduce and Columnar DB's
Search
samant
April 02, 2014
Programming
0
1.4k
MapReduce and Columnar DB's
samant
April 02, 2014
Tweet
Share
More Decks by samant
See All by samant
Introduction to Firebase (May contain some pieces of AppGyver and Polymer)
samant
0
1.3k
Why Ruby on Rails? - Feweb - September 2014
samant
1
1.4k
Beloved JS - JavaScript…. What else?
samant
1
1.7k
WTF: OOCSS like a boss !
samant
7
1.5k
WTF: Document DBs
samant
0
2k
WTF: Rails App Templates
samant
2
2.8k
Other Decks in Programming
See All in Programming
React Native New Architecture 移行実践報告
taminif
1
150
エディターってAIで操作できるんだぜ
kis9a
0
730
AIコーディングエージェント(Gemini)
kondai24
0
220
How Software Deployment tools have changed in the past 20 years
geshan
0
29k
LLMで複雑な検索条件アセットから脱却する!! 生成的検索インタフェースの設計論
po3rin
3
720
Cell-Based Architecture
larchanjo
0
120
令和最新版Android Studioで化石デバイス向けアプリを作る
arkw
0
400
Cap'n Webについて
yusukebe
0
130
TypeScriptで設計する 堅牢さとUXを両立した非同期ワークフローの実現
moeka__c
6
3k
TestingOsaka6_Ozono
o3
0
150
なあ兄弟、 余白の意味を考えてから UI実装してくれ!
ktcryomm
11
11k
手が足りない!兼業データエンジニアに必要だったアーキテクチャと立ち回り
zinkosuke
0
690
Featured
See All Featured
Building Adaptive Systems
keathley
44
2.9k
GraphQLの誤解/rethinking-graphql
sonatard
73
11k
Reflections from 52 weeks, 52 projects
jeffersonlam
355
21k
Optimising Largest Contentful Paint
csswizardry
37
3.5k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
35
2.3k
Producing Creativity
orderedlist
PRO
348
40k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
1.6k
Faster Mobile Websites
deanohume
310
31k
Optimizing for Happiness
mojombo
379
70k
The Power of CSS Pseudo Elements
geoffreycrofte
80
6.1k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
359
30k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
128
54k
Transcript
MapReduce and Columnar DB’s Amant Stéphane @stephamant
Summary • MapReduce • Columnar DB’s • Practical Use Case
MapReduce
MapReduce - Definition • One of Google’s greatest contributions to
computer science • MapReduce is an algorithmic framework for executing jobs in parallel over several nodes
MapReduce
MapReduce
MapReduce - Major Implementation • Almost always based on Hadoop
- a Framework for the storage and processing of large scaled and distributed data supported by Apache • Itself inspired by Google BigTable Project
Columnar DB’s
Columnar DB’s - Definition Columnar databases are so named because
the important aspect of their design is that data from a given column is stored together. (By contrast, a row-oriented database keeps information about a row together.) In column-oriented databases, adding columns is quite inexpensive.
Columnar DB’s - Definition
Columnar DB’s - Definition
Columnar DB’s - Queries get ‘t1′, ‘r1′, {COLUMN => ‘c1′}
get ‘t1′, ‘r1′, {COLUMN => ['c1', 'c2', 'c3']} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMERANGE => [ts1, ts2], VERSIONS => 4} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1, VERSIONS => 4}
Columnar DB’s - Major Implementation • Cassandra • Hypertable •
HBase
Columnar DBs - Supporting Companies • Facebook • Yahoo •
Ebay • Twitter • Amazon • Google • ...
Columnar DB’s - Pro’s • Horizontal scalability (replication and partitioning)
• Versioning is trivial • No real storage cost for null values • Used mainly for Big Data / data mining / Business Intelligence analysis
Columnar DB’s - Con’s • Complexity (Installation, infrastructure and usage)
• Design your schema based on how you plan to query the data • Some operations are really time expensive
Practical Use Case
Facebook Messaging Index Table Keyword #1 Keyword #2 Keyword #3
Keyword #... User ID #1 User ID #2 User ID #... Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id
References Seven Databases in Seven Weeks: A Guide to Modern
Databases and the NoSQL Movement by Eric Redmond and Jim R. Wilson
Thank you