Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MapReduce and Columnar DB's
Search
samant
April 02, 2014
Programming
0
1.4k
MapReduce and Columnar DB's
samant
April 02, 2014
Tweet
Share
More Decks by samant
See All by samant
Introduction to Firebase (May contain some pieces of AppGyver and Polymer)
samant
0
1.2k
Why Ruby on Rails? - Feweb - September 2014
samant
1
1.3k
Beloved JS - JavaScript…. What else?
samant
1
1.7k
WTF: OOCSS like a boss !
samant
7
1.5k
WTF: Document DBs
samant
0
2k
WTF: Rails App Templates
samant
2
2.8k
Other Decks in Programming
See All in Programming
QA環境で誰でも自由自在に現在時刻を操って検証できるようにした話
kalibora
1
140
Findy Team+ Awardを受賞したかった!ベストプラクティス応募内容をふりかえり、開発生産性向上もふりかえる / Findy Team Plus Award BestPractice and DPE Retrospective 2024
honyanya
0
140
オニオンアーキテクチャを使って、 Unityと.NETでコードを共有する
soi013
0
370
GitHub CopilotでTypeScriptの コード生成するワザップ
starfish719
26
6k
ドメインイベント増えすぎ問題
h0r15h0
2
570
Package Traits
ikesyo
1
210
令和7年版 あなたが使ってよいフロントエンド機能とは
mugi_uno
10
5.2k
ISUCON14感想戦で85万点まで頑張ってみた
ponyo877
1
590
情報漏洩させないための設計
kubotak
5
1.3k
.NETでOBS Studio操作してみたけど…… / Operating OBS Studio by .NET
skasweb
0
120
chibiccをCILに移植した結果 (NGK2025S版)
kekyo
PRO
0
130
Асинхронность неизбежна: как мы проектировали сервис уведомлений
lamodatech
0
1.3k
Featured
See All Featured
Into the Great Unknown - MozCon
thekraken
34
1.6k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
47
5.1k
Testing 201, or: Great Expectations
jmmastey
41
7.2k
Faster Mobile Websites
deanohume
305
30k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
230
52k
[RailsConf 2023] Rails as a piece of cake
palkan
53
5.1k
Building Flexible Design Systems
yeseniaperezcruz
328
38k
Navigating Team Friction
lara
183
15k
How to train your dragon (web standard)
notwaldorf
89
5.8k
Thoughts on Productivity
jonyablonski
68
4.4k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Fantastic passwords and where to find them - at NoRuKo
philnash
50
2.9k
Transcript
MapReduce and Columnar DB’s Amant Stéphane @stephamant
Summary • MapReduce • Columnar DB’s • Practical Use Case
MapReduce
MapReduce - Definition • One of Google’s greatest contributions to
computer science • MapReduce is an algorithmic framework for executing jobs in parallel over several nodes
MapReduce
MapReduce
MapReduce - Major Implementation • Almost always based on Hadoop
- a Framework for the storage and processing of large scaled and distributed data supported by Apache • Itself inspired by Google BigTable Project
Columnar DB’s
Columnar DB’s - Definition Columnar databases are so named because
the important aspect of their design is that data from a given column is stored together. (By contrast, a row-oriented database keeps information about a row together.) In column-oriented databases, adding columns is quite inexpensive.
Columnar DB’s - Definition
Columnar DB’s - Definition
Columnar DB’s - Queries get ‘t1′, ‘r1′, {COLUMN => ‘c1′}
get ‘t1′, ‘r1′, {COLUMN => ['c1', 'c2', 'c3']} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMERANGE => [ts1, ts2], VERSIONS => 4} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1, VERSIONS => 4}
Columnar DB’s - Major Implementation • Cassandra • Hypertable •
HBase
Columnar DBs - Supporting Companies • Facebook • Yahoo •
Ebay • Twitter • Amazon • Google • ...
Columnar DB’s - Pro’s • Horizontal scalability (replication and partitioning)
• Versioning is trivial • No real storage cost for null values • Used mainly for Big Data / data mining / Business Intelligence analysis
Columnar DB’s - Con’s • Complexity (Installation, infrastructure and usage)
• Design your schema based on how you plan to query the data • Some operations are really time expensive
Practical Use Case
Facebook Messaging Index Table Keyword #1 Keyword #2 Keyword #3
Keyword #... User ID #1 User ID #2 User ID #... Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id
References Seven Databases in Seven Weeks: A Guide to Modern
Databases and the NoSQL Movement by Eric Redmond and Jim R. Wilson
Thank you