Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MongoDB for Analytics
Search
John Nunemaker
PRO
November 13, 2012
Programming
1.1k
11
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
MongoDB for Analytics
Presented at MongoChicago on November 13, 2012.
John Nunemaker
PRO
November 13, 2012
More Decks by John Nunemaker
See All by John Nunemaker
Remote First: Building Distributed Teams that Win
jnunemaker
PRO
1
160
AI: The stuff that nobody shows you
jnunemaker
PRO
8
720
Atom
jnunemaker
PRO
10
5.1k
Addicted to Stable
jnunemaker
PRO
32
2.9k
MongoDB for Analytics
jnunemaker
PRO
21
2.3k
MongoDB for Analytics
jnunemaker
PRO
16
30k
Why You Should Never Use an ORM
jnunemaker
PRO
61
9.9k
Why NoSQL?
jnunemaker
PRO
10
1k
Don't Repeat Yourself, Repeat Others
jnunemaker
PRO
7
3.5k
Other Decks in Programming
See All in Programming
JavaDoc 再入門
nagise
1
370
技術記事、 専門家としてのプログラマ、 言語化
mizchi
13
6.2k
[2026年度第1回ORセミナー] 計画最適化ベンチャーと競技プログラミング人材
terryu16
0
270
DynamoDBには集計系のクエリがないけどなんとかしたい
musan
1
180
Developing with AI Agents — Codex, Claude Code & Cowork Practical Guide
x5gtrn
PRO
0
1.3k
AIだと陥りがちなJakarta EE最新技術への移行時の落とし穴と解決策
tnagao7
0
110
Performance Engineering for Everyone
elenatanasoiu
0
180
The NotImplementedError Problem in Ruby
koic
1
840
TypeScript+Orvalで実現する型安全かつ堅牢でスケーラブルなマルチチャネル通知基盤 / TSKaigi Night talks ~after conference~
d0riven
0
350
TAKTでAI駆動開発の品質を設計する
j5ik2o
7
1.4k
Oxcを導入して開発体験が向上した話
yug1224
4
320
OSもどきOS
arkw
0
570
Featured
See All Featured
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Kristin Tynski - Automating Marketing Tasks With AI
techseoconnect
PRO
0
270
Typedesign – Prime Four
hannesfritz
42
3.1k
Building AI with AI
inesmontani
PRO
1
1.1k
The Art of Programming - Codeland 2020
erikaheidi
57
14k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
250
1.3M
Marketing to machines
jonoalderson
1
5.5k
A Modern Web Designer's Workflow
chriscoyier
698
190k
Leo the Paperboy
mayatellez
7
1.8k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
1.2k
Build The Right Thing And Hit Your Dates
maggiecrowley
39
3.2k
Impact Scores and Hybrid Strategies: The future of link building
tamaranovitovic
0
310
Transcript
GitHub John Nunemaker MongoChicago 2012 November 12, 2012 MongoDB for
Analytics A loving conversation with @jnunemaker
Background How hernias can be good for you
None
None
1 month Of evenings and weekends
18 months Since public launch
10-15 Million Page views per day
2.7 Billion Page views to date
13 tiny servers 2 web, 6 app, 3 db, 2
queue
requests/sec
ops/sec
cpu %
lock %
Implementation How we do what we do
Doing It (mostly) Live No aggregate querying
None
None
get('/track.gif') do track_service.record(...) TrackGif end
class TrackService def record(attrs) message = MessagePack.pack(attrs) @client.set(@queue, message) end
end
class TrackProcessor def run loop { process } end def
process record @client.get(@queue) end def record(message) attrs = MessagePack.unpack(message) Hit.record(attrs) end end
http://bit.ly/rt-kestrel
class Hit def record site.atomic_update(site_updates) Resolution.record(self) Technology.record(self) Location.record(self) Referrer.record(self) Content.record(self)
Search.record(self) Notification.record(self) View.record(self) end end
class Resolution def record(hit) query = {'_id' => "..."} update
= {'$inc' => {}} update['$inc']["sx.#{hit.screenx}"] = 1 update['$inc']["bx.#{hit.browserx}"] = 1 update['$inc']["by.#{hit.browsery}"] = 1 collection(hit.created_on) .update(query, update, :upsert => true) end end end
Pros
Pros Space
Pros Space RAM
Pros Space RAM Reads
Pros Space RAM Reads Live
Cons
Cons Writes
Cons Writes Constraints
Cons Writes Constraints More Forethought
Cons Writes Constraints More Forethought No raw data
http://bit.ly/rt-counters http://bit.ly/rt-counters2
Time Frame Minute, hour, month, day, year, forever?
# of Variations One document vs many
Single Document Per Time Frame
None
{ "t" => 336381, "u" => 158951, "2011" => {
"02" => { "18" => { "t" => 9, "u" => 6 } } } }
{ '$inc' => { 't' => 1, 'u' => 1,
'2011.02.18.t' => 1, '2011.02.18.u' => 1, } }
Single Document For all ranges in time frame
None
{ "_id" =>"...:10", "bx" => { "320" => 85, "480"
=> 318, "800" => 1938, "1024" => 5033, "1280" => 6288, "1440" => 2323, "1600" => 3817, "2000" => 137 }, "by" => { "480" => 2205, "600" => 7359,
"600" => 7359, "768" => 4515, "900" => 3833, "1024"
=> 2026 }, "sx" => { "320" => 191, "480" => 179, "800" => 195, "1024" => 1059, "1280" => 5861, "1440" => 3533, "1600" => 7675, "2000" => 1279 } }
{ '$inc' => { 'sx.1440' => 1, 'bx.1280' => 1,
'by.768' => 1, } }
Many Documents Search terms, content, referrers...
None
[ { "_id" => "<oid>:<hash>", "t" => "ruby class variables",
"sid" => BSON::ObjectId('<oid>'), "v" => 352 }, { "_id" => "<oid>:<hash>", "t" => "ruby unless", "sid" => BSON::ObjectId('<oid>'), "v" => 347 }, ]
Writes {'_id' => "#{sid}:#{hash}"}
Reads [['sid', 1], ['v', -1]]
Growth Don’t say shard, don’t say shard...
Partition Hot Data Currently using collections for time frames
[ "content.2011.7", "content.2011.8", "content.2011.9", "content.2011.10", "content.2011.11", "content.2011.12", "content.2012.1", "content.2012.2", "content.2012.3",
"content.2012.4", ]
[ "resolutions.2011", "resolutions.2012", ]
Move
Move BigintMove
Move BigintMove MakeYouWannaMove
Move BigintMove MakeYouWannaMove DaMove
Move BigintMove MakeYouWannaMove DaMove SmoothMove
Move BigintMove MakeYouWannaMove DaMove SmoothMove NightMove
Move BigintMove MakeYouWannaMove DaMove SmoothMove NightMove DanceMove
Bigger, Faster Server More CPU, RAM, Disk Space
Users Sites Content Referrers Terms Engines Resolutions Locations Users Sites
Content Referrers Terms Engines Resolutions Locations
Partition by Function Spread writes across a few servers
Users Sites Content Referrers Terms Engines Resolutions Locations
Partition by Server Spread writes across a ton of servers,
way down the road, not worried yet
GitHub Thank you!
[email protected]
John Nunemaker MongoChicago 2012 November 12,
2012 @jnunemaker