Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MongoDB for Analytics
Search
John Nunemaker
PRO
May 04, 2012
Programming
2.3k
21
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
MongoDB for Analytics
Presented at MongoSF on May 4th, 2012.
John Nunemaker
PRO
May 04, 2012
More Decks by John Nunemaker
See All by John Nunemaker
Remote First: Building Distributed Teams that Win
jnunemaker
PRO
1
160
AI: The stuff that nobody shows you
jnunemaker
PRO
8
720
Atom
jnunemaker
PRO
10
5.1k
MongoDB for Analytics
jnunemaker
PRO
11
1.1k
Addicted to Stable
jnunemaker
PRO
32
2.9k
MongoDB for Analytics
jnunemaker
PRO
16
30k
Why You Should Never Use an ORM
jnunemaker
PRO
61
9.9k
Why NoSQL?
jnunemaker
PRO
10
1k
Don't Repeat Yourself, Repeat Others
jnunemaker
PRO
7
3.5k
Other Decks in Programming
See All in Programming
技術記事、 専門家としてのプログラマ、 言語化
mizchi
13
6.1k
不変条件と整合性境界—ビジネスが決める設計判断と実現パターン / Invariants and Consistency Boundaries
nrslib
13
5.4k
そのテスト、説明できますか?~LWテスト戦略FW~のご紹介
nakahara
0
150
IBM Bobを活用したレガシーアプリの最新化
oniak3ibm
PRO
1
200
Strategic Design in the Frontend: Moduliths & Micro Frontends @DDDEurope
manfredsteyer
PRO
0
110
Make SRE Operations Easier with Azure SRE Agent
kkamegawa
0
6.6k
LLMによるContent Moderationの本番運用の裏側と品質担保への挑戦
suikabar
3
700
AI 時代のソフトウェア設計の学び方
masuda220
PRO
29
13k
軽量Java基盤の設計 DIコンテナに頼らない、長期保守と1秒起動の実現 JJUG CCC 2026 Spring
macha64
0
540
New "Type" system on PicoRuby
pocke
1
960
ユニットテストの先へ:テスト技法で要求・仕様を整理するJava開発実践 / Beyond_Unit_Testing_Practical_Java_Development_Techniques_for_Organizing_Requirements_and_Specifications
shimashima35
0
410
作って学ぶ、 JSX (TSX) ランタイムの基本
syumai
7
1.6k
Featured
See All Featured
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.8k
Jamie Indigo - Trashchat’s Guide to Black Boxes: Technical SEO Tactics for LLMs
techseoconnect
PRO
0
170
The Invisible Side of Design
smashingmag
302
52k
B2B Lead Gen: Tactics, Traps & Triumph
marketingsoph
0
150
Design of three-dimensional binary manipulators for pick-and-place task avoiding obstacles (IECON2024)
konakalab
0
460
Leading Effective Engineering Teams in the AI Era
addyosmani
9
2.1k
Building Flexible Design Systems
yeseniaperezcruz
330
40k
Done Done
chrislema
186
16k
Why Our Code Smells
bkeepers
PRO
340
58k
VelocityConf: Rendering Performance Case Studies
addyosmani
333
25k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Testing 201, or: Great Expectations
jmmastey
46
8.2k
Transcript
GitHub John Nunemaker MongoSF 2012 May 4, 2012 MongoDB for
Analytics A loving conversation with @jnunemaker
None
Background How hernias can be good for you
None
None
1 month Of evenings and weekends
1 year Since public launch
13 tiny servers 2 web, 6 app, 3 db, 2
queue
7-8 Million Page views per day
None
None
None
None
Implementation Imma show you how we do what we do
baby
Doing It (mostly) Live No aggregate querying
None
None
get('/track.gif') do track_service.record(...) TrackGif end
class TrackService def record(attrs) message = MessagePack.pack(attrs) @client.set(@queue, message) end
end
class TrackProcessor def run loop { process } end def
process record @client.get(@queue) end def record(message) attrs = MessagePack.unpack(message) Hit.record(attrs) end end
http://bit.ly/rt-kestrel
class Hit def record site.atomic_update(site_updates) Resolution.record(self) Technology.record(self) Location.record(self) Referrer.record(self) Content.record(self)
Search.record(self) Notification.record(self) View.record(self) end end
class Resolution def record(hit) query = {'_id' => "..."} update
= {'$inc' => {}} update['$inc']["sx.#{hit.screenx}"] = 1 update['$inc']["bx.#{hit.browserx}"] = 1 update['$inc']["by.#{hit.browsery}"] = 1 collection(hit.created_on) .update(query, update, :upsert => true) end end end
Pros
Pros Space
Pros Space RAM
Pros Space RAM Reads
Pros Space RAM Reads Live
Cons
Cons Writes
Cons Writes Constraints
Cons Writes Constraints More Forethought
Cons Writes Constraints More Forethought No raw data
http://bit.ly/rt-counters http://bit.ly/rt-counters2
Time Frame Minute, hour, month, day, year, forever?
# of Variations One document vs many
Single Document Per Time Frame
None
{ "t" => 336381, "u" => 158951, "2011" => {
"02" => { "18" => { "t" => 9, "u" => 6 } } } }
{ '$inc' => { 't' => 1, 'u' => 1,
'2011.02.18.t' => 1, '2011.02.18.u' => 1, } }
Single Document For all ranges in time frame
None
{ "_id" =>"...:10", "bx" => { "320" => 85, "480"
=> 318, "800" => 1938, "1024" => 5033, "1280" => 6288, "1440" => 2323, "1600" => 3817, "2000" => 137 }, "by" => { "480" => 2205, "600" => 7359,
"600" => 7359, "768" => 4515, "900" => 3833, "1024"
=> 2026 }, "sx" => { "320" => 191, "480" => 179, "800" => 195, "1024" => 1059, "1280" => 5861, "1440" => 3533, "1600" => 7675, "2000" => 1279 } }
{ '$inc' => { 'sx.1440' => 1, 'bx.1280' => 1,
'by.768' => 1, } }
Many Documents Search terms, content, referrers...
None
[ { "_id" => "<oid>:<hash>", "t" => "ruby class variables",
"sid" => BSON::ObjectId('<oid>'), "v" => 352 }, { "_id" => "<oid>:<hash>", "t" => "ruby unless", "sid" => BSON::ObjectId('<oid>'), "v" => 347 }, ]
Writes {'_id' => "#{sid}:#{hash}"}
Reads [['sid', 1], ['v', -1]]
Growth Don’t say shard, don’t say shard...
Partition Hot Data Currently using collections for time frames
Bigger, Faster Server More CPU, RAM, Disk Space
Users Sites Content Referrers Terms Engines Resolutions Locations Users Sites
Content Referrers Terms Engines Resolutions Locations
Partition by Function Spread writes across a few servers
Users Sites Content Referrers Terms Engines Resolutions Locations
Partition by Server Spread writes across a ton of servers,
way down the road, not worried yet
GitHub Thank you!
[email protected]
John Nunemaker MongoSF 2012 May 4,
2012 @jnunemaker