Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MongoDB for Analytics
Search
John Nunemaker
PRO
May 04, 2012
Programming
21
2.2k
MongoDB for Analytics
Presented at MongoSF on May 4th, 2012.
John Nunemaker
PRO
May 04, 2012
Tweet
Share
More Decks by John Nunemaker
See All by John Nunemaker
Atom
jnunemaker
PRO
8
3.3k
MongoDB for Analytics
jnunemaker
PRO
9
730
Addicted to Stable
jnunemaker
PRO
32
2.2k
MongoDB for Analytics
jnunemaker
PRO
16
29k
Why You Should Never Use an ORM
jnunemaker
PRO
51
8.9k
Why NoSQL?
jnunemaker
PRO
10
850
Don't Repeat Yourself, Repeat Others
jnunemaker
PRO
7
3.2k
I Have No Talent
jnunemaker
PRO
14
880
Why MongoDB Is Awesome
jnunemaker
PRO
18
4.3k
Other Decks in Programming
See All in Programming
Activities at Cairo Library
cairolibrary720
0
1.2k
Clean Architecture by TypeScript & NestJS
ryounasso
0
150
HMSコンペ 11th Solution (team : kansai-kaggler)
t88
1
680
CSC307 Lecture 05
javiergs
PRO
0
210
企業向け生成AIアプリの 開発から得られた知見
takaakikakei
0
310
Architectures with Lightweight Stores: New Rules and Options
manfredsteyer
PRO
0
100
Xcode 16のPreviewModifierと@Previewableを活用した効率的なプレビュー方法の考察
ojun9
2
160
社内 LT 会を発足し、アウトプット文化を醸成させるために考えたこと・やったこと / Starting internal LT meetings and fostering an output culture
mackey0225
3
120
スクラムマスターって孤独じゃないですか?
yoshitaroyoyo
1
140
20240706_CDKConf
takuyay0ne
0
1.2k
From Spring Boot 2 to Spring Boot 3 with Java 22 and Jakarta EE
ivargrimstad
0
1.9k
Modern Angular: Renovation for Your Applications
manfredsteyer
PRO
0
140
Featured
See All Featured
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
502
140k
The Invisible Customer
myddelton
117
13k
Practical Orchestrator
shlominoach
185
10k
Product Roadmaps are Hard
iamctodd
PRO
48
10k
Why Our Code Smells
bkeepers
PRO
332
56k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
245
1.2M
How GitHub (no longer) Works
holman
305
140k
RailsConf 2023
tenderlove
16
720
Building an army of robots
kneath
301
42k
Done Done
chrislema
179
15k
Design by the Numbers
sachag
277
18k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
325
21k
Transcript
GitHub John Nunemaker MongoSF 2012 May 4, 2012 MongoDB for
Analytics A loving conversation with @jnunemaker
None
Background How hernias can be good for you
None
None
1 month Of evenings and weekends
1 year Since public launch
13 tiny servers 2 web, 6 app, 3 db, 2
queue
7-8 Million Page views per day
None
None
None
None
Implementation Imma show you how we do what we do
baby
Doing It (mostly) Live No aggregate querying
None
None
get('/track.gif') do track_service.record(...) TrackGif end
class TrackService def record(attrs) message = MessagePack.pack(attrs) @client.set(@queue, message) end
end
class TrackProcessor def run loop { process } end def
process record @client.get(@queue) end def record(message) attrs = MessagePack.unpack(message) Hit.record(attrs) end end
http://bit.ly/rt-kestrel
class Hit def record site.atomic_update(site_updates) Resolution.record(self) Technology.record(self) Location.record(self) Referrer.record(self) Content.record(self)
Search.record(self) Notification.record(self) View.record(self) end end
class Resolution def record(hit) query = {'_id' => "..."} update
= {'$inc' => {}} update['$inc']["sx.#{hit.screenx}"] = 1 update['$inc']["bx.#{hit.browserx}"] = 1 update['$inc']["by.#{hit.browsery}"] = 1 collection(hit.created_on) .update(query, update, :upsert => true) end end end
Pros
Pros Space
Pros Space RAM
Pros Space RAM Reads
Pros Space RAM Reads Live
Cons
Cons Writes
Cons Writes Constraints
Cons Writes Constraints More Forethought
Cons Writes Constraints More Forethought No raw data
http://bit.ly/rt-counters http://bit.ly/rt-counters2
Time Frame Minute, hour, month, day, year, forever?
# of Variations One document vs many
Single Document Per Time Frame
None
{ "t" => 336381, "u" => 158951, "2011" => {
"02" => { "18" => { "t" => 9, "u" => 6 } } } }
{ '$inc' => { 't' => 1, 'u' => 1,
'2011.02.18.t' => 1, '2011.02.18.u' => 1, } }
Single Document For all ranges in time frame
None
{ "_id" =>"...:10", "bx" => { "320" => 85, "480"
=> 318, "800" => 1938, "1024" => 5033, "1280" => 6288, "1440" => 2323, "1600" => 3817, "2000" => 137 }, "by" => { "480" => 2205, "600" => 7359,
"600" => 7359, "768" => 4515, "900" => 3833, "1024"
=> 2026 }, "sx" => { "320" => 191, "480" => 179, "800" => 195, "1024" => 1059, "1280" => 5861, "1440" => 3533, "1600" => 7675, "2000" => 1279 } }
{ '$inc' => { 'sx.1440' => 1, 'bx.1280' => 1,
'by.768' => 1, } }
Many Documents Search terms, content, referrers...
None
[ { "_id" => "<oid>:<hash>", "t" => "ruby class variables",
"sid" => BSON::ObjectId('<oid>'), "v" => 352 }, { "_id" => "<oid>:<hash>", "t" => "ruby unless", "sid" => BSON::ObjectId('<oid>'), "v" => 347 }, ]
Writes {'_id' => "#{sid}:#{hash}"}
Reads [['sid', 1], ['v', -1]]
Growth Don’t say shard, don’t say shard...
Partition Hot Data Currently using collections for time frames
Bigger, Faster Server More CPU, RAM, Disk Space
Users Sites Content Referrers Terms Engines Resolutions Locations Users Sites
Content Referrers Terms Engines Resolutions Locations
Partition by Function Spread writes across a few servers
Users Sites Content Referrers Terms Engines Resolutions Locations
Partition by Server Spread writes across a ton of servers,
way down the road, not worried yet
GitHub Thank you!
[email protected]
John Nunemaker MongoSF 2012 May 4,
2012 @jnunemaker