Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up for free
MongoDB for Analytics
John Nunemaker
PRO
May 04, 2012
Programming
21
2.1k
MongoDB for Analytics
Presented at MongoSF on May 4th, 2012.
John Nunemaker
PRO
May 04, 2012
Tweet
Share
More Decks by John Nunemaker
See All by John Nunemaker
Atom
jnunemaker
PRO
7
2.8k
MongoDB for Analytics
jnunemaker
PRO
8
530
Addicted to Stable
jnunemaker
PRO
32
2k
MongoDB for Analytics
jnunemaker
PRO
16
28k
Why You Should Never Use an ORM
jnunemaker
PRO
49
7.9k
Why NoSQL?
jnunemaker
PRO
10
690
Don't Repeat Yourself, Repeat Others
jnunemaker
PRO
7
2.9k
I Have No Talent
jnunemaker
PRO
14
680
Why MongoDB Is Awesome
jnunemaker
PRO
18
4.1k
Other Decks in Programming
See All in Programming
子育てとEMと転職と
_atsushisakai
1
340
NGK2023S - OCaml最高! スマホ開発にも使えちゃう?!
haochenxie
0
120
Milestoner
bkuhlmann
1
240
量子コンピュータ時代のプログラミングセミナー / 20221222_Amplify_seminar _route_optimization
fixstars
0
240
Hono v3 - Do Everything, Run Anywhere, But Small, And Faster
yusukebe
4
130
Most Valuable Bug(?) ~インシデント未遂から得た学び~
tatsumiakahori
0
150
花き業界のサプライチェーンを繋げるプロダクト開発の進め方
userlike1
0
160
Remote SSHで行うVS Codeリモートホスト開発とトラブルシューティング
smt7174
1
460
フロントエンドで 良いコードを書くために
t_keshi
3
1.6k
AWSにおける標的型Bot対策
hacomono
0
410
domain層のモジュール化 / MoT TechTalk #15
mot_techtalk
0
110
How to Fight Production Incidents?
asatarin
0
190
Featured
See All Featured
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
152
13k
GitHub's CSS Performance
jonrohan
1020
430k
The Art of Programming - Codeland 2020
erikaheidi
35
11k
Optimizing for Happiness
mojombo
365
64k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
236
1.1M
Visualization
eitanlees
128
12k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
31
20k
A Philosophy of Restraint
colly
193
15k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
224
50k
How GitHub Uses GitHub to Build GitHub
holman
465
280k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
270
12k
How STYLIGHT went responsive
nonsquared
89
4.2k
Transcript
GitHub John Nunemaker MongoSF 2012 May 4, 2012 MongoDB for
Analytics A loving conversation with @jnunemaker
None
Background How hernias can be good for you
None
None
1 month Of evenings and weekends
1 year Since public launch
13 tiny servers 2 web, 6 app, 3 db, 2
queue
7-8 Million Page views per day
None
None
None
None
Implementation Imma show you how we do what we do
baby
Doing It (mostly) Live No aggregate querying
None
None
get('/track.gif') do track_service.record(...) TrackGif end
class TrackService def record(attrs) message = MessagePack.pack(attrs) @client.set(@queue, message) end
end
class TrackProcessor def run loop { process } end def
process record @client.get(@queue) end def record(message) attrs = MessagePack.unpack(message) Hit.record(attrs) end end
http://bit.ly/rt-kestrel
class Hit def record site.atomic_update(site_updates) Resolution.record(self) Technology.record(self) Location.record(self) Referrer.record(self) Content.record(self)
Search.record(self) Notification.record(self) View.record(self) end end
class Resolution def record(hit) query = {'_id' => "..."} update
= {'$inc' => {}} update['$inc']["sx.#{hit.screenx}"] = 1 update['$inc']["bx.#{hit.browserx}"] = 1 update['$inc']["by.#{hit.browsery}"] = 1 collection(hit.created_on) .update(query, update, :upsert => true) end end end
Pros
Pros Space
Pros Space RAM
Pros Space RAM Reads
Pros Space RAM Reads Live
Cons
Cons Writes
Cons Writes Constraints
Cons Writes Constraints More Forethought
Cons Writes Constraints More Forethought No raw data
http://bit.ly/rt-counters http://bit.ly/rt-counters2
Time Frame Minute, hour, month, day, year, forever?
# of Variations One document vs many
Single Document Per Time Frame
None
{ "t" => 336381, "u" => 158951, "2011" => {
"02" => { "18" => { "t" => 9, "u" => 6 } } } }
{ '$inc' => { 't' => 1, 'u' => 1,
'2011.02.18.t' => 1, '2011.02.18.u' => 1, } }
Single Document For all ranges in time frame
None
{ "_id" =>"...:10", "bx" => { "320" => 85, "480"
=> 318, "800" => 1938, "1024" => 5033, "1280" => 6288, "1440" => 2323, "1600" => 3817, "2000" => 137 }, "by" => { "480" => 2205, "600" => 7359,
"600" => 7359, "768" => 4515, "900" => 3833, "1024"
=> 2026 }, "sx" => { "320" => 191, "480" => 179, "800" => 195, "1024" => 1059, "1280" => 5861, "1440" => 3533, "1600" => 7675, "2000" => 1279 } }
{ '$inc' => { 'sx.1440' => 1, 'bx.1280' => 1,
'by.768' => 1, } }
Many Documents Search terms, content, referrers...
None
[ { "_id" => "<oid>:<hash>", "t" => "ruby class variables",
"sid" => BSON::ObjectId('<oid>'), "v" => 352 }, { "_id" => "<oid>:<hash>", "t" => "ruby unless", "sid" => BSON::ObjectId('<oid>'), "v" => 347 }, ]
Writes {'_id' => "#{sid}:#{hash}"}
Reads [['sid', 1], ['v', -1]]
Growth Don’t say shard, don’t say shard...
Partition Hot Data Currently using collections for time frames
Bigger, Faster Server More CPU, RAM, Disk Space
Users Sites Content Referrers Terms Engines Resolutions Locations Users Sites
Content Referrers Terms Engines Resolutions Locations
Partition by Function Spread writes across a few servers
Users Sites Content Referrers Terms Engines Resolutions Locations
Partition by Server Spread writes across a ton of servers,
way down the road, not worried yet
GitHub Thank you!
[email protected]
John Nunemaker MongoSF 2012 May 4,
2012 @jnunemaker