Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MongoDB for Analytics
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
John Nunemaker
PRO
October 18, 2011
Programming
30k
16
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
MongoDB for Analytics
Presented at Mongo Chicago 2011.
John Nunemaker
PRO
October 18, 2011
More Decks by John Nunemaker
See All by John Nunemaker
Remote First: Building Distributed Teams that Win
jnunemaker
PRO
1
170
AI: The stuff that nobody shows you
jnunemaker
PRO
8
740
Atom
jnunemaker
PRO
10
5.1k
MongoDB for Analytics
jnunemaker
PRO
11
1.1k
Addicted to Stable
jnunemaker
PRO
32
2.9k
MongoDB for Analytics
jnunemaker
PRO
21
2.3k
Why You Should Never Use an ORM
jnunemaker
PRO
61
9.9k
Why NoSQL?
jnunemaker
PRO
10
1k
Don't Repeat Yourself, Repeat Others
jnunemaker
PRO
7
3.5k
Other Decks in Programming
See All in Programming
Agentic UI
manfredsteyer
PRO
0
200
LLM本来の能力を解き放つサンドボックス技術とAI民主化への適用
yukukotani
3
4.6k
[2026年度第1回ORセミナー] 計画最適化ベンチャーと競技プログラミング人材
terryu16
0
270
Honoでのサプライチェーン侵害対策 〜 3つのライブラリに学ぶ
yusukebe
7
1.4k
TSKaigi Night Talks 2026_TypeScriptでサプライチェーンの整合性を型に閉じ込める
geekplus_tech
0
410
Creating Composable Callables in Contemporary C++
rollbear
0
170
Language Server 使ってる? 〜VSCode と Zed の場合〜 / Are you using a Language Server? ~For VS Code and Zed~
handlename
0
810
A2UI という光を覗いてみる
satohjohn
1
160
Snowflake Summitでの新機能 CoCo / CoWork / snowflake-summit-2026-overall-what-new-coco
tatsuhiro
1
190
LLMによるContent Moderationの本番運用の裏側と品質担保への挑戦
suikabar
3
780
Datadog × OpenTelemetry 入門と実践のあいだ
kn_to_maxpno
1
180
なぜ型を書くのか? TSKaigi2026で改めて考える #tskaigi_smarthr
kajitack
0
160
Featured
See All Featured
SEO for Brand Visibility & Recognition
aleyda
0
4.6k
Organizational Design Perspectives: An Ontology of Organizational Design Elements
kimpetersen
PRO
1
750
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.5k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
9
1.4k
Building AI with AI
inesmontani
PRO
1
1.1k
DevOps and Value Stream Thinking: Enabling flow, efficiency and business value
helenjbeal
1
250
Exploring the relationship between traditional SERPs and Gen AI search
raygrieselhuber
PRO
2
4k
Leading Effective Engineering Teams in the AI Era
addyosmani
9
2.1k
Noah Learner - AI + Me: how we built a GSC Bulk Export data pipeline
techseoconnect
PRO
0
200
Primal Persuasion: How to Engage the Brain for Learning That Lasts
tmiket
0
380
Utilizing Notion as your number one productivity tool
mfonobong
4
330
The Language of Interfaces
destraynor
162
27k
Transcript
Ordered List John Nunemaker MongoChi 2011 October 18, 2011 MongoDB
for Analytics A loving conversation with @jnunemaker
Background As presented through interpretive dance
None
None
None
~1 month Of evenings and weekends
~4 dog years Since public launch
~6 tiny servers 2 web, 2 app, 2 db
~1-2 Million Page views per day
None
None
Implementation Imma show you how we do what we do
baby
Doing It Live No aggregate querying
get('/track.gif') do Hit.record(...) TrackGif end
class Hit def record site.atomic_update(site_updates) Resolution.record(self) Technology.record(self) Location.record(self) Referrer.record(self) Content.record(self)
Search.record(self) Notification.record(self) View.record(self) end end
class Resolution def record(hit) query = {'_id' => "..."} update
= {'$inc' => {}} update['$inc']["sx.#{hit.screenx}"] = 1 update['$inc']["bx.#{hit.browserx}"] = 1 update['$inc']["by.#{hit.browsery}"] = 1 collection(hit.created_on) .update(query, update, :upsert => true) end end end
Pros
Pros Space
Pros Space RAM
Pros Space RAM Reads
Pros Space RAM Reads Live
Cons
Cons Writes
Cons Writes Constraints
Cons Writes Constraints More Forethought
Cons Writes Constraints More Forethought No raw data
Time Frame Minute, hour, month, day, year, forever?
# of Variations One document vs many
Single Document Per Time Frame
None
{ "t" => 336381, "u" => 158951, "2011" => {
"02" => { "18" => { "t" => 9, "u" => 6 } } } }
{ '$inc' => { 't' => 1, 'u' => 1,
'2011.02.18.t' => 1, '2011.02.18.u' => 1, } }
Single Document For all ranges in time frame
None
{ "_id" =>"...:10", "bx" => { "320" => 85, "480"
=> 318, "800" => 1938, "1024" => 5033, "1280" => 6288, "1440" => 2323, "1600" => 3817, "2000" => 137 }, "by" => { "480" => 2205, "600" => 7359,
"600" => 7359, "768" => 4515, "900" => 3833, "1024"
=> 2026 }, "sx" => { "320" => 191, "480" => 179, "800" => 195, "1024" => 1059, "1280" => 5861, "1440" => 3533, "1600" => 7675, "2000" => 1279 } }
{ '$inc' => { 'sx.1440' => 1, 'bx.1280' => 1,
'by.768' => 1, } }
Many Documents Search terms, content, referrers...
None
[ { "_id" => "<oid>:<hash>", "t" => "ruby class variables",
"sid" => BSON::ObjectId('<oid>'), "v" => 352 }, { "_id" => "<oid>:<hash>", "t" => "ruby unless", "sid" => BSON::ObjectId('<oid>'), "v" => 347 }, ]
Writes {'_id' => "#{site_id}:#{hash}"}
Reads [['sid', 1], ['v', -1]]
Growth The best laid plans of mice and men
Partition Hot Data Currently using collections for time frames
Bigger, Faster Server More CPU, RAM, Disk Space
Users Sites Content Referrers Terms Engines Resolutions Locations Users Sites
Content Referrers Terms Engines Resolutions Locations
Partition by Function Spread writes across a few servers
Users Sites Content Referrers Terms Engines Resolutions Locations
Partition by Server Spread writes across a ton of servers,
way down the road, not worried yet
Ordered List Thank you!
[email protected]
John Nunemaker MongoChi 2011 October
18, 2011 @jnunemaker