Schema Design at Scale

MongoDB: Schema Design at Scale Rick Copeland @rick446 http://arborian.com Friday,
September 14, 12

Who am I? • Now a consultant/trainer, but formerly... •
Software engineer at SourceForge • Author of Essential SQLAlchemy • Author of MongoDB with Python and Ming • Primarily code Python Friday, September 14, 12

The Inspiration • MongoDB monitoring service (MMS) • Free to
all MongoDB users • Minute-by-minute stats on all your servers • Hardware cost is important, use it efﬁciently (remember it’s a free service!) Friday, September 14, 12

Our Experiment • Similar to MMS but not identical •
Collection of 100 metrics, each with per- minute values • “Simulation time” is 300x real time • Run on 2x AWS small instance • one MongoDB server (2.0.2) • one “load generator” Friday, September 14, 12

Load Generator • Increment each metric as many times as
possible during the course of a simulated minute • Record number of updates per second • Occasionally call getLastError to prevent disconnects Friday, September 14, 12

Schema v1 • One document per metric (per server) per
day • Per hour/minute statistics stored as documents { _id: "20101010/metric-1", metadata: { date: ISODate("2000-10-10T00:00:00Z"), metric: "metric-1" }, daily: 5468426, hourly: { "00": 227850, "01": 210231, ... "23": 20457 }, minute: { "0000": 3612, "0001": 3241, ... "1439": 2819 } } Friday, September 14, 12

Update v1 • Use $inc to update ﬁelds in- place
• Use upsert to create document if it’s missing • Easy, correct, seems like a good idea.... increment = { daily: 1 } increment['hourly.' + hour] = 1 increment['minute.' + minute] = 1 db.stats.update( { _id: id, metadata: metadata }, { $inc: update }, true) // upsert Friday, September 14, 12

Performance of v1 Friday, September 14, 12

Performance of v1 Experiment startup Friday, September 14, 12

Performance of v1 Experiment startup OUCH! Friday, September 14, 12

Problems with v1 • The document movement problem • The
midnight problem • The end-of-the-day problem • The historical query problem Friday, September 14, 12

Document movement problem • MongoDB in-place updates are fast •
... except when they’re not in place • MongoDB adaptively pads documents • ... but it’s better to know your doc size ahead of time Friday, September 14, 12

Midnight problem • Upserts are convenient, but what’s our key?
• date/metric • At midnight, you get a huge spike in inserts Friday, September 14, 12

Fixing the document movement problem • Preallocate documents with zeros
• Crontab (?) • NO! (makes the midnight problem even worse) db.stats.update( { _id: id, metadata: metadata }, { $inc: { daily: 0, hourly.0: 0, hourly.1: 0, ... minute.0: 0, minute.1: 0, ... } true) // upsert Friday, September 14, 12

Fixing the midnight problem Friday, September 14, 12

Fixing the midnight problem • Could schedule preallocation for different
metrics, staggered through the day Friday, September 14, 12

metrics, staggered through the day • Observation: Preallocation isn’t required for correct operation Friday, September 14, 12

metrics, staggered through the day • Observation: Preallocation isn’t required for correct operation • Let’s just preallocate tomorrow’s docs randomly as new stats are inserted (with low probability). Friday, September 14, 12

Performance with Preallocation Experiment startup Friday, September 14, 12

Performance with Preallocation • Well, it’s better Experiment startup Friday,
September 14, 12

Performance with Preallocation • Well, it’s better • Still have
decreasing performance through the day... WTF? Experiment startup Friday, September 14, 12

Problems with v1 • The document movement problem • The
midnight problem • The end-of-the-day problem • The historical query problem Friday, September 14, 12

End-of-day problem • Bson stores documents as an association list
• MongoDB must check each key for a match • Load increases signiﬁcantly at the end of the day (MongoDB must scan 1439 keys to ﬁnd the right minute!) “1439” Value “0000” Value “0001” Value Friday, September 14, 12

Fixing the end-of-day problem • Split up our ‘minute’ property
by hour • Better worst-case keys scanned: • Old: 1439 • New: 82 { _id: "20101010/metric-1", metadata: { date: ISODate("2000-10-10T00:00:00Z"), metric: "metric-1" }, daily: 5468426, hourly: { "0": 227850, "1": 210231, ... "23": 20457 }, minute: { "00": { "0000": 3612, "0100": 3241, ... }, ..., "23": { ..., "1439": 2819 } } Friday, September 14, 12

“Hierarchical minutes” Performance Friday, September 14, 12

Performance Comparision Friday, September 14, 12

Performance Comparision (2.2) Friday, September 14, 12

Historical Query Problem • Intra-day queries are great • What
about “performance year to date”? • Now you’re hitting a lot of “cold” documents and causing page faults Friday, September 14, 12

Fixing the historical query problem • Store multiple levels of
granularity in different collections • 2 updates rather than 1, but historical queries much faster • Preallocate along with daily docs (only infrequently upserted) { _id: "201010/metric-1", metadata: { date: ISODate("2000-10-01T00:00:00Z"), metric: "metric-1" }, daily: { "0": 5468426, "1": ..., ... "31": ... }, } Friday, September 14, 12

Queries • Updates are by _id, so no index needed
there • Chart queries are by metadata • Your range/sort should be last in the compound index db.stats.daily.find( { "metadata.date": { $gte: dt1, $lte: dt2 }, "metadata.metric": "metric-1"}, { "metadata.date": 1, "hourly": 1 } }, sort=[("metadata.date", 1)]) db.stats.daily.ensureIndex({ 'metadata.metric': 1, 'metadata.date': 1 }) Friday, September 14, 12

Conclusion • Monitor your performance. Watch out for spikes. •
Preallocate to prevent document copying • Pay attention to the number of keys in your documents (hierarchy can help) • Make sure your index is optimized for your sorts Friday, September 14, 12

Questions? MongoDB Monitoring Service http://www.10gen.com/mongodb-monitoring-service Rick Copeland @rick446 http://arborian.com MongoDB
Consulting & Training Friday, September 14, 12

Schema Design at Scale

Schema Design at Scale

rick446

More Decks by rick446

Other Decks in Technology

Featured

Transcript

MongoDB: Schema Design at Scale Rick Copeland @rick446 http://arborian.com Friday,

Who am I? • Now a consultant/trainer, but formerly... •

The Inspiration • MongoDB monitoring service (MMS) • Free to

Our Experiment • Similar to MMS but not identical •

Load Generator • Increment each metric as many times as

Schema v1 • One document per metric (per server) per

Update v1 • Use $inc to update ﬁelds in- place

Performance of v1 Friday, September 14, 12

Performance of v1 Experiment startup Friday, September 14, 12

Performance of v1 Experiment startup OUCH! Friday, September 14, 12

Problems with v1 • The document movement problem • The

Document movement problem • MongoDB in-place updates are fast •

Midnight problem • Upserts are convenient, but what’s our key?

Fixing the document movement problem • Preallocate documents with zeros

Fixing the midnight problem Friday, September 14, 12

Fixing the midnight problem • Could schedule preallocation for different

Fixing the midnight problem • Could schedule preallocation for different

Fixing the midnight problem • Could schedule preallocation for different

Performance with Preallocation Experiment startup Friday, September 14, 12

Performance with Preallocation • Well, it’s better Experiment startup Friday,

Performance with Preallocation • Well, it’s better • Still have

Performance with Preallocation • Well, it’s better • Still have

Problems with v1 • The document movement problem • The

End-of-day problem • Bson stores documents as an association list

Fixing the end-of-day problem • Split up our ‘minute’ property

“Hierarchical minutes” Performance Friday, September 14, 12

Performance Comparision Friday, September 14, 12

Performance Comparision (2.2) Friday, September 14, 12

Historical Query Problem • Intra-day queries are great • What

Fixing the historical query problem • Store multiple levels of

Queries • Updates are by _id, so no index needed

Conclusion • Monitor your performance. Watch out for spikes. •

Questions? MongoDB Monitoring Service http://www.10gen.com/mongodb-monitoring-service Rick Copeland @rick446 http://arborian.com MongoDB