Slide 1

Slide 1 text

MongoDB: Schema Design at Scale Rick Copeland @rick446 http://arborian.com Friday, September 14, 12

Slide 2

Slide 2 text

Who am I? • Now a consultant/trainer, but formerly... • Software engineer at SourceForge • Author of Essential SQLAlchemy • Author of MongoDB with Python and Ming • Primarily code Python Friday, September 14, 12

Slide 3

Slide 3 text

The Inspiration • MongoDB monitoring service (MMS) • Free to all MongoDB users • Minute-by-minute stats on all your servers • Hardware cost is important, use it efficiently (remember it’s a free service!) Friday, September 14, 12

Slide 4

Slide 4 text

Our Experiment • Similar to MMS but not identical • Collection of 100 metrics, each with per- minute values • “Simulation time” is 300x real time • Run on 2x AWS small instance • one MongoDB server (2.0.2) • one “load generator” Friday, September 14, 12

Slide 5

Slide 5 text

Load Generator • Increment each metric as many times as possible during the course of a simulated minute • Record number of updates per second • Occasionally call getLastError to prevent disconnects Friday, September 14, 12

Slide 6

Slide 6 text

Schema v1 • One document per metric (per server) per day • Per hour/minute statistics stored as documents { _id: "20101010/metric-1", metadata: { date: ISODate("2000-10-10T00:00:00Z"), metric: "metric-1" }, daily: 5468426, hourly: { "00": 227850, "01": 210231, ... "23": 20457 }, minute: { "0000": 3612, "0001": 3241, ... "1439": 2819 } } Friday, September 14, 12

Slide 7

Slide 7 text

Update v1 • Use $inc to update fields in- place • Use upsert to create document if it’s missing • Easy, correct, seems like a good idea.... increment = { daily: 1 } increment['hourly.' + hour] = 1 increment['minute.' + minute] = 1 db.stats.update( { _id: id, metadata: metadata }, { $inc: update }, true) // upsert Friday, September 14, 12

Slide 8

Slide 8 text

Performance of v1 Friday, September 14, 12

Slide 9

Slide 9 text

Performance of v1 Experiment startup Friday, September 14, 12

Slide 10

Slide 10 text

Performance of v1 Experiment startup OUCH! Friday, September 14, 12

Slide 11

Slide 11 text

Problems with v1 • The document movement problem • The midnight problem • The end-of-the-day problem • The historical query problem Friday, September 14, 12

Slide 12

Slide 12 text

Document movement problem • MongoDB in-place updates are fast • ... except when they’re not in place • MongoDB adaptively pads documents • ... but it’s better to know your doc size ahead of time Friday, September 14, 12

Slide 13

Slide 13 text

Midnight problem • Upserts are convenient, but what’s our key? • date/metric • At midnight, you get a huge spike in inserts Friday, September 14, 12

Slide 14

Slide 14 text

Fixing the document movement problem • Preallocate documents with zeros • Crontab (?) • NO! (makes the midnight problem even worse) db.stats.update( { _id: id, metadata: metadata }, { $inc: { daily: 0, hourly.0: 0, hourly.1: 0, ... minute.0: 0, minute.1: 0, ... } true) // upsert Friday, September 14, 12

Slide 15

Slide 15 text

Fixing the midnight problem Friday, September 14, 12

Slide 16

Slide 16 text

Fixing the midnight problem • Could schedule preallocation for different metrics, staggered through the day Friday, September 14, 12

Slide 17

Slide 17 text

Fixing the midnight problem • Could schedule preallocation for different metrics, staggered through the day • Observation: Preallocation isn’t required for correct operation Friday, September 14, 12

Slide 18

Slide 18 text

Fixing the midnight problem • Could schedule preallocation for different metrics, staggered through the day • Observation: Preallocation isn’t required for correct operation • Let’s just preallocate tomorrow’s docs randomly as new stats are inserted (with low probability). Friday, September 14, 12

Slide 19

Slide 19 text

Performance with Preallocation Experiment startup Friday, September 14, 12

Slide 20

Slide 20 text

Performance with Preallocation • Well, it’s better Experiment startup Friday, September 14, 12

Slide 21

Slide 21 text

Performance with Preallocation • Well, it’s better • Still have decreasing performance through the day... WTF? Experiment startup Friday, September 14, 12

Slide 22

Slide 22 text

Performance with Preallocation • Well, it’s better • Still have decreasing performance through the day... WTF? Experiment startup Friday, September 14, 12

Slide 23

Slide 23 text

Problems with v1 • The document movement problem • The midnight problem • The end-of-the-day problem • The historical query problem Friday, September 14, 12

Slide 24

Slide 24 text

End-of-day problem • Bson stores documents as an association list • MongoDB must check each key for a match • Load increases significantly at the end of the day (MongoDB must scan 1439 keys to find the right minute!) “1439” Value “0000” Value “0001” Value Friday, September 14, 12

Slide 25

Slide 25 text

Fixing the end-of-day problem • Split up our ‘minute’ property by hour • Better worst-case keys scanned: • Old: 1439 • New: 82 { _id: "20101010/metric-1", metadata: { date: ISODate("2000-10-10T00:00:00Z"), metric: "metric-1" }, daily: 5468426, hourly: { "0": 227850, "1": 210231, ... "23": 20457 }, minute: { "00": { "0000": 3612, "0100": 3241, ... }, ..., "23": { ..., "1439": 2819 } } Friday, September 14, 12

Slide 26

Slide 26 text

“Hierarchical minutes” Performance Friday, September 14, 12

Slide 27

Slide 27 text

Performance Comparision Friday, September 14, 12

Slide 28

Slide 28 text

Performance Comparision (2.2) Friday, September 14, 12

Slide 29

Slide 29 text

Historical Query Problem • Intra-day queries are great • What about “performance year to date”? • Now you’re hitting a lot of “cold” documents and causing page faults Friday, September 14, 12

Slide 30

Slide 30 text

Fixing the historical query problem • Store multiple levels of granularity in different collections • 2 updates rather than 1, but historical queries much faster • Preallocate along with daily docs (only infrequently upserted) { _id: "201010/metric-1", metadata: { date: ISODate("2000-10-01T00:00:00Z"), metric: "metric-1" }, daily: { "0": 5468426, "1": ..., ... "31": ... }, } Friday, September 14, 12

Slide 31

Slide 31 text

Queries • Updates are by _id, so no index needed there • Chart queries are by metadata • Your range/sort should be last in the compound index db.stats.daily.find( { "metadata.date": { $gte: dt1, $lte: dt2 }, "metadata.metric": "metric-1"}, { "metadata.date": 1, "hourly": 1 } }, sort=[("metadata.date", 1)]) db.stats.daily.ensureIndex({ 'metadata.metric': 1, 'metadata.date': 1 }) Friday, September 14, 12

Slide 32

Slide 32 text

Conclusion • Monitor your performance. Watch out for spikes. • Preallocate to prevent document copying • Pay attention to the number of keys in your documents (hierarchy can help) • Make sure your index is optimized for your sorts Friday, September 14, 12

Slide 33

Slide 33 text

Questions? MongoDB Monitoring Service http://www.10gen.com/mongodb-monitoring-service Rick Copeland @rick446 http://arborian.com MongoDB Consulting & Training Friday, September 14, 12