Software engineer at SourceForge • Author of Essential SQLAlchemy • Author of MongoDB with Python and Ming • Primarily code Python Friday, September 14, 12
all MongoDB users • Minute-by-minute stats on all your servers • Hardware cost is important, use it efficiently (remember it’s a free service!) Friday, September 14, 12
Collection of 100 metrics, each with per- minute values • “Simulation time” is 300x real time • Run on 2x AWS small instance • one MongoDB server (2.0.2) • one “load generator” Friday, September 14, 12
possible during the course of a simulated minute • Record number of updates per second • Occasionally call getLastError to prevent disconnects Friday, September 14, 12
... except when they’re not in place • MongoDB adaptively pads documents • ... but it’s better to know your doc size ahead of time Friday, September 14, 12
metrics, staggered through the day • Observation: Preallocation isn’t required for correct operation • Let’s just preallocate tomorrow’s docs randomly as new stats are inserted (with low probability). Friday, September 14, 12
• MongoDB must check each key for a match • Load increases significantly at the end of the day (MongoDB must scan 1439 keys to find the right minute!) “1439” Value “0000” Value “0001” Value Friday, September 14, 12
there • Chart queries are by metadata • Your range/sort should be last in the compound index db.stats.daily.find( { "metadata.date": { $gte: dt1, $lte: dt2 }, "metadata.metric": "metric-1"}, { "metadata.date": 1, "hourly": 1 } }, sort=[("metadata.date", 1)]) db.stats.daily.ensureIndex({ 'metadata.metric': 1, 'metadata.date': 1 }) Friday, September 14, 12
Preallocate to prevent document copying • Pay attention to the number of keys in your documents (hierarchy can help) • Make sure your index is optimized for your sorts Friday, September 14, 12