Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB - Operations Best Practices

MongoDB - Operations Best Practices

Presented at MongoDC 2012
A short interactive session, working through 3 problem scenarios, and the possible diagnosis methods, and root cause analysis.

Mike Fiedler

June 27, 2012
Tweet

More Decks by Mike Fiedler

Other Decks in Technology

Transcript

  1. Scenario 1 •  Fire, it is on fire! •  Users

    notice response time takes 1-3 sec •  App logs show timeouts •  Server log show socket exceptions
  2. Scenario 1 - Diagnostics •  Logs •  Understanding the timeouts

    •  Client read timeout set •  Connection closed/discarded •  Symptom not cause •  Server connection exceptions •  Match timing of client timeouts •  Symptom not cause
  3. Scenario 1 - Takeaways •  Monitor Logs •  Alert, escalate

    •  Correlate •  Disk •  Monitor •  Moved to RAID (10) •  Instrument/Monitor App •  Know your application and application (write) characteristics
  4. Scenario 2 •  Alerts warn that server is running hot

    •  Random (small) slowdowns •  Increased traffic/queries
  5. Scenario 2 - Diagnostics •  Turn on DB Profiling • 

    Look at logs •  Identify query patterns •  taking longest •  highest frequency •  Run: <query>.explain()  
  6. Scenario 2 - Explain db.s2.find({...}).sort({...}).explain()    {  "cursor"  :  "BtreeCursor

     ABC",   "nscanned"  :  160677,   "nscannedObjects"  :  12015,   "n"  :  55,   "millis"  :  99,   "scanAndOrder"  :  true,   "indexBounds"  :  {...}  }  
  7. Scenario 2 - Diagnostics •  Create a compound index • 

    Used for criteria and sort •  Reduced CPU dramatically
  8. Scenario 2 - Takeaways •  Performance test/analyze system behavior • 

    Load test before deployment •  Alert on abnormal states •  +CPU may be a sign of poorly indexed data •  Perform a rolling upgrade for indexes
  9. Scenario 3 - Diagnostics Device: rrqm/s wrqm/s r/s w/s rsec/s

    wsec/s sdp 0.00 0.00 0.50 0.00 27.86 0.00 avgrq-sz avgqu-sz await svctm %util 56.00 149.58 20320.00 2010.00 100.00 •  iostat
  10. Scenario 3 - More Diagnostics $ blockdev --report RO RA

    SSZ BSZ StartSec Size Device rw 8096 512 4048 0 1099494850560 /dev/sdp •  Huge read-ahead of 4MB
  11. Scenario 3 - Takeaways •  Pay attention to disk configurations

    •  Load testing would have found this early •  MongoDB depends on the OS a lot •  Connect the dots from disproportionate effects •  Using blockdev, be aware of layering!
  12. Best Practices Covered - 1 •  System provisioning •  Configuration

    •  Performance •  Capacity •  Logs •  Review •  Alert •  Rotate and collect (per cluster)
  13. Best Practices Covered - 2 •  Query/Index Analysis •  Instrument

    app code, generate metrics •  Run .explain() •  Database Profiler •  Plan/test rollouts •  Rolling upgrade for Replica Sets •  Generate indexes on Secondaries first •  Use DNS, not IPs