Geo & capped collections with MongoDB

Geo & capped collections with MongoDB

Ab5cdb1357fae38ad51cc03947b377a5?s=128

Russell Smith

April 20, 2012
Tweet

Transcript

  1. Geo & capped collections with MongoDB - use cases &

    performance in the real world Russell Smith Friday, 20 April 12
  2. /usr/bin/whoami • Russell Smith • Consultant for UKD1 Limited •

    Specialising in helping companies going through rapid growth • Help with code, architecture, infrastructure, devops, sysops, capacity planning, etc • <3 gearman, mongodb, neo4j, mysql, kohana, riaksearch, php, debian Friday, 20 April 12
  3. What are capped collections? • Special option for collections which

    allow you to; • Limit the size of a collection • Limit the number of documents in a collection Friday, 20 April 12
  4. Why is this useful? • Logging • News feeds •

    Caching Friday, 20 April 12
  5. Restrictions • Can’t delete objects (you can drop the collection

    though) • Can’t update objects (well...not 100% true - you can only if the size doesn’t change) Friday, 20 April 12
  6. Neat things • Objects are returned in the insertion order

    • Don’t have to worry about things growing out of control • No more cronjobs to delete old data Friday, 20 April 12
  7. Real world example • SongFor.com: Activity feeds for the front

    of the site are using capped collections • StrongHold Kingdoms (MMO): Logging of events from syslog / internal events are stored in a capped collection (Graylog2 & a custom dashboard system) • Lioness: cache + log of data, used to cut lag & reduce API access requests to Facebook • Findlunch.in: Caching of page data Friday, 20 April 12
  8. How to setup... • db.createCollection("mycoll", {capped:true, size:100000}) • db.createCollection("mycoll", {capped:true,

    size:100000, max:100}); (limit to 100 documents - this is slower as it adds overhead) Friday, 20 April 12
  9. Benchmarking • Scripts available on github • Testing server: MacBook

    5.1, core2 2.4, 4GB RAM, Vertex2 Friday, 20 April 12
  10. Results: capped Friday, 20 April 12

  11. Results: capped & limited Friday, 20 April 12

  12. Real world • Log data from Friday, 20 April 12

  13. What is Geospatial indexing? • Allows easily finding things close

    to a point - great for location based services • Insert a document as normal, but with some co-ordinates... • ensureIndex({location_field: ‘2d’}); Friday, 20 April 12
  14. Other magic things • You can search $within a $box,

    or around $center point (circle)...which is handy • Results are sorted by distance from points • geoNear (command) will return distance, average distance, etc Friday, 20 April 12
  15. Versions & support • > v1.3.3 for initial version, which

    is pretty old • >= 1.7.0 added spherical support (more in a minute) • < 1.7.2 doesn’t support geo in sharded collections. It still has issues see: http://jira.mongodb.org/browse/SERVER-926 Friday, 20 April 12
  16. and the point is? • Really really easy to find

    documents around a specific point • You can get distances from that point as well • Lookups are fast and accurate • This is a pain to do in MySQL Friday, 20 April 12
  17. Example • FindLunch.in lets people find lunch close to where

    they are. • Geocode lookup of ~2 million postcodes from the UK • All London tube stations & most boroughs are also listed • Average query time Xms over 20k of test documents Friday, 20 April 12
  18. Benchmarking • Machine is the same as for capped collections

    • Code will be on github - minus postcode data with isn’t Opensource Friday, 20 April 12
  19. Results Friday, 20 April 12

  20. Summary • Capped collections - super fast, great for certain

    things • Geo - again, really fast and handy for location based services Friday, 20 April 12
  21. Questions? • Any questions? Friday, 20 April 12