Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Geo & capped collections with MongoDB

Geo & capped collections with MongoDB

Russell Smith

April 20, 2012
Tweet

More Decks by Russell Smith

Other Decks in Technology

Transcript

  1. Geo & capped collections with
    MongoDB - use cases &
    performance in the real world
    Russell Smith
    Friday, 20 April 12

    View Slide

  2. /usr/bin/whoami
    • Russell Smith
    • Consultant for UKD1 Limited
    • Specialising in helping companies going through rapid growth
    • Help with code, architecture, infrastructure, devops, sysops, capacity
    planning, etc
    • <3 gearman, mongodb, neo4j, mysql, kohana, riaksearch, php, debian
    Friday, 20 April 12

    View Slide

  3. What are capped collections?
    • Special option for collections which allow you to;
    • Limit the size of a collection
    • Limit the number of documents in a collection
    Friday, 20 April 12

    View Slide

  4. Why is this useful?
    • Logging
    • News feeds
    • Caching
    Friday, 20 April 12

    View Slide

  5. Restrictions
    • Can’t delete objects (you can drop the collection though)
    • Can’t update objects (well...not 100% true - you can only if the size
    doesn’t change)
    Friday, 20 April 12

    View Slide

  6. Neat things
    • Objects are returned in the insertion order
    • Don’t have to worry about things growing out of control
    • No more cronjobs to delete old data
    Friday, 20 April 12

    View Slide

  7. Real world example
    • SongFor.com:
    Activity feeds for the front of the site are using capped collections
    • StrongHold Kingdoms (MMO):
    Logging of events from syslog / internal events are stored in a capped
    collection (Graylog2 & a custom dashboard system)
    • Lioness: cache + log of data, used to cut lag & reduce API access
    requests to Facebook
    • Findlunch.in: Caching of page data
    Friday, 20 April 12

    View Slide

  8. How to setup...
    • db.createCollection("mycoll", {capped:true, size:100000})
    • db.createCollection("mycoll", {capped:true, size:100000, max:100});
    (limit to 100 documents - this is slower as it adds overhead)
    Friday, 20 April 12

    View Slide

  9. Benchmarking
    • Scripts available on github
    • Testing server: MacBook 5.1, core2 2.4, 4GB RAM, Vertex2
    Friday, 20 April 12

    View Slide

  10. Results: capped
    Friday, 20 April 12

    View Slide

  11. Results: capped & limited
    Friday, 20 April 12

    View Slide

  12. Real world
    • Log data from
    Friday, 20 April 12

    View Slide

  13. What is Geospatial indexing?
    • Allows easily finding things close to a point - great for location based
    services
    • Insert a document as normal, but with some co-ordinates...
    • ensureIndex({location_field: ‘2d’});
    Friday, 20 April 12

    View Slide

  14. Other magic things
    • You can search $within a $box, or around $center point
    (circle)...which is handy
    • Results are sorted by distance from points
    • geoNear (command) will return distance, average distance, etc
    Friday, 20 April 12

    View Slide

  15. Versions & support
    • > v1.3.3 for initial version, which is pretty old
    • >= 1.7.0 added spherical support (more in a minute)
    • < 1.7.2 doesn’t support geo in sharded collections. It still has issues
    see: http://jira.mongodb.org/browse/SERVER-926
    Friday, 20 April 12

    View Slide

  16. and the point is?
    • Really really easy to find documents around a specific point
    • You can get distances from that point as well
    • Lookups are fast and accurate
    • This is a pain to do in MySQL
    Friday, 20 April 12

    View Slide

  17. Example
    • FindLunch.in lets people find lunch close to where they are.
    • Geocode lookup of ~2 million postcodes from the UK
    • All London tube stations & most boroughs are also listed
    • Average query time Xms over 20k of test documents
    Friday, 20 April 12

    View Slide

  18. Benchmarking
    • Machine is the same as for capped collections
    • Code will be on github - minus postcode data with isn’t Opensource
    Friday, 20 April 12

    View Slide

  19. Results
    Friday, 20 April 12

    View Slide

  20. Summary
    • Capped collections - super fast, great for certain things
    • Geo - again, really fast and handy for location based services
    Friday, 20 April 12

    View Slide

  21. Questions?
    • Any questions?
    Friday, 20 April 12

    View Slide