Slide 1

Slide 1 text

Geo & capped collections with MongoDB - use cases & performance in the real world Russell Smith Friday, 20 April 12

Slide 2

Slide 2 text

/usr/bin/whoami • Russell Smith • Consultant for UKD1 Limited • Specialising in helping companies going through rapid growth • Help with code, architecture, infrastructure, devops, sysops, capacity planning, etc • <3 gearman, mongodb, neo4j, mysql, kohana, riaksearch, php, debian Friday, 20 April 12

Slide 3

Slide 3 text

What are capped collections? • Special option for collections which allow you to; • Limit the size of a collection • Limit the number of documents in a collection Friday, 20 April 12

Slide 4

Slide 4 text

Why is this useful? • Logging • News feeds • Caching Friday, 20 April 12

Slide 5

Slide 5 text

Restrictions • Can’t delete objects (you can drop the collection though) • Can’t update objects (well...not 100% true - you can only if the size doesn’t change) Friday, 20 April 12

Slide 6

Slide 6 text

Neat things • Objects are returned in the insertion order • Don’t have to worry about things growing out of control • No more cronjobs to delete old data Friday, 20 April 12

Slide 7

Slide 7 text

Real world example • SongFor.com: Activity feeds for the front of the site are using capped collections • StrongHold Kingdoms (MMO): Logging of events from syslog / internal events are stored in a capped collection (Graylog2 & a custom dashboard system) • Lioness: cache + log of data, used to cut lag & reduce API access requests to Facebook • Findlunch.in: Caching of page data Friday, 20 April 12

Slide 8

Slide 8 text

How to setup... • db.createCollection("mycoll", {capped:true, size:100000}) • db.createCollection("mycoll", {capped:true, size:100000, max:100}); (limit to 100 documents - this is slower as it adds overhead) Friday, 20 April 12

Slide 9

Slide 9 text

Benchmarking • Scripts available on github • Testing server: MacBook 5.1, core2 2.4, 4GB RAM, Vertex2 Friday, 20 April 12

Slide 10

Slide 10 text

Results: capped Friday, 20 April 12

Slide 11

Slide 11 text

Results: capped & limited Friday, 20 April 12

Slide 12

Slide 12 text

Real world • Log data from Friday, 20 April 12

Slide 13

Slide 13 text

What is Geospatial indexing? • Allows easily finding things close to a point - great for location based services • Insert a document as normal, but with some co-ordinates... • ensureIndex({location_field: ‘2d’}); Friday, 20 April 12

Slide 14

Slide 14 text

Other magic things • You can search $within a $box, or around $center point (circle)...which is handy • Results are sorted by distance from points • geoNear (command) will return distance, average distance, etc Friday, 20 April 12

Slide 15

Slide 15 text

Versions & support • > v1.3.3 for initial version, which is pretty old • >= 1.7.0 added spherical support (more in a minute) • < 1.7.2 doesn’t support geo in sharded collections. It still has issues see: http://jira.mongodb.org/browse/SERVER-926 Friday, 20 April 12

Slide 16

Slide 16 text

and the point is? • Really really easy to find documents around a specific point • You can get distances from that point as well • Lookups are fast and accurate • This is a pain to do in MySQL Friday, 20 April 12

Slide 17

Slide 17 text

Example • FindLunch.in lets people find lunch close to where they are. • Geocode lookup of ~2 million postcodes from the UK • All London tube stations & most boroughs are also listed • Average query time Xms over 20k of test documents Friday, 20 April 12

Slide 18

Slide 18 text

Benchmarking • Machine is the same as for capped collections • Code will be on github - minus postcode data with isn’t Opensource Friday, 20 April 12

Slide 19

Slide 19 text

Results Friday, 20 April 12

Slide 20

Slide 20 text

Summary • Capped collections - super fast, great for certain things • Geo - again, really fast and handy for location based services Friday, 20 April 12

Slide 21

Slide 21 text

Questions? • Any questions? Friday, 20 April 12