Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Real-time Location Based Social Discovery using...

Real-time Location Based Social Discovery using MongoDB

The slides from my MongoSV 2012 presentation

Avatar for Fredrik Björk

Fredrik Björk

December 04, 2012
Tweet

Other Decks in Technology

Transcript

  1. What is Banjo? • The most powerful location based mobile

    technology that brings you the moments you would otherwise miss • Aggregates geo tagged posts from Facebook, Twitter, Instagram and Foursquare in real-time
  2. 3

  3. Stats • Launched June 2011 • 3 million users •

    Social graph of 400 million profiles • 50 billion connections • ~200 geo posts created per second 4
  4. Why MongoDB? • Developer friendly • Easy to maintain and

    scale • Automatic failover • Rapid prototyping of features • Good fit for consuming, storing and presenting JSON data • Geospatial features out of the box 5
  5. Infrastructure • ~160 EC2 instances (75% MongoDB, 25% Redis) •

    SSD drives for low latency • App servers (Sinatra & Rails) hosted on Heroku • Mongos with authentication running on dedicated servers 6
  6. Geo tagged posts • Consumed as JSON from social network

    APIs - streaming, polling & real-time callbacks • Exposed via REST APIs as JSON to the Banjo iOS and Android apps 7
  7. 9 > db.posts.find({ _id: ‘2:262989592561606656’ }) { _id: “2:262989592561606656”, username:

    “fbjork”, text: “Will give a presentation at #MongoSV on how we use @MongoDB for real-time location based social discovery at @Banjo http://www.10gen.com/ events/mongosv”, ... } https://twitter.com/fbjork/status/262989592561606656 • _id is composed of provider (Facebook: 1, Twitter: 2 etc.) and post id for uniqueness
  8. 10 • Coordinates are stored inside an array with latitude,

    longitude { _id: “2:262989592561606656”, username: “fbjork”, text: “Will give a presentation at #MongoSV on how we use @MongoDB for real-time location based social discovery at @Banjo http://www.10gen.com/ events/mongosv”, coordinates: [37.784234,-122.438212], ... }
  9. 11 • Friends are stored inside an array { _id:

    “2:262989592561606656”, username: “fbjork”, text: “Will give a presentation at #MongoSV on how we use @MongoDB for real-time location based social discovery at @Banjo http://www.10gen.com/ events/mongosv”, coordinates: [37.784234,-122.438212], friend_ids: [8816792, 10324882, 2006261, ...] }
  10. 12

  11. 14 > db.posts.find( { coordinates: { $near: [25.792627,-80.226142] } }

    ) { _id: “2:809438082”, coordinates: [25.792610,-80.226100], username: “Rebecca_Boorsma”, text: “I love Miami!”, ... } { _id: “2:1234567”, coordinates: [25.781324,-80.431423], username: “foo”, text: “Another day, another dollar”, ... } Find nearby posts in Miami:
  12. 15

  13. 16 > db.posts.find({ friend_ids: { $in: [2006261] }) { _id:

    “2:10248172”, username: “fbjork”, friend_ids: [8816792, 10324882, 2006261, ...], ... } Find friend posts globally:
  14. 17 > db.posts.find({ coordinates: { $near: [25.792627,-80.226142] }, friend_ids: {

    $in: [2006261] }) { _id: “2:10248172”, username: “fbjork”, friend_ids: [8816792, 10324882, 2006261, ...], ... } Find friend posts in a location:
  15. Compound geo indexes • Create a compound index on coordinates

    and friend_ids: 18 > db.posts.ensureIndex( { coordinates: ‘2d’, friend_ids: 1 } )
  16. 19 • Fails for compound indexes with large arrays •

    Geospatial indexes have a size limit of 1000 bytes > db.posts.ensureIndex( { coordinates: ‘2d’, friend_ids: 1 } ) Error: Key too large to index
  17. Geospatial query performance • Do we need a compound index

    at all? • Geospatial index is usually restrictive enough • Problem: Array traversal (using $in) is CPU hungry for large arrays • Solution: Pre-sharded array fields 20
  18. Pre-sharded array fields • When dealing with large arrays, i.e

    @BarackObama follower ids • Partition fields using pre-sharding • shard = Hash(key) MOD shard_count • Keep array sizes in the low hundreds 21
  19. 22 { friends_0: [1000, 1002, 1006], friends_1: [1004], friends_2: [1001,

    1003, 1005] } # shard_example.rb SHARDS = 3 friend_ids = [1000 , 1001, 1002, 1003, 1004, 1005, 1006] friend_ids.each { |f| puts Zlib.crc32(f.to_s) % SHARDS } 0 2 0 2 1 2 0
  20. 23 > db.posts.find({ coordinates: { $near: [25.792627,-80.226142] }, friend_0: {

    $in: [1000] }) { friends_0: [1000, 1002, 1006], friends_1: [1004], friends_2: [1001, 1003, 1005] } Find friend posts using pre-sharding of the friend arrays:
  21. Capped collections • Good fit for storing a feed of

    posts for a period of time • Eliminates need to expire old posts • Documents can’t grow • Documents can’t be deleted • Resizing collections is painful • Can’t be sharded 24
  22. TTL collections • We switched to TTL collections with MongoDB

    2.2 • Deleting and growing documents is now possible • Easier to change expiration times • Can be sharded (not by geo) 25