Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Designing Algorithms that Scale Horizontally with MongoDB - Luke Ehresman, CopperEgg

mongodb
November 28, 2011

Designing Algorithms that Scale Horizontally with MongoDB - Luke Ehresman, CopperEgg

MongoDallas2011

MongoDB is great for storing immense amounts of data, however writing algorithms to process that data in real-time can be difficult. This talk will discuss designing algorithms that can be scaled out horizontally, use MongoDB as data store, run in parallel, and do not lose atomicity. We will walk through a sample algorithm, see it implemented, and view a live demo.

mongodb

November 28, 2011
Tweet

More Decks by mongodb

Other Decks in Technology

Transcript

  1. Resilient and redundant (replication) Scales horizontally (sharding) M S S

    M S S M S S M S S M S S Monday, November 28, 2011
  2. Resilient and redundant (replication) Scales horizontally (sharding) Parallel processing M

    S S M S S M S S M S S M S S Application (via mongos) Monday, November 28, 2011
  3. M S S M S S M S S M

    S S M S S Application (via mongos) Monday, November 28, 2011
  4. M S S M S S M S S M

    S S M S S docs = collection.find(...); foreach (doc in docs) { do_something(doc); collection.update(doc); } Monday, November 28, 2011
  5. M S S M S S M S S M

    S S M S S Worker Worker Worker Monday, November 28, 2011
  6. M S S M S S M S S M

    S S M S S Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Monday, November 28, 2011
  7. Use a simple algorithm when: Not much data Data is

    easily partitioned (i.e. users) Monday, November 28, 2011
  8. Use a simple algorithm when: Not much data Data is

    easily partitioned (i.e. users) One-time or infrequent execution Monday, November 28, 2011
  9. Use a simple algorithm when: Not much data Data is

    easily partitioned (i.e. users) One-time or infrequent execution Use a more complex algorithm when: Monday, November 28, 2011
  10. Use a simple algorithm when: Not much data Data is

    easily partitioned (i.e. users) One-time or infrequent execution Use a more complex algorithm when: Lots and lots and lots of data Monday, November 28, 2011
  11. Use a simple algorithm when: Not much data Data is

    easily partitioned (i.e. users) One-time or infrequent execution Use a more complex algorithm when: Lots and lots and lots of data Data is not easily partitioned (i.e. queue) Monday, November 28, 2011
  12. Use a simple algorithm when: Not much data Data is

    easily partitioned (i.e. users) One-time or infrequent execution Use a more complex algorithm when: Lots and lots and lots of data Data is not easily partitioned (i.e. queue) Daemon, real-time processing Monday, November 28, 2011
  13. M S S M S S M S S M

    S S M S S Algorithm Algorithm Algorithm Algorithm Algorithm Algorithm Algorithm Algorithm Algorithm Worker Worker Worker Challenges: Monday, November 28, 2011
  14. M S S M S S M S S M

    S S M S S Algorithm Algorithm Algorithm Algorithm Algorithm Algorithm Algorithm Algorithm Algorithm Worker Worker Worker Concurrency Challenges: Monday, November 28, 2011
  15. M S S M S S M S S M

    S S M S S Algorithm Algorithm Algorithm Algorithm Algorithm Algorithm Algorithm Algorithm Algorithm Worker Worker Worker Concurrency Atomicity Challenges: Monday, November 28, 2011
  16. M S S M S S M S S M

    S S M S S Algorithm Algorithm Algorithm Algorithm Algorithm Algorithm Algorithm Algorithm Algorithm Worker Worker Worker Concurrency Atomicity Race conditions Challenges: Monday, November 28, 2011
  17. M S S M S S M S S M

    S S M S S Algorithm Algorithm Algorithm Algorithm Algorithm Algorithm Algorithm Algorithm Algorithm Worker Worker Worker Concurrency Atomicity Race conditions Contention Challenges: Monday, November 28, 2011
  18. Per-document atomic actions No cross-document transactions Atomic Actions in MongoDB

    Multiple operations on one document atomically: increment a value, add to a set, etc. Monday, November 28, 2011
  19. Per-document atomic actions No cross-document transactions Atomic Actions in MongoDB

    Multiple operations on one document atomically: increment a value, add to a set, etc. 1. update 2. find_and_modify Monday, November 28, 2011
  20. Per-document atomic actions No cross-document transactions Atomic Actions in MongoDB

    Multiple operations on one document atomically: increment a value, add to a set, etc. Lots of update operations available: http://www.mongodb.org/display/DOCS/Updating 1. update 2. find_and_modify Monday, November 28, 2011
  21. Atomic Actions in MongoDB Before: { _id: ‘l’, count: 3,

    total_ages: 76, users: [‘ldoe’, ‘lsmith’, ‘ljones’] } Monday, November 28, 2011
  22. Atomic Actions in MongoDB Before: { _id: ‘l’, count: 3,

    total_ages: 76, users: [‘ldoe’, ‘lsmith’, ‘ljones’] } db.test.update({_id: ‘l’, { $inc: {count:1, total_ages:31} $addToSet: {users: ‘lehresman’} }, true); Monday, November 28, 2011
  23. Atomic Actions in MongoDB Before: { _id: ‘l’, count: 3,

    total_ages: 76, users: [‘ldoe’, ‘lsmith’, ‘ljones’] } db.test.update({_id: ‘l’, { $inc: {count:1, total_ages:31} $addToSet: {users: ‘lehresman’} }, true); After: { _id: ‘l’, count: 4, total_ages: 107, users: [‘ldoe’, ‘lsmith’, ‘ljones’, ‘lehresman’] } Monday, November 28, 2011
  24. Atomic Actions in MongoDB Extending the previous example, we could

    not update the ‘l’ document and the ‘m’ document in one atomic action. Monday, November 28, 2011
  25. Small, self-contained units of work Self-reliant, no communication between nodes

    Ephemeral/fault tolerant Requirements: Monday, November 28, 2011
  26. Small, self-contained units of work Self-reliant, no communication between nodes

    Ephemeral/fault tolerant Horizontally scalable Requirements: Monday, November 28, 2011
  27. A Algorithm Overview: F E D C B Worker 1

    Collection Monday, November 28, 2011
  28. A Algorithm Overview: F E D C B Worker 2

    Collection Monday, November 28, 2011
  29. A Algorithm Overview: F E D C B Worker 2

    Collection Monday, November 28, 2011
  30. B A Algorithm Overview: F E D C Worker 1

    Collection Monday, November 28, 2011
  31. B A Algorithm Overview: F E D C Worker Collection

    X Watchdog Monday, November 28, 2011
  32. B A Algorithm Overview: F E D C Worker Collection

    X Watchdog Monday, November 28, 2011
  33. B A Algorithm Overview: F E D C Worker Collection

    X Watchdog Worker Monday, November 28, 2011
  34. B A Algorithm Overview: F E D C Worker Collection

    X Watchdog Worker Monday, November 28, 2011
  35. B A Algorithm Overview: F E D C Collection Watchdog

    Worker Worker Worker Monday, November 28, 2011
  36. 4 Main Components Find and reserve item Work on item

    and save processed data Monday, November 28, 2011
  37. 4 Main Components Find and reserve item Work on item

    and save processed data Release reservation Monday, November 28, 2011
  38. 4 Main Components Find and reserve item Work on item

    and save processed data Release reservation Remove stale reservations (watchdog thread) Monday, November 28, 2011
  39. 4 Main Components Find and reserve item Work on item

    and save processed data Release reservation Remove stale reservations (watchdog thread) Questions so far? Monday, November 28, 2011
  40. 4 Main Components Find and reserve item Work on item

    and save processed data Release reservation Remove stale reservations (watchdog thread) 101101011010110001000100101100010010100010010001 001011000100101000100100010010110001001010001001 000100101100010010100010010001001011000100100010 100010010001001011000100101000100100010010110110 110010010100010010001001011000100101000100100010 010110001001010001001000100101100010010100010100 Enough talk already... Show me the code!! Monday, November 28, 2011
  41. Demo in Pseudocode worker while true { doc = reserve_document()

    if !doc { sleep 1 } else { do_work(doc) save_work(doc) release_reservation(doc) } } Monday, November 28, 2011
  42. Demo in Pseudocode reserve_document doc = db.find_and_modify( query: {reserved_at: 0,

    finished: false}, update: {$set: {reserved_at: now}} ) return doc Monday, November 28, 2011
  43. Demo in Pseudocode reserve_document doc = db.find_and_modify( query: {reserved_at: 0,

    finished: false}, update: {$set: {reserved_at: now}} ) return doc release_reservation(doc) db.update( query: {_id: doc._id}, update: {$set: {reserved_at: 0}} ) Monday, November 28, 2011
  44. Demo in Pseudocode watchdog while true { release_stale_reservations() sleep 5

    } release_stale_reservations db.update( query: {reserved_at: {$lt: now-30, $gt: 0}}, update: {$set: {reserved_at: 0}}, multi: true ) Monday, November 28, 2011
  45. Variation To cycle through a list repeatedly, set a sleep_until

    timestamp when you remove a reservation. When creating a reservation, ignore any documents with a sleep_until in the future. Monday, November 28, 2011
  46. Thanks to: Eric Anderson for help with this presentation and

    demo Images: roflrazzi.com for the Christian Bale image bill barber on flickr for the bucket image http://blog.forgingfire.com for the smoke image theplanetdotcom on flickr for the data center image http://fortunescookie50.blogspot.com for the grimacing lady image Cartoon Network via kidzworld.com for the super dog image Demo code can be found at: http://github.com/copperegg/mongo_scaling_demo Monday, November 28, 2011