MongoDB for Analytics

MongoDB for Analytics

Presented at MongoChicago on November 13, 2012.

E13c31390e0369fcd5972292ce0e7b92?s=128

John Nunemaker

November 13, 2012
Tweet

Transcript

  1. GitHub John Nunemaker MongoChicago 2012 November 12, 2012 MongoDB for

    Analytics A loving conversation with @jnunemaker
  2. Background How hernias can be good for you

  3. None
  4. None
  5. 1 month Of evenings and weekends

  6. 18 months Since public launch

  7. 10-15 Million Page views per day

  8. 2.7 Billion Page views to date

  9. 13 tiny servers 2 web, 6 app, 3 db, 2

    queue
  10. requests/sec

  11. ops/sec

  12. cpu %

  13. lock %

  14. Implementation How we do what we do

  15. Doing It (mostly) Live No aggregate querying

  16. None
  17. None
  18. get('/track.gif') do track_service.record(...) TrackGif end

  19. class TrackService def record(attrs) message = MessagePack.pack(attrs) @client.set(@queue, message) end

    end
  20. class TrackProcessor def run loop { process } end def

    process record @client.get(@queue) end def record(message) attrs = MessagePack.unpack(message) Hit.record(attrs) end end
  21. http://bit.ly/rt-kestrel

  22. class Hit def record site.atomic_update(site_updates) Resolution.record(self) Technology.record(self) Location.record(self) Referrer.record(self) Content.record(self)

    Search.record(self) Notification.record(self) View.record(self) end end
  23. class Resolution def record(hit) query = {'_id' => "..."} update

    = {'$inc' => {}} update['$inc']["sx.#{hit.screenx}"] = 1 update['$inc']["bx.#{hit.browserx}"] = 1 update['$inc']["by.#{hit.browsery}"] = 1 collection(hit.created_on) .update(query, update, :upsert => true) end end end
  24. Pros

  25. Pros Space

  26. Pros Space RAM

  27. Pros Space RAM Reads

  28. Pros Space RAM Reads Live

  29. Cons

  30. Cons Writes

  31. Cons Writes Constraints

  32. Cons Writes Constraints More Forethought

  33. Cons Writes Constraints More Forethought No raw data

  34. http://bit.ly/rt-counters http://bit.ly/rt-counters2

  35. Time Frame Minute, hour, month, day, year, forever?

  36. # of Variations One document vs many

  37. Single Document Per Time Frame

  38. None
  39. { "t" => 336381, "u" => 158951, "2011" => {

    "02" => { "18" => { "t" => 9, "u" => 6 } } } }
  40. { '$inc' => { 't' => 1, 'u' => 1,

    '2011.02.18.t' => 1, '2011.02.18.u' => 1, } }
  41. Single Document For all ranges in time frame

  42. None
  43. { "_id" =>"...:10", "bx" => { "320" => 85, "480"

    => 318, "800" => 1938, "1024" => 5033, "1280" => 6288, "1440" => 2323, "1600" => 3817, "2000" => 137 }, "by" => { "480" => 2205, "600" => 7359,
  44. "600" => 7359, "768" => 4515, "900" => 3833, "1024"

    => 2026 }, "sx" => { "320" => 191, "480" => 179, "800" => 195, "1024" => 1059, "1280" => 5861, "1440" => 3533, "1600" => 7675, "2000" => 1279 } }
  45. { '$inc' => { 'sx.1440' => 1, 'bx.1280' => 1,

    'by.768' => 1, } }
  46. Many Documents Search terms, content, referrers...

  47. None
  48. [ { "_id" => "<oid>:<hash>", "t" => "ruby class variables",

    "sid" => BSON::ObjectId('<oid>'), "v" => 352 }, { "_id" => "<oid>:<hash>", "t" => "ruby unless", "sid" => BSON::ObjectId('<oid>'), "v" => 347 }, ]
  49. Writes {'_id' => "#{sid}:#{hash}"}

  50. Reads [['sid', 1], ['v', -1]]

  51. Growth Don’t say shard, don’t say shard...

  52. Partition Hot Data Currently using collections for time frames

  53. [ "content.2011.7", "content.2011.8", "content.2011.9", "content.2011.10", "content.2011.11", "content.2011.12", "content.2012.1", "content.2012.2", "content.2012.3",

    "content.2012.4", ]
  54. [ "resolutions.2011", "resolutions.2012", ]

  55. Move

  56. Move BigintMove

  57. Move BigintMove MakeYouWannaMove

  58. Move BigintMove MakeYouWannaMove DaMove

  59. Move BigintMove MakeYouWannaMove DaMove SmoothMove

  60. Move BigintMove MakeYouWannaMove DaMove SmoothMove NightMove

  61. Move BigintMove MakeYouWannaMove DaMove SmoothMove NightMove DanceMove

  62. Bigger, Faster Server More CPU, RAM, Disk Space

  63. Users Sites Content Referrers Terms Engines Resolutions Locations Users Sites

    Content Referrers Terms Engines Resolutions Locations
  64. Partition by Function Spread writes across a few servers

  65. Users Sites Content Referrers Terms Engines Resolutions Locations

  66. Partition by Server Spread writes across a ton of servers,

    way down the road, not worried yet
  67. GitHub Thank you! john@github.com John Nunemaker MongoChicago 2012 November 12,

    2012 @jnunemaker