MongoDB for Analytics

MongoDB for Analytics

Presented at Mongo Chicago 2011.

E13c31390e0369fcd5972292ce0e7b92?s=128

John Nunemaker
PRO

October 18, 2011
Tweet

Transcript

  1. Ordered List John Nunemaker MongoChi 2011 October 18, 2011 MongoDB

    for Analytics A loving conversation with @jnunemaker
  2. Background As presented through interpretive dance

  3. None
  4. None
  5. None
  6. ~1 month Of evenings and weekends

  7. ~4 dog years Since public launch

  8. ~6 tiny servers 2 web, 2 app, 2 db

  9. ~1-2 Million Page views per day

  10. None
  11. None
  12. Implementation Imma show you how we do what we do

    baby
  13. Doing It Live No aggregate querying

  14. get('/track.gif') do Hit.record(...) TrackGif end

  15. class Hit def record site.atomic_update(site_updates) Resolution.record(self) Technology.record(self) Location.record(self) Referrer.record(self) Content.record(self)

    Search.record(self) Notification.record(self) View.record(self) end end
  16. class Resolution def record(hit) query = {'_id' => "..."} update

    = {'$inc' => {}} update['$inc']["sx.#{hit.screenx}"] = 1 update['$inc']["bx.#{hit.browserx}"] = 1 update['$inc']["by.#{hit.browsery}"] = 1 collection(hit.created_on) .update(query, update, :upsert => true) end end end
  17. Pros

  18. Pros Space

  19. Pros Space RAM

  20. Pros Space RAM Reads

  21. Pros Space RAM Reads Live

  22. Cons

  23. Cons Writes

  24. Cons Writes Constraints

  25. Cons Writes Constraints More Forethought

  26. Cons Writes Constraints More Forethought No raw data

  27. Time Frame Minute, hour, month, day, year, forever?

  28. # of Variations One document vs many

  29. Single Document Per Time Frame

  30. None
  31. { "t" => 336381, "u" => 158951, "2011" => {

    "02" => { "18" => { "t" => 9, "u" => 6 } } } }
  32. { '$inc' => { 't' => 1, 'u' => 1,

    '2011.02.18.t' => 1, '2011.02.18.u' => 1, } }
  33. Single Document For all ranges in time frame

  34. None
  35. { "_id" =>"...:10", "bx" => { "320" => 85, "480"

    => 318, "800" => 1938, "1024" => 5033, "1280" => 6288, "1440" => 2323, "1600" => 3817, "2000" => 137 }, "by" => { "480" => 2205, "600" => 7359,
  36. "600" => 7359, "768" => 4515, "900" => 3833, "1024"

    => 2026 }, "sx" => { "320" => 191, "480" => 179, "800" => 195, "1024" => 1059, "1280" => 5861, "1440" => 3533, "1600" => 7675, "2000" => 1279 } }
  37. { '$inc' => { 'sx.1440' => 1, 'bx.1280' => 1,

    'by.768' => 1, } }
  38. Many Documents Search terms, content, referrers...

  39. None
  40. [ { "_id" => "<oid>:<hash>", "t" => "ruby class variables",

    "sid" => BSON::ObjectId('<oid>'), "v" => 352 }, { "_id" => "<oid>:<hash>", "t" => "ruby unless", "sid" => BSON::ObjectId('<oid>'), "v" => 347 }, ]
  41. Writes {'_id' => "#{site_id}:#{hash}"}

  42. Reads [['sid', 1], ['v', -1]]

  43. Growth The best laid plans of mice and men

  44. Partition Hot Data Currently using collections for time frames

  45. Bigger, Faster Server More CPU, RAM, Disk Space

  46. Users Sites Content Referrers Terms Engines Resolutions Locations Users Sites

    Content Referrers Terms Engines Resolutions Locations
  47. Partition by Function Spread writes across a few servers

  48. Users Sites Content Referrers Terms Engines Resolutions Locations

  49. Partition by Server Spread writes across a ton of servers,

    way down the road, not worried yet
  50. Ordered List Thank you! john@orderedlist.com John Nunemaker MongoChi 2011 October

    18, 2011 @jnunemaker