Pro Yearly is on sale from $80 to $50! »

MongoDB for Analytics

MongoDB for Analytics

Presented at MongoSF on May 4th, 2012.

E13c31390e0369fcd5972292ce0e7b92?s=128

John Nunemaker
PRO

May 04, 2012
Tweet

Transcript

  1. GitHub John Nunemaker MongoSF 2012 May 4, 2012 MongoDB for

    Analytics A loving conversation with @jnunemaker
  2. None
  3. Background How hernias can be good for you

  4. None
  5. None
  6. 1 month Of evenings and weekends

  7. 1 year Since public launch

  8. 13 tiny servers 2 web, 6 app, 3 db, 2

    queue
  9. 7-8 Million Page views per day

  10. None
  11. None
  12. None
  13. None
  14. Implementation Imma show you how we do what we do

    baby
  15. Doing It (mostly) Live No aggregate querying

  16. None
  17. None
  18. get('/track.gif') do track_service.record(...) TrackGif end

  19. class TrackService def record(attrs) message = MessagePack.pack(attrs) @client.set(@queue, message) end

    end
  20. class TrackProcessor def run loop { process } end def

    process record @client.get(@queue) end def record(message) attrs = MessagePack.unpack(message) Hit.record(attrs) end end
  21. http://bit.ly/rt-kestrel

  22. class Hit def record site.atomic_update(site_updates) Resolution.record(self) Technology.record(self) Location.record(self) Referrer.record(self) Content.record(self)

    Search.record(self) Notification.record(self) View.record(self) end end
  23. class Resolution def record(hit) query = {'_id' => "..."} update

    = {'$inc' => {}} update['$inc']["sx.#{hit.screenx}"] = 1 update['$inc']["bx.#{hit.browserx}"] = 1 update['$inc']["by.#{hit.browsery}"] = 1 collection(hit.created_on) .update(query, update, :upsert => true) end end end
  24. Pros

  25. Pros Space

  26. Pros Space RAM

  27. Pros Space RAM Reads

  28. Pros Space RAM Reads Live

  29. Cons

  30. Cons Writes

  31. Cons Writes Constraints

  32. Cons Writes Constraints More Forethought

  33. Cons Writes Constraints More Forethought No raw data

  34. http://bit.ly/rt-counters http://bit.ly/rt-counters2

  35. Time Frame Minute, hour, month, day, year, forever?

  36. # of Variations One document vs many

  37. Single Document Per Time Frame

  38. None
  39. { "t" => 336381, "u" => 158951, "2011" => {

    "02" => { "18" => { "t" => 9, "u" => 6 } } } }
  40. { '$inc' => { 't' => 1, 'u' => 1,

    '2011.02.18.t' => 1, '2011.02.18.u' => 1, } }
  41. Single Document For all ranges in time frame

  42. None
  43. { "_id" =>"...:10", "bx" => { "320" => 85, "480"

    => 318, "800" => 1938, "1024" => 5033, "1280" => 6288, "1440" => 2323, "1600" => 3817, "2000" => 137 }, "by" => { "480" => 2205, "600" => 7359,
  44. "600" => 7359, "768" => 4515, "900" => 3833, "1024"

    => 2026 }, "sx" => { "320" => 191, "480" => 179, "800" => 195, "1024" => 1059, "1280" => 5861, "1440" => 3533, "1600" => 7675, "2000" => 1279 } }
  45. { '$inc' => { 'sx.1440' => 1, 'bx.1280' => 1,

    'by.768' => 1, } }
  46. Many Documents Search terms, content, referrers...

  47. None
  48. [ { "_id" => "<oid>:<hash>", "t" => "ruby class variables",

    "sid" => BSON::ObjectId('<oid>'), "v" => 352 }, { "_id" => "<oid>:<hash>", "t" => "ruby unless", "sid" => BSON::ObjectId('<oid>'), "v" => 347 }, ]
  49. Writes {'_id' => "#{sid}:#{hash}"}

  50. Reads [['sid', 1], ['v', -1]]

  51. Growth Don’t say shard, don’t say shard...

  52. Partition Hot Data Currently using collections for time frames

  53. Bigger, Faster Server More CPU, RAM, Disk Space

  54. Users Sites Content Referrers Terms Engines Resolutions Locations Users Sites

    Content Referrers Terms Engines Resolutions Locations
  55. Partition by Function Spread writes across a few servers

  56. Users Sites Content Referrers Terms Engines Resolutions Locations

  57. Partition by Server Spread writes across a ton of servers,

    way down the road, not worried yet
  58. GitHub Thank you! john@github.com John Nunemaker MongoSF 2012 May 4,

    2012 @jnunemaker