$30 off During Our Annual Pro Sale. View Details »

MongoDB for Analytics

MongoDB for Analytics

Presented at Mongo Chicago 2011.

John Nunemaker
PRO

October 18, 2011
Tweet

More Decks by John Nunemaker

Other Decks in Programming

Transcript

  1. Ordered List
    John Nunemaker
    MongoChi 2011
    October 18, 2011
    MongoDB for Analytics
    A loving conversation with @jnunemaker

    View Slide

  2. Background
    As presented through interpretive dance

    View Slide

  3. View Slide

  4. View Slide

  5. View Slide

  6. ~1 month
    Of evenings and weekends

    View Slide

  7. ~4 dog years
    Since public launch

    View Slide

  8. ~6 tiny servers
    2 web, 2 app, 2 db

    View Slide

  9. ~1-2 Million
    Page views per day

    View Slide

  10. View Slide

  11. View Slide

  12. Implementation
    Imma show you how we do what we do baby

    View Slide

  13. Doing It Live
    No aggregate querying

    View Slide

  14. get('/track.gif') do
    Hit.record(...)
    TrackGif
    end

    View Slide

  15. class Hit
    def record
    site.atomic_update(site_updates)
    Resolution.record(self)
    Technology.record(self)
    Location.record(self)
    Referrer.record(self)
    Content.record(self)
    Search.record(self)
    Notification.record(self)
    View.record(self)
    end
    end

    View Slide

  16. class Resolution
    def record(hit)
    query = {'_id' => "..."}
    update = {'$inc' => {}}
    update['$inc']["sx.#{hit.screenx}"] = 1
    update['$inc']["bx.#{hit.browserx}"] = 1
    update['$inc']["by.#{hit.browsery}"] = 1
    collection(hit.created_on)
    .update(query, update, :upsert => true)
    end
    end
    end

    View Slide

  17. Pros

    View Slide

  18. Pros
    Space

    View Slide

  19. Pros
    Space
    RAM

    View Slide

  20. Pros
    Space
    RAM
    Reads

    View Slide

  21. Pros
    Space
    RAM
    Reads
    Live

    View Slide

  22. Cons

    View Slide

  23. Cons
    Writes

    View Slide

  24. Cons
    Writes
    Constraints

    View Slide

  25. Cons
    Writes
    Constraints
    More Forethought

    View Slide

  26. Cons
    Writes
    Constraints
    More Forethought
    No raw data

    View Slide

  27. Time Frame
    Minute, hour, month, day, year, forever?

    View Slide

  28. # of Variations
    One document vs many

    View Slide

  29. Single Document
    Per Time Frame

    View Slide

  30. View Slide

  31. {
    "t" => 336381,
    "u" => 158951,
    "2011" => {
    "02" => {
    "18" => {
    "t" => 9,
    "u" => 6
    }
    }
    }
    }

    View Slide

  32. {
    '$inc' => {
    't' => 1,
    'u' => 1,
    '2011.02.18.t' => 1,
    '2011.02.18.u' => 1,
    }
    }

    View Slide

  33. Single Document
    For all ranges in time frame

    View Slide

  34. View Slide

  35. {
    "_id" =>"...:10",
    "bx" => {
    "320" => 85,
    "480" => 318,
    "800" => 1938,
    "1024" => 5033,
    "1280" => 6288,
    "1440" => 2323,
    "1600" => 3817,
    "2000" => 137
    },
    "by" => {
    "480" => 2205,
    "600" => 7359,

    View Slide

  36. "600" => 7359,
    "768" => 4515,
    "900" => 3833,
    "1024" => 2026
    },
    "sx" => {
    "320" => 191,
    "480" => 179,
    "800" => 195,
    "1024" => 1059,
    "1280" => 5861,
    "1440" => 3533,
    "1600" => 7675,
    "2000" => 1279
    }
    }

    View Slide

  37. {
    '$inc' => {
    'sx.1440' => 1,
    'bx.1280' => 1,
    'by.768' => 1,
    }
    }

    View Slide

  38. Many Documents
    Search terms, content, referrers...

    View Slide

  39. View Slide

  40. [
    {
    "_id" => ":",
    "t" => "ruby class variables",
    "sid" => BSON::ObjectId(''),
    "v" => 352
    },
    {
    "_id" => ":",
    "t" => "ruby unless",
    "sid" => BSON::ObjectId(''),
    "v" => 347
    },
    ]

    View Slide

  41. Writes
    {'_id' => "#{site_id}:#{hash}"}

    View Slide

  42. Reads
    [['sid', 1], ['v', -1]]

    View Slide

  43. Growth
    The best laid plans of mice and men

    View Slide

  44. Partition Hot Data
    Currently using collections for time frames

    View Slide

  45. Bigger, Faster Server
    More CPU, RAM, Disk Space

    View Slide

  46. Users
    Sites
    Content
    Referrers
    Terms
    Engines
    Resolutions
    Locations
    Users
    Sites
    Content
    Referrers
    Terms
    Engines
    Resolutions
    Locations

    View Slide

  47. Partition by Function
    Spread writes across a few servers

    View Slide

  48. Users
    Sites
    Content
    Referrers
    Terms
    Engines
    Resolutions
    Locations

    View Slide

  49. Partition by Server
    Spread writes across a ton of servers,
    way down the road, not worried yet

    View Slide

  50. Ordered List
    Thank you!
    [email protected]
    John Nunemaker
    MongoChi 2011
    October 18, 2011
    @jnunemaker

    View Slide