Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoNYC 2012: Real Time Data Analytics

mongodb
June 05, 2012
210

MongoNYC 2012: Real Time Data Analytics

MongoNYC 2012: Real Time Data Analytics, Scott Hernandez, 10gen. There are many ways to build dashboards and reports with detailed aggregations. In this talk we will outline one way of using pre-aggregating as data is collected in order to provide real-time views. You may be familiar with options which include post data collection processing, like map/reduce or the new aggregation framework; we will investigate some of the downsides which we will need to be addressed when defining the use case for counters and pre-aggregation with MongoDB.
Come learn how to build these types of reports and see how easy it can be.

mongodb

June 05, 2012
Tweet

More Decks by mongodb

Transcript

  1. Framework • Know your metrics/counter • Prepared reports • Calculate

    during write • Fast queries • Always up to date • Record time-series collections
  2. Roads not traveled • Map/Reduce • Reprocess raw data •

    Now possible to do partial reduce • Aggregation Framework (aggregate in 2.2) • Also reprocess data on operation (initial release) • Optimizations to come • More costly during reads
  3. Processing • Event received • Split into many updates w/$inc

    • Aggregate • Input Field(s) • Time periods (hourly, monthly, annual) • Defined Metrics
  4. Example Data: github > db.events.findOne() { "repository" : { "url"

    : "https://github.com/vidageek/games", ... "open_issues" : 25, "watchers" : 6, "pushed_at" : "2012/03/10 08:34:00 -0800", "language" : "Java" }, "actor_attributes" : {...}, "created_at" : "2012/03/11 15:20:24 -0700", "public" : true, "actor" : "juliano", "payload" : {...}, "url" : "https://github.com/...", "type" : "CommitCommentEvent” }
  5. Define Metrics • “actor” • “repository.name” • “repository.language” • “type”

    PushEvent, IssuesEvent, WatchEvent, GistEvent • “payload.ref” efs/heads/improved_history, refs/heads/master, refs/heads/signs
  6. Stats > db.stats_hourly.types.find({"_id.type":"GistEvent"}) { "_id" : { "p" : ISODate("2012-05-21T00:00:00Z"),

    "type" : "GistEvent” }, "hour" : { "2" : { "count" : 65 }, "3" : { "count" : 2 }, "7" : { ”count" : 130}, "8" : { "count" : 5 } }, "total" : { ”count" : 202 } }
  7. Updates Increment Query: { ”p" : Date(…), "actor" : "neoplastic"}}

    Update: { "$inc" : { "h.21.c" : 1 , "t.c" : 1}} Upsert : true
  8. The Whys • Multiple data points per document • Documents

    hold many timed points • Good for graphs by time, or types • Nested for improved performance
  9. © Copyright 2010 10gen Inc. drivers at mongodb.org REST ActionScript3

    C# and .NET Clojure ColdFusion Delphi Erlang F# Go: gomongo Groovy Haskell Javascript Lua C C# C++ Erlang Haskell Java Javascript Perl PHP Python Ruby node.js Objective C PHP PowerShell Blog post Python Ruby Scala Scheme (PLT) Smalltalk: Dolphin Smalltalk Community Supported mongodb.org Supported
  10. @mongodb © Copyright 2010 10gen Inc. conferences, appearances, and meetups

    http://www.10gen.com/events http://bit.ly/mongofb Facebook | Twitter | LinkedIn http://linkd.in/joinmongo download at mongodb.org support, training, and this talk brought to you by