Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoNYC 2012: Real Time Data Analytics

mongodb
June 05, 2012
200

MongoNYC 2012: Real Time Data Analytics

MongoNYC 2012: Real Time Data Analytics, Scott Hernandez, 10gen. There are many ways to build dashboards and reports with detailed aggregations. In this talk we will outline one way of using pre-aggregating as data is collected in order to provide real-time views. You may be familiar with options which include post data collection processing, like map/reduce or the new aggregation framework; we will investigate some of the downsides which we will need to be addressed when defining the use case for counters and pre-aggregation with MongoDB.
Come learn how to build these types of reports and see how easy it can be.

mongodb

June 05, 2012
Tweet

Transcript

  1. Framework • Know your metrics/counter • Prepared reports • Calculate

    during write • Fast queries • Always up to date • Record time-series collections
  2. Roads not traveled • Map/Reduce • Reprocess raw data •

    Now possible to do partial reduce • Aggregation Framework (aggregate in 2.2) • Also reprocess data on operation (initial release) • Optimizations to come • More costly during reads
  3. Processing • Event received • Split into many updates w/$inc

    • Aggregate • Input Field(s) • Time periods (hourly, monthly, annual) • Defined Metrics
  4. Example Data: github > db.events.findOne() { "repository" : { "url"

    : "https://github.com/vidageek/games", ... "open_issues" : 25, "watchers" : 6, "pushed_at" : "2012/03/10 08:34:00 -0800", "language" : "Java" }, "actor_attributes" : {...}, "created_at" : "2012/03/11 15:20:24 -0700", "public" : true, "actor" : "juliano", "payload" : {...}, "url" : "https://github.com/...", "type" : "CommitCommentEvent” }
  5. Define Metrics • “actor” • “repository.name” • “repository.language” • “type”

    PushEvent, IssuesEvent, WatchEvent, GistEvent • “payload.ref” efs/heads/improved_history, refs/heads/master, refs/heads/signs
  6. Stats > db.stats_hourly.types.find({"_id.type":"GistEvent"}) { "_id" : { "p" : ISODate("2012-05-21T00:00:00Z"),

    "type" : "GistEvent” }, "hour" : { "2" : { "count" : 65 }, "3" : { "count" : 2 }, "7" : { ”count" : 130}, "8" : { "count" : 5 } }, "total" : { ”count" : 202 } }
  7. Updates Increment Query: { ”p" : Date(…), "actor" : "neoplastic"}}

    Update: { "$inc" : { "h.21.c" : 1 , "t.c" : 1}} Upsert : true
  8. The Whys • Multiple data points per document • Documents

    hold many timed points • Good for graphs by time, or types • Nested for improved performance
  9. © Copyright 2010 10gen Inc. drivers at mongodb.org REST ActionScript3

    C# and .NET Clojure ColdFusion Delphi Erlang F# Go: gomongo Groovy Haskell Javascript Lua C C# C++ Erlang Haskell Java Javascript Perl PHP Python Ruby node.js Objective C PHP PowerShell Blog post Python Ruby Scala Scheme (PLT) Smalltalk: Dolphin Smalltalk Community Supported mongodb.org Supported
  10. @mongodb © Copyright 2010 10gen Inc. conferences, appearances, and meetups

    http://www.10gen.com/events http://bit.ly/mongofb Facebook | Twitter | LinkedIn http://linkd.in/joinmongo download at mongodb.org support, training, and this talk brought to you by