Slide 1

Slide 1 text

Real Time Data Analytics Pre-aggregation with counters © Copyright 2010 10gen Inc.

Slide 2

Slide 2 text

Goals • Dashboards • (Known) Reports • Real-time numbers

Slide 3

Slide 3 text

Framework • Know your metrics/counter • Prepared reports • Calculate during write • Fast queries • Always up to date • Record time-series collections

Slide 4

Slide 4 text

Dashboard

Slide 5

Slide 5 text

Roads not traveled • Map/Reduce • Reprocess raw data • Now possible to do partial reduce • Aggregation Framework (aggregate in 2.2) • Also reprocess data on operation (initial release) • Optimizations to come • More costly during reads

Slide 6

Slide 6 text

Not Appropriate For • Ad-hoc aggregations (unknown metrics) • One-off reports • Possibly complex calculations

Slide 7

Slide 7 text

Processing • Event received • Split into many updates w/$inc • Aggregate • Input Field(s) • Time periods (hourly, monthly, annual) • Defined Metrics

Slide 8

Slide 8 text

Example Data: github > db.events.findOne() { "repository" : { "url" : "https://github.com/vidageek/games", ... "open_issues" : 25, "watchers" : 6, "pushed_at" : "2012/03/10 08:34:00 -0800", "language" : "Java" }, "actor_attributes" : {...}, "created_at" : "2012/03/11 15:20:24 -0700", "public" : true, "actor" : "juliano", "payload" : {...}, "url" : "https://github.com/...", "type" : "CommitCommentEvent” }

Slide 9

Slide 9 text

Define Metrics • “actor” • “repository.name” • “repository.language” • “type” PushEvent, IssuesEvent, WatchEvent, GistEvent • “payload.ref” efs/heads/improved_history, refs/heads/master, refs/heads/signs

Slide 10

Slide 10 text

Aggregations TimePeriod, type # TimePeriod, author # TimePeriod, project #

Slide 11

Slide 11 text

Stats Collections stats_[hourly/daily/monthly].actors stats_[hourly/daily/monthly].projects stats_[hourly/daily/monthly].langs stats_[hourly/daily/monthly].types

Slide 12

Slide 12 text

Stats > db.stats_hourly.types.find({"_id.type":"GistEvent"}) { "_id" : { "p" : ISODate("2012-05-21T00:00:00Z"), "type" : "GistEvent” }, "hour" : { "2" : { "count" : 65 }, "3" : { "count" : 2 }, "7" : { ”count" : 130}, "8" : { "count" : 5 } }, "total" : { ”count" : 202 } }

Slide 13

Slide 13 text

Updates Increment Query: { ”p" : Date(…), "actor" : "neoplastic"}} Update: { "$inc" : { "h.21.c" : 1 , "t.c" : 1}} Upsert : true

Slide 14

Slide 14 text

Query/Graphing • Select by grouping (by date, by type/value) • Documents hold many data points

Slide 15

Slide 15 text

The Whys • Multiple data points per document • Documents hold many timed points • Good for graphs by time, or types • Nested for improved performance

Slide 16

Slide 16 text

Questions

Slide 17

Slide 17 text

© Copyright 2010 10gen Inc. try at try.mongodb.org

Slide 18

Slide 18 text

© Copyright 2010 10gen Inc. drivers at mongodb.org REST ActionScript3 C# and .NET Clojure ColdFusion Delphi Erlang F# Go: gomongo Groovy Haskell Javascript Lua C C# C++ Erlang Haskell Java Javascript Perl PHP Python Ruby node.js Objective C PHP PowerShell Blog post Python Ruby Scala Scheme (PLT) Smalltalk: Dolphin Smalltalk Community Supported mongodb.org Supported

Slide 19

Slide 19 text

@mongodb © Copyright 2010 10gen Inc. conferences, appearances, and meetups http://www.10gen.com/events http://bit.ly/mongofb Facebook | Twitter | LinkedIn http://linkd.in/joinmongo download at mongodb.org support, training, and this talk brought to you by