Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Saving Time with MongoDB - Harris Reynolds, Dev...

mongodb
May 07, 2012
380

Saving Time with MongoDB - Harris Reynolds, Developer, Nimble Labs

Talk Description: Using mongodb as your backend database can save you time both during development and during runtime; and who doesn't want more TIME?! This talk will examine how using mongodb makes development faster for several modern use cases including JSON storage, dynamic schemas and removing the need for object-relational mapping technology; it will also discuss building APIs on top of the standard Mongo driver APIs for common tasks to speed up development time. In addition it will discuss how mongodb can improve the performance of your applications by handling large amounts of data very efficiently.

mongodb

May 07, 2012
Tweet

Transcript

  1. • random CSV data: NFL Football stats, H1B Visa data,

    census data • random mongodb instances (MongoHQ, Openshift etc) App #1: Visualizing Random Data Monday, May 7, 12
  2. CSV data [yawn] ID, last name, first name, year, team,

    position, G, GS, COMP, ATT, PassYD, PassTD, INT, rush, rushYD, rushTD, rec, recYD, recTD MariDa00,Marino,Dan,1983,mia,qb,11,9,173,296,2210,20,6,28,45,2,0,0,0 MariDa00,Marino,Dan,1984,mia,qb,16,16,362,564,5084,48,17,28,-7,0,0,0,0 MariDa00,Marino,Dan,1985,mia,qb,16,16,336,567,4137,30,21,26,-24,0,0,0,0 MariDa00,Marino,Dan,1986,mia,qb,16,16,378,623,4746,44,23,12,-3,0,0,0,0 MariDa00,Marino,Dan,1987,mia,qb,12,12,263,444,3245,26,13,12,-5,1,0,0,0 MariDa00,Marino,Dan,1988,mia,qb,16,16,354,606,4434,28,23,20,-17,0,0,0,0 MariDa00,Marino,Dan,1989,mia,qb,16,16,308,550,3997,24,22,14,-7,2,0,0,0 MariDa00,Marino,Dan,1990,mia,qb,16,16,306,531,3563,21,11,16,29,0,0,0,0 MariDa00,Marino,Dan,1991,mia,qb,16,16,318,549,3970,25,13,27,32,1,0,0,0 MariDa00,Marino,Dan,1992,mia,qb,16,16,330,554,4116,24,16,20,66,0,0,0,0 MariDa00,Marino,Dan,1993,mia,qb,5,5,91,150,1218,8,3,9,-4,1,0,0,0 MariDa00,Marino,Dan,1994,mia,qb,16,16,385,615,4453,30,17,22,-6,1,0,0,0 MariDa00,Marino,Dan,1995,mia,qb,14,14,309,482,3668,24,15,11,14,0,0,0,0 MariDa00,Marino,Dan,1996,mia,qb,13,13,221,373,2795,17,9,11,-3,0,0,0,0 MariDa00,Marino,Dan,1997,mia,qb,16,16,319,548,3780,16,11,18,-14,0,0,0,0 MariDa00,Marino,Dan,1998,mia,qb,16,16,310,537,3497,23,15,21,-3,1,0,0,0 MariDa00,Marino,Dan,1999,mia,qb,11,11,204,369,2448,12,17,6,-6,0,0,0,0 Monday, May 7, 12
  3. CSV to mongodb ID, last name, first name, year, team,

    position, G, GS, COMP, ATT, PassYD, PassTD, INT, rush, rushYD, rushTD, rec, recYD, recTD MariDa00,Marino,Dan,1983,mia,qb, 11,9,173,296,2210,20,6,28,45,2,0,0,0 MariDa00,Marino,Dan,1984,mia,qb, 16,16,362,564,5084,48,17,28,-7,0,0,0,0 MariDa00,Marino,Dan,1985,mia,qb, 16,16,336,567,4137,30,21,26,-24,0,0,0,0 { "_id" : { "$oid" : "4f86fcb00364ce3958c9bcad"} , "ID" : "MariDa00" , "last name" : "Marino" , "first name" : "Dan" , "year" : 1983.0 , "team" : "mia" , "position" : "qb" , "G" : 11.0 , "GS" : 9.0 , "COMP" : 173.0 , "ATT" : 296.0 , "PassYD" : 2210.0 , "PassTD" : 20.0 , "INT" : 6.0 , "rush" : 28.0 , "rushYD" : 45.0 , "rushTD" : 2.0 , "rec" : 0.0 , "recYD" : 0.0 , "recTD" : 0.0} fields documents Monday, May 7, 12
  4. queries: sum of PassYD DBObject qbCondition = new BasicDBObject("position", "qb");

    DBObject columnFields = new BasicDBObject("last name", true); columnFields.put("first name", true); List passLeaderResults = aggregator.sum(columnFields, "PassYD", qbCondition, sortDescending, 10); Top 10 Passing leaders name yards --------------------- Brett Favre 61655.0 Dan Marino 61361.0 John Elway 51475.0 Warren Moon 49325.0 Fran Tarkenton 47003.0 Vinny Testaverde 46233.0 Drew Bledsoe 44611.0 Dan Fouts 43040.0 Peyton Manning 41626.0 Joe Montana 40551.0 > Monday, May 7, 12
  5. queries: sum of INTs DBObject qbCondition = new BasicDBObject("position", "qb");

    DBObject columnFields = new BasicDBObject("last name", true); columnFields.put("first name", true); List passLeaderResults = aggregator.sum(columnFields, "INT", qbCondition, sortDescending, 10); Top 10 Interception leaders name interceptions --------------------- Brett Favre 288.0 George Blanda 277.0 John Hadl 268.0 Vinny Testaverde 267.0 Fran Tarkenton 266.0 Norm Snead 257.0 Johnny Unitas 253.0 Dan Marino 252.0 Jim Hart 247.0 Bobby Layne 243.0 > Monday, May 7, 12
  6. General-purpose APIs • What was that aggregator object? • General

    API for sum, avg, standard deviation • Aggregation should be easy! • https://github.com/harrisreynolds/mongo- aggregation-java Monday, May 7, 12
  7. Method signature for standard deviation: public List stddev( DBObject groupByMap,

    String aggField, DBObject condition ) Ask questions like, What is the standard deviation of salaries (aggregation field) grouped by companies for employees with the title ‘programmer’? (hopefully the std deviation is low if you are underpaid or high if you are overpaid!) Monday, May 7, 12
  8. “Advanced” functions Standard Deviation with map reduce StringBuffer mapFunction =

    new StringBuffer("function() { "); mapFunction.append("emit(" + fieldList + "{" + aggField + ": this." + aggField + "});"); mapFunction.append("}" ); • Writing a program within a program! • Java code that generates JS map function: Monday, May 7, 12
  9. Reduce function in JS Don’t be afraid of Javascript! :-)

    function reduce(key, values) { var result = {count: 0, oldSum: 0, newSum: 0, oldMean: 0, newMean: 0}; values.forEach(function(value) { if( !isNaN(value.value) ) { result.count++; if(result.count==1){ result.oldMean = value.value; result.newMean = value.value; result.oldSum = 0.0; } else { result.newMean = result.oldMean + (value.value - result.oldMean)/result.count; result.newSum = result.oldSum + (value.value - result.oldMean)*(value.value - result.newMean); result.oldMean = result.newMean; result.oldSum = result.newSum; } } }); return result; } Monday, May 7, 12
  10. Experiment! Sum of rushYD by team/Sum of rushes by team

    = yards per carry by team List teamYdsPerCarryResults = aggregator.divideTwoSums( "team", "rushYD", "rush", rbCondition, "yardsPerCarry", sortDescending, 10 ); Top 10 Teams by Yard per Carry team yards/carry --------------------- ram 4.296806751197998 jax 4.21252408477842 sfo 4.186842694115015 cle 4.183267135146145 dal 4.1301798090437645 rav 4.127333056822895 rai 4.120062474808545 sdg 4.113904326598452 cin 4.111124583207032 nyy 4.092783505154639 Monday, May 7, 12
  11. Salient Points • Mongo is great for semi-structured data storage

    • Is this possible with RDBM systems? Yes, but is it painful. Avoid pain when possible! (column names, data types etc.) • general purpose APIs make your life easier. Make your life easier when possible • go ahead and learn JS if you haven’t already Monday, May 7, 12
  12. • Service for archiving tweets (consumer) • Service for finding

    customers via Twitter and Linked in (business) App #2: Social Media Data Monday, May 7, 12
  13. • Twitter User as JSON document: Simple User Models {

    "_id" : ObjectId("4f73b297e4b05be5f313394b"), "id" : 756354, "twitterName" : "Harris Reynolds", "screenName" : "harrisreynolds", "location" : "Birmingham, AL", "description" : "startups, software, hacker, health nut, founder of http://nimblelabs.com, trying to be a better person", "url" : "http:// harrisreynolds.net", "followerCount" : 248, "statusCount" : 1226 } Twitter userTwitter = ... User twitterUser = userTwitter.verifyCredentials(); Gson gson = new Gson(); String userJson = gson.toJson(twitterUser); DBObject userDocument = (DBObject)JSON.parse(userJson); userCollection.insert(userDocument); • Java to store this user Monday, May 7, 12
  14. Dynamic User Models Twitter userTwitter = ... User twitterUser =

    userTwitter.verifyCredentials(); Gson gson = new Gson(); String userJson = gson.toJson(twitterUser); DBObject userDocument = (DBObject)JSON.parse(userJson); userDocument.put( “accountType”, “free” ); LinkedInApiClient client = ... Person linkedInProfile = client.getProfileById( profileId ); String linkedInJson = gson.toJson(linkedInProfile); DBObject linkedInDocument = (DBObject)JSON.parse(linkedInJson); userDocument.put( linkedInDocument.toMap() ); userCollection.insert(userDocument); • Decorate models with additional fields, on the fly! • Combine Twitter User objects with LinkedIn Objects Monday, May 7, 12
  15. Salient Points • mongodb’s schema-less design saves you time. Save

    your time. • Blessed are the flexible, for they shall not be broken. Mongodb is flexible; with great flexibility comes great power. Monday, May 7, 12