Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB DC 2012: Taming Social Media with MongoDB

D8fc2580cfaca035f666d9e4ee79a7f7?s=47 mongodb
June 26, 2012
150

MongoDB DC 2012: Taming Social Media with MongoDB

Danny Holloway, The Human Geo
This talk will cover the basics of extracting and storing data using the twitter API and focus on geospatial indexing capabilities provided by MongoDB in order to perform analysis and create interesting visualizations.

D8fc2580cfaca035f666d9e4ee79a7f7?s=128

mongodb

June 26, 2012
Tweet

Transcript

  1. Taming'Social'Media' with'MongoDB' Danny'Holloway' danny@thehumangeo.com' June'26,'2012'

  2. Overview' •  IntroducCon' •  Social'Media'Challenges' •  MongoDB'Setup' •  CollecCng'Tweets' • 

    Querying'Tweets' •  Accessing'the'Data' •  Finding'Most'AcCve'Tweeter' •  Lessons'Learned' •  Building'an'Interface' •  Demo' 2'
  3. IntroducCon' •  Built'a'tool'to'collect'tweets'over'Australia' and'interact'with'them'on'a'map' •  Working'at'HumanGeo ' ' ' ''

    – Building'tools'and'services'for'geospaCal'analysis' of'Big'Data' – Using'MongoDB'for'horizontally'scalable'storage' and'geospaCal'analysis' 3'
  4. Social'Media'Challenges'' •  No'control'over'data' – “Consumers*of*Tweets*should*tolerate*the*addi4on* of*new*fields*and*variance*in*ordering*of*fields* with*ease.”*;*TwiTer' •  High'Volume' – ~17k'tweets'in'a'day'or'6.2M'per'year'with'exact' coordinates'in'Australia'

    – Record'high'of'>25k'tweets'per'second'or'>788B' per'year'around'the'world']'TwiTer' 4'
  5. MongoDB'Setup' •  Create'database' •  Create'capped'collecCons' •  Create'indexes' 5'

  6. CollecCng'Tweets' •  Using'tweetstream'to'collect'tweets'over' Australia'from'statuses/filter'endpoint' •  Insert'results'into'collecCons' 6'

  7. CollecCng'Tweets'(cont)' •  Augment'results'for'beTer'queries' – TwiTer'provides'date'strings'like'"Wed'Jun'13' 23:17:58'+0000'2012“' ' 7'

  8. Querying'Tweets' •  Get'all'of'the'latest'tweets' ' •  Get'all'the'tweets'from'a'user' ' 8'

  9. Querying'Tweets'(cont)' •  Get'tweets'near'a'point' •  Get'tweets'within'a'bounding'box' ' 9'

  10. Accessing'the'Data' •  Using'BoTle'to'create'a'RESTful'API' 10'

  11. Finding'Most'AcCve'Tweeter' •  Calculate'tweet'count'for'each'user'and' return'tweets'for'that'user' 11'

  12. Lessons'Learned' •  Use'Longitude,'LaCtude'ordering'for' coordinates' •  Default'index'value'range'is'exclusive'of'upper' bound' •  TwiTer'has'bugs'too' • 

    Making'your'own'maps'isn’t'hard'(it'can'take' some'Cme)' 12'
  13. Building'an'Interface' •  Dust'javascript'templaCng'library' •  Leaflet'javascript'interacCve'map'library' •  jQuery''javascript'library' •  TileStream'map'Cle'server' '

    13'