Slide 1

Slide 1 text

Taming'Social'Media' with'MongoDB' Danny'Holloway' [email protected]' June'26,'2012'

Slide 2

Slide 2 text

Overview' •  IntroducCon' •  Social'Media'Challenges' •  MongoDB'Setup' •  CollecCng'Tweets' •  Querying'Tweets' •  Accessing'the'Data' •  Finding'Most'AcCve'Tweeter' •  Lessons'Learned' •  Building'an'Interface' •  Demo' 2'

Slide 3

Slide 3 text

IntroducCon' •  Built'a'tool'to'collect'tweets'over'Australia' and'interact'with'them'on'a'map' •  Working'at'HumanGeo ' ' ' '' – Building'tools'and'services'for'geospaCal'analysis' of'Big'Data' – Using'MongoDB'for'horizontally'scalable'storage' and'geospaCal'analysis' 3'

Slide 4

Slide 4 text

Social'Media'Challenges'' •  No'control'over'data' – “Consumers*of*Tweets*should*tolerate*the*addi4on* of*new*fields*and*variance*in*ordering*of*fields* with*ease.”*;*TwiTer' •  High'Volume' – ~17k'tweets'in'a'day'or'6.2M'per'year'with'exact' coordinates'in'Australia' – Record'high'of'>25k'tweets'per'second'or'>788B' per'year'around'the'world']'TwiTer' 4'

Slide 5

Slide 5 text

MongoDB'Setup' •  Create'database' •  Create'capped'collecCons' •  Create'indexes' 5'

Slide 6

Slide 6 text

CollecCng'Tweets' •  Using'tweetstream'to'collect'tweets'over' Australia'from'statuses/filter'endpoint' •  Insert'results'into'collecCons' 6'

Slide 7

Slide 7 text

CollecCng'Tweets'(cont)' •  Augment'results'for'beTer'queries' – TwiTer'provides'date'strings'like'"Wed'Jun'13' 23:17:58'+0000'2012“' ' 7'

Slide 8

Slide 8 text

Querying'Tweets' •  Get'all'of'the'latest'tweets' ' •  Get'all'the'tweets'from'a'user' ' 8'

Slide 9

Slide 9 text

Querying'Tweets'(cont)' •  Get'tweets'near'a'point' •  Get'tweets'within'a'bounding'box' ' 9'

Slide 10

Slide 10 text

Accessing'the'Data' •  Using'BoTle'to'create'a'RESTful'API' 10'

Slide 11

Slide 11 text

Finding'Most'AcCve'Tweeter' •  Calculate'tweet'count'for'each'user'and' return'tweets'for'that'user' 11'

Slide 12

Slide 12 text

Lessons'Learned' •  Use'Longitude,'LaCtude'ordering'for' coordinates' •  Default'index'value'range'is'exclusive'of'upper' bound' •  TwiTer'has'bugs'too' •  Making'your'own'maps'isn’t'hard'(it'can'take' some'Cme)' 12'

Slide 13

Slide 13 text

Building'an'Interface' •  Dust'javascript'templaCng'library' •  Leaflet'javascript'interacCve'map'library' •  jQuery''javascript'library' •  TileStream'map'Cle'server' ' 13'