Data Mini-Project
Nitin Jain
Pavan Kumar
Geoffrey Jacoby
Abhishek Nandakumar
Slide 2
Slide 2 text
No content
Slide 3
Slide 3 text
151.78240 35.45195 68.79268 79.56177
Slide 4
Slide 4 text
p-value ~ 1.7e−18
Slide 5
Slide 5 text
ordinary
people
hollywood
stars
Slide 6
Slide 6 text
“I want”
Slide 7
Slide 7 text
San Francisco
Memphis
Things for Christmas
Something (describe it)
Someone (to do something)
Ice Cream
Love
Things for Christmas
Money
Pizza
Someone (to do something..)
(Some kind of) House
Slide 8
Slide 8 text
words common to SF + Memphis
p-value = 0.127
Difference insignificant.
Slide 9
Slide 9 text
all unique words
p-value = 3.17e−07
Difference meaningful.
Slide 10
Slide 10 text
Potential confounds
Wants for Christmas.
!
Words occurring once.
!
Context changes meaning:
!
I seriously think I want to hire someone to brush my hair for me
after I get out of the shower. DAMN. So many tangles. #lazy
!
I want someone to fall in love with the way I laugh and fall in love
with my smile.
Slide 11
Slide 11 text
Follower Count
Following Count
Listed (count)
Tweet count
URLs in 100 most recent tweets
Gender
118 celebrities
Slide 12
Slide 12 text
No content
Slide 13
Slide 13 text
Outliers
Slide 14
Slide 14 text
Barack Obama
Arnold Schwarzenegger
Al Gore
Senator John Mccain
Sarah Palin
Hillary Clinton
John Boehner
Joe Biden
Politicians
40,185,454
2,925,511
2,701,171
1,849,471
998,646
945,905
585,267
519,641
Slide 15
Slide 15 text
Tim O’Reilly
Deepak Chopra
Guy Kawasaki
John C. Maxwell
Beth Kanter
Authors
1,736,824
1,694,682
1,395,960
869,970
403,402
Slide 16
Slide 16 text
Kaka
Shaquille O' Neal
Lance Armstrong
Rubens Barrichello
Mark Sanchez
Sports
17,452,018
7,843,305
3,918,367
2,020,439
832,289
Slide 17
Slide 17 text
No content
Slide 18
Slide 18 text
No content
Slide 19
Slide 19 text
118 → 68 celebrities
Linear model found variables significant.
Slide 20
Slide 20 text
Adjusted R2: 0.5444
!
p-values
!
Number of Following 0.00892
Number of Tweets 0.02422
Public List Appearance 2.39e−10)
Slide 21
Slide 21 text
Margin of error above 10%
!
Sampling bias
!
Missed features
Time since on Twitter
Rise in popularity because of recent event
Issues