Slide 1

Slide 1 text

Data Mini-Project Nitin Jain Pavan Kumar Geoffrey Jacoby Abhishek Nandakumar

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

151.78240 35.45195 68.79268 79.56177

Slide 4

Slide 4 text

p-value ~ 1.7e−18

Slide 5

Slide 5 text

ordinary people hollywood stars

Slide 6

Slide 6 text

“I want”

Slide 7

Slide 7 text

San Francisco Memphis Things for Christmas Something (describe it) Someone (to do something) Ice Cream Love Things for Christmas Money Pizza Someone (to do something..) (Some kind of) House

Slide 8

Slide 8 text

words common to SF + Memphis p-value = 0.127 Difference insignificant.

Slide 9

Slide 9 text

all unique words p-value = 3.17e−07 Difference meaningful.

Slide 10

Slide 10 text

Potential confounds Wants for Christmas. ! Words occurring once. ! Context changes meaning: ! I seriously think I want to hire someone to brush my hair for me after I get out of the shower. DAMN. So many tangles. #lazy ! I want someone to fall in love with the way I laugh and fall in love with my smile.

Slide 11

Slide 11 text

Follower Count Following Count Listed (count) Tweet count URLs in 100 most recent tweets Gender 118 celebrities

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

Outliers

Slide 14

Slide 14 text

Barack Obama Arnold Schwarzenegger Al Gore Senator John Mccain Sarah Palin Hillary Clinton John Boehner Joe Biden Politicians 40,185,454 2,925,511 2,701,171 1,849,471 998,646 945,905 585,267 519,641

Slide 15

Slide 15 text

Tim O’Reilly Deepak Chopra Guy Kawasaki John C. Maxwell Beth Kanter Authors 1,736,824 1,694,682 1,395,960 869,970 403,402

Slide 16

Slide 16 text

Kaka Shaquille O' Neal Lance Armstrong Rubens Barrichello Mark Sanchez Sports 17,452,018 7,843,305 3,918,367 2,020,439 832,289

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

118 → 68 celebrities Linear model found variables significant.

Slide 20

Slide 20 text

Adjusted R2: 0.5444 ! p-values ! Number of Following 0.00892 Number of Tweets 0.02422 Public List Appearance 2.39e−10)

Slide 21

Slide 21 text

Margin of error above 10% ! Sampling bias ! Missed features Time since on Twitter Rise in popularity because of recent event Issues