Slide 1

Slide 1 text

Understanding PCA Using Stack Overflow Data Julia Silge https://juliasilge.com/

Slide 2

Slide 2 text

Hello I am Julia Silge Data Scientist, Stack Overflow @juliasilge

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

4 Silge, J.D., Gebhardt, K., Bergmann, M., & Richstone, D. 2005, AJ, 130, 406

Slide 5

Slide 5 text

@chrisalbon

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

Data science at Stack Overflow

Slide 8

Slide 8 text

I spend lots of time thinking about how technologies are related to each other Technology Relationships

Slide 9

Slide 9 text

9 Click to edit slide title Click to edit slide subtitle What kinds of tags do I visit? Tag Percent r 63.1% regex 12.1% ggplot2 9.7% git 6.0% dataframe 4.2%

Slide 10

Slide 10 text

“ 10 Jamie Zawinski Some people, when confronted with a problem, think, "I know, I'll use regular expressions." Now they have two problems.

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

@chrisalbon

Slide 16

Slide 16 text

16 Paragraph copy. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed molestie lorem et ipsum euismod volutpat. Cras et neque euismod, suscipit turpis et, hendrerit libero. ● First level bullet point ○ Second level bullet point ■ Third level bullet point Click to edit slide title Click to edit slide subtitle AccountId Tag Value 1 6461130 sass 0.00244 2 1044010 tsql 0.00179 3 405410 qt 0.00156 4 3224070 http-headers 0.00306 5 10525200 asp.net-mvc 0.00403 6 6349580 amazon-s3 0.00123 7 6114210 cookies 0.00373 8 7397910 arrays 0.0237 9 10997890 user-interface 0.00920 10 1721510 fonts 0.00181 11 9553740 sql-server 0.172 12 3249020 frontend 0.00113 13 10361710 concurrency 0.000377 14 2251000 select 0.000527 User tag visits

Slide 17

Slide 17 text

17 Paragraph copy. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed molestie lorem et ipsum euismod volutpat. Cras et neque euismod, suscipit turpis et, hendrerit libero. ● First level bullet point ○ Second level bullet point ■ Third level bullet point Click to edit slide title Click to edit slide subtitle sparse_tag_matrix <- user_tag_counts %>% tidytext::cast_sparse(AccountId, Tag, Percent) tags_scaled <- scale(sparse_tag_matrix) tags_pca <- irlba::prcomp_irlba(tags_scaled, n = 64) tidied_pca <- bind_cols(Tag = colnames(tags_scaled), tidy(tags_pca$rotation)) User tag visits

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

Using R at Stack Overflow 20

Slide 21

Slide 21 text

Thanks! Find me at @juliasilge and https://juliasilge.com/ to Nick Larsen Kevin Montrose Jason Punyon David Robinson