Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Understanding Principal Component Analysis Using Stack Overflow Data

Julia Silge
February 03, 2018

Understanding Principal Component Analysis Using Stack Overflow Data

February 2018 talk at rstudio::conf

Julia Silge

February 03, 2018
Tweet

More Decks by Julia Silge

Other Decks in Technology

Transcript

  1. I spend lots of time thinking about how technologies are

    related to each other Technology Relationships
  2. 9 Click to edit slide title Click to edit slide

    subtitle What kinds of tags do I visit? Tag Percent r 63.1% regex 12.1% ggplot2 9.7% git 6.0% dataframe 4.2%
  3. “ 10 Jamie Zawinski Some people, when confronted with a

    problem, think, "I know, I'll use regular expressions." Now they have two problems.
  4. 16 Paragraph copy. Lorem ipsum dolor sit amet, consectetur adipiscing

    elit. Sed molestie lorem et ipsum euismod volutpat. Cras et neque euismod, suscipit turpis et, hendrerit libero. • First level bullet point ◦ Second level bullet point ▪ Third level bullet point Click to edit slide title Click to edit slide subtitle AccountId Tag Value <int> <chr> <dbl> 1 6461130 sass 0.00244 2 1044010 tsql 0.00179 3 405410 qt 0.00156 4 3224070 http-headers 0.00306 5 10525200 asp.net-mvc 0.00403 6 6349580 amazon-s3 0.00123 7 6114210 cookies 0.00373 8 7397910 arrays 0.0237 9 10997890 user-interface 0.00920 10 1721510 fonts 0.00181 11 9553740 sql-server 0.172 12 3249020 frontend 0.00113 13 10361710 concurrency 0.000377 14 2251000 select 0.000527 User tag visits
  5. 17 Paragraph copy. Lorem ipsum dolor sit amet, consectetur adipiscing

    elit. Sed molestie lorem et ipsum euismod volutpat. Cras et neque euismod, suscipit turpis et, hendrerit libero. • First level bullet point ◦ Second level bullet point ▪ Third level bullet point Click to edit slide title Click to edit slide subtitle sparse_tag_matrix <- user_tag_counts %>% tidytext::cast_sparse(AccountId, Tag, Percent) tags_scaled <- scale(sparse_tag_matrix) tags_pca <- irlba::prcomp_irlba(tags_scaled, n = 64) tidied_pca <- bind_cols(Tag = colnames(tags_scaled), tidy(tags_pca$rotation)) User tag visits