Understanding Principal Component Analysis Using Stack Overflow Data

274bc3b916eac3fd5280c4a8b60b244b?s=47 Julia Silge
February 03, 2018

Understanding Principal Component Analysis Using Stack Overflow Data

February 2018 talk at rstudio::conf

274bc3b916eac3fd5280c4a8b60b244b?s=128

Julia Silge

February 03, 2018
Tweet

Transcript

  1. Understanding PCA Using Stack Overflow Data Julia Silge https://juliasilge.com/

  2. Hello I am Julia Silge Data Scientist, Stack Overflow @juliasilge

  3. None
  4. 4 Silge, J.D., Gebhardt, K., Bergmann, M., & Richstone, D.

    2005, AJ, 130, 406
  5. @chrisalbon

  6. None
  7. Data science at Stack Overflow

  8. I spend lots of time thinking about how technologies are

    related to each other Technology Relationships
  9. 9 Click to edit slide title Click to edit slide

    subtitle What kinds of tags do I visit? Tag Percent r 63.1% regex 12.1% ggplot2 9.7% git 6.0% dataframe 4.2%
  10. “ 10 Jamie Zawinski Some people, when confronted with a

    problem, think, "I know, I'll use regular expressions." Now they have two problems.
  11. None
  12. None
  13. None
  14. None
  15. @chrisalbon

  16. 16 Paragraph copy. Lorem ipsum dolor sit amet, consectetur adipiscing

    elit. Sed molestie lorem et ipsum euismod volutpat. Cras et neque euismod, suscipit turpis et, hendrerit libero. • First level bullet point ◦ Second level bullet point ▪ Third level bullet point Click to edit slide title Click to edit slide subtitle AccountId Tag Value <int> <chr> <dbl> 1 6461130 sass 0.00244 2 1044010 tsql 0.00179 3 405410 qt 0.00156 4 3224070 http-headers 0.00306 5 10525200 asp.net-mvc 0.00403 6 6349580 amazon-s3 0.00123 7 6114210 cookies 0.00373 8 7397910 arrays 0.0237 9 10997890 user-interface 0.00920 10 1721510 fonts 0.00181 11 9553740 sql-server 0.172 12 3249020 frontend 0.00113 13 10361710 concurrency 0.000377 14 2251000 select 0.000527 User tag visits
  17. 17 Paragraph copy. Lorem ipsum dolor sit amet, consectetur adipiscing

    elit. Sed molestie lorem et ipsum euismod volutpat. Cras et neque euismod, suscipit turpis et, hendrerit libero. • First level bullet point ◦ Second level bullet point ▪ Third level bullet point Click to edit slide title Click to edit slide subtitle sparse_tag_matrix <- user_tag_counts %>% tidytext::cast_sparse(AccountId, Tag, Percent) tags_scaled <- scale(sparse_tag_matrix) tags_pca <- irlba::prcomp_irlba(tags_scaled, n = 64) tidied_pca <- bind_cols(Tag = colnames(tags_scaled), tidy(tags_pca$rotation)) User tag visits
  18. None
  19. None
  20. Using R at Stack Overflow 20

  21. Thanks! Find me at @juliasilge and https://juliasilge.com/ to Nick Larsen

    Kevin Montrose Jason Punyon David Robinson