Identifying Experts in Software Libraries and Frameworks among GitHub Users (MSR 2019)

Identifying Experts in Software Libraries and Frameworks among GitHub Users (MSR 2019)

Software development increasingly depends on libraries and frameworks to increase productivity and reduce time-to-market. Despite this fact, we still lack techniques to assess developers expertise in widely popular libraries and frameworks. In this paper, we evaluate the performance of unsupervised (based on clustering) and supervised machine learning classifiers (Random Forest and SVM) to identify experts in three popular JavaScript libraries: facebook/react, mongodb/node-mongodb, and socketio/socket.io. First, we collect 13 features about developers activity on GitHub projects, including commits on source code files that depend on these libraries. We also build a ground truth including the expertise of 575 developers on the studied libraries, as self-reported by them in a survey. Based on our findings, we document the challenges of using machine learning classifiers to predict expertise in software libraries, using features extracted from GitHub. Then, we propose a method to identify library experts based on clustering feature data from GitHub; by triangulating the results of this method with information available on Linkedin profiles, we show that it is able to recommend dozens of GitHub users with evidences of being experts in the studied JavaScript libraries. We also provide a public dataset with the expertise of 575 developers on the studied libraries.

13beaa3b7239eca3319d54c6a9f3a85a?s=128

ASERG, DCC, UFMG

May 27, 2019
Tweet

Transcript

  1. Identifying Experts in Software Libraries and Frameworks among GitHub Users

    João Eduardo Montandon Federal University of Minas Gerais joao.montandon@dcc.ufmg.br MSR 2019 Luciana Lourdes Silva Federal Institute of Minas Gerais luciana.lourdes.silva@ifmg.edu.br Marco Tulio Valente Federal University of Minas Gerais mtov@dcc.ufmg.br
  2. Software development increasingly depends on third-party components

  3. Jobs offers require knowledge in libraries/ frameworks* * Stack Overflow

    Jobs at July 2nd, 2018.
  4. 4

  5. The problem with checking commit logs... Fine-grained information!! • Significant

    manual effort • Specialist is required • Difficult to scale 5
  6. Our proposal ML & Clustering 6 Novice Intermediate Expert ReactJS

  7. ReactJS MongoDB socket.io 2,185 candidates 418 answers (19%) 68 answers

    (15%) 89 answers (15%) 454 candidates 608 candidates 7 "Could you please rank your expertise on [target library] in a scale from 1 (novice) to 5 (expert)?"
  8. Selected Features Volume Amount of changes codeChurn commits imports Frequency

    Time interval between changes daysSinceFirstImport daysSinceLastImport avgDaysCommitsImportLibrary Breadth Usage in different projects projects projectsImport 8
  9. RQ.1: How accurate are ML classifiers in identifying library experts?

    RQ.2: Which features best distinguish library experts? 9
  10. Study Setup SMOTE 3 & 5 classes classification RQ.1 RQ.2

    Ground-Truth k selection 3 & 5 clusters k-means 10
  11. RQ.1: How accurate are ML classifiers in identifying library experts?

    11
  12. 12 ReactJS Random Forest SVM Baseline Kappa 0.09 0.03 0.00

    AUC 0.56 0.51 0.50 F1 0.36 0.29 0.25
  13. ReactJS Random Forest SVM Baseline Kappa 0.09 0.03 0.00 AUC

    0.56 0.51 0.50 F1 0.36 0.29 0.25 TL;DR: ML did not performed well 13
  14. RQ.2: Which features best distinguish library experts? 14

  15. 15 Library % Novices % Intermediate % Experts ReactJS 3

    23 74 MongoDB 12 24 65 socket.io 0 25 75 Experts Cluster
  16. 16 Library % Novices % Intermediate % Experts ReactJS 3

    23 74 MongoDB 12 24 65 socket.io 0 25 75 Experts Cluster
  17. 17 Library % Novices % Intermediate % Experts ReactJS 3

    23 74 MongoDB 12 24 65 socket.io 0 25 75 Experts Cluster
  18. Library % Novices % Intermediate % Experts ReactJS 0.03 0.23

    0.74 MongoDB 0.12 0.24 0.65 socket.io 0.00 0.25 0.75 Experts Cluster TL;DR: We found clusters with experts 18
  19. LinkedIn Triangulation We analyzed LinkedIn profiles of who did not

    answered to our survey 19 refer to ReactJS
  20. Takeaways ML classifiers did not perform well in identifying expert

    developers... ... But we successfully distinguished clusters with experts
  21. 21

  22. Thank You!! João Eduardo Montandon Luciana Lourdes Silva Marco Tulio

    Valente 22