Identifying Experts in Software Libraries and Frameworks among GitHub Users (MSR 2019)

Identifying Experts in Software Libraries and Frameworks among GitHub Users (MSR 2019)

Software development increasingly depends on libraries and frameworks to increase productivity and reduce time-to-market. Despite this fact, we still lack techniques to assess developers expertise in widely popular libraries and frameworks. In this paper, we evaluate the performance of unsupervised (based on clustering) and supervised machine learning classifiers (Random Forest and SVM) to identify experts in three popular JavaScript libraries: facebook/react, mongodb/node-mongodb, and socketio/socket.io. First, we collect 13 features about developers activity on GitHub projects, including commits on source code files that depend on these libraries. We also build a ground truth including the expertise of 575 developers on the studied libraries, as self-reported by them in a survey. Based on our findings, we document the challenges of using machine learning classifiers to predict expertise in software libraries, using features extracted from GitHub. Then, we propose a method to identify library experts based on clustering feature data from GitHub; by triangulating the results of this method with information available on Linkedin profiles, we show that it is able to recommend dozens of GitHub users with evidences of being experts in the studied JavaScript libraries. We also provide a public dataset with the expertise of 575 developers on the studied libraries.

13beaa3b7239eca3319d54c6a9f3a85a?s=128

ASERG, DCC, UFMG

May 27, 2019
Tweet

Transcript

  1. 1.

    Identifying Experts in Software Libraries and Frameworks among GitHub Users

    João Eduardo Montandon Federal University of Minas Gerais joao.montandon@dcc.ufmg.br MSR 2019 Luciana Lourdes Silva Federal Institute of Minas Gerais luciana.lourdes.silva@ifmg.edu.br Marco Tulio Valente Federal University of Minas Gerais mtov@dcc.ufmg.br
  2. 4.

    4

  3. 5.

    The problem with checking commit logs... Fine-grained information!! • Significant

    manual effort • Specialist is required • Difficult to scale 5
  4. 7.

    ReactJS MongoDB socket.io 2,185 candidates 418 answers (19%) 68 answers

    (15%) 89 answers (15%) 454 candidates 608 candidates 7 "Could you please rank your expertise on [target library] in a scale from 1 (novice) to 5 (expert)?"
  5. 8.

    Selected Features Volume Amount of changes codeChurn commits imports Frequency

    Time interval between changes daysSinceFirstImport daysSinceLastImport avgDaysCommitsImportLibrary Breadth Usage in different projects projects projectsImport 8
  6. 9.

    RQ.1: How accurate are ML classifiers in identifying library experts?

    RQ.2: Which features best distinguish library experts? 9
  7. 10.

    Study Setup SMOTE 3 & 5 classes classification RQ.1 RQ.2

    Ground-Truth k selection 3 & 5 clusters k-means 10
  8. 12.

    12 ReactJS Random Forest SVM Baseline Kappa 0.09 0.03 0.00

    AUC 0.56 0.51 0.50 F1 0.36 0.29 0.25
  9. 13.

    ReactJS Random Forest SVM Baseline Kappa 0.09 0.03 0.00 AUC

    0.56 0.51 0.50 F1 0.36 0.29 0.25 TL;DR: ML did not performed well 13
  10. 15.

    15 Library % Novices % Intermediate % Experts ReactJS 3

    23 74 MongoDB 12 24 65 socket.io 0 25 75 Experts Cluster
  11. 16.

    16 Library % Novices % Intermediate % Experts ReactJS 3

    23 74 MongoDB 12 24 65 socket.io 0 25 75 Experts Cluster
  12. 17.

    17 Library % Novices % Intermediate % Experts ReactJS 3

    23 74 MongoDB 12 24 65 socket.io 0 25 75 Experts Cluster
  13. 18.

    Library % Novices % Intermediate % Experts ReactJS 0.03 0.23

    0.74 MongoDB 0.12 0.24 0.65 socket.io 0.00 0.25 0.75 Experts Cluster TL;DR: We found clusters with experts 18
  14. 19.

    LinkedIn Triangulation We analyzed LinkedIn profiles of who did not

    answered to our survey 19 refer to ReactJS
  15. 20.

    Takeaways ML classifiers did not perform well in identifying expert

    developers... ... But we successfully distinguished clusters with experts
  16. 21.

    21