Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An exploration of the pull-based software development model

An exploration of the pull-based software development model

Talk at the main track of ICSE 2014

Georgios Gousios

June 04, 2014
Tweet

More Decks by Georgios Gousios

Other Decks in Research

Transcript

  1. An exploration of the pull- based software development model Georgios

    Gousios, Martin Pinzger, Arie van Deursen TU Delft, Netherlands ! Twitter: @gousiosg
  2. A new way of Distributed Software Development Pull requests: •

    Keep everything in one place • Facilitate tool integration and automation • Provide contextual information
  3. Research questions 1. How widespread is pull-based development? 2. What

    does the life cycle of a pull request look like? 3. What factors affect the • decision to merge? • time it takes to decide on a merge? 4. Why are some pull requests not merged?
  4. 1. Do repositories use PRs? • 68%: single developer repositories!

    • Of the multi-developer repositories! • 45% use the pull request model! • 55% use the shared repository model
  5. 1. Popularity: per project • median = 2 • 95%

    percentile = 21 • Many projects with > 10.000 • Ruby on Rails • Homebrew 0 1000 2000 3000 100 10000 Number of pull requests (log) Number of projects
  6. Pull request sample Sample projects with: • > 200 pull

    requests • Test suite • Ruby, Python, Java, Scala • At least one commit from a non-core member • Frameworks / applications (not doc) 297 projects 168,000 PRs
  7. 2. Lifecycle: Merges Overall: ~84% of pull requests merged. •

    ~70% through GitHub web UI Alternative approaches: • Local git merge, then push • Cherry-picking • Squash / rebase • Apply patch locally, then push Developed heuristics to detect alternative merges 16% 84% Merged Not merged
  8. 2. Lifecycle: Size metrics median 80% # Commits 1 3

    # Files 2 7 # lines changed 20 168 0 20000 40000 60000 10 1,000 Number of files changed by the pull request (log) Number of pull requests 0 3000 6000 9000 10 1,000 100,000 Lines of code changed in pull request (log) Number of pull requests
  9. 2. Lifecycle: Discussions • Brief • # comments: 80% <

    4 • # participants: 80% < 3 • # comments: weak correlation • with time to merge • with time to close
  10. 2. Lifecycle: Code Review • 12% of pull requests have

    code comments • Review does not affect the probability to merge • Slows down acceptance by an order of magnitude • median: 5h vs 50h • Usually, mostly on large projects
  11. 3. Factors affecting merge decision / time • Determine factors

    • Determine importance Decision to merge 16% 84% Merged Not merged Time to merge 34% 35% 31% Hour Day > Day
  12. 3. Determining factors • Pull request factors: Size, hotness •

    Project factors: test coverage, size, openess • Developer factors: reputation, track record • 40 features
  13. 3. Determining importance • Used 3 well known algorithms with

    no tuning • Random forests • Logistic regression • Naive bayes • Pick the one that performs best • Calculate relative factor importance for prediction
  14. 3. Factors influencing merge decision Hotness of project area Project

    size Project test coverage Changeset size .! .! .
  15. 3. Factors influencing merge time Developer track record Project size

    Project test coverage Project openness .! .! .
  16. 15% 19% 2% 13% 9% 16% 27% Done by others

    better Project heading elsewhere Incorrect process Incorrect implementation Tests failed PR was merged We have no idea 4. Why are some pull requests rejected?
  17. 15% 19% 2% 13% 9% 16% 27% Done by others

    better Project heading elsewhere Incorrect process Incorrect implementation Tests failed PR was merged We have no idea 4. Why are some pull requests rejected? Task articulation
  18. 15% 19% 2% 13% 9% 16% 27% Done by others

    better Project heading elsewhere Incorrect process Incorrect implementation Tests failed PR was merged We have no idea 4. Why are some pull requests rejected? Submitter’s fault
  19. Recommendations •Contributors •Keep it short; keep it hot •Figure out

    what others are doing •Make thyself known to the project community •Core team •Invest in test suite •Clarify project direction
  20. Research • Understanding pull request process at the project level

    • Developer’s information needs • Recommendation tools • Quality analytics