An exploration of the pull-based software development model

An exploration of the pull-based software development model

Talk at the main track of ICSE 2014

43df3993acc9af4e9f619e59cd849aee?s=128

Georgios Gousios

June 04, 2014
Tweet

Transcript

  1. An exploration of the pull- based software development model Georgios

    Gousios, Martin Pinzger, Arie van Deursen TU Delft, Netherlands ! Twitter: @gousiosg
  2. None
  3.    fork-edit pull request merge

  4. None
  5. None
  6. None
  7. None
  8. None
  9. None
  10. A new way of Distributed Software Development Pull requests: •

    Keep everything in one place • Facilitate tool integration and automation • Provide contextual information
  11. Research questions 1. How widespread is pull-based development? 2. What

    does the life cycle of a pull request look like? 3. What factors affect the • decision to merge? • time it takes to decide on a merge? 4. Why are some pull requests not merged?
  12. GHTorrent.org! (almost) all data from the Github REST API since

    Feb 2012 in raw and queriable form
  13. 1. Do repositories use PRs? • 68%: single developer repositories!

    • Of the multi-developer repositories! • 45% use the pull request model! • 55% use the shared repository model
  14. 1. Popularity: per project • median = 2 • 95%

    percentile = 21 • Many projects with > 10.000 • Ruby on Rails • Homebrew 0 1000 2000 3000 100 10000 Number of pull requests (log) Number of projects
  15. Pull request sample Sample projects with: • > 200 pull

    requests • Test suite • Ruby, Python, Java, Scala • At least one commit from a non-core member • Frameworks / applications (not doc) 297 projects 168,000 PRs
  16. 2. Lifecycle: Merges Overall: ~84% of pull requests merged. •

    ~70% through GitHub web UI Alternative approaches: • Local git merge, then push • Cherry-picking • Squash / rebase • Apply patch locally, then push Developed heuristics to detect alternative merges 16% 84% Merged Not merged
  17. 2. Lifecycle: Time to merge 34% 35% 31% Hour Day

    > Day
  18. 2. Lifecycle: Size metrics median 80% # Commits 1 3

    # Files 2 7 # lines changed 20 168 0 20000 40000 60000 10 1,000 Number of files changed by the pull request (log) Number of pull requests 0 3000 6000 9000 10 1,000 100,000 Lines of code changed in pull request (log) Number of pull requests
  19. 2. Lifecycle: Discussions • Brief • # comments: 80% <

    4 • # participants: 80% < 3 • # comments: weak correlation • with time to merge • with time to close
  20. 2. Lifecycle: Code Review • 12% of pull requests have

    code comments • Review does not affect the probability to merge • Slows down acceptance by an order of magnitude • median: 5h vs 50h • Usually, mostly on large projects
  21. 3. Factors affecting merge decision / time • Determine factors

    • Determine importance Decision to merge 16% 84% Merged Not merged Time to merge 34% 35% 31% Hour Day > Day
  22. 3. Determining factors • Pull request factors: Size, hotness •

    Project factors: test coverage, size, openess • Developer factors: reputation, track record • 40 features
  23. 3. Determining importance • Used 3 well known algorithms with

    no tuning • Random forests • Logistic regression • Naive bayes • Pick the one that performs best • Calculate relative factor importance for prediction
  24. 3. Factors influencing merge decision Hotness of project area Project

    size Project test coverage Changeset size .! .! .
  25. 3. Factors influencing merge time Developer track record Project size

    Project test coverage Project openness .! .! .
  26. 15% 19% 2% 13% 9% 16% 27% Done by others

    better Project heading elsewhere Incorrect process Incorrect implementation Tests failed PR was merged We have no idea 4. Why are some pull requests rejected?
  27. 15% 19% 2% 13% 9% 16% 27% Done by others

    better Project heading elsewhere Incorrect process Incorrect implementation Tests failed PR was merged We have no idea 4. Why are some pull requests rejected? Task articulation
  28. 15% 19% 2% 13% 9% 16% 27% Done by others

    better Project heading elsewhere Incorrect process Incorrect implementation Tests failed PR was merged We have no idea 4. Why are some pull requests rejected? Submitter’s fault
  29. Pull Requests are (almost) equally popular as shared repositories

  30. Pull Requests are small; merged in < 1 day; are

    briefly discussed
  31. Pull Requests are merged when they affect a hot project

    area
  32. Pull Requests are processed fast when project has test suite

  33. Pull Requests are processed fast when contributor has good track

    record
  34. Pull Requests are rejected mostly due to insufficient task articulation

  35. Recommendations •Contributors •Keep it short; keep it hot •Figure out

    what others are doing •Make thyself known to the project community •Core team •Invest in test suite •Clarify project direction
  36. Research • Understanding pull request process at the project level

    • Developer’s information needs • Recommendation tools • Quality analytics
  37. None
  38. @gousiosg Twitter: gousiosg/pullreqs Github: