Slide 1

Slide 1 text

An exploration of the pull- based software development model Georgios Gousios, Martin Pinzger, Arie van Deursen TU Delft, Netherlands ! Twitter: @gousiosg

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

   fork-edit pull request merge

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

A new way of Distributed Software Development Pull requests: • Keep everything in one place • Facilitate tool integration and automation • Provide contextual information

Slide 11

Slide 11 text

Research questions 1. How widespread is pull-based development? 2. What does the life cycle of a pull request look like? 3. What factors affect the • decision to merge? • time it takes to decide on a merge? 4. Why are some pull requests not merged?

Slide 12

Slide 12 text

GHTorrent.org! (almost) all data from the Github REST API since Feb 2012 in raw and queriable form

Slide 13

Slide 13 text

1. Do repositories use PRs? • 68%: single developer repositories! • Of the multi-developer repositories! • 45% use the pull request model! • 55% use the shared repository model

Slide 14

Slide 14 text

1. Popularity: per project • median = 2 • 95% percentile = 21 • Many projects with > 10.000 • Ruby on Rails • Homebrew 0 1000 2000 3000 100 10000 Number of pull requests (log) Number of projects

Slide 15

Slide 15 text

Pull request sample Sample projects with: • > 200 pull requests • Test suite • Ruby, Python, Java, Scala • At least one commit from a non-core member • Frameworks / applications (not doc) 297 projects 168,000 PRs

Slide 16

Slide 16 text

2. Lifecycle: Merges Overall: ~84% of pull requests merged. • ~70% through GitHub web UI Alternative approaches: • Local git merge, then push • Cherry-picking • Squash / rebase • Apply patch locally, then push Developed heuristics to detect alternative merges 16% 84% Merged Not merged

Slide 17

Slide 17 text

2. Lifecycle: Time to merge 34% 35% 31% Hour Day > Day

Slide 18

Slide 18 text

2. Lifecycle: Size metrics median 80% # Commits 1 3 # Files 2 7 # lines changed 20 168 0 20000 40000 60000 10 1,000 Number of files changed by the pull request (log) Number of pull requests 0 3000 6000 9000 10 1,000 100,000 Lines of code changed in pull request (log) Number of pull requests

Slide 19

Slide 19 text

2. Lifecycle: Discussions • Brief • # comments: 80% < 4 • # participants: 80% < 3 • # comments: weak correlation • with time to merge • with time to close

Slide 20

Slide 20 text

2. Lifecycle: Code Review • 12% of pull requests have code comments • Review does not affect the probability to merge • Slows down acceptance by an order of magnitude • median: 5h vs 50h • Usually, mostly on large projects

Slide 21

Slide 21 text

3. Factors affecting merge decision / time • Determine factors • Determine importance Decision to merge 16% 84% Merged Not merged Time to merge 34% 35% 31% Hour Day > Day

Slide 22

Slide 22 text

3. Determining factors • Pull request factors: Size, hotness • Project factors: test coverage, size, openess • Developer factors: reputation, track record • 40 features

Slide 23

Slide 23 text

3. Determining importance • Used 3 well known algorithms with no tuning • Random forests • Logistic regression • Naive bayes • Pick the one that performs best • Calculate relative factor importance for prediction

Slide 24

Slide 24 text

3. Factors influencing merge decision Hotness of project area Project size Project test coverage Changeset size .! .! .

Slide 25

Slide 25 text

3. Factors influencing merge time Developer track record Project size Project test coverage Project openness .! .! .

Slide 26

Slide 26 text

15% 19% 2% 13% 9% 16% 27% Done by others better Project heading elsewhere Incorrect process Incorrect implementation Tests failed PR was merged We have no idea 4. Why are some pull requests rejected?

Slide 27

Slide 27 text

15% 19% 2% 13% 9% 16% 27% Done by others better Project heading elsewhere Incorrect process Incorrect implementation Tests failed PR was merged We have no idea 4. Why are some pull requests rejected? Task articulation

Slide 28

Slide 28 text

15% 19% 2% 13% 9% 16% 27% Done by others better Project heading elsewhere Incorrect process Incorrect implementation Tests failed PR was merged We have no idea 4. Why are some pull requests rejected? Submitter’s fault

Slide 29

Slide 29 text

Pull Requests are (almost) equally popular as shared repositories

Slide 30

Slide 30 text

Pull Requests are small; merged in < 1 day; are briefly discussed

Slide 31

Slide 31 text

Pull Requests are merged when they affect a hot project area

Slide 32

Slide 32 text

Pull Requests are processed fast when project has test suite

Slide 33

Slide 33 text

Pull Requests are processed fast when contributor has good track record

Slide 34

Slide 34 text

Pull Requests are rejected mostly due to insufficient task articulation

Slide 35

Slide 35 text

Recommendations •Contributors •Keep it short; keep it hot •Figure out what others are doing •Make thyself known to the project community •Core team •Invest in test suite •Clarify project direction

Slide 36

Slide 36 text

Research • Understanding pull request process at the project level • Developer’s information needs • Recommendation tools • Quality analytics

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

@gousiosg Twitter: gousiosg/pullreqs Github: