An exploration of the pull-
based software
development model
Georgios Gousios, Martin Pinzger, Arie van Deursen
TU Delft, Netherlands
!
Twitter: @gousiosg
Slide 2
Slide 2 text
No content
Slide 3
Slide 3 text
fork-edit pull request merge
Slide 4
Slide 4 text
No content
Slide 5
Slide 5 text
No content
Slide 6
Slide 6 text
No content
Slide 7
Slide 7 text
No content
Slide 8
Slide 8 text
No content
Slide 9
Slide 9 text
No content
Slide 10
Slide 10 text
A new way of Distributed
Software Development
Pull requests:
• Keep everything in one place
• Facilitate tool integration and automation
• Provide contextual information
Slide 11
Slide 11 text
Research questions
1. How widespread is pull-based development?
2. What does the life cycle of a pull request look like?
3. What factors affect the
• decision to merge?
• time it takes to decide on a merge?
4. Why are some pull requests not merged?
Slide 12
Slide 12 text
GHTorrent.org!
(almost) all data from the Github
REST API since Feb 2012 in
raw and queriable form
Slide 13
Slide 13 text
1. Do repositories use PRs?
• 68%: single developer repositories!
• Of the multi-developer repositories!
• 45% use the pull request model!
• 55% use the shared repository model
Slide 14
Slide 14 text
1. Popularity: per project
• median = 2
• 95% percentile = 21
• Many projects with > 10.000
• Ruby on Rails
• Homebrew
0
1000
2000
3000
100 10000
Number of pull requests (log)
Number of projects
Slide 15
Slide 15 text
Pull request sample
Sample projects with:
• > 200 pull requests
• Test suite
• Ruby, Python, Java, Scala
• At least one commit from a non-core member
• Frameworks / applications (not doc)
297 projects
168,000 PRs
Slide 16
Slide 16 text
2. Lifecycle: Merges
Overall: ~84% of pull requests merged.
• ~70% through GitHub web UI
Alternative approaches:
• Local git merge, then push
• Cherry-picking
• Squash / rebase
• Apply patch locally, then push
Developed heuristics
to detect alternative
merges
16%
84% Merged
Not merged
Slide 17
Slide 17 text
2. Lifecycle: Time to merge
34%
35%
31%
Hour
Day
> Day
Slide 18
Slide 18 text
2. Lifecycle: Size metrics
median 80%
# Commits 1 3
# Files 2 7
# lines changed 20 168
0
20000
40000
60000
10 1,000
Number of files changed by the pull request (log)
Number of pull requests
0
3000
6000
9000
10 1,000 100,000
Lines of code changed in pull request (log)
Number of pull requests
Slide 19
Slide 19 text
2. Lifecycle: Discussions
• Brief
• # comments: 80% < 4
• # participants: 80% < 3
• # comments: weak correlation
• with time to merge
• with time to close
Slide 20
Slide 20 text
2. Lifecycle: Code Review
• 12% of pull requests have code comments
• Review does not affect the probability to merge
• Slows down acceptance by an order of
magnitude
• median: 5h vs 50h
• Usually, mostly on large projects
Slide 21
Slide 21 text
3. Factors affecting
merge decision / time
• Determine factors
• Determine importance
Decision to
merge
16%
84% Merged
Not merged
Time to merge
34%
35%
31%
Hour
Day
> Day
Slide 22
Slide 22 text
3. Determining factors
• Pull request factors: Size, hotness
• Project factors: test coverage, size, openess
• Developer factors: reputation, track record
• 40 features
Slide 23
Slide 23 text
3. Determining importance
• Used 3 well known algorithms with no tuning
• Random forests
• Logistic regression
• Naive bayes
• Pick the one that performs best
• Calculate relative factor importance for prediction
Slide 24
Slide 24 text
3. Factors influencing merge
decision
Hotness of project area
Project size
Project test coverage
Changeset size
.!
.!
.
Slide 25
Slide 25 text
3. Factors influencing merge
time
Developer track record
Project size
Project test coverage
Project openness
.!
.!
.
Slide 26
Slide 26 text
15%
19%
2%
13%
9%
16%
27%
Done by others better
Project heading elsewhere
Incorrect process
Incorrect implementation
Tests failed
PR was merged
We have no idea
4. Why are some pull
requests rejected?
Slide 27
Slide 27 text
15%
19%
2%
13%
9%
16%
27%
Done by others better
Project heading elsewhere
Incorrect process
Incorrect implementation
Tests failed
PR was merged
We have no idea
4. Why are some pull
requests rejected?
Task articulation
Slide 28
Slide 28 text
15%
19%
2%
13%
9%
16%
27%
Done by others better
Project heading elsewhere
Incorrect process
Incorrect implementation
Tests failed
PR was merged
We have no idea
4. Why are some pull
requests rejected?
Submitter’s fault
Slide 29
Slide 29 text
Pull Requests
are (almost) equally popular as shared repositories
Slide 30
Slide 30 text
Pull Requests
are small; merged in < 1 day; are briefly discussed
Slide 31
Slide 31 text
Pull Requests
are merged when they affect a hot project area
Slide 32
Slide 32 text
Pull Requests
are processed fast when project has test suite
Slide 33
Slide 33 text
Pull Requests
are processed fast when contributor has good track record
Slide 34
Slide 34 text
Pull Requests
are rejected mostly due to insufficient task articulation
Slide 35
Slide 35 text
Recommendations
•Contributors
•Keep it short; keep it hot
•Figure out what others are doing
•Make thyself known to the project community
•Core team
•Invest in test suite
•Clarify project direction
Slide 36
Slide 36 text
Research
• Understanding pull request process at the
project level
• Developer’s information needs
• Recommendation tools
• Quality analytics