Analysis of the Xen code review process: An example of software development analytics

Lars Kurth Community Manger, Xen Project Chairman, Xen Project Advisory
Board lars_kurth Daniel Izquierdo CDO, Bitergia dizquierdo

Was a contributor to many projects Community guy for the
Xen Project Working for Citrix Member of the group that develops XenServer Chairman of Xen Project Advisory Board The Xen Project Develops the Xen Project Hypervisor Linux Foundation Collaborative Project Used by the biggest cloud providers (AWS, …) Lots of commercial Xen variants

Contributor to other dashboards OpenStack Activity Board and Wikimedia Dashboard
Co-Founder of Bitergia Focused on open analytics Open source tools to analyse open source projects Currently working on The Xen Project Dashboard OpenStack and OPNFV quarterly reports Gender diversity analysis Developer of Metrics Grimoire and VizGrimoire analysis toolchains

2008 2009 2010 2011 2012 2013 2014 2015 Developer list
traffic Growth of Development Activity >100% in 5 years 2015: Getting a sense of scale 240 developers from 95 orgs 12K commits 50 core developers from 18 orgs 4K commits in core hypervisor repos

2008 2009 2010 2011 2012 2013 2014 2015 Developer list
traffic Complaints Maintainers complaining about workload and inefficiencies Contributors complaining about not getting their changes in fast enough

2008 2009 2010 2011 2012 2013 2014 2015 Developer list
traffic Wrote some basic data gathering scripts To see whether we had a problem

2008 2009 2010 2011 2012 2013 2014 2015 Developer list
traffic Surveys and Conversations To identify root cause

Code contributions are more complex and touch more areas of
code

No, we simply don’t have enough reviewers

We had an influx of new contributors who write lower
quality code than required today

No, we argue more than in the past, which slows
everything down & scares away people

We had an influx of non-native English speakers leading to
misunderstandings

2014 2015 2016 Created training for newcomers Visited contributors in
Asia several times Architecture/Design Reviews Avoid late disagreements on architecture/design Governance Retry Governance Mostly Failed Seek Help Use statistical Analysis

Patch Series Overview … Patch 1 Patch n Xen-devel@ mailing
list v1 v2 v3 vm

Xen-devel@ mailing list v1 v2 v3 vm Patch Series Overview
… Patch 1 Patch n Xen-devel@ mailing list v1 v2 v3 vm Review Comments ACK’ed by flags (maintainers)

Patch Series Overview … Patch 1 Patch n Git …
Patch 1 Patch n All necessary ACK’ed by flags in place (committer applies changes to Git tree) Xen-devel@ mailing list v1 v2 v3 vm

MySQL Databases Data Sources Mailing List Stats Mail Discussions CVSAnalY
Git Activity

Mail Discussions Git Activity Data Analysis 4 Study MySQL Database
Code Review Process

Plumbing Code Review Process Notebook Time to review Time to
merge Time to commit Main Devs Main Orgs …

Data Model What is the actual workflow in the community?
What data is relevant? No off-the shelf solution Lots of bits and pieces Had to develop integrations from scratch Photo credit: NASA

Patch Series = (Mail) Thread, Patch = Root Message Cases
where each patch is a new thread (e.g. not using git-send) Cross posted messages (e.g. from LKML) Versions: [PATCH vX Y/Z] Not always regular: need to use heuristics / regular expressions Missing versions (e.g. version starts at v5) Patch number (Y of Z): [PATCH vX Y/Z] Not always regular: need to use heuristics / regular expressions Matching (Mail) Threads and Commits Issues with commit timestamps Some patches share the same subject line

1 patch/series >4 patches/series

Large patches take significantly longer to review Series up to
6/7 patches = X days Series 3 times larger = 10 x X days Complexity increases code review time The bigger, the worse Solution? Break patches into smaller series? Not always possible for complex features (can’t review code in isolation) Code contributions are more complex and touch more areas of code

How can the results be true? Only 60% of code
reviews are included in the study. Issue: Cross posting of Linux/QEMU/… patches Xen-devel@ Have not yet implemented matching against different non-Xen repositories

But it appears that some of the changes we made,
clearly had a positive effect

The findings don’t support my personal experience, thus the whole
approach must be flawed!

Alright, let’s improve the tools and see whether we can
use them to improve how we work and encourage people to do more reviews!

Seems like a reasonable approach. Let’s do it!

Community Use Cases: Encourage desired behavior UC 1: Identify Top
Reviewers (people and orgs) UC 2: Identify Imbalances between Reviewers and Contributors (freeloading) UC 3: Identify Post-ACK comments (an indirect measure of “unnecessary conflict”) Performance Use Cases: Spot issues early UC 4: Identify delays due to large number of revisions (quality, conflict, communication) UC 5: Identify delays due to large Patch Series (complexity, coordination) Backlog Use Cases: Optimize Process and Focus UC 6: Merged and not merged (did something get missed?) UC 7: Identify nearly completed Patch Series (focus by % ACKed) UC 8: Identify of Hot/Warm/Tepid/Cold/Freezing/Dead Patch Series (focus by activity)

Plumbing Code Review Process Dashboard Evolutionary Data Tables and charts
Drill down actions Several Panels Customizable …

Identify Top Reviewers Identify Reviewing Orgs Evolution of Reviews

Identify Imbalances Developers and Orgs Evolution of Patches

Detect noise in the merge process Evolution of comments and
post-ACK comments Top Developers and Orgs

Versions (iterations) by Patch Series Time to merge per number
of version > 9 versions: up to 10x slower

Patches (complexity) by Patch Series Time to merge per number
of patches > 16 patches: up to 10x slower

What was merged, what was not? Link back to individual
patch series Direct links to the review e-mail thread via marc.info

Full details of % of ACKed patches in a series
Intention: • Allow reviewers to focus on nearly complete series  Get patches completed more quickly • Can be used effectively with advanced search queries ACK info

Predefined time buckets Identify hot reviews (lots of recent activity)
Identify freezing/dead reviews (stale code reviews) time release Intention: • Hot and Warm reviews are those with lots of activity in peoples inboxes  Allow us to and contributors to easily spot reviews that got forgotten • Release filters help manage what will be in a release

Improve Accuracy of E-mail to Git Matching Boundary cases (e.g.
patchwork for threads, etc.) Cross posting of Linux/QEMU/… patches Xen-devel@ Improve Usefulness Provide data that fits into the workflow (e.g. git repos, commit id’s, …) Focus data: are the dashboards too noisy? Iterate, Iterate, Iterate Can easily add new views and panels required by groups of users Make the review process more tools friendly? Minor changes OK, significant ones won’t fly

Required active engagement of a community member About 1 month
of effort over a 6 month period Data analysis is no silver bullet Not everyone believes the data Perception is not always a good friend Learned lots about the review process In fact, we may make changes Good starting point for other communities Using an e-mail based review workflow

Slides: slideshare.net/xen_com_mgr Xen Project: xenproject.org Bitergia: bitergia.com Dashboard: tinyurl.com/xenproject-dashboard Documentation:
tinyurl.com/xenproject-dashdocs Code: tinyurl.com/xenproject-dashcode Contribute: tinyurl.com/xenproject-contribute

Analysis of the Xen code review process: An exa...

Analysis of the Xen code review process: An example of software development analytics

More Decks by Bitergia

Other Decks in Technology

Featured

Transcript