Software Development Analytics Workhops: Case of Study. Part 2

Software Development Analytics Daniel Izquierdo Cortázar @dizquierdo dizquierdo at bitergia
dot com

Outline Introduction Analytics from Scratch Women in OpenStack Xen Code
Review Process

Introduction Community of People Infrastructure

Do you know your community? Mozilla Hispano Work Week 2013:
https://blog.mozilla.org/blog/2010/11/09/happy-6th-birthday- firefox/

Do you know your community? Mozilla Summit 2010 https://blog.mozilla.org/blog/2010/11/09/happy-6th-birthday-firefox/

Do you know your community? OpenStack Summit 2013, Hong Kong
http://redhatstackblog.redhat.com/tag/openstack-summit- hong-kong/

Do you know your community? And other communities, such as
Rolling Stones concert

Onions and Communities Communities are like onions Several layers Let’s
peel, cut and cook it! http://en.wikipedia.org/wiki/File:Mixed_onions.jpg

Community of People First layer: users - They use the
software - They have questions - They use mailing lists, forums, Q&A sites

Community of People Second layer: advanced users - Use the
software - Help in the debugging process - Fill bug reports - They even send some patches to the ticketing system

Community of People Third layer: occasional contributors - Use the
software - Help in the debugging process - Contribute in the development process - They feel comfortable using IRC, Git and others

Community of People Fourth layer: regular and core contributors -
Use the software - Help in the debugging process - Contribute a lot in the development process - They take advantage of the whole infrastructure: Gerrit, Jenkins, and others.

Community of People So the onion consists of people: 1.
Users 2. Advanced users 3. Developers 4. And core developers And a lot of different roles and skills: translators, designers, developers, maintainers...

Infrastructure Growth based on the infrastructure 1. Source code repos:
Git, Mercurial, SVN 2. Discussions: Mailing lists, IRC, Askbot, Slack 3. Review process: Gerrit, issue tracking systems, And this helps to define the dev. process!

Analytics from Scratch First Steps Visualization

Analytics from Scratch - Data Sources - Others working on
similar issues - Development of tools/scripts - Retrieval/Integration/Aggregation of information - Cleaning, massaging and understanding the data - Enrichment process

Data Sources

Data Retrieval NoSQL SQL GraphDB TimeSeriesDB Others... Examples: - Metrics
Grimoire (SQL) - GrimoireLab (NoSQL) - PatchWork (API) - Others...

Data Cleaning https://www.flickr.com/photos/christianhaugen/3437050979/in/photostream/

Data Cleaning Format Issues Message-ID in mbox Dates in Git,
mbox, or Gerrit Oddities in the data Commits or emails in the 70’s (Unix time 0) Errors in data sources (migration of Gerrit projects) Missing fields Gerrit username without name or email. CVS -> SVN -> Git migrations

Data Massage https://www.flickr. com/photos/valentinap/129820098/

Data Massage Filter/Group/Aggregate As similarly done in SQL Filter by
date Filter by terms Apply functions (sum, percentiles, difference…) Enrich the data Add extra info: Aff. per developer Time to close a bug Review time Gender Demographics Studies Demographics Predictability Risks analysis Neutrality Bottlenecks Mentorship

Data Visualization

Hackatons ROI

Hackatons By Pierre-Selim Huard, CC BY 4.0, via Wikimedia Commons

Hackatons It's great, we have a hackaton! • Organizations spend
money on hackatons • And hackatons usually have a goal: ◦ Clean the ticketing system ◦ Attract and help new developers ◦ Advance in specific functionalities ◦ Create, innovate How can we measure all of this?

Gerrit Clean Up Day in Wikimedia

Initial Situation - "...trend on "Persistent concerns about code review"
according to a survey - "...more than 1200 changesets waiting for review and the age of open changesets is increasing…” - "Cleaning up Gerrit alone will not solve this, but will help raising the awareness..." Extra info at Ticket T88531

Goal "Overall aim: Reduce queue length and median age of
unreviewed changesets with a focus on volunteer contributions”

Goal (defining metrics) Total number of changesets waiting for a
reviewer action... - and Unknown affiliation - and Independent affiliation - Unknown affiliation and open in the last three months - Independent affiliation and open in the last three monthts Before and After!

Data Sources

Data Sources Total number of changesets waiting for a reviewer
action... - and Unknown affiliation 628 (was 751 ~ -17%) - and Independent affiliation 124 (was 159 ~ -23,1%) - Unknown affiliation and open in the last three months 261 (was 335 ~ -22%) - Independent affiliation and open in the last three monthts 53 (was 71 ~ -25,4%)

Conclusions Goals partially completed - All WMF developer teams using
Gerrit participated, although with different degrees of engagement. - The queue of changesets without any review was reduced by 18% (total) and 25% (last 3 months).

Conclusions BUT - 752 changesets were still unreviewed (was 910),
314 from the past 3 months (was 406). - We committed to the goal of 100% without a calculator; we are still happy about the 18%-24%.

The Xen Code Review Process

Context Issue: - Lots of new developers - Growth of
the development activity - Decay in the code review process - Surveys to developers to understand this - Several hypothesis - They needed to control the process

Context Issue: they needed to understand how the code review
process took place https://www.flickr.com/photos/cantchangerandy/3050880952

Detailing First Metrics Time Analysis: - Time to merge -
Time to commit - Time to re-work - Cycle time - Time to first review ‘Complexity’ Analysis: - Versions per patch serie - Merged patches: ‘touched’ files and lines - Comments per patch - Patches per patch serie

Architecture Original Data Sources Mining Tools CVSAnalY Mailing List Stats
Info Enrich. Python Script Pandas Jupyter Notebooks Viz ElasticSearch + Kibana

Parsing Process PATCH: key word Each Thread (patch serie) =
Code Review Process 1 or more patches per patch serie

Parsing Process Email subject contains some info: - [PATCH vX
x/y ] Subject ‘Subject’ links to the specific commit in Git if merged But: - Hard to parse subjects (infinite options) - ‘Subjects’ tend to be slightly different when committing changes

First Results

First Results Disclaimer: 50% of commits found In the mailing
list

Use Cases Community Use Cases: - Use Case 1: Identification
of Top Reviewers - Use Case 2: Identification of Imbalances between reviewing and contribution - Use Case 3: Identification of Post-ACK comments Performance Use Cases: - Use Case 4: Identification of delays in the code review process due to versions - Use Case 5: Identification of delays in the code review process due to large PatchSeries Backlog Use Cases: - Use Case 6: Identification of Merge and non-Merged PatchSeries - Use Case 7: Identification of nearly completed PatchSeries - Use Case 8: Identification of Hot/Warm/Tepid/Cold/Freezing/Dead PatchSeries

Use Cases: Top Reviewers

Use Cases: Imbalances

Use Cases: Post ACK Comments

Use Cases: Code Review Delays

Use Cases: Merged PatchSeries

Use Cases: Nearly Completed

Use Cases: Timeframes

Gender-diversity In OpenStack Use Case Definition Dashboard

Context Issue: - No numbers about women in OpenStack -
They need numbers to make decisions

Git Activity and Population Women activity (all of the history):
~ 10,5% of the population ( ~ 570 developers ) ~ 6,8% of the activity ( >=16k commits )

Git WOO Type of Contribution • Where do WOO contributions
go? • No-filtered order: Infra, Nova, Neutron, Doc, QA • Lots of activity in Doc, Infra, Neutron, Nova and Horizon

Git WOO Evolution • March 2015: Extra activity in Ironic
• June 2015: Extra activity in Doc and Puppet OpenStack • August 2015: Extra activity in Infra and Doc • November 2015: Extra activity in Doc and OpenStack Client

Gerrit Reviews • ~ 1 Million reviews • ~ 400k
‘+2’ reviews • ~ 11k ‘-2’ reviews • ~ 325k ‘+1’ reviews • ~ 207k ‘-1’ reviews

Gerrit Reviews Evolution by WOO Continuous increase Big Jump during
the last year (if compared to general trend)

Dashboard Demo...

Open Discussion

Software Development Analytics Workhops: Case o...

Software Development Analytics Workhops: Case of Study. Part 2

More Decks by Bitergia

Other Decks in Technology

Featured

Transcript