Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Software Development Analytics Workhops: Case of Study. Part 2

Software Development Analytics Workhops: Case of Study. Part 2

Bitergia

May 16, 2016
Tweet

More Decks by Bitergia

Other Decks in Technology

Transcript

  1. Do you know your community? Mozilla Hispano Work Week 2013:

    https://blog.mozilla.org/blog/2010/11/09/happy-6th-birthday- firefox/
  2. Do you know your community? OpenStack Summit 2013, Hong Kong

    http://redhatstackblog.redhat.com/tag/openstack-summit- hong-kong/
  3. Onions and Communities Communities are like onions Several layers Let’s

    peel, cut and cook it! http://en.wikipedia.org/wiki/File:Mixed_onions.jpg
  4. Community of People First layer: users - They use the

    software - They have questions - They use mailing lists, forums, Q&A sites
  5. Community of People Second layer: advanced users - Use the

    software - Help in the debugging process - Fill bug reports - They even send some patches to the ticketing system
  6. Community of People Third layer: occasional contributors - Use the

    software - Help in the debugging process - Contribute in the development process - They feel comfortable using IRC, Git and others
  7. Community of People Fourth layer: regular and core contributors -

    Use the software - Help in the debugging process - Contribute a lot in the development process - They take advantage of the whole infrastructure: Gerrit, Jenkins, and others.
  8. Community of People So the onion consists of people: 1.

    Users 2. Advanced users 3. Developers 4. And core developers And a lot of different roles and skills: translators, designers, developers, maintainers...
  9. Infrastructure Growth based on the infrastructure 1. Source code repos:

    Git, Mercurial, SVN 2. Discussions: Mailing lists, IRC, Askbot, Slack 3. Review process: Gerrit, issue tracking systems, And this helps to define the dev. process!
  10. Analytics from Scratch - Data Sources - Others working on

    similar issues - Development of tools/scripts - Retrieval/Integration/Aggregation of information - Cleaning, massaging and understanding the data - Enrichment process
  11. Data Retrieval NoSQL SQL GraphDB TimeSeriesDB Others... Examples: - Metrics

    Grimoire (SQL) - GrimoireLab (NoSQL) - PatchWork (API) - Others...
  12. Data Cleaning Format Issues Message-ID in mbox Dates in Git,

    mbox, or Gerrit Oddities in the data Commits or emails in the 70’s (Unix time 0) Errors in data sources (migration of Gerrit projects) Missing fields Gerrit username without name or email. CVS -> SVN -> Git migrations
  13. Data Massage Filter/Group/Aggregate As similarly done in SQL Filter by

    date Filter by terms Apply functions (sum, percentiles, difference…) Enrich the data Add extra info: Aff. per developer Time to close a bug Review time Gender Demographics Studies Demographics Predictability Risks analysis Neutrality Bottlenecks Mentorship
  14. Hackatons It's great, we have a hackaton! • Organizations spend

    money on hackatons • And hackatons usually have a goal: ◦ Clean the ticketing system ◦ Attract and help new developers ◦ Advance in specific functionalities ◦ Create, innovate How can we measure all of this?
  15. Initial Situation - "...trend on "Persistent concerns about code review"

    according to a survey - "...more than 1200 changesets waiting for review and the age of open changesets is increasing…” - "Cleaning up Gerrit alone will not solve this, but will help raising the awareness..." Extra info at Ticket T88531
  16. Goal "Overall aim: Reduce queue length and median age of

    unreviewed changesets with a focus on volunteer contributions”
  17. Goal (defining metrics) Total number of changesets waiting for a

    reviewer action... - and Unknown affiliation - and Independent affiliation - Unknown affiliation and open in the last three months - Independent affiliation and open in the last three monthts Before and After!
  18. Data Sources Total number of changesets waiting for a reviewer

    action... - and Unknown affiliation 628 (was 751 ~ -17%) - and Independent affiliation 124 (was 159 ~ -23,1%) - Unknown affiliation and open in the last three months 261 (was 335 ~ -22%) - Independent affiliation and open in the last three monthts 53 (was 71 ~ -25,4%)
  19. Conclusions Goals partially completed - All WMF developer teams using

    Gerrit participated, although with different degrees of engagement. - The queue of changesets without any review was reduced by 18% (total) and 25% (last 3 months).
  20. Conclusions BUT - 752 changesets were still unreviewed (was 910),

    314 from the past 3 months (was 406). - We committed to the goal of 100% without a calculator; we are still happy about the 18%-24%.
  21. Context Issue: - Lots of new developers - Growth of

    the development activity - Decay in the code review process - Surveys to developers to understand this - Several hypothesis - They needed to control the process
  22. Context Issue: they needed to understand how the code review

    process took place https://www.flickr.com/photos/cantchangerandy/3050880952
  23. Detailing First Metrics Time Analysis: - Time to merge -

    Time to commit - Time to re-work - Cycle time - Time to first review ‘Complexity’ Analysis: - Versions per patch serie - Merged patches: ‘touched’ files and lines - Comments per patch - Patches per patch serie
  24. Architecture Original Data Sources Mining Tools CVSAnalY Mailing List Stats

    Info Enrich. Python Script Pandas Jupyter Notebooks Viz ElasticSearch + Kibana
  25. Parsing Process PATCH: key word Each Thread (patch serie) =

    Code Review Process 1 or more patches per patch serie
  26. Parsing Process Email subject contains some info: - [PATCH vX

    x/y ] Subject ‘Subject’ links to the specific commit in Git if merged But: - Hard to parse subjects (infinite options) - ‘Subjects’ tend to be slightly different when committing changes
  27. Use Cases Community Use Cases: - Use Case 1: Identification

    of Top Reviewers - Use Case 2: Identification of Imbalances between reviewing and contribution - Use Case 3: Identification of Post-ACK comments Performance Use Cases: - Use Case 4: Identification of delays in the code review process due to versions - Use Case 5: Identification of delays in the code review process due to large PatchSeries Backlog Use Cases: - Use Case 6: Identification of Merge and non-Merged PatchSeries - Use Case 7: Identification of nearly completed PatchSeries - Use Case 8: Identification of Hot/Warm/Tepid/Cold/Freezing/Dead PatchSeries
  28. Context Issue: - No numbers about women in OpenStack -

    They need numbers to make decisions
  29. Git Activity and Population Women activity (all of the history):

    ~ 10,5% of the population ( ~ 570 developers ) ~ 6,8% of the activity ( >=16k commits )
  30. Git WOO Type of Contribution • Where do WOO contributions

    go? • No-filtered order: Infra, Nova, Neutron, Doc, QA • Lots of activity in Doc, Infra, Neutron, Nova and Horizon
  31. Git WOO Evolution • March 2015: Extra activity in Ironic

    • June 2015: Extra activity in Doc and Puppet OpenStack • August 2015: Extra activity in Infra and Doc • November 2015: Extra activity in Doc and OpenStack Client
  32. Git WOO Evolution • March 2015: Extra activity in Ironic

    • June 2015: Extra activity in Doc and Puppet OpenStack • August 2015: Extra activity in Infra and Doc • November 2015: Extra activity in Doc and OpenStack Client
  33. Gerrit Reviews • ~ 1 Million reviews • ~ 400k

    ‘+2’ reviews • ~ 11k ‘-2’ reviews • ~ 325k ‘+1’ reviews • ~ 207k ‘-1’ reviews
  34. Gerrit Reviews Evolution by WOO Continuous increase Big Jump during

    the last year (if compared to general trend)