Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Software Development Analytics Workhops: Case of Study. Part 2

Software Development Analytics Workhops: Case of Study. Part 2

322ff21456fd4cf8ad28cdee136a8831?s=128

Bitergia
PRO

May 16, 2016
Tweet

More Decks by Bitergia

Other Decks in Technology

Transcript

  1. Software Development Analytics Daniel Izquierdo Cortázar @dizquierdo dizquierdo at bitergia

    dot com
  2. Outline Introduction Analytics from Scratch Women in OpenStack Xen Code

    Review Process
  3. Introduction Community of People Infrastructure

  4. Do you know your community? Mozilla Hispano Work Week 2013:

    https://blog.mozilla.org/blog/2010/11/09/happy-6th-birthday- firefox/
  5. Do you know your community? Mozilla Summit 2010 https://blog.mozilla.org/blog/2010/11/09/happy-6th-birthday-firefox/

  6. Do you know your community? OpenStack Summit 2013, Hong Kong

    http://redhatstackblog.redhat.com/tag/openstack-summit- hong-kong/
  7. Do you know your community? And other communities, such as

    Rolling Stones concert
  8. Onions and Communities Communities are like onions Several layers Let’s

    peel, cut and cook it! http://en.wikipedia.org/wiki/File:Mixed_onions.jpg
  9. Community of People First layer: users - They use the

    software - They have questions - They use mailing lists, forums, Q&A sites
  10. Community of People Second layer: advanced users - Use the

    software - Help in the debugging process - Fill bug reports - They even send some patches to the ticketing system
  11. Community of People Third layer: occasional contributors - Use the

    software - Help in the debugging process - Contribute in the development process - They feel comfortable using IRC, Git and others
  12. Community of People Fourth layer: regular and core contributors -

    Use the software - Help in the debugging process - Contribute a lot in the development process - They take advantage of the whole infrastructure: Gerrit, Jenkins, and others.
  13. Community of People So the onion consists of people: 1.

    Users 2. Advanced users 3. Developers 4. And core developers And a lot of different roles and skills: translators, designers, developers, maintainers...
  14. Infrastructure Growth based on the infrastructure 1. Source code repos:

    Git, Mercurial, SVN 2. Discussions: Mailing lists, IRC, Askbot, Slack 3. Review process: Gerrit, issue tracking systems, And this helps to define the dev. process!
  15. Analytics from Scratch First Steps Visualization

  16. Analytics from Scratch - Data Sources - Others working on

    similar issues - Development of tools/scripts - Retrieval/Integration/Aggregation of information - Cleaning, massaging and understanding the data - Enrichment process
  17. Data Sources

  18. Data Retrieval NoSQL SQL GraphDB TimeSeriesDB Others... Examples: - Metrics

    Grimoire (SQL) - GrimoireLab (NoSQL) - PatchWork (API) - Others...
  19. Data Cleaning https://www.flickr.com/photos/christianhaugen/3437050979/in/photostream/

  20. Data Cleaning Format Issues Message-ID in mbox Dates in Git,

    mbox, or Gerrit Oddities in the data Commits or emails in the 70’s (Unix time 0) Errors in data sources (migration of Gerrit projects) Missing fields Gerrit username without name or email. CVS -> SVN -> Git migrations
  21. Data Massage https://www.flickr. com/photos/valentinap/129820098/

  22. Data Massage Filter/Group/Aggregate As similarly done in SQL Filter by

    date Filter by terms Apply functions (sum, percentiles, difference…) Enrich the data Add extra info: Aff. per developer Time to close a bug Review time Gender Demographics Studies Demographics Predictability Risks analysis Neutrality Bottlenecks Mentorship
  23. Data Visualization

  24. Hackatons ROI

  25. Hackatons By Pierre-Selim Huard, CC BY 4.0, via Wikimedia Commons

  26. Hackatons It's great, we have a hackaton! • Organizations spend

    money on hackatons • And hackatons usually have a goal: ◦ Clean the ticketing system ◦ Attract and help new developers ◦ Advance in specific functionalities ◦ Create, innovate How can we measure all of this?
  27. Gerrit Clean Up Day in Wikimedia

  28. Initial Situation - "...trend on "Persistent concerns about code review"

    according to a survey - "...more than 1200 changesets waiting for review and the age of open changesets is increasing…” - "Cleaning up Gerrit alone will not solve this, but will help raising the awareness..." Extra info at Ticket T88531
  29. Goal "Overall aim: Reduce queue length and median age of

    unreviewed changesets with a focus on volunteer contributions”
  30. Goal (defining metrics) Total number of changesets waiting for a

    reviewer action... - and Unknown affiliation - and Independent affiliation - Unknown affiliation and open in the last three months - Independent affiliation and open in the last three monthts Before and After!
  31. Data Sources

  32. Data Sources Total number of changesets waiting for a reviewer

    action... - and Unknown affiliation 628 (was 751 ~ -17%) - and Independent affiliation 124 (was 159 ~ -23,1%) - Unknown affiliation and open in the last three months 261 (was 335 ~ -22%) - Independent affiliation and open in the last three monthts 53 (was 71 ~ -25,4%)
  33. Conclusions Goals partially completed - All WMF developer teams using

    Gerrit participated, although with different degrees of engagement. - The queue of changesets without any review was reduced by 18% (total) and 25% (last 3 months).
  34. Conclusions BUT - 752 changesets were still unreviewed (was 910),

    314 from the past 3 months (was 406). - We committed to the goal of 100% without a calculator; we are still happy about the 18%-24%.
  35. The Xen Code Review Process

  36. Context Issue: - Lots of new developers - Growth of

    the development activity - Decay in the code review process - Surveys to developers to understand this - Several hypothesis - They needed to control the process
  37. Context Issue: they needed to understand how the code review

    process took place https://www.flickr.com/photos/cantchangerandy/3050880952
  38. Detailing First Metrics Time Analysis: - Time to merge -

    Time to commit - Time to re-work - Cycle time - Time to first review ‘Complexity’ Analysis: - Versions per patch serie - Merged patches: ‘touched’ files and lines - Comments per patch - Patches per patch serie
  39. Architecture Original Data Sources Mining Tools CVSAnalY Mailing List Stats

    Info Enrich. Python Script Pandas Jupyter Notebooks Viz ElasticSearch + Kibana
  40. Parsing Process PATCH: key word Each Thread (patch serie) =

    Code Review Process 1 or more patches per patch serie
  41. Parsing Process Email subject contains some info: - [PATCH vX

    x/y ] Subject ‘Subject’ links to the specific commit in Git if merged But: - Hard to parse subjects (infinite options) - ‘Subjects’ tend to be slightly different when committing changes
  42. First Results

  43. First Results Disclaimer: 50% of commits found In the mailing

    list
  44. Use Cases Community Use Cases: - Use Case 1: Identification

    of Top Reviewers - Use Case 2: Identification of Imbalances between reviewing and contribution - Use Case 3: Identification of Post-ACK comments Performance Use Cases: - Use Case 4: Identification of delays in the code review process due to versions - Use Case 5: Identification of delays in the code review process due to large PatchSeries Backlog Use Cases: - Use Case 6: Identification of Merge and non-Merged PatchSeries - Use Case 7: Identification of nearly completed PatchSeries - Use Case 8: Identification of Hot/Warm/Tepid/Cold/Freezing/Dead PatchSeries
  45. Use Cases: Top Reviewers

  46. Use Cases: Imbalances

  47. Use Cases: Post ACK Comments

  48. Use Cases: Code Review Delays

  49. Use Cases: Code Review Delays

  50. Use Cases: Code Review Delays

  51. Use Cases: Merged PatchSeries

  52. Use Cases: Nearly Completed

  53. Use Cases: Timeframes

  54. Gender-diversity In OpenStack Use Case Definition Dashboard

  55. Context Issue: - No numbers about women in OpenStack -

    They need numbers to make decisions
  56. Git Activity and Population Women activity (all of the history):

    ~ 10,5% of the population ( ~ 570 developers ) ~ 6,8% of the activity ( >=16k commits )
  57. Git WOO Type of Contribution • Where do WOO contributions

    go? • No-filtered order: Infra, Nova, Neutron, Doc, QA • Lots of activity in Doc, Infra, Neutron, Nova and Horizon
  58. Git WOO Evolution • March 2015: Extra activity in Ironic

    • June 2015: Extra activity in Doc and Puppet OpenStack • August 2015: Extra activity in Infra and Doc • November 2015: Extra activity in Doc and OpenStack Client
  59. Git WOO Evolution • March 2015: Extra activity in Ironic

    • June 2015: Extra activity in Doc and Puppet OpenStack • August 2015: Extra activity in Infra and Doc • November 2015: Extra activity in Doc and OpenStack Client
  60. Gerrit Reviews • ~ 1 Million reviews • ~ 400k

    ‘+2’ reviews • ~ 11k ‘-2’ reviews • ~ 325k ‘+1’ reviews • ~ 207k ‘-1’ reviews
  61. Gerrit Reviews Evolution by WOO Continuous increase Big Jump during

    the last year (if compared to general trend)
  62. Dashboard Demo...

  63. Open Discussion