Gender-diversity analysis of the technical contributions in OpenStack

Presented at OpenStack Summit in Austin, Texas.




April 28, 2016


  1. Gender-diversity analysis of technical contributions Daniel Izquierdo Cortázar @dizquierdo dizquierdo

    at bitergia dot com OpenStack Summit, Austin 2016
  2. Outline Introduction First Steps Some numbers and method Conclusions

  3. Introduction A bit about me Why this analysis What we

    have so far
  4. /me CDO in Bitergia, the software development analytics company Involved

    in the OpenStack Activity Board Involved in the OpenStack Quarterly Reports Disclaimer: not involved in WOO working group, own analysis and interest, I may have missed some stuff...
  5. Why this study Diversity matters I attended some WOO (Women

    of OpenStack) talks in Tokyo There are not numbers about technical contributions (AFAIK) How’s this evolving? Is gender-diversity increasing? In the end this is all about transparency and improvement
  6. What we have so far OpenStack related resources: - Linkedin

    WOO: 600 members, 137 discussions - WOO mailing list: 380 emails, 140 threads, 90 participants - WOO wiki
  7. What we have so far Others of interest: FOSS Survey

    in 2013: - http://floss2013.libresoft.es/results.en.html - 11% of women answered the survey The Industry Gender Gap by the World Economic Forum. - 5% for CEOs, 21% for Mid-level roles, 32% of Junior roles
  8. Some companies Pinterest Engineering focused employees. https://blog.pinterest.com/en/our- plan-more-diverse-pinterest

  9. Some companies Google Tech focused employees. http://www.google.com/diversity/

  10. Some companies Facebook Tech focused employees. http://newsroom.fb. com/news/2015/06/driving- diversity-at-facebook/

  11. Some companies Dropbox all employees. https://blogs.dropbox. com/dropbox/2014/11/strengtheni ng-dropbox-through-diversity/

  12. Summary Conclusions not representative, but: - Women represents around 30%/40%

    of the workforce in tech companies. - And between 10% and 20% if focused on tech teams. - What about OpenStack?
  13. First Steps

  14. Some Definitions Technical contributions: commit, upload, gerrit vote Other potential

    metrics: diversity by company, fairness in the code review among organizations and genders, transparency in the process Available but sensitive info: affiliation, countries, time to review
  15. First Steps Names databases Genderize.io Manual analysis Focus on main

  16. Architecture Original Data Sources Mining Tools CVSAnalY Bicho SortingHat Info

    Enrich. Genderize.io Pandas Jupyter Notebooks Manual work Viz ElasticSearch + Kibana
  17. Architecture Original Data Sources • Git and Gerrit repos based

    on yaml at Governance • ~ 370k commits • ~ 250k changesets • ~ 840k Patchset uploads • ~ 1,124K patches code reviews
  18. Architecture Mining Tools CVSAnalY Bicho SortingHat • CVSAnalY and Bicho

    ddbb publicly available • Activity Board (http://activity.openstack.org/dash/browser/data/db/ ) • SortingHat db available under request (sensitive info) • Bonus: now migrating to GrimoireLab
  19. Architecture Info Enrich. Genderize.io Pandas Jupyter Notebooks Manual work •

    Genderize.io: name database • Pandas: data analysis lib. • Jupyter Notebook: web app. For data analysis • Manual work:
  20. Architecture Viz ElasticSearch + Kibana • ElasticSearch: Schemaless db •

    Kibana: works great with ES • This tandem helps a lot to verify info • Drill down capabilities • Extra info available (but not displayed)
  21. Validation: manual work Check main contributors by hand Be sure

    the WOO Wiki contributors are correct Asian names hard to check ( u_u ) (help needed!) Others...
  22. Some numbers Git Contributions Gerrit Reviews Demographics

  23. Git Overview • Aggregated historical data • Repos based on

    the Governance yaml file
  24. Git Activity and Population Women activity (all of the history):

    ~ 10,5% of the population ( ~ 570 developers ) ~ 6,8% of the activity ( >=16k commits )
  25. Git Activity and Population Women activity (last year): ~ 11%

    of the population ( ~ 340 active developers ) ~ 9% of the activity ( >=6k commits )
  26. Git WOO Main Projects • Where do WOO contributions go?

    • No-filtered order: Infra, Nova, Neutron, Doc, QA • Lots of activity in Doc, Infra, Neutron, Nova and Horizon
  27. Git WOO Type of Contribution • Where do WOO contributions

    go? • No-filtered order: Infra, Nova, Neutron, Doc, QA • Lots of activity in Doc, Infra, Neutron, Nova and Horizon
  28. Git WOO Evolution • Similar trend than the overall evolution

    • But slightly better during the last year • Peaks during March, June August and November 2015 (any clue?)
  29. Git WOO Evolution (peaks) • March 2015: Extra activity in

    Ironic • June 2015: Extra activity in Doc and Puppet OpenStack • August 2015: Extra activity in Infra and Doc • November 2015: Extra activity in Doc and OpenStack Client
  30. Gerrit Overview • As an example the aggregated history of

    the project • Repos based on the Governance yaml file
  31. Gerrit Reviews • ~ 1 Million reviews • ~ 400k

    ‘+2’ reviews • ~ 11k ‘-2’ reviews • ~ 325k ‘+1’ reviews • ~ 207k ‘-1’ reviews
  32. Gerrit Reviews Evolution by WOO Continuous increase Big Jump during

    the last year (if compared to general trend)
  33. Gerrit Reviews Evolution by WOO And that jump is even

    higher when checking +2 reviews Up to 3 times the ‘+2’ activity from 2014 to 2015 (This behaviour does not follow the general trend)
  34. Demographics Attraction of female developers to the community Peak on

    2015 Q3 with 62 developers [chart measures the first contribution by each developer and groups by quarter]
  35. Demographics Female developers leaving the community [active developer = at

    least a commit during the last year] [chart measures the last contribution by each developer and groups by quarter]
  36. Demographics: extra bonus When were born the developers contributing during

    the last quarter? And who are they? Working for? Working at?
  37. Demographics: extra bonus And the other way around: How good

    are we retaining developers that entered in 2013-Q1? (And who are they? Working for? Working at?) [19 attracted in 2013 Q1. 6 left in that quarter. 7 are still contributing. Another 6 left in other periods]
  38. Analysis Is Outreachy helping the gender- gap?

  39. Outreachy “Outreachy helps people from groups underrepresented in free and

    open source software get involved” Is helping Outreachy to decrease the gender gap in OpenStack? How’s performing the community to retain these developers? And how’s the overall performance of the community retaining developers?
  40. Outreachy Studied 4 periods 2 devs. still contributing (commits) Better

    retention More attracted women (mostly paid by orgs.)
  41. Outreachy

  42. Outreachy More women are attracted and retained (also in relative

    numbers) thanks to organizations in OpenStack. Even though, some numbers from tech companies show a higher % of women. Is it worth exploring to invest other resources in companies to kindly let know about this? What about exploring high school focused actions? (prior degree studies) Disclaimer: Just some ideas!
  43. Conclusions Answer to First Questions Data to Make Decisions Open

  44. Some Answers Continuous increase of activity and population (up to

    11%) Outstanding increase in core review contributions Most of the women come as new orgs. join the Foundation Tooling is useful to have number, compare and make decisions
  45. Conclusions Room for improvement of the dataset This provides some

    initial numbers about the current status Hopefully useful for the WOO working group and the project
  46. Open Paths How this may help the challenges detailed by

    the WOO: - Close to 550 female developers (more than 200 with a 100% of probability) - Talk to them, send an email, let them participate, have meetings, ask for mentorships - Detection of new women entering the community, say hello! https://wiki.openstack.org/wiki/Women_of_OpenStack
  47. Further Work Sensitive info: dashboard still private Extra analysis: time

    to merge fairness, companies women %, Outreachy follow ups, quarterly reports, updated data, specific policies ROI and others. This [hopefully] helps to have a better picture Looking for sponsors!