Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Gender-diversity analysis of technical contributions in OpenStack (Barcelona edition)

Gender-diversity analysis of technical contributions in OpenStack (Barcelona edition)

OpenStack Summit, Barcelona 2016.
Daniel Izquierdo.



October 26, 2016


  1. Gender-diversity analysis of technical contributions Daniel Izquierdo Cortázar @dizquierdo dizquierdo

    at bitergia dot com https://speakerdeck.com/bitergia OpenStack Summit, Barcelona 2016
  2. Outline Introduction First Steps Some numbers and method Conclusions

  3. Introduction A bit about me Why this analysis What we

    have so far
  4. /me CDO in Bitergia, the software development analytics company Lately

    involved in understanding the gender diversity in some OSS communities Involved in OPNFV dashboard (opnfv.biterg.io) Disclaimer: not involved in any working group, own analysis and interest, I may have missed some stuff...
  5. Why this study Diversity matters I attended some (Women of

    OpenStack) talks in the OpenStack Summit (Tokyo and Austin) Produced some numbers that gained some attention: OpenStack and Linux Kernel In the end this is all about transparency and improvement Update the numbers
  6. What we have so far FOSS Survey in 2013: -

    http://floss2013.libresoft.es/results.en.html - 11% of women answered the survey The Industry Gender Gap by the World Economic Forum. - 5% for CEOs, 21% for Mid-level roles, 32% of Junior roles
  7. Some companies Pinterest Engineering focused employees. https://blog.pinterest.com/en/our- plan-more-diverse-pinterest

  8. Some companies Google Tech focused employees. http://www.google.com/diversity/

  9. Some companies Facebook Tech focused employees. http://newsroom.fb.com/news/201 5/06/driving-diversity-at-facebook /

  10. Some companies Dropbox all employees. https://blogs.dropbox.com/dropbo x/2014/11/strengthening-dropbox -through-diversity/

  11. OpenStack (Austin) numbers Women activity (all of the history): ~

    10,5% of the population ( ~ 570 developers ) ~ 6,8% of the activity ( >=16k commits )
  12. OpenStack (Austin) numbers Women activity (last year): ~ 11% of

    the population ( ~ 340 active developers ) ~ 9% of the activity ( >=6k commits )
  13. Linux Kernel Numbers Women activity (since 2005): ~ 5.2% (

    > 31K commits) ~ 8% of the population ( ~ 1,15K developers)
  14. Linux Kernel Numbers Women activity (last year): ~ 6.8% of

    the activity ( ~ 4k commits ) ~ 9.9% of the population ( ~ 330 active developers )
  15. Summary Conclusions not representative, but: - Women represents around 30%/40%

    of the workforce in tech companies. - And between 10% and 20% if focused on tech teams. - OpenStack shows a 11% of the population - Linux Kernel shows a 10% of the population
  16. First Steps

  17. Some Definitions Contributions: commit, patchset, code review, email Other potential

    metrics: diversity by company, fairness in the code review among organizations and genders, transparency in the process Available but sensitive info: affiliation, countries, time to review
  18. First Steps Names databases Genderize.io Manual analysis Focus on main

  19. Architecture Original Data Sources Mining Tools Perceval Info Enrich. Genderize.io

    Pandas Manual work Viz ElasticSearch + Kibana @
  20. Architecture Original Data Sources • Git and mailing lists •

    ~ 600K commits (359K w/o merges and deb) • ~ 150K emails • ~ 300K changesets (w/o deb) • ~ 1M patchsets (w/o deb) • ~ 1.5M code reviews (w/o deb) @
  21. Architecture Mining Tools Perceval • Produces JSON documents from the

    usual data sources in OSS • Part of the GrimoireLab toolchain • grimoirelab.github.io
  22. Architecture Info Enrich. Genderize.io Pandas Manual work • Genderize.io: name

    database • Pandas: data analysis lib. • Ceres library (dicortazar/ceres @ github) • Manual work:
  23. Architecture Viz ElasticSearch + Kibana • ElasticSearch: Schemaless db •

    Kibana: works great with ES • This tandem helps a lot to verify info • Drill down capabilities • Extra info available (but not displayed)
  24. Validation: manual work Check main contributors by hand Asian names

    hard to check ( u_u ) (help needed!) Mailing lists providing not expected format
  25. Some numbers Git Contributions Gerrit Contributions Mailing Lists Contributions

  26. Git Overview • Aggregated historical data • Yaml file

  27. Git Activity and Population Women activity (all history): 27,162 commits

    (7.35% of activity) 839 (10.63% of population)
  28. Git Activity and Population Women activity (last year): 7748 commits

    (8,58% of the activity) 422 developers (11,53% of the population)
  29. Git Activity Women Evolution • There are jumps at the

    beginning of the year • Stable during the last year
  30. Git Authors Women Evolution • Interesting pattern: each year is

    kind of a block of developers
  31. Mailing Lists Overview • Openstack-dev • Language-oriented mailing lists

  32. Mailing List WOO Activity • 14K emails, 9% of the

    activity • 672 WOO participants, 9,07% of the population • Similar numbers to the last year
  33. Code Reviews Overview • Projects not found in the yaml

    file were ignored • Package-deb project also ignored
  34. Gerrit WOO Activity • 28,503 changesets sent, 9,4% activity •

    812 women sending changesets, 11,87% of the population • 9,56% of the activity and 13% of the population during the last year Women sending changesets
  35. Analysis Comparison with the status in Austin (6 months ago)

  36. October-April-October (Git) October 2015 - April 2016 April 2016 -

    October 2016
  37. October-April-October October 2015 - April 2016 April 2016 - October

    2016 Women reviewers
  38. Conclusions Answer to First Questions Data to Make Decisions Open

  39. Some Answers • Similar activity in Git: increase in the

    number of repositories • WOO lower activity as core reviewers (~ -9%) ◦ Activity has increased on the other hand (~ 6%)
  40. Conclusions Room for improvement of the dataset This provides some

    initial numbers about the current status Hopefully useful for the Foundation
  41. Open Questions from Last Talk Question: Is there a specific

    action for helping you with the data correctness or the name identification? Suggestion: integrate openstack id with gerrit and in the members foundation directory, there's specific information related to gender Video: https://www.youtube.com/watch?v=TQIQCT-Aqpo
  42. Open Questions from Last Talk Comment: the reason why the

    documentation project is doing so great is because they have great inclusive leaders Comment: Another interesting point is 'retention': how to bring them on board and keep them contributing Video: https://www.youtube.com/watch?v=TQIQCT-Aqpo
  43. Open Questions from Last Talk Suggestion: work on relative numbers

    and not that much in the net numbers. As projects come and go, it would be interesting to work at this level. Comment: working at the level of high school, works done in the USA/Europe? People are willing to help with this line. Suggestion: address people outside of the gender binary Video: https://www.youtube.com/watch?v=TQIQCT-Aqpo
  44. Further Work Sensitive info: dashboard still private Extra analysis: time

    to merge fairness, companies women %, Outreachy follow ups, quarterly reports, updated data, specific policies ROI and others. This [hopefully] helps to have a better picture Other minorities analysis could be done
  45. Gender-diversity analysis of technical contributions Daniel Izquierdo Cortázar @dizquierdo dizquierdo

    at bitergia dot com https://speakerdeck.com/bitergia OpenStack Summit, Barcelona 2016