Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Gender-diversity analysis of technical contribu...

Bitergia
October 26, 2016

Gender-diversity analysis of technical contributions in OpenStack (Barcelona edition)

OpenStack Summit, Barcelona 2016.
Daniel Izquierdo.

Bitergia

October 26, 2016
Tweet

More Decks by Bitergia

Other Decks in Technology

Transcript

  1. Gender-diversity analysis of technical contributions Daniel Izquierdo Cortázar @dizquierdo dizquierdo

    at bitergia dot com https://speakerdeck.com/bitergia OpenStack Summit, Barcelona 2016
  2. /me CDO in Bitergia, the software development analytics company Lately

    involved in understanding the gender diversity in some OSS communities Involved in OPNFV dashboard (opnfv.biterg.io) Disclaimer: not involved in any working group, own analysis and interest, I may have missed some stuff...
  3. Why this study Diversity matters I attended some (Women of

    OpenStack) talks in the OpenStack Summit (Tokyo and Austin) Produced some numbers that gained some attention: OpenStack and Linux Kernel In the end this is all about transparency and improvement Update the numbers
  4. What we have so far FOSS Survey in 2013: -

    http://floss2013.libresoft.es/results.en.html - 11% of women answered the survey The Industry Gender Gap by the World Economic Forum. - 5% for CEOs, 21% for Mid-level roles, 32% of Junior roles
  5. OpenStack (Austin) numbers Women activity (all of the history): ~

    10,5% of the population ( ~ 570 developers ) ~ 6,8% of the activity ( >=16k commits )
  6. OpenStack (Austin) numbers Women activity (last year): ~ 11% of

    the population ( ~ 340 active developers ) ~ 9% of the activity ( >=6k commits )
  7. Linux Kernel Numbers Women activity (since 2005): ~ 5.2% (

    > 31K commits) ~ 8% of the population ( ~ 1,15K developers)
  8. Linux Kernel Numbers Women activity (last year): ~ 6.8% of

    the activity ( ~ 4k commits ) ~ 9.9% of the population ( ~ 330 active developers )
  9. Summary Conclusions not representative, but: - Women represents around 30%/40%

    of the workforce in tech companies. - And between 10% and 20% if focused on tech teams. - OpenStack shows a 11% of the population - Linux Kernel shows a 10% of the population
  10. Some Definitions Contributions: commit, patchset, code review, email Other potential

    metrics: diversity by company, fairness in the code review among organizations and genders, transparency in the process Available but sensitive info: affiliation, countries, time to review
  11. Architecture Original Data Sources • Git and mailing lists •

    ~ 600K commits (359K w/o merges and deb) • ~ 150K emails • ~ 300K changesets (w/o deb) • ~ 1M patchsets (w/o deb) • ~ 1.5M code reviews (w/o deb) @
  12. Architecture Mining Tools Perceval • Produces JSON documents from the

    usual data sources in OSS • Part of the GrimoireLab toolchain • grimoirelab.github.io
  13. Architecture Info Enrich. Genderize.io Pandas Manual work • Genderize.io: name

    database • Pandas: data analysis lib. • Ceres library (dicortazar/ceres @ github) • Manual work:
  14. Architecture Viz ElasticSearch + Kibana • ElasticSearch: Schemaless db •

    Kibana: works great with ES • This tandem helps a lot to verify info • Drill down capabilities • Extra info available (but not displayed)
  15. Validation: manual work Check main contributors by hand Asian names

    hard to check ( u_u ) (help needed!) Mailing lists providing not expected format
  16. Git Activity and Population Women activity (all history): 27,162 commits

    (7.35% of activity) 839 (10.63% of population)
  17. Git Activity and Population Women activity (last year): 7748 commits

    (8,58% of the activity) 422 developers (11,53% of the population)
  18. Git Activity Women Evolution • There are jumps at the

    beginning of the year • Stable during the last year
  19. Mailing List WOO Activity • 14K emails, 9% of the

    activity • 672 WOO participants, 9,07% of the population • Similar numbers to the last year
  20. Code Reviews Overview • Projects not found in the yaml

    file were ignored • Package-deb project also ignored
  21. Gerrit WOO Activity • 28,503 changesets sent, 9,4% activity •

    812 women sending changesets, 11,87% of the population • 9,56% of the activity and 13% of the population during the last year Women sending changesets
  22. Some Answers • Similar activity in Git: increase in the

    number of repositories • WOO lower activity as core reviewers (~ -9%) ◦ Activity has increased on the other hand (~ 6%)
  23. Conclusions Room for improvement of the dataset This provides some

    initial numbers about the current status Hopefully useful for the Foundation
  24. Open Questions from Last Talk Question: Is there a specific

    action for helping you with the data correctness or the name identification? Suggestion: integrate openstack id with gerrit and in the members foundation directory, there's specific information related to gender Video: https://www.youtube.com/watch?v=TQIQCT-Aqpo
  25. Open Questions from Last Talk Comment: the reason why the

    documentation project is doing so great is because they have great inclusive leaders Comment: Another interesting point is 'retention': how to bring them on board and keep them contributing Video: https://www.youtube.com/watch?v=TQIQCT-Aqpo
  26. Open Questions from Last Talk Suggestion: work on relative numbers

    and not that much in the net numbers. As projects come and go, it would be interesting to work at this level. Comment: working at the level of high school, works done in the USA/Europe? People are willing to help with this line. Suggestion: address people outside of the gender binary Video: https://www.youtube.com/watch?v=TQIQCT-Aqpo
  27. Further Work Sensitive info: dashboard still private Extra analysis: time

    to merge fairness, companies women %, Outreachy follow ups, quarterly reports, updated data, specific policies ROI and others. This [hopefully] helps to have a better picture Other minorities analysis could be done
  28. Gender-diversity analysis of technical contributions Daniel Izquierdo Cortázar @dizquierdo dizquierdo

    at bitergia dot com https://speakerdeck.com/bitergia OpenStack Summit, Barcelona 2016