Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Gender-diversity analysis of technical contributions in OpenStack (Barcelona edition)

Gender-diversity analysis of technical contributions in OpenStack (Barcelona edition)

OpenStack Summit, Barcelona 2016.
Daniel Izquierdo.

7dddc875546948b5b5094167c90dc10d?s=128

Bitergia
PRO

October 26, 2016
Tweet

Transcript

  1. Gender-diversity analysis of technical contributions Daniel Izquierdo Cortázar @dizquierdo dizquierdo

    at bitergia dot com https://speakerdeck.com/bitergia OpenStack Summit, Barcelona 2016
  2. Outline Introduction First Steps Some numbers and method Conclusions

  3. Introduction A bit about me Why this analysis What we

    have so far
  4. /me CDO in Bitergia, the software development analytics company Lately

    involved in understanding the gender diversity in some OSS communities Involved in OPNFV dashboard (opnfv.biterg.io) Disclaimer: not involved in any working group, own analysis and interest, I may have missed some stuff...
  5. Why this study Diversity matters I attended some (Women of

    OpenStack) talks in the OpenStack Summit (Tokyo and Austin) Produced some numbers that gained some attention: OpenStack and Linux Kernel In the end this is all about transparency and improvement Update the numbers
  6. What we have so far FOSS Survey in 2013: -

    http://floss2013.libresoft.es/results.en.html - 11% of women answered the survey The Industry Gender Gap by the World Economic Forum. - 5% for CEOs, 21% for Mid-level roles, 32% of Junior roles
  7. Some companies Pinterest Engineering focused employees. https://blog.pinterest.com/en/our- plan-more-diverse-pinterest

  8. Some companies Google Tech focused employees. http://www.google.com/diversity/

  9. Some companies Facebook Tech focused employees. http://newsroom.fb.com/news/201 5/06/driving-diversity-at-facebook /

  10. Some companies Dropbox all employees. https://blogs.dropbox.com/dropbo x/2014/11/strengthening-dropbox -through-diversity/

  11. OpenStack (Austin) numbers Women activity (all of the history): ~

    10,5% of the population ( ~ 570 developers ) ~ 6,8% of the activity ( >=16k commits )
  12. OpenStack (Austin) numbers Women activity (last year): ~ 11% of

    the population ( ~ 340 active developers ) ~ 9% of the activity ( >=6k commits )
  13. Linux Kernel Numbers Women activity (since 2005): ~ 5.2% (

    > 31K commits) ~ 8% of the population ( ~ 1,15K developers)
  14. Linux Kernel Numbers Women activity (last year): ~ 6.8% of

    the activity ( ~ 4k commits ) ~ 9.9% of the population ( ~ 330 active developers )
  15. Summary Conclusions not representative, but: - Women represents around 30%/40%

    of the workforce in tech companies. - And between 10% and 20% if focused on tech teams. - OpenStack shows a 11% of the population - Linux Kernel shows a 10% of the population
  16. First Steps

  17. Some Definitions Contributions: commit, patchset, code review, email Other potential

    metrics: diversity by company, fairness in the code review among organizations and genders, transparency in the process Available but sensitive info: affiliation, countries, time to review
  18. First Steps Names databases Genderize.io Manual analysis Focus on main

    developers
  19. Architecture Original Data Sources Mining Tools Perceval Info Enrich. Genderize.io

    Pandas Manual work Viz ElasticSearch + Kibana @
  20. Architecture Original Data Sources • Git and mailing lists •

    ~ 600K commits (359K w/o merges and deb) • ~ 150K emails • ~ 300K changesets (w/o deb) • ~ 1M patchsets (w/o deb) • ~ 1.5M code reviews (w/o deb) @
  21. Architecture Mining Tools Perceval • Produces JSON documents from the

    usual data sources in OSS • Part of the GrimoireLab toolchain • grimoirelab.github.io
  22. Architecture Info Enrich. Genderize.io Pandas Manual work • Genderize.io: name

    database • Pandas: data analysis lib. • Ceres library (dicortazar/ceres @ github) • Manual work:
  23. Architecture Viz ElasticSearch + Kibana • ElasticSearch: Schemaless db •

    Kibana: works great with ES • This tandem helps a lot to verify info • Drill down capabilities • Extra info available (but not displayed)
  24. Validation: manual work Check main contributors by hand Asian names

    hard to check ( u_u ) (help needed!) Mailing lists providing not expected format
  25. Some numbers Git Contributions Gerrit Contributions Mailing Lists Contributions

  26. Git Overview • Aggregated historical data • Yaml file

  27. Git Activity and Population Women activity (all history): 27,162 commits

    (7.35% of activity) 839 (10.63% of population)
  28. Git Activity and Population Women activity (last year): 7748 commits

    (8,58% of the activity) 422 developers (11,53% of the population)
  29. Git Activity Women Evolution • There are jumps at the

    beginning of the year • Stable during the last year
  30. Git Authors Women Evolution • Interesting pattern: each year is

    kind of a block of developers
  31. Mailing Lists Overview • Openstack-dev • Language-oriented mailing lists

  32. Mailing List WOO Activity • 14K emails, 9% of the

    activity • 672 WOO participants, 9,07% of the population • Similar numbers to the last year
  33. Code Reviews Overview • Projects not found in the yaml

    file were ignored • Package-deb project also ignored
  34. Gerrit WOO Activity • 28,503 changesets sent, 9,4% activity •

    812 women sending changesets, 11,87% of the population • 9,56% of the activity and 13% of the population during the last year Women sending changesets
  35. Analysis Comparison with the status in Austin (6 months ago)

  36. October-April-October (Git) October 2015 - April 2016 April 2016 -

    October 2016
  37. October-April-October October 2015 - April 2016 April 2016 - October

    2016 Women reviewers
  38. Conclusions Answer to First Questions Data to Make Decisions Open

    Paths
  39. Some Answers • Similar activity in Git: increase in the

    number of repositories • WOO lower activity as core reviewers (~ -9%) ◦ Activity has increased on the other hand (~ 6%)
  40. Conclusions Room for improvement of the dataset This provides some

    initial numbers about the current status Hopefully useful for the Foundation
  41. Open Questions from Last Talk Question: Is there a specific

    action for helping you with the data correctness or the name identification? Suggestion: integrate openstack id with gerrit and in the members foundation directory, there's specific information related to gender Video: https://www.youtube.com/watch?v=TQIQCT-Aqpo
  42. Open Questions from Last Talk Comment: the reason why the

    documentation project is doing so great is because they have great inclusive leaders Comment: Another interesting point is 'retention': how to bring them on board and keep them contributing Video: https://www.youtube.com/watch?v=TQIQCT-Aqpo
  43. Open Questions from Last Talk Suggestion: work on relative numbers

    and not that much in the net numbers. As projects come and go, it would be interesting to work at this level. Comment: working at the level of high school, works done in the USA/Europe? People are willing to help with this line. Suggestion: address people outside of the gender binary Video: https://www.youtube.com/watch?v=TQIQCT-Aqpo
  44. Further Work Sensitive info: dashboard still private Extra analysis: time

    to merge fairness, companies women %, Outreachy follow ups, quarterly reports, updated data, specific policies ROI and others. This [hopefully] helps to have a better picture Other minorities analysis could be done
  45. Gender-diversity analysis of technical contributions Daniel Izquierdo Cortázar @dizquierdo dizquierdo

    at bitergia dot com https://speakerdeck.com/bitergia OpenStack Summit, Barcelona 2016