Gender Diversity Analysis in the Python Interpreter

7dddc875546948b5b5094167c90dc10d?s=47 Bitergia
September 24, 2017

Gender Diversity Analysis in the Python Interpreter

PyConES 2017, Cáceres.
Daniel Izquierdo Cortázar

7dddc875546948b5b5094167c90dc10d?s=128

Bitergia

September 24, 2017
Tweet

Transcript

  1. Gender-diversity analysis of technical contributions (In the Python Interpreter) Daniel

    Izquierdo Cortázar @dizquierdo dizquierdo at bitergia dot com https://speakerdeck.com/bitergia PyConES, Cáceres 2017
  2. Outline Introduction First Steps Some numbers and method Conclusions

  3. Introduction A bit about me Why this analysis What we

    have so far
  4. /me CDO in Bitergia, the software development analytics company Lately

    involved in understanding the gender diversity in some OSS communities (OpenStack gender report) Involved in some analytics dashboards: OPNFV, Wikimedia, Eclipse... Disclaimer: not involved in any Python working group, own analysis and interest, I may have missed some stuff...
  5. None
  6. Motivation Diversity matters I felt a lack of a quantitative

    approach in the WOO related talks (Women of OpenStack) (Tokyo and Austin) Produced some numbers that gained some attention In the end this is all about transparency and improvement We need data to make decisions Diversity is a challenge in any OSS project
  7. What Diversity Python diversity statement: “ some of these attributes

    include (but are not limited to): age, culture, ethnicity, gender identity or expression, national origin, physical or mental difference, politics, race, religion, sex, sexual orientation, socio-economic status, and subculture. We welcome people regardless of the values of these or other attributes.”
  8. Context FOSS Survey in 2013: - http://floss2013.libresoft.es/results.en.html - 11% of

    women answered the survey The Industry Gender Gap by the World Economic Forum. - 5% for CEOs, 21% for Mid-level roles, 32% of Junior roles
  9. https://blog.pinterest.com/en/our -plan-more-diverse-pinterest http://www.google.com/diversity/ http://newsroom.fb.com/news/2015/0 6/driving-diversity-at-facebook/ https://blogs.dropbox.com/dropbox/2014/11/stren gthening-dropbox-through-diversity/ Context

  10. Context 11% [330 devs] 10% [340 devs] 8% [71 devs]

    More data at speakderdeck.com/bitergia
  11. Summary Conclusions not representative, but: - Women represent around 30%/40%

    of the workforce in tech companies. - And between 10% and 20% if focused on tech teams. - OpenStack: 11% of the population - Linux Kernel shows a 10% of the population - Hadoop ecosystem around 8%. - What about some projects in the Python ecosystem?
  12. First Steps

  13. Definitions Contribution: commit (no merges, no bots) Other potential metrics:

    diversity by company, fairness in the code review among organizations and genders, transparency in the process Available but sensitive info: affiliation, countries, time to review Focus on the Python Interpreter
  14. Arch. Original Data Sources Mining Tools Perceval Info Enrich. Genderize.io

    Pandas Manual work Viz ElasticSearch + Kibana
  15. Arch. Original Data Sources • Git • ~ 50 repositories:

    • > 130K commits • > 1.1K developers • GitHub Org: (github.com/python)
  16. Arch. Mining Tools Perceval • Produces JSON documents from the

    usual data sources in OSS • Part of the GrimoireLab toolchain • https://grimoirelab.github.io • Founder of the CHAOSS group (https://chaoss.community)
  17. Arch. Info Enrich. Genderize.io Pandas Manual work • Genderize.io: name

    database • Pandas: data analysis lib. • Ceres library (dicortazar/ceres @ github) • Manual work:
  18. Arch. Viz ElasticSearch + Kibana • ElasticSearch: Schemaless db •

    Kibana: works great with ES • This tandem helps a lot to verify info • Drill down capabilities • Extra info available (but not displayed)
  19. Some numbers Git Contributions

  20. Overview • Aggregated historical data

  21. Activity/ Population Women activity (all history): > 1.5K commits (1.2%

    of the activity) 50 developers (4% of the population)
  22. Activity/ Population Women activity (last year): ~350 commits (4.3% of

    the activity) 23 developers (3.75% of the population)
  23. Women Activity • Real activity by women starts in 2012

    and then growing
  24. The most Diverse repos • Interesting to look for the

    best practices and learn from those • This may be biased by external factors I’m not aware of (eg: version control system migrations…) VS All Contributors: Cpython Typeshed Mypy Peps Devguide Planet Pythondotorg Asyncio Psf-chef Typing
  25. Social Network • All community

  26. Social Network • Women

  27. Conclusions Comparison Data to Make Decisions Open Paths

  28. Summary Women Population 11% 10% 8% 4% [CPython]

  29. And now? Forget about the numbers! Clear issue in the

    industry Glass ceiling of 10% Diversity & Inclusion is a challenge [Permanent & Updated] Data can help to be aware and lead a change Data and tech. are just a tool to achieve our goals
  30. Some Steps Outreachy Specific tracks about diversity: • Diversity empowerment

    summit (LA, Prague) Working groups: Women of OpenStack, PyLadies, Django Girls, Women in Open Source What about data? • OpenStack Gender-Diversity Report (Intel & Bitergia) ◦ goo.gl/8H9qr7
  31. How to? Cross Foundations Initiative Can we learn from others?

    Recommendations from OpenStack • Policies impact study • Collaboration with PTLs • Bring women to key positions • Keep supporting the WOO group • Enforce the CoC • Documentation, onboarding process and mentoring as a baseline for any project
  32. Open Questions Are people from the PSF, ASF or OpenStack

    talking together? Other Initiatives within the PSF? Companies are the ones bringing a big % of women
  33. Decisions based on data!

  34. Further Work Sensitive info: dashboard’s private Extra analysis: time to

    merge fairness, companies women %, previous Outreachy follow ups, quarterly reports, updated data, specific policies ROI and others. This [hopefully] helps to have a better picture Other minorities analysis could be done Gender diversity is not binary
  35. Conclusion Room for improvement of the dataset This provides some

    initial numbers about the current status Hopefully useful for the Python ecosystem I’d love to learn about Python initiatives
  36. Gender-diversity analysis of technical contributions (In the Python Interpreter) Daniel

    Izquierdo Cortázar @dizquierdo dizquierdo at bitergia dot com https://speakerdeck.com/bitergia PyConES, Cáceres 2017