Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Gender Diversity Analysis in the Python Interpreter

Gender Diversity Analysis in the Python Interpreter

PyConES 2017, Cáceres.

Daniel Izquierdo Cortazar

September 24, 2017
Tweet

More Decks by Daniel Izquierdo Cortazar

Other Decks in Technology

Transcript

  1. Gender-diversity analysis of technical contributions (In the Python Interpreter) Daniel

    Izquierdo Cortázar @dizquierdo dizquierdo at bitergia dot com https://speakerdeck.com/bitergia PyConES, Cáceres 2017
  2. /me CDO in Bitergia, the software development analytics company Lately

    involved in understanding the gender diversity in some OSS communities (OpenStack gender report) Involved in some analytics dashboards: OPNFV, Wikimedia, Eclipse... Disclaimer: not involved in any Python working group, own analysis and interest, I may have missed some stuff...
  3. Motivation Diversity matters I felt a lack of a quantitative

    approach in the WOO related talks (Women of OpenStack) (Tokyo and Austin) Produced some numbers that gained some attention In the end this is all about transparency and improvement We need data to make decisions Diversity is a challenge in any OSS project
  4. What Diversity Python diversity statement: “ some of these attributes

    include (but are not limited to): age, culture, ethnicity, gender identity or expression, national origin, physical or mental difference, politics, race, religion, sex, sexual orientation, socio-economic status, and subculture. We welcome people regardless of the values of these or other attributes.”
  5. Context FOSS Survey in 2013: - http://floss2013.libresoft.es/results.en.html - 11% of

    women answered the survey The Industry Gender Gap by the World Economic Forum. - 5% for CEOs, 21% for Mid-level roles, 32% of Junior roles
  6. Context 11% [330 devs] 10% [340 devs] 8% [71 devs]

    More data at speakderdeck.com/bitergia
  7. Summary Conclusions not representative, but: - Women represent around 30%/40%

    of the workforce in tech companies. - And between 10% and 20% if focused on tech teams. - OpenStack: 11% of the population - Linux Kernel shows a 10% of the population - Hadoop ecosystem around 8%. - What about some projects in the Python ecosystem?
  8. Definitions Contribution: commit (no merges, no bots) Other potential metrics:

    diversity by company, fairness in the code review among organizations and genders, transparency in the process Available but sensitive info: affiliation, countries, time to review Focus on the Python Interpreter
  9. Arch. Original Data Sources • Git • ~ 50 repositories:

    • > 130K commits • > 1.1K developers • GitHub Org: (github.com/python)
  10. Arch. Mining Tools Perceval • Produces JSON documents from the

    usual data sources in OSS • Part of the GrimoireLab toolchain • https://grimoirelab.github.io • Founder of the CHAOSS group (https://chaoss.community)
  11. Arch. Info Enrich. Genderize.io Pandas Manual work • Genderize.io: name

    database • Pandas: data analysis lib. • Ceres library (dicortazar/ceres @ github) • Manual work:
  12. Arch. Viz ElasticSearch + Kibana • ElasticSearch: Schemaless db •

    Kibana: works great with ES • This tandem helps a lot to verify info • Drill down capabilities • Extra info available (but not displayed)
  13. Activity/ Population Women activity (all history): > 1.5K commits (1.2%

    of the activity) 50 developers (4% of the population)
  14. Activity/ Population Women activity (last year): ~350 commits (4.3% of

    the activity) 23 developers (3.75% of the population)
  15. The most Diverse repos • Interesting to look for the

    best practices and learn from those • This may be biased by external factors I’m not aware of (eg: version control system migrations…) VS All Contributors: Cpython Typeshed Mypy Peps Devguide Planet Pythondotorg Asyncio Psf-chef Typing
  16. And now? Forget about the numbers! Clear issue in the

    industry Glass ceiling of 10% Diversity & Inclusion is a challenge [Permanent & Updated] Data can help to be aware and lead a change Data and tech. are just a tool to achieve our goals
  17. Some Steps Outreachy Specific tracks about diversity: • Diversity empowerment

    summit (LA, Prague) Working groups: Women of OpenStack, PyLadies, Django Girls, Women in Open Source What about data? • OpenStack Gender-Diversity Report (Intel & Bitergia) ◦ goo.gl/8H9qr7
  18. How to? Cross Foundations Initiative Can we learn from others?

    Recommendations from OpenStack • Policies impact study • Collaboration with PTLs • Bring women to key positions • Keep supporting the WOO group • Enforce the CoC • Documentation, onboarding process and mentoring as a baseline for any project
  19. Open Questions Are people from the PSF, ASF or OpenStack

    talking together? Other Initiatives within the PSF? Companies are the ones bringing a big % of women
  20. Further Work Sensitive info: dashboard’s private Extra analysis: time to

    merge fairness, companies women %, previous Outreachy follow ups, quarterly reports, updated data, specific policies ROI and others. This [hopefully] helps to have a better picture Other minorities analysis could be done Gender diversity is not binary
  21. Conclusion Room for improvement of the dataset This provides some

    initial numbers about the current status Hopefully useful for the Python ecosystem I’d love to learn about Python initiatives
  22. Gender-diversity analysis of technical contributions (In the Python Interpreter) Daniel

    Izquierdo Cortázar @dizquierdo dizquierdo at bitergia dot com https://speakerdeck.com/bitergia PyConES, Cáceres 2017