Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Gender Diversity Analysis in OSS Projects

Gender Diversity Analysis in OSS Projects

PyData Meetup, Madrid

Bitergia

June 29, 2017
Tweet

More Decks by Bitergia

Other Decks in Technology

Transcript

  1. Gender Diversity Analysis in OSS Projects PyData Meetup 29th June,

    2017 Madrid Daniel Izquierdo, CDO [email protected] @dizquierdo speakerdeck.com/bitergia
  2. Tweet => 13% of people attending the OpenStack Summit were

    women Tweet => How many of them are actually contributing to the source code? Intro
  3. Goal-Question-Metric approach • Contextualize tech gender-diversity groups • Data sources

    available for the analysis • Tooling • Results and Further Work How To
  4. Governance -> Goals <- Questions <- Metrics Goal: Increase gender

    diversity in the OpenStack Foundation Context
  5. FOSS Survey in 2013: - 11% of women answered the

    survey The Industry Gender Gap by the World Economic Forum. - 5% for CEOs, 21% for Mid-level roles, 32% of Junior roles Context
  6. Tooling Original Data Sources Mining Tools Perceval @ GrimoireLab Info

    Enrich. Genderize.io Ceres/ Pandas Jupyter Notebooks Manual work Viz ElasticSearch + Kibana
  7. Results Original Data Sources • Git and Gerrit repos based

    on yaml at Governance • ~ 1M commits • ~ 500K changesets • ~ 1.5M patchset uploads • ~ 1.8M patches code reviews
  8. Results Mining Tools Perceval • At grimoirelab.github.io • Parses API’s,

    logs, etc and produces JSON documents • Those are later stored in ElasticSearch
  9. Results Info Enrich. Genderize.io Pandas Jupyter Notebooks Manual work •

    Genderize.io: name database • Ceres: data analysis lib. to work with Perceval • Jupyter Notebook: web app. For data analysis • Manual work:
  10. Results Viz ElasticSearch + Kibana • ElasticSearch: Schemaless db •

    Kibana: works great with ES • This tandem helps a lot to verify info • Drill down capabilities
  11. Women activity (last year): ~ 11% of the population (

    ~ 340 active developers ) ~ 9% of the activity ( >=6k commits ) OpenStack (Austin)
  12. Women activity (last year): ~ 6.8% of the activity (

    ~ 4k commits ) ~ 9.9% of the population ( ~ 330 active developers ) Linux Kernel
  13. Women activity (last year): ~2K commits (6.5% of the activity)

    71 developers (8.5% of the population) Hadoop
  14. Users It’s important to understand your potential users! C-level? Middle

    management? Developers? Community? This study aims at understanding the current situation And look for best practices