$30 off During Our Annual Pro Sale. View Details »

Gender-diversity analysis of technical contributions in OpenStack (Barcelona edition)

Gender-diversity analysis of technical contributions in OpenStack (Barcelona edition)

OpenStack Summit, Barcelona 2016.
Daniel Izquierdo.

Bitergia
PRO

October 26, 2016
Tweet

More Decks by Bitergia

Other Decks in Technology

Transcript

  1. Gender-diversity analysis
    of technical contributions
    Daniel Izquierdo Cortázar
    @dizquierdo
    dizquierdo at bitergia dot com
    https://speakerdeck.com/bitergia
    OpenStack Summit, Barcelona 2016

    View Slide

  2. Outline
    Introduction
    First Steps
    Some numbers and method
    Conclusions

    View Slide

  3. Introduction
    A bit about me
    Why this analysis
    What we have so far

    View Slide

  4. /me
    CDO in Bitergia, the software development analytics
    company
    Lately involved in understanding the gender diversity in some
    OSS communities
    Involved in OPNFV dashboard (opnfv.biterg.io)
    Disclaimer: not involved in any working group, own analysis and
    interest, I may have missed some stuff...

    View Slide

  5. Why this study
    Diversity matters
    I attended some (Women of OpenStack) talks in the
    OpenStack Summit (Tokyo and Austin)
    Produced some numbers that gained some attention:
    OpenStack and Linux Kernel
    In the end this is all about transparency and improvement
    Update the numbers

    View Slide

  6. What we have so far
    FOSS Survey in 2013:
    - http://floss2013.libresoft.es/results.en.html
    - 11% of women answered the survey
    The Industry Gender Gap by the World Economic Forum.
    - 5% for CEOs, 21% for Mid-level roles, 32% of Junior roles

    View Slide

  7. Some companies
    Pinterest Engineering
    focused employees.
    https://blog.pinterest.com/en/our-
    plan-more-diverse-pinterest

    View Slide

  8. Some companies
    Google Tech focused
    employees.
    http://www.google.com/diversity/

    View Slide

  9. Some companies
    Facebook Tech focused
    employees.
    http://newsroom.fb.com/news/201
    5/06/driving-diversity-at-facebook
    /

    View Slide

  10. Some companies
    Dropbox all employees.
    https://blogs.dropbox.com/dropbo
    x/2014/11/strengthening-dropbox
    -through-diversity/

    View Slide

  11. OpenStack (Austin) numbers
    Women activity (all of the history):
    ~ 10,5% of the population ( ~ 570 developers )
    ~ 6,8% of the activity ( >=16k commits )

    View Slide

  12. OpenStack (Austin) numbers
    Women activity (last year):
    ~ 11% of the population ( ~ 340 active developers )
    ~ 9% of the activity ( >=6k commits )

    View Slide

  13. Linux Kernel Numbers
    Women activity (since 2005):
    ~ 5.2% ( > 31K commits)
    ~ 8% of the population ( ~ 1,15K developers)

    View Slide

  14. Linux Kernel Numbers
    Women activity (last year):
    ~ 6.8% of the activity ( ~ 4k commits )
    ~ 9.9% of the population ( ~ 330 active developers )

    View Slide

  15. Summary
    Conclusions not representative, but:
    - Women represents around 30%/40% of the workforce in
    tech companies.
    - And between 10% and 20% if focused on tech teams.
    - OpenStack shows a 11% of the population
    - Linux Kernel shows a 10% of the population

    View Slide

  16. First Steps

    View Slide

  17. Some Definitions
    Contributions: commit, patchset, code review, email
    Other potential metrics: diversity by company, fairness in the
    code review among organizations and genders, transparency
    in the process
    Available but sensitive info: affiliation, countries, time to
    review

    View Slide

  18. First Steps
    Names databases
    Genderize.io
    Manual analysis
    Focus on main developers

    View Slide

  19. Architecture
    Original
    Data Sources
    Mining
    Tools
    Perceval
    Info
    Enrich.
    Genderize.io
    Pandas
    Manual work
    Viz
    ElasticSearch
    +
    Kibana
    @

    View Slide

  20. Architecture
    Original
    Data Sources
    ● Git and mailing lists
    ● ~ 600K commits (359K w/o merges and deb)
    ● ~ 150K emails
    ● ~ 300K changesets (w/o deb)
    ● ~ 1M patchsets (w/o deb)
    ● ~ 1.5M code reviews (w/o deb)
    @

    View Slide

  21. Architecture
    Mining
    Tools
    Perceval
    ● Produces JSON documents from the usual
    data sources in OSS
    ● Part of the GrimoireLab toolchain
    ● grimoirelab.github.io

    View Slide

  22. Architecture
    Info
    Enrich.
    Genderize.io
    Pandas
    Manual work
    ● Genderize.io: name database
    ● Pandas: data analysis lib.
    ● Ceres library (dicortazar/ceres @ github)
    ● Manual work:

    View Slide

  23. Architecture
    Viz
    ElasticSearch
    +
    Kibana
    ● ElasticSearch: Schemaless db
    ● Kibana: works great with ES
    ● This tandem helps a lot to verify info
    ● Drill down capabilities
    ● Extra info available (but not displayed)

    View Slide

  24. Validation: manual work
    Check main contributors by hand
    Asian names hard to check ( u_u ) (help needed!)
    Mailing lists providing not expected format

    View Slide

  25. Some numbers
    Git Contributions
    Gerrit Contributions
    Mailing Lists Contributions

    View Slide

  26. Git Overview
    ● Aggregated historical
    data
    ● Yaml file

    View Slide

  27. Git Activity and Population
    Women activity (all history):
    27,162 commits (7.35% of activity)
    839 (10.63% of population)

    View Slide

  28. Git Activity and Population
    Women activity (last year):
    7748 commits (8,58% of the activity)
    422 developers (11,53% of the population)

    View Slide

  29. Git Activity Women Evolution
    ● There are jumps at the beginning of the year
    ● Stable during the last year

    View Slide

  30. Git Authors Women Evolution
    ● Interesting pattern: each year is kind of a block of developers

    View Slide

  31. Mailing Lists Overview
    ● Openstack-dev
    ● Language-oriented
    mailing lists

    View Slide

  32. Mailing List WOO Activity
    ● 14K emails, 9% of the activity
    ● 672 WOO participants, 9,07% of the population
    ● Similar numbers to the last year

    View Slide

  33. Code Reviews Overview
    ● Projects not found in the
    yaml file were ignored
    ● Package-deb project
    also ignored

    View Slide

  34. Gerrit WOO Activity
    ● 28,503 changesets sent, 9,4% activity
    ● 812 women sending changesets, 11,87% of the population
    ● 9,56% of the activity and 13% of the population during the last year
    Women sending changesets

    View Slide

  35. Analysis Comparison with the status in
    Austin (6 months ago)

    View Slide

  36. October-April-October (Git)
    October 2015 - April 2016
    April 2016 - October 2016

    View Slide

  37. October-April-October
    October 2015 - April 2016
    April 2016 - October 2016
    Women reviewers

    View Slide

  38. Conclusions
    Answer to First Questions
    Data to Make Decisions
    Open Paths

    View Slide

  39. Some Answers
    ● Similar activity in Git: increase in the number of
    repositories
    ● WOO lower activity as core reviewers (~ -9%)
    ○ Activity has increased on the other hand (~ 6%)

    View Slide

  40. Conclusions
    Room for improvement of the dataset
    This provides some initial numbers about the current status
    Hopefully useful for the Foundation

    View Slide

  41. Open Questions from Last Talk
    Question: Is there a specific action for helping you with the
    data correctness or the name identification?
    Suggestion: integrate openstack id with gerrit and in the
    members foundation directory, there's specific information
    related to gender
    Video: https://www.youtube.com/watch?v=TQIQCT-Aqpo

    View Slide

  42. Open Questions from Last Talk
    Comment: the reason why the documentation project is doing
    so great is because they have great inclusive leaders
    Comment: Another interesting point is 'retention': how to
    bring them on board and keep them contributing
    Video: https://www.youtube.com/watch?v=TQIQCT-Aqpo

    View Slide

  43. Open Questions from Last Talk
    Suggestion: work on relative numbers and not that much in
    the net numbers. As projects come and go, it would be
    interesting to work at this level.
    Comment: working at the level of high school, works done in
    the USA/Europe? People are willing to help with this line.
    Suggestion: address people outside of the gender binary
    Video: https://www.youtube.com/watch?v=TQIQCT-Aqpo

    View Slide

  44. Further Work
    Sensitive info: dashboard still private
    Extra analysis: time to merge fairness, companies women %,
    Outreachy follow ups, quarterly reports, updated data,
    specific policies ROI and others.
    This [hopefully] helps to have a better picture
    Other minorities analysis could be done

    View Slide

  45. Gender-diversity analysis
    of technical contributions
    Daniel Izquierdo Cortázar
    @dizquierdo
    dizquierdo at bitergia dot com
    https://speakerdeck.com/bitergia
    OpenStack Summit, Barcelona 2016

    View Slide