Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Open Development Analytics (reduced version)

Open Development Analytics (reduced version)

Talk at the Metrics Session of the Open Source Summit Paris 2016. November 16th 2016, Paris (France).

Jesus M. Gonzalez-Barahona

November 15, 2016
Tweet

More Decks by Jesus M. Gonzalez-Barahona

Other Decks in Technology

Transcript

  1. Open Development Analytics
    A Step Towards More Project Transparency
    (Reduced version)
    Jesus M. Gonzalez-Barahona
    [email protected] @jgbarah http://speakerdeck.com/jgbarah
    Bitergia / LibreSoft (URJC)
    Open Source Summit
    Paris (France), November 16th 2016
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 1 / 54

    View Slide

  2. Open
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 2 / 54

    View Slide

  3. Software development
    http://xkcd.com/844/
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 3 / 54

    View Slide

  4. Analytics
    https://en.wikipedia.org/wiki/Charles_Joseph_Minard
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 4 / 54

    View Slide

  5. Open Development Analytics
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 5 / 54

    View Slide

  6. Structure of the presentation
    1 A bit of context
    2 Transparency and governance
    3 Open development analytics
    4 How are changes being reviewed?
    5 Dependency
    6 Dealing with issues?
    7 Diversity
    8 The end
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 6 / 54

    View Slide

  7. A bit of context
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 7 / 54

    View Slide

  8. Me and my two hats
    Uni Rey Juan Carlos:
    LibreSoft research team
    Understanding free, open source software
    Data analytics approach
    Bitergia:
    From research to the real world
    Understanding software development
    Data analytics approach
    http://gsyc.es/~jgb
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 8 / 54

    View Slide

  9. The company
    The software development analytics company
    dashboards
    reports
    consultancy
    ...
    http://bitergia.com
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 9 / 54

    View Slide

  10. Transparency and governance
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 10 / 54

    View Slide

  11. Who drives open software developoment?
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 11 / 54

    View Slide

  12. Who drives open software development
    A community
    Persons (and organizations) with
    common goals
    different interests
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 12 / 54

    View Slide

  13. Working together
    Self-awareness
    Governance
    Transparency
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 13 / 54

    View Slide

  14. Self-awareness
    Open development communities
    need to be self-aware
    data is the source for awareness...
    when it can be used for “sensing”
    The same applies
    to any open organization
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 14 / 54

    View Slide

  15. Governance
    “Establishment of policies, and continuous
    monitoring of their proper implementation, by the
    members of the governing body of an
    organization. It includes the mechanisms required
    to balance the powers of the members (with the
    associated accountability), and their primary duty
    of enhancing the prosperity and viability of the
    organization.”
    http://businessdictionary.com
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 15 / 54

    View Slide

  16. Governance
    “Establishment of policies, and continuous
    monitoring of their proper implementation, by
    the members of the governing body of an
    organization. It includes the mechanisms required
    to balance the powers of the members (with the
    associated accountability), and their primary
    duty of enhancing the prosperity and viability of
    the organization.”
    http://businessdictionary.com
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 16 / 54

    View Slide

  17. Transparency
    It comes in two flavors
    Transparency to the community
    (fairness)
    Transparency to third parties
    (trust)
    Which for open organizations are kind of the same
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 17 / 54

    View Slide

  18. Transparency
    Example of rationale (OpenStack):
    “OpenStack favors disclosure and transparency to
    promote sharing and collaboration within the
    OpenStack community”
    https://www.openstack.org/legal/transparency-policy/
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 18 / 54

    View Slide

  19. Transparency: showing the data is not enough
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 19 / 54

    View Slide

  20. Open development analytics
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 20 / 54

    View Slide

  21. A new dimension of openness
    When we develop in the open
    we produce a great deal of data
    about how we develop
    “Show me the development data”
    as a step beyond
    “show me the code”
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 21 / 54

    View Slide

  22. From open development to open development analytics
    Information about code, community, development
    for open development projects
    can be retrieved, organized, analyzed
    Let’s publish analytics results & data
    Open Development Analytics:
    A new standard for transparency
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 22 / 54

    View Slide

  23. Open development analytics
    Who may benefit?
    Developers
    Project managers
    Community managers
    Evaluators
    ...
    Anyone interested in the health of the project
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 23 / 54

    View Slide

  24. Who may benefit?
    Slide used by Jim Zemlin at LF Collab 2016
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 24 / 54

    View Slide

  25. Some areas of interest
    Performance (understanding activity)
    Company participation (beyond copyright
    notices)
    Transparency (available information)
    Auditing (certify participation, experience, etc.)
    Profiling (key people, companies)
    Neutrality (fair treatment)
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 25 / 54

    View Slide

  26. How are changes being
    reviewed?
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 26 / 54

    View Slide

  27. Some reviewers are more equal than others
    http://blog.bitergia.com/2015/12/30/
    some-developers-are-more-equal-than-others/
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 27 / 54

    View Slide

  28. Neutrality?
    q
    q
    q
    q q
    q
    q q
    0
    1
    2
    3
    250 500 1000 2000 4000
    Number of accepted reviews
    Iterations per accepted review (median)
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 28 / 54

    View Slide

  29. Dependency
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 29 / 54

    View Slide

  30. Apache Pony Factor
    In words of Daniel Gruno:
    We [the ASF] created a term we have coined
    “Pony Factor” (because ASF is full of ponies, or
    people who think they are ponies). Pony Factor
    (PF) shows the diversity of a project in terms of
    the division of labor among committers in a
    project.
    Pony Factor is determined as:
    “The lowest number of committers whose
    total contribution constitutes the majority of
    the codebase”
    https://ke4qqq.wordpress.com/2015/02/08/pony-factor-math/
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 30 / 54

    View Slide

  31. Bitergia Elephant Factor
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 31 / 54

    View Slide

  32. Bitergia Elephant Factor
    Projects can benefit from powerful collaborations
    from companies (elephants). The elephant factor
    shows the diversity of a project in terms of the
    division of labor among companies (by mean of
    developers affiliated with them).
    Elephant factor is determined as:
    “The lowest number of companies whose
    total contribution (in commits by their
    employees) constitutes the majority of the
    commits”
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 32 / 54

    View Slide

  33. Code “owned”
    “The land belongs
    to its workers”
    Emiliano Zapata
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 33 / 54

    View Slide

  34. Code “owned”
    The code changes over time. The current version is
    “owned” by the people who produced it.
    The code “belongs” to those who wrote it.
    Zapata factor (work in progress):
    “The lowest number of developers for whom
    the total number of lines of code they “own”
    (were last touched by them) constitutes the
    majority of the lines of code”
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 34 / 54

    View Slide

  35. Diversity: Code “owned”
    [Linux kernel, July 2016, Zapata factor: 200]
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 35 / 54

    View Slide

  36. Code “owned”
    The code “belongs” to companies who employ
    developers changing it.
    United Fruit factor (work in progress):
    “The lowest number of companies for whom
    the total number of lines of code they “own”
    (were last touched by their employees)
    constitutes the majority of the lines of code”
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 36 / 54

    View Slide

  37. Pony / elephant factors for some projects
    Pony Factor Elephant Factor Commits (excl bots)
    OpenNebula 4 1 12K
    Eucalyptus 5 1 25K
    CloudStack 14 1 42K
    OpenStack >100 6 126K
    CloudFoundry 41 1 60K
    OpenShift 10 1 15K
    Docker 15 1 18K
    Kubernetes 12 1 7K
    [July 2015]
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 37 / 54

    View Slide

  38. Dealing with issues?
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 38 / 54

    View Slide

  39. Issues may be processed not as intended
    Policy (or recommendations) may mandate transitions
    but are they real?
    Time to close when same company reporting / fixing?
    Time to close for external bug reports?
    Time to close depending on who reports?
    Who opens tickets that nobody cares about?
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 39 / 54

    View Slide

  40. Ej: The “mandated” changes of state
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 40 / 54

    View Slide

  41. The real changes of state
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 41 / 54

    View Slide

  42. Diversity
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 42 / 54

    View Slide

  43. Geography
    Geographical diversity is difficult to assess
    Companies can keep detailed records, but open
    communties are different
    Fortunately, some tools leave traces...
    This allows for better knowledge
    ...and better tracking of initiatives
    Example: policies to enlarge the number of developers
    in XXX region
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 43 / 54

    View Slide

  44. Geography: time zones in git records
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 44 / 54

    View Slide

  45. Geography: GitHub profiles
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 45 / 54

    View Slide

  46. Gender: Analyzing by name
    Current situation of gender imbalance in OpenStack
    Gender Developers Commmits Commits/devel
    Female 750 14,647 19.5
    Male 4,632 207,112 44.7
    Only names with more than 80% of certainty.
    [Work in progress, preliminary results]
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 46 / 54

    View Slide

  47. Gender: Analyzing by name
    Commits by women: 6.8% (4 Kcommits)
    Women: 9.9% (330 developers)
    Linux kernel, Nov 2015 – Oct 2016
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 47 / 54

    View Slide

  48. The end
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 48 / 54

    View Slide

  49. Open Development Analytics Live: OPNFV dashboard
    http://opnfv.biterg.io
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 49 / 54

    View Slide

  50. Summary
    Open Development Analytics
    A step forward in project
    transparency
    http://grimoirelab.github.io
    http://speakerdeck.com/jgbarah
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 50 / 54

    View Slide

  51. A moment for a commercial: Join us at MSR 2017!!
    http://2017.msrconf.org
    14th International
    Conference on
    Mining Software
    Repositories
    Co-located with ICSE
    Buenos Aires, Argentina
    Save the dates:
    May 20-21 2017
    Start the conversation!!!
    #msr17
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 51 / 54

    View Slide

  52. License
    c 2016 Bitergia
    Some rights reserved.
    This presentation is distributed under the
    “Attribution-ShareAlike 3.0” license, by Creative Commons,
    available at
    http://creativecommons.org/licenses/by-sa/3.0/
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 52 / 54

    View Slide

  53. Credits (1)
    “Man With Two Hats”
    Statue by Henk Visch, located in Otawa, Canada
    Picture by Lezumbalaberenjena in Wikimedia Commons
    License: Public domain
    https://commons.wikimedia.org/wiki/File:
    Man_With_Two_Hats_Ottawa_Statue_by_lezumbalaberenjena.jpg
    “Napoleon’s Russian campaign of 1812”
    Original by Charles Minard
    License: Public domain
    https://en.wikipedia.org/wiki/Charles_Joseph_Minard#/media/File:
    Minard.png
    “Aged Come In We’re Open”
    Picture by Czarina Alegre in Flickr
    License: Creative Commons Attribution 2.0
    https://flic.kr/p/fjGamh
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 53 / 54

    View Slide

  54. Credits (2)
    “Good code”
    Comic by Randall Munroe, XKCD 844
    License: Creative Commons Attribution-NonCommercial 2.5
    http://xkcd.com/844/
    “Crowd at FOSDEM 2008”
    Picture by Jes´
    us Corrius in Flickr
    Licenses: Creative Commmons Attribution 2.0
    http://www.flickr.com/photos/jcorrius/2302302707/
    “Elephant”
    Picture by ajoheyho
    License: Creative Commons Public Domain
    https://pixabay.com/en/elephant-african-bush-elephant-114543/
    “Emiliano Zapata”
    License: Public Domain
    Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 54 / 54

    View Slide