Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Metrics to Characterize a Software Development Community

Metrics to Characterize a Software Development Community

Invited talk at 12th International Conference on Open Source Systems (OSS). This is a practical talk, based on the contents of our workshop on software development analytics.

More Decks by Jesus M. Gonzalez-Barahona

Other Decks in Technology

Transcript

  1. Metrics to Characterize a Software Development
    Community
    Jesus M. Gonzalez-Barahona
    [email protected] @jgbarah
    Bitergia / LibreSoft (URJC)
    http://speakerdeck.com/jgbarah/
    12th International Conference on Open Source Systems (OSS)
    Gothenburg (Sweden), May 30th 2016
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 1 / 54

    View Slide

  2. Structure of the presentation
    1 A bit of context
    2 Dealing with dynamic complexity
    3 Sources of information
    4 Activity / size
    5 Performance
    6 Demographics
    7 Diversity
    8 Final remarks
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 2 / 54

    View Slide

  3. A bit of context
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 3 / 54

    View Slide

  4. Me and my two hats
    Uni Rey Juan Carlos:
    LibreSoft research team
    Understanding free, open source software
    Data analytics approach
    Bitergia:
    From research to the real world
    Understanding software development
    Data analytics approach
    http://gsyc.es/~jgb
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 4 / 54

    View Slide

  5. The company
    The software development analytics company
    dashboards
    reports
    consultancy
    ...
    http://bitergia.com
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 5 / 54

    View Slide

  6. The book
    Evaluating FOSS Projects:
    Work in progress
    Free / open book
    Fork and play!
    https://jgbarah.gitbooks.io/evaluating-foss-projects/
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 6 / 54

    View Slide

  7. Recommendations
    Open your laptop
    Download the slides (they have links)
    Visit Cauldron.io and produce your own dashboard
    Play with the dashboards
    Understand the interpretations behind the numbers
    http://cauldron.io
    Code: OSS16
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 7 / 54

    View Slide

  8. Preview: The Cauldron
    http://cauldron.io/dashboards/elastic
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 8 / 54

    View Slide

  9. Dealing with dynamic
    complexity
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 9 / 54

    View Slide

  10. Communities may be large and complex
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 10 / 54

    View Slide

  11. Projects may be large and complex... and dynamic
    It’s difficult to...
    ...track what’s happening
    ...understand why it’s happening
    ...react quickly
    ...evaluate results of reaction
    If data is available
    analytics may come to the rescue
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 11 / 54

    View Slide

  12. A continuous process
    Figure out your interest
    Find out available data
    Define key parameters
    Monitor, understand, detect deviations
    Act to correct, improve
    Track results
    Measure → Monitor → Act
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 12 / 54

    View Slide

  13. A continuous process (example)
    Case: company-led development community
    Interest: activity
    Data: changes to code, tickets
    Parameters: commits, tickets closed
    Monitoring: charts, numbers
    Observation: numbers declining
    Action: allocate more developer effort
    Track results...
    Measure → Monitor → Act
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 13 / 54

    View Slide

  14. Sources of information
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 14 / 54

    View Slide

  15. Repositories, repositories, repositories
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 15 / 54

    View Slide

  16. Source code management
    Centralized or client/server: CVS, Subversion
    Decentralized: git, Mercurial, Bazaar, etc.
    Today: most of them accessible through git...
    but not always the information is what appears to be
    (eg: branches in Subversion and git)
    Can be integrated with other tools:
    Gerrit, GitHub, etc.
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 16 / 54

    View Slide

  17. Issue tracking
    Many different systems:
    Bugzilla
    Jira
    GitHub issues
    Phabricator
    RedMine
    Trac ...
    Each with a different model, data, operations...
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 17 / 54

    View Slide

  18. Code review
    More and more projects using it
    Usually: peer review pre-merge change review
    Different methods:
    Mailing lists (eg: Linux)
    Gerrit (eg: OpenStack)
    GitHub pull requests (eg: ElasticSearch)
    or even Jira, Bugzilla...
    Usually, references to tickets and commits
    Much of the control on the software lies here
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 18 / 54

    View Slide

  19. Asynchronous communication
    Mailing lists:
    Mailing lists systems (Mailman)
    Google Groups
    Mailing list archivers (Gmane)
    Forums: too many to mention
    Question/Answer sites: StackOverflow, Askbot
    Information is always archived
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 19 / 54

    View Slide

  20. Synchronous communication
    Systems:
    Traditionally: IRC
    Nowadays: Slack & many others
    Not always text/based (eg: videoconferences)
    Notes:
    In many cases, lack of archives
    Privacy concerns: considered informal communication
    Difficult to track identities
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 20 / 54

    View Slide

  21. The many communities
    Development community: all repositories
    Contributing community: issue tracking, async
    communication
    User community: async communication, ...
    Ecosystem community: difficult to track
    Software may include beacons: tracking usage
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 21 / 54

    View Slide

  22. Activity / size
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 22 / 54

    View Slide

  23. Activity / size
    Many different aspects of activity:
    committing patches:
    source code management system
    reporting, commenting or fixing bugs:
    issue tracking system
    submitting patches or reviewing them:
    code review system
    sending messages:
    async or sync communication systems
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 23 / 54

    View Slide

  24. Activity / size (most common cases)
    Parameters reflecting activity for a certain period.
    People active for a certain period.
    Evolution of any of them.
    Trends for any of them.
    Difficult to compare between projects
    Interesting to compare inside project
    (different subprojects, different time frames)
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 24 / 54

    View Slide

  25. Activity / size (many facets)
    http://cauldron.io/dashboards/elastic
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 25 / 54

    View Slide

  26. Activity / size (many facets)
    http://s.bitergia.com/db-fosdem16
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 26 / 54

    View Slide

  27. Performance
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 27 / 54

    View Slide

  28. Backlog (evolution over time)
    Example: backlog of open issues.
    http://cauldron.io/dashboards/elastic
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 28 / 54

    View Slide

  29. Efficiency
    Example: closed / opened tickets per quarter
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 29 / 54

    View Slide

  30. Tickets
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 30 / 54

    View Slide

  31. Code review (time to merge)
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 31 / 54

    View Slide

  32. Code review (time to merge, metrics)
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 32 / 54

    View Slide

  33. Code review (time to merge, evolution)
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 33 / 54

    View Slide

  34. Code review (number of versions per review)
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 34 / 54

    View Slide

  35. Demographics
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 35 / 54

    View Slide

  36. The many identities of anyone
    The repository level.
    The class of repository level.
    The project level.
    The global level.
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 36 / 54

    View Slide

  37. Demographics: The aging chart
    Attraction Retention
    Newcomers Expertise
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 37 / 54

    View Slide

  38. Diversity
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 38 / 54

    View Slide

  39. Diversity: geographical information (time zones)
    http://cauldron.io/dashboards/elastic
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 39 / 54

    View Slide

  40. Diversity: geographical information (GitHub profiles)
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 40 / 54

    View Slide

  41. Diversity: affiliation
    http://s.bitergia.com/db-fosdem16
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 41 / 54

    View Slide

  42. Diversity: Apache Pony Factor
    In words of Daniel Gruno:
    We [the ASF] created a term we have coined
    “Pony Factor” (because ASF is full of ponies, or
    people who think they are ponies). Pony Factor
    (PF) shows the diversity of a project in terms of
    the division of labor among committers in a
    project.
    Pony Factor is determined as:
    “The lowest number of committers whose
    total contribution constitutes the majority of
    the codebase”
    https://ke4qqq.wordpress.com/2015/02/08/pony-factor-math/
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 42 / 54

    View Slide

  43. Diversity: Bitergia Elephant Factor
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 43 / 54

    View Slide

  44. Diversity: Bitergia Elephant Factor
    Projects can benefit from powerful collaborations
    from companies (elephants). The elephant factor
    shows the diversity of a project in terms of the
    division of labor among companies (by mean of
    developers affiliated with them).
    Elephant factor is determined as:
    “The lowest number of companies whose
    total contribution (in commits by their
    employees) constitutes the majority of the
    commits”
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 44 / 54

    View Slide

  45. Diversity: some projects
    Pony Factor Elephant Factor Commits (excl bots)
    OpenNebula 4 1 12K
    Eucalyptus 5 1 25K
    CloudStack 14 1 42K
    OpenStack >100 6 126K
    CloudFoundry 41 1 60K
    OpenShift 10 1 15K
    Docker 15 1 18K
    Kubernetes 12 1 7K
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 45 / 54

    View Slide

  46. Diversity: Code “owned”
    “The land belongs
    to its workers”
    Emiliano Zapata
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 46 / 54

    View Slide

  47. Diversity: Code “owned”
    The code changes over time. The current version is
    “owned” by the people who produced it.
    The code “belongs” to those who wrote it.
    Zapata factor (work in progress):
    “The lowest number of developers for whom
    the total number of lines of code they “own”
    (were last touched by them) constitutes the
    majority of the lines of code”
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 47 / 54

    View Slide

  48. Diversity: Code “owned”
    The code “belongs” to companies who employ
    developers changing it.
    United Fruit factor (work in progress):
    “The lowest number of companies for whom
    the total number of lines of code they “own”
    (were last touched by their employees)
    constitutes the majority of the lines of code”
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 48 / 54

    View Slide

  49. Final remarks
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 49 / 54

    View Slide

  50. Characterizing a community
    Activity / size
    Performance
    Demography
    Diversity
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 50 / 54

    View Slide

  51. Room for improvement
    Many other aspects... explore your own
    Refine what is important
    Explore new ways of making data useful
    Tell interesting stories based on data
    Visualization is very important
    Higher-order metrics
    Simplify results, make them meaningful
    Can we characterize many aspects with
    a small set of metrics?
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 51 / 54

    View Slide

  52. Summary
    You cannot improve
    what you cannot measure
    Fortunately, you can measure a lot of things...
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 52 / 54

    View Slide

  53. A moment for a commercial: Join us at MSR 2017!!
    http://icse2017.gatech.edu
    http://2017.msrconf.org
    (Coming soon!)
    14th International
    Conference on
    Mining Software
    Repositories
    Co-located with ICSE
    Buenos Aires, Argentina
    Save the dates:
    May 20-21 2017
    Start the conversation!!!
    #msr17
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 53 / 54

    View Slide

  54. c 2016 Bitergia
    Some rights reserved. This presentation is distributed under the
    “Attribution-ShareAlike 3.0” license, by Creative Commons, available at
    http://creativecommons.org/licenses/by-sa/3.0/
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 54 / 54

    View Slide

  55. Credits (1)
    “Man With Two Hats”
    Statue by Henk Visch, located in Otawa, Canada
    Picture by Lezumbalaberenjena in Wikimedia Commons
    License: Public domain
    https://commons.wikimedia.org/wiki/File:
    Man_With_Two_Hats_Ottawa_Statue_by_lezumbalaberenjena.jpg
    “Crowd at FOSDEM 2008”
    by Jes´
    us Corrius
    License: CC Attribution 2.0
    http://www.flickr.com/photos/jcorrius/2302302707/
    “Emiliano Zapata”
    License: Public Domain
    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 55 / 54

    View Slide