Metrics to Characterize a Software Development Community

Metrics to Characterize a Software Development Community

Invited talk at 12th International Conference on Open Source Systems (OSS). This is a practical talk, based on the contents of our workshop on software development analytics.

Transcript

  1. Metrics to Characterize a Software Development Community Jesus M. Gonzalez-Barahona

    jgb@bitergia.com @jgbarah Bitergia / LibreSoft (URJC) http://speakerdeck.com/jgbarah/ 12th International Conference on Open Source Systems (OSS) Gothenburg (Sweden), May 30th 2016 Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 1 / 54
  2. Structure of the presentation 1 A bit of context 2

    Dealing with dynamic complexity 3 Sources of information 4 Activity / size 5 Performance 6 Demographics 7 Diversity 8 Final remarks Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 2 / 54
  3. A bit of context Jesus Gonzalez-Barahona (Bitergia) Metrics for a

    Software Development Community OSS 2016 3 / 54
  4. Me and my two hats Uni Rey Juan Carlos: LibreSoft

    research team Understanding free, open source software Data analytics approach Bitergia: From research to the real world Understanding software development Data analytics approach http://gsyc.es/~jgb Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 4 / 54
  5. The company The software development analytics company dashboards reports consultancy

    ... http://bitergia.com Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 5 / 54
  6. The book Evaluating FOSS Projects: Work in progress Free /

    open book Fork and play! https://jgbarah.gitbooks.io/evaluating-foss-projects/ Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 6 / 54
  7. Recommendations Open your laptop Download the slides (they have links)

    Visit Cauldron.io and produce your own dashboard Play with the dashboards Understand the interpretations behind the numbers http://cauldron.io Code: OSS16 Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 7 / 54
  8. Preview: The Cauldron http://cauldron.io/dashboards/elastic Jesus Gonzalez-Barahona (Bitergia) Metrics for a

    Software Development Community OSS 2016 8 / 54
  9. Dealing with dynamic complexity Jesus Gonzalez-Barahona (Bitergia) Metrics for a

    Software Development Community OSS 2016 9 / 54
  10. Communities may be large and complex Jesus Gonzalez-Barahona (Bitergia) Metrics

    for a Software Development Community OSS 2016 10 / 54
  11. Projects may be large and complex... and dynamic It’s difficult

    to... ...track what’s happening ...understand why it’s happening ...react quickly ...evaluate results of reaction If data is available analytics may come to the rescue Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 11 / 54
  12. A continuous process Figure out your interest Find out available

    data Define key parameters Monitor, understand, detect deviations Act to correct, improve Track results Measure → Monitor → Act Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 12 / 54
  13. A continuous process (example) Case: company-led development community Interest: activity

    Data: changes to code, tickets Parameters: commits, tickets closed Monitoring: charts, numbers Observation: numbers declining Action: allocate more developer effort Track results... Measure → Monitor → Act Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 13 / 54
  14. Sources of information Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software

    Development Community OSS 2016 14 / 54
  15. Repositories, repositories, repositories Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software

    Development Community OSS 2016 15 / 54
  16. Source code management Centralized or client/server: CVS, Subversion Decentralized: git,

    Mercurial, Bazaar, etc. Today: most of them accessible through git... but not always the information is what appears to be (eg: branches in Subversion and git) Can be integrated with other tools: Gerrit, GitHub, etc. Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 16 / 54
  17. Issue tracking Many different systems: Bugzilla Jira GitHub issues Phabricator

    RedMine Trac ... Each with a different model, data, operations... Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 17 / 54
  18. Code review More and more projects using it Usually: peer

    review pre-merge change review Different methods: Mailing lists (eg: Linux) Gerrit (eg: OpenStack) GitHub pull requests (eg: ElasticSearch) or even Jira, Bugzilla... Usually, references to tickets and commits Much of the control on the software lies here Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 18 / 54
  19. Asynchronous communication Mailing lists: Mailing lists systems (Mailman) Google Groups

    Mailing list archivers (Gmane) Forums: too many to mention Question/Answer sites: StackOverflow, Askbot Information is always archived Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 19 / 54
  20. Synchronous communication Systems: Traditionally: IRC Nowadays: Slack & many others

    Not always text/based (eg: videoconferences) Notes: In many cases, lack of archives Privacy concerns: considered informal communication Difficult to track identities Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 20 / 54
  21. The many communities Development community: all repositories Contributing community: issue

    tracking, async communication User community: async communication, ... Ecosystem community: difficult to track Software may include beacons: tracking usage Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 21 / 54
  22. Activity / size Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software

    Development Community OSS 2016 22 / 54
  23. Activity / size Many different aspects of activity: committing patches:

    source code management system reporting, commenting or fixing bugs: issue tracking system submitting patches or reviewing them: code review system sending messages: async or sync communication systems Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 23 / 54
  24. Activity / size (most common cases) Parameters reflecting activity for

    a certain period. People active for a certain period. Evolution of any of them. Trends for any of them. Difficult to compare between projects Interesting to compare inside project (different subprojects, different time frames) Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 24 / 54
  25. Activity / size (many facets) http://cauldron.io/dashboards/elastic Jesus Gonzalez-Barahona (Bitergia) Metrics

    for a Software Development Community OSS 2016 25 / 54
  26. Activity / size (many facets) http://s.bitergia.com/db-fosdem16 Jesus Gonzalez-Barahona (Bitergia) Metrics

    for a Software Development Community OSS 2016 26 / 54
  27. Performance Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community

    OSS 2016 27 / 54
  28. Backlog (evolution over time) Example: backlog of open issues. http://cauldron.io/dashboards/elastic

    Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 28 / 54
  29. Efficiency Example: closed / opened tickets per quarter Jesus Gonzalez-Barahona

    (Bitergia) Metrics for a Software Development Community OSS 2016 29 / 54
  30. Tickets Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community

    OSS 2016 30 / 54
  31. Code review (time to merge) Jesus Gonzalez-Barahona (Bitergia) Metrics for

    a Software Development Community OSS 2016 31 / 54
  32. Code review (time to merge, metrics) Jesus Gonzalez-Barahona (Bitergia) Metrics

    for a Software Development Community OSS 2016 32 / 54
  33. Code review (time to merge, evolution) Jesus Gonzalez-Barahona (Bitergia) Metrics

    for a Software Development Community OSS 2016 33 / 54
  34. Code review (number of versions per review) Jesus Gonzalez-Barahona (Bitergia)

    Metrics for a Software Development Community OSS 2016 34 / 54
  35. Demographics Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community

    OSS 2016 35 / 54
  36. The many identities of anyone The repository level. The class

    of repository level. The project level. The global level. Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 36 / 54
  37. Demographics: The aging chart Attraction Retention Newcomers Expertise Jesus Gonzalez-Barahona

    (Bitergia) Metrics for a Software Development Community OSS 2016 37 / 54
  38. Diversity Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community

    OSS 2016 38 / 54
  39. Diversity: geographical information (time zones) http://cauldron.io/dashboards/elastic Jesus Gonzalez-Barahona (Bitergia) Metrics

    for a Software Development Community OSS 2016 39 / 54
  40. Diversity: geographical information (GitHub profiles) Jesus Gonzalez-Barahona (Bitergia) Metrics for

    a Software Development Community OSS 2016 40 / 54
  41. Diversity: affiliation http://s.bitergia.com/db-fosdem16 Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software

    Development Community OSS 2016 41 / 54
  42. Diversity: Apache Pony Factor In words of Daniel Gruno: We

    [the ASF] created a term we have coined “Pony Factor” (because ASF is full of ponies, or people who think they are ponies). Pony Factor (PF) shows the diversity of a project in terms of the division of labor among committers in a project. Pony Factor is determined as: “The lowest number of committers whose total contribution constitutes the majority of the codebase” https://ke4qqq.wordpress.com/2015/02/08/pony-factor-math/ Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 42 / 54
  43. Diversity: Bitergia Elephant Factor Jesus Gonzalez-Barahona (Bitergia) Metrics for a

    Software Development Community OSS 2016 43 / 54
  44. Diversity: Bitergia Elephant Factor Projects can benefit from powerful collaborations

    from companies (elephants). The elephant factor shows the diversity of a project in terms of the division of labor among companies (by mean of developers affiliated with them). Elephant factor is determined as: “The lowest number of companies whose total contribution (in commits by their employees) constitutes the majority of the commits” Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 44 / 54
  45. Diversity: some projects Pony Factor Elephant Factor Commits (excl bots)

    OpenNebula 4 1 12K Eucalyptus 5 1 25K CloudStack 14 1 42K OpenStack >100 6 126K CloudFoundry 41 1 60K OpenShift 10 1 15K Docker 15 1 18K Kubernetes 12 1 7K Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 45 / 54
  46. Diversity: Code “owned” “The land belongs to its workers” Emiliano

    Zapata Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 46 / 54
  47. Diversity: Code “owned” The code changes over time. The current

    version is “owned” by the people who produced it. The code “belongs” to those who wrote it. Zapata factor (work in progress): “The lowest number of developers for whom the total number of lines of code they “own” (were last touched by them) constitutes the majority of the lines of code” Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 47 / 54
  48. Diversity: Code “owned” The code “belongs” to companies who employ

    developers changing it. United Fruit factor (work in progress): “The lowest number of companies for whom the total number of lines of code they “own” (were last touched by their employees) constitutes the majority of the lines of code” Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 48 / 54
  49. Final remarks Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development

    Community OSS 2016 49 / 54
  50. Characterizing a community Activity / size Performance Demography Diversity Jesus

    Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 50 / 54
  51. Room for improvement Many other aspects... explore your own Refine

    what is important Explore new ways of making data useful Tell interesting stories based on data Visualization is very important Higher-order metrics Simplify results, make them meaningful Can we characterize many aspects with a small set of metrics? Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 51 / 54
  52. Summary You cannot improve what you cannot measure Fortunately, you

    can measure a lot of things... Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 52 / 54
  53. A moment for a commercial: Join us at MSR 2017!!

    http://icse2017.gatech.edu http://2017.msrconf.org (Coming soon!) 14th International Conference on Mining Software Repositories Co-located with ICSE Buenos Aires, Argentina Save the dates: May 20-21 2017 Start the conversation!!! #msr17 Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 53 / 54
  54. c 2016 Bitergia Some rights reserved. This presentation is distributed

    under the “Attribution-ShareAlike 3.0” license, by Creative Commons, available at http://creativecommons.org/licenses/by-sa/3.0/ Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 54 / 54
  55. Credits (1) “Man With Two Hats” Statue by Henk Visch,

    located in Otawa, Canada Picture by Lezumbalaberenjena in Wikimedia Commons License: Public domain https://commons.wikimedia.org/wiki/File: Man_With_Two_Hats_Ottawa_Statue_by_lezumbalaberenjena.jpg “Crowd at FOSDEM 2008” by Jes´ us Corrius License: CC Attribution 2.0 http://www.flickr.com/photos/jcorrius/2302302707/ “Emiliano Zapata” License: Public Domain Jesus Gonzalez-Barahona (Bitergia) Metrics for a Software Development Community OSS 2016 55 / 54