Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How open development data benefits your project and your community

How open development data benefits your project and your community

Software development analytics can help in the complex task of gaining knowledge on the performance and inner life of free / open source software development projects and their communities. Providing this information publicly is a step beyond in project transparency, which helps the project itself, but also any party interested in it.

[ Presentation at Linux Tag 2014: http://www.linuxtag.org/2014/en/program/talk-details/?eventid=1554 ]

Understanding the inner life of free / open source software projects is of fundamental importance to developers, users and decision makers. Gaining this needed knowledge is a specialized, time-consuming and error-prone tasks. Fortunately, the understanding process can be improved by the availability of data and the use of tools for analysis and visualization. This leads to a new step in project transparency: the availability of open development data in formats and ways adapted to most common uses.

Software development analytics may help to gain knowledge of a project, by highlighting interesting aspects of the analyzed projects, tracking relevant patterns, and assisting in the early identification of problems and detection of trends. It can be used to study the structure of a community and its likely evolution, to detect bottlenecks in a code review process, to evaluate the impact of policies trying to improve bug fixing, to understand company participation in large projects, or to assist in due diligence when free / open source software is an important asset. Having this data available publicly allows any interested party to do their own analysis, thus generating trust by transparency.

The talk will show how the open development nature of most free / open source software projects produce a wealth of data which can be retrieved, analyzed and visualized. These processes can be assisted with specific tools such as software development dashboards. These visualizations can be useful for developers, community managers, software integrators, technology forecasters, and in general for any stakeholder interested in the whereabouts of OSS projects, using real-world cases. The talk will also present some examples of software development dashboards for real free / open source software projects, and lessons learned from them.

The presentation will be based on the Grimoire technology, which will be used as an excuse to discuss software development analytics, and open development data, in general.

More Decks by Jesus M. Gonzalez-Barahona

Other Decks in Technology

Transcript

  1. How open development data benefits your project and your community

    Jesus M. Gonzalez-Barahona [email protected] @jgbarah Bitergia / LibreSoft (URJC) http://bit.ly/open-sw-analytics Linux Tag 2014 Berlin (Germany), May 8th 2014 Jesus Gonzalez-Barahona (Bitergia) Open development data Linux Tag 2014 1 / 33
  2. c 2012-2014 Bitergia Some rights reserved. This presentation is distributed

    under the “Attribution-ShareAlike 3.0” license, by Creative Commons, available at http://creativecommons.org/licenses/by-sa/3.0/ Jesus Gonzalez-Barahona (Bitergia) Open development data Linux Tag 2014 2 / 33
  3. Structure of the presentation 1 Measuring free / open source

    software development 2 Why open development analytics? 3 Areas of interest 4 Tools Jesus Gonzalez-Barahona (Bitergia) Open development data Linux Tag 2014 3 / 33
  4. A successful development model? Free (open source) software has shown

    to be a great success ...but there are many details to be understood ...and (a lot of) interest in understanding ...but there is room for improvement ...and (a lot of) interest in improving Jesus Gonzalez-Barahona (Bitergia) Open development data Linux Tag 2014 5 / 33
  5. A successful development model? (2) There are *lots* of development

    models Common characteristics for many of them: Community-based development Intensive use of tools, processes for coordination Open development models (as opposed to in-house, hidden models) Jesus Gonzalez-Barahona (Bitergia) Open development data Linux Tag 2014 6 / 33
  6. The importance of the community [Crowd at FOSDEM 2008, by

    Jes´ us Corrius, CC Attribution 2.0] http://www.flickr.com/photos/jcorrius/2302302707/ Jesus Gonzalez-Barahona (Bitergia) Open development data Linux Tag 2014 7 / 33
  7. The importance of the community (2) Persons (and organizations) with

    different interests common goals Need for coordination, common decision making Availability of data as a tool: Transparency to the community (fairness) Transparency to third parties (trust) Jesus Gonzalez-Barahona (Bitergia) Open development data Linux Tag 2014 8 / 33
  8. Diversity of tools, processes (2) Despite diversity, a large fraction

    of projects: Use tools & services from a small set git / svn / hg Bugzilla / Jira / GitHub tickets Gerrit Mailman / Gmane ... use similar processes: bug fixing coordination using tickets pre-merge code review general discussion in mailing lists ... Collection and analysis of data is possible Publication of data makes sense Jesus Gonzalez-Barahona (Bitergia) Open development data Linux Tag 2014 10 / 33
  9. From open development to open development analytics Information about code,

    community, development for open development projects can be retrieved, organized, analyzed Let’s publish analytics results & data Open Development Analytics: A new standard for transparency Jesus Gonzalez-Barahona (Bitergia) Open development data Linux Tag 2014 12 / 33
  10. Open development analytics Who is interested? Large & small free

    software communities ...and thousands of large & small companies, public administrations, foundations participating in them, depending on their software [Who can afford not to be interested? It is a key strategic need for many actors] Jesus Gonzalez-Barahona (Bitergia) Open development data Linux Tag 2014 13 / 33
  11. Open development analytics Why? Free software produced with open development

    models is more and more important for IT users, producers, integrators It is different & complex, yet transparent, many details are public, and it can be improved Jesus Gonzalez-Barahona (Bitergia) Open development data Linux Tag 2014 14 / 33
  12. Some areas of interest Performance (understanding activity) Company participation (beyond

    copyright notices) Transparency (available information) Auditing (certify participation, experience, etc.) Profiling (key people, companies) Neutrality (fair treatment) Jesus Gonzalez-Barahona (Bitergia) Open development data Linux Tag 2014 16 / 33
  13. Areas of interest: community management Issues Parameters Activity Raw volume,

    participants, ... Reliability Reaction times, pending issues, ... Sustainability Growth rate, structure, ... Jesus Gonzalez-Barahona (Bitergia) Open development data Linux Tag 2014 17 / 33
  14. Areas of interest: community management [Puppet committers community: Attraction /

    retention] Jesus Gonzalez-Barahona (Bitergia) Open development data Linux Tag 2014 18 / 33
  15. Areas of interest: community management [Linux kernel: age of developers

    per cohort] http://blog.bitergia.com/2013/02/01/ demographics-of-linux-kernel-developers-how-old-are-they/ Jesus Gonzalez-Barahona (Bitergia) Open development data Linux Tag 2014 19 / 33
  16. Areas of interest: company participation [Main companies in OpenStack Havana

    (partial view)] http://activity.openstack.org/dash/releases/ Jesus Gonzalez-Barahona (Bitergia) Open development data Linux Tag 2014 22 / 33
  17. Areas of interest: company participation [IBM participation in OpenStack Havana

    (partial view)] http://activity.openstack.org/dash/releases/company.html?company=IBM Jesus Gonzalez-Barahona (Bitergia) Open development data Linux Tag 2014 23 / 33
  18. Areas of interest: transparency Development communities: companies and developers working

    together Policies, procedures, tools, source code... and development data Do they really provide enough data to enable assessment? Analysis of all repositories (data sources)... ...and associated information (eg: affiliation) Jesus Gonzalez-Barahona (Bitergia) Open development data Linux Tag 2014 24 / 33
  19. Areas of interest: auditing [OpenStack top contributors (December 2013)] Jesus

    Gonzalez-Barahona (Bitergia) Open development data Linux Tag 2014 25 / 33
  20. Areas of interest: neutrality q q q q q q

    q q 0 1 2 3 250 500 1000 2000 4000 Number of accepted reviews Iterations per accepted review (median) [WebKit code review data per company (2012)] Jesus Gonzalez-Barahona (Bitergia) Open development data Linux Tag 2014 27 / 33
  21. Tools: Grimoire system MetricsGrimoire: Free software for retrieving data from

    repositories vizGrimoire: Free software for analyzing, visualizing data Grimoire Dashboard: Many panels, different views of the project (charts, summaries, statistic analysis) Commercially supported by Bitergia Jesus Gonzalez-Barahona (Bitergia) Open development data Linux Tag 2014 29 / 33
  22. Summarizing Let’s go one step further: Open Development Analytics Jesus

    Gonzalez-Barahona (Bitergia) Open development data Linux Tag 2014 31 / 33
  23. Relationship with EU-funded R&D projects Markos: License analyzer New tools

    for software development analysis Production of linked open data PROSE: Software development analytics to track results of R&D projects Open Source Projects Europe forge: development analytics facilities http://www.markosproject.eu/ http://www.ict-prose.eu/ https://opensourceprojects.eu/ Jesus Gonzalez-Barahona (Bitergia) Open development data Linux Tag 2014 32 / 33