Speaker Deck

Do you want to measure your project?

by Jesus M. Gonzalez-Barahona

Published February 3, 2013 in Programming

Lightning talk at FOSDEM.

FLOSS (free, libre, open source software) projects are usually developed in the open. A lot of information about their inner life is available in their development repositories: source code management (aka version control), issue tracking (aka bug reporting) systems, mailing lists, etc. This information can be organized and analyzed, and be used to gain understanding about how the project is performing, about the processes their developers are using, and in general about how it is evolving.

The kind of quantitative analytics that can be obtained from these repositories allow also for a direct tracking of several parameters that can characterize specific aspects of software development. The impact of changes in project policies or uses can therefore be evaluated quantitatively, and be observed in retrospective.

There are some websites that allow for some of these analysis, and some tools for supporting software development are also starting to offer some functionality along this line. But a more complete, holistic, flexible and customizable option is available: a set of free software tools that can extract information and metainformation from the most widely used kinds of software development repositories, store it in a database, and produce data and visualizations out of it. This talk will present it: the MetricsGrimoire toolset and its friends.

The MetricsGrimoire project provides a set of tools that can be used to analyze many kinds of software development repositories, from git or Subversion to Bugzilla, Jira or the SourceForge, GitHub and Launchpad issue trackers. Some related tools allow for different kinds of analysis and visualization of the retrieved data. Being all the tools free software, the limit for the kind of analysis and visualizations is only the imagination.

The talk will provide a detailed technical view of the tools (written mainly in Python), how they can be used and extended, and how we're using them to produce detailed analysis about several free software projects.

The main tools that will be introduced are:

- CVSAnalY, which currently supports CVS, SVN and Git, while Bazaar and Mercurial are in the roadmap.
- Bicho, currently supporting Bugzilla and the Google Code, GitHub, Jira, Launchpad, and Allura trackers.
- MailingListStats, currently supporting files in mbox format and Mailman web-accesible archives.
- VizGrimoire, a set of R scripts and JavaSript code to analyze and
visualize the databases produced by the former tools.

With all these tools working together, automatic and semi-automatic analysis of software development projects is possible, at least to a certain extent. Developers can tailor and adapt them to suit their specific needs, track the parameters they are interested in, and analyze the specific aspects of their pet projects that they may want.

Some questions that could be answered by this combined use of the tools
are:

- How has evolved the time-to-fix for bug reports over the whole history of a project?
- Which companies are contributing to a project, and to which extent?
- How technical decisions affect attraction of new developers, time-to-attend for bug reports, or time to review changes to code?
- How can it be done a dynamic visualization of the evolution of a project?.

The talk will show how in fact it is easy to have answers to this questions, and will enter into the details of how to answer them, with plenty of practical examples of the analysis of real projects. Some insight about how to work on extensions and complementary tools will also be provided.