$30 off During Our Annual Pro Sale. View Details »

Do you want to measure your project?

Do you want to measure your project?

Lightning talk at FOSDEM.

FLOSS (free, libre, open source software) projects are usually developed in the open. A lot of information about their inner life is available in their development repositories: source code management (aka version control), issue tracking (aka bug reporting) systems, mailing lists, etc. This information can be organized and analyzed, and be used to gain understanding about how the project is performing, about the processes their developers are using, and in general about how it is evolving.

The kind of quantitative analytics that can be obtained from these repositories allow also for a direct tracking of several parameters that can characterize specific aspects of software development. The impact of changes in project policies or uses can therefore be evaluated quantitatively, and be observed in retrospective.

There are some websites that allow for some of these analysis, and some tools for supporting software development are also starting to offer some functionality along this line. But a more complete, holistic, flexible and customizable option is available: a set of free software tools that can extract information and metainformation from the most widely used kinds of software development repositories, store it in a database, and produce data and visualizations out of it. This talk will present it: the MetricsGrimoire toolset and its friends.

The MetricsGrimoire project provides a set of tools that can be used to analyze many kinds of software development repositories, from git or Subversion to Bugzilla, Jira or the SourceForge, GitHub and Launchpad issue trackers. Some related tools allow for different kinds of analysis and visualization of the retrieved data. Being all the tools free software, the limit for the kind of analysis and visualizations is only the imagination.

The talk will provide a detailed technical view of the tools (written mainly in Python), how they can be used and extended, and how we're using them to produce detailed analysis about several free software projects.

The main tools that will be introduced are:

- CVSAnalY, which currently supports CVS, SVN and Git, while Bazaar and Mercurial are in the roadmap.
- Bicho, currently supporting Bugzilla and the Google Code, GitHub, Jira, Launchpad, and Allura trackers.
- MailingListStats, currently supporting files in mbox format and Mailman web-accesible archives.
- VizGrimoire, a set of R scripts and JavaSript code to analyze and
visualize the databases produced by the former tools.

With all these tools working together, automatic and semi-automatic analysis of software development projects is possible, at least to a certain extent. Developers can tailor and adapt them to suit their specific needs, track the parameters they are interested in, and analyze the specific aspects of their pet projects that they may want.

Some questions that could be answered by this combined use of the tools
are:

- How has evolved the time-to-fix for bug reports over the whole history of a project?
- Which companies are contributing to a project, and to which extent?
- How technical decisions affect attraction of new developers, time-to-attend for bug reports, or time to review changes to code?
- How can it be done a dynamic visualization of the evolution of a project?.

The talk will show how in fact it is easy to have answers to this questions, and will enter into the details of how to answer them, with plenty of practical examples of the analysis of real projects. Some insight about how to work on extensions and complementary tools will also be provided.

Jesus M. Gonzalez-Barahona

February 03, 2013
Tweet

More Decks by Jesus M. Gonzalez-Barahona

Other Decks in Programming

Transcript

  1. Do you want to measure your project?
    An introduction to MetricsGrimoire and vizGrimoire
    Jesus M. Gonzalez-Barahona
    [email protected]
    http://identi.ca/jgbarah http://twitter.com/jgbarah
    Bitergia
    GSyC/LibreSoft (Universidad Rey Juan Carlos)
    FOSDEM, Brussels, February 3rd, 2013
    Jesus Gonzalez-Barahona (Bitergia) Do you want to measure your project? FOSDEM 2013 1 / 24

    View Slide

  2. c 2012, 2013 Bitergia
    Some rights reserved. This presentation is distributed under the
    “Attribution-ShareAlike 3.0” license, by Creative Commons, available at
    http://creativecommons.org/licenses/by-sa/3.0/
    Jesus Gonzalez-Barahona (Bitergia) Do you want to measure your project? FOSDEM 2013 2 / 24

    View Slide

  3. Measuring, measuring, measuring
    Information about code, community, development
    for free / open source software projects
    can usually be retrieved, organized, analyzed
    Let’s do it!
    Jesus Gonzalez-Barahona (Bitergia) Do you want to measure your project? FOSDEM 2013 3 / 24

    View Slide

  4. Data has to be extracted, mined
    Data lives in repositories
    usually not designed to release it easily:
    tools are needed to retrieve and extract
    Data includes many complexities and details
    tools are needed to assist in mining, analysis
    Jesus Gonzalez-Barahona (Bitergia) Do you want to measure your project? FOSDEM 2013 4 / 24

    View Slide

  5. The MetricsGrimoire approach
    Set of tools specialized in retrieving information from
    different kinds of repositories. Among them:
    CVSAnalY: source code management
    (CVS, Subversion, git, etc.)
    Bicho: issue tracking systems
    (Bugzilla, Jira, SourceForge, Allura, Launchpad,
    Google Code, etc.)
    MLStats: mailing lists
    (mbox files, Mailman archives, etc.)
    Store all the information in SQL databases
    Analyze free software with free software!
    http://metricsgrimoire.github.com
    Jesus Gonzalez-Barahona (Bitergia) Do you want to measure your project? FOSDEM 2013 5 / 24

    View Slide

  6. MetricsGrimoire: CVSAnalY
    Browses an SCM repo producing a database with:
    All metainformation (commit records, etc.)
    Metrics for each release of each file
    Produces some tables suitable for specific analysis
    Multiple SCMs: CVS, svn, git (Bazaar partially)
    Whole history in the database, it’s possible to rebuild
    the files tree for any revision
    Support for tags & branches
    Extensions system, incremental capabilities
    Multiple database system support (MySQL and
    SQLite)
    Jesus Gonzalez-Barahona (Bitergia) Do you want to measure your project? FOSDEM 2013 6 / 24

    View Slide

  7. MetricsGrimoire: Bicho
    Parsing issue tracking systems
    Results stored in a MySQL database
    Information about each issue (ticket), and its
    modifications
    Currently supported:
    SourceForge (HTML parsing)
    BugZilla: GNOME, KDE, others
    Jira, Google Code, Allura, Launchpad (API)
    Incremental
    Jesus Gonzalez-Barahona (Bitergia) Do you want to measure your project? FOSDEM 2013 7 / 24

    View Slide

  8. MetricsGrimoire: MailingListStats
    Parses mbox information (RFC 822)
    Deals with Mailman archives
    Stores results (headers, body) in a MySQL database:
    Sender, CCs, etc.
    Time / Date
    Subject
    ...
    Incremental
    Multiple projects stored in a single database
    Jesus Gonzalez-Barahona (Bitergia) Do you want to measure your project? FOSDEM 2013 8 / 24

    View Slide

  9. vizGrimoire: Milking the databases
    Once information is retrieved, and in suitable format for
    querying:
    it can be queried directly in the database
    it can be analyzed from R
    it can be filtered, manually inspected, improved
    it can be combined, cross-analyzed
    it can be visualized
    Set of tools to simplify & automate all of this
    https://vizgrimoire.github.com
    Jesus Gonzalez-Barahona (Bitergia) Do you want to measure your project? FOSDEM 2013 9 / 24

    View Slide

  10. vizGrimoireR: statistics, charts
    R package specialized in managing MetricsGrimoire
    information
    Connects directly to the database and:
    gets the information from it
    filters & massages it
    does statistical analysis on it
    produces charts and WebGL 3D graphs
    produces JSON files to export to other tools
    ...and lets you unleash all the potential of R
    Jesus Gonzalez-Barahona (Bitergia) Do you want to measure your project? FOSDEM 2013 10 / 24

    View Slide

  11. vizGrimoireJS: visualization
    JavaScript library producing visualizations
    Retrieves JSON files and produces:
    live charts: evolution, pies, bars, etc.
    tables and text
    comparative charts
    soon to support replacement in screen
    soon to support links to information in forge
    Integration with HTML5 applications
    Jesus Gonzalez-Barahona (Bitergia) Do you want to measure your project? FOSDEM 2013 11 / 24

    View Slide

  12. (Simple) analytics of a project
    [Create databases cvsanalydb bichodb mlstatsdb]
    cvsanaly2 -u user -p XXX -d cvsanalydb \
    --extensions=Months git_repo
    bicho -d 1 --db-user-out=user --db-password-out=XXX \
    --db-database-out=bichodb github github_url
    mlstats --db-user user --db-password XXX \
    --db-name mlstatsdb http://maiman/url
    git clone [email protected]:VizGrimoire/VizGrimoireJS.git
    cd VizGrimoireJS/browser/data/json
    [Fill in project-info-milestone0.json]
    R --vanilla --args cvsanalydb user XXX path/scm-milestone0.R
    R --vanilla --args cvsanalydb user XXX path/its-milestone0.R
    R --vanilla --args cvsanalydb user XXX path/mls-milestone0.R
    Now, export vizGrimoireJS via HTTP
    Jesus Gonzalez-Barahona (Bitergia) Do you want to measure your project? FOSDEM 2013 12 / 24

    View Slide

  13. *Grimoire: git analysis
    Jesus Gonzalez-Barahona (Bitergia) Do you want to measure your project? FOSDEM 2013 13 / 24

    View Slide

  14. Zentyal: Mailing lists (Developers, Users)
    Jesus Gonzalez-Barahona (Bitergia) Do you want to measure your project? FOSDEM 2013 14 / 24

    View Slide

  15. OpenStack: Opening / closing tickets
    Folsom release cycle, 2012
    http://blog.bitergia.com/2012/09/27/
    how-the-new-release-of-openstack-was-built/
    Jesus Gonzalez-Barahona (Bitergia) Do you want to measure your project? FOSDEM 2013 15 / 24

    View Slide

  16. KDevelop: time-to-close tickets (quantiles)
    Time
    0.99 (black) / 0.95 (green) / 0.5 (red) / 0.25 (blue)
    2000 2002 2004 2006 2008 2010 2012
    2 3 4 5 6
    Time in minutes, log 10 scale
    http://blog.bitergia.com/2012/08/07/updated-data-about-kdevelop/
    Jesus Gonzalez-Barahona (Bitergia) Do you want to measure your project? FOSDEM 2013 16 / 24

    View Slide

  17. Liferay: time-to-close tickets (quantiles)
    http://blog.bitergia.com/2012/10/25/
    preview-of-the-analysis-of-liferay/
    Jesus Gonzalez-Barahona (Bitergia) Do you want to measure your project? FOSDEM 2013 17 / 24

    View Slide

  18. Linux kernel: demographic pyramids
    http://blog.bitergia.com/2013/02/01/
    demographics-of-linux-kernel-developers-how-old-are-they/
    Jesus Gonzalez-Barahona (Bitergia) Do you want to measure your project? FOSDEM 2013 18 / 24

    View Slide

  19. Example: towards a dashboard
    http://blog.bitergia.com/2012/09/27/
    how-the-new-release-of-openstack-was-built/
    Jesus Gonzalez-Barahona (Bitergia) Do you want to measure your project? FOSDEM 2013 19 / 24

    View Slide

  20. Example: integration with the Alert project
    The project:
    mining the repositories of a project...
    ...to provide useful information to developers
    Eg: which bug reports could be of my interest
    Eg: tickets similar to a given ticket (likely dups)
    *Grimoire:
    MetricsGrimoire used to retrieve information from
    repos
    vizGRimoire used to provide a user interface based on
    charts
    http://alert-project.eu
    Jesus Gonzalez-Barahona (Bitergia) Do you want to measure your project? FOSDEM 2013 20 / 24

    View Slide

  21. In summary...
    Development repositories have a wealth of information
    We all can do our own analysis
    Free software to analyze free software development
    Let’s define common formats to interface to different
    tools
    We can incrementally develop a powerful platform
    What would you like to know about your pet project?
    Jesus Gonzalez-Barahona (Bitergia) Do you want to measure your project? FOSDEM 2013 21 / 24

    View Slide

  22. Bitergia: an spin-off
    Started operations in July 2012
    Builds on the experience of LibreSoft R&D group
    Offering professional products and services
    Focused on:
    Metrics about software development
    (including community metrics)
    Specialized support for development forges
    (including metrics for projects)
    http://bitergia.com
    Jesus Gonzalez-Barahona (Bitergia) Do you want to measure your project? FOSDEM 2013 22 / 24

    View Slide

  23. Credits
    Thanks go to...
    Many LibreSoft developers who developed MetricsGrimoire
    The (small) community maintaining MetricsGrimoire
    Some Bitergia developers producing vizGrimoire
    The (future) community maintaining vizGrimoire
    The many free software developers that produced all the software on
    which these tools rely
    The many free software developers that produced all the software that
    gives us projects to analyze
    http://libresoft.es
    http://bitergia.com
    Jesus Gonzalez-Barahona (Bitergia) Do you want to measure your project? FOSDEM 2013 23 / 24

    View Slide

  24. This is the end, my friend
    Have you learned something
    useful?
    [I would love to know what interested you the most]
    [...and the least]
    Final note:
    You can use *Grimoire, contribute to *Grimoire
    Jesus Gonzalez-Barahona (Bitergia) Do you want to measure your project? FOSDEM 2013 24 / 24

    View Slide