Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Measuring free software development (presentati...

Measuring free software development (presentation at IRILL)

Most free / open source software projects have publicly available repositories with many details about how their activity. All changes are recorded in the source code management system (Subversion, git, Mercurial, bazaar, etc.), along with who and when made the change, and other metainformation. Issue tracking (ticketing) systems (Bugzilla, Jira, etc.) record not only bug reports and how they are fixed, but also feature requests, design discussions, etc. Mailing lists, irc logs or forums carry information about discussions, user support, etc. All of this information, when properly analyzed, can be used to track how a project is performing, and to detect interesting patterns and potential or real problems early.

The talk presents how all if this can be done with the tools in the MetricsGrimoire toolset, including some examples for specific projects. The talk was delivered at IRILL, http://irill.org

More info: http://blog.bitergia.com/2012/10/11/measuring-free-software-development-presentation-at-irill

Jesus M. Gonzalez-Barahona

October 11, 2012
Tweet

More Decks by Jesus M. Gonzalez-Barahona

Other Decks in Technology

Transcript

  1. Measuring free software development Jesus M. Gonzalez-Barahona [email protected] http://identi.ca/jgbarah http://twitter.com/jgbarah

    Bitergia GSyC/LibreSoft, Universidad Rey Juan Carlos IRILL (Paris, France), October 11th 2012 Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 1 / 28
  2. c 2012 Bitergia Some rights reserved. This presentation is distributed

    under the “Attribution-ShareAlike 3.0” license, by Creative Commons, available at http://creativecommons.org/licenses/by-sa/3.0/ Blog post about this presentation (including link to slides) http://wp.me/p2cQGW-4A http://blog.bitergia.com/2012/10/11/ measuring-free-software-development-presentation-at-irill/ Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 2 / 28
  3. Free software is (in many cases) special Source code available

    Open development model (usually) Many details about the internals of the development process Intense use of tools for coordination Lots of information is tracked, and available Developers & users communities are important sustainability pooling of resources innovation Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 3 / 28
  4. Measuring, measuring, measuring Information about code, community, development can be

    retrieved, organized, analyzed Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 4 / 28
  5. Who benefits Quantitative, objective data: facts, not opinions Specific questions

    can be answered Even simple analysis may help stakeholders: Developers: Understanding, improving development processes Users, integrators: Long-term sustainability, evolution, reaction to issues Investors: Attraction of external resources, growth rate Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 5 / 28
  6. But data has to be extracted, mined Data lives in

    repositories not always designed to release all their data easily: tools are needed to retrieve and extract it Data includes many complexities and details tools are needed to assist in its mining, analysis Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 6 / 28
  7. The Metrics Grimoire approach Set of tools specialized in retrieving

    information from different kinds of repositories. Among them: CVSAnalY: source code management (CVS, Subversion, git, etc.) Bicho: issue tracking systems (Bugzilla, Jira, SourceForge, Allura, Launchpad, Google Code, etc.) MLStats: mailing lists (mbox files, Mailman archives, etc.) Store all the information in SQL databases with similar structure http://metricsgrimoire.github.com https://github.com/MetricsGrimoire Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 7 / 28
  8. MetricsGrimoire: CVSAnalY Browses an SCM repository producing a database with:

    All metainformation (commit records, etc.) Metrics for each release of each file Also produces some tables suitable for specific analysis Multiple SCMs: CVS, svn, git (Bazaar partially) Whole history in the database, it’s possible to rebuild the files tree for any revision Tags and branches support Option to save the log to a file while parsing Extensions system, incremental capabilities Multiple database system support (MySQL and SQLite) Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 8 / 28
  9. MetricsGrimoire: CVSAnalY extensions Extension: a “plugin” for CVSAnalY Add information

    to the database, based in the information in the database and maybe the repository Usually: new tables for specific studies Simple example: commits per month per commiter Extensions add one or more tables to the database but they never modify the existing ones Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 9 / 28
  10. MetricsGrimoire: CVSAnalY extensions Some examples: FileTypes: adds a table containing

    information about the type of every file in the database (code, documentation, i18n, etc.) Metrics: analyzes every revision of every file calculating metrics like sloc and complexity metrics (mccabe, halstead). It currently supports metrics for C/C++, Python, Java and ADA. CommitsLOC: adds a new table with information about the total lines added/removed for every commit Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 10 / 28
  11. MetricsGrimoire: Bicho Parsing issue tracking systems Results stored in a

    MySQL database Information about each issue (ticket), and its modifications Currently it supports: SourceForge (HTML parsing) BugZilla: GNOME, KDE, others Jira, Google Code, Allura, Launchpad (API) Incremental Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 11 / 28
  12. MetricsGrimoire: MailingListStats Parses mbox information (RFC 822) Deals with Mailman

    archives Stores results (headers, body) in a MySQL database: Sender, CCs, etc. Time / Date Subject ... Incremental Can store multiple projects in a single database Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 12 / 28
  13. Milking the databases Once information is retrieved, and in suitable

    format for querying: it can be queried directly in the database it can be analyzed from R it can be filtered, manually inspected, improved it can be combined, cross-analyzed it can be visualized We’re building tools to simplify all of this: vizGrimoire https://github.com/VizGrimoire Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 13 / 28
  14. Now, some examples Some examples from real projects Jesus Gonzalez-Barahona

    (Bitergia) Measuring free software development IRILL 2012 14 / 28
  15. Zentyal (basic analysis) Source code management repositories: git: git://git.zentyal.org/zentyal From:

    2005-06-27 To: 2012-09-10 Mailing lists: Development Users Announcements http://lists.zentyal.com/cgi-bin/mailman/listinfo/ From: 2010-09-01 To: 2012-09-30 http://blog.bitergia.com/2012/10/03/basic-analysis-of-zentyal/ Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 15 / 28
  16. glibc: commits & committers per month Time Commits per month

    1995 2000 2005 2010 0 100 300 Time Committers per month 1995 2000 2005 2010 2 4 6 8 Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 18 / 28
  17. OpenStack: Opening / closing tickets Folsom release cycle, 2012 http://blog.bitergia.com/2012/09/27/

    how-the-new-release-of-openstack-was-built/ Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 19 / 28
  18. OpenStack: Who is developing it? Core projects / all projects

    (Folsom release cycle, 2012) Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 20 / 28
  19. KDevelop: how is it closing tickets? Time 0.99 (black) /

    0.95 (green) / 0.5 (red) / 0.25 (blue) 2000 2002 2004 2006 2008 2010 2012 2 3 4 5 6 Time in minutes, log 10 scale http://blog.bitergia.com/2012/08/07/updated-data-about-kdevelop/ Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 21 / 28
  20. Preview: the history of OpenOffice.org / LibreOffice [Very preliminary, as

    found in the LibreOffice repository, 2000-2012] Commits Committer 0 Month 200 400 600 800 1000 50 100 150 0 0 20 40 60 80 [Contributions of more than 1,000 commits trimmed] Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 22 / 28
  21. All of this can be integrated... Dashboards Forges IDEs Support

    systems ... A new generation of tracking systems for software development? Integrated with software forges? Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 23 / 28
  22. Example: Alert project Get all events relevant to software development

    commits, changes to tickets, posts, changes to wikis, etc store, organize, structure, enrich the information they provide semantic database, extraction of relevant information and relationships, annotation with complementary information notify, inform, extend available information notifications, specialized searches, recommendations, detection of patterns http://alert-project.eu Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 25 / 28
  23. In summary FLOSS development repositories have a wealth of information

    Their analysis is potentially interesting to any stakeholder Getting the data out of the repository is not that difficult... ...but analysis may be We’re interested in deep analysis We’re interested in working with developers, managers, users What would you like to know about your pet project? Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 26 / 28
  24. Bitergia: an spin-off Started operations in July 2012 Builds on

    the experience of LibreSoft R&D group Offering professional products and services Focused on: Metrics about software development (including community metrics) Specialized support for development forges (including metrics for projects) http://bitergia.com Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 27 / 28
  25. This is the end Have you learned something useful? [I

    would love to know what interested you the most] [...and the least] Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 28 / 28