$30 off During Our Annual Pro Sale. View Details »

Measuring free software development (presentation at IRILL)

Measuring free software development (presentation at IRILL)

Most free / open source software projects have publicly available repositories with many details about how their activity. All changes are recorded in the source code management system (Subversion, git, Mercurial, bazaar, etc.), along with who and when made the change, and other metainformation. Issue tracking (ticketing) systems (Bugzilla, Jira, etc.) record not only bug reports and how they are fixed, but also feature requests, design discussions, etc. Mailing lists, irc logs or forums carry information about discussions, user support, etc. All of this information, when properly analyzed, can be used to track how a project is performing, and to detect interesting patterns and potential or real problems early.

The talk presents how all if this can be done with the tools in the MetricsGrimoire toolset, including some examples for specific projects. The talk was delivered at IRILL, http://irill.org

More info: http://blog.bitergia.com/2012/10/11/measuring-free-software-development-presentation-at-irill

Jesus M. Gonzalez-Barahona

October 11, 2012
Tweet

More Decks by Jesus M. Gonzalez-Barahona

Other Decks in Technology

Transcript

  1. Measuring free software development
    Jesus M. Gonzalez-Barahona
    [email protected]
    http://identi.ca/jgbarah http://twitter.com/jgbarah
    Bitergia
    GSyC/LibreSoft, Universidad Rey Juan Carlos
    IRILL (Paris, France), October 11th 2012
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 1 / 28

    View Slide

  2. c 2012 Bitergia
    Some rights reserved. This presentation is distributed under the
    “Attribution-ShareAlike 3.0” license, by Creative Commons, available at
    http://creativecommons.org/licenses/by-sa/3.0/
    Blog post about this presentation (including link to slides)
    http://wp.me/p2cQGW-4A
    http://blog.bitergia.com/2012/10/11/
    measuring-free-software-development-presentation-at-irill/
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 2 / 28

    View Slide

  3. Free software is (in many cases) special
    Source code available
    Open development model (usually)
    Many details about the internals of the development
    process
    Intense use of tools for coordination
    Lots of information is tracked, and available
    Developers & users communities are important
    sustainability
    pooling of resources
    innovation
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 3 / 28

    View Slide

  4. Measuring, measuring, measuring
    Information about code,
    community, development
    can be retrieved, organized,
    analyzed
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 4 / 28

    View Slide

  5. Who benefits
    Quantitative, objective data: facts, not opinions
    Specific questions can be answered
    Even simple analysis may help stakeholders:
    Developers:
    Understanding, improving development processes
    Users, integrators:
    Long-term sustainability, evolution, reaction to issues
    Investors:
    Attraction of external resources, growth rate
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 5 / 28

    View Slide

  6. But data has to be extracted, mined
    Data lives in repositories not always designed to release all
    their data easily:
    tools are needed to retrieve and extract it
    Data includes many complexities and details
    tools are needed to assist in its mining, analysis
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 6 / 28

    View Slide

  7. The Metrics Grimoire approach
    Set of tools specialized in retrieving information from
    different kinds of repositories. Among them:
    CVSAnalY: source code management (CVS,
    Subversion, git, etc.)
    Bicho: issue tracking systems (Bugzilla, Jira,
    SourceForge, Allura, Launchpad, Google Code, etc.)
    MLStats: mailing lists (mbox files, Mailman archives,
    etc.)
    Store all the information in SQL databases with similar
    structure
    http://metricsgrimoire.github.com
    https://github.com/MetricsGrimoire
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 7 / 28

    View Slide

  8. MetricsGrimoire: CVSAnalY
    Browses an SCM repository producing a database
    with:
    All metainformation (commit records, etc.)
    Metrics for each release of each file
    Also produces some tables suitable for specific analysis
    Multiple SCMs: CVS, svn, git (Bazaar partially)
    Whole history in the database, it’s possible to rebuild
    the files tree for any revision
    Tags and branches support
    Option to save the log to a file while parsing
    Extensions system, incremental capabilities
    Multiple database system support (MySQL and
    SQLite)
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 8 / 28

    View Slide

  9. MetricsGrimoire: CVSAnalY extensions
    Extension: a “plugin” for CVSAnalY
    Add information to the database, based in the
    information in the database and maybe the repository
    Usually: new tables for specific studies
    Simple example: commits per month per commiter
    Extensions add one or more tables to the database but
    they never modify the existing ones
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 9 / 28

    View Slide

  10. MetricsGrimoire: CVSAnalY extensions
    Some examples:
    FileTypes: adds a table containing information about
    the type of every file in the database (code,
    documentation, i18n, etc.)
    Metrics: analyzes every revision of every file
    calculating metrics like sloc and complexity metrics
    (mccabe, halstead). It currently supports metrics for
    C/C++, Python, Java and ADA.
    CommitsLOC: adds a new table with information
    about the total lines added/removed for every commit
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 10 / 28

    View Slide

  11. MetricsGrimoire: Bicho
    Parsing issue tracking systems
    Results stored in a MySQL database
    Information about each issue (ticket), and its
    modifications
    Currently it supports:
    SourceForge (HTML parsing)
    BugZilla: GNOME, KDE, others
    Jira, Google Code, Allura, Launchpad (API)
    Incremental
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 11 / 28

    View Slide

  12. MetricsGrimoire: MailingListStats
    Parses mbox information (RFC 822)
    Deals with Mailman archives
    Stores results (headers, body) in a MySQL database:
    Sender, CCs, etc.
    Time / Date
    Subject
    ...
    Incremental
    Can store multiple projects in a single database
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 12 / 28

    View Slide

  13. Milking the databases
    Once information is retrieved, and in suitable format for
    querying:
    it can be queried directly in the database
    it can be analyzed from R
    it can be filtered, manually inspected, improved
    it can be combined, cross-analyzed
    it can be visualized
    We’re building tools to simplify all of this: vizGrimoire
    https://github.com/VizGrimoire
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 13 / 28

    View Slide

  14. Now, some examples
    Some examples from real projects
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 14 / 28

    View Slide

  15. Zentyal (basic analysis)
    Source code management repositories:
    git: git://git.zentyal.org/zentyal
    From: 2005-06-27
    To: 2012-09-10
    Mailing lists:
    Development
    Users
    Announcements
    http://lists.zentyal.com/cgi-bin/mailman/listinfo/
    From: 2010-09-01
    To: 2012-09-30
    http://blog.bitergia.com/2012/10/03/basic-analysis-of-zentyal/
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 15 / 28

    View Slide

  16. Zentyal: Git repository (parameters per month)
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 16 / 28

    View Slide

  17. Zentyal: Mailing lists (Developers, Users)
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 17 / 28

    View Slide

  18. glibc: commits & committers per month
    Time
    Commits per month
    1995 2000 2005 2010
    0 100 300
    Time
    Committers per month
    1995 2000 2005 2010
    2 4 6 8
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 18 / 28

    View Slide

  19. OpenStack: Opening / closing tickets
    Folsom release cycle, 2012
    http://blog.bitergia.com/2012/09/27/
    how-the-new-release-of-openstack-was-built/
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 19 / 28

    View Slide

  20. OpenStack: Who is developing it?
    Core projects / all projects (Folsom release cycle, 2012)
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 20 / 28

    View Slide

  21. KDevelop: how is it closing tickets?
    Time
    0.99 (black) / 0.95 (green) / 0.5 (red) / 0.25 (blue)
    2000 2002 2004 2006 2008 2010 2012
    2 3 4 5 6
    Time in minutes, log 10 scale
    http://blog.bitergia.com/2012/08/07/updated-data-about-kdevelop/
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 21 / 28

    View Slide

  22. Preview: the history of OpenOffice.org / LibreOffice
    [Very preliminary, as found in the LibreOffice repository, 2000-2012]
    Commits
    Committer 0
    Month
    200
    400
    600
    800
    1000
    50
    100
    150
    0
    0 20
    40
    60
    80
    [Contributions of more than 1,000 commits trimmed]
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 22 / 28

    View Slide

  23. All of this can be integrated...
    Dashboards
    Forges
    IDEs
    Support systems
    ...
    A new generation of tracking
    systems for software development?
    Integrated with software forges?
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 23 / 28

    View Slide

  24. Example: towards a dashboard
    http://blog.bitergia.com/2012/09/27/
    how-the-new-release-of-openstack-was-built/
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 24 / 28

    View Slide

  25. Example: Alert project
    Get all events relevant to software development
    commits, changes to tickets, posts, changes to wikis, etc
    store, organize, structure, enrich the information they
    provide
    semantic database, extraction of relevant information and
    relationships, annotation with complementary information
    notify, inform, extend available information
    notifications, specialized searches, recommendations,
    detection of patterns
    http://alert-project.eu
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 25 / 28

    View Slide

  26. In summary
    FLOSS development repositories have a wealth of
    information
    Their analysis is potentially interesting to any
    stakeholder
    Getting the data out of the repository is not that
    difficult...
    ...but analysis may be
    We’re interested in deep analysis
    We’re interested in working with developers,
    managers, users
    What would you like to know about your pet project?
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 26 / 28

    View Slide

  27. Bitergia: an spin-off
    Started operations in July 2012
    Builds on the experience of LibreSoft R&D group
    Offering professional products and services
    Focused on:
    Metrics about software development
    (including community metrics)
    Specialized support for development forges
    (including metrics for projects)
    http://bitergia.com
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 27 / 28

    View Slide

  28. This is the end
    Have you learned something
    useful?
    [I would love to know what interested you the most]
    [...and the least]
    Jesus Gonzalez-Barahona (Bitergia) Measuring free software development IRILL 2012 28 / 28

    View Slide