Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Software development analytics for the masses}

Software development analytics for the masses}

Talk at Open World Forum 2014 (Paris, France, Oct 31st 2014).

Software development analytics can help in the complex task of gaining knowledge on the performance and inner life of free / open source software development projects and their communities. The talk presents some examples of how real projects can be analyzed, and which lessons can be learned from this analysis.

Understanding the inner life of projects is of fundamental importance to developers, users and decision makers. But gaining this needed knowledge is a specialized, time-consuming and error-prone tasks.

Software development analytics comes to help you, by highlighting interesting aspects of the analyzed projects, tracking relevant patterns, and assisting in the early identification of problems and detection of trends. It can be used to study the structure of a community and its likely evolution, to detect bottlenecks in a code review process, to evaluate the impact of policies trying to improve bug fixing, to understand company participation in large projects, or to assist in due diligence when OSS is an important asset.

Last, but not least, it will be shown how free / open source software can be used for all this analytics process.

Jesus M. Gonzalez-Barahona

October 30, 2014
Tweet

More Decks by Jesus M. Gonzalez-Barahona

Other Decks in Technology

Transcript

  1. Software development analytics for the masses Jesus M. Gonzalez-Barahona and

    Daniel Izquierdo {jgb,dizquierdo}@bitergia.com @jgbarah @dizquierdo Bitergia / LibreSoft (URJC) http://bit.ly/sw-analytics-masses Open World Forum 2014 Paris (France), October 31st 2014 Gonzalez-Barahona & Izquierdo (Bitergia) Software development analytics... OWF 2014 1 / 40
  2. c 2012-2014 Bitergia Some rights reserved. This presentation is distributed

    under the “Attribution-ShareAlike 3.0” license, by Creative Commons, available at http://creativecommons.org/licenses/by-sa/3.0/ Gonzalez-Barahona & Izquierdo (Bitergia) Software development analytics... OWF 2014 2 / 40
  3. Structure of the presentation 1 About me 2 Measuring free

    / open source software development 3 Why open development analytics? 4 Areas of interest & examples 5 Tools: Grimoire 6 Final remarks Gonzalez-Barahona & Izquierdo (Bitergia) Software development analytics... OWF 2014 3 / 40
  4. My Uni, my company Uni Rey Juan Carlos: LibreSoft research

    team Understanding free, open source software development Data analytics approach Bitergia: From research to the real world The software development analytics company Dashboards, reports, consultancy... http://gsyc.es/~jgb http://bitergia.com Gonzalez-Barahona & Izquierdo (Bitergia) Software development analytics... OWF 2014 5 / 40
  5. A successful development model? Free (open source) software has shown

    to be a great success ...but there are many details to be understood ...and (a lot of) interest in understanding ...but there is room for improvement ...and (a lot of) interest in improving Gonzalez-Barahona & Izquierdo (Bitergia) Software development analytics... OWF 2014 7 / 40
  6. A successful development model? (2) There are *lots* of development

    models Common characteristics for many of them: Community-based development Intensive use of tools, processes for coordination Open development models (as opposed to in-house, hidden models) Gonzalez-Barahona & Izquierdo (Bitergia) Software development analytics... OWF 2014 8 / 40
  7. The importance of the community [Crowd at FOSDEM 2008, by

    Jes´ us Corrius, CC Attribution 2.0] http://www.flickr.com/photos/jcorrius/2302302707/ Gonzalez-Barahona & Izquierdo (Bitergia) Software development analytics... OWF 2014 9 / 40
  8. The importance of the community (2) Persons (and organizations) with

    different interests common goals Need for coordination, common decision making Availability of data as a tool: Transparency to the community (fairness) Transparency to third parties (trust) Gonzalez-Barahona & Izquierdo (Bitergia) Software development analytics... OWF 2014 10 / 40
  9. Diversity of tools, processes (2) Despite diversity, a large fraction

    of projects: Use tools & services from a small set git / svn / hg Bugzilla / Jira / GitHub tickets Gerrit Mailman / Gmane ... use similar processes: bug fixing coordination using tickets pre-merge code review general discussion in mailing lists ... Collection and analysis of data is possible Publication of data makes sense Gonzalez-Barahona & Izquierdo (Bitergia) Software development analytics... OWF 2014 12 / 40
  10. From open development to open development analytics Information about code,

    community, development for open development projects can be retrieved, organized, analyzed Let’s publish analytics results & data Open Development Analytics: A new standard for transparency Gonzalez-Barahona & Izquierdo (Bitergia) Software development analytics... OWF 2014 14 / 40
  11. Open development analytics Who is interested? Large & small free

    software communities ...and thousands of large & small companies, public administrations, foundations participating in them, depending on their software [Who can afford not to be interested? It is a key strategic need for many actors] Gonzalez-Barahona & Izquierdo (Bitergia) Software development analytics... OWF 2014 15 / 40
  12. Open development analytics Why? Free software produced with open development

    models is more and more important for IT users, producers, integrators It is different & complex, yet transparent, many details are public, and it can be improved Gonzalez-Barahona & Izquierdo (Bitergia) Software development analytics... OWF 2014 16 / 40
  13. Some areas of interest Performance (understanding activity) Company participation (beyond

    copyright notices) Transparency (available information) Auditing (certify participation, experience, etc.) Profiling (key people, companies) Neutrality (fair treatment) Gonzalez-Barahona & Izquierdo (Bitergia) Software development analytics... OWF 2014 18 / 40
  14. Example: community management Dimensions Metrics Activity raw volume, participants, ...

    Reliability reaction times, pending issues, ... Sustainability growth rate, structure, ... Gonzalez-Barahona & Izquierdo (Bitergia) Software development analytics... OWF 2014 19 / 40
  15. Example: community management [Puppet developers community: attraction / retention (Oct

    2014)] http://bitergia.dev.puppetlabs.com/browser/demographics.html http://radar.oreilly.com/2014/10/ measure-your-open-source-communitys-age-to-keep-it-healthy.html Gonzalez-Barahona & Izquierdo (Bitergia) Software development analytics... OWF 2014 20 / 40
  16. Example: engineering 12-Q4 13-Q1 13-Q2 13-Q3 13-Q4 14-Q1 14-Q2 14-Q3

    0 5 10 15 20 25 30 4.520 6.040 7.550 9.790 12.000 16.570 17.820 22.070 1.190 1.830 2.460 4.080 4.290 6.220 6.080 6.780 Time to review (days): OpenStack Software mean median 12-Q4 13-Q1 13-Q2 13-Q3 13-Q4 14-Q1 14-Q2 14-Q3 0 1 2 3 4 5 6 7 2.760 3.340 3.690 4.250 4.860 5.120 5.490 5.290 2.000 2.000 2.000 2.000 2.000 3.000 3.000 3.000 Patchsets per Changeset: OpenStack Software mean median 12-Q4 13-Q1 13-Q2 13-Q3 13-Q4 14-Q1 14-Q2 14-Q3 0 2 4 6 8 3.180 3.510 4.090 6.660 6.210 7.180 7.560 6.270 0.180 0.310 0.450 0.520 0.780 0.960 0.910 0.760 Time waiting for the reviewer: OpenStack Software avg median 12-Q4 13-Q1 13-Q2 13-Q3 13-Q4 14-Q1 14-Q2 14-Q3 0 2 4 6 8 10 12 14 8.440 6.910 7.750 10.110 8.390 8.410 9.430 10.210 0.230 0.270 0.550 0.470 0.690 0.730 0.790 0.830 Time waiting for the submitter: OpenStack Software avg median [OpenStack core: main code review parameters per quarter] Gonzalez-Barahona & Izquierdo (Bitergia) Software development analytics... OWF 2014 21 / 40
  17. Example: auditing company participation (1) [Main companies in OpenStack Havana

    (partial view)] http://activity.openstack.org/dash/releases/ Gonzalez-Barahona & Izquierdo (Bitergia) Software development analytics... OWF 2014 24 / 40
  18. Example: auditing company participation (2) [Main companies in Eclipse (Oct

    2009 - Oct 2014)] http://dashboard.eclipse.org/scm-companies-summary.html Gonzalez-Barahona & Izquierdo (Bitergia) Software development analytics... OWF 2014 25 / 40
  19. Example: auditing company participation (3) [IBM participation in OpenStack Havana

    (partial view)] http://activity.openstack.org/dash/releases/company.html?company=IBM Gonzalez-Barahona & Izquierdo (Bitergia) Software development analytics... OWF 2014 26 / 40
  20. Example: auditing geographic diversity 15 10 5 0 5 10

    15 tz 0 50 100 150 200 250 300 authors 15 10 5 0 5 10 15 tz 0 50 100 150 200 250 300 350 400 authors [Eclipse mailing lists authors per time zone (2003, 2013)] Gonzalez-Barahona & Izquierdo (Bitergia) Software development analytics... OWF 2014 27 / 40
  21. Example: auditing geographic diversity (2) 15 10 5 0 5

    10 15 tz 0 50 100 150 200 250 300 350 400 authors 15 10 5 0 5 10 15 tz 0 50 100 150 200 250 300 350 400 authors [Eclipse mailing lists and SCM authors per time zone (2013)] Gonzalez-Barahona & Izquierdo (Bitergia) Software development analytics... OWF 2014 28 / 40
  22. Example: transparency Development communities: companies and developers working together Policies,

    procedures, tools, source code... and development data Do they really provide enough data to enable assessment? Analysis of all repositories (data sources)... ...and associated information (eg: affiliation) Gonzalez-Barahona & Izquierdo (Bitergia) Software development analytics... OWF 2014 29 / 40
  23. Example: auditing [OpenStack top contributors (December 2013)] Gonzalez-Barahona & Izquierdo

    (Bitergia) Software development analytics... OWF 2014 30 / 40
  24. Example: neutrality q q q q q q q q

    0 1 2 3 250 500 1000 2000 4000 Number of accepted reviews Iterations per accepted review (median) [WebKit code review data per company (2012)] Gonzalez-Barahona & Izquierdo (Bitergia) Software development analytics... OWF 2014 32 / 40
  25. Tools: Grimoire system MetricsGrimoire: Free software for retrieving data from

    repositories vizGrimoire (GrimoireLib, vizGrimoireJS): Free software for analyzing, visualizing data Grimoire Dashboard: Many panels, different views of the project (charts, summaries, statistic analysis) Commercially supported by Bitergia http://metricsgrimoire.github.io http://vizgrimoire.github.io Gonzalez-Barahona & Izquierdo (Bitergia) Software development analytics... OWF 2014 34 / 40
  26. Summarizing Let’s go one step further: Open Development Analytics Gonzalez-Barahona

    & Izquierdo (Bitergia) Software development analytics... OWF 2014 38 / 40
  27. Relationship with EU-funded R&D projects Markos: License analyzer New tools

    for software development analysis Production of linked open data PROSE: Software development analytics to track results of R&D projects Open Source Projects Europe forge: development analytics facilities http://www.markosproject.eu/ http://www.ict-prose.eu/ https://opensourceprojects.eu/ Gonzalez-Barahona & Izquierdo (Bitergia) Software development analytics... OWF 2014 39 / 40