Validating Metric Thresholds with Developers: an Early Result (ICSME 2015, ERA Track, presented by A. Bergel)

Validating Metric Thresholds with Developers: an Early Result (ICSME 2015, ERA Track, presented by A. Bergel)

Thresholds are essential for promoting source code metrics as an effective instrument to control the internal quality of software applications. However, little is known about the relation between software quality as identified by metric thresholds and as perceived by real developers. In this paper, we report the first results of a study designed to validate a technique that extracts relative metric thresholds from benchmark data. We use this technique to extract thresholds from a benchmark of 79 Pharo/Smalltalk applications, which are validated with five experts and 25 developers. Our preliminary results indicate that good quality applications—as cited by experts—respect metric thresholds. In contrast, we observed that noncompliant applications are not largely viewed as requiring more effort to maintain than other applications.

13beaa3b7239eca3319d54c6a9f3a85a?s=128

ASERG, DCC, UFMG

October 05, 2015
Tweet

Transcript

  1. Paloma Oliveira - @palomaifmg Marco Tulio Valente - @mtov Alexandre

    Bergel - @alexbergel Alexander Serebrenik - @aserebrenik Validating Metric Thresholds with Developers: An Early Result Federal Institute Minas Gerais University of Chile / APPLIED SOFTWARE ENGINEERING RESEARCH GROUP
  2. 2

  3. 3

  4. ▪ Difficult to give a meaning to software metrics values

    ▪ Establishing credible thresholds is essential Coming back to software metrics 4
  5. Relative Thresholds ▪ Must be followed by most source code

    entities. ▪ Example: 5 Oliveira et al. CSMR-WCRE, 2014
  6. Relative Thresholds ▪ p - minimal % of entities in

    each system ▪ M - source code metric ▪ k – upper limit ▪ Relative Thresholds are extracted from a Corpus 6 Oliveira et al. CSMR-WCRE, 2014
  7. This paper ▪ Evaluate Relative Threshold against developers ▪ Step1:

    define the corpus ▪ Step2: compute the thresholds ▪ Step3: asked 5 experts to indicates well and poorly written applications ▪ Step4: contrast result of Step2 and 3 ▪ Step5: Asked 25 maintainers of non-compliant applications how they perceive the maintenance effort 7
  8. Relative Thresholds for Pharo ▪ Corpus: 79 Pharo applications ▪

    Metrics: NOA, NOM, FAN-OUT, and WMC 8 Metric p k NOA 75 5 NOM 75 29 FAN-OUT 80 9 WMC 75 46
  9. Noncompliant systems 9 Noncompliant NOA NOM FAN-OUT WMC Collections x

    x CommandShell x x x Files x x Graphics x x x x Kernel x x Manifest x x x Morphic x x Shout x x x x Tool x x x x
  10. RQ #1: Well-written applications ▪ Well-written applications respect the relative

    thresholds ▪ with exception of FAN-OUT 10 Systems Descriptions Voted by PetitParser Parser framework Expert #1 PharoLaucher Platform to manage Pharo images Expert #2 Pillar Markup language and tools Expert #2 Roassal Visualization engine Expert #3 Seaside Web framework Expert #4 SystemLogger Log framework Expert #5 Zinc HTTP framework Expert #5 ▪ Do applications perceived as well-written by experts respect the derived relative thresholds?
  11. RQ #1: Well-written applications ▪ High FAN-OUT ▪ presence of

    extensive inheritance hierarchies with many instances of overridden methods. 11
  12. RQ #2: Poorly-written applications ▪ Morphic does not respect the

    relative thresholds “Morphic is an old system and there is no test and sparse documentation”. Expert #2 ▪ Metacello respect the relative thresholds ▪ It was cited as poorly-written due to the complexity of its domain. 12 Systems Descriptions Voted by Metacello Versioning system Expert #4 Morphic Graphical interface framework Expert #2 and # 4
  13. RQ #3: Noncompliant applications ▪ Four (out of nine) noncompliant

    applications are harder to maintain ▪ “Graphics is a sum of patches over patches without a clear direction on design, with tons of duplicates and several design errors/conflicts. So is a pain to introduce any change there.” Graphics Maintainer ▪ Noncompliant applications are not largely viewed as requiring more effort to maintain than other applications 13
  14. Conclusion ▪ Well-designed applications respect the thresholds. ▪ Developers usually

    have difficulties to indicate poorly- designed applications. ▪ Noncompliant applications are not largely viewed as requiring more effort to maintain. 14 Federal Institute Minas Gerais University of Chile / APPLIED SOFTWARE ENGINEERING RESEARCH GROUP