Extracting Relative Thresholds for Source Code Metrics (CSMR-WCRE 2014)

Extracting Relative Thresholds for Source Code Metrics (CSMR-WCRE 2014)

Establishing credible thresholds is a central challenge for promoting source code metrics as an effective instrument to control the internal quality of software systems. To address this challenge, we propose the concept of relative thresholds for evaluating metrics data following heavy-tailed
distributions. The proposed thresholds are relative because they assume that metric thresholds should be followed by most source code entities, but that it is also natural to have a number of entities in the “long-tail” that do not follow the defined limits. In the paper, we describe an empirical method for extracting relative thresholds from real systems. We also report a study on applying this method in a corpus with 106 systems. Based on
the results of this study, we argue that the proposed thresholds express a balance between real and idealized design practices.

13beaa3b7239eca3319d54c6a9f3a85a?s=128

ASERG, DCC, UFMG

February 06, 2014
Tweet

Transcript

  1. 1.

    Paloma Oliveira Marco Túlio Valente Fernando P. Lima Extracting Relative

    Thresholds for Source Code Metrics APPLIED SOFTWARE ENGINEERING RESEARCH GROUP /
  2. 2.

    Source code metrics Metrics are rarely used to measure internal

    quality It is essential to establish credible thresholds. Motivation 2 Cohesion Complexity Size Coupling CSMR-WCRE-2014
  3. 3.

    Our idea: Relative Thresholds Thresholds followed by most code entities

    Example: We accept a tail (100% - p%): NOA = 20% 3 CSMR-WCRE-2014
  4. 4.

    Relative Thresholds Format: M: a source code metric p: minimal

    % of entities in each system k: upper limit 4 CSMR-WCRE-2014
  5. 5.

    p and k characterize a relative threshold To calculate p

    and k we need: Corpus: a set of systems Two constants: MIN: real design rules TAIL: idealized design rules Extracting Relative Thresholds 5 CSMR-WCRE-2014
  6. 6.

    Extracting Relative Thresholds MIN: real design rules Relative thresholds should

    be followed by at least MIN% of the systems in the Corpus 6 CSMR-WCRE-2014
  7. 7.

    Extracting Relative Thresholds TAIL: idealized design rules The tail starts

    at the TAIL-th percentile of the values of a given metric; 7 classes with very high values CSMR-WCRE-2014
  8. 8.

    An Example Corpus: 106 systems MIN: 90% of the systems

    TAIL: 90th percentile 8 CSMR-WCRE-2014
  9. 10.

    ComplianceRate Function 10 Returns the % of systems in the

    corpus that follows the relative threshold defined by the pair [p, k] CSMR-WCRE-2014
  10. 11.

    ComplianceRate – Example #1 - NOA ComplianceRate [85, 17] =

    100% Maximal ComplianceRate (100%) But relies on a high value for k (17 attrib.) 11 CSMR-WCRE-2014
  11. 12.

    ComplianceRate – Example #2 – NOA ComplianceRate [90, 8] =

    50% Smaller ComplianceRate (half of the systems) But using acceptable value for k (8 attributes) 12 CSMR-WCRE-2014
  12. 13.

    Penalization Functions 13 Both examples are penalized: High value for

    k Example #1: ComplianceRate[85, 17] = 100% Small value for ComplianceRate Example #2: ComplianceRate[90, 8] = 50% CSMR-WCRE-2014
  13. 15.

    Penalty1 – Example #1 - NOA ComplianceRate [85, 17] =

    100% MIN = 90% Penalty1 [90, 8] = 0 15 CSMR-WCRE-2014
  14. 16.

    Penalty1 – Example #2 - NOA ComplianceRate [90, 8] =

    50% MIN = 90% Penalty1 [90, 8] = (90 - 50) / 90 = 0.4 16 CSMR-WCRE-2014
  15. 17.

    Penalty2 Function Penalty 2 formalizes idealized design rules. TAIL[S]: TAIL-th

    percentile of the values of M in a system S TailMedian is the median of the values in TAIL[S]. 17 To penalize CompliaceRate when k > TailMedian CSMR-WCRE-2014
  16. 18.

    Penalty2 – Example #1 - NOA TailMedian = 9 ComplianceRate

    [85, 17] = 100% K = 17 > TailMedian Penalty2 [17]: (17 - 9) / 9 = 0.9 18 CSMR-WCRE-2014
  17. 19.

    Penalty2 – Example #2 - NOA TailMedian = 9 ComplianceRate

    [90, 8] = 50% k = 8 < TailMedian Penalty2 [8] = 0 19 CSMR-WCRE-2014
  18. 20.

    ComplianceRatePenalty ComplianceRatePenalty is a sum of penalties 20 The Relative

    Threshold is the one with the lowest ComplianceRatePenalty CSMR-WCRE-2014
  19. 21.

    Empirical Method - ComplianceRatePenalty ComplianceRatePenalty [p, k] = penalty1 +

    penalty2 ComplianceRatePenalty [85,17] = 0 + 0.9 = 0.9 ComplianceRatePenalty [90,8] = 0.4 + 0 = 0.4 21 The relative threshold is defined by the lowest value calculated for ComplianceRatePenalty CSMR-WCRE-2014
  20. 22.

    ComplianceRatePenalty - Example - NOA 22 ComplianceRatePenalty = 0 in

    five cases: [75,7] [75,8] [75,9] [80,8] [80,9] Tiebreaker criteria the highest p and the lowest k [80,8] [80,9] CSMR-WCRE-2014
  21. 25.

    Case Study Includes: 1. Relative thresholds extraction for seven metrics

    2. Relative thresholds extraction for a subcorpus 3. Historical analysis 25 CSMR-WCRE-2014
  22. 26.

    26 NOM LOC FAN-OUT RFC WMC PUBA/NOA LCOM Metrics &

    Systems Qualitas Corpus - 106 systems CSMR-WCRE-2014
  23. 28.

    28 Subcorpus: Metrics & Systems NOM LOC FAN-OUT RFC WMC

    PUBA/NOA LCOM Qualitas Corpus - 26 Tools CSMR-WCRE-2014
  24. 30.

    30 NOM FAN-OUT WMC PUBA/NOA Historical Analysis: Metrics & Systems

    Previous versions - COMETS CSMR-WCRE-2014
  25. 32.
  26. 33.

    Conclusion Relative Thresholds Thresholds should be valid for most of

    entities But not for all entities Case Study: Qualitas Corpus The extracted threshold represent a balance between real and idealized design rules. 33 CSMR-WCRE-2014
  27. 34.

    Future Work New metrics and new corpus; Different contexts and

    programming languages New studies on Relative Thresholds (RT) Can we use RT to measure technical debt? What is the impact of not following the RTs? Are outliers really different from non-outliers? 34 CSMR-WCRE-2014