Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Extracting Relative Thresholds for Source Code Metrics (CSMR-WCRE 2014)

Extracting Relative Thresholds for Source Code Metrics (CSMR-WCRE 2014)

Establishing credible thresholds is a central challenge for promoting source code metrics as an effective instrument to control the internal quality of software systems. To address this challenge, we propose the concept of relative thresholds for evaluating metrics data following heavy-tailed
distributions. The proposed thresholds are relative because they assume that metric thresholds should be followed by most source code entities, but that it is also natural to have a number of entities in the “long-tail” that do not follow the defined limits. In the paper, we describe an empirical method for extracting relative thresholds from real systems. We also report a study on applying this method in a corpus with 106 systems. Based on
the results of this study, we argue that the proposed thresholds express a balance between real and idealized design practices.

ASERG, DCC, UFMG

February 06, 2014
Tweet

More Decks by ASERG, DCC, UFMG

Other Decks in Research

Transcript

  1. Paloma Oliveira
    Marco Túlio Valente
    Fernando P. Lima
    Extracting Relative Thresholds for
    Source Code Metrics
    APPLIED SOFTWARE ENGINEERING
    RESEARCH GROUP
    /

    View Slide

  2. Source code metrics
    Metrics are rarely used to measure internal quality
    It is essential to establish credible thresholds.
    Motivation
    2
    Cohesion
    Complexity
    Size
    Coupling
    CSMR-WCRE-2014

    View Slide

  3. Our idea: Relative Thresholds
    Thresholds followed by most code entities
    Example:
    We accept a tail (100% - p%): NOA = 20%
    3
    CSMR-WCRE-2014

    View Slide

  4. Relative Thresholds
    Format:
    M: a source code metric
    p: minimal % of entities in each system
    k: upper limit
    4
    CSMR-WCRE-2014

    View Slide

  5. p and k characterize a relative threshold
    To calculate p and k we need:
    Corpus: a set of systems
    Two constants:
    MIN: real design rules
    TAIL: idealized design rules
    Extracting Relative Thresholds
    5
    CSMR-WCRE-2014

    View Slide

  6. Extracting Relative Thresholds
    MIN: real design rules
    Relative thresholds should be followed by at
    least MIN% of the systems in the Corpus
    6
    CSMR-WCRE-2014

    View Slide

  7. Extracting Relative Thresholds
    TAIL: idealized design rules
    The tail starts at the TAIL-th percentile of
    the values of a given metric;
    7
    classes with very
    high values
    CSMR-WCRE-2014

    View Slide

  8. An Example
    Corpus: 106 systems
    MIN: 90% of the systems
    TAIL: 90th percentile
    8
    CSMR-WCRE-2014

    View Slide

  9. Empirical Method
    9
    Functions to calculate the parameters p and k
    CSMR-WCRE-2014

    View Slide

  10. ComplianceRate Function
    10
    Returns the % of systems in the corpus that follows
    the relative threshold defined by the pair [p, k]
    CSMR-WCRE-2014

    View Slide

  11. ComplianceRate – Example #1 - NOA
    ComplianceRate [85, 17] = 100%
    Maximal ComplianceRate (100%)
    But relies on a high value for k (17 attrib.)
    11
    CSMR-WCRE-2014

    View Slide

  12. ComplianceRate – Example #2 – NOA
    ComplianceRate [90, 8] = 50%
    Smaller ComplianceRate (half of the systems)
    But using acceptable value for k (8 attributes)
    12
    CSMR-WCRE-2014

    View Slide

  13. Penalization Functions
    13
    Both examples are penalized:
    High value for k
    Example #1: ComplianceRate[85, 17] = 100%
    Small value for ComplianceRate
    Example #2: ComplianceRate[90, 8] = 50%
    CSMR-WCRE-2014

    View Slide

  14. Penalty1
    Function
    14
    To penalize CompliaceRate < MIN%
    Penalty1 formalizes real design rules
    CSMR-WCRE-2014

    View Slide

  15. Penalty1
    – Example #1 - NOA
    ComplianceRate [85, 17] = 100%
    MIN = 90%
    Penalty1
    [90, 8] = 0
    15
    CSMR-WCRE-2014

    View Slide

  16. Penalty1
    – Example #2 - NOA
    ComplianceRate [90, 8] = 50%
    MIN = 90%
    Penalty1
    [90, 8] = (90 - 50) / 90 = 0.4
    16
    CSMR-WCRE-2014

    View Slide

  17. Penalty2 Function
    Penalty
    2
    formalizes idealized design rules.
    TAIL[S]: TAIL-th percentile of the values of M in a system S
    TailMedian is the median of the values in TAIL[S].
    17
    To penalize CompliaceRate when k > TailMedian
    CSMR-WCRE-2014

    View Slide

  18. Penalty2 – Example #1 - NOA
    TailMedian = 9
    ComplianceRate [85, 17] = 100%
    K = 17 > TailMedian
    Penalty2
    [17]: (17 - 9) / 9 = 0.9
    18
    CSMR-WCRE-2014

    View Slide

  19. Penalty2
    – Example #2 - NOA
    TailMedian = 9
    ComplianceRate [90, 8] = 50%
    k = 8 < TailMedian
    Penalty2
    [8] = 0
    19
    CSMR-WCRE-2014

    View Slide

  20. ComplianceRatePenalty
    ComplianceRatePenalty is a sum of penalties
    20
    The Relative Threshold is the one with
    the lowest ComplianceRatePenalty
    CSMR-WCRE-2014

    View Slide

  21. Empirical Method - ComplianceRatePenalty
    ComplianceRatePenalty [p, k] = penalty1 + penalty2
    ComplianceRatePenalty [85,17] = 0 + 0.9 = 0.9
    ComplianceRatePenalty [90,8] = 0.4 + 0 = 0.4
    21
    The relative threshold is defined by the
    lowest value calculated
    for ComplianceRatePenalty
    CSMR-WCRE-2014

    View Slide

  22. ComplianceRatePenalty - Example - NOA
    22
    ComplianceRatePenalty = 0 in five cases:
    [75,7] [75,8] [75,9] [80,8] [80,9]
    Tiebreaker criteria
    the highest p and the lowest k
    [80,8] [80,9]
    CSMR-WCRE-2014

    View Slide

  23. Empirical Method
    23
    Relative threshold for NOA
    metric [p,k] = [80,8]
    CSMR-WCRE-2014

    View Slide

  24. Case Study
    24

    View Slide

  25. Case Study
    Includes:
    1. Relative thresholds extraction for seven metrics
    2. Relative thresholds extraction for a subcorpus
    3. Historical analysis
    25
    CSMR-WCRE-2014

    View Slide

  26. 26
    NOM
    LOC
    FAN-OUT
    RFC
    WMC
    PUBA/NOA
    LCOM
    Metrics & Systems
    Qualitas Corpus - 106 systems
    CSMR-WCRE-2014

    View Slide

  27. * Systems that do not follow the thresholds
    Extracted Relative Thresholds
    27
    *
    CSMR-WCRE-2014

    View Slide

  28. 28
    Subcorpus: Metrics & Systems
    NOM
    LOC
    FAN-OUT
    RFC
    WMC
    PUBA/NOA
    LCOM
    Qualitas Corpus - 26 Tools
    CSMR-WCRE-2014

    View Slide

  29. Subcorpus: Extracted Relative Thresholds
    The thresholds rely on relatively high values for k.
    29
    CSMR-WCRE-2014

    View Slide

  30. 30
    NOM
    FAN-OUT
    WMC
    PUBA/NOA
    Historical Analysis: Metrics & Systems
    Previous versions - COMETS
    CSMR-WCRE-2014

    View Slide

  31. Historical analysis - Dataset
    COMETS: Come Metrics Time Series Dataset
    31
    outlier
    CSMR-WCRE-2014

    View Slide

  32. Historical analysis
    32
    Along the extracted versions, the systems did not change
    their status.
    CSMR-WCRE-2014

    View Slide

  33. Conclusion
    Relative Thresholds
    Thresholds should be valid for most of entities
    But not for all entities
    Case Study: Qualitas Corpus
    The extracted threshold represent a balance between
    real and idealized design rules.
    33
    CSMR-WCRE-2014

    View Slide

  34. Future Work
    New metrics and new corpus;
    Different contexts and programming languages
    New studies on Relative Thresholds (RT)
    Can we use RT to measure technical debt?
    What is the impact of not following the RTs?
    Are outliers really different from non-outliers?
    34
    CSMR-WCRE-2014

    View Slide

  35. Thank you!
    35
    APPLIED SOFTWARE
    ENGINEERING
    RESEARCH GROUP
    /
    [email protected]
    http://aserg.labsoft.dcc.ufmg.br
    CSMR-WCRE-2014

    View Slide