Does Feature Scattering Follow Power-Law Distributions? (FOSD 2014)

Does Feature Scattering Follow Power-Law Distributions? (FOSD 2014)

eature scattering is long said to be an undesirable characteristic in source code. Since scattered features introduce extensions across the code base, their maintenance requires analyzing and changing different locations in code, possibly causing ripple effects. Despite this fact, scattering often occurs in practice, either due to limitations in existing programming languages (e.g., imposition of a dominant decomposition) or time-pressure issues. In the latter case, scattering provides a simple way to support new capabilities, avoiding the upfront investment of creating modules and interfaces (when possible). Hence, we argue that scattering is not necessarily bad, provided it is kept within certain limits, or thresholds. Extracting thresholds, however, is not a trivial task. For instance, research shows that some source code-metric distributions are heavy-tailed, usually following power-law models. In the face of heavy-tailed distributions, reporting metrics in terms of averages and standard deviations is unreliable, although commonly done so. Thus, prior to extracting reliable thresholds for feature scattering, one must understand the shape of feature-scattering distribution. In this direction, we analyze the scattering degree of five C-pre-processor-based software families and verify whether their empirical cumulative feature-scattering distributions follow power laws. Our results show that feature scattering in the studied subject systems have characteristics of heavy-tailed distributions, with a good-fit with power laws. Hence, we raise awareness that feature scattering thresholds based on central measures may not be reliable in practice.

13beaa3b7239eca3319d54c6a9f3a85a?s=128

ASERG, DCC, UFMG

September 14, 2014
Tweet

Transcript

  1. 1.

    Does Feature Scattering Follow Power-Law Distributions? Leonardo Passos University of

    Waterloo Canada Rodrigo Queiroz Federal University of Minas Gerais Brazil Marco Tulio Valente Federal University of Minas Gerais Brazil Sven Apel University of Passau Germany Krzysztof Czarnecki University of Waterloo Canada 6th International Workshop on Feature-Oriented Software Development An Investigation of Five Pre-Processor-Based Systems 1
  2. 4.

    4

  3. 11.

    Empirical thresholds already exist for certain software metrics… Alves et

    al., ICSM’10: Deriving Metric Thresholds from Benchmark Data (e.g., Mccabe complexity of a method ≤ 14) 11
  4. 19.

    Power-laws have also been shown to describe the behaviour of

    different code metrics in OO systems 19
  5. 21.

    21

  6. 22.

    Heavy-tail Metric value Frequency Typical values (not typical values) Shape

    of a power-law Central tendency statistics (e.g., mean) are not meaningful t The probability of having values > t, however, is not negligible 22
  7. 25.

    Small Medium Large ~ 20K – 40K SLOC ~ 100

    – 180 features ~ 12000K SLOC ~ 12K features ~ 220K – 1500K SLOC ~ 2K features 25
  8. 28.

    Feature Macro names referenced in at least one ifdef annotation

    ifdef annotation #if, #ifdef, #ifndef, #elif 28
  9. 31.

    How to find whether the observed scattering follows a power-law?

    Clauset et al., Society for Industrial and Applied Mathematics Journal’09 Power-law Distributions in Empirical Data 31
  10. 32.

    (1) Infer a power-law describing the collected scattering values (2)

    Perform an statistical test to check whether the power-law is a plausible model (p-value > 0.1) 32
  11. 33.

    33

  12. 34.

    34

  13. 36.

    36

  14. 37.

    In summary Feature scattering is concentrated in specific features skewing

    the distribution We raise awareness that feature scattering thresholds must not rely on central measures We provide preliminary evidence that feature scattering follows power-laws 37