Does Feature Scattering Follow Power-Law Distributions? (FOSD 2014)

Does Feature Scattering Follow Power-Law Distributions? (FOSD 2014)

eature scattering is long said to be an undesirable characteristic in source code. Since scattered features introduce extensions across the code base, their maintenance requires analyzing and changing different locations in code, possibly causing ripple effects. Despite this fact, scattering often occurs in practice, either due to limitations in existing programming languages (e.g., imposition of a dominant decomposition) or time-pressure issues. In the latter case, scattering provides a simple way to support new capabilities, avoiding the upfront investment of creating modules and interfaces (when possible). Hence, we argue that scattering is not necessarily bad, provided it is kept within certain limits, or thresholds. Extracting thresholds, however, is not a trivial task. For instance, research shows that some source code-metric distributions are heavy-tailed, usually following power-law models. In the face of heavy-tailed distributions, reporting metrics in terms of averages and standard deviations is unreliable, although commonly done so. Thus, prior to extracting reliable thresholds for feature scattering, one must understand the shape of feature-scattering distribution. In this direction, we analyze the scattering degree of five C-pre-processor-based software families and verify whether their empirical cumulative feature-scattering distributions follow power laws. Our results show that feature scattering in the studied subject systems have characteristics of heavy-tailed distributions, with a good-fit with power laws. Hence, we raise awareness that feature scattering thresholds based on central measures may not be reliable in practice.

13beaa3b7239eca3319d54c6a9f3a85a?s=128

ASERG, DCC, UFMG

September 14, 2014
Tweet

Transcript

  1. Does Feature Scattering Follow Power-Law Distributions? Leonardo Passos University of

    Waterloo Canada Rodrigo Queiroz Federal University of Minas Gerais Brazil Marco Tulio Valente Federal University of Minas Gerais Brazil Sven Apel University of Passau Germany Krzysztof Czarnecki University of Waterloo Canada 6th International Workshop on Feature-Oriented Software Development An Investigation of Five Pre-Processor-Based Systems 1
  2. Feature scattering is often stated as undesirable characteristic in code

    2
  3. Hinders parallel development 3

  4. 4

  5. Leads to code tangling, making code harder to understand 5

  6. Feature scattering is often found in practice 6

  7. Allows to quickly incorporate new features 7

  8. No modules, no interfaces, and no design patterns 8

  9. Allows overcoming modularity limitations in existing programming languages (not every

    feature can be modular) 9
  10. Scattering is not necessarily bad, if kept within limits (thresholds)

    10
  11. Empirical thresholds already exist for certain software metrics… Alves et

    al., ICSM’10: Deriving Metric Thresholds from Benchmark Data (e.g., Mccabe complexity of a method ≤ 14) 11
  12. … but no thresholds exist for feature scattering 12

  13. First step towards reliable scattering thresholds Understand scattering distribution 13

  14. Does feature scattering follow power-law distributions? 14

  15. Why power-laws? 15

  16. One kind of heavy-tailed distribution describing many phenomena… 16

  17. Populations in different cities 17

  18. Earthquake intensity 18

  19. Power-laws have also been shown to describe the behaviour of

    different code metrics in OO systems 19
  20. (SCAM’03) (OOPSLA’2006) (TSE’2007) (TOSEM’08) 20

  21. 21

  22. Heavy-tail Metric value Frequency Typical values (not typical values) Shape

    of a power-law Central tendency statistics (e.g., mean) are not meaningful t The probability of having values > t, however, is not negligible 22
  23. Knowing that scattering fits power-law distributions affects how one extracts

    practical thresholds (a typical limit) 23
  24. We verify feature scattering and adherence to power-laws in five

    open-source software families… 24
  25. Small Medium Large ~ 20K – 40K SLOC ~ 100

    – 180 features ~ 12000K SLOC ~ 12K features ~ 220K – 1500K SLOC ~ 2K features 25
  26. In all subjects, variability is captured by ifdefs macro directives

    26
  27. Extracted from Linux (drivers/mmc/core/core.c) Feature 27

  28. Feature Macro names referenced in at least one ifdef annotation

    ifdef annotation #if, #ifdef, #ifndef, #elif 28
  29. Scattering degree of a feature Number of ifdef annotations that

    a feature appears 29
  30. http://rodrigoqueiroz.bitbucket.org/fosd2014.html 30

  31. How to find whether the observed scattering follows a power-law?

    Clauset et al., Society for Industrial and Applied Mathematics Journal’09 Power-law Distributions in Empirical Data 31
  32. (1) Infer a power-law describing the collected scattering values (2)

    Perform an statistical test to check whether the power-law is a plausible model (p-value > 0.1) 32
  33. 33

  34. 34

  35. From the data, we verify that… 35

  36. 36

  37. In summary Feature scattering is concentrated in specific features skewing

    the distribution We raise awareness that feature scattering thresholds must not rely on central measures We provide preliminary evidence that feature scattering follows power-laws 37
  38. Future work Investigate a larger corpus of systems Extract thresholds

    for future scattering 38