Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Does Feature Scattering Follow Power-Law Distributions? (FOSD 2014)

ASERG, DCC, UFMG
September 14, 2014

Does Feature Scattering Follow Power-Law Distributions? (FOSD 2014)

eature scattering is long said to be an undesirable characteristic in source code. Since scattered features introduce extensions across the code base, their maintenance requires analyzing and changing different locations in code, possibly causing ripple effects. Despite this fact, scattering often occurs in practice, either due to limitations in existing programming languages (e.g., imposition of a dominant decomposition) or time-pressure issues. In the latter case, scattering provides a simple way to support new capabilities, avoiding the upfront investment of creating modules and interfaces (when possible). Hence, we argue that scattering is not necessarily bad, provided it is kept within certain limits, or thresholds. Extracting thresholds, however, is not a trivial task. For instance, research shows that some source code-metric distributions are heavy-tailed, usually following power-law models. In the face of heavy-tailed distributions, reporting metrics in terms of averages and standard deviations is unreliable, although commonly done so. Thus, prior to extracting reliable thresholds for feature scattering, one must understand the shape of feature-scattering distribution. In this direction, we analyze the scattering degree of five C-pre-processor-based software families and verify whether their empirical cumulative feature-scattering distributions follow power laws. Our results show that feature scattering in the studied subject systems have characteristics of heavy-tailed distributions, with a good-fit with power laws. Hence, we raise awareness that feature scattering thresholds based on central measures may not be reliable in practice.

ASERG, DCC, UFMG

September 14, 2014
Tweet

More Decks by ASERG, DCC, UFMG

Other Decks in Research

Transcript

  1. Does Feature Scattering Follow
    Power-Law Distributions?
    Leonardo Passos
    University of Waterloo
    Canada
    Rodrigo Queiroz
    Federal University of Minas Gerais
    Brazil
    Marco Tulio Valente
    Federal University of Minas Gerais
    Brazil
    Sven Apel
    University of Passau
    Germany
    Krzysztof Czarnecki
    University of Waterloo
    Canada
    6th International Workshop on Feature-Oriented Software Development
    An Investigation of Five Pre-Processor-Based Systems
    1

    View Slide

  2. Feature scattering is often stated as
    undesirable characteristic in code
    2

    View Slide

  3. Hinders parallel development
    3

    View Slide

  4. 4

    View Slide

  5. Leads to code tangling, making code harder to understand
    5

    View Slide

  6. Feature scattering is often found
    in practice
    6

    View Slide

  7. Allows to quickly incorporate new features
    7

    View Slide

  8. No modules, no interfaces, and no design patterns
    8

    View Slide

  9. Allows overcoming modularity limitations in existing
    programming languages
    (not every feature can be modular)
    9

    View Slide

  10. Scattering is not necessarily bad,
    if kept within limits (thresholds)
    10

    View Slide

  11. Empirical thresholds already exist
    for certain software metrics…
    Alves et al., ICSM’10: Deriving Metric Thresholds from Benchmark Data
    (e.g., Mccabe complexity of a method ≤ 14)
    11

    View Slide

  12. … but no thresholds exist
    for feature scattering
    12

    View Slide

  13. First step towards reliable scattering thresholds
    Understand scattering distribution
    13

    View Slide

  14. Does feature scattering follow
    power-law distributions?
    14

    View Slide

  15. Why power-laws?
    15

    View Slide

  16. One kind of heavy-tailed distribution
    describing many phenomena…
    16

    View Slide

  17. Populations in different cities
    17

    View Slide

  18. Earthquake intensity
    18

    View Slide

  19. Power-laws have also been shown to describe
    the behaviour of different code metrics
    in OO systems
    19

    View Slide

  20. (SCAM’03)
    (OOPSLA’2006)
    (TSE’2007)
    (TOSEM’08)
    20

    View Slide

  21. 21

    View Slide

  22. Heavy-tail
    Metric value
    Frequency
    Typical values
    (not typical values)
    Shape of a power-law
    Central tendency statistics (e.g., mean)
    are not meaningful
    t
    The probability of having values > t,
    however, is not negligible
    22

    View Slide

  23. Knowing that scattering fits power-law
    distributions affects how one extracts
    practical thresholds (a typical limit)
    23

    View Slide

  24. We verify feature scattering and adherence to
    power-laws in five open-source
    software families…
    24

    View Slide

  25. Small Medium Large
    ~ 20K – 40K SLOC
    ~ 100 – 180 features
    ~ 12000K SLOC
    ~ 12K features
    ~ 220K – 1500K SLOC
    ~ 2K features 25

    View Slide

  26. In all subjects, variability is captured by
    ifdefs macro directives
    26

    View Slide

  27. Extracted from Linux (drivers/mmc/core/core.c)
    Feature
    27

    View Slide

  28. Feature Macro names referenced
    in at least one ifdef annotation
    ifdef
    annotation
    #if, #ifdef, #ifndef, #elif
    28

    View Slide

  29. Scattering degree
    of a feature
    Number of ifdef annotations that a feature
    appears
    29

    View Slide

  30. http://rodrigoqueiroz.bitbucket.org/fosd2014.html
    30

    View Slide

  31. How to find whether the observed
    scattering follows a power-law?
    Clauset et al., Society for Industrial and Applied Mathematics Journal’09
    Power-law Distributions in Empirical Data
    31

    View Slide

  32. (1) Infer a power-law describing
    the collected scattering values
    (2) Perform an statistical test to
    check whether the power-law is a
    plausible model (p-value > 0.1)
    32

    View Slide

  33. 33

    View Slide

  34. 34

    View Slide

  35. From the data, we verify that…
    35

    View Slide

  36. 36

    View Slide

  37. In summary
    Feature scattering is concentrated in specific features
    skewing the distribution
    We raise awareness that feature scattering thresholds must
    not rely on central measures
    We provide preliminary evidence that feature scattering follows
    power-laws
    37

    View Slide

  38. Future work
    Investigate a larger corpus of systems
    Extract thresholds for future scattering
    38

    View Slide