Does Feature Scattering Follow Power-Law Distributions? (FOSD 2014)

Does Feature Scattering Follow Power-Law Distributions? Leonardo Passos University of
Waterloo Canada Rodrigo Queiroz Federal University of Minas Gerais Brazil Marco Tulio Valente Federal University of Minas Gerais Brazil Sven Apel University of Passau Germany Krzysztof Czarnecki University of Waterloo Canada 6th International Workshop on Feature-Oriented Software Development An Investigation of Five Pre-Processor-Based Systems 1

Feature scattering is often stated as undesirable characteristic in code
2

Hinders parallel development 3

Leads to code tangling, making code harder to understand 5

Feature scattering is often found in practice 6

Allows to quickly incorporate new features 7

No modules, no interfaces, and no design patterns 8

Allows overcoming modularity limitations in existing programming languages (not every
feature can be modular) 9

Scattering is not necessarily bad, if kept within limits (thresholds)
10

Empirical thresholds already exist for certain software metrics… Alves et
al., ICSM’10: Deriving Metric Thresholds from Benchmark Data (e.g., Mccabe complexity of a method ≤ 14) 11

… but no thresholds exist for feature scattering 12

First step towards reliable scattering thresholds Understand scattering distribution 13

Does feature scattering follow power-law distributions? 14

Why power-laws? 15

One kind of heavy-tailed distribution describing many phenomena… 16

Populations in different cities 17

Earthquake intensity 18

Power-laws have also been shown to describe the behaviour of
different code metrics in OO systems 19

(SCAM’03) (OOPSLA’2006) (TSE’2007) (TOSEM’08) 20

Heavy-tail Metric value Frequency Typical values (not typical values) Shape
of a power-law Central tendency statistics (e.g., mean) are not meaningful t The probability of having values > t, however, is not negligible 22

Knowing that scattering fits power-law distributions affects how one extracts
practical thresholds (a typical limit) 23

We verify feature scattering and adherence to power-laws in five
open-source software families… 24

Small Medium Large ~ 20K – 40K SLOC ~ 100
– 180 features ~ 12000K SLOC ~ 12K features ~ 220K – 1500K SLOC ~ 2K features 25

In all subjects, variability is captured by ifdefs macro directives
26

Extracted from Linux (drivers/mmc/core/core.c) Feature 27

Feature Macro names referenced in at least one ifdef annotation
ifdef annotation #if, #ifdef, #ifndef, #elif 28

Scattering degree of a feature Number of ifdef annotations that
a feature appears 29

http://rodrigoqueiroz.bitbucket.org/fosd2014.html 30

How to find whether the observed scattering follows a power-law?
Clauset et al., Society for Industrial and Applied Mathematics Journal’09 Power-law Distributions in Empirical Data 31

(1) Infer a power-law describing the collected scattering values (2)
Perform an statistical test to check whether the power-law is a plausible model (p-value > 0.1) 32

From the data, we verify that… 35

In summary Feature scattering is concentrated in specific features skewing
the distribution We raise awareness that feature scattering thresholds must not rely on central measures We provide preliminary evidence that feature scattering follows power-laws 37

Future work Investigate a larger corpus of systems Extract thresholds
for future scattering 38

Does Feature Scattering Follow Power-Law Dist...

Does Feature Scattering Follow Power-Law Distributions? (FOSD 2014)

More Decks by ASERG, DCC, UFMG

Other Decks in Research

Featured

Transcript