Feature Scattering in the Large: A Longitudinal Study of Linux Kernel Device Drivers (Modularity 2015) - Best Paper Award (slides and presentation by L. Passos)

Feature Scattering in the Large: A Longitudinal Study of Linux Kernel Device Drivers (Modularity 2015) - Best Paper Award (slides and presentation by L. Passos)

Feature code is often scattered across wide parts of the code base. But, scattering is not necessarily bad if used with care—in fact, systems with highly scattered features have evolved successfully over years. Among others, feature scattering allows developers to circumvent limitations in programming languages and system design. Still, little is known about the characteristics governing scattering, which factors influence it, and practical limits in the evolution of large and long-lived systems. We address this issue with a longitudinal case study of feature scattering in the Linux kernel. We quantitatively and qualitatively analyze almost eight years of its development history, focusing on scattering of device-driver features. Among others, we show that, while scattered features are regularly added, their proportion is lower than non-scattered ones, indicating that the kernel architecture allows most features to be integrated in a modular manner. The median scattering degree of features is constant and low, but the scattering-degree distribution is heavily skewed. Thus, using the arithmetic mean is not a reliable threshold to monitor the evolution of feature scattering. When investigating influencing factors, we find that platform-driver features are 2.5 times more likely to be scattered across architectural (subsystem) boundaries when compared to nonplatform ones. Their use illustrates a maintenance-performance trade-off in creating architectures as for Linux kernel device drivers

13beaa3b7239eca3319d54c6a9f3a85a?s=128

ASERG, DCC, UFMG

March 18, 2015
Tweet

Transcript

  1. Feature Scattering in the Large: A Longitudinal Study of Linux

    Kernel Device Drivers Leonardo Passos lpassos@gsd.uwaterloo.ca University of Waterloo Canada Modularity’15 Research Track 1 Krzysztof Czarnecki kczarnec@gsd.uwaterloo.ca University of Waterloo Canada Thorsten Berger jpadillagaeta@gsd.uwaterloo.ca University of Waterloo Canada Sven Apel apel@uni-passau.de University of Passau Germany Jesús Padilla jpadillagaeta@gsd.uwaterloo.ca University of Waterloo Canada Marco Tulio Valente mtov@dcc.ufmg.br Federal University of Minas Gerais Brazil
  2. 2 Feature = configuration option Feature CONFIG_ACPI is scattered across

    the IA-64 CPU code
  3. Hinders parallel development 3

  4. 4

  5. Leads to code tangling, negatively affecting code understanding 5

  6. Nonetheless, feature scattering is a popular mechanism to support new

    features 6
  7. Quick solution understood by all developers 7

  8. No modules, no interfaces, no design patterns, etc. 8

  9. Allows overcoming modularity limitations in existing programming languages (not every

    feature can be modular) 9
  10. 10 Many large & long-lived software systems have shown that

    is possible to continuously-evolve in the face of feature scattering axTLS Coreboot SeaBIOS FreeBSD
  11. However, no empirical study has investigated feature scattering in the

    evolution of large and long-lived systems 11
  12. Such kind of studies are key in creating a general

    theory on how to effectively manage feature scattering 12
  13. 13 Scattering How could a theory help? Scattering is harmful

    Scattering is not necessarily bad (easy & cheap solution)
  14. Many empirical works have to be performed before devising such

    a theory 14
  15. “A journey of a thousand miles must begin with the

    first step” 15
  16. Starting point: the Linux kernel 16

  17. 17 > 13,000 features feature-oriented system continuously evolving

  18. 18 r = 0.996

  19. 19 Scope: device-driver features

  20. 20 In our analyses, we consider scattering of features in

    terms of referring ifdefs
  21. Scattering degree (SD) of a feature f Nbr. of ifdefs

    referring to f 21
  22. 22 #ifdef, #ifndef, #elif, #if ifdefs

  23. 23 #ifdef CONFIG_ACPI || CONFIG_PM #ifndef CONFIG_ACPI #if defined(CONFIG_ACPI) ...

    A feature f is scattered if its SD(f) ≥ 2
  24. From the kernel evolution history, some limits clearly emerge... 24

  25. % of scattered features is nearly constant (~ 18%) 25

  26. Local vs global scattering 26

  27. A feature is locally scattered when its referring ifdefs are

    restricted to files in the driver subsystem only 27
  28. A feature is globally scattered when there is at least

    one referring ifdef in a file outside the driver subsystem 28
  29. Stabilization (~ 43%) 29 % of globally scattered features is

    increasing, but ≤ 43% at all times
  30. What about the scattering degree of features? 30

  31. 31

  32. For 50% (median) of scattered-driver features, SD ≤ 4 32

  33. For 75%, SD ≤ 8 33

  34. Non-outlier features: 8 < SD ≤ 55 34

  35. 35 Outliers: 35 ≤ SD ≤ 377

  36. There appears to exist different groups in Linux, with different

    SD-limits 36
  37. Group 1 (low SD): SD ≤ 4 37 50% of

    scattered-driver features
  38. 38 Group 2 (medium SD): 5 ≤ SD ≤ 8

    25% of scattered-driver features
  39. 39 Group 3 (high SD): SD > 8 Non-outliers: ~

    22.5% Outliers: ~ 2.5% (max SD = 377)
  40. 40 … a single SD-limit controlling all features does not

    seem to apply
  41. 41 In summary, …

  42. 42 % of scattered-driver features ~ 18% % of globally

    scattered-driver features ≤ 43% SD is not defined by a single absolute value, although most features (75%) have SD ≤ 8
  43. 43 75% of scattered-driver features have SD ≤ 8 no

    more than 25% of features have SD > 8 (relative limit)
  44. What about possible factors influencing the observed scattering? 44

  45. Platform-driver features: features whose drivers support devices that cannot be

    discovered by the CPU 45 Infrastructure-driver features: abstractions in the O.S domain (e.g., ACPI)
  46. The analyses of a random-sample shows statistically significant results :

    46 Platform-driver features are 2.5x more likely to being globally scattered than non-platform ones
  47. 47 In the sample, global scattering of platform-driver features occurs

    mostly in the arch subsystem Tight relationship between platform- driver features and CPU-dependent code (hard to modularize)
  48. 48 In general, there is no relationship between infrastructure-driver features

    and global-/local-scattering
  49. 49 In general, there is no relationship between being a

    platform-driver or infrastructure feature in scattering degree
  50. 50 There is, however, a relationship between extreme scattering and

    infrastructure-related driver features
  51. 51 9/15 are infrastructure extreme scattering

  52. 52 Wrapping up…

  53. 53 In the Linux kernel Most driver features are not

    scattered (~ 82%) C-language modularity constructs + the kernel plugin-based architecture are “good enough”
  54. 54 In the Linux kernel When the existing solutions are

    not good enough, developers scatter features in code
  55. 55 In the Linux kernel Scattering seems to respect some

    limits (consciously enforced???)
  56. 56 Next steps

  57. 57 Conduct interviews Are the observed limits consciously enforced in

    practice? If so, how, and how were they set-up?
  58. 58 If not, why do they occur? Do they indirectly

    stem from some development practice or process?
  59. 59 Investigate whether the limits we found also apply to

    other systems (ongoing collaborative work)
  60. 60 Thanks for listening :) http://lpassos.bitbucket.org/modularity15/