Slide 1

Slide 1 text

Tushar Sharma Supervisor: Prof. Diomidis Spinellis Extending Maintainability Analysis Beyond Code Smells Funded by SENECA project under Marie-Skłodowska Curie Actions Innovative Training Networks ITN-EID. Grant agreement number 642954. May 2, 2019

Slide 2

Slide 2 text

3 A sincere Thank you to my teachers

Slide 3

Slide 3 text

4 Software Engineering in Enterprise Cloud Applications SENECA Product quality - Hosted by SIG and TU Delft - 3 PhD students Process quality - Hosted by URJC and Bitergia - 3 PhD students Operation quality - Hosted by AUEB and Singular Logic - 3 PhD students including me 38 publications and counting…

Slide 4

Slide 4 text

Overview 5 Introduction (4) Experiments and results (24) Conclusions (8) • Context • Problem statement • Goals • Theoretical background • Method • Experiments • Results • Implications • Contributions • Publications • Future work

Slide 5

Slide 5 text

Context 6 …certain structures in the code that suggest (sometimes they scream for) the possibility of refactoring. - Kent Beck Code Smells Smells’ characteristics • Indicator • Poor solution • Violates best practices • Impacts quality • Recurrence

Slide 6

Slide 6 text

Importance of the research topic 7 Artifact • Maintainability • Effort/cost • Reliability • Change proneness • Testability • Performance Processes People • Morale and motivation • Productivity • Code smells • Implementation smells • Architecture smells • Design smells • Test smells • Performance smells • Configuration smells • Database smells • Models smells • Web smells • … Impact Types

Slide 7

Slide 7 text

8

Slide 8

Slide 8 text

Motivation 10 Empirical studies on code smells • Plenty but lack scale and breadth Smell detection • Common issues • Is deep learning an answer? Extending maintainability analysis • for subdomains of software

Slide 9

Slide 9 text

Research goals 1. Maintainability analysis for configuration code • A method to propose a catalog of configuration smells, detect them, and investigate intra- and inter-category relationships. 2. Maintainability analysis for database schema smells • A mechanism to collate, evaluate, and detect database smells • A method to investigate code quality of embedded SQL statements and understand their impact on database and production code properties. 3. Maintainability analysis for production code • Smells at different granularities • Honoring scale and breadth 4. Smell detection using deep learning • Feasibility to detect smells without extensive feature engineering • Explore transfer-learning applicability 11

Slide 10

Slide 10 text

12 Maintainability analysis on configuration code Research questions Download 4621 Puppet repositories Detected configuration smells Analyze smells to provide answers to the research questions Tools (Puppeteer and PuppetLint) Taxonomy of configuration smells

Slide 11

Slide 11 text

13 Maintainability analysis on configuration code Repositories 4,621 Puppet files 142,662 Class declarations 132,323 Define declarations 39,263 File resources 117,286 Package resources 49,841 Service resources 18,737 Exec declarations 43,468 Lines of code (Puppet only) 8,948,611 Curated a catalog of configuration smells • 13 implementation • 11 design Puppeteer – a tool to detect configuration smells

Slide 12

Slide 12 text

14 Maintainability analysis on configuration code 1. What is the distribution of maintainability smells in configuration code? 2. What is the relationship between the occurrence of design configuration smells and implementation configuration smells? 3. Is the principle of coexistence applicable to smells in configuration projects? 4. Does smell density depend on the size of the configuration project? Research questions

Slide 13

Slide 13 text

15 Maintainability analysis on configuration code Results High correlation between design and implementation configuration smells (⍴ = 0.66) Intra-category correlation analysis – whenever a duplicate entity smell is found, it is likely to find other smells from the same category

Slide 14

Slide 14 text

16 MSR 2016, Austin

Slide 15

Slide 15 text

17 Maintainability analysis on database code Research questions Analyze code Detected database schema smells Analyze smells to provide answers to the research questions DbDeo – SQL statement extractor and database smell detector Catalog of database schema smells 2568 open-source and 357 industrial repositories Developers’ survey

Slide 16

Slide 16 text

18 0 5 10 15 20 25 30 35 40 45 50 CA AL SK MC MD PA MA CT VA IA GT MN OA Respondents Don’t know Recommended practice Neither a smell nor a recommended practice Database schema smell Both a smell and a recommended practice depending on the context Clearly marked smells More context- sensitive

Slide 17

Slide 17 text

19 1. What are the occurrence patterns of database smells? 2. Does the size of the project or the database play a role in smell density? 3. Does the nature of code (type of the application, or usage of ORM frameworks) affect the smell density? 4. What is the degree of co-occurrence among database smells? Research questions Maintainability analysis on database code

Slide 18

Slide 18 text

20 Maintainability analysis on database code Attributes Industry OSS Repositories (initial) 840 16,057 Repositories with SQL statements 357 2,568 Files 2,559,984 3,297,932 LOC 220,489,273 409,155,497 SELECT 51,652 74,096 CREATE TABLE 18,907 50,682 INSERT 74,416 66,830 UPDATE 10,454 29,002 CREATE INDEX 7,152 10,798

Slide 19

Slide 19 text

21 Maintainability analysis on database code ORM (Object-Relational Mapping) frameworks • 19 well-known frameworks identified 0 0.5 1 1.5 2 2.5 3 Projects using ORM (681, 238) Rest of the projects (1887, 199) Average smell density Open-source Industrial The difference is not statistically significant! Thus, ORM frameworks do not bring immunity from database smells.

Slide 20

Slide 20 text

22 ICSE 2018, Gothenburg

Slide 21

Slide 21 text

Maintainability analysis for production code 23 Research questions Detected smells C# C# C# Analyze smells

Slide 22

Slide 22 text

Maintainability analysis for production code 24 1. What is the distribution of implementation, design, and architecture smells in C# code? 2. Do the detected smell instances belonging to different granularities correlate? 3. Is the principle of coexistence applicable to smells in C# projects? 4. Does smell density depend on the size of the C# repository? 5. Are architecture smells collocated with design smells? 6. Can the refactoring of design smells lead to fewer architecture smells? Research questions

Slide 23

Slide 23 text

Maintainability analysis for production code 25 Most frequently occurring smells • Architecture - Cyclic dependency • Design – Cyclically-dependent modularization • Implementation – Magic number Repositories 3,209 Components 75,205 Types 724,854 Methods 3,739,387 LOC (C#) 83,135,679 Results

Slide 24

Slide 24 text

Maintainability analysis for production code 26 Results - Correlation 0 200 400 600 800 1000 0 50 100 150 Design smells Architecture Smells Very strong correlation between architecture and design smell occurrences ⍴ = 0.86 (p-value <2.2e−16) Poor correlation between individual pairs of smells

Slide 25

Slide 25 text

Maintainability analysis for production code 27 Results - Collocation A design smell instance D and architecture smell instance A are considered to be “collocated” if a class reported by the instance D participates in the A instance. • Proposed a mechanism to infer participating classes for an architecture smell • Compute contingency matrix and phi-coefficient Very selective collocation

Slide 26

Slide 26 text

Maintainability analysis for production code 28 Results – Impact of design smell refactoring on architecture smells • Refactored state of the project required after the design smells are refactored • A mapping of potential influence of design smells refactoring on architecture smells • Simulated the refactored state Some architecture smells disappear but a large number of architecture smells remained after design smell refactoring. Therefore, refactoring smells is important at all granularities.

Slide 27

Slide 27 text

29 EMSE conference, 2017 Toronto, Canada Under review – JSS Submitted – Apr 2018

Slide 28

Slide 28 text

30 Detecting smells using deep learning

Slide 29

Slide 29 text

31 0.2 1, sigmoid Embedding layer LSTM layer Dropout layer Dense layer Inputs Output Repeat this set of hidden units Model architectures Detecting smells using deep learning

Slide 30

Slide 30 text

32 RQ1: Would it be possible to use deep learning methods to detect code smells? RQ2: Is transfer-learning feasible in the context of detecting smells? Transfer-learning refers to the technique where a learning algorithm exploits the commonalities between different learning tasks to enable knowledge transfer across the tasks Research questions Detecting smells using deep learning

Slide 31

Slide 31 text

33 Results – RQ1 0.38 0.41 0.31 0.04 0.02 0.22 0.29 0.35 0.68 0.09 0.06 0.02 0 0.2 0.4 0.6 0.8 CNN-1D CNN-2D RNN CNN-1D CNN-2D RNN CNN-1D CNN-2D RNN CNN-1D CNN-2D RNN CM ECB MN MA F1 Deep learning models can detect smells without extensive feature engineering, though their performance is smell-specific. Detecting smells using deep learning

Slide 32

Slide 32 text

34 Results – RQ2 0.38 0.51 0.41 0.57 0.31 0.42 0.04 0.08 0.02 0.06 0.22 0.16 0.29 0.42 0.35 0.48 0.68 0.92 0.09 0.02 0.06 0.00 0.02 0.00 0.0 0.2 0.4 0.6 0.8 1.0 DL TL DL TL DL TL DL TL DL TL DL TL DL TL DL TL DL TL DL TL DL TL DL TL CNN- 1D CNN- 2D RNN CNN- 1D CNN- 2D RNN CNN- 1D CNN- 2D RNN CNN- 1D CNN- 2D RNN CM ECB MN MA F1 Detecting smells using deep learning

Slide 33

Slide 33 text

35 TOSEM

Slide 34

Slide 34 text

Key implications • Code quality practices in IaC paradigm • Apart from production code, subdomains of software systems such as configuration code may also suffer from code quality issues and hence must be treated similar to as production code • IDEs, configuration language, and external tools may help avoid many of the configuration smells • Database smells – its not black and white • Developers perceive smells with context and hence not all smells are equally quality problems • ORM and database schema quality • Though Object-Relational Mapping framework make programming easier with database, they do not bring immunity from database smells 36

Slide 35

Slide 35 text

Key implications • Granularity matters • Smells occur at all granularities and they are correlated • Development teams must identify and refactor smells for all granularities • Even if all design smells are refactored, a significant number of architecture smells remain • Attention on code quality when size increases • Higher attention required on code quality (specifically, at architecture granularity) as the size of a software grows 37

Slide 36

Slide 36 text

Key implications • Deep learning opens new possibilities • We demonstrate the feasibility of applying deep learning to detect smells without extensive feature engineering • It may motivate researchers and developers to explore this direction and build over it. • Make once, use repeatedly • The shown feasibility of transfer-learning leads to reusing existing smell detection tools for other programming languages. • No silver-bullet • We observed there is no optimal universal model; performance of the models depends highly on the smell at hand 38

Slide 37

Slide 37 text

Contributions of the thesis Research 39 • Methods • Findings • Datasets Research

Slide 38

Slide 38 text

Contributions of the thesis Practice • Tools • Designite – http://www.designite-tools.com • DesigniteJava – https://github.com/tushartushar/DesigniteJava • Puppeteer – https://github.com/tushartushar/Puppeteer • DbDeo – https://github.com/tushartushar/dbdeo • Smell catalog • Documented smells: 263 • Leading a way for a new category of tools based on transfer-learning 41

Slide 39

Slide 39 text

Future work • Making code smells detection tools more effective • Exploiting the power of deep learning • Automated refactoring support for architecture smells • Scaling refactoring to large-scale potentially by using/developing bottom- up refactoring support • Software data analytics • Software development generates data representing different aspects that can be combined for better analysis and actionable insights 42

Slide 40

Slide 40 text

Publications based on the thesis 1. Tushar Sharma, Marios Fragkoulis, and Diomidis Spinellis. 2016. Does your configuration code smell?. In Proceedings of the 13th International Conference on Mining Software Repositories (MSR '16). 189-200. 2. Tushar Sharma, Diomidis Spinellis. “A survey on software smells”, Journal of Systems and Software, Volume 138, 2018, Pages 158-173, ISSN 0164-1212. 3. Tushar Sharma, Marios Fragkoulis and Diomidis Spinellis, "House of Cards: Code Smells in Open-Source C# Repositories," 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), Toronto, ON, 2017, pp. 424-429. 4. Tushar Sharma, Marios Fragkoulis, Stamatia Rizou, Magiel Bruntink, and Diomidis Spinellis. ``Smelly relations: measuring and understanding database schema quality'', In Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP '18). pp 55-64. 5. Tushar Sharma. 2018. Detecting and managing code smells: research and practice. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings (ICSE '18). ACM, New York, NY, USA, 546-547. 43

Slide 41

Slide 41 text

Publications based on the thesis 6. Tushar Sharma. How Deep Is the Mud: Fathoming Architecture Technical Debt Using Designite. To appear in International Conference of Technical Debt (TechDebt'19), Tools track. Submitted 7. Tushar Sharma, Paramvir Singh, Diomidis Spinellis, "An Empirical Investigation on the Relationship between Design and Architecture Smells” under review in Journal of Software and Systems (JSS). Apr 2018. 8. Tushar Sharma, Vasiliki Efstathiou, Panos Louridas, and Diomidis Spinellis. On the Feasibility of Transfer-learning Code Smells using Deep Learning. April 2019. Eprint available at: https://arxiv.org/abs/1904.03031 (To be submitted at TOSEM) 44

Slide 42

Slide 42 text

45 Tushar Sharma http://www.tusharma.in