Slide 1

Slide 1 text

Stemming Architectural Decay in Software Systems Nenad Medvidović Computer Science Department Viterbi School of Engineering University of Southern California Los Angeles, CA, USA [email protected] http://csse.usc.edu/~neno/

Slide 2

Slide 2 text

Part I Software Architectures Real and Imagined

Slide 3

Slide 3 text

A Bit of Terminology • A software system’s architecture is the set of principal design decisions about the system – Software architecture is the blueprint for a software system’s construction and evolution • Design decisions encompass every facet of the system under development – Structure – Behavior – Interaction – Non-functional properties – Deployment – …

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Temporal Aspect of Architecture • Design decisions are made and unmade over a system’s lifetime – At time t a system has only one architecture • Prescriptive architecture (PA) captures design decisions made prior to system construction – as-designed • Descriptive architecture (DA) describes how the system has been built – as-implemented

Slide 6

Slide 6 text

iRODS – Descriptive Architecture How Many Systems Start off iRODS – Prescriptive Architecture …and End up

Slide 7

Slide 7 text

What Happened? • Software decay – Drift – introduction of design decisions into a system that are not encompassed or implied by its architectural design – Erosion – introduction of design decisions into a system that violate its architectural design • Decaying systems begin to “smell” «More on this later…

Slide 8

Slide 8 text

Decay in Real Systems? Linux – Prescriptive Architecture Linux – Descriptive Architecture

Slide 9

Slide 9 text

Top-Level Architecture – Another View

Slide 10

Slide 10 text

Another Example Hadoop Distributed File System – Prescriptive Architecture HDFS – Descriptive Architecture

Slide 11

Slide 11 text

Hadoop – HDFS + MapReduce

Slide 12

Slide 12 text

Hadoop – Complete Architecture

Slide 13

Slide 13 text

Hadoop – Complete Architecture, Another View

Slide 14

Slide 14 text

One More Example Bash Prescriptive Architecture Bash Descriptive Architecture

Slide 15

Slide 15 text

Part II Decay

Slide 16

Slide 16 text

Can We “Smell” Decay? • Yes, both in the design and code • Software smell • Commonly made design or implementation decision • Negatively impacts your system’s lifecycle properties • It is not a bug – it doesn’t break your system • Our goal is to discover architectural design smells automatically • Inspired by • Refactoring: Improving the Design of Existing Code by Martin Fowler

Slide 17

Slide 17 text

A Catalogue of Architectural Smells • Brick Concern Overload • Brick Use Overload • Brick Dependency Cycle • Unused Interface • Ambiguous Interface • Duplicate Component Functionality • Scattered Functionality • Component Envy • Connector Envy • Connector Chain • Extraneous Adjacent Connector • …

Slide 18

Slide 18 text

Examples of Smells from Real Systems This document contains no technical data subject to the EAR or the ITAR.

Slide 19

Slide 19 text

Linux Architecture

Slide 20

Slide 20 text

Linux – Memory Manager Subsystem

Slide 21

Slide 21 text

Bash Architecture

Slide 22

Slide 22 text

Bash – Job Control Component

Slide 23

Slide 23 text

Bash – Commands Component

Slide 24

Slide 24 text

Hadoop – Complete Architecture

Slide 25

Slide 25 text

Hadoop – Dependency Cycle

Slide 26

Slide 26 text

Hadoop – Component Use Overload

Slide 27

Slide 27 text

Hadoop – Brick Concern Overload Value Aggregator

Slide 28

Slide 28 text

Hadoop – *Envy InterDataNode Protocol

Slide 29

Slide 29 text

Part III Recovery (and then refactoring)

Slide 30

Slide 30 text

Fable of Two Systems Prescriptive Architecture Descriptive Architecture

Slide 31

Slide 31 text

News Flash! Prescriptive Architecture Descriptive Architecture

Slide 32

Slide 32 text

iRODS – Descriptive Architecture How Many Systems Start off iRODS – Prescriptive Architecture …and End up

Slide 33

Slide 33 text

Smells Impact Real Systems Smelly files are more issue prone Smelly files tend to be more change prone

Slide 34

Slide 34 text

How Do We Know? • Architecture recovery – The process of determining a system’s architecture from its implementation-level artifacts – Source code, executable files, Java .class files, … • Difficult in practice – Size of code bases – Irrelevant details – Misleading details – Missing information – Hard to objectively assess existing techniques • Still, automated solutions are available

Slide 35

Slide 35 text

What Are These Solutions You Speak of? • ACDC – Algorithm for Comprehension-Driven Clustering – Structural pattern-based clustering • ARC – Architecture Recovery Using Concerns – Concern-based hierarchical clustering based on similarity measure • Bunch-NAHC & Bunch-SAHC – Hill-climbing algorithm for maximizing Modularization Quality • LIMBO – scaLable InforMation BOttleneck – Probabilistic hierarchical clustering • WCA-UE & WCA-UENM – Weigted Combined Algorithm – Dependency-based hierarchical clustering • ZBR – Zone-Based Recovery – Hierarchical clustering based on textual information • PKG – Implementation Package Structure

Slide 36

Slide 36 text

Do They Really Work? Bash from ACDC Bash from Bunch Bash from ZBR Bash “Ground Truth”

Slide 37

Slide 37 text

A More In-Depth Study • Eight architectures of six open-source systems • Previously obtained ground-truths for each ArchStudio 4 IDE Java 280K 54 comp. Bash 1.14.4 OS Shell C 70K 25 Hadoop 0.19.0 Data Prc Java 200K 68 Linux-C 2.0.27 OS C 750K 7 Linux-D 2.0.27 OS C 750K 120 Mozilla-C 1.3 Browser C/C++ 4M 10 Mozilla-D 1.3 Browser C/C++ 4M 233 OODT 0.2 Data Mgt Java 180K 217

Slide 38

Slide 38 text

Proximity to Ground Truth

Slide 39

Slide 39 text

Cluster Comparison

Slide 40

Slide 40 text

Part IV Understanding Architecture as a “Big Data” problem

Slide 41

Slide 41 text

Software Architecture ARCADE Architecture Recovery, Change, and Decay Evaluator Source Code Issue Repository Recovery Techniques Issue Extractor Issues Architectures Architectural Smell Detector Architectural- Smell Instances Change Metrics Calculator Decay Metrics Calculator Change Metrics Decay Metrics Relation Analyzer Correlation Data

Slide 42

Slide 42 text

Empirical Study of Change and Decay 1. In what ways do architectures change? 2. When and how do architectures decay? 3. What is the relationship between architectural smells and implementation issues? 42

Slide 43

Slide 43 text

Several Subject Systems 43 System Application Domain Versions Time MSLOC ActiveMQ Message Broker 20 8/04-12/05 3.4 Cassandra Distributed DBMS 127 9/09-9/13 22.0 Chukwa Data Monitoring 7 5/09-2/14 2.2 Hadoop Data Processing 63 4/06-8/13 30.0 Ivy Dependency Manager 20 12/07-2/14 0.4 JackRabbit Content Repository 97 8/04-2/14 34.0 Jena Semantic Web Framework 7 6/12-9/13 2.7 JSPWiki Wiki Engine 54 10/07-3/14 1.2 Log4j Logging 41 01/01-06/14 2.4 Lucene Search Engine 21 12/10-1/14 5.1 Mina Network Framework 40 11/06-11/12 2.3 PDFBox PDF Library 17 2/08-3/14 2.7 Struts Web Apps 36 3/00-2/14 6.7 Xerces XML Library 22 3/03-11/09 2.3

Slide 44

Slide 44 text

A Few Background Bits • Versioning Scheme – major.minor.patch release • Change metrics – MojoFM – a2a – c2c • Decay metrics – # structural dependencies – Change proneness – Coupling and cohesion – Smell density and coverage 44 1.5.3 1.6.0 1.6.1 2.0.0

Slide 45

Slide 45 text

Recovery Techniques Used • PKG – package structure recovery • ACDC* – algorithm for comprehension-driven clustering • ARC** – architecture recovery using concerns * V. Tzerpos et al., ACDC: an algorithm for comprehension-driven clustering, In Working Conference on Reverse Engineering (WCRE), 2000 ** J. Garcia et al., Enhancing architectural recovery using concerns, In International Conference on Automated Software Engineering (ASE), 2011

Slide 46

Slide 46 text

How Architectures Change Value unit is percentage Lower numbers mean more change 0 10 20 30 40 50 60 70 80 90 100 Ivy Lucene JSPWiki Ivy Lucene JSPWiki Ivy Lucene JSPWiki Ivy Lucene JSPWiki Major MinMajor Minor Patch Average a2a values between versions ACDC ARC PKG Architecture Similarity “Reversed” architecture changes On average, architecture changes range from 15-25% Changes differ between different views Major < MinMajor < Minor < Patch

Slide 47

Slide 47 text

System vs. Component Level • Architecture changes occur within components even when the system’s overall architectural structure remains relatively stable Architectural similarity between minor versions of “Ivy” ARC view: architecture changes more than 80% within components

Slide 48

Slide 48 text

• Dramatic architecture change can occur across minor versions 0 10 20 30 40 50 60 70 80 90 100 Minimum a2a values between minor versions ACDC ARC PKG RQ3 – When Significant Change Occurs Architecture changes > 50% Architecture Similarity

Slide 49

Slide 49 text

At What Point Does Change Become Decay? Apache Chukwa 0.3.0 Apache Chukwa 0.4.0

Slide 50

Slide 50 text

Architectural Decay 50 0 20 40 60 80 100 120 co lo spf dc Versions v1 v123 Cassandra’s architectures recovered using ARC

Slide 51

Slide 51 text

Smells Impact Real Systems Smelly files are more issue prone Smelly files tend to be more change prone Smelly files are continuously involved in many issues

Slide 52

Slide 52 text

On-Going Work • Identify, understand, and catalogue smells • Identify patterns indicating decay • Collect and document ground-truths • Improve architectural recovery – Better code analysis – Dynamic system aspects • Study correlation/causality of architectural change & decay with – implementation issues – code decay – refactoring – self-adaptive behavior

Slide 53

Slide 53 text

Acknowledgments • Supporters – NSF – Bosch RTC – NASA/JPL • Project participants and collaborators - Pooyan Behnamghader, USC - Yuanfang Cai, Drexel U. - Eric Dashofy, Aerospace Corp. - Chris Douglas, Microsoft - Eder Figueroa, USC - Alessandro Garcia, PUC-Rio - Joshua Garcia, UC Irvine - Muzzammil Imam, USC - Igor Ivkovic, U. of Waterloo - Ivo Krka, Google - Duc Le, USC - Daniel Link, USC - Isela Macia, PUC-Rio - Chris Mattmann, NASA/JPL - Daniel Popescu, Google - Arman Shahbazian, USC - Yixue Zhao, USC – Infosys – Northrop Grumman – Huawei