Slide 1

Slide 1 text

Will this clone be short-lived? Towards understanding of the characteristics of short-lived clones - Journal First Presentation - Ahmed Hassan Weiyi Shang Patanamon (Pick) Thongtanunam patanamon.thongtanunam @unimelb.edu.au @patanamon

Slide 2

Slide 2 text

Code clone: a group of code fragments that are nearly-identical For lower maintenance effort, these clones should be refactored to reduce code repetitiveness Monopoly/IncomeTaxSquare.java Monopoly/GoToJailSquare.java “2,241 refactoring instances were detected in 285 GitHub projects [Silva et al., 2016]” !2

Slide 3

Slide 3 text

Refactoring all clones may not be worthwhile Released Version 2 versions 2 versions 10 versions !3

Slide 4

Slide 4 text

Refactoring all clones may not be worthwhile Released Version 2 versions 2 versions 75% and 36% of volatile clones in the carol and dnsjava systems lived for a short-duration [Kim et al., 2005] 10 versions !3

Slide 5

Slide 5 text

Refactoring all clones may not be worthwhile Released Version 2 versions 2 versions 75% and 36% of volatile clones in the carol and dnsjava systems lived for a short-duration [Kim et al., 2005] Many of the long-lived clones cannot be removed using standard refactoring techniques [Kim et al., 2005] 10 versions !3

Slide 6

Slide 6 text

Refactoring all clones may not be worthwhile Released Version 2 versions 2 versions 10 versions !3 Determining the life expectancy of clones in advance may be beneficial when managing clones

Slide 7

Slide 7 text

Understanding clone genealogies and 
 their life expectancy Apache Pig 17 Years 22 Releases 10 Years 35 Releases 13 Years 66 Releases 14 Years 36 Releases 8 Years 15 Releases 11 Years 43 Releases (PQ1) How long do clones live in a software system? (PQ2) How were short-lived and long-lived clones changed throughout their lifetime? !4

Slide 8

Slide 8 text

Identifying clone life expectancy !5

Slide 9

Slide 9 text

Identifying clone life expectancy !5 Code Repository

Slide 10

Slide 10 text

Identifying clone life expectancy !5 Code Repository Extract sequentially developed versions v2.17.1 v2.17.2 v2.17.3 v2.16.0 v2.17.0 v2.18.0 v2.18.1 v2.18.2 v2.18.3 v2.17.4 Using git commands

Slide 11

Slide 11 text

Identifying clone life expectancy Extract clone genealogies 2 versions 6 versions Using iClones [Göde and Koschke, 2009] !5 Code Repository Extract sequentially developed versions v2.17.1 v2.17.2 v2.17.3 v2.16.0 v2.17.0 v2.18.0 v2.18.1 v2.18.2 v2.18.3 v2.17.4 Using git commands

Slide 12

Slide 12 text

Identifying clone life expectancy Identify short-lived & long-lived clones 2 versions 6 versions Using a clustering technique Short Long Extract clone genealogies 2 versions 6 versions Using iClones [Göde and Koschke, 2009] !5 Code Repository Extract sequentially developed versions v2.17.1 v2.17.2 v2.17.3 v2.16.0 v2.17.0 v2.18.0 v2.18.1 v2.18.2 v2.18.3 v2.17.4 Using git commands

Slide 13

Slide 13 text

30% (Maven) - 87% (Jackrabbit) of clones are short-lived clones Maven Pig Tomcat Ant Camel Jackrabbit 1−2 (1.27) 3−5 (4.05) 6−8 (6.57) 12−13 (12.33) 14−16 (14.94) 19−20 (19.43) 1−2 (1.27) 3−4 (3.43) 5−5 (5) 6−7 (6.43) 8−11 (9.22) 12−14 (13.54) 1−7 (3.9) 9−19 (13.12) 21−38 (26.91) 1−2 (1.21) 3−4 (3.32) 6−7 (6.62) 8−9 (8.22) 10−11 (10.79) 12−14 (12.67) 1−6 (2.63) 7−25 (11.06) 1−9 (2.28) 10−28 (16.2) 35−56 (51.5) 0 200 400 0 20 40 60 80 0 200 400 600 800 0 300 600 900 1200 0 25 50 75 100 0 5 10 15 20 Number of versions Number of clones Short-lived Long-lived !6 Identify short-lived & long-lived clones 2 versions 6 versions Using a clustering technique Short Long

Slide 14

Slide 14 text

30% (Maven) - 87% (Jackrabbit) of clones are short-lived clones Maven Pig Tomcat Ant Camel Jackrabbit 1−2 (1.27) 3−5 (4.05) 6−8 (6.57) 12−13 (12.33) 14−16 (14.94) 19−20 (19.43) 1−2 (1.27) 3−4 (3.43) 5−5 (5) 6−7 (6.43) 8−11 (9.22) 12−14 (13.54) 1−7 (3.9) 9−19 (13.12) 21−38 (26.91) 1−2 (1.21) 3−4 (3.32) 6−7 (6.62) 8−9 (8.22) 10−11 (10.79) 12−14 (12.67) 1−6 (2.63) 7−25 (11.06) 1−9 (2.28) 10−28 (16.2) 35−56 (51.5) 0 200 400 0 20 40 60 80 0 200 400 600 800 0 300 600 900 1200 0 25 50 75 100 0 5 10 15 20 Number of versions Number of clones Short-lived Long-lived Short-lived !6 Identify short-lived & long-lived clones 2 versions 6 versions Using a clustering technique Short Long

Slide 15

Slide 15 text

30% (Maven) - 87% (Jackrabbit) of clones are short-lived clones Maven Pig Tomcat Ant Camel Jackrabbit 1−2 (1.27) 3−5 (4.05) 6−8 (6.57) 12−13 (12.33) 14−16 (14.94) 19−20 (19.43) 1−2 (1.27) 3−4 (3.43) 5−5 (5) 6−7 (6.43) 8−11 (9.22) 12−14 (13.54) 1−7 (3.9) 9−19 (13.12) 21−38 (26.91) 1−2 (1.21) 3−4 (3.32) 6−7 (6.62) 8−9 (8.22) 10−11 (10.79) 12−14 (12.67) 1−6 (2.63) 7−25 (11.06) 1−9 (2.28) 10−28 (16.2) 35−56 (51.5) 0 200 400 0 20 40 60 80 0 200 400 600 800 0 300 600 900 1200 0 25 50 75 100 0 5 10 15 20 Number of versions Number of clones Short-lived Long-lived Short-lived Long-lived !6 Identify short-lived & long-lived clones 2 versions 6 versions Using a clustering technique Short Long

Slide 16

Slide 16 text

30% (Maven) - 87% (Jackrabbit) of clones are short-lived clones Maven Pig Tomcat Ant Camel Jackrabbit 1−2 (1.27) 3−5 (4.05) 6−8 (6.57) 12−13 (12.33) 14−16 (14.94) 19−20 (19.43) 1−2 (1.27) 3−4 (3.43) 5−5 (5) 6−7 (6.43) 8−11 (9.22) 12−14 (13.54) 1−7 (3.9) 9−19 (13.12) 21−38 (26.91) 1−2 (1.21) 3−4 (3.32) 6−7 (6.62) 8−9 (8.22) 10−11 (10.79) 12−14 (12.67) 1−6 (2.63) 7−25 (11.06) 1−9 (2.28) 10−28 (16.2) 35−56 (51.5) 0 200 400 0 20 40 60 80 0 200 400 600 800 0 300 600 900 1200 0 25 50 75 100 0 5 10 15 20 Number of versions Number of clones Short-lived Long-lived !6 Identify short-lived & long-lived clones 2 versions 6 versions Using a clustering technique Short Long The life expectancy of short-lived clones account for <17% of all studied releases

Slide 17

Slide 17 text

!7 Consistent changes appear in the long-lived clones more often than short-lived clones

Slide 18

Slide 18 text

Clones are consistently changed !7 Clones are consistently changed Consistent changes appear in the long-lived clones more often than short-lived clones

Slide 19

Slide 19 text

!7 %clone genealogies including consistently changing patterns Ant Camel Jackrabbit Maven Pig Tomcat 35% 25% 25% 26% 37% 31% 19% 9% 14% 15% 16% 10% Short-lived Long-lived Consistent changes appear in the long-lived clones more often than short-lived clones

Slide 20

Slide 20 text

!7 The maintenance effort that is associated with short-lived clones is smaller than the maintenance effort associated with long-lived clones %clone genealogies including consistently changing patterns Ant Camel Jackrabbit Maven Pig Tomcat 35% 25% 25% 26% 37% 31% 19% 9% 14% 15% 16% 10% Short-lived Long-lived Consistent changes appear in the long-lived clones more often than short-lived clones

Slide 21

Slide 21 text

Understanding clone genealogies and 
 their life expectancy Apache Pig 17 Years 22 Releases 10 Years 35 Releases 13 Years 66 Releases 14 Years 36 Releases 8 Years 15 Releases 11 Years 43 Releases (PQ1) How long do clones live in a software system? (PQ2) How were short-lived and long-lived clones changed throughout their lifetime? !8

Slide 22

Slide 22 text

Understanding clone genealogies and 
 their life expectancy (PQ1) How long do clones live in a software system? (PQ2) How were short-lived and long-lived clones changed throughout their lifetime? Many clones lived in the studied systems for a short duration The maintenance effort for short- lived clones is smaller than that for long-lived clones It is important to determine in advance whether a clone will be short-lived or long-lived to manage clones more efficiently !9

Slide 23

Slide 23 text

Building a classifier to determine the life expectancy of a newly-introduced clone A clone at the time when it was injected into the source code A classifier (Random Forest) Train !10 Product metrics Process metrics Clone metrics (Churn, #Developers) (#Lines of Code) (#Clone Siblings) 38 metrics

Slide 24

Slide 24 text

Towards understanding of the characteristics of short-lived clones (RQ1) How well can we determine whether an introduced clone will be short-lived? (RQ2) What are the most influential metrics for determining the clone life expectancy? A classifier (Random Forest) Short-lived? Long-lived? Product metrics Process metrics Clone metrics (Churn, #Developers) !11 (#Lines of Code) (#Clone Siblings)

Slide 25

Slide 25 text

Towards understanding of the characteristics of short-lived clones (RQ1) How well can we determine whether an introduced clone will be short-lived? (RQ2) What are the most influential metrics for determining the clone life expectancy? A classifier (Random Forest) Process metrics Our random forest classifiers achieve an average AUC of 0.63 to 0.92 Clones that are introduced with a large amount of churn made into their methods are more likely to be short-lived Our classifiers and metrics can be used to determine whether a newly-introduced clone will be short-lived !12

Slide 26

Slide 26 text

!13

Slide 27

Slide 27 text

Maven Pig Tomcat Ant Camel Jackrabbit 1−2 (1.27) 3−5 (4.05) 6−8 (6.57) 12−13 (12.33) 14−16 (14.94) 19−20 (19.43) 1−2 (1.27) 3−4 (3.43) 5−5 (5) 6−7 (6.43) 8−11 (9.22) 12−14 (13.54) 1−7 (3.9) 9−19 (13.12) 21−38 (26.91) 1−2 (1.21) 3−4 (3.32) 6−7 (6.62) 8−9 (8.22) 10−11 (10.79) 12−14 (12.67) 1−6 (2.63) 7−25 (11.06) 1−9 (2.28) 10−28 (16.2) 35−56 (51.5) 0 200 400 0 20 40 60 80 0 200 400 600 800 0 300 600 900 1200 Number of versions Long−lived Short−lived Short Long Less consistent changes More consistent changes !13

Slide 28

Slide 28 text

Maven Pig Tomcat Ant Camel Jackrabbit 1−2 (1.27) 3−5 (4.05) 6−8 (6.57) 12−13 (12.33) 14−16 (14.94) 19−20 (19.43) 1−2 (1.27) 3−4 (3.43) 5−5 (5) 6−7 (6.43) 8−11 (9.22) 12−14 (13.54) 1−7 (3.9) 9−19 (13.12) 21−38 (26.91) 1−2 (1.21) 3−4 (3.32) 6−7 (6.62) 8−9 (8.22) 10−11 (10.79) 12−14 (12.67) 1−6 (2.63) 7−25 (11.06) 1−9 (2.28) 10−28 (16.2) 35−56 (51.5) 0 200 400 0 20 40 60 80 0 200 400 600 800 0 300 600 900 1200 Number of versions Long−lived Short−lived Short Long Less consistent changes More consistent changes Clone metrics Product metrics Process metrics A classifier Building a classifier to determine the life expectancy of clones !13

Slide 29

Slide 29 text

Towards understanding of the characteristics of short-lived clones (RQ1) How well can we determine whether an introduced clone will be short-lived? (RQ2) What are the most influential metrics for determining the clone life expectancy? A classifier (Random Forest) Process metrics Our random forest classifiers achieve an average AUC of 0.63 to 0.92 Clones that are introduced with a large amount of churn made into their methods are more likely to be short-lived Our classifiers and metrics can be used to determine whether a newly-introduced clone will be short-lived Maven Pig Tomcat Ant Camel Jackrabbit 1−2 (1.27) 3−5 (4.05) 6−8 (6.57) 12−13 (12.33) 14−16 (14.94) 19−20 (19.43) 1−2 (1.27) 3−4 (3.43) 5−5 (5) 6−7 (6.43) 8−11 (9.22) 12−14 (13.54) 1−7 (3.9) 9−19 (13.12) 21−38 (26.91) 1−2 (1.21) 3−4 (3.32) 6−7 (6.62) 8−9 (8.22) 10−11 (10.79) 12−14 (12.67) 1−6 (2.63) 7−25 (11.06) 1−9 (2.28) 10−28 (16.2) 35−56 (51.5) 0 200 400 0 20 40 60 80 0 200 400 600 800 0 300 600 900 1200 Number of versions Long−lived Short−lived Short Long Less consistent changes More consistent changes Clone metrics Product metrics Process metrics A classifier Building a classifier to determine the life expectancy of clones !13

Slide 30

Slide 30 text

Towards understanding of the characteristics of short-lived clones (RQ1) How well can we determine whether an introduced clone will be short-lived? (RQ2) What are the most influential metrics for determining the clone life expectancy? A classifier (Random Forest) Process metrics Our random forest classifiers achieve an average AUC of 0.63 to 0.92 Clones that are introduced with a large amount of churn made into their methods are more likely to be short-lived Our classifiers and metrics can be used to determine whether a newly-introduced clone will be short-lived Maven Pig Tomcat Ant Camel Jackrabbit 1−2 (1.27) 3−5 (4.05) 6−8 (6.57) 12−13 (12.33) 14−16 (14.94) 19−20 (19.43) 1−2 (1.27) 3−4 (3.43) 5−5 (5) 6−7 (6.43) 8−11 (9.22) 12−14 (13.54) 1−7 (3.9) 9−19 (13.12) 21−38 (26.91) 1−2 (1.21) 3−4 (3.32) 6−7 (6.62) 8−9 (8.22) 10−11 (10.79) 12−14 (12.67) 1−6 (2.63) 7−25 (11.06) 1−9 (2.28) 10−28 (16.2) 35−56 (51.5) 0 200 400 0 20 40 60 80 0 200 400 600 800 0 300 600 900 1200 Number of versions Long−lived Short−lived Short Long Less consistent changes More consistent changes Clone metrics Product metrics Process metrics A classifier Building a classifier to determine the life expectancy of clones Our classifiers and insights can help teams to plan the most effective use of the clone management resources patanamon.thongtanunam @unimelb.edu.au @patanamon http://patanamon.com !13