Code review at speed: How can we use data to help developers do code review faster?

Slide 1

Slide 1 text

Patanamon (Pick) Thongtanunam [email protected] @patanamon ARC DECRA & Lecturer at School of Computing and Information Systems (CIS) http://patanamon.com Code Review at Speed: How can we use data to help developers do code review faster? 1

Slide 2

Slide 2 text

Create tasks Write code Build & test code Integrate Release/ Deploy A General View of Continuous Integration Code Review: A QA practice that manually examines a new code change Code Review A bug is here… Improve the overall quality of software systems [Thongtanunam et al. 2015, McIntosh et al. 2016] Increase team awareness, transfer knowledge & share code ownership [Bachelli and Bird 2013, Thongtanunam et al., 2016, Sadowski et al, 2018] 2

Slide 3

Slide 3 text

Code Review: A QA practice that manually examine a new code change An author A code change (1) Uploading the change (2) Inviting reviewers Reviewers (3) Examining the change (4) Automated testings The approved change is integrated into the software system Rejected Accepted Fail Changes are abandoned A revision is required Pass A collaborative code review tool 3

Slide 4

Slide 4 text

A large number of new code changes can pose challenges to perform effective code reviews 100 - 1,000 reviews were performed in a month, and each review took 1 day on average [Rigby and Bird, 2013] ~600 ~400 ~550 #Reviews/ month [Thongtanunam and Hassan, 2020] 4

Slide 5

Slide 5 text

A large number of new code changes can pose challenges to perform effective code reviews Non-responding invited reviewers [Ruangwan et al, 2019] Challenges: 5

Slide 6

Slide 6 text

Reviewers may not respond to the review invitation An author A code change (1) Uploading the change (2) Inviting reviewers Reviewers (3) Examining the change (4) Automated testings The approved change is integrated into the software system Rejected Accepted Fail Changes are abandoned A revision is required Pass A collaborative code review tool 6

Slide 7

Slide 7 text

16% - 66% of the studied code changes have at least one invited reviewer who did not respond to the invitation %Non-responding invited reviewers in a patch The more the reviewers that were invited, the higher the chance of having a non-responding reviewer 7

Slide 8

Slide 8 text

Investigating the factors that can be associated with the participation decision Experience & Activeness Past Collaboration Workload 13 studied metrics RespondInvitation ∼ x1 + x2 + ….. + xn Use a non-linear logistic regression model Analyze the relationship with the likelihood of respond the invitation Code Ownership %Commits authored Reviewing Experience %Patches reviewed Review Participation Rate %Invitations Accepted Familiarity with the Patch Author Co-reviewing Freq. Remaining Reviews #Pending Review Requests 5 Signiﬁcant factors: 8

Slide 9

Slide 9 text

Slide 10

Slide 10 text

WLRRec: Workload-aware Reviewer Recommendation A multi-objective evolutionary search (NSGA-II) Experience & Activeness Past Collaboration Obj 1: Maximize the chance of participating a review Workload Obj 2: Mimize the Skewness of the Workload Measure Reviewer Metrics A new code change 10

Slide 11

Slide 11 text

WLRRec Uses 4+1 Key Reviewer Metrics Experience & Activeness Past Collaboration Workload Code Ownership %Commits authored Reviewing Experience %Patches reviewed Review Participation Rate %Invitations Accepted Familiarity with the Patch Author Co-reviewing Freq. Remaining Reviews #Pending Review Requests Fitness func. for Obj 1: Weighted Summation Identify reviewers with maximum experience, activeness and past collaboration Fitness func. for Obj 2: Shanon’s Entropy Identify reviewers with minimal skewed workload 11

Slide 12

Slide 12 text

WLRRec identiﬁes reviewers with maximum experience activeness, past collaboration (Obj 1) Example Fitness func. for Obj. 1 Code Ownership COPick COHoa COKla COAditya Rev. Experience REPick REHoa REKla REAditya Rev. Participate RPPick RPHoa RPKla RPAditya Fam. w/ Patch Author FPPick FPHoa FPKla FPAditya Weighted Sum ScorePick ScoreHoa ScoreKla ScoreAditya Solution Candidate Objective 1 score ScorePick + ScoreKla 12

Slide 13

Slide 13 text

#Pending Review Requests Solution Candidate Total Workload Objective 2 score (Shanon’s entropy) WLRRec identiﬁes reviewers with minimal skewed workload (Obj 2) Example Fitness func. for Obj. 2 -0.81 1 log2 4 ( 5 10 log 2 5 10 + 2 * 1 10 log 2 1 10 + 2 10 log 2 2 10 ) The lower the score, the lower skewed workload (the better distribution of workload) 13

Slide 14

Slide 14 text

Our WLRRec outperforms the single-objective approaches 0% 45% 90% 135% 180% Precision Recall F1 0% 35% 70% 105% 140% Precision Recall F1 %Gain WLRRec vs GA-Obj1 Precision Recall F-Measure Precision Recall F-Measure %Gain WLRRec vs GA-Obj2 WLRRec achieves 88%-142% higher precision, 111%-178% higher recall than GA-Obj1 WLRRec achieves 55%-101% higher precision, 96%-138% higher recall than GA-Obj2 Considering multiple objectives at the same time allows us to better ﬁnd reviewers 14

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Reviewers may have subconscious biases due to the visible information in a code review tool An author A code change (1) Uploading the change (2) Inviting reviewers Reviewers (3) Examining the change (4) Automated testings The approved change is integrated into the software system Rejected Accepted Fail Changes are abandoned A revision is required Pass A collaborative code review tool Code review tools often provide a transparent environment 16

Slide 17

Slide 17 text

Reviewers may have subconscious biases due to the visible information in a code review tool Ahmed usually writes good code 17

Slide 18

Slide 18 text

Investigating the signals of visible information that are associated with the review decision of a reviewer Analyze the relationship with the likelihood of giving a positive vote Use a mixed-eﬀects logistic regression model IsPositiveVote ∼ x1 + x2 + ….. + xn + (1 | ReviewerId ) 8 Studied metrics Relationship Status Prior Feedback Confounding factors:  Code Changes Characteristics as e.g., #Added Lines 18

Slide 19

Slide 19 text

In addition to patch characteristics, other visible information is associated with the review decision Relationship Status Prior Feedback Patch Characteristics Explanatory Power (Log-likelihood ratio test) Association Direction Higher %Reviewed past patches for the patch author More likely to Higher %Prior positive votes Lower %Prior comments More likely to More likely to Visible information has a stronger association with the review decision than patch characteristics 19

Slide 20

Slide 20 text

Other suboptimal reviewing practices also exist in the contemporary code review process [Chouchen et al, 2021] Identify anti-patterns in code reviews Manually examine code reviews of 100 code changes Confused reviewers Divergent reviewers Shallow review Toxic review 21% 20% 14% 5% Low review participation 32% 20

Slide 21

Slide 21 text

A large number of new code changes can pose challenges to perform effective code reviews Non-responding invited reviewers [Ruangwan et al, 2019] Workload-aware Reviewer recommendation [Al-Zubaidi et al, 2020] Suboptimal reviewing [Thongtanunam and Hassan, 2020; Chouchen et al, 2021] Line-level defect prediction [Wattanakriengkrai et al, 2020] Challenges: (Possible) Solutions: 21

Slide 22

Slide 22 text

We find that as little as 1%-3% of the lines of code in a file are actually defective after release 6 Studied systems Activemq Camel Derby Groovy Hbase Hive Jruby Lucene Wicket %Defective files 2-7% 2-8% 6-28% 2-4% 7-11% 6-19% 2-13% 2-8% 2-16% %Defective lines in defective files (at the median) 2% 2% 2% 2% 1% 2% 2% 3% 3% **Detective lines are the source code lines that will be changed by bug-fixing commits to fix post-release defects Only 1%-3% of the lines of code in a ﬁle are actually defective Predicting defective lines would potentially save reviewer eﬀort on inspecting code 22

Slide 23

Slide 23 text

if(closure != null){ Object oldCurrent = current; setClosure(closure, node); closure.call(); current = oldCurrent; } Identified defect-prone lines oldCurrent current node closure Defective (LIME Score >0) Clean (LIME score < 0) 0.8 0.1 -0.3 -0.7 Ranking tokens based on LIME scores Mapping tokens to lines if(closure != null){ Object oldCurrent = current; setClosure(closure, node); closure.call(); current = oldCurrent; } A model-agnostic technique (LIME) Line-DP: Predicting defective lines using a model- agnostic technique (LIME) A file-level defect prediction model Files of Interest Defect-prone files A model-agnostic technique (LIME) Defect-prone lines 23

Slide 24

Slide 24 text

Our approach achieves an overall predictive accuracy better than baseline approaches 12 Our Approach Line-level Baseline Approaches Recall 0.61 – 0.62 0.01 - 0.51 MCC 0.04 – 0.05 -0.01 – 0.03 False Alarm 0.47 – 0.48 o.01 -0.54 Distance to Heaven (the root mean square of the recall and false alarm values) 0.43– 0.44 0.52 – 0.70 LIME The higher the better The lower the better Our Line-DP achieves an overall predictive accuracy better than baseline approaches Line-DP Baselines: Static Analysis tools, N-gram Our Line-DP can eﬀectively identify defective lines while requiring a smaller amount of reviewing eﬀort 24

Slide 25

Slide 25 text

Slide 26

Slide 26 text

Code Review at Speed: How can we use data to help developers do code review faster? Non-responding invited reviewers [Ruangwan et al, 2019] Workload-aware Reviewer recommendation [Al-Zubaidi et al, 2020] Suboptimal reviewing [Thongtanunam and Hassan, 2020; Chouchen et al, 2021] Line-level defect prediction [Wattanakriengkrai et al, 2020] The Impact of Human Factors on the Participation Decision of Reviewers in Modern Code Review S. Ruangwan, P. Thongtanunam, A. Ihara, K. Matsumoto at Journal of EMSE 2019 Workload-Aware Reviewer Recommendation using a Multi-objective Search-Based Approach W. Al-Zubaidi, P. Thongtanunam, H. K. Dam, C. Tantithamthavorn, A. Ghose at PROMISE2020 Review Dynamics and Their Impact on Software Quality P. Thongtanunam and A. E. Hassan at TSE 2020 M. Chouchen, A. Ouni, R. Kula, D. Wang, P. Thongtanunam, M. Mkaouer, K. Matsumoto at SANER2021 Anti-patterns in Modern Code Review: Symptoms and Prevalence Predicting Defective Lines Using a Model-Agnostic Technique S. Wattanakriengkrai, P. Thongtanunam, C. Tantithamthavorn, H. Hata, K. Matsumoto at TSE2020 http://patanamon.com 26