Slide 1

Slide 1 text

Workload-Aware Reviewer Recommendation using a Multi-objective Search-Based Approach Wisam Haitham Abbood Al-Zubaidi Patanamon (Pick) Thongtanunam Hoa Khanh Dam Chakkrit (Kla) Tantithamthavorn Aditya Ghose [email protected] @patanamon 1

Slide 2

Slide 2 text

Author Code Review: A method to improve the overall quality of a patch through manual examination 2

Slide 3

Slide 3 text

Author Code Review: A method to improve the overall quality of a patch through manual examination A code review tool (Ex. Gerrit) A patch 2

Slide 4

Slide 4 text

Shouldn't console.log() call the toString() method (where appropriate) on objects? Identifying a defect I think it’s better to do var s = "{}" console.log(s) Suggesting a solution Author Reviewer Reviewer Code Review: A method to improve the overall quality of a patch through manual examination A code review tool (Ex. Gerrit) A patch 2

Slide 5

Slide 5 text

Effective code review requires active participation [Balachandran ICSE2013; Rigby and Storey ICSE2011] Shouldn't console.log() call the toString() method (where appropriate) on objects? Identifying a defect I think it’s better to do var s = "{}" console.log(s) Suggesting a solution Author Reviewer Reviewer Code Review: A method to improve the overall quality of a patch through manual examination A code review tool (Ex. Gerrit) A patch 2

Slide 6

Slide 6 text

Effective code review requires active participation [Balachandran ICSE2013; Rigby and Storey ICSE2011] A patch tends to be less defective when it was reviewed and discussed extensively by many reviewers [Thongtanunam et al MSR2015; Kononenko et al. ICSME2015] Shouldn't console.log() call the toString() method (where appropriate) on objects? Identifying a defect I think it’s better to do var s = "{}" console.log(s) Suggesting a solution Author Reviewer Reviewer Code Review: A method to improve the overall quality of a patch through manual examination A code review tool (Ex. Gerrit) A patch 2

Slide 7

Slide 7 text

Effective code review requires active participation [Balachandran ICSE2013; Rigby and Storey ICSE2011] A patch tends to be less defective when it was reviewed and discussed extensively by many reviewers [Thongtanunam et al MSR2015; Kononenko et al. ICSME2015] Shouldn't console.log() call the toString() method (where appropriate) on objects? Identifying a defect I think it’s better to do var s = "{}" console.log(s) Suggesting a solution Author Reviewer Reviewer Code Review: A method to improve the overall quality of a patch through manual examination A code review tool (Ex. Gerrit) A patch Finding suitable reviewers is not a trivial task [Thongtanunam et al SANER2015] 2

Slide 8

Slide 8 text

Several Reviewer Recommendation Approaches have been Developed to Improve Code Review Process 3

Slide 9

Slide 9 text

Several Reviewer Recommendation Approaches have been Developed to Improve Code Review Process Expertise/Experience-based Approaches Finding reviewers who review many similar patches in the past [Balachandran ICSE2013, 
 Thongtanunam et al SANER2015, 
 Zanjani et al TSE2016, Xia et al ICSME2016] 3

Slide 10

Slide 10 text

Several Reviewer Recommendation Approaches have been Developed to Improve Code Review Process Expertise/Experience-based Approaches Finding reviewers who review many similar patches in the past [Balachandran ICSE2013, 
 Thongtanunam et al SANER2015, 
 Zanjani et al TSE2016, Xia et al ICSME2016] Exp. + Past Collaboration Approaches Finding reviewers who often work with the author in the past [Yu et al ICSME2014, Ouni et al IST2017] 3

Slide 11

Slide 11 text

Several Reviewer Recommendation Approaches have been Developed to Improve Code Review Process Expertise/Experience-based Approaches Finding reviewers who review many similar patches in the past [Balachandran ICSE2013, 
 Thongtanunam et al SANER2015, 
 Zanjani et al TSE2016, Xia et al ICSME2016] Exp. + Past Collaboration Approaches Finding reviewers who often work with the author in the past [Yu et al ICSME2014, Ouni et al IST2017] ! Requesting only experts or active reviewers for a review could potentially burden them 3

Slide 12

Slide 12 text

Several Reviewer Recommendation Approaches have been Developed to Improve Code Review Process Expertise/Experience-based Approaches Finding reviewers who review many similar patches in the past [Balachandran ICSE2013, 
 Thongtanunam et al SANER2015, 
 Zanjani et al TSE2016, Xia et al ICSME2016] Exp. + Past Collaboration Approaches Finding reviewers who often work with the author in the past [Yu et al ICSME2014, Ouni et al IST2017] ! Requesting only experts or active reviewers for a review could potentially burden them Invited reviewers often consider their workload when accepting new invitations [Ruangwan et al EMSE 2019] 3

Slide 13

Slide 13 text

Several Reviewer Recommendation Approaches have been Developed to Improve Code Review Process Expertise/Experience-based Approaches Finding reviewers who review many similar patches in the past [Balachandran ICSE2013, 
 Thongtanunam et al SANER2015, 
 Zanjani et al TSE2016, Xia et al ICSME2016] Exp. + Past Collaboration Approaches Finding reviewers who often work with the author in the past [Yu et al ICSME2014, Ouni et al IST2017] ! Requesting only experts or active reviewers for a review could potentially burden them Invited reviewers often consider their workload when accepting new invitations [Ruangwan et al EMSE 2019] At Google, review tasks are assigned in a round-robin manner [Sadowski et al. ICSE 2018] 3

Slide 14

Slide 14 text

WLRRec: Workload-aware Reviewer Recommendation 4

Slide 15

Slide 15 text

WLRRec: Workload-aware Reviewer Recommendation A new patch 4

Slide 16

Slide 16 text

WLRRec: Workload-aware Reviewer Recommendation A new patch 4 Measure Reviewer Metrics

Slide 17

Slide 17 text

WLRRec: Workload-aware Reviewer Recommendation A multi-objective evolutionary search (NSGA-II) A new patch 4 Measure Reviewer Metrics

Slide 18

Slide 18 text

WLRRec: Workload-aware Reviewer Recommendation A multi-objective evolutionary search (NSGA-II) A new patch Experience & Activeness Past Collaboration Obj 1: Maximize the chance of participating a review Workload Obj 2: Mimize the Skewness of the Workload 4 Measure Reviewer Metrics

Slide 19

Slide 19 text

WLRRec: Workload-aware Reviewer Recommendation A multi-objective evolutionary search (NSGA-II) A new patch Experience & Activeness Past Collaboration Obj 1: Maximize the chance of participating a review Workload Obj 2: Mimize the Skewness of the Workload 4 Measure Reviewer Metrics

Slide 20

Slide 20 text

WLRRec Uses 4+1 Key Reviewer Metrics Experience & Activeness Past Collaboration Workload 5

Slide 21

Slide 21 text

WLRRec Uses 4+1 Key Reviewer Metrics Experience & Activeness Past Collaboration Workload Code Ownership %Commits authored Reviewing Experience %Patches reviewed Review Participation Rate %Invitations Accepted 5

Slide 22

Slide 22 text

WLRRec Uses 4+1 Key Reviewer Metrics Experience & Activeness Past Collaboration Workload Code Ownership %Commits authored Reviewing Experience %Patches reviewed Review Participation Rate %Invitations Accepted Familiarity with the Patch Author Co-reviewing Freq. 5

Slide 23

Slide 23 text

WLRRec Uses 4+1 Key Reviewer Metrics Experience & Activeness Past Collaboration Workload Code Ownership %Commits authored Reviewing Experience %Patches reviewed Review Participation Rate %Invitations Accepted Familiarity with the Patch Author Co-reviewing Freq. Remaining Reviews #Pending Review Requests 5

Slide 24

Slide 24 text

WLRRec Uses 4+1 Key Reviewer Metrics Experience & Activeness Past Collaboration Workload Code Ownership %Commits authored Reviewing Experience %Patches reviewed Review Participation Rate %Invitations Accepted Familiarity with the Patch Author Co-reviewing Freq. Remaining Reviews #Pending Review Requests Fitness func. for Obj 1: Weighted Summation Identify reviewers with maximum experience, activeness and past collaboration Fitness func. for Obj 2: Shanon’s Entropy Identify reviewers with minimal skewed workload 6

Slide 25

Slide 25 text

WLRRec identifies reviewers with maximum experience activeness, past collaboration (Obj 1) Example Fitness func. for Obj. 1 7

Slide 26

Slide 26 text

WLRRec identifies reviewers with maximum experience activeness, past collaboration (Obj 1) Example Fitness func. for Obj. 1 Code Ownership COPick COHoa COKla COAditya Rev. Experience REPick REHoa REKla REAditya Rev. Participate RPPick RPHoa RPKla RPAditya Fam. w/ Patch Author FPPick FPHoa FPKla FPAditya 7

Slide 27

Slide 27 text

WLRRec identifies reviewers with maximum experience activeness, past collaboration (Obj 1) Example Fitness func. for Obj. 1 Code Ownership COPick COHoa COKla COAditya Rev. Experience REPick REHoa REKla REAditya Rev. Participate RPPick RPHoa RPKla RPAditya Fam. w/ Patch Author FPPick FPHoa FPKla FPAditya Weighted Sum ScorePick ScoreHoa ScoreKla ScoreAditya 7

Slide 28

Slide 28 text

WLRRec identifies reviewers with maximum experience activeness, past collaboration (Obj 1) Example Fitness func. for Obj. 1 Code Ownership COPick COHoa COKla COAditya Rev. Experience REPick REHoa REKla REAditya Rev. Participate RPPick RPHoa RPKla RPAditya Fam. w/ Patch Author FPPick FPHoa FPKla FPAditya Weighted Sum ScorePick ScoreHoa ScoreKla ScoreAditya Solution Candidate 7

Slide 29

Slide 29 text

WLRRec identifies reviewers with maximum experience activeness, past collaboration (Obj 1) Example Fitness func. for Obj. 1 Code Ownership COPick COHoa COKla COAditya Rev. Experience REPick REHoa REKla REAditya Rev. Participate RPPick RPHoa RPKla RPAditya Fam. w/ Patch Author FPPick FPHoa FPKla FPAditya Weighted Sum ScorePick ScoreHoa ScoreKla ScoreAditya Solution Candidate Objective 1 score ScorePick + ScoreKla 7

Slide 30

Slide 30 text

WLRRec identifies reviewers with minimal skewed workload (Obj 2) Example Fitness func. for Obj. 2 8

Slide 31

Slide 31 text

#Pending Review Requests WLRRec identifies reviewers with minimal skewed workload (Obj 2) Example Fitness func. for Obj. 2 8

Slide 32

Slide 32 text

#Pending Review Requests Solution Candidate WLRRec identifies reviewers with minimal skewed workload (Obj 2) Example Fitness func. for Obj. 2 8

Slide 33

Slide 33 text

#Pending Review Requests Solution Candidate Total Workload WLRRec identifies reviewers with minimal skewed workload (Obj 2) Example Fitness func. for Obj. 2 8

Slide 34

Slide 34 text

#Pending Review Requests Solution Candidate Total Workload Objective 2 score (Shanon’s entropy) WLRRec identifies reviewers with minimal skewed workload (Obj 2) Example Fitness func. for Obj. 2 -0.81 8 1 log2 4 ( 5 10 log2 5 10 + 2 * 1 10 log2 1 10 + 2 10 log2 2 10 )

Slide 35

Slide 35 text

#Pending Review Requests Solution Candidate Total Workload Objective 2 score (Shanon’s entropy) WLRRec identifies reviewers with minimal skewed workload (Obj 2) Example Fitness func. for Obj. 2 -0.81 8 1 log2 4 ( 5 10 log2 5 10 + 2 * 1 10 log2 1 10 + 2 10 log2 2 10 ) The lower the score, the lower skewed workload (the better distribution of workload)

Slide 36

Slide 36 text

WLRRec selects the solution that is closet to the reference point S1 S2 S3 S4 9

Slide 37

Slide 37 text

Pareto optimal solutions of selected reviewers generated by NSGA-II WLRRec selects the solution that is closet to the reference point S1 S2 S3 S4 9

Slide 38

Slide 38 text

Pareto optimal solutions of selected reviewers generated by NSGA-II WLRRec selects the solution that is closet to the reference point S1 S2 S3 S4 S1 S2 S3 Objective 1: Maximize chance of participating a review S4 Objective 2: Minimize skewness of the workload distribution Reference point Dist(S4 ) Dist(S 3) Dist(S2) Dist(S2) The Knee Point Approach 9

Slide 39

Slide 39 text

Pareto optimal solutions of selected reviewers generated by NSGA-II WLRRec selects the solution that is closet to the reference point S1 S2 S3 S4 S1 S2 S3 Objective 1: Maximize chance of participating a review S4 Objective 2: Minimize skewness of the workload distribution Reference point Dist(S4 ) Dist(S 3) Dist(S2) Dist(S2) Measure the distance between the solution and the reference point The Knee Point Approach 9

Slide 40

Slide 40 text

Pareto optimal solutions of selected reviewers generated by NSGA-II WLRRec selects the solution that is closet to the reference point S1 S2 S3 S4 S1 S2 S3 Objective 1: Maximize chance of participating a review S4 Objective 2: Minimize skewness of the workload distribution Reference point Dist(S4 ) Dist(S 3) Dist(S2) Dist(S2) Measure the distance between the solution and the reference point The Knee Point Approach Select S3 as it has the closest distance 9

Slide 41

Slide 41 text

How well can our WLRRec (a multi-objective approach) recommend reviewers for a newly-submitted patch? 10

Slide 42

Slide 42 text

How well can our WLRRec (a multi-objective approach) recommend reviewers for a newly-submitted patch? Datasets 10

Slide 43

Slide 43 text

How well can our WLRRec (a multi-objective approach) recommend reviewers for a newly-submitted patch? 36K Patches 
 2K Reviewers 65K Patches 
 1.2K Reviewers 108K Patches 3.7K Reviewers 19K Patches
 410 Reviewers Datasets 10

Slide 44

Slide 44 text

How well can our WLRRec (a multi-objective approach) recommend reviewers for a newly-submitted patch? 36K Patches 
 2K Reviewers 65K Patches 
 1.2K Reviewers 108K Patches 3.7K Reviewers 19K Patches
 410 Reviewers Datasets Investigation 10

Slide 45

Slide 45 text

How well can our WLRRec (a multi-objective approach) recommend reviewers for a newly-submitted patch? 36K Patches 
 2K Reviewers 65K Patches 
 1.2K Reviewers 108K Patches 3.7K Reviewers 19K Patches
 410 Reviewers Datasets Investigation Genetic Algorithm (GA) Obj1: Maximize chance of participating a review Genetic Algorithm (GA) Obj2: Minimize the skewed workload Single-Objective vs. Multiple-Objective 10

Slide 46

Slide 46 text

How well can our WLRRec (a multi-objective approach) recommend reviewers for a newly-submitted patch? 36K Patches 
 2K Reviewers 65K Patches 
 1.2K Reviewers 108K Patches 3.7K Reviewers 19K Patches
 410 Reviewers Datasets Investigation Genetic Algorithm (GA) Obj1: Maximize chance of participating a review Genetic Algorithm (GA) Obj2: Minimize the skewed workload Single-Objective vs. Multiple-Objective Multi-Objective Cellular Genetic Algorithm (MOCell) NSGA-II vs. Other Multi-Objective Algorithms Strength-based Evolutionary Algo- rithm (SPEA2) 10

Slide 47

Slide 47 text

How well can our WLRRec (a multi-objective approach) recommend reviewers for a newly-submitted patch? 36K Patches 
 2K Reviewers 65K Patches 
 1.2K Reviewers 108K Patches 3.7K Reviewers 19K Patches
 410 Reviewers Datasets Investigation Genetic Algorithm (GA) Obj1: Maximize chance of participating a review Genetic Algorithm (GA) Obj2: Minimize the skewed workload Single-Objective vs. Multiple-Objective Multi-Objective Cellular Genetic Algorithm (MOCell) NSGA-II vs. Other Multi-Objective Algorithms Strength-based Evolutionary Algo- rithm (SPEA2) Performance Measures 10

Slide 48

Slide 48 text

How well can our WLRRec (a multi-objective approach) recommend reviewers for a newly-submitted patch? 36K Patches 
 2K Reviewers 65K Patches 
 1.2K Reviewers 108K Patches 3.7K Reviewers 19K Patches
 410 Reviewers Datasets Investigation Genetic Algorithm (GA) Obj1: Maximize chance of participating a review Genetic Algorithm (GA) Obj2: Minimize the skewed workload Single-Objective vs. Multiple-Objective Multi-Objective Cellular Genetic Algorithm (MOCell) NSGA-II vs. Other Multi-Objective Algorithms Strength-based Evolutionary Algo- rithm (SPEA2) Performance Measures Precision Recall F-Measure Hypervolume 10

Slide 49

Slide 49 text

How well can our WLRRec (a multi-objective approach) recommend reviewers for a newly-submitted patch? 36K Patches 
 2K Reviewers 65K Patches 
 1.2K Reviewers 108K Patches 3.7K Reviewers 19K Patches
 410 Reviewers Datasets Investigation Genetic Algorithm (GA) Obj1: Maximize chance of participating a review Genetic Algorithm (GA) Obj2: Minimize the skewed workload Single-Objective vs. Multiple-Objective Multi-Objective Cellular Genetic Algorithm (MOCell) NSGA-II vs. Other Multi-Objective Algorithms Strength-based Evolutionary Algo- rithm (SPEA2) Performance Measures Precision Recall F-Measure Hypervolume %Gain = WLRRecpm - Ypm Ypm pm = Performance Measures Y = Alternative approaches 10

Slide 50

Slide 50 text

Our WLRRec outperforms the single-objective approaches 0% 45% 90% 135% 180% Precision Recall F1 0% 35% 70% 105% 140% Precision Recall F1 %Gain WLRRec vs GA-Obj1 Precision Recall F-Measure Precision Recall F-Measure %Gain WLRRec vs GA-Obj2 11

Slide 51

Slide 51 text

Our WLRRec outperforms the single-objective approaches 0% 45% 90% 135% 180% Precision Recall F1 0% 35% 70% 105% 140% Precision Recall F1 %Gain WLRRec vs GA-Obj1 Precision Recall F-Measure Precision Recall F-Measure %Gain WLRRec vs GA-Obj2 WLRRec achieves 88%-142% higher precision, 111%-178% higher recall than GA-Obj1 WLRRec achieves 55%-101% higher precision, 96%-138% higher recall than GA-Obj2 11

Slide 52

Slide 52 text

Our WLRRec outperforms the single-objective approaches 0% 45% 90% 135% 180% Precision Recall F1 0% 35% 70% 105% 140% Precision Recall F1 %Gain WLRRec vs GA-Obj1 Precision Recall F-Measure Precision Recall F-Measure %Gain WLRRec vs GA-Obj2 Considering multiple objectives at the same time allows us to better find reviewers 11

Slide 53

Slide 53 text

Our WLRRec with NSGA-II is better than other two multi-objective approaches 0% 25% 50% 75% 100% Precision Recall F1 HV 0% 25% 50% 75% 100% Precision Recall F1 HV %Gain WLRRec with NSGA-II vs MOCell Precision Recall F-Measure %Gain WLRRec with NSGA-II vs SPEA2 Hypervolume Precision Recall F-Measure Hypervolume 12

Slide 54

Slide 54 text

Our WLRRec with NSGA-II is better than other two multi-objective approaches 0% 25% 50% 75% 100% Precision Recall F1 HV 0% 25% 50% 75% 100% Precision Recall F1 HV %Gain WLRRec with NSGA-II vs MOCell Precision Recall F-Measure WLRRec achieves 31%-95% higher F-measure, 21%-31% higher hypervolume than MOCell %Gain WLRRec with NSGA-II vs SPEA2 Hypervolume Precision Recall F-Measure Hypervolume 12

Slide 55

Slide 55 text

Our WLRRec with NSGA-II is better than other two multi-objective approaches 0% 25% 50% 75% 100% Precision Recall F1 HV 0% 25% 50% 75% 100% Precision Recall F1 HV %Gain WLRRec with NSGA-II vs MOCell Precision Recall F-Measure WLRRec achieves 31%-95% higher F-measure, 21%-31% higher hypervolume than MOCell WLRRec achieves 19%-95% higher F-measure, 29%-47% higher hypervolume than SPEA2 %Gain WLRRec with NSGA-II vs SPEA2 Hypervolume Precision Recall F-Measure Hypervolume 12

Slide 56

Slide 56 text

Our WLRRec with NSGA-II is better than other two multi-objective approaches 0% 25% 50% 75% 100% Precision Recall F1 HV 0% 25% 50% 75% 100% Precision Recall F1 HV %Gain WLRRec with NSGA-II vs MOCell Precision Recall F-Measure The NSGA-II algorithm leveraged by our WLRRec is an appropriate multi-objective approach to find solutions in this problem domain %Gain WLRRec with NSGA-II vs SPEA2 Hypervolume Precision Recall F-Measure Hypervolume 12

Slide 57

Slide 57 text

13

Slide 58

Slide 58 text

13 Several Reviewer Recommendation Approaches have been Developed to Improve Code Review Process Expertise/Experience-based Approaches Finding reviewers who review many similar patches in the past [Balachandran ICSE2013, 
 Thongtanunam et al SANER2015, 
 Zanjani et al TSE2016, Xia et al ICSME2016] Exp. + Past Collaboration Approaches Finding reviewers who often work with the author in the past [Yu et al ICSME2014, Ouni et al IST2017] ! Requesting only experts or active reviewers for a review could potentially burden them Invited reviewers often consider their workload when accepting new invitations [Ruangwan et al EMSE 2019] At Google, review tasks are assigned in a round-robin manner [Sadowski et al. ICSE 2018]

Slide 59

Slide 59 text

13 Several Reviewer Recommendation Approaches have been Developed to Improve Code Review Process Expertise/Experience-based Approaches Finding reviewers who review many similar patches in the past [Balachandran ICSE2013, 
 Thongtanunam et al SANER2015, 
 Zanjani et al TSE2016, Xia et al ICSME2016] Exp. + Past Collaboration Approaches Finding reviewers who often work with the author in the past [Yu et al ICSME2014, Ouni et al IST2017] ! Requesting only experts or active reviewers for a review could potentially burden them Invited reviewers often consider their workload when accepting new invitations [Ruangwan et al EMSE 2019] At Google, review tasks are assigned in a round-robin manner [Sadowski et al. ICSE 2018] WLRRec: Workload-aware Reviewer Recommendation NSGA-II A new patch Experience & Activeness Past Collaboration Obj 1: Maximize the chance of participating a review Workload Obj 2: Mimize the Skewness of the Reviewing Workload Distribution

Slide 60

Slide 60 text

13 Several Reviewer Recommendation Approaches have been Developed to Improve Code Review Process Expertise/Experience-based Approaches Finding reviewers who review many similar patches in the past [Balachandran ICSE2013, 
 Thongtanunam et al SANER2015, 
 Zanjani et al TSE2016, Xia et al ICSME2016] Exp. + Past Collaboration Approaches Finding reviewers who often work with the author in the past [Yu et al ICSME2014, Ouni et al IST2017] ! Requesting only experts or active reviewers for a review could potentially burden them Invited reviewers often consider their workload when accepting new invitations [Ruangwan et al EMSE 2019] At Google, review tasks are assigned in a round-robin manner [Sadowski et al. ICSE 2018] WLRRec: Workload-aware Reviewer Recommendation NSGA-II A new patch Experience & Activeness Past Collaboration Obj 1: Maximize the chance of participating a review Workload Obj 2: Mimize the Skewness of the Reviewing Workload Distribution Our WLRRec outperforms the single-objective approaches 0% 45% 90% 135% 180% Precision Recall F1 0% 35% 70% 105% 140% Precision Recall F1 %Gain WLRRec vs GA-Obj1 Precision Recall F-Measure Precision Recall F-Measure %Gain WLRRec vs GA-Obj2 WLRRec is 88%-142% higher precision, 111%-178% higher recall than GA-Obj1 WLRRec is 55%-101% higher precision, 96%-138% higher recall than GA-Obj2 Our WLRRec with NSGA-II is better than other two multi-objective approaches 0% 25% 50% 75% 100% Precision Recall F1 HV 0% 25% 50% 75% 100% Precision Recall F1 HV %Gain WLRRec with NSGA-II vs MOCell Precision Recall F-Measure NSGA-II is 31%-95% higher F-measure, NSGA-II is 19%-95% higher F-measure, %Gain WLRRec with NSGA-II vs SPEA2 Hypervolume Precision Recall F-Measure Hypervolume Our WLRRec with NSGA-II is better than other two multi-objective approaches 0% 25% 50% 75% 100% Precision Recall F1 HV 0% 25% 50% 75% 100% Precision Recall F1 HV %Gain WLRRec with NSGA-II vs MOCell Precision Recall F-Measure NSGA-II is 31%-95% higher F-measure, 21%-31% higher hypervolume than MOCell NSGA-II is 19%-95% higher F-measure, 29%-47% higher hypervolume than SPEA2 %Gain WLRRec with NSGA-II vs SPEA2 Hypervolume Precision Recall F-Measure Hypervolume Our WLRRec outperforms the four alternative approaches

Slide 61

Slide 61 text

13 Several Reviewer Recommendation Approaches have been Developed to Improve Code Review Process Expertise/Experience-based Approaches Finding reviewers who review many similar patches in the past [Balachandran ICSE2013, 
 Thongtanunam et al SANER2015, 
 Zanjani et al TSE2016, Xia et al ICSME2016] Exp. + Past Collaboration Approaches Finding reviewers who often work with the author in the past [Yu et al ICSME2014, Ouni et al IST2017] ! Requesting only experts or active reviewers for a review could potentially burden them Invited reviewers often consider their workload when accepting new invitations [Ruangwan et al EMSE 2019] At Google, review tasks are assigned in a round-robin manner [Sadowski et al. ICSE 2018] WLRRec: Workload-aware Reviewer Recommendation NSGA-II A new patch Experience & Activeness Past Collaboration Obj 1: Maximize the chance of participating a review Workload Obj 2: Mimize the Skewness of the Reviewing Workload Distribution Our WLRRec outperforms the single-objective approaches 0% 45% 90% 135% 180% Precision Recall F1 0% 35% 70% 105% 140% Precision Recall F1 %Gain WLRRec vs GA-Obj1 Precision Recall F-Measure Precision Recall F-Measure %Gain WLRRec vs GA-Obj2 WLRRec is 88%-142% higher precision, 111%-178% higher recall than GA-Obj1 WLRRec is 55%-101% higher precision, 96%-138% higher recall than GA-Obj2 Our WLRRec with NSGA-II is better than other two multi-objective approaches 0% 25% 50% 75% 100% Precision Recall F1 HV 0% 25% 50% 75% 100% Precision Recall F1 HV %Gain WLRRec with NSGA-II vs MOCell Precision Recall F-Measure NSGA-II is 31%-95% higher F-measure, NSGA-II is 19%-95% higher F-measure, %Gain WLRRec with NSGA-II vs SPEA2 Hypervolume Precision Recall F-Measure Hypervolume Our WLRRec with NSGA-II is better than other two multi-objective approaches 0% 25% 50% 75% 100% Precision Recall F1 HV 0% 25% 50% 75% 100% Precision Recall F1 HV %Gain WLRRec with NSGA-II vs MOCell Precision Recall F-Measure NSGA-II is 31%-95% higher F-measure, 21%-31% higher hypervolume than MOCell NSGA-II is 19%-95% higher F-measure, 29%-47% higher hypervolume than SPEA2 %Gain WLRRec with NSGA-II vs SPEA2 Hypervolume Precision Recall F-Measure Hypervolume Our work highlights the potential of leveraging the multi-objective algorithm that consider review workload and other important information to find reviewers [email protected] @patanamon http://patanamon.com Our WLRRec outperforms the four alternative approaches