Workload-Aware Reviewer Recommendation using a Multi-objective Search-Based Approach
This is a paper presented at PROMISE2020. This paper proposes a new technique to recommend reviewers based on reviewer experience, activeness, past collaboration, and reviewing workload. The presentation video is here: https://youtu.be/zG4y_eXQXXU
Shouldn't console.log() call the toString() method (where appropriate) on objects? Identifying a defect I think it’s better to do var s = "{}" console.log(s) Suggesting a solution Author Reviewer Reviewer Code Review: A method to improve the overall quality of a patch through manual examination A code review tool (Ex. Gerrit) A patch 2
Effective code review requires active participation
[Balachandran ICSE2013; Rigby and Storey ICSE2011] Shouldn't console.log() call the toString() method (where appropriate) on objects? Identifying a defect I think it’s better to do var s = "{}" console.log(s) Suggesting a solution Author Reviewer Reviewer Code Review: A method to improve the overall quality of a patch through manual examination A code review tool (Ex. Gerrit) A patch 2
Effective code review requires active participation
[Balachandran ICSE2013; Rigby and Storey ICSE2011] A patch tends to be less defective when it was reviewed and discussed extensively by many reviewers
[Thongtanunam et al MSR2015; Kononenko et al. ICSME2015] Shouldn't console.log() call the toString() method (where appropriate) on objects? Identifying a defect I think it’s better to do var s = "{}" console.log(s) Suggesting a solution Author Reviewer Reviewer Code Review: A method to improve the overall quality of a patch through manual examination A code review tool (Ex. Gerrit) A patch 2
Effective code review requires active participation
[Balachandran ICSE2013; Rigby and Storey ICSE2011] A patch tends to be less defective when it was reviewed and discussed extensively by many reviewers
[Thongtanunam et al MSR2015; Kononenko et al. ICSME2015] Shouldn't console.log() call the toString() method (where appropriate) on objects? Identifying a defect I think it’s better to do var s = "{}" console.log(s) Suggesting a solution Author Reviewer Reviewer Code Review: A method to improve the overall quality of a patch through manual examination A code review tool (Ex. Gerrit) A patch Finding suitable reviewers is not a trivial task
Several Reviewer Recommendation Approaches have been Developed to Improve Code Review Process Expertise/Experience-based Approaches Finding reviewers who review many similar patches in the past
[Balachandran ICSE2013, Thongtanunam et al SANER2015, Zanjani et al TSE2016, Xia et al ICSME2016] 3
Several Reviewer Recommendation Approaches have been Developed to Improve Code Review Process Expertise/Experience-based Approaches Finding reviewers who review many similar patches in the past
[Balachandran ICSE2013, Thongtanunam et al SANER2015, Zanjani et al TSE2016, Xia et al ICSME2016] Exp. + Past Collaboration Approaches Finding reviewers who often work with the author in the past
Several Reviewer Recommendation Approaches have been Developed to Improve Code Review Process Expertise/Experience-based Approaches Finding reviewers who review many similar patches in the past
[Balachandran ICSE2013, Thongtanunam et al SANER2015, Zanjani et al TSE2016, Xia et al ICSME2016] Exp. + Past Collaboration Approaches Finding reviewers who often work with the author in the past
[Yu et al ICSME2014, Ouni et al IST2017] ! Requesting only experts or active reviewers for a review could potentially burden them 3
Several Reviewer Recommendation Approaches have been Developed to Improve Code Review Process Expertise/Experience-based Approaches Finding reviewers who review many similar patches in the past
[Balachandran ICSE2013, Thongtanunam et al SANER2015, Zanjani et al TSE2016, Xia et al ICSME2016] Exp. + Past Collaboration Approaches Finding reviewers who often work with the author in the past
[Yu et al ICSME2014, Ouni et al IST2017] ! Requesting only experts or active reviewers for a review could potentially burden them Invited reviewers often consider their workload when accepting new invitations
Several Reviewer Recommendation Approaches have been Developed to Improve Code Review Process Expertise/Experience-based Approaches Finding reviewers who review many similar patches in the past
[Balachandran ICSE2013, Thongtanunam et al SANER2015, Zanjani et al TSE2016, Xia et al ICSME2016] Exp. + Past Collaboration Approaches Finding reviewers who often work with the author in the past
[Yu et al ICSME2014, Ouni et al IST2017] ! Requesting only experts or active reviewers for a review could potentially burden them Invited reviewers often consider their workload when accepting new invitations
[Ruangwan et al EMSE 2019] At Google, review tasks are assigned in a round-robin manner
WLRRec: Workload-aware Reviewer Recommendation A multi-objective evolutionary search (NSGA-II) A new patch Experience & Activeness Past Collaboration Obj 1: Maximize the chance of participating a review Workload Obj 2: Mimize the Skewness of the Workload 4 Measure Reviewer Metrics
WLRRec: Workload-aware Reviewer Recommendation A multi-objective evolutionary search (NSGA-II) A new patch Experience & Activeness Past Collaboration Obj 1: Maximize the chance of participating a review Workload Obj 2: Mimize the Skewness of the Workload 4 Measure Reviewer Metrics
%Invitations Accepted Familiarity with the Patch Author
Co-reviewing Freq. Remaining Reviews
#Pending Review Requests Fitness func. for Obj 1: Weighted Summation Identify reviewers with maximum experience, activeness and past collaboration Fitness func. for Obj 2: Shanon’s Entropy Identify reviewers with minimal skewed workload 6
Pareto optimal solutions of selected reviewers generated by NSGA-II WLRRec selects the solution that is closet to the reference point S1 S2 S3 S4 S1 S2 S3 Objective 1: Maximize chance of participating a review
S4 Objective 2: Minimize skewness of the workload distribution Reference point Dist(S4 ) Dist(S 3) Dist(S2) Dist(S2) The Knee Point Approach 9
Pareto optimal solutions of selected reviewers generated by NSGA-II WLRRec selects the solution that is closet to the reference point S1 S2 S3 S4 S1 S2 S3 Objective 1: Maximize chance of participating a review
S4 Objective 2: Minimize skewness of the workload distribution Reference point Dist(S4 ) Dist(S 3) Dist(S2) Dist(S2) Measure the distance between the solution and the reference point The Knee Point Approach 9
Pareto optimal solutions of selected reviewers generated by NSGA-II WLRRec selects the solution that is closet to the reference point S1 S2 S3 S4 S1 S2 S3 Objective 1: Maximize chance of participating a review
S4 Objective 2: Minimize skewness of the workload distribution Reference point Dist(S4 ) Dist(S 3) Dist(S2) Dist(S2) Measure the distance between the solution and the reference point The Knee Point Approach Select S3 as it has the closest distance 9
Our WLRRec outperforms the single-objective approaches 0% 45% 90% 135% 180% Precision Recall F1 0% 35% 70% 105% 140% Precision Recall F1 %Gain WLRRec vs GA-Obj1 Precision Recall F-Measure Precision Recall F-Measure %Gain WLRRec vs GA-Obj2 Considering multiple objectives at the same time allows us to better find reviewers 11
Our WLRRec with NSGA-II is better than other two multi-objective approaches 0% 25% 50% 75% 100% Precision Recall F1 HV 0% 25% 50% 75% 100% Precision Recall F1 HV %Gain WLRRec with NSGA-II vs MOCell Precision Recall F-Measure %Gain WLRRec with NSGA-II vs SPEA2 Hypervolume Precision Recall F-Measure Hypervolume 12
Our WLRRec with NSGA-II is better than other two multi-objective approaches 0% 25% 50% 75% 100% Precision Recall F1 HV 0% 25% 50% 75% 100% Precision Recall F1 HV %Gain WLRRec with NSGA-II vs MOCell Precision Recall F-Measure WLRRec achieves 31%-95% higher F-measure,
21%-31% higher hypervolume than MOCell %Gain WLRRec with NSGA-II vs SPEA2 Hypervolume Precision Recall F-Measure Hypervolume 12
Our WLRRec with NSGA-II is better than other two multi-objective approaches 0% 25% 50% 75% 100% Precision Recall F1 HV 0% 25% 50% 75% 100% Precision Recall F1 HV %Gain WLRRec with NSGA-II vs MOCell Precision Recall F-Measure WLRRec achieves 31%-95% higher F-measure,
21%-31% higher hypervolume than MOCell WLRRec achieves 19%-95% higher F-measure,
29%-47% higher hypervolume than SPEA2 %Gain WLRRec with NSGA-II vs SPEA2 Hypervolume Precision Recall F-Measure Hypervolume 12
Our WLRRec with NSGA-II is better than other two multi-objective approaches 0% 25% 50% 75% 100% Precision Recall F1 HV 0% 25% 50% 75% 100% Precision Recall F1 HV %Gain WLRRec with NSGA-II vs MOCell Precision Recall F-Measure The NSGA-II algorithm leveraged by our WLRRec is an appropriate multi-objective approach to find solutions in this problem domain %Gain WLRRec with NSGA-II vs SPEA2 Hypervolume Precision Recall F-Measure Hypervolume 12
13 Several Reviewer Recommendation Approaches have been Developed to Improve Code Review Process Expertise/Experience-based Approaches Finding reviewers who review many similar patches in the past
[Balachandran ICSE2013, Thongtanunam et al SANER2015, Zanjani et al TSE2016, Xia et al ICSME2016] Exp. + Past Collaboration Approaches Finding reviewers who often work with the author in the past
[Yu et al ICSME2014, Ouni et al IST2017] ! Requesting only experts or active reviewers for a review could potentially burden them Invited reviewers often consider their workload when accepting new invitations
[Ruangwan et al EMSE 2019] At Google, review tasks are assigned in a round-robin manner
13 Several Reviewer Recommendation Approaches have been Developed to Improve Code Review Process Expertise/Experience-based Approaches Finding reviewers who review many similar patches in the past
[Balachandran ICSE2013, Thongtanunam et al SANER2015, Zanjani et al TSE2016, Xia et al ICSME2016] Exp. + Past Collaboration Approaches Finding reviewers who often work with the author in the past
[Yu et al ICSME2014, Ouni et al IST2017] ! Requesting only experts or active reviewers for a review could potentially burden them Invited reviewers often consider their workload when accepting new invitations
[Ruangwan et al EMSE 2019] At Google, review tasks are assigned in a round-robin manner
[Sadowski et al. ICSE 2018] WLRRec: Workload-aware Reviewer Recommendation NSGA-II A new patch Experience & Activeness Past Collaboration Obj 1: Maximize the chance of participating a review Workload Obj 2: Mimize the Skewness of the Reviewing Workload Distribution
13 Several Reviewer Recommendation Approaches have been Developed to Improve Code Review Process Expertise/Experience-based Approaches Finding reviewers who review many similar patches in the past
[Balachandran ICSE2013, Thongtanunam et al SANER2015, Zanjani et al TSE2016, Xia et al ICSME2016] Exp. + Past Collaboration Approaches Finding reviewers who often work with the author in the past
[Yu et al ICSME2014, Ouni et al IST2017] ! Requesting only experts or active reviewers for a review could potentially burden them Invited reviewers often consider their workload when accepting new invitations
[Ruangwan et al EMSE 2019] At Google, review tasks are assigned in a round-robin manner
[Sadowski et al. ICSE 2018] WLRRec: Workload-aware Reviewer Recommendation NSGA-II A new patch Experience & Activeness Past Collaboration Obj 1: Maximize the chance of participating a review Workload Obj 2: Mimize the Skewness of the Reviewing Workload Distribution Our WLRRec outperforms the single-objective approaches 0% 45% 90% 135% 180% Precision Recall F1 0% 35% 70% 105% 140% Precision Recall F1 %Gain WLRRec vs GA-Obj1 Precision Recall F-Measure Precision Recall F-Measure %Gain WLRRec vs GA-Obj2 WLRRec is 88%-142% higher precision,
111%-178% higher recall than GA-Obj1 WLRRec is 55%-101% higher precision,
96%-138% higher recall than GA-Obj2 Our WLRRec with NSGA-II is better than other two multi-objective approaches 0% 25% 50% 75% 100% Precision Recall F1 HV 0% 25% 50% 75% 100% Precision Recall F1 HV %Gain WLRRec with NSGA-II vs MOCell Precision Recall F-Measure NSGA-II is 31%-95% higher F-measure, NSGA-II is 19%-95% higher F-measure,
%Gain WLRRec with NSGA-II vs SPEA2 Hypervolume Precision Recall F-Measure Hypervolume Our WLRRec with NSGA-II is better than other two multi-objective approaches 0% 25% 50% 75% 100% Precision Recall F1 HV 0% 25% 50% 75% 100% Precision Recall F1 HV %Gain WLRRec with NSGA-II vs MOCell Precision Recall F-Measure NSGA-II is 31%-95% higher F-measure,
21%-31% higher hypervolume than MOCell NSGA-II is 19%-95% higher F-measure,
29%-47% higher hypervolume than SPEA2 %Gain WLRRec with NSGA-II vs SPEA2 Hypervolume Precision Recall F-Measure Hypervolume Our WLRRec outperforms the four alternative approaches
13 Several Reviewer Recommendation Approaches have been Developed to Improve Code Review Process Expertise/Experience-based Approaches Finding reviewers who review many similar patches in the past
[Balachandran ICSE2013, Thongtanunam et al SANER2015, Zanjani et al TSE2016, Xia et al ICSME2016] Exp. + Past Collaboration Approaches Finding reviewers who often work with the author in the past
[Yu et al ICSME2014, Ouni et al IST2017] ! Requesting only experts or active reviewers for a review could potentially burden them Invited reviewers often consider their workload when accepting new invitations
[Ruangwan et al EMSE 2019] At Google, review tasks are assigned in a round-robin manner
[Sadowski et al. ICSE 2018] WLRRec: Workload-aware Reviewer Recommendation NSGA-II A new patch Experience & Activeness Past Collaboration Obj 1: Maximize the chance of participating a review Workload Obj 2: Mimize the Skewness of the Reviewing Workload Distribution Our WLRRec outperforms the single-objective approaches 0% 45% 90% 135% 180% Precision Recall F1 0% 35% 70% 105% 140% Precision Recall F1 %Gain WLRRec vs GA-Obj1 Precision Recall F-Measure Precision Recall F-Measure %Gain WLRRec vs GA-Obj2 WLRRec is 88%-142% higher precision,
111%-178% higher recall than GA-Obj1 WLRRec is 55%-101% higher precision,
96%-138% higher recall than GA-Obj2 Our WLRRec with NSGA-II is better than other two multi-objective approaches 0% 25% 50% 75% 100% Precision Recall F1 HV 0% 25% 50% 75% 100% Precision Recall F1 HV %Gain WLRRec with NSGA-II vs MOCell Precision Recall F-Measure NSGA-II is 31%-95% higher F-measure, NSGA-II is 19%-95% higher F-measure,
%Gain WLRRec with NSGA-II vs SPEA2 Hypervolume Precision Recall F-Measure Hypervolume Our WLRRec with NSGA-II is better than other two multi-objective approaches 0% 25% 50% 75% 100% Precision Recall F1 HV 0% 25% 50% 75% 100% Precision Recall F1 HV %Gain WLRRec with NSGA-II vs MOCell Precision Recall F-Measure NSGA-II is 31%-95% higher F-measure,
21%-31% higher hypervolume than MOCell NSGA-II is 19%-95% higher F-measure,
29%-47% higher hypervolume than SPEA2 %Gain WLRRec with NSGA-II vs SPEA2 Hypervolume Precision Recall F-Measure Hypervolume Our work highlights the potential of leveraging the multi-objective algorithm that consider review workload and other important information to find reviewers [email protected] @patanamon http://patanamon.com Our WLRRec outperforms the four alternative approaches