Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Code review at speed: How can we use data to help developers do code review faster?

Code review at speed: How can we use data to help developers do code review faster?

Abstract: Code review has now become a mandatory practice in the software engineering workflow of many software organizations. While code review has been shown that it provides numerous benefits to the software teams, it is also considered expensive. Due to the manual and human-intensive nature of code review, it can delay the software development workflow. In this talk, I will summarize some of our work and others about how can we help developers perform effectively and what is the remaining challenges in code reviews that need to be addressed.

The talk was given at the MSR2021 Mini-keynote https://2021.msrconf.org/track/msr-2021-keynotes
The short video presentation is available at https://youtu.be/K9BfndLKDyY

More Decks by Patanamon (Pick) Thongtanunam

Other Decks in Research

Transcript

  1. Patanamon (Pick) Thongtanunam
    [email protected] @patanamon
    ARC DECRA & Lecturer

    at School of Computing and
    Information Systems (CIS)
    http://patanamon.com
    Code Review at Speed: How can we use data
    to help developers do code review faster?
    1

    View Slide

  2. Create
    tasks
    Write code
    Build & test
    code
    Integrate
    Release/
    Deploy
    A General View of Continuous Integration
    Code Review: A QA practice that manually
    examines a new code change
    Code Review
    A bug is
    here…
    Improve the overall quality of
    software systems

    [Thongtanunam et al. 2015,

    McIntosh et al. 2016]
    Increase team awareness, transfer
    knowledge & share code ownership

    [Bachelli and Bird 2013,

    Thongtanunam et al., 2016,

    Sadowski et al, 2018]
    2

    View Slide

  3. Code Review: A QA practice that manually
    examine a new code change
    An author
    A code
    change
    (1) Uploading
    the change
    (2) Inviting
    reviewers
    Reviewers
    (3) Examining
    the change
    (4) Automated
    testings
    The approved change is
    integrated into the
    software system
    Rejected
    Accepted
    Fail
    Changes are abandoned
    A revision is required
    Pass
    A collaborative code review tool
    3

    View Slide

  4. A large number of new code changes can pose
    challenges to perform effective code reviews
    100 - 1,000 reviews were
    performed in a month, and each
    review took 1 day on average

    [Rigby and Bird, 2013]
    ~600 ~400 ~550
    #Reviews/
    month
    [Thongtanunam and Hassan, 2020]
    4

    View Slide

  5. A large number of new code changes can pose
    challenges to perform effective code reviews
    Non-responding
    invited reviewers
    [Ruangwan et al, 2019]
    Challenges:
    5

    View Slide

  6. Reviewers may not respond to the review
    invitation
    An author
    A code
    change
    (1) Uploading
    the change
    (2) Inviting
    reviewers
    Reviewers
    (3) Examining
    the change
    (4) Automated
    testings
    The approved change is
    integrated into the
    software system
    Rejected
    Accepted
    Fail
    Changes are abandoned
    A revision is required
    Pass
    A collaborative code review tool
    6

    View Slide

  7. 16% - 66% of the studied code changes have at least one
    invited reviewer who did not respond to the invitation
    %Non-responding invited
    reviewers in a patch
    The more the reviewers that were invited, the higher
    the chance of having a non-responding reviewer
    7

    View Slide

  8. Investigating the factors that can be associated
    with the participation decision
    Experience &
    Activeness
    Past
    Collaboration
    Workload
    13 studied metrics
    RespondInvitation ∼
    x1 + x2 + ….. + xn
    Use a non-linear logistic regression model
    Analyze the relationship with the
    likelihood of respond the invitation
    Code Ownership

    %Commits authored

    Reviewing Experience

    %Patches reviewed

    Review Participation Rate

    %Invitations Accepted
    Familiarity with the
    Patch Author

    Co-reviewing Freq.
    Remaining Reviews

    #Pending Review
    Requests
    5 Significant factors:
    8

    View Slide

  9. A large number of new code changes can pose
    challenges to perform effective code reviews
    Non-responding
    invited reviewers
    [Ruangwan et al, 2019]
    Workload-aware
    Reviewer recommendation
    [Al-Zubaidi et al, 2020]
    Challenges: (Possible) Solutions:
    9

    View Slide

  10. WLRRec:
    Workload-aware Reviewer Recommendation
    A multi-objective
    evolutionary search
    (NSGA-II)
    Experience &
    Activeness
    Past
    Collaboration
    Obj 1: Maximize the chance of
    participating a review
    Workload
    Obj 2: Mimize the
    Skewness of the Workload
    Measure Reviewer
    Metrics
    A new code change
    10

    View Slide

  11. WLRRec Uses 4+1 Key Reviewer Metrics
    Experience &
    Activeness
    Past
    Collaboration
    Workload
    Code Ownership

    %Commits authored

    Reviewing Experience

    %Patches reviewed

    Review Participation Rate

    %Invitations Accepted
    Familiarity with the
    Patch Author

    Co-reviewing Freq.
    Remaining Reviews

    #Pending Review
    Requests
    Fitness func. for Obj 1:
    Weighted Summation
    Identify reviewers with maximum experience,
    activeness and past collaboration
    Fitness func. for Obj 2:
    Shanon’s Entropy
    Identify reviewers with minimal
    skewed workload
    11

    View Slide

  12. WLRRec identifies reviewers with maximum
    experience activeness, past collaboration (Obj 1)
    Example
    Fitness func. for Obj. 1
    Code Ownership COPick COHoa COKla COAditya
    Rev. Experience REPick REHoa REKla REAditya
    Rev. Participate RPPick RPHoa RPKla RPAditya
    Fam. w/ Patch
    Author
    FPPick FPHoa FPKla FPAditya
    Weighted Sum ScorePick ScoreHoa ScoreKla ScoreAditya
    Solution Candidate
    Objective 1 score ScorePick + ScoreKla
    12

    View Slide

  13. #Pending Review
    Requests
    Solution Candidate
    Total Workload
    Objective 2 score
    (Shanon’s entropy)
    WLRRec identifies reviewers with minimal skewed
    workload (Obj 2)
    Example
    Fitness func. for Obj. 2
    -0.81
    1
    log2
    4
    (
    5
    10
    log
    2
    5
    10
    + 2 *
    1
    10
    log
    2
    1
    10
    +
    2
    10
    log
    2
    2
    10
    )
    The lower the score, the lower
    skewed workload (the better
    distribution of workload)
    13

    View Slide

  14. Our WLRRec outperforms the single-objective
    approaches
    0%
    45%
    90%
    135%
    180%
    Precision Recall F1
    0%
    35%
    70%
    105%
    140%
    Precision Recall F1
    %Gain WLRRec vs GA-Obj1
    Precision Recall F-Measure Precision Recall F-Measure
    %Gain WLRRec vs GA-Obj2
    WLRRec achieves 88%-142% higher precision,

    111%-178% higher recall than GA-Obj1
    WLRRec achieves 55%-101% higher precision,

    96%-138% higher recall than GA-Obj2
    Considering multiple objectives at the same
    time allows us to better find reviewers
    14

    View Slide

  15. A large number of new code changes can pose
    challenges to perform effective code reviews
    Non-responding
    invited reviewers
    [Ruangwan et al, 2019]
    Workload-aware
    Reviewer recommendation
    [Al-Zubaidi et al, 2020]
    Suboptimal reviewing
    [Thongtanunam and Hassan, 2020;

    Chouchen et al, 2021]
    Challenges: (Possible) Solutions:
    15

    View Slide

  16. Reviewers may have subconscious biases due to
    the visible information in a code review tool
    An author
    A code
    change
    (1) Uploading
    the change
    (2) Inviting
    reviewers
    Reviewers
    (3) Examining
    the change
    (4) Automated
    testings
    The approved change is
    integrated into the
    software system
    Rejected
    Accepted
    Fail
    Changes are abandoned
    A revision is required
    Pass
    A collaborative code review tool
    Code review tools often provide a
    transparent environment
    16

    View Slide

  17. Reviewers may have subconscious biases due to
    the visible information in a code review tool
    Ahmed usually
    writes good code
    17

    View Slide

  18. Investigating the signals of visible information that
    are associated with the review decision of a reviewer
    Analyze the relationship with the
    likelihood of giving a positive vote Use a mixed-effects logistic regression model
    IsPositiveVote ∼ x1 + x2 + ….. + xn
    + (1 | ReviewerId )
    8 Studied metrics
    Relationship Status Prior Feedback
    Confounding factors:

    Code Changes Characteristics as
    e.g., #Added Lines
    18

    View Slide

  19. In addition to patch characteristics, other visible
    information is associated with the review decision
    Relationship Status
    Prior Feedback
    Patch
    Characteristics
    Explanatory Power
    (Log-likelihood ratio test)
    Association Direction
    Higher %Reviewed
    past patches for
    the patch author
    More likely to
    Higher %Prior
    positive votes
    Lower %Prior
    comments
    More likely to
    More likely to
    Visible information has a stronger association with
    the review decision than patch characteristics
    19

    View Slide

  20. Other suboptimal reviewing practices also exist in
    the contemporary code review process
    [Chouchen et al, 2021]
    Identify anti-patterns
    in code reviews
    Manually examine code reviews of 100 code changes
    Confused reviewers Divergent reviewers
    Shallow review Toxic review
    21% 20%
    14% 5%
    Low review participation
    32%
    20

    View Slide

  21. A large number of new code changes can pose
    challenges to perform effective code reviews
    Non-responding
    invited reviewers
    [Ruangwan et al, 2019]
    Workload-aware
    Reviewer recommendation
    [Al-Zubaidi et al, 2020]
    Suboptimal reviewing
    [Thongtanunam and Hassan, 2020;

    Chouchen et al, 2021]
    Line-level defect
    prediction
    [Wattanakriengkrai et al, 2020]
    Challenges: (Possible) Solutions:
    21

    View Slide

  22. We find that as little as 1%-3% of the lines of code in a file
    are actually defective after release
    6
    Studied systems Activemq Camel Derby Groovy Hbase Hive Jruby Lucene Wicket
    %Defective files 2-7% 2-8% 6-28% 2-4% 7-11% 6-19% 2-13% 2-8% 2-16%
    %Defective lines
    in defective files
    (at the median)
    2% 2% 2% 2% 1% 2% 2% 3% 3%
    **Detective lines are the source code lines that will be changed by bug-fixing commits to fix post-release defects
    Only 1%-3% of the lines of code in a file are
    actually defective
    Predicting defective lines would potentially
    save reviewer effort on inspecting code
    22

    View Slide

  23. if(closure != null){
    Object oldCurrent = current;
    setClosure(closure, node);
    closure.call();
    current = oldCurrent;
    }
    Identified defect-prone lines
    oldCurrent
    current
    node
    closure
    Defective
    (LIME Score >0)
    Clean
    (LIME score < 0)
    0.8
    0.1
    -0.3
    -0.7
    Ranking tokens based on
    LIME scores
    Mapping
    tokens
    to lines
    if(closure != null){
    Object oldCurrent = current;
    setClosure(closure, node);
    closure.call();
    current = oldCurrent;
    }
    A model-agnostic
    technique (LIME)
    Line-DP: Predicting defective lines using a model-
    agnostic technique (LIME)
    A file-level defect
    prediction model
    Files of
    Interest
    Defect-prone
    files
    A model-agnostic
    technique (LIME)
    Defect-prone
    lines
    23

    View Slide

  24. Our approach achieves an overall predictive accuracy
    better than baseline approaches
    12
    Our Approach
    Line-level
    Baseline Approaches
    Recall 0.61 – 0.62 0.01 - 0.51
    MCC 0.04 – 0.05 -0.01 – 0.03
    False Alarm 0.47 – 0.48 o.01 -0.54
    Distance to Heaven
    (the root mean square of the recall and
    false alarm values)
    0.43– 0.44 0.52 – 0.70
    LIME
    The higher
    the better
    The lower
    the better
    Our Line-DP achieves an overall predictive
    accuracy better than baseline approaches
    Line-DP
    Baselines:
    Static Analysis tools, N-gram
    Our Line-DP can effectively identify defective lines
    while requiring a smaller amount of reviewing effort
    24

    View Slide

  25. A large number of new code changes can pose
    challenges to perform effective code reviews
    Non-responding
    invited reviewers
    [Ruangwan et al, 2019]
    Workload-aware
    Reviewer recommendation
    [Al-Zubaidi et al, 2020]
    Suboptimal reviewing
    [Thongtanunam and Hassan, 2020;

    Chouchen et al, 2021]
    Line-level defect
    prediction
    [Wattanakriengkrai et al, 2020]
    Challenges: (Possible) Solutions:
    Our techniques and empirical findings should help teams speed up the
    process and save effort, while maintaining the quality of reviews
    25

    View Slide

  26. Code Review at Speed: How can we use data to
    help developers do code review faster?
    Non-responding
    invited reviewers
    [Ruangwan et al, 2019]
    Workload-aware
    Reviewer recommendation
    [Al-Zubaidi et al, 2020]
    Suboptimal reviewing
    [Thongtanunam and Hassan, 2020;

    Chouchen et al, 2021]
    Line-level defect
    prediction
    [Wattanakriengkrai et al, 2020]
    The Impact of Human Factors on the Participation Decision of Reviewers in Modern Code Review
    S. Ruangwan, P. Thongtanunam, A. Ihara, K. Matsumoto at Journal of EMSE 2019
    Workload-Aware Reviewer Recommendation using a Multi-objective Search-Based Approach
    W. Al-Zubaidi, P. Thongtanunam, H. K. Dam, C. Tantithamthavorn, A. Ghose at PROMISE2020
    Review Dynamics and Their Impact on Software Quality
    P. Thongtanunam and A. E. Hassan at TSE 2020
    M. Chouchen, A. Ouni, R. Kula, D. Wang, P. Thongtanunam, M. Mkaouer, K. Matsumoto at SANER2021
    Anti-patterns in Modern Code Review: Symptoms and Prevalence
    Predicting Defective Lines Using a Model-Agnostic Technique
    S. Wattanakriengkrai, P. Thongtanunam, C. Tantithamthavorn, H. Hata, K. Matsumoto at TSE2020
    http://patanamon.com
    26

    View Slide