Using Large-scale Heterogeneous Graph Representation Learning for Code Review Recommendations at Microsoft

Using Large-scale Heterogeneous Graph Representation Learning for Code Review Recommendations
at Microsoft Jiyang Zhang University of Texas at Austin, USA Chandra Maddila Microsoft Research -> Meta, USA Ram Bairi Microsoft Research, India Christian Bird Microsoft Research, USA Ujjwal Raizada Microsoft Research, India Apoorva Agrawal Microsoft Research, India Yamini Jhawar Microsoft Research, India Kim Herzig Microsoft, USA Arie van Deursen Delft University of Technology, The Netherlands 1

Reviewer Recommendation • Given a pull request, get better reviewers,
faster, to ship better code, faster • Hard at scale: large teams, large (mono) repos, people moving around • State of the art: • Heuristics based on (earlier) authorship and reviewership • 😢 Does not consider semantic information (pull request title, descriptions, linked tasks, etc.) • 😢 Cold-start problem • 😢 Insufficient diversity when picking up reviewers based on reviewership 2

ESEC FSE 2022 MSR MIP 2023 4

pull request user user work item repository file iteration comment
comment id, RepoId, PullRequestId, Status, Title, Iterations, Url, SourceRefName, TargetRefName FilesEditedAAlot, FilesEditedConcurrently id, RepoId, FilePath, Type, IsEditedALot, IsConcurrentlyEdit edInLastNMonths id, RepoId, Name, OrganizationNa me, ProjectName, SourceControlS ystem id, RepoId, WorkItemId, Type, Title, Status, UpdatedDate contains contains creates reviews linked # of directories Creation date, Closed date Review date, Vote id, RepoId, ContentThreadId,CommentI d,IsDeleted,ParentComment Id id, repoId, Description,Co mmonRefCom mitId,PushId, PullRequestIte rationId replyTo reportsTo addsIteration has has comments PublishedDate, LastUpdatedDate CreatedDate iterations The Nalanda Graph at Microsoft 5

Nalanda’s Augmented Socio- Technical Graph 6

Problem formulation: Link Prediction Pull Request 2 reviews ? reviews
? File B Pull Request 1 User 1 File A User 2 reviews changes creates changes changes changes 7

Training CORAL: A Two-Layer Graph Convolutional Network 1st Layer 2nd
Layer • Each layer, each node: get feature information from node and its neighbours and aggregate into representation. • During training make nodes connected in actual graph semantically similar (large inner product) 8

CORAL’s Inductive Inference For Recommendations 9 … Pull Request Users
New Pull Request New Pull Request New Pull Request • Given a new pull request, plug a new in the graph • Connect edges to its files, authors and words. • Obtain pull request node embedding by going through the two Graph Convolutional Network layers • User nodes with highest inner products with pull request node will be recommended by our model 1st Layer 2nd Layer

Dataset for Training & Evaluation • Graph: • File: 2.8M
• Pull Request: 1.3M • Text: 1.1M • User: 48.5K • Work item: 540K • Training Dataset: • 7M <pull request, reviewer> pairs • 700M pairs randomly sampled from graph • Testing Dataset (not in Training): • 250K <pull request, reviewer> pairs 10

How well does CORAL model reviewing history? • 250K historic
pull requests not in training data • See what CORAL would have predicted • Top-k accuracy: Recommend at least one correct reviewer in top k • Mean Reciprocal Rank: Is correct recommendation at top of list? • In 73% of cases, top 3 contains correct reviewer 11

Ablation: Are All CORAL Features Needed? • Just the graph
not much good • Words and files both contribute individually • Files alone get a long way • Their combination yields best performance 12

How does CORAL compare to a rule-based model? • Currently
in production at Microsoft • Zanjani, Kagdi, Bird: “Automatically recommending peer reviewers in modern code review”, TSE 2015 • Model of expertise based on author interactions with files and time decay • Two datasets of 500 pull requests each from differently sized repositories: • 220 large (> 100 devs) • 200 medium (25 < devs < 100) • 80 small (devs < 25) • Ask devs about relevance (irrelevant / like to be informed / will act) for pull requests they did not do 13

CORAL vs Rule-Based Accuracy • Accuracy of actual interactions (change
status, add comment) • Accuracy of devs saying it is relevant • No single clear winner: “no model to rule them all” • More training data for large repos • Social graph less relevant for small repos 14 Repo size Rule-based Model CORAL Large 0.19 0.37 Medium 0.31 0.36 Small 0.35 0.23

What do Users Think (I)? 15 “ I am lead
of this area and would like to review these kinds of PRs which are likely fixing some regressions ” This is a PR worked on by my sister team. We have a dependency on them. So, I’d love to review this PR. I was not added when the PR was created. I would have loved to be added when it was active. Yes! This PR needs a careful review. I'd love to spend time on this PR.

What do Users Think (II) 16 No longer relevant because
this is a repo my team transferred in 2020 to another team. I am a PM, so this PR is not relevant to me. Not relevant since I no longer work on the team that manages this service.

Conclusion • Explored combining social graph and semantic information for
recommending reviewers • Conducted both offline (historic) analysis and online (asking devs) analyis of impact • Offline accuracy of 73% in top 3 • Online recommendations appreciated by devs (67%) • Works better than rule based recommendations for larger repos • Todo? Decay, take node / edge specific features into account, effect of hyper parameters, applicability to open source, … 17

Using Large-scale Heterogeneous Graph Representation Learning for Code Review Recommendations
at Microsoft Jiyang Zhang University of Texas at Austin, USA Chandra Maddila Microsoft Research -> Meta, USA Ram Bairi Microsoft Research, India Christian Bird Microsoft Research, USA Ujjwal Raizada Microsoft Research, India Apoorva Agrawal Microsoft Research, India Yamini Jhawar Microsoft Research, India Kim Herzig Microsoft, USA Arie van Deursen Delft University of Technology, The Netherlands 18

Using Large-scale HeterogeneousGraph Represent...

Using Large-scale Heterogeneous Graph Representation Learning for Code Review Recommendations at Microsoft

Arie van Deursen

More Decks by Arie van Deursen

Other Decks in Research

Featured

Transcript