API CODE RECOMMENDATION USING STATISTICAL LEARNING FROM FINE-GRAINED CHANGES

Lily Mast Eli Rademacher Tien Nguyen Danny Dig Anh Nguyen
Michael Hilton Mihai Codoban Hoan Nguyen API CODE RECOMMENDATION USING STATISTICAL LEARNING FROM FINE-GRAINED CHANGES 1

STATE OF THE PRACTICE 2

STATE OF THE PRACTICE System.out. 2

STATE OF THE PRACTICE System.out. append(char c) append(CharSequence c append(CharSequence
c checkError() close() flush() format(locale l, Stri format(String format, 2

STATE OF THE PRACTICE System.out. append(char c) PrintStream append(CharSequence csq)
PrintStream append(CharSequence csq, int start, … ) PrintStream checkError() boolean close() void flush() void format(locale l, String format,…) PrintStream format(String format, Object… args) PrintStream print(boolean b) void print(char c) void print(char[] s) void print(double d) void print(float f) void print(int i) void print(long l) void print(Object obj) void print(String s) void printf(Locale l, String format, …) PrintStream printf(String format, Object… args) PrintStream println() void println(boolean x) void println(char x) void println(char[] x) void println(double x) void println(float x) void println(int x) void println(long x) void println(Object x) void println(String x) void write(byte[] but, int off, int len) void write(int b) void 2

(PREVIOUS) STATE OF THE ART : LEARN FROM STATIC CONTEXT
Text t;  t = new Text();  t.setText(“hello world"); 3

Text t;  t = new Text();  t.setText(“hello world"); Recommend:   setText(…)  because it often co-occurs with:  new Text() 3

Text t;  t = new Text();  t.setText(“hello world"); Recommend:   setText(…)  because it often co-occurs with:  new Text() frequently co-occuring terms: [Bruch et al. FSE 09] 3

Text t;  t = new Text();  t.setText(“hello world"); Recommend:   setText(…)  because it often co-occurs with:  new Text() frequently co-occuring terms: [Bruch et al. FSE 09] +Order of terms: [Reiss ICSE 09] 3

Text t;  t = new Text();  t.setText(“hello world"); Recommend:   setText(…)  because it often co-occurs with:  new Text() frequently co-occuring terms: [Bruch et al. FSE 09] +Order of terms: [Reiss ICSE 09] +Program dependencies: [Nguyen ICSE 12] 3

    + for (Task t: tasks) {  t.execute();  +
} CHALLENGE: IRRELEVANT TOKENS CLOSEST TO REC POINT. 4

} + Set<TaskResult> results = new HashSet<>(); + results._ CHALLENGE: IRRELEVANT TOKENS CLOSEST TO REC POINT. 4

} + Set<TaskResult> results = new HashSet<>(); + results._     + Set<TaskResult> results = new HashSet<>();   + results._ CHALLENGE: IRRELEVANT TOKENS CLOSEST TO REC POINT. 4

NATURALNESS OF CODE CHANGES Our key insight: changes are regular 
Changes co-occur together  We leverage dynamic context provided by regular changes 5

KEY ADVANCES 6 Approach Previous Ours Mining Source Static Code
Context (code snapshots) Static Code Context (code snapshots) + Dynamic Code Context (code changes) Recommendation Approach Association Mining Association Mining + Statistical Inference of Changes + Set<TaskResult> results = new HashSet<>(); for (Task t: tasks) {  t.execute();  + +results.add(t.getResults()); } + Set<TaskResult> results = new HashSet<>(); for (Task t: tasks) {  t.execute();  + +results._ } + Set<TaskResult> results = new HashSet<>(); + results.add(t.getResults());

APIREC 7

LEARNING REGULAR CHANGES USING STATISTICAL APPROACH 8     +
Set<TaskResult> results = new HashSet<>(); for (Task t: tasks) {  t.execute();  + results.add(t.getResults()); }     + Set<TaskResult> results = new HashSet<>(); for (Task t: tasks) {  t.execute();  + results.add(t.getResults()); }

LEARNING REGULAR CHANGES USING STATISTICAL APPROACH 8     +
Set<TaskResult> results = new HashSet<>(); for (Task t: tasks) {  t.execute();  + results.add(t.getResults()); }

Set<TaskResult> results = new HashSet<>(); for (Task t: tasks) { 
t.execute();  results.add(t.getResults()); } Set<Results> r = new HashSet<>(); for (int i=0; i<arr.size(); i++) {  t.execute();  r.add(t.getResults()); } HashSet<Tasks> h = new HashSet<>(); for (Task t: tasks) {  t.execute();  h.add(t.getResults()); } Set<TaskResult> col = new HashSet<>(); for (int i=0; i<arr.size(); i++) {  arr.run();  col.add(arr.getResult()); } Set<TaskResult> results = new HashSet<>(); for (Task t: tasks) {  t.execute();  results.add(t.getResults()); } Set<TaskResult> results = new HashSet<>(); while (tasks.length > 0) {  tasks[i].calculate();  results.add(t.getResults()); i++; } BASIS OF CONSENSUS 9

CHANGE INFERENCE MODEL 10

Score(c,(DC,SC))=   CHANGE INFERENCE MODEL 10 c = current change
(<operation kind>,<AST node type>, <label>)

  wSC × Score(c,SC)   Score(c,(DC,SC))=   CHANGE INFERENCE MODEL
10 c = current change (<operation kind>,<AST node type>, <label>) Score(c,SC) = impact of Static Context T on predicting c

  wSC × Score(c,SC)       + wDC ×
Score(c,DC) Score(c,(DC,SC))=   CHANGE INFERENCE MODEL 10 c = current change (<operation kind>,<AST node type>, <label>) Score(c,SC) = impact of Static Context T on predicting c Score(c,DC) = impact of Dynamic Context on predicting c

  wSC × Score(c,SC)     wSC ×    
+ wDC × Score(c,DC) Score(c,(DC,SC))=   CHANGE INFERENCE MODEL 10 c = current change (<operation kind>,<AST node type>, <label>) Score(c,SC) = impact of Static Context T on predicting c Score(c,DC) = impact of Dynamic Context on predicting c wSC = weight of impact of context

  wSC × Score(c,SC)     wSC ×    
+ wDC × Score(c,DC)   + wDC × Score(c,(DC,SC))=   CHANGE INFERENCE MODEL 10 c = current change (<operation kind>,<AST node type>, <label>) Score(c,SC) = impact of Static Context T on predicting c Score(c,DC) = impact of Dynamic Context on predicting c wSC = weight of impact of context wDC = weight of impact of change

RESEARCH QUESTIONS: 11

RESEARCH QUESTIONS: ➤RQ1 Accuracy: How often does APIREC recommend the
correct API? 11

correct API? ➤RQ2 Sensitivity: How does context size impact APIREC’s accuracy? 11

correct API? ➤RQ2 Sensitivity: How does context size impact APIREC’s accuracy? ➤RQ3 Running Time: What is the running time of APIREC? 11

CORPORA 12 Large Corpus Community Corpus Projects 50 8 Total
Source Files 48,699 8,561 Total Commits 113,103 18,233 Total AST nodes changed 43,538,386 4,487,479

  for (Task t: tasks) {  t.execute();  }  SIMULATE USER
BY RE-PLAYING CHANGES FROM COMMITS 13 Next Change Recommendation 1 2 3 4 5

  for (Task t: tasks) {  t.execute();  }  SIMULATE USER
BY RE-PLAYING CHANGES FROM COMMITS 13 Next Change Recommendation 1 2 3 4 5 Set

  for (Task t: tasks) {  t.execute();  }    Set
SIMULATE USER BY RE-PLAYING CHANGES FROM COMMITS 13 Next Change Recommendation 1 2 3 4 5 TaskResult

  Set<TaskResult>   for (Task t: tasks) {  t.execute();  } 
  Set SIMULATE USER BY RE-PLAYING CHANGES FROM COMMITS 13 Next Change Recommendation 1 2 3 4 5 results

  Set<TaskResult>   for (Task t: tasks) {  t.execute();  } 
  Set<TaskResult> results   Set SIMULATE USER BY RE-PLAYING CHANGES FROM COMMITS 13 Next Change Recommendation 1 2 3 4 5 =

  Set<TaskResult>   Set<TaskResult> results =   for (Task t:
tasks) {  t.execute();  }    Set<TaskResult> results   Set SIMULATE USER BY RE-PLAYING CHANGES FROM COMMITS 13 Next Change Recommendation 1 2 3 4 5 new

  Set<TaskResult>   Set<TaskResult> results = new   Set<TaskResult> results
=   for (Task t: tasks) {  t.execute();  }    Set<TaskResult> results   Set SIMULATE USER BY RE-PLAYING CHANGES FROM COMMITS 13 Next Change Recommendation 1 2 3 4 5 HashSet<>();

  Set<TaskResult>   Set<TaskResult> results = new HashSet<>();     
Set<TaskResult> results = new   Set<TaskResult> results =   for (Task t: tasks) {  t.execute();  }    Set<TaskResult> results   Set SIMULATE USER BY RE-PLAYING CHANGES FROM COMMITS 13 Next Change Recommendation 1 2 3 4 5 results.

  Set<TaskResult>   Set<TaskResult> results = new HashSet<>();  results.   
  Set<TaskResult> results = new HashSet<>();      Set<TaskResult> results = new   Set<TaskResult> results =   for (Task t: tasks) {  t.execute();  }    Set<TaskResult> results   Set SIMULATE USER BY RE-PLAYING CHANGES FROM COMMITS 13 Next Change Recommendation 1 2 3 4 5 add()

  Set<TaskResult> results = new HashSet<>();      Set<TaskResult> results = new   Set<TaskResult> results =   for (Task t: tasks) {  t.execute();  }  add   Set<TaskResult> results   Set SIMULATE USER BY RE-PLAYING CHANGES FROM COMMITS 13 Next Change Recommendation 1 2 3 4 5 add() clear size remove contains

  Set<TaskResult> results = new HashSet<>();      Set<TaskResult> results = new   Set<TaskResult> results =   for (Task t: tasks) {  t.execute();  }  add   Set<TaskResult> results   Set SIMULATE USER BY RE-PLAYING CHANGES FROM COMMITS 13 Next Change Recommendation 1 2 3 4 5 add() clear size remove contains Top - 1

  Set<TaskResult> results = new HashSet<>();      Set<TaskResult> results = new   Set<TaskResult> results =   for (Task t: tasks) {  t.execute();  }  remove   Set<TaskResult> results   Set SIMULATE USER BY RE-PLAYING CHANGES FROM COMMITS 13 Next Change Recommendation 1 2 3 4 5 add() clear size contains add

  Set<TaskResult> results = new HashSet<>();      Set<TaskResult> results = new   Set<TaskResult> results =   for (Task t: tasks) {  t.execute();  }  remove   Set<TaskResult> results   Set SIMULATE USER BY RE-PLAYING CHANGES FROM COMMITS 13 Next Change Recommendation 1 2 3 4 5 add() clear size contains Top - 5 add

EVALUATION SETUP 14

EVALUATION SETUP Community Edition: APIREC trained on Large Corpus, tested
with Community Corpus. 14

with Community Corpus. Project Edition: APIREC trained ﬁrst 90% of commits of a single project, tested on remaining 10% of commits 14

with Community Corpus. Project Edition: APIREC trained ﬁrst 90% of commits of a single project, tested on remaining 10% of commits User Edition: APIREC trained ﬁrst 90% of commits of a single user project, tested on remaining 10% of commits 14

ACCURACY: RELATED WORK 15 correct answer is in Top-X suggestions
0% 20% 40% 60% 80% Top-1 Top-5 Top-10 78% 74% 55% 77% 64% 29% 69% 61% 26% 40% 34% 22% sequence based set based graph based APIRec

ACCURACY: RELATED WORK 15 correct answer is in Top-X suggestions
APIREC is more accurate than previous work 0% 20% 40% 60% 80% Top-1 Top-5 Top-10 78% 74% 55% 77% 64% 29% 69% 61% 26% 40% 34% 22% sequence based set based graph based APIRec

STATIC CONTEXT PROVIDES CONSTANT IMPACT 16 Number of tokens in
static context 0% 20% 40% 60% 80% 1 5 10 15 20 30 40 Code Context - Top 1 Code Context - Top 5

0% 20% 40% 60% 80% 1 5 10 15 20
30 40 Change Context - Top 1 Change Context - Top 5 DYNAMIC CONTEXT SIZE PROVIDES INCREASING IMPACT 17 Number of tokens in dynamic context 17

ACCURACY: DIFFERENT EDITIONS 18 0% 20% 40% 60% 80% Top
1 Top 5 Top 10 78% 74% 55% 41% 34% 17% 59% 53% 29% User (trained on 1 user in 1 project) Project (trained on one project) Community (trained on 50 projects)

IMPLICATIONS 19

IMPLICATIONS Changes are valuable  Using ﬁne grained changes can provide
signiﬁcant improvement over previous, change agnostic approaches. 19

signiﬁcant improvement over previous, change agnostic approaches. Changes are personal  A developers’s history predicts future changes better than the entire project changes. 19

signiﬁcant improvement over previous, change agnostic approaches. Changes are personal  A developers’s history predicts future changes better than the entire project changes. Changes are untapped  Fine grained changes provide a wealth of data that is currently under used. 19

API CODE RECOMMENDATION USING STATISTICAL LEARN...

API CODE RECOMMENDATION USING STATISTICAL LEARNING FROM FINE-GRAINED CHANGES

More Decks by Michael Hilton

Other Decks in Research

Featured

Transcript