Upgrade to Pro — share decks privately, control downloads, hide ads and more …

API CODE RECOMMENDATION USING STATISTICAL LEARNING FROM FINE-GRAINED CHANGES

8e81db9f29d2543ada5fac546f99e023?s=47 Michael Hilton
November 16, 2016

API CODE RECOMMENDATION USING STATISTICAL LEARNING FROM FINE-GRAINED CHANGES

Slides from FSE16 talk about code recommendation

8e81db9f29d2543ada5fac546f99e023?s=128

Michael Hilton

November 16, 2016
Tweet

Transcript

  1. Lily Mast Eli Rademacher Tien Nguyen Danny Dig Anh Nguyen

    Michael Hilton Mihai Codoban Hoan Nguyen API CODE RECOMMENDATION USING STATISTICAL LEARNING FROM FINE-GRAINED CHANGES 1
  2. STATE OF THE PRACTICE 2

  3. STATE OF THE PRACTICE System.out. 2

  4. STATE OF THE PRACTICE System.out. append(char c) append(CharSequence c append(CharSequence

    c checkError() close() flush() format(locale l, Stri format(String format, 2
  5. STATE OF THE PRACTICE System.out. append(char c) PrintStream append(CharSequence csq)

    PrintStream append(CharSequence csq, int start, … ) PrintStream checkError() boolean close() void flush() void format(locale l, String format,…) PrintStream format(String format, Object… args) PrintStream print(boolean b) void print(char c) void print(char[] s) void print(double d) void print(float f) void print(int i) void print(long l) void print(Object obj) void print(String s) void printf(Locale l, String format, …) PrintStream printf(String format, Object… args) PrintStream println() void println(boolean x) void println(char x) void println(char[] x) void println(double x) void println(float x) void println(int x) void println(long x) void println(Object x) void println(String x) void write(byte[] but, int off, int len) void write(int b) void 2
  6. (PREVIOUS) STATE OF THE ART : LEARN FROM STATIC CONTEXT

    Text t;
 t = new Text();
 t.setText(“hello world"); 3
  7. (PREVIOUS) STATE OF THE ART : LEARN FROM STATIC CONTEXT

    Text t;
 t = new Text();
 t.setText(“hello world"); Recommend: 
 setText(…)
 because it often co-occurs with:
 new Text() 3
  8. (PREVIOUS) STATE OF THE ART : LEARN FROM STATIC CONTEXT

    Text t;
 t = new Text();
 t.setText(“hello world"); Recommend: 
 setText(…)
 because it often co-occurs with:
 new Text() frequently co-occuring terms: [Bruch et al. FSE 09] 3
  9. (PREVIOUS) STATE OF THE ART : LEARN FROM STATIC CONTEXT

    Text t;
 t = new Text();
 t.setText(“hello world"); Recommend: 
 setText(…)
 because it often co-occurs with:
 new Text() frequently co-occuring terms: [Bruch et al. FSE 09] +Order of terms: [Reiss ICSE 09] 3
  10. (PREVIOUS) STATE OF THE ART : LEARN FROM STATIC CONTEXT

    Text t;
 t = new Text();
 t.setText(“hello world"); Recommend: 
 setText(…)
 because it often co-occurs with:
 new Text() frequently co-occuring terms: [Bruch et al. FSE 09] +Order of terms: [Reiss ICSE 09] +Program dependencies: [Nguyen ICSE 12] 3
  11. 
 
 + for (Task t: tasks) {
 t.execute();
 +

    } CHALLENGE: IRRELEVANT TOKENS CLOSEST TO REC POINT. 4
  12. 
 
 + for (Task t: tasks) {
 t.execute();
 +

    } + Set<TaskResult> results = new HashSet<>(); + results._ CHALLENGE: IRRELEVANT TOKENS CLOSEST TO REC POINT. 4
  13. 
 
 + for (Task t: tasks) {
 t.execute();
 +

    } + Set<TaskResult> results = new HashSet<>(); + results._ 
 
 + Set<TaskResult> results = new HashSet<>(); 
 + results._ CHALLENGE: IRRELEVANT TOKENS CLOSEST TO REC POINT. 4
  14. NATURALNESS OF CODE CHANGES Our key insight: changes are regular


    Changes co-occur together
 We leverage dynamic context provided by regular changes 5
  15. KEY ADVANCES 6 Approach Previous Ours Mining Source Static Code

    Context (code snapshots) Static Code Context (code snapshots) + Dynamic Code Context (code changes) Recommendation Approach Association Mining Association Mining + Statistical Inference of Changes + Set<TaskResult> results = new HashSet<>(); for (Task t: tasks) {
 t.execute();
 + +results.add(t.getResults()); } + Set<TaskResult> results = new HashSet<>(); for (Task t: tasks) {
 t.execute();
 + +results._ } + Set<TaskResult> results = new HashSet<>(); + results.add(t.getResults());
  16. APIREC 7

  17. LEARNING REGULAR CHANGES USING STATISTICAL APPROACH 8 
 
 +

    Set<TaskResult> results = new HashSet<>(); for (Task t: tasks) {
 t.execute();
 + results.add(t.getResults()); } 
 
 + Set<TaskResult> results = new HashSet<>(); for (Task t: tasks) {
 t.execute();
 + results.add(t.getResults()); }
  18. LEARNING REGULAR CHANGES USING STATISTICAL APPROACH 8 
 
 +

    Set<TaskResult> results = new HashSet<>(); for (Task t: tasks) {
 t.execute();
 + results.add(t.getResults()); }
  19. LEARNING REGULAR CHANGES USING STATISTICAL APPROACH 8 
 
 +

    Set<TaskResult> results = new HashSet<>(); for (Task t: tasks) {
 t.execute();
 + results.add(t.getResults()); }
  20. Set<TaskResult> results = new HashSet<>(); for (Task t: tasks) {


    t.execute();
 results.add(t.getResults()); } Set<Results> r = new HashSet<>(); for (int i=0; i<arr.size(); i++) {
 t.execute();
 r.add(t.getResults()); } HashSet<Tasks> h = new HashSet<>(); for (Task t: tasks) {
 t.execute();
 h.add(t.getResults()); } Set<TaskResult> col = new HashSet<>(); for (int i=0; i<arr.size(); i++) {
 arr.run();
 col.add(arr.getResult()); } Set<TaskResult> results = new HashSet<>(); for (Task t: tasks) {
 t.execute();
 results.add(t.getResults()); } Set<TaskResult> results = new HashSet<>(); while (tasks.length > 0) {
 tasks[i].calculate();
 results.add(t.getResults()); i++; } BASIS OF CONSENSUS 9
  21. CHANGE INFERENCE MODEL 10

  22. Score(c,(DC,SC))= 
 CHANGE INFERENCE MODEL 10 c = current change

    (<operation kind>,<AST node type>, <label>)
  23. 
 wSC × Score(c,SC) 
 Score(c,(DC,SC))= 
 CHANGE INFERENCE MODEL

    10 c = current change (<operation kind>,<AST node type>, <label>) Score(c,SC) = impact of Static Context T on predicting c
  24. 
 wSC × Score(c,SC) 
 
 
 + wDC ×

    Score(c,DC) Score(c,(DC,SC))= 
 CHANGE INFERENCE MODEL 10 c = current change (<operation kind>,<AST node type>, <label>) Score(c,SC) = impact of Static Context T on predicting c Score(c,DC) = impact of Dynamic Context on predicting c
  25. 
 wSC × Score(c,SC) 
 
 wSC × 
 


    + wDC × Score(c,DC) Score(c,(DC,SC))= 
 CHANGE INFERENCE MODEL 10 c = current change (<operation kind>,<AST node type>, <label>) Score(c,SC) = impact of Static Context T on predicting c Score(c,DC) = impact of Dynamic Context on predicting c wSC = weight of impact of context
  26. 
 wSC × Score(c,SC) 
 
 wSC × 
 


    + wDC × Score(c,DC) 
 + wDC × Score(c,(DC,SC))= 
 CHANGE INFERENCE MODEL 10 c = current change (<operation kind>,<AST node type>, <label>) Score(c,SC) = impact of Static Context T on predicting c Score(c,DC) = impact of Dynamic Context on predicting c wSC = weight of impact of context wDC = weight of impact of change
  27. RESEARCH QUESTIONS: 11

  28. RESEARCH QUESTIONS: ➤RQ1 Accuracy: How often does APIREC recommend the

    correct API? 11
  29. RESEARCH QUESTIONS: ➤RQ1 Accuracy: How often does APIREC recommend the

    correct API? ➤RQ2 Sensitivity: How does context size impact APIREC’s accuracy? 11
  30. RESEARCH QUESTIONS: ➤RQ1 Accuracy: How often does APIREC recommend the

    correct API? ➤RQ2 Sensitivity: How does context size impact APIREC’s accuracy? ➤RQ3 Running Time: What is the running time of APIREC? 11
  31. CORPORA 12 Large Corpus Community Corpus Projects 50 8 Total

    Source Files 48,699 8,561 Total Commits 113,103 18,233 Total AST nodes changed 43,538,386 4,487,479
  32. 
 for (Task t: tasks) {
 t.execute();
 }
 SIMULATE USER

    BY RE-PLAYING CHANGES FROM COMMITS 13 Next Change Recommendation 1 2 3 4 5
  33. 
 for (Task t: tasks) {
 t.execute();
 }
 SIMULATE USER

    BY RE-PLAYING CHANGES FROM COMMITS 13 Next Change Recommendation 1 2 3 4 5 Set
  34. 
 for (Task t: tasks) {
 t.execute();
 }
 
 Set

    SIMULATE USER BY RE-PLAYING CHANGES FROM COMMITS 13 Next Change Recommendation 1 2 3 4 5 TaskResult
  35. 
 Set<TaskResult> 
 for (Task t: tasks) {
 t.execute();
 }


    
 Set SIMULATE USER BY RE-PLAYING CHANGES FROM COMMITS 13 Next Change Recommendation 1 2 3 4 5 results
  36. 
 Set<TaskResult> 
 for (Task t: tasks) {
 t.execute();
 }


    
 Set<TaskResult> results 
 Set SIMULATE USER BY RE-PLAYING CHANGES FROM COMMITS 13 Next Change Recommendation 1 2 3 4 5 =
  37. 
 Set<TaskResult> 
 Set<TaskResult> results = 
 for (Task t:

    tasks) {
 t.execute();
 }
 
 Set<TaskResult> results 
 Set SIMULATE USER BY RE-PLAYING CHANGES FROM COMMITS 13 Next Change Recommendation 1 2 3 4 5 new
  38. 
 Set<TaskResult> 
 Set<TaskResult> results = new 
 Set<TaskResult> results

    = 
 for (Task t: tasks) {
 t.execute();
 }
 
 Set<TaskResult> results 
 Set SIMULATE USER BY RE-PLAYING CHANGES FROM COMMITS 13 Next Change Recommendation 1 2 3 4 5 HashSet<>();
  39. 
 Set<TaskResult> 
 Set<TaskResult> results = new HashSet<>();
 
 


    Set<TaskResult> results = new 
 Set<TaskResult> results = 
 for (Task t: tasks) {
 t.execute();
 }
 
 Set<TaskResult> results 
 Set SIMULATE USER BY RE-PLAYING CHANGES FROM COMMITS 13 Next Change Recommendation 1 2 3 4 5 results.
  40. 
 Set<TaskResult> 
 Set<TaskResult> results = new HashSet<>();
 results.
 


    
 Set<TaskResult> results = new HashSet<>();
 
 
 Set<TaskResult> results = new 
 Set<TaskResult> results = 
 for (Task t: tasks) {
 t.execute();
 }
 
 Set<TaskResult> results 
 Set SIMULATE USER BY RE-PLAYING CHANGES FROM COMMITS 13 Next Change Recommendation 1 2 3 4 5 add()
  41. 
 Set<TaskResult> 
 Set<TaskResult> results = new HashSet<>();
 results.
 


    
 Set<TaskResult> results = new HashSet<>();
 
 
 Set<TaskResult> results = new 
 Set<TaskResult> results = 
 for (Task t: tasks) {
 t.execute();
 }
 add 
 Set<TaskResult> results 
 Set SIMULATE USER BY RE-PLAYING CHANGES FROM COMMITS 13 Next Change Recommendation 1 2 3 4 5 add() clear size remove contains
  42. 
 Set<TaskResult> 
 Set<TaskResult> results = new HashSet<>();
 results.
 


    
 Set<TaskResult> results = new HashSet<>();
 
 
 Set<TaskResult> results = new 
 Set<TaskResult> results = 
 for (Task t: tasks) {
 t.execute();
 }
 add 
 Set<TaskResult> results 
 Set SIMULATE USER BY RE-PLAYING CHANGES FROM COMMITS 13 Next Change Recommendation 1 2 3 4 5 add() clear size remove contains Top - 1
  43. 
 Set<TaskResult> 
 Set<TaskResult> results = new HashSet<>();
 results.
 


    
 Set<TaskResult> results = new HashSet<>();
 
 
 Set<TaskResult> results = new 
 Set<TaskResult> results = 
 for (Task t: tasks) {
 t.execute();
 }
 remove 
 Set<TaskResult> results 
 Set SIMULATE USER BY RE-PLAYING CHANGES FROM COMMITS 13 Next Change Recommendation 1 2 3 4 5 add() clear size contains add
  44. 
 Set<TaskResult> 
 Set<TaskResult> results = new HashSet<>();
 results.
 


    
 Set<TaskResult> results = new HashSet<>();
 
 
 Set<TaskResult> results = new 
 Set<TaskResult> results = 
 for (Task t: tasks) {
 t.execute();
 }
 remove 
 Set<TaskResult> results 
 Set SIMULATE USER BY RE-PLAYING CHANGES FROM COMMITS 13 Next Change Recommendation 1 2 3 4 5 add() clear size contains Top - 5 add
  45. EVALUATION SETUP 14

  46. EVALUATION SETUP Community Edition: APIREC trained on Large Corpus, tested

    with Community Corpus. 14
  47. EVALUATION SETUP Community Edition: APIREC trained on Large Corpus, tested

    with Community Corpus. Project Edition: APIREC trained first 90% of commits of a single project, tested on remaining 10% of commits 14
  48. EVALUATION SETUP Community Edition: APIREC trained on Large Corpus, tested

    with Community Corpus. Project Edition: APIREC trained first 90% of commits of a single project, tested on remaining 10% of commits User Edition: APIREC trained first 90% of commits of a single user project, tested on remaining 10% of commits 14
  49. ACCURACY: RELATED WORK 15 correct answer is in Top-X suggestions

    0% 20% 40% 60% 80% Top-1 Top-5 Top-10 78% 74% 55% 77% 64% 29% 69% 61% 26% 40% 34% 22% sequence based set based graph based APIRec
  50. ACCURACY: RELATED WORK 15 correct answer is in Top-X suggestions

    APIREC is more accurate than previous work 0% 20% 40% 60% 80% Top-1 Top-5 Top-10 78% 74% 55% 77% 64% 29% 69% 61% 26% 40% 34% 22% sequence based set based graph based APIRec
  51. STATIC CONTEXT PROVIDES CONSTANT IMPACT 16 Number of tokens in

    static context 0% 20% 40% 60% 80% 1 5 10 15 20 30 40 Code Context - Top 1 Code Context - Top 5
  52. 0% 20% 40% 60% 80% 1 5 10 15 20

    30 40 Change Context - Top 1 Change Context - Top 5 DYNAMIC CONTEXT SIZE PROVIDES INCREASING IMPACT 17 Number of tokens in dynamic context 17
  53. ACCURACY: DIFFERENT EDITIONS 18 0% 20% 40% 60% 80% Top

    1 Top 5 Top 10 78% 74% 55% 41% 34% 17% 59% 53% 29% User (trained on 1 user in 1 project) Project (trained on one project) Community (trained on 50 projects)
  54. IMPLICATIONS 19

  55. IMPLICATIONS Changes are valuable
 Using fine grained changes can provide

    significant improvement over previous, change agnostic approaches. 19
  56. IMPLICATIONS Changes are valuable
 Using fine grained changes can provide

    significant improvement over previous, change agnostic approaches. Changes are personal
 A developers’s history predicts future changes better than the entire project changes. 19
  57. IMPLICATIONS Changes are valuable
 Using fine grained changes can provide

    significant improvement over previous, change agnostic approaches. Changes are personal
 A developers’s history predicts future changes better than the entire project changes. Changes are untapped
 Fine grained changes provide a wealth of data that is currently under used. 19
  58. 20