Upgrade to Pro — share decks privately, control downloads, hide ads and more …

API-Based Code Search

API-Based Code Search

Rikito Taniguchi

March 25, 2021
Tweet

More Decks by Rikito Taniguchi

Other Decks in Programming

Transcript

  1. What do you do if…? If you have to use

    an unfamiliar API method, what do you do? e.g. “What function should I pass to Array.prototype.sort in JS?” • Read API Documentation • Visit related forum, Q&A (e.g. stack overflow, discussions) • Try to find API code example 2
  2. What do you do if…? If you have to use

    an unfamiliar API method, what do you do? e.g. “What function should I pass to Array.prototype.sort in JS?” • Read API Documentation • Visit related forum, Q&A (e.g. stack overflow, discussions) • Try to find API code example Today’s topic 3
  3. Agenda • Background and Motivation • How Developers Search for

    Code [Caitlin et al ’15] • What is API-Based Code Search • Previous Researches: • MAPO [Xhong et al ’09] • MUSE [Moreno et al ’15] • Future Work 4
  4. Agenda • Background and Motivation • How Developers Search for

    Code [Caitlin et al ’15] • What is API-Based Code Search • Previous Researches: • MAPO [Xhong et al ’09] • MUSE [Moreno et al ’15] • Future Work 5
  5. How developers know API usages? • Developers frequently need to

    use unfamiliar API (method). • e.g. FileReader.read in JDK library • How developers know how to use those APIs? Developer How to use FileReader.read method? 6
  6. Visit Q&A Forums e.g. stack overflow, Github issues Problem: For

    unpopular APIs, it’s difficult to find the answer. 7 Searching for code example of Scala3 compiler API…
  7. Visit Q&A Forums Problems • The answer might not be

    up-to-date. • e.g. The answer is for previous version of the API… • The answer could be too specific to the questions 8
  8. Read API documentation • Not necessarily well documented. • Hard

    to grasp the usage only from natural language explanation. 9 https://docs.oracle.com/javase/8/docs/api/java/util/ArrayList.html
  9. Need more code examples!!! Code examples play significant role for

    complementing developers’ understanding of API usage. 10 https://docs.ruby-lang.org/en/2.0.0/Array.html
  10. However… • For large library (like JDK) authors, it’s time-consuming

    to write and maintain the API examples. • Code examples are not necessarily available. In that case, we need to search for code examples. Actually, research shows developers frequently search for code examples. 11
  11. How Developers Search for Code [Sadowski et al ’15] •

    An empirical study performed at Google. • Research Questions • RQ1: Why do programmers search? • and more … 12
  12. How Developers Search for Code [Sadowski et al ’15] •

    (Probably) Google’s internal code search engine was used. • Participants: 27 software developers at Google • Duration: 2 weeks • Participants answer “What question are you trying to answer?” before searching (in free-form response). • Gathered 259 answers • Categorize answers and analyze what’s the purpose of the search. 13
  13. How Developers Search for Code [Sadowski et al ’15] On

    average 5 search sessions, 12 queries on a workday. • 33.5% search for code example. • 26% for exploring or reading the code • … 14
  14. How Developers Search for Code [Sadowski et al ’15] Developers

    frequently search for code examples! However…
  15. Problem with text based search engine • Since it doesn’t

    care semantic information, it matches a lot of irrelevant code snippets. • Because it ranks the result with NLP manner (e.g. TF-IDF), non- important results tend to rank in top. We need better code example search engine. 17
  16. Agenda • Background and Motivation • How Developers Search for

    Code [Caitlin et al ’15] • What is API-Based Code Search • Previous Researches: • MAPO [Xhong et al ’09] • MUSE [Moreno et al ’15] • Future Work 18
  17. API-Based Code Search API-Based Code Search System returns the code

    examples,
 in response to the API search query. How to use
 java.io.FileReader.read ? Developer API-Based Code
 Search System 19 Here’re code examples for
 java.io.FileReader.read
  18. Demands for API-Based Code Search • It should return distinct

    code examples. • Otherwise, developers need to browse many search results • The resulting code examples should be simple. • So developers can easily understand the API usage. • Typical or “interesting” examples should be in top results. • Niche use cases tend to be uninteresting. 20
  19. Duplicated results make developers browse a tons of examples… 21

    Should return distinct code examples Here’re code examples for
 java.io.FileReader.read These examples
 doing the same thing…
 browse down…
  20. Distinct examples enable developers to find the interesting example quickly.

    22 Should return distinct code examples Here’re code examples for
 java.io.FileReader.read THIS IS IT!!
  21. The code examples should be simple So…
 where should
 I

    read…? Irrevant to
 java.io.FileReader.read Here’re code example for
 java.io.FileReader.read 23
  22. Typical or “interesting” examples should be in top Otherwise, developers

    have to browse a lot… These examples
 are not what I want…
 scrolling down… Here’re code examples for
 java.io.FileReader.read demanding answer 24
  23. Challenges of API-Based Code Search • It should return distinct

    code examples as much as possible. • -> How to cluster the “duplicated” code examples? • The resulting code examples should be simple. • -> How to choose the concise example from each cluster? • Typical or “interesting” examples should be in top results. • -> How to sort and rank the code examples? 25
  24. Agenda • Background and Motivation • How Developers Search for

    Code [Caitlin et al ’15] • What is API-Based Code Search • Previous Researches: • MAPO [Xhong et al ’09] • MUSE [Moreno et al ’15] • Future Work 26
  25. API-Based Code Search Researches • MAPO: Mining and Recommending API

    Usage Patterns [Zhong et al ’09] • Zhong, Hao, et al. "MAPO: Mining and recommending API usage patterns." European Conference on Object-Oriented Programming. Springer, Berlin, Heidelberg, 2009. • How Can I Use This Method? [Moreno et al ’15] • Laura Moreno, et al. "How can I use this method?" In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE '15). IEEE Press, 880–890, 2015. 27
  26. MAPO: Mining and recommending API usage patterns • Input: API

    • e.g. some.pkg.class.method • Output: API usage pattern described by Method Invocation Sequence. 28 Method Invocation Sequence, We can see the details by clicking Found patterns of org.eclipse.jface.action.IContributionManager.appendToGroup MAPO is an Eclipse plugin for Java.
  27. MAPO: Mining and recommending API usage patterns Editor List of

    Patterns Details
 of the
 selected
 pattern 29
  28. MAPO architecture • Code Analyzer • Retrieve and analyze code

    into API call sequence. • API usage miner • Cluster API call sequence and rank the patterns. • Recommender • Provide code examples to a developer. 30
  29. MAPO / Code Analyzer • MAPO utilizes Eclipse’s JDT compiler

    to analyze source code. • For each method, MAPO extracts third-party API method call sequence. • m1 • m2 • @new SomeClass • m3 • chain 31
  30. MAPO / Code Analyzer More realistic example from a paper

    org.eclipse.gef.ui.parts.GraphicalEditor#getGraphicalViewer • @new org.eclipse.gef.editparts.ScalableRootEditPart • @org.eclipse.gef.ui.parts.GraphicalEditor#getGraphica lViewer • @org.eclipse.gef.EditPartViewer#setRootEditPart 32
  31. Dealing with conditional statements • <i1, i2, i4> (then, while)

    • <i1, i3, i4> (else, while) • <i1, i2> (don’t execute loop) • <i1, i3> (don’t execute loop) If branch: retrieve all possible MISs Loop: execute once or nothing. 33
  32. Dealing with conditional statements • <i1, i2, i4> • <i1,

    i3, i4> • <i1, i2> • <i1, i3> Limit the number of possibilities. (Otherwise, method with lot conditional generate many MISs, and rank higher). 34 Greedily pick from longer MISs, until picked MISs cover whole methods. • <i1, i2, i4> • <i1, i3, i4> • <i1, i2> • <i1, i3> • <i1, i2, i4> • <i1, i3, i4> • <i1, i2> • <i1, i3> Cover whole
 method invocations:
 i1, i2, i3, i4
  33. MAPO / API Usage Miner / clustering For clustering Method

    Invocation Sequence (MIS), API Usage Miner calculate the similarities between each MIS. The similarity score is average of two heuristics scores. • Method and class name • Called API methods 35
  34. MAPO / API Usage Miner / clustering Calculate (Normalized) Levenshtein

    distance between • Class name • “DEditorActionContributor” vs “RubyEditorActionContributor” • Method name • “contributeToMenu” vs “contributeToMenu” 36
  35. MAPO / API Usage Miner / clustering Similarity based on

    Method Invocation Sequences • s1, s2: Method Invocation Sequence. • # of API calls: number of API calls. • I1, I2: set of API method calls in s1 and s2 37
  36. MAPO / API Usage Miner / pattern miner For each

    cluster, MAPO mines the most frequent subsequent utilizing sequential pattern mining. • m1 • m2 • m3 • m4 • m5 • m1 • m2 • m3 • xxx • yyy • m1 • m2 • xxx • m4 • m5 • m1 • m2 • m3 • xxx • m5 Subsequent <m1, m2, m3>
 frequently appear! Patterns 38
  37. MAPO / Recommender Calculate the similarity between code at hand

    and API usage patterns. Higher similarity example ranks higher. 39 Code at hand
  38. MAPO / empirical study • 6 tasks using GEF (Eclipse

    Graphical Editing Framework), with test cases. • Participants: 6 (all grad students). • 2 groups: • do task with MAPO • with GoogleCodeSearch and Strathcona (code recommend system). 40 Latter are more complicated tasks
  39. MAPO / empirical study • Each group do the task

    in a fixed time. • Count the number of failed tests (if the program doesn’t build, count as 1 failed test). 41 MAPO looks effective for
 complicated tasks (fewer bugs)
  40. MAPO / conclusion • MAPO returns Method Invocation Sequence as

    common usage pattern. • Uses JDT compiler to extract MIS. • Clulsters patterns by similarity heuristics (based on class and method name, called APIs). • Identify common pattern utilizing sequential pattern mining. • Show examples considering the context. 42
  41. Drawbacks of MAPO MAPO relies on whole MIS in a

    method. Irrelevant part might be a noise. Irrevant to
 java.io.FileReader.read 43
  42. Drawbacks of MAPO • Returns MIS, instead of concrete code

    example • Room for improvement in clustering algorithm • MAPO uses some heuristics… 44
  43. Agenda • Background and Motivation • How Developers Search for

    Code [Caitlin et al ’15] • What is API-Based Code Search • Previous Researches: • MAPO [Xhong et al ’09] • MUSE [Moreno et al ’15] • Future Work 45
  44. MUSE (Method USage Examples) [Laura et al ’15] • Input:

    API • Output: Code examples • (While MAPO returns Method Invocation Sequence) https://raw.githubusercontent.com/lmorenoc/icse15-muse-appendix/master/commons-io-2.4/examples/writeStringToFile_29.html 46
  45. MUSE architecture overview Basically the same with MAPO Code Repository

    Code Analyzer Clustering Engine Code Clusters Program Slices Pattern selection Code Examples Developer 47
  46. Difference from MAPO Code Repository Code Analyzer Clustering Engine Code

    Clusters Program Slices Pattern selection Code Examples Developer 48 Instead of MIS, extract Program Slice Code Clone detection Use simplicity instead of
 Sequential Pattern Mining Code examples, not MIS
  47. Program Slicing MUSE applies static backward slicing to extract only

    relevant (to interesting API call) part of code snippet. 49 x := 1;
 y := x + 1;
 foo()
 bar();
 method(x, y);
 uninteresting(y); x := 1;
 y := x + 1;
 foo()
 bar();
 method(x, y);
 uninteresting(y); Interesting method
  48. Clustering Code Repository Code Analyzer Clustering Engine Code Clusters Program

    Slices Pattern selection Code Examples Developer MUSE utilizes code clone detection technique for clustering. (While MAPO uses similarity heuristics). 50
  49. Code clone detection MUSE uses Simian (text-based) code similarity analyzer

    for clustering. Simian, Similarity Analyzer https://www.harukizaemon.com/simian/ 51 If (a >= b) {
 c = d + b; // comment1
 d = d + 1;
 } else {
 c = d - a; // comment2
 } If (m >= n) { // comment1
 y = x + n; // comment2
 x = x + 1;
 } else {
 y = x - m; // comment3
 } Not-similar by
 simple string
 comparison
  50. Code clone detection Simian can detect syntactically identical code clones

    allowing variation of identifiers, comments, etc… (called type-2 clone) Simian, Similarity Analyzer https://www.harukizaemon.com/simian/ 52 If (m >= n) {
 y = x + n; // comment1
 x = x + 1;
 } else {
 y = d - a; // comment2
 } If (m >= n) { // comment1
 y = x + n; // comment2
 x = x + 1;
 } else {
 y = x - m; // comment3
 } Ignore comments Rename identifiers
  51. Rank example clusters Intuition: More examples from variable methods in

    a cluster -> rank higher 3 examples 1 examples 53 Code Repository Code Analyzer Clustering Engine Code Clusters Program Slices Pattern selection Code Examples Developer
  52. Readability and reusability metric MUSE pick one representative for each

    cluster. (because MUSE shows only one example for each cluster). Pick the most readable
 and reusable code 54 Code Repository Code Analyzer Clustering Engine Code Clusters Program Slices Pattern selection Code Examples Developer
  53. Readability metric Learning a metric for code readability [Buse et

    al ’09] • Supervised classification model. • Vectorize method by • Lines, avg of identifiers, indent • # of branches, loops, assignments, etc… 55 Vectorize ML model 0.8 score
  54. Reusability metric Intuition: more stdlib object -> more reusability •

    ex_i: (i th) code example • #JavaObjectTypes: number of objects from JDK libraries • #ObjectTypes: number of (any) objects 56
  55. Selection Score Selection Score of an example is average of

    • Readability metric • Reusability metric Based on the score, MUSE pick the highest scored example for each cluster. 57
  56. MUSE architecture Code Repository Code Analyzer Clustering Engine Code Clusters

    Program Slices Pattern miner Code Examples Developer 58 Give me code example for
 FileUtils.writeStringToFile
  57. MUSE / experiment RQ: Do MUSE’s examples help developers to

    complete their programming tasks? • 12 industrial developers (5y experience on average) • Assign two kind of tasks, in 60 minutes each • Participants need to use designated 3rd party libraries. 59
  58. MUSE / experiment Assigned tasks • T1: Retrieve PDF metadata

    for the given input URL. • T2: Open a csv file, and filter rows by specific value, then output filtered data to the designated file. Each task has subtasks (e.g. fetch pdf from internet, open csv file …) Evaluate task subtasks’ completeness by (human) code review. 60
  59. MUSE / experiment T1-NCE: Solve T1 using any resources available

    on Internet (e.g. stack overflow, API document), except MUSE generated code examples. T1-CE: Solve T1 using any resources available on Internet and MUSE generated code examples 61
  60. Agenda • Background and Motivation • How Developers Search for

    Code [Caitlin’ 15] • What is API-Based Code Search • Previous Researches: • MAPO [Xhong et al ’09] • MUSE [Moreno et al ’15] • Future Work 63
  61. Room for improvements for MAPO and MUSE • Unnecessary complexity

    of code example. • Need returned value description in code example. • Large search scope, huge index size. (for MUSE) 64
  62. Unnecessary complexity of code examples 65 Interesting part of
 BufferedReader.read

    Unnecessary
 can’t remove by slicing Code example for
 BufferedReader.read
  63. Need returned value description in code example MAPO and MUSE

    gives code examples, but they lack the result of execution. What value in n
 in the end??? 66
  64. Large search scope, huge indexes MUSE needs to create code

    slices and clusters for each API call from whole search scope. Code snippet generate 3 API usage example. • FileReader.read • StringBuffer.append • InputStreamReader.read In experience, MUSE fix the query and search scope… 67
  65. What can we do? (Idea) Extract code example from unit

    tests (of the method). • Unit tests tend to focus on the specific method usage. • They have the examples of inputs and output of the method. • Limit the search scope, and index size. 68
  66. Thank you! • Background and Motivation • How Developers Search

    for Code [Caitlin et al ’15] • What is API-Based Code Search • Previous Researches: • MAPO [Xhong et al ’09] • MUSE [Moreno et al ’15] • Future Work 69
  67. Related research areas • Automatic code summarization • Shido, Yusuke,

    et al. "Automatic source code summarization with extended tree- LSTM." 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 2019. • Code snippet Recommendation • Luan, Sifei, et al. "Aroma: Code recommendation via structural code search." Proceedings of the ACM on Programming Languages 3.OOPSLA (2019): 1-28. • Text-based code search with DeepLearning technique • Husain, Hamel, et al. "Codesearchnet challenge: Evaluating the state of semantic code search." arXiv preprint arXiv:1909.09436 (2019). 70