Slide 1

Slide 1 text

March 19, 2021 API-Based Code Search Rikito Taniguchi 1

Slide 2

Slide 2 text

What do you do if…? If you have to use an unfamiliar API method, what do you do? e.g. “What function should I pass to Array.prototype.sort in JS?” • Read API Documentation • Visit related forum, Q&A (e.g. stack overflow, discussions) • Try to find API code example 2

Slide 3

Slide 3 text

What do you do if…? If you have to use an unfamiliar API method, what do you do? e.g. “What function should I pass to Array.prototype.sort in JS?” • Read API Documentation • Visit related forum, Q&A (e.g. stack overflow, discussions) • Try to find API code example Today’s topic 3

Slide 4

Slide 4 text

Agenda • Background and Motivation • How Developers Search for Code [Caitlin et al ’15] • What is API-Based Code Search • Previous Researches: • MAPO [Xhong et al ’09] • MUSE [Moreno et al ’15] • Future Work 4

Slide 5

Slide 5 text

Agenda • Background and Motivation • How Developers Search for Code [Caitlin et al ’15] • What is API-Based Code Search • Previous Researches: • MAPO [Xhong et al ’09] • MUSE [Moreno et al ’15] • Future Work 5

Slide 6

Slide 6 text

How developers know API usages? • Developers frequently need to use unfamiliar API (method). • e.g. FileReader.read in JDK library • How developers know how to use those APIs? Developer How to use FileReader.read method? 6

Slide 7

Slide 7 text

Visit Q&A Forums e.g. stack overflow, Github issues Problem: For unpopular APIs, it’s difficult to find the answer. 7 Searching for code example of Scala3 compiler API…

Slide 8

Slide 8 text

Visit Q&A Forums Problems • The answer might not be up-to-date. • e.g. The answer is for previous version of the API… • The answer could be too specific to the questions 8

Slide 9

Slide 9 text

Read API documentation • Not necessarily well documented. • Hard to grasp the usage only from natural language explanation. 9 https://docs.oracle.com/javase/8/docs/api/java/util/ArrayList.html

Slide 10

Slide 10 text

Need more code examples!!! Code examples play significant role for complementing developers’ understanding of API usage. 10 https://docs.ruby-lang.org/en/2.0.0/Array.html

Slide 11

Slide 11 text

However… • For large library (like JDK) authors, it’s time-consuming to write and maintain the API examples. • Code examples are not necessarily available. In that case, we need to search for code examples. Actually, research shows developers frequently search for code examples. 11

Slide 12

Slide 12 text

How Developers Search for Code [Sadowski et al ’15] • An empirical study performed at Google. • Research Questions • RQ1: Why do programmers search? • and more … 12

Slide 13

Slide 13 text

How Developers Search for Code [Sadowski et al ’15] • (Probably) Google’s internal code search engine was used. • Participants: 27 software developers at Google • Duration: 2 weeks • Participants answer “What question are you trying to answer?” before searching (in free-form response). • Gathered 259 answers • Categorize answers and analyze what’s the purpose of the search. 13

Slide 14

Slide 14 text

How Developers Search for Code [Sadowski et al ’15] On average 5 search sessions, 12 queries on a workday. • 33.5% search for code example. • 26% for exploring or reading the code • … 14

Slide 15

Slide 15 text

How Developers Search for Code [Sadowski et al ’15] Developers frequently search for code examples! However…

Slide 16

Slide 16 text

Problem with text based search engine Search FileReader.read on Github Search (with filtering Java code) 16

Slide 17

Slide 17 text

Problem with text based search engine • Since it doesn’t care semantic information, it matches a lot of irrelevant code snippets. • Because it ranks the result with NLP manner (e.g. TF-IDF), non- important results tend to rank in top. We need better code example search engine. 17

Slide 18

Slide 18 text

Agenda • Background and Motivation • How Developers Search for Code [Caitlin et al ’15] • What is API-Based Code Search • Previous Researches: • MAPO [Xhong et al ’09] • MUSE [Moreno et al ’15] • Future Work 18

Slide 19

Slide 19 text

API-Based Code Search API-Based Code Search System returns the code examples,
 in response to the API search query. How to use
 java.io.FileReader.read ? Developer API-Based Code
 Search System 19 Here’re code examples for
 java.io.FileReader.read

Slide 20

Slide 20 text

Demands for API-Based Code Search • It should return distinct code examples. • Otherwise, developers need to browse many search results • The resulting code examples should be simple. • So developers can easily understand the API usage. • Typical or “interesting” examples should be in top results. • Niche use cases tend to be uninteresting. 20

Slide 21

Slide 21 text

Duplicated results make developers browse a tons of examples… 21 Should return distinct code examples Here’re code examples for
 java.io.FileReader.read These examples
 doing the same thing…
 browse down…

Slide 22

Slide 22 text

Distinct examples enable developers to find the interesting example quickly. 22 Should return distinct code examples Here’re code examples for
 java.io.FileReader.read THIS IS IT!!

Slide 23

Slide 23 text

The code examples should be simple So…
 where should
 I read…? Irrevant to
 java.io.FileReader.read Here’re code example for
 java.io.FileReader.read 23

Slide 24

Slide 24 text

Typical or “interesting” examples should be in top Otherwise, developers have to browse a lot… These examples
 are not what I want…
 scrolling down… Here’re code examples for
 java.io.FileReader.read demanding answer 24

Slide 25

Slide 25 text

Challenges of API-Based Code Search • It should return distinct code examples as much as possible. • -> How to cluster the “duplicated” code examples? • The resulting code examples should be simple. • -> How to choose the concise example from each cluster? • Typical or “interesting” examples should be in top results. • -> How to sort and rank the code examples? 25

Slide 26

Slide 26 text

Agenda • Background and Motivation • How Developers Search for Code [Caitlin et al ’15] • What is API-Based Code Search • Previous Researches: • MAPO [Xhong et al ’09] • MUSE [Moreno et al ’15] • Future Work 26

Slide 27

Slide 27 text

API-Based Code Search Researches • MAPO: Mining and Recommending API Usage Patterns [Zhong et al ’09] • Zhong, Hao, et al. "MAPO: Mining and recommending API usage patterns." European Conference on Object-Oriented Programming. Springer, Berlin, Heidelberg, 2009. • How Can I Use This Method? [Moreno et al ’15] • Laura Moreno, et al. "How can I use this method?" In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE '15). IEEE Press, 880–890, 2015. 27

Slide 28

Slide 28 text

MAPO: Mining and recommending API usage patterns • Input: API • e.g. some.pkg.class.method • Output: API usage pattern described by Method Invocation Sequence. 28 Method Invocation Sequence, We can see the details by clicking Found patterns of org.eclipse.jface.action.IContributionManager.appendToGroup MAPO is an Eclipse plugin for Java.

Slide 29

Slide 29 text

MAPO: Mining and recommending API usage patterns Editor List of Patterns Details
 of the
 selected
 pattern 29

Slide 30

Slide 30 text

MAPO architecture • Code Analyzer • Retrieve and analyze code into API call sequence. • API usage miner • Cluster API call sequence and rank the patterns. • Recommender • Provide code examples to a developer. 30

Slide 31

Slide 31 text

MAPO / Code Analyzer • MAPO utilizes Eclipse’s JDT compiler to analyze source code. • For each method, MAPO extracts third-party API method call sequence. • m1 • m2 • @new SomeClass • m3 • chain 31

Slide 32

Slide 32 text

MAPO / Code Analyzer More realistic example from a paper org.eclipse.gef.ui.parts.GraphicalEditor#getGraphicalViewer • @new org.eclipse.gef.editparts.ScalableRootEditPart • @org.eclipse.gef.ui.parts.GraphicalEditor#getGraphica lViewer • @org.eclipse.gef.EditPartViewer#setRootEditPart 32

Slide 33

Slide 33 text

Dealing with conditional statements • (then, while) • (else, while) • (don’t execute loop) • (don’t execute loop) If branch: retrieve all possible MISs Loop: execute once or nothing. 33

Slide 34

Slide 34 text

Dealing with conditional statements • • • • Limit the number of possibilities. (Otherwise, method with lot conditional generate many MISs, and rank higher). 34 Greedily pick from longer MISs, until picked MISs cover whole methods. • • • • • • • • Cover whole
 method invocations:
 i1, i2, i3, i4

Slide 35

Slide 35 text

MAPO / API Usage Miner / clustering For clustering Method Invocation Sequence (MIS), API Usage Miner calculate the similarities between each MIS. The similarity score is average of two heuristics scores. • Method and class name • Called API methods 35

Slide 36

Slide 36 text

MAPO / API Usage Miner / clustering Calculate (Normalized) Levenshtein distance between • Class name • “DEditorActionContributor” vs “RubyEditorActionContributor” • Method name • “contributeToMenu” vs “contributeToMenu” 36

Slide 37

Slide 37 text

MAPO / API Usage Miner / clustering Similarity based on Method Invocation Sequences • s1, s2: Method Invocation Sequence. • # of API calls: number of API calls. • I1, I2: set of API method calls in s1 and s2 37

Slide 38

Slide 38 text

MAPO / API Usage Miner / pattern miner For each cluster, MAPO mines the most frequent subsequent utilizing sequential pattern mining. • m1 • m2 • m3 • m4 • m5 • m1 • m2 • m3 • xxx • yyy • m1 • m2 • xxx • m4 • m5 • m1 • m2 • m3 • xxx • m5 Subsequent 
 frequently appear! Patterns 38

Slide 39

Slide 39 text

MAPO / Recommender Calculate the similarity between code at hand and API usage patterns. Higher similarity example ranks higher. 39 Code at hand

Slide 40

Slide 40 text

MAPO / empirical study • 6 tasks using GEF (Eclipse Graphical Editing Framework), with test cases. • Participants: 6 (all grad students). • 2 groups: • do task with MAPO • with GoogleCodeSearch and Strathcona (code recommend system). 40 Latter are more complicated tasks

Slide 41

Slide 41 text

MAPO / empirical study • Each group do the task in a fixed time. • Count the number of failed tests (if the program doesn’t build, count as 1 failed test). 41 MAPO looks effective for
 complicated tasks (fewer bugs)

Slide 42

Slide 42 text

MAPO / conclusion • MAPO returns Method Invocation Sequence as common usage pattern. • Uses JDT compiler to extract MIS. • Clulsters patterns by similarity heuristics (based on class and method name, called APIs). • Identify common pattern utilizing sequential pattern mining. • Show examples considering the context. 42

Slide 43

Slide 43 text

Drawbacks of MAPO MAPO relies on whole MIS in a method. Irrelevant part might be a noise. Irrevant to
 java.io.FileReader.read 43

Slide 44

Slide 44 text

Drawbacks of MAPO • Returns MIS, instead of concrete code example • Room for improvement in clustering algorithm • MAPO uses some heuristics… 44

Slide 45

Slide 45 text

Agenda • Background and Motivation • How Developers Search for Code [Caitlin et al ’15] • What is API-Based Code Search • Previous Researches: • MAPO [Xhong et al ’09] • MUSE [Moreno et al ’15] • Future Work 45

Slide 46

Slide 46 text

MUSE (Method USage Examples) [Laura et al ’15] • Input: API • Output: Code examples • (While MAPO returns Method Invocation Sequence) https://raw.githubusercontent.com/lmorenoc/icse15-muse-appendix/master/commons-io-2.4/examples/writeStringToFile_29.html 46

Slide 47

Slide 47 text

MUSE architecture overview Basically the same with MAPO Code Repository Code Analyzer Clustering Engine Code Clusters Program Slices Pattern selection Code Examples Developer 47

Slide 48

Slide 48 text

Difference from MAPO Code Repository Code Analyzer Clustering Engine Code Clusters Program Slices Pattern selection Code Examples Developer 48 Instead of MIS, extract Program Slice Code Clone detection Use simplicity instead of
 Sequential Pattern Mining Code examples, not MIS

Slide 49

Slide 49 text

Program Slicing MUSE applies static backward slicing to extract only relevant (to interesting API call) part of code snippet. 49 x := 1;
 y := x + 1;
 foo()
 bar();
 method(x, y);
 uninteresting(y); x := 1;
 y := x + 1;
 foo()
 bar();
 method(x, y);
 uninteresting(y); Interesting method

Slide 50

Slide 50 text

Clustering Code Repository Code Analyzer Clustering Engine Code Clusters Program Slices Pattern selection Code Examples Developer MUSE utilizes code clone detection technique for clustering. (While MAPO uses similarity heuristics). 50

Slide 51

Slide 51 text

Code clone detection MUSE uses Simian (text-based) code similarity analyzer for clustering. Simian, Similarity Analyzer https://www.harukizaemon.com/simian/ 51 If (a >= b) {
 c = d + b; // comment1
 d = d + 1;
 } else {
 c = d - a; // comment2
 } If (m >= n) { // comment1
 y = x + n; // comment2
 x = x + 1;
 } else {
 y = x - m; // comment3
 } Not-similar by
 simple string
 comparison

Slide 52

Slide 52 text

Code clone detection Simian can detect syntactically identical code clones allowing variation of identifiers, comments, etc… (called type-2 clone) Simian, Similarity Analyzer https://www.harukizaemon.com/simian/ 52 If (m >= n) {
 y = x + n; // comment1
 x = x + 1;
 } else {
 y = d - a; // comment2
 } If (m >= n) { // comment1
 y = x + n; // comment2
 x = x + 1;
 } else {
 y = x - m; // comment3
 } Ignore comments Rename identifiers

Slide 53

Slide 53 text

Rank example clusters Intuition: More examples from variable methods in a cluster -> rank higher 3 examples 1 examples 53 Code Repository Code Analyzer Clustering Engine Code Clusters Program Slices Pattern selection Code Examples Developer

Slide 54

Slide 54 text

Readability and reusability metric MUSE pick one representative for each cluster. (because MUSE shows only one example for each cluster). Pick the most readable
 and reusable code 54 Code Repository Code Analyzer Clustering Engine Code Clusters Program Slices Pattern selection Code Examples Developer

Slide 55

Slide 55 text

Readability metric Learning a metric for code readability [Buse et al ’09] • Supervised classification model. • Vectorize method by • Lines, avg of identifiers, indent • # of branches, loops, assignments, etc… 55 Vectorize ML model 0.8 score

Slide 56

Slide 56 text

Reusability metric Intuition: more stdlib object -> more reusability • ex_i: (i th) code example • #JavaObjectTypes: number of objects from JDK libraries • #ObjectTypes: number of (any) objects 56

Slide 57

Slide 57 text

Selection Score Selection Score of an example is average of • Readability metric • Reusability metric Based on the score, MUSE pick the highest scored example for each cluster. 57

Slide 58

Slide 58 text

MUSE architecture Code Repository Code Analyzer Clustering Engine Code Clusters Program Slices Pattern miner Code Examples Developer 58 Give me code example for
 FileUtils.writeStringToFile

Slide 59

Slide 59 text

MUSE / experiment RQ: Do MUSE’s examples help developers to complete their programming tasks? • 12 industrial developers (5y experience on average) • Assign two kind of tasks, in 60 minutes each • Participants need to use designated 3rd party libraries. 59

Slide 60

Slide 60 text

MUSE / experiment Assigned tasks • T1: Retrieve PDF metadata for the given input URL. • T2: Open a csv file, and filter rows by specific value, then output filtered data to the designated file. Each task has subtasks (e.g. fetch pdf from internet, open csv file …) Evaluate task subtasks’ completeness by (human) code review. 60

Slide 61

Slide 61 text

MUSE / experiment T1-NCE: Solve T1 using any resources available on Internet (e.g. stack overflow, API document), except MUSE generated code examples. T1-CE: Solve T1 using any resources available on Internet and MUSE generated code examples 61

Slide 62

Slide 62 text

MUSE / experiment Avg 53% completeness in NCE group Avg 73% completeness in CE group 62

Slide 63

Slide 63 text

Agenda • Background and Motivation • How Developers Search for Code [Caitlin’ 15] • What is API-Based Code Search • Previous Researches: • MAPO [Xhong et al ’09] • MUSE [Moreno et al ’15] • Future Work 63

Slide 64

Slide 64 text

Room for improvements for MAPO and MUSE • Unnecessary complexity of code example. • Need returned value description in code example. • Large search scope, huge index size. (for MUSE) 64

Slide 65

Slide 65 text

Unnecessary complexity of code examples 65 Interesting part of
 BufferedReader.read Unnecessary
 can’t remove by slicing Code example for
 BufferedReader.read

Slide 66

Slide 66 text

Need returned value description in code example MAPO and MUSE gives code examples, but they lack the result of execution. What value in n
 in the end??? 66

Slide 67

Slide 67 text

Large search scope, huge indexes MUSE needs to create code slices and clusters for each API call from whole search scope. Code snippet generate 3 API usage example. • FileReader.read • StringBuffer.append • InputStreamReader.read In experience, MUSE fix the query and search scope… 67

Slide 68

Slide 68 text

What can we do? (Idea) Extract code example from unit tests (of the method). • Unit tests tend to focus on the specific method usage. • They have the examples of inputs and output of the method. • Limit the search scope, and index size. 68

Slide 69

Slide 69 text

Thank you! • Background and Motivation • How Developers Search for Code [Caitlin et al ’15] • What is API-Based Code Search • Previous Researches: • MAPO [Xhong et al ’09] • MUSE [Moreno et al ’15] • Future Work 69

Slide 70

Slide 70 text

Related research areas • Automatic code summarization • Shido, Yusuke, et al. "Automatic source code summarization with extended tree- LSTM." 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 2019. • Code snippet Recommendation • Luan, Sifei, et al. "Aroma: Code recommendation via structural code search." Proceedings of the ACM on Programming Languages 3.OOPSLA (2019): 1-28. • Text-based code search with DeepLearning technique • Husain, Hamel, et al. "Codesearchnet challenge: Evaluating the state of semantic code search." arXiv preprint arXiv:1909.09436 (2019). 70