an unfamiliar API method, what do you do? e.g. “What function should I pass to Array.prototype.sort in JS?” • Read API Documentation • Visit related forum, Q&A (e.g. stack overflow, discussions) • Try to find API code example 2
an unfamiliar API method, what do you do? e.g. “What function should I pass to Array.prototype.sort in JS?” • Read API Documentation • Visit related forum, Q&A (e.g. stack overflow, discussions) • Try to find API code example Today’s topic 3
use unfamiliar API (method). • e.g. FileReader.read in JDK library • How developers know how to use those APIs? Developer How to use FileReader.read method? 6
to write and maintain the API examples. • Code examples are not necessarily available. In that case, we need to search for code examples. Actually, research shows developers frequently search for code examples. 11
(Probably) Google’s internal code search engine was used. • Participants: 27 software developers at Google • Duration: 2 weeks • Participants answer “What question are you trying to answer?” before searching (in free-form response). • Gathered 259 answers • Categorize answers and analyze what’s the purpose of the search. 13
care semantic information, it matches a lot of irrelevant code snippets. • Because it ranks the result with NLP manner (e.g. TF-IDF), non- important results tend to rank in top. We need better code example search engine. 17
examples, in response to the API search query. How to use java.io.FileReader.read ? Developer API-Based Code Search System 19 Here’re code examples for java.io.FileReader.read
code examples. • Otherwise, developers need to browse many search results • The resulting code examples should be simple. • So developers can easily understand the API usage. • Typical or “interesting” examples should be in top results. • Niche use cases tend to be uninteresting. 20
code examples as much as possible. • -> How to cluster the “duplicated” code examples? • The resulting code examples should be simple. • -> How to choose the concise example from each cluster? • Typical or “interesting” examples should be in top results. • -> How to sort and rank the code examples? 25
Usage Patterns [Zhong et al ’09] • Zhong, Hao, et al. "MAPO: Mining and recommending API usage patterns." European Conference on Object-Oriented Programming. Springer, Berlin, Heidelberg, 2009. • How Can I Use This Method? [Moreno et al ’15] • Laura Moreno, et al. "How can I use this method?" In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE '15). IEEE Press, 880–890, 2015. 27
• e.g. some.pkg.class.method • Output: API usage pattern described by Method Invocation Sequence. 28 Method Invocation Sequence, We can see the details by clicking Found patterns of org.eclipse.jface.action.IContributionManager.appendToGroup MAPO is an Eclipse plugin for Java.
• <i1, i3, i4> (else, while) • <i1, i2> (don’t execute loop) • <i1, i3> (don’t execute loop) If branch: retrieve all possible MISs Loop: execute once or nothing. 33
Invocation Sequence (MIS), API Usage Miner calculate the similarities between each MIS. The similarity score is average of two heuristics scores. • Method and class name • Called API methods 35
distance between • Class name • “DEditorActionContributor” vs “RubyEditorActionContributor” • Method name • “contributeToMenu” vs “contributeToMenu” 36
Method Invocation Sequences • s1, s2: Method Invocation Sequence. • # of API calls: number of API calls. • I1, I2: set of API method calls in s1 and s2 37
Graphical Editing Framework), with test cases. • Participants: 6 (all grad students). • 2 groups: • do task with MAPO • with GoogleCodeSearch and Strathcona (code recommend system). 40 Latter are more complicated tasks
in a fixed time. • Count the number of failed tests (if the program doesn’t build, count as 1 failed test). 41 MAPO looks effective for complicated tasks (fewer bugs)
common usage pattern. • Uses JDT compiler to extract MIS. • Clulsters patterns by similarity heuristics (based on class and method name, called APIs). • Identify common pattern utilizing sequential pattern mining. • Show examples considering the context. 42
Clusters Program Slices Pattern selection Code Examples Developer 48 Instead of MIS, extract Program Slice Code Clone detection Use simplicity instead of Sequential Pattern Mining Code examples, not MIS
relevant (to interesting API call) part of code snippet. 49 x := 1; y := x + 1; foo() bar(); method(x, y); uninteresting(y); x := 1; y := x + 1; foo() bar(); method(x, y); uninteresting(y); Interesting method
for clustering. Simian, Similarity Analyzer https://www.harukizaemon.com/simian/ 51 If (a >= b) { c = d + b; // comment1 d = d + 1; } else { c = d - a; // comment2 } If (m >= n) { // comment1 y = x + n; // comment2 x = x + 1; } else { y = x - m; // comment3 } Not-similar by simple string comparison
allowing variation of identifiers, comments, etc… (called type-2 clone) Simian, Similarity Analyzer https://www.harukizaemon.com/simian/ 52 If (m >= n) { y = x + n; // comment1 x = x + 1; } else { y = d - a; // comment2 } If (m >= n) { // comment1 y = x + n; // comment2 x = x + 1; } else { y = x - m; // comment3 } Ignore comments Rename identifiers
cluster. (because MUSE shows only one example for each cluster). Pick the most readable and reusable code 54 Code Repository Code Analyzer Clustering Engine Code Clusters Program Slices Pattern selection Code Examples Developer
al ’09] • Supervised classification model. • Vectorize method by • Lines, avg of identifiers, indent • # of branches, loops, assignments, etc… 55 Vectorize ML model 0.8 score
complete their programming tasks? • 12 industrial developers (5y experience on average) • Assign two kind of tasks, in 60 minutes each • Participants need to use designated 3rd party libraries. 59
for the given input URL. • T2: Open a csv file, and filter rows by specific value, then output filtered data to the designated file. Each task has subtasks (e.g. fetch pdf from internet, open csv file …) Evaluate task subtasks’ completeness by (human) code review. 60
on Internet (e.g. stack overflow, API document), except MUSE generated code examples. T1-CE: Solve T1 using any resources available on Internet and MUSE generated code examples 61
slices and clusters for each API call from whole search scope. Code snippet generate 3 API usage example. • FileReader.read • StringBuffer.append • InputStreamReader.read In experience, MUSE fix the query and search scope… 67
tests (of the method). • Unit tests tend to focus on the specific method usage. • They have the examples of inputs and output of the method. • Limit the search scope, and index size. 68
for Code [Caitlin et al ’15] • What is API-Based Code Search • Previous Researches: • MAPO [Xhong et al ’09] • MUSE [Moreno et al ’15] • Future Work 69
et al. "Automatic source code summarization with extended tree- LSTM." 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 2019. • Code snippet Recommendation • Luan, Sifei, et al. "Aroma: Code recommendation via structural code search." Proceedings of the ACM on Programming Languages 3.OOPSLA (2019): 1-28. • Text-based code search with DeepLearning technique • Husain, Hamel, et al. "Codesearchnet challenge: Evaluating the state of semantic code search." arXiv preprint arXiv:1909.09436 (2019). 70