tasks • SE data repositories (Qualitas Corpus, PROMISE, ...) • Code databases (PGA, GHTorrent, GH Archive, ...) • Software data in the wild (GitHub, Gerrit, Jira, ... ) ◦ lack of appropriate mining tools ◦ proper data mining is hard ▪ especially applied to VCS 11
development, maintenance, correction, … • One of the first research tasks to solve in SE ◦ a survey with 250+ methods published in 2007 • Type of models ◦ expert judgement ◦ parametric models (Use Cases, Function Points, COCOMO I/II, …) ◦ non-parametric models (estimation-via-analogy) • Various learners ◦ linear regression, Bayesian networks, GAs, NNs, DTs, HMMs, association rules, … • Challenges ◦ factors that affect effort and productivity are not understood well ◦ lack of decent historical and production data ◦ the need to adjust models to local environment 16
negative and positive inputs • Generate candidate patches ◦ modification of only one statement • Rank candidates ◦ program value features and modification features ◦ probabilistic model of correct code • Validate candidates ◦ test suite as an oracle ◦ passed test == fixed bug? Prophet 20 Fan Long and Martin Rinard. Automatic Patch Generation by Learning Correct Code (POPL’16)
and test data generation ◦ fault localization ◦ code repair ◦ test prioritization ◦ finding relevant tests ◦ estimation of testing efforts ◦ replacement of test suites • Data ◦ execution traces, logs ◦ coverage information ◦ failure data: where and why • https://testsigma.com, https://eggplant.io/, ... 21
method ◦ language-agnostic, requires no source code access • Rewards based on duration, previous last execution time and failure history ◦ are either zero or positive • Effective prioritization strategy is discovered after ~60 CI cycles 22 Spieker et al. Reinforcement Learning for Automatic Test Case Prioritization and Selection in Continuous Integration (ISSTA’17)
code smells in code, automatic suggestion of refactoring opportunities • Features ◦ structural information of the source code (mostly software metrics) ◦ patterns for code smells • Various learners • Most of the tools are standalone applications or Eclipse plugins • Challenges ◦ computational and memory complexity ◦ ambiguous evaluation metrics ◦ low agreement between different detectors ◦ datasets ◦ design patterns 23
object-oriented architecture and automatic recommendation of appropriate refactorings that optimize code structure ◦ clustering ensemble of 3 existing approaches ◦ path-based representations + SVM model (work in progress) • ArchitectureReloaded plugin for IntelliJ IDEA ◦ https://plugins.jetbrains.com/plugin/10411-architecturereloaded • A dataset generator and a dataset for evaluation of Move Method refactoring recommendation approaches Bryksin et. al. Automatic Recommendation of Move Method Refactorings Using Clustering Ensembles (IWoR’18) Novozhilov et al. Evaluation of Move Method refactorings recommendation algorithms: are we doing it right? (IWoR’19) Kurbatova et al. Recommendation of Move Method Refactoring Using Path-Based Representation of Code (IWoR’20)
• Duplicates in source code, documentation, … ◦ detection of duplicated knowledge • 4 types of code clones • All kinds of embeddings and learners involved • Language-specific and language-agnostic algorithms • Challenges ◦ computational time ◦ semantic clones 25 Chanchal Roy and James Cordy. A Survey on Software Clone Detection Research (2007)
the context observed in training ◦ full-line and snipped-based completion • Extracting context ◦ rich structural features (e.g. types) ◦ recurring patterns in source code based on text mining techniques ◦ all in between • Various learners ◦ mostly deep learning models • Challenges ◦ performance and memory limitations ◦ synthetic datasets and evaluation approaches 28
on clustering code changes (bug fixes) ◦ fixes as edit scripts between ASTs of incorrect and correct submissions • Prototype implementation for an introductory Java MOOC ◦ ~1M submissions for 34 tasks (5–21 LOC each) ◦ currently being integrated into Stepik.org Lobanov et. al. Automatic Classification of Error Types in Solutions to Programming Assignments at Online Learning Platform (AIED’19)
than 1.5 million unique files • Several experiments with different features and anomaly detectors • Analysis of both source code and bytecode • Detected 30 types of code anomalies, ~60 reported unique anomalies of 10 types were used by the Kotlin compiler team Bryksin et al. Using Large-Scale Anomaly Detection on Code to Improve Kotlin Compiler (MSR’20)
a search over some kind of space of programs • Program repair, automatic programming • Deductive synthesis, transformation-based synthesis • Inductive synthesis ◦ input-output examples, natural language, partial programs, grammar, assertions • Challenges ◦ search space ◦ user intent ◦ search technique 10. Program synthesis 34 Gulwani et al. Program Synthesis (Foundations and Trends in Programming Languages, Vol. 4, No. 1-2, 2017)
names ◦ aiming on generation of API-heavy code • Sketches for programs representation • Bayesian encoder-decoder technique • Combinatorial concretization ◦ random walk-based technique • IntelliJ IDEA plugin ◦ implementations for Java STDLib and Android SDK ◦ https://plugins.jetbrains.com/plugin/10729-bsl-code-synthesizer 35 Murali et al. Neural sketch learning for conditional program generation (ICLR’17) Vladislav Tankov and Timofey Bryksin. Data-based code synthesis in IntelliJ IDEA (SEIM’18)
code snippets ◦ creating documentation, suggesting better function names, commit messages, etc. • Approaches ◦ rule/template-based text generation ◦ models, adopted from IR and NLP domains ◦ deep learning models from the NMT field ▪ code2seq (Alon et al., 2019) • CoNaLa: The Code/Natural Language Challenge ◦ https://conala-corpus.github.io 36
• End-to-end pipeline support • Rich visualisation tools • Model evolution • Debugging, testing and interpretability • Integration of ML pipelines into production 38 Amershi et al. Software Engineering for Machine Learning: A Case Study (ICSE’19)
Datasets & mining tools • Reproducibility • Extensibility • Interpretability • Evaluation metrics • ML for the sake of ML • Immaturity for the real world • Gap between academia and industry 39 https://d30womf5coomej.cloudfront.net/sa/2c/25ac5102-aa66-4edd-887d-6babe41a20e3.png