Finding bugs function setTimeout(callBack, delay) {...}; browserSingleton.startPoller(100, function(delay, fn) { setTimeout(delay, fn); }); Name that denotes a function Name that denotes a function
Finding bugs function setTimeout(callBack, delay) {...}; browserSingleton.startPoller(100, function(delay, fn) { setTimeout(delay, fn); }); Name that denotes a function Name that denotes a function Order of application Order of application
function setTimeout(callBack: a -> b, delay: int){…}; browserSingleton.startPoller(100, function(delay, fn) { setTimeout(delay,fn); } ); The compiler can only help if humans add semantic information
The Naturalness hypothesis “Software is a form of human communication; software corpora have similar statistical properties to natural language corpora; and these properties can be exploited to build better software engineering tools.” Hindle et al. On the naturalness of software. ICSE 2012
Naturalness showcased Hindle et al. On the naturalness of software. ICSE 2012 Code n-grams are less “surprising” to a language model than English We can train language models to predict next tokens better in code
Finding bugs Pradel and Shen. Deepbugs: A Learning Approach to Name-based Bug Detection. OOPSLA 2018 Training models to distinguish correct from buggy code Buggy code Correct code Buggy Correct
Finding bugs Pradel and Shen. Deepbugs: A Learning Approach to Name-based Bug Detection. OOPSLA 2018 How to produce buggy code? • Swap function arguments
foo(a, b) -> foo(b, a) • Replace binary operators
i <= length -> i % length • Replace binary operand
Predicting types def bigger_number(a: ???, b: ???) -> ???: if a > b: return a else: return b Python 3.5+ code. Can you guess the types? How can we automatically annotate JavaScript/Python code with types?
Predicting types def is_bigger(a:int, b:int) -> boolean: “”” Returns True if a is number a is bigger than b, else False “”” return a > b Learning from existing code annotations
Predicting types def is_bigger(a:int, b:int) -> boolean: “”” Returns True if a is number a is bigger than b, else False “”” return a > b Learning from existing code annotations embedding
Predicting types def is_bigger(a:int, b:int) -> boolean: “”” Returns True if a is number a is bigger than b, else False “”” return a > b Learning from existing code annotations embedding
Predicting types def is_bigger(a:int, b:int) -> boolean: “”” Returns True if a is number a is bigger than b, else False “”” return a > b Learning from existing code annotations embedding sequence learning
Predicting types def is_bigger(a:int, b:int) -> boolean: “”” Returns True if a is number a is bigger than b, else False “”” return a > b Learning from existing code annotations embedding sequence learning
Predicting types def is_bigger(a:int, b:int) -> boolean: “”” Returns True if a is number a is bigger than b, else False “”” return a > b Learning from existing code annotations embedding sequence learning
Predicting types def is_bigger(a:int, b:int) -> boolean: “”” Returns True if a is number a is bigger than b, else False “”” return a > b Learning from existing code annotations embedding sequence learning concat +
Predicting types def is_bigger(a:int, b:int) -> boolean: “”” Returns True if a is number a is bigger than b, else False “”” return a > b Learning from existing code annotations embedding sequence learning concat prediction +
Finding inconsistencies How to find inconsistent function/variable names? Liu et al. Learning to Spot and Refactor Inconsistent Method Names. ICSE 2019 Methods with the similar names should have similar bodies
Finding inconsistencies Liu et al. Learning to Spot and Refactor Inconsistent Method Names. ICSE 2019 1. Build embeddings of function names and body vectors 2. For each function body: 1. Find functions close to it in vector space 2. Check their respective name distance
Code summarization Wan et al. Improving automatic source code summarization via deep reinforcement learning. ASE 2018 def add(a, b): return a + b def ???(a, b): return a + b
Code summarization Wan et al. Improving automatic source code summarization via deep reinforcement learning. ASE 2018 def add(a, b): return a + b “”” Adds two numbers “”” def ???(a, b): return a + b
Code summarization Wan et al. Improving automatic source code summarization via deep reinforcement learning. ASE 2018 def add(a, b): return a + b “”” Adds two numbers “”” def ???(a, b): return a + b add
Code summarization Wan et al. Improving automatic source code summarization via deep reinforcement learning. ASE 2018 Use critic network to re-adjust model weights BLEU score of 0.35
Code summarization Wan et al. Improving automatic source code summarization via deep reinforcement learning. ASE 2018 NL channel Semantics channel Use critic network to re-adjust model weights BLEU score of 0.35
Code summarization Wan et al. Improving automatic source code summarization via deep reinforcement learning. ASE 2018 NL channel Semantics channel Use critic network to re-adjust model weights BLEU score of 0.35