Learning Structural Edits via Incremental Tree Transformations

Learning Structural Edits via Incremental Tree Transformations Ziyu Yao, Frank
Xu, Pengcheng Yin, Huan Sun, Graham Neubig

- Programs are graphs, but even matching in graph grammars
is NP-hard and termination of graph rewriting is Turing-complete The computational challenges of learning programs

is NP-hard and termination of graph rewriting is Turing-complete - Source code is syntactically tree-like (i.e., a context-free grammar) The computational challenges of learning programs

The computational challenges of learning programs - Programs are graphs,
but even matching in graph grammars is NP-hard and termination of graph rewriting is Turing-complete - Source code is syntactically tree-like (i.e., a context-free grammar) - But whole-tree synthesis is still extremely hard, i.e., the number of possible ASTs grows super-exponentially with depth, ~O(Ack(n, m))

is NP-hard and termination of graph rewriting is Turing-complete - Source code is syntactically tree-like (i.e., a context-free grammar) - But whole-tree synthesis is still extremely hard, i.e., the number of possible ASTs grows super-exponentially with depth, ~O(Ack(n, m)) - Even counting shallow trees requires galactic computation The computational challenges of learning programs

Counting the number of binary trees up to height n
All trees S n+1 =(S n +2)2−1 Distinct trees T n =S n −S n−1 n S n T n 1 3 3 2 24 21 3 675 651 4 458329 457653 5 210066388899 210065930571

is NP-hard and termination of graph rewriting is Turing-complete - Source code is syntactically tree-like (i.e., a context-free grammar) - But whole-tree synthesis is still extremely hard, i.e., the number of possible ASTs grows super-exponentially with depth, ~O(Ack(n, m)) - Even counting shallow trees requires galactic computation - Much more computationally feasible to learn incremental transformations, but how do we represent incremental edits? The computational challenges of learning programs

Previous work in incremental program learning - Tarlow et al.,
(2019), Dinella et al., (2020), Brody et al., (2020) explore incremental editing on code sequences and tree-structured data - Unlike theirs and prior work, e.g., Yin et al. (2019), Panthaplackel et al. (2020b), and Hoang et al. (2020), Yao et al. (2021): - Study intent, i.e. conditioned a learned or provided-specification - Use a language-agnostic type-safe DSL based on Wang, (1997) - Perform all transformations entirely in the tree domain - Allow edit trajectories to modify any part of the tree at time - Support new flexible operators such as subtree copying - Propose a IL-based training algorithm with a dynamic oracle

Code transformations as autoregressive tree editing

Code transformations as autoregressive tree editing x=list.ElementAt(i+1) x = list[i+1]

Code transformations as autoregressive tree editing (Λ, Λ) = φ(
) x=list.ElementAt(i+1) x = list[i+1]

Code transformations as autoregressive tree editing x=list.ElementAt(i+1) x = list[i+1]
(Λ, Λ) = φ( )

x=list.ElementAt(i+1) x = list[i+1] Code transformations as autoregressive tree editing
(Λ, Λ) = φ( )

Iterative tree edit decoding process

Iterative tree edit encoding process Given: Learn:

Iterative tree edit encoding process Given: Learn: ⇔

A Dynamic Oracle! See: Goldberg and Nivre (2012) How do
we generate the gold edit sequences?

Trajectory-sampling algorithms

• Wang et al. The Zephyr Abstract Syntax Description Language,
1997. • Yin et al. Learning to represent edits, 2018. • Panthaplackel et al., Learning to update natural language comments based on code changes, 2020. • Tarlow, et al. Learning to fix build errors with graph2diff neural networks, 2019. • Dinella, et al. Hoppity: Learning graph transformations to detect and fix bugs in programs, 2020. • Thong Hoang, et al. CC2Vec: Distributed representations of code changes, 2020. References and Related Work

Learning Structural Edits via Incremental Tree ...

Learning Structural Edits via Incremental Tree Transformations

Breandan Considine

More Decks by Breandan Considine

Featured

Transcript