Natural Language to Code Generation: A Brief Survey

Natural Language to Code: Brief Overview Pavel Braslavski 23.04.2020

About myself • Research/academia: JetBrains Research/ Higher School of Economics
SPb/Ural Federal University • Past industrial experience: Yandex/SKB Kontur • Recent research interests: question answering, fiction analysis, computational humor Homepage: http://kansas.ru/pb/ 2

Why NL2Code? • Applications • Code generation/search • Question answering
• Instructing a robot • Interesting NLP task • Possibly complete executable meaning representation 3 at the chair, move forward three steps past the sofa [from Yoav Artzi’s slides]

SHRDLU by Terry Winograd (1968) 4 https://www.youtube.com/watch?v=bo4RvYJYOzI

Approaches • Bottom-up processing: Words → Syntax → Meaning •
End-to-end • Hand-written rules/grammars • Annotated data → machine-learned models 5

GeoQuery (Zelle and Mooney 1996) 7 [Dragomir Radev]

Compositional Semantics S -> NP VP {VP.Sem(NP.Sem)} t VP ->
V NP {V.Sem(NP.Sem)} <e,t> NP -> N {N.Sem} e V -> likes {λ x,y likes(x,y) <e,<e,t>> N -> Javier {Javier} e N -> pizza {pizza} e [Radev] 8

Semantic Parsing • Associate a semantic expression with each node
Javier likes pizza V: λ x,y likes(x,y) N: pizza VP: λx likes(x,pizza) N: Javier S: likes(Javier, pizza) [Radev] 9

Zettlemoyer and Collins (2005) 10 [Dragomir Radev]

Zettlemoyer and Collins (2005) 11 [Dragomir Radev]

seq2tree Li Dong and Mirella Lapata Language to Logical Form
with Neural Attention 2016 (following slides by Dong & Lapata) 12

Seq2SQL Seq2sql: Generating Structured Queries from Natural Language Using Reinforcement
Learning Victor Zhong, Caiming Xiong, Richard Socher 2017 20

WikiSQL dataset • 80,654 examples: question + SQL query •
24,241 tables extracted from Wikipedia • Table → SQL query → crude question based on a template → human paraphrase via crowdsourcing • https://github.com/salesforce/WikiSQL 21

Architecture 22 [Zhong, Xiong, Socher, 2017] mixed objective function: L
= Lagg + Lsel + Lwhe

Results ex – execution accuracy (the same result) lf –
logical form accuracy (the same query) 23

Coarse to Fine Li Dong and Mirella Lapata Coarse-to-Fine Decoding
for Neural Semantic Parsing 2018 (following slides by Dong & Lapata) 24

Task ATIS Request: Show me flights from Seattle to Boston
next Monday SQL query: (SELECT DISTINCT flight.flight_id FROM flight WHERE (flight.from_airport IN (SELECT airport_service.airport_code FROM airport_service WHERE airport_service.city_code IN (SELECT city.city_code FROM city WHERE city.city_name = 'SEATTLE'))) AND (flight.to_airport IN (SELECT airport_service.airport_code FROM airport_service WHERE airport_service.city_code IN (SELECT city.city_code FROM city WHERE city.city_name = 'BOSTON'))) AND (flight.flight_days IN (SELECT days.days_code FROM days WHERE days.day_name IN (SELECT date_day.day_name FROM date_day WHERE date_day.year = 1993 AND date_day.month_number = 2 AND date_day.day_number = 8)))); Database Result: 31 flights available [Hemphill et al. 1990; Dahl et al. 1994] 25 Alane Suhr, ACL2018 tutorial ~5,000 examples; database with 27 tables and ~160,000 entries

DJANGO (2015) • Source code → pseudo-code • Py2En: 18,805
Py2Jp: 722 • Evaluation: BLEU + manual (acceptability/understanding) • https://github.com/odashi/ase15-django-dataset 26

34 From NL2bash paper

Spider (2018) • 200 DBs with multiple tables, • 10,181
questions, • 5,693 complex SQL queries. • Tables → human questions + SQL queries →checking + paraphrasing • Several evaluation measures accounting for SQL structure • https://yale-lily.github.io/spider 35 +SParC, CoSQL (2019)

NL2Bash (2018) • Bash one-liners from the Web + NL
descriptions • https://github.com/TellinaTool/nl2bash 37

CoNaLa (2018) Intent + python code snippets from StackOverflow Manually
annotated: 2,379 training/500 test Automatically mined: 598,237 intent/snippet pairs Evaluation: BLUE https://conala-corpus.github.io/ 38

StaQC (2018) • Stack Overflow Question-Code pairs https://github.com/LittleYUYU/StackOverflow-Question-Code-Dataset 39

42 https://app.wandb.ai/github/codesearchnet/benchmark

Questions? 43

BLEU • Bilingual Evaluation Understudy • Most widely used evaluation
metrics in machine translation • n-gram precision of the MT output + brevity penalty 44 = ⋅ exp(෍ =1 log ) = min( 1, _ℎ _ℎ ) 4 = ෑ =1 4

Natural Language to Code Generation: A Brief Su...

Natural Language to Code Generation: A Brief Survey

More Decks by Exactpro

Other Decks in Science

Featured

Transcript