Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Natural Language to Code Generation: A Brief Survey

Exactpro
April 23, 2020

Natural Language to Code Generation: A Brief Survey

Natural Language to Code Generation: A Brief Survey (RU)

Pavel Braslavski
Associate Professor, Higher School of Economics
Researcher, JetBrains Research

Online Dev Meetup
23 April 2020

Video (RU): https://youtu.be/ZZJgtSyMfmU

Pavel will give a talk on generating program code based on natural language text input. Pavel will briefly survey various task settings, methods, and evaluation, as well as available data.
The talk is a must for those interested in the modern methods and applications of natural language processing.

To learn more about Exactpro, visit our website https://exactpro.com/

Follow us on
LinkedIn https://www.linkedin.com/company/exactpro-systems-llc
Twitter https://twitter.com/exactpro
Facebook https://www.facebook.com/exactpro/
Instagram https://www.instagram.com/exactpro/

Subscribe to Exactpro YouTube channel https://www.youtube.com/c/exactprosystems

Exactpro

April 23, 2020
Tweet

More Decks by Exactpro

Other Decks in Science

Transcript

  1. About myself • Research/academia: JetBrains Research/ Higher School of Economics

    SPb/Ural Federal University • Past industrial experience: Yandex/SKB Kontur • Recent research interests: question answering, fiction analysis, computational humor Homepage: http://kansas.ru/pb/ 2
  2. Why NL2Code? • Applications • Code generation/search • Question answering

    • Instructing a robot • Interesting NLP task • Possibly complete executable meaning representation 3 at the chair, move forward three steps past the sofa [from Yoav Artzi’s slides]
  3. Approaches • Bottom-up processing: Words → Syntax → Meaning •

    End-to-end • Hand-written rules/grammars • Annotated data → machine-learned models 5
  4. Compositional Semantics S -> NP VP {VP.Sem(NP.Sem)} t VP ->

    V NP {V.Sem(NP.Sem)} <e,t> NP -> N {N.Sem} e V -> likes {λ x,y likes(x,y) <e,<e,t>> N -> Javier {Javier} e N -> pizza {pizza} e [Radev] 8
  5. Semantic Parsing • Associate a semantic expression with each node

    Javier likes pizza V: λ x,y likes(x,y) N: pizza VP: λx likes(x,pizza) N: Javier S: likes(Javier, pizza) [Radev] 9
  6. seq2tree Li Dong and Mirella Lapata Language to Logical Form

    with Neural Attention 2016 (following slides by Dong & Lapata) 12
  7. 13

  8. 15

  9. 16

  10. 17

  11. 18

  12. 19

  13. Seq2SQL Seq2sql: Generating Structured Queries from Natural Language Using Reinforcement

    Learning Victor Zhong, Caiming Xiong, Richard Socher 2017 20
  14. WikiSQL dataset • 80,654 examples: question + SQL query •

    24,241 tables extracted from Wikipedia • Table → SQL query → crude question based on a template → human paraphrase via crowdsourcing • https://github.com/salesforce/WikiSQL 21
  15. Results ex – execution accuracy (the same result) lf –

    logical form accuracy (the same query) 23
  16. Coarse to Fine Li Dong and Mirella Lapata Coarse-to-Fine Decoding

    for Neural Semantic Parsing 2018 (following slides by Dong & Lapata) 24
  17. Task ATIS Request: Show me flights from Seattle to Boston

    next Monday SQL query: (SELECT DISTINCT flight.flight_id FROM flight WHERE (flight.from_airport IN (SELECT airport_service.airport_code FROM airport_service WHERE airport_service.city_code IN (SELECT city.city_code FROM city WHERE city.city_name = 'SEATTLE'))) AND (flight.to_airport IN (SELECT airport_service.airport_code FROM airport_service WHERE airport_service.city_code IN (SELECT city.city_code FROM city WHERE city.city_name = 'BOSTON'))) AND (flight.flight_days IN (SELECT days.days_code FROM days WHERE days.day_name IN (SELECT date_day.day_name FROM date_day WHERE date_day.year = 1993 AND date_day.month_number = 2 AND date_day.day_number = 8)))); Database Result: 31 flights available [Hemphill et al. 1990; Dahl et al. 1994] 25 Alane Suhr, ACL2018 tutorial ~5,000 examples; database with 27 tables and ~160,000 entries
  18. DJANGO (2015) • Source code → pseudo-code • Py2En: 18,805

    Py2Jp: 722 • Evaluation: BLEU + manual (acceptability/understanding) • https://github.com/odashi/ase15-django-dataset 26
  19. 27

  20. 28

  21. 29

  22. 30

  23. 31

  24. 32

  25. 33

  26. Spider (2018) • 200 DBs with multiple tables, • 10,181

    questions, • 5,693 complex SQL queries. • Tables → human questions + SQL queries →checking + paraphrasing • Several evaluation measures accounting for SQL structure • https://yale-lily.github.io/spider 35 +SParC, CoSQL (2019)
  27. 36

  28. NL2Bash (2018) • Bash one-liners from the Web + NL

    descriptions • https://github.com/TellinaTool/nl2bash 37
  29. CoNaLa (2018) Intent + python code snippets from StackOverflow Manually

    annotated: 2,379 training/500 test Automatically mined: 598,237 intent/snippet pairs Evaluation: BLUE https://conala-corpus.github.io/ 38
  30. 40

  31. 41

  32. BLEU • Bilingual Evaluation Understudy • Most widely used evaluation

    metrics in machine translation • n-gram precision of the MT output + brevity penalty 44 = ⋅ exp(෍ =1 log ) = min( 1, _ℎ _ℎ ) 4 = ෑ =1 4