SPECIFICATIONS ▸ Long-standing ideal: describe software behavior in natural language ▸ At every level: unit tests through top-level speci fi cations and requirements ▸ Fundamentally hard! ▸ Linguistic ambiguity, open vocabulary (per domain/codebase) ▸ Natural language is complex enough (e.g., cross-serial dependencies) to require context sensitivity ▸ Previously tackled with template-based techniques, which have limits ▸ So we throw machine learning at it, right?
IN CURRENT ML TECHNIQUES ▸ Current ML (esp. DNN) techniques: ▸ Are non-modular ▸ Dif fi cult to extend with new vocabulary ▸ Impossible to correct mistakes in these models ▸ Want to fi x an individual word’s handling? Can’t. ▸ Are opaque ▸ Current NN-based NLP explanations highlight in fl uential words, don’t describe why
COMBINATORY CATEGORIAL GRAMMARS ▸ Linguists have spent decades building rich formalisms for modeling compositional semantics: how individual words’ meanings combine to build sentence meaning ▸ Categorial Grammars (CGs) are one such approach ▸ Compose word meanings in lambda calculi with “truth types” ▸ Extensive treatments grammar and semantics of English, German, Dutch, French, Japanese, and other NLs already conducted by expert linguists ▸ Including context-sensitive grammatical phenomena!
COMBINATORY CATEGORIAL GRAMMARS Γ ⊢ X/Y ⇒ f Δ ⊢ Y ⇒ a Γ, Δ ⊢ X ⇒ f a > Γ ⊢ Y ⇒ a Δ ⊢ X∖Y ⇒ f Γ, Δ ⊢ X ⇒ f a < 3 ⊢ NP ⇒ 3 Lex is ⊢ (S∖NP)/ADJ ⇒ λpλn . p n Lex even ⊢ ADJ ⇒ λn . n % 2 = 0 Lex is even ⊢ S∖NP ⇒ λn . n % 2 = 0 > 3 is even ⊢ S ⇒ 3 % 2 = 0 <
▸ A kind of parameterized unit test that uses guided random generation ▸ Combinators to compose & fi lter random generation ▸ Combinators to express desired properties ▸ Fundamentally expresses universally quanti fi ed logical claims ▸ forall(generated_inputs(), (λx. isGood(x))) ▸ Randomly probed based on generated inputs ▸ Can be used as a truth type for categorial grammar!
VIA CATEGORIAL GRAMMAR every ⊢ ((S/(S∖NP))/CN[Gen])/ADJ ⇒ λPλgenλclaim = > forall(gen . filter(P), claim) Lex … every even integer ⊢ S/(S∖NP) ⇒ λclaim . forall(genint . filter(even), claim) … is even ⊢ S∖NP ⇒ λn . n % 2 = 1 > every even integer is odd ⊢ S ⇒ forall(genint . filter(even), odd)
PBT UNIT TESTS ▸ Built enough of a lexicon to tackle the examples from the PBT chapter of a draft textbook ▸ Mild adjustments to simplify English slightly (remove parentheticals) ▸ 3 sentences specifying grading ▸ 4 specifying FizzBuzz ▸ Parse into the logical form supported by NLTK, transliterate into Javascript fast- check PBT library
PBT: RESULTS ▸ To handle 29 words, 46 lexicon entries ▸ Separate entries for words with multiple grammatical roles ▸ 3 rules speci fi c to programs tested: “passing” (x2) and “ fi zzbuzz” ▸ 3 FizzBuzz tests are under-speci fi ed ▸ Translated tests are correct interpretations of English ▸ Fail because they under-specify domain ▸ Likely intentional in textbook setting, but similar to unintentional cases ▸ Clari fi ed versions parse & translate into passing tests
TEXTBOOKS ▸ Also dug up 19 more PBTs from 3 other textbooks covering PBT ▸ All can be parsed by existing grammar-only CCG parser (depccg), so feasible to extend prototype for them ▸ One book repeats the same under-speci fi cation of FizzBuzz
▸ Found one grammatical novelty ▸ Translating “every integer is an integer” to PBT: ▸ “integer” needs to be a generator once ▸ “integer” needs to be a predicate once ▸ Our grammar splits common nouns into CN[Gen] and CN[Chk] to distinguish ▸ Suggested by existing work on linguistics of mathematical text, indexing categories by mathematical constructs
Do we have to build the lexicon manually? ▸ In principle no: lexicon inference exists, but needs adaptation for any new grammatical types ▸ Can reuse grammatical types from existing grammar DBs like CCGBank ▸ Does this actually beat ML approaches? ▸ One at least one measure, yes: we manually repaired individual word meanings during hand-crafting ▸ Can this work for non-English languages? ▸ Yes: existing CCG grammars for German, Japanese, Dutch, Hindi, and more