ADVANTAGES OF CONSTITUENCY: COMPUTATIONAL PERSPECTIVES ON SAMOAN WORD PROSODY KRISTINE M. YU UNIVERSITY OF MASSACHUSETTS AMHERST DEPARTMENT OF LINGUISTICS THE 22ND CONFERENCE ON FORMAL GRAMMAR UNIVERSITY OF TOULOUSE, JULY 23, 2017
PHONOLOGISTS ARE OFTEN EXPLICIT ABOUT [OTHER ASSUMPTIONS]. BUT WE TEND TO HELP OURSELVES TO PROSODIC DOMAINS WITHOUT FURTHER COMMENT. Kie Zuraw, 2009 2 https://www.mcgill.ca/linguistics/files/ linguistics/Handout_RevisedForMcGill.pdf
3 ▸ Theoretical phonology: referring to phonological constituents can capture generalizations in phonological patterns ▸ But: ▸ Also alternative ways to capture the same generalizations ▸ Computational descriptions of phonological patterns have revealed strong structural universals without referring to constituents at all (Heinz 2009, 2010, et seq.) Do constituents make phonological grammars more succinct? RESEARCH QUESTION
RESEARCH QUESTION: CASE STUDY 4 Do constituents make phonological grammars for Samoan word stress more succinct? ▸ One way to start to get a grip on whether we get explanatory advantage by assuming existence of constituents: succinctness ▸ Succinctness as a consequence of constituency has not been carefully explored computationally in phonology ▸ Case study: Comparison of succinctness of four grammar fragments generating Samoan stress patterns in monomorphemic words, with and without reference to feet as constituents Similar kinds of succinctness comparisons include: Chomsky (1965); Chomsky and Halle (1968); Meyer and Fischer (1971); Hartmanis (1980); Stabler (2013); Berwick (2015)
WHY CASE STUDIES? 5 ▸ Question here is if this particular phenomenon motivates prosodic constituents ▸ cf. case studies to examine where phonological and syntactic patterns fall in the Chomsky hierarchy
WHY WORK ON SAMOAN STRESS? 6 ▸ Recent, detailed phonological analysis using prosodic constituents based on a rich set of elicited stress patterns (Zuraw, Yu, and Orfitelli 2014) ▸ Connections to other computational modeling of grammar and parsing at the syntax-phonology interface in Samoan (Yu and Stabler to appear, Yu submitted)
WHAT IS A FOOT? 8 ▸ Rhythmic unit composed of one or more syllables ▸ Organizes syllables into higher-order unit based on regular stress patterns ▸ Examples: ▸ Trochaic foot: Strong-Weak ˈbut.ter ▸ Iambic foot: Weak-Strong ba.ˈguette ▸ Foot dominated by higher-order-prosodic units all the way up to level of utterance
DEFINITION OF LANGUAGE “LITTLE SAMOAN” 9 ▸ Slightly simplified description of Samoan stress in monomorphs presented in Zuraw, Yu, and Orfitelli (2014) ▸ Language of strings of light (one vowel only) and heavy (two vowels/long vowel) syllables marked for primary, secondary, or no stress (“weak”) ▸ la(ˈvaː) `energized’ W[L](P[H]) ▸ (ˈmanu) `animal’ (P[L]W[L]) ▸ L: light syllable, H: heavy syllable ▸ W: weak, P: primary stress, S: secondary stress ▸ (…): foot
RESEARCH QUESTION OPERATIONALIZATION 10 Do constituents make phonological grammars for Samoan word stress more succinct? Phonological grammars: - ‘Direct’: directly regulate surface patterns - ‘Karttunen OT’: Finite state implementation of optimality theory (constraint-based) Optimality Theory (OT): Prince and Smolensky (1993/2004) ; Karttunen OT implementation: Karttunen (1998)
OPTIMALITY THEORY 2: (OVER)CONSTRAIN 13 (ˌtemo)ka(ˈlasi) ‘democracy’ (S[L]W[L])W[L](P[L]W[L]) (1) “All syllables must be parsed into feet” (2) “The beginning of the word must coincide with the beginning of a foot” Constraint 1 Constraint 2 (ˌtemo)ka(ˈlasi) * te(ˌmoka)(ˈlasi) * * ▸ Both candidates violate at least one of the ordered constraints
OPTIMALITY THEORY 3: EVALUATE 14 Constraint 1 Constraint 2 (ˌtemo)ka(ˈlasi) * te(ˌmoka)(ˈlasi) * * ▸ EVAL is not a finite state process! Number of states required for EVAL cannot be bounded (Eisner 1997, Karttunen 1998). ▸ Karttunen OT: finite state implementation of OT, maps underlying forms directly to surface forms rather than violation vectors, no EVAL. Equivalent expressive power if finite bound on number of violations. EVALUATE (ˌtemo)ka(ˈlasi) Winner has fewer violations of higher-ranked constraints
RESEARCH QUESTION OPERATIONALIZATION 15 Do constituents make phonological grammars for Samoan word stress more succinct? Phonological grammars: - ‘Direct’: directly regulate surface patterns - ‘Karttunen OT’: Finite state implementation of optimality theory (constraint-based) Optimality Theory (OT): Prince and Smolensky (1993/2004) ; Karttunen OT implementation: Karttunen (1998)
2 X 2 COMPARISON OF GRAMMARS 16 Syllables only Syllables and feet Direct Direct, syllables only Direct, syll. and feet Karttunen OT Karttunen OT, syllables only Karttunen OT, syll. and feet
17 ▸ We define the grammars in xfst, a language designed by linguists to make it natural to state morphophonological grammars (Beesley and Karttunen, 2003) ▸ Includes pre-defined operators and capacity for definition of own operator and units which allow us to write grammars at very high level ▸ Compiles our high-level grammars to finite state transducers ▸ Provides common formalism in which we can define all four grammars and measure grammar size in a controlled comparison (Beesley and Karttunen, 2003), https://web.stanford.edu/~laurik/fsmbook/home.html DEFINING THE GRAMMARS IN xfst
19 ▸ Special case of minimum description length (MDL) relativized to xfst notation ▸ Common xfst formalism for expressing the grammars ▸ Data same across comparisons: stress patterns up to 5 syllables ▸ I limit testing empirical coverage of stress patterns to monomorphs of 5 syllables because there are no longer words ▸ All grammars admit exactly same set of stress patterns up to 5 syllables ▸ Size of encodings of sequences up to 5 syllables is the same, since possibilities allowed by grammars in that range is identical DEFINING SUCCINCTNESS Succinctness: the size of the grammar, i.e. the number of symbols it takes to write it down in xfst
OPERATIONALIZED RESEARCH QUESTION 20 Do constituents make phonological grammars for Samoan word stress more succinct? Does reference to feet reduce the number of symbols used in xfst, in defining direct approach and Karttunen OT grammars for stress patterns in Samoan monomorphs?
IMPLEMENTATION: ENFORCE SURFACE RESTRICTIONS 26 Step 2: Enforce surface restrictions “The end of a word must coincide with the end of a primary- stressed foot.” (Edgemost-R) define EdgemostR [ \P* PrimaryFoot ]; “A word may not contain a lapse” define No1Lapse ~[$W2]];
RESULTS: SUCCINCTNESS 27 Syllables only Syllables and feet Direct Direct, syllables only 145 Direct, syll. and feet 141 Karttunen OT Karttunen OT, syllables only >1000 Karttunen OT, syll. and feet 306 ‣ Direct accounts: feet don’t increase succinctness ‣ OT accounts: feet do increase succinctness ‣ OT syllable grammar shows blowup!
KARTTUNEN OT, SYLLABLES ONLY: ALIGN-X1-L FAMILY 29 Requires counting, doing arithmetic Symbol count blow up to approximate infinitely many degrees of well-formedness!
DISCUSSION 30 ▸ Surprisingly, except for Karttunen syllable OT account, size of grammars in high-level xfst definition very similar. ▸ Direct foot: 141 symbols (31 states, 36 arcs) ▸ Direct syllable: 145 symbols (27 states, 34 arcs) ▸ Karttunen OT foot: 306 symbols (418 states, 535 arcs) ▸ Karttunen OT syllable: >1000 symbols (3460 states, 4680 arcs) ▸ Within the Karttunen OT formalism, reference to feet does make the grammar more succinct ▸ But within “direct” approach, reference to feet does not make grammar more succinct
DISCUSSION 32 ▸ Introduction of feet doesn’t make direct grammars more succinct. So why might we still want to use feet? ▸ Clearly more structure in the grammar referring to feet ▸ Without feet, require case-by-case stipulations in grammar ▸ Direct accounts more succinct than Karttunen OT accounts. So why use OT? ▸ Certain stress patterns very difficult to describe under usual constraints, OT framework captures universals that phonologists have noticed ▸ Any regular stress pattern could be captured in direct approach (unless we introduced further restrictions)
CONCLUSION 33 ▸ Answer: not necessarily! Depends on grammar formalism. ▸ Here, preliminary “proof-of-concept” exploration ▸ Not clear that counting symbols right way to assess how well grammar is capturing generalizations ▸ Not clear that results here would generalize for other phonological phenomena ▸ Shows a way we can study concrete, specific linguistic proposals and engage closely with linguistic data and practice while maintaining a rigorous approach Do constituents make phonological grammars for Samoan word stress more succinct?
FOR MORE INFORMATION… 34 ▸ Paper and code ▸ https://github.com/krismyu/smo-constituency-feet ▸ Related work at my academic website ▸ www.krisyu.org Merci beaucoup!