Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Advantages of constituency: computational perspectives on Samoan word prosody

20a8ff44959a902d76386e2a75592154?s=47 krisyu
July 27, 2017

Advantages of constituency: computational perspectives on Samoan word prosody

Refereed talk presented at the 22nd Conference on Formal Grammar on July 23, 2017 at the University of Toulouse. http://fg.phil.hhu.de/2017/ Code available at: https://github.com/krismyu/smo-constituency-feet

20a8ff44959a902d76386e2a75592154?s=128

krisyu

July 27, 2017
Tweet

Transcript

  1. ADVANTAGES OF CONSTITUENCY: COMPUTATIONAL PERSPECTIVES ON SAMOAN WORD PROSODY KRISTINE

    M. YU UNIVERSITY OF MASSACHUSETTS AMHERST DEPARTMENT OF LINGUISTICS THE 22ND CONFERENCE ON FORMAL GRAMMAR UNIVERSITY OF TOULOUSE, JULY 23, 2017
  2. PHONOLOGISTS ARE OFTEN EXPLICIT ABOUT [OTHER ASSUMPTIONS]. BUT WE TEND

    TO HELP OURSELVES TO PROSODIC DOMAINS WITHOUT FURTHER COMMENT. Kie Zuraw, 2009 2 https://www.mcgill.ca/linguistics/files/ linguistics/Handout_RevisedForMcGill.pdf
  3. 3 ▸ Theoretical phonology: referring to phonological constituents can capture

    generalizations in phonological patterns ▸ But: ▸ Also alternative ways to capture the same generalizations ▸ Computational descriptions of phonological patterns have revealed strong structural universals without referring to constituents at all (Heinz 2009, 2010, et seq.) Do constituents make phonological grammars more succinct? RESEARCH QUESTION
  4. RESEARCH QUESTION: CASE STUDY 4 Do constituents make phonological grammars

    for Samoan word stress more succinct? ▸ One way to start to get a grip on whether we get explanatory advantage by assuming existence of constituents: succinctness ▸ Succinctness as a consequence of constituency has not been carefully explored computationally in phonology ▸ Case study: Comparison of succinctness of four grammar fragments generating Samoan stress patterns in monomorphemic words, with and without reference to feet as constituents Similar kinds of succinctness comparisons include: Chomsky (1965); Chomsky and Halle (1968); Meyer and Fischer (1971); Hartmanis (1980); Stabler (2013); Berwick (2015)
  5. WHY CASE STUDIES? 5 ▸ Question here is if this

    particular phenomenon motivates prosodic constituents ▸ cf. case studies to examine where phonological and syntactic patterns fall in the Chomsky hierarchy
  6. WHY WORK ON SAMOAN STRESS? 6 ▸ Recent, detailed phonological

    analysis using prosodic constituents based on a rich set of elicited stress patterns (Zuraw, Yu, and Orfitelli 2014) ▸ Connections to other computational modeling of grammar and parsing at the syntax-phonology interface in Samoan (Yu and Stabler to appear, Yu submitted)
  7. RESEARCH QUESTION OPERATIONALIZATION 7 Do constituents make phonological grammars for

    Samoan word stress more succinct? Constituents: Feet
  8. WHAT IS A FOOT? 8 ▸ Rhythmic unit composed of

    one or more syllables ▸ Organizes syllables into higher-order unit based on regular stress patterns ▸ Examples: ▸ Trochaic foot: Strong-Weak ˈbut.ter ▸ Iambic foot: Weak-Strong ba.ˈguette ▸ Foot dominated by higher-order-prosodic units all the way up to level of utterance
  9. DEFINITION OF LANGUAGE “LITTLE SAMOAN” 9 ▸ Slightly simplified description

    of Samoan stress in monomorphs presented in Zuraw, Yu, and Orfitelli (2014) ▸ Language of strings of light (one vowel only) and heavy (two vowels/long vowel) syllables marked for primary, secondary, or no stress (“weak”) ▸ la(ˈvaː) `energized’ W[L](P[H]) ▸ (ˈmanu) `animal’ (P[L]W[L]) ▸ L: light syllable, H: heavy syllable ▸ W: weak, P: primary stress, S: secondary stress ▸ (…): foot
  10. RESEARCH QUESTION OPERATIONALIZATION 10 Do constituents make phonological grammars for

    Samoan word stress more succinct? Phonological grammars: - ‘Direct’: directly regulate surface patterns - ‘Karttunen OT’: Finite state implementation of optimality theory (constraint-based) Optimality Theory (OT): Prince and Smolensky (1993/2004) ; Karttunen OT implementation: Karttunen (1998)
  11. OPTIMALITY THEORY 1: (OVER)GENERATE 11 Input: LL Output: P[L]P[L] P[L]W[L]

    P[L]S[L] W[L]P[L] W[L]W[L] W[L]S[L] S[L]P[L] S[L]W[L] S[L]S[L] GENERATE Add stress markup ▸ L: light syllable, H: heavy syllable ▸ W: weak, P: primary stress, S: secondary stress
  12. OPTIMALITY THEORY 2: (OVER)CONSTRAIN 12 Candidates: P[L]P[L] P[L]W[L] P[L]S[L] W[L]P[L]

    W[L]W[L] W[L]S[L] S[L]P[L] S[L]W[L] S[L]S[L] CONSTRAIN Candidates: P[L]P[L] P[L]W[L] P[L]S[L] W[L]P[L] W[L]W[L] W[L]S[L] S[L]P[L] S[L]W[L] S[L]S[L] “Don’t have adjacent stressed syllables” ▸ L: light syllable, H: heavy syllable ▸ W: weak, P: primary stress, S: secondary stress
  13. OPTIMALITY THEORY 2: (OVER)CONSTRAIN 13 (ˌtemo)ka(ˈlasi) ‘democracy’ (S[L]W[L])W[L](P[L]W[L]) (1) “All

    syllables must be parsed into feet” (2) “The beginning of the word must coincide with the beginning of a foot” Constraint 1 Constraint 2 (ˌtemo)ka(ˈlasi) * te(ˌmoka)(ˈlasi) * * ▸ Both candidates violate at least one of the ordered constraints
  14. OPTIMALITY THEORY 3: EVALUATE 14 Constraint 1 Constraint 2 (ˌtemo)ka(ˈlasi)

    * te(ˌmoka)(ˈlasi) * * ▸ EVAL is not a finite state process! Number of states required for EVAL cannot be bounded (Eisner 1997, Karttunen 1998). ▸ Karttunen OT: finite state implementation of OT, maps underlying forms directly to surface forms rather than violation vectors, no EVAL. Equivalent expressive power if finite bound on number of violations. EVALUATE (ˌtemo)ka(ˈlasi) Winner has fewer violations of higher-ranked constraints
  15. RESEARCH QUESTION OPERATIONALIZATION 15 Do constituents make phonological grammars for

    Samoan word stress more succinct? Phonological grammars: - ‘Direct’: directly regulate surface patterns - ‘Karttunen OT’: Finite state implementation of optimality theory (constraint-based) Optimality Theory (OT): Prince and Smolensky (1993/2004) ; Karttunen OT implementation: Karttunen (1998)
  16. 2 X 2 COMPARISON OF GRAMMARS 16 Syllables only Syllables

    and feet Direct Direct, syllables only Direct, syll. and feet Karttunen OT Karttunen OT, syllables only Karttunen OT, syll. and feet
  17. 17 ▸ We define the grammars in xfst, a language

    designed by linguists to make it natural to state morphophonological grammars (Beesley and Karttunen, 2003) ▸ Includes pre-defined operators and capacity for definition of own operator and units which allow us to write grammars at very high level ▸ Compiles our high-level grammars to finite state transducers ▸ Provides common formalism in which we can define all four grammars and measure grammar size in a controlled comparison (Beesley and Karttunen, 2003), https://web.stanford.edu/~laurik/fsmbook/home.html DEFINING THE GRAMMARS IN xfst
  18. RESEARCH QUESTION 18 Do constituents make phonological grammars for Samoan

    word stress more succinct?
  19. 19 ▸ Special case of minimum description length (MDL) relativized

    to xfst notation ▸ Common xfst formalism for expressing the grammars ▸ Data same across comparisons: stress patterns up to 5 syllables ▸ I limit testing empirical coverage of stress patterns to monomorphs of 5 syllables because there are no longer words ▸ All grammars admit exactly same set of stress patterns up to 5 syllables ▸ Size of encodings of sequences up to 5 syllables is the same, since possibilities allowed by grammars in that range is identical DEFINING SUCCINCTNESS Succinctness: the size of the grammar, i.e. the number of symbols it takes to write it down in xfst
  20. OPERATIONALIZED RESEARCH QUESTION 20 Do constituents make phonological grammars for

    Samoan word stress more succinct? Does reference to feet reduce the number of symbols used in xfst, in defining direct approach and Karttunen OT grammars for stress patterns in Samoan monomorphs?
  21. CODE IS AVAILABLE AT… 21 https://github.com/krismyu/smo-constituency-feet

  22. IMPLEMENTATION: MARKUP STRESS 22 Input: . . . LL .

    . . Output: . . . P[L]W[L] . . . Add stress markup Step 0a: Add stress markup (all 4 grammars)
  23. IMPLEMENTATION: MARKUP FEET 23 Output: . . . (P[L]W[L]) .

    . . Add footing markup “Wrap parentheses around any LL or H” Step 0b: Add footing markup (foot-based grammars only) Input: . . . P[L]W[L] . . .
  24. IMPLEMENTATION: DEFINE FEET/SYLLABLES 24 Step 1: Define foot/syllable types define

    Light [ “[“ “L” “]” “; define WeakLight [“W” Light ]; “A weak light is a light immediately preceded by W.” “A light syllable is the sequence [L]” “A lapse is two adjacent weak lights.” define W2 [ WeakLight WeakLight ];
  25. IMPLEMENTATION: DEFINE FEET/SYLLABLES 25 define Foot [“(“ [\[ “(“ |

    “)” ]]* “)”]; define PrimaryFoot [ Foot & $[“P”] ]; “A foot is a string of non-parentheses enclosed in parentheses.” “A primary-stressed foot is a foot that contains P.” Step 1: Define foot/syllable types
  26. IMPLEMENTATION: ENFORCE SURFACE RESTRICTIONS 26 Step 2: Enforce surface restrictions

    “The end of a word must coincide with the end of a primary- stressed foot.” (Edgemost-R) define EdgemostR [ \P* PrimaryFoot ]; “A word may not contain a lapse” define No1Lapse ~[$W2]];
  27. RESULTS: SUCCINCTNESS 27 Syllables only Syllables and feet Direct Direct,

    syllables only 145 Direct, syll. and feet 141 Karttunen OT Karttunen OT, syllables only >1000 Karttunen OT, syll. and feet 306 ‣ Direct accounts: feet don’t increase succinctness ‣ OT accounts: feet do increase succinctness ‣ OT syllable grammar shows blowup!
  28. KARTTUNEN OT, SYLLABLES ONLY: LAPSE FAMILY 28 Penalize 1 lapse:

    10 symbols Penalize 2 lapses: 25 symbols Penalize 3 lapses: 61 symbols Penalize 4 lapses: 131 symbols Requires violation counting, doing arithmetic Finite system can’t make infinitely many degrees of well-formedness! define No1Lapse ~[$W2]]; define No2Lapse ~[$[Weak]^3] & ~[[$[W2]]^2]; define No3Lapse ~[$[Weak]^4] & ~[[$[W2]]^3] & ~[?* [Weak]^3 $[[Weak]^2]] & ~[$[[Weak]]^2 [Weak]^3 ?*]; define No4Lapse ~[$[Weak]^5] & ~[[$[W2]]^4] & ~[?* [Weak]^4 $[[Weak]^2]] & ~[$[[Weak]^2] [Weak]^4 ?*] & ~[[$[[Weak]^3]]^2] & ~[?* [Weak]^3 [$[W2]]^2] & ~[ [$[W2]]^2 [Weak]^3 ] & ~[$[W2] [Weak]^3 $[W2] ];
  29. KARTTUNEN OT, SYLLABLES ONLY: ALIGN-X1-L FAMILY 29 Requires counting, doing

    arithmetic Symbol count blow up to approximate infinitely many degrees of well-formedness!
  30. DISCUSSION 30 ▸ Surprisingly, except for Karttunen syllable OT account,

    size of grammars in high-level xfst definition very similar. ▸ Direct foot: 141 symbols (31 states, 36 arcs) ▸ Direct syllable: 145 symbols (27 states, 34 arcs) ▸ Karttunen OT foot: 306 symbols (418 states, 535 arcs) ▸ Karttunen OT syllable: >1000 symbols (3460 states, 4680 arcs) ▸ Within the Karttunen OT formalism, reference to feet does make the grammar more succinct ▸ But within “direct” approach, reference to feet does not make grammar more succinct
  31. FINITE STATE TRANSDUCERS: FOOTED GRAMMARS 31 0 1 0:N 2

    0:X 3 0:( 4 0:u 5 0:[ 7 0:S 6 0:P 8 H:l 0:l 9 L:l 10 L 12 0:[ 11 0:[ 21 L 20 H 17 0:] 13 H:l 14 L:l H:l 15 L:l 16 0:l 22 0:X 23 0:( H:0 L:0 H:0 L:0 H:0 L:0 27 0:] 26 0:] 32 0:W 28 0:[ 0:P 29 0:S 34 0:[ 37 0:[ 33 H L 38 0:] 39 H 43 0:] 31 0:) 0:( 36 0:X 40 0:[ 45 L 44 H 50 0:] 49 0:] 58 0:( 57 0:X 0:P 67 0:S 66 0:[ 76 0:[ H 85 L 94 0:] 55 0:X 56 0:( 102 0:W 48 0:) 64 0:[ 65 0:S H 74 L H L 75 0:[ 53 0:X 54 0:( 111 0:[ 118 L 62 0:[ 0:P 63 0:S 82 0:] 84 L 83 H H 72 L 0:X 91 0:( 93 0:] 92 0:] 125 0:] 73 0:[ 101 0:W 100 0:S 0:) 80 0:] H 81 L 89 0:( 68 0:X 110 0:[ 90 0:] 109 0:[ 0:P 98 0:S 77 0:[ 117 L 60 H L 99 0:W 124 0:] 70 0:] 107 0:[ 108 0:[ 0:) 131 0:W 115 H 116 L L 122 0:] 123 0:] 139 0:[ 130 0:W 41 0:X 42 0:( 46 0:[ 47 0:S 129 0:) 146 L H 51 L 136 0:X 137 0:( 138 0:[ 153 0:] 52 0:[ H 61 L 0:) 59 0:] 143 0:[ 0:P 144 0:S 145 L 0:X 69 0:( 71 0:] 150 L 86 H 151 0:[ 152 0:] 78 0:S 79 0:W 157 0:] 95 0:] H 158 L 0:) 164 0:] 163 0:( 132 0:X 88 0:[ 87 0:[ L H 97 L 96 H L 0:P 168 0:S 140 0:[ 169 0:W 174 0:[ 106 0:] 105 0:] 173 0:[ 0:) L 178 H 179 L 114 0:W 184 0:] 185 0:] 121 0:[ 103 0:X 104 0:( 112 0:[ 113 0:S 190 0:W 128 L 189 0:) 120 0:[ 196 0:[ 194 0:X 195 0:( H 119 L 135 0:] H 127 L 0:) 202 L 200 0:[ 0:P 201 0:S 126 0:] 134 0:] 0:X 133 0:( 208 0:[ 209 0:] 207 L 147 H 213 0:] 154 0:] 0:) 142 0:W 141 0:S H 214 L L H 148 0:[ 149 0:[ 218 0:( 180 0:X 219 0:] 156 L 155 H 224 0:W L 0:P 223 0:S 186 0:[ 229 0:[ 162 0:] 161 0:] 230 0:[ 0:) 234 H 235 L L 167 0:W 239 0:] 240 0:] 172 0:[ 177 L 159 0:X 160 0:( 165 0:[ 166 0:S 244 0:) 245 0:W 250 0:X 251 0:( 252 0:[ 183 0:] H 170 L 171 0:[ 256 0:[ 0:P 257 0:S 175 0:] 258 L H 176 L 0:) 263 0:[ 264 0:] 0:X 181 0:( 182 0:] 262 L 191 H 188 0:W H 269 L 268 0:] 197 0:] 0:) 187 0:S L H 274 0:( 225 0:X 192 0:[ 193 0:[ 275 0:] 199 L 198 H 0:P 279 0:S 231 0:[ 280 0:W L 285 0:[ 206 0:] 205 0:] 284 0:[ 289 H 290 L 0:) L 212 0:W 217 0:[ 295 0:] 296 0:] 300 0:) 203 0:X 204 0:( 210 0:[ 211 0:S 222 L 301 0:W 228 0:] 305 0:X 306 0:( 307 0:[ 216 0:[ H 215 L H 221 L 311 0:[ 0:P 312 0:S 0:) 220 0:] 313 L 0:X 226 0:( 320 0:] 227 0:] 319 0:[ 318 L 236 H H 325 L 324 0:] 241 0:] 233 0:W 0:) 232 0:S L H 237 0:[ 238 0:[ 329 0:( 270 0:X 330 0:] L 243 L 242 H 0:P 334 0:S 276 0:[ 335 0:W 341 0:[ 340 0:[ 249 0:] 248 0:] 345 H 346 L 0:) 255 0:W L 261 0:[ 350 0:] 351 0:] 356 0:W 267 L 355 0:) 246 0:X 247 0:( 253 0:[ 254 0:S 361 0:X 362 0:( 273 0:] 260 0:[ H 259 L 363 0:[ 0:) 369 L 367 0:[ 0:P 368 0:S H 266 L 265 0:] 374 0:[ 272 0:] 375 0:] 0:X 271 0:( 373 L 281 H 277 0:S H 380 L 379 0:] 286 0:] 278 0:W 0:) L H 385 0:( 314 0:X 282 0:[ 283 0:[ 386 0:] 0:P 390 0:S 321 0:[ 391 0:W L 288 L 287 H 294 0:] 293 0:] 394 0:[ 395 0:[ 299 0:W L 398 H 399 L 0:) 402 0:] 403 0:] 304 0:[ 310 L 406 0:W 291 0:X 292 0:( 297 0:[ 298 0:S 405 0:) 409 0:[ 317 0:] 303 0:[ H 302 L 408 0:( 336 0:X 411 L 0:) 0:P 410 0:S 342 0:[ 308 0:] H 309 L 412 0:[ 0:X 315 0:( 413 0:] 316 0:] 0:) 322 0:S 323 0:W H 414 L 415 0:] L 326 H 327 0:[ 328 0:[ 416 0:W L 333 L 332 H 331 0:] 417 0:[ 339 0:] 338 0:] L 344 0:W 0:) 349 0:[ 0:X 337 0:( 343 0:S 354 L 348 0:[ H 347 L 360 0:] 352 0:] 0:) H 353 L 359 0:] 357 0:X 358 0:( 364 0:[ 0:P 365 0:S 366 0:W 372 0:[ L 370 H 371 0:[ 376 0:] 378 L 377 H L 384 0:] 383 0:] 0:) 389 0:W 393 0:[ 381 0:X 382 0:( 387 0:[ 0:P 388 0:S 397 L H L 401 0:] 392 0:[ 19 L 18 H 0:) H 396 L 25 0:] 24 0:] 400 0:] 404 0:W 30 0:W 35 0:[ 407 0:[ L 0:) L 0:) 0 1 0:X 2 0:( 3 0:[ 4 0:P 5 0:S 6 L 9 0:] 14 0:( 19 0:S 0:P 24 0:[ 7 0:[ 12 H 17 0:] 22 0:) 0:( 26 0:X 28 0:[ 29 L 30 0:] 0:( 8 0:[ 11 L 10 H H 13 L 18 0:] 16 0:] 15 0:] 21 0:W 23 0:W 25 0:[ 27 0:[ L L 20 0:)
  32. DISCUSSION 32 ▸ Introduction of feet doesn’t make direct grammars

    more succinct. So why might we still want to use feet? ▸ Clearly more structure in the grammar referring to feet ▸ Without feet, require case-by-case stipulations in grammar ▸ Direct accounts more succinct than Karttunen OT accounts. So why use OT? ▸ Certain stress patterns very difficult to describe under usual constraints, OT framework captures universals that phonologists have noticed ▸ Any regular stress pattern could be captured in direct approach (unless we introduced further restrictions)
  33. CONCLUSION 33 ▸ Answer: not necessarily! Depends on grammar formalism.

    ▸ Here, preliminary “proof-of-concept” exploration ▸ Not clear that counting symbols right way to assess how well grammar is capturing generalizations ▸ Not clear that results here would generalize for other phonological phenomena ▸ Shows a way we can study concrete, specific linguistic proposals and engage closely with linguistic data and practice while maintaining a rigorous approach Do constituents make phonological grammars for Samoan word stress more succinct?
  34. FOR MORE INFORMATION… 34 ▸ Paper and code ▸ https://github.com/krismyu/smo-constituency-feet

    ▸ Related work at my academic website ▸ www.krisyu.org Merci beaucoup!
  35. APPENDIX: CONVENTIONS FOR SYMBOL COUNTING 35