Slide 1

Slide 1 text

Sound Change Sound Change Mechanization Mechanization Some work-in-progress notes Some work-in-progress notes Tiago Tresoldi DLCE department meeting, MPI-SHH, Jena, 18/02/2020 2

Slide 2

Slide 2 text

What I will talk about What I will talk about History of sound change mechanization Formalized notation Two different tasks The AlteruPhono package Current status and plans What it would (hopefully) allow 3 . 1

Slide 3

Slide 3 text

Caveats Caveats Segments as units composed of bundles of (distinctive) features Not (necessarily) phonemes, but a “useful descriptive fiction” Sound changes alone don’t explain all of history 3 . 2

Slide 4

Slide 4 text

Background Background Sound change mechanization proposed in the 50s (Gleason 1959), started in the 60s and 70s (Kay 1964, Hewson 1977) 4 . 1

Slide 5

Slide 5 text

Diachrony replaced by synchrony Informed edit distance (Damerau, 1975) Phonological distance (Wieling et Nerbonne, 2015) Likelihood of correspondence (Bouchard-Côté et al., 2013) Sequence alignment (List, 2014) Ancestral State Reconstruction (Jäger et List, 2018) 4 . 2

Slide 6

Slide 6 text

(Milky Way from Pakistan’s Karakoram Range, Anne 4 . 3

Slide 7

Slide 7 text

( ) bitacoradegalileo.com 4 . 4

Slide 8

Slide 8 text

Good and preferable solution for tasks such as cognate detection Observation before inference But algorithms connecting French là and Hawaiian laila (“there”), while missing German da and English there, unsettle linguists (List et al., 2017) 4 . 5

Slide 9

Slide 9 text

“All /e/ become /i/ when preceded by a consonant” /tade/ turns into /tadi/ /pepe/ turns into /pipi/ /emu/ stays as /emu/ 4 . 6

Slide 10

Slide 10 text

XKCD 1831, “Here to help” 4 . 7

Slide 11

Slide 11 text

Problem #1: Formal notation Problem #1: Formal notation The pattern of A > B / C (“A turns into B in context C”) is but a blueprint: Footnotes and comparative prose Conventions (∅, #, etc.) Ad hoc solutions (shifts, alternatives, etc.) Implicitness (e.g. coronal plosives) IPA issues (and “e” is not necessarily /e/!) 5 . 1

Slide 12

Slide 12 text

Proto-Omotic to North Omotic e → i / #l_{P,C[+voiced]} e(:) → i(:) / #C[+sibilant]_{d,n,r} {u,a,i} → ∅ / _% (when stressed) Classical Arabic to Hadhrami Arabic dˤ q → ðˤ ɡ 5 . 2

Slide 13

Slide 13 text

Problem #2: Typological research Problem #2: Typological research Insufficient empirical probabilities Database of sound changes Growing body of supporting research (Blevins 2004, Kümmel 2007, Bybee 2015, Hruschka et al. 2015) Case of Index Diachronica 6

Slide 14

Slide 14 text

Task #1: Forwards Reconstruction Task #1: Forwards Reconstruction Smith (1969), PIE → Russian /aŋgʷʰi/(“worm, snake”) -> уж /uʂ/ (“adder”) Mignani (1971), P.-Romance → Franco-Provençal Burton-Hunter (1976), Latin → Old French Eastlack (1977), Latin → Spanish Bátori (1982), Proto-Uralic → Finnish/Hungarian 7 . 1

Slide 15

Slide 15 text

Hartman (2003), a de facto programming language Generative phonology Remarkably powerful Notation very different from the usual 7 . 2

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

$1: A_Coloring «/aw/, /aj/ > /ow/, /ej/» A: +syll (*) +low (*) -cons (*+1) -syll (*+1) +high (*+1) 1: -low (*) 2: back (*) = back (*+1) 3: round (*) = round (*+1) END: A_Coloring 1 2 3 4 5 6 7 . 3

Slide 18

Slide 18 text

Task #2: Backwards Task #2: Backwards Reconstruction Reconstruction Hewson (1977) on Algonquian each lexeme handled on average 21 potential proto-forms Lowe & Mazaudon (1994), Oakes (2000), Kondrak et al. (2007) 7 . 4

Slide 19

Slide 19 text

Even a simple rule as b > p applied to /papa/ yields four alternatives /baba/, /bapa/, /paba/, /papa/ Rosenfelder’s SCA² on Portuguese “distrito” (cf. Sims-Williams, 2018) distrito districtus distriptus (dozen others) diiistericto divivistriviviptus 7 . 5

Slide 20

Slide 20 text

AlteruPhono AlteruPhono Both a Python library and a web tool Intended for usage also without installing Formalization of notation (database, CLTS) PEG grammar Forwards and backwards direction https://github.com/tresoldi/alteruphono 8 . 1

Slide 21

Slide 21 text

Currently Currently Python library usable by programmers 800 test rules (“stress tests”), (B)IPA features Real data (on-going) Proto-Algonquian to Shawnee (48 rules) Conversion of Hartman’s LS (1,800 forms) Toy dataset of PIE to RP English (10 words) */kʷetwṓr/ ➞ /fɔː/ (“four”) */h₂ḱowsyónom/ ➞ /hiə/ (“hear”) 8 . 2

Slide 22

Slide 22 text

8 . 3

Slide 23

Slide 23 text

Example #1, Simple rule Example #1, Simple rule a ➞ e / _ # papa pape pape *papa *pape 9 . 1

Slide 24

Slide 24 text

Example #2, Sound classes Example #2, Sound classes b ➞ β / V _ V ibaba iβaβa iβaβa *ibaba *iβaba *ibaβa *iβaβa 9 . 2

Slide 25

Slide 25 text

Example #3, Back-references Example #3, Back-references k V ➞ @2 / # _ kira ira ira *kira *ira koke oke oke *koke *oke 9 . 3

Slide 26

Slide 26 text

Example #4, Back-references with Example #4, Back-references with changes and alternatives changes and alternatives p|k ➞ @1[+voiced] / V _ V apakak abagak abagak *abagak *apagak *abakak *apakak 9 . 4

Slide 27

Slide 27 text

Example #5, sets and mappings, Example #5, sets and mappings, alternatives alternatives {a,e,u} ➞ {e,i,o} / r _ | _ r areru eriro eriro *eriro *ariro *erero *eriru *arero *ariru *ereru *areru 9 . 5

Slide 28

Slide 28 text

Multitiers Multitiers Approach to tiers as extensions to alignments and features (List et Chacon, 2015; Tresoldi et al., 2018) “initial /t/ becomes /n/ if there is a nasal consonant anywhere in the word” tata ➞ tata taɲa ➞ naɲa tatatataɲatata ➞ natatataɲatata 10 . 1

Slide 29

Slide 29 text

Tier Seg-1 Seg-2 Seg-3 Seg-4 sound t a t a sound t a ɲ a 10 . 2

Slide 30

Slide 30 text

Tier Seg-1 Seg-2 Seg-3 Seg-4 sound t a t a nasal_in_word False False False False sound t a ɲ a nasal_in_word True True True True t[nasal_in_word] > n / # _ 10 . 3

Slide 31

Slide 31 text

Roadmap Roadmap Get usable library and tool Consolidate notation Double implementation, JSON Feature system agnostic (BIPA default) Write paper for review and feedback Bootstrapping for other projects on hold inference of sound changes from cognates catalog attenuate homoplasy from sounds as states 11

Slide 32

Slide 32 text

References References BÁTORI, I., 1982. “Computersimulation in der linguistischen Forschung (Maschinelle Verifizierung der BÁTORI, I., 1982. “Computersimulation in der linguistischen Forschung (Maschinelle Verifizierung der rekonstruierten Lautformen anhand uralischen Materials)”, rekonstruierten Lautformen anhand uralischen Materials)”, Ural-altaische Jahrbücher Ural-altaische Jahrbücher, Neue Folge, 2. 1–18. , Neue Folge, 2. 1–18. BLEVINS, J., 2004. BLEVINS, J., 2004. Evolutionary phonology: The emergence of sound patterns Evolutionary phonology: The emergence of sound patterns. Cambridge University Press. . Cambridge University Press. BOUCHARD-CÔTÉ, A.; HALL, D.; GRIFFITHS, T. L. & KLEIN, D., 2013. “Automated reconstruction of ancient BOUCHARD-CÔTÉ, A.; HALL, D.; GRIFFITHS, T. L. & KLEIN, D., 2013. “Automated reconstruction of ancient languages using probabilistic models of sound change”, languages using probabilistic models of sound change”, Proceedings of the National Academy of Sciences Proceedings of the National Academy of Sciences 110(11). 4224-4229. 110(11). 4224-4229. BURTON–HUNTER, S., 1976. “Romance etymology: A computerized model”, BURTON–HUNTER, S., 1976. “Romance etymology: A computerized model”, Computers and the Humanities Computers and the Humanities 10. 217–220. 10. 217–220. BYBEE, J., 2015. “Articulatory processing and frequency of sound change”, *The Oxford Handbook of BYBEE, J., 2015. “Articulatory processing and frequency of sound change”, *The Oxford Handbook of Historical Phonology", 467-484. Historical Phonology", 467-484. DAMERAU, F. J., 1975. “Mechanization of cognate recognition in comparative linguistics”, DAMERAU, F. J., 1975. “Mechanization of cognate recognition in comparative linguistics”, Linguistics: An Linguistics: An International Journal International Journal 13(148). 5–29. 13(148). 5–29. EASTLACK , C. L., 1977. “Iberochange: A program to simulate systematic sound change in Ibero-Romance”, EASTLACK , C. L., 1977. “Iberochange: A program to simulate systematic sound change in Ibero-Romance”, Computers and the Humanities Computers and the Humanities 11. 81–88. 11. 81–88. GLEASON , H. A. Jr., 1959. “Counting and calculating for historical reconstruction”, GLEASON , H. A. Jr., 1959. “Counting and calculating for historical reconstruction”, Anthropological Anthropological Linguistics Linguistics 1(2). 22–32. 1(2). 22–32. 12 . 1

Slide 33

Slide 33 text

HARTMAN, L., 2003. “Phono (Version 4.0): Software for modeling regular historical sound change”, in Leonel HARTMAN, L., 2003. “Phono (Version 4.0): Software for modeling regular historical sound change”, in Leonel Ruiz Miyares, Celia E. Alvarez Moreno & María Rosa Alvarez Silva (eds.), Ruiz Miyares, Celia E. Alvarez Moreno & María Rosa Alvarez Silva (eds.), Actas: VIII Simposio Internacional de Actas: VIII Simposio Internacional de Comunicación Social Comunicación Social, Santiago de Cuba, 20–24 de enero del 2003, Volume I. Santiago de Cuba: Centro de , Santiago de Cuba, 20–24 de enero del 2003, Volume I. Santiago de Cuba: Centro de Lingüística Aplicada, Ministerio Ciencia, Santiago de Cuba. 606–609. Lingüística Aplicada, Ministerio Ciencia, Santiago de Cuba. 606–609. HEWSON, J., 1977. “Reconstructing prehistoric languages on the computer: The triumph of the electronic HEWSON, J., 1977. “Reconstructing prehistoric languages on the computer: The triumph of the electronic neogrammarian”, in A. Zampolli & N. Calzolari (eds.), neogrammarian”, in A. Zampolli & N. Calzolari (eds.), Proceedings of the 5th Conference on Computational Proceedings of the 5th Conference on Computational Linguistics Linguistics, Pisa 1973, Volume I. Florence: Olschki. 263–273. , Pisa 1973, Volume I. Florence: Olschki. 263–273. HRUSCHKA, D. J.; BRANFORD, S.; SMITH, E. D.; WILKINS, J.; MEADE, A.; PAGEL, M. & BHATTACHARYA, T., HRUSCHKA, D. J.; BRANFORD, S.; SMITH, E. D.; WILKINS, J.; MEADE, A.; PAGEL, M. & BHATTACHARYA, T., 2015. “Detecting Regular Sound Changes in Linguistics as Events of Concerted Evolution”, 2015. “Detecting Regular Sound Changes in Linguistics as Events of Concerted Evolution”, Current Biology Current Biology 25: 1-9. 25: 1-9. JÄGER, G. & LIST, J.-M., 2018. “Using ancestral state reconstruction methods for onomasiological JÄGER, G. & LIST, J.-M., 2018. “Using ancestral state reconstruction methods for onomasiological reconstruction in multilingual word lists”. reconstruction in multilingual word lists”. Language Dynamics and Change Language Dynamics and Change 8.1. 22-54. 8.1. 22-54. KÜMMEL, M. J., 2007, “Konsonantenwandel: Bausteine zu einer Typologie des Lautwandels und ihre KÜMMEL, M. J., 2007, “Konsonantenwandel: Bausteine zu einer Typologie des Lautwandels und ihre Konsequenzen für die vergleichende Rekonstruktion”. Reichert, Wiesbaden. Konsequenzen für die vergleichende Rekonstruktion”. Reichert, Wiesbaden. LIST, J.-M., 2014. LIST, J.-M., 2014. Sequence comparison in historical linguistics Sequence comparison in historical linguistics. Düsseldorf University Press: Düsseldorf. . Düsseldorf University Press: Düsseldorf. LIST, J.-M. & CHACON, T., 2015. “Towards a Cross-Linguistic Database for Historical Phonology? A proposal LIST, J.-M. & CHACON, T., 2015. “Towards a Cross-Linguistic Database for Historical Phonology? A proposal for a machine readable modeling of phonetic context”, Leiden. for a machine readable modeling of phonetic context”, Leiden. LIST, J.-M.; GREENHILL, S. J. & GRAY, R., 2017). “The potential of automatic word comparison for historical LIST, J.-M.; GREENHILL, S. J. & GRAY, R., 2017). “The potential of automatic word comparison for historical linguistics”. linguistics”. PLOS ONE PLOS ONE 12.1. 1-18. 12.1. 1-18. 12 . 2

Slide 34

Slide 34 text

LOWE, J. B. & MAZAUDON, M., 1994. “The reconstruction engine: A computer implementation of the LOWE, J. B. & MAZAUDON, M., 1994. “The reconstruction engine: A computer implementation of the comparative method”, comparative method”, Computational Linguistics Computational Linguistics 20(3). 381–417. 20(3). 381–417. MIGNANI, R., 1971. “Review of Durham & Rogers 1969”, MIGNANI, R., 1971. “Review of Durham & Rogers 1969”, Computers and the Humanities Computers and the Humanities 5(3). 191. 5(3). 191. OAKES, M., 2000. “Computer estimation of vocabulary in a protolanguage from word lists in four daughter OAKES, M., 2000. “Computer estimation of vocabulary in a protolanguage from word lists in four daughter languages”, languages”, Journal of Quantitative Linguistics Journal of Quantitative Linguistics 7(3). 233–243. 7(3). 233–243. KAY, M., 1964. KAY, M., 1964. The logic of cognate recognition in historical linguistics The logic of cognate recognition in historical linguistics. Memorandum RM–4224–PR, . Memorandum RM–4224–PR, prepared for United States Air Force Project Rand. Santa Monica, CA: The Rand Corporation. prepared for United States Air Force Project Rand. Santa Monica, CA: The Rand Corporation. KONDRAK, G.; BECK, D. & DILTS, P., 2007. “Creating a comparative dictionary of Totonac–Tepehua”, in John KONDRAK, G.; BECK, D. & DILTS, P., 2007. “Creating a comparative dictionary of Totonac–Tepehua”, in John Nerbonne, T. Mark Ellison & Grzegorz Kondrak (eds.), Nerbonne, T. Mark Ellison & Grzegorz Kondrak (eds.), Computing and historical phonology: Proceedings of the Computing and historical phonology: Proceedings of the Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology. Prague: . Prague: Association for Computational Linguistics. 134–141. Association for Computational Linguistics. 134–141. SIMS‐WILLIAMS, P., 2018. “Mechanising historical phonology”. SIMS‐WILLIAMS, P., 2018. “Mechanising historical phonology”. Transactions of the Philological Society Transactions of the Philological Society,, 116(3), pp. 555-573. 116(3), pp. 555-573. SMITH, R. N., 1969. “A computer simulation of phonological change”, SMITH, R. N., 1969. “A computer simulation of phonological change”, ITL: Tijdschrift voor Toegepaste ITL: Tijdschrift voor Toegepaste Linguistiek Linguistiek 5. 82–91. 5. 82–91. TRESOLDI, T.; ANDERSON, C. & LIST, J.-M. “Modelling sound change with the help of multi-tiered sequence TRESOLDI, T.; ANDERSON, C. & LIST, J.-M. “Modelling sound change with the help of multi-tiered sequence representations”, representations”, Poznań Linguistic Meeting Poznań Linguistic Meeting, 2018-10-15. , 2018-10-15. WIELING, M. & NERBONNE, J., 2015. “Advances in dialectometry”, WIELING, M. & NERBONNE, J., 2015. “Advances in dialectometry”, Annual Review of Linguistics Annual Review of Linguistics 1. 243–264. 1. 243–264. 12 . 3

Slide 35

Slide 35 text

Thank you! Thank you! [email protected] 13