Sound Change Mechanization

Sound Change Sound Change Mechanization Mechanization Some work-in-progress notes Some
work-in-progress notes Tiago Tresoldi DLCE department meeting, MPI-SHH, Jena, 18/02/2020 2

What I will talk about What I will talk about
History of sound change mechanization Formalized notation Two different tasks The AlteruPhono package Current status and plans What it would (hopefully) allow 3 . 1

Caveats Caveats Segments as units composed of bundles of (distinctive)
features Not (necessarily) phonemes, but a “useful descriptive fiction” Sound changes alone don’t explain all of history 3 . 2

Background Background Sound change mechanization proposed in the 50s (Gleason
1959), started in the 60s and 70s (Kay 1964, Hewson 1977) 4 . 1

Diachrony replaced by synchrony Informed edit distance (Damerau, 1975) Phonological
distance (Wieling et Nerbonne, 2015) Likelihood of correspondence (Bouchard-Côté et al., 2013) Sequence alignment (List, 2014) Ancestral State Reconstruction (Jäger et List, 2018) 4 . 2

(Milky Way from Pakistan’s Karakoram Range, Anne 4 . 3

( ) bitacoradegalileo.com 4 . 4

Good and preferable solution for tasks such as cognate detection
Observation before inference But algorithms connecting French là and Hawaiian laila (“there”), while missing German da and English there, unsettle linguists (List et al., 2017) 4 . 5

“All /e/ become /i/ when preceded by a consonant” /tade/
turns into /tadi/ /pepe/ turns into /pipi/ /emu/ stays as /emu/ 4 . 6

XKCD 1831, “Here to help” 4 . 7

Problem #1: Formal notation Problem #1: Formal notation The pattern
of A > B / C (“A turns into B in context C”) is but a blueprint: Footnotes and comparative prose Conventions (∅, #, etc.) Ad hoc solutions (shifts, alternatives, etc.) Implicitness (e.g. coronal plosives) IPA issues (and “e” is not necessarily /e/!) 5 . 1

Proto-Omotic to North Omotic e → i / #l_{P,C[+voiced]} e(:)
→ i(:) / #C[+sibilant]_{d,n,r} {u,a,i} → ∅ / _% (when stressed) Classical Arabic to Hadhrami Arabic dˤ q → ðˤ ɡ 5 . 2

Problem #2: Typological research Problem #2: Typological research Insufficient empirical
probabilities Database of sound changes Growing body of supporting research (Blevins 2004, Kümmel 2007, Bybee 2015, Hruschka et al. 2015) Case of Index Diachronica 6

Task #1: Forwards Reconstruction Task #1: Forwards Reconstruction Smith (1969),
PIE → Russian /aŋgʷʰi/(“worm, snake”) -> уж /uʂ/ (“adder”) Mignani (1971), P.-Romance → Franco-Provençal Burton-Hunter (1976), Latin → Old French Eastlack (1977), Latin → Spanish Bátori (1982), Proto-Uralic → Finnish/Hungarian 7 . 1

Hartman (2003), a de facto programming language Generative phonology Remarkably
powerful Notation very different from the usual 7 . 2

$1: A_Coloring «/aw/, /aj/ > /ow/, /ej/» A: +syll (*)
+low (*) -cons (*+1) -syll (*+1) +high (*+1) 1: -low (*) 2: back (*) = back (*+1) 3: round (*) = round (*+1) END: A_Coloring 1 2 3 4 5 6 7 . 3

Task #2: Backwards Task #2: Backwards Reconstruction Reconstruction Hewson (1977)
on Algonquian each lexeme handled on average 21 potential proto-forms Lowe & Mazaudon (1994), Oakes (2000), Kondrak et al. (2007) 7 . 4

Even a simple rule as b > p applied to
/papa/ yields four alternatives /baba/, /bapa/, /paba/, /papa/ Rosenfelder’s SCA² on Portuguese “distrito” (cf. Sims-Williams, 2018) distrito districtus distriptus (dozen others) diiistericto divivistriviviptus 7 . 5

AlteruPhono AlteruPhono Both a Python library and a web tool
Intended for usage also without installing Formalization of notation (database, CLTS) PEG grammar Forwards and backwards direction https://github.com/tresoldi/alteruphono 8 . 1

Currently Currently Python library usable by programmers 800 test rules
(“stress tests”), (B)IPA features Real data (on-going) Proto-Algonquian to Shawnee (48 rules) Conversion of Hartman’s LS (1,800 forms) Toy dataset of PIE to RP English (10 words) */kʷetwṓr/ ➞ /fɔː/ (“four”) */h₂ḱowsyónom/ ➞ /hiə/ (“hear”) 8 . 2

Example #1, Simple rule Example #1, Simple rule a ➞
e / _ # papa pape pape *papa *pape 9 . 1

Example #2, Sound classes Example #2, Sound classes b ➞
β / V _ V ibaba iβaβa iβaβa *ibaba *iβaba *ibaβa *iβaβa 9 . 2

Example #3, Back-references Example #3, Back-references k V ➞ @2
/ # _ kira ira ira *kira *ira koke oke oke *koke *oke 9 . 3

Example #4, Back-references with Example #4, Back-references with changes and
alternatives changes and alternatives p|k ➞ @1[+voiced] / V _ V apakak abagak abagak *abagak *apagak *abakak *apakak 9 . 4

Example #5, sets and mappings, Example #5, sets and mappings,
alternatives alternatives {a,e,u} ➞ {e,i,o} / r _ | _ r areru eriro eriro *eriro *ariro *erero *eriru *arero *ariru *ereru *areru 9 . 5

Multitiers Multitiers Approach to tiers as extensions to alignments and
features (List et Chacon, 2015; Tresoldi et al., 2018) “initial /t/ becomes /n/ if there is a nasal consonant anywhere in the word” tata ➞ tata taɲa ➞ naɲa tatatataɲatata ➞ natatataɲatata 10 . 1

Tier Seg-1 Seg-2 Seg-3 Seg-4 sound t a t a
sound t a ɲ a 10 . 2

Tier Seg-1 Seg-2 Seg-3 Seg-4 sound t a t a
nasal_in_word False False False False sound t a ɲ a nasal_in_word True True True True t[nasal_in_word] > n / # _ 10 . 3

Roadmap Roadmap Get usable library and tool Consolidate notation Double
implementation, JSON Feature system agnostic (BIPA default) Write paper for review and feedback Bootstrapping for other projects on hold inference of sound changes from cognates catalog attenuate homoplasy from sounds as states 11

References References BÁTORI, I., 1982. “Computersimulation in der linguistischen Forschung
(Maschinelle Veriﬁzierung der BÁTORI, I., 1982. “Computersimulation in der linguistischen Forschung (Maschinelle Veriﬁzierung der rekonstruierten Lautformen anhand uralischen Materials)”, rekonstruierten Lautformen anhand uralischen Materials)”, Ural-altaische Jahrbücher Ural-altaische Jahrbücher, Neue Folge, 2. 1–18. , Neue Folge, 2. 1–18. BLEVINS, J., 2004. BLEVINS, J., 2004. Evolutionary phonology: The emergence of sound patterns Evolutionary phonology: The emergence of sound patterns. Cambridge University Press. . Cambridge University Press. BOUCHARD-CÔTÉ, A.; HALL, D.; GRIFFITHS, T. L. & KLEIN, D., 2013. “Automated reconstruction of ancient BOUCHARD-CÔTÉ, A.; HALL, D.; GRIFFITHS, T. L. & KLEIN, D., 2013. “Automated reconstruction of ancient languages using probabilistic models of sound change”, languages using probabilistic models of sound change”, Proceedings of the National Academy of Sciences Proceedings of the National Academy of Sciences 110(11). 4224-4229. 110(11). 4224-4229. BURTON–HUNTER, S., 1976. “Romance etymology: A computerized model”, BURTON–HUNTER, S., 1976. “Romance etymology: A computerized model”, Computers and the Humanities Computers and the Humanities 10. 217–220. 10. 217–220. BYBEE, J., 2015. “Articulatory processing and frequency of sound change”, *The Oxford Handbook of BYBEE, J., 2015. “Articulatory processing and frequency of sound change”, *The Oxford Handbook of Historical Phonology", 467-484. Historical Phonology", 467-484. DAMERAU, F. J., 1975. “Mechanization of cognate recognition in comparative linguistics”, DAMERAU, F. J., 1975. “Mechanization of cognate recognition in comparative linguistics”, Linguistics: An Linguistics: An International Journal International Journal 13(148). 5–29. 13(148). 5–29. EASTLACK , C. L., 1977. “Iberochange: A program to simulate systematic sound change in Ibero-Romance”, EASTLACK , C. L., 1977. “Iberochange: A program to simulate systematic sound change in Ibero-Romance”, Computers and the Humanities Computers and the Humanities 11. 81–88. 11. 81–88. GLEASON , H. A. Jr., 1959. “Counting and calculating for historical reconstruction”, GLEASON , H. A. Jr., 1959. “Counting and calculating for historical reconstruction”, Anthropological Anthropological Linguistics Linguistics 1(2). 22–32. 1(2). 22–32. 12 . 1

HARTMAN, L., 2003. “Phono (Version 4.0): Software for modeling regular
historical sound change”, in Leonel HARTMAN, L., 2003. “Phono (Version 4.0): Software for modeling regular historical sound change”, in Leonel Ruiz Miyares, Celia E. Alvarez Moreno & María Rosa Alvarez Silva (eds.), Ruiz Miyares, Celia E. Alvarez Moreno & María Rosa Alvarez Silva (eds.), Actas: VIII Simposio Internacional de Actas: VIII Simposio Internacional de Comunicación Social Comunicación Social, Santiago de Cuba, 20–24 de enero del 2003, Volume I. Santiago de Cuba: Centro de , Santiago de Cuba, 20–24 de enero del 2003, Volume I. Santiago de Cuba: Centro de Lingüística Aplicada, Ministerio Ciencia, Santiago de Cuba. 606–609. Lingüística Aplicada, Ministerio Ciencia, Santiago de Cuba. 606–609. HEWSON, J., 1977. “Reconstructing prehistoric languages on the computer: The triumph of the electronic HEWSON, J., 1977. “Reconstructing prehistoric languages on the computer: The triumph of the electronic neogrammarian”, in A. Zampolli & N. Calzolari (eds.), neogrammarian”, in A. Zampolli & N. Calzolari (eds.), Proceedings of the 5th Conference on Computational Proceedings of the 5th Conference on Computational Linguistics Linguistics, Pisa 1973, Volume I. Florence: Olschki. 263–273. , Pisa 1973, Volume I. Florence: Olschki. 263–273. HRUSCHKA, D. J.; BRANFORD, S.; SMITH, E. D.; WILKINS, J.; MEADE, A.; PAGEL, M. & BHATTACHARYA, T., HRUSCHKA, D. J.; BRANFORD, S.; SMITH, E. D.; WILKINS, J.; MEADE, A.; PAGEL, M. & BHATTACHARYA, T., 2015. “Detecting Regular Sound Changes in Linguistics as Events of Concerted Evolution”, 2015. “Detecting Regular Sound Changes in Linguistics as Events of Concerted Evolution”, Current Biology Current Biology 25: 1-9. 25: 1-9. JÄGER, G. & LIST, J.-M., 2018. “Using ancestral state reconstruction methods for onomasiological JÄGER, G. & LIST, J.-M., 2018. “Using ancestral state reconstruction methods for onomasiological reconstruction in multilingual word lists”. reconstruction in multilingual word lists”. Language Dynamics and Change Language Dynamics and Change 8.1. 22-54. 8.1. 22-54. KÜMMEL, M. J., 2007, “Konsonantenwandel: Bausteine zu einer Typologie des Lautwandels und ihre KÜMMEL, M. J., 2007, “Konsonantenwandel: Bausteine zu einer Typologie des Lautwandels und ihre Konsequenzen für die vergleichende Rekonstruktion”. Reichert, Wiesbaden. Konsequenzen für die vergleichende Rekonstruktion”. Reichert, Wiesbaden. LIST, J.-M., 2014. LIST, J.-M., 2014. Sequence comparison in historical linguistics Sequence comparison in historical linguistics. Düsseldorf University Press: Düsseldorf. . Düsseldorf University Press: Düsseldorf. LIST, J.-M. & CHACON, T., 2015. “Towards a Cross-Linguistic Database for Historical Phonology? A proposal LIST, J.-M. & CHACON, T., 2015. “Towards a Cross-Linguistic Database for Historical Phonology? A proposal for a machine readable modeling of phonetic context”, Leiden. for a machine readable modeling of phonetic context”, Leiden. LIST, J.-M.; GREENHILL, S. J. & GRAY, R., 2017). “The potential of automatic word comparison for historical LIST, J.-M.; GREENHILL, S. J. & GRAY, R., 2017). “The potential of automatic word comparison for historical linguistics”. linguistics”. PLOS ONE PLOS ONE 12.1. 1-18. 12.1. 1-18. 12 . 2

LOWE, J. B. & MAZAUDON, M., 1994. “The reconstruction engine:
A computer implementation of the LOWE, J. B. & MAZAUDON, M., 1994. “The reconstruction engine: A computer implementation of the comparative method”, comparative method”, Computational Linguistics Computational Linguistics 20(3). 381–417. 20(3). 381–417. MIGNANI, R., 1971. “Review of Durham & Rogers 1969”, MIGNANI, R., 1971. “Review of Durham & Rogers 1969”, Computers and the Humanities Computers and the Humanities 5(3). 191. 5(3). 191. OAKES, M., 2000. “Computer estimation of vocabulary in a protolanguage from word lists in four daughter OAKES, M., 2000. “Computer estimation of vocabulary in a protolanguage from word lists in four daughter languages”, languages”, Journal of Quantitative Linguistics Journal of Quantitative Linguistics 7(3). 233–243. 7(3). 233–243. KAY, M., 1964. KAY, M., 1964. The logic of cognate recognition in historical linguistics The logic of cognate recognition in historical linguistics. Memorandum RM–4224–PR, . Memorandum RM–4224–PR, prepared for United States Air Force Project Rand. Santa Monica, CA: The Rand Corporation. prepared for United States Air Force Project Rand. Santa Monica, CA: The Rand Corporation. KONDRAK, G.; BECK, D. & DILTS, P., 2007. “Creating a comparative dictionary of Totonac–Tepehua”, in John KONDRAK, G.; BECK, D. & DILTS, P., 2007. “Creating a comparative dictionary of Totonac–Tepehua”, in John Nerbonne, T. Mark Ellison & Grzegorz Kondrak (eds.), Nerbonne, T. Mark Ellison & Grzegorz Kondrak (eds.), Computing and historical phonology: Proceedings of the Computing and historical phonology: Proceedings of the Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology. Prague: . Prague: Association for Computational Linguistics. 134–141. Association for Computational Linguistics. 134–141. SIMS‐WILLIAMS, P., 2018. “Mechanising historical phonology”. SIMS‐WILLIAMS, P., 2018. “Mechanising historical phonology”. Transactions of the Philological Society Transactions of the Philological Society,, 116(3), pp. 555-573. 116(3), pp. 555-573. SMITH, R. N., 1969. “A computer simulation of phonological change”, SMITH, R. N., 1969. “A computer simulation of phonological change”, ITL: Tijdschrift voor Toegepaste ITL: Tijdschrift voor Toegepaste Linguistiek Linguistiek 5. 82–91. 5. 82–91. TRESOLDI, T.; ANDERSON, C. & LIST, J.-M. “Modelling sound change with the help of multi-tiered sequence TRESOLDI, T.; ANDERSON, C. & LIST, J.-M. “Modelling sound change with the help of multi-tiered sequence representations”, representations”, Poznań Linguistic Meeting Poznań Linguistic Meeting, 2018-10-15. , 2018-10-15. WIELING, M. & NERBONNE, J., 2015. “Advances in dialectometry”, WIELING, M. & NERBONNE, J., 2015. “Advances in dialectometry”, Annual Review of Linguistics Annual Review of Linguistics 1. 243–264. 1. 243–264. 12 . 3

Thank you! Thank you! [email protected] 13

Sound Change Mechanization

Sound Change Mechanization

More Decks by Tiago Tresoldi

Other Decks in Science

Featured

Transcript