Automatic Inference of Sound Changes from Cognates

Automatic Inference of Sound Changes from Cognates

Talk given at ICHL24 in Canberra (Australia)

74ebca07ccf49343d1ddaef84d65b78e?s=128

Tiago Tresoldi

July 04, 2019
Tweet

Transcript

  1. 1.

    Automatic Inference of Sound Changes from Cognates Tiago Tresoldi Max

    Planck Institute for the Science of Human History (MPI-SHH, Jena) Computer-Assisted Language Comparison (CALC) Project July 4th, 2019 Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 1 / 35
  2. 2.

    Contents 1 Introduction 2 Method 3 Inference with proto-forms 4

    Inference without proto-forms 5 Results Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 2 / 35
  3. 3.

    Background Automation of the comparative method has focused in tasks

    like sequence alignment, cognate detection, and tree inference. There is no generic tool for sound change inference. A problem related to, but different from, the inference of ancestral states (J¨ ager 2018, List 2019), as a sound change is computationally a state machine. There are attempts at applying and evaluating sound changes in forward (e.g., Hartmann 2003) and backward reconstruction (e.g., Hewson 1973 and Kondrak 2009), as well as doing phylogenetic analyses with the presence/absence of sound changes as characters. Difficulties due to suprasegmental changes (e.g., nasalization), ancestor states not attested in reflexes (e.g., PIE laryngeals), and conditioning information missing in reflexes (e.g., Verner’s law). Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 3 / 35
  4. 4.

    Sound change inference Computationally complex problem. Linear growth with inventory

    size, but exponential growth when involving chains of sound changes. Lack of formalized prior probabilities. Some changes known to be common, but no catalog exists, nor a matrix of transition probabilities. Few supporting research, such as Blevins (2004), K¨ ummel (2007), and Bybee (2015), besides automated ones as Hruschka et al. (2015). Figure: Visualization of transition matrix from Hruschka et al. (2015). Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 4 / 35
  5. 5.

    Two related problems 1. Sound change inference with proto-forms Given

    predecessors (“proto-forms”) and one or more successor (“reflexes”) states (“sounds”), infer the relations (“sound changes”) that best explain the observed correspondences. Ex.: A proto-language has */k/ and its descendants have either /k/ or /tS/, what explains that? 2. Sound change inference without proto-forms Given states (“sounds”) in related items (“cognates”), infer the ancestral state (“proto-sound”) and the relations (“sound changes”) that best explain the correspondences according to typological and evolutionary assumptions. Ex.: Language A has /tS/ corresponding to Language B /k/, what explains that? Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 5 / 35
  6. 6.

    Method We are developing a method to model sequences with

    “multitiers” (parallel layers of information, such as features and classes), expanding on Chacon and List (2015) and Tresoldi et al. (2018). From sets of tiers provided as hypotheses, redundant information is pruned and only layers that result in information gain are kept. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 6 / 35
  7. 7.

    Tiers - 1 Figure: Two aligned sound sequences. Tresoldi, T.

    (MPI-SHH) Sound change inference July 4th, 2019 7 / 35
  8. 9.

    Tiers - 2 Figure: Multitier system with English, Dutch, Swedish,

    and Sranan sequence sounds, plus a positional Index tier. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 9 / 35
  9. 10.

    Tiers - 3 Figure: Multitier system with English, Dutch, Swedish,

    and Sranan sequence sounds and sound classes, plus a positional Index tier. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 10 / 35
  10. 11.

    Tiers - 4 Figure: Multitier system with English, Dutch, Swedish,

    and Sranan sequence sounds, sound classes, and following vowel status, plus a positional Index tier. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 11 / 35
  11. 13.

    Tiers - 5 Figure: Relational multitier system with sounds and

    word indexes for English, German, and Dutch, plus a positional Index tier. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 13 / 35
  12. 14.

    Example 1 Problem Explain the development of initial (index=1) Proto-Germanic

    */s/ (PG=s) in German (G) in terms of the subsequent sound class in the “CV” model (PG-CV-R). ID Index PG G PG-CV-R Count Cov Solve Examples 1 1 *s S C 78 1.0 * [19, 60] 2 1 *s z V 13 1.0 * [123, 156] Solution *s → S / # [C] *s → z / # [V] Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 14 / 35
  13. 15.

    Example 2 Problem Explain the development of Proto-Germanic */p/ (PG=p)

    in German (G) in terms of the preceding (PG-D-L) or subsequent (PG-D-R) sound class in the “Dolgopolsky” model. ID PG G PG-D-L PG-D-R Count Cov Solve Examples 1 *p f V ∅ 11 0.69 * [297, 861] 2 *p f R ∅ 5 0.31 * [606, 794] 3 *p p S ∅ 15 1 * [75, 218] 4 *p pf ∅ R 1 0.5 [2109] 5 *p pf ∅ - 1 0.5 [602] Result (not solved) *p → f / [V,R] *p → p / [S] *p → pf / [R] Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 15 / 35
  14. 16.

    Example 3 Problem Explain the development of Middle-Chinese /d/ (MC=d)

    in Mandarin (M) in terms of the Middle Chinese tone (MT-T). ID MC M MT-T Count Cov Solve Examples 1 d t 3 30 0.58 [110, 880] 2 d t 2 11 0.21 [1550, 1555] 3 d t 4 11 0.21 [4680, 4685] 4 d th 1 45 0.96 [870, 875] 5 d th 4 2 0.04 [6815, 7975] Result (not solved) d → t / [MT-T={2,3}] d → th / [MT-T=1] Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 16 / 35
  15. 17.

    Inference without ancestors - 1 Figure: A set of reflexes.

    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 17 / 35
  16. 18.

    Inference without ancestors - 2 Figure: Candidate A - Most

    likely solution without additional information. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 18 / 35
  17. 19.

    Inference without ancestors - 3 Figure: Candidate B - Most

    likely solution without penalties for polytomies. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 19 / 35
  18. 20.

    Inference without ancestors - 4 Figure: Candidate C - A

    solution with a single ancestor. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 20 / 35
  19. 21.

    Inference without ancestors - 5 Figure: Candidate D - A

    bad solution. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 21 / 35
  20. 22.

    Cost and fitness Candidates are evaluated by two metrics Cost

    of the tree topology Cost of the implied transitions /p/ /b/ /tS/ /S/ /@/ /p/ 0.55 0.21 0.21 0.03 ∼ 0.00 /b/ 0.31 0.61 0.08 0.08 ∼ 0.00 /tS/ 0.08 0.04 0.54 0.35 ∼ 0.00 /S/ 0.09 0.09 0.33 0.48 ∼ 0.00 /@/ ∼ 0.00 ∼ 0.00 ∼ 0.00 0.17 0.83 Table: A mock transition matrix for the previous examples. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 22 / 35
  21. 23.

    Evaluation Evaluating the candidates above, with a penalty for polytomies:

    Tree Transition fit. Tree fit. Fitness Candidate A 2.75 0.50 3.25 Candidate B 2.20 0.33 2.53 Candidate C 1.34 0.25 1.59 Candidate D 0.07 0.25 0.32 Table: Results of the mock evaluation of the tables above. The higher the fitness, the better. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 23 / 35
  22. 24.

    Implementation Inference from cognates is a case of metaheuristic optimization.

    In first experiments, we built networks with sound changes as nodes and directed edges as the cost of applying such rules, selecting potential paths )from the k shortest paths (Yen 1971, Eppstein 1998). The current approach uses an informed evolutionary algorithm operating upon a prior fixed or relaxed tree. Figure: The 2007 NASA ST5 spacecraft antenna, designed with an evolutionary algorithm (Wikipedia). Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 24 / 35
  23. 25.

    Summary Multitiers are a work-in-progress method for sound change inference.

    A Python library integrated with lingpy is under development and will be published as beta this year. Hypotheses must be evaluated by human experts in a computer-assisted approach. Inference is not limited to “IPA” tiers, or to sound classes, and can be used for “prediction” or “construction”. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 25 / 35
  24. 26.

    Wise words “What I cannot create, I do not understand.”

    “Know how to solve every problem that has been solved.” – Richard Feynman (1988?) Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 26 / 35
  25. 27.

    References I Blevins, J. Evolutionary phonology: The emergence of sound

    patterns. Cambridge University Press, 2004. Bybee, J. Articulatory processing and frequency of sound change. The Oxford handbook of historical phonology (2015), 467–484. Chacon, T., and List, J.-M. Improved computational models of sound change shed light on the history of the tukanoan languages. Journal of Language Relationship 13.3 (2015), 177–204. Damerau, F. Mechanization of cognate recognition in comparative linguistics. Linguistics 13, 148 (1975), 5–30. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 27 / 35
  26. 28.

    References II Eppstein, D. Finding the k shortest paths. SIAM

    Journal on computing 28, 2 (1998), 652–673. Garrett, A. Sound change. The Routledge handbook of historical linguistics (2015), 227–248. Gleason, H. A. Genetic relationship among languages. Structure of Language and its Mathematical Aspects (1961), 179–189. Guy, J. B. An algorithm for identifying cognates in bilingual word-lists and its applicability to machine translation. Journal of Quantitative Linguistics 1, 1 (1994), 35–42. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 28 / 35
  27. 29.

    References III Hartman, L. Phono (version 4.0): Software for modeling

    regular historical sound change. In Actas: VIII Simposio Internacional de Comunicaci´ on Social, Santiago de Cuba, 20–24 de enero del 2003 (Santiago de Cuba, 2003), vol. 1, Centro de Ling¨ u´ ıstica Aplicada, Ministerio Ciencia, Santiago de Cuba, pp. 606–609. Hewson, J. Reconstructing prehistoric languages on the computer: The triumph of the electronic neogrammarian. In COLING 1973 Volume 1: Computational And Mathematical Linguistics: Proceedings of the International Conference on Computational Linguistics (1973), vol. 1. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 29 / 35
  28. 30.

    References IV Hewson, J. A computer-generated dictionary of Proto-Algonquian. Canadian

    Museum of Civilization, Hull, Quebec, 1993. Hruschka, D. J., Branford, S., Smith, E. D., Wilkins, J., Meade, A., Pagel, M., and Bhattacharya, T. Detecting regular sound changes in linguistics as events of concerted evolution. Current Biology 25, 1 (2015), 1–9. J¨ ager, G. Computational historical linguistics. arXiv preprint arXiv:1805.08099 (2018). Karttunen, L., and Beesley, K. R. Two-level rule compiler. Xerox Corporation. Palo Alto Research Center, 1992. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 30 / 35
  29. 31.

    References V Karttunen, L., Kaplan, R. M., and Zaenen, A.

    Two-level morphology with composition. In Proceedings of the 14th conference on Computational linguistics-Volume 1 (1992), Association for Computational Linguistics, pp. 141–148. Kondrak, G. Identification of cognates and recurrent sound correspondences in word lists. TAL 50, 2 (2009), 201–235. K¨ ummel, M. J. Konsonantenwandel: Bausteine zu einer Typologie des Lautwandels und ihre Konsequenzen f¨ ur die vergleichende Rekonstruktion. Reichert, Wiesbaden, 2007. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 31 / 35
  30. 32.

    References VI List, J.-M. Computer-Assisted Language Comparison: Reconciling computational and

    classical approaches in historical linguistics. Max Planck Institute for the Science of Human History, Jena, 2016. List, J.-M., Greenhill, S. J., and Gray, R. D. The potential of automatic word comparison for historical linguistics. PloS one 12, 1 (2017), e0170046. Marsico, E., Flavier, S., Verkerk, A., and Moran, S. Bdproto: A database of phonological inventories from ancient and reconstructed languages. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018) (2018). Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 32 / 35
  31. 33.

    References VII Rama, T., Borin, L., Mikros, G., and Macutek,

    J. Comparative evaluation of string similarity measures for automatic language classification., 2015. Ringe, D., Warnow, T., and Taylor, A. Indo-european and computational cladistics. Transactions of the philological society 100, 1 (2002), 59–129. Sims-Williams, P. Mechanising historical phonology. Transactions of the Philological Society (2018). Tresoldi, T., Anderson, C., and List, J.-M. Modelling sound change with the help of multi-tiered sequence representations. In Pozna´ n Linguistic Meeting 2018 (Pozna´ n, 2018), ?, p. ? Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 33 / 35
  32. 34.

    References VIII Yen, J. Y. Finding the k shortest loopless

    paths in a network. management Science 17, 11 (1971), 712–716. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 34 / 35
  33. 35.

    Automatic Inference of Sound Changes from Cognates Tiago Tresoldi Max

    Planck Institute for the Science of Human History (MPI-SHH, Jena) Computer-Assisted Language Comparison (CALC) Project July 4th, 2019 Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 35 / 35