Planck Institute for the Science of Human History (MPI-SHH, Jena) Computer-Assisted Language Comparison (CALC) Project July 4th, 2019 Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 1 / 35
like sequence alignment, cognate detection, and tree inference. There is no generic tool for sound change inference. A problem related to, but different from, the inference of ancestral states (J¨ ager 2018, List 2019), as a sound change is computationally a state machine. There are attempts at applying and evaluating sound changes in forward (e.g., Hartmann 2003) and backward reconstruction (e.g., Hewson 1973 and Kondrak 2009), as well as doing phylogenetic analyses with the presence/absence of sound changes as characters. Difficulties due to suprasegmental changes (e.g., nasalization), ancestor states not attested in reflexes (e.g., PIE laryngeals), and conditioning information missing in reflexes (e.g., Verner’s law). Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 3 / 35
size, but exponential growth when involving chains of sound changes. Lack of formalized prior probabilities. Some changes known to be common, but no catalog exists, nor a matrix of transition probabilities. Few supporting research, such as Blevins (2004), K¨ ummel (2007), and Bybee (2015), besides automated ones as Hruschka et al. (2015). Figure: Visualization of transition matrix from Hruschka et al. (2015). Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 4 / 35
predecessors (“proto-forms”) and one or more successor (“reflexes”) states (“sounds”), infer the relations (“sound changes”) that best explain the observed correspondences. Ex.: A proto-language has */k/ and its descendants have either /k/ or /tS/, what explains that? 2. Sound change inference without proto-forms Given states (“sounds”) in related items (“cognates”), infer the ancestral state (“proto-sound”) and the relations (“sound changes”) that best explain the correspondences according to typological and evolutionary assumptions. Ex.: Language A has /tS/ corresponding to Language B /k/, what explains that? Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 5 / 35
“multitiers” (parallel layers of information, such as features and classes), expanding on Chacon and List (2015) and Tresoldi et al. (2018). From sets of tiers provided as hypotheses, redundant information is pruned and only layers that result in information gain are kept. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 6 / 35
and Sranan sequence sounds, sound classes, and following vowel status, plus a positional Index tier. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 11 / 35
*/s/ (PG=s) in German (G) in terms of the subsequent sound class in the “CV” model (PG-CV-R). ID Index PG G PG-CV-R Count Cov Solve Examples 1 1 *s S C 78 1.0 * [19, 60] 2 1 *s z V 13 1.0 * [123, 156] Solution *s → S / # [C] *s → z / # [V] Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 14 / 35
in German (G) in terms of the preceding (PG-D-L) or subsequent (PG-D-R) sound class in the “Dolgopolsky” model. ID PG G PG-D-L PG-D-R Count Cov Solve Examples 1 *p f V ∅ 11 0.69 * [297, 861] 2 *p f R ∅ 5 0.31 * [606, 794] 3 *p p S ∅ 15 1 * [75, 218] 4 *p pf ∅ R 1 0.5 [2109] 5 *p pf ∅ - 1 0.5 [602] Result (not solved) *p → f / [V,R] *p → p / [S] *p → pf / [R] Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 15 / 35
in Mandarin (M) in terms of the Middle Chinese tone (MT-T). ID MC M MT-T Count Cov Solve Examples 1 d t 3 30 0.58 [110, 880] 2 d t 2 11 0.21 [1550, 1555] 3 d t 4 11 0.21 [4680, 4685] 4 d th 1 45 0.96 [870, 875] 5 d th 4 2 0.04 [6815, 7975] Result (not solved) d → t / [MT-T={2,3}] d → th / [MT-T=1] Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 16 / 35
Tree Transition fit. Tree fit. Fitness Candidate A 2.75 0.50 3.25 Candidate B 2.20 0.33 2.53 Candidate C 1.34 0.25 1.59 Candidate D 0.07 0.25 0.32 Table: Results of the mock evaluation of the tables above. The higher the fitness, the better. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 23 / 35
In first experiments, we built networks with sound changes as nodes and directed edges as the cost of applying such rules, selecting potential paths )from the k shortest paths (Yen 1971, Eppstein 1998). The current approach uses an informed evolutionary algorithm operating upon a prior fixed or relaxed tree. Figure: The 2007 NASA ST5 spacecraft antenna, designed with an evolutionary algorithm (Wikipedia). Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 24 / 35
A Python library integrated with lingpy is under development and will be published as beta this year. Hypotheses must be evaluated by human experts in a computer-assisted approach. Inference is not limited to “IPA” tiers, or to sound classes, and can be used for “prediction” or “construction”. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 25 / 35
patterns. Cambridge University Press, 2004. Bybee, J. Articulatory processing and frequency of sound change. The Oxford handbook of historical phonology (2015), 467–484. Chacon, T., and List, J.-M. Improved computational models of sound change shed light on the history of the tukanoan languages. Journal of Language Relationship 13.3 (2015), 177–204. Damerau, F. Mechanization of cognate recognition in comparative linguistics. Linguistics 13, 148 (1975), 5–30. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 27 / 35
Journal on computing 28, 2 (1998), 652–673. Garrett, A. Sound change. The Routledge handbook of historical linguistics (2015), 227–248. Gleason, H. A. Genetic relationship among languages. Structure of Language and its Mathematical Aspects (1961), 179–189. Guy, J. B. An algorithm for identifying cognates in bilingual word-lists and its applicability to machine translation. Journal of Quantitative Linguistics 1, 1 (1994), 35–42. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 28 / 35
regular historical sound change. In Actas: VIII Simposio Internacional de Comunicaci´ on Social, Santiago de Cuba, 20–24 de enero del 2003 (Santiago de Cuba, 2003), vol. 1, Centro de Ling¨ u´ ıstica Aplicada, Ministerio Ciencia, Santiago de Cuba, pp. 606–609. Hewson, J. Reconstructing prehistoric languages on the computer: The triumph of the electronic neogrammarian. In COLING 1973 Volume 1: Computational And Mathematical Linguistics: Proceedings of the International Conference on Computational Linguistics (1973), vol. 1. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 29 / 35
Museum of Civilization, Hull, Quebec, 1993. Hruschka, D. J., Branford, S., Smith, E. D., Wilkins, J., Meade, A., Pagel, M., and Bhattacharya, T. Detecting regular sound changes in linguistics as events of concerted evolution. Current Biology 25, 1 (2015), 1–9. J¨ ager, G. Computational historical linguistics. arXiv preprint arXiv:1805.08099 (2018). Karttunen, L., and Beesley, K. R. Two-level rule compiler. Xerox Corporation. Palo Alto Research Center, 1992. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 30 / 35
Two-level morphology with composition. In Proceedings of the 14th conference on Computational linguistics-Volume 1 (1992), Association for Computational Linguistics, pp. 141–148. Kondrak, G. Identification of cognates and recurrent sound correspondences in word lists. TAL 50, 2 (2009), 201–235. K¨ ummel, M. J. Konsonantenwandel: Bausteine zu einer Typologie des Lautwandels und ihre Konsequenzen f¨ ur die vergleichende Rekonstruktion. Reichert, Wiesbaden, 2007. Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 31 / 35
classical approaches in historical linguistics. Max Planck Institute for the Science of Human History, Jena, 2016. List, J.-M., Greenhill, S. J., and Gray, R. D. The potential of automatic word comparison for historical linguistics. PloS one 12, 1 (2017), e0170046. Marsico, E., Flavier, S., Verkerk, A., and Moran, S. Bdproto: A database of phonological inventories from ancient and reconstructed languages. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018) (2018). Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 32 / 35
J. Comparative evaluation of string similarity measures for automatic language classification., 2015. Ringe, D., Warnow, T., and Taylor, A. Indo-european and computational cladistics. Transactions of the philological society 100, 1 (2002), 59–129. Sims-Williams, P. Mechanising historical phonology. Transactions of the Philological Society (2018). Tresoldi, T., Anderson, C., and List, J.-M. Modelling sound change with the help of multi-tiered sequence representations. In Pozna´ n Linguistic Meeting 2018 (Pozna´ n, 2018), ?, p. ? Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 33 / 35
Planck Institute for the Science of Human History (MPI-SHH, Jena) Computer-Assisted Language Comparison (CALC) Project July 4th, 2019 Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 35 / 35