Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Automatic Inference of Sound Changes from Cognates

Automatic Inference of Sound Changes from Cognates

Talk given at ICHL24 in Canberra (Australia)

Tiago Tresoldi

July 04, 2019
Tweet

More Decks by Tiago Tresoldi

Other Decks in Research

Transcript

  1. Automatic Inference of Sound Changes from Cognates
    Tiago Tresoldi
    Max Planck Institute for the Science of Human History (MPI-SHH, Jena)
    Computer-Assisted Language Comparison (CALC) Project
    July 4th, 2019
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 1 / 35

    View full-size slide

  2. Contents
    1 Introduction
    2 Method
    3 Inference with proto-forms
    4 Inference without proto-forms
    5 Results
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 2 / 35

    View full-size slide

  3. Background
    Automation of the comparative method has focused in tasks like
    sequence alignment, cognate detection, and tree inference.
    There is no generic tool for sound change inference.
    A problem related to, but different from, the inference of ancestral
    states (J¨
    ager 2018, List 2019), as a sound change is computationally a
    state machine.
    There are attempts at applying and evaluating sound changes in
    forward (e.g., Hartmann 2003) and backward reconstruction (e.g.,
    Hewson 1973 and Kondrak 2009), as well as doing phylogenetic
    analyses with the presence/absence of sound changes as characters.
    Difficulties due to suprasegmental changes (e.g., nasalization), ancestor
    states not attested in reflexes (e.g., PIE laryngeals), and conditioning
    information missing in reflexes (e.g., Verner’s law).
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 3 / 35

    View full-size slide

  4. Sound change inference
    Computationally complex problem.
    Linear growth with inventory size,
    but exponential growth when
    involving chains of sound changes.
    Lack of formalized prior
    probabilities.
    Some changes known to be
    common, but no catalog exists,
    nor a matrix of transition
    probabilities.
    Few supporting research, such as
    Blevins (2004), K¨
    ummel (2007),
    and Bybee (2015), besides
    automated ones as Hruschka et al.
    (2015).
    Figure: Visualization of transition
    matrix from Hruschka et al. (2015).
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 4 / 35

    View full-size slide

  5. Two related problems
    1. Sound change inference with proto-forms
    Given predecessors (“proto-forms”) and one or more successor (“reflexes”)
    states (“sounds”), infer the relations (“sound changes”) that best explain
    the observed correspondences.
    Ex.: A proto-language has */k/ and its descendants have either /k/
    or /tS/, what explains that?
    2. Sound change inference without proto-forms
    Given states (“sounds”) in related items (“cognates”), infer the ancestral
    state (“proto-sound”) and the relations (“sound changes”) that best
    explain the correspondences according to typological and evolutionary
    assumptions.
    Ex.: Language A has /tS/ corresponding to Language B /k/, what
    explains that?
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 5 / 35

    View full-size slide

  6. Method
    We are developing a method
    to model sequences with
    “multitiers” (parallel layers
    of information, such as
    features and classes),
    expanding on Chacon and
    List (2015) and Tresoldi et
    al. (2018).
    From sets of tiers provided
    as hypotheses, redundant
    information is pruned and
    only layers that result in
    information gain are kept.
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 6 / 35

    View full-size slide

  7. Tiers - 1
    Figure: Two aligned sound sequences.
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 7 / 35

    View full-size slide

  8. Edictor
    Figure: Screenshot of Edictor (http://edictor.digling.org/), from List
    (2018).
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 8 / 35

    View full-size slide

  9. Tiers - 2
    Figure: Multitier system with English, Dutch, Swedish, and Sranan sequence
    sounds, plus a positional Index tier.
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 9 / 35

    View full-size slide

  10. Tiers - 3
    Figure: Multitier system with English, Dutch, Swedish, and Sranan sequence
    sounds and sound classes, plus a positional Index tier.
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 10 / 35

    View full-size slide

  11. Tiers - 4
    Figure: Multitier system with English, Dutch, Swedish, and Sranan sequence
    sounds, sound classes, and following vowel status, plus a positional Index tier.
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 11 / 35

    View full-size slide

  12. Wordlist
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 12 / 35

    View full-size slide

  13. Tiers - 5
    Figure: Relational multitier system with sounds and word indexes for English,
    German, and Dutch, plus a positional Index tier.
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 13 / 35

    View full-size slide

  14. Example 1
    Problem
    Explain the development of initial (index=1) Proto-Germanic */s/ (PG=s)
    in German (G) in terms of the subsequent sound class in the “CV” model
    (PG-CV-R).
    ID Index PG G PG-CV-R Count Cov Solve Examples
    1 1 *s S C 78 1.0 * [19, 60]
    2 1 *s z V 13 1.0 * [123, 156]
    Solution
    *s → S / # [C]
    *s → z / # [V]
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 14 / 35

    View full-size slide

  15. Example 2
    Problem
    Explain the development of Proto-Germanic */p/ (PG=p) in German (G) in
    terms of the preceding (PG-D-L) or subsequent (PG-D-R) sound class in
    the “Dolgopolsky” model.
    ID PG G PG-D-L PG-D-R Count Cov Solve Examples
    1 *p f V ∅ 11 0.69 * [297, 861]
    2 *p f R ∅ 5 0.31 * [606, 794]
    3 *p p S ∅ 15 1 * [75, 218]
    4 *p pf ∅ R 1 0.5 [2109]
    5 *p pf ∅ - 1 0.5 [602]
    Result (not solved)
    *p → f / [V,R]
    *p → p / [S]
    *p → pf / [R]
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 15 / 35

    View full-size slide

  16. Example 3
    Problem
    Explain the development of Middle-Chinese /d/ (MC=d) in Mandarin (M) in
    terms of the Middle Chinese tone (MT-T).
    ID MC M MT-T Count Cov Solve Examples
    1 d t 3 30 0.58 [110, 880]
    2 d t 2 11 0.21 [1550, 1555]
    3 d t 4 11 0.21 [4680, 4685]
    4 d th 1 45 0.96 [870, 875]
    5 d th 4 2 0.04 [6815, 7975]
    Result (not solved)
    d → t / [MT-T={2,3}]
    d → th / [MT-T=1]
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 16 / 35

    View full-size slide

  17. Inference without ancestors - 1
    Figure: A set of reflexes.
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 17 / 35

    View full-size slide

  18. Inference without ancestors - 2
    Figure: Candidate A - Most likely solution without additional information.
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 18 / 35

    View full-size slide

  19. Inference without ancestors - 3
    Figure: Candidate B - Most likely solution without penalties for polytomies.
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 19 / 35

    View full-size slide

  20. Inference without ancestors - 4
    Figure: Candidate C - A solution with a single ancestor.
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 20 / 35

    View full-size slide

  21. Inference without ancestors - 5
    Figure: Candidate D - A bad solution.
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 21 / 35

    View full-size slide

  22. Cost and fitness
    Candidates are evaluated by two metrics
    Cost of the tree topology
    Cost of the implied transitions
    /p/ /b/ /tS/ /S/ /@/
    /p/ 0.55 0.21 0.21 0.03 ∼ 0.00
    /b/ 0.31 0.61 0.08 0.08 ∼ 0.00
    /tS/ 0.08 0.04 0.54 0.35 ∼ 0.00
    /S/ 0.09 0.09 0.33 0.48 ∼ 0.00
    /@/ ∼ 0.00 ∼ 0.00 ∼ 0.00 0.17 0.83
    Table: A mock transition matrix for the previous examples.
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 22 / 35

    View full-size slide

  23. Evaluation
    Evaluating the candidates above, with a penalty for polytomies:
    Tree Transition fit. Tree fit. Fitness
    Candidate A 2.75 0.50 3.25
    Candidate B 2.20 0.33 2.53
    Candidate C 1.34 0.25 1.59
    Candidate D 0.07 0.25 0.32
    Table: Results of the mock evaluation of the tables above. The higher the fitness,
    the better.
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 23 / 35

    View full-size slide

  24. Implementation
    Inference from cognates is a case of
    metaheuristic optimization.
    In first experiments, we built
    networks with sound changes as
    nodes and directed edges as the
    cost of applying such rules, selecting
    potential paths )from the k shortest
    paths (Yen 1971, Eppstein 1998).
    The current approach uses an
    informed evolutionary algorithm
    operating upon a prior fixed or
    relaxed tree.
    Figure: The 2007 NASA ST5
    spacecraft antenna, designed with an
    evolutionary algorithm (Wikipedia).
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 24 / 35

    View full-size slide

  25. Summary
    Multitiers are a work-in-progress method for sound change inference.
    A Python library integrated with lingpy is under development and will
    be published as beta this year.
    Hypotheses must be evaluated by human experts in a
    computer-assisted approach.
    Inference is not limited to “IPA” tiers, or to sound classes, and can be
    used for “prediction” or “construction”.
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 25 / 35

    View full-size slide

  26. Wise words
    “What I cannot create, I do not understand.”
    “Know how to solve every problem that has been solved.”
    – Richard Feynman (1988?)
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 26 / 35

    View full-size slide

  27. References I
    Blevins, J.
    Evolutionary phonology: The emergence of sound patterns.
    Cambridge University Press, 2004.
    Bybee, J.
    Articulatory processing and frequency of sound change.
    The Oxford handbook of historical phonology (2015), 467–484.
    Chacon, T., and List, J.-M.
    Improved computational models of sound change shed light on the
    history of the tukanoan languages.
    Journal of Language Relationship 13.3 (2015), 177–204.
    Damerau, F.
    Mechanization of cognate recognition in comparative linguistics.
    Linguistics 13, 148 (1975), 5–30.
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 27 / 35

    View full-size slide

  28. References II
    Eppstein, D.
    Finding the k shortest paths.
    SIAM Journal on computing 28, 2 (1998), 652–673.
    Garrett, A.
    Sound change.
    The Routledge handbook of historical linguistics (2015), 227–248.
    Gleason, H. A.
    Genetic relationship among languages.
    Structure of Language and its Mathematical Aspects (1961), 179–189.
    Guy, J. B.
    An algorithm for identifying cognates in bilingual word-lists and its
    applicability to machine translation.
    Journal of Quantitative Linguistics 1, 1 (1994), 35–42.
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 28 / 35

    View full-size slide

  29. References III
    Hartman, L.
    Phono (version 4.0): Software for modeling regular historical sound
    change.
    In Actas: VIII Simposio Internacional de Comunicaci´
    on Social,
    Santiago de Cuba, 20–24 de enero del 2003 (Santiago de Cuba,
    2003), vol. 1, Centro de Ling¨

    ıstica Aplicada, Ministerio Ciencia,
    Santiago de Cuba, pp. 606–609.
    Hewson, J.
    Reconstructing prehistoric languages on the computer: The triumph of
    the electronic neogrammarian.
    In COLING 1973 Volume 1: Computational And Mathematical
    Linguistics: Proceedings of the International Conference on
    Computational Linguistics (1973), vol. 1.
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 29 / 35

    View full-size slide

  30. References IV
    Hewson, J.
    A computer-generated dictionary of Proto-Algonquian.
    Canadian Museum of Civilization, Hull, Quebec, 1993.
    Hruschka, D. J., Branford, S., Smith, E. D., Wilkins, J.,
    Meade, A., Pagel, M., and Bhattacharya, T.
    Detecting regular sound changes in linguistics as events of concerted
    evolution.
    Current Biology 25, 1 (2015), 1–9.

    ager, G.
    Computational historical linguistics.
    arXiv preprint arXiv:1805.08099 (2018).
    Karttunen, L., and Beesley, K. R.
    Two-level rule compiler.
    Xerox Corporation. Palo Alto Research Center, 1992.
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 30 / 35

    View full-size slide

  31. References V
    Karttunen, L., Kaplan, R. M., and Zaenen, A.
    Two-level morphology with composition.
    In Proceedings of the 14th conference on Computational
    linguistics-Volume 1 (1992), Association for Computational
    Linguistics, pp. 141–148.
    Kondrak, G.
    Identification of cognates and recurrent sound correspondences in
    word lists.
    TAL 50, 2 (2009), 201–235.

    ummel, M. J.
    Konsonantenwandel: Bausteine zu einer Typologie des Lautwandels
    und ihre Konsequenzen f¨
    ur die vergleichende Rekonstruktion.
    Reichert, Wiesbaden, 2007.
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 31 / 35

    View full-size slide

  32. References VI
    List, J.-M.
    Computer-Assisted Language Comparison: Reconciling computational
    and classical approaches in historical linguistics.
    Max Planck Institute for the Science of Human History, Jena, 2016.
    List, J.-M., Greenhill, S. J., and Gray, R. D.
    The potential of automatic word comparison for historical linguistics.
    PloS one 12, 1 (2017), e0170046.
    Marsico, E., Flavier, S., Verkerk, A., and Moran, S.
    Bdproto: A database of phonological inventories from ancient and
    reconstructed languages.
    In Proceedings of the Eleventh International Conference on Language
    Resources and Evaluation (LREC-2018) (2018).
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 32 / 35

    View full-size slide

  33. References VII
    Rama, T., Borin, L., Mikros, G., and Macutek, J.
    Comparative evaluation of string similarity measures for automatic
    language classification., 2015.
    Ringe, D., Warnow, T., and Taylor, A.
    Indo-european and computational cladistics.
    Transactions of the philological society 100, 1 (2002), 59–129.
    Sims-Williams, P.
    Mechanising historical phonology.
    Transactions of the Philological Society (2018).
    Tresoldi, T., Anderson, C., and List, J.-M.
    Modelling sound change with the help of multi-tiered sequence
    representations.
    In Pozna´
    n Linguistic Meeting 2018 (Pozna´
    n, 2018), ?, p. ?
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 33 / 35

    View full-size slide

  34. References VIII
    Yen, J. Y.
    Finding the k shortest loopless paths in a network.
    management Science 17, 11 (1971), 712–716.
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 34 / 35

    View full-size slide

  35. Automatic Inference of Sound Changes from Cognates
    Tiago Tresoldi
    Max Planck Institute for the Science of Human History (MPI-SHH, Jena)
    Computer-Assisted Language Comparison (CALC) Project
    July 4th, 2019
    Tresoldi, T. (MPI-SHH) Sound change inference July 4th, 2019 35 / 35

    View full-size slide