Modelling sound change with the help of multi-tiered sequence representations

74ebca07ccf49343d1ddaef84d65b78e?s=47 Tiago Tresoldi
September 15, 2018

Modelling sound change with the help of multi-tiered sequence representations


Tiago Tresoldi

September 15, 2018


  1. 1.

    Modelling sound change with the help of multi-tiered sequence representations

    Tiago Tresoldi, Cormac Anderson, and Johann-Mattis List Max-Planck-Institut für Menschheitsgeschichte (MPI-SHH, Jena) Poznań, September 15th 2018
  2. 2.

    Issues in computational historical phonology • Computational historical linguistics has

    been transferring and adapting models and methods from evolutionary biology ◦ Increasing availability of large digital corpora of cross-linguistic data ◦ Phylogenetic turn • Despite the advances, we have been dealing mostly with lexical and cognacy characters: phonetic and phonological tasks are still generally performed in the traditional way, without the assistance computers could give ◦ One of the main reasons for this is that, while it might seem straightforward to compare sound sequences with genetic ones, there are striking differences ◦ The analogies between linguistic and biological basic sequences (i.e., sequences of sounds and sequences of genetic bases) breaks down when we consider the underlying alphabets and the assumptions involved ◦ This is unfortunate, as we possess lots of language data transcribed in this manner
  3. 3.

    Properties of alphabetic transcriptions • Unlike genetic bases phonological character

    sets ("alphabets") are language-specific and vary in number and detail ◦ As we render in a discrete way what is continuous, there is always some level of information loss • Phonetic and phonological transcriptions are idealised representations of various levels of abstraction of multidimensional and continuous information ◦ Not necessarily captured by a single vector of information ◦ Many phonological domains are not commensurate to a segment • None of the proposed solutions for dealing with the difficulty of modelling such sound sequences has become standard, and none is suitable for the computational treatment of three of the main tasks of historical phonology: ◦ Word generation/evaluation ◦ Rule inference ◦ Output prediction
  4. 4.

    Segmental sequences: phonological issues Besides the uncertainty in terms of

    how many discrete units to consider for a given system (the problem of non-uniqueness), phonologies have a number of non-segmental properties relevant for sound change, e.g.: • sound changes frequently act on natural classes, not individual segments ◦ while class-defining features such as localisation, manner, voice, etc. are included in IPA graphemes, they are conflated (e.g. /b/ as a feature bundle “bilabial”, “stop”, “voiced”, possibly also implicit negative information “cannot be a syllable nucleus”) • stress, tone, etc. are non-segmental and operate over a domain much larger than that of a segment, frequently determining what segments can occur, e.g. in unstressed position • it is not infrequent for also melodic features to operate over domains larger than a segment, e.g. vowel harmony, distance effects, etc. • explicit recognition of this in i.a. Firthian phonology (prosodies)
  5. 5.

    Segmental sequences: other issues The idealisation of alphabetic transcription is

    insufficient for machine-representation also in further cases: • word frequency • register • part of speech • donor language and period of borrowing in the case of loanwords • contrasting information, such as cases where authors diverge in terms of a related proto-form • combined information from different word forms, perhaps also from cognates, that might aid us in the identification of changes
  6. 6.

    Our proposal: tiers • We propose to use extensive annotation

    to deal with these issues: “tiers” ◦ these must be parallel, multilayered, and conceptually linked ◦ linguistically, this can begin by involving annotation of alphabetic transcriptions with (the many) distinctive feature systems that researchers have been using for decades, thus also recreating natural classes, but need not be limited to this ◦ computationally, the layers are analogous to solutions used in stochastic methods such as Layered Markov Models • In our proposal, a potentially large number of "tiers" can be expressed in its relationship to a given sequence (i.e., word) ◦ while the most obvious tiers are distinctive features, suprasegmental information and extra lexical information can accommodate all kinds of information, including the relationship between two or more words ◦ there is no need to discuss "context" in terms of subsequences, as each aligned position can hold all the necessary information (and algorithms can be used to identify which tiers are informative and which are redundant)
  7. 7.

    Tiers as annotation • Our proposal is inspired by linguistic

    annotation in general. Similar to linguistic annotation of corpora, which provides an “added value” (Milà Garcia 2018: 271), our annotation framework that represents one sound sequence as a supra-sequence consisting of multiple annotation layers, we add value to pure alphabetic transcriptions in order to overcome their well-known disadvantages. While these disadvantages can be easily handled in manual approaches, for computational approaches it is indispensable that the annotations are explicit. This is what our framework makes possible. • Our proposal has predecessors in historical linguistics, and especially in Hoenigswald (1990) we can find an annotation of accented versus unaccented initial stops in Germanic that is very similar to our idea of using complex annotations to increase the expressiveness of classical transcription.
  8. 8.

    An initial example: "cat" Tier name Grapheme c a t

    Phoneme k æ t Position 1 2 3 CV C V C Voiceness 0 1 0 Sound class K A T Preceding sound class (SC -1) ∅ K A Following sound class (SC +1) A T ∅ ... ... ... ...
  9. 9.

    Correspondences The tiers are not limited to phonological information such

    as distinctive features. They can be used to encode other types of information, such as: • Grammatical properties (for example, when modeling processes that only apply to a given part-of-speech) • Statistical properties (such as word frequency, when modeling processes that might only apply under a certain threshold) • Historical and social properties (such the value of a given tier in a cognate in another language, in a dialectal variety) • Linguistic disagreements (for dealing with and evaluating instances in which there different authors give different accounts)
  10. 10.

    Task 1: word generation/evaluation • The generation of random words

    (e.g., in psycholinguistic experiments) or the evaluation of the naturalness of a random word (i.e., its statistical likelihood given a set of other observed languages) is usually carried out either by generative patterns or by Markov models ◦ Generative patterns tend to be repetitive, favoring the most frequent value of each aspect (syllable structure, sound distribution, etc.), with difficulties in modelling even basic phonotactics ◦ Markov models have a short-attention span and cannot use too large n-gram window or they start overfitting; they may also fail for more complex models such as vowel harmony • Multitiers can potentially be used as alternatives to RNNs (Recurrent Neural Networks), but as a human interpretable alternative ◦ computer-assisted not computer-performed studies • While word generation/ evaluation primarily regards the synchronic level, it is also necessary to evaluate the plausibility of language states, not just language processes
  11. 11.

    Task 2: Rule inference We do not know of any

    attempts to automatise of formalise the inference of sound change rules (sound laws) that account for the development of ancestral words to their descendant words when given only a set of ancestral forms (usually reconstructed) aligned with their descendant forms (usually in an attested language). This task is so far almost exclusively done manually by the experts. The particular problems of rule inference are manifold, and we do not need to list them all. We emphasise, however, that it would be highly desirable for historical linguistics to provide a formalised approach, since this would not only allow us to test different approaches against each other, but also to evaluate potential approaches. We ran experiments with Germanic and Chinese data, which are here briefly presented. They are intended to illustrate what multitiers are and what they can potentially do.
  12. 12.

    From Middle Chinese to Mandarin We ran similar experiments to

    investigate the tone development from Middle Chinese to Mandarin, using MC data from Newman and Raman (1999), kindly provided by the authors. The development from Middle Chinese to Mandarin has a peculiar change in the voiced plosives (notably MC *b and *d) which have reflexes of devoiced counterparts (p, and d) as well as devoiced and aspirated counterparts (pʰ, and tʰ) in Mandarin Chinese. We know from previous research that the reason for this lies in the Middle Chinese tones (tone 1 in MC triggers aspiration, while the other tones 2-4 show only devoicing). With our multi-tier approach, we can easily test this on Newman's dataset. In order to do so, we add a tone-tier to the Middle Chinese words in the data starting with *b and *d and check the reflexes in Mandarin.
  13. 13.

    From Middle Chinese to Mandarin MCH MCH-TONE Mandarin Frequency d

    1 tʰ 45 d 4 tʰ 2 d 2 t 11 d 3 t 30 d 4 t 11 b 1 pʰ 31 b 3 pʰ 1 b 4 pʰ 1 b 2 p 15 b 3 p 16 b 4 p 4
  14. 14.

    From Middle Chinese to Mandarin The pattern we describe here

    is by no means NEW or unknown to historical linguists, although it is rarely mentioned in the literature. We can find the pattern quickly with multi-tiered sequence presentation, provided that we test for the correlation of voicing and devoicing patterns from Middle Chinese to Mandarin and tone. Although we only show in this example what is already known, our approach to the problem with multi-tiered sequence representations illustrates that we can in fact use multi-tiers for quick tests on data that has so far not yet been analysed in this way (e.g., testing devoicing and tone development in other SEA language families). Or we could search for exceptions in datasets, as we have done in this example.
  15. 15.

    Task 3: Word prediction Multi-tiers can also be also for

    word prediction, as the process is essentially the inverse of the rule induction of task #2: • given a set of rules which manipulates a sequence, in cases where the reflex of a proto-form is missing we could automatically generate the expected reflex from the rules inferred from other words, checking for cases where the reflex was subject to a major semantic shift ◦ Historical linguists have been doing this for centuries, but multitiers would allow us to partially automate the task, providing computer assistance to researchers
  16. 16.

    Future work We are developing the multitiers as an independent

    library for the Python programming language, with the intention of merging it with LingPy in the future. Possibility of annotating and testing different theories of phonological representation
  17. 17.

    References Blevins, J. 2004. Evolutionary Phonology: The Emergence of Sound

    Patterns. Cambridge University Press. Bouchard-Côté, Alexandre, David Hall, Thomas L. Griffiths, and Dan Klein. 2013. “Automated Reconstruction of Ancient Languages Using Probabilistic Models of Sound Change.” Proceedings of the National Academy of Sciences of the United States of America 110 (11): 4224–9. Chomsky, Noam, and Morris Halle. 1968. The Sound Pattern of English. New York; Evanston; London: Harper; Row. Durbin, Richard, Sean R. Eddy, Anders Krogh, and Graeme Mitchinson. (1998) 2002. Biological Sequence Analysis. Probabilistic Models of Proteins and Nucleic Acids. 7th ed. Cambridge: Cambridge University Press. Firth, John Rupert. 1948. “Sounds and Peosodies.” Transactions of the Philological Society 47 (1). Wiley Online Library: 127–52. Harris, Zellig Sabbettai. 1963. “Structural Linguistics.” Chicago University Press. Hartman, Lee. 2003. “Phono (Version 4.0): Software for Modeling Regular Historical Sound Change.” Santiago de Cuba. Jakobson, Roman, C Gunnar Fant, and Morris Halle. 1951. “Preliminaries to Speech Analysis: The Distinctive Features and Their Correlates.” Kluge and Seebold (2002): Etymologisches Wörterbuch der deutschen Sprache. De Grutyer. Ladefoged, P., and I. Maddieson. 1996. The Sounds of the World’s Languages. Phonological Theory. Wiley. List, Johann-Mattis. 2014. Sequence Comparison in Historical Linguistics. Düsseldorf: Düsseldorf University Press. List, Johann-Mattis, and Thiago Chacon. 2015. “Towards a Cross-Linguistic Database for Historical Phonology? A Proposal for a Machine Readable Modeling of Phonetic Context.” Leiden. Mielke, Jeff, 2008. The Emergence of distinctive features. OUP Oxford. Newman, J. and Anand, V. Raman, 1999. Historical Chinese Phonology: A Compendium of Beijing and Cantonese Pronunciations of Characters and their Derivations from Middle Chinese. Newcastle and München: Lincom. Wheeler, W. C., and Peter M. Whiteley. 2015. “Historical Linguistics as a Sequence Optimization Problem: The Evolution and Biogeography of Uto-Aztecan Languages.” Cladistics 31 (2): 113–25.