Modelling sound change with
the help of multi-tiered
Tiago Tresoldi, Cormac Anderson, and Johann-Mattis List
Max-Planck-Institut für Menschheitsgeschichte (MPI-SHH, Jena)
Poznań, September 15th 2018
Issues in computational historical phonology
● Computational historical linguistics has been transferring and adapting models
and methods from evolutionary biology
○ Increasing availability of large digital corpora of cross-linguistic data
○ Phylogenetic turn
● Despite the advances, we have been dealing mostly with lexical and cognacy
characters: phonetic and phonological tasks are still generally performed in the
traditional way, without the assistance computers could give
○ One of the main reasons for this is that, while it might seem straightforward to compare sound
sequences with genetic ones, there are striking differences
○ The analogies between linguistic and biological basic sequences (i.e., sequences of sounds and
sequences of genetic bases) breaks down when we consider the underlying alphabets and the
○ This is unfortunate, as we possess lots of language data transcribed in this manner
Properties of alphabetic transcriptions
● Unlike genetic bases phonological character sets ("alphabets") are
language-specific and vary in number and detail
○ As we render in a discrete way what is continuous, there is always some level of information loss
● Phonetic and phonological transcriptions are idealised representations of various
levels of abstraction of multidimensional and continuous information
○ Not necessarily captured by a single vector of information
○ Many phonological domains are not commensurate to a segment
● None of the proposed solutions for dealing with the difficulty of modelling such
sound sequences has become standard, and none is suitable for the
computational treatment of three of the main tasks of historical phonology:
○ Word generation/evaluation
○ Rule inference
○ Output prediction
Segmental sequences: phonological issues
Besides the uncertainty in terms of how many discrete units to consider for a given
system (the problem of non-uniqueness), phonologies have a number of
non-segmental properties relevant for sound change, e.g.:
● sound changes frequently act on natural classes, not individual segments
○ while class-defining features such as localisation, manner, voice, etc. are included in IPA graphemes,
they are conflated (e.g. /b/ as a feature bundle “bilabial”, “stop”, “voiced”, possibly also implicit
negative information “cannot be a syllable nucleus”)
● stress, tone, etc. are non-segmental and operate over a domain much larger than
that of a segment, frequently determining what segments can occur, e.g. in
● it is not infrequent for also melodic features to operate over domains larger than a
segment, e.g. vowel harmony, distance effects, etc.
● explicit recognition of this in i.a. Firthian phonology (prosodies)
Segmental sequences: other issues
The idealisation of alphabetic transcription is insufficient for machine-representation
also in further cases:
● word frequency
● part of speech
● donor language and period of borrowing in the case of loanwords
● contrasting information, such as cases where authors diverge in terms of a related
● combined information from different word forms, perhaps also from cognates,
that might aid us in the identification of changes
Our proposal: tiers
● We propose to use extensive annotation to deal with these issues: “tiers”
○ these must be parallel, multilayered, and conceptually linked
○ linguistically, this can begin by involving annotation of alphabetic transcriptions with (the many)
distinctive feature systems that researchers have been using for decades, thus also recreating
natural classes, but need not be limited to this
○ computationally, the layers are analogous to solutions used in stochastic methods such as Layered
● In our proposal, a potentially large number of "tiers" can be expressed in its
relationship to a given sequence (i.e., word)
○ while the most obvious tiers are distinctive features, suprasegmental information and extra lexical
information can accommodate all kinds of information, including the relationship between two or
○ there is no need to discuss "context" in terms of subsequences, as each aligned position can hold all
the necessary information (and algorithms can be used to identify which tiers are informative and
which are redundant)
Tiers as annotation
● Our proposal is inspired by linguistic annotation in general. Similar to linguistic
annotation of corpora, which provides an “added value” (Milà Garcia 2018: 271),
our annotation framework that represents one sound sequence as a
supra-sequence consisting of multiple annotation layers, we add value to pure
alphabetic transcriptions in order to overcome their well-known disadvantages.
While these disadvantages can be easily handled in manual approaches, for
computational approaches it is indispensable that the annotations are explicit.
This is what our framework makes possible.
● Our proposal has predecessors in historical linguistics, and especially in
Hoenigswald (1990) we can find an annotation of accented versus unaccented
initial stops in Germanic that is very similar to our idea of using complex
annotations to increase the expressiveness of classical transcription.
An initial example: "cat"
Grapheme c a t
Phoneme k æ t
Position 1 2 3
CV C V C
Voiceness 0 1 0
Sound class K A T
Preceding sound class (SC -1) ∅ K A
Following sound class (SC +1) A T ∅
... ... ... ...
The tiers are not limited to phonological information such as distinctive features. They
can be used to encode other types of information, such as:
● Grammatical properties (for example, when modeling processes that only apply to
a given part-of-speech)
● Statistical properties (such as word frequency, when modeling processes that
might only apply under a certain threshold)
● Historical and social properties (such the value of a given tier in a cognate in
another language, in a dialectal variety)
● Linguistic disagreements (for dealing with and evaluating instances in which there
different authors give different accounts)
Task 1: word generation/evaluation
● The generation of random words (e.g., in psycholinguistic experiments) or the
evaluation of the naturalness of a random word (i.e., its statistical likelihood given
a set of other observed languages) is usually carried out either by generative
patterns or by Markov models
○ Generative patterns tend to be repetitive, favoring the most frequent value of each aspect (syllable
structure, sound distribution, etc.), with difficulties in modelling even basic phonotactics
○ Markov models have a short-attention span and cannot use too large n-gram window or they start
overfitting; they may also fail for more complex models such as vowel harmony
● Multitiers can potentially be used as alternatives to RNNs (Recurrent Neural
Networks), but as a human interpretable alternative
○ computer-assisted not computer-performed studies
● While word generation/ evaluation primarily regards the synchronic level, it is also
necessary to evaluate the plausibility of language states, not just language
Task 2: Rule inference
We do not know of any attempts to automatise of formalise the inference of sound
change rules (sound laws) that account for the development of ancestral words to their
descendant words when given only a set of ancestral forms (usually reconstructed)
aligned with their descendant forms (usually in an attested language). This task is so far
almost exclusively done manually by the experts.
The particular problems of rule inference are manifold, and we do not need to list them
all. We emphasise, however, that it would be highly desirable for historical linguistics to
provide a formalised approach, since this would not only allow us to test different
approaches against each other, but also to evaluate potential approaches.
We ran experiments with Germanic and Chinese data, which are here briefly presented.
They are intended to illustrate what multitiers are and what they can potentially do.
From Middle Chinese to Mandarin
We ran similar experiments to investigate the tone development from Middle Chinese
to Mandarin, using MC data from Newman and Raman (1999), kindly provided by the
authors. The development from Middle Chinese to Mandarin has a peculiar change in
the voiced plosives (notably MC *b and *d) which have reflexes of devoiced
counterparts (p, and d) as well as devoiced and aspirated counterparts (pʰ, and tʰ) in
We know from previous research that the reason for this lies in the Middle Chinese
tones (tone 1 in MC triggers aspiration, while the other tones 2-4 show only devoicing).
With our multi-tier approach, we can easily test this on Newman's dataset. In order to
do so, we add a tone-tier to the Middle Chinese words in the data starting with *b and
*d and check the reflexes in Mandarin.
From Middle Chinese to Mandarin
MCH MCH-TONE Mandarin Frequency
d 1 tʰ 45
d 4 tʰ 2
d 2 t 11
d 3 t 30
d 4 t 11
b 1 pʰ 31
b 3 pʰ 1
b 4 pʰ 1
b 2 p 15
b 3 p 16
b 4 p 4
From Middle Chinese to Mandarin
The pattern we describe here is by no means NEW or unknown to historical linguists,
although it is rarely mentioned in the literature.
We can find the pattern quickly with multi-tiered sequence presentation, provided that
we test for the correlation of voicing and devoicing patterns from Middle Chinese to
Mandarin and tone.
Although we only show in this example what is already known, our approach to the
problem with multi-tiered sequence representations illustrates that we can in fact use
multi-tiers for quick tests on data that has so far not yet been analysed in this way (e.g.,
testing devoicing and tone development in other SEA language families). Or we could
search for exceptions in datasets, as we have done in this example.
Task 3: Word prediction
Multi-tiers can also be also for word prediction, as the process is essentially the inverse
of the rule induction of task #2:
● given a set of rules which manipulates a sequence, in cases where the reflex of a
proto-form is missing we could automatically generate the expected reflex from
the rules inferred from other words, checking for cases where the reflex was
subject to a major semantic shift
○ Historical linguists have been doing this for centuries, but multitiers would allow us to partially
automate the task, providing computer assistance to researchers
We are developing the multitiers as an independent library for the Python programming
language, with the intention of merging it with LingPy in the future.
Possibility of annotating and testing different theories of phonological representation
Blevins, J. 2004. Evolutionary Phonology: The Emergence of Sound Patterns. Cambridge University Press.
Bouchard-Côté, Alexandre, David Hall, Thomas L. Griffiths, and Dan Klein. 2013. “Automated Reconstruction of Ancient Languages Using
Probabilistic Models of Sound Change.” Proceedings of the National Academy of Sciences of the United States of America 110 (11): 4224–9.
Chomsky, Noam, and Morris Halle. 1968. The Sound Pattern of English. New York; Evanston; London: Harper; Row.
Durbin, Richard, Sean R. Eddy, Anders Krogh, and Graeme Mitchinson. (1998) 2002. Biological Sequence Analysis. Probabilistic Models of
Proteins and Nucleic Acids. 7th ed. Cambridge: Cambridge University Press.
Firth, John Rupert. 1948. “Sounds and Peosodies.” Transactions of the Philological Society 47 (1). Wiley Online Library: 127–52.
Harris, Zellig Sabbettai. 1963. “Structural Linguistics.” Chicago University Press.
Hartman, Lee. 2003. “Phono (Version 4.0): Software for Modeling Regular Historical Sound Change.” Santiago de Cuba.
Jakobson, Roman, C Gunnar Fant, and Morris Halle. 1951. “Preliminaries to Speech Analysis: The Distinctive Features and Their Correlates.”
Kluge and Seebold (2002): Etymologisches Wörterbuch der deutschen Sprache. De Grutyer.
Ladefoged, P., and I. Maddieson. 1996. The Sounds of the World’s Languages. Phonological Theory. Wiley.
List, Johann-Mattis. 2014. Sequence Comparison in Historical Linguistics. Düsseldorf: Düsseldorf University Press.
List, Johann-Mattis, and Thiago Chacon. 2015. “Towards a Cross-Linguistic Database for Historical Phonology? A Proposal for a Machine
Readable Modeling of Phonetic Context.” Leiden.
Mielke, Jeff, 2008. The Emergence of distinctive features. OUP Oxford.
Newman, J. and Anand, V. Raman, 1999. Historical Chinese Phonology: A Compendium of Beijing and Cantonese Pronunciations of Characters
and their Derivations from Middle Chinese. Newcastle and München: Lincom.
Wheeler, W. C., and Peter M. Whiteley. 2015. “Historical Linguistics as a Sequence Optimization Problem: The Evolution and Biogeography of
Uto-Aztecan Languages.” Cladistics 31 (2): 113–25.