Handling Phonological and Etymological Relations in Computer-Based and Computer-Assisted Frameworks

Handling Phonological and Etymological Relations in Computer-Based and Computer-Assisted Frameworks
Theoretical Aspects Johann-Mattis List DFG Research Fellow Centre des Recherches Linguistiques sur l’Asie Orientale (EHESS) Team “Adaptation, Integration, Reticulation, Evolution” (UPMC) Paris 2015-02-23 1 / 30

LANGUAGE MODELING 2.0 2 / 30

LANGUAGE MODELING 2.0 State of the Art 2 / 30

State of the Art New Models New Models 1 0
Gain-Loss Models 3 / 30

p p͡f f v h ø Gain-Loss Models Multistate Models 3 / 30

p p͡f f v h ø Gain-Loss Models Multistate Models ===> 3 / 30

State of the Art Examples Examples 2014 4 / 30

State of the Art Examples Examples Historical linguistics as a
sequence optimization problem: the evolution and biogeography of Uto-Aztecan languages Ward C. Wheelera,* and Peter M. Whiteleyb aDivision of Invertebrate Zoology, Am erican Museumof Natural History, Central Park West @ 79th Street, New York, NY, 10024-5192, USA; bDivision of Anthropology, Am erican Museumof Natural History, Central Park West @ 79th Street, New York, NY, 10024-5192, USA Accepted 18 March 2014 Abstract Language origins and diversif cation are vital for mapping human history. Traditionally, the reconstruction of language trees has been based on cognate forms among related languages, with ancestral protolanguages inferred by individual investigators. Disagreement among competing authorities is typically extensive, without empirical grounds for resolving alternative hypotheses. Here, we apply analytical methods derived from DNA sequence optimization algorithms to Uto-Aztecan languages, treating words as sequences of sounds. Our analysis yields novel relationships and suggests a resolution to current conf icts about the Proto-Uto-Aztecan homeland. The techniques used for Uto-Aztecan are applicable to written and unwritten languages, and should enable more empirically robust hypotheses of language relationships, language histories, and linguistic evolution. ©The Willi Hennig Society 2014. Introduction How languages evolve has long been a central question for the human sciences. Linguistic elements may be transmitted horizontally (“borrowing”) among neighbouring languages, but most language transmis- sion obviously occurs via lineal descent with modif cation. Linguistic and biological evolution are thus analogous in important respects; constructing trees of languages “genetically” related in families is well estab- lished (e.g. Greenhill et al., 2009). Recently, phylogenetic models have enhanced both methodology and hypothesis-testing for language ancestry (e.g. Forster and Renfrew, 2006). Approaches now engage archaeol- ogy, anthropology, genetics, and computational science, as well as historical linguistics itself. Notwithstanding advances, disputes remain vigorous in both methods and results, including for well-studied language families such as Indo-European (see, for example, Forster and Renfrew, 2006; Campbell and Poser, 2008). Often, reconstructions are untestable— hence the vigour of disputation. The approach adopted here, by contrast, involves an inspectable set of procedures applied directly to empirical linguistic data. We use analytical methods derived from DNA sequence optimization algorithms, treating words as sequences of sounds. We demonstrate this with Uto-Aztecan (UA) languages of North and Middle America. The basic approach articulated here is to remove the inferential overburden of hypothesized “proto-forms” (discussed below), and perform analysis solely using the observed sound content of words. In this way, the sequences of sounds that constitute all human languages form the empirical basis upon which language trees are built. To accomplish this, we have adapted techniques more usually applied to the analysis of DNA and protein sequence data, but are readily applied to sound sequences as well (as with other non- molecular sequence data; Schulmeister and Wheeler, 2004; Robillard et al., 2006). In moving from proto- forms to sound sequences, a transition occurs analogous to the advances forged in organismic systematic *Corresponding author: E-m ail address: [email protected] Cladistics Cladistics (2014) 1–13 10.1111/cla.12078 ©The Willi Hennig Society 2014 4 / 30

State of the Art Examples Examples Historical linguistics as a
sequence optimization problem: the evolution and biogeography of Uto-Aztecan languages Ward C. Wheelera,* and Peter M. Whiteleyb aDivision of Invertebrate Zoology, Am erican Museumof Natural History, Central Park West @ 79th Street, New York, NY, 10024-5192, USA; bDivision of Anthropology, Am erican Museumof Natural History, Central Park West @ 79th Street, New York, NY, 10024-5192, USA Accepted 18 March 2014 Abstract Language origins and diversif cation are vital for mapping human history. Traditionally, the reconstruction of language trees has been based on cognate forms among related languages, with ancestral protolanguages inferred by individual investigators. Disagreement among competing authorities is typically extensive, without empirical grounds for resolving alternative hypotheses. Here, we apply analytical methods derived from DNA sequence optimization algorithms to Uto-Aztecan languages, treating words as sequences of sounds. Our analysis yields novel relationships and suggests a resolution to current conf icts about the Proto-Uto-Aztecan homeland. The techniques used for Uto-Aztecan are applicable to written and unwritten languages, and should enable more empirically robust hypotheses of language relationships, language histories, and linguistic evolution. ©The Willi Hennig Society 2014. Introduction How languages evolve has long been a central question for the human sciences. Linguistic elements may be transmitted horizontally (“borrowing”) among neighbouring languages, but most language transmis- sion obviously occurs via lineal descent with modif cation. Linguistic and biological evolution are thus analogous in important respects; constructing trees of languages “genetically” related in families is well estab- lished (e.g. Greenhill et al., 2009). Recently, phylogenetic models have enhanced both methodology and hypothesis-testing for language ancestry (e.g. Forster and Renfrew, 2006). Approaches now engage archaeol- ogy, anthropology, genetics, and computational science, as well as historical linguistics itself. Notwithstanding advances, disputes remain vigorous in both methods and results, including for well-studied language families such as Indo-European (see, for example, Forster and Renfrew, 2006; Campbell and Poser, 2008). Often, reconstructions are untestable— hence the vigour of disputation. The approach adopted here, by contrast, involves an inspectable set of procedures applied directly to empirical linguistic data. We use analytical methods derived from DNA sequence optimization algorithms, treating words as sequences of sounds. We demonstrate this with Uto-Aztecan (UA) languages of North and Middle America. The basic approach articulated here is to remove the inferential overburden of hypothesized “proto-forms” (discussed below), and perform analysis solely using the observed sound content of words. In this way, the sequences of sounds that constitute all human languages form the empirical basis upon which language trees are built. To accomplish this, we have adapted techniques more usually applied to the analysis of DNA and protein sequence data, but are readily applied to sound sequences as well (as with other non- molecular sequence data; Schulmeister and Wheeler, 2004; Robillard et al., 2006). In moving from proto- forms to sound sequences, a transition occurs analogous to the advances forged in organismic systematic *Corresponding author: E-m ail address: [email protected] Cladistics Cladistics (2014) 1–13 10.1111/cla.12078 ©The Willi Hennig Society 2014 Data 32 Uto-Aztekan languages Swadesh 100 concept lists manually extracted cognate sets IPA-encoding, 148 unique symbols Method simultaneous sequence optimization and phylogenetic inference (ML framework) varying scoring functions for the matching of sound symbols Output phylogenies, (?) transition frequencies (?) Software POY 5.0 (Wheeler 2013): Phylogenetic Analysis of DNA and other data using dynamic homology 4 / 30

State of the Art Examples Examples 2015 5 / 30

State of the Art Examples Examples 2015 Current Biology 25,
1–9, J anuary 5, 2015 ª 2015 The Authors http://dx.doi.org/10.1016/j.cub.2014.10.064 Article Detecting Regular Sound Changes in Linguistics as Events of Concerted Evolution Daniel J . Hruschka,1 Simon Branford,2 Eric D. Smith,3,4 J on Wilkins,3,5 Andrew Meade,2 Mark Pagel,2,3,* and Tanmoy Bhattacharya3,6,* 1School of Human Evolution and Social Change, Arizona State University, PO Box 872402, Tempe, AZ 85287-2402, USA 2SchoolofBiologicalSciences, UniversityofReading, Reading RG6 6BX, UK 3The Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA 4Krasnow Institute for Advanced Study, George Mason University, Mail Stop 2A1, 4400 University Drive, Fairfax, VA 22030, USA 5Ronin Institute, 127 Haddon Place, Montclair, NJ 07043, USA 6T-2, Los Alamos National Laboratory, Los Alamos, NM 87545, USA Summary Background: Concerted evolution is normally used to describe parallel changes at different sites in a genome, but it is also observed in languages where a specif c phoneme changes to the same other phoneme in many words in the lexicon—a phenomenon known as regular sound change. We develop a general statistical model that can detect concerted changes in aligned sequence data and apply it to study regular sound changes in the Turkic language family. Results: Linguistic evolution, unlike the genetic substitutional process, is dominated by events of concerted evolutionary change. Our model identif ed more than 70 historical events ofregularsoundchangethatoccurredthroughouttheevolution of the Turkic language family, while simultaneously inferring a dated phylogenetic tree. Including regular sound changes yielded an approximately4-fold improvement in thecharacter- ization of linguistic change over a simpler model of sporadic change, improved phylogenetic inference, and returned more reliable and plausible dates for events on the phylogenies. The historical timings of the concerted changes closely follow a Poisson process model, and the sound transition networks derived fromour model mirror linguistic expectations. Conclusions: We demonstrate that a model with no prior knowledge of complex concerted or regular changes can nevertheless infer the historical timings and genealogical placements of events of concerted change from the signals left in contemporary data. Our model can be applied wherever discrete elements—such as genes, words, cultural trends, technologies, or morphological traits—can change in parallel within an organism or other evolving group. Introduction Concerted evolutionary change is widespread in genetic systems, being implicated in the genome-wide control of repetitive elements [1–3], the evolution of gene families [2], and homogenization of Y chromosome sequences [4, 5] and as a means by which asexual organisms might escape the debilitating consequences of Muller’s ratchet [3]. It might arise from several mechanisms, including homologous recombi- nation, that allow certain favorable elements to spread or damaging elements to be neutralized. Linguists have long recognized concerted change that affects copies of the same sound (or phoneme) appearing in different words as a central feature of linguistic evolution [6]. A well-known example is the *p> f sound change in the Germanic languages wherein an older Indo-European p sound was replaced by an f sound, such as in *pater> father, or *pes, *pedis> foot (linguistic convention is to use the ‘‘> ’’ symbol to indicate a transition from one sound to another, and here the * symbol denotes a reconstructed ancestral form). These multipleinstances ofonephonemechanging to thesameother phoneme yield regular sound correspondences between pairs or groups of languages. Linguists have proposed several explanations for the regularity of changes grounded in a number of basic processes, including speech production, perception, and cognition [7–9]. Can events of concerted change be detected statistically in sequence data, and do they improve the characterization of evolutionand theinferenceof evolutionaryhistories? Although previous researchers working in a linguistic setting have used the concept of regular changes to build algorithms for automatically inferring cognacy, to our knowledge the model we report here is the f rst probabilistic description of concerted change. This places concerted evolution in a statistical setting that allows for formal hypothesis testing about the nature and rates of concerted changes. For example, the question of how many parallel changes are required to be recognized as an instance of concerted change is naturally dealt with in our model: the statistical signature of concerted or regular change is that the multiple parallel events are more probable if treated as a single coordinated change than as a collection of inde- pendent changes (Box 1). Usefully, the genetic and linguistic phenomena share funda- mental properties relevant to their statistical characterization. Phonemes are the units of sound that make up words and distinguish one word from another, just as the four nucleotide bases (A, C, T, G) make up DNA gene sequences or the 20 amino acids make up protein sequences. The number of distinct sounds in a language varies greatly, but somewhere around 30–60 phonemes are commonly suff cient to describe the range of distinctive sounds in a language’s words [10]. Collections of words can therefore be thought of as providing phonemic ‘‘sequence information’’ that might be informative as to the history, rate, and patterns of concerted evolutionary change in language, and in a manner analogous to sequences of DNA. 5 / 30

State of the Art Examples Examples 2015 Current Biology 25,
1–9, J anuary 5, 2015 ª 2015 The Authors http://dx.doi.org/10.1016/j.cub.2014.10.064 Article Detecting Regular Sound Changes in Linguistics as Events of Concerted Evolution Daniel J . Hruschka,1 Simon Branford,2 Eric D. Smith,3,4 J on Wilkins,3,5 Andrew Meade,2 Mark Pagel,2,3,* and Tanmoy Bhattacharya3,6,* 1School of Human Evolution and Social Change, Arizona State University, PO Box 872402, Tempe, AZ 85287-2402, USA 2SchoolofBiologicalSciences, UniversityofReading, Reading RG6 6BX, UK 3The Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA 4Krasnow Institute for Advanced Study, George Mason University, Mail Stop 2A1, 4400 University Drive, Fairfax, VA 22030, USA 5Ronin Institute, 127 Haddon Place, Montclair, NJ 07043, USA 6T-2, Los Alamos National Laboratory, Los Alamos, NM 87545, USA Summary Background: Concerted evolution is normally used to describe parallel changes at different sites in a genome, but it is also observed in languages where a specif c phoneme changes to the same other phoneme in many words in the lexicon—a phenomenon known as regular sound change. We develop a general statistical model that can detect concerted changes in aligned sequence data and apply it to study regular sound changes in the Turkic language family. Results: Linguistic evolution, unlike the genetic substitutional process, is dominated by events of concerted evolutionary change. Our model identif ed more than 70 historical events ofregularsoundchangethatoccurredthroughouttheevolution of the Turkic language family, while simultaneously inferring a dated phylogenetic tree. Including regular sound changes yielded an approximately4-fold improvement in thecharacter- ization of linguistic change over a simpler model of sporadic change, improved phylogenetic inference, and returned more reliable and plausible dates for events on the phylogenies. The historical timings of the concerted changes closely follow a Poisson process model, and the sound transition networks derived fromour model mirror linguistic expectations. Conclusions: We demonstrate that a model with no prior knowledge of complex concerted or regular changes can nevertheless infer the historical timings and genealogical placements of events of concerted change from the signals left in contemporary data. Our model can be applied wherever discrete elements—such as genes, words, cultural trends, technologies, or morphological traits—can change in parallel within an organism or other evolving group. Introduction Concerted evolutionary change is widespread in genetic systems, being implicated in the genome-wide control of repetitive elements [1–3], the evolution of gene families [2], and homogenization of Y chromosome sequences [4, 5] and as a means by which asexual organisms might escape the debilitating consequences of Muller’s ratchet [3]. It might arise from several mechanisms, including homologous recombi- nation, that allow certain favorable elements to spread or damaging elements to be neutralized. Linguists have long recognized concerted change that affects copies of the same sound (or phoneme) appearing in different words as a central feature of linguistic evolution [6]. A well-known example is the *p> f sound change in the Germanic languages wherein an older Indo-European p sound was replaced by an f sound, such as in *pater> father, or *pes, *pedis> foot (linguistic convention is to use the ‘‘> ’’ symbol to indicate a transition from one sound to another, and here the * symbol denotes a reconstructed ancestral form). These multipleinstances ofonephonemechanging to thesameother phoneme yield regular sound correspondences between pairs or groups of languages. Linguists have proposed several explanations for the regularity of changes grounded in a number of basic processes, including speech production, perception, and cognition [7–9]. Can events of concerted change be detected statistically in sequence data, and do they improve the characterization of evolutionand theinferenceof evolutionaryhistories? Although previous researchers working in a linguistic setting have used the concept of regular changes to build algorithms for automatically inferring cognacy, to our knowledge the model we report here is the f rst probabilistic description of concerted change. This places concerted evolution in a statistical setting that allows for formal hypothesis testing about the nature and rates of concerted changes. For example, the question of how many parallel changes are required to be recognized as an instance of concerted change is naturally dealt with in our model: the statistical signature of concerted or regular change is that the multiple parallel events are more probable if treated as a single coordinated change than as a collection of inde- pendent changes (Box 1). Usefully, the genetic and linguistic phenomena share funda- mental properties relevant to their statistical characterization. Phonemes are the units of sound that make up words and distinguish one word from another, just as the four nucleotide bases (A, C, T, G) make up DNA gene sequences or the 20 amino acids make up protein sequences. The number of distinct sounds in a language varies greatly, but somewhere around 30–60 phonemes are commonly suff cient to describe the range of distinctive sounds in a language’s words [10]. Collections of words can therefore be thought of as providing phonemic ‘‘sequence information’’ that might be informative as to the history, rate, and patterns of concerted evolutionary change in language, and in a manner analogous to sequences of DNA. Data 26 Turkic languages 225 cognate sets from etymological dictionaries (Tower of Babel) manually compiled alignment analyses ASCII-encoding, 62 unique symbols Method Bayesian Markov chain Monte Carlo statistical model allowing for sporadic (irregular) and concerted (regular) changes along a phylogenetic tree that produces the alignments Output phylogenies, change rates Software Bayes Phylogenies 5 / 30

State of the Art Problems Problems ! 6 / 30

State of the Art Problems Data Representation File ”UT-root.fas”, lines
1-20 (Wheeler et al. 2014) 7 / 30

State of the Art Problems Data Representation File ”TurkicFull.txt”, lines
1-20 (Hruschka et al. 2015) 7 / 30

State of the Art Problems Model Restrictions 8 / 30

State of the Art Problems Model Restrictions approach Wheeler et
al. 2014 Hruschka et al. 2015 data size taxa 32 26 cognate sets 100 225 data structure phonetic strings ✓ ✓ segmentation ✓ ✓ cognate sets ✓ ✓ alignments inferred prescribed models sound change transition graph transition graph transition rates prescribed inferred unobserved states ✗ ✗ context ✗ ✗ 8 / 30

State of the Art Problems Model Restrictions ˈs o h₂
w l̩ s h₂ u ˈe n - Indo-European "sun" s oː l - s oː l i k u l u s "sun" "small sun" s oː l ɛ j z ɔ n ə s u nː ɔ̃ː Germanic German Latin French "sun" "sun" "sun" 9 / 30

h j - ä r t a - h -
e - r z - - h - e a r t - - c - - o r d i s hjärta herz heart cordis Modeling Phonological R elations 10 / 30

Modeling Phonological Relations Alignments Alignments 11 / 30

Modeling Phonological Relations Problems Problems ! 12 / 30

Modeling Phonological Relations Problems Problems unalignable cognates 13 / 30

Modeling Phonological Relations Problems Problems unalignable cognates t u ŋ
a b iː t ə - context 13 / 30

Modeling Phonological Relations Problems Problems unalignable cognates t u ŋ
a b iː t ə - context site dependencies 13 / 30

Modeling Phonological Relations Solutions Solutions ? 14 / 30

Modeling Phonological Relations Solutions LingPy and EDICTOR LingPy (List et
al. 2014, http://lingpy.org) EDICTOR (List in prep., http://tsv.lingpy.org) P(A|B)=(P(B|A)P(A))/(P(B) FRANZ BOPP VERY, VERY LONG TITLE 15 / 30

Modeling Phonological Relations Solutions Unalignable Parts 16 / 30

Modeling Phonological Relations Solutions Unalignable Parts implemented (LingPy and EDICTOR)
16 / 30

Modeling Phonological Relations Solutions Phonetic Context 17 / 30

Modeling Phonological Relations Solutions Phonetic Context implemented (LingPy and EDICTOR)
17 / 30

Modeling Phonological Relations Solutions Site Dependence 18 / 30

Modeling Phonological Relations Solutions Site Dependence implementation pending 18 /
30

Modeling Phonological Relations Solutions Summary Problem Subproblem LingPy EDICTOR unalignable
cognates linear relations ✓ ✓ non-linear relations ✗ ✗ context prosodic context ✓ ✓ user-deﬁned context (✓) ✓ site dependencies neighboring columns ✗ (✓) non-neighboring columns ✗ ✗ 19 / 30

Modeling Phonological Relations Solutions Summary PhonoBank? Yes! If we thoroughly
collect aligned cognate sets, along with proto-forms, and marked context (datasets of Paul and Thiago) we will be able to automatically extract the sound changes and cre- ate our PhonoBank! 19 / 30

PIE *bhreu◌̯ Hg◌ ̑ - “to use” PIE *bhruHg◌ ̑
-ié- “to use” (present tense) PGM *ƀrūkan- “to use” OHG brūhhan “to use” G brauchen “to use” G Brauch “custom” OHG fruht “profit, fruit” G frugal “nourishing” Fr fruit “profit,fruit” Fr frugal “modest (food)” Lt fruor, fruī “I enjoy” Lt frūctus “profit” Lt frux “fruit, grain” Lt frugalis “bring profit” inherited from borrowed from derived from Modeling Etymological R elations 20 / 30

Modeling Etymological Relations Dimensions of Lexical Change Dimensions of Lexical
Change 21 / 30

Change SEMANTIC CHANGE MORPHOLOGICAL CHANGE S T R A T IC C H A N G E Dimensions of Lexical Change (Gévaudan 2007) 21 / 30

Change *kuppa- Kopf Kopf köpfen world cup Weltcup semantic change morphological change stratic change 22 / 30

Modeling Etymological Relations Inference Inference word Wort слово cuvînt palabra
mot adottszó slovo verbum focal 词 parola λόγος शब◌् द ord λόγος Wort слово cuvînt palabra mot adottszó slovo verbum focal 词 parola शब◌् द ord word ord ord word 23 / 30

Modeling Etymological Relations Inference Semantic Shift and Colexiﬁcation Networks Key
Concept Russian German ... 1.1 world mir, svet Welt ... 1.21 earth, land zemlja Erde, Land ... 1.212 ground, soil počva Erde, Boden ... 1.420 tree derevo Baum ... 1.430 wood derevo Wald ... ... ... ... ... ... 24 / 30

Modeling Etymological Relations Inference Semantic Shift and Colexiﬁcation Networks post,
pole staff, walking stick doorpost, jamb tree stump mast club firewood root tree trunk woods, forest banana tree tree wood CLICS: Database of Cross-Linguistic Colexifications (List et al. 2014, http://clics.lingpy.org) 24 / 30

Modeling Etymological Relations Inference Semantic Shift and Concept Comparison GLOSS
OCCS Blust- 2008-210 Dolgopolsky- 1964-15 Dunn-2012-207 4118 WATER 3 water water water 5440 TONGUE 3 tongue tongue tongue 5450 I 3 I ﬁrst person marker I 5456 YOU 3 you second person marker you 5490 WHO 3 who? who/what who CONCEPTICON: A resource for the linking of concept lists (List and Cysouw, forthcoming, http://concepticon.github.io) Concept ID arbitrarité 25 / 30

Modeling Etymological Relations Inference Language Contact and Minimal Lateral Networks
. . ---Lánzhōu . Fùzhōu -- . Xiāngtàn -- . M ěixiàn -- . H ongkong -- . ---Wǔhàn . ---Běijīng . ---Kùnmíng . Hángzhōu -- . Xiàmén -- . ---Chéngdū . Sùzhōu -- . Shànghǎi -- . Táiběi -- . ---Zhèngzhōu . Shèxiàn -- . ---Nánjīng . ---Guìyáng . W énzhōu -- . N ánníng -- . Tūnxī -- . ---Tiānjìn . Shāntóu -- . ---Xīníng . ---Q īngdǎo . ---Ürüm qi . ---Píngyáo . Nánchàng -- . ---Tàiyuán . Chángshā -- . Hǎikǒu -- . ---Héfèi . Jiàn'ǒu -- . ---Yīnchuàn . ---Hohhot . Táoyuán -- . ---Xī'ān . G uǎngzhōu -- . ---Harbin . ---Jìnán . 0 . 0 . 0 . Inferred Links Minimal Lateral Networks of Chinese dialects (List et al. 2014) 26 / 30

. . Tūnxī - . Fùzhōu - . --Héfèi . Hǎikǒu - . --Wǔhàn . --Kùnmíng . --Zhèngzhōu . --Xī'ān . Nánchàng - . G uǎngzhōu - . Sùzhōu - . --Ürüm qi . --Q īngdǎo . M ěixiàn - . N ánníng - . Chángshā - . --Lánzhōu . --Tàiyuán . --Tiānjìn . --Harbin . Táoyuán - . W énzhōu - . --Xīníng . H ongkong - . Xiàmén - . Hángzhōu - . --Yīnchuàn . --Chéngdū . --Nánjīng . Táiběi - . --Guìyáng . --Hohhot . --Běijīng . Shànghǎi - . Xiāngtàn - . --Jìnán . Shāntóu - . Shèxiàn - . Jiàn'ǒu - . --Píngyáo . 1 . 4 . 8 . Inferred Links Minimal Lateral Networks of Chinese dialects (List et al. 2014) 26 / 30

. . Guānhuà . Xiàng . Mǐn . Yuè . Wú . Jìn . Kèjiā . Gàn . Huī . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 . 10 . 11 . 12 . 13 . 14 . 15 . 16 . 17 . 18 . 19 . 20 . 21 . 22 . 23 . 24 . 25 . 26 . 27 . 28 . 29 . 30 . 31 . 32 . 33 . 34 . 35 . 36 . 37 . 38 . 39 . 40 . 1 . Běijīng 北京 . 2 . Chángshā 长沙 . 3 . Chéngdū 成都 . 4 . Fùzhōu 福州 . 5 . Guǎngzhōu 广州 . 6 . Guìyáng 贵阳 . 7 . Harbin 哈尔滨 . 8 . Hǎikǒu 海口 . 9 . Hángzhōu 杭州 . 10 . Héfèi 合肥 . 11 . Hohhot 呼和浩特 . 12 . Jiàn'ōu 建瓯 . 13 . Jìnán 济南 . 14 . Kùnmíng 昆明 . 15 . Lánzhōu 兰州 . 16 . Měixiàn 梅县 . 17 . Nánchàng 南昌 . 18 . Nánjīng 南京 . 19 . Nánníng 南宁 . 20 . Píngyáo 平遥 . 21 . Qīngdǎo 青岛 . 22 . Shànghǎi 上海 . 23 . Shāntóu 汕头 . 24 . Shèxiàn 歙县 . 25 . Sùzhōu 苏州 . 26 . Táiběi 台北 . 27 . Tàiyuán 太原 . 28 . Táoyuán 桃园 . 29 . Tiānjìn 天津 . 30 . Tūnxī 屯溪 . 31 . Wénzhōu 温州 . 32 . Wǔhàn 武汉 . 33 . Ürümqi 乌鲁木齐 . 34 . Xiàmén 厦门 . 35 . Hongkong 香港 . 36 . Xiāngtàn 湘潭 . 37 . Xīníng 西宁 . 38 . Xī'ān 西安 . 39 . Yīnchuàn 银川 . 40 . Zhèngzhōu 郑州 . 1 . 7 . 15 . Inferred Links Minimal Lateral Networks of Chinese dialects (List et al. 2014) 26 / 30

. . -----Jìnán . -----Harbin . -----Héfèi . Chángshā ---- . Sùzhōu ---- . -----Yīnchuàn . -----Běijīng . Hángzhōu ---- . -----Chéngdū . -----Hohhot . -----Lánzhōu . Xiāngtàn ---- . -----Ürüm qi . M ěixiàn ---- . -----Xī'ān . G uǎngzhōu ---- . -----Nánjīng . Táoyuán ---- . -----Zhèngzhōu . -----Kùnmíng . Táiběi ---- . Shànghǎi ---- . Xiàmén ---- . Jiàn'ǒu ---- . Shèxiàn ---- . -----Q īngdǎo . -----Xīníng . Fùzhōu ---- . -----Tàiyuán . -----Píngyáo . Nánchàng ---- . H ongkong ---- . N ánníng ---- . W énzhōu ---- . -----Guìyáng . Shāntóu ---- . -----Tiānjìn . Tūnxī ---- . Hǎikǒu ---- . -----Wǔhàn . 太阳 . 日头 . 热头 . 阳婆 . 日 . Loss Event . Gain Event Inferred evolution of „sun” (List et al. submitted) 26 / 30

. . Shànghǎi ---- . Hongkong ---- . Táiběi ---- . Nánjīng ---- . Táoyuán ---- . Běijīng ---- . Měixiàn ---- . Xiàmén ---- . Fùzhōu ---- . Guǎngzhōu ---- . 太阳 . 日头 . Loss Event . Gain Event Inferred evolution of „sun” (List et al. submitted) 26 / 30

Modeling Etymological Relations Models Models LOSS INNO VATIO N INNO
VATIO N BORROWING 27 / 30

Modeling Etymological Relations Models New Models for Lexical Change German
m oː n t - English m uː n - - Danish m ɔː n - ə Swedish m oː n - e 28 / 30

m oː n t - English m uː n - - Danish m ɔː n - ə Swedish m oː n - e Fúzhōu ŋ u o ʔ ⁵ - - - - - - - - - - Měixiàn ŋ i a t ⁵ - - - - - k u o ŋ ⁴⁴ Guǎngzhōu j - y t ² l - œ ŋ ²² - - - - - Běijīng - y ɛ - ⁵¹ l i ɑ ŋ - - - - - - 28 / 30

m oː n t - English m uː n - - Danish m ɔː n - ə Swedish m oː n - e Fúzhōu ŋ u o ʔ ⁵ - - - - - - - - - - Měixiàn ŋ i a t ⁵ - - - - - k u o ŋ ⁴⁴ Guǎngzhōu j - y t ² l - œ ŋ ²² - - - - - Běijīng - y ɛ - ⁵¹ l i ɑ ŋ - - - - - - "MOON" "MOON" "SHINE" "LIGHT" 28 / 30

Modeling Etymological Relations Models New Models for Lexical Change ŋ
u o ʔ ⁵ - - - - - - - - - - ŋ i a t ⁵ - - - - - k u o ŋ ⁴⁴ j - y t ² l - œ ŋ ²² - - - - - - y ɛ - ⁵¹ l i ɑ ŋ - - - - - - "MOON" "SHINE" "LIGHT" 28 / 30

Modeling Etymological Relations Models New Models for Lexical Change 28
/ 30

Modeling Etymological Relations Models New Models for Lexical Change transition
priors for etymologically related words can be provided manually, automatically, or semi-automatically transitions priors can be simultaneously deﬁned for semantic change and morphological change (including complex paradigms) all we need in order to use the rich information inherent in complex etymological relations are software “beasts” that provide multistate models accepting n states along with user-speciﬁed individual transition priors for each set of etymologically related words 28 / 30

Modeling Etymological Relations Models New Data for Lexical Change Database
of Lexical Change Patterns in Sino-Tibetan Languages project of CRLAO (Paris, L. Sagart and G. Jacques) in collaboration with SOAS (London, N. Hill) 50+ doculects 250+ concepts alignment analyses for alignable cognates detailed annotation of morphological relations (“linguistic priors”) cross-semantic search for cognate words to be launched before autumn 2015 29 / 30

Modeling Etymological Relations Models New Data for Lexical Change Database
of Lexical Change Patterns in Sino-Tibetan Languages project of CRLAO (Paris, L. Sagart and G. Jacques) in collaboration with SOAS (London, N. Hill) 50+ doculects 250+ concepts alignment analyses for alignable cognates detailed annotation of morphological relations (“linguistic priors”) cross-semantic search for cognate words to be launched before autumn 2015 A Comparative Database of Tukanoan and Northwestern South American Languages Thiago will soon present you with the details! 29 / 30

THANK YOU FOR LISTENING! 30 / 30

Handling Phonological and Etymological Relation...

Handling Phonological and Etymological Relations in Computer-Based and Computer-Assisted Frameworks

More Decks by Johann-Mattis List

Other Decks in Science

Featured

Transcript