Slide 1

Slide 1 text

. . . . . . . Of Words, Waves, and Webs Using bioinformatics to study the lateral component of language evolution Johann-Mattis List Forschungszentrum Deutscher Sprachatlas Philipps-Universität Marburg 31.01.2014 1 / 1

Slide 2

Slide 2 text

Languages 语言 language язык språk Languages 2 / 1

Slide 3

Slide 3 text

Languages Languages and Dialects Languages and Dialects Norwegian, Swedish, and Danish are different languages . . Běijīng-Chinese, Shànghǎi-Chinese und Hakka-Chinese are dialects of the same language 3 / 1

Slide 4

Slide 4 text

Languages Languages and Dialects Languages and Dialects Beijing Chinese 1 iou²¹ i⁵⁵ xuei³⁵ pei²¹fəŋ⁵⁵ kən⁵⁵ tʰai⁵¹iaŋ¹¹ t͡ʂəŋ⁵⁵ ʦai⁵³ naɚ⁵¹ t͡ʂəŋ⁵⁵luən⁵¹ Hakka Chinese 1 iu³³ it⁵⁵ pai³³a¹¹ pet³³fuŋ³³ tʰuŋ¹¹ ɲit¹¹tʰeu¹¹ hɔk³³ e⁵³ au⁵⁵ Shanghai Chinese 1 ɦi²² tʰɑ̃⁵⁵ ʦɿ²¹ poʔ³foŋ⁴⁴ taʔ⁵ tʰa³³ɦiã⁴⁴ ʦəŋ³³ hɔ⁴⁴ ləʔ¹lə²³ʦa⁵³ Beijing Chinese 2 ʂei³⁵ də⁵⁵ pən³⁵ liŋ²¹ ta⁵¹ Hakka Chinese 2 man³³ ɲin¹¹ kʷɔ⁵⁵ vɔi⁵³ Shanghai Chinese 2 sa³³ ɲiŋ⁵⁵ ɦəʔ²¹ pəŋ³³ zɿ⁴⁴ du¹³ Norwegian 1 nuːɾɑʋinˑn̩ ɔ suːln̩ kɾɑŋlət ɔm Swedish 1 nuːɖanvɪndən ɔ suːlən tv̥ɪstadə ən gɔŋ ɔm Danish 1 noʌ̯ʌnvenˀn̩ ʌ soːl̩ˀn kʰʌm eŋg̊ɑŋ i sd̥ʁiðˀ ʌmˀ Norwegian 2 ʋem ɑ dem sɱ̩ ʋɑː ɖɳ̩ stæɾ̥kəstə Swedish 2 vɛm ɑv dɔm sɔm vɑ staɹkast Danish 2 vɛmˀ a b̥m̩ d̥ vɑ d̥n̩ sd̥æʌ̯g̊əsd̥ə 4 / 1

Slide 5

Slide 5 text

Languages Languages and Dialects Languages and Dialects From the perspective of the lexicon and the sound system, the Chinese dialects are at least as diverse as the Scandi- navian languages 4 / 1

Slide 6

Slide 6 text

Languages Diasystems Language as a diasystem Languages are complex aggregates of different linguistic systems which “miteinander koexistieren und einander be- einflussen” (Coseriu 1973: 40). . . 5 / 1

Slide 7

Slide 7 text

Languages Diasystems Language as a diasystem Languages are complex aggregates of different linguistic systems which “miteinander koexistieren und einander be- einflussen” (Coseriu 1973: 40). . . A linguistic diasystem needs a “roof language” (Goossens 1973: 11), a linguistic variety that serves as a standard for interdialectal communication. 5 / 1

Slide 8

Slide 8 text

Languages Diasystems Language as a diasystem Standard Language Diatopic Varieties Diastratic Varieties Diaphasic Varieties 6 / 1

Slide 9

Slide 9 text

Languages Change Change 7 / 1

Slide 10

Slide 10 text

Languages Change Change expected Mandarin [ma₅₅po₂₁lou] 7 / 1

Slide 11

Slide 11 text

Languages Change Change expected Mandarin [ma₅₅po₂₁lou] attested Mandarin [wan₅₁paw₂₁lu₅₁] 7 / 1

Slide 12

Slide 12 text

Languages Change Change expected Mandarin [ma₅₅po₂₁lou] attested Mandarin [wan₅₁paw₂₁lu₅₁] explanation Cantonese [maːn₂₂pow₃₅low₃₂] 7 / 1

Slide 13

Slide 13 text

Languages Change Change English Cantonese Mandarin maːlboʁo maːn22 pow35 low32 wan51 paw21 lu51 Proper Name “Road of 1000 Tre- asures” “Road of 1000 Tre- asures” 万宝路 8 / 1

Slide 14

Slide 14 text

Languages Change Wind of Sound Change in China 燕 燕 于 飛, 下 上 其 音。 The swallows go flying, falling and rising are their voices; yān yān yú fēi xià shàng qí yīn 之 子 于 歸, 遠 送 于 南。 This young lady goes to her new home, far I accompany her to the south. zhī zǐ yú guī, yuǎn sòng yú nán 瞻 望 弗 及, 實 勞 我 心。 I gaze after her, can no longer see her, truly it grieves my heart. zhān wàng fú jí, shí láo wǒ xīn 9 / 1

Slide 15

Slide 15 text

Languages Change Wind of Sound Change in China 燕 燕 于 飛, 下 上 其 音。 The swallows go flying, falling and rising are their voices; yān yān yú fēi xià shàng qí yīn 之 子 于 歸, 遠 送 于 南。 This young lady goes to her new home, far I accompany her to the south. zhī zǐ yú guī, yuǎn sòng yú nán 瞻 望 弗 及, 實 勞 我 心。 I gaze after her, can no longer see her, truly it grieves my heart. zhān wàng fú jí, shí láo wǒ xīn 9 / 1

Slide 16

Slide 16 text

Languages Change Wind of Sound Change in China 燕 燕 于 飛, 下 上 其 音。 The swallows go flying, falling and rising are their voices; yān yān yú pjɨj xià shàng qí ʔjɨm 之 子 于 歸, 遠 送 于 南。 This young lady goes to her new home, far I accompany her to the south. zhī zǐ yú kʷjɨj, yuǎn sòng yú nɨm 瞻 望 弗 及, 實 勞 我 心。 I gaze after her, can no longer see her, truly it grieves my heart. zhān wàng fú jí, shí láo wǒ sjɨm 9 / 1

Slide 17

Slide 17 text

Modelling Language History Modelling Language History 10 / 1

Slide 18

Slide 18 text

Modelling Language History Trees Dendrophilia August Schleicher (1821-1868) 11 / 1

Slide 19

Slide 19 text

Modelling Language History Trees Dendrophilia August Schleicher (1821-1868) “Diese Annahmen, logisch folgend aus den Ergebnissen der bisheri- gen Forschung, lassen sich am bes- ten unter dem Bilde eines sich ver- ästelnden Baumes anschaulich ma- chen.”(Schleicher 1853: 787) 11 / 1

Slide 20

Slide 20 text

Modelling Language History Trees Dendrophilia Schleicher (1853) 12 / 1

Slide 21

Slide 21 text

Modelling Language History Waves Dendrophobia Johannes Schmidt (1843-1901) 13 / 1

Slide 22

Slide 22 text

Modelling Language History Waves Dendrophobia Johannes Schmidt (1843-1901) „Man mag sich also drehen und wen- den wie man will, so lange man an der anschauung fest hält, dass die in his- torischer zeit erscheinenden sprachen durch merfache gabelungen aus der ur- sprache hervorgegangen seien,d.h. so lange man einen stammbaum der indo- germanischen sprachen annimmt, wird man nie dazu gelangen alle die hier in frage stehenden tatsachen wissen- schaftlich zu erklären.” (Schmidt 1872: 17, my translation) 13 / 1

Slide 23

Slide 23 text

Modelling Language History Waves Dendrophobia Johannes Schmidt (1843-1901) „Ich möchte an seine [des Baumes] stelle das bild der welle setzen, wel- che sich in concentrischen mit der entfernung vom mittelpunkte immer schwächer werdenden ringen aus- breitet.” (Schmidt 1872: 27) 14 / 1

Slide 24

Slide 24 text

Modelling Language History Waves Dendrophobia Schmidt (1875) 15 / 1

Slide 25

Slide 25 text

Modelling Language History Waves Dendrophobia Meillet (1908) Hirt (1905) Bloomfield (1933) Bonfante (1931) 16 / 1

Slide 26

Slide 26 text

Modelling Language History Networks Phylogenetic Networks Trees are bad, because... 17 / 1

Slide 27

Slide 27 text

Modelling Language History Networks Phylogenetic Networks Trees are bad, because... they are difficult to reconstruct............ 17 / 1

Slide 28

Slide 28 text

Modelling Language History Networks Phylogenetic Networks Trees are bad, because... they are difficult to reconstruct............ languages do not always split............ .......... ............ ............ 17 / 1

Slide 29

Slide 29 text

Modelling Language History Networks Phylogenetic Networks Trees are bad, because... they are difficult to reconstruct............ languages do not always split............ .......... ............ ............ they are boring, since they only model the vertical aspects of language history ............ 17 / 1

Slide 30

Slide 30 text

Modelling Language History Networks Phylogenetic Networks Trees are bad, because... they are difficult to reconstruct............ languages do not always split............ .......... ............ ............ they are boring, since they only model the vertical aspects of language history ............ Waves are bad, because nobody knows how to reconstruct them 17 / 1

Slide 31

Slide 31 text

Modelling Language History Networks Phylogenetic Networks Trees are bad, because... they are difficult to reconstruct............ languages do not always split............ .......... ............ ............ they are boring, since they only model the vertical aspects of language history ............ Waves are bad, because nobody knows how to reconstruct them languages still diverge, even if not necessarily in split processes 17 / 1

Slide 32

Slide 32 text

Modelling Language History Networks Phylogenetic Networks Trees are bad, because... they are difficult to reconstruct............ languages do not always split............ .......... ............ ............ they are boring, since they only model the vertical aspects of language history ............ Waves are bad, because nobody knows how to reconstruct them languages still diverge, even if not necessarily in split processes they are boring, since they only model the horizontal aspects of language history 17 / 1

Slide 33

Slide 33 text

Modelling Language History Networks Phylogenetic Networks Hugo Schuchardt (1842-1927) 18 / 1

Slide 34

Slide 34 text

Modelling Language History Networks Phylogenetic Networks Hugo Schuchardt (1842-1927) “Wir verbinden die Äste und Zwei- ge des Baumes mit zahllosen hori- zontalen Linien, und er hört auf ein Baum zu sein.” (Schuchardt 1870 [1900]: 11) 18 / 1

Slide 35

Slide 35 text

Modelling Language History Networks Phylogenetic Networks 19 / 1

Slide 36

Slide 36 text

Modelling Language History Networks Phylogenetic Networks 19 / 1

Slide 37

Slide 37 text

Linguistics and Biology Linguistics and Biology 20 / 1

Slide 38

Slide 38 text

Linguistics and Biology The Quantitative Turn The Quantitative Turn 21 / 1

Slide 39

Slide 39 text

Linguistics and Biology The Quantitative Turn The Quantitative Turn 21 / 1

Slide 40

Slide 40 text

Linguistics and Biology The Quantitative Turn The Quantitative Turn “Indo-European and computational cladistics” (Ringe, Warnow and Taylor 2002) “Language-tree divergence times support the Anatolian theory of Indo-European origin” (Gray und Atkinson 2003) “Language classification by numbers” (McMahon und McMahon 2005) “Curious Parallels and Curious Connections: Phylogenetic Thinking in Biology and Historical Linguistics” (Atkinson und Gray 2005) “Automated classification of the world’s languages” (Brown et al. 2008) “Indo-European languages tree by Levenshtein distance” (Serva and Petroni 2008) “Networks uncover hidden lexical borrowing in Indo-European language evolution” (Nelson-Sathi et al. 2011) 22 / 1

Slide 41

Slide 41 text

Linguistics and Biology The Quantitative Turn The Quantitative Turn “Indo-European and computational cladistics” (Ringe, Warnow and Taylor 2002) “Language-tree divergence times support the Anatolian theory of Indo-European origin” (Gray und Atkinson 2003) “Language classification by numbers” (McMahon und McMahon 2005) “Curious Parallels and Curious Connections: Phylogenetic Thinking in Biology and Historical Linguistics” (Atkinson und Gray 2005) “Automated classification of the world’s languages” (Brown et al. 2008) “Indo-European languages tree by Levenshtein distance” (Serva and Petroni 2008) “Networks uncover hidden lexical borrowing in Indo-European language evolution” (Nelson-Sathi et al. 2011) 22 / 1

Slide 42

Slide 42 text

Linguistics and Biology Parallels Parallels . Parallels according to Pagel (2009) . . . . . . . . aspect species languages unit of replication gene word replication asexual und sexual reproduction learning speciation cladogenesis language split forces of change natural selection and genetic drift social selection and trends differentiation tree-like tree-like 23 / 1

Slide 43

Slide 43 text

Linguistics and Biology Parallels Parallels? 1 24 / 1

Slide 44

Slide 44 text

Linguistics and Biology Parallels Parallels? 1 1 1 24 / 1

Slide 45

Slide 45 text

Linguistics and Biology Differences Differences . Differences (Geisler & List 2013) . . . . . . . . Aspect Species Languages domain Popper’s World I Popper’s World III relation between form and function mechanical arbitrary origin monogenesis unclear sequence similarity universal (indepen- dent of species) language-specific differentiation tree-like network-like These differences are ignored in most of the recent applications of bioinformatic methods in historical linguistics. 25 / 1

Slide 46

Slide 46 text

Linguistics and Biology Differences Differences: Alphabets 26 / 1

Slide 47

Slide 47 text

Linguistics and Biology Differences Differences: Alphabets • universal • language-specific 26 / 1

Slide 48

Slide 48 text

Linguistics and Biology Differences Differences: Alphabets • universal • language-specific • limited • widely varying 26 / 1

Slide 49

Slide 49 text

Linguistics and Biology Differences Differences: Alphabets • universal • language-specific • limited • widely varying • constant • mutable 26 / 1

Slide 50

Slide 50 text

Linguistics and Biology Differences Differences: Alphabets • universal • language-specific • limited • widely varying • constant • mutable In order to identify homologous words in different languages, not only corresponding segments have to be identified, but also mappings between the alphabets. Phonetic alignment is thus similar to the task of aligning two sequences which have been drawn from two different alphabets! 26 / 1

Slide 51

Slide 51 text

Linguistics and Biology Differences Differences: Alphabets 27 / 1

Slide 52

Slide 52 text

Linguistics and Biology Differences Differences: Alphabets Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 3 x d d 1 x n n 1 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn d ɔː n 27 / 1

Slide 53

Slide 53 text

Linguistics and Biology Differences Differences: Alphabets Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 3 x d d 1 x n n 1 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn d ɔː n 27 / 1

Slide 54

Slide 54 text

Linguistics and Biology Differences Differences: Alphabets Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 2 x d d 1 x n n 1 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn d ɔː n 27 / 1

Slide 55

Slide 55 text

Linguistics and Biology Differences Differences: Alphabets Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 2 x d d 1 x n n 1 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn θ ɔː n 27 / 1

Slide 56

Slide 56 text

Linguistics and Biology Differences Differences: Alphabets Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 3 x d d 1 x ? n n 2 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn θ ɔː n 27 / 1

Slide 57

Slide 57 text

Linguistics and Biology Differences Differences: Alphabets Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 3 x d d 1 x n n 2 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn θ ɔː n 27 / 1

Slide 58

Slide 58 text

Linguistics and Biology Differences Differences: Alphabets Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 3 x n n 2 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German Dorn d ɔɐ n English thorn θ ɔː n German dumm d ʊ m English dumb d ʌ m 27 / 1

Slide 59

Slide 59 text

Linguistics and Biology Differences Differences: Borrowing Of the 1,000 most frequent Latin words (Stefenelli 1992), 28 / 1

Slide 60

Slide 60 text

Linguistics and Biology Differences Differences: Borrowing Of the 1,000 most frequent Latin words (Stefenelli 1992), 67% were directly inherited in at least one of the descendant languages of Latin, 28 / 1

Slide 61

Slide 61 text

Linguistics and Biology Differences Differences: Borrowing Of the 1,000 most frequent Latin words (Stefenelli 1992), 67% were directly inherited in at least one of the descendant languages of Latin, 14% were directly inherited in all descendant languages, 28 / 1

Slide 62

Slide 62 text

Linguistics and Biology Differences Differences: Borrowing Of the 1,000 most frequent Latin words (Stefenelli 1992), 67% were directly inherited in at least one of the descendant languages of Latin, 14% were directly inherited in all descendant languages, only 33% are completely lost, 28 / 1

Slide 63

Slide 63 text

Linguistics and Biology Differences Differences: Borrowing Of the 1,000 most frequent Latin words (Stefenelli 1992), 67% were directly inherited in at least one of the descendant languages of Latin, 14% were directly inherited in all descendant languages, only 33% are completely lost, about 50% of the words survive as borrowings from Latin in the descendant languages 28 / 1

Slide 64

Slide 64 text

Linguistics and Biology Differences Differences: Borrowing Of the 1,000 most frequent Latin words (Stefenelli 1992), 67% were directly inherited in at least one of the descendant languages of Latin, 14% were directly inherited in all descendant languages, only 33% are completely lost, about 50% of the words survive as borrowings from Latin in the descendant languages Saying that languages evolve in tree-like processes is similar to saying that penguins walk: It may be true, but it’s only a part of the whole interesting story. 28 / 1

Slide 65

Slide 65 text

Shifting the Paradigm Shifting the Paradigm 29 / 1

Slide 66

Slide 66 text

Shifting the Paradigm New Parallels New Parallels If we sequence 61 human genomes, we will find more or less the same collection of about 30,000 genes in each individual. But if we sequence 61 genomes of Escherichia coli (Lukjancenko et al. 2010) 30 / 1

Slide 67

Slide 67 text

Shifting the Paradigm New Parallels New Parallels If we sequence 61 human genomes, we will find more or less the same collection of about 30,000 genes in each individual. But if we sequence 61 genomes of Escherichia coli (Lukjancenko et al. 2010) we find about 4,500 genes in each individual, 30 / 1

Slide 68

Slide 68 text

Shifting the Paradigm New Parallels New Parallels If we sequence 61 human genomes, we will find more or less the same collection of about 30,000 genes in each individual. But if we sequence 61 genomes of Escherichia coli (Lukjancenko et al. 2010) we find about 4,500 genes in each individual, we find 1,000 genes present in all genomes, 30 / 1

Slide 69

Slide 69 text

Shifting the Paradigm New Parallels New Parallels If we sequence 61 human genomes, we will find more or less the same collection of about 30,000 genes in each individual. But if we sequence 61 genomes of Escherichia coli (Lukjancenko et al. 2010) we find about 4,500 genes in each individual, we find 1,000 genes present in all genomes, we find about 18,000 different genes distributed among all genomes. 30 / 1

Slide 70

Slide 70 text

Shifting the Paradigm New Parallels New Parallels . Eukaryotic and Prokaryotic Evolution . . . . . . . . Eukaryotic populations generate tree-like divergence structures over time, while genome evolution in prokaryotes generates both tree-like and net-like components. 31 / 1

Slide 71

Slide 71 text

Shifting the Paradigm New Parallels New Parallels . Eukaryotic and Prokaryotic Evolution . . . . . . . . Eukaryotic populations generate tree-like divergence structures over time, while genome evolution in prokaryotes generates both tree-like and net-like components. . Evolution and Language History . . . . . . . . Recalling the scores on borrowing frequency in the descendant languages of Latin, it seems obvious that language history shows a much closer resemblance to prokaryotic evolution than to eukaryotic evolution. When trying to apply methods from bioinformatics to linguistic problems, it seems therefore more fruitful to use those methods that explicitly deal with prokaryotic evolution. 31 / 1

Slide 72

Slide 72 text

Shifting the Paradigm Minimal Lateral Networks Minimal Lateral Networks . Biological Workflow (Dagan and Martin 2007, Dagan et al. 2008) . . . . . . . . . . . 1 collect phyletic pattern data (shared gene families) of the taxa that shall be investigated . . . 2 use gain-loss mapping techniques with different weighting models, allowing for different amounts of gain events to analyze how the gene families evolved along a given reference tree . . . 3 use ancestral genome sizes as an external criterion to determine the best weighting model . . . 4 assume that all patterns for which the best model yields more than one gain event result from lateral gene transfer . . . 5 reconstruct a minimal lateral network by connecting multiple gains for the same gene family by lateral edges 32 / 1

Slide 73

Slide 73 text

Shifting the Paradigm Minimal Lateral Networks Minimal Lateral Networks . Linguistic Workflow (Nelson-Sathi et al. 2011, List et al. 2014) . . . . . . . . . . . 1 collect phyletic pattern data (shared cognates) of the languages that shall be investigated . . . 2 use gain-loss mapping techniques with different weighting models, allowing for different amounts of to analyze how the cognates evolved along a given reference tree . . . 3 use ancestral vocabulary size distributions as an external criterion to determine the best weighting model . . . 4 allow for a substantial amount (5%) of parallel evolution . . . 5 assume that all patterns for which the best model yields more than one gain event result from lateral gene transfer . . . 6 reconstruct a minimal lateral network by connecting multiple gains of the same cognate by lateral edges 33 / 1

Slide 74

Slide 74 text

Shifting the Paradigm Minimal Lateral Networks Minimal Lateral Networks: Gain-Loss Mapping 34 / 1

Slide 75

Slide 75 text

Shifting the Paradigm Minimal Lateral Networks Minimal Lateral Networks: Gain-Loss Mapping -- Spanish -- French -- Italian Danish -- English -- German -- 34 / 1

Slide 76

Slide 76 text

Shifting the Paradigm Minimal Lateral Networks Minimal Lateral Networks: Gain-Loss Mapping -- Spanish -- French -- Italian Danish -- English -- German -- 34 / 1

Slide 77

Slide 77 text

Shifting the Paradigm Minimal Lateral Networks Minimal Lateral Networks: Gain-Loss Mapping -- Spanish -- French -- Italian Danish -- English -- German -- 34 / 1

Slide 78

Slide 78 text

Shifting the Paradigm Minimal Lateral Networks Minimal Lateral Networks: Gain-Loss Mapping -- Spanish -- French -- Italian Danish -- English -- German -- 34 / 1

Slide 79

Slide 79 text

Shifting the Paradigm Application Application: Indo-European Data (List et al. 2014) . Data . . . . . . . . 40 Indo-European languages (taken from the IELex, Dunn 2012) 1190 cognate sets (207 semantic glosses) 105 cognate sets contain known borrowings traditional reference tree, reflecting a very broad consensus, taken from Ethnologue (Lewis and Fennig 2013) 35 / 1

Slide 80

Slide 80 text

Shifting the Paradigm Application Application: Indo-European Data (List et al. 2014) . Analysis . . . . . . . . bottom-up parsimony-based approach for gain-loss mapping using different weight ratios for gain and loss events modified analysis allows for multifurcating (polytomic) reference trees specific factor for parallel evolution was added to the evaluation procedure implementation as part of the LingPy Python library for quantitative tasks in historical linguistics (http://lingpy.org, Version 2.2, List et al. 2013) 35 / 1

Slide 81

Slide 81 text

Shifting the Paradigm Application Application: Indo-European Data (List et al. 2014) . Results . . . . . . . . 76 cognate sets correctly identified as borrowings 31% of all cognate sets could not be properly explained by the reference tree 17 out of 19 borrowings in English correctly identified well-known contact situations among major groups and languages were correctly identified 35 / 1

Slide 82

Slide 82 text

Shifting the Paradigm Application Application: Indo-European Data (List et al. 2014) 35 / 1

Slide 83

Slide 83 text

Shifting the Paradigm Application Application: Chinese Dialects (List et. al forthcoming) . Data . . . . . . . . lexical data of 40 Chinese dialects (Hóu 2004) 1056 cognate sets (180 semantic glosses) two traditional reference trees reflecting competing hypotheses, and two automatically generated reference trees (Neighbor-Joining and UPGMA) 36 / 1

Slide 84

Slide 84 text

Shifting the Paradigm Application Application: Chinese Dialects (List et. al forthcoming) . Analysis . . . . . . . . calculate minimal spatial networks by plotting the inferred lateral connections onto geographic maps 36 / 1

Slide 85

Slide 85 text

Shifting the Paradigm Application Application: Chinese Dialects (List et. al forthcoming) . Results . . . . . . . . between 48% (UPGMA) and 55% (Neighbor-Joining) of the characters cannot be explained by the reference trees although not showing the highest degree (be it weighted or unweighted) in the minimal lateral network, Běijīng Chinese shows the highest proportion of cognate sets which are suggestive of borrowing (40-42%): this reflects the important role that Běijīng Chinese plays as the current standard language for interdialectal communication and education in China 36 / 1

Slide 86

Slide 86 text

Shifting the Paradigm Application Application: Chinese Dialects (List et al. forthcoming) . . ---Lánzhōu . Fùzhōu -- . Xiāngtàn -- . M ěixiàn -- . H ongkong -- . ---Wǔhàn . ---Běijīng . ---Kùnmíng . Hángzhōu -- . Xiàmén -- . ---Chéngdū . Sùzhōu -- . Shànghǎi -- . Táiběi -- . ---Zhèngzhōu . Shèxiàn -- . ---Nánjīng . ---Guìyáng . W énzhōu -- . N ánníng -- . Tūnxī -- . ---Tiānjìn . Shāntóu -- . ---Xīníng . ---Q īngdǎo . ---Ürüm qi . ---Píngyáo . Nánchàng -- . ---Tàiyuán . Chángshā -- . Hǎikǒu -- . ---Héfèi . Jiàn'ǒu -- . ---Yīnchuàn . ---Hohhot . Táoyuán -- . ---Xī'ān . G uǎngzhōu -- . ---Harbin . ---Jìnán . 0 . 0 . 0 . Inferred Links Reference tree of the Chinese dialects 37 / 1

Slide 87

Slide 87 text

Shifting the Paradigm Application Application: Chinese Dialects (List et al. forthcoming) . . ---Lánzhōu . Fùzhōu -- . Xiāngtàn -- . M ěixiàn -- . H ongkong -- . ---Wǔhàn . ---Běijīng . ---Kùnmíng . Hángzhōu -- . Xiàmén -- . ---Chéngdū . Sùzhōu -- . Shànghǎi -- . Táiběi -- . ---Zhèngzhōu . Shèxiàn -- . ---Nánjīng . ---Guìyáng . W énzhōu -- . N ánníng -- . Tūnxī -- . ---Tiānjìn . Shāntóu -- . ---Xīníng . ---Q īngdǎo . ---Ürüm qi . ---Píngyáo . Nánchàng -- . ---Tàiyuán . Chángshā -- . Hǎikǒu -- . ---Héfèi . Jiàn'ǒu -- . ---Yīnchuàn . ---Hohhot . Táoyuán -- . ---Xī'ān . G uǎngzhōu -- . ---Harbin . ---Jìnán . 0 . 0 . 0 . Inferred Links MLN analysis, no borrowing allowed 37 / 1

Slide 88

Slide 88 text

Shifting the Paradigm Application Application: Chinese Dialects (List et al. forthcoming) . . ---Lánzhōu . Fùzhōu -- . Xiāngtàn -- . M ěixiàn -- . H ongkong -- . ---Wǔhàn . ---Běijīng . ---Kùnmíng . Hángzhōu -- . Xiàmén -- . ---Chéngdū . Sùzhōu -- . Shànghǎi -- . Táiběi -- . ---Zhèngzhōu . Shèxiàn -- . ---Nánjīng . ---Guìyáng . W énzhōu -- . N ánníng -- . Tūnxī -- . ---Tiānjìn . Shāntóu -- . ---Xīníng . ---Q īngdǎo . ---Ürüm qi . ---Píngyáo . Nánchàng -- . ---Tàiyuán . Chángshā -- . Hǎikǒu -- . ---Héfèi . Jiàn'ǒu -- . ---Yīnchuàn . ---Hohhot . Táoyuán -- . ---Xī'ān . G uǎngzhōu -- . ---Harbin . ---Jìnán . 1 . 4 . 8 . Inferred Links MLN analysis, best fit of borrowing and inheritance 37 / 1

Slide 89

Slide 89 text

Shifting the Paradigm Application Application: Chinese Dialects (List et al. forthcoming) . . Guānhuà . Xiàng . Mǐn . Yuè . Wú . Jìn . Kèjiā . Gàn . Huī . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 . 10 . 11 . 12 . 13 . 14 . 15 . 16 . 17 . 18 . 19 . 20 . 21 . 22 . 23 . 24 . 25 . 26 . 27 . 28 . 29 . 30 . 31 . 32 . 33 . 34 . 35 . 36 . 37 . 38 . 39 . 40 . 1 . Běijīng 北京 . 2 . Chángshā 长沙 . 3 . Chéngdū 成都 . 4 . Fùzhōu 福州 . 5 . Guǎngzhōu 广州 . 6 . Guìyáng 贵阳 . 7 . Harbin 哈尔滨 . 8 . Hǎikǒu 海口 . 9 . Hángzhōu 杭州 . 10 . Héfèi 合肥 . 11 . Hohhot 呼和浩特 . 12 . Jiàn'ōu 建瓯 . 13 . Jìnán 济南 . 14 . Kùnmíng 昆明 . 15 . Lánzhōu 兰州 . 16 . Měixiàn 梅县 . 17 . Nánchàng 南昌 . 18 . Nánjīng 南京 . 19 . Nánníng 南宁 . 20 . Píngyáo 平遥 . 21 . Qīngdǎo 青岛 . 22 . Shànghǎi 上海 . 23 . Shāntóu 汕头 . 24 . Shèxiàn 歙县 . 25 . Sùzhōu 苏州 . 26 . Táiběi 台北 . 27 . Tàiyuán 太原 . 28 . Táoyuán 桃园 . 29 . Tiānjìn 天津 . 30 . Tūnxī 屯溪 . 31 . Wénzhōu 温州 . 32 . Wǔhàn 武汉 . 33 . Ürümqi 乌鲁木齐 . 34 . Xiàmén 厦门 . 35 . Hongkong 香港 . 36 . Xiāngtàn 湘潭 . 37 . Xīníng 西宁 . 38 . Xī'ān 西安 . 39 . Yīnchuàn 银川 . 40 . Zhèngzhōu 郑州 . 1 . 7 . 15 . Inferred Links 37 / 1

Slide 90

Slide 90 text

Shifting the Paradigm Application Application: Chinese Dialects (work in progress) . . -----Jìnán . -----Harbin . -----Héfèi . Chángshā ---- . Sùzhōu ---- . -----Yīnchuàn . -----Běijīng . Hángzhōu ---- . -----Chéngdū . -----Hohhot . -----Lánzhōu . Xiāngtàn ---- . -----Ürüm qi . M ěixiàn ---- . -----Xī'ān . G uǎngzhōu ---- . -----Nánjīng . Táoyuán ---- . -----Zhèngzhōu . -----Kùnmíng . Táiběi ---- . Shànghǎi ---- . Xiàmén ---- . Jiàn'ǒu ---- . Shèxiàn ---- . -----Q īngdǎo . -----Xīníng . Fùzhōu ---- . -----Tàiyuán . -----Píngyáo . Nánchàng ---- . H ongkong ---- . N ánníng ---- . W énzhōu ---- . -----Guìyáng . Shāntóu ---- . -----Tiānjìn . Tūnxī ---- . Hǎikǒu ---- . -----Wǔhàn . 太阳 . 日头 . 热头 . 阳婆 . 日 . Loss Event . Gain Event Item „sun” 38 / 1

Slide 91

Slide 91 text

Shifting the Paradigm Application Application: Chinese Dialects (work in progress) Item „sun” . . Shànghǎi ---- . Hongkong ---- . Táiběi ---- . Nánjīng ---- . Táoyuán ---- . Běijīng ---- . Měixiàn ---- . Xiàmén ---- . Fùzhōu ---- . Guǎngzhōu ---- . 太阳 . 日头 . Loss Event . Gain Event 38 / 1

Slide 92

Slide 92 text

Shifting the Paradigm Application Application: Chinese Dialects (work in progress) Item „sun” . . Shànghǎi ---- . Hongkong ---- . Táiběi ---- . Nánjīng ---- . Táoyuán ---- . Běijīng ---- . Měixiàn ---- . Xiàmén ---- . Fùzhōu ---- . Guǎngzhōu ---- . 太阳 . 日头 . Loss Event . Gain Event 38 / 1

Slide 93

Slide 93 text

Shifting the Paradigm Application Application: Chinese Dialects (work in progress) Item „sun” . . Shànghǎi ---- . Hongkong ---- . Táiběi ---- . Nánjīng ---- . Táoyuán ---- . Běijīng ---- . Měixiàn ---- . Xiàmén ---- . Fùzhōu ---- . Guǎngzhōu ---- . 太阳 . 日头 . Loss Event . Gain Event 38 / 1

Slide 94

Slide 94 text

Outlook Outlook Outlook 39 / 1

Slide 95

Slide 95 text

Outlook further test the MLN method on linguistic data 40 / 1

Slide 96

Slide 96 text

Outlook further test the MLN method on linguistic data increase the transparency of the results in order to provide linguistic experts with a valid starting point for further not necessarily automatic research 40 / 1

Slide 97

Slide 97 text

Outlook further test the MLN method on linguistic data increase the transparency of the results in order to provide linguistic experts with a valid starting point for further not necessarily automatic research improve the capability of the models: Similarly to gene fusion in biology, we have complex processes of compounding, regularly contributing to lexical change. Gain-loss models are not enough to deal with these cases of partial homology. 40 / 1

Slide 98

Slide 98 text

Outlook further test the MLN method on linguistic data increase the transparency of the results in order to provide linguistic experts with a valid starting point for further not necessarily automatic research improve the capability of the models: Similarly to gene fusion in biology, we have complex processes of compounding, regularly contributing to lexical change. Gain-loss models are not enough to deal with these cases of partial homology. Thank You for listening! 40 / 1