“These assumptions, which follow logically from the results of our re- search, can be best illustrated by the image of a branching tree.” (Schle- icher 1853: 787) 11 / 52
“You can turn it as you want, but as long as you stick to the idea that the his- torically attested languages have been developing by multiple furcations of an ancestral language, that is, as long as you assume that there is a Stammbaum [family tree] of the Indo-European lan- guages, you will never be able to explain all facts which have been assembled in a scientifically satisfying way.” (Schmidt 1872: 17, my translation) 13 / 52
“I want to replace [the tree] by the im- age of a wave that spreads out from the center in concentric circles be- coming weaker and weaker the far- ther they get away from the center.” (Schmidt 1872: 27, my translation) 14 / 52
because... they are difficult to reconstruct............ languages do not always split............ .......... ............ ............ they are boring, since they only model the vertical aspects of language history ............ 16 / 52
because... they are difficult to reconstruct............ languages do not always split............ .......... ............ ............ they are boring, since they only model the vertical aspects of language history ............ Waves are bad, because nobody knows how to reconstruct them 16 / 52
because... they are difficult to reconstruct............ languages do not always split............ .......... ............ ............ they are boring, since they only model the vertical aspects of language history ............ Waves are bad, because nobody knows how to reconstruct them languages still diverge, even if not necessarily in split processes 16 / 52
because... they are difficult to reconstruct............ languages do not always split............ .......... ............ ............ they are boring, since they only model the vertical aspects of language history ............ Waves are bad, because nobody knows how to reconstruct them languages still diverge, even if not necessarily in split processes they are boring, since they only model the horizontal aspects of language history 16 / 52
to the Past The Geological Evidences of The Antiquity of Man with Remarks on Theories of The Origin of Species by Variation By Sir Charles Lyell London John Murray, Albemarle Street 1863 20 / 52
to the Past If we new not- hing of the existence of Latin, - if all historical documents previous to the fin- teenth century had been lost, - if tra- dition even was si- lent as to the former existance of a Ro- man empire, a me- re comparison of the Italian, Spanish, Portuguese, French, Wallachian, and Rhaetian dialects would enable us to say that at some time there must ha- ve been a language, from which these six modern dialects derive their origin in common. 20 / 52
to the Past: Uniformitarianism (C. Lyell) Uniformity of Change: Laws of change are uniform. They have applied in the past as they apply now and will apply in the future, no matter at which place. 21 / 52
to the Past: Uniformitarianism (C. Lyell) Uniformity of Change: Laws of change are uniform. They have applied in the past as they apply now and will apply in the future, no matter at which place. Graduality of Change: Change proceeds gradually, not abrupt. 21 / 52
to the Past: Uniformitarianism (C. Lyell) Uniformity of Change: Laws of change are uniform. They have applied in the past as they apply now and will apply in the future, no matter at which place. Graduality of Change: Change proceeds gradually, not abrupt. Abductive Reasoning: We can infer past events and processes by investigating patterns observed in the present, which becomes the “key to the interpretation of some mystery in the archives of remote ages” (Lyell 1830: 165) 21 / 52
to the Past: Uniformitarianism (A. Schleicher) Language Change is a gradual process (Schleicher 1848: 25). is a law-like process (Schleicher 1848: 25). is a natural process which occurs in all languages (Schleicher 1848: 25). universal process which occurs in all times (Schleicher 1863[1873]: 10f). allows us to infer past processes and extinct languages by investigating the languages of the present (see Schleicher 1848: 25). 22 / 52
to the Past: Summary It was not the direct exchange of ideas that lead to the de- velopment of similar approaches in biology and linguistics, but the astonishing fact that scholars in both fields would at about the same time detect striking parallels between both disciplines, both regarding their theoretical founda- tions and the processes they were investigating. 23 / 52
to the Past: Summary It was not the direct exchange of ideas that lead to the de- velopment of similar approaches in biology and linguistics, but the astonishing fact that scholars in both fields would at about the same time detect striking parallels between both disciplines, both regarding their theoretical founda- tions and the processes they were investigating. And linguists were the first to draw trees! 23 / 52
to the Past: Summary 1700 1800 1750 1850 List et al. (in preparation) Stiernhielm's Lingua Nova 1671 Gallet's Arbre ca. 1800 Darwin's Origins 1859 De Buffon's Table 1755 Schleicher's Stammbaum 1853 Darwin's Tree Sketch 1837 Lamarck's Tableaux 1809 Čelakovský's Rodový Kmen 1853 Rühling's Tabula 1774 Hicke's Affinitas 1689 Schottels's Tabelle 1663 24 / 52
Turn “Indo-European and computational cladistics” (Ringe, Warnow and Taylor 2002) “Language-tree divergence times support the Anatolian theory of Indo-European origin” (Gray und Atkinson 2003) “Language classification by numbers” (McMahon und McMahon 2005) “Curious Parallels and Curious Connections: Phylogenetic Thinking in Biology and Historical Linguistics” (Atkinson und Gray 2005) “Automated classification of the world’s languages” (Brown et al. 2008) “Indo-European languages tree by Levenshtein distance” (Serva and Petroni 2008) “Networks uncover hidden lexical borrowing in Indo-European language evolution” (Nelson-Sathi et al. 2011) 26 / 52
Turn “Indo-European and computational cladistics” (Ringe, Warnow and Taylor 2002) “Language-tree divergence times support the Anatolian theory of Indo-European origin” (Gray und Atkinson 2003) “Language classification by numbers” (McMahon und McMahon 2005) “Curious Parallels and Curious Connections: Phylogenetic Thinking in Biology and Historical Linguistics” (Atkinson und Gray 2005) “Automated classification of the world’s languages” (Brown et al. 2008) “Indo-European languages tree by Levenshtein distance” (Serva and Petroni 2008) “Networks uncover hidden lexical borrowing in Indo-European language evolution” (Nelson-Sathi et al. 2011) 26 / 52
Turn: Words as Genes Basic Concept German ID English ID Italian ID French ID HAND Hand 1 hand 1 mano 2 main 2 BLOOD Blut 3 blood 3 sangue 4 sang 4 HEAD Kopf 5 head 6 testa 7 tête 7 TOOTH Zahn 8 tooth 8 dente 8 dent 8 TO SLEEP schlafen 9 sleep 9 dormir 10 dormir 10 TO SAY sagen 11 say 11 dire 12 dire 12 ... ... ... ... ... ... ... ... ... 27 / 52
Turn: Words as Genes Basic Concept German ID English ID Italian ID French ID HAND Hand 1 hand 1 mano 2 main 2 BLOOD Blut 3 blood 3 sangue 4 sang 4 HEAD Kopf 5 head 6 testa 7 tête 7 TOOTH Zahn 8 tooth 8 dente 8 dent 8 TO SLEEP schlafen 9 sleep 9 dormir 10 dormir 10 TO SAY sagen 11 say 11 dire 12 dire 12 ... ... ... ... ... ... ... ... ... 27 / 52
Turn: Words as Genes English 111 German 101 French 000 Italian 001 101 001 001 + B − C + A Char. English German French Italian A 1 1 0 0 B 1 0 0 0 C 1 1 0 1 27 / 52
Turn: Sounds as Nuclein Bases Concept German English Italian French “HAND” G E I F Hand 0 1 2 3 hand 1 0 2 3 mano 2 2 0 2 main 3 3 2 0 “BLOOD” G E I F Blut 0 4 5 4 blood 4 0 6 5 sangue 5 6 0 2 sang 4 5 2 0 Edit Distances between Orthographic Entries 28 / 52
Parallels Parallels between Species and Languages (Pagel 2009) aspect species languages unit of replication gene word replication asexual und sexual reproduction learning speciation cladogenesis language split forces of change natural selection and genetic drift social selection and trends differentiation tree-like tree-like 29 / 52
Parallels Differences between Species and Languages (Geisler & List 2013) Aspect Species Languages domain Popper’s World I Popper’s World III relation between form and function mechanical arbitrary origin monogenesis unclear sequence similarity universal (indepen- dent of species) language-specific differentiation tree-like network-like 31 / 52
Meaning Latin Italian ‘FEATHER’ pluːma pjuma ‘FLAT’ plaːnus pjano ‘SQUARE’ plateːa pjaʦːa Meaning Latin Italian ‘TONGUE’ liŋgua liŋgwa ‘MOON’ lu:na luna ‘SLOW’ lentus lento l > j 36 / 52
Meaning Latin Italian ‘FEATHER’ pluːma pjuma ‘FLAT’ plaːnus pjano ‘SQUARE’ plateːa pjaʦːa Meaning Latin Italian ‘TONGUE’ liŋgua liŋgwa ‘MOON’ lu:na luna ‘SLOW’ lentus lento l > j l > l 36 / 52
Meaning Latin Italian ‘FEATHER’ pluːma pjuma ‘FLAT’ plaːnus pjano ‘SQUARE’ plateːa pjaʦːa Meaning Latin Italian ‘TONGUE’ liŋgua liŋgwa ‘MOON’ lu:na luna ‘SLOW’ lentus lento l > j l > l l > j / p _ 36 / 52
Meaning Latin Italian ‘FEATHER’ pluːma pjuma ‘FLAT’ plaːnus pjano ‘SQUARE’ plateːa pjaʦːa Meaning Latin Italian ‘TONGUE’ liŋgua liŋgwa ‘MOON’ lu:na luna ‘SLOW’ lentus lento l > j l > l l > j / p _ Not sounds change, sound systems change (Bloomfield 1933)! 36 / 52
Meaning Latin Italian ‘FEATHER’ pluːma pjuma ‘FLAT’ plaːnus pjano ‘SQUARE’ plateːa pjaʦːa Meaning Latin Italian ‘TONGUE’ liŋgua liŋgwa ‘MOON’ lu:na luna ‘SLOW’ lentus lento l > j l > l l > j / p _ Not sounds change, sound systems change (Bloomfield 1933)! Sound change depends on the context in which the sounds occur! 36 / 52
Meaning Latin Italian ‘FEATHER’ pluːma pjuma ‘FLAT’ plaːnus pjano ‘SQUARE’ plateːa pjaʦːa Meaning Latin Italian ‘TONGUE’ liŋgua liŋgwa ‘MOON’ lu:na luna ‘SLOW’ lentus lento l > j l > l l > j / p _ Not sounds change, sound systems change (Bloomfield 1933)! Sound change depends on the context in which the sounds occur! Sound change largely follows irreversible patterns! 36 / 52
Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 3 x d d 1 x n n 1 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn d ɔː n 37 / 52
Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 3 x d d 1 x n n 1 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn d ɔː n 37 / 52
Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 2 x d d 1 x n n 1 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn d ɔː n 37 / 52
Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 2 x d d 1 x n n 1 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn θ ɔː n 37 / 52
Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 3 x d d 1 x ? n n 2 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn θ ɔː n 37 / 52
Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 3 x d d 1 x n n 2 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn θ ɔː n 37 / 52
Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 3 x n n 2 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German Dorn d ɔɐ n English thorn θ ɔː n German dumm d ʊ m English dumb d ʌ m 37 / 52
A little experiment... data from 8 Bai dialects (Sino-Tibetan language spoken in China, Allen 2007) cognate (homologous) parts in all words were aligned from the sounds, a network was reconstructed, showing the frequency in which homologous sounds occur in the same column of an alignment the network was further clustered using Markov clustering (Dongen 2002) for community structure 39 / 52
v v w v v v ɥ v w w w w v w w e e e ẽ e ẽ ɛ ɛ̃ ẽ æ ̃ æ e ẽ ẽ e e ts ts ts dʑ ts tʂ ɖ ʈ ts ts c tɕ ts dʐ z tʂ ts tʃ dz ɔ ɔ o ɔ o ɔ o õ ɔ̃ ɔ ɔ ɔ̃ o ³⁵ ⁵⁵ ⁵⁵ ⁵⁵ ⁵⁵ ³⁵ ⁵⁵ õ o ɔ ɤ̃ o õ ³⁵ ⁵⁵ ⁵⁵ ³⁵ ³⁵ ³⁵ ³⁵ ³⁵ t t t d t t t d t t ĩ ɛ ɛ̃ i i ɛ̃ i ɛ ɛ ɛ ɛ̃ ɿ ɿ ɿ ɿ ɿ ɿ ʅ ɿ ɿ ɤ ʊ ɤ ɤ ɤ̃ ɤ ɤ ɤ ⁵⁵ ʁ ɣ ɣ ɣ ɣ ɣ ʔ ɣ ɣ æ ɛ ɛ ɛ ɛ̃ w ɛ̃ ã ʊ̃ ɤ̃ ɔ̃ õ ɤ ɤ̃ o ɯ ɯ ɯ ̃ ɯ ɯ ̃ ɯ ̃ ɯ ɯ ̃ ɯ u o u ũ ũ i i ĩ i a a e a a ã ã ỹ ĩ ĩ ĩ i i cʰ tʃʰ tsʰ ʈʰ tʂʰ tsʰ tsʰ tsʰ ɯ ỹ ɯ ɯ ̃ ɑ ɯ u u ɤ u u u u ũ tsʰ tʂʰ tsʰ tʃʰ tsʰ tsʰ a a ã ã a ɔ̃ ŋ ŋ ŋ ŋ ŋ ŋ ŋ ɴ f f f f f f f f ²¹ ²¹ ²¹ ²¹ ²¹ ²¹ ³¹ ²¹ ɕ ɕ ɕ ɕ ɕ ɕ ɕ ɕʰ ³³ ³³ ³³ ³³ ³³ ³³ ³³ tɕʰ tɕʰ tɕʰ tɕʰ tɕʰ tɕʰ tɕʰ tɕʰ tɕ tɕ dʑ tɕ tɕ tɕ tɕ ⁴⁴ ⁴⁴ ⁴⁴ ⁴⁴ ⁴⁴ ⁴⁴ ⁴⁴ ⁴⁴ n n ɲ n n ɲ j n n ʔ ɲ n n ɲ ɲ ʂ z s s s s ɕ s ʃ s ʂ s ʃ sʰ s j ʑ j ɣ j j j j j z z z ʐ ʐ z ʐ kʲ k k k k q k k kʲ k x x kʰ x kʰ qʰ y tɕ y y ³³ p y ²¹ y ũ ỹ ỹ ỹ y y y kʰ kʰ kʰ kʰ b kʰʲ kʰ kʰ p p p ũ p p p p tʃ b pʰ χ xʰ a x x x x x ɡ ŋ ɡ k m m m m m m m m ɴ̩ ³¹ ³¹ ³¹ ∼ ³¹ ³¹ ³¹ ³¹ tʰ tʰ tʰ tʰ pʰ tʰ tʰ tʰ tʰ pʰ pʰ ∼ pʰ ∼ pʰ ∼ pʰ ∼ ∼ pʰ l l l l l ⁴² ⁴² ⁴² l ⁴² l ⁴² ⁴² ⁴² ⁴² l 40 / 52
v v w v v v ɥ v w w w w v w w e e e ẽ e ẽ ɛ ɛ̃ ẽ æ ̃ æ e ẽ ẽ e e ts ts ts dʑ ts tʂ ɖ ʈ ts ts c tɕ ts dʐ z tʂ ts tʃ dz ɔ ɔ o ɔ o ɔ o õ ɔ̃ ɔ ɔ ɔ̃ o ³⁵ ⁵⁵ ⁵⁵ ⁵⁵ ⁵⁵ ³⁵ ⁵⁵ õ o ɔ ɤ̃ o õ ³⁵ ⁵⁵ ⁵⁵ ³⁵ ³⁵ ³⁵ ³⁵ ³⁵ t t t d t t t d t t ĩ ɛ ɛ̃ i i ɛ̃ i ɛ ɛ ɛ ɛ̃ ɿ ɿ ɿ ɿ ɿ ɿ ʅ ɿ ɿ ɤ ʊ ɤ ɤ ɤ̃ ɤ ɤ ɤ ⁵⁵ ʁ ɣ ɣ ɣ ɣ ɣ ʔ ɣ ɣ æ ɛ ɛ ɛ ɛ̃ w ɛ̃ ã ʊ̃ ɤ̃ ɔ̃ õ ɤ ɤ̃ o ɯ ɯ ɯ ̃ ɯ ɯ ̃ ɯ ̃ ɯ ɯ ̃ ɯ u o u ũ ũ i i ĩ i a a e a a ã ã ỹ ĩ ĩ ĩ i i cʰ tʃʰ tsʰ ʈʰ tʂʰ tsʰ tsʰ tsʰ ɯ ỹ ɯ ɯ ̃ ɑ ɯ u u ɤ u u u u ũ tsʰ tʂʰ tsʰ tʃʰ tsʰ tsʰ a a ã ã a ɔ̃ ŋ ŋ ŋ ŋ ŋ ŋ ŋ ɴ f f f f f f f f ²¹ ²¹ ²¹ ²¹ ²¹ ²¹ ³¹ ²¹ ɕ ɕ ɕ ɕ ɕ ɕ ɕ ɕʰ ³³ ³³ ³³ ³³ ³³ ³³ ³³ tɕʰ tɕʰ tɕʰ tɕʰ tɕʰ tɕʰ tɕʰ tɕʰ tɕ tɕ dʑ tɕ tɕ tɕ tɕ ⁴⁴ ⁴⁴ ⁴⁴ ⁴⁴ ⁴⁴ ⁴⁴ ⁴⁴ ⁴⁴ n n ɲ n n ɲ j n n ʔ ɲ n n ɲ ɲ ʂ z s s s s ɕ s ʃ s ʂ s ʃ sʰ s j ʑ j ɣ j j j j j z z z ʐ ʐ z ʐ kʲ k k k k q k k kʲ k x x kʰ x kʰ qʰ y tɕ y y ³³ p y ²¹ y ũ ỹ ỹ ỹ y y y kʰ kʰ kʰ kʰ b kʰʲ kʰ kʰ p p p ũ p p p p tʃ b pʰ χ xʰ a x x x x x ɡ ŋ ɡ k m m m m m m m m ɴ̩ ³¹ ³¹ ³¹ ∼ ³¹ ³¹ ³¹ ³¹ tʰ tʰ tʰ tʰ pʰ tʰ tʰ tʰ tʰ pʰ pʰ ∼ pʰ ∼ pʰ ∼ pʰ ∼ ∼ pʰ l l l l l ⁴² ⁴² ⁴² l ⁴² l ⁴² ⁴² ⁴² ⁴² l 40 / 52
so far, we only explore, we do not yet analyse the patterns as a next step, we need to start thinking about ways to infer potential directions of changes we also need to find more rigorous ways to handle the context of change patterns, as context is one of the major factors conditioning sound change we only use monopartite networks in this exploration, and do not really illustrate which sound occurs in which language for a deep analysis, we will need to include the languages in which the sounds occur into our analysis need to include the information 41 / 52
Semantic change plays a crucial role in language change. Although most linguists assume that it proceeds according to certain general patterns, we currently lack the empirical basis to pursue the question in depth. Normally, semantic change proceeds by cumulation and reduction. 43 / 52
German “head” Kopf . k ɔ p͡f Pre-German “head” *kop – k ɔ p “vessel” Proto- Germanic *kuppa- k u pː a “vessel” POLYSEMY PHASE FORM MEANING MONOSEMY PHASE MONOSEMY PHASE CUMULATION REDUCTION 43 / 52
Concept "money" is part of a cluster with the central concept "fishscale" with a total of 10 nodes. Hover over forms for each link. Click on the forms to check their sources. Click HERE to export the current network. ty: Line weights: Coloring: Family silver leather fishscale bark coin fur snail skin, hide money shell 49 links for "silver" and "money": Language Family Form 1. Ignaciano Arawakan ne 2. Aymara, Central Aymaran ḳulʸḳi 3. Tsafiki Barbacoan kaˈla 4. Seselwa Creole French Creole larzan 5. Miao, White Hmong-Mien nyiaj 6. Breton Indo-European arhant 7. French Indo-European argent 8. Gaelic, Irish Indo-European airgead 9. Welsh Indo-European arian 10. Cofán Isolate koriΦĩʔdi 44 / 52
Concept "wheel" is part of a cluster with the central concept "leg" with a total of 11 nodes. Hover over the e each link. Click on the forms to check their sources. Click HERE to export the current network. ity: Line weights: Coloring: Geolocation sphere, ball round footprint foot calf of leg circle thigh wheel leg hip buttocks 6 links for "foot" and "wheel": Language Family Form 1. Cofán Isolate c̷ɨʔtʰe 2. Puinave Isolate sim 3. Yaminahua Panoan taɨ 4. Wayampi Tupi pɨ 5. Pumé Unclassified taɔ 6. Ninam Yanomam mãhuk 44 / 52
so far, we use monopartite networks for our modeling and rather simple community-detection algorithms, as a result, we loose signal, since words do not change their meaning in isolation, but we know that semantic change is often interconnected: the change of the meaning in one word goes along with changes in other words bipartite networks seem to be a straightforward way to model our networks to account for interdependencies we only compare the meanings of words in isolation, but we know that the meaning of a word can be compositional, involving complex structures of denotation (compare “apple tree”, “grandfather”, etc.) by investigating partial colexifications (partial polysemy) we may gain new insights into the roads of perception and denotation 45 / 52
One beer please! A beer for me! Beer? Please? You have beer? I'm thirsty, but I do not drink water, can you help me? I want the same as everybody else here. 46 / 52
We can think of many different ways of how to express a certain meaning, but although the potential is virtually un- limited, the roads of denotation, that is, the mechanisms by which words are formed from morphemes, follow certain re- curring patterns across all languages. Comparing these pat- terns can give us important insights into human cognition. 47 / 52
On the other hand, the fact that words are often formed from smaller parts, be it by compounding existing words, or us- ing specific morphemes to derive new words, makes it very difficult to identify homologous words automatically! What are the mechanisms by which the roads of denotation are created across the worlds languages? How can we distinguish direct homologues (orthologues) from indirect ones (partial homologues, etc.) in phylogenetic models or homologue detection? 47 / 52
'soh₂-wl̩- sh₂uˈen- SUN Indo-European soːwel- sunːoː- SUN Germanic soːl- SUN soːlikul- SMALL SUN Romance solej SUN French sol SUN Spanish zɔnə SUN German suːl SUN Swedish 48 / 52
'soh₂-wl◌̩ - sh₂uˈen- SUN Indo-European soːwel- sunːoː- SUN Germanic soːl- SUN soːlikul- SMALL SUN Romance solej SUN French sol SUN Spanish zɔnə SUN German suːl SUN Swedish SEM ANTIC SHIFT M O RPH O LO G ICAL CH AN G E M O R PH O LO G ICA L CH A N G E MORPHOLOGICAL CHANGE MORPHOLOGICAL CHANGE 48 / 52
Italian dare French donner Indo-European *deh₃- *deh₃-no- Latin dare dōnum dōnāre Italian sole French soleil Swedish sol German Sonne Germanic *sōwel- *sunnō- Latin sol soliculus Indo-European *sóh₂-wl̩ - *sh₂én- A B 48 / 52
Automatic Detection of Partial Cognates: The Problem languages in which words are frequently created by compounding the identification of homologous words is extremely difficult current phylogenetic models cannot handle partial homology, and as a result, very important signal is lost current methods for automatic homologue detection in linguistics also cannot handle partial homologues and show a very low accuracy in languages where compounding is frequent (especially in the languages of South-East Asia) 49 / 52
German m oː n t - English m uː n - - Danish m ɔː n - ə Swedish m oː n - e Fúzhōu ŋ u o ʔ ⁵ - - - - - - - - - - Měixiàn ŋ i a t ⁵ - - - - - k u o ŋ ⁴⁴ Guǎngzhōu j - y t ² l - œ ŋ ²² - - - - - Běijīng - y ɛ - ⁵¹ l i ɑ ŋ - - - - - - 49 / 52
German m oː n t - English m uː n - - Danish m ɔː n - ə Swedish m oː n - e Fúzhōu ŋ u o ʔ ⁵ - - - - - - - - - - Měixiàn ŋ i a t ⁵ - - - - - k u o ŋ ⁴⁴ Guǎngzhōu j - y t ² l - œ ŋ ²² - - - - - Běijīng - y ɛ - ⁵¹ l i ɑ ŋ - - - - - - "MOON" "MOON" "SHINE" "LIGHT" 49 / 52
Automatic Detection of Partial Cognates: The Solution use sequence similarity networks to determine the similarity between the parts of the words in the data apply filters to reduce the edges in the similarity networks use a community detection algorithm to further partition the data into clusters 49 / 52
Automatic Detection of Partial Cognates: The Solution with help of sequence similarity networks, we (List, Lopez, and Bapteste 2016) have created the first algorithm to detect partial cognates (homologues) in linguistic data our method outperforms traditional methods largely, reaching a plus of more than 5% in accuracy on our test sets the algorithms is also very fast and can be easily applied to considerably large datasets 49 / 52
with our new algorithm for partial cognate detection with help of sequence similarity networks, we have opened the door for the fast creation of large datasets for language families in historical linguistics which could so far not be sufficiently analysed with phylogenetic methods unfortunately, however, we lack the phylogenetic models to sufficiently further analyse the data (in List 2016, it is shown, that we need multi-state models in order to handle partial homology sufficiently) our knowledge about the underlying processes from an evolutionary perspective is also not very profound, and we need to try to find new ways to study the roads of denotation across the languages in the world 50 / 52
processes which have triggered the diversity of the linguistic diversity we observe today. By reducing the investigation of language evolution to the search for phylo- genetic trees, we deprive ourselves of an abundance of data which can offer new explanations for the development of in- dividual language families, universal characteristics of lan- guage change, and even universal characteristics of human cognition. Whether evolutionary processes in biology and linguistics are indeed similar is difficult to tell. However, when carefully comparing the commonalities, we may find ways to success- fully transfer and adapt methods across disciplines, but also to gain new insights into overarching processes of evolution. 52 / 52