and Biology Johann-Mattis List Department of Linguistic and Cultural Evolution Max Planck Institute for the Science of Human History Jena 2018/03/14 1 / 52
“These assumptions, which follow logically from the results of our re- search, can be best illustrated by the image of a branching tree.” (Schle- icher 1853: 787) 10 / 52
“You can turn it as you want, but as long as you stick to the idea that the his- torically attested languages have been developing by multiple furcations of an ancestral language, that is, as long as you assume that there is a Stammbaum [family tree] of the Indo-European lan- guages, you will never be able to explain all facts which have been assembled in a scientifically satisfying way.” (Schmidt 1872: 17, my translation) 12 / 52
“I want to replace [the tree] by the im- age of a wave that spreads out from the center in concentric circles be- coming weaker and weaker the far- ther they get away from the center.” (Schmidt 1872: 27, my translation) 13 / 52
are bad, because... they are difficult to reconstruct............ languages do not always split............ .......... ............ ............ they are boring, since they only model the vertical aspects of language history ............ 15 / 52
are bad, because... they are difficult to reconstruct............ languages do not always split............ .......... ............ ............ they are boring, since they only model the vertical aspects of language history ............ Waves are bad, because nobody knows how to reconstruct them 15 / 52
are bad, because... they are difficult to reconstruct............ languages do not always split............ .......... ............ ............ they are boring, since they only model the vertical aspects of language history ............ Waves are bad, because nobody knows how to reconstruct them languages still diverge, even if not necessarily in split processes 15 / 52
are bad, because... they are difficult to reconstruct............ languages do not always split............ .......... ............ ............ they are boring, since they only model the vertical aspects of language history ............ Waves are bad, because nobody knows how to reconstruct them languages still diverge, even if not necessarily in split processes they are boring, since they only model the horizontal aspects of language history 15 / 52
Schuchardt (1842-1927) “We connect the branches and twigs of the tree with countless horizon- tal lines and it ceases to be a tree.” (Schuchardt 1870 [1900]: 11) 16 / 52
to the Past The Geological Evidences of The Antiquity of Man with Remarks on Theories of The Origin of Species by Variation By Sir Charles Lyell London John Murray, Albemarle Street 1863 19 / 52
to the Past If we new not- hing of the existence of Latin, - if all historical documents previous to the fin- teenth century had been lost, - if tra- dition even was si- lent as to the former existance of a Ro- man empire, a me- re comparison of the Italian, Spanish, Portuguese, French, Wallachian, and Rhaetian dialects would enable us to say that at some time there must ha- ve been a language, from which these six modern dialects derive their origin in common. 19 / 52
to the Past: Uniformitarianism (C. Lyell) Uniformity of Change: Laws of change are uniform. They have applied in the past as they apply now and will apply in the future, no matter at which place. 20 / 52
to the Past: Uniformitarianism (C. Lyell) Uniformity of Change: Laws of change are uniform. They have applied in the past as they apply now and will apply in the future, no matter at which place. Graduality of Change: Change proceeds gradually, not abrupt. 20 / 52
to the Past: Uniformitarianism (C. Lyell) Uniformity of Change: Laws of change are uniform. They have applied in the past as they apply now and will apply in the future, no matter at which place. Graduality of Change: Change proceeds gradually, not abrupt. Abductive Reasoning: We can infer past events and processes by investigating patterns observed in the present, which becomes the “key to the interpretation of some mystery in the archives of remote ages” (Lyell 1830: 165) 20 / 52
to the Past: Uniformitarianism (A. Schleicher) Language Change is a gradual process (Schleicher 1848: 25). is a law-like process (Schleicher 1848: 25). is a natural process which occurs in all languages (Schleicher 1848: 25). universal process which occurs in all times (Schleicher 1863[1873]: 10f). allows us to infer past processes and extinct languages by investigating the languages of the present (see Schleicher 1848: 25). 21 / 52
to the Past: Summary It was not the direct exchange of ideas that lead to the devel- opment of similar approaches in biology and linguistics, but the astonishing fact that scholars in both fields would at about the same time detect striking parallels between both disci- plines, both regarding their theoretical foundations and the processes they were investigating. 22 / 52
to the Past: Summary It was not the direct exchange of ideas that lead to the devel- opment of similar approaches in biology and linguistics, but the astonishing fact that scholars in both fields would at about the same time detect striking parallels between both disci- plines, both regarding their theoretical foundations and the processes they were investigating. And linguists were the first to draw trees! 22 / 52
Turn: Words as Genes Basic Concept German ID English ID Italian ID French ID HAND Hand 1 hand 1 mano 2 main 2 BLOOD Blut 3 blood 3 sangue 4 sang 4 HEAD Kopf 5 head 6 testa 7 tête 7 TOOTH Zahn 8 tooth 8 dente 8 dent 8 TO SLEEP schlafen 9 sleep 9 dormir 10 dormir 10 TO SAY sagen 11 say 11 dire 12 dire 12 ... ... ... ... ... ... ... ... ... 26 / 52
Turn: Words as Genes Basic Concept German ID English ID Italian ID French ID HAND Hand 1 hand 1 mano 2 main 2 BLOOD Blut 3 blood 3 sangue 4 sang 4 HEAD Kopf 5 head 6 testa 7 tête 7 TOOTH Zahn 8 tooth 8 dente 8 dent 8 TO SLEEP schlafen 9 sleep 9 dormir 10 dormir 10 TO SAY sagen 11 say 11 dire 12 dire 12 ... ... ... ... ... ... ... ... ... 26 / 52
Turn: Words as Genes English 111 German 101 French 000 Italian 001 101 001 001 + B − C + A Char. English German French Italian A 1 1 0 0 B 1 0 0 0 C 1 1 0 1 26 / 52
Turn: Sounds as Nuclein Bases Concept German English Italian French “HAND” G E I F Hand 0 1 2 3 hand 1 0 2 3 mano 2 2 0 2 main 3 3 2 0 “BLOOD” G E I F Blut 0 4 5 4 blood 4 0 6 5 sangue 5 6 0 2 sang 4 5 2 0 Edit Distances between Orthographic Entries 27 / 52
Parallels Parallels between Species and Languages (Pagel 2009) aspect species languages unit of replication gene word replication asexual und sexual reproduction learning speciation cladogenesis language split forces of change natural selection and genetic drift social selection and trends differentiation tree-like tree-like 28 / 52
Parallels Differences between Species and Languages (Geisler & List 2013) Aspect Species Languages domain Popper’s World I Popper’s World III relation between form and function mechanical arbitrary origin monogenesis unclear sequence similarity universal (indepen- dent of species) language-specific differentiation tree-like network-like 30 / 52
Sound Change Meaning Latin Italian ‘FEATHER’ pluːma pjuma ‘FLAT’ plaːnus pjano ‘SQUARE’ plateːa pjaʦːa Meaning Latin Italian ‘TONGUE’ liŋgua liŋgwa ‘MOON’ lu:na luna ‘SLOW’ lentus lento l > j 33 / 52
Sound Change Meaning Latin Italian ‘FEATHER’ pluːma pjuma ‘FLAT’ plaːnus pjano ‘SQUARE’ plateːa pjaʦːa Meaning Latin Italian ‘TONGUE’ liŋgua liŋgwa ‘MOON’ lu:na luna ‘SLOW’ lentus lento l > j l > l 33 / 52
Sound Change Meaning Latin Italian ‘FEATHER’ pluːma pjuma ‘FLAT’ plaːnus pjano ‘SQUARE’ plateːa pjaʦːa Meaning Latin Italian ‘TONGUE’ liŋgua liŋgwa ‘MOON’ lu:na luna ‘SLOW’ lentus lento l > j l > l l > j / p _ 33 / 52
Sound Change Meaning Latin Italian ‘FEATHER’ pluːma pjuma ‘FLAT’ plaːnus pjano ‘SQUARE’ plateːa pjaʦːa Meaning Latin Italian ‘TONGUE’ liŋgua liŋgwa ‘MOON’ lu:na luna ‘SLOW’ lentus lento l > j l > l l > j / p _ Not sounds change, sound systems change (Bloomfield 1933)! 33 / 52
Sound Change Meaning Latin Italian ‘FEATHER’ pluːma pjuma ‘FLAT’ plaːnus pjano ‘SQUARE’ plateːa pjaʦːa Meaning Latin Italian ‘TONGUE’ liŋgua liŋgwa ‘MOON’ lu:na luna ‘SLOW’ lentus lento l > j l > l l > j / p _ Not sounds change, sound systems change (Bloomfield 1933)! Sound change depends on the context in which the sounds occur! 33 / 52
Sound Change Meaning Latin Italian ‘FEATHER’ pluːma pjuma ‘FLAT’ plaːnus pjano ‘SQUARE’ plateːa pjaʦːa Meaning Latin Italian ‘TONGUE’ liŋgua liŋgwa ‘MOON’ lu:na luna ‘SLOW’ lentus lento l > j l > l l > j / p _ Not sounds change, sound systems change (Bloomfield 1933)! Sound change depends on the context in which the sounds occur! Sound change largely follows irreversible patterns! 33 / 52
Sound Change Was ist das für ein Buchstabe? Das ist ein P. Ich püsse euch alle, ganz besonders Averell, meinen Pleinen. Das reicht! Aber wohin gehen wir, wenn man uns wieder einfängt? Plappe Pleiner! 34 / 52
Sound Change Was ist das für ein Buchstabe? Das ist ein P. Ich püsse euch alle, ganz besonders Averell, meinen Pleinen. Das reicht! Aber wohin gehen wir, wenn man uns wieder einfängt? Plappe Pleiner! Liebe Kinder, heute habe ich Lucky Luke getroffen. Ich küsse euch, ganz besonders Averell, meinen Kleinen! Eure Ma Dalton 34 / 52
Sound Change Was ist das für ein Buchstabe? Das ist ein P. Ich püsse euch alle, ganz besonders Averell, meinen Pleinen. Das reicht! Aber wohin gehen wir, wenn man uns wieder einfängt? Plappe Pleiner! Liebe Pinder, heute habe ich Lucpy Lupe getroffen. Ich püsse euch, ganz besonders Averell, meinen Pleinen! Eure Ma Dalton 34 / 52
Homology The term homology was coined by Richard Owen (1804–1892), who distinguished ‘homologues’, as ‘the same organ in different animals under every variety of form and function’ (Owen 1843: 379), from from ‘analogues’ as an ‘organ in one animal which has the same function as another part or organ in a different animal’ (ibid.: 374). 36 / 52
Homology The term homology was coined by Richard Owen (1804–1892), who distinguished ‘homologues’, as ‘the same organ in different animals under every variety of form and function’ (Owen 1843: 379), from from ‘analogues’ as an ‘organ in one animal which has the same function as another part or organ in a different animal’ (ibid.: 374). Nowadays, it commonly denotes a ‘relationship of common descent between any entities, without further specification of the evolutionary scenario’ (Koonin 2005: 311). 36 / 52
Homology The term homology was coined by Richard Owen (1804–1892), who distinguished ‘homologues’, as ‘the same organ in different animals under every variety of form and function’ (Owen 1843: 379), from from ‘analogues’ as an ‘organ in one animal which has the same function as another part or organ in a different animal’ (ibid.: 374). Nowadays, it commonly denotes a ‘relationship of common descent between any entities, without further specification of the evolutionary scenario’ (Koonin 2005: 311). With respect to specific scenarios of common descent, molecular biologists characterize relationships between homologous genes further by distinguishing between orthology, paralogy, and xenology. 36 / 52
Homology Italian dare French donner Indo-European *deh₃- *deh₃-no- Latin dare dōnum dōnāre Italian sole French soleil Swedish sol German Sonne Germanic *sōwel- *sunnō- Latin sol soliculus Indo-European *sóh₂-wl̩ - *sh₂én- A B List (2016) 37 / 52
Semantic Change Semantic change plays a crucial role in language change. Al- though most linguists assume that it proceeds according to certain general patterns, we currently lack the empirical basis to pursue the question in depth. Normally, semantic change proceeds by cumulation and reduction. 39 / 52
Semantic Change German “head” Kopf . k ɔ p͡f Pre-German “head” *kop – k ɔ p “vessel” Proto- Germanic *kuppa- k u pː a “vessel” POLYSEMY PHASE FORM MEANING MONOSEMY PHASE MONOSEMY PHASE CUMULATION REDUCTION 39 / 52
crucial approach to interdisciplinary research is to adapt suitable methods from other disciplines to our needs instead of blindly taking them unmodified without testing whether they are suitable to be used in historical linguistics after all. 41 / 52
Comparison: Problems since linguistic alphabets change, linguistic alignments need to infer both the mappings between the different alphabets and the alignment itself! the only workaround for this is to preparse the data, using an initial guess for alignments to infer mappings between the different alphabets for each language pair, and compare these against a random distribution drawn from permutation tests this workflow requires more time than a simple alignment of sequences, but luckily, our sequences are small! 43 / 52
One beer please! A beer for me! Beer? Please? You have beer? I'm thirsty, but I do not drink water, can you help me? I want the same as everybody else here. 46 / 52
We can think of many different ways of how to express a cer- tain meaning, but although the potential is virtually unlimited, the roads of denotation, that is, the mechanisms by which words are formed from morphemes, follow certain recurring patterns across all languages. Comparing these patterns can give us important insights into human cognition. 47 / 52
On the other hand, the fact that words are often formed from smaller parts, be it by compounding existing words, or using specific morphemes to derive new words, makes it very diffi- cult to identify homologous words automatically! What are the mechanisms by which the roads of denotation are created across the worlds languages? How can we distinguish direct homologues (orthologues) from indirect ones (partial homologues, etc.) in phylogenetic models or homologue detection? 47 / 52
Automatic Detection of Partial Cognates: Problem languages in which words are frequently created by compounding the identification of homologous words is extremely difficult current phylogenetic models cannot handle partial homology, and as a result, very important signal is lost current methods for automatic homologue detection in linguistics also cannot handle partial homologues and show a very low accuracy in languages where compounding is frequent (especially in the languages of South-East Asia) 48 / 52
German m oː n t - English m uː n - - Danish m ɔː n - ə Swedish m oː n - e Fúzhōu ŋ u o ʔ ⁵ - - - - - - - - - - Měixiàn ŋ i a t ⁵ - - - - - k u o ŋ ⁴⁴ Guǎngzhōu j - y t ² l - œ ŋ ²² - - - - - Běijīng - y ɛ - ⁵¹ l i ɑ ŋ - - - - - - 48 / 52
German m oː n t - English m uː n - - Danish m ɔː n - ə Swedish m oː n - e Fúzhōu ŋ u o ʔ ⁵ - - - - - - - - - - Měixiàn ŋ i a t ⁵ - - - - - k u o ŋ ⁴⁴ Guǎngzhōu j - y t ² l - œ ŋ ²² - - - - - Běijīng - y ɛ - ⁵¹ l i ɑ ŋ - - - - - - "MOON" "MOON" "SHINE" "LIGHT" 48 / 52
Automatic Detection of Partial Cognates: The Solution use sequence similarity networks to determine the similarity between the parts of the words in the data apply filters to reduce the edges in the similarity networks use a community detection algorithm to further partition the data into clusters 48 / 52
Automatic Detection of Partial Cognates: Solution with help of sequence similarity networks, we (List, Lopez, and Bapteste 2016) have created the first algorithm to detect partial cognates (homologues) in linguistic data our method outperforms traditional methods largely, reaching a plus of more than 5% in accuracy on our test sets the algorithms is also very fast and can be easily applied to considerably large datasets 48 / 52
Concept "money" is part of a cluster with the central concept "fishscale" with a total of 10 nodes. Hover over forms for each link. Click on the forms to check their sources. Click HERE to export the current network. ty: Line weights: Coloring: Family silver leather fishscale bark coin fur snail skin, hide money shell 49 links for "silver" and "money": Language Family Form 1. Ignaciano Arawakan ne 2. Aymara, Central Aymaran ḳulʸḳi 3. Tsafiki Barbacoan kaˈla 4. Seselwa Creole French Creole larzan 5. Miao, White Hmong-Mien nyiaj 6. Breton Indo-European arhant 7. French Indo-European argent 8. Gaelic, Irish Indo-European airgead 9. Welsh Indo-European arian 10. Cofán Isolate koriΦĩʔdi 49 / 52
Concept "wheel" is part of a cluster with the central concept "leg" with a total of 11 nodes. Hover over the e each link. Click on the forms to check their sources. Click HERE to export the current network. ity: Line weights: Coloring: Geolocation sphere, ball round footprint foot calf of leg circle thigh wheel leg hip buttocks 6 links for "foot" and "wheel": Language Family Form 1. Cofán Isolate c̷ɨʔtʰe 2. Puinave Isolate sim 3. Yaminahua Panoan taɨ 4. Wayampi Tupi pɨ 5. Pumé Unclassified taɔ 6. Ninam Yanomam mãhuk 49 / 52
we need to be careful to not overstrain our analogies we can try and get inspiration from solutions proposed in other disciplines but we should never forget who we are: LINGUISTS AND PROUD! 51 / 52