Slide 1

Slide 1 text

Beyond Cognacy Current Chances and Future Challenges of Automatic Cognate Detection in Historical Linguistics Johann-Mattis List Forschungszentrum Deutscher Sprachatlas Philipps-University Marburg 2014-09-17 1 / 30

Slide 2

Slide 2 text

word Wort слово cuvînt palabra mot adottszó slovo verbum focal 词 parola λόγος शब◌् द ord λόγος Wort слово cuvînt palabra mot adottszó slovo verbum focal 词 parola शब◌् द ord word ord ord word Cognate Detection 2 / 30

Slide 3

Slide 3 text

Cognate Detection Traditional Approaches Traditional Approaches FRANZ BOPP VERY, VERY LONG TITLE 3 / 30

Slide 4

Slide 4 text

Cognate Detection Traditional Approaches The Comparative Method FRANZ BOPP VERY, VERY LONG TITLE proof of relationship identification of cognates identification of sound correspondences reconstruction of proto-forms internal classification 4 / 30

Slide 5

Slide 5 text

Cognate Detection Traditional Approaches The Comparative Method FRANZ BOPP VERY, VERY LONG TITLE proof of relationship identification of cognates identification of sound correspondences reconstruction of proto-forms internal classification 4 / 30

Slide 6

Slide 6 text

Cognate Detection Traditional Approaches Cognate Detection FRANZ BOPP VERY, VERY LONG TITLE 5 / 30

Slide 7

Slide 7 text

Cognate Detection Traditional Approaches Cognate Detection FRANZ BOPP VERY, VERY LONG TITLE Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 3 x d d 1 x n n 1 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn d ɔː n 5 / 30

Slide 8

Slide 8 text

Cognate Detection Traditional Approaches Cognate Detection FRANZ BOPP VERY, VERY LONG TITLE Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 3 x d d 1 x n n 1 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn d ɔː n 5 / 30

Slide 9

Slide 9 text

Cognate Detection Traditional Approaches Cognate Detection FRANZ BOPP VERY, VERY LONG TITLE Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 2 x d d 1 x n n 1 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn d ɔː n 5 / 30

Slide 10

Slide 10 text

Cognate Detection Traditional Approaches Cognate Detection FRANZ BOPP VERY, VERY LONG TITLE Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 2 x d d 1 x n n 1 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn θ ɔː n 5 / 30

Slide 11

Slide 11 text

Cognate Detection Traditional Approaches Cognate Detection FRANZ BOPP VERY, VERY LONG TITLE Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 3 x d d 1 x ? n n 2 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn θ ɔː n 5 / 30

Slide 12

Slide 12 text

Cognate Detection Traditional Approaches Cognate Detection FRANZ BOPP VERY, VERY LONG TITLE Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 3 x d d 1 x n n 2 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn θ ɔː n 5 / 30

Slide 13

Slide 13 text

Cognate Detection Automatic Approaches Automatic Approaches P(A|B)=(P(B|A)P(A))/(P(B) 6 / 30

Slide 14

Slide 14 text

Cognate Detection Automatic Approaches Narrowing down the Task P(A|B)=(P(B|A)P(A))/(P(B) 7 / 30

Slide 15

Slide 15 text

Cognate Detection Automatic Approaches Narrowing down the Task P(A|B)=(P(B|A)P(A))/(P(B) Traditional Workflow *dent- dente dɑ̃ dɛnte *tanθ tuːθ t͡saːn DICTIONARIES WORDLISTS HISTORICAL SCENARIOS 7 / 30

Slide 16

Slide 16 text

Cognate Detection Automatic Approaches Narrowing down the Task P(A|B)=(P(B|A)P(A))/(P(B) Traditional Workflow HAND [hænd] FOOT [fʊt] EARTH [ɜːrθ] TREE [triː] BARK [bɑːrk] *dent- dente dɑ̃ dɛnte *tanθ tuːθ t͡saːn DICTIONARIES WORDLISTS HISTORICAL SCENARIOS 7 / 30

Slide 17

Slide 17 text

Cognate Detection Automatic Approaches Narrowing down the Task P(A|B)=(P(B|A)P(A))/(P(B) Traditional Workflow HAND [hænd] FOOT [fʊt] EARTH [ɜːrθ] TREE [triː] BARK [bɑːrk] *dent- dente dɑ̃ dɛnte *tanθ tuːθ t͡saːn DICTIONARIES WORDLISTS HISTORICAL SCENARIOS 7 / 30

Slide 18

Slide 18 text

Cognate Detection Automatic Approaches Narrowing down the Task P(A|B)=(P(B|A)P(A))/(P(B) Technical Workflow HAND [hænd] FOOT [fʊt] EARTH [ɜːrθ] TREE [triː] BARK [bɑːrk] WORDLIST DATA HAND [hænd] FOOT [fʊt] EARTH [ɜːrθ] TREE [triː] BARK [bɑːrk] RAW DATA Semantic Tagging HAND [hænd] FOOT [fʊt] EARTH [ɜːrθ] TREE [triː] BARK [bɑːrk] TOKENS, MORPHEMES Tokenization Cognate Detection HAND [hænd] FOOT [fʊt] EARTH [ɜːrθ] TREE [triː] BARK [bɑːrk] COGNATE SETS Alignment Analysis HAND [hænd] FOOT [fʊt] EARTH [ɜːrθ] TREE [triː] BARK [bɑːrk] SOUND CORRESPON- DENCES HAND [hænd] FOOT [fʊt] EARTH [ɜːrθ] TREE [triː] BARK [bɑːrk] PROTO- FORMS Linguistic Reconstruction 7 / 30

Slide 19

Slide 19 text

Cognate Detection Automatic Approaches Narrowing down the Task P(A|B)=(P(B|A)P(A))/(P(B) Technical Workflow HAND [hænd] FOOT [fʊt] EARTH [ɜːrθ] TREE [triː] BARK [bɑːrk] WORDLIST DATA HAND [hænd] FOOT [fʊt] EARTH [ɜːrθ] TREE [triː] BARK [bɑːrk] RAW DATA Semantic Tagging HAND [hænd] FOOT [fʊt] EARTH [ɜːrθ] TREE [triː] BARK [bɑːrk] TOKENS, MORPHEMES Tokenization Cognate Detection HAND [hænd] FOOT [fʊt] EARTH [ɜːrθ] TREE [triː] BARK [bɑːrk] COGNATE SETS Alignment Analysis HAND [hænd] FOOT [fʊt] EARTH [ɜːrθ] TREE [triː] BARK [bɑːrk] SOUND CORRESPON- DENCES HAND [hænd] FOOT [fʊt] EARTH [ɜːrθ] TREE [triː] BARK [bɑːrk] PROTO- FORMS Linguistic Reconstruction 7 / 30

Slide 20

Slide 20 text

Cognate Detection Automatic Approaches Narrowing down the Task P(A|B)=(P(B|A)P(A))/(P(B) Technical Workflow INPUT: Multilingual wordlist → semantically tagged → phonetically transcribed → tokenized into phonemes OUTPUT: Multilingual wordlist → identified cognate entries assigned to clusters → identified cognate entries multiply aligned 7 / 30

Slide 21

Slide 21 text

Cognate Detection Automatic Approaches Algorithms P(A|B)=(P(B|A)P(A))/(P(B) 8 / 30

Slide 22

Slide 22 text

Cognate Detection Automatic Approaches Algorithms P(A|B)=(P(B|A)P(A))/(P(B) Basic Procedure for Multilingual Cognate Detection WORDLIST DATA 8 / 30

Slide 23

Slide 23 text

Cognate Detection Automatic Approaches Algorithms P(A|B)=(P(B|A)P(A))/(P(B) Basic Procedure for Multilingual Cognate Detection WORDLIST DATA PAIRWISE DISTANCES BETWEEN WORDS PAIRWISE COMPARISON 8 / 30

Slide 24

Slide 24 text

Cognate Detection Automatic Approaches Algorithms P(A|B)=(P(B|A)P(A))/(P(B) Basic Procedure for Multilingual Cognate Detection WORDLIST DATA PAIRWISE DISTANCES BETWEEN WORDS COGNATE SETS COGNATE CLUSTERING PAIRWISE COMPARISON 8 / 30

Slide 25

Slide 25 text

Cognate Detection Automatic Approaches Algorithms P(A|B)=(P(B|A)P(A))/(P(B) Cognate Clustering Analysis ID Taxa Word Gloss GlossID IPA ... ... ... ... ... ... 21 German Frau woman 20 frau 22 Dutch vrouw woman 20 vrɑu 23 English woman woman 20 wʊmən 24 Danish kvinde woman 20 kvenə 25 Swedish kvinna woman 20 kviːna 26 Norwegian kvine woman 20 kʋinə ... ... ... ... ... ... 8 / 30

Slide 26

Slide 26 text

Cognate Detection Automatic Approaches Algorithms P(A|B)=(P(B|A)P(A))/(P(B) Cognate Clustering Swedish English Danish Norwegian Dutch German kvinna woman kvinde kvine vrouw Frau Swedish kvina 0.00 0.69 0.07 0.12 0.71 0.78 English wumin 0.69 0.00 0.66 0.57 0.68 0.87 Danish kveni 0.07 0.66 0.00 0.08 0.67 0.71 Norwegian kwini 0.12 0.57 0.08 0.00 0.75 0.74 Dutch frou 0.71 0.68 0.67 0.75 0.00 0.17 German frau 0.78 0.87 0.71 0.74 0.17 0.00 Analysis ID Taxa Word Gloss GlossID IPA ... ... ... ... ... ... 21 German Frau woman 20 frau 22 Dutch vrouw woman 20 vrɑu 23 English woman woman 20 wʊmən 24 Danish kvinde woman 20 kvenə 25 Swedish kvinna woman 20 kviːna 26 Norwegian kvine woman 20 kʋinə ... ... ... ... ... ... 8 / 30

Slide 27

Slide 27 text

Cognate Detection Automatic Approaches Algorithms P(A|B)=(P(B|A)P(A))/(P(B) Cognate Clustering Swedish English Danish Norwegian Dutch German kvinna woman kvinde kvine vrouw Frau Swedish kvina 0.00 0.69 0.07 0.12 0.71 0.78 English wumin 0.69 0.00 0.66 0.57 0.68 0.87 Danish kveni 0.07 0.66 0.00 0.08 0.67 0.71 Norwegian kwini 0.12 0.57 0.08 0.00 0.75 0.74 Dutch frou 0.71 0.68 0.67 0.75 0.00 0.17 German frau 0.78 0.87 0.71 0.74 0.17 0.00 German Frau frau Dutch vrouw vrou English woman wumin Danish kvinde kveni Swedish kvinna kvina Norwegian kvine kwini 8 / 30

Slide 28

Slide 28 text

Cognate Detection Automatic Approaches Algorithms P(A|B)=(P(B|A)P(A))/(P(B) Cognate Clustering Swedish English Danish Norwegian Dutch German kvinna woman kvinde kvine vrouw Frau Swedish kvina 0.00 0.69 0.07 0.12 0.71 0.78 English wumin 0.69 0.00 0.66 0.57 0.68 0.87 Danish kveni 0.07 0.66 0.00 0.08 0.67 0.71 Norwegian kwini 0.12 0.57 0.08 0.00 0.75 0.74 Dutch frou 0.71 0.68 0.67 0.75 0.00 0.17 German frau 0.78 0.87 0.71 0.74 0.17 0.00 German Frau frau Dutch vrouw vrou English woman wumin Danish kvinde kveni Swedish kvinna kvina Norwegian kvine kwini 8 / 30

Slide 29

Slide 29 text

Cognate Detection Automatic Approaches Algorithms P(A|B)=(P(B|A)P(A))/(P(B) Cognate Clustering German Frau frau Dutch vrouw vrou English woman wumin Danish kvinde kveni Swedish kvinna kvina Norwegian kvine kwini Analysis ID Taxa Word Gloss GlossID IPA CogID ... ... ... ... ... ... ... 21 German Frau woman 20 frau 1 22 Dutch vrouw woman 20 vrɑu 1 23 English woman woman 20 wʊmən 2 24 Danish kvinde woman 20 kvenə 3 25 Swedish kvinna woman 20 kviːna 3 26 Norwegian kvine woman 20 kʋinə 3 ... ... ... ... ... ... ... 8 / 30

Slide 30

Slide 30 text

Cognate Detection Automatic Approaches Algorithms P(A|B)=(P(B|A)P(A))/(P(B) INPUT TOKENIZATION PREPROCESSING LOG-ODDS D ISTANCE COGNATE OUTPUT CORRESPONDENCE DETECTION USING PHONETIC ALIGNMENT LOOP DISTRIBUTION LexStat Algorithm (List 2014) EXPECTED ATTESTED DISTRIBUTION CALCULATION CLUSTERING 8 / 30

Slide 31

Slide 31 text

Cognate Detection Problems Problems ! 9 / 30

Slide 32

Slide 32 text

Cognate Detection Problems Applicability ! 10 / 30

Slide 33

Slide 33 text

Cognate Detection Problems Applicability ! Method Multilingual? No additional requirements? Freely Available? Mackay & Kondrak 2005 ✗ ✓ ✗ Bergsma & Kondrak 2007 ✓ ✓ ✗ Turchin et al. 2010 ✓ ✓ ✓ Berg-Kirkpatrick & Klein 2011 ✗ ✓ ✗ Hauer & Kondrak 2011 ✓ ✓ ✗ Steiner et al. 2011 ✓ ✓ ✗ List 2012 & 2014 ✓ ✓ ✓ Beinborn et al. 2013 ✗ ? ✗ Bouchard-Côté et al. 2013 ✓ ✗ ✗ Rama 2013 ✗ ✓ ✗ Ciobanu & Dinu 2014 ✗ ✓ ✗ … … … … 10 / 30

Slide 34

Slide 34 text

Cognate Detection Problems Applicability ! Method Multilingual? No additional requirements? Freely Available? Mackay & Kondrak 2005 ✗ ✓ ✗ Bergsma & Kondrak 2007 ✓ ✓ ✗ Turchin et al. 2010 ✓ ✓ ✓ Berg-Kirkpatrick & Klein 2011 ✗ ✓ ✗ Hauer & Kondrak 2011 ✓ ✓ ✗ Steiner et al. 2011 ✓ ✓ ✗ List 2012 & 2014 ✓ ✓ ✓ Beinborn et al. 2013 ✗ ? ✗ Bouchard-Côté et al. 2013 ✓ ✗ ✗ Rama 2013 ✗ ✓ ✗ Ciobanu & Dinu 2014 ✗ ✓ ✗ … … … … 10 / 30

Slide 35

Slide 35 text

Cognate Detection Problems Applicability ! Method Multilingual? No additional requirements? Freely Available? Mackay & Kondrak 2005 ✗ ✓ ✗ Bergsma & Kondrak 2007 ✓ ✓ ✗ Turchin et al. 2010 ✓ ✓ ✓ Berg-Kirkpatrick & Klein 2011 ✗ ✓ ✗ Hauer & Kondrak 2011 ✓ ✓ ✗ Steiner et al. 2011 ✓ ✓ ✗ List 2012 & 2014 ✓ ✓ ✓ Beinborn et al. 2013 ✗ ? ✗ Bouchard-Côté et al. 2013 ✓ ✗ ✗ Rama 2013 ✗ ✓ ✗ Ciobanu & Dinu 2014 ✗ ✓ ✗ … … … … 10 / 30

Slide 36

Slide 36 text

Cognate Detection Problems Applicability ! Method Multilingual? No additional requirements? Freely Available? Mackay & Kondrak 2005 ✗ ✓ ✗ Bergsma & Kondrak 2007 ✓ ✓ ✗ Turchin et al. 2010 ✓ ✓ ✓ Berg-Kirkpatrick & Klein 2011 ✗ ✓ ✗ Hauer & Kondrak 2011 ✓ ✓ ✗ Steiner et al. 2011 ✓ ✓ ✗ List 2012 & 2014 ✓ ✓ ✓ Beinborn et al. 2013 ✗ ? ✗ Bouchard-Côté et al. 2013 ✓ ✗ ✗ Rama 2013 ✗ ✓ ✗ Ciobanu & Dinu 2014 ✗ ✓ ✗ … … … … 10 / 30

Slide 37

Slide 37 text

Cognate Detection Problems Transparency ! 11 / 30

Slide 38

Slide 38 text

Cognate Detection Problems Transparency ! Results are often only reported as evaluation scores. 11 / 30

Slide 39

Slide 39 text

Cognate Detection Problems Transparency ! Results are often only reported as evaluation scores. Examples for individual cognate judgments are rare. 11 / 30

Slide 40

Slide 40 text

Cognate Detection Problems Transparency ! Results are often only reported as evaluation scores. Examples for individual cognate judgments are rare. Supplementary data – is often lacking, or 11 / 30

Slide 41

Slide 41 text

Cognate Detection Problems Transparency ! Results are often only reported as evaluation scores. Examples for individual cognate judgments are rare. Supplementary data – is often lacking, or – not given in a human-readable form. 11 / 30

Slide 42

Slide 42 text

Cognate Detection Problems Transparency ! Results are often only reported as evaluation scores. Examples for individual cognate judgments are rare. Supplementary data – is often lacking, or – not given in a human-readable form. → The results show a great lack of transparency. 11 / 30

Slide 43

Slide 43 text

Cognate Detection Problems Comparability ! 12 / 30

Slide 44

Slide 44 text

Cognate Detection Problems Comparability ! Test sets (benchmarks) vary greatly. 12 / 30

Slide 45

Slide 45 text

Cognate Detection Problems Comparability ! Test sets (benchmarks) vary greatly. Often, only subsets of Dyen et al. (1992) are used. 12 / 30

Slide 46

Slide 46 text

Cognate Detection Problems Comparability ! Test sets (benchmarks) vary greatly. Often, only subsets of Dyen et al. (1992) are used. → It is difficult to compare the performance of the methods. 12 / 30

Slide 47

Slide 47 text

Cognate Detection Problems Accuracy ! 13 / 30

Slide 48

Slide 48 text

Cognate Detection Problems Accuracy ! Evaluation criteria are not very intuitive and vary greatly. 13 / 30

Slide 49

Slide 49 text

Cognate Detection Problems Accuracy ! Evaluation criteria are not very intuitive and vary greatly. It is difficult to communicate the results to traditional linguists. 13 / 30

Slide 50

Slide 50 text

Cognate Detection Problems Accuracy ! Evaluation criteria are not very intuitive and vary greatly. It is difficult to communicate the results to traditional linguists. → Many linguists regard automatic cognate detection as – “impossible per se”, or 13 / 30

Slide 51

Slide 51 text

Cognate Detection Problems Accuracy ! Evaluation criteria are not very intuitive and vary greatly. It is difficult to communicate the results to traditional linguists. → Many linguists regard automatic cognate detection as – “impossible per se”, or – as useful as “rolling a dice”. 13 / 30

Slide 52

Slide 52 text

Chances 14 / 30

Slide 53

Slide 53 text

Chances 14 / 30

Slide 54

Slide 54 text

Chances 14 / 30

Slide 55

Slide 55 text

Chances Applicability Applicability PyPi GitHub SourceForge GoogleCode CPAN CTAN JSAN PEAR LaunchPad 15 / 30

Slide 56

Slide 56 text

Chances Applicability Applicability PyPi GitHub SourceForge GoogleCode CPAN CTAN JSAN PEAR LaunchPad It was never easier to publish and maintain code... 15 / 30

Slide 57

Slide 57 text

Chances Applicability LingPy PyPi GitHub SourceForge GoogleCode CPAN CTAN JSAN PEAR LaunchPad 16 / 30

Slide 58

Slide 58 text

Chances Applicability LingPy PyPi GitHub SourceForge GoogleCode CPAN CTAN JSAN PEAR LaunchPad What is LingPy? Python library for automatic tasks in historical linguistics project homepage: http://lingpy.org code base: https://github.com/lingpy/lingpy supports Python2 and Python3 works on Mac, Linux, and (basically also) Windows current release: 2.3 16 / 30

Slide 59

Slide 59 text

Chances Applicability LingPy PyPi GitHub SourceForge GoogleCode CPAN CTAN JSAN PEAR LaunchPad What does LingPy offer? tokenization of phonetic sequences phonetic alignment analyses (List 2012a) automatic cognate detection (Turchin 2010, List 2012b) automatic borrowing detection (List et al. 2014) basic routines for the evaluation of automatic methods plotting routines for interactive visualizations 16 / 30

Slide 60

Slide 60 text

Chances Transparency Transparency 17 / 30

Slide 61

Slide 61 text

Chances Transparency Interactive Presentation of Results 18 / 30

Slide 62

Slide 62 text

Chances Transparency Interactive Presentation of Results Alignments offer a unique perspective on results of cognate detection analyses. JavaScript and HTML5 offer unique ways for interactive data visualization. At the moment, we develop JavaScript tools that – visualize phonetic alignments of cognate sets, and – even allow to edit the data online. 18 / 30

Slide 63

Slide 63 text

Chances Comparability Comparability ML BAYES ? ! 19 / 30

Slide 64

Slide 64 text

Chances Comparability Benchmark Databases for Historical Linguistics ML BAYES ? ! 20 / 30

Slide 65

Slide 65 text

Chances Comparability Benchmark Databases for Historical Linguistics ML BAYES ? ! First benchmark databases have been compiled and published: Benchmark Database of Phonetic Alignments (BDPA, List & Prokić 2014, http://alignments.lingpy.org) Benchmark Database for Cognate Detection (BDCD, presented in List 2014, http://sequencecomparison.github.io). Benchmark Database for Linguistic Reconstruction (BDLR, in preparation). 20 / 30

Slide 66

Slide 66 text

Chances Comparability Benchmark Databases for Historical Linguistics ML BAYES ? ! All data is given in phonetic transcriptions (IPA), tokenized into phonemic units, freely available for download, and can be directly used in LingPy. 20 / 30

Slide 67

Slide 67 text

Chances Accuracy Accuracy *h₂ 21 / 30

Slide 68

Slide 68 text

Chances Accuracy Performance of Cognate Detection Algorithms *h₂ 22 / 30

Slide 69

Slide 69 text

Chances Accuracy Performance of Cognate Detection Algorithms *h₂ B-Cubed F-Scores on BDCD Benchmark (List 2014) Bai (Tibeto-Burman) Indo-European Japanese and Ryukyu Ob-Ugrian Austronesian Sinitic (Chinese Dialects) 60 65 70 75 80 85 90 95 Turchin NED SCA LexStat 22 / 30

Slide 70

Slide 70 text

Chances Accuracy Performance of Cognate Detection Algorithms *h₂ B-Cubed F-Scores on BDCD Benchmark (List 2014) Bai (Tibeto-Burman) Indo-European Japanese and Ryukyu Ob-Ugrian Austronesian Sinitic (Chinese Dialects) 60 65 70 75 80 85 90 95 Turchin NED SCA LexStat 75% 93% 92% 81% 89% 81% 22 / 30

Slide 71

Slide 71 text

Chances Accuracy Performance of Cognate Detection Algorithms *h₂ B-Cubed F-Scores on BDCD Benchmark (List 2014) Bai (Tibeto-Burman) Indo-European Japanese and Ryukyu Ob-Ugrian Austronesian Sinitic (Chinese Dialects) 60 65 70 75 80 85 90 95 Turchin NED SCA LexStat 75% 93% 22 / 30

Slide 72

Slide 72 text

P(A|B)=(P(B|A)P(A))/(P(B) Challenges 23 / 30

Slide 73

Slide 73 text

Challenges Within Cognacy Within Cognacy 24 / 30

Slide 74

Slide 74 text

Challenges Within Cognacy Within Cognacy We need to enhance our 24 / 30

Slide 75

Slide 75 text

Challenges Within Cognacy Within Cognacy We need to enhance our lexical databases (amount and quality of data), 24 / 30

Slide 76

Slide 76 text

Challenges Within Cognacy Within Cognacy We need to enhance our lexical databases (amount and quality of data), cognate detection algorithms (accessibility and performance), and 24 / 30

Slide 77

Slide 77 text

Challenges Within Cognacy Within Cognacy We need to enhance our lexical databases (amount and quality of data), cognate detection algorithms (accessibility and performance), and ways to present the results (interactive visualizations). 24 / 30

Slide 78

Slide 78 text

Challenges Beyond Cognacy Beyond Cognacy 25 / 30

Slide 79

Slide 79 text

Challenges Beyond Cognacy Beyond Cognacy German m oː n t - English m uː n - - Danish m ɔː n - ə Swedish m oː n - e 25 / 30

Slide 80

Slide 80 text

Challenges Beyond Cognacy Beyond Cognacy German m oː n t - English m uː n - - Danish m ɔː n - ə Swedish m oː n - e Fúzhōu ŋ u o ʔ ⁵ - - - - - - - - - - Měixiàn ŋ i a t ⁵ - - - - - k u o ŋ ⁴⁴ Guǎngzhōu j - y t ² l - œ ŋ ²² - - - - - Běijīng - y ɛ - ⁵¹ l i ɑ ŋ - - - - - - 25 / 30

Slide 81

Slide 81 text

Challenges Beyond Cognacy Beyond Cognacy German m oː n t - English m uː n - - Danish m ɔː n - ə Swedish m oː n - e Fúzhōu ŋ u o ʔ ⁵ - - - - - - - - - - Měixiàn ŋ i a t ⁵ - - - - - k u o ŋ ⁴⁴ Guǎngzhōu j - y t ² l - œ ŋ ²² - - - - - Běijīng - y ɛ - ⁵¹ l i ɑ ŋ - - - - - - "MOON" "MOON" "SHINE" "LIGHT" 25 / 30

Slide 82

Slide 82 text

Challenges Beyond Cognacy Beyond Cognacy Fúzhōu Měixiàn Guǎngzhōu Běijīng 25 / 30

Slide 83

Slide 83 text

Challenges Beyond Cognacy Beyond Cognacy Fúzhōu Měixiàn Guǎngzhōu Běijīng INNO VATIO N INNO VATIO N INNO VATIO N BO RRO W ING LO SS INNO VATIO N INNO VATIO N 25 / 30

Slide 84

Slide 84 text

Challenges Beyond Cognacy Lexical Change SEMANTIC CHANGE MORPHOLOGICAL CHANGE S T R A T IC C H A N G E Three Dimensions of Lexical Change (Gévaudan 2007) 26 / 30

Slide 85

Slide 85 text

Challenges Beyond Cognacy Lexical Change Stratic Morphological Semantic Relation Biolog. Term continuity traditional notion of cognacy - + +/- +/- cognacy à la Swadesh - + +/- + automatic cognate detection - +/- +/- + direct cognate relation orthology + + + oblique cognate relation paralogy + - + etymological relation homology +/- +/- +/- oblique etymological relation xenology - +/- +/- 26 / 30

Slide 86

Slide 86 text

Challenges Beyond Cognacy Inferring Lexical Change Scenarios 27 / 30

Slide 87

Slide 87 text

Challenges Beyond Cognacy Inferring Lexical Change Scenarios In order to go beyond cognacy, we need methods for 27 / 30

Slide 88

Slide 88 text

Challenges Beyond Cognacy Inferring Lexical Change Scenarios In order to go beyond cognacy, we need methods for borrowing detection (stratic aspect), 27 / 30

Slide 89

Slide 89 text

Challenges Beyond Cognacy Inferring Lexical Change Scenarios In order to go beyond cognacy, we need methods for borrowing detection (stratic aspect), partial cognate inference (morphological aspect), and 27 / 30

Slide 90

Slide 90 text

Challenges Beyond Cognacy Inferring Lexical Change Scenarios In order to go beyond cognacy, we need methods for borrowing detection (stratic aspect), partial cognate inference (morphological aspect), and cross-semantic cognate inference (semantic aspect). 27 / 30

Slide 91

Slide 91 text

Challenges Beyond Cognacy Inferring Lexical Change Scenarios In order to go beyond cognacy, we need methods for borrowing detection (stratic aspect), partial cognate inference (morphological aspect), and cross-semantic cognate inference (semantic aspect). Following the lead of evolutionary biology, these methods should be combined under a unified framework of tree reconciliation (Page & Cotton 2002) in historical linguistics. 27 / 30

Slide 92

Slide 92 text

Challenges Beyond Cognacy Tree Reconciliation Fúzhōu Měixiàn Guǎngzhōu Běijīng Fúzhōu Měixiàn Guǎngzhōu Běijīng 28 / 30

Slide 93

Slide 93 text

Challenges Beyond Cognacy Tree Reconciliation Fúzhōu Měixiàn Guǎngzhōu Běijīng Fúzhōu Měixiàn Guǎngzhōu Běijīng 28 / 30

Slide 94

Slide 94 text

Challenges Beyond Cognacy Tree Reconciliation Fúzhōu Měixiàn Guǎngzhōu Běijīng 28 / 30

Slide 95

Slide 95 text

Challenges Beyond Cognacy Tree Reconciliation Fúzhōu Měixiàn Guǎngzhōu Běijīng 28 / 30

Slide 96

Slide 96 text

Challenges Beyond Cognacy Tree Reconciliation LOSS INNO VATIO N INNO VATIO N BORROWING 28 / 30

Slide 97

Slide 97 text

Challenges Beyond Cognacy Tree Reconciliation PHYLOGENETIC RECONSTRUC- TION COGNATE (=HOMOLOG) DETECTION COGNATE TREE RECONCILIATION General Workflow for the Inference of Lexical Change Scenarios 28 / 30

Slide 98

Slide 98 text

Conclusion 29 / 30

Slide 99

Slide 99 text

Conclusion Automatic cognate detection is still in its infancy, yet the child is constantly growing. 29 / 30

Slide 100

Slide 100 text

Conclusion Automatic cognate detection is still in its infancy, yet the child is constantly growing. Enhancing the applicability, transparency, comparability, and accuracy of cognate detection methods is a goal that can be achieved in the near future. 29 / 30

Slide 101

Slide 101 text

Conclusion Automatic cognate detection is still in its infancy, yet the child is constantly growing. Enhancing the applicability, transparency, comparability, and accuracy of cognate detection methods is a goal that can be achieved in the near future. The greatest challenge arises from the complexity of lexical change processes. 29 / 30

Slide 102

Slide 102 text

Conclusion Automatic cognate detection is still in its infancy, yet the child is constantly growing. Enhancing the applicability, transparency, comparability, and accuracy of cognate detection methods is a goal that can be achieved in the near future. The greatest challenge arises from the complexity of lexical change processes. More realistic approaches that go beyond cognacy should be able to handle variation along the stratic, the morphological, and the semantic dimension of lexical change. 29 / 30

Slide 103

Slide 103 text

Conclusion Automatic cognate detection is still in its infancy, yet the child is constantly growing. Enhancing the applicability, transparency, comparability, and accuracy of cognate detection methods is a goal that can be achieved in the near future. The greatest challenge arises from the complexity of lexical change processes. More realistic approaches that go beyond cognacy should be able to handle variation along the stratic, the morphological, and the semantic dimension of lexical change. Evolutionary biology offers frameworks that could be employed to achieve these goals, yet it is not entirely clear whether and how this is possible. 29 / 30

Slide 104

Slide 104 text

Thank You for Listening! 30 / 30