Slide 1

Slide 1 text

Automatic Identification of Historically Related Words Johann-Mattis List DFG research fellow Centre des recherches linguistiques sur l’Asie Orientale Team Adaptation, Integration, Reticulation, Evolution EHESS and UPMC, Paris 2015/05/20 1 / 30

Slide 2

Slide 2 text

Lexical Change 2 / 30

Slide 3

Slide 3 text

Lexical Change Dimensions Dimensions of Lexical Change 'soh₂-wl̩- sh₂uˈen- SUN Indo-European 3 / 30

Slide 4

Slide 4 text

Lexical Change Dimensions Dimensions of Lexical Change 'soh₂-wl̩- sh₂uˈen- SUN Indo-European soːwel- sunːoː- SUN Germanic 3 / 30

Slide 5

Slide 5 text

Lexical Change Dimensions Dimensions of Lexical Change 'soh₂-wl̩- sh₂uˈen- SUN Indo-European soːwel- sunːoː- SUN Germanic zɔnə SUN German suːl SUN Swedish 3 / 30

Slide 6

Slide 6 text

Lexical Change Dimensions Dimensions of Lexical Change 'soh₂-wl̩- sh₂uˈen- SUN Indo-European soːwel- sunːoː- SUN Germanic soːl- SUN Romance zɔnə SUN German suːl SUN Swedish 3 / 30

Slide 7

Slide 7 text

Lexical Change Dimensions Dimensions of Lexical Change 'soh₂-wl̩- sh₂uˈen- SUN Indo-European soːwel- sunːoː- SUN Germanic soːl- SUN soːlikul- SMALL SUN Romance zɔnə SUN German suːl SUN Swedish 3 / 30

Slide 8

Slide 8 text

Lexical Change Dimensions Dimensions of Lexical Change 'soh₂-wl̩- sh₂uˈen- SUN Indo-European soːwel- sunːoː- SUN Germanic soːl- SUN soːlikul- SMALL SUN Romance solej SUN French sol SUN Spanish zɔnə SUN German suːl SUN Swedish 3 / 30

Slide 9

Slide 9 text

Lexical Change Dimensions Dimensions of Lexical Change 'soh₂-wl◌̩ - sh₂uˈen- SUN Indo-European soːwel- sunːoː- SUN Germanic soːl- SUN soːlikul- SMALL SUN Romance solej SUN French sol SUN Spanish zɔnə SUN German suːl SUN Swedish SEM ANTIC SHIFT M O RPH O LO G ICAL CH AN G E M O R PH O LO G ICA L CH A N G E MORPHOLOGICAL CHANGE MORPHOLOGICAL CHANGE 3 / 30

Slide 10

Slide 10 text

Lexical Change Dimensions Dimensions of Lexical Change arbre 4 / 30

Slide 11

Slide 11 text

Lexical Change Dimensions Dimensions of Lexical Change form "meaning" 4 / 30

Slide 12

Slide 12 text

Lexical Change Dimensions Dimensions of Lexical Change arbre 4 / 30

Slide 13

Slide 13 text

Lexical Change Dimensions Dimensions of Lexical Change 4 / 30

Slide 14

Slide 14 text

Lexical Change Dimensions Dimensions of Lexical Change arbre MEANING FORM LANGUAGE 4 / 30

Slide 15

Slide 15 text

Lexical Change Dimensions Dimensions of Lexical Change FORM LANGUAGE MEANING arbre 4 / 30

Slide 16

Slide 16 text

Lexical Change Dimensions Dimensions of Lexical Change arbre MEANING FORM LANGUAGE MEANING FORM LANGUAGE 4 / 30

Slide 17

Slide 17 text

Lexical Change Dimensions Dimensions of Lexical Change SEMANTIC CHANGE MORPHOLOGICAL CHANGE S T R A T IC C H A N G E Gévaudan (2007) 4 / 30

Slide 18

Slide 18 text

Lexical Change Relations Relations between Historically Related Words English 'TOOTH' tooth Germanic 'TOOTH' *tanθ- German 'TOOTH' Zahn Direct Cognate Relation (Orthology) 5 / 30

Slide 19

Slide 19 text

Lexical Change Relations Relations between Historically Related Words English 'BIRTH' birth Germanic 'BIRTH' *ga-burdi- German 'BIRTH' Geburt Indirect Cognate Relation (Paralogy) 5 / 30

Slide 20

Slide 20 text

Lexical Change Relations Relations between Historically Related Words Germanic English 'SILLY' silly Germanic 'HAPPY' *sæli- German 'BLESSED' selig Indirect Cognate Relation (Paralogy) 5 / 30

Slide 21

Slide 21 text

Lexical Change Relations Relations between Historically Related Words Kopf 'HAPPY' *sæli- 'BLESSED' selig Germanic 'SHORT' *skurt Indo-Europ. 'CUT OFF' *(s)ker- Latin 'MUTILATED' curtus German 'SHORT' kurz English 'SHORT' short Indirect Etymological Relation (Xenology) 5 / 30

Slide 22

Slide 22 text

Lexical Change Relations Relations between Historically Related Words Relations in Biology Proposed Terminology for Linguistics direct cognate relation homology orthology etymological relation cognate relation indirect cognate relation paralogy xenology indirect etymological relation 5 / 30

Slide 23

Slide 23 text

Lexical Change Sound Change Sound Change Meaning Latin Italian ‘FEATHER’ pluːma pjuma ‘FLAT’ plaːnus pjano ‘SQUARE’ plateːa pjaʦːa 6 / 30

Slide 24

Slide 24 text

Lexical Change Sound Change Sound Change Meaning Latin Italian ‘FEATHER’ pluːma pjuma ‘FLAT’ plaːnus pjano ‘SQUARE’ plateːa pjaʦːa l > j 6 / 30

Slide 25

Slide 25 text

Lexical Change Sound Change Sound Change Meaning Latin Italian ‘FEATHER’ pluːma pjuma ‘FLAT’ plaːnus pjano ‘SQUARE’ plateːa pjaʦːa Meaning Latin Italian ‘TONGUE’ liŋgua liŋgwa ‘MOON’ lu:na luna ‘SLOW’ lentus lento l > j 6 / 30

Slide 26

Slide 26 text

Lexical Change Sound Change Sound Change Meaning Latin Italian ‘FEATHER’ pluːma pjuma ‘FLAT’ plaːnus pjano ‘SQUARE’ plateːa pjaʦːa Meaning Latin Italian ‘TONGUE’ liŋgua liŋgwa ‘MOON’ lu:na luna ‘SLOW’ lentus lento l > j l > l 6 / 30

Slide 27

Slide 27 text

Lexical Change Sound Change Sound Change Meaning Latin Italian ‘FEATHER’ pluːma pjuma ‘FLAT’ plaːnus pjano ‘SQUARE’ plateːa pjaʦːa Meaning Latin Italian ‘TONGUE’ liŋgua liŋgwa ‘MOON’ lu:na luna ‘SLOW’ lentus lento l > j l > l l > j / p _ 6 / 30

Slide 28

Slide 28 text

Lexical Change Sound Change Sound Change Meaning Latin Italian ‘FEATHER’ pluːma pjuma ‘FLAT’ plaːnus pjano ‘SQUARE’ plateːa pjaʦːa Meaning Latin Italian ‘TONGUE’ liŋgua liŋgwa ‘MOON’ lu:na luna ‘SLOW’ lentus lento l > j l > l l > j / p _ Not sounds change, phonemes (alphabets!) change (Bloomfield 1933)! 6 / 30

Slide 29

Slide 29 text

Lexical Change Sound Change Sound Change Meaning Latin Italian ‘FEATHER’ pluːma pjuma ‘FLAT’ plaːnus pjano ‘SQUARE’ plateːa pjaʦːa Meaning Latin Italian ‘TONGUE’ liŋgua liŋgwa ‘MOON’ lu:na luna ‘SLOW’ lentus lento l > j l > l l > j / p _ Not sounds change, phonemes (alphabets!) change (Bloomfield 1933)! Sound change depends on the context in which the sounds occur! 6 / 30

Slide 30

Slide 30 text

Lexical Change Sound Change Sound Change 7 / 30

Slide 31

Slide 31 text

Lexical Change Sound Change Sound Change Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 3 x d d 1 x n n 1 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn d ɔː n 7 / 30

Slide 32

Slide 32 text

Lexical Change Sound Change Sound Change Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 3 x d d 1 x n n 1 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn d ɔː n 7 / 30

Slide 33

Slide 33 text

Lexical Change Sound Change Sound Change Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 2 x d d 1 x n n 1 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn d ɔː n 7 / 30

Slide 34

Slide 34 text

Lexical Change Sound Change Sound Change Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 2 x d d 1 x n n 1 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn θ ɔː n 7 / 30

Slide 35

Slide 35 text

Lexical Change Sound Change Sound Change Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 3 x d d 1 x ? n n 2 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn θ ɔː n 7 / 30

Slide 36

Slide 36 text

Lexical Change Sound Change Sound Change Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 3 x d d 1 x n n 2 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn θ ɔː n 7 / 30

Slide 37

Slide 37 text

Lexical Change Sound Change Sound Change Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 3 x n n 2 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German Dorn d ɔɐ n English thorn θ ɔː n German dumm d ʊ m English dumb d ʌ m 7 / 30

Slide 38

Slide 38 text

Lexical Change Sound Change Sound Change To identify cognate words, one needs a context- dependent mapping between two (or more) phoneme systems (alphabets)! Technically, one needs to infer both the scoring function and the optimal alignment between multiple words at the same time! 7 / 30

Slide 39

Slide 39 text

Sequence Comparison 8 / 30

Slide 40

Slide 40 text

Sequence Comparison Alignment Analyses Alignment Analyses 9 / 30

Slide 41

Slide 41 text

Sequence Comparison Alignment Analyses Alignment Analyses 0 H H H H H 0 0 H H H H 0 9 / 30

Slide 42

Slide 42 text

Sequence Comparison Alignment Analyses Alignment Analyses 0 H H H H H 0 0 H H H H 0 9 / 30

Slide 43

Slide 43 text

Sequence Comparison Alignment Analyses Alignment Analyses 0 H H H H H 0 0 H H H H H 0 9 / 30

Slide 44

Slide 44 text

Sequence Comparison Alignment Analyses Alignment Analyses: Alignment Modes Mode Alignment global T H E # C A T - F I S H # H U N T S T H E # C A T # F I S H - E - - - S semiglobal T H E # C A T - F I S H - - - H U N T S T H E # C A T # F I S H E S # - - - - - local T H E # C A T - F I S H HUNTS T H E # C A T # F I S H ES diagonal T H E # C A T - F I S H - # H U N T S T H E # C A T # F I S H E - - - - - S secondary T H E # C A T F I S H # H U N T - S T H E # C A T - - - - # F I S H E S 10 / 30

Slide 45

Slide 45 text

Sequence Comparison Alignment Analyses Alignment Analyses: Alignment Modes Mode Alignment global T H E # C A T - F I S H # H U N T S T H E # C A T # F I S H - E - - - S semiglobal T H E # C A T - F I S H - - - H U N T S T H E # C A T # F I S H E S # - - - - - local T H E # C A T - F I S H HUNTS T H E # C A T # F I S H ES diagonal T H E # C A T - F I S H - # H U N T S T H E # C A T # F I S H E - - - - - S secondary T H E # C A T F I S H # H U N T - S T H E # C A T - - - - # F I S H E S 10 / 30

Slide 46

Slide 46 text

Sequence Comparison Alignment Analyses Alignment Analyses: Alignment Modes Mode Alignment global T H E # C A T - F I S H # H U N T S T H E # C A T # F I S H - E - - - S semiglobal T H E # C A T - F I S H - - - H U N T S T H E # C A T # F I S H E S # - - - - - local T H E # C A T - F I S H HUNTS T H E # C A T # F I S H ES diagonal T H E # C A T - F I S H - # H U N T S T H E # C A T # F I S H E - - - - - S secondary T H E # C A T F I S H # H U N T - S T H E # C A T - - - - # F I S H E S 10 / 30

Slide 47

Slide 47 text

Sequence Comparison Alignment Analyses Alignment Analyses: Alignment Modes Mode Alignment global T H E # C A T - F I S H # H U N T S T H E # C A T # F I S H - E - - - S semiglobal T H E # C A T - F I S H - - - H U N T S T H E # C A T # F I S H E S # - - - - - local T H E # C A T - F I S H HUNTS T H E # C A T # F I S H ES diagonal T H E # C A T - F I S H - # H U N T S T H E # C A T # F I S H E - - - - - S secondary T H E # C A T F I S H # H U N T - S T H E # C A T - - - - # F I S H E S 10 / 30

Slide 48

Slide 48 text

Sequence Comparison Alignment Analyses Alignment Analyses: Alignment Modes Primary Alignment Haikou z i - t - ³ Beijing ʐ ʅ ⁵¹ tʰ ou ¹ Secondary Alignment Haikou z i t ³ - - - Beijing ʐ ʅ - ⁵¹ tʰ ou ¹ 10 / 30

Slide 49

Slide 49 text

Sequence Comparison Alignment Analyses Alignment Analyses: Alignment Modes Mode Alignment global T H E # C A T - F I S H # H U N T S T H E # C A T # F I S H - E - - - S semiglobal T H E # C A T - F I S H - - - H U N T S T H E # C A T # F I S H E S # - - - - - local T H E # C A T - F I S H HUNTS T H E # C A T # F I S H ES diagonal T H E # C A T - F I S H - # H U N T S T H E # C A T # F I S H E - - - - - S secondary T H E # C A T F I S H # H U N T - S T H E # C A T - - - - # F I S H E S 10 / 30

Slide 50

Slide 50 text

Sequence Comparison Multiple Alignment Analyses Multiple Alignment Analyses W O L D E M O R T W A L D E M A R - V O L O D Y M Y R - V - L A D I M I R - 11 / 30

Slide 51

Slide 51 text

Sequence Comparison Multiple Alignment Analyses Multiple Alignment Analyses W O L - D E M O R T W A L - D E M A R - V O L O D Y M Y R - V - L A D I M I R - 11 / 30

Slide 52

Slide 52 text

Sequence Comparison Sequences in Biology and Linguistics Sequences in Biology and Linguistics 12 / 30

Slide 53

Slide 53 text

Sequence Comparison Sequences in Biology and Linguistics Sequences in Biology and Linguistics • universal • language-specific 12 / 30

Slide 54

Slide 54 text

Sequence Comparison Sequences in Biology and Linguistics Sequences in Biology and Linguistics • universal • language-specific • limited • widely varying 12 / 30

Slide 55

Slide 55 text

Sequence Comparison Sequences in Biology and Linguistics Sequences in Biology and Linguistics • universal • language-specific • limited • widely varying • constant • mutable 12 / 30

Slide 56

Slide 56 text

Sequence Modeling in Historical Linguistics 13 / 30

Slide 57

Slide 57 text

Sequence Modeling in Historical Linguistics Paradigmatic Aspects Paradigmatic Aspects 14 / 30

Slide 58

Slide 58 text

Sequence Modeling in Historical Linguistics Paradigmatic Aspects Paradigmatic Aspects Sound Classes Sounds which frequently occur in correspondence relation in genetically related languages can be clustered into classes (types), assuming that “phonetic correspondences inside a ‘type’ are more regular than those between different ‘types’” (Dolgopolsky 1986[1966]: 35). 14 / 30

Slide 59

Slide 59 text

Sequence Modeling in Historical Linguistics Paradigmatic Aspects Paradigmatic Aspects Sound Classes Sounds which frequently occur in correspondence relation in genetically related languages can be clustered into classes (types), assuming that “phonetic correspondences inside a ‘type’ are more regular than those between different ‘types’” (Dolgopolsky 1986[1966]: 35). k g p b ʧ ʤ f v t d ʃ ʒ θ ð s z 1 14 / 30

Slide 60

Slide 60 text

Sequence Modeling in Historical Linguistics Paradigmatic Aspects Paradigmatic Aspects Sound Classes Sounds which frequently occur in correspondence relation in genetically related languages can be clustered into classes (types), assuming that “phonetic correspondences inside a ‘type’ are more regular than those between different ‘types’” (Dolgopolsky 1986[1966]: 35). k g p b ʧ ʤ f v t d ʃ ʒ θ ð s z 1 14 / 30

Slide 61

Slide 61 text

Sequence Modeling in Historical Linguistics Paradigmatic Aspects Paradigmatic Aspects Sound Classes Sounds which frequently occur in correspondence relation in genetically related languages can be clustered into classes (types), assuming that “phonetic correspondences inside a ‘type’ are more regular than those between different ‘types’” (Dolgopolsky 1986[1966]: 35). k g p b ʧ ʤ f v t d ʃ ʒ θ ð s z 1 14 / 30

Slide 62

Slide 62 text

Sequence Modeling in Historical Linguistics Paradigmatic Aspects Paradigmatic Aspects Sound Classes Sounds which frequently occur in correspondence relation in genetically related languages can be clustered into classes (types), assuming that “phonetic correspondences inside a ‘type’ are more regular than those between different ‘types’” (Dolgopolsky 1986[1966]: 35). K T P S 1 14 / 30

Slide 63

Slide 63 text

Sequence Modeling in Historical Linguistics Syntagmatic Aspects Syntagmatic Aspects 15 / 30

Slide 64

Slide 64 text

Sequence Modeling in Historical Linguistics Syntagmatic Aspects Syntagmatic Aspects Prosodic Strings Sound change occurs more frequently in prosodically weak positions of sound sequences (Geisler 1992). Based on the sonority profile of a sound sequence, we can distinguish different positions inside a string with respect to their prosodic context. Prosodic context can be modeled as prosodic string in which contexts are encoded by using specific symbols. 15 / 30

Slide 65

Slide 65 text

Sequence Modeling in Historical Linguistics Syntagmatic Aspects Syntagmatic Aspects Prosodic Strings Sound change occurs more frequently in prosodically weak positions of sound sequences (Geisler 1992). Based on the sonority profile of a sound sequence, we can distinguish different positions inside a string with respect to their prosodic context. Prosodic context can be modeled as prosodic string in which contexts are encoded by using specific symbols. j a b ə l k a 15 / 30

Slide 66

Slide 66 text

Sequence Modeling in Historical Linguistics Syntagmatic Aspects Syntagmatic Aspects Prosodic Strings Sound change occurs more frequently in prosodically weak positions of sound sequences (Geisler 1992). Based on the sonority profile of a sound sequence, we can distinguish different positions inside a string with respect to their prosodic context. Prosodic context can be modeled as prosodic string in which contexts are encoded by using specific symbols. sonority increases j a b ə l k a 15 / 30

Slide 67

Slide 67 text

Sequence Modeling in Historical Linguistics Syntagmatic Aspects Syntagmatic Aspects Prosodic Strings Sound change occurs more frequently in prosodically weak positions of sound sequences (Geisler 1992). Based on the sonority profile of a sound sequence, we can distinguish different positions inside a string with respect to their prosodic context. Prosodic context can be modeled as prosodic string in which contexts are encoded by using specific symbols. j a b ə l k a ↑ ↑ ↓ ↑ ↑ ascending maximum ↓ descending 15 / 30

Slide 68

Slide 68 text

Sequence Modeling in Historical Linguistics Syntagmatic Aspects Syntagmatic Aspects Prosodic Strings Sound change occurs more frequently in prosodically weak positions of sound sequences (Geisler 1992). Based on the sonority profile of a sound sequence, we can distinguish different positions inside a string with respect to their prosodic context. Prosodic context can be modeled as prosodic string in which contexts are encoded by using specific symbols. j a b ə l k a ↑ ↑ ↓ ↑ o strong weak 15 / 30

Slide 69

Slide 69 text

Sequence Modeling in Historical Linguistics Syntagmatic Aspects Syntagmatic Aspects Prosodic Strings Sound change occurs more frequently in prosodically weak positions of sound sequences (Geisler 1992). Based on the sonority profile of a sound sequence, we can distinguish different positions inside a string with respect to their prosodic context. Prosodic context can be modeled as prosodic string in which contexts are encoded by using specific symbols. j a b ə l k a # v C v c C > 15 / 30

Slide 70

Slide 70 text

Sequence Modeling in Historical Linguistics Multitiered Sequence Representation Multitiered Sequence Representation 16 / 30

Slide 71

Slide 71 text

Sequence Modeling in Historical Linguistics Multitiered Sequence Representation Multitiered Sequence Representation External Representation IPA j a b ə l k a Internal Representation Dolgopolsky Sound Classes J V P V L K V SCA Sound-Classes J A P E L K A ASJP Sound-Classes y a b I l k a Prosodic String # V C V c C > Trigrams #,j,a j,a,b a,b,ə b,ə,l ə,l,k l,k,a k,a,$ Sound-Class Trigrams #,j,V J,a,P V,b,V P,ə,L V,l,K L,k,V K,a,$ Onset-Vowel-Offset C,j V,a C,b v,ə c,l C,k >,a Sonority Profile 6 7 1 7 5 1 7 Prosodic String # v C v c C > Relative Gap-Weight 2.0 1.5 1.5 1.3 1.1 1.5 0.7 ... ... ... ... ... ... ... ... 16 / 30

Slide 72

Slide 72 text

Sequence Modeling in Historical Linguistics Multitiered Sequence Representation Multitiered Sequence Representation 17 / 30

Slide 73

Slide 73 text

Sequence Modeling in Historical Linguistics Multitiered Sequence Representation Multitiered Sequence Representation Cognate List Alignment Correspondence List German Zunge ʦ ʊ ŋ ə GER ENG Frequ. ʦ t 2 x s t 2 x h h 1 x f f 1 x n - 1 x … … … English tongue t ʌ ŋ - German Zahn ʦ aː n - English tooth t ʊː - θ German heiß h ai s English hot h ɔ t German Fuß f u ː s English foot f ʊ t 17 / 30

Slide 74

Slide 74 text

Sequence Modeling in Historical Linguistics Multitiered Sequence Representation Multitiered Sequence Representation Cognate List Alignment Correspondence List German Zunge ʦ ʊ ŋ ə GER ENG Frequ. ʦ t 2 x s t 2 x h h 1 x f f 1 x n - 1 x … … … English tongue t ʌ ŋ - German Zahn ʦ aː n - English tooth t ʊː - θ German heiß h ai s English hot h ɔ t German Fuß f u ː s English foot f ʊ t 17 / 30

Slide 75

Slide 75 text

Sequence Modeling in Historical Linguistics Multitiered Sequence Representation Multitiered Sequence Representation Cognate List Alignment Correspondence List German Zunge C U N E GER ENG Frequ. C/# T/# 2 x S/$ T/$ 2 x H/$ H/# 1 x B/$ B/# 1 x N/c - 1 x … … … English tongue T A N - German Zahn C A N - English tooth T U - T German heiß H A S English hot H O T German Fuß B U S English foot B U T 17 / 30

Slide 76

Slide 76 text

Sequence Modeling in Historical Linguistics Multitiered Sequence Representation Multitiered Sequence Representation Multitiered sequence representations (sound classes, prosodic strings, etc.) are of great use in automatic sequence comparison, since they guarantee comparability of otherwise incomparable alphabets, and allow to model phonetic contexts in a simple, universal, and objective way. 17 / 30

Slide 77

Slide 77 text

Automatic Sequence Comparison in Historical Linguistics 18 / 30

Slide 78

Slide 78 text

Automatic Sequence Comparison in Historical Linguistics Automatic Phonetic Alignment Automatic Phonetic Alignment Sound-Class-Based Phonetic Alignment (SCA, List 2012ac & 2014) IPA as input format pairwise and multiple alignment global, local, semi-global, diagonal, and secondary alignment modes three different sound-class models (Dolgopolsky, SCA, ASJP) empirically and theoretically inferred scoring functions for the sound-class alphabets secondary alignment for the alignment of data containing word or morpheme boundaries (see List 2012c & 2014 for specifics) multitiered sequence representation (prosodic strings) procedure for the detection of swaps (metathesis) in multiple alignments (List 2012a) 19 / 30

Slide 79

Slide 79 text

Automatic Sequence Comparison in Historical Linguistics Automatic Phonetic Alignment Automatic Phonetic Alignment _ INPUT SEQUEN- CES _ jabl̩ko jabəlka jabləkə japkɔ stage 1 SOUND-CLASS CONVERSION jabl̩ko → JAPLKU jabəlka → JAPELKA jabləkə → JAPLEKE japkɔ → JAPKU stage 2 LIBRARY CREATI- ON JAP-LKU JAPELKA JAPL-KU JAPLEKE JAPLKU JAP-KU JAPEL-KA JAP-LEKE ... ... stage 3 DISTANCE CAL- CULATION JAPLKU 0.00 0.14 0.34 0.12 JAPELKA 0.14 0.00 0.46 0.28 JAPLEKE 0.34 0.46 0.00 0.44 JAPKO 0.12 0.28 0.44 0.00 stage 4 CLUSTER ANALY- SIS . . . JAPLKU JAPELKA . JAPLEKE . . JAPKU stage 5 PROGRESSIVE ALIGNMENT J A P - L K U J A P E L K A JAPLEKE JAPKU MORE SEQUENCES? stage 6 ITERATIVE REFI- NEMENT J A P - L - K U J A P E L - K A J A P - L E K E JAPKU stage 7 SWAP CHECK J A P - L - K U J A P E L - K A J A P - L E K E J A P - - - K U stage 8 IPA CONVERSION J A P … → j a b … J A P … → j a b … J A P … → j a b … J A P … → j a p … OUTPUT MSA j a b - l̩ - k o j a b ə l - k a j a b - l ə k ə j a p - - - k ɔ yes no 20 / 30

Slide 80

Slide 80 text

Automatic Sequence Comparison in Historical Linguistics Automatic Phonetic Alignment Automatic Phonetic Alignment _ INPUT SEQUEN- CES _ jabl̩ko jabəlka jabləkə japkɔ stage 1 SOUND-CLASS CONVERSION jabl̩ko → JAPLKU jabəlka → JAPELKA jabləkə → JAPLEKE japkɔ → JAPKU stage 2 LIBRARY CREATI- ON JAP-LKU JAPELKA JAPL-KU JAPLEKE JAPLKU JAP-KU JAPEL-KA JAP-LEKE ... ... stage 3 DISTANCE CAL- CULATION JAPLKU 0.00 0.14 0.34 0.12 JAPELKA 0.14 0.00 0.46 0.28 JAPLEKE 0.34 0.46 0.00 0.44 JAPKO 0.12 0.28 0.44 0.00 stage 4 CLUSTER ANALY- SIS . . . JAPLKU JAPELKA . JAPLEKE . . JAPKU stage 5 PROGRESSIVE ALIGNMENT J A P - L K U J A P E L K A JAPLEKE JAPKU MORE SEQUENCES? stage 6 ITERATIVE REFI- NEMENT J A P - L - K U J A P E L - K A J A P - L E K E JAPKU stage 7 SWAP CHECK J A P - L - K U J A P E L - K A J A P - L E K E J A P - - - K U stage 8 IPA CONVERSION J A P … → j a b … J A P … → j a b … J A P … → j a b … J A P … → j a p … OUTPUT MSA j a b - l̩ - k o j a b ə l - k a j a b - l ə k ə j a p - - - k ɔ yes no 20 / 30

Slide 81

Slide 81 text

Automatic Sequence Comparison in Historical Linguistics Automatic Phonetic Alignment Automatic Phonetic Alignment _ INPUT SEQUEN- CES _ jabl̩ko jabəlka jabləkə japkɔ stage 1 SOUND-CLASS CONVERSION jabl̩ko → JAPLKU jabəlka → JAPELKA jabləkə → JAPLEKE japkɔ → JAPKU stage 2 LIBRARY CREATI- ON JAP-LKU JAPELKA JAPL-KU JAPLEKE JAPLKU JAP-KU JAPEL-KA JAP-LEKE ... ... DISTANCE CAL JAPLKU 0.00 0.14 0.34 0.12 20 / 30

Slide 82

Slide 82 text

Automatic Sequence Comparison in Historical Linguistics Automatic Phonetic Alignment Automatic Phonetic Alignment _ INPUT SEQUEN- CES _ jabl̩ko jabəlka jabləkə japkɔ stage 1 SOUND-CLASS CONVERSION jabl̩ko → JAPLKU jabəlka → JAPELKA jabləkə → JAPLEKE japkɔ → JAPKU stage 2 LIBRARY CREATI- ON JAP-LKU JAPELKA JAPL-KU JAPLEKE JAPLKU JAP-KU JAPEL-KA JAP-LEKE ... ... stage 3 DISTANCE CAL- CULATION JAPLKU 0.00 0.14 0.34 0.12 JAPELKA 0.14 0.00 0.46 0.28 JAPLEKE 0.34 0.46 0.00 0.44 JAPKO 0.12 0.28 0.44 0.00 stage 4 CLUSTER ANALY- SIS . . . JAPLKU JAPELKA . JAPLEKE . . JAPKU stage 5 PROGRESSIVE ALIGNMENT J A P - L K U J A P E L K A JAPLEKE JAPKU MORE SEQUENCES? stage 6 ITERATIVE REFI- NEMENT J A P - L - K U J A P E L - K A J A P - L E K E JAPKU stage 7 SWAP CHECK J A P - L - K U J A P E L - K A J A P - L E K E J A P - - - K U stage 8 IPA CONVERSION J A P … → j a b … J A P … → j a b … J A P … → j a b … J A P … → j a p … OUTPUT MSA j a b - l̩ - k o j a b ə l - k a j a b - l ə k ə j a p - - - k ɔ yes no 20 / 30

Slide 83

Slide 83 text

Automatic Sequence Comparison in Historical Linguistics Automatic Phonetic Alignment Automatic Phonetic Alignment _ INPUT SEQUEN- CES _ jabl̩ko jabəlka jabləkə japkɔ stage 1 SOUND-CLASS CONVERSION jabl̩ko → JAPLKU jabəlka → JAPELKA jabləkə → JAPLEKE japkɔ → JAPKU stage 2 LIBRARY CREATI- ON JAP-LKU JAPELKA JAPL-KU JAPLEKE JAPLKU JAP-KU JAPEL-KA JAP-LEKE ... ... stage 3 DISTANCE CAL- CULATION JAPLKU 0.00 0.14 0.34 0.12 JAPELKA 0.14 0.00 0.46 0.28 JAPLEKE 0.34 0.46 0.00 0.44 JAPKO 0.12 0.28 0.44 0.00 stage 4 CLUSTER ANALY- SIS . . . JAPLKU JAPELKA . JAPLEKE . . JAPKU stage 5 PROGRESSIVE ALIGNMENT J A P - L K U J A P E L K A JAPLEKE JAPKU MORE SEQUENCES? stage 6 ITERATIVE REFI- NEMENT J A P - L - K U J A P E L - K A J A P - L E K E JAPKU stage 7 SWAP CHECK J A P - L - K U J A P E L - K A J A P - L E K E J A P - - - K U stage 8 IPA CONVERSION J A P … → j a b … J A P … → j a b … J A P … → j a b … J A P … → j a p … OUTPUT MSA j a b - l̩ - k o j a b ə l - k a j a b - l ə k ə j a p - - - k ɔ yes no 20 / 30

Slide 84

Slide 84 text

Automatic Sequence Comparison in Historical Linguistics Automatic Phonetic Alignment Automatic Phonetic Alignment CONVERSION j japkɔ → JAPKU stage 2 LIBRARY CREATI- ON JAP-LKU JAPELKA JAPL-KU JAPLEKE JAPLKU JAP-KU JAPEL-KA JAP-LEKE ... ... stage 3 DISTANCE CAL- CULATION JAPLKU 0.00 0.14 0.34 0.12 JAPELKA 0.14 0.00 0.46 0.28 JAPLEKE 0.34 0.46 0.00 0.44 JAPKO 0.12 0.28 0.44 0.00 stage 4 CLUSTER ANALY- SIS . . . JAPLKU JAPELKA . JAPLEKE . . JAPKU J A P - L K U J A P E L K A 20 / 30

Slide 85

Slide 85 text

Automatic Sequence Comparison in Historical Linguistics Automatic Phonetic Alignment Automatic Phonetic Alignment _ INPUT SEQUEN- CES _ jabl̩ko jabəlka jabləkə japkɔ stage 1 SOUND-CLASS CONVERSION jabl̩ko → JAPLKU jabəlka → JAPELKA jabləkə → JAPLEKE japkɔ → JAPKU stage 2 LIBRARY CREATI- ON JAP-LKU JAPELKA JAPL-KU JAPLEKE JAPLKU JAP-KU JAPEL-KA JAP-LEKE ... ... stage 3 DISTANCE CAL- CULATION JAPLKU 0.00 0.14 0.34 0.12 JAPELKA 0.14 0.00 0.46 0.28 JAPLEKE 0.34 0.46 0.00 0.44 JAPKO 0.12 0.28 0.44 0.00 stage 4 CLUSTER ANALY- SIS . . . JAPLKU JAPELKA . JAPLEKE . . JAPKU stage 5 PROGRESSIVE ALIGNMENT J A P - L K U J A P E L K A JAPLEKE JAPKU MORE SEQUENCES? stage 6 ITERATIVE REFI- NEMENT J A P - L - K U J A P E L - K A J A P - L E K E JAPKU stage 7 SWAP CHECK J A P - L - K U J A P E L - K A J A P - L E K E J A P - - - K U stage 8 IPA CONVERSION J A P … → j a b … J A P … → j a b … J A P … → j a b … J A P … → j a p … OUTPUT MSA j a b - l̩ - k o j a b ə l - k a j a b - l ə k ə j a p - - - k ɔ yes no 20 / 30

Slide 86

Slide 86 text

Automatic Sequence Comparison in Historical Linguistics Automatic Phonetic Alignment Automatic Phonetic Alignment _ INPUT SEQUEN- CES _ jabl̩ko jabəlka jabləkə japkɔ stage 1 SOUND-CLASS CONVERSION jabl̩ko → JAPLKU jabəlka → JAPELKA jabləkə → JAPLEKE japkɔ → JAPKU stage 2 LIBRARY CREATI- ON JAP-LKU JAPELKA JAPL-KU JAPLEKE JAPLKU JAP-KU JAPEL-KA JAP-LEKE ... ... stage 3 DISTANCE CAL- CULATION JAPLKU 0.00 0.14 0.34 0.12 JAPELKA 0.14 0.00 0.46 0.28 JAPLEKE 0.34 0.46 0.00 0.44 JAPKO 0.12 0.28 0.44 0.00 stage 4 CLUSTER ANALY- SIS . . . JAPLKU JAPELKA . JAPLEKE . . JAPKU stage 5 PROGRESSIVE ALIGNMENT J A P - L K U J A P E L K A JAPLEKE JAPKU MORE SEQUENCES? stage 6 ITERATIVE REFI- NEMENT J A P - L - K U J A P E L - K A J A P - L E K E JAPKU stage 7 SWAP CHECK J A P - L - K U J A P E L - K A J A P - L E K E J A P - - - K U stage 8 IPA CONVERSION J A P … → j a b … J A P … → j a b … J A P … → j a b … J A P … → j a p … OUTPUT MSA j a b - l̩ - k o j a b ə l - k a j a b - l ə k ə j a p - - - k ɔ yes no 20 / 30

Slide 87

Slide 87 text

Automatic Sequence Comparison in Historical Linguistics Automatic Phonetic Alignment Automatic Phonetic Alignment stage 5 PROGRESSIVE ALIGNMENT J A P - L K U J A P E L K A JAPLEKE JAPKU MORE SEQUENCES? stage 6 ITERATIVE REFI- NEMENT J A P - L - K U J A P E L - K A J A P - L E K E JAPKU yes no 20 / 30

Slide 88

Slide 88 text

Automatic Sequence Comparison in Historical Linguistics Automatic Phonetic Alignment Automatic Phonetic Alignment _ INPUT SEQUEN- CES _ jabl̩ko jabəlka jabləkə japkɔ stage 1 SOUND-CLASS CONVERSION jabl̩ko → JAPLKU jabəlka → JAPELKA jabləkə → JAPLEKE japkɔ → JAPKU stage 2 LIBRARY CREATI- ON JAP-LKU JAPELKA JAPL-KU JAPLEKE JAPLKU JAP-KU JAPEL-KA JAP-LEKE ... ... stage 3 DISTANCE CAL- CULATION JAPLKU 0.00 0.14 0.34 0.12 JAPELKA 0.14 0.00 0.46 0.28 JAPLEKE 0.34 0.46 0.00 0.44 JAPKO 0.12 0.28 0.44 0.00 stage 4 CLUSTER ANALY- SIS . . . JAPLKU JAPELKA . JAPLEKE . . JAPKU stage 5 PROGRESSIVE ALIGNMENT J A P - L K U J A P E L K A JAPLEKE JAPKU MORE SEQUENCES? stage 6 ITERATIVE REFI- NEMENT J A P - L - K U J A P E L - K A J A P - L E K E JAPKU stage 7 SWAP CHECK J A P - L - K U J A P E L - K A J A P - L E K E J A P - - - K U stage 8 IPA CONVERSION J A P … → j a b … J A P … → j a b … J A P … → j a b … J A P … → j a p … OUTPUT MSA j a b - l̩ - k o j a b ə l - k a j a b - l ə k ə j a p - - - k ɔ yes no 20 / 30

Slide 89

Slide 89 text

Automatic Sequence Comparison in Historical Linguistics Automatic Phonetic Alignment Automatic Phonetic Alignment _ INPUT SEQUEN- CES _ jabl̩ko jabəlka jabləkə japkɔ stage 1 SOUND-CLASS CONVERSION jabl̩ko → JAPLKU jabəlka → JAPELKA jabləkə → JAPLEKE japkɔ → JAPKU stage 2 LIBRARY CREATI- ON JAP-LKU JAPELKA JAPL-KU JAPLEKE JAPLKU JAP-KU JAPEL-KA JAP-LEKE ... ... stage 3 DISTANCE CAL- CULATION JAPLKU 0.00 0.14 0.34 0.12 JAPELKA 0.14 0.00 0.46 0.28 JAPLEKE 0.34 0.46 0.00 0.44 JAPKO 0.12 0.28 0.44 0.00 stage 4 CLUSTER ANALY- SIS . . . JAPLKU JAPELKA . JAPLEKE . . JAPKU stage 5 PROGRESSIVE ALIGNMENT J A P - L K U J A P E L K A JAPLEKE JAPKU MORE SEQUENCES? stage 6 ITERATIVE REFI- NEMENT J A P - L - K U J A P E L - K A J A P - L E K E JAPKU stage 7 SWAP CHECK J A P - L - K U J A P E L - K A J A P - L E K E J A P - - - K U stage 8 IPA CONVERSION J A P … → j a b … J A P … → j a b … J A P … → j a b … J A P … → j a p … OUTPUT MSA j a b - l̩ - k o j a b ə l - k a j a b - l ə k ə j a p - - - k ɔ yes no 20 / 30

Slide 90

Slide 90 text

Automatic Sequence Comparison in Historical Linguistics Automatic Phonetic Alignment Automatic Phonetic Alignment JAPKU stage 7 SWAP CHECK J A P - L - K U J A P E L - K A J A P - L E K E J A P - - - K U stage 8 IPA CONVERSION J A P … → j a b … J A P … → j a b … J A P … → j a b … J A P … → j a p … OUTPUT MSA j a b - l̩ - k o j a b ə l - k a j a b - l ə k ə j a p - - - k ɔ 20 / 30

Slide 91

Slide 91 text

Automatic Sequence Comparison in Historical Linguistics Automatic Cognate Detection Automatic Cognate Detection INPUT: Multilingual wordlist → semantically tagged → phonetically transcribed → tokenized into phonemes OUTPUT: Multilingual wordlist → identified cognate entries assigned to clusters → identified cognate entries multiply aligned 21 / 30

Slide 92

Slide 92 text

Automatic Sequence Comparison in Historical Linguistics Automatic Cognate Detection Automatic Cognate Detection Basic Procedure for Multilingual Cognate Detection WORDLIST DATA 22 / 30

Slide 93

Slide 93 text

Automatic Sequence Comparison in Historical Linguistics Automatic Cognate Detection Automatic Cognate Detection Basic Procedure for Multilingual Cognate Detection WORDLIST DATA PAIRWISE DISTANCES BETWEEN WORDS PAIRWISE COMPARISON 22 / 30

Slide 94

Slide 94 text

Automatic Sequence Comparison in Historical Linguistics Automatic Cognate Detection Automatic Cognate Detection Basic Procedure for Multilingual Cognate Detection WORDLIST DATA PAIRWISE DISTANCES BETWEEN WORDS COGNATE SETS COGNATE CLUSTERING PAIRWISE COMPARISON 22 / 30

Slide 95

Slide 95 text

Automatic Sequence Comparison in Historical Linguistics Automatic Cognate Detection Automatic Cognate Detection Cognate Clustering Analysis ID Taxa Word Gloss GlossID IPA ... ... ... ... ... ... 21 German Frau woman 20 frau 22 Dutch vrouw woman 20 vrɑu 23 English woman woman 20 wʊmən 24 Danish kvinde woman 20 kvenə 25 Swedish kvinna woman 20 kviːna 26 Norwegian kvine woman 20 kʋinə ... ... ... ... ... ... 22 / 30

Slide 96

Slide 96 text

Automatic Sequence Comparison in Historical Linguistics Automatic Cognate Detection Automatic Cognate Detection Cognate Clustering Swedish English Danish Norwegian Dutch German kvinna woman kvinde kvine vrouw Frau Swedish kvina 0.00 0.69 0.07 0.12 0.71 0.78 English wumin 0.69 0.00 0.66 0.57 0.68 0.87 Danish kveni 0.07 0.66 0.00 0.08 0.67 0.71 Norwegian kwini 0.12 0.57 0.08 0.00 0.75 0.74 Dutch frou 0.71 0.68 0.67 0.75 0.00 0.17 German frau 0.78 0.87 0.71 0.74 0.17 0.00 Analysis ID Taxa Word Gloss GlossID IPA ... ... ... ... ... ... 21 German Frau woman 20 frau 22 Dutch vrouw woman 20 vrɑu 23 English woman woman 20 wʊmən 24 Danish kvinde woman 20 kvenə 25 Swedish kvinna woman 20 kviːna 26 Norwegian kvine woman 20 kʋinə ... ... ... ... ... ... 22 / 30

Slide 97

Slide 97 text

Automatic Sequence Comparison in Historical Linguistics Automatic Cognate Detection Automatic Cognate Detection Cognate Clustering Swedish English Danish Norwegian Dutch German kvinna woman kvinde kvine vrouw Frau Swedish kvina 0.00 0.69 0.07 0.12 0.71 0.78 English wumin 0.69 0.00 0.66 0.57 0.68 0.87 Danish kveni 0.07 0.66 0.00 0.08 0.67 0.71 Norwegian kwini 0.12 0.57 0.08 0.00 0.75 0.74 Dutch frou 0.71 0.68 0.67 0.75 0.00 0.17 German frau 0.78 0.87 0.71 0.74 0.17 0.00 German Frau frau Dutch vrouw vrou English woman wumin Danish kvinde kveni Swedish kvinna kvina Norwegian kvine kwini 22 / 30

Slide 98

Slide 98 text

Automatic Sequence Comparison in Historical Linguistics Automatic Cognate Detection Automatic Cognate Detection Cognate Clustering Swedish English Danish Norwegian Dutch German kvinna woman kvinde kvine vrouw Frau Swedish kvina 0.00 0.69 0.07 0.12 0.71 0.78 English wumin 0.69 0.00 0.66 0.57 0.68 0.87 Danish kveni 0.07 0.66 0.00 0.08 0.67 0.71 Norwegian kwini 0.12 0.57 0.08 0.00 0.75 0.74 Dutch frou 0.71 0.68 0.67 0.75 0.00 0.17 German frau 0.78 0.87 0.71 0.74 0.17 0.00 German Frau frau Dutch vrouw vrou English woman wumin Danish kvinde kveni Swedish kvinna kvina Norwegian kvine kwini 22 / 30

Slide 99

Slide 99 text

Automatic Sequence Comparison in Historical Linguistics Automatic Cognate Detection Automatic Cognate Detection Cognate Clustering German Frau frau Dutch vrouw vrou English woman wumin Danish kvinde kveni Swedish kvinna kvina Norwegian kvine kwini Analysis ID Taxa Word Gloss GlossID IPA CogID ... ... ... ... ... ... ... 21 German Frau woman 20 frau 1 22 Dutch vrouw woman 20 vrɑu 1 23 English woman woman 20 wʊmən 2 24 Danish kvinde woman 20 kvenə 3 25 Swedish kvinna woman 20 kviːna 3 26 Norwegian kvine woman 20 kʋinə 3 ... ... ... ... ... ... ... 22 / 30

Slide 100

Slide 100 text

Automatic Sequence Comparison in Historical Linguistics Automatic Cognate Detection Automatic Cognate Detection INPUT TOKENIZATION OUTPUT LexStat Algorithm (List 2012b & 2014) PREPROCESSING LOG-ODDS CORRESPONDENCE DETECTION USING PHONETIC ALIGNMENT LOOP DISTRIBUTION EXPECTED ATTESTED DISTRIBUTION D ISTANCE CALCULATION COGNATE CLUSTERING 22 / 30

Slide 101

Slide 101 text

Automatic Sequence Comparison in Historical Linguistics Implementation Implementation LingPy http://lingpy.org http://sequencecomparison.github.io 23 / 30

Slide 102

Slide 102 text

Automatic Sequence Comparison in Historical Linguistics Evaluation Evaluation: SCA (List 2012a, c & 2014) Gold Standard for Multiple Alignment Analyses 750 multiple alignments (manually edited) 50 089 Words 528 different languages and dialects 8 language families encoded in IPA online at http://sequencecomparison.github.io 24 / 30

Slide 103

Slide 103 text

Automatic Sequence Comparison in Historical Linguistics Evaluation Evaluation: SCA (List 2012a, c & 2014) Basic Library Iterate Lib-Iter Column score 84 85 86 87 88 89 90 91 Basic Library Iterate Lib-Iter Pair score 97 98 99 DOLGO ASJP SCA Performance of the Sound-Class Based Phonetic Alignment Algorithm (Multiple Alignments) 24 / 30

Slide 104

Slide 104 text

Automatic Sequence Comparison in Historical Linguistics Evaluation Evaluation: SCA (List 2012a, c & 2014) Basic Library Iterate Lib-Iter Column score 84 85 86 87 88 89 90 91 Basic Library Iterate Lib-Iter Pair score 97 98 99 DOLGO ASJP SCA 92% 99% Performance of the Sound-Class Based Phonetic Alignment Algorithm (Multiple Alignments) 24 / 30

Slide 105

Slide 105 text

Automatic Sequence Comparison in Historical Linguistics Evaluation Evaluation: SCA (List 2012a, c & 2014) Taxon Alignment Dashi t͡ʂ - ɯ - ²¹ p - e ²¹ - - - Eryuan - - - - - p - i ³¹ ʂ e ⁴² Gongxing d͡ʐ - i - ¹² b - i ²¹ - - - Heqing - - - - - p - i ³¹ sʰ e ⁴⁴ Jianchuan - - - - - p - i ³¹ - - - Jianxing ʦ - ɯ - ³¹ p - e ²¹ - - - Lanping - - - - - p - ĩ ⁴² s e ⁴⁴ Luobenzhuo ʥ - ỹ - ⁴² - - - - - - - Mazhelong ɕ - e n ⁵⁵ p - e ²¹ - - - Qiliqiao - - - - - p - i ³¹ s e ⁴⁴ Tuoluo d j ɯ - ²¹ b - i ³⁵ - - - Yunlong - - - - - b j ɯ ²¹ s ɛ ⁵⁵ Zhoucheng ʦ - ɯ - ⁰ p - e ²¹ - - - XXX XXX XXX XXX XXX XXX XXX XXX XXX XXX XXX XXX 25 / 30

Slide 106

Slide 106 text

Automatic Sequence Comparison in Historical Linguistics Evaluation Evaluation: LexStat (List 2012b & 2014) Gold Standard for Automatic Cognate Detection 6 lexicostatistical datasets 10 243 cognate sets 95 different languages and dialects 8 language families incoded in IPA online at http://sequencecomparison.github.io 26 / 30

Slide 107

Slide 107 text

Automatic Sequence Comparison in Historical Linguistics Evaluation Evaluation: LexStat (List 2012b & 2014) Bai (Tibeto-Burman) Indo-European Japanese and Ryukyu Ob-Ugrian Austronesian Sinitic (Chinese Dialects) 60 65 70 75 80 85 90 95 Turchin NED SCA LexStat Performance of Different Cognate Detection Algorithms 26 / 30

Slide 108

Slide 108 text

Automatic Sequence Comparison in Historical Linguistics Evaluation Evaluation: LexStat (List 2012b & 2014) Bai (Tibeto-Burman) Indo-European Japanese and Ryukyu Ob-Ugrian Austronesian Sinitic (Chinese Dialects) 60 65 70 75 80 85 90 95 Turchin NED SCA LexStat 75% 93% 92% 81% 89% 81% Performance of Different Cognate Detection Algorithms 26 / 30

Slide 109

Slide 109 text

Automatic Sequence Comparison in Historical Linguistics Evaluation Evaluation: LexStat (List 2012b & 2014) Bai (Tibeto-Burman) Indo-European Japanese and Ryukyu Ob-Ugrian Austronesian Sinitic (Chinese Dialects) 60 65 70 75 80 85 90 95 Turchin NED SCA LexStat 75% 93% Performance of Different Cognate Detection Algorithms 26 / 30

Slide 110

Slide 110 text

Automatic Sequence Comparison in Historical Linguistics Evaluation Evaluation: LexStat (List 2012b & 2014) Dataset by Kessler (2001) “graben” (30) Turchin Levensht. LexStat. Albanisch gërmon gərmo 1 1 1 Englisch digs dɪg 2 2 2 Französisch creuse krøze 1 3 3 Deutsch gräbt graːb 1 1 4 Hawaii ‘eli ʔeli 5 5 5 Navajo hahashgééd hahageːd 6 6 6 Türkisch kazıyor kaz 7 3 7 27 / 30

Slide 111

Slide 111 text

Automatic Sequence Comparison in Historical Linguistics Evaluation Evaluation: LexStat (List 2012b & 2014) Dataset by Kessler (2001) “Mund” (104) Turchin Levensth. LexStat. Albanisch gojë goj 1 1 1 Englisch mouth mauθ 2 2 2 Französisch bouche buʃ 3 3 3 Deutsch Mund mund 4 4 2 Hawaii waha waha 5 5 5 Navajo ’azéé’ zeːʔ 6 6 6 Türkisch ağız aɣz 7 7 7 27 / 30

Slide 112

Slide 112 text

Concluding Remarks 28 / 30

Slide 113

Slide 113 text

Automatic Sequence Comparison in Historical Linguistics Evaluation Concluding Remarks The techniques for automatic sequence comparison in historical linguistics have greatly advanced during the last decade, and they are at a stage where they can actively help linguists in studying dialectal variation or carrying out initial analyses of understudied languages. There is, however, still space for improvement. So far, we cannot properly handle the major processes of lexical change, such as semantic shift, morphological processes, or borrowing. 29 / 30

Slide 114

Slide 114 text

Thanks for Your Attention! 30 / 30