Using networks to infer sound correspondence patterns across multiple languages - Speaker Deck

Using networks to infer sound correspondence patterns across multiple languages

by Johann-Mattis List

Speaker Deck

Tweet

Tweet

Slide 1

Slide 1 text

Using networks to infer sound-correspondence patterns across multiple Languages Johann-Mattis List Research Group “Computer-Assisted Language Comparison” Department of Linguistic and Cultural Evolution Max-Planck Institute for the Science of Human History Jena, Germany 2017-10-24 very long title P(A|B)=P(B|A)... 1 / 29

Slide 2

Slide 2 text

Comparative Linguistics 2 / 29

Slide 3

Slide 3 text

"All languages change, as long as they exist." (August Schleicher 1863) walkman Indo-European Germanic Old English English p f f f ə a æ ɑː t d d ð eː eː e ə r r r r Germanic German English iPod Comparative Linguistics 2 / 29

Slide 4

Slide 4 text

iPod Indo-European Germanic Old English English p f f f ə a æ ɑː t d d ð eː eː e ə r r r r Germanic German English walkman "All languages change, as long as they exist." (August Schleicher 1863) Comparative Linguistics 2 / 29

Slide 5

Slide 5 text

walkman Indo-European Germanic Old English English p f f f ə a æ ɑː t d d ð eː eː e ə r r r r Germanic German English iPod "All languages change, as long as they exist." (August Schleicher 1863) Comparative Linguistics 2 / 29

Slide 6

Slide 6 text

walkman Indo-European Germanic Old English English p f f f ə a æ ɑː t d d ð eː eː e ə r r r r Germanic German English iPod "All languages change, as long as they exist." (August Schleicher 1863) Comparative Linguistics 2 / 29

Slide 7

Slide 7 text

iPod Indo-European Germanic Old English English p f f f ə a æ ɑː t d d ð eː eː e ə r r r r walkman L₁ L₁ L₁ L₁ L₁ "All languages change, as long as they exist." (August Schleicher 1863) Comparative Linguistics 2 / 29

Slide 8

Slide 8 text

iPod Indo-European Germanic Old English English p f f f ə a æ ɑː t d d ð eː eː e ə r r r r walkman L₁ L₁ L₁ L₁ L₁ "All languages change, as long as they exist." (August Schleicher 1863) Comparative Linguistics 2 / 29

Slide 9

Slide 9 text

iPod Indo-European Germanic Old English English p f f f ə a æ ɑː t d d ð eː eː e ə r r r r walkman L₁ L₁ L₁ L₁ L₁ "All languages change, as long as they exist." (August Schleicher 1863) Comparative Linguistics 2 / 29

Slide 10

Slide 10 text

iPod Indo-European Germanic Old English English p f f f ə a æ ɑː t d d ð eː eː e ə r r r r walkman L₁ L₁ L₁ "All languages change, as long as they exist." (August Schleicher 1863) Comparative Linguistics 2 / 29

Slide 11

Slide 11 text

iPod Indo-European Germanic Old English English p f f f ə a æ ɑː t d d ð eː eː e ə r r r r walkman L₂ L₁ L₃ "All languages change, as long as they exist." (August Schleicher 1863) Comparative Linguistics 2 / 29

Slide 12

Slide 12 text

Comparative Linguistics Background Background 3 / 29

Slide 13

Slide 13 text

Comparative Linguistics Background Background 3 / 29

Slide 14

Slide 14 text

Comparative Linguistics Background Background 3 / 29

Slide 15

Slide 15 text

Comparative Linguistics Background Background 3 / 29

Slide 16

Slide 16 text

Comparative Linguistics Background Background 3 / 29

Slide 17

Slide 17 text

Comparative Linguistics Comparative Method The Comparative Method COMPA- RATIVE METHOD 4 / 29

Slide 18

Slide 18 text

Comparative Linguistics Comparative Method The Comparative Method COMPA- RATIVE METHOD 4 / 29

Slide 19

Slide 19 text

Comparative Linguistics Comparative Method The Comparative Method COMPA- RATIVE METHOD 4 / 29

Slide 20

Slide 20 text

Comparative Linguistics Comparative Method The Comparative Method COMPA- RATIVE METHOD 4 / 29

Slide 21

Slide 21 text

Comparative Linguistics Comparative Method The Comparative Method COMPA- RATIVE METHOD 4 / 29

Slide 22

Slide 22 text

Comparative Linguistics Computational Linguistics Computational Historical Linguistics COMPUTA- TIONAL HISTORICAL LINGUISTICS 5 / 29

Slide 23

Slide 23 text

Comparative Linguistics Computational Linguistics Computational Historical Linguistics COMPUTA- TIONAL HISTORICAL LINGUISTICS 5 / 29

Slide 24

Slide 24 text

Comparative Linguistics Computational Linguistics Computational Historical Linguistics COMPUTA- TIONAL HISTORICAL LINGUISTICS 5 / 29

Slide 25

Slide 25 text

Comparative Linguistics Computational Linguistics Computational Historical Linguistics COMPUTA- TIONAL HISTORICAL LINGUISTICS 5 / 29

Slide 26

Slide 26 text

Comparative Linguistics Computational Linguistics Computational Historical Linguistics COMPUTA- TIONAL HISTORICAL LINGUISTICS 5 / 29

Slide 27

Slide 27 text

Comparative Linguistics Computational Linguistics Classical vs. Computational Language Comparison LC CA COMPA- RATIVE METHOD lacks efficiency lacks consistency lacks efficiency lacks accuracy lacks flexibility high efficiency high consistency high flexibility high accuracy COMPUTA- TIONAL HISTORICAL LINGUISTICS 6 / 29

Slide 28

Slide 28 text

Comparative Linguistics Computational Linguistics Classical vs. Computational Language Comparison LC CA COMPA- RATIVE METHOD lacks efficiency lacks consistency lacks efficiency lacks accuracy lacks flexibility high efficiency high consistency high flexibility high accuracy COMPUTA- TIONAL HISTORICAL LINGUISTICS 6 / 29

Slide 29

Slide 29 text

Comparative Linguistics Computational Linguistics Classical vs. Computational Language Comparison LC CA lacks efficiency lacks consistency lacks efficiency lacks accuracy lacks flexibility high efficiency high consistency high flexibility high accuracy COMPA- RATIVE METHOD accuracy flexibility consistency efficiency COMPUTA- TIONAL HISTORICAL LINGUISTICS 6 / 29

Slide 30

Slide 30 text

Comparative Linguistics CALC Computer-Assisted Language Comparison LC CA LC CA lacks efficiency lacks consistency lacks efficiency lacks accuracy lacks flexibility high efficiency high consistency high flexibility high accuracy COMPA- RATIVE METHOD accuracy flexibility consistency efficiency COMPUTA- TIONAL HISTORICAL LINGUISTICS 7 / 29

Slide 31

Slide 31 text

Comparative Linguistics CALC Computer-Assisted Language Comparison LC CA 7 / 29

Slide 32

Slide 32 text

Historical Language Comparison 8 / 29

Slide 33

Slide 33 text

Historical Language Comparison Sequences in Biology and Linguistics Alphabets in Biology and Linguistics 9 / 29

Slide 34

Slide 34 text

Historical Language Comparison Sequences in Biology and Linguistics Alphabets in Biology and Linguistics • universal • language-specific 9 / 29

Slide 35

Slide 35 text

Historical Language Comparison Sequences in Biology and Linguistics Alphabets in Biology and Linguistics • universal • language-specific • limited • widely varying 9 / 29

Slide 36

Slide 36 text

Historical Language Comparison Sequences in Biology and Linguistics Alphabets in Biology and Linguistics • universal • language-specific • limited • widely varying • constant • mutable 9 / 29

Slide 37

Slide 37 text

Historical Language Comparison Sound Correspondences Inferring Correspondences Ø Sound Bola Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ 10 / 29

Slide 38

Slide 38 text

Historical Language Comparison Sound Correspondences Inferring Correspondences Ø Sound Bola Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ 10 / 29

Slide 39

Slide 39 text

Historical Language Comparison Sound Correspondences Inferring Correspondences Ø Sound Bola Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ 10 / 29

Slide 40

Slide 40 text

Historical Language Comparison Sound Correspondences Inferring Correspondences Ø Sound Bola Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ 10 / 29

Slide 41

Slide 41 text

Historical Language Comparison Sound Correspondences Inferring Correspondences Ø Sound Bola Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ 10 / 29

Slide 42

Slide 42 text

Historical Language Comparison Sound Correspondences Inferring Correspondences Ø Sound Bola Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ 10 / 29

Slide 43

Slide 43 text

Historical Language Comparison Sound Correspondences Inferring Correspondences Ø Sound Bola Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ 10 / 29

Slide 44

Slide 44 text

Historical Language Comparison Sound Correspondences Inferring Correspondences Ø Sound Bola Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ 10 / 29

Slide 45

Slide 45 text

Historical Language Comparison Homolog Detection Inferring Homologs Cognate List Alignment Correspondences Bola six kʰ j a u ʔ ⁵⁵ Bola Maru Freq. a(a̰) a(a̰) 3 x u u 3 x ʔ k 3 x j j 2 x k(ʰ) k(ʰ) 2 x ⁵⁵ ⁵⁵ 2 x ³¹ ³¹ 1 x Maru six kʰ j a u k ⁵⁵ Bola lip k a̰ u ʔ ⁵⁵ Maru lip k a̰ u k ⁵⁵ Bola man j a u ʔ ³¹ Maru man j a u k ³¹ 11 / 29

Slide 46

Slide 46 text

Historical Language Comparison Correspondence Patterns Inferring Patterns Ø Sound Bola Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ tʰ tsʰ t tʰ sʰ 12 / 29

Slide 47

Slide 47 text

Historical Language Comparison Correspondence Patterns Inferring Patterns Ø Sound Bola Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ tʰ tsʰ t tʰ sʰ "salt" Bola tʰ a ³⁵ Maru tsʰ ɔ ³⁵ Rangoon sʰ ɑ ⁵⁵ 12 / 29

Slide 48

Slide 48 text

Historical Language Comparison Correspondence Patterns Inferring Patterns Ø Sound Bola Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ t ts t tʰ tθ 12 / 29

Slide 49

Slide 49 text

Historical Language Comparison Correspondence Patterns Inferring Patterns Ø Sound Bola Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ t ts t tʰ tθ "tooth" Bola t u i ⁵⁵ Maru ts ɔ i ³¹ Rangoon tθ w a ⁵⁵ 12 / 29

Slide 50

Slide 50 text

Historical Language Comparison Correspondence Patterns Inferring Patterns Ø Sound Bola Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ tʰ tʰ t tʰ tʰ 12 / 29

Slide 51

Slide 51 text

Historical Language Comparison Correspondence Patterns Inferring Patterns Ø Sound Bola Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ tʰ tʰ t tʰ tʰ "sharp" Bola tʰ a ʔ ⁵⁵ Maru tʰ ɔ ʔ ⁵⁵ Rangoon tʰ ɛ ʔ ⁴ 12 / 29

Slide 52

Slide 52 text

Historical Language Comparison Correspondence Patterns Inferring Patterns Ø Sound Bola Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ t t t tʰ t 12 / 29

Slide 53

Slide 53 text

Historical Language Comparison Correspondence Patterns Inferring Patterns Ø Sound Bola Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ t t t tʰ t "wing" Bola t a u ŋ ⁵⁵ Maru t a u ŋ ³¹ Rangoon t ɑ u ∼ ²² 12 / 29

Slide 54

Slide 54 text

Inferring Correspondence Patterns 13 / 29

Slide 55

Slide 55 text

Inferring Correspondence Patterns Sound Correspondence Patterns Sound Correspondence Patterns PIE Hittite Sanskrit Avestan Greek Latin Gothic Old Church Slavonic Lithuanian Old Irish Armenian Tocharian *p p p p f p p f b p p Ø h w Ø p *b b p b bβ b b p b b b p p *bʰ b p bʱ/bh bβ pʰ/ph f b b b b b b p *t t t t θ t t θ/þ d t t t tʼ j/y t tʃ/c *d d t d d ð d d t d d d t ts ʃ/ś *dʰ d t dʰ/dh h d ð tʰ/th f d b d d d d t t tʃ/c ... ... ... ... ... ... ... ... ... ... ... ... *kʷ kʷ/ku k c k c k p t kʷ/qu hʷ/hw g k tʃ/č k c kʼ tʃʼ/čʼ k ʃʲ/ś *gʷ kʷ/u g j g j g b d gʷ/gu u q g ʒ/ž z g b k k ś *gʷʰ kʷ/ku gʷ/gu gʱ/gh h g j pʰ/ph tʰ/th kʰ/kh f gʷ/gu u g b g ʒ/ž z g g g dʒ/ǰ k ʃʲ/ś Clackson (2007: 37) 14 / 29

Slide 56

Slide 56 text

Inferring Correspondence Patterns Sound Correspondence Patterns Sound Correspondence Patterns 15 / 29

Slide 57

Slide 57 text

Inferring Correspondence Patterns Sound Correspondence Patterns Sound Correspondence Patterns correspondence patterns in linguistics are a way to encode mappings across several different alphabets 15 / 29

Slide 58

Slide 58 text

Inferring Correspondence Patterns Sound Correspondence Patterns Sound Correspondence Patterns correspondence patterns in linguistics are a way to encode mappings across several different alphabets they are usually inferred manually, by inspecting “correspondence sets” (Clackson 2007: 29f) of words (i.e., cognate sets with recurring sounds) 15 / 29

Slide 59

Slide 59 text

Inferring Correspondence Patterns Sound Correspondence Patterns Sound Correspondence Patterns correspondence patterns in linguistics are a way to encode mappings across several different alphabets they are usually inferred manually, by inspecting “correspondence sets” (Clackson 2007: 29f) of words (i.e., cognate sets with recurring sounds) the main problem of correspondence pattern identification is the handling of missing data, since not all cognate sets will necessarily contain reflexes from each of the languages under investigation 15 / 29

Slide 60

Slide 60 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Inference of Correspondence Patterns 16 / 29

Slide 61

Slide 61 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Inference of Correspondence Patterns the main idea for the correspondence pattern inference algorithm is to derive a graph from correspondence sets in which each individual correspondence set (a site in an aligned cognate set) is a node, and links between nodes are drawn between compatible correspondence sets 16 / 29

Slide 62

Slide 62 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Inference of Correspondence Patterns the main idea for the correspondence pattern inference algorithm is to derive a graph from correspondence sets in which each individual correspondence set (a site in an aligned cognate set) is a node, and links between nodes are drawn between compatible correspondence sets if two correspondence sets are compatible, this means that they have identical non-missing values for at least one language and no conflicting data for any of the languages 16 / 29

Slide 63

Slide 63 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Inference of Correspondence Patterns the main idea for the correspondence pattern inference algorithm is to derive a graph from correspondence sets in which each individual correspondence set (a site in an aligned cognate set) is a node, and links between nodes are drawn between compatible correspondence sets if two correspondence sets are compatible, this means that they have identical non-missing values for at least one language and no conflicting data for any of the languages if two or more correspondence sets are compatible, we can impute missing values by combining them 16 / 29

Slide 64

Slide 64 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Compabitility of Alignment Sites Cognate Set L1 L2 L3 L4 L5 L6 L7 L8 “hand-1” p p p Ø f f Ø p “foot-1” p p p p f f p p ⊠ compatible □ incompatible 17 / 29

Slide 65

Slide 65 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Compabitility of Alignment Sites Cognate Set L1 L2 L3 L4 L5 L6 L7 L8 “hand-1” p p p Ø f f Ø p “foot-1” p p p p f f p p ⊠ compatible □ incompatible Cognate Set L1 L2 L3 L4 L5 L6 L7 L8 “hand-1” p p p Ø f f Ø p “leg-1” p p f pf f f p p □ compatible ⊠ incompatible 17 / 29

Slide 66

Slide 66 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Compatibility Graphs 8 Burmish languages (spoken in China and Myanmar, taken from Hill and List 2017) 18 / 29

Slide 67

Slide 67 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Compatibility Graphs 8 Burmish languages (spoken in China and Myanmar, taken from Hill and List 2017) 240 concepts 855 partial cognate sets 18 / 29

Slide 68

Slide 68 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Compatibility Graphs 8 Burmish languages (spoken in China and Myanmar, taken from Hill and List 2017) 240 concepts 855 partial cognate sets 728 cross-semantic partial cognate sets (covering one and more concepts) 18 / 29

Slide 69

Slide 69 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Compatibility Graphs 8 Burmish languages (spoken in China and Myanmar, taken from Hill and List 2017) 240 concepts 855 partial cognate sets 728 cross-semantic partial cognate sets (covering one and more concepts) 218 valid cognate sets (residues in more than one language) 18 / 29

Slide 70

Slide 70 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Compabitility Graphs A "gums" Achang ʂ - u a ³¹ Rangoon tθ w - ɑ ⁵⁵ Atsi Ø Ø Ø Ø Ø Bola Ø Ø Ø Ø Ø Maru Ø Ø Ø Ø Ø B "die" Achang Ø Ø Ø Ø Rangoon tθ e - ²² Atsi Ø Ø Ø Ø Bola ʃ ɿ - ⁵⁵ Maru ʃ i k ³¹ C "daughter" Achang Ø Ø Ø Rangoon tθ ɑ ⁵³ Atsi Ø Ø Ø Bola Ø Ø Ø Maru ts o ⁰ A B A C B C ʂ Ø ʂ Ø Ø Ø tθ tθ tθ tθ tθ tθ Ø Ø Ø Ø Ø Ø Ø ʃ Ø Ø ʃ Ø Ø ʃ Ø ts ʃ ts identical compatible incompatible A C B A B C 19 / 29

Slide 71

Slide 71 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Compatibility Graphs pʰ tʰ tʰ tʰ tʰ pʰ tʰ tʰ pʰ tʰ pʰ ŋ ŋ ŋ ŋ ŋ ŋ ŋ ŋ tsʰ tʃʰ tsʰ tʃʰ tʃʰ tsʰ tsʰ tsʰ v j f j v v n◌̥ ŋ - ŋ ŋ ŋ ŋ ʃ ʃ ʃ ʃ ʃ ʃ ʃ ʃ s s tʃ tʃ s tʃ tʃ tʃ tʃ tʃ tʃ tʃ x x tʃ x x x t t t t t t ʃ ʃ ʃ ʃ ʃ ʃ ts ts ts ts ts ts ts ts ts ts t t t t t t t t p pʰ m m p m m pʰ p m s s s s s s s s s s n l l l l l l l l l l l l s s s s s s p p p p p p p p p p pʰ p p p p p p m m m m m m m l l l l l l l l j j - j - j j - j k ɣ ɣ ɣ ɣ ʐ ɣ j j v j - w v j k k k k k k k k k kʰ kʰ kʰ kʰ kʰ kʰ kʰ k k k k k kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ x x x x n◌̥ n n n n n n n n n ŋ n n n n n n n n n k k k k k k m m m m m m m m m n◌̥ n n n - ŋ n ŋ nʲ m m m m m m m m 20 / 29

Slide 72

Slide 72 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Compatibility Graphs tʰ tʰ tʰ pʰ tʰ tʰ pʰ pʰ tʰ tʰ pʰ ŋ ŋ ŋ ŋ ŋ ŋ ŋ ŋ tsʰ tsʰ tʃʰ tsʰ tsʰ tʃʰ tʃʰ tsʰ j v f j v v ŋ - ŋ n◌̥ ŋ ŋ ŋ ʃ ʃ s ʃ ʃ ʃ ʃ ʃ ʃ s tʃ tʃ tʃ tʃ tʃ s tʃ tʃ tʃ tʃ x x tʃ x x x t t t t t t ʃ ʃ ʃ ʃ ʃ ʃ ts ts ts ts ts ts ts ts ts ts t t t t t t t t m m m pʰ pʰ p p p m m s s s s s s s s s s n l l l l l l l l l l l l s s s s s s p p p p p p p p p p p p p p p pʰ p m m m m m m m l l l l l l l l - - j j j j j - j k ɣ ɣ ɣ ɣ ʐ ɣ w j - v v j j j k k k k k k k k k kʰ kʰ kʰ kʰ kʰ kʰ kʰ k k k k k kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ x x x x n n◌̥ n n n n n ŋ n n n n n n n n n n n n k k k k k k m m m m m m m m - n n ŋ ŋ n n◌̥ n nʲ m m m m m m m m m 20 / 29

Slide 73

Slide 73 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Compatibility Graphs x x x x x x x x x x good correspondence set bad correspondence set 20 / 29

Slide 74

Slide 74 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Compatibility Graphs Only fully compatible clusters (i.e., only cliques in our net- work of correspondence sets) can represent true sound correspondence patterns (if sound change is regular). 20 / 29

Slide 75

Slide 75 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Correspondence Pattern Inference as Clique Cover Problem 21 / 29

Slide 76

Slide 76 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Correspondence Pattern Inference as Clique Cover Problem The clique cover problem (also called clique partitioning problem, see Bhasker 1991) is the inverse of the famous graph coloring problem and has been shown to be NP-hard. 21 / 29

Slide 77

Slide 77 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Correspondence Pattern Inference as Clique Cover Problem The clique cover problem (also called clique partitioning problem, see Bhasker 1991) is the inverse of the famous graph coloring problem and has been shown to be NP-hard. The goal of the problem is to split a graph into the smallest number of cliques in which each node is represented by exactly one clique. 21 / 29

Slide 78

Slide 78 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Correspondence Pattern Inference as Clique Cover Problem The clique cover problem (also called clique partitioning problem, see Bhasker 1991) is the inverse of the famous graph coloring problem and has been shown to be NP-hard. The goal of the problem is to split a graph into the smallest number of cliques in which each node is represented by exactly one clique. We assume (but we cannot formally prove it) that the clique cover of our graph of compatible correspondence sets will correspond to the optimal set of sound correspondence patterns in our data. 21 / 29

Slide 79

Slide 79 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Correspondence Pattern Inference as Clique Cover Problem The clique cover problem (also called clique partitioning problem, see Bhasker 1991) is the inverse of the famous graph coloring problem and has been shown to be NP-hard. The goal of the problem is to split a graph into the smallest number of cliques in which each node is represented by exactly one clique. We assume (but we cannot formally prove it) that the clique cover of our graph of compatible correspondence sets will correspond to the optimal set of sound correspondence patterns in our data. By applying an approximation algorithm to infer a near-optimal clique cover of our data of aligned cognate sets, we can infer the most frequently recurring correspondence patterns in our data. 21 / 29

Slide 80

Slide 80 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Graph with Clique Cover tʃʰ s ʃ tʃʰ x j n◌̥ n n ŋ tsʰ tsʰ tsʰ tʃʰ pʰ pʰ pʰ j w x v x v x j x n n x x x n◌̥ x n m m m n kʰ kʰ kʰ tʰ tʰ tʰ kʰ tʰ tʰ tʰ tʰ - - - pʰ pʰ p j ɣ ɣ ɣ ɣ j j j n n n n m n n n n m m m m m m m m m m l l l p l l l l l l l kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ ʃ s s s k ʃ ʃ k s s k s ʃ ʃ ʃ t t t t t t t t t t t m m m m m m m m m p p p p p p p p p p p p p p p k k s kʰ kʰ k ʃ ŋ n m l t t l - tsʰ f tʃ k n n l ʃ tsʰ l s m t p k n kʰ m j m v j s tʃ n ts m l ŋ k kʰ v ʃ ʐ ʃ n k j - tʃ pʰ s v m k ŋ ŋ - n n l n◌̥ ŋ ŋ l l l l ʃ ʃ ts ʃ k s k s s s s s ts ts ts ts ts ts j k j ɣ k ŋ ŋ ŋ ŋ ŋ ŋ ŋ s s tʃ tʃ tʃ tʃ tʃ tʃ tʃ k k k k k ts p pʰ ts nʲ ŋ n ŋ k 22 / 29

Slide 81

Slide 81 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Graph with Clique Cover tʰ tʰ tʰ pʰ tʰ tʰ pʰ pʰ tʰ tʰ pʰ ŋ ŋ ŋ ŋ ŋ ŋ ŋ ŋ tsʰ tsʰ tʃʰ tsʰ tsʰ tʃʰ tʃʰ tsʰ j v f j v v ŋ - ŋ n◌̥ ŋ ŋ ŋ ʃ ʃ s ʃ ʃ ʃ ʃ ʃ ʃ s tʃ tʃ tʃ tʃ tʃ s tʃ tʃ tʃ tʃ x x tʃ x x x t t t t t t ʃ ʃ ʃ ʃ ʃ ʃ ts ts ts ts ts ts ts ts ts ts t t t t t t t t m m m pʰ pʰ p p p m m s s s s s s s s s s n l l l l l l l l l l l l s s s s s s p p p p p p p p p p p p p p p pʰ p m m m m m m m l l l l l l l l - - j j j j j - j k ɣ ɣ ɣ ɣ ʐ ɣ w j - v v j j j k k k k k k k k k kʰ kʰ kʰ kʰ kʰ kʰ kʰ k k k k k kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ x x x x n n◌̥ n n n n n ŋ n n n n n n n n n n n n k k k k k k m m m m m m m m - n n ŋ ŋ n n◌̥ n nʲ m m m m m m m m m 22 / 29

Slide 82

Slide 82 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Graph with Clique Cover tʃʰ tʃʰ tsʰ tsʰ tsʰ tsʰ tʰ tʰ tsʰ tʃʰ tʰ tʰ tʰ tʰ tʰ 22 / 29

Slide 83

Slide 83 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Graph with Clique Cover tʃʰ tʃʰ tʃʰ tsʰ tsʰ tsʰ tsʰ tsʰ tʰ tʰ tʰ tʰ tʰ tʰ tʰ Clique Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 41 659 "goat" Ø Ø Ø Ø tʃʰ tsʰ sʰ Ø 41 672 "armpit" Ø Ø tʃʰ tʃʰ tʃʰ Ø Ø Ø 41 433 "rice" tsʰ tʃʰ tʃʰ tʃʰ tʃʰ Ø sʰ tsʰ Clique Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 74 55 "ten" Ø Ø tʰ tsʰ Ø Ø Ø tsʰ 42 53 "ten" tɕʰ tsʰ Ø Ø Ø tsʰ sʰ Ø 42 421 "salt" tɕʰ tsʰ tʰ tsʰ tsʰ tsʰ sʰ cʰ 42 129 "twenty" Ø tsʰ tʰ tsʰ tsʰ tsʰ sʰ Ø 83 287 "hair" Ø tsʰ tsʰ tsʰ tsʰ tsʰ sʰ Ø Clique Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 17 639 "above" Ø tʰ tʰ Ø tʰ tʰ tʰ Ø 17 472 "sing" Ø tʰ tʰ Ø tʰ Ø Ø Ø 17 61 "that" tʰ Ø Ø tʰ tʰ tʰ tʰ tʰ 17 323 "sharp" tʰ tʰ tʰ tʰ tʰ tʰ tʰ Ø 17 66 "there" tʰ Ø tʰ tʰ tʰ Ø tʰ tʰ 17 547 "firewood" tʰ tʰ tʰ tʰ tʰ tʰ tʰ tʰ 17 74 "thick" Ø tʰ tʰ tʰ tʰ Ø tʰ Ø 22 / 29

Slide 84

Slide 84 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Graph with Clique Cover Clique Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 41 659 "goat" Ø Ø Ø Ø tʃʰ tsʰ sʰ Ø 41 672 "armpit" Ø Ø tʃʰ tʃʰ tʃʰ Ø Ø Ø 41 433 "rice" tsʰ tʃʰ tʃʰ tʃʰ tʃʰ Ø sʰ tsʰ Clique Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 74 55 "ten" Ø Ø tʰ tsʰ Ø Ø Ø tsʰ 42 53 "ten" tɕʰ tsʰ Ø Ø Ø tsʰ sʰ Ø 42 421 "salt" tɕʰ tsʰ tʰ tsʰ tsʰ tsʰ sʰ cʰ 42 129 "twenty" Ø tsʰ tʰ tsʰ tsʰ tsʰ sʰ Ø 83 287 "hair" Ø tsʰ tsʰ tsʰ tsʰ tsʰ sʰ Ø Clique Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 17 639 "above" Ø tʰ tʰ Ø tʰ tʰ tʰ Ø 17 472 "sing" Ø tʰ tʰ Ø tʰ Ø Ø Ø 17 61 "that" tʰ Ø Ø tʰ tʰ tʰ tʰ tʰ 17 323 "sharp" tʰ tʰ tʰ tʰ tʰ tʰ tʰ Ø 17 66 "there" tʰ Ø tʰ tʰ tʰ Ø tʰ tʰ 17 547 "firewood" tʰ tʰ tʰ tʰ tʰ tʰ tʰ tʰ 17 74 "thick" Ø tʰ tʰ tʰ tʰ Ø tʰ Ø tʃʰ tʃʰ tʃʰ tsʰ tsʰ tsʰ tsʰ tsʰ tʰ tʰ tʰ tʰ tʰ tʰ tʰ 22 / 29

Slide 85

Slide 85 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Summary on Results 23 / 29

Slide 86

Slide 86 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Summary on Results 104 initial consonant patterns (48 with more than one reflex, the rest highly irregular) 23 / 29

Slide 87

Slide 87 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Summary on Results 104 initial consonant patterns (48 with more than one reflex, the rest highly irregular) many patterns correspond to well-known proto-sounds in the data 23 / 29

Slide 88

Slide 88 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Summary on Results 104 initial consonant patterns (48 with more than one reflex, the rest highly irregular) many patterns correspond to well-known proto-sounds in the data some cliques are unintuitive 23 / 29

Slide 89

Slide 89 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Summary on Problems Irregular patterns: 24 / 29

Slide 90

Slide 90 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Summary on Problems Irregular patterns: result in part from problems in cognate coding (homology assessment) 24 / 29

Slide 91

Slide 91 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Summary on Problems Irregular patterns: result in part from problems in cognate coding (homology assessment) result in part from sparseness of data 24 / 29

Slide 92

Slide 92 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Summary on Problems Irregular patterns: result in part from problems in cognate coding (homology assessment) result in part from sparseness of data result in part from the exactness of the algorithm 24 / 29

Slide 93

Slide 93 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Summary on Problems Irregular patterns: result in part from problems in cognate coding (homology assessment) result in part from sparseness of data result in part from the exactness of the algorithm Unintuitive patterns: 24 / 29

Slide 94

Slide 94 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Summary on Problems Irregular patterns: result in part from problems in cognate coding (homology assessment) result in part from sparseness of data result in part from the exactness of the algorithm Unintuitive patterns: result in part from the greediness of the algorithm 24 / 29

Slide 95

Slide 95 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Possible Improvements 25 / 29

Slide 96

Slide 96 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Possible Improvements catch the greediness of the algorithm by adding a secondary check of cliques (calculate consensus, re-assign all alignment sites to all compatible consensus sequences, count the instances) 25 / 29

Slide 97

Slide 97 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Possible Improvements catch the greediness of the algorithm by adding a secondary check of cliques (calculate consensus, re-assign all alignment sites to all compatible consensus sequences, count the instances) allow for non-perfect cliques in which compatibility is allowed to deviate to a certain degree (e.g., one irregular cell) 25 / 29

Slide 98

Slide 98 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Possible Improvements catch the greediness of the algorithm by adding a secondary check of cliques (calculate consensus, re-assign all alignment sites to all compatible consensus sequences, count the instances) allow for non-perfect cliques in which compatibility is allowed to deviate to a certain degree (e.g., one irregular cell) provide a more fine-grained checking of proposed cliques by counting columns suffering from missing data 25 / 29

Slide 99

Slide 99 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Possible Improvements catch the greediness of the algorithm by adding a secondary check of cliques (calculate consensus, re-assign all alignment sites to all compatible consensus sequences, count the instances) allow for non-perfect cliques in which compatibility is allowed to deviate to a certain degree (e.g., one irregular cell) provide a more fine-grained checking of proposed cliques by counting columns suffering from missing data allow for a direct checking and correcting of patterns by the experts 25 / 29

Slide 100

Slide 100 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Possible Improvements Clique Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 41 659 "goat" Ø Ø Ø Ø tʃʰ tsʰ sʰ Ø 41 672 "armpit" Ø Ø tʃʰ tʃʰ tʃʰ Ø Ø Ø 41 433 "rice" tsʰ tʃʰ tʃʰ tʃʰ tʃʰ Ø sʰ tsʰ Clique Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 74 55 "ten" Ø Ø tʰ tsʰ Ø Ø Ø tsʰ 42 53 "ten" tɕʰ tsʰ Ø Ø Ø tsʰ sʰ Ø 42 421 "salt" tɕʰ tsʰ tʰ tsʰ tsʰ tsʰ sʰ cʰ 42 129 "twenty" Ø tsʰ tʰ tsʰ tsʰ tsʰ sʰ Ø 83 287 "hair" Ø tsʰ tsʰ tsʰ tsʰ tsʰ sʰ Ø Clique Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 17 639 "above" Ø tʰ tʰ Ø tʰ tʰ tʰ Ø 17 472 "sing" Ø tʰ tʰ Ø tʰ Ø Ø Ø 17 61 "that" tʰ Ø Ø tʰ tʰ tʰ tʰ tʰ 17 323 "sharp" tʰ tʰ tʰ tʰ tʰ tʰ tʰ Ø 17 66 "there" tʰ Ø tʰ tʰ tʰ Ø tʰ tʰ 17 547 "firewood" tʰ tʰ tʰ tʰ tʰ tʰ tʰ tʰ 17 74 "thick" Ø tʰ tʰ tʰ tʰ Ø tʰ Ø tʃʰ tʃʰ tʃʰ tsʰ tsʰ tsʰ tsʰ tsʰ tʰ tʰ tʰ tʰ tʰ tʰ tʰ 26 / 29

Slide 101

Slide 101 text

Outlook 27 / 29

Slide 102

Slide 102 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Outlook 28 / 29

Slide 103

Slide 103 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Outlook the proposed inference of correspondence patterns is a first attempt to account for systemic aspects of sound change in a rigorous manner 28 / 29

Slide 104

Slide 104 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Outlook the proposed inference of correspondence patterns is a first attempt to account for systemic aspects of sound change in a rigorous manner in contrast to many approaches proposed so far, it does not require family trees in any form, networks are just enough, but the patterns inferred can be used to study tree-like aspects of evolution (Chacon and List 2015), 28 / 29

Slide 105

Slide 105 text

Inferring Correspondence Patterns Inference of Correspondence Patterns Outlook the proposed inference of correspondence patterns is a first attempt to account for systemic aspects of sound change in a rigorous manner in contrast to many approaches proposed so far, it does not require family trees in any form, networks are just enough, but the patterns inferred can be used to study tree-like aspects of evolution (Chacon and List 2015), the algorithm is surely a good start, but it needs to be improved in several ways 28 / 29

Slide 106

Slide 106 text

Merci Pour Votre Attention! 29 / 29