Using networks to infer sound correspondence patterns across multiple languages

Using networks to infer sound-correspondence patterns across multiple Languages Johann-Mattis
List Research Group “Computer-Assisted Language Comparison” Department of Linguistic and Cultural Evolution Max-Planck Institute for the Science of Human History Jena, Germany 2017-10-24 very long title P(A|B)=P(B|A)... 1 / 29

Comparative Linguistics 2 / 29

"All languages change, as long as they exist." (August Schleicher
1863) walkman Indo-European Germanic Old English English p f f f ə a æ ɑː t d d ð eː eː e ə r r r r Germanic German English iPod Comparative Linguistics 2 / 29

iPod Indo-European Germanic Old English English p f f f
ə a æ ɑː t d d ð eː eː e ə r r r r Germanic German English walkman "All languages change, as long as they exist." (August Schleicher 1863) Comparative Linguistics 2 / 29

walkman Indo-European Germanic Old English English p f f f
ə a æ ɑː t d d ð eː eː e ə r r r r Germanic German English iPod "All languages change, as long as they exist." (August Schleicher 1863) Comparative Linguistics 2 / 29

ə a æ ɑː t d d ð eː eː e ə r r r r walkman L₁ L₁ L₁ L₁ L₁ "All languages change, as long as they exist." (August Schleicher 1863) Comparative Linguistics 2 / 29

ə a æ ɑː t d d ð eː eː e ə r r r r walkman L₁ L₁ L₁ "All languages change, as long as they exist." (August Schleicher 1863) Comparative Linguistics 2 / 29

ə a æ ɑː t d d ð eː eː e ə r r r r walkman L₂ L₁ L₃ "All languages change, as long as they exist." (August Schleicher 1863) Comparative Linguistics 2 / 29

Comparative Linguistics Background Background 3 / 29

Comparative Linguistics Comparative Method The Comparative Method COMPA- RATIVE METHOD
4 / 29

Comparative Linguistics Computational Linguistics Computational Historical Linguistics COMPUTA- TIONAL HISTORICAL
LINGUISTICS 5 / 29

Comparative Linguistics Computational Linguistics Classical vs. Computational Language Comparison LC
CA COMPA- RATIVE METHOD lacks efficiency lacks consistency lacks efficiency lacks accuracy lacks flexibility high efficiency high consistency high flexibility high accuracy COMPUTA- TIONAL HISTORICAL LINGUISTICS 6 / 29

Comparative Linguistics Computational Linguistics Classical vs. Computational Language Comparison LC
CA lacks efficiency lacks consistency lacks efficiency lacks accuracy lacks flexibility high efficiency high consistency high flexibility high accuracy COMPA- RATIVE METHOD accuracy flexibility consistency efficiency COMPUTA- TIONAL HISTORICAL LINGUISTICS 6 / 29

Comparative Linguistics CALC Computer-Assisted Language Comparison LC CA LC CA
lacks efficiency lacks consistency lacks efficiency lacks accuracy lacks flexibility high efficiency high consistency high flexibility high accuracy COMPA- RATIVE METHOD accuracy flexibility consistency efficiency COMPUTA- TIONAL HISTORICAL LINGUISTICS 7 / 29

Comparative Linguistics CALC Computer-Assisted Language Comparison LC CA 7 /
29

Historical Language Comparison 8 / 29

Historical Language Comparison Sequences in Biology and Linguistics Alphabets in
Biology and Linguistics 9 / 29

Biology and Linguistics • universal • language-specific 9 / 29

Biology and Linguistics • universal • language-specific • limited • widely varying 9 / 29

Biology and Linguistics • universal • language-specific • limited • widely varying • constant • mutable 9 / 29

Historical Language Comparison Sound Correspondences Inferring Correspondences Ø Sound Bola
Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ 10 / 29

Historical Language Comparison Homolog Detection Inferring Homologs Cognate List Alignment
Correspondences Bola six kʰ j a u ʔ ⁵⁵ Bola Maru Freq. a(a̰) a(a̰) 3 x u u 3 x ʔ k 3 x j j 2 x k(ʰ) k(ʰ) 2 x ⁵⁵ ⁵⁵ 2 x ³¹ ³¹ 1 x Maru six kʰ j a u k ⁵⁵ Bola lip k a̰ u ʔ ⁵⁵ Maru lip k a̰ u k ⁵⁵ Bola man j a u ʔ ³¹ Maru man j a u k ³¹ 11 / 29

Historical Language Comparison Correspondence Patterns Inferring Patterns Ø Sound Bola
Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ tʰ tsʰ t tʰ sʰ 12 / 29

Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ tʰ tsʰ t tʰ sʰ "salt" Bola tʰ a ³⁵ Maru tsʰ ɔ ³⁵ Rangoon sʰ ɑ ⁵⁵ 12 / 29

Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ t ts t tʰ tθ 12 / 29

Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ t ts t tʰ tθ "tooth" Bola t u i ⁵⁵ Maru ts ɔ i ³¹ Rangoon tθ w a ⁵⁵ 12 / 29

Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ tʰ tʰ t tʰ tʰ 12 / 29

Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ tʰ tʰ t tʰ tʰ "sharp" Bola tʰ a ʔ ⁵⁵ Maru tʰ ɔ ʔ ⁵⁵ Rangoon tʰ ɛ ʔ ⁴ 12 / 29

Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ t t t tʰ t 12 / 29

Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ t t t tʰ t "wing" Bola t a u ŋ ⁵⁵ Maru t a u ŋ ³¹ Rangoon t ɑ u ∼ ²² 12 / 29

Inferring Correspondence Patterns 13 / 29

Inferring Correspondence Patterns Sound Correspondence Patterns Sound Correspondence Patterns PIE
Hittite Sanskrit Avestan Greek Latin Gothic Old Church Slavonic Lithuanian Old Irish Armenian Tocharian *p p p p f p p f b p p Ø h w Ø p *b b p b bβ b b p b b b p p *bʰ b p bʱ/bh bβ pʰ/ph f b b b b b b p *t t t t θ t t θ/þ d t t t tʼ j/y t tʃ/c *d d t d d ð d d t d d d t ts ʃ/ś *dʰ d t dʰ/dh h d ð tʰ/th f d b d d d d t t tʃ/c ... ... ... ... ... ... ... ... ... ... ... ... *kʷ kʷ/ku k c k c k p t kʷ/qu hʷ/hw g k tʃ/č k c kʼ tʃʼ/čʼ k ʃʲ/ś *gʷ kʷ/u g j g j g b d gʷ/gu u q g ʒ/ž z g b k k ś *gʷʰ kʷ/ku gʷ/gu gʱ/gh h g j pʰ/ph tʰ/th kʰ/kh f gʷ/gu u g b g ʒ/ž z g g g dʒ/ǰ k ʃʲ/ś Clackson (2007: 37) 14 / 29

Inferring Correspondence Patterns Sound Correspondence Patterns Sound Correspondence Patterns 15
/ 29

Inferring Correspondence Patterns Sound Correspondence Patterns Sound Correspondence Patterns correspondence
patterns in linguistics are a way to encode mappings across several different alphabets 15 / 29

patterns in linguistics are a way to encode mappings across several different alphabets they are usually inferred manually, by inspecting “correspondence sets” (Clackson 2007: 29f) of words (i.e., cognate sets with recurring sounds) 15 / 29

patterns in linguistics are a way to encode mappings across several different alphabets they are usually inferred manually, by inspecting “correspondence sets” (Clackson 2007: 29f) of words (i.e., cognate sets with recurring sounds) the main problem of correspondence pattern identification is the handling of missing data, since not all cognate sets will necessarily contain reflexes from each of the languages under investigation 15 / 29

Inferring Correspondence Patterns Inference of Correspondence Patterns Inference of Correspondence
Patterns 16 / 29

Patterns the main idea for the correspondence pattern inference algorithm is to derive a graph from correspondence sets in which each individual correspondence set (a site in an aligned cognate set) is a node, and links between nodes are drawn between compatible correspondence sets 16 / 29

Patterns the main idea for the correspondence pattern inference algorithm is to derive a graph from correspondence sets in which each individual correspondence set (a site in an aligned cognate set) is a node, and links between nodes are drawn between compatible correspondence sets if two correspondence sets are compatible, this means that they have identical non-missing values for at least one language and no conflicting data for any of the languages 16 / 29

Patterns the main idea for the correspondence pattern inference algorithm is to derive a graph from correspondence sets in which each individual correspondence set (a site in an aligned cognate set) is a node, and links between nodes are drawn between compatible correspondence sets if two correspondence sets are compatible, this means that they have identical non-missing values for at least one language and no conflicting data for any of the languages if two or more correspondence sets are compatible, we can impute missing values by combining them 16 / 29

Inferring Correspondence Patterns Inference of Correspondence Patterns Compabitility of Alignment
Sites Cognate Set L1 L2 L3 L4 L5 L6 L7 L8 “hand-1” p p p Ø f f Ø p “foot-1” p p p p f f p p ⊠ compatible □ incompatible 17 / 29

Inferring Correspondence Patterns Inference of Correspondence Patterns Compabitility of Alignment
Sites Cognate Set L1 L2 L3 L4 L5 L6 L7 L8 “hand-1” p p p Ø f f Ø p “foot-1” p p p p f f p p ⊠ compatible □ incompatible Cognate Set L1 L2 L3 L4 L5 L6 L7 L8 “hand-1” p p p Ø f f Ø p “leg-1” p p f pf f f p p □ compatible ⊠ incompatible 17 / 29

Inferring Correspondence Patterns Inference of Correspondence Patterns Compatibility Graphs 8
Burmish languages (spoken in China and Myanmar, taken from Hill and List 2017) 18 / 29

Burmish languages (spoken in China and Myanmar, taken from Hill and List 2017) 240 concepts 855 partial cognate sets 18 / 29

Burmish languages (spoken in China and Myanmar, taken from Hill and List 2017) 240 concepts 855 partial cognate sets 728 cross-semantic partial cognate sets (covering one and more concepts) 18 / 29

Burmish languages (spoken in China and Myanmar, taken from Hill and List 2017) 240 concepts 855 partial cognate sets 728 cross-semantic partial cognate sets (covering one and more concepts) 218 valid cognate sets (residues in more than one language) 18 / 29

Inferring Correspondence Patterns Inference of Correspondence Patterns Compabitility Graphs A
"gums" Achang ʂ - u a ³¹ Rangoon tθ w - ɑ ⁵⁵ Atsi Ø Ø Ø Ø Ø Bola Ø Ø Ø Ø Ø Maru Ø Ø Ø Ø Ø B "die" Achang Ø Ø Ø Ø Rangoon tθ e - ²² Atsi Ø Ø Ø Ø Bola ʃ ɿ - ⁵⁵ Maru ʃ i k ³¹ C "daughter" Achang Ø Ø Ø Rangoon tθ ɑ ⁵³ Atsi Ø Ø Ø Bola Ø Ø Ø Maru ts o ⁰ A B A C B C ʂ Ø ʂ Ø Ø Ø tθ tθ tθ tθ tθ tθ Ø Ø Ø Ø Ø Ø Ø ʃ Ø Ø ʃ Ø Ø ʃ Ø ts ʃ ts identical compatible incompatible A C B A B C 19 / 29

Inferring Correspondence Patterns Inference of Correspondence Patterns Compatibility Graphs pʰ
tʰ tʰ tʰ tʰ pʰ tʰ tʰ pʰ tʰ pʰ ŋ ŋ ŋ ŋ ŋ ŋ ŋ ŋ tsʰ tʃʰ tsʰ tʃʰ tʃʰ tsʰ tsʰ tsʰ v j f j v v n◌̥ ŋ - ŋ ŋ ŋ ŋ ʃ ʃ ʃ ʃ ʃ ʃ ʃ ʃ s s tʃ tʃ s tʃ tʃ tʃ tʃ tʃ tʃ tʃ x x tʃ x x x t t t t t t ʃ ʃ ʃ ʃ ʃ ʃ ts ts ts ts ts ts ts ts ts ts t t t t t t t t p pʰ m m p m m pʰ p m s s s s s s s s s s n l l l l l l l l l l l l s s s s s s p p p p p p p p p p pʰ p p p p p p m m m m m m m l l l l l l l l j j - j - j j - j k ɣ ɣ ɣ ɣ ʐ ɣ j j v j - w v j k k k k k k k k k kʰ kʰ kʰ kʰ kʰ kʰ kʰ k k k k k kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ x x x x n◌̥ n n n n n n n n n ŋ n n n n n n n n n k k k k k k m m m m m m m m m n◌̥ n n n - ŋ n ŋ nʲ m m m m m m m m 20 / 29

Inferring Correspondence Patterns Inference of Correspondence Patterns Compatibility Graphs tʰ
tʰ tʰ pʰ tʰ tʰ pʰ pʰ tʰ tʰ pʰ ŋ ŋ ŋ ŋ ŋ ŋ ŋ ŋ tsʰ tsʰ tʃʰ tsʰ tsʰ tʃʰ tʃʰ tsʰ j v f j v v ŋ - ŋ n◌̥ ŋ ŋ ŋ ʃ ʃ s ʃ ʃ ʃ ʃ ʃ ʃ s tʃ tʃ tʃ tʃ tʃ s tʃ tʃ tʃ tʃ x x tʃ x x x t t t t t t ʃ ʃ ʃ ʃ ʃ ʃ ts ts ts ts ts ts ts ts ts ts t t t t t t t t m m m pʰ pʰ p p p m m s s s s s s s s s s n l l l l l l l l l l l l s s s s s s p p p p p p p p p p p p p p p pʰ p m m m m m m m l l l l l l l l - - j j j j j - j k ɣ ɣ ɣ ɣ ʐ ɣ w j - v v j j j k k k k k k k k k kʰ kʰ kʰ kʰ kʰ kʰ kʰ k k k k k kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ x x x x n n◌̥ n n n n n ŋ n n n n n n n n n n n n k k k k k k m m m m m m m m - n n ŋ ŋ n n◌̥ n nʲ m m m m m m m m m 20 / 29

Inferring Correspondence Patterns Inference of Correspondence Patterns Compatibility Graphs x
x x x x x x x x x good correspondence set bad correspondence set 20 / 29

Inferring Correspondence Patterns Inference of Correspondence Patterns Compatibility Graphs Only
fully compatible clusters (i.e., only cliques in our net- work of correspondence sets) can represent true sound correspondence patterns (if sound change is regular). 20 / 29

Inferring Correspondence Patterns Inference of Correspondence Patterns Correspondence Pattern Inference
as Clique Cover Problem 21 / 29

as Clique Cover Problem The clique cover problem (also called clique partitioning problem, see Bhasker 1991) is the inverse of the famous graph coloring problem and has been shown to be NP-hard. 21 / 29

as Clique Cover Problem The clique cover problem (also called clique partitioning problem, see Bhasker 1991) is the inverse of the famous graph coloring problem and has been shown to be NP-hard. The goal of the problem is to split a graph into the smallest number of cliques in which each node is represented by exactly one clique. 21 / 29

as Clique Cover Problem The clique cover problem (also called clique partitioning problem, see Bhasker 1991) is the inverse of the famous graph coloring problem and has been shown to be NP-hard. The goal of the problem is to split a graph into the smallest number of cliques in which each node is represented by exactly one clique. We assume (but we cannot formally prove it) that the clique cover of our graph of compatible correspondence sets will correspond to the optimal set of sound correspondence patterns in our data. 21 / 29

as Clique Cover Problem The clique cover problem (also called clique partitioning problem, see Bhasker 1991) is the inverse of the famous graph coloring problem and has been shown to be NP-hard. The goal of the problem is to split a graph into the smallest number of cliques in which each node is represented by exactly one clique. We assume (but we cannot formally prove it) that the clique cover of our graph of compatible correspondence sets will correspond to the optimal set of sound correspondence patterns in our data. By applying an approximation algorithm to infer a near-optimal clique cover of our data of aligned cognate sets, we can infer the most frequently recurring correspondence patterns in our data. 21 / 29

Inferring Correspondence Patterns Inference of Correspondence Patterns Graph with Clique
Cover tʃʰ s ʃ tʃʰ x j n◌̥ n n ŋ tsʰ tsʰ tsʰ tʃʰ pʰ pʰ pʰ j w x v x v x j x n n x x x n◌̥ x n m m m n kʰ kʰ kʰ tʰ tʰ tʰ kʰ tʰ tʰ tʰ tʰ - - - pʰ pʰ p j ɣ ɣ ɣ ɣ j j j n n n n m n n n n m m m m m m m m m m l l l p l l l l l l l kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ ʃ s s s k ʃ ʃ k s s k s ʃ ʃ ʃ t t t t t t t t t t t m m m m m m m m m p p p p p p p p p p p p p p p k k s kʰ kʰ k ʃ ŋ n m l t t l - tsʰ f tʃ k n n l ʃ tsʰ l s m t p k n kʰ m j m v j s tʃ n ts m l ŋ k kʰ v ʃ ʐ ʃ n k j - tʃ pʰ s v m k ŋ ŋ - n n l n◌̥ ŋ ŋ l l l l ʃ ʃ ts ʃ k s k s s s s s ts ts ts ts ts ts j k j ɣ k ŋ ŋ ŋ ŋ ŋ ŋ ŋ s s tʃ tʃ tʃ tʃ tʃ tʃ tʃ k k k k k ts p pʰ ts nʲ ŋ n ŋ k 22 / 29

Cover tʰ tʰ tʰ pʰ tʰ tʰ pʰ pʰ tʰ tʰ pʰ ŋ ŋ ŋ ŋ ŋ ŋ ŋ ŋ tsʰ tsʰ tʃʰ tsʰ tsʰ tʃʰ tʃʰ tsʰ j v f j v v ŋ - ŋ n◌̥ ŋ ŋ ŋ ʃ ʃ s ʃ ʃ ʃ ʃ ʃ ʃ s tʃ tʃ tʃ tʃ tʃ s tʃ tʃ tʃ tʃ x x tʃ x x x t t t t t t ʃ ʃ ʃ ʃ ʃ ʃ ts ts ts ts ts ts ts ts ts ts t t t t t t t t m m m pʰ pʰ p p p m m s s s s s s s s s s n l l l l l l l l l l l l s s s s s s p p p p p p p p p p p p p p p pʰ p m m m m m m m l l l l l l l l - - j j j j j - j k ɣ ɣ ɣ ɣ ʐ ɣ w j - v v j j j k k k k k k k k k kʰ kʰ kʰ kʰ kʰ kʰ kʰ k k k k k kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ x x x x n n◌̥ n n n n n ŋ n n n n n n n n n n n n k k k k k k m m m m m m m m - n n ŋ ŋ n n◌̥ n nʲ m m m m m m m m m 22 / 29

Cover tʃʰ tʃʰ tsʰ tsʰ tsʰ tsʰ tʰ tʰ tsʰ tʃʰ tʰ tʰ tʰ tʰ tʰ 22 / 29

Cover tʃʰ tʃʰ tʃʰ tsʰ tsʰ tsʰ tsʰ tsʰ tʰ tʰ tʰ tʰ tʰ tʰ tʰ Clique Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 41 659 "goat" Ø Ø Ø Ø tʃʰ tsʰ sʰ Ø 41 672 "armpit" Ø Ø tʃʰ tʃʰ tʃʰ Ø Ø Ø 41 433 "rice" tsʰ tʃʰ tʃʰ tʃʰ tʃʰ Ø sʰ tsʰ Clique Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 74 55 "ten" Ø Ø tʰ tsʰ Ø Ø Ø tsʰ 42 53 "ten" tɕʰ tsʰ Ø Ø Ø tsʰ sʰ Ø 42 421 "salt" tɕʰ tsʰ tʰ tsʰ tsʰ tsʰ sʰ cʰ 42 129 "twenty" Ø tsʰ tʰ tsʰ tsʰ tsʰ sʰ Ø 83 287 "hair" Ø tsʰ tsʰ tsʰ tsʰ tsʰ sʰ Ø Clique Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 17 639 "above" Ø tʰ tʰ Ø tʰ tʰ tʰ Ø 17 472 "sing" Ø tʰ tʰ Ø tʰ Ø Ø Ø 17 61 "that" tʰ Ø Ø tʰ tʰ tʰ tʰ tʰ 17 323 "sharp" tʰ tʰ tʰ tʰ tʰ tʰ tʰ Ø 17 66 "there" tʰ Ø tʰ tʰ tʰ Ø tʰ tʰ 17 547 "firewood" tʰ tʰ tʰ tʰ tʰ tʰ tʰ tʰ 17 74 "thick" Ø tʰ tʰ tʰ tʰ Ø tʰ Ø 22 / 29

Cover Clique Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 41 659 "goat" Ø Ø Ø Ø tʃʰ tsʰ sʰ Ø 41 672 "armpit" Ø Ø tʃʰ tʃʰ tʃʰ Ø Ø Ø 41 433 "rice" tsʰ tʃʰ tʃʰ tʃʰ tʃʰ Ø sʰ tsʰ Clique Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 74 55 "ten" Ø Ø tʰ tsʰ Ø Ø Ø tsʰ 42 53 "ten" tɕʰ tsʰ Ø Ø Ø tsʰ sʰ Ø 42 421 "salt" tɕʰ tsʰ tʰ tsʰ tsʰ tsʰ sʰ cʰ 42 129 "twenty" Ø tsʰ tʰ tsʰ tsʰ tsʰ sʰ Ø 83 287 "hair" Ø tsʰ tsʰ tsʰ tsʰ tsʰ sʰ Ø Clique Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 17 639 "above" Ø tʰ tʰ Ø tʰ tʰ tʰ Ø 17 472 "sing" Ø tʰ tʰ Ø tʰ Ø Ø Ø 17 61 "that" tʰ Ø Ø tʰ tʰ tʰ tʰ tʰ 17 323 "sharp" tʰ tʰ tʰ tʰ tʰ tʰ tʰ Ø 17 66 "there" tʰ Ø tʰ tʰ tʰ Ø tʰ tʰ 17 547 "firewood" tʰ tʰ tʰ tʰ tʰ tʰ tʰ tʰ 17 74 "thick" Ø tʰ tʰ tʰ tʰ Ø tʰ Ø tʃʰ tʃʰ tʃʰ tsʰ tsʰ tsʰ tsʰ tsʰ tʰ tʰ tʰ tʰ tʰ tʰ tʰ 22 / 29

Inferring Correspondence Patterns Inference of Correspondence Patterns Summary on Results
23 / 29

104 initial consonant patterns (48 with more than one reflex, the rest highly irregular) 23 / 29

104 initial consonant patterns (48 with more than one reflex, the rest highly irregular) many patterns correspond to well-known proto-sounds in the data 23 / 29

104 initial consonant patterns (48 with more than one reflex, the rest highly irregular) many patterns correspond to well-known proto-sounds in the data some cliques are unintuitive 23 / 29

Inferring Correspondence Patterns Inference of Correspondence Patterns Summary on Problems
Irregular patterns: 24 / 29

Irregular patterns: result in part from problems in cognate coding (homology assessment) 24 / 29

Irregular patterns: result in part from problems in cognate coding (homology assessment) result in part from sparseness of data 24 / 29

Irregular patterns: result in part from problems in cognate coding (homology assessment) result in part from sparseness of data result in part from the exactness of the algorithm 24 / 29

Irregular patterns: result in part from problems in cognate coding (homology assessment) result in part from sparseness of data result in part from the exactness of the algorithm Unintuitive patterns: 24 / 29

Irregular patterns: result in part from problems in cognate coding (homology assessment) result in part from sparseness of data result in part from the exactness of the algorithm Unintuitive patterns: result in part from the greediness of the algorithm 24 / 29

Inferring Correspondence Patterns Inference of Correspondence Patterns Possible Improvements 25
/ 29

Inferring Correspondence Patterns Inference of Correspondence Patterns Possible Improvements catch
the greediness of the algorithm by adding a secondary check of cliques (calculate consensus, re-assign all alignment sites to all compatible consensus sequences, count the instances) 25 / 29

the greediness of the algorithm by adding a secondary check of cliques (calculate consensus, re-assign all alignment sites to all compatible consensus sequences, count the instances) allow for non-perfect cliques in which compatibility is allowed to deviate to a certain degree (e.g., one irregular cell) 25 / 29

the greediness of the algorithm by adding a secondary check of cliques (calculate consensus, re-assign all alignment sites to all compatible consensus sequences, count the instances) allow for non-perfect cliques in which compatibility is allowed to deviate to a certain degree (e.g., one irregular cell) provide a more fine-grained checking of proposed cliques by counting columns suffering from missing data 25 / 29

the greediness of the algorithm by adding a secondary check of cliques (calculate consensus, re-assign all alignment sites to all compatible consensus sequences, count the instances) allow for non-perfect cliques in which compatibility is allowed to deviate to a certain degree (e.g., one irregular cell) provide a more fine-grained checking of proposed cliques by counting columns suffering from missing data allow for a direct checking and correcting of patterns by the experts 25 / 29

Inferring Correspondence Patterns Inference of Correspondence Patterns Possible Improvements Clique
Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 41 659 "goat" Ø Ø Ø Ø tʃʰ tsʰ sʰ Ø 41 672 "armpit" Ø Ø tʃʰ tʃʰ tʃʰ Ø Ø Ø 41 433 "rice" tsʰ tʃʰ tʃʰ tʃʰ tʃʰ Ø sʰ tsʰ Clique Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 74 55 "ten" Ø Ø tʰ tsʰ Ø Ø Ø tsʰ 42 53 "ten" tɕʰ tsʰ Ø Ø Ø tsʰ sʰ Ø 42 421 "salt" tɕʰ tsʰ tʰ tsʰ tsʰ tsʰ sʰ cʰ 42 129 "twenty" Ø tsʰ tʰ tsʰ tsʰ tsʰ sʰ Ø 83 287 "hair" Ø tsʰ tsʰ tsʰ tsʰ tsʰ sʰ Ø Clique Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 17 639 "above" Ø tʰ tʰ Ø tʰ tʰ tʰ Ø 17 472 "sing" Ø tʰ tʰ Ø tʰ Ø Ø Ø 17 61 "that" tʰ Ø Ø tʰ tʰ tʰ tʰ tʰ 17 323 "sharp" tʰ tʰ tʰ tʰ tʰ tʰ tʰ Ø 17 66 "there" tʰ Ø tʰ tʰ tʰ Ø tʰ tʰ 17 547 "firewood" tʰ tʰ tʰ tʰ tʰ tʰ tʰ tʰ 17 74 "thick" Ø tʰ tʰ tʰ tʰ Ø tʰ Ø tʃʰ tʃʰ tʃʰ tsʰ tsʰ tsʰ tsʰ tsʰ tʰ tʰ tʰ tʰ tʰ tʰ tʰ 26 / 29

Outlook 27 / 29

Inferring Correspondence Patterns Inference of Correspondence Patterns Outlook 28 /
29

Inferring Correspondence Patterns Inference of Correspondence Patterns Outlook the proposed
inference of correspondence patterns is a first attempt to account for systemic aspects of sound change in a rigorous manner 28 / 29

inference of correspondence patterns is a first attempt to account for systemic aspects of sound change in a rigorous manner in contrast to many approaches proposed so far, it does not require family trees in any form, networks are just enough, but the patterns inferred can be used to study tree-like aspects of evolution (Chacon and List 2015), 28 / 29

inference of correspondence patterns is a first attempt to account for systemic aspects of sound change in a rigorous manner in contrast to many approaches proposed so far, it does not require family trees in any form, networks are just enough, but the patterns inferred can be used to study tree-like aspects of evolution (Chacon and List 2015), the algorithm is surely a good start, but it needs to be improved in several ways 28 / 29

Merci Pour Votre Attention! 29 / 29

Using networks to infer sound correspondence pa...

Using networks to infer sound correspondence patterns across multiple languages

More Decks by Johann-Mattis List

Other Decks in Science

Featured

Transcript