Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Using networks to infer sound correspondence pa...

Using networks to infer sound correspondence patterns across multiple languages

Talk held at the Symposium on Networks and Evolution (Université Pierre et Marie Curie, 2017-10-24, Paris).

Johann-Mattis List

October 24, 2017
Tweet

More Decks by Johann-Mattis List

Other Decks in Science

Transcript

  1. Using networks to infer sound-correspondence patterns across multiple Languages Johann-Mattis

    List Research Group “Computer-Assisted Language Comparison” Department of Linguistic and Cultural Evolution Max-Planck Institute for the Science of Human History Jena, Germany 2017-10-24 very long title P(A|B)=P(B|A)... 1 / 29
  2. "All languages change, as long as they exist." (August Schleicher

    1863) walkman Indo-European Germanic Old English English p f f f ə a æ ɑː t d d ð eː eː e ə r r r r Germanic German English iPod Comparative Linguistics 2 / 29
  3. iPod Indo-European Germanic Old English English p f f f

    ə a æ ɑː t d d ð eː eː e ə r r r r Germanic German English walkman "All languages change, as long as they exist." (August Schleicher 1863) Comparative Linguistics 2 / 29
  4. walkman Indo-European Germanic Old English English p f f f

    ə a æ ɑː t d d ð eː eː e ə r r r r Germanic German English iPod "All languages change, as long as they exist." (August Schleicher 1863) Comparative Linguistics 2 / 29
  5. walkman Indo-European Germanic Old English English p f f f

    ə a æ ɑː t d d ð eː eː e ə r r r r Germanic German English iPod "All languages change, as long as they exist." (August Schleicher 1863) Comparative Linguistics 2 / 29
  6. iPod Indo-European Germanic Old English English p f f f

    ə a æ ɑː t d d ð eː eː e ə r r r r walkman L₁ L₁ L₁ L₁ L₁ "All languages change, as long as they exist." (August Schleicher 1863) Comparative Linguistics 2 / 29
  7. iPod Indo-European Germanic Old English English p f f f

    ə a æ ɑː t d d ð eː eː e ə r r r r walkman L₁ L₁ L₁ L₁ L₁ "All languages change, as long as they exist." (August Schleicher 1863) Comparative Linguistics 2 / 29
  8. iPod Indo-European Germanic Old English English p f f f

    ə a æ ɑː t d d ð eː eː e ə r r r r walkman L₁ L₁ L₁ L₁ L₁ "All languages change, as long as they exist." (August Schleicher 1863) Comparative Linguistics 2 / 29
  9. iPod Indo-European Germanic Old English English p f f f

    ə a æ ɑː t d d ð eː eː e ə r r r r walkman L₁ L₁ L₁ "All languages change, as long as they exist." (August Schleicher 1863) Comparative Linguistics 2 / 29
  10. iPod Indo-European Germanic Old English English p f f f

    ə a æ ɑː t d d ð eː eː e ə r r r r walkman L₂ L₁ L₃ "All languages change, as long as they exist." (August Schleicher 1863) Comparative Linguistics 2 / 29
  11. Comparative Linguistics Computational Linguistics Classical vs. Computational Language Comparison LC

    CA COMPA- RATIVE METHOD lacks efficiency lacks consistency lacks efficiency lacks accuracy lacks flexibility high efficiency high consistency high flexibility high accuracy COMPUTA- TIONAL HISTORICAL LINGUISTICS 6 / 29
  12. Comparative Linguistics Computational Linguistics Classical vs. Computational Language Comparison LC

    CA COMPA- RATIVE METHOD lacks efficiency lacks consistency lacks efficiency lacks accuracy lacks flexibility high efficiency high consistency high flexibility high accuracy COMPUTA- TIONAL HISTORICAL LINGUISTICS 6 / 29
  13. Comparative Linguistics Computational Linguistics Classical vs. Computational Language Comparison LC

    CA lacks efficiency lacks consistency lacks efficiency lacks accuracy lacks flexibility high efficiency high consistency high flexibility high accuracy COMPA- RATIVE METHOD accuracy flexibility consistency efficiency COMPUTA- TIONAL HISTORICAL LINGUISTICS 6 / 29
  14. Comparative Linguistics CALC Computer-Assisted Language Comparison LC CA LC CA

    lacks efficiency lacks consistency lacks efficiency lacks accuracy lacks flexibility high efficiency high consistency high flexibility high accuracy COMPA- RATIVE METHOD accuracy flexibility consistency efficiency COMPUTA- TIONAL HISTORICAL LINGUISTICS 7 / 29
  15. Historical Language Comparison Sequences in Biology and Linguistics Alphabets in

    Biology and Linguistics • universal • language-specific 9 / 29
  16. Historical Language Comparison Sequences in Biology and Linguistics Alphabets in

    Biology and Linguistics • universal • language-specific • limited • widely varying 9 / 29
  17. Historical Language Comparison Sequences in Biology and Linguistics Alphabets in

    Biology and Linguistics • universal • language-specific • limited • widely varying • constant • mutable 9 / 29
  18. Historical Language Comparison Sound Correspondences Inferring Correspondences Ø Sound Bola

    Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ 10 / 29
  19. Historical Language Comparison Sound Correspondences Inferring Correspondences Ø Sound Bola

    Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ 10 / 29
  20. Historical Language Comparison Sound Correspondences Inferring Correspondences Ø Sound Bola

    Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ 10 / 29
  21. Historical Language Comparison Sound Correspondences Inferring Correspondences Ø Sound Bola

    Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ 10 / 29
  22. Historical Language Comparison Sound Correspondences Inferring Correspondences Ø Sound Bola

    Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ 10 / 29
  23. Historical Language Comparison Sound Correspondences Inferring Correspondences Ø Sound Bola

    Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ 10 / 29
  24. Historical Language Comparison Sound Correspondences Inferring Correspondences Ø Sound Bola

    Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ 10 / 29
  25. Historical Language Comparison Sound Correspondences Inferring Correspondences Ø Sound Bola

    Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ 10 / 29
  26. Historical Language Comparison Homolog Detection Inferring Homologs Cognate List Alignment

    Correspondences Bola six kʰ j a u ʔ ⁵⁵ Bola Maru Freq. a(a̰) a(a̰) 3 x u u 3 x ʔ k 3 x j j 2 x k(ʰ) k(ʰ) 2 x ⁵⁵ ⁵⁵ 2 x ³¹ ³¹ 1 x Maru six kʰ j a u k ⁵⁵ Bola lip k a̰ u ʔ ⁵⁵ Maru lip k a̰ u k ⁵⁵ Bola man j a u ʔ ³¹ Maru man j a u k ³¹ 11 / 29
  27. Historical Language Comparison Correspondence Patterns Inferring Patterns Ø Sound Bola

    Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ tʰ tsʰ t tʰ sʰ 12 / 29
  28. Historical Language Comparison Correspondence Patterns Inferring Patterns Ø Sound Bola

    Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ tʰ tsʰ t tʰ sʰ "salt" Bola tʰ a ³⁵ Maru tsʰ ɔ ³⁵ Rangoon sʰ ɑ ⁵⁵ 12 / 29
  29. Historical Language Comparison Correspondence Patterns Inferring Patterns Ø Sound Bola

    Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ t ts t tʰ tθ 12 / 29
  30. Historical Language Comparison Correspondence Patterns Inferring Patterns Ø Sound Bola

    Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ t ts t tʰ tθ "tooth" Bola t u i ⁵⁵ Maru ts ɔ i ³¹ Rangoon tθ w a ⁵⁵ 12 / 29
  31. Historical Language Comparison Correspondence Patterns Inferring Patterns Ø Sound Bola

    Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ tʰ tʰ t tʰ tʰ 12 / 29
  32. Historical Language Comparison Correspondence Patterns Inferring Patterns Ø Sound Bola

    Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ tʰ tʰ t tʰ tʰ "sharp" Bola tʰ a ʔ ⁵⁵ Maru tʰ ɔ ʔ ⁵⁵ Rangoon tʰ ɛ ʔ ⁴ 12 / 29
  33. Historical Language Comparison Correspondence Patterns Inferring Patterns Ø Sound Bola

    Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ t t t tʰ t 12 / 29
  34. Historical Language Comparison Correspondence Patterns Inferring Patterns Ø Sound Bola

    Maru Rangoon d Ø Ø d t t t t tʰ tʰ tʰ tʰ tθ Ø Ø tθ ts ts ts Ø tsʰ tsʰ tsʰ Ø tʃ tʃ tʃ Ø tʃʰ tʃʰ tʃʰ Ø s s s s sʰ Ø Ø sʰ ɕ Ø Ø ɕ ʃ ʃ ʃ t t t tʰ t "wing" Bola t a u ŋ ⁵⁵ Maru t a u ŋ ³¹ Rangoon t ɑ u ∼ ²² 12 / 29
  35. Inferring Correspondence Patterns Sound Correspondence Patterns Sound Correspondence Patterns PIE

    Hittite Sanskrit Avestan Greek Latin Gothic Old Church Slavonic Lithuanian Old Irish Armenian Tocharian *p p p p f p p f b p p Ø h w Ø p *b b p b bβ b b p b b b p p *bʰ b p bʱ/bh bβ pʰ/ph f b b b b b b p *t t t t θ t t θ/þ d t t t tʼ j/y t tʃ/c *d d t d d ð d d t d d d t ts ʃ/ś *dʰ d t dʰ/dh h d ð tʰ/th f d b d d d d t t tʃ/c ... ... ... ... ... ... ... ... ... ... ... ... *kʷ kʷ/ku k c k c k p t kʷ/qu hʷ/hw g k tʃ/č k c kʼ tʃʼ/čʼ k ʃʲ/ś *gʷ kʷ/u g j g j g b d gʷ/gu u q g ʒ/ž z g b k k ś *gʷʰ kʷ/ku gʷ/gu gʱ/gh h g j pʰ/ph tʰ/th kʰ/kh f gʷ/gu u g b g ʒ/ž z g g g dʒ/ǰ k ʃʲ/ś Clackson (2007: 37) 14 / 29
  36. Inferring Correspondence Patterns Sound Correspondence Patterns Sound Correspondence Patterns correspondence

    patterns in linguistics are a way to encode mappings across several different alphabets 15 / 29
  37. Inferring Correspondence Patterns Sound Correspondence Patterns Sound Correspondence Patterns correspondence

    patterns in linguistics are a way to encode mappings across several different alphabets they are usually inferred manually, by inspecting “correspondence sets” (Clackson 2007: 29f) of words (i.e., cognate sets with recurring sounds) 15 / 29
  38. Inferring Correspondence Patterns Sound Correspondence Patterns Sound Correspondence Patterns correspondence

    patterns in linguistics are a way to encode mappings across several different alphabets they are usually inferred manually, by inspecting “correspondence sets” (Clackson 2007: 29f) of words (i.e., cognate sets with recurring sounds) the main problem of correspondence pattern identification is the handling of missing data, since not all cognate sets will necessarily contain reflexes from each of the languages under investigation 15 / 29
  39. Inferring Correspondence Patterns Inference of Correspondence Patterns Inference of Correspondence

    Patterns the main idea for the correspondence pattern inference algorithm is to derive a graph from correspondence sets in which each individual correspondence set (a site in an aligned cognate set) is a node, and links between nodes are drawn between compatible correspondence sets 16 / 29
  40. Inferring Correspondence Patterns Inference of Correspondence Patterns Inference of Correspondence

    Patterns the main idea for the correspondence pattern inference algorithm is to derive a graph from correspondence sets in which each individual correspondence set (a site in an aligned cognate set) is a node, and links between nodes are drawn between compatible correspondence sets if two correspondence sets are compatible, this means that they have identical non-missing values for at least one language and no conflicting data for any of the languages 16 / 29
  41. Inferring Correspondence Patterns Inference of Correspondence Patterns Inference of Correspondence

    Patterns the main idea for the correspondence pattern inference algorithm is to derive a graph from correspondence sets in which each individual correspondence set (a site in an aligned cognate set) is a node, and links between nodes are drawn between compatible correspondence sets if two correspondence sets are compatible, this means that they have identical non-missing values for at least one language and no conflicting data for any of the languages if two or more correspondence sets are compatible, we can impute missing values by combining them 16 / 29
  42. Inferring Correspondence Patterns Inference of Correspondence Patterns Compabitility of Alignment

    Sites Cognate Set L1 L2 L3 L4 L5 L6 L7 L8 “hand-1” p p p Ø f f Ø p “foot-1” p p p p f f p p ⊠ compatible □ incompatible 17 / 29
  43. Inferring Correspondence Patterns Inference of Correspondence Patterns Compabitility of Alignment

    Sites Cognate Set L1 L2 L3 L4 L5 L6 L7 L8 “hand-1” p p p Ø f f Ø p “foot-1” p p p p f f p p ⊠ compatible □ incompatible Cognate Set L1 L2 L3 L4 L5 L6 L7 L8 “hand-1” p p p Ø f f Ø p “leg-1” p p f pf f f p p □ compatible ⊠ incompatible 17 / 29
  44. Inferring Correspondence Patterns Inference of Correspondence Patterns Compatibility Graphs 8

    Burmish languages (spoken in China and Myanmar, taken from Hill and List 2017) 18 / 29
  45. Inferring Correspondence Patterns Inference of Correspondence Patterns Compatibility Graphs 8

    Burmish languages (spoken in China and Myanmar, taken from Hill and List 2017) 240 concepts 855 partial cognate sets 18 / 29
  46. Inferring Correspondence Patterns Inference of Correspondence Patterns Compatibility Graphs 8

    Burmish languages (spoken in China and Myanmar, taken from Hill and List 2017) 240 concepts 855 partial cognate sets 728 cross-semantic partial cognate sets (covering one and more concepts) 18 / 29
  47. Inferring Correspondence Patterns Inference of Correspondence Patterns Compatibility Graphs 8

    Burmish languages (spoken in China and Myanmar, taken from Hill and List 2017) 240 concepts 855 partial cognate sets 728 cross-semantic partial cognate sets (covering one and more concepts) 218 valid cognate sets (residues in more than one language) 18 / 29
  48. Inferring Correspondence Patterns Inference of Correspondence Patterns Compabitility Graphs A

    "gums" Achang ʂ - u a ³¹ Rangoon tθ w - ɑ ⁵⁵ Atsi Ø Ø Ø Ø Ø Bola Ø Ø Ø Ø Ø Maru Ø Ø Ø Ø Ø B "die" Achang Ø Ø Ø Ø Rangoon tθ e - ²² Atsi Ø Ø Ø Ø Bola ʃ ɿ - ⁵⁵ Maru ʃ i k ³¹ C "daughter" Achang Ø Ø Ø Rangoon tθ ɑ ⁵³ Atsi Ø Ø Ø Bola Ø Ø Ø Maru ts o ⁰ A B A C B C ʂ Ø ʂ Ø Ø Ø tθ tθ tθ tθ tθ tθ Ø Ø Ø Ø Ø Ø Ø ʃ Ø Ø ʃ Ø Ø ʃ Ø ts ʃ ts identical compatible incompatible A C B A B C 19 / 29
  49. Inferring Correspondence Patterns Inference of Correspondence Patterns Compatibility Graphs pʰ

    tʰ tʰ tʰ tʰ pʰ tʰ tʰ pʰ tʰ pʰ ŋ ŋ ŋ ŋ ŋ ŋ ŋ ŋ tsʰ tʃʰ tsʰ tʃʰ tʃʰ tsʰ tsʰ tsʰ v j f j v v n◌̥ ŋ - ŋ ŋ ŋ ŋ ʃ ʃ ʃ ʃ ʃ ʃ ʃ ʃ s s tʃ tʃ s tʃ tʃ tʃ tʃ tʃ tʃ tʃ x x tʃ x x x t t t t t t ʃ ʃ ʃ ʃ ʃ ʃ ts ts ts ts ts ts ts ts ts ts t t t t t t t t p pʰ m m p m m pʰ p m s s s s s s s s s s n l l l l l l l l l l l l s s s s s s p p p p p p p p p p pʰ p p p p p p m m m m m m m l l l l l l l l j j - j - j j - j k ɣ ɣ ɣ ɣ ʐ ɣ j j v j - w v j k k k k k k k k k kʰ kʰ kʰ kʰ kʰ kʰ kʰ k k k k k kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ x x x x n◌̥ n n n n n n n n n ŋ n n n n n n n n n k k k k k k m m m m m m m m m n◌̥ n n n - ŋ n ŋ nʲ m m m m m m m m 20 / 29
  50. Inferring Correspondence Patterns Inference of Correspondence Patterns Compatibility Graphs tʰ

    tʰ tʰ pʰ tʰ tʰ pʰ pʰ tʰ tʰ pʰ ŋ ŋ ŋ ŋ ŋ ŋ ŋ ŋ tsʰ tsʰ tʃʰ tsʰ tsʰ tʃʰ tʃʰ tsʰ j v f j v v ŋ - ŋ n◌̥ ŋ ŋ ŋ ʃ ʃ s ʃ ʃ ʃ ʃ ʃ ʃ s tʃ tʃ tʃ tʃ tʃ s tʃ tʃ tʃ tʃ x x tʃ x x x t t t t t t ʃ ʃ ʃ ʃ ʃ ʃ ts ts ts ts ts ts ts ts ts ts t t t t t t t t m m m pʰ pʰ p p p m m s s s s s s s s s s n l l l l l l l l l l l l s s s s s s p p p p p p p p p p p p p p p pʰ p m m m m m m m l l l l l l l l - - j j j j j - j k ɣ ɣ ɣ ɣ ʐ ɣ w j - v v j j j k k k k k k k k k kʰ kʰ kʰ kʰ kʰ kʰ kʰ k k k k k kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ x x x x n n◌̥ n n n n n ŋ n n n n n n n n n n n n k k k k k k m m m m m m m m - n n ŋ ŋ n n◌̥ n nʲ m m m m m m m m m 20 / 29
  51. Inferring Correspondence Patterns Inference of Correspondence Patterns Compatibility Graphs x

    x x x x x x x x x good correspondence set bad correspondence set 20 / 29
  52. Inferring Correspondence Patterns Inference of Correspondence Patterns Compatibility Graphs Only

    fully compatible clusters (i.e., only cliques in our net- work of correspondence sets) can represent true sound corre- spondence patterns (if sound change is regular). 20 / 29
  53. Inferring Correspondence Patterns Inference of Correspondence Patterns Correspondence Pattern Inference

    as Clique Cover Problem The clique cover problem (also called clique partitioning problem, see Bhasker 1991) is the inverse of the famous graph coloring problem and has been shown to be NP-hard. 21 / 29
  54. Inferring Correspondence Patterns Inference of Correspondence Patterns Correspondence Pattern Inference

    as Clique Cover Problem The clique cover problem (also called clique partitioning problem, see Bhasker 1991) is the inverse of the famous graph coloring problem and has been shown to be NP-hard. The goal of the problem is to split a graph into the smallest number of cliques in which each node is represented by exactly one clique. 21 / 29
  55. Inferring Correspondence Patterns Inference of Correspondence Patterns Correspondence Pattern Inference

    as Clique Cover Problem The clique cover problem (also called clique partitioning problem, see Bhasker 1991) is the inverse of the famous graph coloring problem and has been shown to be NP-hard. The goal of the problem is to split a graph into the smallest number of cliques in which each node is represented by exactly one clique. We assume (but we cannot formally prove it) that the clique cover of our graph of compatible correspondence sets will correspond to the optimal set of sound correspondence patterns in our data. 21 / 29
  56. Inferring Correspondence Patterns Inference of Correspondence Patterns Correspondence Pattern Inference

    as Clique Cover Problem The clique cover problem (also called clique partitioning problem, see Bhasker 1991) is the inverse of the famous graph coloring problem and has been shown to be NP-hard. The goal of the problem is to split a graph into the smallest number of cliques in which each node is represented by exactly one clique. We assume (but we cannot formally prove it) that the clique cover of our graph of compatible correspondence sets will correspond to the optimal set of sound correspondence patterns in our data. By applying an approximation algorithm to infer a near-optimal clique cover of our data of aligned cognate sets, we can infer the most frequently recurring correspondence patterns in our data. 21 / 29
  57. Inferring Correspondence Patterns Inference of Correspondence Patterns Graph with Clique

    Cover tʃʰ s ʃ tʃʰ x j n◌̥ n n ŋ tsʰ tsʰ tsʰ tʃʰ pʰ pʰ pʰ j w x v x v x j x n n x x x n◌̥ x n m m m n kʰ kʰ kʰ tʰ tʰ tʰ kʰ tʰ tʰ tʰ tʰ - - - pʰ pʰ p j ɣ ɣ ɣ ɣ j j j n n n n m n n n n m m m m m m m m m m l l l p l l l l l l l kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ ʃ s s s k ʃ ʃ k s s k s ʃ ʃ ʃ t t t t t t t t t t t m m m m m m m m m p p p p p p p p p p p p p p p k k s kʰ kʰ k ʃ ŋ n m l t t l - tsʰ f tʃ k n n l ʃ tsʰ l s m t p k n kʰ m j m v j s tʃ n ts m l ŋ k kʰ v ʃ ʐ ʃ n k j - tʃ pʰ s v m k ŋ ŋ - n n l n◌̥ ŋ ŋ l l l l ʃ ʃ ts ʃ k s k s s s s s ts ts ts ts ts ts j k j ɣ k ŋ ŋ ŋ ŋ ŋ ŋ ŋ s s tʃ tʃ tʃ tʃ tʃ tʃ tʃ k k k k k ts p pʰ ts nʲ ŋ n ŋ k 22 / 29
  58. Inferring Correspondence Patterns Inference of Correspondence Patterns Graph with Clique

    Cover tʰ tʰ tʰ pʰ tʰ tʰ pʰ pʰ tʰ tʰ pʰ ŋ ŋ ŋ ŋ ŋ ŋ ŋ ŋ tsʰ tsʰ tʃʰ tsʰ tsʰ tʃʰ tʃʰ tsʰ j v f j v v ŋ - ŋ n◌̥ ŋ ŋ ŋ ʃ ʃ s ʃ ʃ ʃ ʃ ʃ ʃ s tʃ tʃ tʃ tʃ tʃ s tʃ tʃ tʃ tʃ x x tʃ x x x t t t t t t ʃ ʃ ʃ ʃ ʃ ʃ ts ts ts ts ts ts ts ts ts ts t t t t t t t t m m m pʰ pʰ p p p m m s s s s s s s s s s n l l l l l l l l l l l l s s s s s s p p p p p p p p p p p p p p p pʰ p m m m m m m m l l l l l l l l - - j j j j j - j k ɣ ɣ ɣ ɣ ʐ ɣ w j - v v j j j k k k k k k k k k kʰ kʰ kʰ kʰ kʰ kʰ kʰ k k k k k kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ x x x x n n◌̥ n n n n n ŋ n n n n n n n n n n n n k k k k k k m m m m m m m m - n n ŋ ŋ n n◌̥ n nʲ m m m m m m m m m 22 / 29
  59. Inferring Correspondence Patterns Inference of Correspondence Patterns Graph with Clique

    Cover tʃʰ tʃʰ tsʰ tsʰ tsʰ tsʰ tʰ tʰ tsʰ tʃʰ tʰ tʰ tʰ tʰ tʰ 22 / 29
  60. Inferring Correspondence Patterns Inference of Correspondence Patterns Graph with Clique

    Cover tʃʰ tʃʰ tʃʰ tsʰ tsʰ tsʰ tsʰ tsʰ tʰ tʰ tʰ tʰ tʰ tʰ tʰ Clique Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 41 659 "goat" Ø Ø Ø Ø tʃʰ tsʰ sʰ Ø 41 672 "armpit" Ø Ø tʃʰ tʃʰ tʃʰ Ø Ø Ø 41 433 "rice" tsʰ tʃʰ tʃʰ tʃʰ tʃʰ Ø sʰ tsʰ Clique Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 74 55 "ten" Ø Ø tʰ tsʰ Ø Ø Ø tsʰ 42 53 "ten" tɕʰ tsʰ Ø Ø Ø tsʰ sʰ Ø 42 421 "salt" tɕʰ tsʰ tʰ tsʰ tsʰ tsʰ sʰ cʰ 42 129 "twenty" Ø tsʰ tʰ tsʰ tsʰ tsʰ sʰ Ø 83 287 "hair" Ø tsʰ tsʰ tsʰ tsʰ tsʰ sʰ Ø Clique Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 17 639 "above" Ø tʰ tʰ Ø tʰ tʰ tʰ Ø 17 472 "sing" Ø tʰ tʰ Ø tʰ Ø Ø Ø 17 61 "that" tʰ Ø Ø tʰ tʰ tʰ tʰ tʰ 17 323 "sharp" tʰ tʰ tʰ tʰ tʰ tʰ tʰ Ø 17 66 "there" tʰ Ø tʰ tʰ tʰ Ø tʰ tʰ 17 547 "firewood" tʰ tʰ tʰ tʰ tʰ tʰ tʰ tʰ 17 74 "thick" Ø tʰ tʰ tʰ tʰ Ø tʰ Ø 22 / 29
  61. Inferring Correspondence Patterns Inference of Correspondence Patterns Graph with Clique

    Cover Clique Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 41 659 "goat" Ø Ø Ø Ø tʃʰ tsʰ sʰ Ø 41 672 "armpit" Ø Ø tʃʰ tʃʰ tʃʰ Ø Ø Ø 41 433 "rice" tsʰ tʃʰ tʃʰ tʃʰ tʃʰ Ø sʰ tsʰ Clique Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 74 55 "ten" Ø Ø tʰ tsʰ Ø Ø Ø tsʰ 42 53 "ten" tɕʰ tsʰ Ø Ø Ø tsʰ sʰ Ø 42 421 "salt" tɕʰ tsʰ tʰ tsʰ tsʰ tsʰ sʰ cʰ 42 129 "twenty" Ø tsʰ tʰ tsʰ tsʰ tsʰ sʰ Ø 83 287 "hair" Ø tsʰ tsʰ tsʰ tsʰ tsʰ sʰ Ø Clique Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 17 639 "above" Ø tʰ tʰ Ø tʰ tʰ tʰ Ø 17 472 "sing" Ø tʰ tʰ Ø tʰ Ø Ø Ø 17 61 "that" tʰ Ø Ø tʰ tʰ tʰ tʰ tʰ 17 323 "sharp" tʰ tʰ tʰ tʰ tʰ tʰ tʰ Ø 17 66 "there" tʰ Ø tʰ tʰ tʰ Ø tʰ tʰ 17 547 "firewood" tʰ tʰ tʰ tʰ tʰ tʰ tʰ tʰ 17 74 "thick" Ø tʰ tʰ tʰ tʰ Ø tʰ Ø tʃʰ tʃʰ tʃʰ tsʰ tsʰ tsʰ tsʰ tsʰ tʰ tʰ tʰ tʰ tʰ tʰ tʰ 22 / 29
  62. Inferring Correspondence Patterns Inference of Correspondence Patterns Summary on Results

    104 initial consonant patterns (48 with more than one reflex, the rest highly irregular) 23 / 29
  63. Inferring Correspondence Patterns Inference of Correspondence Patterns Summary on Results

    104 initial consonant patterns (48 with more than one reflex, the rest highly irregular) many patterns correspond to well-known proto-sounds in the data 23 / 29
  64. Inferring Correspondence Patterns Inference of Correspondence Patterns Summary on Results

    104 initial consonant patterns (48 with more than one reflex, the rest highly irregular) many patterns correspond to well-known proto-sounds in the data some cliques are unintuitive 23 / 29
  65. Inferring Correspondence Patterns Inference of Correspondence Patterns Summary on Problems

    Irregular patterns: result in part from problems in cognate coding (homology assessment) 24 / 29
  66. Inferring Correspondence Patterns Inference of Correspondence Patterns Summary on Problems

    Irregular patterns: result in part from problems in cognate coding (homology assessment) result in part from sparseness of data 24 / 29
  67. Inferring Correspondence Patterns Inference of Correspondence Patterns Summary on Problems

    Irregular patterns: result in part from problems in cognate coding (homology assessment) result in part from sparseness of data result in part from the exactness of the algorithm 24 / 29
  68. Inferring Correspondence Patterns Inference of Correspondence Patterns Summary on Problems

    Irregular patterns: result in part from problems in cognate coding (homology assessment) result in part from sparseness of data result in part from the exactness of the algorithm Unintuitive patterns: 24 / 29
  69. Inferring Correspondence Patterns Inference of Correspondence Patterns Summary on Problems

    Irregular patterns: result in part from problems in cognate coding (homology assessment) result in part from sparseness of data result in part from the exactness of the algorithm Unintuitive patterns: result in part from the greediness of the algorithm 24 / 29
  70. Inferring Correspondence Patterns Inference of Correspondence Patterns Possible Improvements catch

    the greediness of the algorithm by adding a secondary check of cliques (calculate consensus, re-assign all alignment sites to all compatible consensus sequences, count the instances) 25 / 29
  71. Inferring Correspondence Patterns Inference of Correspondence Patterns Possible Improvements catch

    the greediness of the algorithm by adding a secondary check of cliques (calculate consensus, re-assign all alignment sites to all compatible consensus sequences, count the instances) allow for non-perfect cliques in which compatibility is allowed to deviate to a certain degree (e.g., one irregular cell) 25 / 29
  72. Inferring Correspondence Patterns Inference of Correspondence Patterns Possible Improvements catch

    the greediness of the algorithm by adding a secondary check of cliques (calculate consensus, re-assign all alignment sites to all compatible consensus sequences, count the instances) allow for non-perfect cliques in which compatibility is allowed to deviate to a certain degree (e.g., one irregular cell) provide a more fine-grained checking of proposed cliques by counting columns suffering from missing data 25 / 29
  73. Inferring Correspondence Patterns Inference of Correspondence Patterns Possible Improvements catch

    the greediness of the algorithm by adding a secondary check of cliques (calculate consensus, re-assign all alignment sites to all compatible consensus sequences, count the instances) allow for non-perfect cliques in which compatibility is allowed to deviate to a certain degree (e.g., one irregular cell) provide a more fine-grained checking of proposed cliques by counting columns suffering from missing data allow for a direct checking and correcting of patterns by the experts 25 / 29
  74. Inferring Correspondence Patterns Inference of Correspondence Patterns Possible Improvements Clique

    Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 41 659 "goat" Ø Ø Ø Ø tʃʰ tsʰ sʰ Ø 41 672 "armpit" Ø Ø tʃʰ tʃʰ tʃʰ Ø Ø Ø 41 433 "rice" tsʰ tʃʰ tʃʰ tʃʰ tʃʰ Ø sʰ tsʰ Clique Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 74 55 "ten" Ø Ø tʰ tsʰ Ø Ø Ø tsʰ 42 53 "ten" tɕʰ tsʰ Ø Ø Ø tsʰ sʰ Ø 42 421 "salt" tɕʰ tsʰ tʰ tsʰ tsʰ tsʰ sʰ cʰ 42 129 "twenty" Ø tsʰ tʰ tsʰ tsʰ tsʰ sʰ Ø 83 287 "hair" Ø tsʰ tsʰ tsʰ tsʰ tsʰ sʰ Ø Clique Cogn. Concept Achang Atsi Bola Lashi Maru Old B. Rang. Xiand. 17 639 "above" Ø tʰ tʰ Ø tʰ tʰ tʰ Ø 17 472 "sing" Ø tʰ tʰ Ø tʰ Ø Ø Ø 17 61 "that" tʰ Ø Ø tʰ tʰ tʰ tʰ tʰ 17 323 "sharp" tʰ tʰ tʰ tʰ tʰ tʰ tʰ Ø 17 66 "there" tʰ Ø tʰ tʰ tʰ Ø tʰ tʰ 17 547 "firewood" tʰ tʰ tʰ tʰ tʰ tʰ tʰ tʰ 17 74 "thick" Ø tʰ tʰ tʰ tʰ Ø tʰ Ø tʃʰ tʃʰ tʃʰ tsʰ tsʰ tsʰ tsʰ tsʰ tʰ tʰ tʰ tʰ tʰ tʰ tʰ 26 / 29
  75. Inferring Correspondence Patterns Inference of Correspondence Patterns Outlook the proposed

    inference of correspondence patterns is a first attempt to account for systemic aspects of sound change in a rigorous manner 28 / 29
  76. Inferring Correspondence Patterns Inference of Correspondence Patterns Outlook the proposed

    inference of correspondence patterns is a first attempt to account for systemic aspects of sound change in a rigorous manner in contrast to many approaches proposed so far, it does not require family trees in any form, networks are just enough, but the patterns inferred can be used to study tree-like aspects of evolution (Chacon and List 2015), 28 / 29
  77. Inferring Correspondence Patterns Inference of Correspondence Patterns Outlook the proposed

    inference of correspondence patterns is a first attempt to account for systemic aspects of sound change in a rigorous manner in contrast to many approaches proposed so far, it does not require family trees in any form, networks are just enough, but the patterns inferred can be used to study tree-like aspects of evolution (Chacon and List 2015), the algorithm is surely a good start, but it needs to be improved in several ways 28 / 29