Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Computer-Assisted Approaches to Linguistic Reconstruction

Computer-Assisted Approaches to Linguistic Reconstruction

Talk, held at the "Workshop on the regularity of sound change" (2017/07/20-21, Cologne, University of Cologne)

Johann-Mattis List

July 21, 2017
Tweet

More Decks by Johann-Mattis List

Other Decks in Science

Transcript

  1. Computer-Assisted Approaches to Linguistic Reconstruction A Case Study from the

    Burmish Languages Johann-Mattis List¹ and Nathan W. Hill² 2017-07-21 ¹ Max Planck Institute for the Science of Human History, ² SOAS, University of London
  2. Etymology 2.0? Historical linguistics after the quantitative turn: • Quantitative

    methods in historical linguistics have received much attention of late, 4
  3. Etymology 2.0? Historical linguistics after the quantitative turn: • Quantitative

    methods in historical linguistics have received much attention of late, • but only a few (if any) of the new methods have addressed long-standing problems of classical linguistics, 4
  4. Etymology 2.0? Historical linguistics after the quantitative turn: • Quantitative

    methods in historical linguistics have received much attention of late, • but only a few (if any) of the new methods have addressed long-standing problems of classical linguistics, • and as a result, many classical linguists are very sceptical of the new approaches. 4
  5. Etymology 3.0? Towards a “qualitative turn” in computational historical linguistics:

    • Instead of blaming computers for our misery (no funding, institutes are begin shut down, etc.), we should start seeing computers as a chance to address the important questions which we have not solved in 200 years of research... 5
  6. Etymology 3.0? Towards a “qualitative turn” in computational historical linguistics:

    • Instead of blaming computers for our misery (no funding, institutes are begin shut down, etc.), we should start seeing computers as a chance to address the important questions which we have not solved in 200 years of research... • But we don’t a framework in which computers do our work for us, instead, we need a framework, where we tell computers to some work for us, in order to render our research more explicit, more efficient, and more rigorous. 5
  7. Computer-Assisted Language Comparison CALC (MPI-SHH, Jena) • computational formalization of

    the classical methods for historical language comparison • establish a close collaboration between computational and classical historical linguistics by providing data in human- and machine-readable form 7
  8. Computer-Assisted Language Comparison CALC (MPI-SHH, Jena) • computational formalization of

    the classical methods for historical language comparison • establish a close collaboration between computational and classical historical linguistics by providing data in human- and machine-readable form ASIA (SOAS, London) • reconstruction of Proto-Burmish 7
  9. The Burmish Etymological Database (BED) problem current etymological accounts on

    Proto-Burmish have many problems (no lexical reconstruction, insufficient phonological reconstruction, unclear data, intransparent methodology) 9
  10. The Burmish Etymological Database (BED) problem current etymological accounts on

    Proto-Burmish have many problems (no lexical reconstruction, insufficient phonological reconstruction, unclear data, intransparent methodology) goal compile an etymological database of Proto-Burmish 9
  11. The Burmish Etymological Database (BED) problem current etymological accounts on

    Proto-Burmish have many problems (no lexical reconstruction, insufficient phonological reconstruction, unclear data, intransparent methodology) goal compile an etymological database of Proto-Burmish procedure BED as litmus test for CALC: • make the Proto-Burmish Etymological Database project a first test for the CALC framework • use existing computational methods to pre-analyze the data • develop interfaces to allow for correction and inspection by the experts 9
  12. Where are you now with BED? ⊠ Develop computer-assisted workflows

    to create and curate data in human- and machine-readable form. 10
  13. Where are you now with BED? ⊠ Develop computer-assisted workflows

    to create and curate data in human- and machine-readable form. ⊠ Develop computer-assisted workflow for partial cognate detection and alignments. 10
  14. Where are you now with BED? ⊠ Develop computer-assisted workflows

    to create and curate data in human- and machine-readable form. ⊠ Develop computer-assisted workflow for partial cognate detection and alignments. ⊟ Develop methods for automatic phonological reconstruction and workflows for the correction of the results by experts (→ THIS TALK). 10
  15. Where are you now with BED? ⊠ Develop computer-assisted workflows

    to create and curate data in human- and machine-readable form. ⊠ Develop computer-assisted workflow for partial cognate detection and alignments. ⊟ Develop methods for automatic phonological reconstruction and workflows for the correction of the results by experts (→ THIS TALK). □ Develop methods for lexical reconstruction (first ideas, not shown in this talk). 10
  16. Where are you now with BED? ⊠ Develop computer-assisted workflows

    to create and curate data in human- and machine-readable form. ⊠ Develop computer-assisted workflow for partial cognate detection and alignments. ⊟ Develop methods for automatic phonological reconstruction and workflows for the correction of the results by experts (→ THIS TALK). □ Develop methods for lexical reconstruction (first ideas, not shown in this talk). □ Write a data-driven etymological dictionary (table of contents is half-written). 10
  17. What is phonological reconstruction? • Phonological reconstruction is primarily understood

    as the reconstruction of the sound system of a language not reflected in written sources. 12
  18. What is phonological reconstruction? • Phonological reconstruction is primarily understood

    as the reconstruction of the sound system of a language not reflected in written sources. • More specifically, however, we see phonological reconstruction as the task of reconstructing major patterns of sound change which allow us to reconstruct tentative proto-forms from cognate sets, regardless of whether those words were really present in the Ursprache or what those forms meant. 12
  19. What is phonological reconstruction? • Phonological reconstruction is primarily understood

    as the reconstruction of the sound system of a language not reflected in written sources. • More specifically, however, we see phonological reconstruction as the task of reconstructing major patterns of sound change which allow us to reconstruct tentative proto-forms from cognate sets, regardless of whether those words were really present in the Ursprache or what those forms meant. • The task of lexical reconstruction follows phonological reconstruction in projecting full lexemes to the Ursprache, thereby also assessing their meaning, and whether it is reasonable to reconstruct them at all. 12
  20. Classical Workflow (Comparative Method) • assemble cognate sets and sound

    correspondences by comparing data on different languages 13
  21. Classical Workflow (Comparative Method) • assemble cognate sets and sound

    correspondences by comparing data on different languages • infer sound change processes (“sound laws”) from the inferred sound correspondence patterns 13
  22. Classical Workflow (Comparative Method) • assemble cognate sets and sound

    correspondences by comparing data on different languages • infer sound change processes (“sound laws”) from the inferred sound correspondence patterns • explain exceptions by: 13
  23. Classical Workflow (Comparative Method) • assemble cognate sets and sound

    correspondences by comparing data on different languages • infer sound change processes (“sound laws”) from the inferred sound correspondence patterns • explain exceptions by: • refining inferred sound change processes (cf. “Verner’s law”) • borrowing (“substratum influence”) • analogy (leftovers) 13
  24. Computer-Based Automatic Approaches Problems of computer-based approaches: (a) fail to

    model sound change as a systemic process (each column of an alignment is counted independently) 15
  25. Computer-Based Automatic Approaches Problems of computer-based approaches: (a) fail to

    model sound change as a systemic process (each column of an alignment is counted independently) (b) fail to make use of linguistic knowledge on the directionality of sound change processes and have to rely on phylogenies 15
  26. Computer-Based Automatic Approaches Problems of computer-based approaches: (a) fail to

    model sound change as a systemic process (each column of an alignment is counted independently) (b) fail to make use of linguistic knowledge on the directionality of sound change processes and have to rely on phylogenies (c) fail to handle unattested sounds, as only sounds which are in the data can be reconstructed 15
  27. General Workflow Preliminary steps: • partial cognate detection and partial

    phonetic alignment (List et al. 2016) with manual refinement 17
  28. General Workflow Preliminary steps: • partial cognate detection and partial

    phonetic alignment (List et al. 2016) with manual refinement • preliminary identification of cross-semantic cognates based on partial colexifications (Hill and List forthcoming) 17
  29. General Workflow Preliminary steps: • partial cognate detection and partial

    phonetic alignment (List et al. 2016) with manual refinement • preliminary identification of cross-semantic cognates based on partial colexifications (Hill and List forthcoming) Phonological reconstruction: 17
  30. General Workflow Preliminary steps: • partial cognate detection and partial

    phonetic alignment (List et al. 2016) with manual refinement • preliminary identification of cross-semantic cognates based on partial colexifications (Hill and List forthcoming) Phonological reconstruction: • sound correspondence pattern identification (List et al. in prep.) 17
  31. General Workflow Preliminary steps: • partial cognate detection and partial

    phonetic alignment (List et al. 2016) with manual refinement • preliminary identification of cross-semantic cognates based on partial colexifications (Hill and List forthcoming) Phonological reconstruction: • sound correspondence pattern identification (List et al. in prep.) • automatic reconstruction using weighted directed networks (→ this talk) 17
  32. Detailed Workflow: Preliminary Steps Fúzhōu ŋuoʔ⁵ Měixiàn ŋiat⁵ 0.44 kuoŋ⁴⁴

    0.78 0.78 Wēnzhōu y²¹ ȵ 0.30 0.35 0.67 ku ³ ɔ ⁵ 0.80 0.85 0.27 0.67 vai¹³ 0.85 0.85 0.82 0.73 0.73 Běijīng y ¹ ɛ⁵ 0.77 0.84 0.73 0.56 0.56 0.66 li ŋ¹ ɑ 0.78 0.78 0.44 0.67 0.82 0.82 0.80 ŋiat⁵ kuoŋ⁴⁴ ŋuoʔ⁵ ȵy²¹ yɛ⁵¹ kuɔ³⁵ liɑŋ¹ vai¹³ ŋiat⁵ vai¹³ kuoŋ⁴⁴ ŋuoʔ⁵ liɑŋ¹ yɛ⁵¹ ȵy²¹ kuɔ³⁵ ȵy²¹ kuɔ³⁵ ŋiat⁵ yɛ⁵¹ liɑŋ¹ ŋuoʔ⁵ kuoŋ⁴⁴ vai¹³ B C D A Partial cognate detection: List, Lopez, and Bapteste (2016) 18
  33. Detailed Workflow: Preliminary Steps Language 'mountain' 'dog' 'thunder' 'wolf' 'bear

    (n.)' Atsi pum⁵¹ kʰui²¹ mau²¹ mjiŋ⁵¹ vam⁵¹ kʰui²¹ mo⁵⁵ vam⁵¹ mountain dog sky + thunder bear + dog + m-suff. bear Bola pam⁵⁵ kʰui³⁵ mau³¹ mjaŋ⁵⁵ mjaŋ⁵⁵ kʰui³⁵ vɛ⁵ ⁵⁵ mountain dog sky + thunder thunder + dog bear Lashi pɔm³¹ kʰui⁵⁵ mou³³ kɔm³³ wɔm³¹ kʰui⁵⁵ wɔm³¹ mountain dog sky + thunderB bear + dog bear Maru pam³¹ lə¹ ³¹ kʰa³⁵ muk⁵⁵ kum³¹ mjaŋ³¹ kʰa³⁵ vɛ⁵ ³¹ mountain ? + dog sky + thunderB thunder + dog bear Achang pum⁵⁵ xui³¹ mau³¹ ʐau³¹ pum⁵⁵ xui³¹ ɔm⁵⁵ mountain dog sky + thunderC mountain + dog bear Morpheme Annotation (Hill and List forthc.) 20
  34. Detailed Workflow: Sound Correspondence Pattern Inference Sound Correspondence Patterns and

    Phonological Reconstruction: • the most traditional way to reconstruct in the comparative-method framework is to infer patterns of regular sound correspondences across a set of languages and then assign proto-forms for each distinct pattern 22
  35. Detailed Workflow: Sound Correspondence Pattern Inference Sound Correspondence Patterns and

    Phonological Reconstruction: • the most traditional way to reconstruct in the comparative-method framework is to infer patterns of regular sound correspondences across a set of languages and then assign proto-forms for each distinct pattern • correspondence patterns are usually inferred manually, by inspecting “correspondence sets” (Clackson 2007: 29f) of words (i.e., cognate sets with recurring sounds) 22
  36. Detailed Workflow: Sound Correspondence Pattern Inference Sound Correspondence Patterns and

    Phonological Reconstruction: • the most traditional way to reconstruct in the comparative-method framework is to infer patterns of regular sound correspondences across a set of languages and then assign proto-forms for each distinct pattern • correspondence patterns are usually inferred manually, by inspecting “correspondence sets” (Clackson 2007: 29f) of words (i.e., cognate sets with recurring sounds) • the main problem of correspondence pattern identification is the handling of missing data, since not all cognate sets will necessarily contain reflexes from each of the languages under investigation 22
  37. Detailed Workflow: Sound Correspondence Pattern Inference Graphs of Compatible Correspondence

    Sets: • the main idea for the correspondence pattern inference algorithm is to derive a graph from correspondence sets in which each individual correspondence set (a site in an aligned cognate set) is a node, and links between nodes are drawn between compatible correspondence sets 23
  38. Detailed Workflow: Sound Correspondence Pattern Inference Graphs of Compatible Correspondence

    Sets: • the main idea for the correspondence pattern inference algorithm is to derive a graph from correspondence sets in which each individual correspondence set (a site in an aligned cognate set) is a node, and links between nodes are drawn between compatible correspondence sets • if two correspondence sets are compatible, this means that they have identical non-missing values for at least one language and no conflicting data for any of the languages 23
  39. Detailed Workflow: Sound Correspondence Pattern Inference Graphs of Compatible Correspondence

    Sets: • the main idea for the correspondence pattern inference algorithm is to derive a graph from correspondence sets in which each individual correspondence set (a site in an aligned cognate set) is a node, and links between nodes are drawn between compatible correspondence sets • if two correspondence sets are compatible, this means that they have identical non-missing values for at least one language and no conflicting data for any of the languages • if two or more correspondence sets are compatible, we can impute missing values by combining them 23
  40. Detailed Workflow: Sound Correspondence Pattern Inference Cognate Set L1 L2

    L3 L4 L5 L6 L7 L8 “hand-1” p p p ￿ f f ￿ p “foot-1” p p p p f f p p ⊠ compatible □ incompatible 24
  41. Detailed Workflow: Sound Correspondence Pattern Inference Cognate Set L1 L2

    L3 L4 L5 L6 L7 L8 “hand-1” p p p ￿ f f ￿ p “foot-1” p p p p f f p p ⊠ compatible □ incompatible Cognate Set L1 L2 L3 L4 L5 L6 L7 L8 “hand-1” p p p ￿ f f ￿ p “leg-1” p p f pf f f p p □ compatible ⊠ incompatible 24
  42. Detailed Workflow: Sound Correspondence Pattern Inference s s s s

    s s s s s s s k s - x x x x k k k k k k kʰ k ʃ k ʃ ʃ x ɣ ʃ k k k k k k k s s s s s n s s k k k k ʃ ʃ s s ʃ ʃ tʃ tʃ tʃ tʃ tʃ tʃ tʃ tʃ tʃ tʃ x x x x x ʃ ʃ ʃ ʃ ʃ ʃ ʃ kʰ ʃ ʃ s ʃ ts ts ts ts ts ts ts ts ts ts t t t t t t t t t t t t t t t t kʰ t kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ kʰ 25
  43. Detailed Workflow: Sound Correspondence Pattern Inference x x x x

    x x x x x x good correspondence set bad correspondence set 25
  44. Detailed Workflow: Sound Correspondence Pattern Inference Only fully compatible clusters

    (i.e., only cliques in our network of correspondence sets) can represent true sound correspondence pat- terns (if sound change is regular). 25
  45. Detailed Workflow: Sound Correspondence Pattern Inference Sound Correspondence Pattern Inference

    as a Clique Cover Problem: • The clique cover problem (also called clique partitioning problem, see Bhasker 1991) is the inverse of the famous graph coloring problem and has been shown to be NP-hard. 26
  46. Detailed Workflow: Sound Correspondence Pattern Inference Sound Correspondence Pattern Inference

    as a Clique Cover Problem: • The clique cover problem (also called clique partitioning problem, see Bhasker 1991) is the inverse of the famous graph coloring problem and has been shown to be NP-hard. • The goal of the problem is to split a graph into the smallest number of cliques in which each node is represented by exactly one clique. 26
  47. Detailed Workflow: Sound Correspondence Pattern Inference Sound Correspondence Pattern Inference

    as a Clique Cover Problem: • The clique cover problem (also called clique partitioning problem, see Bhasker 1991) is the inverse of the famous graph coloring problem and has been shown to be NP-hard. • The goal of the problem is to split a graph into the smallest number of cliques in which each node is represented by exactly one clique. • We assume (but we cannot formally prove it) that the clique cover of our graph of compatible correspondence sets will correspond to the optimal set of sound correspondence patterns in our data. 26
  48. Detailed Workflow: Sound Correspondence Pattern Inference Sound Correspondence Pattern Inference

    as a Clique Cover Problem: • The clique cover problem (also called clique partitioning problem, see Bhasker 1991) is the inverse of the famous graph coloring problem and has been shown to be NP-hard. • The goal of the problem is to split a graph into the smallest number of cliques in which each node is represented by exactly one clique. • We assume (but we cannot formally prove it) that the clique cover of our graph of compatible correspondence sets will correspond to the optimal set of sound correspondence patterns in our data. • By applying an approximation algorithm to infer a near-optimal clique cover of our data of aligned cognate sets, we can infer the most frequently recurring correspondence patterns in our data. 26
  49. Detailed Workflow: Automatic Reconstruction We can do without trees! •

    Phonological reconstruction in the comparative-method framework usually starts from correspondence patterns. 27
  50. Detailed Workflow: Automatic Reconstruction We can do without trees! •

    Phonological reconstruction in the comparative-method framework usually starts from correspondence patterns. • Apart from very few exceptions, it does not require the knowledge of any specific phylogeny for the language family under investigation (at least not for most consonants). 27
  51. Detailed Workflow: Automatic Reconstruction We can do without trees! •

    Phonological reconstruction in the comparative-method framework usually starts from correspondence patterns. • Apart from very few exceptions, it does not require the knowledge of any specific phylogeny for the language family under investigation (at least not for most consonants). • What it requires, however, is to know the major sound change transitions, which have strong directional preferences for consonants (much less for vowels and tones). 27
  52. Detailed Workflow: Automatic Reconstruction We can do without trees! •

    Phonological reconstruction in the comparative-method framework usually starts from correspondence patterns. • Apart from very few exceptions, it does not require the knowledge of any specific phylogeny for the language family under investigation (at least not for most consonants). • What it requires, however, is to know the major sound change transitions, which have strong directional preferences for consonants (much less for vowels and tones). • We can use this knowledge as a proxy to select which of the sounds in a given correspondence pattern is the best candidate for the proto-sound. 27
  53. Detailed Workflow: Automatic Reconstruction əː ə̆ ɿ ə ḭ: ḭ

    ɤ ə̰ ĭ aː i ɑ ɛ̃ ɑ̃ ɛ ɯ ɛ̰̃ ɔ̃ a̰: ɛ̰ ɔ̰̃ ã o̰ a̰ ɑ̰ w ∼ ŋ - v ĩ e a ḛ ẽ ɔ̰ ŋ̊ ŋʲ n◌̥ʲ n◌̥ ɲ̊ m n mʲ ɲ nʲ m◌̥ ɕ ʃ ç ɬ ʂ r◌̥ l◌̥ ɔː u: ṵ õ ɔ o ʊ ṵː u ũ j ɣ ʐ kʰ x ʑ rj xʐ r l tɕ tʃ t ts c k tʰ s tθ p ʔ pʰ f tsʰ tɕʰ tʃʰ cʰ sʰ 28
  54. Detailed Workflow: Automatic Reconstruction Automatic Reconstruction Strategy: 1. extract the

    sub-graph from the sound-change graph for each distinct sound in a given correspondence pattern, 29
  55. Detailed Workflow: Automatic Reconstruction Automatic Reconstruction Strategy: 1. extract the

    sub-graph from the sound-change graph for each distinct sound in a given correspondence pattern, 2. search for a potential source in the sub-graph, i.e., a sound that has no ancestor, 29
  56. Detailed Workflow: Automatic Reconstruction Automatic Reconstruction Strategy: 1. extract the

    sub-graph from the sound-change graph for each distinct sound in a given correspondence pattern, 2. search for a potential source in the sub-graph, i.e., a sound that has no ancestor, 3. if • there is a source, select it as proto-form, 29
  57. Detailed Workflow: Automatic Reconstruction Automatic Reconstruction Strategy: 1. extract the

    sub-graph from the sound-change graph for each distinct sound in a given correspondence pattern, 2. search for a potential source in the sub-graph, i.e., a sound that has no ancestor, 3. if • there is a source, select it as proto-form, • there are multiple sources, select all as proto-form, 29
  58. Detailed Workflow: Automatic Reconstruction Automatic Reconstruction Strategy: 1. extract the

    sub-graph from the sound-change graph for each distinct sound in a given correspondence pattern, 2. search for a potential source in the sub-graph, i.e., a sound that has no ancestor, 3. if • there is a source, select it as proto-form, • there are multiple sources, select all as proto-form, • the graph is disconnected or no source can be found (loops in the graph), select the most frequently recurring form as a potential proto-form (“majority rules”), 29
  59. Detailed Workflow: Automatic Reconstruction Automatic Reconstruction Strategy: 1. extract the

    sub-graph from the sound-change graph for each distinct sound in a given correspondence pattern, 2. search for a potential source in the sub-graph, i.e., a sound that has no ancestor, 3. if • there is a source, select it as proto-form, • there are multiple sources, select all as proto-form, • the graph is disconnected or no source can be found (loops in the graph), select the most frequently recurring form as a potential proto-form (“majority rules”), 4. label the “quality” of the respective proto-form, specifically marking correspondence patterns which occur only one time in the data, 29
  60. Detailed Workflow: Automatic Reconstruction Automatic Reconstruction Strategy: 1. extract the

    sub-graph from the sound-change graph for each distinct sound in a given correspondence pattern, 2. search for a potential source in the sub-graph, i.e., a sound that has no ancestor, 3. if • there is a source, select it as proto-form, • there are multiple sources, select all as proto-form, • the graph is disconnected or no source can be found (loops in the graph), select the most frequently recurring form as a potential proto-form (“majority rules”), 4. label the “quality” of the respective proto-form, specifically marking correspondence patterns which occur only one time in the data, 5. have the expert clean up the mess. 29
  61. Detailed Workflow: Automatic Reconstruction Advantage of the Approach: (a) ⊠

    systemic aspects of sound change are integrated into the correspondence pattern detection algorithm 30
  62. Detailed Workflow: Automatic Reconstruction Advantage of the Approach: (a) ⊠

    systemic aspects of sound change are integrated into the correspondence pattern detection algorithm (b) ⊠ linguistic knowledge (even language-specific knowledge) is exhaustively used to construct the sound-change networks 30
  63. Detailed Workflow: Automatic Reconstruction Advantage of the Approach: (a) ⊠

    systemic aspects of sound change are integrated into the correspondence pattern detection algorithm (b) ⊠ linguistic knowledge (even language-specific knowledge) is exhaustively used to construct the sound-change networks (c) □ unattested sounds need to be manually handled by assigning them to specific correspondence patterns 30
  64. General Findings Basic Statistics: • 8 languages • 240 concepts

    • 855 partial cognate sets • 728 cross-semantic partial cognate sets • 218 valid cognate sets (with more than two reflexes) • 104 initial consonant patterns (48 with more than one reflex, the rest highly irregular) • well-reconstructed proto-sounds: stops and affricates *k, *kʰ, *t, *tʰ, *tʃ, *tʃʰ, *ts, *tsʰ, p, pʰ fricatives s, ʃ, x liquids and j r, l, j nasals ŋ, n, m 32
  65. Specific Findings: “black” and “dark” Language black dark Old Burmese

    n a k - ∅ Rangoon n ɛ ʔ ⁴ ∅ Achang l ɔ k ⁵⁵ ∅ Xiandao n ɔ ʔ ⁵⁵ ∅ Atsi n o ʔ ²¹ n o ʔ ²¹ Bola n a ʔ ³¹ n a ʔ ³¹ Lashi n ɔː ʔ ³¹ ∅ Maru n ɔ ʔ ³¹ n ɔ ʔ ³¹ Proto-Burmish n *a k ³¹ *n *a|*ṵ ʔ|*k ³¹ [i] The discrepancy in the reconstructions for these two forms which were regularly recognized as cognate in all languages is due to the insufficient reconstruction by the proto-type which takes each correspondence set in- dependently, rather than summarizing all possible reflexes for accepted cross-semantic cognate sets. 33
  66. Specific Findings: “middle” and “outside” Language middle outside (“out-middle”) Old

    Burmese ∅ ∅ Rangoon ∅ ∅ Achang k u - ŋ ⁵⁵ ∅ Xiandao k o - ŋ ⁵⁵ ∅ Atsi k u - ŋ ²¹ ∅ Bola k a u ŋ ³¹ k a u ŋ ³¹ Lashi k u - ŋ ³¹ ∅ Maru k a u ŋ ³⁵ k a u ŋ ³⁵ Proto-Burmish *k u - *ŋ ⁵⁵ ∅ [i] The algorithm refuses to reconstruct the morpheme “middle” in the word “outside”, as it only occurs two times. If the proto-type, again, only reconstructed one time per cross-semantic cognate set, the results would be the same. 34
  67. Specific Findings: “tree” and “wood” Language tree wood Old Burmese

    s a ts - s a ts - Rangoon tθ i ʔ ⁴ tθ i ʔ ⁴ Achang s a ŋ ³¹ ʂ ə k ⁵⁵ Xiandao ʂ ɯ k ⁵⁵ ∅ Atsi s i k ⁵⁵ s i k ⁵⁵ Bola s a k ⁵⁵ s a k ⁵⁵ Lashi s ə̰ k ⁵⁵ s ə̰ k ⁵⁵ Maru s a̰ k ⁵⁵ s a̰ k ⁵⁵ Proto-Burmish s a k ⁵⁵ *s ə̰ *k ⁵⁵ [i] Apart from the vowel, which is marked as irregular in the reconstruc- tion for “wood”, this reconstruction is regular and also easy to compare with other Sino-Tibetan reflexes. The reconstruction for “tree”, on the other hand, is irregular, due to the wrong cognate assignment for Achang. 35
  68. We are only beginning to explore the potential of sound

    correspon- dence pattern analysis as a backbone for automatic linguistic recon- struction. Even now, it is straightforward to manually annotate all different sound correspondence patterns which we could infer from the data. To test the full potential of the approach, we will have to drastically increase the number of lexical items in our data, but even in this state, the approach is promising, and can serve as a starting point for a classical phonological reconstruction analysis. 37