Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LingPy — A Python library for quantitative tasks in historical linguistics

LingPy — A Python library for quantitative tasks in historical linguistics

Talk held at the workshop Typologica, July 19-20, Düsseldorf.

Johann-Mattis List

July 19, 2012
Tweet

More Decks by Johann-Mattis List

Other Decks in Science

Transcript

  1. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection .

    . . . . . . LingPy – A Python Library for Quantitative Tasks in Historical Linguistics Johann-Mattis List∗ ∗Institut für Romanistik II Heinrich Heine University Düsseldorf 19. Juli 2012 1 / 32
  2. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Introductory

    Remarks Konservativ kommt nicht von Konserve! Angela Merkel (10. April 2000) 4 / 32
  3. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Introductory

    Remarks konservativ Part of Speech adjective frequency moderate 5 / 32
  4. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Introductory

    Remarks konservativ Part of Speech adjective frequency moderate meaning “sticking to the traditional” synonyms “rückschrittlich”, “antiquiert”, “rück- ständig”, “unzeitgemäß” pronunciation kɔnzɛrvatiːf 5 / 32
  5. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Introductory

    Remarks konservativ Part of Speech adjective frequency moderate meaning “sticking to the traditional” synonyms “rückschrittlich”, “antiquiert”, “rück- ständig”, “unzeitgemäß” pronunciation kɔnzɛrvatiːf origin English conservative 5 / 32
  6. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Introductory

    Remarks English conservative “antiquated” Middle Latin conservativus “protective” Latin conservare “keep unchanged” Proto-Romance *serwo- “shepherd” Proto-Indo-European *ser-u-o “guardian” Adamic Language ??? “?” 6 / 32
  7. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Introductory

    Remarks English conservative “antiquated” Middle Latin conservativus “protective” Latin conservare “keep unchanged” Proto-Romance *serwo- “shepherd” Proto-Indo-European *ser-u-o “guardian” Adamic Language ??? “?” 6 / 32
  8. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Introductory

    Remarks English conservative “antiquated” Middle Latin conservativus “protective” Latin conservare “keep unchanged” Proto-Romance *serwo- “shepherd” Proto-Indo-European *ser-u-o “guardian” Adamic Language ??? “?” 6 / 32
  9. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Introductory

    Remarks English conservative “antiquated” Middle Latin conservativus “protective” Latin conservare “keep unchanged” Proto-Romance *serwo- “shepherd” Proto-Indo-European *ser-u-o “guardian” Adamic Language ??? “?” 6 / 32
  10. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Introductory

    Remarks English conservative “antiquated” Middle Latin conservativus “protective” Latin conservare “keep unchanged” Proto-Romance *serwo- “shepherd” Proto-Indo-European *ser-u-o “guardian” Adamic Language ??? “?” 6 / 32
  11. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Introductory

    Remarks English conservative “antiquated” Middle Latin conservativus “protective” Latin conservare “keep unchanged” Proto-Romance *serwo- “shepherd” Proto-Indo-European *ser-u-o “guardian” Adamic Language ??? “?” 6 / 32
  12. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Introductory

    Remarks Konserve Part of Speech noun frequency moderate 7 / 32
  13. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Introductory

    Remarks Konserve Part of Speech noun frequency moderate meaning “preserve tin” synonyms “Büchse”, “Konservenbüchse”, “Konservendose” pronunciation kɔnzɛrvə 7 / 32
  14. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Introductory

    Remarks Konserve Part of Speech noun frequency moderate meaning “preserve tin” synonyms “Büchse”, “Konservenbüchse”, “Konservendose” pronunciation kɔnzɛrvə origin Middle Latin conserva 7 / 32
  15. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Introductory

    Remarks Middle Latin conserva “antiquated” Middle Latin conserva “pickled fruits” Latin conservare “keep unchanged” Proto-Romance *serwo- “shepherd” Proto-Indo-European *ser-u-o “guardian” Adamic Language ??? “?” 8 / 32
  16. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Introductory

    Remarks Middle Latin conserva “antiquated” Middle Latin conserva “pickled fruits” Latin conservare “keep unchanged” Proto-Romance *serwo- “shepherd” Proto-Indo-European *ser-u-o “guardian” Adamic Language ??? “?” 8 / 32
  17. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Introductory

    Remarks Middle Latin conserva “antiquated” Middle Latin conserva “pickled fruits” Latin conservare “keep unchanged” Proto-Romance *serwo- “shepherd” Proto-Indo-European *ser-u-o “guardian” Adamic Language ??? “?” 8 / 32
  18. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Introductory

    Remarks Middle Latin conserva “antiquated” Middle Latin conserva “pickled fruits” Latin conservare “keep unchanged” Proto-Romance *serwo- “shepherd” Proto-Indo-European *ser-u-o “guardian” Adamic Language ??? “?” 8 / 32
  19. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Introductory

    Remarks Middle Latin conserva “antiquated” Middle Latin conserva “pickled fruits” Latin conservare “keep unchanged” Proto-Romance *serwo- “shepherd” Proto-Indo-European *ser-u-o “guardian” Adamic Language ??? “?” 8 / 32
  20. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Introductory

    Remarks German konservativ “antiquated” German konservativ “antiquated” Latin conservare “keep unchanged” Latin conservare “shepherd” Konserve Konserve “preserve tin” Latin conservare “keep unchanged” 8 / 32
  21. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Introductory

    Remarks Konservativ kommt nicht von Konserve! Q. E. D. 9 / 32
  22. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Introductory

    Remarks What the *** is historical linguistics? . 10 / 32
  23. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Introductory

    Remarks What the *** is historical linguistics? . Historical linguistics is a (scientific) discipline that does not specifically care about what words mean, but where they come from... 10 / 32
  24. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Structure

    of the Talk . . . 1 Basic Ideas . . . 2 Sequence Modelling . . . 3 Phonetic Alignment . . . 4 Automatic Cognate Detection 11 / 32
  25. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection What

    is LingPy? LingPy is a Python library (see http://lingulist.de/lingpy) for automatic tasks in historical linguistics. 12 / 32
  26. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection What

    is LingPy? LingPy is a Python library (see http://lingulist.de/lingpy) for automatic tasks in historical linguistics. The current release of LingPy (lingpy-1.0) provides methods for sequence modelling, pairwise and multiple sequence alignment (SCA, List 2012a), automatic cognate detection (LexStat, List 2012b), and plotting routines (see the online documentation for details). 12 / 32
  27. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection What

    is LingPy? LingPy is a Python library (see http://lingulist.de/lingpy) for automatic tasks in historical linguistics. The current release of LingPy (lingpy-1.0) provides methods for sequence modelling, pairwise and multiple sequence alignment (SCA, List 2012a), automatic cognate detection (LexStat, List 2012b), and plotting routines (see the online documentation for details). LingPy can be invoked from the Python shell or inside Python scripts (examples are given in the online documentation). 12 / 32
  28. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Current

    Features tokenize phonetic sequences convert phonetic sequences into abstract sound classes 13 / 32
  29. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Current

    Features tokenize phonetic sequences convert phonetic sequences into abstract sound classes calculate basic prosodic characteristics of phonetic sequences 13 / 32
  30. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Current

    Features tokenize phonetic sequences convert phonetic sequences into abstract sound classes calculate basic prosodic characteristics of phonetic sequences carry out pairwise alignment analyses of phonetic sequences 13 / 32
  31. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Current

    Features tokenize phonetic sequences convert phonetic sequences into abstract sound classes calculate basic prosodic characteristics of phonetic sequences carry out pairwise alignment analyses of phonetic sequences carry out multiple alignment analyses of phonetic sequences 13 / 32
  32. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Current

    Features tokenize phonetic sequences convert phonetic sequences into abstract sound classes calculate basic prosodic characteristics of phonetic sequences carry out pairwise alignment analyses of phonetic sequences carry out multiple alignment analyses of phonetic sequences automatically search for cognates and etymologically related words in multilingual word lists 13 / 32
  33. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Basic

    Ideas adapt common techniques for sequence comparison and phylogenetic reconstruction to the specific needs of historical linguistics (general idea) 14 / 32
  34. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Basic

    Ideas adapt common techniques for sequence comparison and phylogenetic reconstruction to the specific needs of historical linguistics (general idea) construct realistic models of phonetic sequences by distinguishing external and internal representations (module lingpy.sequence) 14 / 32
  35. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Basic

    Ideas adapt common techniques for sequence comparison and phylogenetic reconstruction to the specific needs of historical linguistics (general idea) construct realistic models of phonetic sequences by distinguishing external and internal representations (module lingpy.sequence) compare sequences in a way that closely reflects common linguistic theory (module lingpy.compare) 14 / 32
  36. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Basic

    Ideas adapt common techniques for sequence comparison and phylogenetic reconstruction to the specific needs of historical linguistics (general idea) construct realistic models of phonetic sequences by distinguishing external and internal representations (module lingpy.sequence) compare sequences in a way that closely reflects common linguistic theory (module lingpy.compare) compare languages in a way that closely reflects the basic methods of historical linguistics (module lingpy.lexstat) 14 / 32
  37. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Excursus:

    Sequence Similarity . Synchronic Sequence Similarity . . . . . . . . Sequences are judged to be similar if the segments of the sequences are phonetically similar (‘phenotypic resemblence’, Lass 1997). 16 / 32
  38. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Excursus:

    Sequence Similarity . Synchronic Sequence Similarity . . . . . . . . Sequences are judged to be similar if the segments of the sequences are phonetically similar (‘phenotypic resemblence’, Lass 1997). . Diachronic Sequence Similarity . . . . . . . . Sequences are judged to be similar if the segments of the sequences correspond systematically (‘genotypic resemblence’, Lass 1997). 16 / 32
  39. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Excursus:

    Sequence Similarity Synchronic Sequence Similarity Diachronic Sequence Similarity 17 / 32
  40. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Excursus:

    Sequence Similarity Synchronic Sequence Similarity Greek mati ‘eye’ ≈ Malay mata ‘eye’ Greek θɛɔs ‘god’ ≈ Spanish diɔs ‘god’ Diachronic Sequence Similarity 17 / 32
  41. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Excursus:

    Sequence Similarity Synchronic Sequence Similarity Greek mati ‘eye’ ≈ Malay mata ‘eye’ Greek θɛɔs ‘god’ ≈ Spanish diɔs ‘god’ Diachronic Sequence Similarity German ʦʰaːn ‘tooth’ ≈ English tʊːθ ‘tooth’ Spanish eʧo ‘fact’ ≈ French fɛ ‘fact’ 17 / 32
  42. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Paradigmatic

    Aspects . Sound Classes . . . . . . . . Sounds which often occur in correspondence relations in genetically related languages can be clustered into classes (types). It is assumed “that phonetic correspondences inside a ‘type’ are more regular than those between different ‘types’” (Dolgopolsky 1986: 35). 18 / 32
  43. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Paradigmatic

    Aspects . Sound Classes . . . . . . . . Sounds which often occur in correspondence relations in genetically related languages can be clustered into classes (types). It is assumed “that phonetic correspondences inside a ‘type’ are more regular than those between different ‘types’” (Dolgopolsky 1986: 35). k g p b ʧ ʤ f v t d ʃ ʒ θ ð s z 1 18 / 32
  44. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Paradigmatic

    Aspects . Sound Classes . . . . . . . . Sounds which often occur in correspondence relations in genetically related languages can be clustered into classes (types). It is assumed “that phonetic correspondences inside a ‘type’ are more regular than those between different ‘types’” (Dolgopolsky 1986: 35). k g p b ʧ ʤ f v t d ʃ ʒ θ ð s z 1 18 / 32
  45. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Paradigmatic

    Aspects . Sound Classes . . . . . . . . Sounds which often occur in correspondence relations in genetically related languages can be clustered into classes (types). It is assumed “that phonetic correspondences inside a ‘type’ are more regular than those between different ‘types’” (Dolgopolsky 1986: 35). k g p b ʧ ʤ f v t d ʃ ʒ θ ð s z 1 18 / 32
  46. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Paradigmatic

    Aspects . Sound Classes . . . . . . . . Sounds which often occur in correspondence relations in genetically related languages can be clustered into classes (types). It is assumed “that phonetic correspondences inside a ‘type’ are more regular than those between different ‘types’” (Dolgopolsky 1986: 35). K T P S 1 18 / 32
  47. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Syntagmatic

    Aspects sound change occurs more frequently in prosodically weak positions of phonetic sequences (Geisler 1992) 19 / 32
  48. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Syntagmatic

    Aspects sound change occurs more frequently in prosodically weak positions of phonetic sequences (Geisler 1992) given the sonority structure of a phonetic sequence, one can distinguish positions that differ regarding their prosodic context 19 / 32
  49. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Syntagmatic

    Aspects sound change occurs more frequently in prosodically weak positions of phonetic sequences (Geisler 1992) given the sonority structure of a phonetic sequence, one can distinguish positions that differ regarding their prosodic context prosodic context can be modelled by representing a sequence by a prosodic string, indicating the different prosodic contexts of each segment 19 / 32
  50. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Syntagmatic

    Aspects sound change occurs more frequently in prosodically weak positions of phonetic sequences (Geisler 1992) given the sonority structure of a phonetic sequence, one can distinguish positions that differ regarding their prosodic context prosodic context can be modelled by representing a sequence by a prosodic string, indicating the different prosodic contexts of each segment based on the relative strength of all sites in a phonetic sequence, substitution scores and gap penalties can be modified when carrying out alignment analyses 19 / 32
  51. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Syntagmatic

    Aspects j a b ə l k a ↑ △ ↑ △ ↓ ↑ △ ↑ ascending △ maximum ↓ descending 20 / 32
  52. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Syntagmatic

    Aspects j a b ə l k a ↑ △ ↑ △ ↓ ↑ △ o strong weak 20 / 32
  53. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Examples

    >>> from lingpy import * >>> konservativ = Sequence("kɔnzɛrvatiːf") 21 / 32
  54. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Examples

    >>> from lingpy import * >>> konservativ = Sequence("kɔnzɛrvatiːf") >>> print ' '.join(konservativ.tokens) 21 / 32
  55. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Examples

    >>> from lingpy import * >>> konservativ = Sequence("kɔnzɛrvatiːf") >>> print ' '.join(konservativ.tokens) k ɔ n z ɛ r v a t iː f 21 / 32
  56. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Examples

    >>> from lingpy import * >>> konservativ = Sequence("kɔnzɛrvatiːf") >>> print ' '.join(konservativ.tokens) k ɔ n z ɛ r v a t iː f >>> print konservativ.classes 21 / 32
  57. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Examples

    >>> from lingpy import * >>> konservativ = Sequence("kɔnzɛrvatiːf") >>> print ' '.join(konservativ.tokens) k ɔ n z ɛ r v a t iː f >>> print konservativ.classes KUNSERBATIB 21 / 32
  58. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Examples

    >>> from lingpy import * >>> konservativ = Sequence("kɔnzɛrvatiːf") >>> print ' '.join(konservativ.tokens) k ɔ n z ɛ r v a t iː f >>> print konservativ.classes KUNSERBATIB >>> print ' '.join([str(i) for i in konservativ.sonar]) 21 / 32
  59. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Examples

    >>> from lingpy import * >>> konservativ = Sequence("kɔnzɛrvatiːf") >>> print ' '.join(konservativ.tokens) k ɔ n z ɛ r v a t iː f >>> print konservativ.classes KUNSERBATIB >>> print ' '.join([str(i) for i in konservativ.sonar]) 1 7 4 3 7 5 3 7 1 7 3 21 / 32
  60. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Examples

    >>> from lingpy import * >>> konservativ = Sequence("kɔnzɛrvatiːf") >>> print ' '.join(konservativ.tokens) k ɔ n z ɛ r v a t iː f >>> print konservativ.classes KUNSERBATIB >>> print ' '.join([str(i) for i in konservativ.sonar]) 1 7 4 3 7 5 3 7 1 7 3 >>> print konservativ.prostring 21 / 32
  61. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Examples

    >>> from lingpy import * >>> konservativ = Sequence("kɔnzɛrvatiːf") >>> print ' '.join(konservativ.tokens) k ɔ n z ɛ r v a t iː f >>> print konservativ.classes KUNSERBATIB >>> print ' '.join([str(i) for i in konservativ.sonar]) 1 7 4 3 7 5 3 7 1 7 3 >>> print konservativ.prostring #VcCvcCvCv$ 21 / 32
  62. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection h

    j - ä r t a - h - e - r z - - h - e a r t - - c - - o r d i s hjärta herz heart cordis Phonetic Alignment 22 / 32
  63. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Excursus:

    Alignment Analyses . Alignment Analyses . . . . . . . . 23 / 32
  64. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Excursus:

    Alignment Analyses . Alignment Analyses . . . . . . . . In alignment analyses, sequences are arranged in a matrix in such a way that corresponding elements occur in the same column, while empty cells resulting from non-corresponding elements are filled with gap symbols. 23 / 32
  65. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Excursus:

    Alignment Analyses . Alignment Analyses . . . . . . . . In alignment analyses, sequences are arranged in a matrix in such a way that corresponding elements occur in the same column, while empty cells resulting from non-corresponding elements are filled with gap symbols. t ɔ x t ə r d ɔː t ə r 23 / 32
  66. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Excursus:

    Alignment Analyses . Alignment Analyses . . . . . . . . In alignment analyses, sequences are arranged in a matrix in such a way that corresponding elements occur in the same column, while empty cells resulting from non-corresponding elements are filled with gap symbols. t ɔ x t ə r d ɔː t ə r 23 / 32
  67. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Excursus:

    Alignment Analyses . Alignment Analyses . . . . . . . . In alignment analyses, sequences are arranged in a matrix in such a way that corresponding elements occur in the same column, while empty cells resulting from non-corresponding elements are filled with gap symbols. t ɔ x t ə r d ɔː - t ə r 23 / 32
  68. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Phonetic

    Alignment . Sound-Class Based Alignment (SCA, List 2012a) . . . . . . . . 24 / 32
  69. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Phonetic

    Alignment . Sound-Class Based Alignment (SCA, List 2012a) . . . . . . . . Sound classes and alignment analyses can be easily combined by representing phonetic sequences internally as sound classes and comparing the sound classes with traditional alignment algorithms. 24 / 32
  70. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Phonetic

    Alignment . Sound-Class Based Alignment (SCA, List 2012a) . . . . . . . . Sound classes and alignment analyses can be easily combined by representing phonetic sequences internally as sound classes and comparing the sound classes with traditional alignment algorithms. INPUT tɔxtər dɔːtər TOKENIZATION t, ɔ, x, t, ə, r d, ɔː, t, ə, r CONVERSION t ɔ x … → T O G … d ɔː t … → T O T … ALIGNMENT T O G T E R T O - T E R CONVERSION T O G … → t ɔ x … T O - … → d oː - … OUTPUT t ɔ x t ə r d ɔː - t ə r 1 24 / 32
  71. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Example

    >>> from lingpy import * >>> konservativ = "kɔnzɛrvatiːf" 25 / 32
  72. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Example

    >>> from lingpy import * >>> konservativ = "kɔnzɛrvatiːf" >>> Konserve = "kɔnzɛrvə" 25 / 32
  73. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Example

    >>> from lingpy import * >>> konservativ = "kɔnzɛrvatiːf" >>> Konserve = "kɔnzɛrvə" >>> pair = Pairwise(konservativ,Konserve) 25 / 32
  74. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Example

    >>> from lingpy import * >>> konservativ = "kɔnzɛrvatiːf" >>> Konserve = "kɔnzɛrvə" >>> pair = Pairwise(konservativ,Konserve) >>> pair.align() 25 / 32
  75. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Example

    >>> from lingpy import * >>> konservativ = "kɔnzɛrvatiːf" >>> Konserve = "kɔnzɛrvə" >>> pair = Pairwise(konservativ,Konserve) >>> pair.align() >>> pair.distance() 25 / 32
  76. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Example

    >>> from lingpy import * >>> konservativ = "kɔnzɛrvatiːf" >>> Konserve = "kɔnzɛrvə" >>> pair = Pairwise(konservativ,Konserve) >>> pair.align() >>> pair.distance() >>> print pair 25 / 32
  77. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Example

    >>> from lingpy import * >>> konservativ = "kɔnzɛrvatiːf" >>> Konserve = "kɔnzɛrvə" >>> pair = Pairwise(konservativ,Konserve) >>> pair.align() >>> pair.distance() >>> print pair k ɔ n z ɛ r v a t iː f k ɔ n z ɛ r v - - ə - 0.3 25 / 32
  78. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Excursus:

    Language-Specific Similarities . Language-Specific Similarity Measure . . . . . . . . 28 / 32
  79. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Excursus:

    Language-Specific Similarities . Language-Specific Similarity Measure . . . . . . . . recall the aforementioned distinction between synchronic and diachronic similarity 28 / 32
  80. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Excursus:

    Language-Specific Similarities . Language-Specific Similarity Measure . . . . . . . . recall the aforementioned distinction between synchronic and diachronic similarity diachronic similarity is the basic of all traditional approaches to cognate detection 28 / 32
  81. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Excursus:

    Language-Specific Similarities . Language-Specific Similarity Measure . . . . . . . . recall the aforementioned distinction between synchronic and diachronic similarity diachronic similarity is the basic of all traditional approaches to cognate detection diachronic sequence similarity is determined on the basis of systematic sound correspondences as opposed to similarity based on surface resemblances of phonetic segments 28 / 32
  82. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Excursus:

    Language-Specific Similarities . Language-Specific Similarity Measure . . . . . . . . recall the aforementioned distinction between synchronic and diachronic similarity diachronic similarity is the basic of all traditional approaches to cognate detection diachronic sequence similarity is determined on the basis of systematic sound correspondences as opposed to similarity based on surface resemblances of phonetic segments the most crucial aspect of correspondence-based (diachronic) similarity is that it is language-specific 28 / 32
  83. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Excursus:

    Language-Specific Similarities . Language-Specific Similarity Measure . . . . . . . . recall the aforementioned distinction between synchronic and diachronic similarity diachronic similarity is the basic of all traditional approaches to cognate detection diachronic sequence similarity is determined on the basis of systematic sound correspondences as opposed to similarity based on surface resemblances of phonetic segments the most crucial aspect of correspondence-based (diachronic) similarity is that it is language-specific Meaning German Dutch English “tooth” Zahn [ ʦ aːn] tand [ t ɑnt] tooth [ t ʊːθ] “ten” zehn [ ʦ eːn] tien [ t iːn] ten [ t ɛn] “tongue” Zunge [ ʦ ʊŋə] tong [ t ɔŋ] tongue [ t ʌŋ] 28 / 32
  84. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Excursus:

    Language-Specific Similarities . Language-Specific Similarity Measure . . . . . . . . recall the aforementioned distinction between synchronic and diachronic similarity diachronic similarity is the basic of all traditional approaches to cognate detection diachronic sequence similarity is determined on the basis of systematic sound correspondences as opposed to similarity based on surface resemblances of phonetic segments the most crucial aspect of correspondence-based (diachronic) similarity is that it is language-specific Meaning Shanghai Beijing Guangzhou “nine” [ ʨ iɤ³⁵] Beijing [ ʨ iou²¹⁴] [ k ɐu³⁵] “today” [ ʨ iŋ⁵⁵ʦɔ²¹] Beijing [ ʨ iɚ⁵⁵] [ k ɐm⁵³jɐt²] “rooster” [koŋ⁵⁵ ʨ i²¹] Beijing[kuŋ⁵⁵ ʨ i⁵⁵] [ k ɐi⁵⁵koŋ⁵⁵] 28 / 32
  85. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Automatic

    Cognate Detection . LexStat (List 2012b) . . . . . . . . 29 / 32
  86. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Automatic

    Cognate Detection . LexStat (List 2012b) . . . . . . . . LexStat is a method for the automatic detection of cognates in multilingual word lists 29 / 32
  87. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Automatic

    Cognate Detection . LexStat (List 2012b) . . . . . . . . LexStat is a method for the automatic detection of cognates in multilingual word lists LexStat based its cognate judgments on language-specific, diachronic similarity 29 / 32
  88. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Automatic

    Cognate Detection . LexStat (List 2012b) . . . . . . . . LexStat is a method for the automatic detection of cognates in multilingual word lists LexStat based its cognate judgments on language-specific, diachronic similarity LexStat combines the most recent approaches to sequence comparison in computer science and biology with a novel approach to sequence modelling 29 / 32
  89. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Automatic

    Cognate Detection . LexStat (List 2012b) . . . . . . . . LexStat is a method for the automatic detection of cognates in multilingual word lists LexStat based its cognate judgments on language-specific, diachronic similarity LexStat combines the most recent approaches to sequence comparison in computer science and biology with a novel approach to sequence modelling LexStat largely outperforms alternative approaches and yields 90% precision and 84% recall on very divergent language families, such as Indo-European and Austronesian 29 / 32
  90. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Automatic

    Cognate Detection . LexStat (List 2012b) . . . . . . . . LexStat is a method for the automatic detection of cognates in multilingual word lists LexStat based its cognate judgments on language-specific, diachronic similarity LexStat combines the most recent approaches to sequence comparison in computer science and biology with a novel approach to sequence modelling LexStat largely outperforms alternative approaches and yields 90% precision and 84% recall on very divergent language families, such as Indo-European and Austronesian LexStat yields transparent decisions which can be directly examined by the researcher 29 / 32
  91. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Automatic

    Cognate Detection ID Items German English Swedish 1 hand hant hænd hand 2 woman fraʊ wʊmən kvina 3 know kɛnən nəʊ çɛna 3 know vɪsən - veːta … … … … … 30 / 32
  92. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Automatic

    Cognate Detection ID Items German COG English COG Swedish COG 1 hand hant 1 hænd 1 hand 1 2 woman fraʊ 2 wʊmən 3 kvina 4 3 know kɛnən 5 nəʊ 5 çɛna 5 3 know vɪsən 6 - 0 veːta 6 … … … … … … … … 30 / 32
  93. Basic Ideas Sequence Modelling Phonetic Alignment Automatic Cognate Detection Special

    thanks to: • The German Federal Mi- nistry of Education and Research (BMBF) for funding our research project. • Hans Geisler for his hel- pful, critical, and ins- piring support. 32 / 32