$30 off During Our Annual Pro Sale. View Details »

LexStat: Automatic detection of cognates in multilingual wordlists

LexStat: Automatic detection of cognates in multilingual wordlists

Talk held at the Joint Workshop of LINGVIS and UNCLH, April 23-24, Avignon.

Johann-Mattis List

April 23, 2012
Tweet

More Decks by Johann-Mattis List

Other Decks in Science

Transcript

  1. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    .
    .
    . .
    .
    .
    .
    LexStat: Automatic Detection of Cognates in
    Multilingual Wordlists
    Johann-Mattis List∗
    ∗Institute for Romance Languages and Literature
    Heinrich Heine University Düsseldorf
    April 24, 2012
    1 / 28

    View Slide

  2. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Structure of the Talk
    .
    . .
    1 Keys to the Past
    .
    . .
    2 Identification of Cognates
    .
    . .
    3 LexStat
    .
    . .
    4 Evaluation
    2 / 28

    View Slide

  3. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Keys to the Past
    3 / 28

    View Slide

  4. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Charles Lyell on Languages
    4 / 28

    View Slide

  5. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Charles Lyell on Languages
    The Geological Evidences
    of
    The Antiquity of Man
    with Remarks on Theories of
    The Origin of Species by Variation
    By Sir Charles Lyell
    London
    John Murray, Albemarle Street
    1863
    4 / 28

    View Slide

  6. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Charles Lyell on Languages
    If we new not-
    hing of the existence
    of Latin, - if all
    historical documents
    previous to the fin-
    teenth century had
    been lost, - if tra-
    dition even was si-
    lent as to the former
    existance of a Ro-
    man empire, a me-
    re comparison of the
    Italian, Spanish,
    Portuguese, French,
    Wallachian, and
    Rhaetian dialects
    would enable us to
    say that at some
    time there must ha-
    ve been a language,
    from which these
    six modern dialects
    derive their origin
    in common.
    4 / 28

    View Slide

  7. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Uniformitarianism and Abduction
    5 / 28

    View Slide

  8. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Uniformitarianism and Abduction
    .
    Uniformitarianism
    .
    .
    .
    . .
    .
    .
    .
    5 / 28

    View Slide

  9. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Uniformitarianism and Abduction
    .
    Uniformitarianism
    .
    .
    .
    . .
    .
    .
    .
    “Universality of Change” – Change is independent of time
    and space
    5 / 28

    View Slide

  10. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Uniformitarianism and Abduction
    .
    Uniformitarianism
    .
    .
    .
    . .
    .
    .
    .
    “Universality of Change” – Change is independent of time
    and space
    “Graduality of Change” – Change is neither abrupt nor
    chaotic
    5 / 28

    View Slide

  11. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Uniformitarianism and Abduction
    .
    Uniformitarianism
    .
    .
    .
    . .
    .
    .
    .
    “Universality of Change” – Change is independent of time
    and space
    “Graduality of Change” – Change is neither abrupt nor
    chaotic
    “Uniformity of Change” – Change is not heterogeneous
    5 / 28

    View Slide

  12. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Uniformitarianism and Abduction
    .
    Uniformitarianism
    .
    .
    .
    . .
    .
    .
    .
    “Universality of Change” – Change is independent of time
    and space
    “Graduality of Change” – Change is neither abrupt nor
    chaotic
    “Uniformity of Change” – Change is not heterogeneous
    .
    Abduction
    .
    .
    .
    . .
    .
    .
    .
    5 / 28

    View Slide

  13. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Uniformitarianism and Abduction
    .
    Uniformitarianism
    .
    .
    .
    . .
    .
    .
    .
    “Universality of Change” – Change is independent of time
    and space
    “Graduality of Change” – Change is neither abrupt nor
    chaotic
    “Uniformity of Change” – Change is not heterogeneous
    .
    Abduction
    .
    .
    .
    . .
    .
    .
    .
    Present Events or Patterns
    + Known Laws
    => Abduction of Historical Facts
    5 / 28

    View Slide

  14. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Uniformitarianism and Abduction
    .
    Uniformitarianism
    .
    .
    .
    . .
    .
    .
    .
    “Universality of Change” – Change is independent of time
    and space
    “Graduality of Change” – Change is neither abrupt nor
    chaotic
    “Uniformity of Change” – Change is not heterogeneous
    .
    Abduction
    .
    .
    .
    . .
    .
    .
    .
    Present Events or Patterns
    + Known Laws
    => Abduction of Historical Facts
    Similarities Between Languages
    + Language Change
    => Inference of Proto-Languages
    5 / 28

    View Slide

  15. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    h j - ä r t a -
    h - e - r z - -
    h - e a r t - -
    c - - o r d i s
    hjärta
    herz
    heart
    cordis
    Identification of Cognates
    6 / 28

    View Slide

  16. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    The Comparative Method
    .
    Basic Procedure
    .
    .
    .
    . .
    .
    .
    .
    7 / 28

    View Slide

  17. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    The Comparative Method
    .
    Basic Procedure
    .
    .
    .
    . .
    .
    .
    .
    Compile an initial list of putative cognate sets.
    7 / 28

    View Slide

  18. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    The Comparative Method
    .
    Basic Procedure
    .
    .
    .
    . .
    .
    .
    .
    Compile an initial list of putative cognate sets.
    Extract an initial list of putative sets of sound
    correspondences from the initial cognate list.
    7 / 28

    View Slide

  19. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    The Comparative Method
    .
    Basic Procedure
    .
    .
    .
    . .
    .
    .
    .
    Compile an initial list of putative cognate sets.
    Extract an initial list of putative sets of sound
    correspondences from the initial cognate list.
    Refine the cognate list and the correspondence list by
    7 / 28

    View Slide

  20. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    The Comparative Method
    .
    Basic Procedure
    .
    .
    .
    . .
    .
    .
    .
    Compile an initial list of putative cognate sets.
    Extract an initial list of putative sets of sound
    correspondences from the initial cognate list.
    Refine the cognate list and the correspondence list by
    adding and deleting cognate sets from the cognate list,
    depending on whether they are consistent with the
    correspondence list or not, and
    7 / 28

    View Slide

  21. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    The Comparative Method
    .
    Basic Procedure
    .
    .
    .
    . .
    .
    .
    .
    Compile an initial list of putative cognate sets.
    Extract an initial list of putative sets of sound
    correspondences from the initial cognate list.
    Refine the cognate list and the correspondence list by
    adding and deleting cognate sets from the cognate list,
    depending on whether they are consistent with the
    correspondence list or not, and
    adding and deleting correspondence sets from the
    correspondence list, depending on whether they are
    consistent with the cognate list or not.
    7 / 28

    View Slide

  22. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    The Comparative Method
    .
    Basic Procedure
    .
    .
    .
    . .
    .
    .
    .
    Compile an initial list of putative cognate sets.
    Extract an initial list of putative sets of sound
    correspondences from the initial cognate list.
    Refine the cognate list and the correspondence list by
    adding and deleting cognate sets from the cognate list,
    depending on whether they are consistent with the
    correspondence list or not, and
    adding and deleting correspondence sets from the
    correspondence list, depending on whether they are
    consistent with the cognate list or not.
    Finish when the results are satisfying enough.
    7 / 28

    View Slide

  23. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    The Comparative Method
    .
    Language-Specific Similarity Measure
    .
    .
    .
    . .
    .
    .
    .
    8 / 28

    View Slide

  24. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    The Comparative Method
    .
    Language-Specific Similarity Measure
    .
    .
    .
    . .
    .
    .
    .
    Sequence similarity is determined on the basis of
    systematic sound correspondences as opposed to similarity
    based on surface resemblances of phonetic segments.
    8 / 28

    View Slide

  25. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    The Comparative Method
    .
    Language-Specific Similarity Measure
    .
    .
    .
    . .
    .
    .
    .
    Sequence similarity is determined on the basis of
    systematic sound correspondences as opposed to similarity
    based on surface resemblances of phonetic segments.
    Lass (1997) calls this notion of similarity phenotypic as
    opposed to a genotypic notion of similarity.
    8 / 28

    View Slide

  26. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    The Comparative Method
    .
    Language-Specific Similarity Measure
    .
    .
    .
    . .
    .
    .
    .
    Sequence similarity is determined on the basis of
    systematic sound correspondences as opposed to similarity
    based on surface resemblances of phonetic segments.
    Lass (1997) calls this notion of similarity phenotypic as
    opposed to a genotypic notion of similarity.
    The most crucial aspect of correspondence-based similarity
    is that it is language-specific: Genotypic similarity is never
    defined in general terms but always with respect to the
    language systems which are being compared.
    bla
    German [ʦaːn] “tooth” Dutch tand [tɑnt] English [tʊːθ] “tooth”
    German [ʦeːn] “ten” Dutch tien [tiːn] English [tɛn] “ten”
    German [ʦʊŋə] “tongue” Dutch tong [tɔŋ] English [tʌŋ] “tongue”
    8 / 28

    View Slide

  27. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    The Comparative Method
    .
    Language-Specific Similarity Measure
    .
    .
    .
    . .
    .
    .
    .
    Sequence similarity is determined on the basis of
    systematic sound correspondences as opposed to similarity
    based on surface resemblances of phonetic segments.
    Lass (1997) calls this notion of similarity phenotypic as
    opposed to a genotypic notion of similarity.
    The most crucial aspect of correspondence-based similarity
    is that it is language-specific: Genotypic similarity is never
    defined in general terms but always with respect to the
    language systems which are being compared.
    Meaning German Dutch English
    “tooth” Zahn [ ʦ aːn] tand [ t ɑnt] tooth [ t ʊːθ]
    “ten” zehn [ ʦ eːn] tien [ t iːn] ten [ t ɛn]
    “tongue” Zunge [ ʦ ʊŋə] tong [ t ɔŋ] tongue [ t ʌŋ]
    8 / 28

    View Slide

  28. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    The Comparative Method
    .
    Language-Specific Similarity Measure
    .
    .
    .
    . .
    .
    .
    .
    Sequence similarity is determined on the basis of
    systematic sound correspondences as opposed to similarity
    based on surface resemblances of phonetic segments.
    Lass (1997) calls this notion of similarity phenotypic as
    opposed to a genotypic notion of similarity.
    The most crucial aspect of correspondence-based similarity
    is that it is language-specific: Genotypic similarity is never
    defined in general terms but always with respect to the
    language systems which are being compared.
    Meaning Shanghai Beijing Guangzhou
    “nine” [ ʨ iɤ³⁵] Beijing [ ʨ iou²¹⁴] [ k ɐu³⁵]
    “today” [ ʨ iŋ⁵⁵ʦɔ²¹] Beijing [ ʨ iɚ⁵⁵] [ k ɐm⁵³jɐt²]
    “rooster” [koŋ⁵⁵ ʨ i²¹] Beijing[kuŋ⁵⁵ ʨ i⁵⁵] [ k ɐi⁵⁵koŋ⁵⁵]
    8 / 28

    View Slide

  29. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Automatic Approaches
    9 / 28

    View Slide

  30. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Automatic Approaches
    .
    Alignment Analyses
    .
    .
    .
    . .
    .
    .
    .
    9 / 28

    View Slide

  31. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Automatic Approaches
    .
    Alignment Analyses
    .
    .
    .
    . .
    .
    .
    .
    In alignment analyses, sequences are arranged in a matrix in
    such a way that corresponding elements occur in the same
    column, while empty cells resulting from non-corresponding
    elements are filled with gap symbols.
    9 / 28

    View Slide

  32. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Automatic Approaches
    .
    Alignment Analyses
    .
    .
    .
    . .
    .
    .
    .
    In alignment analyses, sequences are arranged in a matrix in
    such a way that corresponding elements occur in the same
    column, while empty cells resulting from non-corresponding
    elements are filled with gap symbols.
    t ɔ x t ə r
    d ɔː t ə r
    9 / 28

    View Slide

  33. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Automatic Approaches
    .
    Alignment Analyses
    .
    .
    .
    . .
    .
    .
    .
    In alignment analyses, sequences are arranged in a matrix in
    such a way that corresponding elements occur in the same
    column, while empty cells resulting from non-corresponding
    elements are filled with gap symbols.
    t ɔ x t ə r
    d ɔː t ə r
    9 / 28

    View Slide

  34. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Automatic Approaches
    .
    Alignment Analyses
    .
    .
    .
    . .
    .
    .
    .
    In alignment analyses, sequences are arranged in a matrix in
    such a way that corresponding elements occur in the same
    column, while empty cells resulting from non-corresponding
    elements are filled with gap symbols.
    t ɔ x t ə r
    d ɔː - t ə r
    9 / 28

    View Slide

  35. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Automatic Approaches
    .
    Alignment Analyses
    .
    .
    .
    . .
    .
    .
    .
    In alignment analyses, sequences are arranged in a matrix in
    such a way that corresponding elements occur in the same
    column, while empty cells resulting from non-corresponding
    elements are filled with gap symbols.
    t ɔ x t ə r
    d ɔː - t ə r
    C
    ognate
    identification
    isusuallybased
    on
    a
    sim
    -
    ilarity
    or distance
    score
    (e.g., edit-distance) cal-
    culated
    from
    the
    num
    ber
    of m
    atches
    and
    m
    is-
    m
    atches
    in
    the
    alignm
    ent.
    9 / 28

    View Slide

  36. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Automatic Approaches
    .
    Sound Classes
    .
    .
    .
    . .
    .
    .
    .
    10 / 28

    View Slide

  37. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Automatic Approaches
    .
    Sound Classes
    .
    .
    .
    . .
    .
    .
    .
    Sounds which often occur in
    correspondence relations in
    genetically related languages
    can be clustered into classes
    (types). It is assumed “that
    phonetic correspondences
    inside a ‘type’ are more regular
    than those between different
    ‘types’” (Dolgopolsky 1986: 35).
    10 / 28

    View Slide

  38. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Automatic Approaches
    .
    Sound Classes
    .
    .
    .
    . .
    .
    .
    .
    Sounds which often occur in
    correspondence relations in
    genetically related languages
    can be clustered into classes
    (types). It is assumed “that
    phonetic correspondences
    inside a ‘type’ are more regular
    than those between different
    ‘types’” (Dolgopolsky 1986: 35).
    k g p b
    ʧ ʤ f v
    t d ʃ ʒ
    θ ð s z
    1
    10 / 28

    View Slide

  39. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Automatic Approaches
    .
    Sound Classes
    .
    .
    .
    . .
    .
    .
    .
    Sounds which often occur in
    correspondence relations in
    genetically related languages
    can be clustered into classes
    (types). It is assumed “that
    phonetic correspondences
    inside a ‘type’ are more regular
    than those between different
    ‘types’” (Dolgopolsky 1986: 35).
    k g p b
    ʧ ʤ f v
    t d ʃ ʒ
    θ ð s z
    1
    10 / 28

    View Slide

  40. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Automatic Approaches
    .
    Sound Classes
    .
    .
    .
    . .
    .
    .
    .
    Sounds which often occur in
    correspondence relations in
    genetically related languages
    can be clustered into classes
    (types). It is assumed “that
    phonetic correspondences
    inside a ‘type’ are more regular
    than those between different
    ‘types’” (Dolgopolsky 1986: 35).
    k g p b
    ʧ ʤ f v
    t d ʃ ʒ
    θ ð s z
    1
    10 / 28

    View Slide

  41. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Automatic Approaches
    .
    Sound Classes
    .
    .
    .
    . .
    .
    .
    .
    Sounds which often occur in
    correspondence relations in
    genetically related languages
    can be clustered into classes
    (types). It is assumed “that
    phonetic correspondences
    inside a ‘type’ are more regular
    than those between different
    ‘types’” (Dolgopolsky 1986: 35).
    K
    T
    P
    S
    1
    10 / 28

    View Slide

  42. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Automatic Approaches
    .
    Sound Classes
    .
    .
    .
    . .
    .
    .
    .
    Sounds which often occur in
    correspondence relations in
    genetically related languages
    can be clustered into classes
    (types). It is assumed “that
    phonetic correspondences
    inside a ‘type’ are more regular
    than those between different
    ‘types’” (Dolgopolsky 1986: 35).
    K
    T
    P
    S
    1
    C
    ognate
    identification
    is
    usually
    based
    on
    com
    -
    paring
    the
    first two
    consonants
    of two
    words:
    If
    they
    m
    atch
    regarding
    their
    sound
    classes,
    the
    words
    are
    judged
    to
    be
    cognate, otherw
    ise
    not.
    10 / 28

    View Slide

  43. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Automatic Approaches
    .
    Sound-Class-Based Alignment (SCA)
    .
    .
    .
    . .
    .
    .
    .
    11 / 28

    View Slide

  44. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Automatic Approaches
    .
    Sound-Class-Based Alignment (SCA)
    .
    .
    .
    . .
    .
    .
    .
    Sound classes and alignment analyses can be easily combined
    by representing phonetic sequences internally as sound classes
    and comparing the sound classes with traditional alignment
    algorithms.
    11 / 28

    View Slide

  45. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Automatic Approaches
    .
    Sound-Class-Based Alignment (SCA)
    .
    .
    .
    . .
    .
    .
    .
    Sound classes and alignment analyses can be easily combined
    by representing phonetic sequences internally as sound classes
    and comparing the sound classes with traditional alignment
    algorithms.
    INPUT
    tɔxtər
    dɔːtər
    TOKENIZATION
    t, ɔ, x, t, ə, r
    d, ɔː, t, ə, r
    CONVERSION
    t ɔ x … → T O G …
    d ɔː t … → T O T …
    ALIGNMENT
    T O G T E R
    T O - T E R
    CONVERSION
    T O G … → t ɔ x …
    T O - … → d oː - …
    OUTPUT
    t ɔ x t ə r
    d ɔː x t ə r
    1
    11 / 28

    View Slide

  46. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Automatic Approaches
    .
    Sound-Class-Based Alignment (SCA)
    .
    .
    .
    . .
    .
    .
    .
    Sound classes and alignment analyses can be easily combined
    by representing phonetic sequences internally as sound classes
    and comparing the sound classes with traditional alignment
    algorithms.
    INPUT
    tɔxtər
    dɔːtər
    TOKENIZATION
    t, ɔ, x, t, ə, r
    d, ɔː, t, ə, r
    CONVERSION
    t ɔ x … → T O G …
    d ɔː t … → T O T …
    ALIGNMENT
    T O G T E R
    T O - T E R
    CONVERSION
    T O G … → t ɔ x …
    T O - … → d oː - …
    OUTPUT
    t ɔ x t ə r
    d ɔː x t ə r
    1
    C
    ognate
    identification
    m
    ay
    be
    based
    on
    a
    cer-
    tain
    threshold
    and
    distance
    scores
    derived
    from
    the
    sim
    ilarity
    scores
    yielded
    by
    the
    alignm
    ent al-
    gorithm
    .
    11 / 28

    View Slide

  47. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Traditional vs. Automatic Approaches
    12 / 28

    View Slide

  48. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Traditional vs. Automatic Approaches
    .
    Similarity
    .
    .
    .
    . .
    .
    .
    .
    12 / 28

    View Slide

  49. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Traditional vs. Automatic Approaches
    .
    Similarity
    .
    .
    .
    . .
    .
    .
    .
    Almost all current automatic approaches are based on a
    language-independent similarity measure, while the
    comparative method applies a language-specific one. All
    automatic approaches will therefore yield the same scores for
    phenotypically identical sequences, regardless of the language
    systems they belong to.
    12 / 28

    View Slide

  50. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    LexStat
    13 / 28

    View Slide

  51. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Working Procedure
    14 / 28

    View Slide

  52. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Working Procedure
    Sequence Input sequences are read from specifically for-
    matted input files
    14 / 28

    View Slide

  53. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Working Procedure
    Sequence Input sequences are read from specifically for-
    matted input files
    1 Sequence Conversion sequences are converted to sound
    classes and prosodic profiles
    14 / 28

    View Slide

  54. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Working Procedure
    Sequence Input sequences are read from specifically for-
    matted input files
    1 Sequence Conversion sequences are converted to sound
    classes and prosodic profiles
    2 Scoring-Scheme Creation using a permutation method, language-
    specific scoring schemes are determined
    14 / 28

    View Slide

  55. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Working Procedure
    Sequence Input sequences are read from specifically for-
    matted input files
    1 Sequence Conversion sequences are converted to sound
    classes and prosodic profiles
    2 Scoring-Scheme Creation using a permutation method, language-
    specific scoring schemes are determined
    3 Distance Calculation based on the language-specific scoring-
    scheme, pairwise distances between se-
    quences are calculated
    14 / 28

    View Slide

  56. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Working Procedure
    Sequence Input sequences are read from specifically for-
    matted input files
    1 Sequence Conversion sequences are converted to sound
    classes and prosodic profiles
    2 Scoring-Scheme Creation using a permutation method, language-
    specific scoring schemes are determined
    3 Distance Calculation based on the language-specific scoring-
    scheme, pairwise distances between se-
    quences are calculated
    4 Sequence Clustering sequences are clustered into cognate
    sets whose average distance is beyond
    a certain threshold
    14 / 28

    View Slide

  57. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Working Procedure
    Sequence Input sequences are read from specifically for-
    matted input files
    1 Sequence Conversion sequences are converted to sound
    classes and prosodic profiles
    2 Scoring-Scheme Creation using a permutation method, language-
    specific scoring schemes are determined
    3 Distance Calculation based on the language-specific scoring-
    scheme, pairwise distances between se-
    quences are calculated
    4 Sequence Clustering sequences are clustered into cognate
    sets whose average distance is beyond
    a certain threshold
    Sequence Output information regarding sequence cluster-
    ing is written to file using a specific format
    14 / 28

    View Slide

  58. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Implementation
    15 / 28

    View Slide

  59. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Implementation
    LexStat ist implemented as part of the LingPy Python
    library (see http://lingulist.de/lingpy) for automatic tasks
    in historical linguistics.
    15 / 28

    View Slide

  60. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Implementation
    LexStat ist implemented as part of the LingPy Python
    library (see http://lingulist.de/lingpy) for automatic tasks
    in historical linguistics.
    The current release of LingPy (lingpy-1.0) provides
    methods for pairwise and multiple sequence alignment
    (SCA), automatic cognate detection (LexStat), and plotting
    routines (see the online documentation for details).
    15 / 28

    View Slide

  61. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Implementation
    LexStat ist implemented as part of the LingPy Python
    library (see http://lingulist.de/lingpy) for automatic tasks
    in historical linguistics.
    The current release of LingPy (lingpy-1.0) provides
    methods for pairwise and multiple sequence alignment
    (SCA), automatic cognate detection (LexStat), and plotting
    routines (see the online documentation for details).
    LexStat can be invoked from the Python shell or inside
    Python scripts (examples are given in the online
    documentation).
    15 / 28

    View Slide

  62. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Input and Output
    ID Items German English Swedish
    1 hand hant hænd hand
    2 woman fraʊ wʊmən kvina
    3 know kɛnən nəʊ çɛna
    3 know vɪsən - veːta
    … … … … …
    16 / 28

    View Slide

  63. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Input and Output
    ID Items German COG English COG Swedish COG
    1 hand hant 1 hænd 1 hand 1
    2 woman fraʊ 2 wʊmən 3 kvina 4
    3 know kɛnən 5 nəʊ 5 çɛna 5
    3 know vɪsən 6 - 0 veːta 6
    … … … … … … … …
    16 / 28

    View Slide

  64. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Input and Output
    16 / 28

    View Slide

  65. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Internal Representation of Sequences
    17 / 28

    View Slide

  66. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Internal Representation of Sequences
    .
    Sound Classes and Prosodic Context
    .
    .
    .
    . .
    .
    .
    .
    17 / 28

    View Slide

  67. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Internal Representation of Sequences
    .
    Sound Classes and Prosodic Context
    .
    .
    .
    . .
    .
    .
    .
    All sequences are internally represented as sound classes,
    the default model being the one proposed in List
    (forthcoming).
    17 / 28

    View Slide

  68. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Internal Representation of Sequences
    .
    Sound Classes and Prosodic Context
    .
    .
    .
    . .
    .
    .
    .
    All sequences are internally represented as sound classes,
    the default model being the one proposed in List
    (forthcoming).
    All sequences are also represented by prosodic strings
    which indicate the prosodic environment (initial, ascending,
    maximum, descending, final) of each phonetic segment
    (List 2012).
    17 / 28

    View Slide

  69. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Internal Representation of Sequences
    .
    Sound Classes and Prosodic Context
    .
    .
    .
    . .
    .
    .
    .
    All sequences are internally represented as sound classes,
    the default model being the one proposed in List
    (forthcoming).
    All sequences are also represented by prosodic strings
    which indicate the prosodic environment (initial, ascending,
    maximum, descending, final) of each phonetic segment
    (List 2012).
    The information regarding sound classes and prosodic
    context is combined, and each input sequence is further
    represented as a sequence of tuples, consisting of the
    sound class and the prosodic environment of the respective
    phonetic segment.
    17 / 28

    View Slide

  70. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Scoring-Scheme Creation
    18 / 28

    View Slide

  71. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Scoring-Scheme Creation
    .
    Attested Distribution
    .
    .
    .
    . .
    .
    .
    .
    18 / 28

    View Slide

  72. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Scoring-Scheme Creation
    .
    Attested Distribution
    .
    .
    .
    . .
    .
    .
    .
    carry out global and pairwise alignment analyses of all sequence pairs occuring
    in the same semantic slot
    store all corresponding segments that occur in sequences whose distance is
    beyond a certain threshold
    18 / 28

    View Slide

  73. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Scoring-Scheme Creation
    .
    Attested Distribution
    .
    .
    .
    . .
    .
    .
    .
    carry out global and pairwise alignment analyses of all sequence pairs occuring
    in the same semantic slot
    store all corresponding segments that occur in sequences whose distance is
    beyond a certain threshold
    .
    Creation of the Expected Distribution
    .
    .
    .
    . .
    .
    .
    .
    18 / 28

    View Slide

  74. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Scoring-Scheme Creation
    .
    Attested Distribution
    .
    .
    .
    . .
    .
    .
    .
    carry out global and pairwise alignment analyses of all sequence pairs occuring
    in the same semantic slot
    store all corresponding segments that occur in sequences whose distance is
    beyond a certain threshold
    .
    Creation of the Expected Distribution
    .
    .
    .
    . .
    .
    .
    .
    shuffle the wordlists repeatedly and
    carry out global and pairwise alignment analyses of all sequence pairs in
    the randomly shuffled wordlists
    store all corresponding segments
    average the results
    18 / 28

    View Slide

  75. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Scoring-Scheme Creation
    .
    Attested Distribution
    .
    .
    .
    . .
    .
    .
    .
    carry out global and pairwise alignment analyses of all sequence pairs occuring
    in the same semantic slot
    store all corresponding segments that occur in sequences whose distance is
    beyond a certain threshold
    .
    Creation of the Expected Distribution
    .
    .
    .
    . .
    .
    .
    .
    shuffle the wordlists repeatedly and
    carry out global and pairwise alignment analyses of all sequence pairs in
    the randomly shuffled wordlists
    store all corresponding segments
    average the results
    .
    Calculation of Similarity Scores
    .
    .
    .
    . .
    .
    .
    .
    18 / 28

    View Slide

  76. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Scoring-Scheme Creation
    .
    Attested Distribution
    .
    .
    .
    . .
    .
    .
    .
    carry out global and pairwise alignment analyses of all sequence pairs occuring
    in the same semantic slot
    store all corresponding segments that occur in sequences whose distance is
    beyond a certain threshold
    .
    Creation of the Expected Distribution
    .
    .
    .
    . .
    .
    .
    .
    shuffle the wordlists repeatedly and
    carry out global and pairwise alignment analyses of all sequence pairs in
    the randomly shuffled wordlists
    store all corresponding segments
    average the results
    .
    Calculation of Similarity Scores
    .
    .
    .
    . .
    .
    .
    .
    Calculation of log-odds scores from the distributions.
    18 / 28

    View Slide

  77. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Scoring-Scheme Creation
    English German Att. Exp. Score
    #[t,d] #[t,d] 3.0 1.24 6.3
    #[t,d] #[ʦ] 3.0 0.38 6.0
    #[t,d] #[ʃ,s,z] 1.0 1.99 -1.5
    #[θ,ð] #[t,d] 7.0 0.72 6.3
    #[θ,ð] #[ʦ] 0.0 0.25 -1.5
    #[θ,ð] #[s,z] 0.0 1.33 0.5
    [t,d]$ [t,d]$ 21.0 8.86 6.3
    [t,d]$ [ʦ]$ 3.0 1.62 3.9
    [t,d]$ [ʃ,s]$ 6.0 5.30 1.5
    [θ,ð]$ [t,d]$ 4.0 1.14 4.8
    [θ,ð]$ [ʦ]$ 0.0 0.20 -1.5
    [θ,ð]$ [ʃ,s]$ 0.0 0.80 0.5
    19 / 28

    View Slide

  78. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Scoring-Scheme Creation
    English German Att. Exp. Score
    #[t,d] #[t,d] 3.0 1.24 6.3
    #[t,d] #[ʦ] 3.0 0.38 6.0
    #[t,d] #[ʃ,s,z] 1.0 1.99 -1.5
    #[θ,ð] #[t,d] 7.0 0.72 6.3
    #[θ,ð] #[ʦ] 0.0 0.25 -1.5
    #[θ,ð] #[s,z] 0.0 1.33 0.5
    [t,d]$ [t,d]$ 21.0 8.86 6.3
    [t,d]$ [ʦ]$ 3.0 1.62 3.9
    [t,d]$ [ʃ,s]$ 6.0 5.30 1.5
    [θ,ð]$ [t,d]$ 4.0 1.14 4.8
    [θ,ð]$ [ʦ]$ 0.0 0.20 -1.5
    [θ,ð]$ [ʃ,s]$ 0.0 0.80 0.5
    19 / 28

    View Slide

  79. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Scoring-Scheme Creation
    Initial Final
    English town [taʊn] hot [hɔt]
    German Zaun [ʦaun] heiß [haɪs]
    English thorn [θɔːn] mouth [maʊθ]
    German Dorn [dɔrn] Mund [mʊnt]
    English dale [deɪl] head [hɛd]
    German Tal [taːl] Hut [huːt]
    19 / 28

    View Slide

  80. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Sequence Clustering
    Ger. Eng. Dan. Swe. Dut. Nor.
    Ger. [frau] 0.00 0.95 0.81 0.70 0.34 1.00
    Eng. [wʊmən] 0.95 0.00 0.78 0.90 0.80 0.80
    Dan. [kvenə] 0.81 0.78 0.00 0.17 0.96 0.13
    Swe. [kvinːa] 0.70 0.90 0.17 0.00 0.86 0.10
    Dut. [vrɑuʋ] 0.34 0.80 0.96 0.86 0.00 0.89
    Nor. [kʋinə] 1.00 0.80 0.13 0.10 0.89 0.00
    20 / 28

    View Slide

  81. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Sequence Clustering
    Ger. Eng. Dan. Swe. Dut. Nor.
    Ger. [frau] 0.00 0.95 0.81 0.70 0.34 1.00
    Eng. [wʊmən] 0.95 0.00 0.78 0.90 0.80 0.80
    Dan. [kvenə] 0.81 0.78 0.00 0.17 0.96 0.13
    Swe. [kvinːa] 0.70 0.90 0.17 0.00 0.86 0.10
    Dut. [vrɑuʋ] 0.34 0.80 0.96 0.86 0.00 0.89
    Nor. [kʋinə] 1.00 0.80 0.13 0.10 0.89 0.00
    Clusters 1 2 3 3 1 3
    20 / 28

    View Slide

  82. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    *
    *
    *
    *
    *
    *
    *
    *
    * *
    *
    *
    *
    v o l - d e m o r t
    v - l a d i m i r -
    v a l - d e m a r -
    Evaluation
    21 / 28

    View Slide

  83. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Gold Standard
    22 / 28

    View Slide

  84. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Gold Standard
    File Family Lng. Itm. Entr. Source
    GER Germanic 7 110 814 Starostin (2008)
    ROM Romance 5 110 589 Starostin (2008)
    SLV Slavic 4 110 454 Starostin (2008)
    PIE Indo-Eur. 18 110 2057 Starostin (2008)
    OUG Uralic 21 110 2055 Starostin (2008)
    BAI Bai 9 110 1028 Wang (2006)
    SIN Sinitic 9 180 1614 Hóu (2004)
    KSL varia 8 200 1600 Kessler (2001)
    JAP Japonic 10 200 1986 Shirō (1973)
    22 / 28

    View Slide

  85. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Evaluation Measures
    23 / 28

    View Slide

  86. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Evaluation Measures
    .
    Set Comparison
    .
    .
    .
    . .
    .
    .
    .
    23 / 28

    View Slide

  87. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Evaluation Measures
    .
    Set Comparison
    .
    .
    .
    . .
    .
    .
    .
    Precision, Recall, and F-Score are calculated by comparing the
    cognate sets proposed by the method with the cognate sets in
    the gold standard (see Bergsma & Kondrak 2007).
    23 / 28

    View Slide

  88. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Evaluation Measures
    .
    Set Comparison
    .
    .
    .
    . .
    .
    .
    .
    Precision, Recall, and F-Score are calculated by comparing the
    cognate sets proposed by the method with the cognate sets in
    the gold standard (see Bergsma & Kondrak 2007).
    .
    Pair Comparison
    .
    .
    .
    . .
    .
    .
    .
    23 / 28

    View Slide

  89. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Evaluation Measures
    .
    Set Comparison
    .
    .
    .
    . .
    .
    .
    .
    Precision, Recall, and F-Score are calculated by comparing the
    cognate sets proposed by the method with the cognate sets in
    the gold standard (see Bergsma & Kondrak 2007).
    .
    Pair Comparison
    .
    .
    .
    . .
    .
    .
    .
    Pair comparison is based on a pairwise comparison of all
    decisions present in testset and goldstandard.
    23 / 28

    View Slide

  90. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Tests
    24 / 28

    View Slide

  91. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Tests
    Sound Classes – matching sound classes without
    alignment (based on Turchin et al. 2010)
    24 / 28

    View Slide

  92. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Tests
    Sound Classes – matching sound classes without
    alignment (based on Turchin et al. 2010)
    Simple Alignment – normalized edit-distance (Levenshtein
    1966)
    24 / 28

    View Slide

  93. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Tests
    Sound Classes – matching sound classes without
    alignment (based on Turchin et al. 2010)
    Simple Alignment – normalized edit-distance (Levenshtein
    1966)
    SCA – language-independent distance scores derived from
    sound-class-based alignment analyses (List 2012)
    24 / 28

    View Slide

  94. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Tests
    Sound Classes – matching sound classes without
    alignment (based on Turchin et al. 2010)
    Simple Alignment – normalized edit-distance (Levenshtein
    1966)
    SCA – language-independent distance scores derived from
    sound-class-based alignment analyses (List 2012)
    LexStat – language-specific distance scores
    24 / 28

    View Slide

  95. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    General Results
    25 / 28

    View Slide

  96. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    General Results
    Score LexStat SCA Simple Alm. Sound Cl.
    Identical Pairs 0.85 0.82 0.76 0.74
    Precision 0.59 0.51 0.39 0.39
    Recall 0.68 0.57 0.47 0.55
    F-Score 0.63 0.55 0.42 0.46
    25 / 28

    View Slide

  97. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    General Results
    SLV KSL GER BAI SIN PIE ROM JAP OUG
    0.6
    0.7
    0.8
    0.9
    1.0
    LexStat
    SCA
    NED
    Turchin
    25 / 28

    View Slide

  98. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Specific Results
    26 / 28

    View Slide

  99. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Specific Results
    Pairwise decisions were extracted from the KSL dataset
    and compared with the Gold Standard.
    26 / 28

    View Slide

  100. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Specific Results
    Pairwise decisions were extracted from the KSL dataset
    and compared with the Gold Standard.
    72 borrowings were explicitly marked along with their
    source by Kessler (2001).
    26 / 28

    View Slide

  101. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Specific Results
    Pairwise decisions were extracted from the KSL dataset
    and compared with the Gold Standard.
    72 borrowings were explicitly marked along with their
    source by Kessler (2001).
    83 chance resemblances were determined automatically by
    taking non-cognate word pairs with an NED score less than
    0.6.
    26 / 28

    View Slide

  102. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Specific Results
    Pairwise decisions were extracted from the KSL dataset
    and compared with the Gold Standard.
    72 borrowings were explicitly marked along with their
    source by Kessler (2001).
    83 chance resemblances were determined automatically by
    taking non-cognate word pairs with an NED score less than
    0.6.
    LexStat SCA Simple Alm. Sound Cl.
    Borrowings 50% 61% 49% 53%
    Chance Resemblances 17% 42% 89% 31%
    26 / 28

    View Slide

  103. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    *deh3
    -
    ?
    What’s next?
    27 / 28

    View Slide

  104. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    Special thanks to:
    • The German Federal Mi-
    nistry of Education and
    Research (BMBF) for
    funding our research
    project.
    • Hans Geisler for his hel-
    pful, critical, and inspi-
    ring support.
    • James Kilbury for all the
    time he spent on helping
    me to refine the manu-
    script.
    28 / 28

    View Slide

  105. . .
    Keys to the Past
    . . . . . .
    Identification of Cognates
    . . . . . . .
    LexStat
    . . . . .
    Evaluation
    THANK YOU
    1
    FOR LISTENING!
    28 / 28

    View Slide