Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Computer-Assisted Language Comparison

Computer-Assisted Language Comparison

Talk held at the CLT Seminar (Centre for Language Technology, University of Gothenburg)

Johann-Mattis List

October 09, 2014
Tweet

More Decks by Johann-Mattis List

Other Decks in Science

Transcript

  1. Computer-Assisted Language Comparison
    Bridging the Gap between Traditional and Quantitative Approaches in
    Historical Linguistics
    Johann-Mattis List
    Forschungszentrum Deutscher Sprachatlas
    Philipps-University Marburg
    2014-10-09
    1 / 50

    View full-size slide

  2. Traditional Historical Linguistics
    2 / 50

    View full-size slide

  3. Traditional Historical Linguistics Characteristics
    Characteristics
    FRANZ BOPP
    VERY,
    VERY
    LONG
    TITLE
    3 / 50

    View full-size slide

  4. Traditional Historical Linguistics Characteristics
    Research Object
    4 / 50

    View full-size slide

  5. Traditional Historical Linguistics Characteristics
    Research Object
    German ʦ aː n -
    * Proto-Germanic t a n d
    English t ʊː θ -
    ** Proto-Indo-European d o n t
    Italian d ɛ n t e
    * Proto-Romance d e n t
    French d ɑ̃ - -
    4 / 50

    View full-size slide

  6. Traditional Historical Linguistics Characteristics
    Research Object
    German ʦ aː n -
    * Proto-Germanic t a n d
    English t ʊː θ -
    ** Proto-Indo-European d o n t
    Italian d ɛ n t e
    * Proto-Romance d e n t
    French d ɑ̃ - -
    4 / 50

    View full-size slide

  7. Traditional Historical Linguistics Characteristics
    Research Object
    German ʦ aː n - -
    * Proto-Germanic t a n d
    English t ʊː - θ -
    ** Proto-Indo-European d o n t
    Italian d ɛ n t e
    * Proto-Romance d e n t
    French d ɑ̃ - - -
    4 / 50

    View full-size slide

  8. Traditional Historical Linguistics Characteristics
    Research Object
    German ʦ aː n - -
    Proto-Germanic t a n θ -
    English t ʊː - θ -
    ** Proto-Indo-European d o n t
    Italian d ɛ n t e
    Proto-Romance d e n t e
    French d ɑ̃ - - -
    4 / 50

    View full-size slide

  9. Traditional Historical Linguistics Characteristics
    Research Object
    German ʦ aː n -
    Proto-Germanic t a n θ -
    English t ʊː - θ
    ** Proto-Indo-European d o n t
    Italian d ɛ n t e
    Proto-Romance d e n t e
    French d ɑ̃ - -
    4 / 50

    View full-size slide

  10. Traditional Historical Linguistics Characteristics
    Research Object
    German ʦ aː n -
    Proto-Germanic t a n θ -
    English t ʊː - θ
    Proto-Indo-European d e n t -
    Italian d ɛ n t ə
    Proto-Romance d e n t e
    French d ɑ̃ - -
    4 / 50

    View full-size slide

  11. Traditional Historical Linguistics Characteristics
    Research Object
    German ʦ aː n -
    * Proto-Germanic t a n d
    English t ʊː - θ
    Proto-Indo-European d e n t
    Italian d ɛ n t ə
    * Proto-Romance d e n t
    French d ɑ̃ - -
    4 / 50

    View full-size slide

  12. Traditional Historical Linguistics Characteristics
    Research Object
    German ʦ aː n
    Proto-Germanic t a n θ
    English t ʊː θ
    Proto-Indo-European d e n t
    Italian d ɛ n t e
    Proto-Romance d e n t e
    French d ɑ̃
    German ʦ aː n
    Proto-Germanic t a n θ
    English t ʊː θ
    Proto-Indo-European d e n t
    Italian d ɛ n t e
    Proto-Romance d e n t e
    French d ɑ̃
    1
    4 / 50

    View full-size slide

  13. Traditional Historical Linguistics Characteristics
    Research Object
    History
    individual events (description(
    individual processes (description)
    general processes (modeling, analysis)
    Language History
    individual language states (description of sound system, grammar,
    lexicon)
    individual instances of language development (description of sound
    change patterns, grammaticalization, lexical change)
    general language development (modeling and analysis of sound
    change processes, grammaticalization, lexical change)
    5 / 50

    View full-size slide

  14. Traditional Historical Linguistics Characteristics
    Research Object
    Internal Language History (ontogenesis)
    etymology
    historical grammar
    historical phonology
    External Language History (phylogenesis)
    linguistic reconstruction
    proof of language relationship
    genetic classification
    General Tendencies in Language History
    processes and mechanisms of sound change
    grammaticalization
    lexical change
    6 / 50

    View full-size slide

  15. Traditional Historical Linguistics Characteristics
    Origins
    Uniformitarianism
    “universality of change” – change is independent of time and space
    “graduality of change” – change is neither abrupt nor chaotic
    “uniformity of change” – change is not heterogeneous, but uniform
    Founding Fathers
    Franz Bopp (1791–1867): language comparison (Bopp 1816)
    Rasmus Rask (1787-1832) and Jacob Grimm (1785-1863): sound
    law (Rask 1818, Grimm 1822)
    August Schleicher (1821–1868): family tree and linguistic
    reconstruction (Schleicher 1853 & 1861)
    7 / 50

    View full-size slide

  16. Traditional Historical Linguistics Achievements
    Achievements
    8 / 50

    View full-size slide

  17. Traditional Historical Linguistics Achievements
    Methods and Theories
    Comparative Method (Meillet 1925)
    Basic procedure for proving language relationship and reconstructing
    unattested ancestral language states, etymologies, and genetic
    classifications.
    Family Tree Model and Wave Theory (Schleicher 1853, Schmidt 1872)
    Two partially incompatible models to describe historical language
    relations.
    Regularity Hypothesis (Osthoff & Brugmann 1878)
    Fundamental working hypothesis that states that certain sound change
    processes proceed regularly (universally, gradually, and in a uniform
    manner).
    9 / 50

    View full-size slide

  18. Traditional Historical Linguistics Achievements
    Comparative Method
    proof of
    relationship
    identification
    of cognates
    identification of
    sound correspondences
    reconstruction
    of proto-forms
    internal
    classification
    10 / 50

    View full-size slide

  19. Traditional Historical Linguistics Achievements
    Comparative Method
    proof of
    relationship
    identification
    of cognates
    identification of
    sound correspondences
    reconstruction
    of proto-forms
    internal
    classification
    10 / 50

    View full-size slide

  20. Traditional Historical Linguistics Achievements
    Comparative Method
    Cognate List Alignment Correspondence List
    German dünn d ʏ n GER ENG Frequ.
    d θ 3 x
    d d 1 x
    n n 1 x
    m m 1 x
    ŋ ŋ 1 x
    English thin θ ɪ n
    German Ding d ɪ ŋ
    English thing θ ɪ ŋ
    German dumm d ʊ m
    English dumb d ʌ m
    German Dorn d ɔɐ n
    English thorn d ɔː n
    10 / 50

    View full-size slide

  21. Traditional Historical Linguistics Achievements
    Comparative Method
    Cognate List Alignment Correspondence List
    German dünn d ʏ n GER ENG Frequ.
    d θ 3 x
    d d 1 x
    n n 1 x
    m m 1 x
    ŋ ŋ 1 x
    English thin θ ɪ n
    German Ding d ɪ ŋ
    English thing θ ɪ ŋ
    German dumm d ʊ m
    English dumb d ʌ m
    German Dorn d ɔɐ n
    English thorn d ɔː n
    10 / 50

    View full-size slide

  22. Traditional Historical Linguistics Achievements
    Comparative Method
    Cognate List Alignment Correspondence List
    German dünn d ʏ n GER ENG Frequ.
    d θ 2 x
    d d 1 x
    n n 1 x
    m m 1 x
    ŋ ŋ 1 x
    English thin θ ɪ n
    German Ding d ɪ ŋ
    English thing θ ɪ ŋ
    German dumm d ʊ m
    English dumb d ʌ m
    German Dorn d ɔɐ n
    English thorn d ɔː n
    10 / 50

    View full-size slide

  23. Traditional Historical Linguistics Achievements
    Comparative Method
    Cognate List Alignment Correspondence List
    German dünn d ʏ n GER ENG Frequ.
    d θ 2 x
    d d 1 x
    n n 1 x
    m m 1 x
    ŋ ŋ 1 x
    English thin θ ɪ n
    German Ding d ɪ ŋ
    English thing θ ɪ ŋ
    German dumm d ʊ m
    English dumb d ʌ m
    German Dorn d ɔɐ n
    English thorn θ ɔː n
    10 / 50

    View full-size slide

  24. Traditional Historical Linguistics Achievements
    Comparative Method
    Cognate List Alignment Correspondence List
    German dünn d ʏ n GER ENG Frequ.
    d θ 3 x
    d d 1 x ?
    n n 2 x
    m m 1 x
    ŋ ŋ 1 x
    English thin θ ɪ n
    German Ding d ɪ ŋ
    English thing θ ɪ ŋ
    German dumm d ʊ m
    English dumb d ʌ m
    German Dorn d ɔɐ n
    English thorn θ ɔː n
    10 / 50

    View full-size slide

  25. Traditional Historical Linguistics Achievements
    Comparative Method
    Cognate List Alignment Correspondence List
    German dünn d ʏ n GER ENG Frequ.
    d θ 3 x
    d d 1 x
    n n 2 x
    m m 1 x
    ŋ ŋ 1 x
    English thin θ ɪ n
    German Ding d ɪ ŋ
    English thing θ ɪ ŋ
    German dumm d ʊ m
    English dumb d ʌ m
    German Dorn d ɔɐ n
    English thorn θ ɔː n
    10 / 50

    View full-size slide

  26. Traditional Historical Linguistics Achievements
    Insights
    Internal Language History
    Thanks to historical linguistics, the history of a considerable (but still
    small) amount of languages has been thoroughly investigated.
    External Language History
    Thanks to historical linguistics, a considerable amount of the languages
    in the world has been genetically classified (although there remain
    many unsolved and controversially discussed questions).
    General Language History
    Some work on general processes of language history has been done,
    yet many questions still remain unsolved or are controversially debated.
    11 / 50

    View full-size slide

  27. Traditional Historical Linguistics Problems
    Problems
    12 / 50

    View full-size slide

  28. Traditional Historical Linguistics Problems
    Transparency
    Part of the process of “becoming” a competent
    Indo-Europeanist has always been recognized as coming to
    grasp “intuitively” concepts and types of changes in language
    so as to be able to pick and choose between alternative
    explanations for the history and development of specific
    features of the reconstructed language and its offspring.
    Schwink (1994)
    13 / 50

    View full-size slide

  29. Traditional Historical Linguistics Problems
    Applicability
    14 / 50

    View full-size slide

  30. Traditional Historical Linguistics Problems
    Applicability
    – 7,106 languages (Lewis & Fennig 2013)
    – 147 language families (ibid.)
    – 25244065 languages which could be compared
    14 / 50

    View full-size slide

  31. Traditional Historical Linguistics Problems
    Applicability
    The amount of digitally available data for the lan-
    guages of the world is growing from day to day,
    while there are only a few historical linguists who
    are trained to carry out the comparison of these
    languages. It seems impossible to handle this
    task when relying only on the traditional, time-
    consuming manual procedures developed in tra-
    ditional historical linguistics.
    14 / 50

    View full-size slide

  32. Traditional Historical Linguistics Problems
    Adequacy
    One time is never, two times is ever!
    (a mathematician friend on the treatment of probability in
    Indo-European linguistics)
    15 / 50

    View full-size slide

  33. Traditional Historical Linguistics Problems
    Summary
    Despite its achievements, traditional historical linguistics has some
    clear shortcomings, such as
    a lack of transparency in methodology,
    the “philological” form of knowledge representation, and
    the questionable validity of certain results.
    16 / 50

    View full-size slide

  34. Traditional Historical Linguistics Problems
    Example on “Philological Knowledge Representation”
    Frucht. Sf std. (9. Jh.), mhd. vruht, ahd. fruht, as. fruht. Ent-
    lehnt aus l. frūctus m. gleicher Bedeutung (zu l. fruī “ge-
    nieße”). Das deutsche Wort ist Femininum geworden im
    Anschluß an die ti- Abstrakta wie Flucht² usw. Adjekti-
    ve: fruchtig, fruchtbar; Verb: (be-)fruchten. Ebenso nndl.
    vrucht, ne. fruit, nfrz. fruit, nschw. frukt, nnorw. frukt; frugal.
    (Kluge und Seebold 2002)
    17 / 50

    View full-size slide

  35. Quantitative Historical Linguistics
    18 / 50

    View full-size slide

  36. Quantitative Historical Linguistics Characteristics
    Characteristics
    P(A|B)=(P(B|A)P(A))/(P(B)
    19 / 50

    View full-size slide

  37. Quantitative Historical Linguistics Characteristics
    Characteristics
    “Indo-European and computational cladistics” (Ringe, Warnow and Taylor
    2002)
    “Language-tree divergence times support the Anatolian theory of
    Indo-European origin” (Gray und Atkinson 2003)
    “Language classification by numbers” (McMahon und McMahon 2005)
    “Curious Parallels and Curious Connections: Phylogenetic Thinking in
    Biology and Historical Linguistics” (Atkinson und Gray 2005)
    “Automated classification of the world’s languages” (Brown et al. 2008)
    “Computational Feature-Sensitive Reconstruction of Language
    Relationships: Developing the ALINE Distance for Comparative Historical
    Linguistic Reconstruction” (Downey et al. 2008)
    “Networks uncover hidden lexical borrowing in Indo-European language
    evolution” (Nelson-Sathi et al. 2011)
    “A pipeline for computational historical linguistics” (Steiner, Stadler, und
    Cysouw 2011)
    20 / 50

    View full-size slide

  38. Quantitative Historical Linguistics Characteristics
    Points of Interest and Goals
    phylogenetic reconstruction
    sequence comparison
    general questions of language development
    21 / 50

    View full-size slide

  39. Quantitative Historical Linguistics Characteristics
    Points of Interest and Goals
    phylogenetic reconstruction
    sequence comparison
    general questions of language development
    Goals
    If we cannot guarantee getting the same results from the same data
    considered by different linguists, we jeopardize the essential scientific
    criterion of repeatability. (McMahon & McMahon 2005)
    21 / 50

    View full-size slide

  40. Quantitative Historical Linguistics Characteristics
    Methods and Theories
    phylogenetic reconstruction (cf., among others, Gray & Atkinson
    2003 Ringe et al. 2002, Brown et al. 2008)
    phonetic alignment (cf., among others, Kondrak 2000, Prokić et al.
    2009, List 2012a)
    cognate detection (cf. Steiner et al. 2011, List 2012b)
    borrowing detection (cf. Nelson-Sathi et al. 2011, List et al. 2014a)
    22 / 50

    View full-size slide

  41. Quantitative Historical Linguistics Achievements
    Achievements
    23 / 50

    View full-size slide

  42. Quantitative Historical Linguistics Achievements
    New Perspectives
    external language history receives more attention than before
    “Indo-Euro-Centrism” is replaced by a more cross-linguistic
    paradigm
    new questions regarding general language history
    new proposals to model language history
    24 / 50

    View full-size slide

  43. Quantitative Historical Linguistics Achievements
    New Approaches
    empirical data becomes the center of interest
    probabilistic approaches replace “historical” approaches
    databases replace philological knowledge representation
    “informal” methods are formalized and automatized
    25 / 50

    View full-size slide

  44. Quantitative Historical Linguistics Achievements
    Examples
    26 / 50

    View full-size slide

  45. Quantitative Historical Linguistics Achievements
    Examples: Phonetic Alignment
    Alignment Analyses
    Alignment analyses display sequence similarities by representing
    multiple sequences as rows of a matrix in which common segments are
    placed in the same column. Alignments are a formal way to deal with
    general tasks of sequence comparison. Although never explicitly
    labeled or displayed, alignments are virtually present in all analyses in
    historical linguistics dealing with the comparison of sound sequences
    (words, morphemes).
    26 / 50

    View full-size slide

  46. Quantitative Historical Linguistics Achievements
    Examples: Phonetic Alignment
    Alignment Analyses
    Alignment analyses display sequence similarities by representing
    multiple sequences as rows of a matrix in which common segments are
    placed in the same column. Alignments are a formal way to deal with
    general tasks of sequence comparison. Although never explicitly
    labeled or displayed, alignments are virtually present in all analyses in
    historical linguistics dealing with the comparison of sound sequences
    (words, morphemes).
    t ɔ x t ə r
    d ɔː t ə r
    26 / 50

    View full-size slide

  47. Quantitative Historical Linguistics Achievements
    Examples: Phonetic Alignment
    Alignment Analyses
    Alignment analyses display sequence similarities by representing
    multiple sequences as rows of a matrix in which common segments are
    placed in the same column. Alignments are a formal way to deal with
    general tasks of sequence comparison. Although never explicitly
    labeled or displayed, alignments are virtually present in all analyses in
    historical linguistics dealing with the comparison of sound sequences
    (words, morphemes).
    t ɔ x t ə r
    d ɔː t ə r
    26 / 50

    View full-size slide

  48. Quantitative Historical Linguistics Achievements
    Examples: Phonetic Alignment
    Alignment Analyses
    Alignment analyses display sequence similarities by representing
    multiple sequences as rows of a matrix in which common segments are
    placed in the same column. Alignments are a formal way to deal with
    general tasks of sequence comparison. Although never explicitly
    labeled or displayed, alignments are virtually present in all analyses in
    historical linguistics dealing with the comparison of sound sequences
    (words, morphemes).
    t ɔ x t ə r
    d ɔː - t ə r
    26 / 50

    View full-size slide

  49. Quantitative Historical Linguistics Achievements
    Examples: Phonetic Alignment
    Sound Classes
    Sounds which frequently occur in
    correspondence relations in
    genetically related languages can
    be clustered into classes. It is
    thereby assumed that “phonetic
    correspondences inside a ‘type’ are
    more regular than those between
    different ‘types’” (Dolgopolsky
    1986[1964]: 35).
    27 / 50

    View full-size slide

  50. Quantitative Historical Linguistics Achievements
    Examples: Phonetic Alignment
    Sound Classes
    Sounds which frequently occur in
    correspondence relations in
    genetically related languages can
    be clustered into classes. It is
    thereby assumed that “phonetic
    correspondences inside a ‘type’ are
    more regular than those between
    different ‘types’” (Dolgopolsky
    1986[1964]: 35).
    k g p b
    ʧ ʤ f v
    t d ʃ ʒ
    θ ð s z
    1
    27 / 50

    View full-size slide

  51. Quantitative Historical Linguistics Achievements
    Examples: Phonetic Alignment
    Sound Classes
    Sounds which frequently occur in
    correspondence relations in
    genetically related languages can
    be clustered into classes. It is
    thereby assumed that “phonetic
    correspondences inside a ‘type’ are
    more regular than those between
    different ‘types’” (Dolgopolsky
    1986[1964]: 35).
    k g p b
    ʧ ʤ f v
    t d ʃ ʒ
    θ ð s z
    1
    27 / 50

    View full-size slide

  52. Quantitative Historical Linguistics Achievements
    Examples: Phonetic Alignment
    Sound Classes
    Sounds which frequently occur in
    correspondence relations in
    genetically related languages can
    be clustered into classes. It is
    thereby assumed that “phonetic
    correspondences inside a ‘type’ are
    more regular than those between
    different ‘types’” (Dolgopolsky
    1986[1964]: 35).
    k g p b
    ʧ ʤ f v
    t d ʃ ʒ
    θ ð s z
    1
    27 / 50

    View full-size slide

  53. Quantitative Historical Linguistics Achievements
    Examples: Phonetic Alignment
    Sound Classes
    Sounds which frequently occur in
    correspondence relations in
    genetically related languages can
    be clustered into classes. It is
    thereby assumed that “phonetic
    correspondences inside a ‘type’ are
    more regular than those between
    different ‘types’” (Dolgopolsky
    1986[1964]: 35).
    K
    T
    P
    S
    1
    27 / 50

    View full-size slide

  54. Quantitative Historical Linguistics Achievements
    Examples: Phonetic Alignment
    Sound-Class-Based Phonetic Alignment (SCA, List 2012 a)
    Sound classes and alignment analyses can be combined. Sound
    sequences are internally represented as sound classes. Alignments are
    carried out using standard algorithms developed in evolutionary biology.
    28 / 50

    View full-size slide

  55. Quantitative Historical Linguistics Achievements
    Examples: Phonetic Alignment
    Sound-Class-Based Phonetic Alignment (SCA, List 2012 a)
    Sound classes and alignment analyses can be combined. Sound
    sequences are internally represented as sound classes. Alignments are
    carried out using standard algorithms developed in evolutionary biology.
    INPUT
    tɔxtər
    dɔːtər
    TOKENIZATION
    t, ɔ, x, t, ə, r
    d, ɔː, t, ə, r
    CONVERSION
    t ɔ x … → T O G …
    d ɔː t … → T O T …
    ALIGNMENT
    T O G T E R
    T O - T E R
    CONVERSION
    T O G … → t ɔ x …
    T O - … → d oː - …
    OUTPUT
    t ɔ x t ə r
    d ɔː - t ə r
    1
    28 / 50

    View full-size slide

  56. Quantitative Historical Linguistics Achievements
    Examples: Phonetic Alignment
    SCA reaches an accuracy of more than 90 % for multiple alignment
    analyses, using the conservative column scores as evaluation
    scores.
    SCA can be applied to almost all languages, including tone
    languages (clicks are not yet supported), provided the data is given
    in regular phonetic transcription.
    SCA models prosodic properties of sound sequences and scores
    sound segments differently, depending on their position in the
    sequence, thereby accounting for general theories of prosodic
    strength .
    SCA is integrated in LingPy (http://lingpy.org, List & Moran
    2013, an open source Python toolkit for quantitative tasks in
    historical linguistics and has been successfully tested on all major
    platforms (Mac, Linux, Microsoft).
    29 / 50

    View full-size slide

  57. Quantitative Historical Linguistics Achievements
    Examples: Automatic Cognate Detection
    LexStat (List 2012, List 2014)
    LexStat is a method for automatic cognate detection in multilingual
    wordlists. It uses on sound-class-based sequence alignment (SCA)
    analyses as a proxy to infer language-specific sound similarities (similar
    to the notion of sound correspondences in historical linguistics). Using
    the automatically inferred sound similarities, LexStat partitions words
    into cognate sets.
    30 / 50

    View full-size slide

  58. Quantitative Historical Linguistics Achievements
    Examples: Automatic Cognate Detection
    Basic Procedure for Multilingual Cognate Detection
    WORDLIST
    DATA
    31 / 50

    View full-size slide

  59. Quantitative Historical Linguistics Achievements
    Examples: Automatic Cognate Detection
    Basic Procedure for Multilingual Cognate Detection
    WORDLIST
    DATA
    PAIRWISE
    DISTANCES
    BETWEEN
    WORDS
    PAIRWISE
    COMPARISON
    31 / 50

    View full-size slide

  60. Quantitative Historical Linguistics Achievements
    Examples: Automatic Cognate Detection
    Basic Procedure for Multilingual Cognate Detection
    WORDLIST
    DATA
    PAIRWISE
    DISTANCES
    BETWEEN
    WORDS
    COGNATE
    SETS
    COGNATE
    CLUSTERING
    PAIRWISE
    COMPARISON
    31 / 50

    View full-size slide

  61. Quantitative Historical Linguistics Achievements
    Examples: Automatic Cognate Detection
    Cognate Clustering
    Analysis
    ID Taxa Word Gloss GlossID IPA
    ... ... ... ... ... ...
    21 German Frau woman 20 frau
    22 Dutch vrouw woman 20 vrɑu
    23 English woman woman 20 wʊmən
    24 Danish kvinde woman 20 kvenə
    25 Swedish kvinna woman 20 kviːna
    26 Norwegian kvine woman 20 kʋinə
    ... ... ... ... ... ...
    31 / 50

    View full-size slide

  62. Quantitative Historical Linguistics Achievements
    Examples: Automatic Cognate Detection
    Cognate Clustering
    Swedish English Danish Norwegian Dutch German
    kvinna woman kvinde kvine vrouw Frau
    Swedish
    kvina
    0.00 0.69 0.07 0.12 0.71 0.78
    English
    wumin
    0.69 0.00 0.66 0.57 0.68 0.87
    Danish
    kveni
    0.07 0.66 0.00 0.08 0.67 0.71
    Norwegian
    kwini
    0.12 0.57 0.08 0.00 0.75 0.74
    Dutch
    frou
    0.71 0.68 0.67 0.75 0.00 0.17
    German
    frau
    0.78 0.87 0.71 0.74 0.17 0.00
    Analysis
    ID Taxa Word Gloss GlossID IPA
    ... ... ... ... ... ...
    21 German Frau woman 20 frau
    22 Dutch vrouw woman 20 vrɑu
    23 English woman woman 20 wʊmən
    24 Danish kvinde woman 20 kvenə
    25 Swedish kvinna woman 20 kviːna
    26 Norwegian kvine woman 20 kʋinə
    ... ... ... ... ... ...
    31 / 50

    View full-size slide

  63. Quantitative Historical Linguistics Achievements
    Examples: Automatic Cognate Detection
    Cognate Clustering
    Swedish English Danish Norwegian Dutch German
    kvinna woman kvinde kvine vrouw Frau
    Swedish
    kvina
    0.00 0.69 0.07 0.12 0.71 0.78
    English
    wumin
    0.69 0.00 0.66 0.57 0.68 0.87
    Danish
    kveni
    0.07 0.66 0.00 0.08 0.67 0.71
    Norwegian
    kwini
    0.12 0.57 0.08 0.00 0.75 0.74
    Dutch
    frou
    0.71 0.68 0.67 0.75 0.00 0.17
    German
    frau
    0.78 0.87 0.71 0.74 0.17 0.00
    German Frau frau
    Dutch vrouw vrou
    English woman wumin
    Danish kvinde kveni
    Swedish kvinna kvina
    Norwegian kvine kwini
    31 / 50

    View full-size slide

  64. Quantitative Historical Linguistics Achievements
    Examples: Automatic Cognate Detection
    Cognate Clustering
    Swedish English Danish Norwegian Dutch German
    kvinna woman kvinde kvine vrouw Frau
    Swedish
    kvina
    0.00 0.69 0.07 0.12 0.71 0.78
    English
    wumin
    0.69 0.00 0.66 0.57 0.68 0.87
    Danish
    kveni
    0.07 0.66 0.00 0.08 0.67 0.71
    Norwegian
    kwini
    0.12 0.57 0.08 0.00 0.75 0.74
    Dutch
    frou
    0.71 0.68 0.67 0.75 0.00 0.17
    German
    frau
    0.78 0.87 0.71 0.74 0.17 0.00
    German Frau frau
    Dutch vrouw vrou
    English woman wumin
    Danish kvinde kveni
    Swedish kvinna kvina
    Norwegian kvine kwini
    31 / 50

    View full-size slide

  65. Quantitative Historical Linguistics Achievements
    Examples: Automatic Cognate Detection
    Cognate Clustering
    German Frau frau
    Dutch vrouw vrou
    English woman wumin
    Danish kvinde kveni
    Swedish kvinna kvina
    Norwegian kvine kwini
    Analysis
    ID Taxa Word Gloss GlossID IPA CogID
    ... ... ... ... ... ... ...
    21 German Frau woman 20 frau 1
    22 Dutch vrouw woman 20 vrɑu 1
    23 English woman woman 20 wʊmən 2
    24 Danish kvinde woman 20 kvenə 3
    25 Swedish kvinna woman 20 kviːna 3
    26 Norwegian kvine woman 20 kʋinə 3
    ... ... ... ... ... ... ...
    31 / 50

    View full-size slide

  66. Quantitative Historical Linguistics Achievements
    Examples: Automatic Cognate Detection
    INPUT
    TOKENIZATION
    PREPROCESSING
    LOG-ODDS
    D ISTANCE
    COGNATE
    OUTPUT
    CORRESPONDENCE
    DETECTION USING
    PHONETIC
    ALIGNMENT
    LOOP
    DISTRIBUTION
    LexStat Algorithm (List 2014)
    EXPECTED
    ATTESTED
    DISTRIBUTION
    CALCULATION
    CLUSTERING
    31 / 50

    View full-size slide

  67. Quantitative Historical Linguistics Achievements
    Examples: Automatic Cognate Detection
    B-Cubed F-Scores on BDCD Benchmark (List 2014)
    Bai
    (Tibeto-Burman)
    Indo-European
    Japanese and
    Ryukyu Ob-Ugrian
    Austronesian
    Sinitic
    (Chinese Dialects)
    60
    65
    70
    75
    80
    85
    90
    95
    Turchin
    NED
    SCA
    LexStat
    32 / 50

    View full-size slide

  68. Quantitative Historical Linguistics Achievements
    Examples: Automatic Cognate Detection
    B-Cubed F-Scores on BDCD Benchmark (List 2014)
    Bai
    (Tibeto-Burman)
    Indo-European
    Japanese and
    Ryukyu Ob-Ugrian
    Austronesian
    Sinitic
    (Chinese Dialects)
    60
    65
    70
    75
    80
    85
    90
    95
    Turchin
    NED
    SCA
    LexStat
    75%
    93%
    92%
    81%
    89%
    81%
    32 / 50

    View full-size slide

  69. Quantitative Historical Linguistics Achievements
    Examples: Automatic Cognate Detection
    B-Cubed F-Scores on BDCD Benchmark (List 2014)
    Bai
    (Tibeto-Burman)
    Indo-European
    Japanese and
    Ryukyu Ob-Ugrian
    Austronesian
    Sinitic
    (Chinese Dialects)
    60
    65
    70
    75
    80
    85
    90
    95
    Turchin
    NED
    SCA
    LexStat
    75%
    93%
    32 / 50

    View full-size slide

  70. Quantitative Historical Linguistics Achievements
    Examples: Phylogenetic Networks
    Hugo Schuchardt
    (1842-1927)
    33 / 50

    View full-size slide

  71. Quantitative Historical Linguistics Achievements
    Examples: Phylogenetic Networks
    Hugo Schuchardt
    (1842-1927)
    “We connect the branches and twigs
    of the tree with countless horizon-
    tal lines and it ceases to be a tree.”
    (Schuchardt 1870 [1900]: 11)
    33 / 50

    View full-size slide

  72. Quantitative Historical Linguistics Achievements
    Examples: Phylogenetic Networks
    Hugo Schuchardt
    (1842-1927)
    33 / 50

    View full-size slide

  73. Quantitative Historical Linguistics Achievements
    Examples: Phylogenetic Networks
    Hugo Schuchardt
    (1842-1927)
    33 / 50

    View full-size slide

  74. Quantitative Historical Linguistics Achievements
    Examples: Phylogenetic Networks
    Biological Workflow (Dagan & Martin 2007, Dagan et al. 2008)
    1 collect phyletic pattern data (shared gene families) of the taxa that shall
    be investigated
    2 use gain-loss mapping techniques with different weighting models,
    allowing for different amounts of gain events to analyze how the gene
    families evolved along a given reference tree
    3 use ancestral genome sizes as an external criterion to determine the
    best weighting model
    4 assume that all patterns for which the best model yields more than one
    gain event result from lateral gene transfer
    5 reconstruct a minimal lateral network (MLN) by connecting multiple
    gains for the same gene family by lateral edges
    34 / 50

    View full-size slide

  75. Quantitative Historical Linguistics Achievements
    Examples: Phylogenetic Networks
    Linguistic Workflow (Nelson-Sathi et al. 2011, List et al. 2014a)
    1 collect phyletic pattern data (shared cognates) of the languages that shall
    be investigated
    2 use gain-loss mapping techniques with different weighting models,
    allowing for different amounts of to analyze how the cognates evolved
    along a given reference tree
    3 use ancestral vocabulary size distributions as an external criterion to
    determine the best weighting model
    4 allow for a substantial amount (5%) of parallel evolution
    5 assume that all patterns for which the best model yields more than one
    gain event result from lateral gene transfer
    6 reconstruct a minimal lateral network by connecting multiple gains of
    the same cognate by lateral edges
    35 / 50

    View full-size slide

  76. Quantitative Historical Linguistics Achievements
    Examples: Phylogenetic Networks
    36 / 50

    View full-size slide

  77. Quantitative Historical Linguistics Achievements
    Examples: Phylogenetic Networks
    -- Spanish
    --
    French
    --
    Italian
    Danish
    --
    English --
    German
    --
    36 / 50

    View full-size slide

  78. Quantitative Historical Linguistics Achievements
    Examples: Phylogenetic Networks
    -- Spanish
    --
    French
    --
    Italian
    Danish
    --
    English --
    German
    --
    36 / 50

    View full-size slide

  79. Quantitative Historical Linguistics Achievements
    Examples: Phylogenetic Networks
    -- Spanish
    --
    French
    --
    Italian
    Danish
    --
    English --
    German
    --
    36 / 50

    View full-size slide

  80. Quantitative Historical Linguistics Achievements
    Examples: Phylogenetic Networks
    -- Spanish
    --
    French
    --
    Italian
    Danish
    --
    English --
    German
    --
    36 / 50

    View full-size slide

  81. Quantitative Historical Linguistics Achievements
    Examples: Phylogenetic Networks
    List et al. (2014b)
    .
    .
    ---Lánzhōu
    .
    Fùzhōu --
    .
    Xiāngtàn --
    .
    M
    ěixiàn
    --
    .
    H
    ongkong
    --
    .
    ---Wǔhàn
    .
    ---Běijīng
    .
    ---Kùnmíng
    .
    Hángzhōu
    --
    .
    Xiàmén --
    .
    ---Chéngdū
    .
    Sùzhōu
    --
    .
    Shànghǎi --
    .
    Táiběi --
    .
    ---Zhèngzhōu
    .
    Shèxiàn --
    .
    ---Nánjīng
    .
    ---Guìyáng
    .
    W
    énzhōu
    --
    .
    N
    ánníng
    --
    .
    Tūnxī --
    .
    ---Tiānjìn
    .
    Shāntóu --
    .
    ---Xīníng
    .
    ---Q
    īngdǎo
    .
    ---Ürüm
    qi
    .
    ---Píngyáo
    .
    Nánchàng --
    .
    ---Tàiyuán
    .
    Chángshā --
    .
    Hǎikǒu --
    .
    ---Héfèi
    .
    Jiàn'ǒu --
    .
    ---Yīnchuàn
    .
    ---Hohhot
    .
    Táoyuán --
    .
    ---Xī'ān
    .
    G
    uǎngzhōu
    --
    .
    ---Harbin
    .
    ---Jìnán
    .
    0
    .
    0
    .
    0
    .
    Inferred Links
    36 / 50

    View full-size slide

  82. Quantitative Historical Linguistics Achievements
    Examples: Phylogenetic Networks
    List et al. (2014b)
    .
    .
    ---Lánzhōu
    .
    Fùzhōu --
    .
    Xiāngtàn --
    .
    M
    ěixiàn
    --
    .
    H
    ongkong
    --
    .
    ---Wǔhàn
    .
    ---Běijīng
    .
    ---Kùnmíng
    .
    Hángzhōu
    --
    .
    Xiàmén --
    .
    ---Chéngdū
    .
    Sùzhōu
    --
    .
    Shànghǎi --
    .
    Táiběi --
    .
    ---Zhèngzhōu
    .
    Shèxiàn --
    .
    ---Nánjīng
    .
    ---Guìyáng
    .
    W
    énzhōu
    --
    .
    N
    ánníng
    --
    .
    Tūnxī --
    .
    ---Tiānjìn
    .
    Shāntóu --
    .
    ---Xīníng
    .
    ---Q
    īngdǎo
    .
    ---Ürüm
    qi
    .
    ---Píngyáo
    .
    Nánchàng --
    .
    ---Tàiyuán
    .
    Chángshā --
    .
    Hǎikǒu --
    .
    ---Héfèi
    .
    Jiàn'ǒu --
    .
    ---Yīnchuàn
    .
    ---Hohhot
    .
    Táoyuán --
    .
    ---Xī'ān
    .
    G
    uǎngzhōu
    --
    .
    ---Harbin
    .
    ---Jìnán
    .
    0
    .
    0
    .
    0
    .
    Inferred Links
    36 / 50

    View full-size slide

  83. Quantitative Historical Linguistics Achievements
    Examples: Phylogenetic Networks
    List et al. (2014b)
    .
    .
    ---Lánzhōu
    .
    Fùzhōu --
    .
    Xiāngtàn --
    .
    M
    ěixiàn
    --
    .
    H
    ongkong
    --
    .
    ---Wǔhàn
    .
    ---Běijīng
    .
    ---Kùnmíng
    .
    Hángzhōu
    --
    .
    Xiàmén --
    .
    ---Chéngdū
    .
    Sùzhōu
    --
    .
    Shànghǎi --
    .
    Táiběi --
    .
    ---Zhèngzhōu
    .
    Shèxiàn --
    .
    ---Nánjīng
    .
    ---Guìyáng
    .
    W
    énzhōu
    --
    .
    N
    ánníng
    --
    .
    Tūnxī --
    .
    ---Tiānjìn
    .
    Shāntóu --
    .
    ---Xīníng
    .
    ---Q
    īngdǎo
    .
    ---Ürüm
    qi
    .
    ---Píngyáo
    .
    Nánchàng --
    .
    ---Tàiyuán
    .
    Chángshā --
    .
    Hǎikǒu --
    .
    ---Héfèi
    .
    Jiàn'ǒu --
    .
    ---Yīnchuàn
    .
    ---Hohhot
    .
    Táoyuán --
    .
    ---Xī'ān
    .
    G
    uǎngzhōu
    --
    .
    ---Harbin
    .
    ---Jìnán
    .
    1
    .
    10
    .
    20
    .
    Inferred Links
    36 / 50

    View full-size slide

  84. Quantitative Historical Linguistics Achievements
    Examples: Phylogenetic Networks
    List et al. (2014b)
    .
    .
    ---Lánzhōu
    .
    Fùzhōu --
    .
    Xiāngtàn --
    .
    M
    ěixiàn
    --
    .
    H
    ongkong
    --
    .
    ---Wǔhàn
    .
    ---Běijīng
    .
    ---Kùnmíng
    .
    Hángzhōu
    --
    .
    Xiàmén --
    .
    ---Chéngdū
    .
    Sùzhōu
    --
    .
    Shànghǎi --
    .
    Táiběi --
    .
    ---Zhèngzhōu
    .
    Shèxiàn --
    .
    ---Nánjīng
    .
    ---Guìyáng
    .
    W
    énzhōu
    --
    .
    N
    ánníng
    --
    .
    Tūnxī --
    .
    ---Tiānjìn
    .
    Shāntóu --
    .
    ---Xīníng
    .
    ---Q
    īngdǎo
    .
    ---Ürüm
    qi
    .
    ---Píngyáo
    .
    Nánchàng --
    .
    ---Tàiyuán
    .
    Chángshā --
    .
    Hǎikǒu --
    .
    ---Héfèi
    .
    Jiàn'ǒu --
    .
    ---Yīnchuàn
    .
    ---Hohhot
    .
    Táoyuán --
    .
    ---Xī'ān
    .
    G
    uǎngzhōu
    --
    .
    ---Harbin
    .
    ---Jìnán
    .
    1
    .
    4
    .
    8
    .
    Inferred Links
    36 / 50

    View full-size slide

  85. Quantitative Historical Linguistics Problems
    Problems
    37 / 50

    View full-size slide

  86. Quantitative Historical Linguistics Problems
    Transparency
    38 / 50

    View full-size slide

  87. Quantitative Historical Linguistics Problems
    Transparency
    Evaluation criteria for applied automatic methods are not
    very intuitive and vary greatly.
    38 / 50

    View full-size slide

  88. Quantitative Historical Linguistics Problems
    Transparency
    Evaluation criteria for applied automatic methods are not
    very intuitive and vary greatly.
    Benchmark databases are rarely used, especially in
    phylogenetic approaches eyeballing of phylogenetic trees is
    sold as proof for “valid approaches”.
    38 / 50

    View full-size slide

  89. Quantitative Historical Linguistics Problems
    Transparency
    Evaluation criteria for applied automatic methods are not
    very intuitive and vary greatly.
    Benchmark databases are rarely used, especially in
    phylogenetic approaches eyeballing of phylogenetic trees is
    sold as proof for “valid approaches”.
    It is difficult to communicate the results to traditional linguists.
    38 / 50

    View full-size slide

  90. Quantitative Historical Linguistics Problems
    Transparency
    Evaluation criteria for applied automatic methods are not
    very intuitive and vary greatly.
    Benchmark databases are rarely used, especially in
    phylogenetic approaches eyeballing of phylogenetic trees is
    sold as proof for “valid approaches”.
    It is difficult to communicate the results to traditional linguists.
    → Many linguists regard automatic approaches as
    38 / 50

    View full-size slide

  91. Quantitative Historical Linguistics Problems
    Transparency
    Evaluation criteria for applied automatic methods are not
    very intuitive and vary greatly.
    Benchmark databases are rarely used, especially in
    phylogenetic approaches eyeballing of phylogenetic trees is
    sold as proof for “valid approaches”.
    It is difficult to communicate the results to traditional linguists.
    → Many linguists regard automatic approaches as
    – not trustworthy and error-prone, or
    38 / 50

    View full-size slide

  92. Quantitative Historical Linguistics Problems
    Transparency
    Evaluation criteria for applied automatic methods are not
    very intuitive and vary greatly.
    Benchmark databases are rarely used, especially in
    phylogenetic approaches eyeballing of phylogenetic trees is
    sold as proof for “valid approaches”.
    It is difficult to communicate the results to traditional linguists.
    → Many linguists regard automatic approaches as
    – not trustworthy and error-prone, or
    – “impossible per se”, or
    38 / 50

    View full-size slide

  93. Quantitative Historical Linguistics Problems
    Transparency
    Evaluation criteria for applied automatic methods are not
    very intuitive and vary greatly.
    Benchmark databases are rarely used, especially in
    phylogenetic approaches eyeballing of phylogenetic trees is
    sold as proof for “valid approaches”.
    It is difficult to communicate the results to traditional linguists.
    → Many linguists regard automatic approaches as
    – not trustworthy and error-prone, or
    – “impossible per se”, or
    – as useful as “rolling a dice”.
    38 / 50

    View full-size slide

  94. Quantitative Historical Linguistics Problems
    Applicability
    39 / 50

    View full-size slide

  95. Quantitative Historical Linguistics Problems
    Applicability
    Method
    Multilingual?
    No additional
    requirements?
    Freely
    Available?
    Mackay & Kondrak 2005 ✗ ✓ ✗
    Bergsma & Kondrak 2007 ✓ ✓ ✗
    Turchin et al. 2010 ✓ ✓ ✓
    Berg-Kirkpatrick & Klein 2011 ✗ ✓ ✗
    Hauer & Kondrak 2011 ✓ ✓ ✗
    Steiner et al. 2011 ✓ ✓ ✗
    List 2012 & List 2014 ✓ ✓ ✓
    Beinborn et al. 2013 ✗ ? ✗
    Bouchard-Côté et al. 2013 ✓ ✗ ✗
    Rama 2013 ✗ ✓ ✗
    Ciobanu & Dinu 2014 ✗ ✓ ✗
    … … … …
    39 / 50

    View full-size slide

  96. Quantitative Historical Linguistics Problems
    Applicability
    Method
    Multilingual?
    No additional
    requirements?
    Freely
    Available?
    Mackay & Kondrak 2005 ✗ ✓ ✗
    Bergsma & Kondrak 2007 ✓ ✓ ✗
    Turchin et al. 2010 ✓ ✓ ✓
    Berg-Kirkpatrick & Klein 2011 ✗ ✓ ✗
    Hauer & Kondrak 2011 ✓ ✓ ✗
    Steiner et al. 2011 ✓ ✓ ✗
    List 2012 & 2014 ✓ ✓ ✓
    Beinborn et al. 2013 ✗ ? ✗
    Bouchard-Côté et al. 2013 ✓ ✗ ✗
    Rama 2013 ✗ ✓ ✗
    Ciobanu & Dinu 2014 ✗ ✓ ✗
    … … … …
    39 / 50

    View full-size slide

  97. Quantitative Historical Linguistics Problems
    Applicability
    Method
    Multilingual?
    No additional
    requirements?
    Freely
    Available?
    Mackay & Kondrak 2005 ✗ ✓ ✗
    Bergsma & Kondrak 2007 ✓ ✓ ✗
    Turchin et al. 2010 ✓ ✓ ✓
    Berg-Kirkpatrick & Klein 2011 ✗ ✓ ✗
    Hauer & Kondrak 2011 ✓ ✓ ✗
    Steiner et al. 2011 ✓ ✓ ✗
    List 2012 & 2014 ✓ ✓ ✓
    Beinborn et al. 2013 ✗ ? ✗
    Bouchard-Côté et al. 2013 ✓ ✗ ✗
    Rama 2013 ✗ ✓ ✗
    Ciobanu & Dinu 2014 ✗ ✓ ✗
    … … … …
    39 / 50

    View full-size slide

  98. Quantitative Historical Linguistics Problems
    Applicability
    Method
    Multilingual?
    No additional
    requirements?
    Freely
    Available?
    Mackay & Kondrak 2005 ✗ ✓ ✗
    Bergsma & Kondrak 2007 ✓ ✓ ✗
    Turchin et al. 2010 ✓ ✓ ✓
    Berg-Kirkpatrick & Klein 2011 ✗ ✓ ✗
    Hauer & Kondrak 2011 ✓ ✓ ✗
    Steiner et al. 2011 ✓ ✓ ✗
    List 2012 & 2014 ✓ ✓ ✓
    Beinborn et al. 2013 ✗ ? ✗
    Bouchard-Côté et al. 2013 ✓ ✗ ✗
    Rama 2013 ✗ ✓ ✗
    Ciobanu & Dinu 2014 ✗ ✓ ✗
    … … … …
    39 / 50

    View full-size slide

  99. Quantitative Historical Linguistics Problems
    Accuracy
    Data Problems (Geisler & List forthcoming)
    Comparing two independently produced lexicostatistical datasets:
    database # languages # concepts
    Dyen et al. 1997 95 200
    Tower of Babel 98 110
    intersection 46 103
    40 / 50

    View full-size slide

  100. Quantitative Historical Linguistics Problems
    Accuracy
    Data Problems (Geisler & List forthcoming)
    Comparing two independently produced lexicostatistical datasets:
    database # languages # concepts
    Dyen et al. 1997 95 200
    Tower of Babel 98 110
    intersection 46 103
    Results
    up to 10 % difference in concept translations
    many undetected borrowings in both datasets
    up to 30 % differences in tree topologies for Bayesian analyses
    40 / 50

    View full-size slide

  101. Quantitative Historical Linguistics Problems
    Accuracy
    40 / 50

    View full-size slide

  102. Quantitative Historical Linguistics Problems
    Summary
    Many quantitative methods which are based on manually compiled
    datasets cannot cope with errors resulting from inconsistent data
    compilation. They are only as objective as the data being fed to
    them!
    Many quantitative approaches are insufficiently tested, and
    scholars are often content with results traditional linguists would
    never accept.
    Additionally, quantitative approaches are often presented in a way
    that makes it hard (not only for traditional linguists) to understand
    what they are based upon. Results are reported in an intransparent
    way, supplementary data is often lacking, concrete examples are
    seldom provided and source code (essential to check and replicate
    analyses) is missing in almost all recent publications.
    41 / 50

    View full-size slide

  103. Computer-Assisted
    Language Comparison
    42 / 50

    View full-size slide

  104. Computer-Assisted Language Comparison Bridging the Gap
    Bridging the Gap
    So far, the majority of computational approaches in histori-
    cal linguistics largely disregards the actual needs of histori-
    cal linguistics. Despite the frequent claims that the algorith-
    ms are intended to supplement traditional research, many
    of them are mere attempts to prove the power of modern
    machine learning approaches and completely disregard the
    achievements of traditional research in historical linguistics.
    43 / 50

    View full-size slide

  105. Computer-Assisted Language Comparison Bridging the Gap
    Bridging the Gap
    If we really want to make a difference with computational ap-
    proaches and not simply seek to replace every expert who
    likes books with a computer or abacus, we need to work
    much, much harder, on a real integration of computational
    and traditional approaches.
    43 / 50

    View full-size slide

  106. Computer-Assisted Language Comparison Bridging the Gap
    Bridging the Gap
    P(A|B)=(P(B|A)P(A))/(P(B)
    FRANZ BOPP
    VERY,
    VERY
    LONG
    TITLE
    43 / 50

    View full-size slide

  107. Computer-Assisted Language Comparison Bridging the Gap
    Bridging the Gap
    P(A|B)=(P(B|A)P(A))/(P(B)
    FRANZ BOPP
    VERY,
    VERY
    LONG
    TITLE
    43 / 50

    View full-size slide

  108. Computer-Assisted Language Comparison Bridging the Gap
    Bridging the Gap
    P(A|B)=(P(B|A)P(A))/(P(B)
    FRANZ BOPP
    VERY,
    VERY
    LONG
    TITLE
    Apart from “computational historical
    linguistics”, we need to establish a
    new discipline of “computer-aided
    historical linguistics”.
    Such a framework needs bench-
    marks and new standards to cope
    with general problems of quantitati-
    ve approaches.
    However, such a framework will also
    need additional resources that help
    traditional approaches to leave the
    “realm of intuition”.
    43 / 50

    View full-size slide

  109. Computer-Assisted Language Comparison Examples
    Examples
    44 / 50

    View full-size slide

  110. Computer-Assisted Language Comparison Examples
    Benchmark Databases for Historical Linguistics
    45 / 50

    View full-size slide

  111. Computer-Assisted Language Comparison Examples
    Benchmark Databases for Historical Linguistics
    First benchmark databases have been compiled and published:
    Benchmark Database of Phonetic Alignments (BDPA, List & Prokić
    2014, http://alignments.lingpy.org)
    Benchmark Database for Cognate Detection (BDCD, presented in
    List 2014, http://sequencecomparison.github.io).
    Benchmark Database for Linguistic Reconstruction (BDLR, in
    preparation).
    45 / 50

    View full-size slide

  112. Computer-Assisted Language Comparison Examples
    Benchmark Databases for Historical Linguistics
    All data is
    given in phonetic transcriptions (IPA),
    tokenized into phonemic units,
    freely available for download, and
    can be directly used in LingPy.
    45 / 50

    View full-size slide

  113. Computer-Assisted Language Comparison Examples
    Visualizations and Interactive Applications
    Often, automatic approaches hide essential aspects of their
    analyses. These aspects are not only valid to test the power of
    methods, but also to get the best out of the results.
    Aggregation of results is useful for publications, but we know, that
    “every word has its own history”, and traditional research has
    always been concerned with this.
    Visualizing and reporting all detailed decisions and findings of
    automatic methods will not only increase their transparency, it may
    also help convincing traditional scholars that computational
    approaches may provide valuable insights.
    Apart from static visualizations, JavaScript and HTML5 offer unique
    ways for interactive data visualization and make it easy to produce,
    share, and explore what automatic methods have produced.
    So far, we have develop JavaScript prototype tools that
    – visualize phonetic alignments of cognate sets (JavaScript
    Cognate Viewer, JCOV,
    http://github.com/dighl/jcov/),
    – allow to edit and refine alignments and cognate sets online
    using online tools (Etymological Dictionary Editor, EDICTOR,
    http://tsv.lingpy.org), and
    – tools that visualize phylogenetic trees in geographic space
    (together with T. Mayer, Tree Explorer, TREX,
    http://github.com/dighl/TREX).
    46 / 50

    View full-size slide

  114. Computer-Assisted Language Comparison Challenges
    Challenges
    47 / 50

    View full-size slide

  115. Computer-Assisted Language Comparison Challenges
    Challenges
    German m oː n t -
    English m uː n - -
    Danish m ɔː n - ə
    Swedish m oː n - e
    47 / 50

    View full-size slide

  116. Computer-Assisted Language Comparison Challenges
    Challenges
    German m oː n t -
    English m uː n - -
    Danish m ɔː n - ə
    Swedish m oː n - e
    Fúzhōu ŋ u o ʔ ⁵ - - - - - - - - - -
    Měixiàn ŋ i a t ⁵ - - - - - k u o ŋ ⁴⁴
    Guǎngzhōu j - y t ² l - œ ŋ ²² - - - - -
    Běijīng - y ɛ - ⁵¹ l i ɑ ŋ - - - - - -
    47 / 50

    View full-size slide

  117. Computer-Assisted Language Comparison Challenges
    Challenges
    German m oː n t -
    English m uː n - -
    Danish m ɔː n - ə
    Swedish m oː n - e
    Fúzhōu ŋ u o ʔ ⁵ - - - - - - - - - -
    Měixiàn ŋ i a t ⁵ - - - - - k u o ŋ ⁴⁴
    Guǎngzhōu j - y t ² l - œ ŋ ²² - - - - -
    Běijīng - y ɛ - ⁵¹ l i ɑ ŋ - - - - - -
    "MOON"
    "MOON"
    "SHINE" "LIGHT"
    47 / 50

    View full-size slide

  118. Computer-Assisted Language Comparison Challenges
    Challenges
    Fúzhōu
    Měixiàn
    Guǎngzhōu
    Běijīng
    47 / 50

    View full-size slide

  119. Computer-Assisted Language Comparison Challenges
    Challenges
    Fúzhōu
    Měixiàn
    Guǎngzhōu
    Běijīng
    INNO
    VATIO
    N
    INNO
    VATIO
    N
    INNO
    VATIO
    N
    BO
    RRO
    W
    ING
    LO
    SS
    INNO
    VATIO
    N
    INNO
    VATIO
    N
    47 / 50

    View full-size slide

  120. Computer-Assisted Language Comparison Challenges
    Challenges
    SEMANTIC CHANGE
    MORPHOLOGICAL CHANGE
    S
    T
    R
    A
    T
    IC
    C
    H
    A
    N
    G
    E
    Three Dimensions of Lexical Change (Gévaudan 2007)
    47 / 50

    View full-size slide

  121. Computer-Assisted Language Comparison Challenges
    Challenges
    In order to cope with the multiple dimensions of lexical change, we
    need new methods and models in historical linguistics, which ex-
    plicitly deal with borrowing, partial cognacy, and semantic change.
    Following the lead of evolutionary biology, these methods could be
    combined under a unified framework of tree reconciliation (Page &
    Cotton 2002) in historical linguistics.
    48 / 50

    View full-size slide

  122. Computer-Assisted Language Comparison Challenges
    Challenges
    Fúzhōu
    Měixiàn
    Guǎngzhōu
    Běijīng Fúzhōu
    Měixiàn
    Guǎngzhōu
    Běijīng
    48 / 50

    View full-size slide

  123. Computer-Assisted Language Comparison Challenges
    Challenges
    Fúzhōu
    Měixiàn
    Guǎngzhōu
    Běijīng Fúzhōu
    Měixiàn
    Guǎngzhōu
    Běijīng
    48 / 50

    View full-size slide

  124. Computer-Assisted Language Comparison Challenges
    Challenges
    Fúzhōu
    Měixiàn
    Guǎngzhōu
    Běijīng
    48 / 50

    View full-size slide

  125. Computer-Assisted Language Comparison Challenges
    Challenges
    Fúzhōu
    Měixiàn
    Guǎngzhōu
    Běijīng
    48 / 50

    View full-size slide

  126. Computer-Assisted Language Comparison Challenges
    Challenges
    LOSS
    INNO
    VATIO
    N
    INNO
    VATIO
    N
    BORROWING
    48 / 50

    View full-size slide

  127. Conclusion
    Conclusion
    49 / 50

    View full-size slide

  128. Conclusion
    Conclusion
    Automatic approaches are constantly gaining ground in historical
    linguistics.
    49 / 50

    View full-size slide

  129. Conclusion
    Conclusion
    Automatic approaches are constantly gaining ground in historical
    linguistics.
    Nevertheless, the majority of the new approaches shows a great
    lack in transparency and applicability.
    49 / 50

    View full-size slide

  130. Conclusion
    Conclusion
    Automatic approaches are constantly gaining ground in historical
    linguistics.
    Nevertheless, the majority of the new approaches shows a great
    lack in transparency and applicability.
    One reason for this is the gap between traditional and
    computational approaches which are mostly applied independently
    from each other.
    49 / 50

    View full-size slide

  131. Conclusion
    Conclusion
    Automatic approaches are constantly gaining ground in historical
    linguistics.
    Nevertheless, the majority of the new approaches shows a great
    lack in transparency and applicability.
    One reason for this is the gap between traditional and
    computational approaches which are mostly applied independently
    from each other.
    In order to increase the interaction between traditional and
    computational historical linguists, we need a paradigm shift in
    historical linguistic.
    49 / 50

    View full-size slide

  132. Conclusion
    Conclusion
    Automatic approaches are constantly gaining ground in historical
    linguistics.
    Nevertheless, the majority of the new approaches shows a great
    lack in transparency and applicability.
    One reason for this is the gap between traditional and
    computational approaches which are mostly applied independently
    from each other.
    In order to increase the interaction between traditional and
    computational historical linguists, we need a paradigm shift in
    historical linguistic.
    Computational linguists need to increase the transparency of their
    results, focusing on their detailed and interactive presentation
    instead of hiding behind numbers.
    49 / 50

    View full-size slide

  133. Conclusion
    Conclusion
    Automatic approaches are constantly gaining ground in historical
    linguistics.
    Nevertheless, the majority of the new approaches shows a great
    lack in transparency and applicability.
    One reason for this is the gap between traditional and
    computational approaches which are mostly applied independently
    from each other.
    In order to increase the interaction between traditional and
    computational historical linguists, we need a paradigm shift in
    historical linguistic.
    Computational linguists need to increase the transparency of their
    results, focusing on their detailed and interactive presentation
    instead of hiding behind numbers.
    Traditional linguists need to increase the transparency of their
    methods, focusing on formalizing their intuitions instead of hiding
    behind their “expert insights”.
    49 / 50

    View full-size slide

  134. Thank You for Listening!
    50 / 50

    View full-size slide