Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Studying language contact within a computer-assisted framework

Johann-Mattis List
June 01, 2019
65

Studying language contact within a computer-assisted framework

Talk held at the 64th Annual Conference of the International Linguistic Association (2019-05-30/2019-06-01, Universidad Nacional de San Martín, Buenos Aires).

Johann-Mattis List

June 01, 2019
Tweet

Transcript

  1. Studying Language Contact within a
    Computer-Assisted Framework
    Johann-Mattis List
    Research Group “Computer-Assisted Language Comparison”
    Department of Linguistic and Cultural Evolution
    Max-Planck Institute for the Science of Human History
    Jena, Germany
    2019-06-01
    very
    long
    title
    P(A|B)=P(B|A)...
    1 / 32

    View Slide

  2. Introduction
    Introduction
    Introduction
    Language contact and lexical borrowing
    2 / 32

    View Slide

  3. Introduction Language Contact and Language History
    Language History
    August Schleicher
    (1821-1868)
    3 / 32

    View Slide

  4. Introduction Language Contact and Language History
    Language History
    August Schleicher
    (1821-1868)
    “These assumptions, which follow
    logically from the results of our re-
    search, can be best illustrated by the
    image of a branching tree.” (Schle-
    icher 1853: 787)
    3 / 32

    View Slide

  5. Introduction Language Contact and Language History
    Language History
    Schleicher (1853)
    4 / 32

    View Slide

  6. Introduction Language Contact and Language History
    Language Contact
    Johannes Schmidt
    (1843-1901)
    “I want to replace [the tree] by the im-
    age of a wave that spreads out from
    the center in concentric circles be-
    coming weaker and weaker the far-
    ther they get away from the center.”
    (Schmidt 1872: 27, my translation)
    5 / 32

    View Slide

  7. Introduction Language Contact and Language History
    Language Contact
    Schmidt (1875)
    6 / 32

    View Slide

  8. Introduction Language Contact and Language History
    Language History and Language Contact
    Hugo Schuchardt
    (1842-1927)
    7 / 32

    View Slide

  9. Introduction Language Contact and Language History
    Language History and Language Contact
    Hugo Schuchardt
    (1842-1927)
    “We connect the branches and twigs
    of the tree with countless horizon-
    tal lines and it ceases to be a tree.”
    (Schuchardt 1870 [1900]: 11)
    7 / 32

    View Slide

  10. Introduction Language Contact and Language History
    Language History and Language Contact
    8 / 32

    View Slide

  11. Introduction Language Contact and Language History
    Language History and Language Contact
    8 / 32

    View Slide

  12. Introduction Studying Language Contact
    Similarities between Languages
    similarities
    coincidental
    Grk. theós
    Lat. deus
    ‘god’
    non-coincidental
    natural
    Chi. māma
    Ger. Mama
    ‘mother’
    non-natural
    genealogical
    Eng. tooth
    Ger. Zahn
    ‘tooth’
    non-genealogical
    Eng. Marlboro
    Chi. wànbǎolù
    proper name
    List (2014): DUP: Düsseldorf, List (forthcoming)
    9 / 32

    View Slide

  13. Introduction Studying Language Contact
    Detecting Language Contact
    Evidence Example
    direct Cantonese [tai³³-iœŋ²¹] (Mandarin tàiyáng)
    phylogeny-related English mountain vs. French montagne, Spanish
    montaña
    trait-related German Damm vs. English dam
    distribution-based German Job, Joker, Junkie, Journal
    .
    List (forthcoming)
    10 / 32

    View Slide

  14. Introduction Studying Language Contact
    Detecting Language Contact
    convenient shortcuts: treat lookalikes between Chinese and
    Hmong-Mien as borrowings from Chinese, for historical reasons
    (Ratliff 2010)
    assume all vocabulary from a specific semantic field to be borrowed
    (e.g., religion, seafaring, etc.)
    11 / 32

    View Slide

  15. Introduction Computational Historical Linguistics
    Computational Historical Linguistics
    starting in the early 21st century with phylogenetic approaches (Gray
    and Atkinson 2003, Ringe et al. 2002)
    accompanied by pioneering work on sequence comparison (Kondrak
    2000)
    later followed by more and more approaches on different topics
    (phylogenetic networks, Nakhleh et al. 2005, automatic cognate
    detection, Hauer and Kondrak 2011),
    now a fully established sub-field of historical linguistics
    12 / 32

    View Slide

  16. Introduction Computational Historical Linguistics
    Computational Approaches to Language Contact
    Proposed solutions:
    conflicts in the phylogeny, explain them by invoking borrowings (MLN
    approach, Nelson-Sathi et al. 2011, List et al. 2014)
    similar words among unrelated languages (Mennecier et al. 2016)
    tree reconciliation methods (Willems et al. 2016)
    borrowability statistics (Sergey Yakhontov, as reported by Starostin
    1990, Chén 1996, McMahon et al. 2005)
    13 / 32

    View Slide

  17. Introduction Computational Historical Linguistics
    Computational Approaches to Language Contact
    Performance of proposed solutions:
    conflicts in the phylogeny tend to overestimate the amount of
    borrowing, since there are multiple reasons for conflicts in
    phylogenies, not only borrowing (Morrison 2011)
    sequence comparison on unrelated languages seem solid, but one
    needs to be careful with chance resemblances based on
    onomatopoetic words etc. (mama, papa, etc., Jakobson 1960, Blasi
    et al. 2016)
    tree reconciliation methods are unrealistic if word trees are derived
    from simple edit distances
    sublist-approaches may be useful, but they require large accounts on
    known borrowings, which we usually lack
    13 / 32

    View Slide

  18. Computer-Assisted Language Comparison
    Computer-Assisted
    Language Comparison
    very
    long
    title
    P(A|B)=P(B|A)...
    14 / 32

    View Slide

  19. Computer-Assisted Language Comparison Background
    Historical Linguistics in the Digital Age
    data in linguistics are steadily increasing
    our qualitative methods reach their practical limits
    we need to take computational methods into account
    but computational methods are not very accurate and may yield
    wrong results
    15 / 32

    View Slide

  20. Computer-Assisted Language Comparison Project
    Project
    16 / 32

    View Slide

  21. Computer-Assisted Language Comparison Project
    Project
    16 / 32

    View Slide

  22. Computer-Assisted Language Comparison Project
    Project
    16 / 32

    View Slide

  23. Computer-Assisted Language Comparison Project
    Project
    16 / 32

    View Slide

  24. Computer-Assisted Language Comparison Project
    Project
    16 / 32

    View Slide

  25. Computer-Assisted Language Comparison Project
    Project
    16 / 32

    View Slide

  26. Computer-Assisted Language Comparison Project
    Project
    16 / 32

    View Slide

  27. Computer-Assisted Language Comparison Project
    Project
    16 / 32

    View Slide

  28. Computer-Assisted Language Comparison Project
    Project
    16 / 32

    View Slide

  29. Computer-Assisted Language Comparison Project
    Project
    16 / 32

    View Slide

  30. Computer-Assisted Language Comparison Project
    Project
    16 / 32

    View Slide

  31. Computer-Assisted Language Comparison CALC
    CALC
    very
    long
    title
    P(A|B)=P(B|A)...
    ERC Starting Grant
    (2017-2022)
    Host: MPI-SHH (Jena)
    Current team: 2
    post-docs, 2 docs, and
    myself
    Objectives go beyond
    historical linguistics and
    Sino-Tibetan (but they
    are our starting point)
    http://calc.digling.org
    17 / 32

    View Slide

  32. Studying Language Contact with CALC
    Studying Borrowing
    with CALC
    $
    18 / 32

    View Slide

  33. Studying Language Contact with CALC Computer-Assisted Problem Solving
    Computer-Assisted Problem Solving
    1 identify the core class of your problem (modeling, inference, analysis)
    2 formalize the problem in a way that allows one to test it (specify data
    and techniques for evaluation)
    3 do not hesitate to define sub-problems, given that qualitative
    solutions are often holistic
    4 look at existing qualitative solutions
    5 search for inspiration in neighboring disciplines (graph theory,
    computer science, evolutionary biology) by looking for similar
    processes that could be addressed in an analogous or similar way
    6 accept a qualitative or semi-automatic solution for inference
    processes, but make sure that the results are annotated in a
    machine-readable way
    7 insist on transparent output (no black boxes) to allow for an
    immediate review of results by experts
    19 / 32

    View Slide

  34. Studying Language Contact with CALC Computer-Assisted Problem Solving
    Modeling, Inference, and Analysis
    20 x
    10 x
    5 x ?
    Modeling
    Inference
    Analysis
    20 / 32

    View Slide

  35. Studying Language Contact with CALC Computer-Assisted Problem Solving
    Identify Core Problems (1-3)
    borrowing is a process that happened during different stages in time,
    reflected in form of borrowing or contact layers
    identification of source and target of borrowings is almost impossible
    without knowing the history of a given area
    distinguishing borrowing from inheritance, chance, and typological
    patterns of denotation is also difficult for classical linguistics
    contact areas may overlap
    21 / 32

    View Slide

  36. Studying Language Contact with CALC Computer-Assisted Problem Solving
    Look at Existing Solutions (4-6)
    recent borrowings can be detected with automatic sequence
    comparison approaches (Mennecier et al. 2016)
    searching for borrowings in unrelated languages spoken in similar
    regions can control for inheritance (Mennecier et al. 2016)
    highly advanced techniques for cognate detection are available by now
    (List 2014, List et al. 2017)
    methods for clustering and partitioning are well advanced, but need to
    be applied in a correct fashion
    22 / 32

    View Slide

  37. Studying Language Contact with CALC Computer-Assisted Problem Solving
    Insist on Transparent Output (7)
    lift the data to a high standard of phonetic transcriptions
    use interactive applications to transparently share the findings
    rigorous testing and training on datasets from different languages of
    the world
    prefer direct output (concrete items identifying a contact area) for
    initial studies
    23 / 32

    View Slide

  38. Example from SEA Languages
    Example from
    South-East Asian Languages
    Burmish_Achang
    Baheng, East
    Baheng_West
    Bana
    Biao Min
    Sui_Banliang
    Sinitic_Changsha
    Sinitic_Chaozhou
    Sinitic_Chengdu
    Chuanqiandian
    Chuanqiandian_Central_Guizhou
    Chuanqiandian_Northeast_Yunnan
    Chuanqiandian_Southern_Guizhou
    Dongnu
    Sinitic_Guangzhou
    Bai_Jianchuan
    Jiongnai
    Kim_Mun
    Sinitic_Kunming
    Bai_Luobenzhuo
    Luobuohe_Eastern
    Luobuohe_Western
    Sinitic_Meixian
    Mien
    Sinitic_Nanchang
    Numao
    Nunu
    Sui_Pandong
    Qiandong_East
    Qiandong_North
    Qiandong_South
    Qiandong_West
    Sui_Sandong
    She
    Sinitic_Xi_an
    Xiangxi_East
    Xiangxi_West
    Bai_Xiangyun
    Sinitic_Yangjiang
    Yi_Dafang
    Yi_Mile
    Yi_Mojiang
    Yi_Nanhua
    Yi_Nanjian
    Yi_Xide
    Younuo
    Zao_Min
    24 / 32

    View Slide

  39. Example from SEA Languages Data Preparation
    Language Data
    48 SEA languages from three different families (Sino-Tibetan,
    Hmong-Mien, Tai-Kadai), aggregated from four different sources
    (Beijing University 1964, Sun et al. 1991, Chen 2012, Castro 2015)
    unified phonetic transcriptions following the Cross-Linguistic
    Transcription System framework (Anderson et al. forthcoming,
    https://clts.clld.org)
    unification of elicitation glosses with help of Concepticon (List et al.
    2016, https://concepticon.clld.org)
    data curation following the principles of the Cross-Linguistic Data
    Formats initiative (Forkel et al. 2018, https://cldf.clld.org)
    first inspection of data with help of EDICTOR (List 2017,
    http://edictor.digling.org)
    25 / 32

    View Slide

  40. Example from SEA Languages Borrowing Inference
    Borrowing Inference
    A within-family cognate detection using LexStat as implemented in
    LingPy (List 2014, List et al. 2017)
    B cross-family borrowing detection using a new feature-based
    prosody-aware approach for pronunciation distance calculation and
    flat clustering approach
    C interactive analysis of inferences
    D partition cognate sets into groups indicative of a contact zone
    26 / 32

    View Slide

  41. Example from SEA Languages Borrowing Inference
    Borrowing Inference: Pronunciation Distance Calculation
    pronunciation distance depends on prosody (with weak and strong
    positions in each word, see List 2014)
    feature systems for huge numbers of sounds were lacking so far, but
    are available now with CLTS (Anderson et al. forthcoming)
    alignment methods are well-developed and can be used to compare
    words beforehand (List 2014)
    The approach is work in progress, contact me for more information.
    27 / 32

    View Slide

  42. Example from SEA Languages Data Analysis
    Data Analysis: Contact Areas
    28 / 32

    View Slide

  43. Example from SEA Languages Data Analysis
    Data Analysis: Contact Areas
    Burmish_Achang
    Baheng, East
    Baheng_West
    Bana
    Sinitic_Beijing
    Biao Min
    Sui_Banliang
    Sinitic_Changsha
    Sinitic_Chaozhou
    Sinitic_Chengdu
    Chuanqiandian
    Chuanqiandian_Central_Guizhou
    Chuanqiandian_Northeast_Yunnan
    Chuanqiandian_Southern_Guizhou
    Dongnu
    Sinitic_Guangzhou
    Bai_Jianchuan
    Jiongnai
    Kim_Mun
    Sinitic_Kunming
    Bai_Luobenzhuo
    Luobuohe_Eastern
    Luobuohe_Western
    Sinitic_Meixian
    Mien
    Sinitic_Nanchang
    Numao
    Nunu
    Sui_Pandong
    Qiandong_East
    Qiandong_North
    Qiandong_South
    Qiandong_West
    Sui_Sandong
    She
    Sinitic_Xi_an
    Xiangxi_East
    Xiangxi_West
    Bai_Xiangyun
    Sinitic_Yangjiang
    Yi_Dafang
    Yi_Mile
    Yi_Mojiang
    Yi_Nanhua
    Yi_Nanjian
    Yi_Xide
    Younuo
    Zao_Min
    Burmish
    Hmongic
    Sinitic
    Mienic
    Sui
    Bai
    Nesu
    28 / 32

    View Slide

  44. Example from SEA Languages Data Analysis
    Data Analysis: Contact Areas
    two major contact areas (Hmong-Mien and Sui,
    Sinitic/Bai and Hmong-Mien)
    not all languages under similar influence
    inspection shows that most borrowings can be confirmed
    28 / 32

    View Slide

  45. Example from SEA Languages Data Analysis
    Data Analysis: Contact Areas
    partitioning cognate sets and their associated meanings
    based on their distribution across languages yields about
    6 groups in which five and more concepts are consistently
    shared
    the groups show different distributions and offer
    additional insights into the distribution of shared lexical
    traits
    as some problems are not yet handled (missing data,
    specific coding errors), a manual analysis should ideally
    start from here
    28 / 32

    View Slide

  46. Example from SEA Languages Data Analysis
    Data Analysis: Contact Areas
    ASK (INQUIRE) BEAN BIG BIRD CHICKEN
    CRY DAY (NOT NIGHT) DIE DRINK DUCK
    EGG FAECES (EXCREMENT) FAR HORSE HUNDRED
    KILL OLD (USED) ROPE THIS
    Burmish_Achang
    Baheng, East
    Baheng_West
    Bana
    Biao Min
    Sui_Banliang
    Sinitic_Changsha
    Sinitic_Chaozhou
    Sinitic_Chengdu
    Chuanqiandian
    Chuanqiandian_Central_Guizhou
    Chuanqiandian_Northeast_Yunnan
    Chuanqiandian_Southern_Guizhou
    Dongnu
    Sinitic_Guangzhou
    Bai_Jianchuan
    Jiongnai
    Kim_Mun
    Sinitic_Kunming
    Bai_Luobenzhuo
    Luobuohe_Eastern
    Luobuohe_Western
    Sinitic_Meixian
    Mien
    Sinitic_Nanchang
    Numao
    Nunu
    Sui_Pandong
    Qiandong_East
    Qiandong_North
    Qiandong_South
    Qiandong_West
    Sui_Sandong
    She
    Sinitic_Xi_an
    Xiangxi_East
    Xiangxi_West
    Bai_Xiangyun
    Sinitic_Yangjiang
    Yi_Dafang
    Yi_Mile
    Yi_Mojiang
    Yi_Nanhua
    Yi_Nanjian
    Yi_Xide
    Younuo
    Zao_Min
    Sui
    Sinitic
    Nesu
    Bai
    Mienic
    Burmish
    Hmongic
    28 / 32

    View Slide

  47. Example from SEA Languages Data Analysis
    Data Analysis: Contact Areas
    BEAR BITE CHILI PEPPER CHOOSE FAST
    HOE PEAR POOR SALTY WASH
    Burmish_Achang
    Baheng, East
    Baheng_West
    Bana
    Biao Min
    Sui_Banliang
    Sinitic_Changsha
    Sinitic_Chaozhou
    Sinitic_Chengdu
    Chuanqiandian
    Chuanqiandian_Central_Guizhou
    Chuanqiandian_Northeast_Yunnan
    Chuanqiandian_Southern_Guizhou
    Dongnu
    Sinitic_Guangzhou
    Bai_Jianchuan
    Jiongnai
    Kim_Mun
    Sinitic_Kunming
    Bai_Luobenzhuo
    Luobuohe_Eastern
    Luobuohe_Western
    Sinitic_Meixian
    Mien
    Sinitic_Nanchang
    Numao
    Nunu
    Sui_Pandong
    Qiandong_East
    Qiandong_North
    Qiandong_South
    Qiandong_West
    Sui_Sandong
    She
    Sinitic_Xi_an
    Xiangxi_East
    Xiangxi_West
    Bai_Xiangyun
    Sinitic_Yangjiang
    Yi_Dafang
    Yi_Mile
    Yi_Mojiang
    Yi_Nanhua
    Yi_Nanjian
    Yi_Xide
    Younuo
    Zao_Min
    Sui
    Sinitic
    Nesu
    Bai
    Mienic
    Burmish
    Hmongic
    28 / 32

    View Slide

  48. Example from SEA Languages Data Analysis
    Data Analysis: Contact Areas
    BE HUNGRY FIREWOOD HARD JUMP MOUTH
    SOUP THIN (SLIM) WELL
    Burmish_Achang
    Baheng, East
    Baheng_West
    Bana
    Biao Min
    Sui_Banliang
    Sinitic_Changsha
    Sinitic_Chaozhou
    Sinitic_Chengdu
    Chuanqiandian
    Chuanqiandian_Central_Guizhou
    Chuanqiandian_Northeast_Yunnan
    Chuanqiandian_Southern_Guizhou
    Dongnu
    Sinitic_Guangzhou
    Bai_Jianchuan
    Jiongnai
    Kim_Mun
    Sinitic_Kunming
    Bai_Luobenzhuo
    Luobuohe_Eastern
    Luobuohe_Western
    Sinitic_Meixian
    Mien
    Sinitic_Nanchang
    Numao
    Nunu
    Sui_Pandong
    Qiandong_East
    Qiandong_North
    Qiandong_South
    Qiandong_West
    Sui_Sandong
    She
    Sinitic_Xi_an
    Xiangxi_East
    Xiangxi_West
    Bai_Xiangyun
    Sinitic_Yangjiang
    Yi_Dafang
    Yi_Mile
    Yi_Mojiang
    Yi_Nanhua
    Yi_Nanjian
    Yi_Xide
    Younuo
    Zao_Min
    Sui
    Sinitic
    Nesu
    Bai
    Mienic
    Burmish
    Hmongic
    28 / 32

    View Slide

  49. Example from SEA Languages Data Analysis
    Data Analysis: Contact Areas
    ANT CLAW MONKEY SPARROW SWEET POTATO
    YOUNGER BROTHER
    Burmish_Achang
    Baheng, East
    Baheng_West
    Bana
    Biao Min
    Sui_Banliang
    Sinitic_Changsha
    Sinitic_Chaozhou
    Sinitic_Chengdu
    Chuanqiandian
    Chuanqiandian_Central_Guizhou
    Chuanqiandian_Northeast_Yunnan
    Chuanqiandian_Southern_Guizhou
    Dongnu
    Sinitic_Guangzhou
    Bai_Jianchuan
    Jiongnai
    Kim_Mun
    Sinitic_Kunming
    Bai_Luobenzhuo
    Luobuohe_Eastern
    Luobuohe_Western
    Sinitic_Meixian
    Mien
    Sinitic_Nanchang
    Numao
    Nunu
    Sui_Pandong
    Qiandong_East
    Qiandong_North
    Qiandong_South
    Qiandong_West
    Sui_Sandong
    She
    Sinitic_Xi_an
    Xiangxi_East
    Xiangxi_West
    Bai_Xiangyun
    Sinitic_Yangjiang
    Yi_Dafang
    Yi_Mile
    Yi_Mojiang
    Yi_Nanhua
    Yi_Nanjian
    Yi_Xide
    Younuo
    Zao_Min
    Sui
    Sinitic
    Nesu
    Bai
    Mienic
    Burmish
    Hmongic
    28 / 32

    View Slide

  50. Example from SEA Languages Data Analysis
    Data Analysis: Contact Areas
    DRINK FAST NOSE THICK THUNDER
    Burmish_Achang
    Baheng, East
    Baheng_West
    Bana
    Biao Min
    Sui_Banliang
    Sinitic_Changsha
    Sinitic_Chaozhou
    Sinitic_Chengdu
    Chuanqiandian
    Chuanqiandian_Central_Guizhou
    Chuanqiandian_Northeast_Yunnan
    Chuanqiandian_Southern_Guizhou
    Dongnu
    Sinitic_Guangzhou
    Bai_Jianchuan
    Jiongnai
    Kim_Mun
    Sinitic_Kunming
    Bai_Luobenzhuo
    Luobuohe_Eastern
    Luobuohe_Western
    Sinitic_Meixian
    Mien
    Sinitic_Nanchang
    Numao
    Nunu
    Sui_Pandong
    Qiandong_East
    Qiandong_North
    Qiandong_South
    Qiandong_West
    Sui_Sandong
    She
    Sinitic_Xi_an
    Xiangxi_East
    Xiangxi_West
    Bai_Xiangyun
    Sinitic_Yangjiang
    Yi_Dafang
    Yi_Mile
    Yi_Mojiang
    Yi_Nanhua
    Yi_Nanjian
    Yi_Xide
    Younuo
    Zao_Min
    Sui
    Sinitic
    Nesu
    Bai
    Mienic
    Burmish
    Hmongic
    28 / 32

    View Slide

  51. Example from SEA Languages Data Analysis
    Data Analysis: Contact Areas
    GRASS PEAR RIDE TIRED WALK
    Burmish_Achang
    Baheng, East
    Baheng_West
    Bana
    Biao Min
    Sui_Banliang
    Sinitic_Changsha
    Sinitic_Chaozhou
    Sinitic_Chengdu
    Chuanqiandian
    Chuanqiandian_Central_Guizhou
    Chuanqiandian_Northeast_Yunnan
    Chuanqiandian_Southern_Guizhou
    Dongnu
    Sinitic_Guangzhou
    Bai_Jianchuan
    Jiongnai
    Kim_Mun
    Sinitic_Kunming
    Bai_Luobenzhuo
    Luobuohe_Eastern
    Luobuohe_Western
    Sinitic_Meixian
    Mien
    Sinitic_Nanchang
    Numao
    Nunu
    Sui_Pandong
    Qiandong_East
    Qiandong_North
    Qiandong_South
    Qiandong_West
    Sui_Sandong
    She
    Sinitic_Xi_an
    Xiangxi_East
    Xiangxi_West
    Bai_Xiangyun
    Sinitic_Yangjiang
    Yi_Dafang
    Yi_Mile
    Yi_Mojiang
    Yi_Nanhua
    Yi_Nanjian
    Yi_Xide
    Younuo
    Zao_Min
    Sui
    Sinitic
    Nesu
    Bai
    Mienic
    Burmish
    Hmongic
    28 / 32

    View Slide

  52. Example from SEA Languages Data Analysis
    Data Analysis: Concept Statistics
    by checking the purity of cognates sets with respect to
    the families across which they occur, we can derive
    rankings of concepts, according to their relative
    borrowability in our dataset
    borrowability often thought of as a stable characteristics
    of concepts, also due to Swadesh’s doctrine of basic
    vocabulary, but it is clear that concepts evolve with
    culture, and terms for technical innovations may
    therefore be highly borrowable, as long as they are new,
    but they would later not be borrowed again
    therefore, all statistics on borrowability have to be taken
    with care, as they also reflect the history of a given region
    and not necessarily general patterns of language change
    29 / 32

    View Slide

  53. Example from SEA Languages Data Analysis
    Data Analysis: Concept Statistics
    29 / 32

    View Slide

  54. Example from SEA Languages Data Analysis
    Data Analysis: Concept Statistics
    29 / 32

    View Slide

  55. Example from SEA Languages Data Analysis
    Data Analysis: Concept Statistics
    a weak (Spearman rank: -0.18, p<0.005) negative
    correlation between the purity of concepts with respect
    to potential borrowings and the borrowing statistics of
    the World Loanword Project (WOLD, Haspelmath and
    Tadmor 2008)
    a weak (Spearman rank: 0.19, p<0.005) positive
    correlation with the WOLD project’s age score
    the new ranks based on concept purity could be used to
    expand the limited scope of the WOLD project
    systematically
    29 / 32

    View Slide

  56. Outlook
    Outlook
    *deh3
    -
    ?
    30 / 32

    View Slide

  57. Outlook
    Outlook
    enhance the accuracy of our contact inference workflow
    apply to more language families (esp. South-American languages)
    work on inference of more ancient borrowings
    work on inference of borrowing directions
    enhance the interactive output
    31 / 32

    View Slide

  58. Outlook
    32 / 32

    View Slide