Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Open problems in computational historical linguistics

Open problems in computational historical linguistics

Plenary talk, held at the 24th International Conference of Historical Linguistics (2019-07-01/05, Canberra, Australian National Universit).

Johann-Mattis List

July 02, 2019
Tweet

More Decks by Johann-Mattis List

Other Decks in Science

Transcript

  1. Open Problems in Computational Historical Linguistics
    Johann-Mattis List
    Research Group “Computer-Assisted Language Comparison”
    Department of Linguistic and Cultural Evolution
    Max-Planck Institute for the Science of Human History
    Jena, Germany
    2019-07-02
    very
    long
    title
    P(A|B)=P(B|A)...
    1 / 60

    View Slide

  2. Introduction
    *deh3
    -
    ?
    Introduction
    2 / 60

    View Slide

  3. Introduction Problems
    Problems (we ignore)
    La Société n’adment aucune communication concernant, soit
    l’origine du langage, soit la création d’une langue universelle.
    (Statuts de la Société de Linguistique de Paris, 1866: III)
    .
    The Society will not allow any work dealing with the origin of
    language or the creation of a universal language. (rules of the
    Paris Society of Linguistics from 1866, my transl.)
    3 / 60

    View Slide

  4. Introduction Problems
    Problems (we did not know about)
    The Proto-Sapiens grammar was so simple that the sporadic ref-
    erences in previous paragraphs have essentially described it. The
    prime importance of sound symbolism for the people of nature
    should be noted again before we further detail that the vowel “E”
    was felt as indicating the “yin” element, passivity, femininity etc.
    [...] (Papakitsos and Kenanidis 2018: 8)
    4 / 60

    View Slide

  5. Introduction Problems
    Problems (we forgot)
    Based on an analysis of the literature and a large scale crowd-
    sourcing experiment, we estimate that an average 20-year-old na-
    tive speaker of American English knows 42,000 lemmas and 4,200
    non-transparent multiword expressions, derived from 11,100 word
    families. (Brysbaert et al. 2016: 1)
    5 / 60

    View Slide

  6. Introduction Hilbert Problems
    Hilbert Problems
    23 problems identified by the
    mathematician David Hilbert in
    1900 (Hilbert 1902)
    at least 10 problems have been
    solved by now
    some 7 problems have solutions
    accepted by some scientists
    6 / 60

    View Slide

  7. Introduction Hilbert Problems
    Hilpert Problems
    Martin Hilpert proposed a list of
    problems for linguistics in a talk
    in 2014
    . Russell D. Gray further
    promoted the idea in a series of
    talks, where he emphasized we
    should ask more Hilb/pert
    questions in the field of diversity
    linguistics
    7 / 60

    View Slide

  8. Introduction Hilbert Problems
    Hilpert Problems
    Martin Hilpert proposed a list of
    problems for linguistics in a talk
    in 2014
    Russell D. Gray further
    promoted the idea in a series of
    talks, where he emphasized we
    should ask more Hilb/pert
    questions in the field of diversity
    linguistics
    7 / 60

    View Slide

  9. Introduction Problems in CHL
    Problems in Computational Historical Linguistics
    *deh3
    -
    ?
    The problems I want to discuss are
    “small” in comparison to big
    picture questions asked by
    Hilpert and Gray,
    “personal”, i.e., identified by
    myself, and not necessarily
    interesting to everybody,
    “solvable”, i.e., I guess they
    have a solution.
    I discuss them in the hope that they
    will help us to advance our research
    by forcing us to formalize our work.
    8 / 60

    View Slide

  10. Open Problems
    Open Problems
    in Computational Historical Linguistics
    *deh3
    -
    ?
    *deh3
    -
    ?
    *deh
    3 -
    ?
    *deh3
    -
    ?
    *deh3
    -
    ?
    9 / 60

    View Slide

  11. Open Problems Background
    Background: A Series of Blog Posts
    https://phylonetworks.blogspot.com
    10 / 60

    View Slide

  12. Open Problems Background
    Background: A Series of Blog Posts
    10 problems in total
    initial basic division into problems of inference, simulation, statistics,
    and typology
    problems will be discussed on a monthly basis throughout 2019
    first five problems were already discussed in February, March, April,
    May, and June
    11 / 60

    View Slide

  13. Open Problems Background
    Background: Modeling, Inference, and Analysis
    20 x
    10 x
    5 x ?
    Modeling
    Inference
    Analysis
    12 / 60

    View Slide

  14. Open Problems Inference Problems
    Inference Problems
    Inference
    1 automated morpheme segmentation
    (blog in February 2019)
    2 automated borrowing detection (blog
    in March 2019)
    3 automated sound law induction (blog
    in April 2019)
    4 automated phonological reconstruction
    (blog in May 2019)
    13 / 60

    View Slide

  15. Open Problems Inference Problems
    Inference Problems
    Inference problems deal with something we want to find in lin-
    guistic data. Their common objective is to identify past and
    present processes and states of which we – due to our models
    – think that they have occurred or existed once, or still occur
    and exist.
    14 / 60

    View Slide

  16. Open Problems Modeling Problems
    Modeling Problems
    Modeling
    5 simulation of lexical change (blog in
    June 2019)
    6 simulation of sound change
    7 proof of language relatedness
    15 / 60

    View Slide

  17. Open Problems Modeling Problems
    Modeling Problems
    The modeling problems deal with our knowledge about pro-
    cesses and how we account for the processes in a formal or
    mathematical way. Proof of language relatedness is a specific
    case, maybe not completely fitting into this category, but its
    key objective is to model chance resemblances, which is why it
    is basically also a modeling task and not a task of inference.
    16 / 60

    View Slide

  18. Open Problems Analysis Problems
    Analysis Problems
    20 x
    10 x
    5 x ?
    Analysis
    8 typology of semantic change
    9 typology of semantic promiscuity
    10 typology of sound change
    17 / 60

    View Slide

  19. Open Problems Analysis Problems
    Analysis Problems
    The analysis problems deal with the bigger picture of the pro-
    cesses, and with the question if we can derive tendencies, rates,
    or frequencies from linguistic data. In order to achieve this, we
    need to infer the processes first, and this is the reason why
    these problems are discussed last.
    18 / 60

    View Slide

  20. Open Problems Analysis Problems
    Analysis Problems: Semantic Promiscuity
    List et al. (2016): Unity and disunity [...]. Biology Direct. 19 / 60

    View Slide

  21. Open Problems Analysis Problems
    Analysis Problems: Semantic Promiscuity
    List (2018): Von Wortfamilien [...]. Von Wörtern und Bäumen. 19 / 60

    View Slide

  22. Open Problems Analysis Problems
    Analysis Problems: Semantic Promiscuity
    In der Linguistik gibt es noch keinen richtigen Terminus für
    Wörter, die selbst Grundlage von vielen anderen Wörtern sind
    [...]. In Anlehnung an die Biologie, wo wir in den Proteindomä-
    nen ähnliche Phänomene vorfinden [..], könnten wir jedoch von
    promiskuitiven Konzepten sprechen [...]. (List 2018: Von Wort-
    familien und promiskuitiven Wörtern)
    In linguistics, we lack a term for words that serve themselves as
    the basis for many other words [...]. Following biology, where
    we find similar phenomena with respect to protein domains [...],
    we could, however, speak of promiscuous concepts. (List 2018,
    my translation).
    19 / 60

    View Slide

  23. Problem Solving
    Problem Solving Strategies
    *deh3
    -
    ?
    20 / 60

    View Slide

  24. Problem Solving CALC
    Computer-Assisted Language Comparison
    21 / 60

    View Slide

  25. Problem Solving CALC
    Computer-Assisted Language Comparison
    21 / 60

    View Slide

  26. Problem Solving CALC
    Computer-Assisted Language Comparison
    21 / 60

    View Slide

  27. Problem Solving CALC
    Computer-Assisted Language Comparison
    21 / 60

    View Slide

  28. Problem Solving CALC
    Computer-Assisted Language Comparison
    21 / 60

    View Slide

  29. Problem Solving CALC
    Computer-Assisted Language Comparison
    21 / 60

    View Slide

  30. Problem Solving CALC
    Computer-Assisted Language Comparison
    21 / 60

    View Slide

  31. Problem Solving CALC
    Computer-Assisted Language Comparison
    21 / 60

    View Slide

  32. Problem Solving CALC
    Computer-Assisted Language Comparison
    21 / 60

    View Slide

  33. Problem Solving CALC
    Computer-Assisted Language Comparison
    21 / 60

    View Slide

  34. Problem Solving CALC
    Computer-Assisted Language Comparison
    very
    long
    title
    P(A|B)=P(B|A)...
    Funding: ERC Starting
    Grant (2017-2022)
    Host: MPI-SHH (Jena)
    Current team: 2
    post-docs, 2 docs, and
    myself
    Objectives: establish
    CALC framework for
    Sino-Tibetan and beyond
    http://calc.digling.org
    22 / 60

    View Slide

  35. Problem Solving Mind the Machines
    Mind the Machines (?)
    [...] it was at the 1985 work-
    shop [...] that Fred Jelinek ut-
    tered the now immortal phrase
    “Every time we fire a phoneti-
    cian/linguist, the performance
    of our system goes up”. (Moore
    2005: 1)
    SkyNet
    Use AI to Dismiss Traditional Linguists
    23 / 60

    View Slide

  36. Problem Solving Mind the Machines
    Mind the Machines (!)
    Problems may have an exact solution.
    → Why search for an approximate one?
    Machine learning techniques are not apt for all tasks at hand.
    → We all need to leave our comfort zones!
    We do not only want to know what happened but why it happened!
    → Blackbox results are of no scientific value.
    Our data in historical linguistics is usually not big.
    → Big data solutions often do not work for small data.
    24 / 60

    View Slide

  37. Problem Solving Computer-Assisted Problem Solving
    Computer-Assisted Problem Solving
    A identify the core class of your problem (modeling, inference, analysis)
    B look at existing qualitative solutions
    C formalize the problem in a way that allows you to test it
    D qualitative solutions are often holistic, do not hesitate to specify
    sub-problems
    E search for inspiration in neighboring disciplines by looking for similar
    processes
    F accept a qualitative or semi-automatic solution for inference, but
    make sure the results are also machine-readable
    G insist on transparent output to allow experts to review the results
    25 / 60

    View Slide

  38. Possible Solutions
    Possible Solutions
    *deh3
    -
    eu
    *deh3
    -
    re
    *deh
    3 -
    ka
    *deh3
    -
    H *deh3
    -
    !
    for the Inference Problems
    26 / 60

    View Slide

  39. Possible Solutions Morpheme Segmentation
    Automated Morpheme Segmentation: Task
    Given a list of less than 1000 words
    in phonetic transcription, readily seg-
    mented into sounds, with concepts
    mapped to common concept lists
    (e.g., Concepticon), identify the mor-
    pheme boundaries in the data.
    List (2019): “Automatic morpheme segmentation (Open problems in computa-
    tional diversity linguistics 1)”. GWPN 8.2.
    27 / 60

    View Slide

  40. Possible Solutions Morpheme Segmentation
    Automated Morpheme Segmentation: Current Solutions
    Most algorithms build on n-grams (recurring symbol sequences of
    arbitrary length).
    Assuming that n-grams representing meaning-building units should be
    distributed more frequently across the lexicon of a language, they
    assemble n-gram statistics from the data.
    With Morfessor, there is a popular family of algorithms available in
    form of a stable library (Creutz and Lagus 2005, Virpioja et al. 2013).
    28 / 60

    View Slide

  41. Possible Solutions Morpheme Segmentation
    Automated Morpheme Segmentation: Performance
    List (2019): “Automatic morpheme segmentation” GWPN. 29 / 60

    View Slide

  42. Possible Solutions Morpheme Segmentation
    Automated Morpheme Segmentation: Difficulty
    Ambiguity: morphemes are ambiguous, they are not only based on
    the form, but also on semantics.
    Fuzziness: morpheme boundaries are often fuzzy, even speakers may
    at times no longer understand the original morphology of their
    language.
    Task definition: morpheme boundaries depend on the task at hand,
    as morphological judgments can be based on different perspectives
    (historical perspective involving more than one language, speaker
    intuition, descriptive grammar).
    30 / 60

    View Slide

  43. Possible Solutions Morpheme Segmentation
    Automated Morpheme Segmentation: Qualitative Solutions
    Semantic evidence: humans take semantics into account (compare
    Spanish herman-o “brother” vs. herman-a “sister”).
    Language-specific evidence: humans know that morphological
    structure varies across languages (compare SEA languages vs.
    Indo-European languages) and adjust their strategies accordingly.
    Phonetic evidence: humans try to infer phonotactic rules for the
    languages they work with.
    Cross-linguistic evidence: humans make use of comparisons across
    related languages to search for morpheme boundaries.
    31 / 60

    View Slide

  44. Possible Solutions Morpheme Segmentation
    Automated Morpheme Segmentation: Suggestions
    To enhance our current methods, we need to
    A invest time to create datasets for testing and training,
    B employ semantic information (make use of new resources such as
    CLICS, Concepticon),
    C employ phonotactic information (make use of the prosody models in
    LingPy and new resources like CLTS),
    D employ cross-linguistic information (use sequence comparison
    techniques as those implemented in LingPy), and (maybe)
    E give up the idea of a universal morpheme segmentation algorithm
    (rather proceed from linguistic areas or families).
    32 / 60

    View Slide

  45. Possible Solutions Morpheme Segmentation
    Automated Morpheme Segmentation: Current Work
    We pursue initial work on a Morpheme-Annotated Lexical Database
    (MOALD) in the CALC group,
    based on aggregation strategies used in the CLICS project (List et al.
    2018),
    building on the standardization efforts of the CLDF initiative (Forkel
    et al. 2018),
    using data from individual collaborations on computer-assisted
    language comparison (e.g., T. C. Chacon for Tukanoan languages, A.
    Hantgan for Dogon languages),
    N. E. Schweikhard, doctoral student in the CALC group, will present these initial ideas
    in a talk titled ”Towards a Database of Morpheme-Annotated Wordlists” at the ICHL on
    Thursday.
    33 / 60

    View Slide

  46. Possible Solutions Borrowing Detection
    Automated Borrowing Detection: Task
    Given word lists of different lan-
    guages, find out which words have
    been borrowed, and also determine
    the direction of borrowing.
    mountain
    mouse
    wifi
    List (2019): “Automatic borrowing detection (Open problems in computational
    diversity linguistics 2)”. GWPN 8.3.
    34 / 60

    View Slide

  47. Possible Solutions Borrowing Detection
    Automated Borrowing Detection: Current Solutions
    Some approaches make use of conflicts in the phylogeny, explaining
    them by invoking borrowings (MLN approach, Nelson-Sathi et al.
    2011, List et al. 2014).
    Some approaches search for similar words among unrelated languages
    (Mennecier et al. 2016).
    Tree reconciliation methods compute trees of individual words from
    different languages and then infer borrowing processes by comparing
    the individual word phylogenies with language phylogenies computed
    from all words together (Willems et al. 2016).
    Borrowability statistics (as proposed by Sergey Yakhontov, as
    reported by Starostin 1991, Chén 1996, or McMahon et al. 2005) can
    be used to compare commonalities across stable and less stable parts
    of vocabularies, assuming that commonalities in unstable parts can be
    attributed to borrowing.
    35 / 60

    View Slide

  48. Possible Solutions Borrowing Detection
    Automated Borrowing Detection: Performance
    Conflicts in the phylogeny tend to overestimate the amount of
    borrowing, since there are multiple reasons for conflicts in
    phylogenies, not only borrowing (Morrison 2011).
    Sequence comparison on unrelated languages seems solid, but one
    needs to be careful with chance resemblances (mama, papa, etc.,
    Jakobson 1960, Blasi et al. 2016), and we need to improve our
    metrics for phonetic similarity.
    Tree reconciliation methods are unrealistic if word trees are derived
    from simple edit distances, as it was done in the studies presented so
    far, and they also overestimate the amount of borrowing.
    Sublist-approaches may be useful, but they require large accounts on
    known borrowings, to derive the ranked lists, and it is not clear if
    borrowing rates are stable across times and places.
    36 / 60

    View Slide

  49. Possible Solutions Borrowing Detection
    Automated Borrowing Detection: Difficulty
    Lack of positive criteria: detecting borrowing presupposes to
    exclude alternative reasons (inheritance, natural patterns, chance).
    Lack of unified criteria: there is no unified procedure for the
    identification of borrowings in the classical discipline.
    Difficulties in handling cumulative evidence: borrowing detection
    is much more based on multiple types of evidence (“consilience”,
    “cumulative evidence”) than other tasks in historical linguistics, and
    there is no straightforward way to weight the evidence.
    37 / 60

    View Slide

  50. Possible Solutions Borrowing Detection
    Automated Borrowing Detection: Qualitative Solutions
    Direct evidence: by comparing the same language across different
    times, we can easily see if a word has been borrowed (cf. Cantonese
    [tʰai33-iœŋ21] “sun” with Mandarin tàiyáng).
    Phylogeny-related conflicts: seemingly cognate words that cannot
    be readily explained with the phylogeny of the languages, may often
    hint to borrowing (cf. English mountain and French montagne).
    Trait-related conflicts: when sound correspondences appear
    irregular, this is often a hint to borrowing (cf. German Damm vs.
    English dam).
    Distribution-related conflicts: Specific sounds or words with a
    specific phonotactic that occur only in specific semantic fields may
    point to borrowing (cf. German Joker, Job, Junkie, Journal).
    List (forthc.) “Automated methods [...]” Language Linguistics Compass. 38 / 60

    View Slide

  51. Possible Solutions Borrowing Detection
    Automated Borrowing Detection: Suggestions
    To enhance our current methods, we need to
    A increase cross-linguistic data in phonetic transcription, with consistent
    definition of meanings to search for similar words among unrelated
    languages,
    B test methods for automatic correspondence pattern recognition to
    search for trait-related conflicts (List 2019),
    C work on cross-linguistic datasets of known borrowed words to increase
    our knowledge of borrowability, and
    D investigate methods for concept-based stratification.
    39 / 60

    View Slide

  52. Possible Solutions Borrowing Detection
    Automated Borrowing Detection: Current Work
    In the CALC project, we currently develop
    A cross-linguistic datasets to test borrowing relations in contact areas,
    based on high-quality data in phonetic transcription reflecting
    carefully selected concept lists,
    B new feature-based metrics of phonetic word similarity, based on the
    features developed for the CLTS project on Cross-Linguistic
    Transcription Systems (Anderson et al. 2019), and
    C methods for contact-zone detection based on a new method for
    cognate set partitioning.
    M.-S. Wu, doctoral student in the CALC group, will illustrate how we assemble data for
    South-East Asia in a talk titled “Studying language contact in South East Asia with help
    of computer-assisted approaches” at the ICHL on Tuesday.
    40 / 60

    View Slide

  53. Possible Solutions Borrowing Detection
    Automated Borrowing Detection: Current Work
    ASK (INQUIRE) BEAN BIG BIRD CHICKEN
    CRY DAY (NOT NIGHT) DIE DRINK DUCK
    EGG FAECES (EXCREMENT) FAR HORSE HUNDRED
    KILL OLD (USED) ROPE THIS
    Burmish_Achang
    Baheng, East
    Baheng_West
    Bana
    Biao Min
    Sui_Banliang
    Sinitic_Changsha
    Sinitic_Chaozhou
    Sinitic_Chengdu
    Chuanqiandian
    Chuanqiandian_Central_Guizhou
    Chuanqiandian_Northeast_Yunnan
    Chuanqiandian_Southern_Guizhou
    Dongnu
    Sinitic_Guangzhou
    Bai_Jianchuan
    Jiongnai
    Kim_Mun
    Sinitic_Kunming
    Bai_Luobenzhuo
    Luobuohe_Eastern
    Luobuohe_Western
    Sinitic_Meixian
    Mien
    Sinitic_Nanchang
    Numao
    Nunu
    Sui_Pandong
    Qiandong_East
    Qiandong_North
    Qiandong_South
    Qiandong_West
    Sui_Sandong
    She
    Xiangxi_East
    Xiangxi_West
    Bai_Xiangyun
    Sinitic_Yangjiang
    Yi_Dafang
    Yi_Mile
    Yi_Mojiang
    Yi_Nanhua
    Yi_Nanjian
    Yi_Xide
    Younuo
    Zao_Min
    Sui
    Sinitic
    Nesu
    Bai
    Mienic
    Burmish
    Hmongic
    List (in prep.): “Automated detection of lexical strata”. 40 / 60

    View Slide

  54. Possible Solutions Borrowing Detection
    Automated Borrowing Detection: Current Work
    BE HUNGRY FIREWOOD HARD JUMP MOUTH
    SOUP THIN (SLIM) WELL
    Burmish_Achang
    Baheng, East
    Baheng_West
    Bana
    Biao Min
    Sui_Banliang
    Sinitic_Changsha
    Sinitic_Chaozhou
    Sinitic_Chengdu
    Chuanqiandian
    Chuanqiandian_Central_Guizhou
    Chuanqiandian_Northeast_Yunnan
    Chuanqiandian_Southern_Guizhou
    Dongnu
    Sinitic_Guangzhou
    Bai_Jianchuan
    Jiongnai
    Kim_Mun
    Sinitic_Kunming
    Bai_Luobenzhuo
    Luobuohe_Eastern
    Luobuohe_Western
    Sinitic_Meixian
    Mien
    Sinitic_Nanchang
    Numao
    Nunu
    Sui_Pandong
    Qiandong_East
    Qiandong_North
    Qiandong_South
    Qiandong_West
    Sui_Sandong
    She
    Xiangxi_East
    Xiangxi_West
    Bai_Xiangyun
    Sinitic_Yangjiang
    Yi_Dafang
    Yi_Mile
    Yi_Mojiang
    Yi_Nanhua
    Yi_Nanjian
    Yi_Xide
    Younuo
    Zao_Min
    ASK (INQUIRE) BEAN BIG BIRD CHICKEN
    CRY DAY (NOT NIGHT) DIE DRINK DUCK
    EGG FAECES (EXCREMENT) FAR HORSE HUNDRED
    KILL OLD (USED) ROPE THIS
    Sui
    Sinitic
    Nesu
    Bai
    Mienic
    Burmish
    Hmongic
    List (in prep.): “Automated detection of lexical strata”. 40 / 60

    View Slide

  55. Possible Solutions Sound Law Induction
    Automated Sound Law Induction: Task
    Given a list of words in an
    ancestral language and their
    reflexes in a descendant lan-
    guage, identify the sound laws
    by which the ancestor can be
    converted into the descendant.
    *p > *pf / #_
    List (2019): “Automatic sound law induction (Open problems in computational
    diversity linguistics 3)”. GWPN 8.3.
    41 / 60

    View Slide

  56. Possible Solutions Sound Law Induction
    Automated Sound Law Induction: Current Solutions
    No direct studies dealing with this task are known to me, but studies cov-
    ering similar tasks include
    simulation studies (see e.g., Ciobanu and Dinu 2018) for word
    prediction,
    manual tools to model sound change when providing sound laws
    (e.g., PHONO by Hartmann 2003), and
    correspondence-pattern based word prediction (List 2019, Bodt and
    List 2019).
    42 / 60

    View Slide

  57. Possible Solutions Sound Law Induction
    Automated Sound Law Induction: Difficulty
    Methods for induction: the induction of rules as a problem is
    usually not addressed in machine learning solutions.
    Distant phonological context: triggering context for sound change
    may be found in arbitrary distances from the target sound.
    Abstract phonological context: “abstract” contexts from
    suprasegmentals (e.g. tone and stress) can also condition sound
    change.
    Systematic aspects of sound change: sound change often affects
    groups of phonemes in a similar manner, i.e., it effects parts of the
    phonological system of a language.
    43 / 60

    View Slide

  58. Possible Solutions Sound Law Induction
    Automated Sound Law Induction: Qualitative Solutions
    Trial and error: there are no general strategies that scholars follow,
    instead, they seem to like to figure this out themselves, similar to
    people who like to solve Sudoku or Chess riddles.
    44 / 60

    View Slide

  59. Possible Solutions Sound Law Induction
    Automated Sound Law Induction: Suggestions
    We can address the problem of sound law induction (at least in part) with
    help of techniques for multi-tiered sequence modeling (List 2014, List and
    Chacon 2015). The basic idea of these techniques is to represent words not
    only as consisting of a single sequence of sounds, but instead as some kind
    of partitura in which each of the different phonological aspects of a word
    form are given their own voice. This technique allows us then
    to model all different possible conditioning contexts in separate layers
    (tiers), and to
    use heuristics to search for those tiers which actually condition a
    given sound change.
    45 / 60

    View Slide

  60. Possible Solutions Sound Law Induction
    Automated Sound Law Induction: Suggestions
    1 primero
    2 falsos
    1 tedioso
    45 / 60

    View Slide

  61. Possible Solutions Sound Law Induction
    Automated Sound Law Induction: Suggestions
    IPA
    Stress
    Orthography k i n d e r g a r t e n
    k ɪ n d ɐ g a ʁ t ə n
    2 2 2 0 0 0 1 1 1 0 0 0
    45 / 60

    View Slide

  62. Possible Solutions Sound Law Induction
    Automated Sound Law Induction: Suggestions
    IPA k ɪ n d ɐ g a ʁ t ə n
    Prec. CV # C V c C V C V c C V
    Preceding # k ɪ n d ɐ g a ʁ t ə
    Following ɪ n d ɐ g a ʁ t ə n $
    Foll. CV V c C V C V c C V C $
    45 / 60

    View Slide

  63. Possible Solutions Sound Law Induction
    Automated Sound Law Induction: Suggestions
    Proto p p p p p p p p p p p
    Stress 2 2 2 2 2 1 0 1 0 1 0
    Prec. CV # # C C V V V # C # C
    Foll. CV C V c c V V V C V c V
    Descendant p p p p f f f h h h h
    45 / 60

    View Slide

  64. Possible Solutions Sound Law Induction
    Automated Sound Law Induction: Suggestions
    Proto p p p p p p p p p p p
    Stress 2 2 2 2 2 1 0 1 0 1 0
    Prec. CV # # C C V V V # C # C
    Foll. CV C V c c V V V C V c V
    Descendant p p p p f f f h h h h
    45 / 60

    View Slide

  65. Possible Solutions Sound Law Induction
    Automated Sound Law Induction: Current work
    In the CALC project, we currently develop
    A a Python library to work with multi-tiered sequence representations,
    B datasets for testing and training, and
    C methods and metrics for the evaluation of reconstruction systems
    based on multi-tiered sequence representations (see List forthc. for
    initial ideas in this regard).
    T. Tresoldi, post-doc in the CALC group, will further illustrate the usefulness of multi-
    tiered sequence representations in a talk titled “Automatic Induction of Sound Laws from
    Cognates” at the ICHL on Thursday.
    46 / 60

    View Slide

  66. Possible Solutions Phonological Reconstruction
    Automated Phonological Reconstruction: Task
    Given a set of alignments
    of strict cognate morphemes
    across a set of related lan-
    guages, as well as the typ-
    ical correspondence patterns
    by which the sounds in the
    languages correspond to each
    other, try to infer the hypothet-
    ical pronunciation of each mor-
    pheme in the proto-language.
    * ₂
    List (2019): “Automatic phonological reconstruction (Open problems in computa-
    tional diversity linguistics 4)”. GWPN 8.4.
    47 / 60

    View Slide

  67. Possible Solutions Phonological Reconstruction
    Automated Phonological Reconstruction: Current Solutions
    Bouchard-Côté et al. (2013) use a framework that makes use of
    probabilistic string transducers. If the family tree of the languages is
    known, and cognate sets are defined as such, the method produces
    proto-form suggestions.
    In a forthcoming paper, Gerhard Jäger illustrates how classical
    methods for ancestral state reconstruction applied to aligned cognate
    sets could be used for phonological reconstruction and illustrates this
    for ASJP wordlists of the Romance languages (Jäger forthcoming).
    48 / 60

    View Slide

  68. Possible Solutions Phonological Reconstruction
    Automated Phonological Reconstruction: Performance
    The method by Bouchard-Côté was only tested on Austronesian, and
    is not available, so it cannot be tested further without
    re-implementing from scratch. The scores reported are high (error
    rates between 0.25 and 0.12), but Austronesian is not a challenging
    candidate for reconstruction.
    Jäger’s method produces a set of words that is slightly more similar
    to Latin than the baseline (words from Sardinian).
    None of the methods is capable of producing sounds that are not
    found in any of the descendant languages.
    Evaluation of the methods is carried out with help of the edit
    distance, which is problematic, since the edit distance does not check
    for systematic similarities (List forthc.).
    List (forth.): “Beyond edit distances”. Theoretical Linguistics. 49 / 60

    View Slide

  69. Possible Solutions Phonological Reconstruction
    Automated Phonological Reconstruction: Difficulty
    Abstractness of reconstructions: scholars still disagree with respect
    to the question of how reconstruction should be best carried out, i.e.,
    if it should be abstract or realistic (so-called abstractionalist-realist
    debate, Lass 2017, Jakobson 1958).
    Evaluation of reconstruction systems: no measures to account for
    the predictive quality of a given reconstruction system exist.
    Unattested states: reconstructing what cannot be found in the
    data, as in the case of laryngeals in Indo-European (Saussure 1879),
    does not have a counterpart in biology (or is simply ignored).
    50 / 60

    View Slide

  70. Possible Solutions Phonological Reconstruction
    Automated Phonological Reconstruction: Qualitative Sol.
    Sound correspondence patterns: scholars determine the most
    frequent and salient sound correspondence patterns in their data and
    base the reconstructions on them (Anttila 1972, Meillet 1903).
    External evidence: scholars use external evidence where possible
    (e.g. from more distantly related languages).
    Internal reconstruction: scholars employ techniques of internal
    reconstruction where possible.
    Feature representations: scholars make make use of feature
    representations of sounds to propose unobserved sounds.
    51 / 60

    View Slide

  71. Possible Solutions Phonological Reconstruction
    Automated Phonological Reconstruction: Qualitative Sol.
    p (L₂) > f / _i
    i (L₂) > Ø / _a
    51 / 60

    View Slide

  72. Possible Solutions Phonological Reconstruction
    Automated Phonological Reconstruction: Suggestions
    To enhance the current methods, we need to
    A create sufficient data for testing and training (different language
    families, different time depths),
    B develop measures to compare different reconstruction systems (gold
    standard and algorithmic solution or competing “home-made”
    systems) both with each other and with respect to their power to
    predict the data of the descendant languages,
    C embrace the possibility of semi-automated reconstruction (e.g. by
    computing correspondence patterns from alignment data, see List
    2019), and
    d investigate possibilities to take feature systems (as provided in
    Anderson et al. 2019) into account, in order to allow for the
    reconstruction of unobserved sounds.
    52 / 60

    View Slide

  73. Possible Solutions Phonological Reconstruction
    Automated Phonological Reconstruction: Current Work
    In our work in the CALC project, we are currently
    A establishing linguistic reconstructions by collaborating with different
    researchers on specific subgroups,
    B testing semi-automatic methods for reconstruction, based on the
    algorithm for sound correspondence pattern detection by List (2019),
    C evaluating metrics for the comparison of reconstruction systems (List
    forthc.), and
    D testing multi-tier-based methods to test the predictive strength of
    different reconstruction systems.
    N. W. Hill (collaborator with CALC) will discuss the computer-assisted reconstruction of
    Proto-Burmish in his talk titled “Toward a computational implementation of the tradi-
    tional comparative method” at the ICHL on Thursday.
    T. A. Bodt (collaborator with CALC) will present how semi-automated reconstruction
    methods can be further used in a talk titled “The predictive capacity of a computer-
    assisted framework of the comparative method” at the ICHL on Thursday.
    53 / 60

    View Slide

  74. Possible Solutions General Ideas
    General Ideas: Evaluation
    Problem:
    lack of good benchmark
    datasets (especially gold
    standards, training data, and
    baselines) and the
    lack of good evaluation
    measures
    Suggested solutions:
    simulation methods (produce
    test data)
    interfaces for data annotation
    (produce data and evaluate
    results)
    54 / 60

    View Slide

  75. Possible Solutions General Ideas
    General Ideas: Standards
    are needed to make linguistic
    data comparable
    allow for a better integration of
    software and data
    can also guarantee that data is
    available in both human- and
    machine-readable form
    01
    | | | 05 | | | | 10 | | | | 15
    Forkel et al. (2018): “Cross-Linguistic Data Formats [...]” Scientific Data. https:
    //cldf.clld.org.
    55 / 60

    View Slide

  76. Possible Solutions General Ideas
    General Ideas: Standards
    Glottolog
    arbitrarité
    Concepticon
    CLTS
    languages
    concepts
    sounds
    Reference Catalogs
    >>> from pycldf import *
    >>> ds = Dataset('path')
    >>> ds.validate()
    >>> ds.statistics()
    Validation Software
    CLDF
    ID CONCEPT IPA COGNACY
    1 hand hant 1
    2 hand hænd 1
    3 ruka ruka 2
    4 rẽnka rẽnka 2
    ... ... ... ...
    Spreadsheet Formats
    Online Publication (CLLD)
    55 / 60

    View Slide

  77. Possible Solutions General Ideas
    General Ideas: Standards
    55 / 60

    View Slide

  78. Possible Solutions General Ideas
    General Ideas: Interfaces
    allow for a rapid annotation of
    data
    guarantee that data is human-
    and machine-readable
    allow for qualitative and
    quantitative research at the
    same time
    very
    long
    title
    P(A|B)=P(B|A)...
    List (2017): “A web-based tool [...]” Proc. of the EACL System Demonstrations.
    https://edictor.digling.org.
    56 / 60

    View Slide

  79. Possible Solutions General Ideas
    General Ideas: Interfaces
    ID DOCULECT CONCEPT SEGMENTS
    N U O ?
    wOld
    yuE_5_1liaN_1
    moon
    moon
    moon
    moon
    Běijīng
    Guǎngzhōu
    Měixiàn
    Fúzhōu
    1
    2
    3
    4
    Conversion and Segmentation
    Highlighting of Unrecognized
    Phonetic Symbols
    yuE_5_1liaN_1
    yɛ⁵¹liɑŋ¹
    y ɛ ⁵¹ l i ɑ ŋ ¹
    annotate data
    analyze data
    edit alignments
    Etymological DICTionary ediTor
    http://edictor.digling.org
    List (2017)
    E D T
    56 / 60

    View Slide

  80. Outlook
    Outlook
    *deh3
    -
    ?
    57 / 60

    View Slide

  81. Outlook Measuring
    Measuring
    «Measure what is measurable, and make measurable what is not
    so.» (Galileo Galilei [quote apparently falsely attributed to Galilei,
    see Kleinert 2009])
    58 / 60

    View Slide

  82. Outlook Towards Big Data
    Towards Big Data
    CLICS: Database of Cross-Linguistic Colexifications
    http://clics.clld.org
    List et al. (2018)
    >1000 languages
    >1500 concepts
    59 / 60

    View Slide

  83. Outlook Towards Big Data
    Towards Big Data
    CLICS: Database of Cross-Linguistic Colexifications
    http://clics.clld.org
    List et al. (2018)
    CARRY IN HAND
    CARRY UNDER ARM
    RULE
    ORDER
    SALT
    TAKE
    CHOOSE
    LEND SHARE
    BRING
    FORGET
    ACQUIT
    HAVE SEX
    HAND
    LIBERATE
    DIRTY
    GUEST
    ARM
    BETWEEN
    UPPER ARM
    MOLD
    TORCH OR LAMP
    OWN
    GAP (DISTANCE)
    DRIP (EMIT LIQUID)
    FINGERNAIL OR TOENAIL
    RIVER
    KISS
    RAIN (PRECIPITATION)
    WHEN
    SPOON
    SUCK
    ROUND
    LICK
    FINGERNAIL
    CLAW SOUP
    DRINK
    FORK
    PITCHFORK
    WATER
    SEA
    OPEN
    SMOKE (INHALE)
    LET GO OR SET FREE
    CAUSE
    DIRT
    FORKED BRANCH
    SEND
    LIP
    FORGIVE
    UNTIE
    ANCHOR
    EAT
    BITE
    BEVERAGE
    SWALLOW
    SAP
    URINE
    ANKLE
    FISHHOOK
    WHEEL
    WHERE
    LIFT
    CHIEFTAIN
    LOWER ARM
    CAUSE TO (LET)
    QUEEN
    GIVE
    ELBOW
    DONATE
    ELECTRICITY
    SKY
    STORM CLOUDS
    MUD
    SWAMP
    SMOKE (EXHAUST)
    FRESH
    SMOKE (EMIT SMOKE)
    STRANGER
    CEASE
    MOORLAND
    HOST
    GO UP (ASCEND)
    WEDDING
    CLIMB
    CLOUD
    PALM OF HAND
    FIVE
    MARRY
    RISE (MOVE UPWARDS)
    WRIST
    KING
    PRESIDENT
    FATHOM
    COLLARBONE
    RIDE
    SPACE (AVAILABLE)
    MASTER
    SHOULDER
    BROOM
    RAKE
    FLESH
    HOOK
    DRIBBLE
    SPIT
    TOE
    PAW
    OCEAN
    FINGER
    LAKE
    EDGE
    OBSCURE
    TOP
    NIGHT
    INCREASE
    WORLD
    UP
    DARKNESS
    BE
    GOD
    CALF OF LEG
    LEG
    SHIN
    FISH
    LOWER LEG
    WOMAN
    FEMALE (OF PERSON)
    FEMALE
    FEMALE (OF ANIMAL)
    LAGOON
    CORNER
    BORDER
    BESIDE
    FRINGE
    BOUNDARY
    WIFE
    COAST
    POINTED
    SHARP
    SHORE
    PLACE (POSITION)
    END (OF SPACE)
    EARTH (SOIL)
    BLACK
    STAND UP
    CHEW
    MEAL
    BREAKFAST
    HEEL
    FOOD
    DINNER (SUPPER)
    FOOT
    STAR
    SAND
    CLAY
    STAND
    SHOULDERBLADE
    CRAWL
    WAKE UP FOG
    FINISH
    DARK
    MALE ICE
    WAIST
    MARRIED MAN
    HIP
    DEEP
    LUNG
    FOAM
    REMAINS
    BLUE
    WAIT (FOR)
    LIFE
    LATE
    BE ALIVE
    AFTER
    TOWN
    BEHIND
    ASH
    FLOUR
    STATE (POLITICS)
    NEW
    UPPER BACK
    BOTTOM
    PASTURE
    THATCH
    BUTTOCKS
    MAN
    MALE (OF ANIMAL)
    MALE (OF PERSON)
    SIT DOWN
    TALL
    CROUCH
    EVENING
    AFTERNOON
    HIGH
    WEST
    GROW
    MAINLAND
    SIT
    LAND
    FLOOR
    AREA
    HALT (STOP)
    DUST
    REMAIN
    GROUND
    NATIVE COUNTRY
    DWELL (LIVE, RESIDE)
    COUNTRY
    HUSBAND
    BACK
    END (OF TIME)
    SPINE
    GRASS
    DEW
    MARRIED WOMAN
    ROOSTER
    INSECT
    FOWL
    BIRD
    ANIMAL
    HEN
    SHORT
    BABY
    CORN FIELD
    THIN
    SAGO PALM
    GARDEN
    SMALL
    THIN (OF SHAPE OF OBJECT)
    CLAN
    NARROW
    FAMILY
    YOUNG
    CITIZEN
    FINE OR THIN
    SHALLOW
    THIN (SLIM)
    GIRL
    RELATIVES
    YOUNG MAN
    FRIEND
    PARENTS
    CHILD (DESCENDANT)
    YOUNG WOMAN
    BOY
    NEIGHBOUR
    CHILD (YOUNG HUMAN)
    SON
    SIBLING
    BROTHER
    DESCENDANTS
    OLDER SIBLING
    DAUGHTER
    ALONE
    FENCE
    ONLY
    FEW
    TOWER
    SOME
    ONE
    YARD
    OUTSIDE
    FORTRESS
    NEVER
    PLAIN
    PEOPLE
    VALLEY
    DOWN
    FIELD
    LOW
    PERSON
    YOUNGER SIBLING
    YOUNGER SISTER
    OLDER BROTHER
    YOUNGER BROTHER
    COUSIN
    SISTER
    OLDER SISTER
    NEPHEW
    DAMP
    FLOWER
    MANY
    SMOOTH
    WIDE
    FLAT
    BLOOD
    WET
    BELOW OR UNDER
    DOWN OR BELOW
    GREY
    BREAD
    DOUGH
    RAW
    VILLAGE
    GREEN
    CROWD
    SOFT
    AT
    ALL
    SLIP
    UNRIPE
    VEIN
    BLOOD VESSEL
    ALWAYS
    TENDON
    ROOF
    ROOT
    INSIDE
    OR
    GENTLE
    OLD
    WITH
    ENOUGH
    OLD (AGED)
    FORMER
    AND
    ROOM
    HOME
    TENT
    HUT
    GARDEN-HOUSE
    WEAK
    DENSE
    MEN'S HOUSE
    OLD MAN
    LAZY
    STILL (CONTINUING)
    TIRED
    AGAIN
    MORE
    READY
    OLD WOMAN
    SOMETIMES
    IN
    HOUSE
    OFTEN
    YELLOW
    RED
    AFTERWARDS
    BIG
    GOLD
    YOLK
    HOUR
    SALTY
    PINCH
    KNEEL
    AGE
    RIPE
    THICK
    FULL
    STRAIGHT
    BE LATE
    LIGHT (RADIATION) ABOVE
    WORK (ACTIVITY)
    PRODUCE
    MAKE
    DAY (NOT NIGHT)
    HEAVEN
    WORK (LABOUR) BUILD
    FAR
    AT THAT TIME
    LONG
    WHITE
    LENGTH
    THEN
    MOUNTAIN OR HILL
    SEASON
    HAVE
    PRESS
    GET
    PICK UP
    HEAD
    HOLD
    EARN
    DO OR MAKE
    WEATHER
    FATHER
    STEPFATHER
    UNCLE
    FATHER-IN-LAW (OF MAN)
    FATHER'S BROTHER
    MOTHER'S BROTHER
    STEPMOTHER
    AUNT
    BEGINNING
    BEGIN
    FIRST
    FATHER'S SISTER
    MOTHER-IN-LAW (OF WOMAN)
    MOTHER'S SISTER
    MOTHER
    MOTHER-IN-LAW (OF MAN)
    PARENTS-IN-LAW
    GRANDDAUGHTER
    SON-IN-LAW (OF WOMAN)
    FATHER-IN-LAW (OF WOMAN)
    SON-IN-LAW (OF MAN)
    DAUGHTER-IN-LAW (OF WOMAN)
    CHILD-IN-LAW
    SIBLING'S CHILD
    NIECE
    GRANDFATHER
    DAUGHTER-IN-LAW (OF MAN)
    IN FRONT OF
    FORWARD
    GRANDSON
    GRANDCHILD
    GRANDMOTHER
    ANCESTORS
    GRANDPARENTS
    THING
    STREET
    MANNER
    ROAD
    PIECE
    PORT
    PATH OR ROAD
    PATH
    RIB
    BONE
    BAIT
    THIGH
    BAY
    FLESH OR MEAT MEAT FOOTPRINT
    SIDE
    PART
    SLICE
    WALL (OF HOUSE)
    MIDDLE
    NAVEL
    SNOW
    LAST (FINAL)
    HAY HALF
    NEAR
    CHICKEN
    BULL
    SNAKE
    WORM
    CATTLE
    LIVESTOCK
    CALF
    OX
    COW
    WHICH
    WHITHER (WHERE TO)
    WINE
    HOW
    CIRCLE
    RING
    BALL
    BRACELET
    HOW MUCH
    HOW MANY
    BEEHIVE
    GRAVE
    CAVE
    BEARD
    RAIN (RAINING)
    SPRING OR WELL
    MOUSTACHE
    STREAM
    GLUE
    ALCOHOL (FERMENTED DRINK)
    BEE
    BEER
    HONEY
    WHO WASP
    MEAD
    WHAT
    WHY
    CANDY
    LUNCH
    ITEM
    WARE
    CUSTOM
    LAW
    MIDDAY
    PIT (POTHOLE)
    HOLE
    FURROW
    DITCH
    LAIR
    JUDGMENT
    COURT
    ADJUDICATE
    CONDEMN
    CONVICT
    ACCUSE
    BLAME
    ANNOUNCE
    PREACH
    EXPLAIN
    SAY
    ASK (REQUEST)
    THROW
    BUDGE (ONESELF)
    SHOOT
    EMBERS
    UGLY
    CHOP
    CUT DOWN
    COLD (OF WEATHER)
    FIREWOOD
    GRASP
    LEAD (GUIDE)
    DISTANCE
    LIE DOWN
    CARRY ON HEAD
    PERMIT
    PUSH
    MOLAR TOOTH
    FRONT TOOTH (INCISOR)
    RIDGEPOLE
    BEAK
    COAT
    TOWEL
    HELMET
    SHIRT
    HEADBAND
    HEADGEAR
    RAG
    VEIL
    SOON
    TOGETHER
    IMMEDIATELY
    NEST
    NOW
    BED
    TODAY
    INSTANTLY
    SUDDENLY
    RUG
    WITHOUT
    PONCHO
    BLANKET
    CLOAK
    MAT
    BEFORE
    BOLT (MOVE IN HASTE)
    ROAR (OF SEA)
    FAST
    DASH (OF VEHICLE)
    EARLY
    YESTERDAY
    HURRY
    AT FIRST
    EMPTY
    NO
    DRY
    ZERO
    NOTHING
    NOT
    RESULT IN
    BE BORN
    HAPPEN
    PASS
    SUCCEED
    BECOME
    BRAVE
    CLOTH
    POWERFUL
    DARE
    LOUD
    GRASS-SKIRT
    DRESS
    CLOTHES
    SKIRT
    RIPEN
    SOLID
    PIERCE
    HARD
    BEGET
    ROUGH
    REFUSE
    FRY
    DRESS UP
    DENY
    CALM
    MORNING
    PEACE
    BE SILENT
    QUIET
    SWELL
    TOMORROW
    HEALTHY
    EXPENSIVE
    HAPPY
    ROAST OR FRY
    STRONG BAKE
    PRICE
    BOIL (SOMETHING)
    PUT ON
    COOKED
    SLOW
    FAITHFUL
    RIGHT
    LAST (ENDURE)
    FOR A LONG TIME
    DAWN
    BEAUTIFUL
    GOOD
    COOK (SOMETHING)
    YES
    CORRECT (RIGHT)
    BOIL (OF LIQUID)
    DO
    PUT
    BRIGHT
    CLEAN
    LIGHT (COLOR)
    LAY (VERB)
    SHINE
    SEAT (SOMEBODY)
    INNOCENT
    FORBID
    PREPARE
    CERTAIN
    TRUTH TRUE
    DEAR
    PRECIOUS
    WARM
    HEAT
    CONCEIVE
    SEW
    LOOM
    PLAIT
    LIGHT (IGNITE)
    BURN (SOMETHING) PREVENT
    HOLY
    GOOD-LOOKING
    ARSON
    BEND
    CHANGE (BECOME DIFFERENT)
    BURNING
    TWIST
    DEBT
    CROOKED
    ROLL
    SPIN
    HEAVY
    HOT
    WEAVE
    DIFFICULT
    FEVER
    PLAIT OR BRAID OR WEAVE
    PREGNANT
    OWE
    TWINKLE
    CLEAR
    BEND (SOMETHING)
    MORTAR CRUSHER
    PESTLE
    BITTER
    MILL MONTH SKULL
    MEASURE
    TRY
    COME BACK TIME
    MOON
    COUNT
    JOIN
    SQUEEZE
    PILE UP
    CLOCK
    BUY
    DRAW MILK
    DAY (24 HOURS)
    BETRAY
    GUARD
    PROTECT
    PAY
    KNEE
    KEEP
    SELL
    SUN
    BILL
    HELP
    LIE (MISLEAD)
    TRADE OR BARTER
    DECEIT
    PERJURY
    RESCUE
    CURE
    FOLD
    SIEVE
    PRESERVE
    TRANSLATE
    TURN (SOMETHING)
    TURN
    WRAP
    HERD (SOMETHING)
    WAGES
    DEFEND
    CHANGE
    RETURN HOME
    TIE UP (TETHER)
    TURN AROUND
    HANG
    KNIT
    WEIGH
    HANG UP
    GIVE BACK
    CONNECT
    COVER
    BUTTON
    BUNCH
    KNOT
    SHUT
    BUNDLE
    TIE
    NOOSE
    GILL
    EAR
    EARLOBE
    THINK
    FOLLOW
    JEWEL
    BE ABLE
    OBEY
    SUMMER
    FEEL (TACTUALLY)
    REMEMBER
    SUSPECT
    BELIEVE
    GUESS
    RECOGNIZE (SOMEBODY)
    SOUR
    SWEET
    SUGAR CANE
    BRACKISH
    SUGAR
    TASTY
    CALCULATE
    IMITATE
    CITRUS FRUIT
    TASTE (SOMETHING)
    READ
    COME
    PRECIPICE
    SEE
    STONE OR ROCK
    APPROACH
    TOUCH
    ARRIVE
    YEAR
    MEET
    GRIND
    FRAGRANT
    ROTTEN SMELL (STINK)
    SMELL (PERCEIVE)
    STINKING
    SNIFF
    PUS
    FEEL
    UNDERSTAND
    HEAR
    THINK (BELIEVE)
    LISTEN
    MOVE (AFFECT EMOTIONALLY)
    KNOW (SOMETHING)
    NOTICE (SOMETHING)
    WATCH
    LEARN
    REEF
    STUDY
    LOOK FOR
    LOOK
    NASAL MUCUS (SNOT)
    SPLASH
    PITY
    HIDE (CONCEAL)
    SHELF
    FLY (MOVE THROUGH AIR)
    REGRET
    NOSTRIL
    THIEF
    BOARD
    SINK (DESCEND)
    DECREASE
    CHEEK
    NOSE
    BROKEN
    LOSE
    EMERGE (APPEAR)
    ANXIETY
    BAD LUCK
    GOOD LUCK
    OMEN
    WRONG
    SLAB
    FOREHEAD
    EYE
    BAD
    EVIL
    TABLE
    INJURE
    DANGER
    SURPRISED
    HARVEST
    BERRY
    FEAR (FRIGHT)
    NUT FAULT
    MISTAKE
    BECOME SICK
    SEED
    MISS (A TARGET)
    GUILTY
    SWELLING
    BRUISE
    BLISTER
    BOIL (OF SKIN)
    SCAR
    CHOKE
    ENTER
    ACHE
    SICK
    DISEASE
    PAIN
    DAMAGE (INJURY)
    SEVERE
    GRIEF
    SAUSAGE
    BEAD
    STOMACH
    INTESTINES
    CHAIN
    SPLEEN
    NECKLACE
    WOMB
    LIVER
    BELLY
    MEANING
    GHOST
    POSTCARD
    HEART
    LEGENDARY CREATURE
    SHADE
    DEMON
    BRAIN MEMORY
    FIGHT
    LETTER
    THOUGHT
    MIND
    BOOK
    COLLAR INTENTION
    SPIRIT
    PURSUE
    LONG HAIR
    SPRINGTIME
    HAIR (HEAD)
    THINK (REFLECT)
    DOUBT
    AUTUMN
    ORNAMENT
    HOPE
    ARMY
    QUARREL
    BEAT
    SOLDIER
    KNOCK
    BATTLE
    NOISE
    REST
    NAPE (OF NECK)
    THROAT
    NECK
    IDEA
    IF
    BECAUSE
    SLEEP
    FOREST
    DRIP (FALL IN GLOBULES)
    STICK
    TREE
    WALKING STICK
    PLANT (VEGETATION)
    LIE (REST)
    DRAG
    ASK (INQUIRE)
    DIVIDE
    URGE (SOMEONE)
    STING
    BRANCH
    CAMPFIRE BORROW SEPARATE TOOTH
    MOUTH
    CANDLE
    FALL ASLEEP
    DRIVE (CATTLE)
    MATCH
    DRIVE
    RAFTER
    BEAM
    DOORPOST
    DREAM (SOMETHING)
    POST
    MAST
    TUMBLE (FALL DOWN)
    WALK
    TREE TRUNK
    LAND (DESCEND)
    TEAR (SHRED)
    SAW
    GO OUT
    FALL
    TEAR (OF EYE) GO DOWN (DESCEND)
    BODY
    TREE STUMP
    SHOW
    CARVE
    SPOIL (SOMEBODY OR
    SOMETHING)
    BREAK (CLEAVE)
    PLANT (SOMETHING)
    DESTROY
    WALK (TAKE A WALK)
    CHIN
    BREAK (DESTROY OR GET
    DESTROYED)
    CUT
    PICK
    SPLIT
    LEAVE
    PULL
    CLUB
    WOOD
    MOVE (ONESELF)
    HIRE
    PRAISE
    MIX
    KNEAD
    WIPE
    SNEEZE
    BOAST
    SCRATCH
    CLEAN (SOMETHING)
    HOARFROST
    WORSHIP
    COUGH
    SWEEP
    RUB
    SCRAPE
    CARCASS
    DIE (FROM ACCIDENT)
    DIE
    BATHE
    SWIM
    DEAD
    FLOAT
    LOVE
    STAB
    SAIL
    PEEL
    SPREAD OUT
    CRY
    COMMON COLD (DISEASE)
    FROST
    CORPSE
    SHRIEK
    JUMP
    SHOUT
    DIG
    WINTER
    NAME
    STREAM (FLOW CONTINUOUSLY)
    PLOUGH
    CULTIVATE
    PLAY
    VISIBLE
    SEEM
    STRETCH
    SOW SEEDS
    RETREAT
    INVITE
    MUSIC
    RUN
    COLD
    HOLLOW OUT
    CHARCOAL
    TONGUE
    STOVE
    CONVERSATION
    SKIN
    DIVORCE
    OVEN
    EARWAX
    COOKHOUSE
    TIP (OF TONGUE)
    AIR
    HUNT
    BORE
    CALL BY NAME
    BREATH
    STEP (VERB)
    SONG
    ATTACK
    WASH
    PROUD
    SIN
    DEFENDANT
    CRIME
    CHIME (ACTION) EGG
    TESTICLES
    BARLEY
    FRUIT
    VEGETABLES
    GRAIN
    MAIZE
    RICE
    WHEAT
    RUDDER
    RYE
    PADDLE SWAY
    SWING (MOVEMENT)
    SWING (SOMETHING)
    SHAKE
    ROW
    FREEZE
    JOG (SOMETHING)
    OAT
    SHIVER
    RINSE
    RING (MAKE SOUND)
    MAKE NOISE
    SOUND (OF INSTRUMENT OR
    VOICE)
    TINKLE
    HOE
    SHOVEL
    SPADE
    FLOW
    DANCE
    FLEE
    CALL
    DAMAGE
    SAME FACE
    SIMILAR DISAPPEAR
    ESCAPE
    PRAY GAME
    BURY
    CAPE
    CHAIR
    MOVE
    STEAL
    GROAN
    HOWL
    COLD (CHILL)
    JAW
    DROWN
    SINK (DISAPPEAR IN WATER)
    SET (HEAVENLY BODIES)
    DIVE
    WOUND
    POUND
    TALK
    BREATHE
    PROMISE
    SPEAK
    WIND
    VOICE
    FUR
    PUBIC HAIR
    SOUND OR NOISE
    STRIKE OR BEAT
    BARK
    SCALE
    KILL
    HAMMER
    TONE (MUSIC)
    WOOL
    EXTINGUISH
    MURDER
    HIT
    SPEECH
    CHAT (WITH SOMEBODY)
    WORD
    STORM
    THRESH
    LEATHER
    LIKE
    NEED (NOUN)
    FELT
    SKIN (OF FRUIT)
    PAPER
    OATH
    WANT
    SWEAR
    KICK
    SNAIL
    DEATH
    PULL OFF (SKIN)
    SHELL
    FIREPLACE
    PEN
    HAIR (BODY)
    LANGUAGE
    CONVEY (A MESSAGE)
    TELL
    LEAF (LEAFLIKE OBJECT)
    FEATHER
    POUR
    FLAME
    GO
    SING
    BEESWAX
    HELL
    GATHER
    CARRY
    SEIZE
    CATCH
    TRAP (CATCH)
    WING
    FIRE
    CARRY ON SHOULDER
    CAST
    MOW
    BOSS
    FIND
    FIN
    ADMIT
    TEACH
    LEAF
    SAILCLOTH
    HAIR ANSWER
    SAY
    FOOT
    CIRCLE
    GRAIN
    Largest connected
    component in CLICS²
    Clusters inferred with
    the Infomap Community
    Detection algorithm
    59 / 60

    View Slide

  84. Outlook Towards Big Data
    Towards Big Data
    CLICS: Database of Cross-Linguistic Colexifications
    http://clics.clld.org
    List et al. (2018)
    CARRY IN HAND
    CARRY UNDER ARM
    RULE
    ORDER
    SALT
    TAKE
    CHOOSE
    LEND SHARE
    BRING
    FORGET
    ACQUIT
    HAVE SEX
    HAND
    LIBERATE
    DIRTY
    GUEST
    ARM
    BETWEEN
    UPPER ARM
    MOLD
    TORCH OR LAMP
    OWN
    GAP (DISTANCE)
    DRIP (EMIT LIQUID)
    FINGERNAIL OR TOENAIL
    RIVER
    KISS
    RAIN (PRECIPITATION)
    WHEN
    SPOON
    SUCK
    ROUND
    LICK
    FINGERNAIL
    CLAW SOUP
    DRINK
    FORK
    PITCHFORK
    WATER
    SEA
    OPEN
    SMOKE (INHALE)
    LET GO OR SET FREE
    CAUSE
    DIRT
    FORKED BRANCH
    SEND
    LIP
    FORGIVE
    UNTIE
    ANCHOR
    EAT
    BITE
    BEVERAGE
    SWALLOW
    SAP
    URINE
    ANKLE
    FISHHOOK
    WHEEL
    WHERE
    LIFT
    CHIEFTAIN
    LOWER ARM
    CAUSE TO (LET)
    QUEEN
    GIVE
    ELBOW
    DONATE
    ELECTRICITY
    SKY
    STORM CLOUDS
    MUD
    SWAMP
    SMOKE (EXHAUST)
    FRESH
    SMOKE (EMIT SMOKE)
    STRANGER
    CEASE
    MOORLAND
    HOST
    GO UP (ASCEND)
    WEDDING
    CLIMB
    CLOUD
    PALM OF HAND
    FIVE
    MARRY
    RISE (MOVE UPWARDS)
    WRIST
    KING
    PRESIDENT
    FATHOM
    COLLARBONE
    RIDE
    SPACE (AVAILABLE)
    MASTER
    SHOULDER
    BROOM
    RAKE
    FLESH
    HOOK
    DRIBBLE
    SPIT
    TOE
    PAW
    OCEAN
    FINGER
    LAKE
    EDGE
    OBSCURE
    TOP
    NIGHT
    INCREASE
    WORLD
    UP
    DARKNESS
    BE
    GOD
    CALF OF LEG
    LEG
    SHIN
    FISH
    LOWER LEG
    WOMAN
    FEMALE (OF PERSON)
    FEMALE
    FEMALE (OF ANIMAL)
    LAGOON
    CORNER
    BORDER
    BESIDE
    FRINGE
    BOUNDARY
    WIFE
    COAST
    POINTED
    SHARP
    SHORE
    PLACE (POSITION)
    END (OF SPACE)
    EARTH (SOIL)
    BLACK
    STAND UP
    CHEW
    MEAL
    BREAKFAST
    HEEL
    FOOD
    DINNER (SUPPER)
    FOOT
    STAR
    SAND
    CLAY
    STAND
    SHOULDERBLADE
    CRAWL
    WAKE UP FOG
    FINISH
    DARK
    MALE ICE
    WAIST
    MARRIED MAN
    HIP
    DEEP
    LUNG
    FOAM
    REMAINS
    BLUE
    WAIT (FOR)
    LIFE
    LATE
    BE ALIVE
    AFTER
    TOWN
    BEHIND
    ASH
    FLOUR
    STATE (POLITICS)
    NEW
    UPPER BACK
    BOTTOM
    PASTURE
    THATCH
    BUTTOCKS
    MAN
    MALE (OF ANIMAL)
    MALE (OF PERSON)
    SIT DOWN
    TALL
    CROUCH
    EVENING
    AFTERNOON
    HIGH
    WEST
    GROW
    MAINLAND
    SIT
    LAND
    FLOOR
    AREA
    HALT (STOP)
    DUST
    REMAIN
    GROUND
    NATIVE COUNTRY
    DWELL (LIVE, RESIDE)
    COUNTRY
    HUSBAND
    BACK
    END (OF TIME)
    SPINE
    GRASS
    DEW
    MARRIED WOMAN
    ROOSTER
    INSECT
    FOWL
    BIRD
    ANIMAL
    HEN
    SHORT
    BABY
    CORN FIELD
    THIN
    SAGO PALM
    GARDEN
    SMALL
    THIN (OF SHAPE OF OBJECT)
    CLAN
    NARROW
    FAMILY
    YOUNG
    CITIZEN
    FINE OR THIN
    SHALLOW
    THIN (SLIM)
    GIRL
    RELATIVES
    YOUNG MAN
    FRIEND
    PARENTS
    CHILD (DESCENDANT)
    YOUNG WOMAN
    BOY
    NEIGHBOUR
    CHILD (YOUNG HUMAN)
    SON
    SIBLING
    BROTHER
    DESCENDANTS
    OLDER SIBLING
    DAUGHTER
    ALONE
    FENCE
    ONLY
    FEW
    TOWER
    SOME
    ONE
    YARD
    OUTSIDE
    FORTRESS
    NEVER
    PLAIN
    PEOPLE
    VALLEY
    DOWN
    FIELD
    LOW
    PERSON
    YOUNGER SIBLING
    YOUNGER SISTER
    OLDER BROTHER
    YOUNGER BROTHER
    COUSIN
    SISTER
    OLDER SISTER
    NEPHEW
    DAMP
    FLOWER
    MANY
    SMOOTH
    WIDE
    FLAT
    BLOOD
    WET
    BELOW OR UNDER
    DOWN OR BELOW
    GREY
    BREAD
    DOUGH
    RAW
    VILLAGE
    GREEN
    CROWD
    SOFT
    AT
    ALL
    SLIP
    UNRIPE
    VEIN
    BLOOD VESSEL
    ALWAYS
    TENDON
    ROOF
    ROOT
    INSIDE
    OR
    GENTLE
    OLD
    WITH
    ENOUGH
    OLD (AGED)
    FORMER
    AND
    ROOM
    HOME
    TENT
    HUT
    GARDEN-HOUSE
    WEAK
    DENSE
    MEN'S HOUSE
    OLD MAN
    LAZY
    STILL (CONTINUING)
    TIRED
    AGAIN
    MORE
    READY
    OLD WOMAN
    SOMETIMES
    IN
    HOUSE
    OFTEN
    YELLOW
    RED
    AFTERWARDS
    BIG
    GOLD
    YOLK
    HOUR
    SALTY
    PINCH
    KNEEL
    AGE
    RIPE
    THICK
    FULL
    STRAIGHT
    BE LATE
    LIGHT (RADIATION) ABOVE
    WORK (ACTIVITY)
    PRODUCE
    MAKE
    DAY (NOT NIGHT)
    HEAVEN
    WORK (LABOUR) BUILD
    FAR
    AT THAT TIME
    LONG
    WHITE
    LENGTH
    THEN
    MOUNTAIN OR HILL
    SEASON
    HAVE
    PRESS
    GET
    PICK UP
    HEAD
    HOLD
    EARN
    DO OR MAKE
    WEATHER
    FATHER
    STEPFATHER
    UNCLE
    FATHER-IN-LAW (OF MAN)
    FATHER'S BROTHER
    MOTHER'S BROTHER
    STEPMOTHER
    AUNT
    BEGINNING
    BEGIN
    FIRST
    FATHER'S SISTER
    MOTHER-IN-LAW (OF WOMAN)
    MOTHER'S SISTER
    MOTHER
    MOTHER-IN-LAW (OF MAN)
    PARENTS-IN-LAW
    GRANDDAUGHTER
    SON-IN-LAW (OF WOMAN)
    FATHER-IN-LAW (OF WOMAN)
    SON-IN-LAW (OF MAN)
    DAUGHTER-IN-LAW (OF WOMAN)
    CHILD-IN-LAW
    SIBLING'S CHILD
    NIECE
    GRANDFATHER
    DAUGHTER-IN-LAW (OF MAN)
    IN FRONT OF
    FORWARD
    GRANDSON
    GRANDCHILD
    GRANDMOTHER
    ANCESTORS
    GRANDPARENTS
    THING
    STREET
    MANNER
    ROAD
    PIECE
    PORT
    PATH OR ROAD
    PATH
    RIB
    BONE
    BAIT
    THIGH
    BAY
    FLESH OR MEAT MEAT FOOTPRINT
    SIDE
    PART
    SLICE
    WALL (OF HOUSE)
    MIDDLE
    NAVEL
    SNOW
    LAST (FINAL)
    HAY HALF
    NEAR
    CHICKEN
    BULL
    SNAKE
    WORM
    CATTLE
    LIVESTOCK
    CALF
    OX
    COW
    WHICH
    WHITHER (WHERE TO)
    WINE
    HOW
    CIRCLE
    RING
    BALL
    BRACELET
    HOW MUCH
    HOW MANY
    BEEHIVE
    GRAVE
    CAVE
    BEARD
    RAIN (RAINING)
    SPRING OR WELL
    MOUSTACHE
    STREAM
    GLUE
    ALCOHOL (FERMENTED DRINK)
    BEE
    BEER
    HONEY
    WHO WASP
    MEAD
    WHAT
    WHY
    CANDY
    LUNCH
    ITEM
    WARE
    CUSTOM
    LAW
    MIDDAY
    PIT (POTHOLE)
    HOLE
    FURROW
    DITCH
    LAIR
    JUDGMENT
    COURT
    ADJUDICATE
    CONDEMN
    CONVICT
    ACCUSE
    BLAME
    ANNOUNCE
    PREACH
    EXPLAIN
    SAY
    ASK (REQUEST)
    THROW
    BUDGE (ONESELF)
    SHOOT
    EMBERS
    UGLY
    CHOP
    CUT DOWN
    COLD (OF WEATHER)
    FIREWOOD
    GRASP
    LEAD (GUIDE)
    DISTANCE
    LIE DOWN
    CARRY ON HEAD
    PERMIT
    PUSH
    MOLAR TOOTH
    FRONT TOOTH (INCISOR)
    RIDGEPOLE
    BEAK
    COAT
    TOWEL
    HELMET
    SHIRT
    HEADBAND
    HEADGEAR
    RAG
    VEIL
    SOON
    TOGETHER
    IMMEDIATELY
    NEST
    NOW
    BED
    TODAY
    INSTANTLY
    SUDDENLY
    RUG
    WITHOUT
    PONCHO
    BLANKET
    CLOAK
    MAT
    BEFORE
    BOLT (MOVE IN HASTE)
    ROAR (OF SEA)
    FAST
    DASH (OF VEHICLE)
    EARLY
    YESTERDAY
    HURRY
    AT FIRST
    EMPTY
    NO
    DRY
    ZERO
    NOTHING
    NOT
    RESULT IN
    BE BORN
    HAPPEN
    PASS
    SUCCEED
    BECOME
    BRAVE
    CLOTH
    POWERFUL
    DARE
    LOUD
    GRASS-SKIRT
    DRESS
    CLOTHES
    SKIRT
    RIPEN
    SOLID
    PIERCE
    HARD
    BEGET
    ROUGH
    REFUSE
    FRY
    DRESS UP
    DENY
    CALM
    MORNING
    PEACE
    BE SILENT
    QUIET
    SWELL
    TOMORROW
    HEALTHY
    EXPENSIVE
    HAPPY
    ROAST OR FRY
    STRONG BAKE
    PRICE
    BOIL (SOMETHING)
    PUT ON
    COOKED
    SLOW
    FAITHFUL
    RIGHT
    LAST (ENDURE)
    FOR A LONG TIME
    DAWN
    BEAUTIFUL
    GOOD
    COOK (SOMETHING)
    YES
    CORRECT (RIGHT)
    BOIL (OF LIQUID)
    DO
    PUT
    BRIGHT
    CLEAN
    LIGHT (COLOR)
    LAY (VERB)
    SHINE
    SEAT (SOMEBODY)
    INNOCENT
    FORBID
    PREPARE
    CERTAIN
    TRUTH TRUE
    DEAR
    PRECIOUS
    WARM
    HEAT
    CONCEIVE
    SEW
    LOOM
    PLAIT
    LIGHT (IGNITE)
    BURN (SOMETHING) PREVENT
    HOLY
    GOOD-LOOKING
    ARSON
    BEND
    CHANGE (BECOME DIFFERENT)
    BURNING
    TWIST
    DEBT
    CROOKED
    ROLL
    SPIN
    HEAVY
    HOT
    WEAVE
    DIFFICULT
    FEVER
    PLAIT OR BRAID OR WEAVE
    PREGNANT
    OWE
    TWINKLE
    CLEAR
    BEND (SOMETHING)
    MORTAR CRUSHER
    PESTLE
    BITTER
    MILL MONTH SKULL
    MEASURE
    TRY
    COME BACK TIME
    MOON
    COUNT
    JOIN
    SQUEEZE
    PILE UP
    CLOCK
    BUY
    DRAW MILK
    DAY (24 HOURS)
    BETRAY
    GUARD
    PROTECT
    PAY
    KNEE
    KEEP
    SELL
    SUN
    BILL
    HELP
    LIE (MISLEAD)
    TRADE OR BARTER
    DECEIT
    PERJURY
    RESCUE
    CURE
    FOLD
    SIEVE
    PRESERVE
    TRANSLATE
    TURN (SOMETHING)
    TURN
    WRAP
    HERD (SOMETHING)
    WAGES
    DEFEND
    CHANGE
    RETURN HOME
    TIE UP (TETHER)
    TURN AROUND
    HANG
    KNIT
    WEIGH
    HANG UP
    GIVE BACK
    CONNECT
    COVER
    BUTTON
    BUNCH
    KNOT
    SHUT
    BUNDLE
    TIE
    NOOSE
    GILL
    EAR
    EARLOBE
    THINK
    FOLLOW
    JEWEL
    BE ABLE
    OBEY
    SUMMER
    FEEL (TACTUALLY)
    REMEMBER
    SUSPECT
    BELIEVE
    GUESS
    RECOGNIZE (SOMEBODY)
    SOUR
    SWEET
    SUGAR CANE
    BRACKISH
    SUGAR
    TASTY
    CALCULATE
    IMITATE
    CITRUS FRUIT
    TASTE (SOMETHING)
    READ
    COME
    PRECIPICE
    SEE
    STONE OR ROCK
    APPROACH
    TOUCH
    ARRIVE
    YEAR
    MEET
    GRIND
    FRAGRANT
    ROTTEN SMELL (STINK)
    SMELL (PERCEIVE)
    STINKING
    SNIFF
    PUS
    FEEL
    UNDERSTAND
    HEAR
    THINK (BELIEVE)
    LISTEN
    MOVE (AFFECT EMOTIONALLY)
    KNOW (SOMETHING)
    NOTICE (SOMETHING)
    WATCH
    LEARN
    REEF
    STUDY
    LOOK FOR
    LOOK
    NASAL MUCUS (SNOT)
    SPLASH
    PITY
    HIDE (CONCEAL)
    SHELF
    FLY (MOVE THROUGH AIR)
    REGRET
    NOSTRIL
    THIEF
    BOARD
    SINK (DESCEND)
    DECREASE
    CHEEK
    NOSE
    BROKEN
    LOSE
    EMERGE (APPEAR)
    ANXIETY
    BAD LUCK
    GOOD LUCK
    OMEN
    WRONG
    SLAB
    FOREHEAD
    EYE
    BAD
    EVIL
    TABLE
    INJURE
    DANGER
    SURPRISED
    HARVEST
    BERRY
    FEAR (FRIGHT)
    NUT FAULT
    MISTAKE
    BECOME SICK
    SEED
    MISS (A TARGET)
    GUILTY
    SWELLING
    BRUISE
    BLISTER
    BOIL (OF SKIN)
    SCAR
    CHOKE
    ENTER
    ACHE
    SICK
    DISEASE
    PAIN
    DAMAGE (INJURY)
    SEVERE
    GRIEF
    SAUSAGE
    BEAD
    STOMACH
    INTESTINES
    CHAIN
    SPLEEN
    NECKLACE
    WOMB
    LIVER
    BELLY
    MEANING
    GHOST
    POSTCARD
    HEART
    LEGENDARY CREATURE
    SHADE
    DEMON
    BRAIN MEMORY
    FIGHT
    LETTER
    THOUGHT
    MIND
    BOOK
    COLLAR INTENTION
    SPIRIT
    PURSUE
    LONG HAIR
    SPRINGTIME
    HAIR (HEAD)
    THINK (REFLECT)
    DOUBT
    AUTUMN
    ORNAMENT
    HOPE
    ARMY
    QUARREL
    BEAT
    SOLDIER
    KNOCK
    BATTLE
    NOISE
    REST
    NAPE (OF NECK)
    THROAT
    NECK
    IDEA
    IF
    BECAUSE
    SLEEP
    FOREST
    DRIP (FALL IN GLOBULES)
    STICK
    TREE
    WALKING STICK
    PLANT (VEGETATION)
    LIE (REST)
    DRAG
    ASK (INQUIRE)
    DIVIDE
    URGE (SOMEONE)
    STING
    BRANCH
    CAMPFIRE BORROW SEPARATE TOOTH
    MOUTH
    CANDLE
    FALL ASLEEP
    DRIVE (CATTLE)
    MATCH
    DRIVE
    RAFTER
    BEAM
    DOORPOST
    DREAM (SOMETHING)
    POST
    MAST
    TUMBLE (FALL DOWN)
    WALK
    TREE TRUNK
    LAND (DESCEND)
    TEAR (SHRED)
    SAW
    GO OUT
    FALL
    TEAR (OF EYE) GO DOWN (DESCEND)
    BODY
    TREE STUMP
    SHOW
    CARVE
    SPOIL (SOMEBODY OR
    SOMETHING)
    BREAK (CLEAVE)
    PLANT (SOMETHING)
    DESTROY
    WALK (TAKE A WALK)
    CHIN
    BREAK (DESTROY OR GET
    DESTROYED)
    CUT
    PICK
    SPLIT
    LEAVE
    PULL
    CLUB
    WOOD
    MOVE (ONESELF)
    HIRE
    PRAISE
    MIX
    KNEAD
    WIPE
    SNEEZE
    BOAST
    SCRATCH
    CLEAN (SOMETHING)
    HOARFROST
    WORSHIP
    COUGH
    SWEEP
    RUB
    SCRAPE
    CARCASS
    DIE (FROM ACCIDENT)
    DIE
    BATHE
    SWIM
    DEAD
    FLOAT
    LOVE
    STAB
    SAIL
    PEEL
    SPREAD OUT
    CRY
    COMMON COLD (DISEASE)
    FROST
    CORPSE
    SHRIEK
    JUMP
    SHOUT
    DIG
    WINTER
    NAME
    STREAM (FLOW CONTINUOUSLY)
    PLOUGH
    CULTIVATE
    PLAY
    VISIBLE
    SEEM
    STRETCH
    SOW SEEDS
    RETREAT
    INVITE
    MUSIC
    RUN
    COLD
    HOLLOW OUT
    CHARCOAL
    TONGUE
    STOVE
    CONVERSATION
    SKIN
    DIVORCE
    OVEN
    EARWAX
    COOKHOUSE
    TIP (OF TONGUE)
    AIR
    HUNT
    BORE
    CALL BY NAME
    BREATH
    STEP (VERB)
    SONG
    ATTACK
    WASH
    PROUD
    SIN
    DEFENDANT
    CRIME
    CHIME (ACTION) EGG
    TESTICLES
    BARLEY
    FRUIT
    VEGETABLES
    GRAIN
    MAIZE
    RICE
    WHEAT
    RUDDER
    RYE
    PADDLE SWAY
    SWING (MOVEMENT)
    SWING (SOMETHING)
    SHAKE
    ROW
    FREEZE
    JOG (SOMETHING)
    OAT
    SHIVER
    RINSE
    RING (MAKE SOUND)
    MAKE NOISE
    SOUND (OF INSTRUMENT OR
    VOICE)
    TINKLE
    HOE
    SHOVEL
    SPADE
    FLOW
    DANCE
    FLEE
    CALL
    DAMAGE
    SAME FACE
    SIMILAR DISAPPEAR
    ESCAPE
    PRAY GAME
    BURY
    CAPE
    CHAIR
    MOVE
    STEAL
    GROAN
    HOWL
    COLD (CHILL)
    JAW
    DROWN
    SINK (DISAPPEAR IN WATER)
    SET (HEAVENLY BODIES)
    DIVE
    WOUND
    POUND
    TALK
    BREATHE
    PROMISE
    SPEAK
    WIND
    VOICE
    FUR
    PUBIC HAIR
    SOUND OR NOISE
    STRIKE OR BEAT
    BARK
    SCALE
    KILL
    HAMMER
    TONE (MUSIC)
    WOOL
    EXTINGUISH
    MURDER
    HIT
    SPEECH
    CHAT (WITH SOMEBODY)
    WORD
    STORM
    THRESH
    LEATHER
    LIKE
    NEED (NOUN)
    FELT
    SKIN (OF FRUIT)
    PAPER
    OATH
    WANT
    SWEAR
    KICK
    SNAIL
    DEATH
    PULL OFF (SKIN)
    SHELL
    FIREPLACE
    PEN
    HAIR (BODY)
    LANGUAGE
    CONVEY (A MESSAGE)
    TELL
    LEAF (LEAFLIKE OBJECT)
    FEATHER
    POUR
    FLAME
    GO
    SING
    BEESWAX
    HELL
    GATHER
    CARRY
    SEIZE
    CATCH
    TRAP (CATCH)
    WING
    FIRE
    CARRY ON SHOULDER
    CAST
    MOW
    BOSS
    FIND
    FIN
    ADMIT
    TEACH
    LEAF
    SAILCLOTH
    HAIR ANSWER
    SAY
    FOOT
    CIRCLE
    GRAIN
    Largest connected
    component in CLICS²
    Clusters inferred with
    the Infomap Community
    Detection algorithm
    List et al. (u. rev.)
    TONGUE
    TELL ANNOUNCE
    TALK
    ADMIT
    CHAT (WITH SOMEBODY)
    SAY
    WORD
    ANSWER
    LANGUAGE
    VOICE
    SOUND OR NOISE
    NOISE
    PREACH
    SPEECH
    TONE (MUSIC)
    EXPLAIN
    CONVERSATION
    CONVEY (A MESSAGE)
    SPEAK
    59 / 60

    View Slide

  85. Outlook New Hypotheses
    From Problems to Solutions
    Formulating open problems for our field is a first step towards
    their solution. Especially searching for problems that may have
    been overlooked so far is a first step to a deeper understanding
    of our research and our research object.
    60 / 60

    View Slide

  86. Outlook New Hypotheses
    From Problems to Solutions
    Thanks for your attention!
    • Principal Investigator: Dr. Johann-Mattis List
    • Project Full Title: Computer-Assisted Language
    Comparison. Reconciling Computational and Classical
    Approaches in Historical Linguistics
    • Project Short Name: CALC
    • Project duration: 04/2017 — 03/2022
    • Host institution: Department of Linguistic and Cultural
    Evolution (MPI-SHH)
    LC
    CA
    COMPUTA-
    TIONAL
    HISTORICAL
    LINGUISTICS
    COMPA-
    RATIVE
    METHOD
    Thanks to CALC associates, advisors, and critics: Cormac Anderson, Wolfgang Behr,
    Timotheus A. Bodt, Thiago Chacon, Michael Cysouw, Robert Forkel, Hans Geisler,
    Guido Grimm, Simon Greenhill, Russell Gray, Guillaume Jacques, Gerhard Jäger,
    Gereon Kaiping, Yunfan Lai, Nathan W. Hill, David Morrison, Justin Power, Taraka
    Rama, Christoph Rzymski, Laurent Sagart, Nathanael Schweikhard, George Starostin,
    Tiago Tresoldi, Mary Walworth, Søren Wichmann, Mei-Shin Wu
    60 / 60

    View Slide