$30 off During Our Annual Pro Sale. View Details »

Representing concepts for the purpose of cross-linguistic language comparison

Representing concepts for the purpose of cross-linguistic language comparison

Keynote held at CARLA 2020, the second international workshop on "Concepts in Action: Representation, Learning, and Application" (2020-09-23, virtual conference, Bolzano, University of Osnabrück).

Johann-Mattis List

September 23, 2020
Tweet

More Decks by Johann-Mattis List

Other Decks in Science

Transcript

  1. Representing concepts for the purpose of
    cross-linguistic language comparison
    Johann-Mattis List
    Research Group “Computer-Assisted Language Comparison”
    Department of Linguistic and Cultural Evolution
    Max Planck Institute for the Science of Human History
    Jena, Germany
    2020/09/23
    very
    long
    title
    P(A|B)=P(B|A)...
    1 / 32

    View Slide

  2. Background
    2 / 32
    Comparative Linguistics

    View Slide

  3. Background
    2 / 32
    "All languages change, as long as they exist."
    (August Schleicher 1863)
    walkman
    Indo-European
    Germanic
    Old English
    English
    p
    f
    f
    f
    ə
    a
    æ
    ɑː
    t
    d
    d
    ð


    e
    ə
    r
    r
    r
    r
    Germanic
    German English
    iPod
    Comparative Linguistics

    View Slide

  4. Background
    2 / 32
    iPod
    Indo-European
    Germanic
    Old English
    English
    p
    f
    f
    f
    ə
    a
    æ
    ɑː
    t
    d
    d
    ð


    e
    ə
    r
    r
    r
    r
    Germanic
    German English
    walkman
    "All languages change, as long as they exist."
    (August Schleicher 1863)
    Comparative Linguistics

    View Slide

  5. Background
    2 / 32
    walkman
    Indo-European
    Germanic
    Old English
    English
    p
    f
    f
    f
    ə
    a
    æ
    ɑː
    t
    d
    d
    ð


    e
    ə
    r
    r
    r
    r
    Germanic
    German English
    iPod
    "All languages change, as long as they exist."
    (August Schleicher 1863)
    Comparative Linguistics

    View Slide

  6. Background
    2 / 32
    walkman
    Indo-European
    Germanic
    Old English
    English
    p
    f
    f
    f
    ə
    a
    æ
    ɑː
    t
    d
    d
    ð


    e
    ə
    r
    r
    r
    r
    Germanic
    German English
    iPod
    "All languages change, as long as they exist."
    (August Schleicher 1863)
    Comparative Linguistics

    View Slide

  7. Background Background
    Background on Language Comparison
    3 / 32
    Icelandic
    Old Indian
    Old Greek
    Latin
    Sanskrit
    Jacob
    Grimm
    Rasmus
    Rask
    Undersøgelse
    om det gamle
    Nordiske
    Sprogs
    Oprindelse
    1818
    Deutsche
    Grammatik
    (Ausgabe
    II)
    1822

    View Slide

  8. Background Background
    Background on Language Comparison
    3 / 32
    Icelandic
    Old Indian
    Old Greek
    Latin
    Sanskrit
    Indo-European
    Method for Language Comparison
    • intensive language comparison
    • identify regularly recurring similaritities
    → prove language relationship
    → reconstruct development of language families
    Jacob
    Grimm
    Rasmus
    Rask
    Undersøgelse
    om det gamle
    Nordiske
    Sprogs
    Oprindelse
    1818
    Deutsche
    Grammatik
    (Ausgabe
    II)
    1822

    View Slide

  9. Background Background
    Background on Language Comparison
    3 / 32
    Icelandic
    Old Indian
    Old Greek
    Latin
    Sanskrit
    Indo-European
    Method for Language Comparison
    • intensive language comparison
    • identify regularly recurring similaritities
    → prove language relationship
    → reconstruct development of language families
    Jacob
    Grimm
    Rasmus
    Rask
    Undersøgelse
    om det gamle
    Nordiske
    Sprogs
    Oprindelse
    1818
    Deutsche
    Grammatik
    (Ausgabe
    II)
    1822

    View Slide

  10. Background Comparative Method
    The Comparative Method
    4 / 32

    View Slide

  11. Background Comparative Method
    The Comparative Method
    4 / 32

    View Slide

  12. Background Comparative Method
    The Comparative Method
    4 / 32

    View Slide

  13. Background Computational Linguistics
    Computational Historical Linguistics
    5 / 32
    problems of computational approaches
    → lack of flexibility
    → lack of accuracy
    → often rely on manually annotated data
    → produce results in a black-box fashion
    Breton d - ã n t -
    Danish d̥ʰ - a n - -
    Dutch t - ɑ n t -
    English t - uː - θ -
    French d - ã - - -
    German t͜s - aː n - -
    Greek ð - o̞ n d i
    Italian d - ɛ n t e
    Portuguese d - ẽ - t ɨ
    Spanish d j e n t e
    /-French
    |
    | /-Greek_Mod
    | |
    ----| /---| /-Portuguese
    | | | |
    | | \---| /-Italian
    | | | /---|
    | | | | \-Spanish
    \---| \---|
    | | /-Breton
    | \---|
    | \-Dutch
    |
    | /-English
    \---|
    | /-Danish
    \---|
    \-German
    phonetic alignment (List 2012, 2014)
    phylogenetic reconstruction

    View Slide

  14. Background Computational Linguistics
    Computational Historical Linguistics
    5 / 32
    problems of computational approaches
    → lack of flexibility
    → lack of accuracy
    → often rely on manually annotated data
    → produce results in a black-box fashion
    Breton d - ã n t -
    Danish d̥ʰ - a n - -
    Dutch t - ɑ n t -
    English t - uː - θ -
    French d - ã - - -
    German t͜s - aː n - -
    Greek ð - o̞ n d i
    Italian d - ɛ n t e
    Portuguese d - ẽ - t ɨ
    Spanish d j e n t e
    /-French
    |
    | /-Greek_Mod
    | |
    ----| /---| /-Portuguese
    | | | |
    | | \---| /-Italian
    | | | /---|
    | | | | \-Spanish
    \---| \---|
    | | /-Breton
    | \---|
    | \-Dutch
    |
    | /-English
    \---|
    | /-Danish
    \---|
    \-German
    phonetic alignment (List 2012, 2014)
    phylogenetic reconstruction

    View Slide

  15. Background Computational Linguistics
    Computational Historical Linguistics
    5 / 32
    problems of computational approaches
    → lack of flexibility
    → lack of accuracy
    → often rely on manually annotated data
    → produce results in a black-box fashion
    Breton d - ã n t -
    Danish d̥ʰ - a n - -
    Dutch t - ɑ n t -
    English t - uː - θ -
    French d - ã - - -
    German t͜s - aː n - -
    Greek ð - o̞ n d i
    Italian d - ɛ n t e
    Portuguese d - ẽ - t ɨ
    Spanish d j e n t e
    /-French
    |
    | /-Greek_Mod
    | |
    ----| /---| /-Portuguese
    | | | |
    | | \---| /-Italian
    | | | /---|
    | | | | \-Spanish
    \---| \---|
    | | /-Breton
    | \---|
    | \-Dutch
    |
    | /-English
    \---|
    | /-Danish
    \---|
    \-German
    phonetic alignment (List 2012, 2014)
    phylogenetic reconstruction

    View Slide

  16. 6 / 32
    The CALC Project

    View Slide

  17. 6 / 32
    Language families like Sino-Tibetan present
    "almost unsurmountable obstacles".
    (Antoine Meillet 1925)
    insights
    → language change
    → human prehistory
    → triggers of diversity of life and culture
    → classical methods reach their limit
    → computational methods cannot replace
    experts' experience and intuition
    obstacles
    increasing amounts of data
    historical language comparison
    large and diverse language families
    challenges
    The CALC Project

    View Slide

  18. 6 / 32
    Language families like Sino-Tibetan present
    "almost unsurmountable obstacles".
    (Antoine Meillet 1925)
    insights
    → language change
    → human prehistory
    → triggers of diversity of life and culture
    → classical methods reach their limit
    → computational methods cannot replace
    experts' experience and intuition
    obstacles
    increasing amounts of data
    historical language comparison
    large and diverse language families
    challenges
    The CALC Project

    View Slide

  19. 6 / 32
    Language families like Sino-Tibetan present
    "almost unsurmountable obstacles".
    (Antoine Meillet 1925)
    insights
    → language change
    → human prehistory
    → triggers of diversity of life and culture
    → classical methods reach their limit
    → computational methods cannot replace
    experts' experience and intuition
    obstacles
    increasing amounts of data
    historical language comparison
    large and diverse language families
    challenges
    The CALC Project

    View Slide

  20. The CALC Project Starting Point
    Classical and Computer-Based Language Comparison
    7 / 32
    LC
    CA
    lacks
    efficiency
    consistency
    efficiency
    accuracy
    COMPA-
    RATIVE
    METHOD
    COMPUTA-
    TIONAL
    HISTORICAL
    LINGUISTICS
    flexibility

    View Slide

  21. The CALC Project Starting Point
    Classical and Computer-Based Language Comparison
    7 / 32
    LC
    CA
    lacks
    efficiency
    consistency
    efficiency
    accuracy
    COMPA-
    RATIVE
    METHOD
    COMPUTA-
    TIONAL
    HISTORICAL
    LINGUISTICS
    flexibility

    View Slide

  22. The CALC Project CALC
    Computer-Assisted Language Comparison
    8 / 32
    LC
    CA
    lacks
    efficiency
    consistency
    efficiency
    accuracy
    COMPA-
    RATIVE
    METHOD
    COMPUTA-
    TIONAL
    HISTORICAL
    LINGUISTICS
    flexibility

    View Slide

  23. The CALC Project CALC
    Computer-Assisted Language Comparison
    8 / 32

    View Slide

  24. The CALC Project CALC
    Computer-Assisted Language Comparison
    9 / 32
    very
    long
    title
    P(A|B)=P(B|A)...
    Funding: ERC Starting Grant
    (2017-2022)
    Host Institution: MPI-SHH
    (Jena)
    Team: 2 Post-Docs, 4 Docs (2
    financed by project, 2 financed
    externally), PI
    Goal: establish a framework for
    CALC and show how to apply it
    to the Sino-Tibetan language
    family.
    https://digling.org/calc/

    View Slide

  25. 10 / 32
    Basics

    View Slide

  26. 10 / 32
    Basics

    View Slide

  27. 10 / 32
    Basics

    View Slide

  28. 10 / 32
    Basics

    View Slide

  29. 10 / 32
    Basics

    View Slide

  30. 10 / 32
    Basics

    View Slide

  31. Basics of CALC Software
    LingPy
    11 / 32
    SOFTWARE
    >>> from lingpy import *
    >>> wl = Wordlist('tst')
    >>> wl.coverage()
    >>> wl.align()
    Python Library
    ✓ over 85 publications based on the software
    ✓ multiple phonetic alignments (List 2014 )
    ✓ automatic cognate detection (List et al. 2017)
    ✓ correspondence pattern identification (List 2019)
    State of the Art
    High Accuracy
    *h₂
    - multiple phonetic alignments (List 2014):
    - automatic cognate detection (List et al. 2017):
    - phylogenetic reconstruction (Rama et al. 2018):
    - correspondence pattern identification (List 2019):
    98% (pair scores)
    89% (B-Cubed scores)
    0.08 (Gen. Quart. Dist.)
    NP-hard (no human attempts)
    Ling
    Py.org

    View Slide

  32. Basics of CALC Interfaces
    EDICTOR
    12 / 32
    INTERFACES
    ID DOCULECT CONCEPT SEGMENTS
    N U O ?
    wOld
    yuE_5_1liaN_1
    moon
    moon
    moon
    moon
    Běijīng
    Guǎngzhōu
    Měixiàn
    Fúzhōu
    1
    2
    3
    4
    Conversion and Segmentation
    Highlighting of Unrecognized
    Phonetic Symbols
    yuE_5_1liaN_1
    yɛ⁵¹liɑŋ¹
    y ɛ ⁵¹ l i ɑ ŋ ¹
    annotate data
    analyze data
    edit alignments
    Etymological DICTionary ediTor
    http://edictor.digling.org
    List (2017)
    E D T

    View Slide

  33. Basics of CALC Data
    Data
    13 / 32
    DATA
    CLDF
    >>> from pycldf import *
    >>> ds = Dataset('path')
    >>> ds.validate()
    >>> ds.statistics()
    Validation Software
    ID CONCEPT IPA COGNACY
    1 hand hant 1
    2 hand hænd 1
    3 ruka ruka 2
    4 rẽnka rẽnka 2
    ... ... ... ...
    Spreadsheet Formats
    Online publication with CLLD
    pypi.org/project/pycldf/
    Glottolog
    arbitrarité
    Concepticon
    CLTS
    Languages
    Concepts
    Speech sounds
    CLTS
    siː əl tiː əs
    cldf.clld.org
    w3.org/2013/csvw/
    Reference catalogs
    Cross-Linguistic Data Formats Initiative
    (Forkel, List et al. 2018)

    View Slide

  34. Basics of CALC Data
    Data
    13 / 32
    DATA

    View Slide

  35. Basics of CALC Data
    Data
    13 / 32
    DATA

    View Slide

  36. 14 / 32
    Examples

    View Slide

  37. 14 / 32
    Semantic
    Colexification
    Networks
    Comparison
    Data
    Linking
    Concepts
    Text
    Integrating
    Concepts
    Examples

    View Slide

  38. Examples Linking Concepts
    Linking Concepts: Starting Point
    In the past centuries, scholars have been producing a large amount of
    concept lists.
    A concept list is in its simples form a list of concepts (e.g., I, you,
    he/she, dog, cat) which scholars find interesting for some linguistic,
    anthropological, or cognitive study.
    Starting with the work by Morris Swadesh, who proposed basic
    vocabulary as a concept important for historical linguistics, the
    compilation of concept lists has increased even more.
    For a very long time, scholars would just ignore the abundance of
    different concept lists produced in different fields and never try to
    systematically compare them.
    15 / 32

    View Slide

  39. Examples Linking Concepts
    Linking Concepts: Data and Analysis
    In 2016, we published the first version of the Concepticon project (List
    et al. 2016, https://concepticon.clld.org), the first attempt to
    link the numerous concept lists which have been compiled so far.
    We link concept lists by defining Concept Sets, that is, abstract
    concepts which are given a unique ID and a gloss (to ease elicitation)
    along with a definition and (potentially) additional metadata.
    All items of a given concept list are linked to the Concepticon
    Concept Sets where possible.
    By now, Concepticon has 3755 Concept Sets and links to 310
    different concept lists.
    16 / 32

    View Slide

  40. Examples Linking Concepts
    Linking Concepts: Data and Analysis
    We have regularly maintained and updated the Concepticon since
    2016.
    By now, we have a team of about 8-10 regular contributors.
    All concept lists that are added to the project are rigorously checked
    in a code-based review procedure along with computational checks for
    internal consistency.
    New lists can be automatically linked to the Concepticon and later
    manually refined (this works in up to 10 different languages).
    Concepticon is the basic reference catalog for concepts and elicitation
    glosses as underlying the Cross-Linguistic Data Formats initiative
    (Forkel et al. 2018, https://cldf.clld.org).
    17 / 32

    View Slide

  41. Examples Linking Concepts
    Linking Concepts: Results
    Concepticon is increasingly used by scholars who want to establish
    their own questionnaires or surveys for lexical data of the languages of
    the world.
    Concepticon is the core component that allowed for the relaunch of
    the CLICS database (see Semantic Networks, next example).
    The data is growing at a steady paste and the procedures for
    error-checking and evaluation are constantly being refined.
    Our code-based data curation approach has shown to be very efficient
    for projects with a long-term goal.
    Individual issues of defining concepts in the way in which we do this
    in Concepticon have been disseminated in form of discussions in Blog
    posts (e.g., List 2018).
    18 / 32

    View Slide

  42. Examples Linking Concepts
    Linking Concepts: Plans
    Version 2.4 is supposed to bring another larger extension of the
    Concepticon project by even more concept lists.
    We work on an integration of Concepticon with the NoRaRe database
    (last example in this talk).
    We pursue initial experiments that enhance our automated mapping
    algorithm (also considering the use of machine learning technologies),
    which is needed to provide access to Concepticon data for those
    projects that work with a lot of data (e.g., NLP projects).
    19 / 32

    View Slide

  43. Examples Semantic Networks
    Semantic Networks: Starting Point
    20 / 32
    forest tree wood stem branch root
    French fɔʀɛ bwɑ aʀbrə bwɑ tʀɔ bʀɑʃ ʀasin
    Russian lʲes dʲerɪva dʲerɪva stvɔl vʲetvʲ kɔrɪnʲ
    Croatian ʃuma staːblɔ dr ɔ staːblɔ graːna kɔriɛn
    Yukaghir aːnmonilʲe saːl saːl tʃilge tʃilge waruluː
    Yaqui dʒuja dʒuja kuta naːwa budʒa naːwa
    ,
    v
    1
    1
    2
    1
    1
    1
    Colexification
    Collective term for
    polysemy and homophony

    View Slide

  44. Examples Semantic Networks
    Semantic Networks: Starting Point
    20 / 32
    forest tree wood stem branch root
    French fɔʀɛ bwɑ aʀbrə bwɑ tʀɔ bʀɑʃ ʀasin
    Russian lʲes dʲerɪva dʲerɪva stvɔl vʲetvʲ kɔrɪnʲ
    Croatian ʃuma staːblɔ dr ɔ staːblɔ graːna kɔriɛn
    Yukaghir aːnmonilʲe saːl saːl tʃilge tʃilge waruluː
    Yaqui dʒuja dʒuja kuta naːwa budʒa naːwa
    ,
    v
    1

    View Slide

  45. Examples Semantic Networks
    Semantic Networks: Starting Point
    20 / 32
    forest tree wood stem branch root
    French fɔʀɛ bwɑ aʀbrə bwɑ tʀɔ bʀɑʃ ʀasin
    Russian lʲes dʲerɪva dʲerɪva stvɔl vʲetvʲ kɔrɪnʲ
    Croatian ʃuma staːblɔ dr ɔ staːblɔ graːna kɔriɛn
    Yukaghir aːnmonilʲe saːl saːl tʃilge tʃilge waruluː
    Yaqui dʒuja dʒuja kuta naːwa budʒa naːwa
    ,
    v
    1
    1
    2

    View Slide

  46. Examples Semantic Networks
    Semantic Networks: Starting Point
    20 / 32
    forest tree wood stem branch root
    French fɔʀɛ bwɑ aʀbrə bwɑ tʀɔ bʀɑʃ ʀasin
    Russian lʲes dʲerɪva dʲerɪva stvɔl vʲetvʲ kɔrɪnʲ
    Croatian ʃuma staːblɔ dr ɔ staːblɔ graːna kɔriɛn
    Yukaghir aːnmonilʲe saːl saːl tʃilge tʃilge waruluː
    Yaqui dʒuja dʒuja kuta naːwa budʒa naːwa
    ,
    v
    1
    1
    2
    1
    1
    1

    View Slide

  47. Examples Semantic Networks
    Semantic Networks: Data and Analysis
    21 / 32
    INTERFACES SOFTWARE
    DATA
    Database of
    Cross-Linguistic
    Colexifications
    CLICS
    https://clics.clld.org

    View Slide

  48. Examples Semantic Networks
    Semantic Networks: Data and Analysis
    21 / 32
    INTERFACES SOFTWARE
    DATA
    Database of
    Cross-Linguistic
    Colexifications
    CLICS
    https://clics.clld.org
    Interactive
    web application
    for browsing
    the data

    View Slide

  49. Examples Semantic Networks
    Semantic Networks: Data and Analysis
    21 / 32
    INTERFACES SOFTWARE
    DATA
    Database of
    Cross-Linguistic
    Colexifications
    CLICS
    https://clics.clld.org
    Interactive
    web application
    for browsing
    the data
    Test-based
    data
    lifting and
    curation
    CLDF

    View Slide

  50. Examples Semantic Networks
    Semantic Networks: Results
    22 / 32
    List, J.-M., A. Terhalle, and M. Urban (2013): Using network approaches
    to enhance the analysis of cross-linguistic polysemies. In: Proceedings of
    the 10th International Conference on Computational Semantics -- Short
    Papers. Association for Computational Linguistics 347-353.

    View Slide

  51. Examples Semantic Networks
    Semantic Networks: Results
    22 / 32
    Mayer, T., J.-M. List, A. Terhalle, and M. Urban (2014): An interactive
    visualization of cross-linguistic colexification patterns. In: Visualization as
    added value in the development, use and evaluation of Linguistic
    Resources. Workshop organized as part of the International Conference
    on Language Resources and Evaluation. 1-8.

    View Slide

  52. Examples Semantic Networks
    Semantic Networks: Results
    22 / 32
    List, J.-M., S. Greenhill, C. Anderson, T. Mayer, T. Tresoldi, and R.
    Forkel (2018): CLICS². An improved database of cross-linguistic
    colexifications assembling lexical data with help of cross-linguistic data
    formats. Linguistic Typology 22.2. 277-306.

    View Slide

  53. Examples Semantic Networks
    Semantic Networks: Results
    22 / 32
    Rzymski, C., T. Tresoldi, S. Greenhill, M. Wu, N. Schweikhard, M.
    Koptjevskaja-Tamm, V. Gast, T. Bodt, A. Hantgan, G. Kaiping, S. Chang,
    Y. Lai, N. Morozova, H. Arjava, N. Hübler, E. Koile, S. Pepper, M. Proos,
    B. Epps, I. Blanco, C. Hundt, S. Monakhov, K. Pianykh, S. Ramesh, R.
    Gray, R. Forkel, and J.-M. List (2020): The Database of Cross-Linguistic
    Colexifications, reproducible analysis of cross- linguistic polysemies.
    Scientific Data 7.13. 1-12.

    View Slide

  54. Examples Semantic Networks
    Semantic Networks: Results
    22 / 32
    CLICS¹ (2014)
    CLICS² (2018)
    CLICS³ (2020)

    View Slide

  55. Examples Semantic Networks
    Semantic Networks: Results
    22 / 32
    Jackson, J., J. Watts, T. Henry, J.-M. List, P. Mucha, R. Forkel, S.
    Greenhill, and K. Lindquist (2019): Emotion semantics show both cultural
    variation and universal structure. Science 366.6472. 1517-1522.

    View Slide

  56. Examples Semantic Networks
    Semantic Networks: Results
    22 / 32

    View Slide

  57. Examples Semantic Networks
    Semantic Networks: Results
    22 / 32

    View Slide

  58. Examples Semantic Networks
    Semantic Networks: Results
    22 / 32

    View Slide

  59. Examples Semantic Networks
    Semantic Networks: Results
    22 / 32

    View Slide

  60. Examples Semantic Networks
    Semantic Networks: Plans
    Expanding colexification analyses to include partial colexifications and
    directed networks.
    Creating partial colexification data for testing and training.
    Conducting targeted colexification studies.
    23 / 32

    View Slide

  61. Examples Integrating Concepts
    Integrating Concepts: Starting Point
    24 / 32
    Tjuka et al. (under review): 10.31234/osf.io/tgw3z

    View Slide

  62. Examples Integrating Concepts
    Integrating Concepts: Starting Point
    There is a wealth of data about concepts produced by historical
    linguists, corpus linguistics, computational linguistics, and
    psycholinguists.
    These data are rarely properly integrated.
    But if they were integrated with resources like the Concepticon, this
    would be fantastic, since it would offer us a large amount of new
    possibilities for our research.
    24 / 32

    View Slide

  63. Examples Integrating Concepts
    Integrating Concepts: Data and Analysis
    25 / 32
    Tjuka et al. (under review): 10.31234/osf.io/tgw3z

    View Slide

  64. Examples Integrating Concepts
    Integrating Concepts: Data and Analysis
    25 / 32
    Tjuka et al. (under review): 10.31234/osf.io/tgw3z

    View Slide

  65. Examples Integrating Concepts
    Integrating Concepts: Data and Analysis
    We apply our workflow for test-driven data curation to publicly
    available datasets which provide norms, ratings, or relations for
    concepts and words.
    We distinguish manually, semi-automatically, and automatically
    mapped resources (based on structure and size).
    We normalize the original data by tagging the columns and making
    them comparable across the different source datasets.
    25 / 32

    View Slide

  66. Examples Integrating Concepts
    Integrating Concepts: Results
    First version submitted and released (Tjuka, Forkel, and List, under
    review, https://digling.org/norare/).
    71 datasets from which 415 word and concept properties could be
    derived.
    Data curation workflow could be successfully evaluated (building also
    on our experience with Concepticon).
    Data applicability is largely enhanced thanks to the pynorare
    software API that allows for a quick comparison, but the data can
    also be easily analyzed with the help of R.
    26 / 32

    View Slide

  67. Examples Integrating Concepts
    Integrating Concepts: Plans
    Annika Tjuka (first author of the NoRaRe database) started to carry
    out different tests of the norms, ratings, and relations in NoRaRe and
    will pursue doing this.
    Expanding the database by adding specifically corpus data (e.g., for
    parallel bible corpus studies) and data from NLP studies (word
    embeddings).
    Enhancing the concept mapping algorithms (experiments with
    Christoph Rzymski).
    Integrating NoRaRe with the Concepticon web presentation (with
    Robert Forkel).
    27 / 32

    View Slide

  68. 28 / 32
    Outlook
    *deh3
    -
    ?

    View Slide

  69. Outlook Ongoing Projects
    Ongoing Projects
    Expanding CLICS as part of our lexibank initiative to lift and
    retro-standardize lexical data for the purpose of cross-linguistic
    comparison (see, among others, Forkel and List 2020: CLDFBench).
    Discussing further integration with psychological approaches by
    pushing language analysis (Jackson et al. under review).
    Enhanced approaches to the annotation of colexifications in lexical
    datasets (with Roberto Zariquiey, based on work presented in
    Schweikhard and List 2020).
    29 / 32

    View Slide

  70. Outlook Planned Projects
    Planned Projects
    An extended study on the semantics of body parts from the
    perspective of linguistic diversity (work with Annika Tjuka and
    Damián Blasi).
    Semantics underlying terms for body and mind (work led by
    MacCormack and Jackson in collaboration with Watts, and Henry).
    Creating enhanced, manually annotated datasets for the study of
    partial colexifications (work with Nathanael Schweikhard).
    30 / 32

    View Slide

  71. Outlook Possibilities
    Possibilities
    Based on work by Urban (2011), we can design an approach to detect
    partial colexifications in a cross-linguistic collection of lexical datasets.
    Unlike Urban’s claim, these networks reflect both metonymic and
    metaphorical relations among concepts across multiple languages.
    Pilot studies show promising results with respect to network
    structures.
    31 / 32

    View Slide

  72. Outlook Possibilities
    Possibilities
    31 / 32
    List (in preparation)

    View Slide

  73. Outlook Possibilities
    Possibilities
    31 / 32
    List (in preparation)

    View Slide

  74. 32 / 32
    Thanks to all who do research with our group and shared ideas, code, and data
    with us in the past: Cormac Anderson, Timotheus Bodt, Doug Cooper, Simon J.
    Greenhill, Russell D. Gray, Robert Forkel, Yunfan Lai, Nathan W. Hill, Jessica K.
    Ivani, Yunfan Lai, Christoph Rzymski, Nathanael E. Schweikhard, Tiago Tresoldi,
    and Mei-Shin Wu.
    Many thanks to the European Research Council for supporting the project
    "Computer-Assisted Language Comparison" as part of the H2020 Funding
    Schema in the form of an ERC Starting Grant (2017-2022).
    Thank You for Listening
    LC
    CA
    COMPUTA-
    TIONAL
    HISTORICAL
    LINGUISTICS
    COMPA-
    RATIVE
    METHOD

    View Slide