Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Investigating Verb Derivation Patterns in Sino-Tibetan Languages within a Computer-Assisted Framework

Investigating Verb Derivation Patterns in Sino-Tibetan Languages within a Computer-Assisted Framework

Talk (together with Yunfan Lai) held at the workshop "Perspectives on low-resource language varieties" (2018-02-09, Saarland University, Saarbrücken)

Johann-Mattis List

February 09, 2018
Tweet

More Decks by Johann-Mattis List

Other Decks in Science

Transcript

  1. Investigating Verb Derivation Patterns in Sino-Tibetan
    Languages within a Computer-Assisted Framework
    Yunfan Lai and Johann-Mattis List
    Research Group “Computer-Assisted Language Comparison”
    Department of Linguistic and Cultural Evolution
    Max-Planck Institute for the Science of Human History
    Jena, Germany
    2018-02-09
    very
    long
    title
    P(A|B)=P(B|A)...
    1 / 30

    View Slide

  2. Historical Language Comparison
    2 / 30

    View Slide

  3. Historical Language Comparison
    2 / 30

    View Slide

  4. Historical Language Comparison
    2 / 30

    View Slide

  5. Historical Language Comparison
    2 / 30

    View Slide

  6. Historical Language Comparison
    Classical vs. Computational Language Comparison
    3 / 30

    View Slide

  7. Historical Language Comparison
    Classical vs. Computational Language Comparison
    3 / 30

    View Slide

  8. Historical Language Comparison
    Classical vs. Computational Language Comparison
    3 / 30

    View Slide

  9. Historical Language Comparison CALC
    Computer-Assisted Language Comparison
    4 / 30

    View Slide

  10. Historical Language Comparison CALC
    Computer-Assisted Language Comparison
    4 / 30

    View Slide

  11. Historical Language Comparison CALC
    Computer-Assisted Language Comparison
    4 / 30

    View Slide

  12. Historical Language Comparison CALC
    Computer-Assisted Language Comparison
    4 / 30

    View Slide

  13. Historical Language Comparison CALC
    Computer-Assisted Language Comparison
    4 / 30

    View Slide

  14. Historical Language Comparison CALC
    Computer-Assisted Language Comparison
    4 / 30

    View Slide

  15. Historical Language Comparison CALC
    Computer-Assisted Language Comparison
    4 / 30

    View Slide

  16. Historical Language Comparison Standards, Software, and Tools
    Standards, Software, and Tools
    CLDF
    5 / 30

    View Slide

  17. Historical Language Comparison Standards, Software, and Tools
    Standards, Software, and Tools
    CLDF
    Cross-Linguistic Data Formats
    (CLDF):
    - defines standards for data
    sharing
    - can be read and manipulated
    lated by different tools
    - http://cldf.clld.org
    5 / 30

    View Slide

  18. Historical Language Comparison Standards, Software, and Tools
    Standards, Software, and Tools
    CLDF
    Glottolog:
    - language identifiers
    - language coordinates
    - language classification
    - http://glottolog.org
    5 / 30

    View Slide

  19. Historical Language Comparison Standards, Software, and Tools
    Standards, Software, and Tools
    CLDF
    Concepticon:
    - concept identifiers
    - concept metadata
    - concept ontology
    - concepticon.clld.org
    5 / 30

    View Slide

  20. Historical Language Comparison Standards, Software, and Tools
    Standards, Software, and Tools
    CLDF
    Cross-Linguistic Transcription
    Systems
    - reference catalogs for sounds
    - links to transcription systems
    - links to transcription data
    - http://clts.clld.org
    5 / 30

    View Slide

  21. Historical Language Comparison Standards, Software, and Tools
    Standards, Software, and Tools
    CLDF
    LingPy
    - Python software package
    - sequence comparison
    - cognate detection
    - language classification
    - http://lingpy.org
    5 / 30

    View Slide

  22. Historical Language Comparison Standards, Software, and Tools
    Standards, Software, and Tools
    CLDF
    EDICTOR
    - manual data annotation
    - manual data analysis
    - web-based tool
    - http://edictor.digling.org
    5 / 30

    View Slide

  23. Historical Language Comparison Standards, Software, and Tools
    Standards, Software, and Tools
    CLDF
    Database of Cross-Linguistic
    Colexifications (CLICS)
    - provides account on cross-
    linguistic polysemies
    - proxy for investigating
    semantic change
    - http://cldf.clld.org
    5 / 30

    View Slide

  24. Historical Language Comparison Standards, Software, and Tools
    Standards, Software, and Tools
    CLDF
    CLLD
    - framework for data
    publication
    - homogeneous look-and-feel
    - well-known among linguists
    - http://clld.org
    5 / 30

    View Slide

  25. The Story of Chinese “star”
    6 / 30

    View Slide

  26. The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’
    Just a couple of weeks ago, Laurent Sagart and Guillaume Jacques
    had a discussion on the Chinese word for “star”, which is
    reconstructed as *s-tsʰˤeŋ in Old Chinese by Baxter and Sagart (2014).
    7 / 30

    View Slide

  27. The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’
    8 / 30

    View Slide

  28. The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’
    8 / 30

    View Slide

  29. The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’
    8 / 30

    View Slide

  30. The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’
    8 / 30

    View Slide

  31. The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’
    Reflections
    When discussing etymologies involving nominal and verbal derivation, we
    often end up discussing about
    vague semantic analyses,
    “educated guesses”, applied to
    languages largely understudied.
    9 / 30

    View Slide

  32. The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’
    Reflections
    When discussing etymologies involving nominal and verbal derivation, we
    often end up discussing about
    vague semantic analyses,
    “educated guesses”, applied to
    languages largely understudied.
    All scholars would probably agree that
    to advance these discussions (which may easily turn in circles),
    stricter formalization
    could help to set up a boundary for our disagreements.
    9 / 30

    View Slide

  33. The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’
    Reflections
    Many attempts to formalize semantic change have been made, but they
    are not feasible to help us investigate the questions at hand. It would be
    good if we had
    10 / 30

    View Slide

  34. The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’
    Reflections
    Many attempts to formalize semantic change have been made, but they
    are not feasible to help us investigate the questions at hand. It would be
    good if we had
    large-scale samples of abstract and concrete patterns of derivational
    semantics, which are
    stored in such a way that we can directly compare across multiple
    language families and retrieve general assessments of the plausibility
    and frequency of patterns under discussion.
    10 / 30

    View Slide

  35. The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’
    Reflections
    What we find instead are
    11 / 30

    View Slide

  36. The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’
    Reflections
    What we find instead are
    very detailed single-language accounts on derivation patterns, which
    are
    usually not comparable across languages.
    11 / 30

    View Slide

  37. The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’
    Reflections
    What we find instead are
    very detailed single-language accounts on derivation patterns, which
    are
    usually not comparable across languages.
    Our dilemma is: if we go large-scale, our analyses are useless for single
    languages, but if we go small-scale, we loose comparability, as the patterns
    are too specific for one language.
    11 / 30

    View Slide

  38. The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’
    Reflections
    We can overcome the scaling problem by establishing comparable
    small-scale analyses, which
    12 / 30

    View Slide

  39. The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’
    Reflections
    We can overcome the scaling problem by establishing comparable
    small-scale analyses, which
    adhere to standards,
    represent data in human- and machine-readable form, and
    embrace the Zen of Python: “simple things should be simple, complex
    things should be possible”
    12 / 30

    View Slide

  40. Khroskyabs Causativisation
    13 / 30

    View Slide

  41. Khroskyabs Causativisation
    The Khroskyabs Language
    Rgyalrongic, Sino-Tibetan
    Rgnaba Prefecture, Sichuan Province
    Dialects: Wobzi, Siyuewu, etc.
    14 / 30

    View Slide

  42. Khroskyabs Causativisation
    The Khroskyabs Language
    complex phonology
    ʁɴzbrɑ́ ‘to dare’
    jzmbjə̂m ‘to let fly’
    complex morphology
    polysynthetic templatic morphology
    hierarchical alignment
    verbal derivation
    15 / 30

    View Slide

  43. Khroskyabs Causativisation
    Khroskyabs Causative Constructions: An Overview
    Lai (2014, 2016)
    s-Causative: prefix s-
    v-Causative: prefix v-
    lexical causative
    suppletive pairs
    labile verbs
    anticausative pairs
    16 / 30

    View Slide

  44. Khroskyabs Causativisation
    s-Causative
    Table: s-Causative and v-Causative
    Base Gloss Causative Gloss
    qʰrɑ́ to be big s-qʰrɑ́ to cause to be big
    kʰɑ̂ to give s-kʰɑ̂ to cause to give
    rǽ to write s-rǽ to cause to write
    tsʰû to be boiled v-ftsʰû > f-tsʰû to boil
    17 / 30

    View Slide

  45. Khroskyabs Causativisation
    Anticausative pairs
    Table: Anticausative pairs in Khroskyabs
    Transitive Gloss Intransitive Gloss
    ftɕʰə̂ to melt tr. dʑə̂ to melt intr.
    kʰlǽ to perish glǽ to die out
    ntɕʰətɕʰɑ́v to trip ndʑədʑɑ́v to tumble
    ntsʰɑ̂ɣ to wear dzɑ̂ɣ to be there (attached)
    pʰrə̂ to loosen brə̂ to become loose
    tɕʰǽv to break tr. dʑǽv to break intr.
    tɕə̂rə to tear dʑə̂rə to be torn intr.
    18 / 30

    View Slide

  46. Khroskyabs Causativisation
    Irregular cases
    Table: Irregular cases
    Base Gloss Causative Gloss
    vzɑ́r to be spicy l-zɑ́v to cause to be spicy
    jdʑə̂r to mill jdʑə̂-l to cause to mill
    tʰê to drink s-tʰé to cause to drink
    çtə̂ to be short s-tə́m to shorten
    19 / 30

    View Slide

  47. Khroskyabs Causativisation
    What We Wish to Do...
    Use an onomasiological approach to guarantee comparability across
    languages, and establish a first list of causative concepts along with their
    source concepts:
    BOILED vs. BOIL
    TRIP vs. TUMBLE
    PERISH vs. DIE OUT
    SHORT vs. SHORTEN
    ...
    20 / 30

    View Slide

  48. Khroskyabs Causativisation
    What We Wish to Do...
    Use an onomasiological approach to guarantee comparability across
    languages, and establish a first list of causative concepts along with their
    source concepts:
    BOILED vs. BOIL
    TRIP vs. TUMBLE
    PERISH vs. DIE OUT
    SHORT vs. SHORTEN
    ...
    We then investigate how these pairs are linked with each other in the
    target language, for example by
    affixation (and what kind of affixation)
    voicing alternations (frequent in Sino-Tibetan)
    suppletion
    or else?
    20 / 30

    View Slide

  49. Khroskyabs Causativisation
    What Tools to Use
    Our project and the DLCE of MPI-SHH has already established many of
    the important tools or is currently working on their implementation. As of
    now, the most important tools for this study are:
    Concepticon (List et al. 2016, as our reference catalogue for
    meanings),
    Glottolog (Hammarström et al. 2017, as our reference catalogue for
    languages),
    CLTS (List et al. in Prep., our reference catalogue for sound
    segments),
    CLDF (Forkel et al. in Prep., our overarching standard for data
    exchange),
    CLICS (List et al. 2014, our cross-linguistic approach for measuring
    semantic similarity),
    EDICTOR (List 2017, our tool for data annotation and analysis)
    21 / 30

    View Slide

  50. Khroskyabs Causativisation
    Annotation Examples
    Enhanced annotation is a major asset of the CALC project. The goal is to
    22 / 30

    View Slide

  51. Khroskyabs Causativisation
    Annotation Examples
    Enhanced annotation is a major asset of the CALC project. The goal is to
    provide data in human- and machine readable form,
    allow for both a comparison across and inside a given language,
    embrace standards while also allowing for flexible and
    language-specific solutions,
    support efficiency by providing a healthy mixture between scripts (in
    Python) and web-based tools (EDICTOR, in JavaScript) to assist the
    annotation process.
    Before we can annotate, however, we need to understand what and how
    we can do this!
    22 / 30

    View Slide

  52. Khroskyabs Causativisation
    Annotation Examples
    ROOT and STEM
    qʰrɑ́ ‘to be big’ vs s-qʰrɑ́ ‘cause to be big’
    ROOT: qʰrɑ́
    STEM: qʰrɑ́ and s-qʰrɑ́
    ftɕʰə̂ ‘to melt tr.’ vs dʑə̂ ‘to melt itr.’ < [+VOICING] + tɕʰə̂
    ROOT: tɕʰə̂
    STEM: f-tɕʰə̂ and [+VOICING] + tɕʰə̂
    23 / 30

    View Slide

  53. Khroskyabs Causativisation
    Prefixation
    Simple prefixation of s- and v-
    24 / 30

    View Slide

  54. Khroskyabs Causativisation
    Voicing Alternation
    non-aspirated voiceless as ROOT
    25 / 30

    View Slide

  55. Khroskyabs Causativisation
    Irregular Cases
    tone alternation
    numbering ROOT
    detection of reduction
    metathesis
    26 / 30

    View Slide

  56. Khroskyabs Causativisation
    Irregular Cases
    tone alternation
    numbering ROOT
    detection of reduction
    metathesis
    We have some ideas of how to handle metathesis, we are still in the stage of
    discussing how to handle it best. Reduction is a harder case, as is tonal
    alternation. For the time being, we decide to collect these examples but not rush
    with a solution until we have found out more about these particular irregularities.
    26 / 30

    View Slide

  57. Khroskyabs Causativisation
    What Can We do Then?
    27 / 30

    View Slide

  58. Khroskyabs Causativisation
    What Can We do Then?
    Thanks to the fact that our data is linked to our standards, we can
    27 / 30

    View Slide

  59. Khroskyabs Causativisation
    What Can We do Then?
    Thanks to the fact that our data is linked to our standards, we can
    expand the comparison from one to many dialects of Khroskyabs,
    use our questionnaires and annotation frameworks for other
    Sino-Tibetan languages (preliminary work on Kiranti with Guillaume
    Jacques has been carried out)
    compare derivation patterns across unrelated languages and make
    typologists happy
    27 / 30

    View Slide

  60. Khroskyabs Causativisation
    Interactive Etymologies
    Our current annotation can be directly fit into word derivation graphs (or
    partial colexification networks, cf. Hill and List 2017):
    28 / 30

    View Slide

  61. Khroskyabs Causativisation
    Interactive Etymologies
    Our current annotation can be directly fit into word derivation graphs (or
    partial colexification networks, cf. Hill and List 2017):
    28 / 30

    View Slide

  62. Khroskyabs Causativisation
    Benefits
    Thanks to our adherence to standardized annotations, our approach will
    lead to improved:
    transparency (human- and machine-readable data)
    efficiency (thanks to algorithms and annotation tools designed for the
    tasks at hand)
    re-usability (in typological studies and historical language comparison)
    29 / 30

    View Slide

  63. Khroskyabs Causativisation
    Benefits
    Thanks to our adherence to standardized annotations, our approach will
    lead to improved:
    transparency (human- and machine-readable data)
    efficiency (thanks to algorithms and annotation tools designed for the
    tasks at hand)
    re-usability (in typological studies and historical language comparison)
    So far, we are just about to get started, but many things are already in
    place, and we are keen on exploring the possibilities, but also the
    disadvantages of our preliminary ideas with you!
    29 / 30

    View Slide

  64. Back to Our Chinese “star”
    We cannot solve the word’s history now, but suppose we follow up on our
    standardised annotation of linguistic data on the micro-level, we can
    harvest cross-linguistic data on the macro-level. If we expand the analyses
    of verbal derivation in Khroskyabs to more languages of the Sino-Tibetan
    family, we may be able to
    substantiate the typological plausibility of hypotheses regarding
    Chinese “star”,
    reliably reconstruct the meaning of its stem,
    determine the function of the prefix, and
    draw explicit pathways of semantic change.
    30 / 30

    View Slide

  65. Back to Our Chinese “star”
    «Chaque mot a son histoire». But many word histories are similar. If we
    start classifying them, what we may learn can go easily beyond the history
    of the word for “star” in Chinese.
    30 / 30

    View Slide

  66. Danke für Ihre Aufmerksamkeit!
    30 / 30

    View Slide