Investigating Verb Derivation Patterns in Sino-Tibetan Languages within a Computer-Assisted Framework

Investigating Verb Derivation Patterns in Sino-Tibetan Languages within a Computer-Assisted
Framework Yunfan Lai and Johann-Mattis List Research Group “Computer-Assisted Language Comparison” Department of Linguistic and Cultural Evolution Max-Planck Institute for the Science of Human History Jena, Germany 2018-02-09 very long title P(A|B)=P(B|A)... 1 / 30

Historical Language Comparison 2 / 30

Historical Language Comparison Classical vs. Computational Language Comparison 3 /
30

Historical Language Comparison CALC Computer-Assisted Language Comparison 4 / 30

Historical Language Comparison Standards, Software, and Tools Standards, Software, and
Tools CLDF 5 / 30

Tools CLDF Cross-Linguistic Data Formats (CLDF): - deﬁnes standards for data sharing - can be read and manipulated lated by diﬀerent tools - http://cldf.clld.org 5 / 30

Tools CLDF Glottolog: - language identiﬁers - language coordinates - language classiﬁcation - http://glottolog.org 5 / 30

Tools CLDF Concepticon: - concept identiﬁers - concept metadata - concept ontology - concepticon.clld.org 5 / 30

Tools CLDF Cross-Linguistic Transcription Systems - reference catalogs for sounds - links to transcription systems - links to transcription data - http://clts.clld.org 5 / 30

Tools CLDF LingPy - Python software package - sequence comparison - cognate detection - language classiﬁcation - http://lingpy.org 5 / 30

Tools CLDF EDICTOR - manual data annotation - manual data analysis - web-based tool - http://edictor.digling.org 5 / 30

Tools CLDF Database of Cross-Linguistic Colexiﬁcations (CLICS) - provides account on cross- linguistic polysemies - proxy for investigating semantic change - http://cldf.clld.org 5 / 30

Tools CLDF CLLD - framework for data publication - homogeneous look-and-feel - well-known among linguists - http://clld.org 5 / 30

The Story of Chinese “star” 6 / 30

The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’
Just a couple of weeks ago, Laurent Sagart and Guillaume Jacques had a discussion on the Chinese word for “star”, which is reconstructed as *s-tsʰˤeŋ in Old Chinese by Baxter and Sagart (2014). 7 / 30

8 / 30

Reflections When discussing etymologies involving nominal and verbal derivation, we often end up discussing about vague semantic analyses, “educated guesses”, applied to languages largely understudied. 9 / 30

Reflections When discussing etymologies involving nominal and verbal derivation, we often end up discussing about vague semantic analyses, “educated guesses”, applied to languages largely understudied. All scholars would probably agree that to advance these discussions (which may easily turn in circles), stricter formalization could help to set up a boundary for our disagreements. 9 / 30

Reflections Many attempts to formalize semantic change have been made, but they are not feasible to help us investigate the questions at hand. It would be good if we had 10 / 30

Reflections Many attempts to formalize semantic change have been made, but they are not feasible to help us investigate the questions at hand. It would be good if we had large-scale samples of abstract and concrete patterns of derivational semantics, which are stored in such a way that we can directly compare across multiple language families and retrieve general assessments of the plausibility and frequency of patterns under discussion. 10 / 30

Reflections What we find instead are 11 / 30

Reflections What we find instead are very detailed single-language accounts on derivation patterns, which are usually not comparable across languages. 11 / 30

Reflections What we find instead are very detailed single-language accounts on derivation patterns, which are usually not comparable across languages. Our dilemma is: if we go large-scale, our analyses are useless for single languages, but if we go small-scale, we loose comparability, as the patterns are too specific for one language. 11 / 30

Reflections We can overcome the scaling problem by establishing comparable small-scale analyses, which 12 / 30

Reflections We can overcome the scaling problem by establishing comparable small-scale analyses, which adhere to standards, represent data in human- and machine-readable form, and embrace the Zen of Python: “simple things should be simple, complex things should be possible” 12 / 30

Khroskyabs Causativisation 13 / 30

Khroskyabs Causativisation The Khroskyabs Language Rgyalrongic, Sino-Tibetan Rgnaba Prefecture, Sichuan
Province Dialects: Wobzi, Siyuewu, etc. 14 / 30

Khroskyabs Causativisation The Khroskyabs Language complex phonology ʁɴzbrɑ́ ‘to dare’
jzmbjə̂m ‘to let fly’ complex morphology polysynthetic templatic morphology hierarchical alignment verbal derivation 15 / 30

Khroskyabs Causativisation Khroskyabs Causative Constructions: An Overview Lai (2014, 2016)
s-Causative: prefix s- v-Causative: prefix v- lexical causative suppletive pairs labile verbs anticausative pairs 16 / 30

Khroskyabs Causativisation s-Causative Table: s-Causative and v-Causative Base Gloss Causative
Gloss qʰrɑ́ to be big s-qʰrɑ́ to cause to be big kʰɑ̂ to give s-kʰɑ̂ to cause to give rǽ to write s-rǽ to cause to write tsʰû to be boiled v-ftsʰû > f-tsʰû to boil 17 / 30

Khroskyabs Causativisation Anticausative pairs Table: Anticausative pairs in Khroskyabs Transitive
Gloss Intransitive Gloss ftɕʰə̂ to melt tr. dʑə̂ to melt intr. kʰlǽ to perish glǽ to die out ntɕʰətɕʰɑ́v to trip ndʑədʑɑ́v to tumble ntsʰɑ̂ɣ to wear dzɑ̂ɣ to be there (attached) pʰrə̂ to loosen brə̂ to become loose tɕʰǽv to break tr. dʑǽv to break intr. tɕə̂rə to tear dʑə̂rə to be torn intr. 18 / 30

Khroskyabs Causativisation Irregular cases Table: Irregular cases Base Gloss Causative
Gloss vzɑ́r to be spicy l-zɑ́v to cause to be spicy jdʑə̂r to mill jdʑə̂-l to cause to mill tʰê to drink s-tʰé to cause to drink çtə̂ to be short s-tə́m to shorten 19 / 30

Khroskyabs Causativisation What We Wish to Do... Use an onomasiological
approach to guarantee comparability across languages, and establish a first list of causative concepts along with their source concepts: BOILED vs. BOIL TRIP vs. TUMBLE PERISH vs. DIE OUT SHORT vs. SHORTEN ... 20 / 30

Khroskyabs Causativisation What We Wish to Do... Use an onomasiological
approach to guarantee comparability across languages, and establish a first list of causative concepts along with their source concepts: BOILED vs. BOIL TRIP vs. TUMBLE PERISH vs. DIE OUT SHORT vs. SHORTEN ... We then investigate how these pairs are linked with each other in the target language, for example by affixation (and what kind of affixation) voicing alternations (frequent in Sino-Tibetan) suppletion or else? 20 / 30

Khroskyabs Causativisation What Tools to Use Our project and the
DLCE of MPI-SHH has already established many of the important tools or is currently working on their implementation. As of now, the most important tools for this study are: Concepticon (List et al. 2016, as our reference catalogue for meanings), Glottolog (Hammarström et al. 2017, as our reference catalogue for languages), CLTS (List et al. in Prep., our reference catalogue for sound segments), CLDF (Forkel et al. in Prep., our overarching standard for data exchange), CLICS (List et al. 2014, our cross-linguistic approach for measuring semantic similarity), EDICTOR (List 2017, our tool for data annotation and analysis) 21 / 30

Khroskyabs Causativisation Annotation Examples Enhanced annotation is a major asset
of the CALC project. The goal is to 22 / 30

Khroskyabs Causativisation Annotation Examples Enhanced annotation is a major asset
of the CALC project. The goal is to provide data in human- and machine readable form, allow for both a comparison across and inside a given language, embrace standards while also allowing for flexible and language-specific solutions, support efficiency by providing a healthy mixture between scripts (in Python) and web-based tools (EDICTOR, in JavaScript) to assist the annotation process. Before we can annotate, however, we need to understand what and how we can do this! 22 / 30

Khroskyabs Causativisation Annotation Examples ROOT and STEM qʰrɑ́ ‘to be
big’ vs s-qʰrɑ́ ‘cause to be big’ ROOT: qʰrɑ́ STEM: qʰrɑ́ and s-qʰrɑ́ ftɕʰə̂ ‘to melt tr.’ vs dʑə̂ ‘to melt itr.’ < [+VOICING] + tɕʰə̂ ROOT: tɕʰə̂ STEM: f-tɕʰə̂ and [+VOICING] + tɕʰə̂ 23 / 30

Khroskyabs Causativisation Prefixation Simple prefixation of s- and v- 24
/ 30

Khroskyabs Causativisation Voicing Alternation non-aspirated voiceless as ROOT 25 /
30

Khroskyabs Causativisation Irregular Cases tone alternation numbering ROOT detection of
reduction metathesis 26 / 30

Khroskyabs Causativisation Irregular Cases tone alternation numbering ROOT detection of
reduction metathesis We have some ideas of how to handle metathesis, we are still in the stage of discussing how to handle it best. Reduction is a harder case, as is tonal alternation. For the time being, we decide to collect these examples but not rush with a solution until we have found out more about these particular irregularities. 26 / 30

Khroskyabs Causativisation What Can We do Then? 27 / 30

Khroskyabs Causativisation What Can We do Then? Thanks to the
fact that our data is linked to our standards, we can 27 / 30

Khroskyabs Causativisation What Can We do Then? Thanks to the
fact that our data is linked to our standards, we can expand the comparison from one to many dialects of Khroskyabs, use our questionnaires and annotation frameworks for other Sino-Tibetan languages (preliminary work on Kiranti with Guillaume Jacques has been carried out) compare derivation patterns across unrelated languages and make typologists happy 27 / 30

Khroskyabs Causativisation Interactive Etymologies Our current annotation can be directly
fit into word derivation graphs (or partial colexification networks, cf. Hill and List 2017): 28 / 30

Khroskyabs Causativisation Benefits Thanks to our adherence to standardized annotations,
our approach will lead to improved: transparency (human- and machine-readable data) efficiency (thanks to algorithms and annotation tools designed for the tasks at hand) re-usability (in typological studies and historical language comparison) 29 / 30

Khroskyabs Causativisation Benefits Thanks to our adherence to standardized annotations,
our approach will lead to improved: transparency (human- and machine-readable data) efficiency (thanks to algorithms and annotation tools designed for the tasks at hand) re-usability (in typological studies and historical language comparison) So far, we are just about to get started, but many things are already in place, and we are keen on exploring the possibilities, but also the disadvantages of our preliminary ideas with you! 29 / 30

Back to Our Chinese “star” We cannot solve the word’s
history now, but suppose we follow up on our standardised annotation of linguistic data on the micro-level, we can harvest cross-linguistic data on the macro-level. If we expand the analyses of verbal derivation in Khroskyabs to more languages of the Sino-Tibetan family, we may be able to substantiate the typological plausibility of hypotheses regarding Chinese “star”, reliably reconstruct the meaning of its stem, determine the function of the prefix, and draw explicit pathways of semantic change. 30 / 30

Back to Our Chinese “star” «Chaque mot a son histoire».
But many word histories are similar. If we start classifying them, what we may learn can go easily beyond the history of the word for “star” in Chinese. 30 / 30

Danke für Ihre Aufmerksamkeit! 30 / 30

Investigating Verb Derivation Patterns in Sino-...

Investigating Verb Derivation Patterns in Sino-Tibetan Languages within a Computer-Assisted Framework

More Decks by Johann-Mattis List

Other Decks in Science

Featured

Transcript