Slide 1

Slide 1 text

Investigating Verb Derivation Patterns in Sino-Tibetan Languages within a Computer-Assisted Framework Yunfan Lai and Johann-Mattis List Research Group “Computer-Assisted Language Comparison” Department of Linguistic and Cultural Evolution Max-Planck Institute for the Science of Human History Jena, Germany 2018-02-09 very long title P(A|B)=P(B|A)... 1 / 30

Slide 2

Slide 2 text

Historical Language Comparison 2 / 30

Slide 3

Slide 3 text

Historical Language Comparison 2 / 30

Slide 4

Slide 4 text

Historical Language Comparison 2 / 30

Slide 5

Slide 5 text

Historical Language Comparison 2 / 30

Slide 6

Slide 6 text

Historical Language Comparison Classical vs. Computational Language Comparison 3 / 30

Slide 7

Slide 7 text

Historical Language Comparison Classical vs. Computational Language Comparison 3 / 30

Slide 8

Slide 8 text

Historical Language Comparison Classical vs. Computational Language Comparison 3 / 30

Slide 9

Slide 9 text

Historical Language Comparison CALC Computer-Assisted Language Comparison 4 / 30

Slide 10

Slide 10 text

Historical Language Comparison CALC Computer-Assisted Language Comparison 4 / 30

Slide 11

Slide 11 text

Historical Language Comparison CALC Computer-Assisted Language Comparison 4 / 30

Slide 12

Slide 12 text

Historical Language Comparison CALC Computer-Assisted Language Comparison 4 / 30

Slide 13

Slide 13 text

Historical Language Comparison CALC Computer-Assisted Language Comparison 4 / 30

Slide 14

Slide 14 text

Historical Language Comparison CALC Computer-Assisted Language Comparison 4 / 30

Slide 15

Slide 15 text

Historical Language Comparison CALC Computer-Assisted Language Comparison 4 / 30

Slide 16

Slide 16 text

Historical Language Comparison Standards, Software, and Tools Standards, Software, and Tools CLDF 5 / 30

Slide 17

Slide 17 text

Historical Language Comparison Standards, Software, and Tools Standards, Software, and Tools CLDF Cross-Linguistic Data Formats (CLDF): - defines standards for data sharing - can be read and manipulated lated by different tools - http://cldf.clld.org 5 / 30

Slide 18

Slide 18 text

Historical Language Comparison Standards, Software, and Tools Standards, Software, and Tools CLDF Glottolog: - language identifiers - language coordinates - language classification - http://glottolog.org 5 / 30

Slide 19

Slide 19 text

Historical Language Comparison Standards, Software, and Tools Standards, Software, and Tools CLDF Concepticon: - concept identifiers - concept metadata - concept ontology - concepticon.clld.org 5 / 30

Slide 20

Slide 20 text

Historical Language Comparison Standards, Software, and Tools Standards, Software, and Tools CLDF Cross-Linguistic Transcription Systems - reference catalogs for sounds - links to transcription systems - links to transcription data - http://clts.clld.org 5 / 30

Slide 21

Slide 21 text

Historical Language Comparison Standards, Software, and Tools Standards, Software, and Tools CLDF LingPy - Python software package - sequence comparison - cognate detection - language classification - http://lingpy.org 5 / 30

Slide 22

Slide 22 text

Historical Language Comparison Standards, Software, and Tools Standards, Software, and Tools CLDF EDICTOR - manual data annotation - manual data analysis - web-based tool - http://edictor.digling.org 5 / 30

Slide 23

Slide 23 text

Historical Language Comparison Standards, Software, and Tools Standards, Software, and Tools CLDF Database of Cross-Linguistic Colexifications (CLICS) - provides account on cross- linguistic polysemies - proxy for investigating semantic change - http://cldf.clld.org 5 / 30

Slide 24

Slide 24 text

Historical Language Comparison Standards, Software, and Tools Standards, Software, and Tools CLDF CLLD - framework for data publication - homogeneous look-and-feel - well-known among linguists - http://clld.org 5 / 30

Slide 25

Slide 25 text

The Story of Chinese “star” 6 / 30

Slide 26

Slide 26 text

The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’ Just a couple of weeks ago, Laurent Sagart and Guillaume Jacques had a discussion on the Chinese word for “star”, which is reconstructed as *s-tsʰˤeŋ in Old Chinese by Baxter and Sagart (2014). 7 / 30

Slide 27

Slide 27 text

The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’ 8 / 30

Slide 28

Slide 28 text

The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’ 8 / 30

Slide 29

Slide 29 text

The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’ 8 / 30

Slide 30

Slide 30 text

The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’ 8 / 30

Slide 31

Slide 31 text

The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’ Reflections When discussing etymologies involving nominal and verbal derivation, we often end up discussing about vague semantic analyses, “educated guesses”, applied to languages largely understudied. 9 / 30

Slide 32

Slide 32 text

The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’ Reflections When discussing etymologies involving nominal and verbal derivation, we often end up discussing about vague semantic analyses, “educated guesses”, applied to languages largely understudied. All scholars would probably agree that to advance these discussions (which may easily turn in circles), stricter formalization could help to set up a boundary for our disagreements. 9 / 30

Slide 33

Slide 33 text

The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’ Reflections Many attempts to formalize semantic change have been made, but they are not feasible to help us investigate the questions at hand. It would be good if we had 10 / 30

Slide 34

Slide 34 text

The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’ Reflections Many attempts to formalize semantic change have been made, but they are not feasible to help us investigate the questions at hand. It would be good if we had large-scale samples of abstract and concrete patterns of derivational semantics, which are stored in such a way that we can directly compare across multiple language families and retrieve general assessments of the plausibility and frequency of patterns under discussion. 10 / 30

Slide 35

Slide 35 text

The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’ Reflections What we find instead are 11 / 30

Slide 36

Slide 36 text

The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’ Reflections What we find instead are very detailed single-language accounts on derivation patterns, which are usually not comparable across languages. 11 / 30

Slide 37

Slide 37 text

The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’ Reflections What we find instead are very detailed single-language accounts on derivation patterns, which are usually not comparable across languages. Our dilemma is: if we go large-scale, our analyses are useless for single languages, but if we go small-scale, we loose comparability, as the patterns are too specific for one language. 11 / 30

Slide 38

Slide 38 text

The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’ Reflections We can overcome the scaling problem by establishing comparable small-scale analyses, which 12 / 30

Slide 39

Slide 39 text

The story of Chinese *s-tsʰˤeŋ > seng > xīng ‘star’ Reflections We can overcome the scaling problem by establishing comparable small-scale analyses, which adhere to standards, represent data in human- and machine-readable form, and embrace the Zen of Python: “simple things should be simple, complex things should be possible” 12 / 30

Slide 40

Slide 40 text

Khroskyabs Causativisation 13 / 30

Slide 41

Slide 41 text

Khroskyabs Causativisation The Khroskyabs Language Rgyalrongic, Sino-Tibetan Rgnaba Prefecture, Sichuan Province Dialects: Wobzi, Siyuewu, etc. 14 / 30

Slide 42

Slide 42 text

Khroskyabs Causativisation The Khroskyabs Language complex phonology ʁɴzbrɑ́ ‘to dare’ jzmbjə̂m ‘to let fly’ complex morphology polysynthetic templatic morphology hierarchical alignment verbal derivation 15 / 30

Slide 43

Slide 43 text

Khroskyabs Causativisation Khroskyabs Causative Constructions: An Overview Lai (2014, 2016) s-Causative: prefix s- v-Causative: prefix v- lexical causative suppletive pairs labile verbs anticausative pairs 16 / 30

Slide 44

Slide 44 text

Khroskyabs Causativisation s-Causative Table: s-Causative and v-Causative Base Gloss Causative Gloss qʰrɑ́ to be big s-qʰrɑ́ to cause to be big kʰɑ̂ to give s-kʰɑ̂ to cause to give rǽ to write s-rǽ to cause to write tsʰû to be boiled v-ftsʰû > f-tsʰû to boil 17 / 30

Slide 45

Slide 45 text

Khroskyabs Causativisation Anticausative pairs Table: Anticausative pairs in Khroskyabs Transitive Gloss Intransitive Gloss ftɕʰə̂ to melt tr. dʑə̂ to melt intr. kʰlǽ to perish glǽ to die out ntɕʰətɕʰɑ́v to trip ndʑədʑɑ́v to tumble ntsʰɑ̂ɣ to wear dzɑ̂ɣ to be there (attached) pʰrə̂ to loosen brə̂ to become loose tɕʰǽv to break tr. dʑǽv to break intr. tɕə̂rə to tear dʑə̂rə to be torn intr. 18 / 30

Slide 46

Slide 46 text

Khroskyabs Causativisation Irregular cases Table: Irregular cases Base Gloss Causative Gloss vzɑ́r to be spicy l-zɑ́v to cause to be spicy jdʑə̂r to mill jdʑə̂-l to cause to mill tʰê to drink s-tʰé to cause to drink çtə̂ to be short s-tə́m to shorten 19 / 30

Slide 47

Slide 47 text

Khroskyabs Causativisation What We Wish to Do... Use an onomasiological approach to guarantee comparability across languages, and establish a first list of causative concepts along with their source concepts: BOILED vs. BOIL TRIP vs. TUMBLE PERISH vs. DIE OUT SHORT vs. SHORTEN ... 20 / 30

Slide 48

Slide 48 text

Khroskyabs Causativisation What We Wish to Do... Use an onomasiological approach to guarantee comparability across languages, and establish a first list of causative concepts along with their source concepts: BOILED vs. BOIL TRIP vs. TUMBLE PERISH vs. DIE OUT SHORT vs. SHORTEN ... We then investigate how these pairs are linked with each other in the target language, for example by affixation (and what kind of affixation) voicing alternations (frequent in Sino-Tibetan) suppletion or else? 20 / 30

Slide 49

Slide 49 text

Khroskyabs Causativisation What Tools to Use Our project and the DLCE of MPI-SHH has already established many of the important tools or is currently working on their implementation. As of now, the most important tools for this study are: Concepticon (List et al. 2016, as our reference catalogue for meanings), Glottolog (Hammarström et al. 2017, as our reference catalogue for languages), CLTS (List et al. in Prep., our reference catalogue for sound segments), CLDF (Forkel et al. in Prep., our overarching standard for data exchange), CLICS (List et al. 2014, our cross-linguistic approach for measuring semantic similarity), EDICTOR (List 2017, our tool for data annotation and analysis) 21 / 30

Slide 50

Slide 50 text

Khroskyabs Causativisation Annotation Examples Enhanced annotation is a major asset of the CALC project. The goal is to 22 / 30

Slide 51

Slide 51 text

Khroskyabs Causativisation Annotation Examples Enhanced annotation is a major asset of the CALC project. The goal is to provide data in human- and machine readable form, allow for both a comparison across and inside a given language, embrace standards while also allowing for flexible and language-specific solutions, support efficiency by providing a healthy mixture between scripts (in Python) and web-based tools (EDICTOR, in JavaScript) to assist the annotation process. Before we can annotate, however, we need to understand what and how we can do this! 22 / 30

Slide 52

Slide 52 text

Khroskyabs Causativisation Annotation Examples ROOT and STEM qʰrɑ́ ‘to be big’ vs s-qʰrɑ́ ‘cause to be big’ ROOT: qʰrɑ́ STEM: qʰrɑ́ and s-qʰrɑ́ ftɕʰə̂ ‘to melt tr.’ vs dʑə̂ ‘to melt itr.’ < [+VOICING] + tɕʰə̂ ROOT: tɕʰə̂ STEM: f-tɕʰə̂ and [+VOICING] + tɕʰə̂ 23 / 30

Slide 53

Slide 53 text

Khroskyabs Causativisation Prefixation Simple prefixation of s- and v- 24 / 30

Slide 54

Slide 54 text

Khroskyabs Causativisation Voicing Alternation non-aspirated voiceless as ROOT 25 / 30

Slide 55

Slide 55 text

Khroskyabs Causativisation Irregular Cases tone alternation numbering ROOT detection of reduction metathesis 26 / 30

Slide 56

Slide 56 text

Khroskyabs Causativisation Irregular Cases tone alternation numbering ROOT detection of reduction metathesis We have some ideas of how to handle metathesis, we are still in the stage of discussing how to handle it best. Reduction is a harder case, as is tonal alternation. For the time being, we decide to collect these examples but not rush with a solution until we have found out more about these particular irregularities. 26 / 30

Slide 57

Slide 57 text

Khroskyabs Causativisation What Can We do Then? 27 / 30

Slide 58

Slide 58 text

Khroskyabs Causativisation What Can We do Then? Thanks to the fact that our data is linked to our standards, we can 27 / 30

Slide 59

Slide 59 text

Khroskyabs Causativisation What Can We do Then? Thanks to the fact that our data is linked to our standards, we can expand the comparison from one to many dialects of Khroskyabs, use our questionnaires and annotation frameworks for other Sino-Tibetan languages (preliminary work on Kiranti with Guillaume Jacques has been carried out) compare derivation patterns across unrelated languages and make typologists happy 27 / 30

Slide 60

Slide 60 text

Khroskyabs Causativisation Interactive Etymologies Our current annotation can be directly fit into word derivation graphs (or partial colexification networks, cf. Hill and List 2017): 28 / 30

Slide 61

Slide 61 text

Khroskyabs Causativisation Interactive Etymologies Our current annotation can be directly fit into word derivation graphs (or partial colexification networks, cf. Hill and List 2017): 28 / 30

Slide 62

Slide 62 text

Khroskyabs Causativisation Benefits Thanks to our adherence to standardized annotations, our approach will lead to improved: transparency (human- and machine-readable data) efficiency (thanks to algorithms and annotation tools designed for the tasks at hand) re-usability (in typological studies and historical language comparison) 29 / 30

Slide 63

Slide 63 text

Khroskyabs Causativisation Benefits Thanks to our adherence to standardized annotations, our approach will lead to improved: transparency (human- and machine-readable data) efficiency (thanks to algorithms and annotation tools designed for the tasks at hand) re-usability (in typological studies and historical language comparison) So far, we are just about to get started, but many things are already in place, and we are keen on exploring the possibilities, but also the disadvantages of our preliminary ideas with you! 29 / 30

Slide 64

Slide 64 text

Back to Our Chinese “star” We cannot solve the word’s history now, but suppose we follow up on our standardised annotation of linguistic data on the micro-level, we can harvest cross-linguistic data on the macro-level. If we expand the analyses of verbal derivation in Khroskyabs to more languages of the Sino-Tibetan family, we may be able to substantiate the typological plausibility of hypotheses regarding Chinese “star”, reliably reconstruct the meaning of its stem, determine the function of the prefix, and draw explicit pathways of semantic change. 30 / 30

Slide 65

Slide 65 text

Back to Our Chinese “star” «Chaque mot a son histoire». But many word histories are similar. If we start classifying them, what we may learn can go easily beyond the history of the word for “star” in Chinese. 30 / 30

Slide 66

Slide 66 text

Danke für Ihre Aufmerksamkeit! 30 / 30