Slide 1

Slide 1 text

Handling word formation in historical-comparative linguistics Annotation and Analysis Johann-Mattis List DFG Research Fellow Centre des recherches linguistiques sur l’Asie Orientale Team Adaptation, Integration, Reticulation, Evolution EHESS and UPMC, Paris 2016/12/01 1 / 38

Slide 2

Slide 2 text

Preliminaries Preliminaries 2 / 38

Slide 3

Slide 3 text

Preliminaries The Word Formation Problem The Word Formation Problem Frucht, ferner fruchten, befruchten, Befruchtung, fruchtbar, fruchtig Frucht f. ‘der Fortpflanzung der eigenen Art dienendes Produkt einer Pflanze’, auch ‘ungeborenes Lebewesen’, übertragen ‘Ertrag’, ahd. fruht (9. Jh.), mhd. vruht, asächs. fruht, mnd. mnl. nl. vrucht beruhen auf einer frühen Entlehnung von gleichbed. lat. frūctus, abgeleitet vom Verb lat. fruī (frūctus sum) ‘genießen, Nutzen ziehen’ (verwandt mit brauchen, s. d.). Das Deminutiv Früchtchen hat die spezielle Bedeutung [...] German "Frucht" in Pfei�er (1993, also at http://dwds.de) 3 / 38

Slide 4

Slide 4 text

Preliminaries The Word Formation Problem The Word Formation Problem Frucht, ferner fruchten, befruchten, Befruchtung, fruchtbar, fruchtig Frucht f. ‘der Fortpflanzung der eigenen Art dienendes Produkt einer Pflanze’, auch ‘ungeborenes Lebewesen’, übertragen ‘Ertrag’, ahd. fruht (9. Jh.), mhd. vruht, asächs. fruht, mnd. mnl. nl. vrucht beruhen auf einer frühen Entlehnung von gleichbed. lat. frūctus, abgeleitet vom Verb lat. fruī (frūctus sum) ‘genießen, Nutzen ziehen’ (verwandt mit brauchen, s. d.). Das Deminutiv Früchtchen hat die spezielle Bedeutung [...] German "Frucht" in Pfei�er (1993, also at http://dwds.de 3 / 38

Slide 5

Slide 5 text

Preliminaries The Word Formation Problem The Word Formation Problem Frucht, ferner fruchten, befruchten, Befruchtung, fruchtbar, fruchtig Frucht f. ‘der Fortpflanzung der eigenen Art dienendes Produkt einer Pflanze’, auch ‘ungeborenes Lebewesen’, übertragen ‘Ertrag’, ahd. fruht (9. Jh.), mhd. vruht, asächs. fruht, mnd. mnl. nl. vrucht beruhen auf einer frühen Entlehnung von gleichbed. lat. frūctus, abgeleitet vom Verb lat. fruī (frūctus sum) ‘genießen, Nutzen ziehen’ (verwandt mit brauchen, s. d.). Das Deminutiv Früchtchen hat die spezielle Bedeutung [...] inherited from borrowed from derived from PIE *bhreu◌◌̯ Hg◌ ◌ ̑ - “to use” PIE *bhruHg◌ ◌ ̑ -ié- “to use” (present tense) PGM *ƀrūkan- “to use” OHG brūhhan “to use” G brauchen “to use” G Brauch “custom” OHG fruht “profit, fruit” G frugal “modest (food)” Fr fruit “profit,fruit” Fr frugal “modest (food)” Lt fruor, fruī “I enjoy” Lt frūctus “profit” Lt frux “fruit, grain” Lt frugalis “bring profit” Adapted from an Illustration by Hans Geisler (University Düsseldorf) German "Frucht" in Pfei�er (1993, also at http://dwds.de 3 / 38

Slide 6

Slide 6 text

Preliminaries The Word Formation Problem The Word Formation Problem While etymological dictionaries provide us with very detailed scenarios on processes in lan- guage change, including processes of inheri- tance, contact, and word formation, they present the knowledge in a prosaic fashion that resists quantification and makes it also very difficult to comprehend, especially for those who are not experts in the given language family. 3 / 38

Slide 7

Slide 7 text

Preliminaries The Word Formation Problem The Word Formation Problem solej SUN French sol SUN Spanish SUN zɔnə SUN German SUN suːl Swedish 4 / 38

Slide 8

Slide 8 text

Preliminaries The Word Formation Problem The Word Formation Problem 'soh₂-wl̩- sh₂uˈen- SUN Indo-European 4 / 38

Slide 9

Slide 9 text

Preliminaries The Word Formation Problem The Word Formation Problem 'soh₂-wl̩- sh₂uˈen- SUN Indo-European soːwel- sunːoː- SUN Germanic 4 / 38

Slide 10

Slide 10 text

Preliminaries The Word Formation Problem The Word Formation Problem 'soh₂-wl̩- sh₂uˈen- SUN Indo-European soːwel- sunːoː- SUN Germanic zɔnə SUN German suːl SUN Swedish 4 / 38

Slide 11

Slide 11 text

Preliminaries The Word Formation Problem The Word Formation Problem 'soh₂-wl̩- sh₂uˈen- SUN Indo-European soːwel- sunːoː- SUN Germanic soːl- SUN Romance zɔnə SUN German suːl SUN Swedish 4 / 38

Slide 12

Slide 12 text

Preliminaries The Word Formation Problem The Word Formation Problem 'soh₂-wl̩- sh₂uˈen- SUN Indo-European soːwel- sunːoː- SUN Germanic soːl- SUN soːlikul- SMALL SUN Romance zɔnə SUN German suːl SUN Swedish 4 / 38

Slide 13

Slide 13 text

Preliminaries The Word Formation Problem The Word Formation Problem 'soh₂-wl̩- sh₂uˈen- SUN Indo-European soːwel- sunːoː- SUN Germanic soːl- SUN soːlikul- SMALL SUN Romance solej SUN French sol SUN Spanish zɔnə SUN German suːl SUN Swedish 4 / 38

Slide 14

Slide 14 text

Preliminaries The Word Formation Problem The Word Formation Problem 'soh₂-wl◌◌̩ - sh₂uˈen- SUN Indo-European soːwel- sunːoː- SUN Germanic soːl- SUN soːlikul- SMALL SUN Romance solej SUN French sol SUN Spanish zɔnə SUN German suːl SUN Swedish SEM ANTIC SHIFT M O RPH O LO G ICAL CH AN G E M O R PH O LO G ICA L CH A N G E MORPHOLOGICAL CHANGE MORPHOLOGICAL CHANGE 4 / 38

Slide 15

Slide 15 text

Preliminaries The Word Formation Problem The Word Formation Problem Computational phylogenetic approaches, on the other hand, usually ignore word formation, mak- ing it difficult to study language change in all its complexity. 4 / 38

Slide 16

Slide 16 text

Preliminaries The Word Formation Problem Reconciling Etymology with Computers Can we find a way to increase the consis- tency of classical etymological accounts and the complexity of computational approaches to han- dle word formation processes in comparative- historical linguistics in a more reliable and more transparent way? 5 / 38

Slide 17

Slide 17 text

Preliminaries Annotation and Analysis Annotation and Analysis Annotation The first step in linguistic reconstruction, during which we assemble our evidence and identify related words and mor- phemes in and outside languages. Analysis The second, crucial step of linguistic reconstruction, dur- ing which we present our hypotheses in form of historical scenarios that explain how the relations that we annotated evolved into their current shape. 6 / 38

Slide 18

Slide 18 text

Preliminaries Annotation and Analysis Annotation and Analysis solej SUN French sol SUN Spanish SUN zɔnə SUN German SUN suːl 7 / 38

Slide 19

Slide 19 text

Preliminaries Annotation and Analysis Annotation and Analysis solej SUN French sol SUN Spanish SUN zɔnə SUN German SUN suːl 7 / 38

Slide 20

Slide 20 text

Preliminaries Annotation and Analysis Annotation and Analysis solej SUN French sol SUN Spanish SUN zɔnə SUN German SUN suːl 7 / 38

Slide 21

Slide 21 text

Preliminaries Annotation and Analysis Annotation and Analysis solej SUN French sol SUN Spanish SUN zɔnə SUN German SUN suːl 7 / 38

Slide 22

Slide 22 text

Preliminaries Annotation and Analysis Annotation Descriptive Annotation The first step of annotation in which we only state which relations hold between the entities we analyse and does not make specific assumptions regarding the historical pro- cesses which they result from. Etymological Annotation An interpretive step in the annotation in which certain parts of the reconstruction are already carried out and certain basic processes are postulated. 8 / 38

Slide 23

Slide 23 text

Preliminaries Annotation and Analysis Analysis Linguistic Reconstruction Postulating a classical proto-form which is supposed to have been fulfilling a certain function in the proto-language. Etymological Analysis A complete account on a given proto-form, providing the best explanations available to account for the development of the form in the descendant languages (including idiosyncratic processes). 9 / 38

Slide 24

Slide 24 text

Preliminaries Technical Aspects Data Representation Multilingual Word List A list of words from various languages, in which each row corresponds to one word form in one language, marked by a unique identifier in the first column, with additional informa- tion regarding the word (meaning, pronunciation, cognacy, etc.) placed in additional columns. Word List Format Given the very abstract and lightweight character of this word list structure, it can be stored in all computer formats which allow for this kind of data, such as CSV (comma-separated values, pure text file) or more specific spreadsheet formats (Excel, LibreOffice, GoogleSheets, etc.). 10 / 38

Slide 25

Slide 25 text

Preliminaries Technical Aspects Data Representation ID CONCEPT ORTHOGRAPHY VALUE DOCULECT COGNACY 1 hand Hand hant German 1 2 hand hand hænd English 1 3 hand рука ruka Russian 2 4 hand рука ruka Ukrainian 2 5 leg Bein bain German 3 6 leg leg lɛg English 4 7 leg нога noga Russian 5 8 leg нога noha Ukrainian 5 9 Woldemort Waldemar valdemar German 6 10 Woldemort Woldemort wɔldemɔrt English 6 11 Woldemort Владимир vladimir Russian 6 12 Woldemort Володимир volodimir Ukrainian 6 13 Harry Harald haralt German 7 14 Harry Harry hæri English 7 15 Harry Гарри gari Russian 7 16 Harry Гаррi hari Ukrainian 7 11 / 38

Slide 26

Slide 26 text

Preliminaries Technical Aspects Data Representation 11 / 38

Slide 27

Slide 27 text

Preliminaries Technical Aspects Data Representation EDICTOR Version 0.1 (List 2016) http://edictor.digling.org 11 / 38

Slide 28

Slide 28 text

Preliminaries Technical Aspects Examples for Basic Fields DOCULECT The name of the language variety (EDICTOR prefers a simple name, without special charac- ters and whitespace). CONCEPT The gloss for the concept expressed by the word (content arbitrary, as long as different concepts are given different glosses). TOKENS Machine-readable representation of phonetic transcription. Space-segmented, that is, pho- netic/phonological units (“sounds”) are sepa- rated by spaces. 12 / 38

Slide 29

Slide 29 text

Preliminaries Technical Aspects Relations COGNACY The identifier for cognate sets by which words which have the same identifier are grouped into the same cognate set. ALIGNMENT The aligned representation of the word when be- ing aligned with cognate words. 13 / 38

Slide 30

Slide 30 text

Preliminaries Technical Aspects Relations 14 / 38

Slide 31

Slide 31 text

Compounding Compounding 15 / 38

Slide 32

Slide 32 text

Compounding Preliminaries Handling Compounding in SEA Languages The following is common work with Nathan Hill (SOAS, London). We developed the ideas for handling and analysing word formation in a project on the history of the Burmish language family. 16 / 38

Slide 33

Slide 33 text

Compounding Preliminaries Compounding in SEA Languages German m oː n t - English m uː n - - Danish m ɔː n - ə Swedish m oː n - e 17 / 38

Slide 34

Slide 34 text

Compounding Preliminaries Compounding in SEA Languages German m oː n t - English m uː n - - Danish m ɔː n - ə Swedish m oː n - e Fúzhōu ŋ u o ʔ ⁵ - - - - - - - - - - Měixiàn ŋ i a t ⁵ - - - - - k u o ŋ ⁴⁴ Guǎngzhōu j - y t ² l - œ ŋ ²² - - - - - Běijīng - y ɛ - ⁵¹ l i ɑ ŋ - - - - - - 17 / 38

Slide 35

Slide 35 text

Compounding Preliminaries Compounding in SEA Languages German m oː n t - English m uː n - - Danish m ɔː n - ə Swedish m oː n - e Fúzhōu ŋ u o ʔ ⁵ - - - - - - - - - - Měixiàn ŋ i a t ⁵ - - - - - k u o ŋ ⁴⁴ Guǎngzhōu j - y t ² l - œ ŋ ²² - - - - - Běijīng - y ɛ - ⁵¹ l i ɑ ŋ - - - - - - 1 2 3 4 number of morphemes per word 0.0 0.2 0.4 0.6 0.8 1.0 relative frequency all words nouns Compounds in the basic vocabulary (Swadesh1952) across 23 Chinese dialects (data by Hamed and Wang 2006) 30% 50% 17 / 38

Slide 36

Slide 36 text

Compounding Preliminaries Partial Cognacy ‘Cognacy is not a binary relation which is either present or not. Instead, we can distinguish different subtypes of cog- nacy, just as biologists can identify specific types of homol- ogy between genes.’ (List 2016: 133) 18 / 38

Slide 37

Slide 37 text

Compounding Annotation of Compounds Language-External Annotation TOKENS (SEGMENTS) Our phonetic transcriptions are expanded by adding a layer of morphological segmentation in which different mor- phemes in the same word form are separated with a mor- pheme separation character (usually a +). PARTIAL_COGNATES Partial cognate relations are indicated with help of partial cognate identifiers which are separated by a space in the partial-cognate column. PARTIAL_ALIGNMENTS When aligning the words, we still write all alignments in only one column, but we align each morpheme in the word only for the partial cognate set to which it belongs. 19 / 38

Slide 38

Slide 38 text

Compounding Annotation of Compounds Language-External Annotation Partial cognates which are annotated in this way can be directly converted into the “normal” cog- nate sets we know from the quantitative phy- logenetic analyses. Partial cognate annotation would be extremely tedious in spreadsheets or other formats. The EDICTOR tool, however, supports partial cognate annotation, and LingPy offers tools for partial cognate identification and alignment (List et al. 2016). 19 / 38

Slide 39

Slide 39 text

Compounding Annotation of Compounds Language-External Annotation 20 / 38

Slide 40

Slide 40 text

Compounding Annotation of Compounds Language-External Annotation 20 / 38

Slide 41

Slide 41 text

Compounding Annotation of Compounds Language-External Annotation 20 / 38

Slide 42

Slide 42 text

Compounding Annotation of Compounds Language-External Annotation 20 / 38

Slide 43

Slide 43 text

Compounding Annotation of Compounds Language-External Annotation 20 / 38

Slide 44

Slide 44 text

Compounding Annotation of Compounds Language-External Annotation 20 / 38

Slide 45

Slide 45 text

Compounding Annotation of Compounds Language-Internal Annotation MORPHEMES We can annotate the morpheme-separated word forms fur- ther by using a very straightforward space-segmented for- mat of “word-form-glossing”, in which we use the same iden- tifier (a brief gloss for a concept) to gloss the structure of word forms. In this way, we can annotate the semantic structure of multi-morphemic compounds (compare Chinese shùpí 树皮 ‘bark’ tree skin), and at the same time anno- tate language-internal cognates (word families or “allofams”) in a transparent way. 21 / 38

Slide 46

Slide 46 text

Compounding Annotation of Compounds Language-Internal Annotation Morpheme annotation using word-form-glossing is supported by the EDICTOR tool and will be supported in form of partial colexification analy- ses in future versions of LingPy. 21 / 38

Slide 47

Slide 47 text

Compounding Annotation of Compounds Language-Internal Annotation 21 / 38

Slide 48

Slide 48 text

Compounding Analysis of Compounds Ancestral State Reconstruction Bějīng Fúzhōu Měixiàn Guǎngzhōu 月 ŋiat⁵ kuoŋ⁴⁴ jyt² ŋuoʔ⁵ liɑŋ¹ lœŋ²² yɛ⁵¹ 光 亮 B G M F B G M F 22 / 38

Slide 49

Slide 49 text

Compounding Analysis of Compounds Ancestral State Reconstruction Bějīng Fúzhōu Měixiàn Guǎngzhōu 月 ŋiat⁵ kuoŋ⁴⁴ jyt² ŋuoʔ⁵ liɑŋ¹ lœŋ²² yɛ⁵¹ 光 亮 B G M F B G M F 22 / 38

Slide 50

Slide 50 text

Compounding Analysis of Compounds Ancestral State Reconstruction Bějīng Fúzhōu Měixiàn Guǎngzhōu 月 ŋiat⁵ kuoŋ⁴⁴ jyt² ŋuoʔ⁵ liɑŋ¹ lœŋ²² yɛ⁵¹ 光 亮 B G LOSS INNO VATIO N INNO VATIO N BORROWING M F 22 / 38

Slide 51

Slide 51 text

Compounding Analysis of Compounds Ancestral State Reconstruction Once we have annotated the compounds in lexical data for partial cognacy, we can use various methods to infer how the data can be best explained by assuming different pro- cesses of change in compounding. Currently, only proto- types are available (see e.g., List 2016), but evolutionary biology offers with the framework of tree reconciliation meth- ods already a rich arsenal of methods which we can adapt to our linguistic needs in the future. Given that compound- ing is treated carelessly in most etymological dictionaries of SEA languages, the development of new quantitative analy- ses inspired from biological techniques may turn out as very fruitful for our discipline. 22 / 38

Slide 52

Slide 52 text

Compounding Analysis of Compounds Ancestral State Reconstruction Fúzhōu Táiběi Xiàmén Zhāngpíng Mǐn Guǎngzhōu Měixiàn Liánchéng Hakka Wēnzhōu Níngbō Sūzhōu Shànghǎi Shànghǎi_B Wú Nánchāng Ānyì Gàn Chángshā Shuāngfēng Xiāng Yàngshān Wǔhàn Níngxià Chéngdū Běijīng Tàiyuán Yúcì Guānhuà 月 月娘 月光佛 月光 月亮 月明 ‘MOON’ ‘MOON-MOTHER’ ‘MOON-LIGHT’ ‘MOON-LIGHT-SUFFIX’ ‘MOON-SHINE’ ‘MOON-BRIGHT’ List (2016, Journal of Language Evolution) 23 / 38

Slide 53

Slide 53 text

Compounding Analysis of Compounds Ancestral State Reconstruction Fúzhōu Táiběi Xiàmén Zhāngpíng Mǐn Guǎngzhōu Měixiàn Liánchéng Hakka Wēnzhōu Níngbō Sūzhōu Shànghǎi Shànghǎi_B Wú Nánchāng Ānyì Gàn Chángshā Shuāngfēng Xiāng Yàngshān Wǔhàn Níngxià Chéngdū Běijīng Tàiyuán Yúcì Guānhuà 月 月娘 月光佛 月光 月亮 月明 ‘MOON’ ‘MOON-MOTHER’ ‘MOON-LIGHT’ ‘MOON-LIGHT-SUFFIX’ ‘MOON-SHINE’ ‘MOON-BRIGHT’ List (2016, Journal of Language Evolution) 23 / 38

Slide 54

Slide 54 text

Compounding Analysis of Compounds Ancestral State Reconstruction Fúzhōu Táiběi Xiàmén Zhāngpíng Mǐn Guǎngzhōu Měixiàn Liánchéng Hakka Wēnzhōu Níngbō Sūzhōu Shànghǎi Shànghǎi_B Wú Nánchāng Ānyì Gàn Chángshā Shuāngfēng Xiāng Yàngshān Wǔhàn Níngxià Chéngdū Běijīng Tàiyuán Yúcì Guānhuà 月 月娘 月光佛 月光 月亮 月明 ‘MOON’ ‘MOON-MOTHER’ ‘MOON-LIGHT’ ‘MOON-LIGHT-SUFFIX’ ‘MOON-SHINE’ ‘MOON-BRIGHT’ List (2016, Journal of Language Evolution) 23 / 38

Slide 55

Slide 55 text

Compounding Analysis of Compounds Ancestral State Reconstruction Fúzhōu Táiběi Xiàmén Zhāngpíng Mǐn Guǎngzhōu Měixiàn Liánchéng Hakka Wēnzhōu Níngbō Sūzhōu Shànghǎi Shànghǎi_B Wú Nánchāng Ānyì Gàn Chángshā Shuāngfēng Xiāng Yàngshān Wǔhàn Níngxià Chéngdū Běijīng Tàiyuán Yúcì Guānhuà 月 月娘 月光佛 月光 月亮 月明 ‘MOON’ ‘MOON-MOTHER’ ‘MOON-LIGHT’ ‘MOON-LIGHT-SUFFIX’ ‘MOON-SHINE’ ‘MOON-BRIGHT’ List (2016, Journal of Language Evolution) 23 / 38

Slide 56

Slide 56 text

Compounding Analysis of Compounds Ancestral State Reconstruction Fúzhōu Táiběi Xiàmén Zhāngpíng Mǐn Guǎngzhōu Měixiàn Liánchéng Hakka Wēnzhōu Níngbō Sūzhōu Shànghǎi Shànghǎi_B Wú Nánchāng Ānyì Gàn Chángshā Shuāngfēng Xiāng Yàngshān Wǔhàn Níngxià Chéngdū Běijīng Tàiyuán Yúcì Guānhuà 月 月娘 月光佛 月光 月亮 月明 ‘MOON’ ‘MOON-MOTHER’ ‘MOON-LIGHT’ ‘MOON-LIGHT-SUFFIX’ ‘MOON-SHINE’ ‘MOON-BRIGHT’ List (2016, Journal of Language Evolution) 23 / 38

Slide 57

Slide 57 text

Compounding Analysis of Compounds Ancestral State Reconstruction Fúzhōu Táiběi Xiàmén Zhāngpíng Mǐn Guǎngzhōu Měixiàn Liánchéng Hakka Wēnzhōu Níngbō Sūzhōu Shànghǎi Shànghǎi_B Wú Nánchāng Ānyì Gàn Chángshā Shuāngfēng Xiāng Yàngshān Wǔhàn Níngxià Chéngdū Běijīng Tàiyuán Yúcì Guānhuà 月 月娘 月光佛 月光 月亮 月明 ‘MOON’ ‘MOON-MOTHER’ ‘MOON-LIGHT’ ‘MOON-LIGHT-SUFFIX’ ‘MOON-SHINE’ ‘MOON-BRIGHT’ List (2016, Journal of Language Evolution) 23 / 38

Slide 58

Slide 58 text

Compounding Analysis of Compounds Ancestral State Reconstruction Fúzhōu Táiběi Xiàmén Zhāngpíng Mǐn Guǎngzhōu Měixiàn Liánchéng Hakka Wēnzhōu Níngbō Sūzhōu Shànghǎi Shànghǎi_B Wú Nánchāng Ānyì Gàn Chángshā Shuāngfēng Xiāng Yàngshān Wǔhàn Níngxià Chéngdū Běijīng Tàiyuán Yúcì Guānhuà 月 月娘 月光佛 月光 月亮 月明 ‘MOON’ ‘MOON-MOTHER’ ‘MOON-LIGHT’ ‘MOON-LIGHT-SUFFIX’ ‘MOON-SHINE’ ‘MOON-BRIGHT’ List (2016, Journal of Language Evolution) 23 / 38

Slide 59

Slide 59 text

Compounding Analysis of Compounds Ancestral State Reconstruction Fúzhōu Táiběi Xiàmén Zhāngpíng Mǐn Guǎngzhōu Měixiàn Liánchéng Hakka Wēnzhōu Níngbō Sūzhōu Shànghǎi Shànghǎi_B Wú Nánchāng Ānyì Gàn Chángshā Shuāngfēng Xiāng Yàngshān Wǔhàn Níngxià Chéngdū Běijīng Tàiyuán Yúcì Guānhuà 月 月娘 月光佛 月光 月亮 月明 ‘MOON’ ‘MOON-MOTHER’ ‘MOON-LIGHT’ ‘MOON-LIGHT-SUFFIX’ ‘MOON-SHINE’ ‘MOON-BRIGHT’ List (2016, Journal of Language Evolution) 23 / 38

Slide 60

Slide 60 text

Compounding Analysis of Compounds Ancestral State Reconstruction Fúzhōu Táiběi Xiàmén Zhāngpíng Mǐn Guǎngzhōu Měixiàn Liánchéng Hakka Wēnzhōu Níngbō Sūzhōu Shànghǎi Shànghǎi_B Wú Nánchāng Ānyì Gàn Chángshā Shuāngfēng Xiāng Yàngshān Wǔhàn Níngxià Chéngdū Běijīng Tàiyuán Yúcì Guānhuà 月 月娘 月光佛 月光 月亮 月明 ‘MOON’ ‘MOON-MOTHER’ ‘MOON-LIGHT’ ‘MOON-LIGHT-SUFFIX’ ‘MOON-SHINE’ ‘MOON-BRIGHT’ List (2016, Journal of Language Evolution) 23 / 38

Slide 61

Slide 61 text

Compounding Analysis of Compounds Ancestral State Reconstruction Fúzhōu Táiběi Xiàmén Zhāngpíng Mǐn Guǎngzhōu Měixiàn Liánchéng Hakka Wēnzhōu Níngbō Sūzhōu Shànghǎi Shànghǎi_B Wú Nánchāng Ānyì Gàn Chángshā Shuāngfēng Xiāng Yàngshān Wǔhàn Níngxià Chéngdū Běijīng Tàiyuán Yúcì Guānhuà 月 月娘 月光佛 月光 月亮 月明 ‘MOON’ ‘MOON-MOTHER’ ‘MOON-LIGHT’ ‘MOON-LIGHT-SUFFIX’ ‘MOON-SHINE’ ‘MOON-BRIGHT’ List (2016, Journal of Language Evolution) 23 / 38

Slide 62

Slide 62 text

Compounding Analysis of Compounds Ancestral State Reconstruction Fúzhōu Táiběi Xiàmén Zhāngpíng Mǐn Guǎngzhōu Měixiàn Liánchéng Hakka Wēnzhōu Níngbō Sūzhōu Shànghǎi Shànghǎi_B Wú Nánchāng Ānyì Gàn Chángshā Shuāngfēng Xiāng Yàngshān Wǔhàn Níngxià Chéngdū Běijīng Tàiyuán Yúcì Guānhuà 月 月娘 月光佛 月光 月亮 月明 ‘MOON’ ‘MOON-MOTHER’ ‘MOON-LIGHT’ ‘MOON-LIGHT-SUFFIX’ ‘MOON-SHINE’ ‘MOON-BRIGHT’ List (2016, Journal of Language Evolution) 23 / 38

Slide 63

Slide 63 text

Compounding Analysis of Compounds Pattern Analysis Pattern Freq. Note 82 Mostly a-prefix followed by non-cognate items, e.g. Bola [a³¹ + thaʔ⁵⁵] vs. Achang [a31 + lum31[ ‘above’. 58 Mostly one plain noun vs. one with an additional part, e.g. Bola [tʃa◌̱ m⁵⁵] vs. Lashi [tsɔ◌̱ m⁵⁵ + mou⁵⁵] {cloud + sky} ‘cloud’. 53 Mostly one plain noun vs. one with an additional part in front, often simply the a-suffix, e.g., Atsi [siŋ²¹] vs. Achang [a³¹ + ʂəŋ³¹] ‘liver’. 36 Common main noun, but different preceding noun, e.g., Bola [mjaŋ⁵⁵ + kʰui³⁵] {thunder + dog} vs. Lashi [wɔm³¹ + kʰui⁵⁵] {bear + dog} ‘wolf’. 34 The common part is prefixed in one word and suffixed in the other, e.g., Atsi [u²¹ + tsham⁵¹] {head + hair} vs. Rangoon [shɑ◌̃ ²² + pĩ²²] {hair + ?} ‘hair (of head)’. remaining 77 30 unique patterns are remaining, 15 only occuring once Partial Cognate Patterns in Burmish Languages (Hill & List WIP) 24 / 38

Slide 64

Slide 64 text

Compounding Analysis of Compounds Pattern Analysis To our knowledge, partial cognate pattern anal- ysis has not been carried out in linguistics so far. It may, however, turn out to provide us with important evidence on the directionality of certain patterns. Regardless of directionality, it may also be interesting to see to which degree language families in which compounding is fre- quent differ in their major patterns. 25 / 38

Slide 65

Slide 65 text

Derivation Derivation 26 / 38

Slide 66

Slide 66 text

Derivation Preliminaries Handling Derivation in Sino-Tibetan Languages The following is common work with Guillaume Jacques (CRLAO, Paris), who provided a first explicit annotation of verbal derivation in Kiranti languages (an annotated version of Jacques forthcoming, A reconstruction of Proto-Kiranti verb roots, Folia Linguistica Historica). We de- veloped the ideas for handling and preliminary ideas for analysing derivation in several discus- sions during the last months. 27 / 38

Slide 67

Slide 67 text

Derivation Preliminaries Derivation in SEA Languages *rib rab.rib srib sa.srib grib grib.ma sgrib ɴgrib Exemplary derivations of word forms from the Tibetan root *rib 'to be dark', as proposed by Jacques (2016, Une famille de mot en Tibétain, URL: https://panchr.hypotheses.org/1273), visualized as a derivation tree. The form itself is not attested, but it can be supposed based on the semantic relations between the members of the word family. 28 / 38

Slide 68

Slide 68 text

Derivation Preliminaries Levels of Annotation Hierarchical Aspect The hierarchical aspect plays a much more crucial role in derivations than in plain transparent compounds. Instead of a simple morpheme segmentation, an annotation of deriva- tions needs to account for this aspect. Paradigmatic Aspect While compounds are syntagmatic structures, derivations may further show paradigmatic variation. If derivation is expressed by a voicing contrast, as, for example, the anti- causative in some Kiranti languages (Jacques forthcoming), this is a paradigmatic change which cannot be modeled with help of alignments. A transparent annotation of derivations also needs to account for this. 29 / 38

Slide 69

Slide 69 text

Derivation Annotation of Derivation Language-Internal Annotation Basic Principle of Annotation As a general idea for annotation, we can model syntagmatic derivation in a similar way in which we propose to model compounding. As a rule, the machine-readable transcription of words is provided in morpheme-segmented form. Hierarchical Annotation In most cases, the hierarchical aspect can be modeled by annotating that one element attaches to another element. Bantawa sakt ‘to weed’, an applicative derivation of an in- transitive verb, can thus be written as "s a k ← t", indi- cating that the applicative-marker "t" attaches as suffix to the main morpheme (Jacques forthcoming). 30 / 38

Slide 70

Slide 70 text

Derivation Annotation of Derivation Language-Internal Annotation Paradigmatic Annotation Paradigmatic annotation requires two additional layers, to al- low for a sufficiently abstract handling. Each word form is linked to a root in a separate ROOT column, and a stem in a separate STEM column. The syntagmatic derivations can be placed in a DERIVATION column, in which derivation is annotated in a similar form, as we used for the word-form- glossing. The hierarchical relations underlying the derivations need to be handled independently of the word list and can be passed in form of directed networks to the word list. 31 / 38

Slide 71

Slide 71 text

Derivation Annotation of Derivation Language-Internal Annotation Cognate Annotation Cognate annotation follows the partial cognate annotation principle, but word forms with paradigm variation in their stems are assigned different identifiers, reflecting the fact that they cannot be aligned. An additional cognate identifier is needed to make sure that different stems can be assigned to the same root. 31 / 38

Slide 72

Slide 72 text

Derivation Annotation of Derivation Language-External Annotation We can use the language-internal annotation also for language-external annotation, with the difference, that our root and stem cells will now serve for the proto-form of the proposed proto-language. Problems we cannot yet suffi- ciently handle are those where the actual derivation cannot be traced back to the proto-language, as this will require to introduce intermediate ancestral forms. More examples are needed to handle these cases. 32 / 38

Slide 73

Slide 73 text

Derivation Annotation of Derivation Language-External Annotation DOCULECT SEGMENTS ROOT STEM DERIVATION French sol←ej *soh₂wl- *soh₂wl + ? REKTUS DIM Spanish sol *soh₂wl- *soh₂wl REKTUS German zɔnɛ *soh₂wl- *sh₂en OBLIQUUS Swedish suːl *soh₂wl- *soh₂wl REKTUS Annotating words for ‘sun’ in the four Indo-European languages. 32 / 38

Slide 74

Slide 74 text

Derivation Annotation of Derivation Language-External Annotation 32 / 38

Slide 75

Slide 75 text

Derivation Annotation of Derivation Language-External Annotation 32 / 38

Slide 76

Slide 76 text

Derivation Annotation of Derivation Language-External Annotation 32 / 38

Slide 77

Slide 77 text

Derivation Annotation of Derivation Language-External Annotation Given the derivation patterns as provided by G. Jacques, the data structure in the demo application could be generated automatically. The same applies to the stems, which were derived by applying the conversion rules proposed for each derivation process. Already in this preliminary form, the data provides a much larger degree of transparency than the clas- sical etymological dictionaries, and further analyses can be easily carried out. 32 / 38

Slide 78

Slide 78 text

Derivation Analysis of Derivation Phylogenetic Reconstruction (?) Since the direction in compounding processes is really hard to estimate (and may well be highly language-specific or even show no preference at all), we need a reference phy- logeny to get initial insights into the major processes. In derivation, however, we usually have a clear idea regarding the direction of the processes. Since directional processes are very useful for phylogenetic estimation, we suppose that a larger dataset annotated in this form, may be very suitable for phylogenetic reconstruction methods. For this, however, more languages need to be added to G. Jacques’ Kiranti database. 33 / 38

Slide 79

Slide 79 text

Derivation Analysis of Derivation Pattern Analysis 18 4 8 4 14 2 2 70 3 9 10 9 causative reflexive deponent applicative anticausative intransitive transitive Frequency of derivational transitions in G. Jacques’ Kiranti data. 34 / 38

Slide 80

Slide 80 text

Outlook Ausblick Outlook 35 / 38

Slide 81

Slide 81 text

Outlook Challenges for the Future Analogy So far, we cannot handle analogy, but we need to handle it at some point, both in annotation and analysis. Software and Tools We need better heuristics to produce proper data and the tools to correct computational preprocessing within the frameworks for annotation which we define. Complexity and Comparability We surely do not cover all what is possible in languages at the moment, be it in synchrony or diachrony, so we need to develop more and more examples to enhance our specifica- tion. 36 / 38

Slide 82

Slide 82 text

Outlook Linguists, and I myself, often complain about the insuffi- ciency of the new biological methods applied to linguistic data. But if we, as linguists, fail to annotate our data in such a way that it can be easily compared across languages and applications, if we do not work harder to make our data transparent enough so that it can be read by humans and machines, we should not complain that computational lin- guists use lousy data and lousy algorithms. Better and suf- ficient algorithms will come, if only we manage to increase the comparability of our linguistic data. 37 / 38

Slide 83

Slide 83 text

Outlook Merci pour votre attention! 38 / 38