Slide 1

Slide 1 text

Studying Language Contact within a Computer-Assisted Framework Johann-Mattis List Research Group “Computer-Assisted Language Comparison” Department of Linguistic and Cultural Evolution Max-Planck Institute for the Science of Human History Jena, Germany 2019-06-01 very long title P(A|B)=P(B|A)... 1 / 32

Slide 2

Slide 2 text

Introduction Introduction Introduction Language contact and lexical borrowing 2 / 32

Slide 3

Slide 3 text

Introduction Language Contact and Language History Language History August Schleicher (1821-1868) 3 / 32

Slide 4

Slide 4 text

Introduction Language Contact and Language History Language History August Schleicher (1821-1868) “These assumptions, which follow logically from the results of our re- search, can be best illustrated by the image of a branching tree.” (Schle- icher 1853: 787) 3 / 32

Slide 5

Slide 5 text

Introduction Language Contact and Language History Language History Schleicher (1853) 4 / 32

Slide 6

Slide 6 text

Introduction Language Contact and Language History Language Contact Johannes Schmidt (1843-1901) “I want to replace [the tree] by the im- age of a wave that spreads out from the center in concentric circles be- coming weaker and weaker the far- ther they get away from the center.” (Schmidt 1872: 27, my translation) 5 / 32

Slide 7

Slide 7 text

Introduction Language Contact and Language History Language Contact Schmidt (1875) 6 / 32

Slide 8

Slide 8 text

Introduction Language Contact and Language History Language History and Language Contact Hugo Schuchardt (1842-1927) 7 / 32

Slide 9

Slide 9 text

Introduction Language Contact and Language History Language History and Language Contact Hugo Schuchardt (1842-1927) “We connect the branches and twigs of the tree with countless horizon- tal lines and it ceases to be a tree.” (Schuchardt 1870 [1900]: 11) 7 / 32

Slide 10

Slide 10 text

Introduction Language Contact and Language History Language History and Language Contact 8 / 32

Slide 11

Slide 11 text

Introduction Language Contact and Language History Language History and Language Contact 8 / 32

Slide 12

Slide 12 text

Introduction Studying Language Contact Similarities between Languages similarities coincidental Grk. theós Lat. deus ‘god’ non-coincidental natural Chi. māma Ger. Mama ‘mother’ non-natural genealogical Eng. tooth Ger. Zahn ‘tooth’ non-genealogical Eng. Marlboro Chi. wànbǎolù proper name List (2014): DUP: Düsseldorf, List (forthcoming) 9 / 32

Slide 13

Slide 13 text

Introduction Studying Language Contact Detecting Language Contact Evidence Example direct Cantonese [t￿ai³³-iœŋ²¹] ￿￿ (Mandarin tàiyáng) phylogeny-related English mountain vs. French montagne, Spanish montaña trait-related German Damm vs. English dam distribution-based German Job, Joker, Junkie, Journal . List (forthcoming) 10 / 32

Slide 14

Slide 14 text

Introduction Studying Language Contact Detecting Language Contact convenient shortcuts: treat lookalikes between Chinese and Hmong-Mien as borrowings from Chinese, for historical reasons (Ratliff 2010) assume all vocabulary from a specific semantic field to be borrowed (e.g., religion, seafaring, etc.) 11 / 32

Slide 15

Slide 15 text

Introduction Computational Historical Linguistics Computational Historical Linguistics starting in the early 21st century with phylogenetic approaches (Gray and Atkinson 2003, Ringe et al. 2002) accompanied by pioneering work on sequence comparison (Kondrak 2000) later followed by more and more approaches on different topics (phylogenetic networks, Nakhleh et al. 2005, automatic cognate detection, Hauer and Kondrak 2011), now a fully established sub-field of historical linguistics 12 / 32

Slide 16

Slide 16 text

Introduction Computational Historical Linguistics Computational Approaches to Language Contact Proposed solutions: conflicts in the phylogeny, explain them by invoking borrowings (MLN approach, Nelson-Sathi et al. 2011, List et al. 2014) similar words among unrelated languages (Mennecier et al. 2016) tree reconciliation methods (Willems et al. 2016) borrowability statistics (Sergey Yakhontov, as reported by Starostin 1990, Chén 1996, McMahon et al. 2005) 13 / 32

Slide 17

Slide 17 text

Introduction Computational Historical Linguistics Computational Approaches to Language Contact Performance of proposed solutions: conflicts in the phylogeny tend to overestimate the amount of borrowing, since there are multiple reasons for conflicts in phylogenies, not only borrowing (Morrison 2011) sequence comparison on unrelated languages seem solid, but one needs to be careful with chance resemblances based on onomatopoetic words etc. (mama, papa, etc., Jakobson 1960, Blasi et al. 2016) tree reconciliation methods are unrealistic if word trees are derived from simple edit distances sublist-approaches may be useful, but they require large accounts on known borrowings, which we usually lack 13 / 32

Slide 18

Slide 18 text

Computer-Assisted Language Comparison Computer-Assisted Language Comparison very long title P(A|B)=P(B|A)... 14 / 32

Slide 19

Slide 19 text

Computer-Assisted Language Comparison Background Historical Linguistics in the Digital Age data in linguistics are steadily increasing our qualitative methods reach their practical limits we need to take computational methods into account but computational methods are not very accurate and may yield wrong results 15 / 32

Slide 20

Slide 20 text

Computer-Assisted Language Comparison Project Project 16 / 32

Slide 21

Slide 21 text

Computer-Assisted Language Comparison Project Project 16 / 32

Slide 22

Slide 22 text

Computer-Assisted Language Comparison Project Project 16 / 32

Slide 23

Slide 23 text

Computer-Assisted Language Comparison Project Project 16 / 32

Slide 24

Slide 24 text

Computer-Assisted Language Comparison Project Project 16 / 32

Slide 25

Slide 25 text

Computer-Assisted Language Comparison Project Project 16 / 32

Slide 26

Slide 26 text

Computer-Assisted Language Comparison Project Project 16 / 32

Slide 27

Slide 27 text

Computer-Assisted Language Comparison Project Project 16 / 32

Slide 28

Slide 28 text

Computer-Assisted Language Comparison Project Project 16 / 32

Slide 29

Slide 29 text

Computer-Assisted Language Comparison Project Project 16 / 32

Slide 30

Slide 30 text

Computer-Assisted Language Comparison Project Project 16 / 32

Slide 31

Slide 31 text

Computer-Assisted Language Comparison CALC CALC very long title P(A|B)=P(B|A)... ERC Starting Grant (2017-2022) Host: MPI-SHH (Jena) Current team: 2 post-docs, 2 docs, and myself Objectives go beyond historical linguistics and Sino-Tibetan (but they are our starting point) http://calc.digling.org 17 / 32

Slide 32

Slide 32 text

Studying Language Contact with CALC Studying Borrowing with CALC $ 18 / 32

Slide 33

Slide 33 text

Studying Language Contact with CALC Computer-Assisted Problem Solving Computer-Assisted Problem Solving 1 identify the core class of your problem (modeling, inference, analysis) 2 formalize the problem in a way that allows one to test it (specify data and techniques for evaluation) 3 do not hesitate to define sub-problems, given that qualitative solutions are often holistic 4 look at existing qualitative solutions 5 search for inspiration in neighboring disciplines (graph theory, computer science, evolutionary biology) by looking for similar processes that could be addressed in an analogous or similar way 6 accept a qualitative or semi-automatic solution for inference processes, but make sure that the results are annotated in a machine-readable way 7 insist on transparent output (no black boxes) to allow for an immediate review of results by experts 19 / 32

Slide 34

Slide 34 text

Studying Language Contact with CALC Computer-Assisted Problem Solving Modeling, Inference, and Analysis 20 x 10 x 5 x ? Modeling Inference Analysis 20 / 32

Slide 35

Slide 35 text

Studying Language Contact with CALC Computer-Assisted Problem Solving Identify Core Problems (1-3) borrowing is a process that happened during different stages in time, reflected in form of borrowing or contact layers identification of source and target of borrowings is almost impossible without knowing the history of a given area distinguishing borrowing from inheritance, chance, and typological patterns of denotation is also difficult for classical linguistics contact areas may overlap 21 / 32

Slide 36

Slide 36 text

Studying Language Contact with CALC Computer-Assisted Problem Solving Look at Existing Solutions (4-6) recent borrowings can be detected with automatic sequence comparison approaches (Mennecier et al. 2016) searching for borrowings in unrelated languages spoken in similar regions can control for inheritance (Mennecier et al. 2016) highly advanced techniques for cognate detection are available by now (List 2014, List et al. 2017) methods for clustering and partitioning are well advanced, but need to be applied in a correct fashion 22 / 32

Slide 37

Slide 37 text

Studying Language Contact with CALC Computer-Assisted Problem Solving Insist on Transparent Output (7) lift the data to a high standard of phonetic transcriptions use interactive applications to transparently share the findings rigorous testing and training on datasets from different languages of the world prefer direct output (concrete items identifying a contact area) for initial studies 23 / 32

Slide 38

Slide 38 text

Example from SEA Languages Example from South-East Asian Languages Burmish_Achang Baheng, East Baheng_West Bana Biao Min Sui_Banliang Sinitic_Changsha Sinitic_Chaozhou Sinitic_Chengdu Chuanqiandian Chuanqiandian_Central_Guizhou Chuanqiandian_Northeast_Yunnan Chuanqiandian_Southern_Guizhou Dongnu Sinitic_Guangzhou Bai_Jianchuan Jiongnai Kim_Mun Sinitic_Kunming Bai_Luobenzhuo Luobuohe_Eastern Luobuohe_Western Sinitic_Meixian Mien Sinitic_Nanchang Numao Nunu Sui_Pandong Qiandong_East Qiandong_North Qiandong_South Qiandong_West Sui_Sandong She Sinitic_Xi_an Xiangxi_East Xiangxi_West Bai_Xiangyun Sinitic_Yangjiang Yi_Dafang Yi_Mile Yi_Mojiang Yi_Nanhua Yi_Nanjian Yi_Xide Younuo Zao_Min 24 / 32

Slide 39

Slide 39 text

Example from SEA Languages Data Preparation Language Data 48 SEA languages from three different families (Sino-Tibetan, Hmong-Mien, Tai-Kadai), aggregated from four different sources (Beijing University 1964, Sun et al. 1991, Chen 2012, Castro 2015) unified phonetic transcriptions following the Cross-Linguistic Transcription System framework (Anderson et al. forthcoming, https://clts.clld.org) unification of elicitation glosses with help of Concepticon (List et al. 2016, https://concepticon.clld.org) data curation following the principles of the Cross-Linguistic Data Formats initiative (Forkel et al. 2018, https://cldf.clld.org) first inspection of data with help of EDICTOR (List 2017, http://edictor.digling.org) 25 / 32

Slide 40

Slide 40 text

Example from SEA Languages Borrowing Inference Borrowing Inference A within-family cognate detection using LexStat as implemented in LingPy (List 2014, List et al. 2017) B cross-family borrowing detection using a new feature-based prosody-aware approach for pronunciation distance calculation and flat clustering approach C interactive analysis of inferences D partition cognate sets into groups indicative of a contact zone 26 / 32

Slide 41

Slide 41 text

Example from SEA Languages Borrowing Inference Borrowing Inference: Pronunciation Distance Calculation pronunciation distance depends on prosody (with weak and strong positions in each word, see List 2014) feature systems for huge numbers of sounds were lacking so far, but are available now with CLTS (Anderson et al. forthcoming) alignment methods are well-developed and can be used to compare words beforehand (List 2014) The approach is work in progress, contact me for more information. 27 / 32

Slide 42

Slide 42 text

Example from SEA Languages Data Analysis Data Analysis: Contact Areas 28 / 32

Slide 43

Slide 43 text

Example from SEA Languages Data Analysis Data Analysis: Contact Areas Burmish_Achang Baheng, East Baheng_West Bana Sinitic_Beijing Biao Min Sui_Banliang Sinitic_Changsha Sinitic_Chaozhou Sinitic_Chengdu Chuanqiandian Chuanqiandian_Central_Guizhou Chuanqiandian_Northeast_Yunnan Chuanqiandian_Southern_Guizhou Dongnu Sinitic_Guangzhou Bai_Jianchuan Jiongnai Kim_Mun Sinitic_Kunming Bai_Luobenzhuo Luobuohe_Eastern Luobuohe_Western Sinitic_Meixian Mien Sinitic_Nanchang Numao Nunu Sui_Pandong Qiandong_East Qiandong_North Qiandong_South Qiandong_West Sui_Sandong She Sinitic_Xi_an Xiangxi_East Xiangxi_West Bai_Xiangyun Sinitic_Yangjiang Yi_Dafang Yi_Mile Yi_Mojiang Yi_Nanhua Yi_Nanjian Yi_Xide Younuo Zao_Min Burmish Hmongic Sinitic Mienic Sui Bai Nesu 28 / 32

Slide 44

Slide 44 text

Example from SEA Languages Data Analysis Data Analysis: Contact Areas two major contact areas (Hmong-Mien and Sui, Sinitic/Bai and Hmong-Mien) not all languages under similar influence inspection shows that most borrowings can be confirmed 28 / 32

Slide 45

Slide 45 text

Example from SEA Languages Data Analysis Data Analysis: Contact Areas partitioning cognate sets and their associated meanings based on their distribution across languages yields about 6 groups in which five and more concepts are consistently shared the groups show different distributions and offer additional insights into the distribution of shared lexical traits as some problems are not yet handled (missing data, specific coding errors), a manual analysis should ideally start from here 28 / 32

Slide 46

Slide 46 text

Example from SEA Languages Data Analysis Data Analysis: Contact Areas ASK (INQUIRE) BEAN BIG BIRD CHICKEN CRY DAY (NOT NIGHT) DIE DRINK DUCK EGG FAECES (EXCREMENT) FAR HORSE HUNDRED KILL OLD (USED) ROPE THIS Burmish_Achang Baheng, East Baheng_West Bana Biao Min Sui_Banliang Sinitic_Changsha Sinitic_Chaozhou Sinitic_Chengdu Chuanqiandian Chuanqiandian_Central_Guizhou Chuanqiandian_Northeast_Yunnan Chuanqiandian_Southern_Guizhou Dongnu Sinitic_Guangzhou Bai_Jianchuan Jiongnai Kim_Mun Sinitic_Kunming Bai_Luobenzhuo Luobuohe_Eastern Luobuohe_Western Sinitic_Meixian Mien Sinitic_Nanchang Numao Nunu Sui_Pandong Qiandong_East Qiandong_North Qiandong_South Qiandong_West Sui_Sandong She Sinitic_Xi_an Xiangxi_East Xiangxi_West Bai_Xiangyun Sinitic_Yangjiang Yi_Dafang Yi_Mile Yi_Mojiang Yi_Nanhua Yi_Nanjian Yi_Xide Younuo Zao_Min Sui Sinitic Nesu Bai Mienic Burmish Hmongic 28 / 32

Slide 47

Slide 47 text

Example from SEA Languages Data Analysis Data Analysis: Contact Areas BEAR BITE CHILI PEPPER CHOOSE FAST HOE PEAR POOR SALTY WASH Burmish_Achang Baheng, East Baheng_West Bana Biao Min Sui_Banliang Sinitic_Changsha Sinitic_Chaozhou Sinitic_Chengdu Chuanqiandian Chuanqiandian_Central_Guizhou Chuanqiandian_Northeast_Yunnan Chuanqiandian_Southern_Guizhou Dongnu Sinitic_Guangzhou Bai_Jianchuan Jiongnai Kim_Mun Sinitic_Kunming Bai_Luobenzhuo Luobuohe_Eastern Luobuohe_Western Sinitic_Meixian Mien Sinitic_Nanchang Numao Nunu Sui_Pandong Qiandong_East Qiandong_North Qiandong_South Qiandong_West Sui_Sandong She Sinitic_Xi_an Xiangxi_East Xiangxi_West Bai_Xiangyun Sinitic_Yangjiang Yi_Dafang Yi_Mile Yi_Mojiang Yi_Nanhua Yi_Nanjian Yi_Xide Younuo Zao_Min Sui Sinitic Nesu Bai Mienic Burmish Hmongic 28 / 32

Slide 48

Slide 48 text

Example from SEA Languages Data Analysis Data Analysis: Contact Areas BE HUNGRY FIREWOOD HARD JUMP MOUTH SOUP THIN (SLIM) WELL Burmish_Achang Baheng, East Baheng_West Bana Biao Min Sui_Banliang Sinitic_Changsha Sinitic_Chaozhou Sinitic_Chengdu Chuanqiandian Chuanqiandian_Central_Guizhou Chuanqiandian_Northeast_Yunnan Chuanqiandian_Southern_Guizhou Dongnu Sinitic_Guangzhou Bai_Jianchuan Jiongnai Kim_Mun Sinitic_Kunming Bai_Luobenzhuo Luobuohe_Eastern Luobuohe_Western Sinitic_Meixian Mien Sinitic_Nanchang Numao Nunu Sui_Pandong Qiandong_East Qiandong_North Qiandong_South Qiandong_West Sui_Sandong She Sinitic_Xi_an Xiangxi_East Xiangxi_West Bai_Xiangyun Sinitic_Yangjiang Yi_Dafang Yi_Mile Yi_Mojiang Yi_Nanhua Yi_Nanjian Yi_Xide Younuo Zao_Min Sui Sinitic Nesu Bai Mienic Burmish Hmongic 28 / 32

Slide 49

Slide 49 text

Example from SEA Languages Data Analysis Data Analysis: Contact Areas ANT CLAW MONKEY SPARROW SWEET POTATO YOUNGER BROTHER Burmish_Achang Baheng, East Baheng_West Bana Biao Min Sui_Banliang Sinitic_Changsha Sinitic_Chaozhou Sinitic_Chengdu Chuanqiandian Chuanqiandian_Central_Guizhou Chuanqiandian_Northeast_Yunnan Chuanqiandian_Southern_Guizhou Dongnu Sinitic_Guangzhou Bai_Jianchuan Jiongnai Kim_Mun Sinitic_Kunming Bai_Luobenzhuo Luobuohe_Eastern Luobuohe_Western Sinitic_Meixian Mien Sinitic_Nanchang Numao Nunu Sui_Pandong Qiandong_East Qiandong_North Qiandong_South Qiandong_West Sui_Sandong She Sinitic_Xi_an Xiangxi_East Xiangxi_West Bai_Xiangyun Sinitic_Yangjiang Yi_Dafang Yi_Mile Yi_Mojiang Yi_Nanhua Yi_Nanjian Yi_Xide Younuo Zao_Min Sui Sinitic Nesu Bai Mienic Burmish Hmongic 28 / 32

Slide 50

Slide 50 text

Example from SEA Languages Data Analysis Data Analysis: Contact Areas DRINK FAST NOSE THICK THUNDER Burmish_Achang Baheng, East Baheng_West Bana Biao Min Sui_Banliang Sinitic_Changsha Sinitic_Chaozhou Sinitic_Chengdu Chuanqiandian Chuanqiandian_Central_Guizhou Chuanqiandian_Northeast_Yunnan Chuanqiandian_Southern_Guizhou Dongnu Sinitic_Guangzhou Bai_Jianchuan Jiongnai Kim_Mun Sinitic_Kunming Bai_Luobenzhuo Luobuohe_Eastern Luobuohe_Western Sinitic_Meixian Mien Sinitic_Nanchang Numao Nunu Sui_Pandong Qiandong_East Qiandong_North Qiandong_South Qiandong_West Sui_Sandong She Sinitic_Xi_an Xiangxi_East Xiangxi_West Bai_Xiangyun Sinitic_Yangjiang Yi_Dafang Yi_Mile Yi_Mojiang Yi_Nanhua Yi_Nanjian Yi_Xide Younuo Zao_Min Sui Sinitic Nesu Bai Mienic Burmish Hmongic 28 / 32

Slide 51

Slide 51 text

Example from SEA Languages Data Analysis Data Analysis: Contact Areas GRASS PEAR RIDE TIRED WALK Burmish_Achang Baheng, East Baheng_West Bana Biao Min Sui_Banliang Sinitic_Changsha Sinitic_Chaozhou Sinitic_Chengdu Chuanqiandian Chuanqiandian_Central_Guizhou Chuanqiandian_Northeast_Yunnan Chuanqiandian_Southern_Guizhou Dongnu Sinitic_Guangzhou Bai_Jianchuan Jiongnai Kim_Mun Sinitic_Kunming Bai_Luobenzhuo Luobuohe_Eastern Luobuohe_Western Sinitic_Meixian Mien Sinitic_Nanchang Numao Nunu Sui_Pandong Qiandong_East Qiandong_North Qiandong_South Qiandong_West Sui_Sandong She Sinitic_Xi_an Xiangxi_East Xiangxi_West Bai_Xiangyun Sinitic_Yangjiang Yi_Dafang Yi_Mile Yi_Mojiang Yi_Nanhua Yi_Nanjian Yi_Xide Younuo Zao_Min Sui Sinitic Nesu Bai Mienic Burmish Hmongic 28 / 32

Slide 52

Slide 52 text

Example from SEA Languages Data Analysis Data Analysis: Concept Statistics by checking the purity of cognates sets with respect to the families across which they occur, we can derive rankings of concepts, according to their relative borrowability in our dataset borrowability often thought of as a stable characteristics of concepts, also due to Swadesh’s doctrine of basic vocabulary, but it is clear that concepts evolve with culture, and terms for technical innovations may therefore be highly borrowable, as long as they are new, but they would later not be borrowed again therefore, all statistics on borrowability have to be taken with care, as they also reflect the history of a given region and not necessarily general patterns of language change 29 / 32

Slide 53

Slide 53 text

Example from SEA Languages Data Analysis Data Analysis: Concept Statistics 29 / 32

Slide 54

Slide 54 text

Example from SEA Languages Data Analysis Data Analysis: Concept Statistics 29 / 32

Slide 55

Slide 55 text

Example from SEA Languages Data Analysis Data Analysis: Concept Statistics a weak (Spearman rank: -0.18, p<0.005) negative correlation between the purity of concepts with respect to potential borrowings and the borrowing statistics of the World Loanword Project (WOLD, Haspelmath and Tadmor 2008) a weak (Spearman rank: 0.19, p<0.005) positive correlation with the WOLD project’s age score the new ranks based on concept purity could be used to expand the limited scope of the WOLD project systematically 29 / 32

Slide 56

Slide 56 text

Outlook Outlook *deh3 - ? 30 / 32

Slide 57

Slide 57 text

Outlook Outlook enhance the accuracy of our contact inference workflow apply to more language families (esp. South-American languages) work on inference of more ancient borrowings work on inference of borrowing directions enhance the interactive output 31 / 32

Slide 58

Slide 58 text

Outlook 32 / 32