Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Open problems in computational historical linguistics

Open problems in computational historical linguistics

Plenary talk, held at the 24th International Conference of Historical Linguistics (2019-07-01/05, Canberra, Australian National Universit).

Johann-Mattis List

July 02, 2019
Tweet

More Decks by Johann-Mattis List

Other Decks in Science

Transcript

  1. Open Problems in Computational Historical Linguistics Johann-Mattis List Research Group

    “Computer-Assisted Language Comparison” Department of Linguistic and Cultural Evolution Max-Planck Institute for the Science of Human History Jena, Germany 2019-07-02 very long title P(A|B)=P(B|A)... 1 / 60
  2. Introduction Problems Problems (we ignore) La Société n’adment aucune communication

    concernant, soit l’origine du langage, soit la création d’une langue universelle. (Statuts de la Société de Linguistique de Paris, 1866: III) . The Society will not allow any work dealing with the origin of language or the creation of a universal language. (rules of the Paris Society of Linguistics from 1866, my transl.) 3 / 60
  3. Introduction Problems Problems (we did not know about) The Proto-Sapiens

    grammar was so simple that the sporadic ref- erences in previous paragraphs have essentially described it. The prime importance of sound symbolism for the people of nature should be noted again before we further detail that the vowel “E” was felt as indicating the “yin” element, passivity, femininity etc. [...] (Papakitsos and Kenanidis 2018: 8) 4 / 60
  4. Introduction Problems Problems (we forgot) Based on an analysis of

    the literature and a large scale crowd- sourcing experiment, we estimate that an average 20-year-old na- tive speaker of American English knows 42,000 lemmas and 4,200 non-transparent multiword expressions, derived from 11,100 word families. (Brysbaert et al. 2016: 1) 5 / 60
  5. Introduction Hilbert Problems Hilbert Problems 23 problems identified by the

    mathematician David Hilbert in 1900 (Hilbert 1902) at least 10 problems have been solved by now some 7 problems have solutions accepted by some scientists 6 / 60
  6. Introduction Hilbert Problems Hilpert Problems Martin Hilpert proposed a list

    of problems for linguistics in a talk in 2014 . Russell D. Gray further promoted the idea in a series of talks, where he emphasized we should ask more Hilb/pert questions in the field of diversity linguistics 7 / 60
  7. Introduction Hilbert Problems Hilpert Problems Martin Hilpert proposed a list

    of problems for linguistics in a talk in 2014 Russell D. Gray further promoted the idea in a series of talks, where he emphasized we should ask more Hilb/pert questions in the field of diversity linguistics 7 / 60
  8. Introduction Problems in CHL Problems in Computational Historical Linguistics *deh3

    - ? The problems I want to discuss are “small” in comparison to big picture questions asked by Hilpert and Gray, “personal”, i.e., identified by myself, and not necessarily interesting to everybody, “solvable”, i.e., I guess they have a solution. I discuss them in the hope that they will help us to advance our research by forcing us to formalize our work. 8 / 60
  9. Open Problems Open Problems in Computational Historical Linguistics *deh3 -

    ? *deh3 - ? *deh 3 - ? *deh3 - ? *deh3 - ? 9 / 60
  10. Open Problems Background Background: A Series of Blog Posts 10

    problems in total initial basic division into problems of inference, simulation, statistics, and typology problems will be discussed on a monthly basis throughout 2019 first five problems were already discussed in February, March, April, May, and June 11 / 60
  11. Open Problems Inference Problems Inference Problems Inference 1 automated morpheme

    segmentation (blog in February 2019) 2 automated borrowing detection (blog in March 2019) 3 automated sound law induction (blog in April 2019) 4 automated phonological reconstruction (blog in May 2019) 13 / 60
  12. Open Problems Inference Problems Inference Problems Inference problems deal with

    something we want to find in lin- guistic data. Their common objective is to identify past and present processes and states of which we – due to our models – think that they have occurred or existed once, or still occur and exist. 14 / 60
  13. Open Problems Modeling Problems Modeling Problems Modeling 5 simulation of

    lexical change (blog in June 2019) 6 simulation of sound change 7 proof of language relatedness 15 / 60
  14. Open Problems Modeling Problems Modeling Problems The modeling problems deal

    with our knowledge about pro- cesses and how we account for the processes in a formal or mathematical way. Proof of language relatedness is a specific case, maybe not completely fitting into this category, but its key objective is to model chance resemblances, which is why it is basically also a modeling task and not a task of inference. 16 / 60
  15. Open Problems Analysis Problems Analysis Problems 20 x 10 x

    5 x ? Analysis 8 typology of semantic change 9 typology of semantic promiscuity 10 typology of sound change 17 / 60
  16. Open Problems Analysis Problems Analysis Problems The analysis problems deal

    with the bigger picture of the pro- cesses, and with the question if we can derive tendencies, rates, or frequencies from linguistic data. In order to achieve this, we need to infer the processes first, and this is the reason why these problems are discussed last. 18 / 60
  17. Open Problems Analysis Problems Analysis Problems: Semantic Promiscuity List et

    al. (2016): Unity and disunity [...]. Biology Direct. 19 / 60
  18. Open Problems Analysis Problems Analysis Problems: Semantic Promiscuity List (2018):

    Von Wortfamilien [...]. Von Wörtern und Bäumen. 19 / 60
  19. Open Problems Analysis Problems Analysis Problems: Semantic Promiscuity In der

    Linguistik gibt es noch keinen richtigen Terminus für Wörter, die selbst Grundlage von vielen anderen Wörtern sind [...]. In Anlehnung an die Biologie, wo wir in den Proteindomä- nen ähnliche Phänomene vorfinden [..], könnten wir jedoch von promiskuitiven Konzepten sprechen [...]. (List 2018: Von Wort- familien und promiskuitiven Wörtern) In linguistics, we lack a term for words that serve themselves as the basis for many other words [...]. Following biology, where we find similar phenomena with respect to protein domains [...], we could, however, speak of promiscuous concepts. (List 2018, my translation). 19 / 60
  20. Problem Solving CALC Computer-Assisted Language Comparison very long title P(A|B)=P(B|A)...

    Funding: ERC Starting Grant (2017-2022) Host: MPI-SHH (Jena) Current team: 2 post-docs, 2 docs, and myself Objectives: establish CALC framework for Sino-Tibetan and beyond http://calc.digling.org 22 / 60
  21. Problem Solving Mind the Machines Mind the Machines (?) [...]

    it was at the 1985 work- shop [...] that Fred Jelinek ut- tered the now immortal phrase “Every time we fire a phoneti- cian/linguist, the performance of our system goes up”. (Moore 2005: 1) SkyNet Use AI to Dismiss Traditional Linguists 23 / 60
  22. Problem Solving Mind the Machines Mind the Machines (!) Problems

    may have an exact solution. → Why search for an approximate one? Machine learning techniques are not apt for all tasks at hand. → We all need to leave our comfort zones! We do not only want to know what happened but why it happened! → Blackbox results are of no scientific value. Our data in historical linguistics is usually not big. → Big data solutions often do not work for small data. 24 / 60
  23. Problem Solving Computer-Assisted Problem Solving Computer-Assisted Problem Solving A identify

    the core class of your problem (modeling, inference, analysis) B look at existing qualitative solutions C formalize the problem in a way that allows you to test it D qualitative solutions are often holistic, do not hesitate to specify sub-problems E search for inspiration in neighboring disciplines by looking for similar processes F accept a qualitative or semi-automatic solution for inference, but make sure the results are also machine-readable G insist on transparent output to allow experts to review the results 25 / 60
  24. Possible Solutions Possible Solutions *deh3 - eu *deh3 - re

    *deh 3 - ka *deh3 - H *deh3 - ! for the Inference Problems 26 / 60
  25. Possible Solutions Morpheme Segmentation Automated Morpheme Segmentation: Task Given a

    list of less than 1000 words in phonetic transcription, readily seg- mented into sounds, with concepts mapped to common concept lists (e.g., Concepticon), identify the mor- pheme boundaries in the data. List (2019): “Automatic morpheme segmentation (Open problems in computa- tional diversity linguistics 1)”. GWPN 8.2. 27 / 60
  26. Possible Solutions Morpheme Segmentation Automated Morpheme Segmentation: Current Solutions Most

    algorithms build on n-grams (recurring symbol sequences of arbitrary length). Assuming that n-grams representing meaning-building units should be distributed more frequently across the lexicon of a language, they assemble n-gram statistics from the data. With Morfessor, there is a popular family of algorithms available in form of a stable library (Creutz and Lagus 2005, Virpioja et al. 2013). 28 / 60
  27. Possible Solutions Morpheme Segmentation Automated Morpheme Segmentation: Difficulty Ambiguity: morphemes

    are ambiguous, they are not only based on the form, but also on semantics. Fuzziness: morpheme boundaries are often fuzzy, even speakers may at times no longer understand the original morphology of their language. Task definition: morpheme boundaries depend on the task at hand, as morphological judgments can be based on different perspectives (historical perspective involving more than one language, speaker intuition, descriptive grammar). 30 / 60
  28. Possible Solutions Morpheme Segmentation Automated Morpheme Segmentation: Qualitative Solutions Semantic

    evidence: humans take semantics into account (compare Spanish herman-o “brother” vs. herman-a “sister”). Language-specific evidence: humans know that morphological structure varies across languages (compare SEA languages vs. Indo-European languages) and adjust their strategies accordingly. Phonetic evidence: humans try to infer phonotactic rules for the languages they work with. Cross-linguistic evidence: humans make use of comparisons across related languages to search for morpheme boundaries. 31 / 60
  29. Possible Solutions Morpheme Segmentation Automated Morpheme Segmentation: Suggestions To enhance

    our current methods, we need to A invest time to create datasets for testing and training, B employ semantic information (make use of new resources such as CLICS, Concepticon), C employ phonotactic information (make use of the prosody models in LingPy and new resources like CLTS), D employ cross-linguistic information (use sequence comparison techniques as those implemented in LingPy), and (maybe) E give up the idea of a universal morpheme segmentation algorithm (rather proceed from linguistic areas or families). 32 / 60
  30. Possible Solutions Morpheme Segmentation Automated Morpheme Segmentation: Current Work We

    pursue initial work on a Morpheme-Annotated Lexical Database (MOALD) in the CALC group, based on aggregation strategies used in the CLICS project (List et al. 2018), building on the standardization efforts of the CLDF initiative (Forkel et al. 2018), using data from individual collaborations on computer-assisted language comparison (e.g., T. C. Chacon for Tukanoan languages, A. Hantgan for Dogon languages), N. E. Schweikhard, doctoral student in the CALC group, will present these initial ideas in a talk titled ”Towards a Database of Morpheme-Annotated Wordlists” at the ICHL on Thursday. 33 / 60
  31. Possible Solutions Borrowing Detection Automated Borrowing Detection: Task Given word

    lists of different lan- guages, find out which words have been borrowed, and also determine the direction of borrowing. mountain mouse wifi List (2019): “Automatic borrowing detection (Open problems in computational diversity linguistics 2)”. GWPN 8.3. 34 / 60
  32. Possible Solutions Borrowing Detection Automated Borrowing Detection: Current Solutions Some

    approaches make use of conflicts in the phylogeny, explaining them by invoking borrowings (MLN approach, Nelson-Sathi et al. 2011, List et al. 2014). Some approaches search for similar words among unrelated languages (Mennecier et al. 2016). Tree reconciliation methods compute trees of individual words from different languages and then infer borrowing processes by comparing the individual word phylogenies with language phylogenies computed from all words together (Willems et al. 2016). Borrowability statistics (as proposed by Sergey Yakhontov, as reported by Starostin 1991, Chén 1996, or McMahon et al. 2005) can be used to compare commonalities across stable and less stable parts of vocabularies, assuming that commonalities in unstable parts can be attributed to borrowing. 35 / 60
  33. Possible Solutions Borrowing Detection Automated Borrowing Detection: Performance Conflicts in

    the phylogeny tend to overestimate the amount of borrowing, since there are multiple reasons for conflicts in phylogenies, not only borrowing (Morrison 2011). Sequence comparison on unrelated languages seems solid, but one needs to be careful with chance resemblances (mama, papa, etc., Jakobson 1960, Blasi et al. 2016), and we need to improve our metrics for phonetic similarity. Tree reconciliation methods are unrealistic if word trees are derived from simple edit distances, as it was done in the studies presented so far, and they also overestimate the amount of borrowing. Sublist-approaches may be useful, but they require large accounts on known borrowings, to derive the ranked lists, and it is not clear if borrowing rates are stable across times and places. 36 / 60
  34. Possible Solutions Borrowing Detection Automated Borrowing Detection: Difficulty Lack of

    positive criteria: detecting borrowing presupposes to exclude alternative reasons (inheritance, natural patterns, chance). Lack of unified criteria: there is no unified procedure for the identification of borrowings in the classical discipline. Difficulties in handling cumulative evidence: borrowing detection is much more based on multiple types of evidence (“consilience”, “cumulative evidence”) than other tasks in historical linguistics, and there is no straightforward way to weight the evidence. 37 / 60
  35. Possible Solutions Borrowing Detection Automated Borrowing Detection: Qualitative Solutions Direct

    evidence: by comparing the same language across different times, we can easily see if a word has been borrowed (cf. Cantonese [tʰai33-iœŋ21] “sun” with Mandarin tàiyáng). Phylogeny-related conflicts: seemingly cognate words that cannot be readily explained with the phylogeny of the languages, may often hint to borrowing (cf. English mountain and French montagne). Trait-related conflicts: when sound correspondences appear irregular, this is often a hint to borrowing (cf. German Damm vs. English dam). Distribution-related conflicts: Specific sounds or words with a specific phonotactic that occur only in specific semantic fields may point to borrowing (cf. German Joker, Job, Junkie, Journal). List (forthc.) “Automated methods [...]” Language Linguistics Compass. 38 / 60
  36. Possible Solutions Borrowing Detection Automated Borrowing Detection: Suggestions To enhance

    our current methods, we need to A increase cross-linguistic data in phonetic transcription, with consistent definition of meanings to search for similar words among unrelated languages, B test methods for automatic correspondence pattern recognition to search for trait-related conflicts (List 2019), C work on cross-linguistic datasets of known borrowed words to increase our knowledge of borrowability, and D investigate methods for concept-based stratification. 39 / 60
  37. Possible Solutions Borrowing Detection Automated Borrowing Detection: Current Work In

    the CALC project, we currently develop A cross-linguistic datasets to test borrowing relations in contact areas, based on high-quality data in phonetic transcription reflecting carefully selected concept lists, B new feature-based metrics of phonetic word similarity, based on the features developed for the CLTS project on Cross-Linguistic Transcription Systems (Anderson et al. 2019), and C methods for contact-zone detection based on a new method for cognate set partitioning. M.-S. Wu, doctoral student in the CALC group, will illustrate how we assemble data for South-East Asia in a talk titled “Studying language contact in South East Asia with help of computer-assisted approaches” at the ICHL on Tuesday. 40 / 60
  38. Possible Solutions Borrowing Detection Automated Borrowing Detection: Current Work ASK

    (INQUIRE) BEAN BIG BIRD CHICKEN CRY DAY (NOT NIGHT) DIE DRINK DUCK EGG FAECES (EXCREMENT) FAR HORSE HUNDRED KILL OLD (USED) ROPE THIS Burmish_Achang Baheng, East Baheng_West Bana Biao Min Sui_Banliang Sinitic_Changsha Sinitic_Chaozhou Sinitic_Chengdu Chuanqiandian Chuanqiandian_Central_Guizhou Chuanqiandian_Northeast_Yunnan Chuanqiandian_Southern_Guizhou Dongnu Sinitic_Guangzhou Bai_Jianchuan Jiongnai Kim_Mun Sinitic_Kunming Bai_Luobenzhuo Luobuohe_Eastern Luobuohe_Western Sinitic_Meixian Mien Sinitic_Nanchang Numao Nunu Sui_Pandong Qiandong_East Qiandong_North Qiandong_South Qiandong_West Sui_Sandong She Xiangxi_East Xiangxi_West Bai_Xiangyun Sinitic_Yangjiang Yi_Dafang Yi_Mile Yi_Mojiang Yi_Nanhua Yi_Nanjian Yi_Xide Younuo Zao_Min Sui Sinitic Nesu Bai Mienic Burmish Hmongic List (in prep.): “Automated detection of lexical strata”. 40 / 60
  39. Possible Solutions Borrowing Detection Automated Borrowing Detection: Current Work BE

    HUNGRY FIREWOOD HARD JUMP MOUTH SOUP THIN (SLIM) WELL Burmish_Achang Baheng, East Baheng_West Bana Biao Min Sui_Banliang Sinitic_Changsha Sinitic_Chaozhou Sinitic_Chengdu Chuanqiandian Chuanqiandian_Central_Guizhou Chuanqiandian_Northeast_Yunnan Chuanqiandian_Southern_Guizhou Dongnu Sinitic_Guangzhou Bai_Jianchuan Jiongnai Kim_Mun Sinitic_Kunming Bai_Luobenzhuo Luobuohe_Eastern Luobuohe_Western Sinitic_Meixian Mien Sinitic_Nanchang Numao Nunu Sui_Pandong Qiandong_East Qiandong_North Qiandong_South Qiandong_West Sui_Sandong She Xiangxi_East Xiangxi_West Bai_Xiangyun Sinitic_Yangjiang Yi_Dafang Yi_Mile Yi_Mojiang Yi_Nanhua Yi_Nanjian Yi_Xide Younuo Zao_Min ASK (INQUIRE) BEAN BIG BIRD CHICKEN CRY DAY (NOT NIGHT) DIE DRINK DUCK EGG FAECES (EXCREMENT) FAR HORSE HUNDRED KILL OLD (USED) ROPE THIS Sui Sinitic Nesu Bai Mienic Burmish Hmongic List (in prep.): “Automated detection of lexical strata”. 40 / 60
  40. Possible Solutions Sound Law Induction Automated Sound Law Induction: Task

    Given a list of words in an ancestral language and their reflexes in a descendant lan- guage, identify the sound laws by which the ancestor can be converted into the descendant. *p > *pf / #_ List (2019): “Automatic sound law induction (Open problems in computational diversity linguistics 3)”. GWPN 8.3. 41 / 60
  41. Possible Solutions Sound Law Induction Automated Sound Law Induction: Current

    Solutions No direct studies dealing with this task are known to me, but studies cov- ering similar tasks include simulation studies (see e.g., Ciobanu and Dinu 2018) for word prediction, manual tools to model sound change when providing sound laws (e.g., PHONO by Hartmann 2003), and correspondence-pattern based word prediction (List 2019, Bodt and List 2019). 42 / 60
  42. Possible Solutions Sound Law Induction Automated Sound Law Induction: Difficulty

    Methods for induction: the induction of rules as a problem is usually not addressed in machine learning solutions. Distant phonological context: triggering context for sound change may be found in arbitrary distances from the target sound. Abstract phonological context: “abstract” contexts from suprasegmentals (e.g. tone and stress) can also condition sound change. Systematic aspects of sound change: sound change often affects groups of phonemes in a similar manner, i.e., it effects parts of the phonological system of a language. 43 / 60
  43. Possible Solutions Sound Law Induction Automated Sound Law Induction: Qualitative

    Solutions Trial and error: there are no general strategies that scholars follow, instead, they seem to like to figure this out themselves, similar to people who like to solve Sudoku or Chess riddles. 44 / 60
  44. Possible Solutions Sound Law Induction Automated Sound Law Induction: Suggestions

    We can address the problem of sound law induction (at least in part) with help of techniques for multi-tiered sequence modeling (List 2014, List and Chacon 2015). The basic idea of these techniques is to represent words not only as consisting of a single sequence of sounds, but instead as some kind of partitura in which each of the different phonological aspects of a word form are given their own voice. This technique allows us then to model all different possible conditioning contexts in separate layers (tiers), and to use heuristics to search for those tiers which actually condition a given sound change. 45 / 60
  45. Possible Solutions Sound Law Induction Automated Sound Law Induction: Suggestions

    IPA Stress Orthography k i n d e r g a r t e n k ɪ n d ɐ g a ʁ t ə n 2 2 2 0 0 0 1 1 1 0 0 0 45 / 60
  46. Possible Solutions Sound Law Induction Automated Sound Law Induction: Suggestions

    IPA k ɪ n d ɐ g a ʁ t ə n Prec. CV # C V c C V C V c C V Preceding # k ɪ n d ɐ g a ʁ t ə Following ɪ n d ɐ g a ʁ t ə n $ Foll. CV V c C V C V c C V C $ 45 / 60
  47. Possible Solutions Sound Law Induction Automated Sound Law Induction: Suggestions

    Proto p p p p p p p p p p p Stress 2 2 2 2 2 1 0 1 0 1 0 Prec. CV # # C C V V V # C # C Foll. CV C V c c V V V C V c V Descendant p p p p f f f h h h h 45 / 60
  48. Possible Solutions Sound Law Induction Automated Sound Law Induction: Suggestions

    Proto p p p p p p p p p p p Stress 2 2 2 2 2 1 0 1 0 1 0 Prec. CV # # C C V V V # C # C Foll. CV C V c c V V V C V c V Descendant p p p p f f f h h h h 45 / 60
  49. Possible Solutions Sound Law Induction Automated Sound Law Induction: Current

    work In the CALC project, we currently develop A a Python library to work with multi-tiered sequence representations, B datasets for testing and training, and C methods and metrics for the evaluation of reconstruction systems based on multi-tiered sequence representations (see List forthc. for initial ideas in this regard). T. Tresoldi, post-doc in the CALC group, will further illustrate the usefulness of multi- tiered sequence representations in a talk titled “Automatic Induction of Sound Laws from Cognates” at the ICHL on Thursday. 46 / 60
  50. Possible Solutions Phonological Reconstruction Automated Phonological Reconstruction: Task Given a

    set of alignments of strict cognate morphemes across a set of related lan- guages, as well as the typ- ical correspondence patterns by which the sounds in the languages correspond to each other, try to infer the hypothet- ical pronunciation of each mor- pheme in the proto-language. * ₂ List (2019): “Automatic phonological reconstruction (Open problems in computa- tional diversity linguistics 4)”. GWPN 8.4. 47 / 60
  51. Possible Solutions Phonological Reconstruction Automated Phonological Reconstruction: Current Solutions Bouchard-Côté

    et al. (2013) use a framework that makes use of probabilistic string transducers. If the family tree of the languages is known, and cognate sets are defined as such, the method produces proto-form suggestions. In a forthcoming paper, Gerhard Jäger illustrates how classical methods for ancestral state reconstruction applied to aligned cognate sets could be used for phonological reconstruction and illustrates this for ASJP wordlists of the Romance languages (Jäger forthcoming). 48 / 60
  52. Possible Solutions Phonological Reconstruction Automated Phonological Reconstruction: Performance The method

    by Bouchard-Côté was only tested on Austronesian, and is not available, so it cannot be tested further without re-implementing from scratch. The scores reported are high (error rates between 0.25 and 0.12), but Austronesian is not a challenging candidate for reconstruction. Jäger’s method produces a set of words that is slightly more similar to Latin than the baseline (words from Sardinian). None of the methods is capable of producing sounds that are not found in any of the descendant languages. Evaluation of the methods is carried out with help of the edit distance, which is problematic, since the edit distance does not check for systematic similarities (List forthc.). List (forth.): “Beyond edit distances”. Theoretical Linguistics. 49 / 60
  53. Possible Solutions Phonological Reconstruction Automated Phonological Reconstruction: Difficulty Abstractness of

    reconstructions: scholars still disagree with respect to the question of how reconstruction should be best carried out, i.e., if it should be abstract or realistic (so-called abstractionalist-realist debate, Lass 2017, Jakobson 1958). Evaluation of reconstruction systems: no measures to account for the predictive quality of a given reconstruction system exist. Unattested states: reconstructing what cannot be found in the data, as in the case of laryngeals in Indo-European (Saussure 1879), does not have a counterpart in biology (or is simply ignored). 50 / 60
  54. Possible Solutions Phonological Reconstruction Automated Phonological Reconstruction: Qualitative Sol. Sound

    correspondence patterns: scholars determine the most frequent and salient sound correspondence patterns in their data and base the reconstructions on them (Anttila 1972, Meillet 1903). External evidence: scholars use external evidence where possible (e.g. from more distantly related languages). Internal reconstruction: scholars employ techniques of internal reconstruction where possible. Feature representations: scholars make make use of feature representations of sounds to propose unobserved sounds. 51 / 60
  55. Possible Solutions Phonological Reconstruction Automated Phonological Reconstruction: Suggestions To enhance

    the current methods, we need to A create sufficient data for testing and training (different language families, different time depths), B develop measures to compare different reconstruction systems (gold standard and algorithmic solution or competing “home-made” systems) both with each other and with respect to their power to predict the data of the descendant languages, C embrace the possibility of semi-automated reconstruction (e.g. by computing correspondence patterns from alignment data, see List 2019), and d investigate possibilities to take feature systems (as provided in Anderson et al. 2019) into account, in order to allow for the reconstruction of unobserved sounds. 52 / 60
  56. Possible Solutions Phonological Reconstruction Automated Phonological Reconstruction: Current Work In

    our work in the CALC project, we are currently A establishing linguistic reconstructions by collaborating with different researchers on specific subgroups, B testing semi-automatic methods for reconstruction, based on the algorithm for sound correspondence pattern detection by List (2019), C evaluating metrics for the comparison of reconstruction systems (List forthc.), and D testing multi-tier-based methods to test the predictive strength of different reconstruction systems. N. W. Hill (collaborator with CALC) will discuss the computer-assisted reconstruction of Proto-Burmish in his talk titled “Toward a computational implementation of the tradi- tional comparative method” at the ICHL on Thursday. T. A. Bodt (collaborator with CALC) will present how semi-automated reconstruction methods can be further used in a talk titled “The predictive capacity of a computer- assisted framework of the comparative method” at the ICHL on Thursday. 53 / 60
  57. Possible Solutions General Ideas General Ideas: Evaluation Problem: lack of

    good benchmark datasets (especially gold standards, training data, and baselines) and the lack of good evaluation measures Suggested solutions: simulation methods (produce test data) interfaces for data annotation (produce data and evaluate results) 54 / 60
  58. Possible Solutions General Ideas General Ideas: Standards are needed to

    make linguistic data comparable allow for a better integration of software and data can also guarantee that data is available in both human- and machine-readable form 01 | | | 05 | | | | 10 | | | | 15 Forkel et al. (2018): “Cross-Linguistic Data Formats [...]” Scientific Data. https: //cldf.clld.org. 55 / 60
  59. Possible Solutions General Ideas General Ideas: Standards Glottolog arbitrarité Concepticon

    CLTS languages concepts sounds Reference Catalogs >>> from pycldf import * >>> ds = Dataset('path') >>> ds.validate() >>> ds.statistics() Validation Software CLDF ID CONCEPT IPA COGNACY 1 hand hant 1 2 hand hænd 1 3 ruka ruka 2 4 rẽnka rẽnka 2 ... ... ... ... Spreadsheet Formats Online Publication (CLLD) 55 / 60
  60. Possible Solutions General Ideas General Ideas: Interfaces allow for a

    rapid annotation of data guarantee that data is human- and machine-readable allow for qualitative and quantitative research at the same time very long title P(A|B)=P(B|A)... List (2017): “A web-based tool [...]” Proc. of the EACL System Demonstrations. https://edictor.digling.org. 56 / 60
  61. Possible Solutions General Ideas General Ideas: Interfaces ID DOCULECT CONCEPT

    SEGMENTS N U O ? wOld yuE_5_1liaN_1 moon moon moon moon Běijīng Guǎngzhōu Měixiàn Fúzhōu 1 2 3 4 Conversion and Segmentation Highlighting of Unrecognized Phonetic Symbols yuE_5_1liaN_1 yɛ⁵¹liɑŋ¹ y ɛ ⁵¹ l i ɑ ŋ ¹ annotate data analyze data edit alignments Etymological DICTionary ediTor http://edictor.digling.org List (2017) E D T 56 / 60
  62. Outlook Measuring Measuring «Measure what is measurable, and make measurable

    what is not so.» (Galileo Galilei [quote apparently falsely attributed to Galilei, see Kleinert 2009]) 58 / 60
  63. Outlook Towards Big Data Towards Big Data CLICS: Database of

    Cross-Linguistic Colexifications http://clics.clld.org List et al. (2018) >1000 languages >1500 concepts 59 / 60
  64. Outlook Towards Big Data Towards Big Data CLICS: Database of

    Cross-Linguistic Colexifications http://clics.clld.org List et al. (2018) CARRY IN HAND CARRY UNDER ARM RULE ORDER SALT TAKE CHOOSE LEND SHARE BRING FORGET ACQUIT HAVE SEX HAND LIBERATE DIRTY GUEST ARM BETWEEN UPPER ARM MOLD TORCH OR LAMP OWN GAP (DISTANCE) DRIP (EMIT LIQUID) FINGERNAIL OR TOENAIL RIVER KISS RAIN (PRECIPITATION) WHEN SPOON SUCK ROUND LICK FINGERNAIL CLAW SOUP DRINK FORK PITCHFORK WATER SEA OPEN SMOKE (INHALE) LET GO OR SET FREE CAUSE DIRT FORKED BRANCH SEND LIP FORGIVE UNTIE ANCHOR EAT BITE BEVERAGE SWALLOW SAP URINE ANKLE FISHHOOK WHEEL WHERE LIFT CHIEFTAIN LOWER ARM CAUSE TO (LET) QUEEN GIVE ELBOW DONATE ELECTRICITY SKY STORM CLOUDS MUD SWAMP SMOKE (EXHAUST) FRESH SMOKE (EMIT SMOKE) STRANGER CEASE MOORLAND HOST GO UP (ASCEND) WEDDING CLIMB CLOUD PALM OF HAND FIVE MARRY RISE (MOVE UPWARDS) WRIST KING PRESIDENT FATHOM COLLARBONE RIDE SPACE (AVAILABLE) MASTER SHOULDER BROOM RAKE FLESH HOOK DRIBBLE SPIT TOE PAW OCEAN FINGER LAKE EDGE OBSCURE TOP NIGHT INCREASE WORLD UP DARKNESS BE GOD CALF OF LEG LEG SHIN FISH LOWER LEG WOMAN FEMALE (OF PERSON) FEMALE FEMALE (OF ANIMAL) LAGOON CORNER BORDER BESIDE FRINGE BOUNDARY WIFE COAST POINTED SHARP SHORE PLACE (POSITION) END (OF SPACE) EARTH (SOIL) BLACK STAND UP CHEW MEAL BREAKFAST HEEL FOOD DINNER (SUPPER) FOOT STAR SAND CLAY STAND SHOULDERBLADE CRAWL WAKE UP FOG FINISH DARK MALE ICE WAIST MARRIED MAN HIP DEEP LUNG FOAM REMAINS BLUE WAIT (FOR) LIFE LATE BE ALIVE AFTER TOWN BEHIND ASH FLOUR STATE (POLITICS) NEW UPPER BACK BOTTOM PASTURE THATCH BUTTOCKS MAN MALE (OF ANIMAL) MALE (OF PERSON) SIT DOWN TALL CROUCH EVENING AFTERNOON HIGH WEST GROW MAINLAND SIT LAND FLOOR AREA HALT (STOP) DUST REMAIN GROUND NATIVE COUNTRY DWELL (LIVE, RESIDE) COUNTRY HUSBAND BACK END (OF TIME) SPINE GRASS DEW MARRIED WOMAN ROOSTER INSECT FOWL BIRD ANIMAL HEN SHORT BABY CORN FIELD THIN SAGO PALM GARDEN SMALL THIN (OF SHAPE OF OBJECT) CLAN NARROW FAMILY YOUNG CITIZEN FINE OR THIN SHALLOW THIN (SLIM) GIRL RELATIVES YOUNG MAN FRIEND PARENTS CHILD (DESCENDANT) YOUNG WOMAN BOY NEIGHBOUR CHILD (YOUNG HUMAN) SON SIBLING BROTHER DESCENDANTS OLDER SIBLING DAUGHTER ALONE FENCE ONLY FEW TOWER SOME ONE YARD OUTSIDE FORTRESS NEVER PLAIN PEOPLE VALLEY DOWN FIELD LOW PERSON YOUNGER SIBLING YOUNGER SISTER OLDER BROTHER YOUNGER BROTHER COUSIN SISTER OLDER SISTER NEPHEW DAMP FLOWER MANY SMOOTH WIDE FLAT BLOOD WET BELOW OR UNDER DOWN OR BELOW GREY BREAD DOUGH RAW VILLAGE GREEN CROWD SOFT AT ALL SLIP UNRIPE VEIN BLOOD VESSEL ALWAYS TENDON ROOF ROOT INSIDE OR GENTLE OLD WITH ENOUGH OLD (AGED) FORMER AND ROOM HOME TENT HUT GARDEN-HOUSE WEAK DENSE MEN'S HOUSE OLD MAN LAZY STILL (CONTINUING) TIRED AGAIN MORE READY OLD WOMAN SOMETIMES IN HOUSE OFTEN YELLOW RED AFTERWARDS BIG GOLD YOLK HOUR SALTY PINCH KNEEL AGE RIPE THICK FULL STRAIGHT BE LATE LIGHT (RADIATION) ABOVE WORK (ACTIVITY) PRODUCE MAKE DAY (NOT NIGHT) HEAVEN WORK (LABOUR) BUILD FAR AT THAT TIME LONG WHITE LENGTH THEN MOUNTAIN OR HILL SEASON HAVE PRESS GET PICK UP HEAD HOLD EARN DO OR MAKE WEATHER FATHER STEPFATHER UNCLE FATHER-IN-LAW (OF MAN) FATHER'S BROTHER MOTHER'S BROTHER STEPMOTHER AUNT BEGINNING BEGIN FIRST FATHER'S SISTER MOTHER-IN-LAW (OF WOMAN) MOTHER'S SISTER MOTHER MOTHER-IN-LAW (OF MAN) PARENTS-IN-LAW GRANDDAUGHTER SON-IN-LAW (OF WOMAN) FATHER-IN-LAW (OF WOMAN) SON-IN-LAW (OF MAN) DAUGHTER-IN-LAW (OF WOMAN) CHILD-IN-LAW SIBLING'S CHILD NIECE GRANDFATHER DAUGHTER-IN-LAW (OF MAN) IN FRONT OF FORWARD GRANDSON GRANDCHILD GRANDMOTHER ANCESTORS GRANDPARENTS THING STREET MANNER ROAD PIECE PORT PATH OR ROAD PATH RIB BONE BAIT THIGH BAY FLESH OR MEAT MEAT FOOTPRINT SIDE PART SLICE WALL (OF HOUSE) MIDDLE NAVEL SNOW LAST (FINAL) HAY HALF NEAR CHICKEN BULL SNAKE WORM CATTLE LIVESTOCK CALF OX COW WHICH WHITHER (WHERE TO) WINE HOW CIRCLE RING BALL BRACELET HOW MUCH HOW MANY BEEHIVE GRAVE CAVE BEARD RAIN (RAINING) SPRING OR WELL MOUSTACHE STREAM GLUE ALCOHOL (FERMENTED DRINK) BEE BEER HONEY WHO WASP MEAD WHAT WHY CANDY LUNCH ITEM WARE CUSTOM LAW MIDDAY PIT (POTHOLE) HOLE FURROW DITCH LAIR JUDGMENT COURT ADJUDICATE CONDEMN CONVICT ACCUSE BLAME ANNOUNCE PREACH EXPLAIN SAY ASK (REQUEST) THROW BUDGE (ONESELF) SHOOT EMBERS UGLY CHOP CUT DOWN COLD (OF WEATHER) FIREWOOD GRASP LEAD (GUIDE) DISTANCE LIE DOWN CARRY ON HEAD PERMIT PUSH MOLAR TOOTH FRONT TOOTH (INCISOR) RIDGEPOLE BEAK COAT TOWEL HELMET SHIRT HEADBAND HEADGEAR RAG VEIL SOON TOGETHER IMMEDIATELY NEST NOW BED TODAY INSTANTLY SUDDENLY RUG WITHOUT PONCHO BLANKET CLOAK MAT BEFORE BOLT (MOVE IN HASTE) ROAR (OF SEA) FAST DASH (OF VEHICLE) EARLY YESTERDAY HURRY AT FIRST EMPTY NO DRY ZERO NOTHING NOT RESULT IN BE BORN HAPPEN PASS SUCCEED BECOME BRAVE CLOTH POWERFUL DARE LOUD GRASS-SKIRT DRESS CLOTHES SKIRT RIPEN SOLID PIERCE HARD BEGET ROUGH REFUSE FRY DRESS UP DENY CALM MORNING PEACE BE SILENT QUIET SWELL TOMORROW HEALTHY EXPENSIVE HAPPY ROAST OR FRY STRONG BAKE PRICE BOIL (SOMETHING) PUT ON COOKED SLOW FAITHFUL RIGHT LAST (ENDURE) FOR A LONG TIME DAWN BEAUTIFUL GOOD COOK (SOMETHING) YES CORRECT (RIGHT) BOIL (OF LIQUID) DO PUT BRIGHT CLEAN LIGHT (COLOR) LAY (VERB) SHINE SEAT (SOMEBODY) INNOCENT FORBID PREPARE CERTAIN TRUTH TRUE DEAR PRECIOUS WARM HEAT CONCEIVE SEW LOOM PLAIT LIGHT (IGNITE) BURN (SOMETHING) PREVENT HOLY GOOD-LOOKING ARSON BEND CHANGE (BECOME DIFFERENT) BURNING TWIST DEBT CROOKED ROLL SPIN HEAVY HOT WEAVE DIFFICULT FEVER PLAIT OR BRAID OR WEAVE PREGNANT OWE TWINKLE CLEAR BEND (SOMETHING) MORTAR CRUSHER PESTLE BITTER MILL MONTH SKULL MEASURE TRY COME BACK TIME MOON COUNT JOIN SQUEEZE PILE UP CLOCK BUY DRAW MILK DAY (24 HOURS) BETRAY GUARD PROTECT PAY KNEE KEEP SELL SUN BILL HELP LIE (MISLEAD) TRADE OR BARTER DECEIT PERJURY RESCUE CURE FOLD SIEVE PRESERVE TRANSLATE TURN (SOMETHING) TURN WRAP HERD (SOMETHING) WAGES DEFEND CHANGE RETURN HOME TIE UP (TETHER) TURN AROUND HANG KNIT WEIGH HANG UP GIVE BACK CONNECT COVER BUTTON BUNCH KNOT SHUT BUNDLE TIE NOOSE GILL EAR EARLOBE THINK FOLLOW JEWEL BE ABLE OBEY SUMMER FEEL (TACTUALLY) REMEMBER SUSPECT BELIEVE GUESS RECOGNIZE (SOMEBODY) SOUR SWEET SUGAR CANE BRACKISH SUGAR TASTY CALCULATE IMITATE CITRUS FRUIT TASTE (SOMETHING) READ COME PRECIPICE SEE STONE OR ROCK APPROACH TOUCH ARRIVE YEAR MEET GRIND FRAGRANT ROTTEN SMELL (STINK) SMELL (PERCEIVE) STINKING SNIFF PUS FEEL UNDERSTAND HEAR THINK (BELIEVE) LISTEN MOVE (AFFECT EMOTIONALLY) KNOW (SOMETHING) NOTICE (SOMETHING) WATCH LEARN REEF STUDY LOOK FOR LOOK NASAL MUCUS (SNOT) SPLASH PITY HIDE (CONCEAL) SHELF FLY (MOVE THROUGH AIR) REGRET NOSTRIL THIEF BOARD SINK (DESCEND) DECREASE CHEEK NOSE BROKEN LOSE EMERGE (APPEAR) ANXIETY BAD LUCK GOOD LUCK OMEN WRONG SLAB FOREHEAD EYE BAD EVIL TABLE INJURE DANGER SURPRISED HARVEST BERRY FEAR (FRIGHT) NUT FAULT MISTAKE BECOME SICK SEED MISS (A TARGET) GUILTY SWELLING BRUISE BLISTER BOIL (OF SKIN) SCAR CHOKE ENTER ACHE SICK DISEASE PAIN DAMAGE (INJURY) SEVERE GRIEF SAUSAGE BEAD STOMACH INTESTINES CHAIN SPLEEN NECKLACE WOMB LIVER BELLY MEANING GHOST POSTCARD HEART LEGENDARY CREATURE SHADE DEMON BRAIN MEMORY FIGHT LETTER THOUGHT MIND BOOK COLLAR INTENTION SPIRIT PURSUE LONG HAIR SPRINGTIME HAIR (HEAD) THINK (REFLECT) DOUBT AUTUMN ORNAMENT HOPE ARMY QUARREL BEAT SOLDIER KNOCK BATTLE NOISE REST NAPE (OF NECK) THROAT NECK IDEA IF BECAUSE SLEEP FOREST DRIP (FALL IN GLOBULES) STICK TREE WALKING STICK PLANT (VEGETATION) LIE (REST) DRAG ASK (INQUIRE) DIVIDE URGE (SOMEONE) STING BRANCH CAMPFIRE BORROW SEPARATE TOOTH MOUTH CANDLE FALL ASLEEP DRIVE (CATTLE) MATCH DRIVE RAFTER BEAM DOORPOST DREAM (SOMETHING) POST MAST TUMBLE (FALL DOWN) WALK TREE TRUNK LAND (DESCEND) TEAR (SHRED) SAW GO OUT FALL TEAR (OF EYE) GO DOWN (DESCEND) BODY TREE STUMP SHOW CARVE SPOIL (SOMEBODY OR SOMETHING) BREAK (CLEAVE) PLANT (SOMETHING) DESTROY WALK (TAKE A WALK) CHIN BREAK (DESTROY OR GET DESTROYED) CUT PICK SPLIT LEAVE PULL CLUB WOOD MOVE (ONESELF) HIRE PRAISE MIX KNEAD WIPE SNEEZE BOAST SCRATCH CLEAN (SOMETHING) HOARFROST WORSHIP COUGH SWEEP RUB SCRAPE CARCASS DIE (FROM ACCIDENT) DIE BATHE SWIM DEAD FLOAT LOVE STAB SAIL PEEL SPREAD OUT CRY COMMON COLD (DISEASE) FROST CORPSE SHRIEK JUMP SHOUT DIG WINTER NAME STREAM (FLOW CONTINUOUSLY) PLOUGH CULTIVATE PLAY VISIBLE SEEM STRETCH SOW SEEDS RETREAT INVITE MUSIC RUN COLD HOLLOW OUT CHARCOAL TONGUE STOVE CONVERSATION SKIN DIVORCE OVEN EARWAX COOKHOUSE TIP (OF TONGUE) AIR HUNT BORE CALL BY NAME BREATH STEP (VERB) SONG ATTACK WASH PROUD SIN DEFENDANT CRIME CHIME (ACTION) EGG TESTICLES BARLEY FRUIT VEGETABLES GRAIN MAIZE RICE WHEAT RUDDER RYE PADDLE SWAY SWING (MOVEMENT) SWING (SOMETHING) SHAKE ROW FREEZE JOG (SOMETHING) OAT SHIVER RINSE RING (MAKE SOUND) MAKE NOISE SOUND (OF INSTRUMENT OR VOICE) TINKLE HOE SHOVEL SPADE FLOW DANCE FLEE CALL DAMAGE SAME FACE SIMILAR DISAPPEAR ESCAPE PRAY GAME BURY CAPE CHAIR MOVE STEAL GROAN HOWL COLD (CHILL) JAW DROWN SINK (DISAPPEAR IN WATER) SET (HEAVENLY BODIES) DIVE WOUND POUND TALK BREATHE PROMISE SPEAK WIND VOICE FUR PUBIC HAIR SOUND OR NOISE STRIKE OR BEAT BARK SCALE KILL HAMMER TONE (MUSIC) WOOL EXTINGUISH MURDER HIT SPEECH CHAT (WITH SOMEBODY) WORD STORM THRESH LEATHER LIKE NEED (NOUN) FELT SKIN (OF FRUIT) PAPER OATH WANT SWEAR KICK SNAIL DEATH PULL OFF (SKIN) SHELL FIREPLACE PEN HAIR (BODY) LANGUAGE CONVEY (A MESSAGE) TELL LEAF (LEAFLIKE OBJECT) FEATHER POUR FLAME GO SING BEESWAX HELL GATHER CARRY SEIZE CATCH TRAP (CATCH) WING FIRE CARRY ON SHOULDER CAST MOW BOSS FIND FIN ADMIT TEACH LEAF SAILCLOTH HAIR ANSWER SAY FOOT CIRCLE GRAIN Largest connected component in CLICS² Clusters inferred with the Infomap Community Detection algorithm 59 / 60
  65. Outlook Towards Big Data Towards Big Data CLICS: Database of

    Cross-Linguistic Colexifications http://clics.clld.org List et al. (2018) CARRY IN HAND CARRY UNDER ARM RULE ORDER SALT TAKE CHOOSE LEND SHARE BRING FORGET ACQUIT HAVE SEX HAND LIBERATE DIRTY GUEST ARM BETWEEN UPPER ARM MOLD TORCH OR LAMP OWN GAP (DISTANCE) DRIP (EMIT LIQUID) FINGERNAIL OR TOENAIL RIVER KISS RAIN (PRECIPITATION) WHEN SPOON SUCK ROUND LICK FINGERNAIL CLAW SOUP DRINK FORK PITCHFORK WATER SEA OPEN SMOKE (INHALE) LET GO OR SET FREE CAUSE DIRT FORKED BRANCH SEND LIP FORGIVE UNTIE ANCHOR EAT BITE BEVERAGE SWALLOW SAP URINE ANKLE FISHHOOK WHEEL WHERE LIFT CHIEFTAIN LOWER ARM CAUSE TO (LET) QUEEN GIVE ELBOW DONATE ELECTRICITY SKY STORM CLOUDS MUD SWAMP SMOKE (EXHAUST) FRESH SMOKE (EMIT SMOKE) STRANGER CEASE MOORLAND HOST GO UP (ASCEND) WEDDING CLIMB CLOUD PALM OF HAND FIVE MARRY RISE (MOVE UPWARDS) WRIST KING PRESIDENT FATHOM COLLARBONE RIDE SPACE (AVAILABLE) MASTER SHOULDER BROOM RAKE FLESH HOOK DRIBBLE SPIT TOE PAW OCEAN FINGER LAKE EDGE OBSCURE TOP NIGHT INCREASE WORLD UP DARKNESS BE GOD CALF OF LEG LEG SHIN FISH LOWER LEG WOMAN FEMALE (OF PERSON) FEMALE FEMALE (OF ANIMAL) LAGOON CORNER BORDER BESIDE FRINGE BOUNDARY WIFE COAST POINTED SHARP SHORE PLACE (POSITION) END (OF SPACE) EARTH (SOIL) BLACK STAND UP CHEW MEAL BREAKFAST HEEL FOOD DINNER (SUPPER) FOOT STAR SAND CLAY STAND SHOULDERBLADE CRAWL WAKE UP FOG FINISH DARK MALE ICE WAIST MARRIED MAN HIP DEEP LUNG FOAM REMAINS BLUE WAIT (FOR) LIFE LATE BE ALIVE AFTER TOWN BEHIND ASH FLOUR STATE (POLITICS) NEW UPPER BACK BOTTOM PASTURE THATCH BUTTOCKS MAN MALE (OF ANIMAL) MALE (OF PERSON) SIT DOWN TALL CROUCH EVENING AFTERNOON HIGH WEST GROW MAINLAND SIT LAND FLOOR AREA HALT (STOP) DUST REMAIN GROUND NATIVE COUNTRY DWELL (LIVE, RESIDE) COUNTRY HUSBAND BACK END (OF TIME) SPINE GRASS DEW MARRIED WOMAN ROOSTER INSECT FOWL BIRD ANIMAL HEN SHORT BABY CORN FIELD THIN SAGO PALM GARDEN SMALL THIN (OF SHAPE OF OBJECT) CLAN NARROW FAMILY YOUNG CITIZEN FINE OR THIN SHALLOW THIN (SLIM) GIRL RELATIVES YOUNG MAN FRIEND PARENTS CHILD (DESCENDANT) YOUNG WOMAN BOY NEIGHBOUR CHILD (YOUNG HUMAN) SON SIBLING BROTHER DESCENDANTS OLDER SIBLING DAUGHTER ALONE FENCE ONLY FEW TOWER SOME ONE YARD OUTSIDE FORTRESS NEVER PLAIN PEOPLE VALLEY DOWN FIELD LOW PERSON YOUNGER SIBLING YOUNGER SISTER OLDER BROTHER YOUNGER BROTHER COUSIN SISTER OLDER SISTER NEPHEW DAMP FLOWER MANY SMOOTH WIDE FLAT BLOOD WET BELOW OR UNDER DOWN OR BELOW GREY BREAD DOUGH RAW VILLAGE GREEN CROWD SOFT AT ALL SLIP UNRIPE VEIN BLOOD VESSEL ALWAYS TENDON ROOF ROOT INSIDE OR GENTLE OLD WITH ENOUGH OLD (AGED) FORMER AND ROOM HOME TENT HUT GARDEN-HOUSE WEAK DENSE MEN'S HOUSE OLD MAN LAZY STILL (CONTINUING) TIRED AGAIN MORE READY OLD WOMAN SOMETIMES IN HOUSE OFTEN YELLOW RED AFTERWARDS BIG GOLD YOLK HOUR SALTY PINCH KNEEL AGE RIPE THICK FULL STRAIGHT BE LATE LIGHT (RADIATION) ABOVE WORK (ACTIVITY) PRODUCE MAKE DAY (NOT NIGHT) HEAVEN WORK (LABOUR) BUILD FAR AT THAT TIME LONG WHITE LENGTH THEN MOUNTAIN OR HILL SEASON HAVE PRESS GET PICK UP HEAD HOLD EARN DO OR MAKE WEATHER FATHER STEPFATHER UNCLE FATHER-IN-LAW (OF MAN) FATHER'S BROTHER MOTHER'S BROTHER STEPMOTHER AUNT BEGINNING BEGIN FIRST FATHER'S SISTER MOTHER-IN-LAW (OF WOMAN) MOTHER'S SISTER MOTHER MOTHER-IN-LAW (OF MAN) PARENTS-IN-LAW GRANDDAUGHTER SON-IN-LAW (OF WOMAN) FATHER-IN-LAW (OF WOMAN) SON-IN-LAW (OF MAN) DAUGHTER-IN-LAW (OF WOMAN) CHILD-IN-LAW SIBLING'S CHILD NIECE GRANDFATHER DAUGHTER-IN-LAW (OF MAN) IN FRONT OF FORWARD GRANDSON GRANDCHILD GRANDMOTHER ANCESTORS GRANDPARENTS THING STREET MANNER ROAD PIECE PORT PATH OR ROAD PATH RIB BONE BAIT THIGH BAY FLESH OR MEAT MEAT FOOTPRINT SIDE PART SLICE WALL (OF HOUSE) MIDDLE NAVEL SNOW LAST (FINAL) HAY HALF NEAR CHICKEN BULL SNAKE WORM CATTLE LIVESTOCK CALF OX COW WHICH WHITHER (WHERE TO) WINE HOW CIRCLE RING BALL BRACELET HOW MUCH HOW MANY BEEHIVE GRAVE CAVE BEARD RAIN (RAINING) SPRING OR WELL MOUSTACHE STREAM GLUE ALCOHOL (FERMENTED DRINK) BEE BEER HONEY WHO WASP MEAD WHAT WHY CANDY LUNCH ITEM WARE CUSTOM LAW MIDDAY PIT (POTHOLE) HOLE FURROW DITCH LAIR JUDGMENT COURT ADJUDICATE CONDEMN CONVICT ACCUSE BLAME ANNOUNCE PREACH EXPLAIN SAY ASK (REQUEST) THROW BUDGE (ONESELF) SHOOT EMBERS UGLY CHOP CUT DOWN COLD (OF WEATHER) FIREWOOD GRASP LEAD (GUIDE) DISTANCE LIE DOWN CARRY ON HEAD PERMIT PUSH MOLAR TOOTH FRONT TOOTH (INCISOR) RIDGEPOLE BEAK COAT TOWEL HELMET SHIRT HEADBAND HEADGEAR RAG VEIL SOON TOGETHER IMMEDIATELY NEST NOW BED TODAY INSTANTLY SUDDENLY RUG WITHOUT PONCHO BLANKET CLOAK MAT BEFORE BOLT (MOVE IN HASTE) ROAR (OF SEA) FAST DASH (OF VEHICLE) EARLY YESTERDAY HURRY AT FIRST EMPTY NO DRY ZERO NOTHING NOT RESULT IN BE BORN HAPPEN PASS SUCCEED BECOME BRAVE CLOTH POWERFUL DARE LOUD GRASS-SKIRT DRESS CLOTHES SKIRT RIPEN SOLID PIERCE HARD BEGET ROUGH REFUSE FRY DRESS UP DENY CALM MORNING PEACE BE SILENT QUIET SWELL TOMORROW HEALTHY EXPENSIVE HAPPY ROAST OR FRY STRONG BAKE PRICE BOIL (SOMETHING) PUT ON COOKED SLOW FAITHFUL RIGHT LAST (ENDURE) FOR A LONG TIME DAWN BEAUTIFUL GOOD COOK (SOMETHING) YES CORRECT (RIGHT) BOIL (OF LIQUID) DO PUT BRIGHT CLEAN LIGHT (COLOR) LAY (VERB) SHINE SEAT (SOMEBODY) INNOCENT FORBID PREPARE CERTAIN TRUTH TRUE DEAR PRECIOUS WARM HEAT CONCEIVE SEW LOOM PLAIT LIGHT (IGNITE) BURN (SOMETHING) PREVENT HOLY GOOD-LOOKING ARSON BEND CHANGE (BECOME DIFFERENT) BURNING TWIST DEBT CROOKED ROLL SPIN HEAVY HOT WEAVE DIFFICULT FEVER PLAIT OR BRAID OR WEAVE PREGNANT OWE TWINKLE CLEAR BEND (SOMETHING) MORTAR CRUSHER PESTLE BITTER MILL MONTH SKULL MEASURE TRY COME BACK TIME MOON COUNT JOIN SQUEEZE PILE UP CLOCK BUY DRAW MILK DAY (24 HOURS) BETRAY GUARD PROTECT PAY KNEE KEEP SELL SUN BILL HELP LIE (MISLEAD) TRADE OR BARTER DECEIT PERJURY RESCUE CURE FOLD SIEVE PRESERVE TRANSLATE TURN (SOMETHING) TURN WRAP HERD (SOMETHING) WAGES DEFEND CHANGE RETURN HOME TIE UP (TETHER) TURN AROUND HANG KNIT WEIGH HANG UP GIVE BACK CONNECT COVER BUTTON BUNCH KNOT SHUT BUNDLE TIE NOOSE GILL EAR EARLOBE THINK FOLLOW JEWEL BE ABLE OBEY SUMMER FEEL (TACTUALLY) REMEMBER SUSPECT BELIEVE GUESS RECOGNIZE (SOMEBODY) SOUR SWEET SUGAR CANE BRACKISH SUGAR TASTY CALCULATE IMITATE CITRUS FRUIT TASTE (SOMETHING) READ COME PRECIPICE SEE STONE OR ROCK APPROACH TOUCH ARRIVE YEAR MEET GRIND FRAGRANT ROTTEN SMELL (STINK) SMELL (PERCEIVE) STINKING SNIFF PUS FEEL UNDERSTAND HEAR THINK (BELIEVE) LISTEN MOVE (AFFECT EMOTIONALLY) KNOW (SOMETHING) NOTICE (SOMETHING) WATCH LEARN REEF STUDY LOOK FOR LOOK NASAL MUCUS (SNOT) SPLASH PITY HIDE (CONCEAL) SHELF FLY (MOVE THROUGH AIR) REGRET NOSTRIL THIEF BOARD SINK (DESCEND) DECREASE CHEEK NOSE BROKEN LOSE EMERGE (APPEAR) ANXIETY BAD LUCK GOOD LUCK OMEN WRONG SLAB FOREHEAD EYE BAD EVIL TABLE INJURE DANGER SURPRISED HARVEST BERRY FEAR (FRIGHT) NUT FAULT MISTAKE BECOME SICK SEED MISS (A TARGET) GUILTY SWELLING BRUISE BLISTER BOIL (OF SKIN) SCAR CHOKE ENTER ACHE SICK DISEASE PAIN DAMAGE (INJURY) SEVERE GRIEF SAUSAGE BEAD STOMACH INTESTINES CHAIN SPLEEN NECKLACE WOMB LIVER BELLY MEANING GHOST POSTCARD HEART LEGENDARY CREATURE SHADE DEMON BRAIN MEMORY FIGHT LETTER THOUGHT MIND BOOK COLLAR INTENTION SPIRIT PURSUE LONG HAIR SPRINGTIME HAIR (HEAD) THINK (REFLECT) DOUBT AUTUMN ORNAMENT HOPE ARMY QUARREL BEAT SOLDIER KNOCK BATTLE NOISE REST NAPE (OF NECK) THROAT NECK IDEA IF BECAUSE SLEEP FOREST DRIP (FALL IN GLOBULES) STICK TREE WALKING STICK PLANT (VEGETATION) LIE (REST) DRAG ASK (INQUIRE) DIVIDE URGE (SOMEONE) STING BRANCH CAMPFIRE BORROW SEPARATE TOOTH MOUTH CANDLE FALL ASLEEP DRIVE (CATTLE) MATCH DRIVE RAFTER BEAM DOORPOST DREAM (SOMETHING) POST MAST TUMBLE (FALL DOWN) WALK TREE TRUNK LAND (DESCEND) TEAR (SHRED) SAW GO OUT FALL TEAR (OF EYE) GO DOWN (DESCEND) BODY TREE STUMP SHOW CARVE SPOIL (SOMEBODY OR SOMETHING) BREAK (CLEAVE) PLANT (SOMETHING) DESTROY WALK (TAKE A WALK) CHIN BREAK (DESTROY OR GET DESTROYED) CUT PICK SPLIT LEAVE PULL CLUB WOOD MOVE (ONESELF) HIRE PRAISE MIX KNEAD WIPE SNEEZE BOAST SCRATCH CLEAN (SOMETHING) HOARFROST WORSHIP COUGH SWEEP RUB SCRAPE CARCASS DIE (FROM ACCIDENT) DIE BATHE SWIM DEAD FLOAT LOVE STAB SAIL PEEL SPREAD OUT CRY COMMON COLD (DISEASE) FROST CORPSE SHRIEK JUMP SHOUT DIG WINTER NAME STREAM (FLOW CONTINUOUSLY) PLOUGH CULTIVATE PLAY VISIBLE SEEM STRETCH SOW SEEDS RETREAT INVITE MUSIC RUN COLD HOLLOW OUT CHARCOAL TONGUE STOVE CONVERSATION SKIN DIVORCE OVEN EARWAX COOKHOUSE TIP (OF TONGUE) AIR HUNT BORE CALL BY NAME BREATH STEP (VERB) SONG ATTACK WASH PROUD SIN DEFENDANT CRIME CHIME (ACTION) EGG TESTICLES BARLEY FRUIT VEGETABLES GRAIN MAIZE RICE WHEAT RUDDER RYE PADDLE SWAY SWING (MOVEMENT) SWING (SOMETHING) SHAKE ROW FREEZE JOG (SOMETHING) OAT SHIVER RINSE RING (MAKE SOUND) MAKE NOISE SOUND (OF INSTRUMENT OR VOICE) TINKLE HOE SHOVEL SPADE FLOW DANCE FLEE CALL DAMAGE SAME FACE SIMILAR DISAPPEAR ESCAPE PRAY GAME BURY CAPE CHAIR MOVE STEAL GROAN HOWL COLD (CHILL) JAW DROWN SINK (DISAPPEAR IN WATER) SET (HEAVENLY BODIES) DIVE WOUND POUND TALK BREATHE PROMISE SPEAK WIND VOICE FUR PUBIC HAIR SOUND OR NOISE STRIKE OR BEAT BARK SCALE KILL HAMMER TONE (MUSIC) WOOL EXTINGUISH MURDER HIT SPEECH CHAT (WITH SOMEBODY) WORD STORM THRESH LEATHER LIKE NEED (NOUN) FELT SKIN (OF FRUIT) PAPER OATH WANT SWEAR KICK SNAIL DEATH PULL OFF (SKIN) SHELL FIREPLACE PEN HAIR (BODY) LANGUAGE CONVEY (A MESSAGE) TELL LEAF (LEAFLIKE OBJECT) FEATHER POUR FLAME GO SING BEESWAX HELL GATHER CARRY SEIZE CATCH TRAP (CATCH) WING FIRE CARRY ON SHOULDER CAST MOW BOSS FIND FIN ADMIT TEACH LEAF SAILCLOTH HAIR ANSWER SAY FOOT CIRCLE GRAIN Largest connected component in CLICS² Clusters inferred with the Infomap Community Detection algorithm List et al. (u. rev.) TONGUE TELL ANNOUNCE TALK ADMIT CHAT (WITH SOMEBODY) SAY WORD ANSWER LANGUAGE VOICE SOUND OR NOISE NOISE PREACH SPEECH TONE (MUSIC) EXPLAIN CONVERSATION CONVEY (A MESSAGE) SPEAK 59 / 60
  66. Outlook New Hypotheses From Problems to Solutions Formulating open problems

    for our field is a first step towards their solution. Especially searching for problems that may have been overlooked so far is a first step to a deeper understanding of our research and our research object. 60 / 60
  67. Outlook New Hypotheses From Problems to Solutions Thanks for your

    attention! • Principal Investigator: Dr. Johann-Mattis List • Project Full Title: Computer-Assisted Language Comparison. Reconciling Computational and Classical Approaches in Historical Linguistics • Project Short Name: CALC • Project duration: 04/2017 — 03/2022 • Host institution: Department of Linguistic and Cultural Evolution (MPI-SHH) LC CA COMPUTA- TIONAL HISTORICAL LINGUISTICS COMPA- RATIVE METHOD Thanks to CALC associates, advisors, and critics: Cormac Anderson, Wolfgang Behr, Timotheus A. Bodt, Thiago Chacon, Michael Cysouw, Robert Forkel, Hans Geisler, Guido Grimm, Simon Greenhill, Russell Gray, Guillaume Jacques, Gerhard Jäger, Gereon Kaiping, Yunfan Lai, Nathan W. Hill, David Morrison, Justin Power, Taraka Rama, Christoph Rzymski, Laurent Sagart, Nathanael Schweikhard, George Starostin, Tiago Tresoldi, Mary Walworth, Søren Wichmann, Mei-Shin Wu 60 / 60