problems in their research. But they tend to overemphasize the complexity of their problems. As a result, they refuse to handle even the things which could be easily handled. Instead of “Yes, we can!”, lin- guists tend to say “Can we really?” 6 / 37
Frucht f. ‘der Fortpflanzung der eigenen Art dienendes Produkt einer Pflanze’, auch ‘ungeborenes Lebewesen’, übertragen ‘Ertrag’, ahd. fruht (9. Jh.), mhd. vruht, asächs. fruht, mnd. mnl. nl. vrucht beruhen auf einer frühen Entlehnung von gleichbed. lat. frūctus, abgeleitet vom Verb lat. fruī (frūctus sum) ‘genießen, Nutzen ziehen’ (verwandt mit brauchen, s. d.). Das Deminutiv Früchtchen hat die spezielle Bedeutung [...] German "Frucht" in Pfei�er (1993, also at http://dwds.de) 8 / 37
Frucht f. ‘der Fortpflanzung der eigenen Art dienendes Produkt einer Pflanze’, auch ‘ungeborenes Lebewesen’, übertragen ‘Ertrag’, ahd. fruht (9. Jh.), mhd. vruht, asächs. fruht, mnd. mnl. nl. vrucht beruhen auf einer frühen Entlehnung von gleichbed. lat. frūctus, abgeleitet vom Verb lat. fruī (frūctus sum) ‘genießen, Nutzen ziehen’ (verwandt mit brauchen, s. d.). Das Deminutiv Früchtchen hat die spezielle Bedeutung [...] German "Frucht" in Pfei�er (1993, also at http://dwds.de 8 / 37
Frucht f. ‘der Fortpflanzung der eigenen Art dienendes Produkt einer Pflanze’, auch ‘ungeborenes Lebewesen’, übertragen ‘Ertrag’, ahd. fruht (9. Jh.), mhd. vruht, asächs. fruht, mnd. mnl. nl. vrucht beruhen auf einer frühen Entlehnung von gleichbed. lat. frūctus, abgeleitet vom Verb lat. fruī (frūctus sum) ‘genießen, Nutzen ziehen’ (verwandt mit brauchen, s. d.). Das Deminutiv Früchtchen hat die spezielle Bedeutung [...] inherited from borrowed from derived from PIE *bhreu◌◌̯ Hg◌ ◌ ̑ - “to use” PIE *bhruHg◌ ◌ ̑ -ié- “to use” (present tense) PGM *ƀrūkan- “to use” OHG brūhhan “to use” G brauchen “to use” G Brauch “custom” OHG fruht “profit, fruit” G frugal “modest (food)” Fr fruit “profit,fruit” Fr frugal “modest (food)” Lt fruor, fruī “I enjoy” Lt frūctus “profit” Lt frux “fruit, grain” Lt frugalis “bring profit” Adapted from an Illustration by Hans Geisler (University Düsseldorf) German "Frucht" in Pfei�er (1993, also at http://dwds.de 8 / 37
form” (impossible to search it efficiently) no standardized phonetic representations no standardized glosses for meanings no standardized names or abbreviations for language and dialect names no standardized representation of sound correspondences no standardized assignment of cognate sets and borrowings ... 9 / 37
on many points in historical linguistics, be it the number of laryngeals, the position of Baltic and Slavic, or whether a given word was borrowed or not. We know well that no two etymological dictionaries for the same language or language families are completely identi- cal. Unfortunately, we lack a rigorous check to which de- gree experts actually agree or disagree in their judgments. We also lack methods for evaluation which would help us to show to which degree a given hypothesis (a reconstruction, a family tree, or an etymology) corresponds with our linguis- tic data. 10 / 37
use CSV (comma-separated values) as a basic format for tabular data use JSON (key-value data-format) for meta-data define how standard columns of the data (languages/doculects, concepts, transcriptions, grammatical features, etc.) should be treated provide an API that checks the consistency of datasets provide sample datasets that illustrate the data format provide applications which handle CLDF (for example, in automatic analyses) 13 / 37
in CLDF define each row as a word indicate the language in which this word is spoken in one column indicate the meaning in another column provide information on the form in additional columns 15 / 37
ID DOCULECT CONCEPT ... 1 German Woldemort valdəmar ... 2 English Woldemort wɔldəmɔrt ... 3 Chinese Woldemort fu⁵¹ti⁵¹mɔ³⁵ ... 4 Russian Woldemort vladimir ... ... ... ... ... ... 10 German Harry haralt ... 11 English Harry hæri ... 12 Russian Harry gali ... ... ... ... ... ... TRANSCRIPTION 16 / 37
al. 2016) simplifies the handling and the testing of data sets in CLDF format. With help of the API and its extensions, scholars can test whether the data conforms to the format. With additional software which is mostly already available, one can further easily draw statistics from the data. Last not least, tools which handle CLDF data can be used for automatic analysis (LingPy, List and Forkel 2016, http://lingpy.org) or for manual cu- ration (EDICTOR, List 2017, http://edictor.digling.org). 17 / 37
labels in published concept lists (questionnaires) to concept sets link concept sets to meta-data define relations between concept sets never link one concept in a given list to more than one concept set (guarantees consistency) provide an API to check the consistency of the data and to query the data provide a web-interface to browse through the data 21 / 37
language change. Although most linguists assume that it proceeds according to certain general patterns, we currently lack the empirical basis to pursue the question in depth. Normally, semantic change proceeds by cumulation and reduction. 24 / 37
Pre-German “head” *kop – k ɔ p “vessel” Proto- Germanic *kuppa- k u pː a “vessel” POLYSEMY PHASE FORM MEANING MONOSEMY PHASE MONOSEMY PHASE CUMULATION REDUCTION 24 / 37
with the central concept "fishscale" with a total of 10 nodes. Hover over forms for each link. Click on the forms to check their sources. Click HERE to export the current network. ty: Line weights: Coloring: Family silver leather fishscale bark coin fur snail skin, hide money shell 49 links for "silver" and "money": Language Family Form 1. Ignaciano Arawakan ne 2. Aymara, Central Aymaran ḳulʸḳi 3. Tsafiki Barbacoan kaˈla 4. Seselwa Creole French Creole larzan 5. Miao, White Hmong-Mien nyiaj 6. Breton Indo-European arhant 7. French Indo-European argent 8. Gaelic, Irish Indo-European airgead 9. Welsh Indo-European arian 10. Cofán Isolate koriΦĩʔdi 25 / 37
with the central concept "leg" with a total of 11 nodes. Hover over the e each link. Click on the forms to check their sources. Click HERE to export the current network. ity: Line weights: Coloring: Geolocation sphere, ball round footprint foot calf of leg circle thigh wheel leg hip buttocks 6 links for "foot" and "wheel": Language Family Form 1. Cofán Isolate c̷ɨʔtʰe 2. Puinave Isolate sim 3. Yaminahua Panoan taɨ 4. Wayampi Tupi pɨ 5. Pumé Unclassified taɔ 6. Ninam Yanomam mãhuk 25 / 37
the standards recommended by the IPA are widely varying in linguistics. Experts on language families often have their own traditions, humans necessarily com- mit errors when transcribing data, technical confusions arise from the usage of lookalike symbols which do not share the same code point, and scholars interpret the IPA differently. Furthermore, the IPA does not offer recommendations for all aspects of transcription: morphological annotation, for ex- ample is not included and varies greatly among scholars. 32 / 37
Forkel, in prep.) define standards for phonetic representation provide meta-data for standardized sounds (feature matrices, etc.) provide an API that allows to query the data and check the consistency of transcriptions with regard to CLPA provide solutions for scholars to convert their data to CLPA develop standards for phonotactic and morphological annotation 33 / 37