LC CA Handling word formation in historical language comparison N. E. Schweikhard Max Planck Institute for the Science of Human History Department of Linguistic and Cultural Evolution CALC Project Feb 8, 2019 1 / 18
Compositionality basic feature of human language language consists of re-combinable elements: phonemes and morphemes → limited amount of elements, unlimited amount of expressions 3 / 18
Word Families Word formations lead to families of related words: fish fish-er to fish fish-er-man shell-fish fish-ing There is often ambiguity about the direction of derivation. 5 / 18
Synchrony vs. Diachrony Relations between words can differ between language stages: Germanic: *fall-an ↔ *fall-jan Modern English: to fall ↔ to fell 6 / 18
Synchrony vs. Diachrony Relations between words can differ between language stages: Germanic: *fall-an ↔ *fall-jan Modern English: to fall ↔ to fell Indo-Eur.: *dʰréng-e- ’to drink’ ↔ *dʰrong-éie- ’to make drink’ Old English: drinc-an ’to drink’ ← → drenċ-an ’to make drink’ Modern English: to drink ← | → to drench 6 / 18
Synchrony vs. Diachrony Relations between words can differ between language stages: Germanic: *fall-an ↔ *fall-jan Modern English: to fall ↔ to fell Indo-Eur.: *dʰréng-e- ’to drink’ ↔ *dʰrong-éie- ’to make drink’ Old English: drinc-an ’to drink’ ← → drenċ-an ’to make drink’ Modern English: to drink ← | → to drench Historical linguistics is interested in diachronic relationships. 6 / 18
Synchrony vs. Diachrony Relations between words can differ between language stages: Germanic: *fall-an ↔ *fall-jan Modern English: to fall ↔ to fell Indo-Eur.: *dʰréng-e- ’to drink’ ↔ *dʰrong-éie- ’to make drink’ Old English: drinc-an ’to drink’ ← → drenċ-an ’to make drink’ Modern English: to drink ← | → to drench Historical linguistics is interested in diachronic relationships. Word formation can best be described in synchronic results. 6 / 18
Synchrony vs. Diachrony Relations between words can differ between language stages: Germanic: *fall-an ↔ *fall-jan Modern English: to fall ↔ to fell Indo-Eur.: *dʰréng-e- ’to drink’ ↔ *dʰrong-éie- ’to make drink’ Old English: drinc-an ’to drink’ ← → drenċ-an ’to make drink’ Modern English: to drink ← | → to drench Historical linguistics is interested in diachronic relationships. Word formation can best be described in synchronic results. → How do we combine these perspectives? 6 / 18
Cognacy Categories of cognacy full cognates: Germanic *fiskas - German Fisch - English fish partial cognates: Latin piscator - English fisher borrowings: Latin piscatorius → English piscatory 7 / 18
Cognacy Categories of cognacy full cognates: Germanic *fiskas - German Fisch - English fish partial cognates: Latin piscator - English fisher borrowings: Latin piscatorius → English piscatory Only by knowing the synchronic relationships can we determine the diachronic ones. 7 / 18
Cognacy Categories of cognacy full cognates: Germanic *fiskas - German Fisch - English fish partial cognates: Latin piscator - English fisher borrowings: Latin piscatorius → English piscatory Only by knowing the synchronic relationships can we determine the diachronic ones. And we might be interested in historical synchronic stages of languages for their own sake. 7 / 18
Word Formation in Computational Linguistics Computers can help linguists: handling large amounts of data finding patterns increasing transparency and retraceability 8 / 18
Word Formation in Computational Linguistics Computers can help linguists: handling large amounts of data finding patterns increasing transparency and retraceability lack human intuition → need to be provided exhaustive information 8 / 18
Problems of Computational Linguistics Automatic Cognate Detection: standard method in historical comparative linguistics used to analyze large amounts of language data 9 / 18
Problems of Computational Linguistics Automatic Cognate Detection: standard method in historical comparative linguistics used to analyze large amounts of language data LexStat: based on detecting regular sound correspondences: works in principle like comparative method 9 / 18
Problems of Computational Linguistics Automatic Cognate Detection: standard method in historical comparative linguistics used to analyze large amounts of language data LexStat: based on detecting regular sound correspondences: works in principle like comparative method partial cognacy and context-dependent sound shifts can seriously hamper results 9 / 18
Problems of Computational Linguistics Automatic Cognate Detection: standard method in historical comparative linguistics used to analyze large amounts of language data LexStat: based on detecting regular sound correspondences: works in principle like comparative method partial cognacy and context-dependent sound shifts can seriously hamper results → Solution: Provide framework of possible relations between words to computer 9 / 18
A Computer-Assisted Framework of Cognacy Must-haves: machine-readable: utilizing the benefits of computers human-readable: comfortable and easy to use 10 / 18
A Computer-Assisted Framework of Cognacy Must-haves: machine-readable: utilizing the benefits of computers human-readable: comfortable and easy to use standardized: facilitating collaboration and re-use of data and scripts 10 / 18
A Computer-Assisted Framework of Cognacy Must-haves: machine-readable: utilizing the benefits of computers human-readable: comfortable and easy to use standardized: facilitating collaboration and re-use of data and scripts exhaustive: both synchronic and diachronic relations word formation sound changes analogy borrowing 10 / 18
A Computer-Assisted Framework of Cognacy: Basics simple tsv-table-format based on the CLTF-standard one row for each word form one column for each type of annotation 11 / 18
A Computer-Assisted Framework of Cognacy: Basics simple tsv-table-format based on the CLTF-standard one row for each word form one column for each type of annotation cognate morphemes linked via cross-IDs 11 / 18
A Computer-Assisted Framework of Cognacy: Basics simple tsv-table-format based on the CLTF-standard one row for each word form one column for each type of annotation cognate morphemes linked via cross-IDs additional file for specifying word relations 11 / 18
Basic Examples ID DOCULECT CONCEPT FORM TOKENS CROSSIDS 1 Indo-European fish *pisḱos p i s c + o s 1 0 2 English fish fish f i ʃ 1 3 Latin fish piscis p i s k + i s 1 0 4 English fishing fishing f i ʃ + i ŋ 1 2 5 Latin to fish piscari p i s k + aː r iː 1 3 *pisḱos piscis fish fishing piscari Source Target Change 1 2 sound change 1 3 sound change 2 4 word formation 3 5 word formation 12 / 18
Paradigmatic Processes: Root-IDs Linking cognate morphemes that differ by internal word formation ID DOCULECT CONCEPT FORM TOKENS CROSSIDS ROOTIDS 1 Indo-European to drink *dʰrénge- dʰ r é n g + e 1 0 1 0 2 Indo-European to make drink *dʰrongéie- dʰ r o n g + é i e 2 0 1 0 3 English to drink drink d r ɪ ŋ k 1 1 4 English to drench drench d r ɛ n tʃ 2 1 *dʰrénge- *dʰrongéie- drink drench Source Target Change 1 2 causative 1 3 sound change 2 4 sound change 13 / 18
Visualization: Language Tree Reconciliation English to drink to drench German trinken tränken Indo-European dʰrég- 'moist' dʰrénge- 'to drink' dʰrongéie- 'to make drink' Lithuanian drė́gti 'to become moist' 14 / 18
Visualization: Language Tree Reconciliation English to drink to drench German trinken tränken Indo-European dʰrég- 'moist' dʰrénge- 'to drink' dʰrongéie- 'to make drink' Lithuanian drė́gti 'to become moist' English to drink to drench German trinken tränken Indo-European dʰrég- 'moist' dʰrénge- 'to drink' Lithuanian drė́gti 'to become moist' Germanic drinkan 'to drink' drankjan 'to make drink' 14 / 18
Visualization: Language Tree Reconciliation II English to drink to drench German trinken tränken Indo-European dʰrég- 'moist' dʰrénge- 'to drink' dʰrongéie- 'to make drink' Lithuanian drė́gti 'to become moist' Germanic drinkan 'to drink' drankjan 'to make drink' 15 / 18
Visualization: Language Tree Reconciliation II English to drink to drench German trinken tränken Indo-European dʰrég- 'moist' dʰrénge- 'to drink' dʰrongéie- 'to make drink' Lithuanian drė́gti 'to become moist' Germanic drinkan 'to drink' drankjan 'to make drink' English German Indo-European Lithuanian Germanic 15 / 18
Visualization: Language Tree Reconciliation III English to drink to drench German trinken tränken Indo-European dʰrég- 'moist' dʰrénge- 'to drink' dʰrongéie- 'to make drink' Lithuanian drė́gti 'to become moist' Germanic drinkan 'to drink' drankjan 'to make drink' 16 / 18
Visualization: Language Tree Reconciliation III English to drink to drench German trinken tränken Indo-European dʰrég- 'moist' dʰrénge- 'to drink' dʰrongéie- 'to make drink' Lithuanian drė́gti 'to become moist' Germanic drinkan 'to drink' drankjan 'to make drink' "Balto-Germanic" dreg- 'to become moist' drenge- 'to drink' drangje- 'to make drink' German trinken tränken Indo-European dʰrég- 'moist' dʰrénge- 'to drink' dʰrongéie- 'to make drink' English to drink to drench Lithuanian drė́gti 'to become moist' 16 / 18
Summary Our Goals: Represent as many kinds of relations between words as possible Transparency of data vs. interpretation Python library of standard procedures in annotation Fully annotated example wordlists to be used for research Automatic visualization tools for data exploration and analysis 17 / 18
Thank you for your attention! CALC members : Dr. Johann-Mattis List (Group leader) Dr. Yunfan Lai (Post-Doc) Dr. Tiago Tresoldi (Post-Doc) Mei-Shin Wu (Doctoral student) Nathanael E. Schweikhard (Doctoral student) http://calc.digling.org/ 18 / 18