Handling word formation in historical language comparison

8f49fcca6feb41b08b84a5b785bd2f4f?s=47 Schweikhard
February 08, 2019

Handling word formation in historical language comparison

Talk, held at the JenLing Linguistics Workshop at the Internationales Centrum (Friedrich-Schiller-Universität, Jena, 2019/02/08).

8f49fcca6feb41b08b84a5b785bd2f4f?s=128

Schweikhard

February 08, 2019
Tweet

Transcript

  1. LC CA Handling word formation in historical language comparison N.

    E. Schweikhard Max Planck Institute for the Science of Human History Department of Linguistic and Cultural Evolution CALC Project Feb 8, 2019 1 / 18
  2. Table of Contents 1 Importance of Word Formation 2 Word

    Formation in Computational Linguistics 3 A Computer-Assisted Frameword of Cognacy 2 / 18
  3. Compositionality basic feature of human language 3 / 18

  4. Compositionality basic feature of human language language consists of re-combinable

    elements: phonemes and morphemes 3 / 18
  5. Compositionality basic feature of human language language consists of re-combinable

    elements: phonemes and morphemes → limited amount of elements, unlimited amount of expressions 3 / 18
  6. Types of Word Formation 4 / 18

  7. Types of Word Formation syntagmatic: shell-fish fish-er 4 / 18

  8. Types of Word Formation syntagmatic: shell-fish fish-er paradigmatic: fish ↔

    to fish to fall ↔ to fell 4 / 18
  9. Word Families Word formations lead to families of related words:

    fish fish-er to fish fish-er-man shell-fish fish-ing 5 / 18
  10. Word Families Word formations lead to families of related words:

    fish fish-er to fish fish-er-man shell-fish fish-ing There is often ambiguity about the direction of derivation. 5 / 18
  11. Synchrony vs. Diachrony Relations between words can differ between language

    stages: Germanic: *fall-an ↔ *fall-jan Modern English: to fall ↔ to fell 6 / 18
  12. Synchrony vs. Diachrony Relations between words can differ between language

    stages: Germanic: *fall-an ↔ *fall-jan Modern English: to fall ↔ to fell Indo-Eur.: *dʰréng-e- ’to drink’ ↔ *dʰrong-éie- ’to make drink’ Old English: drinc-an ’to drink’ ← → drenċ-an ’to make drink’ Modern English: to drink ← | → to drench 6 / 18
  13. Synchrony vs. Diachrony Relations between words can differ between language

    stages: Germanic: *fall-an ↔ *fall-jan Modern English: to fall ↔ to fell Indo-Eur.: *dʰréng-e- ’to drink’ ↔ *dʰrong-éie- ’to make drink’ Old English: drinc-an ’to drink’ ← → drenċ-an ’to make drink’ Modern English: to drink ← | → to drench Historical linguistics is interested in diachronic relationships. 6 / 18
  14. Synchrony vs. Diachrony Relations between words can differ between language

    stages: Germanic: *fall-an ↔ *fall-jan Modern English: to fall ↔ to fell Indo-Eur.: *dʰréng-e- ’to drink’ ↔ *dʰrong-éie- ’to make drink’ Old English: drinc-an ’to drink’ ← → drenċ-an ’to make drink’ Modern English: to drink ← | → to drench Historical linguistics is interested in diachronic relationships. Word formation can best be described in synchronic results. 6 / 18
  15. Synchrony vs. Diachrony Relations between words can differ between language

    stages: Germanic: *fall-an ↔ *fall-jan Modern English: to fall ↔ to fell Indo-Eur.: *dʰréng-e- ’to drink’ ↔ *dʰrong-éie- ’to make drink’ Old English: drinc-an ’to drink’ ← → drenċ-an ’to make drink’ Modern English: to drink ← | → to drench Historical linguistics is interested in diachronic relationships. Word formation can best be described in synchronic results. → How do we combine these perspectives? 6 / 18
  16. Cognacy Categories of cognacy full cognates: Germanic *fiskas - German

    Fisch - English fish partial cognates: Latin piscator - English fisher borrowings: Latin piscatorius → English piscatory 7 / 18
  17. Cognacy Categories of cognacy full cognates: Germanic *fiskas - German

    Fisch - English fish partial cognates: Latin piscator - English fisher borrowings: Latin piscatorius → English piscatory Only by knowing the synchronic relationships can we determine the diachronic ones. 7 / 18
  18. Cognacy Categories of cognacy full cognates: Germanic *fiskas - German

    Fisch - English fish partial cognates: Latin piscator - English fisher borrowings: Latin piscatorius → English piscatory Only by knowing the synchronic relationships can we determine the diachronic ones. And we might be interested in historical synchronic stages of languages for their own sake. 7 / 18
  19. Word Formation in Computational Linguistics Computers can help linguists: handling

    large amounts of data finding patterns increasing transparency and retraceability 8 / 18
  20. Word Formation in Computational Linguistics Computers can help linguists: handling

    large amounts of data finding patterns increasing transparency and retraceability lack human intuition → need to be provided exhaustive information 8 / 18
  21. Problems of Computational Linguistics Automatic Cognate Detection: standard method in

    historical comparative linguistics used to analyze large amounts of language data 9 / 18
  22. Problems of Computational Linguistics Automatic Cognate Detection: standard method in

    historical comparative linguistics used to analyze large amounts of language data LexStat: based on detecting regular sound correspondences: works in principle like comparative method 9 / 18
  23. Problems of Computational Linguistics Automatic Cognate Detection: standard method in

    historical comparative linguistics used to analyze large amounts of language data LexStat: based on detecting regular sound correspondences: works in principle like comparative method partial cognacy and context-dependent sound shifts can seriously hamper results 9 / 18
  24. Problems of Computational Linguistics Automatic Cognate Detection: standard method in

    historical comparative linguistics used to analyze large amounts of language data LexStat: based on detecting regular sound correspondences: works in principle like comparative method partial cognacy and context-dependent sound shifts can seriously hamper results → Solution: Provide framework of possible relations between words to computer 9 / 18
  25. A Computer-Assisted Framework of Cognacy Must-haves: machine-readable: utilizing the benefits

    of computers 10 / 18
  26. A Computer-Assisted Framework of Cognacy Must-haves: machine-readable: utilizing the benefits

    of computers human-readable: comfortable and easy to use 10 / 18
  27. A Computer-Assisted Framework of Cognacy Must-haves: machine-readable: utilizing the benefits

    of computers human-readable: comfortable and easy to use standardized: facilitating collaboration and re-use of data and scripts 10 / 18
  28. A Computer-Assisted Framework of Cognacy Must-haves: machine-readable: utilizing the benefits

    of computers human-readable: comfortable and easy to use standardized: facilitating collaboration and re-use of data and scripts exhaustive: both synchronic and diachronic relations word formation sound changes analogy borrowing 10 / 18
  29. A Computer-Assisted Framework of Cognacy: Basics simple tsv-table-format based on

    the CLTF-standard 11 / 18
  30. A Computer-Assisted Framework of Cognacy: Basics simple tsv-table-format based on

    the CLTF-standard one row for each word form one column for each type of annotation 11 / 18
  31. A Computer-Assisted Framework of Cognacy: Basics simple tsv-table-format based on

    the CLTF-standard one row for each word form one column for each type of annotation cognate morphemes linked via cross-IDs 11 / 18
  32. A Computer-Assisted Framework of Cognacy: Basics simple tsv-table-format based on

    the CLTF-standard one row for each word form one column for each type of annotation cognate morphemes linked via cross-IDs additional file for specifying word relations 11 / 18
  33. Basic Examples ID DOCULECT CONCEPT FORM TOKENS CROSSIDS 1 Indo-European

    fish *pisḱos p i s c + o s 1 0 2 English fish fish f i ʃ 1 3 Latin fish piscis p i s k + i s 1 0 4 English fishing fishing f i ʃ + i ŋ 1 2 5 Latin to fish piscari p i s k + aː r iː 1 3 *pisḱos piscis fish fishing piscari Source Target Change 1 2 sound change 1 3 sound change 2 4 word formation 3 5 word formation 12 / 18
  34. Paradigmatic Processes: Root-IDs Linking cognate morphemes that differ by internal

    word formation ID DOCULECT CONCEPT FORM TOKENS CROSSIDS ROOTIDS 1 Indo-European to drink *dʰrénge- dʰ r é n g + e 1 0 1 0 2 Indo-European to make drink *dʰrongéie- dʰ r o n g + é i e 2 0 1 0 3 English to drink drink d r ɪ ŋ k 1 1 4 English to drench drench d r ɛ n tʃ 2 1 *dʰrénge- *dʰrongéie- drink drench Source Target Change 1 2 causative 1 3 sound change 2 4 sound change 13 / 18
  35. Visualization: Language Tree Reconciliation English to drink to drench German

    trinken tränken Indo-European dʰrég- 'moist' dʰrénge- 'to drink' dʰrongéie- 'to make drink' Lithuanian drė́gti 'to become moist' 14 / 18
  36. Visualization: Language Tree Reconciliation English to drink to drench German

    trinken tränken Indo-European dʰrég- 'moist' dʰrénge- 'to drink' dʰrongéie- 'to make drink' Lithuanian drė́gti 'to become moist' English to drink to drench German trinken tränken Indo-European dʰrég- 'moist' dʰrénge- 'to drink' Lithuanian drė́gti 'to become moist' Germanic drinkan 'to drink' drankjan 'to make drink' 14 / 18
  37. Visualization: Language Tree Reconciliation II English to drink to drench

    German trinken tränken Indo-European dʰrég- 'moist' dʰrénge- 'to drink' dʰrongéie- 'to make drink' Lithuanian drė́gti 'to become moist' Germanic drinkan 'to drink' drankjan 'to make drink' 15 / 18
  38. Visualization: Language Tree Reconciliation II English to drink to drench

    German trinken tränken Indo-European dʰrég- 'moist' dʰrénge- 'to drink' dʰrongéie- 'to make drink' Lithuanian drė́gti 'to become moist' Germanic drinkan 'to drink' drankjan 'to make drink' English German Indo-European Lithuanian Germanic 15 / 18
  39. Visualization: Language Tree Reconciliation III English to drink to drench

    German trinken tränken Indo-European dʰrég- 'moist' dʰrénge- 'to drink' dʰrongéie- 'to make drink' Lithuanian drė́gti 'to become moist' Germanic drinkan 'to drink' drankjan 'to make drink' 16 / 18
  40. Visualization: Language Tree Reconciliation III English to drink to drench

    German trinken tränken Indo-European dʰrég- 'moist' dʰrénge- 'to drink' dʰrongéie- 'to make drink' Lithuanian drė́gti 'to become moist' Germanic drinkan 'to drink' drankjan 'to make drink' "Balto-Germanic" dreg- 'to become moist' drenge- 'to drink' drangje- 'to make drink' German trinken tränken Indo-European dʰrég- 'moist' dʰrénge- 'to drink' dʰrongéie- 'to make drink' English to drink to drench Lithuanian drė́gti 'to become moist' 16 / 18
  41. Summary Our Goals: Represent as many kinds of relations between

    words as possible Transparency of data vs. interpretation Python library of standard procedures in annotation Fully annotated example wordlists to be used for research Automatic visualization tools for data exploration and analysis 17 / 18
  42. Thank you for your attention! CALC members : Dr. Johann-Mattis

    List (Group leader) Dr. Yunfan Lai (Post-Doc) Dr. Tiago Tresoldi (Post-Doc) Mei-Shin Wu (Doctoral student) Nathanael E. Schweikhard (Doctoral student) http://calc.digling.org/ 18 / 18