Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Handling word formation in historical language comparison

Schweikhard
February 08, 2019

Handling word formation in historical language comparison

Talk, held at the JenLing Linguistics Workshop at the Internationales Centrum (Friedrich-Schiller-Universität, Jena, 2019/02/08).

Schweikhard

February 08, 2019
Tweet

More Decks by Schweikhard

Other Decks in Science

Transcript

  1. LC
    CA
    Handling word formation in historical language
    comparison
    N. E. Schweikhard
    Max Planck Institute for the Science of Human History
    Department of Linguistic and Cultural Evolution
    CALC Project
    Feb 8, 2019
    1 / 18

    View Slide

  2. Table of Contents
    1 Importance of Word Formation
    2 Word Formation in Computational Linguistics
    3 A Computer-Assisted Frameword of Cognacy
    2 / 18

    View Slide

  3. Compositionality
    basic feature of human language
    3 / 18

    View Slide

  4. Compositionality
    basic feature of human language
    language consists of re-combinable elements:
    phonemes and morphemes
    3 / 18

    View Slide

  5. Compositionality
    basic feature of human language
    language consists of re-combinable elements:
    phonemes and morphemes
    → limited amount of elements, unlimited amount of expressions
    3 / 18

    View Slide

  6. Types of Word Formation
    4 / 18

    View Slide

  7. Types of Word Formation
    syntagmatic:
    shell-fish
    fish-er
    4 / 18

    View Slide

  8. Types of Word Formation
    syntagmatic:
    shell-fish
    fish-er
    paradigmatic:
    fish ↔ to fish
    to fall ↔ to fell
    4 / 18

    View Slide

  9. Word Families
    Word formations lead to families of related words:
    fish
    fish-er
    to fish
    fish-er-man
    shell-fish
    fish-ing
    5 / 18

    View Slide

  10. Word Families
    Word formations lead to families of related words:
    fish
    fish-er
    to fish
    fish-er-man
    shell-fish
    fish-ing
    There is often ambiguity about the direction of derivation.
    5 / 18

    View Slide

  11. Synchrony vs. Diachrony
    Relations between words can differ between language stages:
    Germanic: *fall-an ↔ *fall-jan
    Modern English: to fall ↔ to fell
    6 / 18

    View Slide

  12. Synchrony vs. Diachrony
    Relations between words can differ between language stages:
    Germanic: *fall-an ↔ *fall-jan
    Modern English: to fall ↔ to fell
    Indo-Eur.: *dʰréng-e- ’to drink’ ↔ *dʰrong-éie- ’to make drink’
    Old English: drinc-an ’to drink’ ← → drenċ-an ’to make drink’
    Modern English: to drink ← | → to drench
    6 / 18

    View Slide

  13. Synchrony vs. Diachrony
    Relations between words can differ between language stages:
    Germanic: *fall-an ↔ *fall-jan
    Modern English: to fall ↔ to fell
    Indo-Eur.: *dʰréng-e- ’to drink’ ↔ *dʰrong-éie- ’to make drink’
    Old English: drinc-an ’to drink’ ← → drenċ-an ’to make drink’
    Modern English: to drink ← | → to drench
    Historical linguistics is interested in diachronic relationships.
    6 / 18

    View Slide

  14. Synchrony vs. Diachrony
    Relations between words can differ between language stages:
    Germanic: *fall-an ↔ *fall-jan
    Modern English: to fall ↔ to fell
    Indo-Eur.: *dʰréng-e- ’to drink’ ↔ *dʰrong-éie- ’to make drink’
    Old English: drinc-an ’to drink’ ← → drenċ-an ’to make drink’
    Modern English: to drink ← | → to drench
    Historical linguistics is interested in diachronic relationships.
    Word formation can best be described in synchronic results.
    6 / 18

    View Slide

  15. Synchrony vs. Diachrony
    Relations between words can differ between language stages:
    Germanic: *fall-an ↔ *fall-jan
    Modern English: to fall ↔ to fell
    Indo-Eur.: *dʰréng-e- ’to drink’ ↔ *dʰrong-éie- ’to make drink’
    Old English: drinc-an ’to drink’ ← → drenċ-an ’to make drink’
    Modern English: to drink ← | → to drench
    Historical linguistics is interested in diachronic relationships.
    Word formation can best be described in synchronic results.
    → How do we combine these perspectives?
    6 / 18

    View Slide

  16. Cognacy
    Categories of cognacy
    full cognates: Germanic *fiskas - German Fisch - English fish
    partial cognates: Latin piscator - English fisher
    borrowings: Latin piscatorius → English piscatory
    7 / 18

    View Slide

  17. Cognacy
    Categories of cognacy
    full cognates: Germanic *fiskas - German Fisch - English fish
    partial cognates: Latin piscator - English fisher
    borrowings: Latin piscatorius → English piscatory
    Only by knowing the synchronic relationships can we
    determine the diachronic ones.
    7 / 18

    View Slide

  18. Cognacy
    Categories of cognacy
    full cognates: Germanic *fiskas - German Fisch - English fish
    partial cognates: Latin piscator - English fisher
    borrowings: Latin piscatorius → English piscatory
    Only by knowing the synchronic relationships can we
    determine the diachronic ones.
    And we might be interested in historical synchronic stages of
    languages for their own sake.
    7 / 18

    View Slide

  19. Word Formation in Computational Linguistics
    Computers
    can help linguists:
    handling large amounts of data
    finding patterns
    increasing transparency and retraceability
    8 / 18

    View Slide

  20. Word Formation in Computational Linguistics
    Computers
    can help linguists:
    handling large amounts of data
    finding patterns
    increasing transparency and retraceability
    lack human intuition
    → need to be provided exhaustive information
    8 / 18

    View Slide

  21. Problems of Computational Linguistics
    Automatic Cognate Detection:
    standard method in historical comparative linguistics
    used to analyze large amounts of language data
    9 / 18

    View Slide

  22. Problems of Computational Linguistics
    Automatic Cognate Detection:
    standard method in historical comparative linguistics
    used to analyze large amounts of language data
    LexStat: based on detecting regular sound correspondences:
    works in principle like comparative method
    9 / 18

    View Slide

  23. Problems of Computational Linguistics
    Automatic Cognate Detection:
    standard method in historical comparative linguistics
    used to analyze large amounts of language data
    LexStat: based on detecting regular sound correspondences:
    works in principle like comparative method
    partial cognacy and context-dependent sound shifts can
    seriously hamper results
    9 / 18

    View Slide

  24. Problems of Computational Linguistics
    Automatic Cognate Detection:
    standard method in historical comparative linguistics
    used to analyze large amounts of language data
    LexStat: based on detecting regular sound correspondences:
    works in principle like comparative method
    partial cognacy and context-dependent sound shifts can
    seriously hamper results
    → Solution:
    Provide framework of possible relations between words to computer
    9 / 18

    View Slide

  25. A Computer-Assisted Framework of Cognacy
    Must-haves:
    machine-readable:
    utilizing the benefits of computers
    10 / 18

    View Slide

  26. A Computer-Assisted Framework of Cognacy
    Must-haves:
    machine-readable:
    utilizing the benefits of computers
    human-readable:
    comfortable and easy to use
    10 / 18

    View Slide

  27. A Computer-Assisted Framework of Cognacy
    Must-haves:
    machine-readable:
    utilizing the benefits of computers
    human-readable:
    comfortable and easy to use
    standardized:
    facilitating collaboration and re-use of data and scripts
    10 / 18

    View Slide

  28. A Computer-Assisted Framework of Cognacy
    Must-haves:
    machine-readable:
    utilizing the benefits of computers
    human-readable:
    comfortable and easy to use
    standardized:
    facilitating collaboration and re-use of data and scripts
    exhaustive:
    both synchronic and diachronic relations
    word formation
    sound changes
    analogy
    borrowing
    10 / 18

    View Slide

  29. A Computer-Assisted Framework of Cognacy: Basics
    simple tsv-table-format based on the CLTF-standard
    11 / 18

    View Slide

  30. A Computer-Assisted Framework of Cognacy: Basics
    simple tsv-table-format based on the CLTF-standard
    one row for each word form
    one column for each type of annotation
    11 / 18

    View Slide

  31. A Computer-Assisted Framework of Cognacy: Basics
    simple tsv-table-format based on the CLTF-standard
    one row for each word form
    one column for each type of annotation
    cognate morphemes linked via cross-IDs
    11 / 18

    View Slide

  32. A Computer-Assisted Framework of Cognacy: Basics
    simple tsv-table-format based on the CLTF-standard
    one row for each word form
    one column for each type of annotation
    cognate morphemes linked via cross-IDs
    additional file for specifying word relations
    11 / 18

    View Slide

  33. Basic Examples
    ID DOCULECT CONCEPT FORM TOKENS CROSSIDS
    1 Indo-European fish *pisḱos p i s c + o s 1 0
    2 English fish fish f i ʃ 1
    3 Latin fish piscis p i s k + i s 1 0
    4 English fishing fishing f i ʃ + i ŋ 1 2
    5 Latin to fish piscari p i s k + aː r iː 1 3
    *pisḱos
    piscis
    fish
    fishing
    piscari
    Source Target Change
    1 2 sound change
    1 3 sound change
    2 4 word formation
    3 5 word formation
    12 / 18

    View Slide

  34. Paradigmatic Processes: Root-IDs
    Linking cognate morphemes that differ by internal word formation
    ID DOCULECT CONCEPT FORM TOKENS CROSSIDS ROOTIDS
    1 Indo-European to drink *dʰrénge- dʰ r é n g + e 1 0 1 0
    2 Indo-European to make drink *dʰrongéie- dʰ r o n g + é i e 2 0 1 0
    3 English to drink drink d r ɪ ŋ k 1 1
    4 English to drench drench d r ɛ n tʃ 2 1
    *dʰrénge-
    *dʰrongéie-
    drink
    drench
    Source Target Change
    1 2 causative
    1 3 sound change
    2 4 sound change
    13 / 18

    View Slide

  35. Visualization: Language Tree Reconciliation
    English
    to drink
    to drench
    German
    trinken
    tränken
    Indo-European
    dʰrég- 'moist'
    dʰrénge- 'to drink'
    dʰrongéie- 'to make drink'
    Lithuanian
    drė́gti
    'to become moist'
    14 / 18

    View Slide

  36. Visualization: Language Tree Reconciliation
    English
    to drink
    to drench
    German
    trinken
    tränken
    Indo-European
    dʰrég- 'moist'
    dʰrénge- 'to drink'
    dʰrongéie- 'to make drink'
    Lithuanian
    drė́gti
    'to become moist'
    English
    to drink
    to drench
    German
    trinken
    tränken
    Indo-European
    dʰrég- 'moist'
    dʰrénge- 'to drink'
    Lithuanian
    drė́gti
    'to become moist'
    Germanic
    drinkan 'to drink'
    drankjan
    'to make drink'
    14 / 18

    View Slide

  37. Visualization: Language Tree Reconciliation II
    English
    to drink
    to drench
    German
    trinken
    tränken
    Indo-European
    dʰrég- 'moist'
    dʰrénge- 'to drink'
    dʰrongéie- 'to make drink'
    Lithuanian
    drė́gti
    'to become moist'
    Germanic
    drinkan 'to drink'
    drankjan
    'to make drink'
    15 / 18

    View Slide

  38. Visualization: Language Tree Reconciliation II
    English
    to drink
    to drench
    German
    trinken
    tränken
    Indo-European
    dʰrég- 'moist'
    dʰrénge- 'to drink'
    dʰrongéie- 'to make drink'
    Lithuanian
    drė́gti
    'to become moist'
    Germanic
    drinkan 'to drink'
    drankjan
    'to make drink'
    English German
    Indo-European
    Lithuanian
    Germanic
    15 / 18

    View Slide

  39. Visualization: Language Tree Reconciliation III
    English
    to drink
    to drench
    German
    trinken
    tränken
    Indo-European
    dʰrég- 'moist'
    dʰrénge- 'to drink'
    dʰrongéie- 'to make drink'
    Lithuanian
    drė́gti
    'to become moist'
    Germanic
    drinkan 'to drink'
    drankjan
    'to make drink'
    16 / 18

    View Slide

  40. Visualization: Language Tree Reconciliation III
    English
    to drink
    to drench
    German
    trinken
    tränken
    Indo-European
    dʰrég- 'moist'
    dʰrénge- 'to drink'
    dʰrongéie- 'to make drink'
    Lithuanian
    drė́gti
    'to become moist'
    Germanic
    drinkan 'to drink'
    drankjan
    'to make drink'
    "Balto-Germanic"
    dreg- 'to become
    moist'
    drenge- 'to drink'
    drangje- 'to make
    drink'
    German
    trinken
    tränken
    Indo-European
    dʰrég- 'moist'
    dʰrénge- 'to drink'
    dʰrongéie- 'to make drink'
    English
    to drink
    to drench
    Lithuanian
    drė́gti
    'to become moist'
    16 / 18

    View Slide

  41. Summary
    Our Goals:
    Represent as many kinds of relations between words as
    possible
    Transparency of data vs. interpretation
    Python library of standard procedures in annotation
    Fully annotated example wordlists to be used for research
    Automatic visualization tools for data exploration and analysis
    17 / 18

    View Slide

  42. Thank you for your attention!
    CALC members :
    Dr. Johann-Mattis List (Group leader)
    Dr. Yunfan Lai (Post-Doc)
    Dr. Tiago Tresoldi (Post-Doc)
    Mei-Shin Wu (Doctoral student)
    Nathanael E. Schweikhard (Doctoral student)
    http://calc.digling.org/
    18 / 18

    View Slide