Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Computer-Assisted Language Comparison. Ideas, Tools, Applications

Computer-Assisted Language Comparison. Ideas, Tools, Applications

Talk, held at the EVOLAEMP Project (2016-01-20, Eberhard-Karls-University, Tübingen).

Johann-Mattis List

January 20, 2016
Tweet

More Decks by Johann-Mattis List

Other Decks in Science

Transcript

  1. Computer-Assisted Language Comparison
    Ideas, Tools, Applications
    Johann-Mattis List
    DFG research fellow
    Centre des recherches linguistiques sur l’Asie Orientale
    Team Adaptation, Integration, Reticulation, Evolution
    EHESS and UPMC, Paris
    2016-01-20
    1 / 32

    View Slide

  2. Background
    Background
    2 / 32

    View Slide

  3. Background The Comparative Method
    The Comparative Method
    3 / 32

    View Slide

  4. Background The Comparative Method
    The Comparative Method
    In linguistics, the comparative method is a technique for
    studying the development of languages by performing a
    feature-by-feature comparison of two or more languages with
    common descent from a shared ancestor, as opposed to the
    method of internal reconstruction, which analyses the internal
    development of a single language over time.
    Wikipedia s.v. "Comparative Method"
    The method of comparing languages to determine whether and
    how they have developed from a common ancestor. The items
    compared are lexical and grammatical units, and the aim is to
    discover correspondences relating sounds in two or more di�erent
    languages, which are so numerous and so regular, across sets of
    units with similar meanings, that no other explanation is
    reasonable.
    Oxford Dictionary of Linguistics (Matthews 1997)
    The comparative is both the earliest and the most important
    of the methods of reconstruction. Most of the major
    insights into the prehistory of languages have been gained by
    the applications of this method, and most reconstructions
    have been based on it.
    Fox (1995)
    The Comparative Method is the central tool in
    historical linguistics for historical
    reconstruction and also classifying languages.
    A classi�cation done with the Comparative
    Method is called a genetic classi�cation. The
    result is that languages are arranged in language
    family trees. This means that languages are
    classi�ed according to their genealogical
    relationships2 and are interpreted as being in
    relation of child- or sisterhood to other
    languages. Such a way of classifying entities is
    called phylogenetic classi�cation in biology; a
    classi�cation by genealogical relationships.
    Fleischhauer (2009)
    The method of comparatistics today is generally known
    under the not very well-chosen term "comparative-historical
    method". It constitutes a huge complex of abstract and
    concrete procedures for the investigation of the history
    of related languages which genetically go back to some
    unofrom tradition of the past.
    Klimov (1990), my translation
    → comparative linguistics, reconstruction
    Routledge Dictionary of Language and Linguistics
    (Bussmann 1996)
    3 / 32

    View Slide

  5. Background The Comparative Method
    The Comparative Method
    Scholar
    Proof of
    Relationship
    Study of
    Language History
    External
    Reconstruction
    Linguistic
    Reconstruction
    Language
    Classification
    Anttila (1972) ✓ ✓
    Bußmann (2002) ✓
    Fleischhauer (2009) ✓
    Fox (1995) ✓
    Glück (2000) ✓
    Harrison (2003) ✓
    Hoenigswald (1960) ✓
    Jarceva (1990) ✓
    Klimov (1990) ✓ ✓
    Lehmann (1969) ✓
    Makaev (1977) ✓
    Matthews (1997) ✓
    Rankin (2003) ✓
    3 / 32

    View Slide

  6. Background The Comparative Method
    The Comparative Method
    Working Definition for the Comparative Method
    The comparative method is a bunch of techniques that are
    commonly used by historical linguists in order to reconstruct
    the history of languages and language families.
    3 / 32

    View Slide

  7. Background Workflows
    Workflows
    4 / 32

    View Slide

  8. Background Workflows
    Workflows
    Workflow by Ross and Durie (1996)
    1. Determine on the strength of diagnostic evidence that a set of languages are
    genetically related, that is, that they constitute a ‘family’;
    2. Collect putative cognate sets for the family (both morphological paradigms and
    lexical items).
    3. Work out the sound correspondences from the cognate sets, putting ‘irregular’
    cognate sets on one side;
    4. Reconstruct the protolanguage of the family as follows:
    a Reconstruct the protophonology from the sound correspondences worked out
    in (3), using conventional wisdom regarding the directions of sound changes.
    b Reconstruct protomorphemes (both morphological paradigms and lexical
    items) from the cognate sets collected in (2), using the protophonology re-
    constructed in (4a).
    5. Establish innovations (phonological, lexical, semantic, morphological, morpho-
    syntactic) shared by groups of languages within the family relative to the re-
    constructed protolanguage.
    6. Tabulate the innovations established in (5) to arrive at an internal classification
    of the family, a ‘family tree’.
    7. Construct an etymological dictionary, tracing borrowings, semantic change,
    and so forth, for the lexicon of the family (or of one language of the family).
    4 / 32

    View Slide

  9. Background Workflows
    Workflows
    PHONOLOGICAL
    AND MORPHOLOGICAL
    RECONSTRUCTION
    IDENTIFICATION
    OF
    INNOVATIONS
    RECONSTRUCTION
    OF
    PHYLOGENIES
    PUBLISH
    ETYMOLOGICAL
    DICTIONARY
    PROOF OF
    LANGUAGE
    RELATIONSHIP
    SOUND
    CORRESPONDENCE
    IDENTIFICATION
    COGNATE
    SET
    IDENTIFICATION
    Tentative Visualization of the Workflow by Ross and Durie (1996: 6f)
    4 / 32

    View Slide

  10. Background Workflows
    Workflows
    proof of
    relationship
    identification
    of cognates
    identification of
    sound correspondences
    reconstruction
    of proto-forms
    internal
    classification
    revise
    revise
    revise
    revise
    Simplified Version of Ross and Durie’s Workflow (List 2014: 58)
    4 / 32

    View Slide

  11. Problems
    Problems
    5 / 32

    View Slide

  12. Problems Application
    Application
    6 / 32

    View Slide

  13. Problems Application
    Application
    PHONOLOGICAL
    AND MORPHOLOGICAL
    RECONSTRUCTION
    IDENTIFICATION
    OF
    INNOVATIONS
    RECONSTRUCTION
    OF
    PHYLOGENIES
    PUBLISH
    ETYMOLOGICAL
    DICTIONARY
    PROOF OF
    LANGUAGE
    RELATIONSHIP
    SOUND
    CORRESPONDENCE
    IDENTIFICATION
    COGNATE
    SET
    IDENTIFICATION
    6 / 32

    View Slide

  14. Problems Application
    Application
    PHONOLOGICAL
    AND MORPHOLOGICAL
    RECONSTRUCTION
    IDENTIFICATION
    OF
    INNOVATIONS
    RECONSTRUCTION
    OF
    PHYLOGENIES
    PUBLISH
    ETYMOLOGICAL
    DICTIONARY
    PROOF OF
    LANGUAGE
    RELATIONSHIP
    SOUND
    CORRESPONDENCE
    IDENTIFICATION
    COGNATE
    SET
    IDENTIFICATION
    TIME CONSUMING...
    6 / 32

    View Slide

  15. Problems Application
    Application
    PHONOLOGICAL
    AND MORPHOLOGICAL
    RECONSTRUCTION
    IDENTIFICATION
    OF
    INNOVATIONS
    RECONSTRUCTION
    OF
    PHYLOGENIES
    PUBLISH
    ETYMOLOGICAL
    DICTIONARY
    PROOF OF
    LANGUAGE
    RELATIONSHIP
    SOUND
    CORRESPONDENCE
    IDENTIFICATION
    COGNATE
    SET
    IDENTIFICATION
    TIME CONSUMING...
    TEDIOUS...
    6 / 32

    View Slide

  16. Problems Representation
    Representation
    7 / 32

    View Slide

  17. Problems Representation
    Representation
    Frucht, ferner fruchten, befruchten, Befruchtung,
    fruchtbar, fruchtig
    Frucht f. ‘der Fortpflanzung der eigenen Art dienendes
    Produkt einer Pflanze’, auch ‘ungeborenes Lebewesen’,
    übertragen ‘Ertrag’, ahd. fruht (9. Jh.), mhd. vruht,
    asächs. fruht, mnd. mnl. nl. vrucht beruhen auf einer
    frühen Entlehnung von gleichbed. lat. frūctus,
    abgeleitet vom Verb lat. fruī (frūctus sum) ‘genießen,
    Nutzen ziehen’ (verwandt mit brauchen, s. d.). Das
    Deminutiv Früchtchen hat die spezielle Bedeutung
    [...]
    German "Frucht" in Pfei�er (1993, also at http://dwds.de)
    7 / 32

    View Slide

  18. Problems Representation
    Representation
    Frucht, ferner fruchten, befruchten, Befruchtung,
    fruchtbar, fruchtig
    Frucht f. ‘der Fortpflanzung der eigenen Art dienendes
    Produkt einer Pflanze’, auch ‘ungeborenes Lebewesen’,
    übertragen ‘Ertrag’, ahd. fruht (9. Jh.), mhd. vruht,
    asächs. fruht, mnd. mnl. nl. vrucht beruhen auf einer
    frühen Entlehnung von gleichbed. lat. frūctus,
    abgeleitet vom Verb lat. fruī (frūctus sum) ‘genießen,
    Nutzen ziehen’ (verwandt mit brauchen, s. d.). Das
    Deminutiv Früchtchen hat die spezielle Bedeutung
    [...]
    German "Frucht" in Pfei�er (1993,
    also at http://dwds.de
    7 / 32

    View Slide

  19. Problems Representation
    Representation
    Frucht, ferner fruchten, befruchten, Befruchtung,
    fruchtbar, fruchtig
    Frucht f. ‘der Fortpflanzung der eigenen Art dienendes
    Produkt einer Pflanze’, auch ‘ungeborenes Lebewesen’,
    übertragen ‘Ertrag’, ahd. fruht (9. Jh.), mhd. vruht,
    asächs. fruht, mnd. mnl. nl. vrucht beruhen auf einer
    frühen Entlehnung von gleichbed. lat. frūctus,
    abgeleitet vom Verb lat. fruī (frūctus sum) ‘genießen,
    Nutzen ziehen’ (verwandt mit brauchen, s. d.). Das
    Deminutiv Früchtchen hat die spezielle Bedeutung
    [...]
    inherited from
    borrowed from
    derived from
    PIE *bhreu◌◌̯
    Hg◌

    ̑
    -
    “to use”
    PIE *bhruHg◌

    ̑
    -ié-
    “to use” (present tense)
    PGM *ƀrūkan-
    “to use”
    OHG brūhhan
    “to use”
    G brauchen
    “to use”
    G Brauch
    “custom”
    OHG fruht
    “profit, fruit”
    G frugal
    “modest (food)”
    Fr fruit
    “profit,fruit”
    Fr frugal
    “modest (food)”
    Lt fruor, fruī
    “I enjoy”
    Lt frūctus
    “profit”
    Lt frux
    “fruit, grain”
    Lt frugalis
    “bring profit”
    Adapted from an Illustration by Hans Geisler (University Düsseldorf)
    German "Frucht" in Pfei�er (1993,
    also at http://dwds.de
    7 / 32

    View Slide

  20. Problems Representation
    Representation
    Entry for PIE *kʷetware in Tower of Babel (http://starling.rinet.ru) 7 / 32

    View Slide

  21. Problems Representation
    Representation
    Insufficiencies of Data Representation
    data in “textual form” (impossible to search it efficiently)
    no standardized phonetic representations
    no standardized glosses for meanings
    no standardized names or abbreviations for language
    and dialect names
    no standardized representation of sound
    correspondences
    no standardized assignment of cognate sets and
    borrowings
    ...
    8 / 32

    View Slide

  22. Problems Replication
    Replication
    9 / 32

    View Slide

  23. Problems Replication
    Replication
    Gloss Blust Pawley Distance
    “day” *qaco *qaco 0
    “to spit” *qanusi *qanusi 0
    “person” *taumataq *tamwata 3
    “to vomit” *mumutaq *mumuta 1
    “name” *ŋajan *qajan 1
    “snake” *mwata *mwata 0
    “man” *mwa ruqane *taumwaqane 5
    “four” *pani *pat 2
    “one” *sakai *tasa 3
    ... ... ... ...
    Disagreement between experts on PO reconstructions (Bouchard-Côté et al. 2014) 9 / 32

    View Slide

  24. Problems Replication
    Replication
    Reproducability Problems in Historical Linguistics
    Scholars disagree on many points in historical linguistics, be
    it the number of laryngeals, the position of Baltic and Slavic,
    or whether a given word was borrowed or not.
    We know well that no two etymological dictionaries for the
    same language or language families are completely identi-
    cal. Unfortunately, we lack a rigorous check to which de-
    gree experts actually agree or disagree in their judgments.
    We also lack methods for evaluation which would help us to
    show to which degree a given hypothesis (a reconstruction,
    a family tree, or an etymology) corresponds with our linguis-
    tic data.
    9 / 32

    View Slide

  25. A Computer-Assisted Framework for Language Comparison
    Towards a Computer-Assisted
    Framework for Language Comparison
    10 / 32

    View Slide

  26. A Computer-Assisted Framework for Language Comparison
    P(A|B)=(P(B|A)P(A))/(P(B)
    FRANZ BOPP
    VERY,
    VERY
    LONG
    TITLE
    11 / 32

    View Slide

  27. A Computer-Assisted Framework for Language Comparison
    PRO:
    - intuition
    - background knowledge
    - can juggle with multiple types of evidence
    CONTRA:
    - has to sleep and rest
    - does not like to count and do boring work
    - can oversee facts when doing boring work
    CONTRA:
    - no intuition
    - no background knowledge
    - can't juggle with multiple types of evidence
    PRO:
    - doesn't need to sleep
    - is very good at counting and boring work
    - doesn't make errors in boring work
    P(A|B)=(P(B|A)P(A))/(P(B)
    FRANZ BOPP
    VERY,
    VERY
    LONG
    TITLE
    11 / 32

    View Slide

  28. A Computer-Assisted Framework for Language Comparison
    PRO:
    - intuition
    - background knowledge
    - can juggle with multiple types of evidence
    CONTRA:
    - has to sleep and rest
    - does not like to count and do boring work
    - can oversee facts when doing boring work
    CONTRA:
    - no intuition
    - no background knowledge
    - can't juggle with multiple types of evidence
    PRO:
    - doesn't need to sleep
    - is very good at counting and boring work
    - doesn't make errors in boring work
    P(A|B)=(P(B|A)P(A))/(P(B)
    FRANZ BOPP
    VERY,
    VERY
    LONG
    TITLE
    11 / 32

    View Slide

  29. A Computer-Assisted Framework for Language Comparison
    PRO:
    - intuition
    - background knowledge
    - can juggle with multiple types of evidence
    CONTRA:
    - has to sleep and rest
    - does not like to count and do boring work
    - can oversee facts when doing boring work
    CONTRA:
    - no intuition
    - no background knowledge
    - can't juggle with multiple types of evidence
    PRO:
    - doesn't need to sleep
    - is very good at counting and boring work
    - doesn't make errors in boring work
    P(A|B)=(P(B|A)P(A))/(P(B)
    FRANZ BOPP
    VERY,
    VERY
    LONG
    TITLE
    COMPUTER-ASSISTED LANGUAGE COMPARISON
    11 / 32

    View Slide

  30. A Computer-Assisted Framework for Language Comparison Standards
    Standards
    12 / 32

    View Slide

  31. A Computer-Assisted Framework for Language Comparison Standards
    Standards: Concept Labeling
    12 / 32

    View Slide

  32. A Computer-Assisted Framework for Language Comparison Standards
    Standards: Concept Labeling
    Concept List # Items Concept Label Concept ID
    Allen (2007) 500 animal oil; 动物油(脂肪) GREASE (CONCEPTICON-ID: 3232)
    Gregersen (1976) 217 fat-grease*fat-grease GREASE (CONCEPTICON-ID: 3232)
    Heggarty (2005) 150 fat (grease); grasa GREASE (CONCEPTICON-ID: 3232)
    Swadesh (1955) 100 fat (grease) GREASE (CONCEPTICON-ID: 3232)
    Alpher and Nash (1999) 151 fat, grease GREASE (CONCEPTICON-ID: 3232)
    Hale (1961) 100 fat, grease GREASE (CONCEPTICON-ID: 3232)
    OGrady and Klokeid (1969) 100 fat, grease GREASE (CONCEPTICON-ID: 3232)
    Blust (2008) 210 fat/grease GREASE (CONCEPTICON-ID: 3232)
    Matisoff (1978) 200 fat/grease GREASE (CONCEPTICON-ID: 3232)
    Samarin (1969) 218 fat/grease GREASE (CONCEPTICON-ID: 3232)
    Dunn et al. (2012) 207 fat GREASE (CONCEPTICON-ID: 3232)
    Swadesh (1950) 215 fat GREASE (CONCEPTICON-ID: 3232)
    Zgraggen (1980) 380 fat GREASE (CONCEPTICON-ID: 3232)
    Jachontov (1991) 100 fat n. GREASE (CONCEPTICON-ID: 3232)
    Wiktionary (2003) 207 fat (noun) GREASE (CONCEPTICON-ID: 3232)
    Starostin (1991) 110 fat n.; жир GREASE (CONCEPTICON-ID: 3232)
    TeilDautrey et al. (2008) 430 fat, oil GREASE (CONCEPTICON-ID: 3232)
    Swadesh (1952) 200 fat (organic substance) GREASE (CONCEPTICON-ID: 3232)
    Shiro (1973) 200 grease (fat) GREASE (CONCEPTICON-ID: 3232)
    Samarin (1969) 100 grease; graisse; Fett; grasa GREASE (CONCEPTICON-ID: 3232)
    Wang (2006) 200 pig oil; 猪油 GREASE (CONCEPTICON-ID: 3232)
    Haspelmath and Tadmor (2009) 1460 the grease or fat GREASE (CONCEPTICON-ID: 3232)
    Concept labels for “GREASE” in 22 different concept lists (see List et al. 2015,
    online at http://concepticon.clld.org)
    12 / 32

    View Slide

  33. A Computer-Assisted Framework for Language Comparison Standards
    Standards: Concept Labeling
    Concept labels for “GREASE” in 22 different concept lists (see List et al. 2015,
    online at http://concepticon.clld.org)
    Concept List # Items Concept Label Concept ID
    Allen (2007) 500 animal oil; 动物油(脂肪) GREASE (CONCEPTICON-ID:323)
    Gregersen (1976) 217 fat-grease*fat-grease GREASE (CONCEPTICON-ID:323)
    Heggarty (2005) 150 fat (grease); grasa GREASE (CONCEPTICON-ID:323)
    Swadesh (1955) 100 fat (grease) GREASE (CONCEPTICON-ID:323)
    Alpher and Nash (1999) 151 fat, grease GREASE (CONCEPTICON-ID:323)
    Hale (1961) 100 fat, grease GREASE (CONCEPTICON-ID:323)
    OGrady and Klokeid (1969) 100 fat, grease GREASE (CONCEPTICON-ID:323)
    Blust (2008) 210 fat/grease GREASE (CONCEPTICON-ID:323)
    Matisoff (1978) 200 fat/grease GREASE (CONCEPTICON-ID:323)
    Samarin (1969) 218 fat/grease GREASE (CONCEPTICON-ID:323)
    Dunn et al. (2012) 207 fat GREASE (CONCEPTICON-ID:323)
    Swadesh (1950) 215 fat GREASE (CONCEPTICON-ID:323)
    Zgraggen (1980) 380 fat GREASE (CONCEPTICON-ID:323)
    Jachontov (1991) 100 fat n. GREASE (CONCEPTICON-ID:323)
    Wiktionary (2003) 207 fat (noun) GREASE (CONCEPTICON-ID:323)
    Starostin (1991) 110 fat n.; жир GREASE (CONCEPTICON-ID:323)
    TeilDautrey et al. (2008) 430 fat, oil GREASE (CONCEPTICON-ID:323)
    Swadesh (1952) 200 fat (organic substance) GREASE (CONCEPTICON-ID:323)
    Shiro (1973) 200 grease (fat) GREASE (CONCEPTICON-ID:323)
    Samarin (1969) 100 grease; graisse; Fett; grasa GREASE (CONCEPTICON-ID:323)
    Wang (2006) 200 pig oil; 猪油 GREASE (CONCEPTICON-ID:323)
    Haspelmath and Tadmor (2009) 1460 the grease or fat GREASE (CONCEPTICON-ID:323)
    12 / 32

    View Slide

  34. A Computer-Assisted Framework for Language Comparison Standards
    Standards: Concept Labeling
    Concept labels for “GREASE” in 22 different concept lists (see List et al. 2015,
    online at http://concepticon.clld.org)
    Concept List # Items Concept Label Concept ID
    Allen (2007) 500 animal oil; 动物油(脂肪) GREASE (CONCEPTICON-ID:323)
    Gregersen (1976) 217 fat-grease*fat-grease GREASE (CONCEPTICON-ID:323)
    Heggarty (2005) 150 fat (grease); grasa GREASE (CONCEPTICON-ID:323)
    Swadesh (1955) 100 fat (grease) GREASE (CONCEPTICON-ID:323)
    Alpher and Nash (1999) 151 fat, grease GREASE (CONCEPTICON-ID:323)
    Hale (1961) 100 fat, grease GREASE (CONCEPTICON-ID:323)
    OGrady and Klokeid (1969) 100 fat, grease GREASE (CONCEPTICON-ID:323)
    Blust (2008) 210 fat/grease GREASE (CONCEPTICON-ID:323)
    Matisoff (1978) 200 fat/grease GREASE (CONCEPTICON-ID:323)
    Samarin (1969) 218 fat/grease GREASE (CONCEPTICON-ID:323)
    Dunn et al. (2012) 207 fat GREASE (CONCEPTICON-ID:323)
    Swadesh (1950) 215 fat GREASE (CONCEPTICON-ID:323)
    Zgraggen (1980) 380 fat GREASE (CONCEPTICON-ID:323)
    Jachontov (1991) 100 fat n. GREASE (CONCEPTICON-ID:323)
    Wiktionary (2003) 207 fat (noun) GREASE (CONCEPTICON-ID:323)
    Starostin (1991) 110 fat n.; жир GREASE (CONCEPTICON-ID:323)
    TeilDautrey et al. (2008) 430 fat, oil GREASE (CONCEPTICON-ID:323)
    Swadesh (1952) 200 fat (organic substance) GREASE (CONCEPTICON-ID:323)
    Shiro (1973) 200 grease (fat) GREASE (CONCEPTICON-ID:323)
    Samarin (1969) 100 grease; graisse; Fett; grasa GREASE (CONCEPTICON-ID:323)
    Wang (2006) 200 pig oil; 猪油 GREASE (CONCEPTICON-ID:323)
    Haspelmath and Tadmor (2009) 1460 the grease or fat GREASE (CONCEPTICON-ID:323)
    12 / 32

    View Slide

  35. A Computer-Assisted Framework for Language Comparison Standards
    Standards: Concept Labeling
    The Concepticon (List, Cysouw, and Forkel submitted), is available
    in form of an online application at
    http://concepticon.clld.org and an online repository at
    http://github.com/clld/concepticon-data.
    The data currently comprises 128 concept lists in which more than
    10 000 concept labels are linked to about 2000 concept sets.
    Basic semantic relations (broader, narrower, etc.) are defined
    between similar concept sets.
    Concept sets are enriched by linking them to additional meta-data.
    13 / 32

    View Slide

  36. A Computer-Assisted Framework for Language Comparison Standards
    Standards: Lexical Representation
    14 / 32

    View Slide

  37. A Computer-Assisted Framework for Language Comparison Standards
    Standards: Lexical Representation
    Dialect Entry IPA Segments Morphemes
    Beijing 大油 ta⁵¹ iou³⁵ t a ⁵¹ i o u ³⁵ t a ⁵¹ + i o u ³⁵
    Changsha 油 tɕy³³ iəu¹³ tɕ y ³³ i ə u ¹³ tɕ y ³³ + i ə u ¹³
    Chengdu 猪油 tsu⁴⁴iəu³¹ ts u ⁴⁴ i ə u ³¹ ts u ⁴⁴ + i ə u ³¹
    Fuzhou 猪油 ty⁴⁴iu⁵² t y ⁴⁴ i u ⁵² t y ⁴⁴ + i u ⁵²
    Guangzhou 猪膏 tʃy⁵⁵kou⁵³ tʃ y ⁵⁵ k ou ⁵³ tʃ y ⁵⁵ + k ou ⁵³
    Meixian 油 jiu¹² j i u ¹² j i u ¹ ²
    Nanchang 油 iu⁵⁵ i u ⁵⁵ i u ⁵⁵
    Taibei ti44 iu13豬油 ti⁴⁴ iu¹³ t i ⁴⁴ i u ¹³ t i ⁴⁴ + i u ¹³
    Wenzhou 猪油 tsei⁴⁴ ɦiau³¹ ts e i ⁴⁴ ɦ i a u ³¹ ts e i +⁴⁴ ɦ i a u ³¹
    Xiamen 油 iu²⁴ i u ²⁴ i u ²⁴
    Lexical entries for “GREASE” (“pork fat”) in 10 Chinese dialect varieties
    (data taken from Wang and Hamed 2006)
    14 / 32

    View Slide

  38. A Computer-Assisted Framework for Language Comparison Standards
    Standards: Lexical Representation
    Lexical entries for “GREASE” (“pork fat”) in 10 Chinese dialect varieties
    (data taken from Wang and Hamed 2006)
    Dialect Entry IPA Segments Morphemes
    Beijing 大油 ta⁵¹ iou³⁵ t a ⁵¹ i o u ³⁵ t a ⁵¹ + i o u ³⁵
    Changsha 油 tɕy³³ iəu¹³ tɕ y ³³ i ə u ¹³ tɕ y ³³ + i ə u ¹³
    Chengdu 猪油 tsu⁴⁴iəu³¹ ts u ⁴⁴ i ə u ³¹ ts u ⁴⁴ + i ə u ³¹
    Fuzhou 猪油 ty⁴⁴iu⁵² t y ⁴⁴ i u ⁵² t y ⁴⁴ + i u ⁵²
    Guangzhou 猪膏 tʃy⁵⁵kou⁵³ tʃ y ⁵⁵ k ou ⁵³ tʃ y ⁵⁵ + k ou ⁵³
    Meixian 油 jiu¹² j i u ¹² j i u ¹ ²
    Nanchang 油 iu⁵⁵ i u ⁵⁵ i u ⁵⁵
    Taibei ti44 iu13豬油 ti⁴⁴ iu¹³ t i ⁴⁴ i u ¹³ t i ⁴⁴ + i u ¹³
    Wenzhou 猪油 tsei⁴⁴ ɦiau³¹ ts e i ⁴⁴ ɦ i a u ³¹ ts e i ⁴⁴ + ɦ i a u ³¹
    Xiamen 油 iu²⁴ i u ²⁴ i u ²⁴
    14 / 32

    View Slide

  39. A Computer-Assisted Framework for Language Comparison Standards
    Standards: Lexical Representation
    Lexical entries for “GREASE” (“pork fat”) in 10 Chinese dialect varieties
    (data taken from Wang and Hamed 2006)
    Dialect Entry IPA Segments Morphemes
    Beijing 大油 ta⁵¹ iou³⁵ t a ⁵¹ i o u ³⁵ t a ⁵¹ + i o u ³⁵
    Changsha 油 tɕy³³ iəu¹³ tɕ y ³³ i ə u ¹³ tɕ y ³³ + i ə u ¹³
    Chengdu 猪油 tsu⁴⁴iəu³¹ ts u ⁴⁴ i ə u ³¹ ts u ⁴⁴ + i ə u ³¹
    Fuzhou 猪油 ty⁴⁴iu⁵² t y ⁴⁴ i u ⁵² t y ⁴⁴ + i u ⁵²
    Guangzhou 猪膏 tʃy⁵⁵kou⁵³ tʃ y ⁵⁵ k ou ⁵³ tʃ y ⁵⁵ + k ou ⁵³
    Meixian 油 jiu¹² j i u ¹² j i u ¹ ²
    Nanchang 油 iu⁵⁵ i u ⁵⁵ i u ⁵⁵
    Taibei ti44 iu13豬油 ti⁴⁴ iu¹³ t i ⁴⁴ i u ¹³ t i ⁴⁴ + i u ¹³
    Wenzhou 猪油 tsei⁴⁴ ɦiau³¹ ts e i ⁴⁴ ɦ i a u ³¹ ts e i +⁴⁴ ɦ i a u ³¹
    Xiamen 油 iu²⁴ i u ²⁴ i u ²⁴
    14 / 32

    View Slide

  40. A Computer-Assisted Framework for Language Comparison Standards
    Standards: Lexical Representation
    Lexical entries for “GREASE” (“pork fat”) in 10 Chinese dialect varieties
    (data taken from Wang and Hamed 2006)
    Dialect Entry IPA Segments Morphemes
    Beijing 大油 ta⁵¹ iou³⁵ t a ⁵¹ i o u ³⁵ t a ⁵¹ + i o u ³⁵
    Changsha 油 tɕy³³ iəu¹³ tɕ y ³³ i ə u ¹³ tɕ y ³³ + i ə u ¹³
    Chengdu 猪油 tsu⁴⁴iəu³¹ ts u ⁴⁴ i ə u ³¹ ts u ⁴⁴ + i ə u ³¹
    Fuzhou 猪油 ty⁴⁴iu⁵² t y ⁴⁴ i u ⁵² t y ⁴⁴ + i u ⁵²
    Guangzhou 猪膏 tʃy⁵⁵kou⁵³ tʃ y ⁵⁵ k ou ⁵³ tʃ y ⁵⁵ + k ou ⁵³
    Meixian 油 jiu¹² j i u ¹² j i u ¹ ²
    Nanchang 油 iu⁵⁵ i u ⁵⁵ i u ⁵⁵
    Taibei ti44 iu13豬油 ti⁴⁴ iu¹³ t i ⁴⁴ i u ¹³ t i ⁴⁴ + i u ¹³
    Wenzhou 猪油 tsei⁴⁴ ɦiau³¹ ts e i ⁴⁴ ɦ i a u ³¹ ts e i ⁴⁴ + ɦ i a u ³¹
    Xiamen 油 iu²⁴ i u ²⁴ i u ²⁴
    14 / 32

    View Slide

  41. A Computer-Assisted Framework for Language Comparison Standards
    Standards: Representation of Cognate Judgments
    15 / 32

    View Slide

  42. A Computer-Assisted Framework for Language Comparison Standards
    Standards: Representation of Cognate Judgments
    Language Lexical Entry Cognacy Alignment
    Central Amis simar 2 s i m a r
    Thao lhimash 2 lh i m a sh
    Hanunóo tabáʔ 23 t a b á ʔ
    Nias tawõ 23 t a w õ -
    Mailu mona 1 m o n a -
    Maloh -iñak 1 - i ñ a k
    Tetum mina 1 m i n a -
    Banggi laːna 24 l aː n a -
    Berawan (Long Terawan) ləməʔ 24 l ə m ə ʔ
    Iban lemak 24 l e m a k
    Cognate judgments for “grease/fat” across 10 Austronesian languages
    (data taken from Greenhill et. al 2008, online at
    http://language.psy.auckland.ac.nz/austronesian/)
    15 / 32

    View Slide

  43. A Computer-Assisted Framework for Language Comparison Standards
    Standards: Representation of Cognate Judgments
    Cognate judgments for “grease/fat” across 10 Austronesian languages
    (data taken from Greenhill et. al 2008, online at
    http://language.psy.auckland.ac.nz/austronesian/)
    Language Lexical Entry Cognacy Alignment
    Central Amis simar 2 s i m a r
    Thao lhimash 2 lh i m a sh
    Hanunóo tabáʔ 23 t a b á ʔ
    Nias tawõ 23 t a w õ -
    Mailu mona 1 m o n a -
    Maloh -iñak 1 - i ñ a k
    Tetum mina 1 m i n a -
    Banggi laːna 24 l aː n a -
    Berawan (Long Terawan) ləməʔ 24 l ə m ə ʔ
    Iban lemak 24 l e m a k
    15 / 32

    View Slide

  44. A Computer-Assisted Framework for Language Comparison Standards
    Jena Wordlist Standard
    16 / 32

    View Slide

  45. A Computer-Assisted Framework for Language Comparison Standards
    Jena Wordlist Standard
    JENA
    WORDLIST
    STANDARD
    The Jena Wordlist Standard is being developed by the NESCent style working group
    “GlottoBank: Towards a Global Language Phylogeny” under the direction of Russel Gray
    16 / 32

    View Slide

  46. A Computer-Assisted Framework for Language Comparison Standards
    Jena Wordlist Standard
    The Jena Wordlist Standard is being developed by the NESCent style working group
    “GlottoBank: Towards a Global Language Phylogeny” under the direction of Russel Gray
    JENA
    WORDLIST
    STANDARD
    DEFINE STANDARDS FOR
    - Wordlists
    - Cognate Sets
    - Alignments
    PROVIDE TOOLS FOR
    - Data Validation
    - Data Exchange
    - Data Enrichment
    16 / 32

    View Slide

  47. A Computer-Assisted Framework for Language Comparison Standards
    Jena Wordlist Standard
    The Jena Wordlist Standard is being developed by the NESCent style working group
    “GlottoBank: Towards a Global Language Phylogeny” under the direction of Russel Gray
    JENA
    WORDLIST
    STANDARD
    arbitrarité
    Glottolog
    http://glottolog.clld.org
    Phoible
    http://phoible.clld.org
    CONCEPTICON
    http://concepticon.clld.org
    [ˈfɔi.bł]
    INTEGRATE EXISTING STANDARDS
    16 / 32

    View Slide

  48. A Computer-Assisted Framework for Language Comparison Standards
    Jena Wordlist Standard
    The Jena Wordlist Standard is being developed by the NESCent style working group
    “GlottoBank: Towards a Global Language Phylogeny” under the direction of Russel Gray
    PROVIDE TOOLS FOR
    EDITING AND ANALYSIS
    LingPy
    http://lingpy.org
    TSV EDICTOR
    http://tsv.lingpy.org
    JENA
    WORDLIST
    STANDARD
    16 / 32

    View Slide

  49. A Computer-Assisted Framework for Language Comparison Standards
    Jena Wordlist Standard
    The Jena Wordlist Standard is being developed by the NESCent style working group
    “GlottoBank: Towards a Global Language Phylogeny” under the direction of Russel Gray
    JENA
    WORDLIST
    STANDARD
    LexiBank
    - Cross-Linguistic Database
    of Lexical Cognate Sets
    PhonoBank
    - Cross-Linguistic Database
    of Regular Sound Change
    Patterns
    USE THE STANDARD TO BUILD
    NEW DATABASES
    16 / 32

    View Slide

  50. A Computer-Assisted Framework for Language Comparison Workflows
    Workflows
    17 / 32

    View Slide

  51. A Computer-Assisted Framework for Language Comparison Workflows
    Workflows
    P(A|B)=(P(B|A)P(A))/(P(B)
    FRANZ BOPP
    VERY,
    VERY
    LONG
    TITLE
    Semantic
    Tagging
    Segmentation
    Cognate
    Detection
    Alignment
    Analysis
    Linguistic
    Reconstruction
    Phylogenetic
    Reconstruction
    HAND [hænd]
    FOOT [fʊt]
    EARTH [ɜːrθ]
    TREE [triː]
    BARK [bɑːrk]
    RAW
    DATA
    HAND [hænd]
    FOOT [fʊt]
    EARTH [ɜːrθ]
    TREE [triː]
    BARK [bɑːrk]
    WORDLIST
    DATA
    HAND [hænd]
    FOOT [fʊt]
    EARTH [ɜːrθ]
    TREE [triː]
    BARK [bɑːrk]
    TOKENS,
    MORPHEMES
    HAND [hænd]
    FOOT [fʊt]
    EARTH [ɜːrθ]
    TREE [triː]
    BARK [bɑːrk]
    COGNATE
    SETS
    HAND [hænd]
    FOOT [fʊt]
    EARTH [ɜːrθ]
    TREE [triː]
    BARK [bɑːrk]
    SOUND
    CORRESPON-
    DENCES
    HAND [hænd]
    FOOT [fʊt]
    EARTH [ɜːrθ]
    TREE [triː]
    BARK [bɑːrk]
    PROTO-
    FORMS
    HAND [hænd]
    FOOT [fʊt]
    EARTH [ɜːrθ]
    TREE [triː]
    BARK [bɑːrk]
    PHYLO-
    GENIES
    PROVIDES
    AUTOMATIC
    ANALYSES
    REVISES
    AUTOMATIC
    ANALYSES
    A possible computer-assisted, iterative workflow with automatic and manual components.
    17 / 32

    View Slide

  52. A Computer-Assisted Framework for Language Comparison Workflows
    Workflows: Tools
    18 / 32

    View Slide

  53. A Computer-Assisted Framework for Language Comparison Workflows
    Workflows: Tools
    LingPy
    http://lingpy.org
    TSV EDICTOR
    http://tsv.lingpy.org
    18 / 32

    View Slide

  54. A Computer-Assisted Framework for Language Comparison Workflows
    Workflows: Tools
    LingPy and EDICTOR: Two tools for computer-assisted language comparison.
    TSV EDICTOR
    http://tsv.lingpy.org
    Software Library for Automatic
    Tasks in Historical Linguistics
    - phonetic segmentation
    - phonetic alignment
    - cognate detection
    - ancestral state reconstruction
    - borrowing detection
    - phylogenetic reconstruction
    18 / 32

    View Slide

  55. A Computer-Assisted Framework for Language Comparison Workflows
    Workflows: Tools
    LingPy and EDICTOR: Two tools for computer-assisted language comparison.
    TSV
    LingPy
    http://lingpy.org
    Online Tool for Computer-
    Assisted Language Comparison
    - server- and client-based
    - data validation
    - phonetic segmentation
    - cognate set editor
    - alignment editor
    - correspondence evaluation
    18 / 32

    View Slide

  56. A Computer-Assisted Framework for Language Comparison Test Cases
    Test Cases: Tukano (with T. Chacon)
    Recent research in historical linguistics shows initial attempts to
    model language history no longer as a process of word gain and
    word loss, but as a process of sound changes across sets of
    cognate words (Wheeler and Whiteley 2015, Bouchard-Côté et al.
    2013, Hruschka et al. 2015, Jäger and List 2015).
    Classical linguists often base genetic classification on shared
    innovations in sound change which allow to identify subgroups.
    The problem of shared innovations is the inherent circularity of the
    concept. Valid innovations need to respect known tendencies of
    sound change, but highly frequent sound change patterns can
    often likewise be interpreted in terms of parallel evolution.
    Computational approaches ignore salient features of sound
    change: context-dependency, system-dependency, and
    directionality. They also ignore that sound systems of ancestral
    languages do not necessarily resemble the alphabets of the
    contemporary languages.
    19 / 32

    View Slide

  57. A Computer-Assisted Framework for Language Comparison Test Cases
    Test Cases: Tukano (with T. Chacon)
    Chacon and List (submitted) address these problems by
    assembling known sound changes extracted from distinct phonetic
    contexts in the consonantal inventory of 21 Tukano languages
    along with their ancestral forms in Proto-Tukano,
    using a weighted, directed parsimony framework to model
    transitions for multiple states of characters corresponding to one
    proto-sound in a distinct context,
    including states which are not attested in contemporary languages
    as “latent states”, and
    using a genetic algorithm to infer the set of trees which minimizes
    the parsimony score.
    20 / 32

    View Slide

  58. A Computer-Assisted Framework for Language Comparison Test Cases
    Test Cases: Tukano (with T. Chacon)
    21 / 32

    View Slide

  59. A Computer-Assisted Framework for Language Comparison Test Cases
    Test Cases: Tukano (with T. Chacon)
    The results show:
    that directional models largely outperform classical
    Sankoff-parsimony,
    that the directions in the proposed sound changes consistently
    identify the root of the languages by splitting Tukano into an
    Eastern and a Western branch,
    that the consensus classification for the best-scoring trees
    convincingly reconciles previous proposals in the literature.
    22 / 32

    View Slide

  60. A Computer-Assisted Framework for Language Comparison Test Cases
    Test Cases: Chinese
    B
    A C D
    duplication
    speciation
    lateral
    transfer
    D
    D
    orthologs
    paralogs
    xenologs
    B
    C
    D
    B
    A
    A
    B
    A B
    23 / 32

    View Slide

  61. A Computer-Assisted Framework for Language Comparison Test Cases
    Test Cases: Chinese
    Historical Relations
    Terminology
    Biology Linguistics
    common descent
    direct
    homology
    orthology
    cognacy....
    ?
    oblique
    cognacy
    indirect paralogy
    involving lateral
    transfer
    xenology ?
    Linguistics
    direct cognate relation
    etymological relation
    indirect cognate
    relation
    (oblique cognacy)
    indirect etymological
    relation
    cognate relation
    (cognacy)
    23 / 32

    View Slide

  62. A Computer-Assisted Framework for Language Comparison Test Cases
    Test Cases: Chinese
    The linguistic terminology regarding historical relations bet-
    ween words lags behind the terminology used in evolutiona-
    ry biology.
    23 / 32

    View Slide

  63. A Computer-Assisted Framework for Language Comparison Test Cases
    Test Cases: Chinese
    Italian
    dare
    French
    donner
    Indo-European
    *deh₃-
    *deh₃-no-
    Latin
    dare
    dōnum
    dōnāre
    Italian
    sole
    French
    soleil
    Swedish
    sol
    German
    Sonne
    Germanic
    *sōwel-
    *sunnō-
    Latin
    solis
    soliculus
    Indo-European
    *sóh₂-wl̩ -
    *sh₂én-
    A B
    23 / 32

    View Slide

  64. A Computer-Assisted Framework for Language Comparison Test Cases
    Test Cases: Chinese
    Lexical change reveals complex patterns of which classi-
    cal historical linguists are aware, but which they completely
    ignore in their terminology regarding historical relations bet-
    ween words.
    23 / 32

    View Slide

  65. A Computer-Assisted Framework for Language Comparison Test Cases
    Test Cases: Chinese
    German m oː n t -
    English m uː n - -
    Danish m ɔː n - ə
    Swedish m oː n - e
    23 / 32

    View Slide

  66. A Computer-Assisted Framework for Language Comparison Test Cases
    Test Cases: Chinese
    German m oː n t -
    English m uː n - -
    Danish m ɔː n - ə
    Swedish m oː n - e
    Fúzhōu ŋ u o ʔ ⁵ - - - - - - - - - -
    Měixiàn ŋ i a t ⁵ - - - - - k u o ŋ ⁴⁴
    Guǎngzhōu j - y t ² l - œ ŋ ²² - - - - -
    Běijīng - y ɛ - ⁵¹ l i ɑ ŋ - - - - - -
    23 / 32

    View Slide

  67. A Computer-Assisted Framework for Language Comparison Test Cases
    Test Cases: Chinese
    German m oː n t -
    English m uː n - -
    Danish m ɔː n - ə
    Swedish m oː n - e
    Fúzhōu ŋ u o ʔ ⁵ - - - - - - - - - -
    Měixiàn ŋ i a t ⁵ - - - - - k u o ŋ ⁴⁴
    Guǎngzhōu j - y t ² l - œ ŋ ²² - - - - -
    Běijīng - y ɛ - ⁵¹ l i ɑ ŋ - - - - - -
    "MOON"
    "MOON"
    "SHINE" "LIGHT"
    23 / 32

    View Slide

  68. A Computer-Assisted Framework for Language Comparison Test Cases
    Test Cases: Chinese
    Fúzhōu
    Měixiàn
    Guǎngzhōu
    Běijīng
    23 / 32

    View Slide

  69. A Computer-Assisted Framework for Language Comparison Test Cases
    Test Cases: Chinese
    Fúzhōu
    Měixiàn
    Guǎngzhōu
    Běijīng
    INNO
    VATIO
    N
    INNO
    VATIO
    N
    INNO
    VATIO
    N
    BO
    RRO
    W
    ING
    LO
    SS
    INNO
    VATIO
    N
    INNO
    VATIO
    N
    23 / 32

    View Slide

  70. A Computer-Assisted Framework for Language Comparison Test Cases
    Test Cases: Chinese
    The classical gain-loss models for lexical change employed
    in computational historical linguistics are largely unrealistic
    when it comes to the modeling of complex historical relati-
    ons, especially relations of indirect cognacy.
    24 / 32

    View Slide

  71. A Computer-Assisted Framework for Language Comparison Test Cases
    Test Cases: Chinese
    B
    A C
    AC
    ABD
    AB
    A D
    24 / 32

    View Slide

  72. A Computer-Assisted Framework for Language Comparison Test Cases
    Test Cases: Chinese
    Instead of using gain-loss models, we should try to find ways
    to model lexical change within multi-state approaches which
    also include the directionality of change.
    24 / 32

    View Slide

  73. A Computer-Assisted Framework for Language Comparison Test Cases
    Test Cases: Chinese
    List (submitted) illustrates how complex historical relations between
    words in Chinese dialects can be modeled by
    employing a directed weighted parsimony framework,
    modeling partial cognacy resulting from compounding as
    character-state transitions,
    computing weights between multiple characters states with help of
    a modified Hamming distance applied to the alignment of words
    which are segmented into morphemes, with insertions being more
    heavily penalized as deletions).
    25 / 32

    View Slide

  74. A Computer-Assisted Framework for Language Comparison Test Cases
    Test Cases: Chinese
    月 月光 月光佛 月亮
    月 0 1 2 1
    月光 2 0 1 2
    月光佛 4 2 0 4
    月亮 2 2 3 0
    月 月光 月光佛 月亮
    月 0 1 2 1
    月光 1 0 1 2
    月光佛 2 1 0 3
    月亮 1 2 3 0
    月 月光 月光佛 月亮
    月 0 1 1 1
    月光 1 0 1 1
    月光佛 1 1 0 1
    月亮 1 1 1 0
    月 光
    月 光 佛
    -
    月 亮
    月 光 佛
    -
    0
    0
    1 0 2 2
    = 1 = 4
    月 光
    月 光 佛
    -
    月 亮
    月 光 佛
    -
    0
    0
    1 0 2 1
    = 1 = 3
    Transition Penalty (SANKOFF)
    Transition Penalty (DWST)
    1
    1
    1
    1 1
    1
    1
    1
    1
    1
    1
    1
    1
    2 1
    1
    2 1
    1
    1
    2
    A
    B
    C
    26 / 32

    View Slide

  75. A Computer-Assisted Framework for Language Comparison Test Cases
    Test Cases: Chinese
    The model is tested by determining how well the approach accounts for
    ancestral state reconstruction (semantic reconstruction) on a dataset of
    24 Chinese dialects for which reference phylogenies were provided,
    and ancestral states are known via Ancient Chinese texts. The results
    show, that
    binary state models perform worst (around 55% correct
    reconstructions),
    Fitch parsimony applied to multi-state representations performs
    slightly better than binary models (around 60% correct
    reconstructions),
    Sankoff parsimony performs much better, with scores around 75%,
    but high dependency upon the reference phylogeny,
    directed Sankoff parsimony outperforms all approaches, reaching
    82%.
    27 / 32

    View Slide

  76. A Computer-Assisted Framework for Language Comparison Test Cases
    Test Cases: Chinese
    Fúzhōu
    Táiběi
    Xiàmén
    Zhāngpíng
    Mǐn
    Guǎngzhōu
    Měixiàn
    Liánchéng
    Hakka
    Wēnzhōu
    Níngbō
    Sūzhōu
    Shànghǎi
    Shànghǎi_B

    Nánchāng
    Ānyì
    Gàn
    Chángshā
    Shuāngfēng
    Xiāng
    Yàngshān
    Wǔhàn
    Níngxià
    Chéngdū
    Běijīng
    Tàiyuán
    Yúcì
    Guānhuà

    月娘
    月光佛
    月光
    月亮
    月明
    ‘MOON’
    ‘MOON-MOTHER’
    ‘MOON-LIGHT’
    ‘MOON-LIGHT-SUFFIX’
    ‘MOON-SHINE’
    ‘MOON-BRIGHT’
    28 / 32

    View Slide

  77. A Computer-Assisted Framework for Language Comparison Summary
    Summary
    The test cases mentioned above do not stop with the computational
    applications, but are instead intended to serve as a starting point from
    which classical linguists can evaluate and improve on the findings. In
    the case of sound change processes, interactive applications help
    linguists to identify classical “shared innovations”, but instead of
    determining them manually, linguists can inspect the consequences of
    their hypotheses regarding subgrouping. In the case of complex
    relations between words, linguists can investigate the plausibility of
    phylogenetic models and compounding processes, thereby using
    parallel evolution as a proxy for the identification of lateral relations
    between the languages and gaining more insights into potential
    regularities of compounding in the history of Chinese.
    29 / 32

    View Slide

  78. Challenges
    Challenges
    30 / 32

    View Slide

  79. Challenges
    Challenges
    P(A|B)=(P(B|A)P(A))/(P(B)
    FRANZ BO
    PP
    VERY,
    VERY
    LO
    NG
    TI TLE
    31 / 32

    View Slide

  80. Challenges
    Challenges
    P(A|B)=(P(B|A)P(A))/(P(B)
    FRANZ BO
    PP
    VERY,
    VERY
    LO
    NG
    TI TLE
    Modeling of Morphological Change
    morphological change is not systematic (as opposed to sound change)
    morphological differences in cognate sets distort the alignments
    Modeling of Semantic Change
    semantic shift is not systematic but has general tendencies
    we need to incorporate known tendencies in our analyses
    Modeling of Irregular Sound Change
    irregular or sporadic sound change is problematic for reconstruction
    we need to find ways to incorporate our uncertainty in our alignments
    31 / 32

    View Slide

  81. Concluding Remarks
    The current practice of data representation in
    historical linguistics does not only make it dif-
    ficult to compare and test hypotheses propo-
    sed by classical linguists with those proposed
    by computational approaches, but also to recon-
    cile the insights we can gain from the two ap-
    proaches. If we want to gain new insights into
    the past of our languages, we need to find ways
    to integrate both the knowledge which experts
    have been accumulating over centuries and the
    new computational tools which help to organize,
    analyze and integrate this knowledge.
    32 / 32

    View Slide

  82. Thanks for Your Attention!
    32 / 32

    View Slide