$30 off During Our Annual Pro Sale. View Details »

Do Roots Really Grow Trees?

Do Roots Really Grow Trees?

Paper, presented at the conference "43rd Annual Meeting of the Societas Linguistica Europaea" (Vilnius, Societas Linguistica Europaea).

Johann-Mattis List

September 03, 2010
Tweet

More Decks by Johann-Mattis List

Other Decks in Science

Transcript

  1. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    Do Roots Really Grow Trees?
    Quantitative Root-Based Approaches in Historical Linguistics
    Hans Geisler, Johann-Mattis List
    August 26, 2010
    1 / 33

    View Slide

  2. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    Structure of the Talk
    Introduction
    Comparison and Reconstruction
    The Root Concept in Historical Linguistics
    Lexicostatistics vs. Root-Based Approaches
    Two Models of Language Evolution
    The Separation Base Method
    Etymostatistics
    Phylogenetic Reconstruction
    Comparison of the Models
    Testing the Models of Language Evolution
    Simulations of the Evolutionary Models
    Testing the Models on Real Data
    Conclusion
    Model-Internal Problems
    Models and Reality
    2 / 33

    View Slide

  3. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    Comparison and Reconstruction
    The Root Concept in Historical Linguistics
    Lexicostatistics vs. Root-Based Approaches
    Introduction
    Comparison and Reconstruction
    The Root Concept in Historical Linguistics
    Lexicostatistics vs. Root-Based Approaches
    3 / 33

    View Slide

  4. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    Comparison and Reconstruction
    The Root Concept in Historical Linguistics
    Lexicostatistics vs. Root-Based Approaches
    Comparison and Reconstruction
    Goal of Comparison
    One major goal of comparison in historical linguistics is to
    reconstruct the way genetically related languages evolved from
    a common ancestor language.
    Characters of Comparison
    The characters of comparison differ in the different approaches
    in historical linguistics. The leading question in character
    selection is always, whether a specific sample of characters is
    meaningful for phylogenetic reconstruction.
    4 / 33

    View Slide

  5. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    Comparison and Reconstruction
    The Root Concept in Historical Linguistics
    Lexicostatistics vs. Root-Based Approaches
    The Root Concept in Historical Linguistics
    Indo-European Latin Romance
    tis
    tom
    si
    no
    d(e)h3
    si
    m
    datum
    “given”
    Latin
    dōnāre
    “present”
    Latin
    dōnum
    “gift”
    Latin
    dare
    “to give”
    Latin
    dōs
    “dowry”
    Latin
    date
    “date”
    French
    douna
    “give”
    Provencal
    don
    “gift”
    Spanish
    dar
    “give”
    Portuguese
    dote
    “dowry”
    Italian
    5 / 33

    View Slide

  6. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    Comparison and Reconstruction
    The Root Concept in Historical Linguistics
    Lexicostatistics vs. Root-Based Approaches
    Lexicostatistics vs. Root-Based Approaches
    Lexicostatistics Root-Based-Approaches
    Evolutionary Model replacement of words denot-
    ing basic concepts in seman-
    tic meaning slots
    gain and loss of roots
    Comparanda words denoting the same ba-
    sic concepts
    words which can be traced
    back to a single root (“word
    families”)
    Method of comparison comparative method comparative method
    Characters basic concepts roots (proto-forms)
    6 / 33

    View Slide

  7. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    Comparison and Reconstruction
    The Root Concept in Historical Linguistics
    Lexicostatistics vs. Root-Based Approaches
    Lexicostatistics vs. Root-Based Approaches
    Concept Italian Romanian Spanish French Latin
    BIRD
    - pasǎre pássaro - passer
    ucello - ave oiseau avis
    Table: The Lexicostatistical Analysis for the Concept BIRD
    Root Meaning Italian Romanian Spanish French
    passer “sparrow” passero pasǎre pássaro passereau
    avis “bird” ucello - ave oiseau
    Table: Root-Based Analysis for Latin passer “sparrow” and avis “bird”
    7 / 33

    View Slide

  8. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    Comparison and Reconstruction
    The Root Concept in Historical Linguistics
    Lexicostatistics vs. Root-Based Approaches
    Lexicostatistics vs. Root-Based Approaches
    Apparent Advantages of Root-Based Approaches
    Root-based approaches do not depend on the basic
    vocabulary assumption.
    Dataset is not restricted to the realm of basic vocabulary.
    Use of roots (proto-forms) as primary characters of
    comparison comes closer to the framework of the
    comparative method.
    8 / 33

    View Slide

  9. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    The Separation Base Method
    Etymostatistics
    Phylogenetic Reconstruction
    Comparison of the Models
    Two Models of Language Evolution
    The Separation Base Method (Holm 2000 & 2008)
    Etymostatistics (Starostin 2000[1989])
    Phylogenetic Reconstruction
    Comparison of the Models
    9 / 33

    View Slide

  10. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    The Separation Base Method
    Etymostatistics
    Phylogenetic Reconstruction
    Comparison of the Models
    Evolutionary Model of the Separation Base Method
    Roots inherited from the common ancestor language
    Roots lost after the split from the ancestor language
    L1234
    L12
    L34
    L1
    L2
    L3
    L4
    10 / 33

    View Slide

  11. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    The Separation Base Method
    Etymostatistics
    Phylogenetic Reconstruction
    Comparison of the Models
    Evolutionary Model of the Separation Base Method
    L1
    L2
    L3
    L4
    1
    11 / 33

    View Slide

  12. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    The Separation Base Method
    Etymostatistics
    Phylogenetic Reconstruction
    Comparison of the Models
    Datasets for the Separation Base Method
    Language Value Coding
    Proto *h2
    ent- 1
    Hittite hant- 1
    Old Indian ánti 1
    Avestan - 0
    Armenian - 0
    Greek antí 1
    Slavic - 0
    Baltic ãnt-i 1
    Germanic *anθ-ia 1
    Latin ante 1
    Celtic *antono 1
    Albanian - 0
    Tokharian ānt 1
    Table: Coding of data according to the Separation Base Method
    12 / 33

    View Slide

  13. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    The Separation Base Method
    Etymostatistics
    Phylogenetic Reconstruction
    Comparison of the Models
    Evolutionary Model of Etymostatistics
    Roots inherited from the common ancestor language
    Innovations at different stages of language evolution
    L1234
    L12
    L34
    L1
    L2
    L3
    L4
    13 / 33

    View Slide

  14. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    The Separation Base Method
    Etymostatistics
    Phylogenetic Reconstruction
    Comparison of the Models
    Evolutionary Model of Etymostatistics
    L1
    L2
    L3
    L4
    1
    14 / 33

    View Slide

  15. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    The Separation Base Method
    Etymostatistics
    Phylogenetic Reconstruction
    Comparison of the Models
    Datasets for Etymostatistics
    1. Take whatever text you like for a given language and select
    from it all non-borrowed lexical roots.
    2. Exclude all prefixes, suffixes and proper names and count
    each root only once.
    3. Take this set of roots and look, with help of etymological
    dictionaries, for each root, whether it has a reflex in other
    genetically related languages you want to investigate.
    4. Compute the similarity of the text-language to the other
    languages by calculating the percentage of roots reflected
    in the other languages.
    5. Repeat the procedure for the other languages you want to
    investigate by changing the text-language and selecting
    different texts for the investigation.
    15 / 33

    View Slide

  16. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    The Separation Base Method
    Etymostatistics
    Phylogenetic Reconstruction
    Comparison of the Models
    Datasets for Etymostatistics
    “Das kräftige Wirtschaftswachstum [...] [hat] die Stimmung der
    Verbraucher [...] weiter aufgehellt.” (Spiegel ONLINE, 2010/08/26)1
    Word Meaning “Lemma” Root Reflex Coding
    Das “that” das *þat that 1
    kräftige “strong” Kraft *kraftiz craft 1
    Wirtschaftswachstum “economic growth” Wirt *werđuz - 0
    hat “has” haben *xaƀēnan to have 1
    [die] = das
    Stimmung “mood” Stimme *stemnō - 0
    [der] = das
    Verbraucher “consumer” Brauch *brūkanan to brook 1
    weiter “further” weit *wīđaz wide 1
    aufgehellt “brighten” “hell” OHG hellan - 0
    1Translation: “The strong economic growth has further brightened the
    mood of the customers.”
    16 / 33

    View Slide

  17. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    The Separation Base Method
    Etymostatistics
    Phylogenetic Reconstruction
    Comparison of the Models
    Phylogenetic Reconstruction
    Distance-Based Methods
    Convert the binary data into distances, and analyze it with help
    of common cluster algorithms (e.g. Neighbor-Joining, cf. Saitou
    & Nei 1987; UPGMA, cf. Sokal & Michener 1958).
    Character-Based Methods
    Take the binary form of the data, and analyze it with help of
    specific algorithms which explain the distribution of characters
    according to certain evolutionary models (e.g. probabilistic
    models, cf. Ronquist 2003; parsimony models, cf. Camin &
    Sokal 1965).
    17 / 33

    View Slide

  18. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    The Separation Base Method
    Etymostatistics
    Phylogenetic Reconstruction
    Comparison of the Models
    Comparison of the Models
    Separation Base
    Method
    Etymostatistics
    Evolutionary Model Root loss Root loss and gain
    Data Complete etymological
    dictionaries listing all re-
    constructable roots of a
    proto-language
    Random samples of
    roots extracted from
    texts or word-lists
    Reconstruction Quasi-distances based
    on the assumption that
    the root reflexes in the
    descendant languages
    are hypergeometrically
    distributed
    Uncorrected distances
    (Percentages of com-
    mon character states)
    18 / 33

    View Slide

  19. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    Simulations of the Evolutionary Models
    Testing the Models on Real Data
    Testing the Methods
    Simulations of the Evolutionary Models
    Testing the Models on Real Data
    19 / 33

    View Slide

  20. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    Simulations of the Evolutionary Models
    Testing the Models on Real Data
    Simulations of the Evolutionary Models
    +++ short description of the programs +++
    20 / 33

    View Slide

  21. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    Simulations of the Evolutionary Models
    Testing the Models on Real Data
    Simulations of the Evolutionary Models
    Python Program for the Simulation of the Models
    Program starts with one language L.
    Language goes through different generations of change.
    A generation of change is characterized by a possible split
    of the language into two descendant languages and a
    random amount of root-loss (Separation Base Method) or
    root-loss and root-gain (Etymostatistics).
    The result is a certain amount of descendant languages in
    the last generation of change and a specific distribution of
    roots among these languages.
    21 / 33

    View Slide

  22. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    Simulations of the Evolutionary Models
    Testing the Models on Real Data
    Simulations of the Evolutionary Models
    L_0000
    L_0001
    L_0010
    L_0011
    L_1000
    L_1001
    L_1010
    L_1011
    200
    400
    600
    800
    1000
    22 / 33

    View Slide

  23. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    Simulations of the Evolutionary Models
    Testing the Models on Real Data
    Simulations of the Evolutionary Models
    L
    1000
    L_0
    977
    L_0
    884
    L_00
    818
    L_10
    745
    L_000
    665
    L_001
    682
    L_100
    714
    L_101
    567
    L_0000
    516
    L_0001
    521
    L_0010
    434
    L_0011
    615
    L_1000
    330
    L_1001
    708
    L_1001
    501
    L_1011
    387
    23 / 33

    View Slide

  24. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    Simulations of the Evolutionary Models
    Testing the Models on Real Data
    Simulations of the Evolutionary Models
    24 / 33

    View Slide

  25. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    Simulations of the Evolutionary Models
    Testing the Models on Real Data
    Testing the Separation Base Method
    +++ description of the test+++
    25 / 33

    View Slide

  26. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    Simulations of the Evolutionary Models
    Testing the Models on Real Data
    Testing the Separation Base Method
    +++ graphic/tree +++
    26 / 33

    View Slide

  27. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    Simulations of the Evolutionary Models
    Testing the Models on Real Data
    Testing the Separation Base Method
    +++ graphic/lexstat/stefenelli+++
    27 / 33

    View Slide

  28. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    Simulations of the Evolutionary Models
    Testing the Models on Real Data
    Testing the Separation Base Method
    +++ zusammenfassen der Resultate+++
    28 / 33

    View Slide

  29. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    Simulations of the Evolutionary Models
    Testing the Models on Real Data
    Testing Etymostatistics
    +++ description of the test+++
    29 / 33

    View Slide

  30. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    Simulations of the Evolutionary Models
    Testing the Models on Real Data
    Testing Etymostatistics
    +++ graphic/results+++
    30 / 33

    View Slide

  31. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    Simulations of the Evolutionary Models
    Testing the Models on Real Data
    Testing Etymostatistics
    +++ zusammenfassen der resultate+++
    31 / 33

    View Slide

  32. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    Model-Internal Problems
    Models and Reality
    Conclusion
    Model-Internal Problems
    Models and Reality
    32 / 33

    View Slide

  33. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    Model-Internal Problems
    Models and Reality
    Model-Internal Problems
    +++ Information loss in the models +++ +++ more rigid testing
    of the appropriate method for reconstruction +++
    33 / 33

    View Slide

  34. Introduction
    Two Models of Language Evolution
    Testing the Models of Language Evolution
    Conclusion
    Model-Internal Problems
    Models and Reality
    Models and Reality
    +++ split as the key assumption
    +++ evolution is not always tree-like
    +++ datasets are problematic
    34 / 33

    View Slide