Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Towards refined phylogenies of the Tukanoan languages: A computer-assisted approach

Towards refined phylogenies of the Tukanoan languages: A computer-assisted approach

Talk (by Thiago Costa Chacon, Tiago Tresoldi, and Johann-Mattis List), held as part of the Workshop "Computer-assisted approaches to historical and typological language comparison", organized as part of the Annual Meeting of the Societas Linguistica Europea (2019-08-21/24, Leipzig, University of Leipzig).

Johann-Mattis List

August 21, 2019
Tweet

More Decks by Johann-Mattis List

Other Decks in Science

Transcript

  1. Towards refined phylogenies of the Tukanoan languages:
    A computer-assisted approach
    Thiago Costa Chacon
    Tiago Tresoldi
    Johann-Mattis List

    View Slide

  2. Objectives
    - Investigate the classification of the Tukanoan language
    family by means of an intensive comparison of
    classical, computational, and computer-assisted
    approaches
    - Show how we are lifting the data by Huber and Reed
    (1992) to a level where a computer-assisted linguistic
    and phylogenetic reconstruction can be carried out
    - Present initial results and challenges for the future

    View Slide

  3. 1. The Tukanoan language family
    2. Issues in the classification of Tukanoan
    3. A computer-assisted study of Tukanoan
    4. State of the art
    5. Outlook
    Plan of the Talk

    View Slide

  4. - A “mid-size” linguistic
    family of South
    America
    - About 29 languages (8
    extinct)
    - Located in
    Northwest Amazonia
    - Strong contact
    relations with
    Arawakan and other
    neighboring families
    The Tukanoan Language Family

    View Slide

  5. - Geographical split
    into Western (WT)
    and Eastern (ET)
    languages , already
    documented in
    colonial maps from
    the 17th century
    - The lower number of
    WT languages is
    possibly due to
    earlier and more
    devastating impacts
    of colonization
    The Tukanoan Language Family

    View Slide

  6. Beuchat and Rivet (1911)
    ● Defined the Tukanoan family as a group of languages
    unrelated to any other linguistic family.
    Mason (1950)
    ● Established the major division between WT and ET
    branches, but did not present any criteria other than
    geography for his classification.
    Issues in the Classification of Tukanoan

    View Slide

  7. Waltz and Wheeler (1972)
    ● Kubeo classified in a third branch: Middle Tukanoan
    ● Lexical similarity criteria
    Malone (1986ms.) and Barnes (1999)
    ● Both Kubeo and Tanimuka-Retuarã classified as
    Middle Tukanoan
    ● Classification based on shared sound innovations
    Issues in the Classification of Tukanoan

    View Slide

  8. Waltz and Wheeler (1972) Malone (1986) and Barnes (1999)
    Issues in the Classification of Tukanoan

    View Slide

  9. Chacon (2014)
    ● WT vs. ET division
    ● Kubeo and Tanimuka are ET
    ● Uses the classical comparative method by
    reconstructing the consonant inventory and proposing
    innovations in consonantal change
    Issues in the Classification of Tukanoan

    View Slide

  10. Classification
    by Chacon
    (2014), based
    on
    shared
    innovations in
    sound-change
    processes.
    Issues in the Classification of Tukanoan

    View Slide

  11. Chacon and List (2015):
    ● Confirm and refine Chacon’s (2014) reconstruction
    ● Correspondences sets with reconstructed sounds from
    Chacon (2014) were converted into a sound change
    transition network, where
    ○ proto-forms were included
    ○ intermediate unattested states were proposed based on
    a qualitative assessment regarding sound change
    tendencies
    ● The sound change networks were analyzed to find a
    phylogenetic tree, using parsimony based on step matrices
    (directed and weighted transition preferences between
    sounds)
    Issues in the Classification of Tukanoan

    View Slide

  12. Issues in the Classification of Tukanoan

    View Slide

  13. Chacon and List (2015):
    ● The parsimony-based approach can infer trees which fit the
    data best by finding the most parsimonious trees with
    respect to the qualitative assessment of sound change
    transitions and the reconstructed proto-forms
    ● It also infers which sound transitions occurred in the
    internal nodes if a tree is supplied by the users
    ● Given that similar sound changes may occur on different
    branches of a given tree, the method reflects homoplasy
    (and also shows that not all sound changes are equally
    indicative for subgrouping)
    Issues in the Classification of Tukanoan

    View Slide

  14. Issues in the Classification of Tukanoan
    Revised
    classification in
    Chacon and List
    (2015), based
    on parsimony
    and Chacon’s
    expert
    assessment

    View Slide

  15. Problems of sound-change-based phylogenies:
    - Phylogenies built on parsimony are merely topological,
    branch lengths (dates) cannot be inferred.
    - Sound changes are homoplastic (Ringe 2002).
    - The workflow is not practical for the application to other
    language families, since it requires an extreme degree of
    expert knowledge, which is in the danger of being circular
    - Thus, it cannot be seen as an independent test
    Issues in the Classification of Tukanoan

    View Slide

  16. Basic ideas
    1. Allow for an integration of cognates, sound
    correspondences, and proto-forms to enable different
    analyses.
    2. Increase transparency by relying on a new workflow.
    3. Produce ultimately a new semi-automated
    reconstruction of a substantial amount of lexical items
    in Tukanoan.
    4. Allow for the convenient computation of different
    phylogenies.
    A Computer-Assisted Study of Tukanoan

    View Slide

  17. 1. LIFT the Tukanoan data in Huber and Reed (1992) to allow
    for computer-assisted treatment by:
    a. LINKING concepts to Concepticon, languages to Glottolog, and
    sounds to CLTS,
    b. INSERTING morpheme boundaries,
    c. FINDING cross-semantic partial cognates and sound correspondence
    patterns automatically, and
    d. TURNING the automated analyses into an expert-based, transparent,
    etymological database.
    2. Convert the data to various formats needed for further
    analysis:
    a. Formats for phylogenetic inference using Bayesian frameworks
    applied to cognate sets.
    b. Formats for phylogenetic inference using step matrices in
    parsimony-based analyses (e.g., with PAUP).
    c. Formats for ancestral state reconstruction assuming a fixed tree.
    A C.-A. Study of Tukanoan: Integration

    View Slide

  18. Linking to Reference Catalogs and Standardization:
    ● Concepticon
    ● Glottolog
    ● CLTS
    Annotation:
    ● partial, cross-semantic cognate annotation with LingPy and
    EDICTOR
    ● using morpheme-glosses to handle language-internal
    cognates
    A C.-A. Study of Tukanoan: Transparency

    View Slide

  19. 1. Infer correspondence patterns from aligned,
    cross-semantic, partial cognates using the algorithm by
    List (2019).
    2. Annotate the most frequently recurring
    correspondence patterns manually, by assigning each
    pattern a proto-form, based on the judgment of the
    expert.
    3. Iterate over all alignments in the data, and assign all
    compatible proto-forms to each alignment site for each
    partial cognate set.
    A C.-A. Study of Tukanoan: Reconstruction

    View Slide

  20. 1. Instead of the classical “cognate sets”, we start from
    partial cognates, which are assigned regardless of
    meaning (cross-semantic partial cognates, CROSSIDS in
    our tools).
    2. By annotating for each word which of its part is
    considered as salient, we can identify the roots in each
    word transparently, and convert those into classical
    cognate sets, either still cross-semantic, or on a
    per-concept basis.
    A C.-A. Study of Tukanoan: Phylogenies

    View Slide

  21. What we present here is still work in progress. For this reason,
    we cannot provide complete results and new findings. In our
    “State-of-the-Art”, we will therefore concentrate on our current
    results, namely:
    ● What we have done so far to allow for the integration of
    etymological data and phylogenetic analyses.
    ● What we have done so far to increase the transparency of
    our computational and qualitative analyses.
    ● How far we are with our reconstruction of Proto-Tukano.
    ● How far we are with respect to phylogenetic reconstruction.
    State of the Art

    View Slide

  22. ● Data by Huber and Reed (1992) was previously digitized by
    M. Cysouw and has now been linked to Concepticon,
    Glottolog, with phonetic transcriptions being converted to
    the Broad IPA system proposed by the CLTS framework.
    ● Morpheme boundaries were manually added by T. C.
    Chacon, but they will have to be refined in the future, as
    certain boundaries may have been overlooked in the first
    process.
    ● Partial cognates, cross-semantic cognates, phonetic
    alignments, and preliminary sound correspondence
    patterns have been inferred with help of LingPy and
    LingRex.
    State of the Art: Integration

    View Slide

  23. ● Data is rendered in the EDICTOR tool (server-based version),
    which allows all colleagues to access and modify the data
    quickly.
    ● Apart from morpheme-boundaries, we have started to add
    morpheme-glosses (Hill and List 2017), both to annotate
    language-internal cognates and word families more
    properly, and to allow for a transparent annotation of each
    words’ root, which is needed to turn partial cognate
    judgments into full cognate judgments, usually needed for
    phylogenetic reconstruction studies.
    State of the Art: Transparency

    View Slide

  24. View Slide

  25. ● A first, semi-automatic reconstruction has now been carried
    out.
    ● The data can be inspected through the EDICTOR tool, but
    there remain many uncertainties so far, and all entries will
    have to be checked by our expert (T. C. Chacon).
    ● Once all proto-forms are assembled, this may qualify as
    something being worth to be published itself, since so far,
    scholars have only reconstructed the consonant inventories
    of Tukano, and nobody has so far tested, to which degree
    the sound correspondence patterns really hold.
    State of the Art: Reconstruction

    View Slide

  26. State of the Art: Phylogenies COGID

    View Slide

  27. State of the Art: Phylogenies CROSSID

    View Slide

  28. - The position of Kubeo and Tanimuka is still under
    investigation, but the new phylogenies support Chacon
    (2014, 2015)
    - Issues
    - Higher death on Western Branch
    - Sampling of languages/dialects
    - Intra- and extra-family contact
    - Clean and properly analyzed data
    State of the Art: Phylogenies

    View Slide

  29. Future plans:
    ● Enhance the data by refining the current cognates,
    language-internal cognates, and proto-forms.
    ● Expand the data by adding extinct Tukanoan
    languages to allow for a calibration in the phylogenetic
    analyses.
    ● Extend the phylogenetic analyses carried out so far, by
    testing correspondence-pattern based phylogenies and
    improving the Bayesian analyses.
    Outlook

    View Slide

  30. ● Luke Maurits for helping with the BEASTLing analyses.
    ● Michael Cysouw for digitizing the data by Huber and
    Reed (1992).
    Thanks to:
    Thank you for listening!

    View Slide