Towards refined phylogenies of the Tukanoan languages: A computer-assisted approach

Towards refined phylogenies of the Tukanoan languages: A computer-assisted approach

Talk (by Thiago Costa Chacon, Tiago Tresoldi, and Johann-Mattis List), held as part of the Workshop "Computer-assisted approaches to historical and typological language comparison", organized as part of the Annual Meeting of the Societas Linguistica Europea (2019-08-21/24, Leipzig, University of Leipzig).

E01961dd2fbd219a30044ffe27c9fb70?s=128

Johann-Mattis List

August 21, 2019
Tweet

Transcript

  1. Towards refined phylogenies of the Tukanoan languages: A computer-assisted approach

    Thiago Costa Chacon Tiago Tresoldi Johann-Mattis List
  2. Objectives - Investigate the classification of the Tukanoan language family

    by means of an intensive comparison of classical, computational, and computer-assisted approaches - Show how we are lifting the data by Huber and Reed (1992) to a level where a computer-assisted linguistic and phylogenetic reconstruction can be carried out - Present initial results and challenges for the future
  3. 1. The Tukanoan language family 2. Issues in the classification

    of Tukanoan 3. A computer-assisted study of Tukanoan 4. State of the art 5. Outlook Plan of the Talk
  4. - A “mid-size” linguistic family of South America - About

    29 languages (8 extinct) - Located in Northwest Amazonia - Strong contact relations with Arawakan and other neighboring families The Tukanoan Language Family
  5. - Geographical split into Western (WT) and Eastern (ET) languages

    , already documented in colonial maps from the 17th century - The lower number of WT languages is possibly due to earlier and more devastating impacts of colonization The Tukanoan Language Family
  6. Beuchat and Rivet (1911) • Defined the Tukanoan family as

    a group of languages unrelated to any other linguistic family. Mason (1950) • Established the major division between WT and ET branches, but did not present any criteria other than geography for his classification. Issues in the Classification of Tukanoan
  7. Waltz and Wheeler (1972) • Kubeo classified in a third

    branch: Middle Tukanoan • Lexical similarity criteria Malone (1986ms.) and Barnes (1999) • Both Kubeo and Tanimuka-Retuarã classified as Middle Tukanoan • Classification based on shared sound innovations Issues in the Classification of Tukanoan
  8. Waltz and Wheeler (1972) Malone (1986) and Barnes (1999) Issues

    in the Classification of Tukanoan
  9. Chacon (2014) • WT vs. ET division • Kubeo and

    Tanimuka are ET • Uses the classical comparative method by reconstructing the consonant inventory and proposing innovations in consonantal change Issues in the Classification of Tukanoan
  10. Classification by Chacon (2014), based on shared innovations in sound-change

    processes. Issues in the Classification of Tukanoan
  11. Chacon and List (2015): • Confirm and refine Chacon’s (2014)

    reconstruction • Correspondences sets with reconstructed sounds from Chacon (2014) were converted into a sound change transition network, where ◦ proto-forms were included ◦ intermediate unattested states were proposed based on a qualitative assessment regarding sound change tendencies • The sound change networks were analyzed to find a phylogenetic tree, using parsimony based on step matrices (directed and weighted transition preferences between sounds) Issues in the Classification of Tukanoan
  12. Issues in the Classification of Tukanoan

  13. Chacon and List (2015): • The parsimony-based approach can infer

    trees which fit the data best by finding the most parsimonious trees with respect to the qualitative assessment of sound change transitions and the reconstructed proto-forms • It also infers which sound transitions occurred in the internal nodes if a tree is supplied by the users • Given that similar sound changes may occur on different branches of a given tree, the method reflects homoplasy (and also shows that not all sound changes are equally indicative for subgrouping) Issues in the Classification of Tukanoan
  14. Issues in the Classification of Tukanoan Revised classification in Chacon

    and List (2015), based on parsimony and Chacon’s expert assessment
  15. Problems of sound-change-based phylogenies: - Phylogenies built on parsimony are

    merely topological, branch lengths (dates) cannot be inferred. - Sound changes are homoplastic (Ringe 2002). - The workflow is not practical for the application to other language families, since it requires an extreme degree of expert knowledge, which is in the danger of being circular - Thus, it cannot be seen as an independent test Issues in the Classification of Tukanoan
  16. Basic ideas 1. Allow for an integration of cognates, sound

    correspondences, and proto-forms to enable different analyses. 2. Increase transparency by relying on a new workflow. 3. Produce ultimately a new semi-automated reconstruction of a substantial amount of lexical items in Tukanoan. 4. Allow for the convenient computation of different phylogenies. A Computer-Assisted Study of Tukanoan
  17. 1. LIFT the Tukanoan data in Huber and Reed (1992)

    to allow for computer-assisted treatment by: a. LINKING concepts to Concepticon, languages to Glottolog, and sounds to CLTS, b. INSERTING morpheme boundaries, c. FINDING cross-semantic partial cognates and sound correspondence patterns automatically, and d. TURNING the automated analyses into an expert-based, transparent, etymological database. 2. Convert the data to various formats needed for further analysis: a. Formats for phylogenetic inference using Bayesian frameworks applied to cognate sets. b. Formats for phylogenetic inference using step matrices in parsimony-based analyses (e.g., with PAUP). c. Formats for ancestral state reconstruction assuming a fixed tree. A C.-A. Study of Tukanoan: Integration
  18. Linking to Reference Catalogs and Standardization: • Concepticon • Glottolog

    • CLTS Annotation: • partial, cross-semantic cognate annotation with LingPy and EDICTOR • using morpheme-glosses to handle language-internal cognates A C.-A. Study of Tukanoan: Transparency
  19. 1. Infer correspondence patterns from aligned, cross-semantic, partial cognates using

    the algorithm by List (2019). 2. Annotate the most frequently recurring correspondence patterns manually, by assigning each pattern a proto-form, based on the judgment of the expert. 3. Iterate over all alignments in the data, and assign all compatible proto-forms to each alignment site for each partial cognate set. A C.-A. Study of Tukanoan: Reconstruction
  20. 1. Instead of the classical “cognate sets”, we start from

    partial cognates, which are assigned regardless of meaning (cross-semantic partial cognates, CROSSIDS in our tools). 2. By annotating for each word which of its part is considered as salient, we can identify the roots in each word transparently, and convert those into classical cognate sets, either still cross-semantic, or on a per-concept basis. A C.-A. Study of Tukanoan: Phylogenies
  21. What we present here is still work in progress. For

    this reason, we cannot provide complete results and new findings. In our “State-of-the-Art”, we will therefore concentrate on our current results, namely: • What we have done so far to allow for the integration of etymological data and phylogenetic analyses. • What we have done so far to increase the transparency of our computational and qualitative analyses. • How far we are with our reconstruction of Proto-Tukano. • How far we are with respect to phylogenetic reconstruction. State of the Art
  22. • Data by Huber and Reed (1992) was previously digitized

    by M. Cysouw and has now been linked to Concepticon, Glottolog, with phonetic transcriptions being converted to the Broad IPA system proposed by the CLTS framework. • Morpheme boundaries were manually added by T. C. Chacon, but they will have to be refined in the future, as certain boundaries may have been overlooked in the first process. • Partial cognates, cross-semantic cognates, phonetic alignments, and preliminary sound correspondence patterns have been inferred with help of LingPy and LingRex. State of the Art: Integration
  23. • Data is rendered in the EDICTOR tool (server-based version),

    which allows all colleagues to access and modify the data quickly. • Apart from morpheme-boundaries, we have started to add morpheme-glosses (Hill and List 2017), both to annotate language-internal cognates and word families more properly, and to allow for a transparent annotation of each words’ root, which is needed to turn partial cognate judgments into full cognate judgments, usually needed for phylogenetic reconstruction studies. State of the Art: Transparency
  24. None
  25. • A first, semi-automatic reconstruction has now been carried out.

    • The data can be inspected through the EDICTOR tool, but there remain many uncertainties so far, and all entries will have to be checked by our expert (T. C. Chacon). • Once all proto-forms are assembled, this may qualify as something being worth to be published itself, since so far, scholars have only reconstructed the consonant inventories of Tukano, and nobody has so far tested, to which degree the sound correspondence patterns really hold. State of the Art: Reconstruction
  26. State of the Art: Phylogenies COGID

  27. State of the Art: Phylogenies CROSSID

  28. - The position of Kubeo and Tanimuka is still under

    investigation, but the new phylogenies support Chacon (2014, 2015) - Issues - Higher death on Western Branch - Sampling of languages/dialects - Intra- and extra-family contact - Clean and properly analyzed data State of the Art: Phylogenies
  29. Future plans: • Enhance the data by refining the current

    cognates, language-internal cognates, and proto-forms. • Expand the data by adding extinct Tukanoan languages to allow for a calibration in the phylogenetic analyses. • Extend the phylogenetic analyses carried out so far, by testing correspondence-pattern based phylogenies and improving the Bayesian analyses. Outlook
  30. • Luke Maurits for helping with the BEASTLing analyses. •

    Michael Cysouw for digitizing the data by Huber and Reed (1992). Thanks to: Thank you for listening!