Towards refined phylogenies of the Tukanoan languages: A computer-assisted approach

Towards reﬁned phylogenies of the Tukanoan languages: A computer-assisted approach
Thiago Costa Chacon Tiago Tresoldi Johann-Mattis List

Objectives - Investigate the classiﬁcation of the Tukanoan language family
by means of an intensive comparison of classical, computational, and computer-assisted approaches - Show how we are lifting the data by Huber and Reed (1992) to a level where a computer-assisted linguistic and phylogenetic reconstruction can be carried out - Present initial results and challenges for the future

1. The Tukanoan language family 2. Issues in the classiﬁcation
of Tukanoan 3. A computer-assisted study of Tukanoan 4. State of the art 5. Outlook Plan of the Talk

- A “mid-size” linguistic family of South America - About
29 languages (8 extinct) - Located in Northwest Amazonia - Strong contact relations with Arawakan and other neighboring families The Tukanoan Language Family

- Geographical split into Western (WT) and Eastern (ET) languages
, already documented in colonial maps from the 17th century - The lower number of WT languages is possibly due to earlier and more devastating impacts of colonization The Tukanoan Language Family

Beuchat and Rivet (1911) • Defined the Tukanoan family as
a group of languages unrelated to any other linguistic family. Mason (1950) • Established the major division between WT and ET branches, but did not present any criteria other than geography for his classification. Issues in the Classification of Tukanoan

Waltz and Wheeler (1972) • Kubeo classified in a third
branch: Middle Tukanoan • Lexical similarity criteria Malone (1986ms.) and Barnes (1999) • Both Kubeo and Tanimuka-Retuarã classified as Middle Tukanoan • Classification based on shared sound innovations Issues in the Classification of Tukanoan

Waltz and Wheeler (1972) Malone (1986) and Barnes (1999) Issues
in the Classiﬁcation of Tukanoan

Chacon (2014) • WT vs. ET division • Kubeo and
Tanimuka are ET • Uses the classical comparative method by reconstructing the consonant inventory and proposing innovations in consonantal change Issues in the Classiﬁcation of Tukanoan

Classiﬁcation by Chacon (2014), based on shared innovations in sound-change
processes. Issues in the Classiﬁcation of Tukanoan

Chacon and List (2015): • Confirm and refine Chacon’s (2014)
reconstruction • Correspondences sets with reconstructed sounds from Chacon (2014) were converted into a sound change transition network, where ◦ proto-forms were included ◦ intermediate unattested states were proposed based on a qualitative assessment regarding sound change tendencies • The sound change networks were analyzed to find a phylogenetic tree, using parsimony based on step matrices (directed and weighted transition preferences between sounds) Issues in the Classification of Tukanoan

Issues in the Classiﬁcation of Tukanoan

Chacon and List (2015): • The parsimony-based approach can infer
trees which fit the data best by finding the most parsimonious trees with respect to the qualitative assessment of sound change transitions and the reconstructed proto-forms • It also infers which sound transitions occurred in the internal nodes if a tree is supplied by the users • Given that similar sound changes may occur on different branches of a given tree, the method reflects homoplasy (and also shows that not all sound changes are equally indicative for subgrouping) Issues in the Classification of Tukanoan

Issues in the Classiﬁcation of Tukanoan Revised classiﬁcation in Chacon
and List (2015), based on parsimony and Chacon’s expert assessment

Problems of sound-change-based phylogenies: - Phylogenies built on parsimony are
merely topological, branch lengths (dates) cannot be inferred. - Sound changes are homoplastic (Ringe 2002). - The workﬂow is not practical for the application to other language families, since it requires an extreme degree of expert knowledge, which is in the danger of being circular - Thus, it cannot be seen as an independent test Issues in the Classiﬁcation of Tukanoan

Basic ideas 1. Allow for an integration of cognates, sound
correspondences, and proto-forms to enable different analyses. 2. Increase transparency by relying on a new workflow. 3. Produce ultimately a new semi-automated reconstruction of a substantial amount of lexical items in Tukanoan. 4. Allow for the convenient computation of different phylogenies. A Computer-Assisted Study of Tukanoan

1. LIFT the Tukanoan data in Huber and Reed (1992)
to allow for computer-assisted treatment by: a. LINKING concepts to Concepticon, languages to Glottolog, and sounds to CLTS, b. INSERTING morpheme boundaries, c. FINDING cross-semantic partial cognates and sound correspondence patterns automatically, and d. TURNING the automated analyses into an expert-based, transparent, etymological database. 2. Convert the data to various formats needed for further analysis: a. Formats for phylogenetic inference using Bayesian frameworks applied to cognate sets. b. Formats for phylogenetic inference using step matrices in parsimony-based analyses (e.g., with PAUP). c. Formats for ancestral state reconstruction assuming a ﬁxed tree. A C.-A. Study of Tukanoan: Integration

Linking to Reference Catalogs and Standardization: • Concepticon • Glottolog
• CLTS Annotation: • partial, cross-semantic cognate annotation with LingPy and EDICTOR • using morpheme-glosses to handle language-internal cognates A C.-A. Study of Tukanoan: Transparency

1. Infer correspondence patterns from aligned, cross-semantic, partial cognates using
the algorithm by List (2019). 2. Annotate the most frequently recurring correspondence patterns manually, by assigning each pattern a proto-form, based on the judgment of the expert. 3. Iterate over all alignments in the data, and assign all compatible proto-forms to each alignment site for each partial cognate set. A C.-A. Study of Tukanoan: Reconstruction

1. Instead of the classical “cognate sets”, we start from
partial cognates, which are assigned regardless of meaning (cross-semantic partial cognates, CROSSIDS in our tools). 2. By annotating for each word which of its part is considered as salient, we can identify the roots in each word transparently, and convert those into classical cognate sets, either still cross-semantic, or on a per-concept basis. A C.-A. Study of Tukanoan: Phylogenies

What we present here is still work in progress. For
this reason, we cannot provide complete results and new ﬁndings. In our “State-of-the-Art”, we will therefore concentrate on our current results, namely: • What we have done so far to allow for the integration of etymological data and phylogenetic analyses. • What we have done so far to increase the transparency of our computational and qualitative analyses. • How far we are with our reconstruction of Proto-Tukano. • How far we are with respect to phylogenetic reconstruction. State of the Art

• Data by Huber and Reed (1992) was previously digitized
by M. Cysouw and has now been linked to Concepticon, Glottolog, with phonetic transcriptions being converted to the Broad IPA system proposed by the CLTS framework. • Morpheme boundaries were manually added by T. C. Chacon, but they will have to be reﬁned in the future, as certain boundaries may have been overlooked in the ﬁrst process. • Partial cognates, cross-semantic cognates, phonetic alignments, and preliminary sound correspondence patterns have been inferred with help of LingPy and LingRex. State of the Art: Integration

• Data is rendered in the EDICTOR tool (server-based version),
which allows all colleagues to access and modify the data quickly. • Apart from morpheme-boundaries, we have started to add morpheme-glosses (Hill and List 2017), both to annotate language-internal cognates and word families more properly, and to allow for a transparent annotation of each words’ root, which is needed to turn partial cognate judgments into full cognate judgments, usually needed for phylogenetic reconstruction studies. State of the Art: Transparency

• A ﬁrst, semi-automatic reconstruction has now been carried out.
• The data can be inspected through the EDICTOR tool, but there remain many uncertainties so far, and all entries will have to be checked by our expert (T. C. Chacon). • Once all proto-forms are assembled, this may qualify as something being worth to be published itself, since so far, scholars have only reconstructed the consonant inventories of Tukano, and nobody has so far tested, to which degree the sound correspondence patterns really hold. State of the Art: Reconstruction

State of the Art: Phylogenies COGID

State of the Art: Phylogenies CROSSID

- The position of Kubeo and Tanimuka is still under
investigation, but the new phylogenies support Chacon (2014, 2015) - Issues - Higher death on Western Branch - Sampling of languages/dialects - Intra- and extra-family contact - Clean and properly analyzed data State of the Art: Phylogenies

Future plans: • Enhance the data by reﬁning the current
cognates, language-internal cognates, and proto-forms. • Expand the data by adding extinct Tukanoan languages to allow for a calibration in the phylogenetic analyses. • Extend the phylogenetic analyses carried out so far, by testing correspondence-pattern based phylogenies and improving the Bayesian analyses. Outlook

• Luke Maurits for helping with the BEASTLing analyses. •
Michael Cysouw for digitizing the data by Huber and Reed (1992). Thanks to: Thank you for listening!

Towards refined phylogenies of the Tukanoan lan...

Towards refined phylogenies of the Tukanoan languages: A computer-assisted approach

Johann-Mattis List

More Decks by Johann-Mattis List

Other Decks in Science

Featured

Transcript

Towards reﬁned phylogenies of the Tukanoan languages: A computer-assisted approach

Objectives - Investigate the classiﬁcation of the Tukanoan language family

1. The Tukanoan language family 2. Issues in the classiﬁcation

- A “mid-size” linguistic family of South America - About

- Geographical split into Western (WT) and Eastern (ET) languages

Beuchat and Rivet (1911) • Deﬁned the Tukanoan family as

Waltz and Wheeler (1972) • Kubeo classiﬁed in a third

Waltz and Wheeler (1972) Malone (1986) and Barnes (1999) Issues

Chacon (2014) • WT vs. ET division • Kubeo and

Classiﬁcation by Chacon (2014), based on shared innovations in sound-change

Chacon and List (2015): • Conﬁrm and reﬁne Chacon’s (2014)

Issues in the Classiﬁcation of Tukanoan

Chacon and List (2015): • The parsimony-based approach can infer

Issues in the Classiﬁcation of Tukanoan Revised classiﬁcation in Chacon

Problems of sound-change-based phylogenies: - Phylogenies built on parsimony are

Basic ideas 1. Allow for an integration of cognates, sound

1. LIFT the Tukanoan data in Huber and Reed (1992)

Linking to Reference Catalogs and Standardization: • Concepticon • Glottolog

1. Infer correspondence patterns from aligned, cross-semantic, partial cognates using

1. Instead of the classical “cognate sets”, we start from

What we present here is still work in progress. For

• Data by Huber and Reed (1992) was previously digitized

• Data is rendered in the EDICTOR tool (server-based version),

• A ﬁrst, semi-automatic reconstruction has now been carried out.

State of the Art: Phylogenies COGID

State of the Art: Phylogenies CROSSID

- The position of Kubeo and Tanimuka is still under

Future plans: • Enhance the data by reﬁning the current

• Luke Maurits for helping with the BEASTLing analyses. •