The Future of the Comparative Method

The Future of the Comparative Method Towards a Computer-Assisted Framework
of Linguistic Reconstruction Johann-Mattis List DFG research fellow Centre des recherches linguistiques sur l’Asie Orientale Team Adaptation, Integration, Reticulation, Evolution EHESS and UPMC, Paris 2015-06-22 1 / 20

Background Background 2 / 20

Background Deﬁnitions Deﬁnitions 3 / 20

Background Deﬁnitions Deﬁnitions In linguistics, the comparative method is a
technique for studying the development of languages by performing a feature-by-feature comparison of two or more languages with common descent from a shared ancestor, as opposed to the method of internal reconstruction, which analyses the internal development of a single language over time. Wikipedia s.v. "Comparative Method" The method of comparing languages to determine whether and how they have developed from a common ancestor. The items compared are lexical and grammatical units, and the aim is to discover correspondences relating sounds in two or more di�erent languages, which are so numerous and so regular, across sets of units with similar meanings, that no other explanation is reasonable. Oxford Dictionary of Linguistics (Matthews 1997) The comparative is both the earliest and the most important of the methods of reconstruction. Most of the major insights into the prehistory of languages have been gained by the applications of this method, and most reconstructions have been based on it. Fox (1995) The Comparative Method is the central tool in historical linguistics for historical reconstruction and also classifying languages. A classi�cation done with the Comparative Method is called a genetic classi�cation. The result is that languages are arranged in language family trees. This means that languages are classi�ed according to their genealogical relationships2 and are interpreted as being in relation of child- or sisterhood to other languages. Such a way of classifying entities is called phylogenetic classi�cation in biology; a classi�cation by genealogical relationships. Fleischhauer (2009) The method of comparatistics today is generally known under the not very well-chosen term "comparative-historical method". It constitutes a huge complex of abstract and concrete procedures for the investigation of the history of related languages which genetically go back to some unofrom tradition of the past. Klimov (1990), my translation → comparative linguistics, reconstruction Routledge Dictionary of Language and Linguistics (Bussmann 1996) 3 / 20

Background Definitions Definitions Scholar Proof of Relationship Study of Language
History External Reconstruction Linguistic Reconstruction Language Classification Anttila (1972) ✓ ✓ Bußmann (2002) ✓ Fleischhauer (2009) ✓ Fox (1995) ✓ Glück (2000) ✓ Harrison (2003) ✓ Hoenigswald (1960) ✓ Jarceva (1990) ✓ Klimov (1990) ✓ ✓ Lehmann (1969) ✓ Makaev (1977) ✓ Matthews (1997) ✓ Rankin (2003) ✓ 3 / 20

Background Definitions Definitions Working Definition for the Comparative Method The
comparative method is a bunch of techniques that are commonly used by historical linguists in order to reconstruct the history of languages and language families. 3 / 20

Background Workﬂows Workﬂows 4 / 20

Background Workflows Workflows Workflow by Ross and Durie (1996) 1.
Determine on the strength of diagnostic evidence that a set of languages are genetically related, that is, that they constitute a ‘family’; 2. Collect putative cognate sets for the family (both morphological paradigms and lexical items). 3. Work out the sound correspondences from the cognate sets, putting ‘irregular’ cognate sets on one side; 4. Reconstruct the protolanguage of the family as follows: a Reconstruct the protophonology from the sound correspondences worked out in (3), using conventional wisdom regarding the directions of sound changes. b Reconstruct protomorphemes (both morphological paradigms and lexical items) from the cognate sets collected in (2), using the protophonology reconstructed in (4a). 5. Establish innovations (phonological, lexical, semantic, morphological, morpho- syntactic) shared by groups of languages within the family relative to the reconstructed protolanguage. 6. Tabulate the innovations established in (5) to arrive at an internal classification of the family, a ‘family tree’. 7. Construct an etymological dictionary, tracing borrowings, semantic change, and so forth, for the lexicon of the family (or of one language of the family). 4 / 20

Background Workflows Workflows PHONOLOGICAL AND MORPHOLOGICAL RECONSTRUCTION IDENTIFICATION OF INNOVATIONS
RECONSTRUCTION OF PHYLOGENIES PUBLISH ETYMOLOGICAL DICTIONARY PROOF OF LANGUAGE RELATIONSHIP SOUND CORRESPONDENCE IDENTIFICATION COGNATE SET IDENTIFICATION Tentative Visualization of the Workflow by Ross and Durie (1996: 6f) 4 / 20

Background Workflows Workflows proof of relationship identification of cognates identification
of sound correspondences reconstruction of proto-forms internal classification revise revise revise revise Simplified Version of Ross and Durie’s Workflow (List 2014: 58) 4 / 20

Problems Problems 5 / 20

Problems Application Application 6 / 20

Problems Application Application PHONOLOGICAL AND MORPHOLOGICAL RECONSTRUCTION IDENTIFICATION OF INNOVATIONS
RECONSTRUCTION OF PHYLOGENIES PUBLISH ETYMOLOGICAL DICTIONARY PROOF OF LANGUAGE RELATIONSHIP SOUND CORRESPONDENCE IDENTIFICATION COGNATE SET IDENTIFICATION 6 / 20

RECONSTRUCTION OF PHYLOGENIES PUBLISH ETYMOLOGICAL DICTIONARY PROOF OF LANGUAGE RELATIONSHIP SOUND CORRESPONDENCE IDENTIFICATION COGNATE SET IDENTIFICATION TIME CONSUMING... 6 / 20

RECONSTRUCTION OF PHYLOGENIES PUBLISH ETYMOLOGICAL DICTIONARY PROOF OF LANGUAGE RELATIONSHIP SOUND CORRESPONDENCE IDENTIFICATION COGNATE SET IDENTIFICATION TIME CONSUMING... TEDIOUS... 6 / 20

Problems Representation Representation 7 / 20

Problems Representation Representation Frucht, ferner fruchten, befruchten, Befruchtung, fruchtbar, fruchtig
Frucht f. ‘der Fortpﬂanzung der eigenen Art dienendes Produkt einer Pﬂanze’, auch ‘ungeborenes Lebewesen’, übertragen ‘Ertrag’, ahd. fruht (9. Jh.), mhd. vruht, asächs. fruht, mnd. mnl. nl. vrucht beruhen auf einer frühen Entlehnung von gleichbed. lat. frūctus, abgeleitet vom Verb lat. fruī (frūctus sum) ‘genießen, Nutzen ziehen’ (verwandt mit brauchen, s. d.). Das Deminutiv Früchtchen hat die spezielle Bedeutung [...] German "Frucht" in Pfei�er (1993, also at http://dwds.de) 7 / 20

Frucht f. ‘der Fortpﬂanzung der eigenen Art dienendes Produkt einer Pﬂanze’, auch ‘ungeborenes Lebewesen’, übertragen ‘Ertrag’, ahd. fruht (9. Jh.), mhd. vruht, asächs. fruht, mnd. mnl. nl. vrucht beruhen auf einer frühen Entlehnung von gleichbed. lat. frūctus, abgeleitet vom Verb lat. fruī (frūctus sum) ‘genießen, Nutzen ziehen’ (verwandt mit brauchen, s. d.). Das Deminutiv Früchtchen hat die spezielle Bedeutung [...] German "Frucht" in Pfei�er (1993, also at http://dwds.de 7 / 20

Frucht f. ‘der Fortpflanzung der eigenen Art dienendes Produkt einer Pflanze’, auch ‘ungeborenes Lebewesen’, übertragen ‘Ertrag’, ahd. fruht (9. Jh.), mhd. vruht, asächs. fruht, mnd. mnl. nl. vrucht beruhen auf einer frühen Entlehnung von gleichbed. lat. frūctus, abgeleitet vom Verb lat. fruī (frūctus sum) ‘genießen, Nutzen ziehen’ (verwandt mit brauchen, s. d.). Das Deminutiv Früchtchen hat die spezielle Bedeutung [...] inherited from borrowed from derived from PIE *bhreu◌◌̯ Hg◌ ◌ ̑ - “to use” PIE *bhruHg◌ ◌ ̑ -ié- “to use” (present tense) PGM *ƀrūkan- “to use” OHG brūhhan “to use” G brauchen “to use” G Brauch “custom” OHG fruht “profit, fruit” G frugal “modest (food)” Fr fruit “profit,fruit” Fr frugal “modest (food)” Lt fruor, fruī “I enjoy” Lt frūctus “profit” Lt frux “fruit, grain” Lt frugalis “bring profit” Adapted from an Illustration by Hans Geisler (University Düsseldorf) German "Frucht" in Pfei�er (1993, also at http://dwds.de 7 / 20

Problems Representation Representation Entry for PIE *kʷetware in Tower of
Babel (http://starling.rinet.ru) 7 / 20

Problems Representation Representation Insuﬃciencies of Data Representation data in “textual
form” (impossible to search it eﬃciently) no standardized phonetic representations no standardized glosses for meanings no standardized names or abbreviations for language and dialect names no standardized representation of sound correspondences no standardized assignment of cognate sets and borrowings ... 8 / 20

Problems Replication Replication 9 / 20

Problems Replication Replication Gloss Blust Pawley Distance “day” *qaco *qaco
0 “to spit” *qanusi *qanusi 0 “person” *taumataq *tamwata 3 “to vomit” *mumutaq *mumuta 1 “name” *ŋajan *qajan 1 “snake” *mwata *mwata 0 “man” *mwa ruqane *taumwaqane 5 “four” *pani *pat 2 “one” *sakai *tasa 3 ... ... ... ... Disagreement between experts on PO reconstructions (Bouchard-Côté et al. 2014) 9 / 20

Problems Replication Replication Reproducability Problems in Historical Linguistics Scholars disagree
on many points in historical linguistics, be it the number of laryngeals, the position of Baltic and Slavic, or whether a given word was borrowed or not. We know well that no two etymological dictionaries for the sa- me language or language families are completely identical. Unfortunately, we lack a rigorous check to which degree experts actually agree or disagree in their judgments. We also lack methods for evaluation which would help us to show to which degree a given hypothesis (a reconstruction, a family tree, or an etymology) corresponds with our linguistic data. 9 / 20

Towards a Computer-Assisted Comparative Method Towards a Computer-Assisted Comparative Method
10 / 20

Towards a Computer-Assisted Comparative Method P(A|B)=(P(B|A)P(A))/(P(B) FRANZ BOPP VERY, VERY
LONG TITLE 11 / 20

Towards a Computer-Assisted Comparative Method PRO: - intuition - background
knowledge - can juggle with multiple types of evidence CONTRA: - has to sleep and rest - does not like to count and do boring work - can oversee facts when doing boring work CONTRA: - no intuition - no background knowledge - can't juggle with multiple types of evidence PRO: - doesn't need to sleep - is very good at counting and boring work - doesn't make errors in boring work P(A|B)=(P(B|A)P(A))/(P(B) FRANZ BOPP VERY, VERY LONG TITLE 11 / 20

Towards a Computer-Assisted Comparative Method PRO: - intuition - background
knowledge - can juggle with multiple types of evidence CONTRA: - has to sleep and rest - does not like to count and do boring work - can oversee facts when doing boring work CONTRA: - no intuition - no background knowledge - can't juggle with multiple types of evidence PRO: - doesn't need to sleep - is very good at counting and boring work - doesn't make errors in boring work P(A|B)=(P(B|A)P(A))/(P(B) FRANZ BOPP VERY, VERY LONG TITLE COMPUTER-ASSISTED LANGUAGE COMPARISON 11 / 20

Towards a Computer-Assisted Comparative Method Standards Standards 12 / 20

Towards a Computer-Assisted Comparative Method Standards Standards: Concept Labeling 12
/ 20

Towards a Computer-Assisted Comparative Method Standards Standards: Concept Labeling Concept
List # Items Concept Label Concept ID Allen (2007) 500 animal oil; 动物油(脂肪) GREASE (CONCEPTICON-ID: 3232) Gregersen (1976) 217 fat-grease*fat-grease GREASE (CONCEPTICON-ID: 3232) Heggarty (2005) 150 fat (grease); grasa GREASE (CONCEPTICON-ID: 3232) Swadesh (1955) 100 fat (grease) GREASE (CONCEPTICON-ID: 3232) Alpher and Nash (1999) 151 fat, grease GREASE (CONCEPTICON-ID: 3232) Hale (1961) 100 fat, grease GREASE (CONCEPTICON-ID: 3232) OGrady and Klokeid (1969) 100 fat, grease GREASE (CONCEPTICON-ID: 3232) Blust (2008) 210 fat/grease GREASE (CONCEPTICON-ID: 3232) Matisoﬀ (1978) 200 fat/grease GREASE (CONCEPTICON-ID: 3232) Samarin (1969) 218 fat/grease GREASE (CONCEPTICON-ID: 3232) Dunn et al. (2012) 207 fat GREASE (CONCEPTICON-ID: 3232) Swadesh (1950) 215 fat GREASE (CONCEPTICON-ID: 3232) Zgraggen (1980) 380 fat GREASE (CONCEPTICON-ID: 3232) Jachontov (1991) 100 fat n. GREASE (CONCEPTICON-ID: 3232) Wiktionary (2003) 207 fat (noun) GREASE (CONCEPTICON-ID: 3232) Starostin (1991) 110 fat n.; жир GREASE (CONCEPTICON-ID: 3232) TeilDautrey et al. (2008) 430 fat, oil GREASE (CONCEPTICON-ID: 3232) Swadesh (1952) 200 fat (organic substance) GREASE (CONCEPTICON-ID: 3232) Shiro (1973) 200 grease (fat) GREASE (CONCEPTICON-ID: 3232) Samarin (1969) 100 grease; graisse; Fett; grasa GREASE (CONCEPTICON-ID: 3232) Wang (2006) 200 pig oil; 猪油 GREASE (CONCEPTICON-ID: 3232) Haspelmath and Tadmor (2009) 1460 the grease or fat GREASE (CONCEPTICON-ID: 3232) Concept labels for “GREASE” in 22 diﬀerent concept lists (see List et al. 2015, online at http://concepticon.clld.org) 12 / 20

Towards a Computer-Assisted Comparative Method Standards Standards: Concept Labeling Concept
labels for “GREASE” in 22 diﬀerent concept lists (see List et al. 2015, online at http://concepticon.clld.org) Concept List # Items Concept Label Concept ID Allen (2007) 500 animal oil; 动物油(脂肪) GREASE (CONCEPTICON-ID:323) Gregersen (1976) 217 fat-grease*fat-grease GREASE (CONCEPTICON-ID:323) Heggarty (2005) 150 fat (grease); grasa GREASE (CONCEPTICON-ID:323) Swadesh (1955) 100 fat (grease) GREASE (CONCEPTICON-ID:323) Alpher and Nash (1999) 151 fat, grease GREASE (CONCEPTICON-ID:323) Hale (1961) 100 fat, grease GREASE (CONCEPTICON-ID:323) OGrady and Klokeid (1969) 100 fat, grease GREASE (CONCEPTICON-ID:323) Blust (2008) 210 fat/grease GREASE (CONCEPTICON-ID:323) Matisoﬀ (1978) 200 fat/grease GREASE (CONCEPTICON-ID:323) Samarin (1969) 218 fat/grease GREASE (CONCEPTICON-ID:323) Dunn et al. (2012) 207 fat GREASE (CONCEPTICON-ID:323) Swadesh (1950) 215 fat GREASE (CONCEPTICON-ID:323) Zgraggen (1980) 380 fat GREASE (CONCEPTICON-ID:323) Jachontov (1991) 100 fat n. GREASE (CONCEPTICON-ID:323) Wiktionary (2003) 207 fat (noun) GREASE (CONCEPTICON-ID:323) Starostin (1991) 110 fat n.; жир GREASE (CONCEPTICON-ID:323) TeilDautrey et al. (2008) 430 fat, oil GREASE (CONCEPTICON-ID:323) Swadesh (1952) 200 fat (organic substance) GREASE (CONCEPTICON-ID:323) Shiro (1973) 200 grease (fat) GREASE (CONCEPTICON-ID:323) Samarin (1969) 100 grease; graisse; Fett; grasa GREASE (CONCEPTICON-ID:323) Wang (2006) 200 pig oil; 猪油 GREASE (CONCEPTICON-ID:323) Haspelmath and Tadmor (2009) 1460 the grease or fat GREASE (CONCEPTICON-ID:323) 12 / 20

Towards a Computer-Assisted Comparative Method Standards Standards: Lexical Representation 13
/ 20

Towards a Computer-Assisted Comparative Method Standards Standards: Lexical Representation Dialect
Entry IPA Segments Morphemes Beijing 大油 ta⁵¹ iou³⁵ t a ⁵¹ i o u ³⁵ t a ⁵¹ + i o u ³⁵ Changsha 油 tɕy³³ iəu¹³ tɕ y ³³ i ə u ¹³ tɕ y ³³ + i ə u ¹³ Chengdu 猪油 tsu⁴⁴iəu³¹ ts u ⁴⁴ i ə u ³¹ ts u ⁴⁴ + i ə u ³¹ Fuzhou 猪油 ty⁴⁴iu⁵² t y ⁴⁴ i u ⁵² t y ⁴⁴ + i u ⁵² Guangzhou 猪膏 tʃy⁵⁵kou⁵³ tʃ y ⁵⁵ k ou ⁵³ tʃ y ⁵⁵ + k ou ⁵³ Meixian 油 jiu¹² j i u ¹² j i u ¹ ² Nanchang 油 iu⁵⁵ i u ⁵⁵ i u ⁵⁵ Taibei ti44 iu13豬油 ti⁴⁴ iu¹³ t i ⁴⁴ i u ¹³ t i ⁴⁴ + i u ¹³ Wenzhou 猪油 tsei⁴⁴ ɦiau³¹ ts e i ⁴⁴ ɦ i a u ³¹ ts e i +⁴⁴ ɦ i a u ³¹ Xiamen 油 iu²⁴ i u ²⁴ i u ²⁴ Lexical entries for “GREASE” (“pork fat”) in 10 Chinese dialect varieties (data taken from Wang and Hamed 2006) 13 / 20

Towards a Computer-Assisted Comparative Method Standards Standards: Lexical Representation Lexical
entries for “GREASE” (“pork fat”) in 10 Chinese dialect varieties (data taken from Wang and Hamed 2006) Dialect Entry IPA Segments Morphemes Beijing 大油 ta⁵¹ iou³⁵ t a ⁵¹ i o u ³⁵ t a ⁵¹ + i o u ³⁵ Changsha 油 tɕy³³ iəu¹³ tɕ y ³³ i ə u ¹³ tɕ y ³³ + i ə u ¹³ Chengdu 猪油 tsu⁴⁴iəu³¹ ts u ⁴⁴ i ə u ³¹ ts u ⁴⁴ + i ə u ³¹ Fuzhou 猪油 ty⁴⁴iu⁵² t y ⁴⁴ i u ⁵² t y ⁴⁴ + i u ⁵² Guangzhou 猪膏 tʃy⁵⁵kou⁵³ tʃ y ⁵⁵ k ou ⁵³ tʃ y ⁵⁵ + k ou ⁵³ Meixian 油 jiu¹² j i u ¹² j i u ¹ ² Nanchang 油 iu⁵⁵ i u ⁵⁵ i u ⁵⁵ Taibei ti44 iu13豬油 ti⁴⁴ iu¹³ t i ⁴⁴ i u ¹³ t i ⁴⁴ + i u ¹³ Wenzhou 猪油 tsei⁴⁴ ɦiau³¹ ts e i ⁴⁴ ɦ i a u ³¹ ts e i ⁴⁴ + ɦ i a u ³¹ Xiamen 油 iu²⁴ i u ²⁴ i u ²⁴ 13 / 20

entries for “GREASE” (“pork fat”) in 10 Chinese dialect varieties (data taken from Wang and Hamed 2006) Dialect Entry IPA Segments Morphemes Beijing 大油 ta⁵¹ iou³⁵ t a ⁵¹ i o u ³⁵ t a ⁵¹ + i o u ³⁵ Changsha 油 tɕy³³ iəu¹³ tɕ y ³³ i ə u ¹³ tɕ y ³³ + i ə u ¹³ Chengdu 猪油 tsu⁴⁴iəu³¹ ts u ⁴⁴ i ə u ³¹ ts u ⁴⁴ + i ə u ³¹ Fuzhou 猪油 ty⁴⁴iu⁵² t y ⁴⁴ i u ⁵² t y ⁴⁴ + i u ⁵² Guangzhou 猪膏 tʃy⁵⁵kou⁵³ tʃ y ⁵⁵ k ou ⁵³ tʃ y ⁵⁵ + k ou ⁵³ Meixian 油 jiu¹² j i u ¹² j i u ¹ ² Nanchang 油 iu⁵⁵ i u ⁵⁵ i u ⁵⁵ Taibei ti44 iu13豬油 ti⁴⁴ iu¹³ t i ⁴⁴ i u ¹³ t i ⁴⁴ + i u ¹³ Wenzhou 猪油 tsei⁴⁴ ɦiau³¹ ts e i ⁴⁴ ɦ i a u ³¹ ts e i +⁴⁴ ɦ i a u ³¹ Xiamen 油 iu²⁴ i u ²⁴ i u ²⁴ 13 / 20

entries for “GREASE” (“pork fat”) in 10 Chinese dialect varieties (data taken from Wang and Hamed 2006) Dialect Entry IPA Segments Morphemes Beijing 大油 ta⁵¹ iou³⁵ t a ⁵¹ i o u ³⁵ t a ⁵¹ + i o u ³⁵ Changsha 油 tɕy³³ iəu¹³ tɕ y ³³ i ə u ¹³ tɕ y ³³ + i ə u ¹³ Chengdu 猪油 tsu⁴⁴iəu³¹ ts u ⁴⁴ i ə u ³¹ ts u ⁴⁴ + i ə u ³¹ Fuzhou 猪油 ty⁴⁴iu⁵² t y ⁴⁴ i u ⁵² t y ⁴⁴ + i u ⁵² Guangzhou 猪膏 tʃy⁵⁵kou⁵³ tʃ y ⁵⁵ k ou ⁵³ tʃ y ⁵⁵ + k ou ⁵³ Meixian 油 jiu¹² j i u ¹² j i u ¹ ² Nanchang 油 iu⁵⁵ i u ⁵⁵ i u ⁵⁵ Taibei ti44 iu13豬油 ti⁴⁴ iu¹³ t i ⁴⁴ i u ¹³ t i ⁴⁴ + i u ¹³ Wenzhou 猪油 tsei⁴⁴ ɦiau³¹ ts e i ⁴⁴ ɦ i a u ³¹ ts e i ⁴⁴ + ɦ i a u ³¹ Xiamen 油 iu²⁴ i u ²⁴ i u ²⁴ 13 / 20

Towards a Computer-Assisted Comparative Method Standards Standards: Representation of Cognate
Judgments 14 / 20

Judgments Language Lexical Entry Cognacy Alignment Central Amis simar 2 s i m a r Thao lhimash 2 lh i m a sh Hanunóo tabáʔ 23 t a b á ʔ Nias tawõ 23 t a w õ - Mailu mona 1 m o n a - Maloh -iñak 1 - i ñ a k Tetum mina 1 m i n a - Banggi laːna 24 l aː n a - Berawan (Long Terawan) ləməʔ 24 l ə m ə ʔ Iban lemak 24 l e m a k Cognate judgments for “grease/fat” across 10 Austronesian languages (data taken from Greenhill et. al 2008, online at http://language.psy.auckland.ac.nz/austronesian/) 14 / 20

Judgments Cognate judgments for “grease/fat” across 10 Austronesian languages (data taken from Greenhill et. al 2008, online at http://language.psy.auckland.ac.nz/austronesian/) Language Lexical Entry Cognacy Alignment Central Amis simar 2 s i m a r Thao lhimash 2 lh i m a sh Hanunóo tabáʔ 23 t a b á ʔ Nias tawõ 23 t a w õ - Mailu mona 1 m o n a - Maloh -iñak 1 - i ñ a k Tetum mina 1 m i n a - Banggi laːna 24 l aː n a - Berawan (Long Terawan) ləməʔ 24 l ə m ə ʔ Iban lemak 24 l e m a k 14 / 20

Towards a Computer-Assisted Comparative Method Standards Jena Wordlist Standard 15
/ 20

Towards a Computer-Assisted Comparative Method Standards Jena Wordlist Standard JENA
WORDLIST STANDARD The Jena Wordlist Standard is being developed by the NESCent style working group “GlottoBank: Towards a Global Language Phylogeny” under the direction of Russel Gray 15 / 20

Towards a Computer-Assisted Comparative Method Standards Jena Wordlist Standard The
Jena Wordlist Standard is being developed by the NESCent style working group “GlottoBank: Towards a Global Language Phylogeny” under the direction of Russel Gray JENA WORDLIST STANDARD DEFINE STANDARDS FOR - Wordlists - Cognate Sets - Alignments PROVIDE TOOLS FOR - Data Validation - Data Exchange - Data Enrichment 15 / 20

Jena Wordlist Standard is being developed by the NESCent style working group “GlottoBank: Towards a Global Language Phylogeny” under the direction of Russel Gray JENA WORDLIST STANDARD arbitrarité Glottolog http://glottolog.clld.org Phoible http://phoible.clld.org CONCEPTICON http://concepticon.clld.org [ˈfɔi.bł] INTEGRATE EXISTING STANDARDS 15 / 20

Jena Wordlist Standard is being developed by the NESCent style working group “GlottoBank: Towards a Global Language Phylogeny” under the direction of Russel Gray PROVIDE TOOLS FOR EDITING AND ANALYSIS LingPy http://lingpy.org TSV EDICTOR http://tsv.lingpy.org JENA WORDLIST STANDARD 15 / 20

Jena Wordlist Standard is being developed by the NESCent style working group “GlottoBank: Towards a Global Language Phylogeny” under the direction of Russel Gray JENA WORDLIST STANDARD LexiBank - Cross-Linguistic Database of Lexical Cognate Sets PhonoBank - Cross-Linguistic Database of Regular Sound Change Patterns USE THE STANDARD TO BUILD NEW DATABASES 15 / 20

Towards a Computer-Assisted Comparative Method Workﬂows Workﬂows 16 / 20

Towards a Computer-Assisted Comparative Method Workflows Workflows P(A|B)=(P(B|A)P(A))/(P(B) FRANZ BOPP
VERY, VERY LONG TITLE Semantic Tagging Segmentation Cognate Detection Alignment Analysis Linguistic Reconstruction Phylogenetic Reconstruction HAND [hænd] FOOT [fʊt] EARTH [ɜːrθ] TREE [triː] BARK [bɑːrk] RAW DATA HAND [hænd] FOOT [fʊt] EARTH [ɜːrθ] TREE [triː] BARK [bɑːrk] WORDLIST DATA HAND [hænd] FOOT [fʊt] EARTH [ɜːrθ] TREE [triː] BARK [bɑːrk] TOKENS, MORPHEMES HAND [hænd] FOOT [fʊt] EARTH [ɜːrθ] TREE [triː] BARK [bɑːrk] COGNATE SETS HAND [hænd] FOOT [fʊt] EARTH [ɜːrθ] TREE [triː] BARK [bɑːrk] SOUND CORRESPON- DENCES HAND [hænd] FOOT [fʊt] EARTH [ɜːrθ] TREE [triː] BARK [bɑːrk] PROTO- FORMS HAND [hænd] FOOT [fʊt] EARTH [ɜːrθ] TREE [triː] BARK [bɑːrk] PHYLO- GENIES PROVIDES AUTOMATIC ANALYSES REVISES AUTOMATIC ANALYSES A possible computer-assisted, iterative workflow with automatic and manual components. 16 / 20

Towards a Computer-Assisted Comparative Method Workﬂows Workﬂows: Tools 17 /
20

Towards a Computer-Assisted Comparative Method Workﬂows Workﬂows: Tools LingPy http://lingpy.org
TSV EDICTOR http://tsv.lingpy.org 17 / 20

Towards a Computer-Assisted Comparative Method Workﬂows Workﬂows: Tools LingPy and
EDICTOR: Two tools for computer-assisted language comparison. TSV EDICTOR http://tsv.lingpy.org Software Library for Automatic Tasks in Historical Linguistics - phonetic segmentation - phonetic alignment - cognate detection - ancestral state reconstruction - borrowing detection - phylogenetic reconstruction 17 / 20

Towards a Computer-Assisted Comparative Method Workﬂows Workﬂows: Tools LingPy and
EDICTOR: Two tools for computer-assisted language comparison. TSV LingPy http://lingpy.org Online Tool for Computer- Assisted Language Comparison - server- and client-based - data validation - phonetic segmentation - cognate set editor - alignment editor - correspondence evaluation 17 / 20

Towards a Computer-Assisted Comparative Method Workﬂows Workﬂows: Test Cases Reconstruction
of Tukano Languages (with T. Chacon) 15 Tukano languages 140 concepts cognate sets are aligned with proposed reconstructions Reconstruction of Burmish Languages (with N. Hill) 8 Burmish languages about 500 concepts cognate sets were determined automatically and are currently being reviewd by the expert Lexical Homology Database of Sino-Tibetan Languages (with L. Sagart and G. Jacques) more than 50 Sino-Tibetan languages about 240 concepts data is currently being assembled 18 / 20

Towards a Computer-Assisted Comparative Method Challenges Challenges P(A|B)=(P(B|A)P(A))/(P(B) FRANZ BO
PP VERY, VERY LO NG TI TLE 19 / 20

Towards a Computer-Assisted Comparative Method Challenges Challenges P(A|B)=(P(B|A)P(A))/(P(B) FRANZ BO
PP VERY, VERY LO NG TI TLE Modeling of Morphological Change morphological change is not systematic (as opposed to sound change) morphological diﬀerences in cognate sets distort the alignments Modeling of Semantic Change semantic shift is not systematic but has general tendencies we need to incorporate known tendencies in our analyses Modeling of Irregular Sound Change irregular or sporadic sound change is problematic for reconstruction we need to ﬁnd ways to incorporate our uncertainty in our alignments 19 / 20

Concluding Remarks Many hypotheses have been proposed regarding the deeper
phylogeny of the Austronesian and many other language families. Unfortunately, the current practice of data presentation makes it dif- ﬁcult to compare and test these hypotheses. If we want to gain new insights into the past of our languages, we need to ﬁnd ways to integrate both the knowledge which experts have been accu- mulating over centuries and the new computa- tional tools which help to organize, analyze and integrate this knowledge. 20 / 20

Thanks for Your Attention! 20 / 20

The Future of the Comparative Method

The Future of the Comparative Method

More Decks by Johann-Mattis List

Other Decks in Science

Featured

Transcript