future prospects T. Tresoldi N. E. Schweikhard M.-S. Wu Y.-F. Lai J.-M. List Max Planck Institute for the Science of Human History Department of Linguistic and Cultural Evolution CALC Project Jul 13, 2018
haven’t changed substantially since the 19th century: 1 conduct intensive language comparison 2 identify regular recurring similarities 3 reconstruct the development of languages and their families
haven’t changed substantially since the 19th century: 1 conduct intensive language comparison 2 identify regular recurring similarities 3 reconstruct the development of languages and their families Issues: Usually done manually, by small groups, over long a time
haven’t changed substantially since the 19th century: 1 conduct intensive language comparison 2 identify regular recurring similarities 3 reconstruct the development of languages and their families Issues: Usually done manually, by small groups, over long a time Crucial tasks, such as cognate identification, are partly based in non-formalized knowledge and intuition
(1830s) Computational methods: Swadesh, Greenberg, S. Starostin lexicostatistics and data normalization vs. glottochronology and mass comparison research on non-consensual supra-families, such as Nostratic
(1830s) Computational methods: Swadesh, Greenberg, S. Starostin lexicostatistics and data normalization vs. glottochronology and mass comparison research on non-consensual supra-families, such as Nostratic Modern methods: Ringe, G. Starostin, “New Zealand School” Linguistics coming back from synchronicity Bayesian inference in phylogenetics: alleged uninterpretability, models from biology Competing research by non-linguists and non-academic setting (NLP)
not studied by and for itself window to human history and a bridge to other disciplines Quantitative (and particularly Bayesian) turn: classical methods reached their limits in some cases open access, collaboration (no more “lone wolves”)
not studied by and for itself window to human history and a bridge to other disciplines Quantitative (and particularly Bayesian) turn: classical methods reached their limits in some cases open access, collaboration (no more “lone wolves”) New languages, new questions: large and diverse language families language families like Sino-Tibetan present “almost unsurmountable obstacles” (Antoine Meillet, 1925)
data Manual alignment Manual sound correspondence Manual cognate judgment ... CLDF EDICTOR LingPy Concepticon Cross-Linguistic Colexifications Software, data, and tools should complement the traditional approach: interdisciplinary approach: adapt rather than transfer
data Manual alignment Manual sound correspondence Manual cognate judgment ... CLDF EDICTOR LingPy Concepticon Cross-Linguistic Colexifications Software, data, and tools should complement the traditional approach: interdisciplinary approach: adapt rather than transfer allow experts to access and understand the results
data Manual alignment Manual sound correspondence Manual cognate judgment ... CLDF EDICTOR LingPy Concepticon Cross-Linguistic Colexifications Software, data, and tools should complement the traditional approach: interdisciplinary approach: adapt rather than transfer allow experts to access and understand the results computational methods cannot replace experts (assist, not replace)
with unified formats for data storage and exchange. Data curation is facilitated by: Doculect Glottocode Concept Concepticon ID Form Tokens Source Anuta anut1237 EIGHT 1705 varu v a r u POLLEX East Futunan east2447 EIGHT 1705 valu v a l u POLLEX Hawaiian hawa1245 EIGHT 1705 walu w a l u ID: 71458 Kapingamarangi kapi1249 EIGHT 1705 walu w a l u POLLEX Mele Fila mele1250 EIGHT 1705 ebaru B a r u ID: 52375 Nukuria nuku1259 EIGHT 1705 varu v a r u Davletshin (2015) . . . . . . . . . . . . . . . . . . . . . Rapanui rapa1244 EIGHT 1705 va’u v a P u POLLEX Rennell Bellona renn1242 EIGHT 1705 bangu b a Ng u POLLEX spreadsheet formats Validation software Benchmark data Reference catalogs Online publications
tedious tasks of comparative linguistics (over 50 publications citing LingPy!). Where can we move from here? Educate and train Stochastic methods, decision trees, neural networks
tedious tasks of comparative linguistics (over 50 publications citing LingPy!). Where can we move from here? Educate and train Stochastic methods, decision trees, neural networks New questions to explore likeliness of random resemblance morphology in cognate identification partial colexifications can intuition be weighted? suprasegmental relationships and segments in their setting
tedious tasks of comparative linguistics (over 50 publications citing LingPy!). Where can we move from here? Educate and train Stochastic methods, decision trees, neural networks New questions to explore likeliness of random resemblance morphology in cognate identification partial colexifications can intuition be weighted? suprasegmental relationships and segments in their setting less-studied languages, including sign languages Litmus test: Sino-Tibetan languages
interrelations between form, meaning, and frequency? Are they system-dependent, culture-specific, or universal? How can computer-assisted methods help answering these questions?
interrelations between form, meaning, and frequency? Are they system-dependent, culture-specific, or universal? How can computer-assisted methods help answering these questions? Perspectives concept-based (onomasiological) vs. form-based (semasiological) cross-linguistic vs. language-specific quantitative vs. qualitative → computer-assisted
question: evolutionary dynamics of lexical change concept-based language(-family)-specific Paradigmatic Alternations in Nominal Derivations question: causes of paradigmatic alternations form-based language(-family)-specific Productivity and Promiscuity in Compounding question: language-specific and universal aspects of compoundhood concept- and form-based cross-linguistic (worldwide) and language-specific
degree of the correlation between language and genetic diversity? A group B group C group A language B language C language WHY? Geography? Population size? Bilingualism?
languages (Sino-Tibetan) exhibit a series of orientational prefixes Traditional approaches Fully related to actual topography According to river, mountain, sun, etc. Inconsistent among scholars: wild guesses Not covering all uses Are orientational prefixes related to actual geography?
current job is to test how closely orientational prefixes are related to real-world topography Selection of 15 familiar places (villages or towns) Collection of the prefixes used between every two places Draw a map based on the prefixes This map represents the collective memory of the speakers Compare the inferred map with the actual map We may get to understand The original meaning of the orientational prefixes Evolutionary pathways of the orientational prefixes How Rgyalrongic (even Sino-Tibetan) ancestors understood and interpreted geography