词 parola λόγος शब◌् द ord λόγος Wort слово cuvînt palabra mot adottszó slovo verbum focal 词 parola शब◌् द ord word ord ord word Cognate Detection 2 / 30
VERY LONG TITLE proof of relationship identification of cognates identification of sound correspondences reconstruction of proto-forms internal classification 4 / 30
VERY LONG TITLE proof of relationship identification of cognates identification of sound correspondences reconstruction of proto-forms internal classification 4 / 30
LONG TITLE Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 3 x d d 1 x n n 1 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn d ɔː n 5 / 30
LONG TITLE Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 3 x d d 1 x n n 1 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn d ɔː n 5 / 30
LONG TITLE Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 2 x d d 1 x n n 1 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn d ɔː n 5 / 30
LONG TITLE Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 2 x d d 1 x n n 1 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn θ ɔː n 5 / 30
LONG TITLE Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 3 x d d 1 x ? n n 2 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn θ ɔː n 5 / 30
LONG TITLE Cognate List Alignment Correspondence List German dünn d ʏ n GER ENG Frequ. d θ 3 x d d 1 x n n 2 x m m 1 x ŋ ŋ 1 x English thin θ ɪ n German Ding d ɪ ŋ English thing θ ɪ ŋ German dumm d ʊ m English dumb d ʌ m German Dorn d ɔɐ n English thorn θ ɔː n 5 / 30
as evaluation scores. Examples for individual cognate judgments are rare. Supplementary data – is often lacking, or – not given in a human-readable form. 11 / 30
as evaluation scores. Examples for individual cognate judgments are rare. Supplementary data – is often lacking, or – not given in a human-readable form. → The results show a great lack of transparency. 11 / 30
intuitive and vary greatly. It is difficult to communicate the results to traditional linguists. → Many linguists regard automatic cognate detection as – “impossible per se”, or 13 / 30
intuitive and vary greatly. It is difficult to communicate the results to traditional linguists. → Many linguists regard automatic cognate detection as – “impossible per se”, or – as useful as “rolling a dice”. 13 / 30
PEAR LaunchPad What is LingPy? Python library for automatic tasks in historical linguistics project homepage: http://lingpy.org code base: https://github.com/lingpy/lingpy supports Python2 and Python3 works on Mac, Linux, and (basically also) Windows current release: 2.3 16 / 30
PEAR LaunchPad What does LingPy offer? tokenization of phonetic sequences phonetic alignment analyses (List 2012a) automatic cognate detection (Turchin 2010, List 2012b) automatic borrowing detection (List et al. 2014) basic routines for the evaluation of automatic methods plotting routines for interactive visualizations 16 / 30
perspective on results of cognate detection analyses. JavaScript and HTML5 offer unique ways for interactive data visualization. At the moment, we develop JavaScript tools that – visualize phonetic alignments of cognate sets, and – even allow to edit the data online. 18 / 30
! First benchmark databases have been compiled and published: Benchmark Database of Phonetic Alignments (BDPA, List & Prokić 2014, http://alignments.lingpy.org) Benchmark Database for Cognate Detection (BDCD, presented in List 2014, http://sequencecomparison.github.io). Benchmark Database for Linguistic Reconstruction (BDLR, in preparation). 20 / 30
! All data is given in phonetic transcriptions (IPA), tokenized into phonemic units, freely available for download, and can be directly used in LingPy. 20 / 30
lexical databases (amount and quality of data), cognate detection algorithms (accessibility and performance), and ways to present the results (interactive visualizations). 24 / 30
- English m uː n - - Danish m ɔː n - ə Swedish m oː n - e Fúzhōu ŋ u o ʔ ⁵ - - - - - - - - - - Měixiàn ŋ i a t ⁵ - - - - - k u o ŋ ⁴⁴ Guǎngzhōu j - y t ² l - œ ŋ ²² - - - - - Běijīng - y ɛ - ⁵¹ l i ɑ ŋ - - - - - - 25 / 30
- English m uː n - - Danish m ɔː n - ə Swedish m oː n - e Fúzhōu ŋ u o ʔ ⁵ - - - - - - - - - - Měixiàn ŋ i a t ⁵ - - - - - k u o ŋ ⁴⁴ Guǎngzhōu j - y t ² l - œ ŋ ²² - - - - - Běijīng - y ɛ - ⁵¹ l i ɑ ŋ - - - - - - "MOON" "MOON" "SHINE" "LIGHT" 25 / 30
go beyond cognacy, we need methods for borrowing detection (stratic aspect), partial cognate inference (morphological aspect), and cross-semantic cognate inference (semantic aspect). 27 / 30
go beyond cognacy, we need methods for borrowing detection (stratic aspect), partial cognate inference (morphological aspect), and cross-semantic cognate inference (semantic aspect). Following the lead of evolutionary biology, these methods should be combined under a unified framework of tree reconciliation (Page & Cotton 2002) in historical linguistics. 27 / 30
the child is constantly growing. Enhancing the applicability, transparency, comparability, and accuracy of cognate detection methods is a goal that can be achieved in the near future. 29 / 30
the child is constantly growing. Enhancing the applicability, transparency, comparability, and accuracy of cognate detection methods is a goal that can be achieved in the near future. The greatest challenge arises from the complexity of lexical change processes. 29 / 30
the child is constantly growing. Enhancing the applicability, transparency, comparability, and accuracy of cognate detection methods is a goal that can be achieved in the near future. The greatest challenge arises from the complexity of lexical change processes. More realistic approaches that go beyond cognacy should be able to handle variation along the stratic, the morphological, and the semantic dimension of lexical change. 29 / 30
the child is constantly growing. Enhancing the applicability, transparency, comparability, and accuracy of cognate detection methods is a goal that can be achieved in the near future. The greatest challenge arises from the complexity of lexical change processes. More realistic approaches that go beyond cognacy should be able to handle variation along the stratic, the morphological, and the semantic dimension of lexical change. Evolutionary biology offers frameworks that could be employed to achieve these goals, yet it is not entirely clear whether and how this is possible. 29 / 30