Slide 17
Slide 17 text
Towards a Cognate Detection Benchmark Database
● Benchmark databases are essential to train, test, and evaluate our algorithms
● So far, there are not many benchmark databases which could be used for our
tasks
● With the new data we prepared for this study, and the data that was compiled
to test LingPy (List 2014), there are already 18 different datasets which are
○ in more or less clean phonetic encoding
○ are segmentized
○ are tagged for cognacy
● We think these should be published as a benchmark database for cognate
detection (BDCD, or “LexiBench”?)
● In this way, we maintain comparability with other algorithms that might be
proposed, and we ease the creation of alternative approaches