Slide 9
Slide 9 text
Challenges for Humans
“There is a severe imbalance of
being data-rich and theory-poor.”
(William S.-Y. Wang, 1996)
● many datasets on South-East Asian languages have been published
(Sidwell 2015, Wang 2004, Huang 1992, etc.)
● large digitized collections have been made available via the STEDT
project (Matisoff 2011)
● but the majority of these data is unprocessed (not further checked
by linguists), lacking etymologies, cognate judgments, phonetic
transcriptions, or concept annotations