of linguistic reconstruction in China (starting with Chén Dì 陳第, 1541 – 1606), breakthrough in the early 20th century with Karlgren’s reconstructions and impressive work by Wáng Lí 王力 (1980) and Li Fang-kuei 李方桂 (1971), 3 / 35
of linguistic reconstruction in China (starting with Chén Dì 陳第, 1541 – 1606), breakthrough in the early 20th century with Karlgren’s reconstructions and impressive work by Wáng Lí 王力 (1980) and Li Fang-kuei 李方桂 (1971), since then more and more improved concrete reconstructions of Old Chinese phonology, 3 / 35
of linguistic reconstruction in China (starting with Chén Dì 陳第, 1541 – 1606), breakthrough in the early 20th century with Karlgren’s reconstructions and impressive work by Wáng Lí 王力 (1980) and Li Fang-kuei 李方桂 (1971), since then more and more improved concrete reconstructions of Old Chinese phonology, another breakthrough in the 1980s, when Baxter (1992), Starostin (1989), and Zhèngzhāng Shàngfāng (see Zhèngzhāng 2003) presented reconstructions in which they independently proposed several similar features (notably six vowels) 3 / 35
of linguistic reconstruction in China (starting with Chén Dì 陳第, 1541 – 1606), breakthrough in the early 20th century with Karlgren’s reconstructions and impressive work by Wáng Lí 王力 (1980) and Li Fang-kuei 李方桂 (1971), since then more and more improved concrete reconstructions of Old Chinese phonology, another breakthrough in the 1980s, when Baxter (1992), Starostin (1989), and Zhèngzhāng Shàngfāng (see Zhèngzhāng 2003) presented reconstructions in which they independently proposed several similar features (notably six vowels) current state-of-the-art is the reconstruction by Baxter & Sagart (2014) 3 / 35
judgments are often not made publicly available difficult to compare different judgments without additional data no large-scale comparisons of different types of evidence and how scholars interpret 4 / 35
judgments are often not made publicly available difficult to compare different judgments without additional data no large-scale comparisons of different types of evidence and how scholars interpret although the research is based on very large amounts of data, those data are usually not accessible in a digital format, which makes it difficult for scholars to evaluate and investigate a given reconstruction system, not to speak of comparing different ones 4 / 35
rhymes in ancient poems structure of ancient characters reconstruction of Middle Chinese via fǎnqiè readings, rhyme books, and ancient character descriptions (dúruò etc.) 5 / 35
rhymes in ancient poems structure of ancient characters reconstruction of Middle Chinese via fǎnqiè readings, rhyme books, and ancient character descriptions (dúruò etc.) Sino-Xenic readings 5 / 35
rhymes in ancient poems structure of ancient characters reconstruction of Middle Chinese via fǎnqiè readings, rhyme books, and ancient character descriptions (dúruò etc.) Sino-Xenic readings ... 5 / 35
rhymes in ancient poems structure of ancient characters reconstruction of Middle Chinese via fǎnqiè readings, rhyme books, and ancient character descriptions (dúruò etc.) Sino-Xenic readings ... Interestingly, many of the different types of evidence have been investigated with help of rudimentary network approaches by classical Chinese scholars in the past. 5 / 35
science can be modeled as networks: social networks: nodes are persons, edges are relations between persons (e.g., friendship on FaceBook, etc.), 7 / 35
science can be modeled as networks: social networks: nodes are persons, edges are relations between persons (e.g., friendship on FaceBook, etc.), phylogenetic networks: nodes are languages or dialect varieties, edges represent genetic closeness, 7 / 35
science can be modeled as networks: social networks: nodes are persons, edges are relations between persons (e.g., friendship on FaceBook, etc.), phylogenetic networks: nodes are languages or dialect varieties, edges represent genetic closeness, network of sound change patterns: nodes are sounds, directed edges represent likelihood of sound change during language evolution, 7 / 35
science can be modeled as networks: social networks: nodes are persons, edges are relations between persons (e.g., friendship on FaceBook, etc.), phylogenetic networks: nodes are languages or dialect varieties, edges represent genetic closeness, network of sound change patterns: nodes are sounds, directed edges represent likelihood of sound change during language evolution, ... 7 / 35
the network check externally proposed groupings against grouping emerging from the network itself investigate the general dynamics of the network 8 / 35
should be transparent (easy to comprehend for experts) data should be human- and machine-readable (to allow for error-checking and automatic analysis as well as qualitative analysis 10 / 35
should be transparent (easy to comprehend for experts) data should be human- and machine-readable (to allow for error-checking and automatic analysis as well as qualitative analysis data should be shared immediately with publications if publications rely on the data 10 / 35
over complicated ones tabular data can be edited in Excel but should be shared in form of CSV files if data structures cannot be immediately understood, authors should add metadata to describe them 11 / 35
over complicated ones tabular data can be edited in Excel but should be shared in form of CSV files if data structures cannot be immediately understood, authors should add metadata to describe them data can be easily hosted on scientific repositories like Zenodo (zenodo.org), and curation is simple with help of services such as GitHub or GitBucket 11 / 35
try to digitize rhyme judgments of different authors (like Baxter’s 1992 Shījīng readings, Wáng’s 1980 Shījīng readings, etc.) make the data comparable, both for humans and machines 12 / 35
try to digitize rhyme judgments of different authors (like Baxter’s 1992 Shījīng readings, Wáng’s 1980 Shījīng readings, etc.) make the data comparable, both for humans and machines make the data freely available on the web and on Zenodo 12 / 35
try to digitize rhyme judgments of different authors (like Baxter’s 1992 Shījīng readings, Wáng’s 1980 Shījīng readings, etc.) make the data comparable, both for humans and machines make the data freely available on the web and on Zenodo Current state-of-the-art: http://github.com/digling/rhymes. 12 / 35
application displays Shījīng rhymes in digitized form with rhyme annotations following Baxter (1992) and rhyme readings following Baxter and Sagart (2014) and Pān (2000, as provided in the Thesaurus Linguae Sericae). offers a quick and transparent way to inspect Baxter’s rhyme annotations, as well as a quick way to search through the Shījīng for rhyme patterns and brief glosses. URL: http://digling.org/shijing 16 / 35
press) Ho (2016) claims that the principle of vowel purity was important in Old Chinese rhyming: poets would try to avoid rhyming words with different consonants, 17 / 35
press) Ho (2016) claims that the principle of vowel purity was important in Old Chinese rhyming: poets would try to avoid rhyming words with different consonants, while differences in the codas were more often tolerated 17 / 35
press) Ho (2016) claims that the principle of vowel purity was important in Old Chinese rhyming: poets would try to avoid rhyming words with different consonants, while differences in the codas were more often tolerated reconstruction systems which contradict this principle, may therefore be externally criticized as neglecting the principle of vowel purity 17 / 35
press) Ho (2016) claims that the principle of vowel purity was important in Old Chinese rhyming: poets would try to avoid rhyming words with different consonants, while differences in the codas were more often tolerated reconstruction systems which contradict this principle, may therefore be externally criticized as neglecting the principle of vowel purity On the other hand, we can compare different reconstruction regarding the degree of purity of their vowels compared to the rhyme data in the Shījīng. 17 / 35
Density a ɑ æ e ə o ɔ u ʊ ɯ i Karlgren (1957) 1830 0.0031 0.0026 x x x x x x x x x x x Li 李方桂 (1971) 1830 0.0031 0.0026 x x x x Wáng 王力 (1980) 1830 0.0031 0.0026 x x x x x Zhèngzhāng 鄭張尚芳 (2003) 1830 0.0031 0.0030 x x x x x x Starostin (1989) 1358 0.0035 0.0026 x x x x x x Pān 潘悟雲 (2000) 1830 0.0031 0.0026 x x x x x Baxter and Sagart (2014) 1431 0.0038 0.0033 x x x x x x Schuessler (2007) 1224 0.0041 0.0035 x x x x x x 18 / 35
sharing connections in a graph are also similar regarding other characteristics (New- man 2003). In social network analyses it can, for exam- ple, be used to test whether observed patterns in a network, like friendship, come along with properties of the individuals, such as language or gender (ibid.). Assortativity can be mea- sured by calculating the assortativity coefficient of a network in which all nodes have a given attribute. 19 / 35
purity was really dominant during time of the creation of the Shījīng, our results indicate that reconstruction systems with six vowels outperform those with less or more vowels. Given that we do not know to which degree vowel purity was important in Old Chinese rhyming, this does not allow us to prove or disprove any of the reconstruction systems. Further research on rhyming practice and pragmatics are needed. 22 / 35
vowel purity paper remains. We tested on Baxter’s (1992) rhyme judgments, which could have easily influenced the results in the favor of the Baxter-Sagart system and of six-vowel systems in general! 23 / 35
vowel purity paper remains. We tested on Baxter’s (1992) rhyme judgments, which could have easily influenced the results in the favor of the Baxter-Sagart system and of six-vowel systems in general! We defended this with availability of data: when making the initial studies, no digital version of alternative rhyme judgments were available. 23 / 35
vowel purity paper remains. We tested on Baxter’s (1992) rhyme judgments, which could have easily influenced the results in the favor of the Baxter-Sagart system and of six-vowel systems in general! We defended this with availability of data: when making the initial studies, no digital version of alternative rhyme judgments were available. This is different now with the CHIP initiative, which just managed to digitize Wáng’s (1980) rhyme judgments along with the reconstructions. 23 / 35
difference in rhyme judgments in Baxter (1992) and Wáng (1980). A simple measure is to compare, how many stanzas differ. From 1014 common stanzas1, 131 are different between Wáng and Baxter (12.9%). 1I excluded those with multi-morpheme rhymes. 24 / 35
difference in rhyme judgments in Baxter (1992) and Wáng (1980). A simple measure is to compare, how many stanzas differ. From 1014 common stanzas1, 131 are different between Wáng and Baxter (12.9%). A far more useful measure is to compare how much different stanzas differ. Comparing rhyme judgments with a cluster task, B-Cubed scores are the perfect measure (Amigo et al. 2009). Applying B-Cubed scores to compare the rhyme judgments, we find 96.8% of similarity between Baxter’s and Wáng’s rhyme judgments. 1I excluded those with multi-morpheme rhymes. 24 / 35
difference in rhyme judgments in Baxter (1992) and Wáng (1980). A simple measure is to compare, how many stanzas differ. From 1014 common stanzas1, 131 are different between Wáng and Baxter (12.9%). A far more useful measure is to compare how much different stanzas differ. Comparing rhyme judgments with a cluster task, B-Cubed scores are the perfect measure (Amigo et al. 2009). Applying B-Cubed scores to compare the rhyme judgments, we find 96.8% of similarity between Baxter’s and Wáng’s rhyme judgments. A new rhyme browser has now been created which contrasts rhymes by Wáng (1980) and Baxter (1992) and is available from http://digling.org/shijing/wangli/. 1I excluded those with multi-morpheme rhymes. 24 / 35
since been known that fǎnqiè 反切 readings can also be analyzed by exploiting their network characteristics (see, e.g., Gěng Zhènshēng 耿振生 2004 on the fǎnqiè xìliánfǎ 反切系聯法). 26 / 35
since been known that fǎnqiè 反切 readings can also be analyzed by exploiting their network characteristics (see, e.g., Gěng Zhènshēng 耿振生 2004 on the fǎnqiè xìliánfǎ 反切系聯法). But with modern network approaches, we can handle the data more consistently and transparently. 26 / 35
since been known that fǎnqiè 反切 readings can also be analyzed by exploiting their network characteristics (see, e.g., Gěng Zhènshēng 耿振生 2004 on the fǎnqiè xìliánfǎ 反切系聯法). But with modern network approaches, we can handle the data more consistently and transparently. By extracting, for example, all fǎnqiè shàngzì 反切上字 from the Guǎngyùn 廣韻, we can create networks of fǎnqiè connections. 26 / 35
since been known that fǎnqiè 反切 readings can also be analyzed by exploiting their network characteristics (see, e.g., Gěng Zhènshēng 耿振生 2004 on the fǎnqiè xìliánfǎ 反切系聯法). But with modern network approaches, we can handle the data more consistently and transparently. By extracting, for example, all fǎnqiè shàngzì 反切上字 from the Guǎngyùn 廣韻, we can create networks of fǎnqiè connections. These networks are ideal for teaching Chinese traditional phonology, but also for comparison if scholars have different opinions. 26 / 35
still underexplored, both with respect to traditional scholarship on Chinese historical phonology and with respect to the way they are best handled, and poten- tial differences across Chinese rhyme books or other sources containing fǎnqiè readings from different epochs or authors. However, it seems promising to further exploit and test the ap- proaches, as they may drastically increase the transparency of current approaches. 28 / 35
段玉裁 detected the strong correlation between the phonetic part of xíngshēng 形聲 characters, we know that the Chinese system basically reflects a network structure, since a large part of the characters can be decomposed into subparts which reflect other characters or recur across different characters. 29 / 35
段玉裁 detected the strong correlation between the phonetic part of xíngshēng 形聲 characters, we know that the Chinese system basically reflects a network structure, since a large part of the characters can be decomposed into subparts which reflect other characters or recur across different characters. Not often really taken into consideration is the historical aspect of these connections. Not all phonetic units of xíngshēng characters were formed at the same time, and the characters reflect a complex evolution of character formation at different steps. 29 / 35
段玉裁 detected the strong correlation between the phonetic part of xíngshēng 形聲 characters, we know that the Chinese system basically reflects a network structure, since a large part of the characters can be decomposed into subparts which reflect other characters or recur across different characters. Not often really taken into consideration is the historical aspect of these connections. Not all phonetic units of xíngshēng characters were formed at the same time, and the characters reflect a complex evolution of character formation at different steps. Instead of listing xíngshēng series in form of lists of characters and common component, we should create explicit networks, as they are much more transparent to display where scholars disagree in their analyses, but also which characters are immediately composed of other characters. 29 / 35
especially aspects of char- acter formation, with help of directed networks could greatly benefit not only scientific exchange among scholars, who would be encouraged to present their judgments more trans- parently, but also other aspects of Chinese writing, such as, e.g., pedagogical aspects of teaching the structure of the writ- ing system to beginners, or information-theoretic aspects. 31 / 35
data we model in networks can be enhanced, also our methods to analyze the networks need to be further improved. As an example, consider dynamic networks, which would analyze and model network changes in time. By im- proving on these methods, we could, for example, compare fǎnqiè networks across different epochs, as well as rhyme networks from different authors, dialects, and styles. We could further try to induce fundamental hierarchies and rel- ative time frames from xiéshēng networks. 32 / 35
the last years in my research is that despite the great achievements schol- ars have made in historical linguistics, and especially in Chi- nese traditional phonology, we still lack clear-cut frameworks that help us to produce our data transparently. Historical lin- guistics is a data-driven discipline, but scholars tend to ignore this when presenting their incredible insights in an intranspar- ent form. Networks can help in two ways here: first, they are a transparent way of data-representation; and second, they provide an added value in those cases, where data becomes too large for scholars to be inspected by eye-balling only. 34 / 35