Slide 1

Slide 1 text

Using Network Models to Analyze Old Chinese Rhyme Data Johann-Mattis List (CRLAO, Paris)

Slide 2

Slide 2 text

Rhymes and Networks

Slide 3

Slide 3 text

Rhymes Lose yourself in the music the moment you own it you better never let it go you only get one shot do not miss your chance to blow this opportunity comes once in a lifetime… (Eminem, “Lose yourself”, 2002)

Slide 4

Slide 4 text

Rhymes Lose yourself in the music the moment you own it you better never let it go you only get one shot do not miss your chance to blow this opportunity comes once in a lifetime… (Eminem, “Lose yourself”, 2002)

Slide 5

Slide 5 text

Rhymes Lose yourself in the music [-ɪk] ? [ɔi] the moment you own it [-ɪt] ? [ai] you better never let it go you only get one shot do not miss your chance to blow this opportunity comes once in a lifetime… (Eminem, “Lose yourself”, 2002)

Slide 6

Slide 6 text

Rhymes music [-ɪk] own it [-ɪt] But Germans would rhyme employ and deny! Mai [-ɔi] neu [-ai]

Slide 7

Slide 7 text

Networks

Slide 8

Slide 8 text

Networks

Slide 9

Slide 9 text

Networks

Slide 10

Slide 10 text

Networks

Slide 11

Slide 11 text

From Rhymes to Networks

Slide 12

Slide 12 text

From Rhymes to Networks

Slide 13

Slide 13 text

From Rhymes to Networks

Slide 14

Slide 14 text

From Rhymes to Networks

Slide 15

Slide 15 text

From Rhymes to Networks

Slide 16

Slide 16 text

From Rhymes to Networks

Slide 17

Slide 17 text

Constructing a Shījīng Network

Slide 18

Slide 18 text

Data Preparation The starting point of the Shījīng network that was constructed for this talk are the rhyme annotations given in the appendix of Baxter (1992). Since the data was not digitally available, I transferred the annotations by Baxter to a digital version of the Shījīng (Project Gutenberg) and checked it with additional digital versions (e.g., http://ctext.org).

Slide 19

Slide 19 text

Data Preparation The digital version was corrected during this process, where the comparison with other versions and with Baxter’s data showed that it contained errors. Furthermore, I had a digital collection of most of the Old Chinese reconstructions given in the new OCBS system (provided by L. Sagart).

Slide 20

Slide 20 text

Data Organization Once the data was prepared in such a way, the Shījīng was organized into: ● poems (numbered as 1, 2, 3, etc.) ● stanzas (numbered 1.1, 1.2, etc.) ● section (part ended by comma or full stop in which normally the rhyme words occur, numbered for each stanza, 1, 2, 3, etc.)

Slide 21

Slide 21 text

Data Organization If a section contained a rhyme word according to Baxter’s annotation, this was noted as such. If I detected further rhyme words or had reasons to disagree with Baxter’s annotation, this was noted in an alternative annotation. For each section, I tried to identify the Old Chinese readings of the OCBS system. This was not possible in all cases. Some 400 readings are missing, and I still did not have time to check them.

Slide 22

Slide 22 text

Data Inspection and Presentation The Shījīng dataset was converted to an interactive Web- Application that can be used to browse rhyming patterns in the Shijing. ● http://digling.org/shijing/

Slide 23

Slide 23 text

Network Reconstruction The network was reconstructed as follows: ● all characters which occur in the Shījīng in a position that was annotated as “rhyming” according to Baxter’s annotation, are represented as nodes ● links between two characters are drawn whenever they are annotated as being rhyming in a given poem ● the number of instances in which two characters rhyme in separate stanzas were counted and assigned as the edge weights of the network ● node weights were derived from the number of times the rhyme words occurred in the Shījīng

Slide 24

Slide 24 text

Network Reconstruction The data was normalized: ● by counting every pair of identical lines only once, in order to avoid that phrases bear too much weight The data should be further normalized (but there was no time for the analysis): ● by controlling the weight of each occurrence of a rhyme along with the size of the rhyme group, in which it occurs, in order to avoid that one overcounts links in poems with really large groups of rhyming words per stanza

Slide 25

Slide 25 text

Analyzing the Shījīng Network

Slide 26

Slide 26 text

A Bird’s Eye View

Slide 27

Slide 27 text

A Bird’s Eye View

Slide 28

Slide 28 text

A Bird’s Eye View

Slide 29

Slide 29 text

A Bird’s Eye View

Slide 30

Slide 30 text

A Bird’s Eye View

Slide 31

Slide 31 text

Searching for Structure Coding Rhymes by Vowel Quality: ■ a ■ e ■ i ■ o ■ u ■ ə

Slide 32

Slide 32 text

Transitions between Rhyme Groups...

Slide 33

Slide 33 text

Jaccard index computed for the number of transitions between the OCBS rhyme groups and inside the same group. Most of the groups are strongly recovered, some, however, occur so infrequently that they rhyme more often with other words than with words of their own group. Transitions betweenRhyme Groups...

Slide 34

Slide 34 text

Searching for Independent Structure: Communities

Slide 35

Slide 35 text

Searching for Independent Structure: Communities

Slide 36

Slide 36 text

Searching for Independent Structure: Communities

Slide 37

Slide 37 text

Searching for Independent Structure: Communities

Slide 38

Slide 38 text

Searching for Independent Structure: Communities

Slide 39

Slide 39 text

Searching for Independent Structure: Communities 1 3

Slide 40

Slide 40 text

Searching for Independent Structure: Communities An Infomap community detection analysis was carried out on the rhyme data. Infomap (Rosvall and Bergstrom 2008) is a fast community detection algorithm with a very good performance. It handles weighted nodes and weighted edges, and uses random walks through the network in order to determine the best partition into communities. ● Results can be inspected at http://digling. org/shijing/infomap.html

Slide 41

Slide 41 text

Searching for Independent Structure: Communities

Slide 42

Slide 42 text

Searching for Independent Structure: Communities The communities which were identified are not necessarily overlapping directly with the OCBS reconstructions of the rhymes, but show interesting transitions between rhyme groups as well as splits. Interpreting the data, however, is difficult, since: ● a community identified by Infomap is not necessarily homogeneous, since rhyming is not homogeneous ● a split of words with the same rhyme group into two communities does not imply that they do not rhyme ● we always need to get back to the real data and see what is going on there

Slide 43

Slide 43 text

A Closer View at Specific Patterns: The *r-Coda Jaccard Index for shared rimes of characters with Old Chinese coda *r, *j, and *n according to the OCBS.

Slide 44

Slide 44 text

A Closer View at Specific Patterns: *ar, *aj, and *an Coding Rhymes by Coda: ■ an ■ ar ■ aj ■ unclear

Slide 45

Slide 45 text

A Closer View at Specific Patterns: *ar, *aj, and *an Coding Rhymes by Coda: ■ an ■ ar ■ aj ■ unclear

Slide 46

Slide 46 text

A Closer View at Specific Patterns: *ar, *aj, and *an Coding Rhymes by Coda: ■ an ■ ar ■ aj ■ unclear

Slide 47

Slide 47 text

A Closer View at Specific Patterns: *ar, *aj, and *an Coding Rhymes by Coda: ■ an ■ ar ■ aj ■ unclear

Slide 48

Slide 48 text

A Closer View at Specific Patterns: *ar, *aj, and *an Coding Rhymes by Coda: ■ an ■ ar ■ aj ■ unclear

Slide 49

Slide 49 text

A Closer View at Specific Patterns: *ar, *aj, and *an Coding Rhymes by Coda: ■ an ■ ar ■ aj ■ unclear Community 16 (Infomap)

Slide 50

Slide 50 text

A Closer View at Specific Patterns: *ar, *aj, and *an Coding Rhymes by Coda: ■ an ■ ar ■ aj ■ unclear Community 16 (Infomap)

Slide 51

Slide 51 text

A Closer View at Specific Patterns: *ar, *aj, and *an Coding Rhymes by Coda: ■ an ■ ar ■ aj ■ unclear Community 16 (Infomap)

Slide 52

Slide 52 text

Outlook

Slide 53

Slide 53 text

Where are we? Well… ● Rhyme analysis based on network approaches is still strictly experimental. ● We need to enhance the data (missing readings, swapped lines in the Shījīng text). ● We need to enhance the models (better normalization!)

Slide 54

Slide 54 text

Where are we? But… ● Already at this stage, it turns be useful to inspect the automatically identified clusters in times of doubt regarding the reading of a certain character. ● It is generally useful to make use of interactive visualization techniques when dealing with huge amounts of data (especially when it comes from different sources). ● Tools like the “Shījīng rhyme browser” are especially useful for beginners, but probably also for experts (?).

Slide 55

Slide 55 text

Where could we be? Imagine… ● A world in which we have large collections of rhyme networks on all kinds of poetry, ranging Shakespeare via Bob Dylan, up to Eminem. ● We could gather important information on rhyming behaviour — both cross-cultural and culture-specific. ● We could track the emergence of hip hop, or the degradation of rhyme patterns in modern poetry, or even the influence of the “Judas!-call” on Bob Dylans rhyming practice...

Slide 56

Slide 56 text

Where could we be? Imagine (now seriously)… ● that we could carry out large-scale comparisons on rhyming practice in different stages of Chinese ● that we could propose transparently our individual assessments of what we think rhymed in pieces of old Chinese poetry ● that we could trace the history of Chinese in poetry networks … doesn’t that sound like it could be interesting?

Slide 57

Slide 57 text

Thanks to Laurent Sagart and William Baxter for helpful discussion, tips, ideas, and data!

Slide 58

Slide 58 text

Thanks to Laurent Sagart and William Baxter for helpful discussion, tips, ideas, and data! Thanks to Bob Dylan, Eminem, Shakespeare, and all the other poets out there!

Slide 59

Slide 59 text

Thanks to You, for Your Attention! Thanks to Laurent Sagart and William Baxter for helpful discussion, tips, ideas, and data! Thanks to Bob Dylan, Eminem, Shakespeare, and all the other poets out there!

Slide 60

Slide 60 text

Thanks to You, for Your Attention! ranks bankes blanks francs tanks convention dimension green gentian comprehension suspension Thanks to Laurent Sagart and William Baxter for helpful discussion, tips, ideas, and data! Thanks to Bob Dylan, Eminem, Shakespeare, and all the other poets out there!