Using network models to analyze Old Chinese rhyme data

Using network models to analyze Old Chinese rhyme data

Talk held at the workshop "Recent Advances in Old Chinese Historical Phonology" (2015/11/05-06, SOAS, London).

E01961dd2fbd219a30044ffe27c9fb70?s=128

Johann-Mattis List

November 05, 2015
Tweet

Transcript

  1. Using Network Models to Analyze Old Chinese Rhyme Data Johann-Mattis

    List (CRLAO, Paris)
  2. Rhymes and Networks

  3. Rhymes Lose yourself in the music the moment you own

    it you better never let it go you only get one shot do not miss your chance to blow this opportunity comes once in a lifetime… (Eminem, “Lose yourself”, 2002)
  4. Rhymes Lose yourself in the music the moment you own

    it you better never let it go you only get one shot do not miss your chance to blow this opportunity comes once in a lifetime… (Eminem, “Lose yourself”, 2002)
  5. Rhymes Lose yourself in the music [-ɪk] ? [ɔi] the

    moment you own it [-ɪt] ? [ai] you better never let it go you only get one shot do not miss your chance to blow this opportunity comes once in a lifetime… (Eminem, “Lose yourself”, 2002)
  6. Rhymes music [-ɪk] own it [-ɪt] But Germans would rhyme

    employ and deny! Mai [-ɔi] neu [-ai]
  7. Networks

  8. Networks

  9. Networks

  10. Networks

  11. From Rhymes to Networks

  12. From Rhymes to Networks

  13. From Rhymes to Networks

  14. From Rhymes to Networks

  15. From Rhymes to Networks

  16. From Rhymes to Networks

  17. Constructing a Shījīng Network

  18. Data Preparation The starting point of the Shījīng network that

    was constructed for this talk are the rhyme annotations given in the appendix of Baxter (1992). Since the data was not digitally available, I transferred the annotations by Baxter to a digital version of the Shījīng (Project Gutenberg) and checked it with additional digital versions (e.g., http://ctext.org).
  19. Data Preparation The digital version was corrected during this process,

    where the comparison with other versions and with Baxter’s data showed that it contained errors. Furthermore, I had a digital collection of most of the Old Chinese reconstructions given in the new OCBS system (provided by L. Sagart).
  20. Data Organization Once the data was prepared in such a

    way, the Shījīng was organized into: • poems (numbered as 1, 2, 3, etc.) • stanzas (numbered 1.1, 1.2, etc.) • section (part ended by comma or full stop in which normally the rhyme words occur, numbered for each stanza, 1, 2, 3, etc.)
  21. Data Organization If a section contained a rhyme word according

    to Baxter’s annotation, this was noted as such. If I detected further rhyme words or had reasons to disagree with Baxter’s annotation, this was noted in an alternative annotation. For each section, I tried to identify the Old Chinese readings of the OCBS system. This was not possible in all cases. Some 400 readings are missing, and I still did not have time to check them.
  22. Data Inspection and Presentation The Shījīng dataset was converted to

    an interactive Web- Application that can be used to browse rhyming patterns in the Shijing. • http://digling.org/shijing/
  23. Network Reconstruction The network was reconstructed as follows: • all

    characters which occur in the Shījīng in a position that was annotated as “rhyming” according to Baxter’s annotation, are represented as nodes • links between two characters are drawn whenever they are annotated as being rhyming in a given poem • the number of instances in which two characters rhyme in separate stanzas were counted and assigned as the edge weights of the network • node weights were derived from the number of times the rhyme words occurred in the Shījīng
  24. Network Reconstruction The data was normalized: • by counting every

    pair of identical lines only once, in order to avoid that phrases bear too much weight The data should be further normalized (but there was no time for the analysis): • by controlling the weight of each occurrence of a rhyme along with the size of the rhyme group, in which it occurs, in order to avoid that one overcounts links in poems with really large groups of rhyming words per stanza
  25. Analyzing the Shījīng Network

  26. A Bird’s Eye View

  27. A Bird’s Eye View

  28. A Bird’s Eye View

  29. A Bird’s Eye View

  30. A Bird’s Eye View

  31. Searching for Structure Coding Rhymes by Vowel Quality: ▪ a

    ▪ e ▪ i ▪ o ▪ u ▪ ə
  32. Transitions between Rhyme Groups...

  33. Jaccard index computed for the number of transitions between the

    OCBS rhyme groups and inside the same group. Most of the groups are strongly recovered, some, however, occur so infrequently that they rhyme more often with other words than with words of their own group. Transitions betweenRhyme Groups...
  34. Searching for Independent Structure: Communities

  35. Searching for Independent Structure: Communities

  36. Searching for Independent Structure: Communities

  37. Searching for Independent Structure: Communities

  38. Searching for Independent Structure: Communities

  39. Searching for Independent Structure: Communities 1 3

  40. Searching for Independent Structure: Communities An Infomap community detection analysis

    was carried out on the rhyme data. Infomap (Rosvall and Bergstrom 2008) is a fast community detection algorithm with a very good performance. It handles weighted nodes and weighted edges, and uses random walks through the network in order to determine the best partition into communities. • Results can be inspected at http://digling. org/shijing/infomap.html
  41. Searching for Independent Structure: Communities

  42. Searching for Independent Structure: Communities The communities which were identified

    are not necessarily overlapping directly with the OCBS reconstructions of the rhymes, but show interesting transitions between rhyme groups as well as splits. Interpreting the data, however, is difficult, since: • a community identified by Infomap is not necessarily homogeneous, since rhyming is not homogeneous • a split of words with the same rhyme group into two communities does not imply that they do not rhyme • we always need to get back to the real data and see what is going on there
  43. A Closer View at Specific Patterns: The *r-Coda Jaccard Index

    for shared rimes of characters with Old Chinese coda *r, *j, and *n according to the OCBS.
  44. A Closer View at Specific Patterns: *ar, *aj, and *an

    Coding Rhymes by Coda: ▪ an ▪ ar ▪ aj ▪ unclear
  45. A Closer View at Specific Patterns: *ar, *aj, and *an

    Coding Rhymes by Coda: ▪ an ▪ ar ▪ aj ▪ unclear
  46. A Closer View at Specific Patterns: *ar, *aj, and *an

    Coding Rhymes by Coda: ▪ an ▪ ar ▪ aj ▪ unclear
  47. A Closer View at Specific Patterns: *ar, *aj, and *an

    Coding Rhymes by Coda: ▪ an ▪ ar ▪ aj ▪ unclear
  48. A Closer View at Specific Patterns: *ar, *aj, and *an

    Coding Rhymes by Coda: ▪ an ▪ ar ▪ aj ▪ unclear
  49. A Closer View at Specific Patterns: *ar, *aj, and *an

    Coding Rhymes by Coda: ▪ an ▪ ar ▪ aj ▪ unclear Community 16 (Infomap)
  50. A Closer View at Specific Patterns: *ar, *aj, and *an

    Coding Rhymes by Coda: ▪ an ▪ ar ▪ aj ▪ unclear Community 16 (Infomap)
  51. A Closer View at Specific Patterns: *ar, *aj, and *an

    Coding Rhymes by Coda: ▪ an ▪ ar ▪ aj ▪ unclear Community 16 (Infomap)
  52. Outlook

  53. Where are we? Well… • Rhyme analysis based on network

    approaches is still strictly experimental. • We need to enhance the data (missing readings, swapped lines in the Shījīng text). • We need to enhance the models (better normalization!)
  54. Where are we? But… • Already at this stage, it

    turns be useful to inspect the automatically identified clusters in times of doubt regarding the reading of a certain character. • It is generally useful to make use of interactive visualization techniques when dealing with huge amounts of data (especially when it comes from different sources). • Tools like the “Shījīng rhyme browser” are especially useful for beginners, but probably also for experts (?).
  55. Where could we be? Imagine… • A world in which

    we have large collections of rhyme networks on all kinds of poetry, ranging Shakespeare via Bob Dylan, up to Eminem. • We could gather important information on rhyming behaviour — both cross-cultural and culture-specific. • We could track the emergence of hip hop, or the degradation of rhyme patterns in modern poetry, or even the influence of the “Judas!-call” on Bob Dylans rhyming practice...
  56. Where could we be? Imagine (now seriously)… • that we

    could carry out large-scale comparisons on rhyming practice in different stages of Chinese • that we could propose transparently our individual assessments of what we think rhymed in pieces of old Chinese poetry • that we could trace the history of Chinese in poetry networks … doesn’t that sound like it could be interesting?
  57. Thanks to Laurent Sagart and William Baxter for helpful discussion,

    tips, ideas, and data!
  58. Thanks to Laurent Sagart and William Baxter for helpful discussion,

    tips, ideas, and data! Thanks to Bob Dylan, Eminem, Shakespeare, and all the other poets out there!
  59. Thanks to You, for Your Attention! Thanks to Laurent Sagart

    and William Baxter for helpful discussion, tips, ideas, and data! Thanks to Bob Dylan, Eminem, Shakespeare, and all the other poets out there!
  60. Thanks to You, for Your Attention! ranks bankes blanks francs

    tanks convention dimension green gentian comprehension suspension Thanks to Laurent Sagart and William Baxter for helpful discussion, tips, ideas, and data! Thanks to Bob Dylan, Eminem, Shakespeare, and all the other poets out there!