Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Using network models to analyze Old Chinese rhyme data

Using network models to analyze Old Chinese rhyme data

Talk held at the workshop "Recent Advances in Old Chinese Historical Phonology" (2015/11/05-06, SOAS, London).

Johann-Mattis List

November 05, 2015
Tweet

More Decks by Johann-Mattis List

Other Decks in Science

Transcript

  1. Using Network Models to
    Analyze Old Chinese
    Rhyme Data
    Johann-Mattis List (CRLAO, Paris)

    View Slide

  2. Rhymes and Networks

    View Slide

  3. Rhymes
    Lose yourself in the music
    the moment you own it
    you better never let it go
    you only get one shot
    do not
    miss your chance to blow
    this opportunity comes once in a lifetime…
    (Eminem, “Lose yourself”, 2002)

    View Slide

  4. Rhymes
    Lose yourself in the music
    the moment you own it
    you better never let it go
    you only get one shot
    do not
    miss your chance to blow
    this opportunity comes once in a lifetime…
    (Eminem, “Lose yourself”, 2002)

    View Slide

  5. Rhymes
    Lose yourself in the music [-ɪk] ? [ɔi]
    the moment you own it [-ɪt] ? [ai]
    you better never let it go
    you only get one shot
    do not
    miss your chance to blow
    this opportunity comes once in a lifetime…
    (Eminem, “Lose yourself”, 2002)

    View Slide

  6. Rhymes
    music [-ɪk]
    own it [-ɪt]
    But Germans would rhyme employ and deny!
    Mai [-ɔi]
    neu [-ai]

    View Slide

  7. Networks

    View Slide

  8. Networks

    View Slide

  9. Networks

    View Slide

  10. Networks

    View Slide

  11. From Rhymes to Networks

    View Slide

  12. From Rhymes to Networks

    View Slide

  13. From Rhymes to Networks

    View Slide

  14. From Rhymes to Networks

    View Slide

  15. From Rhymes to Networks

    View Slide

  16. From Rhymes to Networks

    View Slide

  17. Constructing a Shījīng Network

    View Slide

  18. Data Preparation
    The starting point of the Shījīng network that was constructed
    for this talk are the rhyme annotations given in the appendix
    of Baxter (1992).
    Since the data was not digitally available, I transferred the
    annotations by Baxter to a digital version of the Shījīng
    (Project Gutenberg) and checked it with additional digital
    versions (e.g., http://ctext.org).

    View Slide

  19. Data Preparation
    The digital version was corrected during this process, where
    the comparison with other versions and with Baxter’s data
    showed that it contained errors.
    Furthermore, I had a digital collection of most of the Old
    Chinese reconstructions given in the new OCBS system
    (provided by L. Sagart).

    View Slide

  20. Data Organization
    Once the data was prepared in such a way, the Shījīng was
    organized into:
    ● poems (numbered as 1, 2, 3, etc.)
    ● stanzas (numbered 1.1, 1.2, etc.)
    ● section (part ended by comma or full stop in which
    normally the rhyme words occur, numbered for each
    stanza, 1, 2, 3, etc.)

    View Slide

  21. Data Organization
    If a section contained a rhyme word according to Baxter’s
    annotation, this was noted as such. If I detected further rhyme
    words or had reasons to disagree with Baxter’s annotation,
    this was noted in an alternative annotation. For each section,
    I tried to identify the Old Chinese readings of the OCBS
    system. This was not possible in all cases. Some 400
    readings are missing, and I still did not have time to check
    them.

    View Slide

  22. Data Inspection and Presentation
    The Shījīng dataset was converted to an interactive Web-
    Application that can be used to browse rhyming patterns in
    the Shijing.
    ● http://digling.org/shijing/

    View Slide

  23. Network Reconstruction
    The network was reconstructed as follows:
    ● all characters which occur in the Shījīng in a position that was
    annotated as “rhyming” according to Baxter’s annotation, are
    represented as nodes
    ● links between two characters are drawn whenever they are annotated
    as being rhyming in a given poem
    ● the number of instances in which two characters rhyme in separate
    stanzas were counted and assigned as the edge weights of the network
    ● node weights were derived from the number of times the rhyme words
    occurred in the Shījīng

    View Slide

  24. Network Reconstruction
    The data was normalized:
    ● by counting every pair of identical lines only once, in order to avoid that
    phrases bear too much weight
    The data should be further normalized (but there was no time for the
    analysis):
    ● by controlling the weight of each occurrence of a rhyme along with the
    size of the rhyme group, in which it occurs, in order to avoid that one
    overcounts links in poems with really large groups of rhyming words per
    stanza

    View Slide

  25. Analyzing the Shījīng Network

    View Slide

  26. A Bird’s Eye View

    View Slide

  27. A Bird’s Eye View

    View Slide

  28. A Bird’s Eye View

    View Slide

  29. A Bird’s Eye View

    View Slide

  30. A Bird’s Eye View

    View Slide

  31. Searching for Structure
    Coding Rhymes
    by Vowel Quality:
    ■ a
    ■ e
    ■ i
    ■ o
    ■ u
    ■ ə

    View Slide

  32. Transitions between Rhyme Groups...

    View Slide

  33. Jaccard index computed
    for the number of
    transitions between the
    OCBS rhyme groups and
    inside the same group.
    Most of the groups are
    strongly recovered,
    some, however, occur so
    infrequently that they
    rhyme more often with
    other words than with
    words of their own group.
    Transitions betweenRhyme Groups...

    View Slide

  34. Searching for Independent Structure: Communities

    View Slide

  35. Searching for Independent Structure: Communities

    View Slide

  36. Searching for Independent Structure: Communities

    View Slide

  37. Searching for Independent Structure: Communities

    View Slide

  38. Searching for Independent Structure: Communities

    View Slide

  39. Searching for Independent Structure: Communities
    1
    3

    View Slide

  40. Searching for Independent Structure: Communities
    An Infomap community detection analysis was carried out on
    the rhyme data. Infomap (Rosvall and Bergstrom 2008) is a
    fast community detection algorithm with a very good
    performance. It handles weighted nodes and weighted edges,
    and uses random walks through the network in order to
    determine the best partition into communities.
    ● Results can be inspected at http://digling.
    org/shijing/infomap.html

    View Slide

  41. Searching for Independent Structure: Communities

    View Slide

  42. Searching for Independent Structure: Communities
    The communities which were identified are not necessarily
    overlapping directly with the OCBS reconstructions of the
    rhymes, but show interesting transitions between rhyme
    groups as well as splits. Interpreting the data, however, is
    difficult, since:
    ● a community identified by Infomap is not necessarily
    homogeneous, since rhyming is not homogeneous
    ● a split of words with the same rhyme group into two
    communities does not imply that they do not rhyme
    ● we always need to get back to the real data and see what
    is going on there

    View Slide

  43. A Closer View at Specific Patterns: The *r-Coda
    Jaccard Index for
    shared rimes of
    characters with Old
    Chinese coda *r, *j,
    and *n according to
    the OCBS.

    View Slide

  44. A Closer View at Specific Patterns: *ar, *aj, and *an
    Coding Rhymes
    by Coda:
    ■ an
    ■ ar
    ■ aj
    ■ unclear

    View Slide

  45. A Closer View at Specific Patterns: *ar, *aj, and *an
    Coding Rhymes
    by Coda:
    ■ an
    ■ ar
    ■ aj
    ■ unclear

    View Slide

  46. A Closer View at Specific Patterns: *ar, *aj, and *an
    Coding Rhymes
    by Coda:
    ■ an
    ■ ar
    ■ aj
    ■ unclear

    View Slide

  47. A Closer View at Specific Patterns: *ar, *aj, and *an
    Coding Rhymes
    by Coda:
    ■ an
    ■ ar
    ■ aj
    ■ unclear

    View Slide

  48. A Closer View at Specific Patterns: *ar, *aj, and *an
    Coding Rhymes
    by Coda:
    ■ an
    ■ ar
    ■ aj
    ■ unclear

    View Slide

  49. A Closer View at Specific Patterns: *ar, *aj, and *an
    Coding Rhymes
    by Coda:
    ■ an
    ■ ar
    ■ aj
    ■ unclear
    Community 16
    (Infomap)

    View Slide

  50. A Closer View at Specific Patterns: *ar, *aj, and *an
    Coding Rhymes
    by Coda:
    ■ an
    ■ ar
    ■ aj
    ■ unclear
    Community 16
    (Infomap)

    View Slide

  51. A Closer View at Specific Patterns: *ar, *aj, and *an
    Coding Rhymes
    by Coda:
    ■ an
    ■ ar
    ■ aj
    ■ unclear
    Community 16
    (Infomap)

    View Slide

  52. Outlook

    View Slide

  53. Where are we?
    Well…
    ● Rhyme analysis based on network approaches is still
    strictly experimental.
    ● We need to enhance the data (missing readings,
    swapped lines in the Shījīng text).
    ● We need to enhance the models (better
    normalization!)

    View Slide

  54. Where are we?
    But…
    ● Already at this stage, it turns be useful to inspect the
    automatically identified clusters in times of doubt
    regarding the reading of a certain character.
    ● It is generally useful to make use of interactive
    visualization techniques when dealing with huge
    amounts of data (especially when it comes from
    different sources).
    ● Tools like the “Shījīng rhyme browser” are especially
    useful for beginners, but probably also for experts (?).

    View Slide

  55. Where could we be?
    Imagine…
    ● A world in which we have large collections of rhyme
    networks on all kinds of poetry, ranging Shakespeare
    via Bob Dylan, up to Eminem.
    ● We could gather important information on rhyming
    behaviour — both cross-cultural and culture-specific.
    ● We could track the emergence of hip hop, or the
    degradation of rhyme patterns in modern poetry, or
    even the influence of the “Judas!-call” on Bob Dylans
    rhyming practice...

    View Slide

  56. Where could we be?
    Imagine (now seriously)…
    ● that we could carry out large-scale comparisons on
    rhyming practice in different stages of Chinese
    ● that we could propose transparently our individual
    assessments of what we think rhymed in pieces of old
    Chinese poetry
    ● that we could trace the history of Chinese in poetry
    networks
    … doesn’t that sound like it could be interesting?

    View Slide

  57. Thanks to Laurent Sagart and William Baxter for helpful
    discussion, tips, ideas, and data!

    View Slide

  58. Thanks to Laurent Sagart and William Baxter for helpful
    discussion, tips, ideas, and data!
    Thanks to Bob Dylan, Eminem, Shakespeare, and all the other
    poets out there!

    View Slide

  59. Thanks to You, for Your Attention!
    Thanks to Laurent Sagart and William Baxter for helpful
    discussion, tips, ideas, and data!
    Thanks to Bob Dylan, Eminem, Shakespeare, and all the other
    poets out there!

    View Slide

  60. Thanks to You, for Your Attention!
    ranks
    bankes
    blanks
    francs
    tanks
    convention
    dimension
    green gentian
    comprehension
    suspension
    Thanks to Laurent Sagart and William Baxter for helpful
    discussion, tips, ideas, and data!
    Thanks to Bob Dylan, Eminem, Shakespeare, and all the other
    poets out there!

    View Slide