Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Big Data Analysis of Yumentingzheng

A Big Data Analysis of Yumentingzheng

https://aaron.kr/content/portfolio/word-cloud-big-data-analysis-of-manchu-script-in-weiwenqiju/

Poster presentation at The 50th Fall Comprehensive Conference of the Korea Information and Communication Society (KIICE) (제50회 한국정보통신학회 추계종합학술대회).

The Qing dynasty's Yumentingzheng (御門聽政) is an important document comparable to the Annals of Joseon in Korea. This is a record of conversations between the Qing dynasty emperor and his ministers.

Manchu script is one of the few scripts left in the world that does NOT have an easy way to perform OCR (optical character recognition) on it to scan and read documents into a digital format. There is also no readily available and complete Manchu dictionary available for word lookup. Therefore, pages containing the Manchu script must first be transliterated in a way that computers can perform analysis.

In this study, text that was already transcribed using the Möllendorf method was simplified to the Abkai method (transliterations of Manchu) in order to perform a big data analysis on the text and pull out the most important keywords from this portion of the text, the Weiwenqiji (慰問起居).

Aaron Snowberger

October 28, 2021
Tweet

More Decks by Aaron Snowberger

Other Decks in Technology

Transcript

  1. Aaron Daniel Snowberger,
    Choong Ho Lee
    어문청정 빅데이터 분석:
    위문기거 일례
    A Big Data Analysis of
    Yumentingzheng:
    Weiwenqiju
    as an Example








    View Slide

  2. Introduction
    Yumentingzheng(御門聽政), which records the contents of the Qing
    dynasty's discussions with his subjects, is an important document like the
    Annals of Joseon in Korea. This paper describes the methods and steps
    for big data analysis of Yumentingzheng written in the Manchu
    alphabet. In big data analysis of documents written in Manchu characters,
    there are many problems that need to be solved in advance, and research
    on these should be preceded. In this paper, a method of big data analysis
    using the R language was proposed in the stage where the text written in
    Manchurian characters was transliterated into Latin characters through
    a preliminary study to be conducted in the future. In the proposed method,
    the Apkai method was adopted for the transliteration of Yumentingzheng,
    and the results of big data analysis were presented using the text of
    Weiwenqiju(慰問起居).

    View Slide

  3. Text[1] that was already transcribed by
    the Möllendorf method was converted to
    the Abkai method, and then big data
    analysis was performed with R on the
    frequency of words appearing in the text.
    Möllendorf Abkai
    Möllendorf Abkai Manchu
    0
    1
    š x ᡡ
    0
    2
    c q ᡡ
    0
    3
    ū v ᡡ

    View Slide

  4. Manchu Text Latin Transliteration
    Manchu Möllendorf Abkai
    The original Manchu
    text was transcribed
    using the Möllendorf
    method.
    The transcription was
    edited to according to
    the Abkai method in
    the previous table.
    The resulting edited
    transcription was used
    in the big data analysis.
    0
    1
    0
    2

    View Slide

  5. It is necessary to find meaningless
    words among the words displayed in
    large letters to remove in order to
    display the word cloud with
    meaningful and important words.
    1st Wordcloud Analysis
    The words ‘be’ and ‘de’ appear the most often
    because nouns cannot be extracted individually.
    Word Korean English
    0
    1
    be ~을,~를, ~로써, ~로 하여금 particles
    0
    2
    de ~에, ~에서, ~에로, 에
    대하여 at, to
    0
    3
    amban 대신(大臣) Minister
    0
    4
    kemuni 늘, 언제나 always
    0
    5
    aniya 해, 년 year
    0
    6
    yasa 눈(目) eye
    0
    7
    udu ~라 할지라도 even though

    View Slide

  6. It is necessary to find meaningless
    words among the words displayed in
    large letters to remove in order to
    display the word cloud with
    meaningful and important words.
    2nd Wordcloud Analysis
    Word Korean English
    0
    1
    yasa 눈(目) eye
    0
    2
    amban 대신(大臣) Minister
    0
    3
    kemuni 늘, 언제나 always
    0
    4
    aniya 해, 년 year
    0
    5
    udu ~라 할지라도 even though
    With the particles ‘be’ and ‘de’ removed, as well
    as other pronouns such as ‘bi’, ‘si’, ‘udu’, and
    ‘mini’, we can find the most important words.

    View Slide

  7. Wordcloud2 Analysis
    The R package wordcloud2 produces a wordcloud with
    the ability to mouseover any word to see its frequency.
    (This requires sorting the table in decreasing order first.)
    Word Count
    yasa 10
    qi 9
    ye 9
    se 9
    ere 8
    ba 7
    amban 7
    aniya 7
    the 7
    kemuni 6

    View Slide

  8. Conclusion
    This paper presented a big data analysis method of
    literature written in Manchu characters. Since the
    Manchu dictionary package has not been provided in
    the R language until now, a word cloud was created
    with the frequency of words in the current state.
    The method of Romanizing Manchu characters was
    converted to the Apkai method without special
    symbols. In fact, the effectiveness of this method
    was demonstrated by conducting an experiment
    with the Weiwenqiju portion of the Yumentingzheng.
    0
    1
    0
    2
    CREDITS: This presentation template was created by Slidesgo,
    including icons by Flaticon, and infographics & images by Freepik

    View Slide

  9. [1] Zhuang Jifa, Yumentingzheng, Wenshizhe Press, 2000.
    [2] Manchu alphabet. [Internet] Available:
    https://en.wikipedia.org/wiki/Manchu_alphabet
    [3] Diandian Zhang, Yan Liu, Zhuowei Wang, and Depei Wang, "OCR with the Deep
    CNN Model for Ligature Script-Based Languages like Manchu," Hindawi Scientific
    Programming, vol. 2021, Article ID 5520338, https://doi.org/10.1155/2021/5520338
    [4] Jang Yongsik, Kang Higu, Learning to code in R language, Saengneung Press, 2018.
    [5] Manchu Language. [Internet] Available:
    https://namu.wiki/w/%EB%A7%8C%EC%A3%BC%EC%96%B4
    References
    CREDITS: This presentation template was created by Slidesgo,
    including icons by Flaticon, and infographics & images by Freepik

    View Slide