Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Extend the use of supplemental variables in GDA by applying machine learning to the free text descriptive response portion and combining it with MCA analysis

419kfj
October 08, 2023

Extend the use of supplemental variables in GDA by applying machine learning to the free text descriptive response portion and combining it with MCA analysis

The practice of linking the distribution of individuals within the space revealed by MCA with qualitative surveys has been mentioned in the book [1] and practiced in research activity [2]. In Japan, KH Coder [3] as a text analysis tool has been remarkably popularized and used in many social surveys.
It is possible to link this text analysis with the selected answers using functions within KH Coder. Our first attempt as a mixed research method is to use this functionality.
The next step is to add the frequently occurring words (important words) obtained at this stage to the individual coordinates as supplementary variables in the MCA and to analyze them by a GDA method [4].
In this report, as the next step, we report an example [5] in which frequently occurring words (important words) were tagged as positive/negative by the machine learning process and analyzed as supplementary variables.
This approach extends the use of supplementary variables in GDA.

References
• [1] Le Roux, Brigitte, & Henry Rouanet. 2010. "Multiple correspondence analysis.", Quantitative applications in the social sciences 163. Thousand Oaks, Calif: Sage Publications. "Between quantity and quality, there is geometry."p1
• [2] Tony Bennett, Mike Savage, Elizabeth Silva, Alan Warde, Modesto Gayo-Cal and David Wright al, "Culture, Class, Distinction",2009,2010, Routledge
• [3] https://khcoder.net/en/
• [4] with [1] and using the GDAtools package of R. Robette N. (2023), GDAtools : Geometric Data Analysis in R, version 2.0, https://nicolas- robette.github.io/GDAtools/
• [5] Kazuo Fujimoto and Kazuya Ohata, “Development of a method for analyzing participant satisfaction survey data that combines MCA and Aspect Based Sentiment Analysis.”(in Japanese), NLP2023
https://www.anlp.jp/proceedings/annual_meeting/2023/pdf_dir/Q1-11.pdf
• (in English) https://419kfj.sakura.ne.jp/db/wp- content/uploads/2023/09/nlp2023−article_01−13v1.1_eng.pdf

419kfj

October 08, 2023
Tweet

More Decks by 419kfj

Other Decks in Research

Transcript

  1. Extend the use of supplemental variables in GDA by applying

    machine learning to the free text descriptive response portion and combining it with MCA analysis ver1.0 CARME2023 09/28 Room2 11:00-12:30 kazuo fujimoto [email protected] Project Researcher Institute for Mathematics and Computer Science Tsuda University
  2. Abstract The practice of linking the distribution of individuals within

    the space revealed by MCA with qualitative surveys has been mentioned in the book [1] and practiced in research activity [2]. In Japan, KH Coder [3] as a text analysis tool has been remarkably popularized and used in many social surveys. It is possible to link this text analysis with the selected answers using functions within KH Coder. Our first attempt as a mixed research method is to use this functionality. The next step is to add the frequently occurring words (important words) obtained at this stage to the individual coordinates as supplementary variables in the MCA and to analyze them by a GDA method [4]. In this report, as the next step, we report an example [5] in which frequently occurring words (important words) were tagged as positive/negative by the machine learning process and analyzed as supplementary variables. This approach extends the use of supplementary variables in GDA. 2023/9/28 CARME2023@University of Bonn 4
  3. References • [1] Le Roux, Brigitte, & Henry Rouanet. 2010.

    "Multiple correspondence analysis.", Quantitative applications in the social sciences 163. Thousand Oaks, Calif: Sage Publications. "Between quantity and quality, there is geometry."p1 • [2] Tony Bennett, Mike Savage, Elizabeth Silva, Alan Warde, Modesto Gayo-Cal and David Wright al, "Culture, Class, Distinction",2009,2010, Routledge • [3] https://khcoder.net/en/ • [4] with [1] and using the GDAtools package of R. Robette N. (2023), GDAtools : Geometric Data Analysis in R, version 2.0, https://nicolas- robette.github.io/GDAtools/ • [5] Kazuo Fujimoto and Kazuya Ohata, “Development of a method for analyzing participant satisfaction survey data that combines MCA and Aspect Based Sentiment Analysis.”(in Japanese), NLP2023 • https://www.anlp.jp/proceedings/annual_meeting/2023/pdf_dir/Q1-11.pdf • (in English) https://419kfj.sakura.ne.jp/db/wp- content/uploads/2023/09/nlp2023−article_01−13v1.1_eng.pdf English version 2023/9/28 CARME2023@University of Bonn 5
  4. Software related references • Higuchi, Koichi 2017 “A Two-Step Approach

    to Quantitative Content Analysis: KH Coder Tutorial using Anne of Green Gables (Part II)” Ritsumeikan social sciences review 53(1): 137-147. [PDF File] https://khcoder.net/en/ • Robette N. (2023), GDAtools : Geometric Data Analysis in R, version 2.0, https://nicolas- robette.github.io/GDAtools/ • RStudio Team (2020). RStudio: Integrated Development for R. RStudio, PBC, Boston, MA URL http://www.rstudio.com/. • R Core Team (2023). _R: A Language and Environment for Statistical Computing_. R Foundation for Statistical Computing, Vienna, Austria. <https://www.R-project.org/>. • Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” Journal of Open Source Software, 4(43), 1686. doi:10.21105/joss.01686. 2023/9/28 CARME2023@University of Bonn 6
  5. Notice and Apology • In the following report, due to

    an application problem of the reporter, permission to reuse the raw data was not granted, so graphs and other information are based on the report for the The Association for Natural Language Processing in 2023/03, and no new analysis was conducted. • Referenced reports • Kazuo Fujimoto and Kazuya Ohata, “Development of a method for analyzing participant satisfaction survey data that combines MCA and Aspect Based Sentiment Analysis.”(in Japanese), NLP2023 (https://www.anlp.jp/proceedings/annual_meeting/2023/pdf_dir/Q1-11.pdf) (English version) 2023/9/28 CARME2023@University of Bonn 7
  6. Outline of my presentaion • Characteristics of the data (congratulatory

    response) • Challenge: • How can we extract improvement measures and issues when most of the responses are "good"? • Step 0 Exploratory Data Analysis (EDA) and MCA, and Basic Text mining, separately. • Step 1: Focus on free text responses. Linking text mining and MCA • Step 2: Focus on ambiguity of most frequently used key words and phrases. Adding Tags (positive/negative/ none) by machine learning (ABSA: Aspect Based Semantic Analysis). • Step 3 Project the tagged words onto the MCA indivisual map. • Issue. It was found that the individuals who selected the important tagged words can be plotted on the whole individual map, but the amount of tagging depends on the dictionary of machine learning. • Also, the MCA map is very biased to begin with, so we would like to deepen the analysis by utilizing CSA. 2023/9/28 CARME2023@University of Bonn 9
  7. Schematic overview of this report • Projecting tagged extracted words

    as supplemental variables into MCA's result space. • Our trial is an attempt to create supplemental variables by text mining and machine learning tagging and plotting them in individual space, and developing another mixed research methods. * Le Roux, Brigitte, & Henry Rouanet. 2010. "Multiple correspondence analysis.", chapter 1 Famous phrases. * MCA and mixed research methods 2023/9/28 CARME2023@University of Bonn 10
  8. Data Structure 2023/9/28 CARME2023@University of Bonn 11 ID Var1 Var2

    …. Varn 1 2 3 m-3 m-2 m-1 m Open Ended Free Text Answer parts : : : : … …
  9. Step 0 MCA and Text Minig Separately 2023/9/28 CARME2023@University of

    Bonn 12 ID Var1 Var2 …. Vark 1 2 3 N-3 N-2 N-1 N Free Text parts : : : : … … Specific MCA Text Mining by KH Coder. One Variable and its categories can be ploted in co- occurrence Network and CA Plot with words. Examning the mutual relations by KWIC concordance Separately
  10. Step 1 MCA and Frequent word as supplymentary variables 2023/9/28

    CARME2023@University of Bonn 13 ID Var1 Var2 …. Vark 1 2 3 N-3 N-2 N-1 N Free Text parts : : Word1 Word2 Word3 … Wordk 1 0 1 0 1 1 0 Specific MCA and SDA Interpret the Words using KWIC of Step 0
  11. by using KWIC of Step0 • We found the Ambiguous

    Meaning within frequented Words. • So we made next another approach as as follows: • put the p and n tag to each words. p means “positive” and n means “negative” • We make this process by using Aspect Based Semantic Analysis (ABSA). • After tagging to the Words and make data frame as Supplymentaly variable. • Overlayed them on individual space which is generated by MCA. 2023/9/28 CARME2023@University of Bonn 14
  12. Step 2 MCA and Tagged Frequent word as supplymentary variables

    2023/9/28 CARME2023@University of Bonn 15 ID Var1 Var2 …. Vark 1 2 3 N-3 N-2 N-1 N Free Text parts : : Word1/p Word1/n Word2/p … Word/n 1 0 1 0 1 1 0 Specific MCA and SDA Interpret the Words using KWIC of Step 0
  13. Characteristics of the data and Challenge • Characteristics of the

    data (congratulatory response) • Response selection for 5 case method • Mostly 5 or 4 responses. Average is …. • The seminar was about information security workshop, and participants were highly motivated. • Challenge: How can we extract improvement measures and issues when most of the responses are "good"? • Based on these results, if it is sufficient to summarize that the event was a success, then there is nothing to say. • However, it is necessary to identify issues that need to be addressed in order to make the event even better. 2023/9/28 CARME2023@University of Bonn 17
  14. Step0 Exploratory Data Analysis (EDA) and MCA • Number of

    respondents 2001 • Confirmation of the relationship between satisfaction and responses. • Analysis of the distribution of data by MCA confirms the trend of unsatisfactory respondents. • Responses that could lead to improvement (free text responses) are not found in the unsatisfactory response group. • An analysis of the free-response statements of the satisfied respondent group is needed. 2023/9/28 CARME2023@University of Bonn 18
  15. Paris displsy of Skill improved and Understanding 2023/9/28 CARME2023@University of

    Bonn • A large portion of “understanding” is accounted for by "skills: improved ". • ! Don’t understanding and skill improvement are not related. • Congratulatory Responses • That wasn't so bad, was it? (Polite Responses) • Involvement Self-identification Confirmation Responses • As long as you participated, there should be results. • There are issues to be clarified here. skills: improved skills: improved understanding understanding very improved、improved、 no change、Don’t know、NA understand well、understand、 Don’t understand some, Don’t understand many NA 19
  16. hese three questions are biased toward posive. Instructor's explanation and

    others focusing on understanding Seen in this way, responses about “instructor explanation”, “support”, and “response” are considered to be uninformative with respect to “understanding” 2023/9/28 CARME2023@University of Bonn Understanding instructor explanation support responses Understanding instructor explanation support responses ← Positive /Negative → 20
  17. Step 1: Focus on free text response. Linking text mining

    and MCA Respondents with extremely low satisfaction did not respond to the open-ended (free text )responses either. Therefore, they are not eligible to explore areas for improvement in the workshops. 2023/9/28 CARME2023@University of Bonn 21
  18. Space generation by MCA (speMCA with only NA excl.) 2023/9/28

    CARME2023@University of Bonn Completely disagree. Clustering of response patterns 22
  19. Number of responses and response rate to open- ended free

    text questions (Q15-2, Q20, Q22) • Answer all three questions: 223 (14.8%) • Reasons for "understand" responses (Q15-2): • 742+348+87+223=1400 • 70.0% • Course environment (Q20): • 73+348+223+9=653 • 32.6% • Other overall impressions (Q22): • 24+87+223+9=343 • 17.1% 2023/9/28 CARME2023@University of Bonn Reasons for "understand" overall impressions Course environment 23
  20. Step 2: Focus on ambiguity of frequently used key words

    and phrases. • Tag (positive/negative/none) these by machine learning (ABSA). • Words with both p/n occurrences • 'time, exercise, content, knowledge, terminology, explanation, training, lecture • Negative Word Top 5 ('time', 84), ('exercise', 72), ('content', 52), ('knowledge', 44), ('term', 30) • Positive Word Top 5 ('exercise', 120), ('content', 94), ('explanation', 78), ('training', 49), ('lecture', 37) • The table on the next page shows the "extracted words" list without the p/n tag. Frequent words detected by the aspect-based sentiment analysis are marked in this. 2023/9/28 CARME2023@University of Bonn 25
  21. Words with a high number of occurrences with ambiguous usage

    • Time • Exercise • Contents • Knowledge • explanation 抽出語 出現回数 抽出語 出現回数 1理解 583 21流れ 148 2時間 516 22発⽣ 143 3思う 488 23ありがとう 141 4インシデント 387 24研修 135 5演習 363 25⾮常 134 6内容 351 26勉強 134 7対応 316 27業務 130 8知識 307 28⾏う 127 9感じる 254 29解析 124 10ログ 208 30情報 124 11事前 183 31具体 122 12実際 182 32⽤語 119 13説明 175 33難しい 107 14学習 171 34グループ 98 15部分 168 35参加 98 16多い 160 36分かる 98 17もう少し 154 37⾃分 95 18受講 153 38良い 94 19セキュリティ 151 39講義 93 20報告 149 40必要 93 2023/9/28 CARME2023@University of Bonn • Training • Specific terms • lecture Term Frequency Term Frequency 26
  22. 2023/9/28 CARME2023@University of Bonn Response patterns for each question Sill

    improved Understanding Explanation of lecturer Adequate Speed ? Supports Responces to Questions 27
  23. Step 3 Project the tagged words onto the MCA entity

    map. The ”explanation" and "content" are characterized by negative expressions (successfully separated). 2023/9/28 CARME2023@University of Bonn 28
  24. Interim Summary and Future Issues • As indicated above, the

    results suggest that the input of free description responses from text mining as a supplemental variable in MCA allows for analysis in combination with the analysis of the free description portion and categorical variables. • It is also suggested that text mining can be used not only to extract words, but also to tag them using machine learning to enable more detailed analysis. • The key issue to be addressed is whether it is possible to encourage workshop participants to respond to free-text questions. • Since the distribution of congratulatory responses is highly skewed, we would like to deepen the analysis by using CSA and other methods. 2023/9/28 CARME2023@University of Bonn 30
  25. Summary by charts 2023/9/28 CARME2023@University of Bonn 31 ID Var1

    Var2 …. Varn 1 2 3 Open Ended Free Text Answer parts MCA KH Coder /Text mining KWIC concordance [Frequency List] of words SDA w/supplymentary Variables Πϯγσϯτ Α͘ཧղͰ͖ͨ ಺༰ ஌ࣝ ۩ମ ཧղ ԋश ରԠ ϩά આ໌ डߨ ݚम ඇৗ ࣌ؒ ͋Γ͕ͱ͏ ࢥ͏ ײ͡Δ ࣮ࡍ ཧղͰ͖ͨ ࣄલ ෦෼ ηΩϡϦςΟ ྲྀΕ ൃੜ ΋͏গ͠ ཧղͰ͖ͳ͍಺༰͕͋ͬͨ ༻ޠ ઐ໳ ղੳ ෆ଍ ଟ͍ ೉͍͠ ཧղͰ͖ͳ͍಺༰͕ଟ͔ͬͨ ػձ ษڧ Degree: ø ù ú û Frequency: ø÷÷ ù÷÷ ú÷÷ û÷÷ ü÷÷ ಛʹ ෆ଍ આ໌ķ ࣌ؒĵ ॳΊͯ ۩ମ ෦෼ ઐ໳ ಺༰ķ ଟ͍ ϋϯζΦϯ ୹͍ ֬ೝ গͳ͍ ಺༰ĵ ϩά ֶश ࣮ફ ೉͍͠ ༻ޠ ԋशĵ ଍ΓΔ ෼͔Δ ԋशķ ܦݧ άϧʔϓ શମ ஌ࣝ ՝୊ ղੳ ྲྀΕ ࣌ؒ ஌Δ ࡞ۀ ݚमķ ରԠ ࢿྉ આ໌ ಺༰ ࣄલ ํ๏ ମݧ ݚम ߨࢣ ࢀߟ ηΩϡϦςΟ ษڧ ൃੜ ࡞੒ ֶͿ ཧղ ৘ใ ࣮ࡍ Πϯγσϯτ डߨ ΋͏গ͠ ඞཁ ԋश ࣗ෼ ඇৗ ôù ôø ÷ ø ù ú ôø ÷ ø ù ੒෼øççï÷õûúûüóççúõýìð ੒෼ùççï÷õúĀøĀóççúõùüìð čĹĬĸļĬĵĪŀā ø÷÷ ù÷÷ ú÷÷ û÷÷ ü÷÷ Co-Occurrence Network Map CA Map MCA Map Step1 ABSA/Machine Learning Step2 Tagged GDA/SDA Questionnaire Free text answers Step0 Analysis separately Analysis Text by KWIC, refering MAPs.
  26. Acknowledgments • This paper would not have been possible without

    the machine learning (ABSA) run by NICT's 2022 RA. Kazuya Ohata; thank you again for the co-authored paper and poster session presentation at the March 2023 Natural Language Processing Conference NLP2023. • The research on multiple correspondence analysis by the reporter is also supported by Grant-in-Aid for Scientific Research (KAKENHI), 20K02162 "Research on Categorical Data Analysis Methods Focusing on Geometric Arrangement of Data". We would like to express our gratitude for the support. https://kaken.nii.ac.jp/ja/grant/KAKENHI- PROJECT-20K02162/ 2023/9/28 CARME2023@University of Bonn 33