Extend the use of supplemental variables in GDA by applying machine learning to the free text descriptive response portion and combining it with MCA analysis

Extend the use of supplemental variables in GDA by applying
machine learning to the free text descriptive response portion and combining it with MCA analysis ver1.0 CARME2023 09/28 Room2 11:00-12:30 kazuo fujimoto [email protected] Project Researcher Institute for Mathematics and Computer Science Tsuda University

Very short seld introduction:After CARME… After CARME2015, This transrated Book
publisherd. After CARME2019!

So Aftre CARME2023… • Not decided … 2023/9/28 CARME2023@University of
Bonn 3

Abstract The practice of linking the distribution of individuals within
the space revealed by MCA with qualitative surveys has been mentioned in the book [1] and practiced in research activity [2]. In Japan, KH Coder [3] as a text analysis tool has been remarkably popularized and used in many social surveys. It is possible to link this text analysis with the selected answers using functions within KH Coder. Our first attempt as a mixed research method is to use this functionality. The next step is to add the frequently occurring words (important words) obtained at this stage to the individual coordinates as supplementary variables in the MCA and to analyze them by a GDA method [4]. In this report, as the next step, we report an example [5] in which frequently occurring words (important words) were tagged as positive/negative by the machine learning process and analyzed as supplementary variables. This approach extends the use of supplementary variables in GDA. 2023/9/28 CARME2023@University of Bonn 4

References • [1] Le Roux, Brigitte, & Henry Rouanet. 2010.
"Multiple correspondence analysis.", Quantitative applications in the social sciences 163. Thousand Oaks, Calif: Sage Publications. "Between quantity and quality, there is geometry."p1 • [2] Tony Bennett, Mike Savage, Elizabeth Silva, Alan Warde, Modesto Gayo-Cal and David Wright al, "Culture, Class, Distinction",2009,2010, Routledge • [3] https://khcoder.net/en/ • [4] with [1] and using the GDAtools package of R. Robette N. (2023), GDAtools : Geometric Data Analysis in R, version 2.0, https://nicolas- robette.github.io/GDAtools/ • [5] Kazuo Fujimoto and Kazuya Ohata, “Development of a method for analyzing participant satisfaction survey data that combines MCA and Aspect Based Sentiment Analysis.”(in Japanese), NLP2023 • https://www.anlp.jp/proceedings/annual_meeting/2023/pdf_dir/Q1-11.pdf • (in English) https://419kfj.sakura.ne.jp/db/wp- content/uploads/2023/09/nlp2023−article_01−13v1.1_eng.pdf English version 2023/9/28 CARME2023@University of Bonn 5

Software related references • Higuchi, Koichi 2017 “A Two-Step Approach
to Quantitative Content Analysis: KH Coder Tutorial using Anne of Green Gables (Part II)” Ritsumeikan social sciences review 53(1): 137-147. [PDF File] https://khcoder.net/en/ • Robette N. (2023), GDAtools : Geometric Data Analysis in R, version 2.0, https://nicolas- robette.github.io/GDAtools/ • RStudio Team (2020). RStudio: Integrated Development for R. RStudio, PBC, Boston, MA URL http://www.rstudio.com/. • R Core Team (2023). _R: A Language and Environment for Statistical Computing_. R Foundation for Statistical Computing, Vienna, Austria. <https://www.R-project.org/>. • Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” Journal of Open Source Software, 4(43), 1686. doi:10.21105/joss.01686. 2023/9/28 CARME2023@University of Bonn 6

Notice and Apology • In the following report, due to
an application problem of the reporter, permission to reuse the raw data was not granted, so graphs and other information are based on the report for the The Association for Natural Language Processing in 2023/03, and no new analysis was conducted. • Referenced reports • Kazuo Fujimoto and Kazuya Ohata, “Development of a method for analyzing participant satisfaction survey data that combines MCA and Aspect Based Sentiment Analysis.”(in Japanese), NLP2023 (https://www.anlp.jp/proceedings/annual_meeting/2023/pdf_dir/Q1-11.pdf) (English version) 2023/9/28 CARME2023@University of Bonn 7

Outline of my presentaion 2023/9/28 CARME2023@University of Bonn 8

Outline of my presentaion • Characteristics of the data (congratulatory
response) • Challenge: • How can we extract improvement measures and issues when most of the responses are "good"? • Step 0 Exploratory Data Analysis (EDA) and MCA, and Basic Text mining, separately. • Step 1: Focus on free text responses. Linking text mining and MCA • Step 2: Focus on ambiguity of most frequently used key words and phrases. Adding Tags (positive/negative/ none) by machine learning (ABSA: Aspect Based Semantic Analysis). • Step 3 Project the tagged words onto the MCA indivisual map. • Issue. It was found that the individuals who selected the important tagged words can be plotted on the whole individual map, but the amount of tagging depends on the dictionary of machine learning. • Also, the MCA map is very biased to begin with, so we would like to deepen the analysis by utilizing CSA. 2023/9/28 CARME2023@University of Bonn 9

Schematic overview of this report • Projecting tagged extracted words
as supplemental variables into MCA's result space. • Our trial is an attempt to create supplemental variables by text mining and machine learning tagging and plotting them in individual space, and developing another mixed research methods. * Le Roux, Brigitte, & Henry Rouanet. 2010. "Multiple correspondence analysis.", chapter 1 Famous phrases. * MCA and mixed research methods 2023/9/28 CARME2023@University of Bonn 10

Data Structure 2023/9/28 CARME2023@University of Bonn 11 ID Var1 Var2
…. Varn １２３ m-3 m-2 m-1 ｍ Open Ended Free Text Answer parts : : : : … …

Step 0 MCA and Text Minig Separately 2023/9/28 CARME2023@University of
Bonn 12 ID Var1 Var2 …. Vark １２３ N-3 N-2 N-1 N Free Text parts : : : : … … Specific MCA Text Mining by KH Coder. One Variable and its categories can be ploted in co- occurrence Network and CA Plot with words. Examning the mutual relations by KWIC concordance Separately

Step 1 MCA and Frequent word as supplymentary variables 2023/9/28
CARME2023@University of Bonn 13 ID Var1 Var2 …. Vark １２３ N-3 N-2 N-1 N Free Text parts : : Word1 Word2 Word3 … Wordk 1 0 1 0 1 1 0 Specific MCA and SDA Interpret the Words using KWIC of Step 0

by using KWIC of Step0 • We found the Ambiguous
Meaning within frequented Words. • So we made next another approach as as follows: • put the p and n tag to each words. p means “positive” and n means “negative” • We make this process by using Aspect Based Semantic Analysis (ABSA). • After tagging to the Words and make data frame as Supplymentaly variable. • Overlayed them on individual space which is generated by MCA. 2023/9/28 CARME2023@University of Bonn 14

Step 2 MCA and Tagged Frequent word as supplymentary variables
2023/9/28 CARME2023@University of Bonn 15 ID Var1 Var2 …. Vark １２３ N-3 N-2 N-1 N Free Text parts : : Word1/p Word1/n Word2/p … Word/n 1 0 1 0 1 1 0 Specific MCA and SDA Interpret the Words using KWIC of Step 0

Step 0 and Step 1 2023/9/28 CARME2023@University of Bonn 16

Characteristics of the data and Challenge • Characteristics of the
data (congratulatory response) • Response selection for 5 case method • Mostly 5 or 4 responses. Average is …. • The seminar was about information security workshop, and participants were highly motivated. • Challenge: How can we extract improvement measures and issues when most of the responses are "good"? • Based on these results, if it is sufficient to summarize that the event was a success, then there is nothing to say. • However, it is necessary to identify issues that need to be addressed in order to make the event even better. 2023/9/28 CARME2023@University of Bonn 17

Step0 Exploratory Data Analysis (EDA) and MCA • Number of
respondents 2001 • Confirmation of the relationship between satisfaction and responses. • Analysis of the distribution of data by MCA confirms the trend of unsatisfactory respondents. • Responses that could lead to improvement (free text responses) are not found in the unsatisfactory response group. • An analysis of the free-response statements of the satisfied respondent group is needed. 2023/9/28 CARME2023@University of Bonn 18

Paris displsy of Skill improved and Understanding 2023/9/28 CARME2023@University of
Bonn • A large portion of “understanding” is accounted for by "skills: improved ". • ! Don’t understanding and skill improvement are not related. • Congratulatory Responses • That wasn't so bad, was it? (Polite Responses) • Involvement Self-identification Confirmation Responses • As long as you participated, there should be results. • There are issues to be clarified here. skills: improved skills: improved understanding understanding very improved、improved、 no change、Don’t know、NA understand well、understand、 Don’t understand some, Don’t understand many NA 19

hese three questions are biased toward posive. Instructor's explanation and
others focusing on understanding Seen in this way, responses about “instructor explanation”, “support”, and “response” are considered to be uninformative with respect to “understanding” 2023/9/28 CARME2023@University of Bonn Understanding instructor explanation support responses Understanding instructor explanation support responses ← Positive /Negative → 20

Step 1: Focus on free text response. Linking text mining
and MCA Respondents with extremely low satisfaction did not respond to the open-ended (free text )responses either. Therefore, they are not eligible to explore areas for improvement in the workshops. 2023/9/28 CARME2023@University of Bonn 21

Space generation by MCA (speMCA with only NA excl.) 2023/9/28
CARME2023@University of Bonn Completely disagree. Clustering of response patterns 22

Number of responses and response rate to open- ended free
text questions (Q15-2, Q20, Q22) • Answer all three questions: 223 (14.8%) • Reasons for "understand" responses (Q15-2): • 742+348+87+223=1400 • 70.0% • Course environment (Q20): • 73+348+223+9=653 • 32.6% • Other overall impressions (Q22): • 24+87+223+9=343 • 17.1% 2023/9/28 CARME2023@University of Bonn Reasons for "understand" overall impressions Course environment 23

Step 2 2023/9/28 CARME2023@University of Bonn 24

Step 2: Focus on ambiguity of frequently used key words
and phrases. • Tag (positive/negative/none) these by machine learning (ABSA). • Words with both p/n occurrences • 'time, exercise, content, knowledge, terminology, explanation, training, lecture • Negative Word Top 5 ('time', 84), ('exercise', 72), ('content', 52), ('knowledge', 44), ('term', 30) • Positive Word Top 5 ('exercise', 120), ('content', 94), ('explanation', 78), ('training', 49), ('lecture', 37) • The table on the next page shows the "extracted words" list without the p/n tag. Frequent words detected by the aspect-based sentiment analysis are marked in this. 2023/9/28 CARME2023@University of Bonn 25

Words with a high number of occurrences with ambiguous usage
• Time • Exercise • Contents • Knowledge • explanation 抽出語出現回数抽出語出現回数 1理解 583 21流れ 148 2時間 516 22発⽣ 143 3思う 488 23ありがとう 141 4インシデント 387 24研修 135 5演習 363 25⾮常 134 6内容 351 26勉強 134 7対応 316 27業務 130 8知識 307 28⾏う 127 9感じる 254 29解析 124 10ログ 208 30情報 124 11事前 183 31具体 122 12実際 182 32⽤語 119 13説明 175 33難しい 107 14学習 171 34グループ 98 15部分 168 35参加 98 16多い 160 36分かる 98 17もう少し 154 37⾃分 95 18受講 153 38良い 94 19セキュリティ 151 39講義 93 20報告 149 40必要 93 2023/9/28 CARME2023@University of Bonn • Training • Specific terms • lecture Term Frequency Term Frequency 26

2023/9/28 CARME2023@University of Bonn Response patterns for each question Sill
improved Understanding Explanation of lecturer Adequate Speed ? Supports Responces to Questions 27

Step 3 Project the tagged words onto the MCA entity
map. The ”explanation" and "content" are characterized by negative expressions (successfully separated). 2023/9/28 CARME2023@University of Bonn 28

Interim Summary and Future Issues 2023/9/28 CARME2023@University of Bonn 29

Interim Summary and Future Issues • As indicated above, the
results suggest that the input of free description responses from text mining as a supplemental variable in MCA allows for analysis in combination with the analysis of the free description portion and categorical variables. • It is also suggested that text mining can be used not only to extract words, but also to tag them using machine learning to enable more detailed analysis. • The key issue to be addressed is whether it is possible to encourage workshop participants to respond to free-text questions. • Since the distribution of congratulatory responses is highly skewed, we would like to deepen the analysis by using CSA and other methods. 2023/9/28 CARME2023@University of Bonn 30

Summary by charts 2023/9/28 CARME2023@University of Bonn 31 ID Var1
Var2 …. Varn １２３ Open Ended Free Text Answer parts MCA KH Coder /Text mining KWIC concordance [Frequency List] of words SDA w/supplymentary Variables Πϯγσϯτ Α͘ཧղͰ͖ͨ ಺༰ ஌ࣝ ۩ମ ཧղ ԋश ରԠ ϩά આ໌ डߨ ݚम ඇৗ ࣌ؒ ͋Γ͕ͱ͏ ࢥ͏ ײ͡Δ ࣮ࡍ ཧղͰ͖ͨ ࣄલ ෦෼ ηΩϡϦςΟ ྲྀΕ ൃੜ ΋͏গ͠ ཧղͰ͖ͳ͍಺༰͕͋ͬͨ ༻ޠ ઐ໳ ղੳ ෆ଍ ଟ͍ ೉͍͠ ཧղͰ͖ͳ͍಺༰͕ଟ͔ͬͨ ػձ ษڧ Degree: ø ù ú û Frequency: ø÷÷ ù÷÷ ú÷÷ û÷÷ ü÷÷ ಛʹ ෆ଍ આ໌ķ ࣌ؒĵ ॳΊͯ ۩ମ ෦෼ ઐ໳ ಺༰ķ ଟ͍ ϋϯζΦϯ ୹͍ ֬ೝ গͳ͍ ಺༰ĵ ϩά ֶश ࣮ફ ೉͍͠ ༻ޠ ԋशĵ ଍ΓΔ ෼͔Δ ԋशķ ܦݧ άϧʔϓ શମ ஌ࣝ ՝୊ ղੳ ྲྀΕ ࣌ؒ ஌Δ ࡞ۀ ݚमķ ରԠ ࢿྉ આ໌ ಺༰ ࣄલ ํ๏ ମݧ ݚम ߨࢣ ࢀߟ ηΩϡϦςΟ ษڧ ൃੜ ࡞੒ ֶͿ ཧղ ৘ใ ࣮ࡍ Πϯγσϯτ डߨ ΋͏গ͠ ඞཁ ԋश ࣗ෼ ඇৗ ôù ôø ÷ ø ù ú ôø ÷ ø ù ੒෼øççï÷õûúûüóççúõýìð ੒෼ùççï÷õúĀøĀóççúõùüìð čĹĬĸļĬĵĪŀā ø÷÷ ù÷÷ ú÷÷ û÷÷ ü÷÷ Co-Occurrence Network Map CA Map MCA Map Step1 ABSA/Machine Learning Step2 Tagged GDA/SDA Questionnaire Free text answers Step0 Analysis separately Analysis Text by KWIC, refering MAPs.

Acknowledgments 2023/9/28 CARME2023@University of Bonn 32

Acknowledgments • This paper would not have been possible without
the machine learning (ABSA) run by NICT's 2022 RA. Kazuya Ohata; thank you again for the co-authored paper and poster session presentation at the March 2023 Natural Language Processing Conference NLP2023. • The research on multiple correspondence analysis by the reporter is also supported by Grant-in-Aid for Scientific Research (KAKENHI), 20K02162 "Research on Categorical Data Analysis Methods Focusing on Geometric Arrangement of Data". We would like to express our gratitude for the support. https://kaken.nii.ac.jp/ja/grant/KAKENHI- PROJECT-20K02162/ 2023/9/28 CARME2023@University of Bonn 33

Thank you for your attention. Questions and suggestions are welcome.
[email protected] 2023/9/28 CARME2023@University of Bonn 34

MEMO 2023/9/28 CARME2023@University of Bonn 35

Extend the use of supplemental variables in GDA...

Extend the use of supplemental variables in GDA by applying machine learning to the free text descriptive response portion and combining it with MCA analysis

More Decks by 419kfj

Other Decks in Research

Featured

Transcript