Slide 1

Slide 1 text

Methods and Examples of Correspondence Analysis July 17, 2024 @Chiba University Institute for Mathematics and Computer Science, Tsuda University Project Manager: Kazuo Fujimoto [email protected]

Slide 2

Slide 2 text

Acknowledgement • This work was supported by JSPS KAKENHI Grant Number JP20K02162, JP23K22184. 2024/07/17 Response Analysis Methodology and Examples 2

Slide 3

Slide 3 text

Revise log • ver0.96 2024/07/17 05:30 Tsuda college -> Tsuda University, add Acknowledgement page. • ver0.95 2024/07/16 22:30 CA 2nd example into English • ver0.9 2024/07/16 • Edited from the presentation for Reported at the first meeting of the "Beyond Governance-Type Ethics" workshop, Ver1.0 2024/06/29 . 2024/07/17 Response Analysis Methodology and Examples 3

Slide 4

Slide 4 text

related files • download “CA_workshop_qmd_files.zip from • http://133.167.73.14/~kazuo/CA_workshop/CA_workshop_qmd_file s.zip and extract in a project directory. it contain, 2 qmd files and 1 rda file. 2024/07/17 Response Analysis Methodology and Examples 4

Slide 5

Slide 5 text

Profile of Lecturer • Born in Tokyo, 1955 • 1978/3 Graduated from Sophia University, Faculty of Science and Engineering, Department of Electrical and Electronic Engineering • Worked for a company until 2002/3 after graduation. As Engineer, and Marketing staff. • Studied sociology at Tokyo Metropolitan University Graduate School from 1990 to 1992. • 2002/4-2020/3 Worked at Sakushin Gakuin University. • Sociology • Communication theory • IT Security Issues/Information Systems Theory • Social Research Practice • 2020/3 Retired. Professor Emeritus. • After retirement from Sakushin Gakuin University, • From April 2021, I moved to be a Project Researcher of the Institute for Mathematics and Computer Science, Tsuda College. • From September 2021, I work as Invited Expert at the National Institute of Information and Communications Technology (NICT). 2024/07/17 Response Analysis Methodology and Examples 5

Slide 6

Slide 6 text

Research Themes • Categorical Data Analysis in Social Research • Correspondence Analysis Related Translations • 2015, "Introduction to Correspondence Analysis," Ohmsha, Inc. • 2020 "Theory and Practice of Correspondence Analysis," Ohmsha, Inc. • Current Research Topics • "A Study of Categorical Data Analysis Focusing on the Geometrical Structure of Data." • KAKEN DB https://bit.ly/3sOg8JI 2024/07/17 Response Analysis Methodology and Examples 6

Slide 7

Slide 7 text

Todayʼs Program • 09:30 〜 • Keywords to understand Correspondence Analysis(CA) • Principle Mathematical tool for dimension reduction. • Case Studies • CA Case Study • MCA Case Studies • MCA Applications • Structured Data Analysis • 11:00 〜 • CA/MCA in R/ Proctice • 12:00 〜 • Q&A andAppendix 2024/07/17 Response Analysis Methodology and Examples 7

Slide 8

Slide 8 text

What is Categorical data? How do you process categorical data? 2024/07/17 Response Analysis Methodology and Examples 8

Slide 9

Slide 9 text

What do you do when you face categorical data? • classification of research data. • Quantitative variables. • numerical • Categorical/Qualitative variables. • Nominal variables. • Ordinal variables. • “Scaling” is very important data attribute. • ref: “measurement scale” https://www.britannica.com/topic/measurement-scale 2024/07/17 Response Analysis Methodology and Examples 9

Slide 10

Slide 10 text

Ordinal variables Scaling 2024/07/17 Response Analysis Methodology and Examples 10 Encoded to 5 4 3 2 1 Question A Question B Is the Answer 4 of Question A is the same “strength” to the Answer 4 of Question B ?

Slide 11

Slide 11 text

Your action? • In order to store in the computer, data often coded numerically. • 1,2,3,4,5 • Do you calculate the mean or variance, even though that Number is a Name? • In Excel, if the ʼappearance' is a number, you can calculate it as a number, even though it is a nominal variable. • With ordinal variables, such a process seems even more reasonable. • But when you are coded using the five-case method, 5, 4, 3, 2, 1, is 4 double 2? And is there an equal difference of 1 between each response? To begin with, is 5 to 1 linear? 2024/07/17 Response Analysis Methodology and Examples 11

Slide 12

Slide 12 text

Integer Scale or Likert Scale • to adapt the statistical methods, like mean and variance, it is needed to convert Categorical responces to Numeric value. • Integer Scaling or Likart Scaling is certainly one of the methods. • But is it adequate to investigate the data (and reply actions) ? •CA and MCA provide the Quantification Method to preserve the data structure. 2024/07/17 Response Analysis Methodology and Examples 12

Slide 13

Slide 13 text

What is CA/MCA ? Quantify by focusing on response patterns. 2024/07/17 Response Analysis Methodology and Examples 13

Slide 14

Slide 14 text

Relationship between CA and MCA • Correspondence Analysis 対応分析/대응 분석 • “CA” in short. • Some call it ”Korepon." ....(in Japanese) • Multiple Correspondence Analysis 多重対応分析/다중 대응 분석 • Multiple Correspondence Analysis • “MCA” in short • In contrast, CA is sometimes called Simple CA.

Slide 15

Slide 15 text

Data format 2024/07/17 Response Analysis Methodology and Examples 15 for CA for MCA : mulitle variables answer indivisuals Varibale 2 Varibale 1

Slide 16

Slide 16 text

both format are the same, 2 dimentions. 2024/07/17 Response Analysis Methodology and Examples 16 expand the “Variable” to Colum of Varibles Category. (indicator matrix) “indivisuals x variables” table convert to “indivisuals x categories” table .

Slide 17

Slide 17 text

for CA and MCA, every features have a same mechanism • MCA will be performed by CA to the indicator matrix, which is converted form individuals x variable matrix. • In CA you will see “Row space” and “Col space”, it correspond to “individual spacae” and “Variable space” of MCA. 2024/07/17 Response Analysis Methodology and Examples 17

Slide 18

Slide 18 text

Key words for CA/MCA • Data Object of analysis • Cross table • CA: 2 variables, MCA: multivariate • In fact, the MCA is also a 2 "Variables" table: • row: Individual x column: Response Category(expanded from Response Variables). • Releated methods • Quantification for Categorical data • Chikio Hayashi, Quantification Methods type III • Shizuhiko Nishizato, "Necessity of Quantification," Kwansei Gakuin University Booklet

Slide 19

Slide 19 text

Sociology and MCA • Famous application of MCA • The Sociology of Relationships as opposed to the Sociology of Variables • Adapted from the German edition of Perface, "Metier des Sociologists. • Refers to "sociology of variables" ,regression analysi • Bourdieu stated "I use Correspondence Analysis very much, because I think that it is essentially a relational procedure whose philosophy fully expresses what in my view constitutes social reality. It is a procedure that 'thinks' in relations, as I try to do with the concept of field".2 2 Preface of the German edition of Le métier de sociologue, 1991 Lebardon Frederic, 2009:13 Lebardon Frederic, 2009,”How Bourdieu ʻQuantifiedʼ Bourdieu: The Geometric Modelling of Data”, Karen Robson, Chris Sanders ed, Quantifying Theory:Pierre Bourdieu, Springer

Slide 20

Slide 20 text

Summary before the explanation of CA case study. • Format of input data • CA and MCA • Result generated as a result of CA/MCA • Two Spaces • coordinate axis • coordinate • Two coordinates • home position • standard coordinates • Interpretation of the points with Geometric point of view • The origin point of the graph is the overroll average • The similar points are placed in close position. • Different things are placed far away each other. • What you are looking at is the row/column profile

Slide 21

Slide 21 text

Input data format • The CA enters the cross table. • Two variable, 2 dimentions table • MCA is a table of rows: individuals, columns: variables, such as survey data. • However, as a process, the variable columns are expanded into variable category (choice) columns, and the CA is performed in the form of an indicator matrix with 1 in the selected cell. (One of the classical MCAs.)

Slide 22

Slide 22 text

both format are the same, 2 dimentions. 2024/07/17 Response Analysis Methodology and Examples 22 expand the “Variable” to Colum of Varibles Category. (indicator matrix) “indivisuals x variables” table convert to “indivisuals x categories” table .

Slide 23

Slide 23 text

CA/MCA results • CA/MCA compares row and column profiles. • Profiles: Row ratio, column ratio • The row ratio contains information on the selected column by row. • Calculate the distance (chi-square distance) of its row/column profile vector • Origin is the overall average point (expected value) • Similar profiles are located close together and different profiles are located far apart. • Mathematically, the residual matrix from the expected value of the input matrix is decomposed into singular values. • Matrix about row coordinates, diagonal matrix about the variance of the coordinate axes, and matrix about column coordinates

Slide 24

Slide 24 text

SVD for residual matrix • SVD result 3matrixes , U,D,V • S= U Dα Vt • U related to Row coordinate • V related to Col coordinate • αis singular value(square root of eigen value) excepcted matrix residuales diag matrix, items are inverted squared row margin diag matrix, items are inverted squared col margin standardization

Slide 25

Slide 25 text

core calculation of CA/MCA P S(residual matrix) U Dα V SVD diag matrix, items are inverted squared row margin Dr -1/2 diag matrix, items are inverted squared col margin Dr -1/2 Φrow standard coord Γcol standard coord Frow principal coord Gcol principal coord Φ=Dr -1/2U Γ=Dc -1/2V F=ΦDα G=ΓDα result of SVD UDV dimention reduction is performed by selecting α P=M/n 2024/07/17 Response Analysis Methodology and Examples 25

Slide 26

Slide 26 text

row1 … rowm col Sum col1 1 : 1 coln 1 AveColProfile col1 … coln row Sum row1 1 : 1 rowm 1 AveRowProfile 1 CA input table and two profiles Row Profile R Column Profile C 2024/07/17 Response Analysis Methodology and Examples 26 col1 … coln rowS um row1 : rowm colSum From the point of view from Row: row Analysis From the point of view from Col: col Analysis

Slide 27

Slide 27 text

CA generate Row space and Col space 2024/07/17 Response Analysis Methodology and Examples 27 col1 … coln rowS um row1 : rown colSum m x n matrix Row space Col space Generating the space means generating axies, dim1….dimn. > these dimn has inertias or variances which are disassembled from the total variance of inputed table. > these inertial of dims are the same in Row space and Col spece.

Slide 28

Slide 28 text

CA's Three Result Row coordin ates F Dim1 … row1 row2 row3 Column coordin ates G Dim1 … col1 col2 col3 Eigenval ue λ Dim1 … lambda 2024/07/17 Response Analysis Methodology and Examples 28 SVD to the residual matrix results three matrixes col1 … coln Row Sum row1 : rowm ColSum

Slide 29

Slide 29 text

Row and Column relationships • Row coordinates F after CA, Row profile R before CA • Column Coordinates G after CA, Column Profile C before CA are interpenetrated through the inertia (variance, eigenvalue, or squared singular value) of the coordinates generated by the CA. (transition formula) • This relationship make the relationship of “additional variables” and “additional categories”, which do not contribute to the creation of the space but acquire coordinates. • This is basic feature of Structured Data Analysis , SDA.

Slide 30

Slide 30 text

Transition Formula robbery fraud destruction Oslo center northern part Row coordin ates F Dim1 Dim2 Oslo center northern part Column coordin ates G Dim1 Dim2 robbery fraud destructi on Eigenvalue λ Dim1 Dim2 lambda F = 𝑅𝐺𝐷 ! "#/% The column profile is also calculated by transposing, so the same form is used with respect to G = CFD .λ −1/2 2024/07/17 Response Analysis Methodology and Examples 30

Slide 31

Slide 31 text

Rojecting the supplymentary variables to the Generated spaces 2024/07/17 Response Analysis Methodology and Examples 31 col1 … coln supplyme ntaryCols row1 : rown colSum Active Variables:Generate row/col space Project the supplymentary variables Objective “Variable” dependent “Variable” explanatory “Varibles” independent “Variables” new axises are “new Varibles”

Slide 32

Slide 32 text

Stuructured Data Analysis • Structured modeling • define Active Variables and Supplymentary Variables • Active Variables • Which contribute to Genrerate the Row/Col spaces. • Supplymentary Variables • Thoese which does not contribute to the spaces but bring important information by ploted int the Row/Col spaces. 2024/07/17 Response Analysis Methodology and Examples 32

Slide 33

Slide 33 text

Case study and practice 2024/07/17 Response Analysis Methodology and Examples 33

Slide 34

Slide 34 text

contents • First I will show you the Examples and • Second write Rmarkdown scripts and Run them ! • warmingup • Example 1 Housetasks • Example 2 Gender role consciousness • Example 3 Gender role consciousness(2) SDA 2024/07/17 Response Analysis Methodology and Examples 34

Slide 35

Slide 35 text

common steps of each examples(1) • Step 1 • confirm data structure • basic statistical analysis • Frequency Table • Chi-squared test • Step 2 • use CA/MCA function • for SDA, structured modeling is nesessary. • check fundamental result • check inertia(variance) • check Col/variables spaces and Row/indivisual space 2024/07/17 Response Analysis Methodology and Examples 35

Slide 36

Slide 36 text

common steps of each examples(2) • Step 3 interpret axis • check the contribution of each axis. • Name the axis • Interpret the Col/Variable positions and Row/indiv positions. • Step 4 (if needed) • Project the supplymentary points. • Interpret space (point positions and axes direction) 2024/07/17 Response Analysis Methodology and Examples 36

Slide 37

Slide 37 text

Warming up • start Rstudio • prepare the project directory • copy related files to project directory • create New file as start.Rmd • knit it ! • confirm generating Html file and will be opend by your Web browser. 2024/07/17 Response Analysis Methodology and Examples 37

Slide 38

Slide 38 text

example files • I show you 3 examples by excuting R on the Rstudio using .qmd/.rmd files. • If we will have enough time, please excute it step by step with me. • If there will no enough time, please watch what I will do. • You will reproduce step by step with example files. 2024/07/17 Response Analysis Methodology and Examples 38

Slide 39

Slide 39 text

CA case study: “Housetasks” 家事分担 가사 분담 2024/07/17 Response Analysis Methodology and Examples 39

Slide 40

Slide 40 text

Step 1 Data check 2024/07/17 Response Analysis Methodology and Examples 40

Slide 41

Slide 41 text

chi-squre test 2024/07/17 Response Analysis Methodology and Examples 41

Slide 42

Slide 42 text

row analysis and col analysis 2024/07/17 Response Analysis Methodology and Examples 42

Slide 43

Slide 43 text

Step 2 CA 2024/07/17 Response Analysis Methodology and Examples 43

Slide 44

Slide 44 text

symmetric map 2024/07/17 Response Analysis Methodology and Examples 44

Slide 45

Slide 45 text

caution on the distance of points • between row variable and between col variables are defined. • but between row point and col point distance is not defined. • This is very important if you interpret symmetric map. • ★asymmetric map • if one variable use standard coordinate, it will be the box to point another point. • Direction is important information. • After grasp the direction, symmetric map is usefull to interpret the structure. 2024/07/17 Response Analysis Methodology and Examples 45

Slide 46

Slide 46 text

asymmetric map 2024/07/17 Response Analysis Methodology and Examples 46

Slide 47

Slide 47 text

Interpret symmetric map over asymmetric map 2024/07/17 Response Analysis Methodology and Examples 47 If you study who is the main actor to the contens of house tasks, put the blue point as a box of the red points. first quadrant (direction) is mainly by husband. 2nd quadrand (direction ) is mainly by Wife. dim1 right is by hasband, dim1 left is by Wife, middle position is Alternating. dim2 upper part is wife or husband “alone”, lower part is “together”.

Slide 48

Slide 48 text

MCA case study: gender role consciousness 性別役割意識 성역할 의식 2024/07/17 Response Analysis Methodology and Examples 48

Slide 49

Slide 49 text

MCA Case Studies • Case Study: Q16abc at SSM2005 • There are A) - C) opinions about the roles of men and women. What do you think? Please choose the one that is closest to your opinion on each of them. Strongly agree somewhat agree somewhat disagree strongly disagree don't know A Men should work outside the home and women should protect the home. 1 2 3 4 99 B Boys and girls should be raised differently. 1 2 3 4 99 C Women are better suited than men for housework and childcare. 1 2 3 4 99

Slide 50

Slide 50 text

original questionary

Slide 51

Slide 51 text

Step 1 data-set The answer to Q16abc is 1,2,3,4,9, recoded to A,B,C,D,DKNA.

Slide 52

Slide 52 text

First MCA What I'd like to see is the low frequency DK-NA over here. It has been pulled. This is to be junked. 2024/07/17 Response Analysis Methodology and Examples 52

Slide 53

Slide 53 text

For each question, the answer choices are connected by a LINE (ordinal variable). 2024/07/17 Response Analysis Methodology and Examples 53

Slide 54

Slide 54 text

MCA case study(2): SDA. structured data analysis 構造化データ解析 구조화 데이터 분석 2024/07/17 Response Analysis Methodology and Examples 54

Slide 55

Slide 55 text

MCA Applications: Structured Data Analysis • Projecting supplymentary variables into the space generated by MCA • Interpret the reference space (objective variable) with additional variables (explanatory variables) • You can also do MCA by pulling it all together. • If so, how do you interpret the axes generated? • Interpretation of Axis on Gender Role Attitudes (Q16ABC) • If you add `age` and `gender` to the axis, is it possible to interpret axis ? • The generated coordinate axes are the new variables • Multi-dimensional variables • The effect of additional variables (categories) on this will be analyzed. • analysis of variance

Slide 56

Slide 56 text

Plot "Age 10" as an additional variable Gender and age as interaction plot 2024/07/17 Response Analysis Methodology and Examples 56

Slide 57

Slide 57 text

CA case study: Occupation and Leisure Spending 職種と余暇の過ごし⽅ 직업과 여가생활 2024/07/17 Response Analysis Methodology and Examples 57

Slide 58

Slide 58 text

CA Case Study • Case Study: "Occupation and Leisure Spending" • Data Preparation • Basic analysis of data • simple aggregate • cross table • chi-square test • Chi-square value and p-value • Grasping row and column analysis by mosaic plot • Relationships among variable categories by CA • Variable-to-Variable Relationships

Slide 59

Slide 59 text

data • Cross Table of Inputs • An Introduction to Correspondence Analysis. Table in Chapter 1 of the

Slide 60

Slide 60 text

Since it is a cross table, the chi-square test • P-value is almost zero So, the null hypothesis (job type and leisure time are unrelated) is rejected. However, we do not know what kind of relationship exists... • We need to do row and column analysis. (Before the chi- square test, this!)

Slide 61

Slide 61 text

row analysis 2024/07/17 Response Analysis Methodology and Examples 61

Slide 62

Slide 62 text

colum analysis 2024/07/17 Response Analysis Methodology and Examples 62

Slide 63

Slide 63 text

Symmetric map symmetrical map 2024/07/17 Response Analysis Methodology and Examples 63

Slide 64

Slide 64 text

asymmetric map 2024/07/17 Response Analysis Methodology and Examples 64

Slide 65

Slide 65 text

Row and column contributions The variance that the data as a whole had is broken down into axes. The axes are composed of each point. The contribution ratio shows how much each point contributes to the total variance. Axis 1, "Church services" contributes 35% of the total for the row. contributed 35% in the rows. In the columns, "Retirees" contributes 71%. 71% contribution. This is the basis for interpreting and naming the axes. the axis and name it. 2024/07/17 Response Analysis Methodology and Examples 65

Slide 66

Slide 66 text

Interpretation of axes • Dim.1 • Axis corresponding to age • Right: Older, Left: Younger • Dim2 • behavioral pattern • Top: Active • Bottom: Quiet 2024/07/17 Response Analysis Methodology and Examples 66 Older Younger Active Quiet

Slide 67

Slide 67 text

For futher study of CA/MCA 2024/07/17 Response Analysis Methodology and Examples 67

Slide 68

Slide 68 text

CJK display problem • ggplot and vcd::mosaic(both are grid graphics) need showtext package to display CJK(Chinese Japanese Korean) characters. • You can install showtext from CRAN, and load this at the top of script. • At the graph chunk, you have to put it. be carefull ! underscore and dash. • ```{r fig_showtext=TRUE} or • ```{r} #| fig-showtext: TRUE 2024/07/17 Response Analysis Methodology and Examples 68

Slide 69

Slide 69 text

Related tools for performing CA/MCA in R • main functions for CA/MCA • FactoMineR • GDAtools • Helper functions for CA/MCA • factoextra • explor • Factoshine • Imprtant package for EDA • vcd/vcdExtra

Slide 70

Slide 70 text

CA/MCA helper Package • factoextra • Graphical display of FactoMineR's function, CA, MCA, PCA, and other results. • http://www.sthda.com/english/wiki/factoextra-r-package-easy-multivariate- data-analyses-and-elegant-visualization • FactoShiny • Tools by Shiny created for FactoMineR results • explor(not “explore”) • Tool to dynamically display results of various CA/MCAfunctions (Shiny) • https://juba.github.io/explor/ Interpreting results 2024/07/17 Response Analysis Methodology and Examples 70

Slide 71

Slide 71 text

vcd (visualing categorical data) • mosaic is included in this package. this mosaic is different from base::mosaicplot. • Mosaic plot • https://cran.r- project.org/web/packages/vcdExtra/vignettes/mosaics.html • Tutorial • WorkingwithcategoricaldatawithRandthevcdand vcdExtrapackages • https://www.datavis.ca/courses/VCD/vcd-tutorial.pdf • Text book for Categorical Data Analysis • http://ddar.datavis.ca/ 2024/07/17 Response Analysis Methodology and Examples 71

Slide 72

Slide 72 text

CARME.network • Correspondence Analysis and Related Methods network • http://www.carme-n.org/ • Youtube • https://www.youtube.com/@CARMEnetwork 2024/07/17 Response Analysis Methodology and Examples 72

Slide 73

Slide 73 text

thank you for your attention [email protected] https://419kfj.sakura.ne.jp/db/ 2024/07/17 Response Analysis Methodology and Examples 73

Slide 74

Slide 74 text

Appendix 2024/07/17 Response Analysis Methodology and Examples 74