Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Methods and Examples of Correspondence Analysis

Methods and Examples of Correspondence Analysis

slide for the above titled work shop at Chiba University at 17th July, 2024. ver0.96

419kfj

July 17, 2024
Tweet

More Decks by 419kfj

Other Decks in Programming

Transcript

  1. Methods and Examples of Correspondence Analysis July 17, 2024 @Chiba

    University Institute for Mathematics and Computer Science, Tsuda University Project Manager: Kazuo Fujimoto [email protected]
  2. Acknowledgement • This work was supported by JSPS KAKENHI Grant

    Number JP20K02162, JP23K22184. 2024/07/17 Response Analysis Methodology and Examples 2
  3. Revise log • ver0.96 2024/07/17 05:30 Tsuda college -> Tsuda

    University, add Acknowledgement page. • ver0.95 2024/07/16 22:30 CA 2nd example into English • ver0.9 2024/07/16 • Edited from the presentation for Reported at the first meeting of the "Beyond Governance-Type Ethics" workshop, Ver1.0 2024/06/29 . 2024/07/17 Response Analysis Methodology and Examples 3
  4. related files • download “CA_workshop_qmd_files.zip from • http://133.167.73.14/~kazuo/CA_workshop/CA_workshop_qmd_file s.zip and

    extract in a project directory. it contain, 2 qmd files and 1 rda file. 2024/07/17 Response Analysis Methodology and Examples 4
  5. Profile of Lecturer • Born in Tokyo, 1955 • 1978/3

    Graduated from Sophia University, Faculty of Science and Engineering, Department of Electrical and Electronic Engineering • Worked for a company until 2002/3 after graduation. As Engineer, and Marketing staff. • Studied sociology at Tokyo Metropolitan University Graduate School from 1990 to 1992. • 2002/4-2020/3 Worked at Sakushin Gakuin University. • Sociology • Communication theory • IT Security Issues/Information Systems Theory • Social Research Practice • 2020/3 Retired. Professor Emeritus. • After retirement from Sakushin Gakuin University, • From April 2021, I moved to be a Project Researcher of the Institute for Mathematics and Computer Science, Tsuda College. • From September 2021, I work as Invited Expert at the National Institute of Information and Communications Technology (NICT). 2024/07/17 Response Analysis Methodology and Examples 5
  6. Research Themes • Categorical Data Analysis in Social Research •

    Correspondence Analysis Related Translations • 2015, "Introduction to Correspondence Analysis," Ohmsha, Inc. • 2020 "Theory and Practice of Correspondence Analysis," Ohmsha, Inc. • Current Research Topics • "A Study of Categorical Data Analysis Focusing on the Geometrical Structure of Data." • KAKEN DB https://bit.ly/3sOg8JI 2024/07/17 Response Analysis Methodology and Examples 6
  7. Todayʼs Program • 09:30 〜 • Keywords to understand Correspondence

    Analysis(CA) • Principle Mathematical tool for dimension reduction. • Case Studies • CA Case Study • MCA Case Studies • MCA Applications • Structured Data Analysis • 11:00 〜 • CA/MCA in R/ Proctice • 12:00 〜 • Q&A andAppendix 2024/07/17 Response Analysis Methodology and Examples 7
  8. What is Categorical data? How do you process categorical data?

    2024/07/17 Response Analysis Methodology and Examples 8
  9. What do you do when you face categorical data? •

    classification of research data. • Quantitative variables. • numerical • Categorical/Qualitative variables. • Nominal variables. • Ordinal variables. • “Scaling” is very important data attribute. • ref: “measurement scale” https://www.britannica.com/topic/measurement-scale 2024/07/17 Response Analysis Methodology and Examples 9
  10. Ordinal variables Scaling 2024/07/17 Response Analysis Methodology and Examples 10

    Encoded to 5 4 3 2 1 Question A Question B Is the Answer 4 of Question A is the same “strength” to the Answer 4 of Question B ?
  11. Your action? • In order to store in the computer,

    data often coded numerically. • 1,2,3,4,5 • Do you calculate the mean or variance, even though that Number is a Name? • In Excel, if the ʼappearance' is a number, you can calculate it as a number, even though it is a nominal variable. • With ordinal variables, such a process seems even more reasonable. • But when you are coded using the five-case method, 5, 4, 3, 2, 1, is 4 double 2? And is there an equal difference of 1 between each response? To begin with, is 5 to 1 linear? 2024/07/17 Response Analysis Methodology and Examples 11
  12. Integer Scale or Likert Scale • to adapt the statistical

    methods, like mean and variance, it is needed to convert Categorical responces to Numeric value. • Integer Scaling or Likart Scaling is certainly one of the methods. • But is it adequate to investigate the data (and reply actions) ? •CA and MCA provide the Quantification Method to preserve the data structure. 2024/07/17 Response Analysis Methodology and Examples 12
  13. What is CA/MCA ? Quantify by focusing on response patterns.

    2024/07/17 Response Analysis Methodology and Examples 13
  14. Relationship between CA and MCA • Correspondence Analysis 対応分析/대응 분석

    • “CA” in short. • Some call it ”Korepon." ....(in Japanese) • Multiple Correspondence Analysis 多重対応分析/다중 대응 분석 • Multiple Correspondence Analysis • “MCA” in short • In contrast, CA is sometimes called Simple CA.
  15. Data format 2024/07/17 Response Analysis Methodology and Examples 15 for

    CA for MCA : mulitle variables answer indivisuals Varibale 2 Varibale 1
  16. both format are the same, 2 dimentions. 2024/07/17 Response Analysis

    Methodology and Examples 16 expand the “Variable” to Colum of Varibles Category. (indicator matrix) “indivisuals x variables” table convert to “indivisuals x categories” table .
  17. for CA and MCA, every features have a same mechanism

    • MCA will be performed by CA to the indicator matrix, which is converted form individuals x variable matrix. • In CA you will see “Row space” and “Col space”, it correspond to “individual spacae” and “Variable space” of MCA. 2024/07/17 Response Analysis Methodology and Examples 17
  18. Key words for CA/MCA • Data Object of analysis •

    Cross table • CA: 2 variables, MCA: multivariate • In fact, the MCA is also a 2 "Variables" table: • row: Individual x column: Response Category(expanded from Response Variables). • Releated methods • Quantification for Categorical data • Chikio Hayashi, Quantification Methods type III • Shizuhiko Nishizato, "Necessity of Quantification," Kwansei Gakuin University Booklet
  19. Sociology and MCA • Famous application of MCA • The

    Sociology of Relationships as opposed to the Sociology of Variables • Adapted from the German edition of Perface, "Metier des Sociologists. • Refers to "sociology of variables" ,regression analysi • Bourdieu stated "I use Correspondence Analysis very much, because I think that it is essentially a relational procedure whose philosophy fully expresses what in my view constitutes social reality. It is a procedure that 'thinks' in relations, as I try to do with the concept of field".2 2 Preface of the German edition of Le métier de sociologue, 1991 Lebardon Frederic, 2009:13 Lebardon Frederic, 2009,”How Bourdieu ʻQuantifiedʼ Bourdieu: The Geometric Modelling of Data”, Karen Robson, Chris Sanders ed, Quantifying Theory:Pierre Bourdieu, Springer
  20. Summary before the explanation of CA case study. • Format

    of input data • CA and MCA • Result generated as a result of CA/MCA • Two Spaces • coordinate axis • coordinate • Two coordinates • home position • standard coordinates • Interpretation of the points with Geometric point of view • The origin point of the graph is the overroll average • The similar points are placed in close position. • Different things are placed far away each other. • What you are looking at is the row/column profile
  21. Input data format • The CA enters the cross table.

    • Two variable, 2 dimentions table • MCA is a table of rows: individuals, columns: variables, such as survey data. • However, as a process, the variable columns are expanded into variable category (choice) columns, and the CA is performed in the form of an indicator matrix with 1 in the selected cell. (One of the classical MCAs.)
  22. both format are the same, 2 dimentions. 2024/07/17 Response Analysis

    Methodology and Examples 22 expand the “Variable” to Colum of Varibles Category. (indicator matrix) “indivisuals x variables” table convert to “indivisuals x categories” table .
  23. CA/MCA results • CA/MCA compares row and column profiles. •

    Profiles: Row ratio, column ratio • The row ratio contains information on the selected column by row. • Calculate the distance (chi-square distance) of its row/column profile vector • Origin is the overall average point (expected value) • Similar profiles are located close together and different profiles are located far apart. • Mathematically, the residual matrix from the expected value of the input matrix is decomposed into singular values. • Matrix about row coordinates, diagonal matrix about the variance of the coordinate axes, and matrix about column coordinates
  24. SVD for residual matrix • SVD result 3matrixes , U,D,V

    • S= U Dα Vt • U related to Row coordinate • V related to Col coordinate • αis singular value(square root of eigen value) excepcted matrix residuales diag matrix, items are inverted squared row margin diag matrix, items are inverted squared col margin standardization
  25. core calculation of CA/MCA P S(residual matrix) U Dα V

    SVD diag matrix, items are inverted squared row margin Dr -1/2 diag matrix, items are inverted squared col margin Dr -1/2 Φrow standard coord Γcol standard coord Frow principal coord Gcol principal coord Φ=Dr -1/2U Γ=Dc -1/2V F=ΦDα G=ΓDα result of SVD UDV dimention reduction is performed by selecting α P=M/n 2024/07/17 Response Analysis Methodology and Examples 25
  26. row1 … rowm col Sum col1 1 : 1 coln

    1 AveColProfile col1 … coln row Sum row1 1 : 1 rowm 1 AveRowProfile 1 CA input table and two profiles Row Profile R Column Profile C 2024/07/17 Response Analysis Methodology and Examples 26 col1 … coln rowS um row1 : rowm colSum From the point of view from Row: row Analysis From the point of view from Col: col Analysis
  27. CA generate Row space and Col space 2024/07/17 Response Analysis

    Methodology and Examples 27 col1 … coln rowS um row1 : rown colSum m x n matrix Row space Col space Generating the space means generating axies, dim1….dimn. > these dimn has inertias or variances which are disassembled from the total variance of inputed table. > these inertial of dims are the same in Row space and Col spece.
  28. CA's Three Result Row coordin ates F Dim1 … row1

    row2 row3 Column coordin ates G Dim1 … col1 col2 col3 Eigenval ue λ Dim1 … lambda 2024/07/17 Response Analysis Methodology and Examples 28 SVD to the residual matrix results three matrixes col1 … coln Row Sum row1 : rowm ColSum
  29. Row and Column relationships • Row coordinates F after CA,

    Row profile R before CA • Column Coordinates G after CA, Column Profile C before CA are interpenetrated through the inertia (variance, eigenvalue, or squared singular value) of the coordinates generated by the CA. (transition formula) • This relationship make the relationship of “additional variables” and “additional categories”, which do not contribute to the creation of the space but acquire coordinates. • This is basic feature of Structured Data Analysis , SDA.
  30. Transition Formula robbery fraud destruction Oslo center northern part Row

    coordin ates F Dim1 Dim2 Oslo center northern part Column coordin ates G Dim1 Dim2 robbery fraud destructi on Eigenvalue λ Dim1 Dim2 lambda F = 𝑅𝐺𝐷 ! "#/% The column profile is also calculated by transposing, so the same form is used with respect to G = CFD .λ −1/2 2024/07/17 Response Analysis Methodology and Examples 30
  31. Rojecting the supplymentary variables to the Generated spaces 2024/07/17 Response

    Analysis Methodology and Examples 31 col1 … coln supplyme ntaryCols row1 : rown colSum Active Variables:Generate row/col space Project the supplymentary variables Objective “Variable” dependent “Variable” explanatory “Varibles” independent “Variables” new axises are “new Varibles”
  32. Stuructured Data Analysis • Structured modeling • define Active Variables

    and Supplymentary Variables • Active Variables • Which contribute to Genrerate the Row/Col spaces. • Supplymentary Variables • Thoese which does not contribute to the spaces but bring important information by ploted int the Row/Col spaces. 2024/07/17 Response Analysis Methodology and Examples 32
  33. contents • First I will show you the Examples and

    • Second write Rmarkdown scripts and Run them ! • warmingup • Example 1 Housetasks • Example 2 Gender role consciousness • Example 3 Gender role consciousness(2) SDA 2024/07/17 Response Analysis Methodology and Examples 34
  34. common steps of each examples(1) • Step 1 • confirm

    data structure • basic statistical analysis • Frequency Table • Chi-squared test • Step 2 • use CA/MCA function • for SDA, structured modeling is nesessary. • check fundamental result • check inertia(variance) • check Col/variables spaces and Row/indivisual space 2024/07/17 Response Analysis Methodology and Examples 35
  35. common steps of each examples(2) • Step 3 interpret axis

    • check the contribution of each axis. • Name the axis • Interpret the Col/Variable positions and Row/indiv positions. • Step 4 (if needed) • Project the supplymentary points. • Interpret space (point positions and axes direction) 2024/07/17 Response Analysis Methodology and Examples 36
  36. Warming up • start Rstudio • prepare the project directory

    • copy related files to project directory • create New file as start.Rmd • knit it ! • confirm generating Html file and will be opend by your Web browser. 2024/07/17 Response Analysis Methodology and Examples 37
  37. example files • I show you 3 examples by excuting

    R on the Rstudio using .qmd/.rmd files. • If we will have enough time, please excute it step by step with me. • If there will no enough time, please watch what I will do. • You will reproduce step by step with example files. 2024/07/17 Response Analysis Methodology and Examples 38
  38. caution on the distance of points • between row variable

    and between col variables are defined. • but between row point and col point distance is not defined. • This is very important if you interpret symmetric map. • ★asymmetric map • if one variable use standard coordinate, it will be the box to point another point. • Direction is important information. • After grasp the direction, symmetric map is usefull to interpret the structure. 2024/07/17 Response Analysis Methodology and Examples 45
  39. Interpret symmetric map over asymmetric map 2024/07/17 Response Analysis Methodology

    and Examples 47 If you study who is the main actor to the contens of house tasks, put the blue point as a box of the red points. first quadrant (direction) is mainly by husband. 2nd quadrand (direction ) is mainly by Wife. dim1 right is by hasband, dim1 left is by Wife, middle position is Alternating. dim2 upper part is wife or husband “alone”, lower part is “together”.
  40. MCA Case Studies • Case Study: Q16abc at SSM2005 •

    There are A) - C) opinions about the roles of men and women. What do you think? Please choose the one that is closest to your opinion on each of them. Strongly agree somewhat agree somewhat disagree strongly disagree don't know A Men should work outside the home and women should protect the home. 1 2 3 4 99 B Boys and girls should be raised differently. 1 2 3 4 99 C Women are better suited than men for housework and childcare. 1 2 3 4 99
  41. First MCA What I'd like to see is the low

    frequency DK-NA over here. It has been pulled. This is to be junked. 2024/07/17 Response Analysis Methodology and Examples 52
  42. For each question, the answer choices are connected by a

    LINE (ordinal variable). 2024/07/17 Response Analysis Methodology and Examples 53
  43. MCA case study(2): SDA. structured data analysis 構造化データ解析 구조화 데이터

    분석 2024/07/17 Response Analysis Methodology and Examples 54
  44. MCA Applications: Structured Data Analysis • Projecting supplymentary variables into

    the space generated by MCA • Interpret the reference space (objective variable) with additional variables (explanatory variables) • You can also do MCA by pulling it all together. • If so, how do you interpret the axes generated? • Interpretation of Axis on Gender Role Attitudes (Q16ABC) • If you add `age` and `gender` to the axis, is it possible to interpret axis ? • The generated coordinate axes are the new variables • Multi-dimensional variables • The effect of additional variables (categories) on this will be analyzed. • analysis of variance
  45. Plot "Age 10" as an additional variable Gender and age

    as interaction plot 2024/07/17 Response Analysis Methodology and Examples 56
  46. CA Case Study • Case Study: "Occupation and Leisure Spending"

    • Data Preparation • Basic analysis of data • simple aggregate • cross table • chi-square test • Chi-square value and p-value • Grasping row and column analysis by mosaic plot • Relationships among variable categories by CA • Variable-to-Variable Relationships
  47. data • Cross Table of Inputs • An Introduction to

    Correspondence Analysis. Table in Chapter 1 of the
  48. Since it is a cross table, the chi-square test •

    P-value is almost zero So, the null hypothesis (job type and leisure time are unrelated) is rejected. However, we do not know what kind of relationship exists... • We need to do row and column analysis. (Before the chi- square test, this!)
  49. Row and column contributions The variance that the data as

    a whole had is broken down into axes. The axes are composed of each point. The contribution ratio shows how much each point contributes to the total variance. Axis 1, "Church services" contributes 35% of the total for the row. contributed 35% in the rows. In the columns, "Retirees" contributes 71%. 71% contribution. This is the basis for interpreting and naming the axes. the axis and name it. 2024/07/17 Response Analysis Methodology and Examples 65
  50. Interpretation of axes • Dim.1 • Axis corresponding to age

    • Right: Older, Left: Younger • Dim2 • behavioral pattern • Top: Active • Bottom: Quiet 2024/07/17 Response Analysis Methodology and Examples 66 Older Younger Active Quiet
  51. CJK display problem • ggplot and vcd::mosaic(both are grid graphics)

    need showtext package to display CJK(Chinese Japanese Korean) characters. • You can install showtext from CRAN, and load this at the top of script. • At the graph chunk, you have to put it. be carefull ! underscore and dash. • ```{r fig_showtext=TRUE} or • ```{r} #| fig-showtext: TRUE 2024/07/17 Response Analysis Methodology and Examples 68
  52. Related tools for performing CA/MCA in R • main functions

    for CA/MCA • FactoMineR • GDAtools • Helper functions for CA/MCA • factoextra • explor • Factoshine • Imprtant package for EDA • vcd/vcdExtra
  53. CA/MCA helper Package • factoextra • Graphical display of FactoMineR's

    function, CA, MCA, PCA, and other results. • http://www.sthda.com/english/wiki/factoextra-r-package-easy-multivariate- data-analyses-and-elegant-visualization • FactoShiny • Tools by Shiny created for FactoMineR results • explor(not “explore”) • Tool to dynamically display results of various CA/MCAfunctions (Shiny) • https://juba.github.io/explor/ Interpreting results 2024/07/17 Response Analysis Methodology and Examples 70
  54. vcd (visualing categorical data) • mosaic is included in this

    package. this mosaic is different from base::mosaicplot. • Mosaic plot • https://cran.r- project.org/web/packages/vcdExtra/vignettes/mosaics.html • Tutorial • WorkingwithcategoricaldatawithRandthevcdand vcdExtrapackages • https://www.datavis.ca/courses/VCD/vcd-tutorial.pdf • Text book for Categorical Data Analysis • http://ddar.datavis.ca/ 2024/07/17 Response Analysis Methodology and Examples 71
  55. CARME.network • Correspondence Analysis and Related Methods network • http://www.carme-n.org/

    • Youtube • https://www.youtube.com/@CARMEnetwork 2024/07/17 Response Analysis Methodology and Examples 72