Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A missing values tour with principal component methods

julie josse
October 31, 2015

A missing values tour with principal component methods

How to perform principal components methods despite missing values and how it can help to handle missing values...

julie josse

October 31, 2015
Tweet

More Decks by julie josse

Other Decks in Research

Transcript

  1. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Missing values and principal components methods
    Julie Josse
    Stanford Stat 300, July 2015
    1 / 92

    View Slide

  2. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Outline
    1 Introduction
    2 Point estimates of the PCA axes and components
    3 Uncertainty
    4 MCA/MFA
    5 Single imputation for mixed variables
    6 Multiple imputation
    7 Practice
    8 Appendix
    2 / 92

    View Slide

  3. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Missing values
    Gertrude Mary Cox
    “The best thing to do with missing
    values is not to have any”
    Missing values are ubiquitous:
    • no answer in a questionnaire
    • data that are lost or destroyed
    • machines that fail
    • plants damaged
    • ...
    Still an issue in the "big data" area
    3 / 92

    View Slide

  4. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Some references
    Schafer (1997) Little & Rubin (1987, 2002)
    Joseph L. Schafer Roderick Little Donald Rubin
    Suggested reading: chap 25 of Gelman & Hill (2006)
    Andrew Gelman Jennifer L. Hill
    4 / 92

    View Slide

  5. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Missing values problematic
    A very simple way: deletion (default lm function in R)
    Dealing with missing values depends on:
    • the pattern of missing values
    • the mechanism leading to missing values
    5 / 92

    View Slide

  6. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Missing values problematic
    A very simple way: deletion (default lm function in R)
    Dealing with missing values depends on:
    • the pattern of missing values
    • the mechanism leading to missing values
    • MCAR: probability does not depend on any values
    • MAR: probability may depend on values on other variables
    • MNAR: probability depends on the value itself
    (Ex: Income - Age)
    5 / 92

    View Slide

  7. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Missing values problematic
    A very simple way: deletion (default lm function in R)
    Dealing with missing values depends on:
    • the pattern of missing values
    • the mechanism leading to missing values
    • MCAR: probability does not depend on any values
    • MAR: probability may depend on values on other variables
    • MNAR: probability depends on the value itself
    (Ex: Income - Age)
    ⇒ Inspect/ visualization of missing data
    5 / 92

    View Slide

  8. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Single imputation methods
    6 / 92

    View Slide

  9. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Single imputation methods
    q
    q
    q
    q
    q q q q
    q
    q
    q
    q q
    q
    q
    q q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q q
    q
    q
    q q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q q
    q
    q
    q q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q q q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q q q
    q q q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q q
    q q
    q q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q q q
    q
    q
    q
    q
    q
    q
    q
    q
    −3 −2 −1 0 1 2
    −2 −1 0 1 2
    Mean imputation
    X
    Y
    q
    q
    q
    q q q
    q q
    q
    q q
    q
    q q
    q
    q
    q
    q q
    q q
    q q
    q
    q q
    q
    q
    q q q
    q
    q q
    q q q
    q
    q q
    q q q q
    q q
    q
    q
    q q q
    q q
    q
    q q
    q q
    q q
    q q
    q q
    q
    q q
    q
    q q q q
    q q
    q
    q
    q qq
    q q q
    q q q
    q q q
    q q
    q
    q q q
    q q q
    q
    q
    q q
    q q
    q
    q q
    q
    q q
    q
    q q
    q q
    q q q q
    q
    q
    µy = 0
    σy = 1
    ρ = 0.6
    CIµy 95%
    0.01
    0.5
    0.30
    39.4
    6 / 92

    View Slide

  10. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Single imputation methods
    q
    q
    q
    q
    q q q q
    q
    q
    q
    q q
    q
    q
    q q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q q
    q
    q
    q q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q q
    q
    q
    q q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q q q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q q q
    q q q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q q
    q q
    q q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q q q
    q
    q
    q
    q
    q
    q
    q
    q
    −3 −2 −1 0 1 2
    −2 −1 0 1 2
    Mean imputation
    X
    Y
    q
    q
    q
    q q q
    q q
    q
    q q
    q
    q q
    q
    q
    q
    q q
    q q
    q q
    q
    q q
    q
    q
    q q q
    q
    q q
    q q q
    q
    q q
    q q q q
    q q
    q
    q
    q q q
    q q
    q
    q q
    q q
    q q
    q q
    q q
    q
    q q
    q
    q q q q
    q q
    q
    q
    q qq
    q q q
    q q q
    q q q
    q q
    q
    q q q
    q q q
    q
    q
    q q
    q q
    q
    q q
    q
    q q
    q
    q q
    q q
    q q q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    −3 −2 −1 0 1 2
    −2 −1 0 1 2
    Regression imputation
    X
    Y
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    µy = 0
    σy = 1
    ρ = 0.6
    CIµy 95%
    0.01
    0.5
    0.30
    39.4
    0.01
    0.72
    0.78
    61.6
    6 / 92

    View Slide

  11. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Single imputation methods
    q
    q
    q
    q
    q q q q
    q
    q
    q
    q q
    q
    q
    q q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q q
    q
    q
    q q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q q
    q
    q
    q q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q q q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q q q
    q q q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q q
    q q
    q q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q q q
    q
    q
    q
    q
    q
    q
    q
    q
    −3 −2 −1 0 1 2
    −2 −1 0 1 2
    Mean imputation
    X
    Y
    q
    q
    q
    q q q
    q q
    q
    q q
    q
    q q
    q
    q
    q
    q q
    q q
    q q
    q
    q q
    q
    q
    q q q
    q
    q q
    q q q
    q
    q q
    q q q q
    q q
    q
    q
    q q q
    q q
    q
    q q
    q q
    q q
    q q
    q q
    q
    q q
    q
    q q q q
    q q
    q
    q
    q qq
    q q q
    q q q
    q q q
    q q
    q
    q q q
    q q q
    q
    q
    q q
    q q
    q
    q q
    q
    q q
    q
    q q
    q q
    q q q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    −3 −2 −1 0 1 2
    −2 −1 0 1 2
    Regression imputation
    X
    Y
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    −3 −2 −1 0 1 2
    −3 −2 −1 0 1 2
    Stochastic regression imputation
    X
    Y
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    µy = 0
    σy = 1
    ρ = 0.6
    CIµy 95%
    0.01
    0.5
    0.30
    39.4
    0.01
    0.72
    0.78
    61.6
    0.01
    0.99
    0.59
    70.8
    6 / 92

    View Slide

  12. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Single imputation methods
    q
    q
    q
    q
    q q q q
    q
    q
    q
    q q
    q
    q
    q q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q q
    q
    q
    q q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q q
    q
    q
    q q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q q q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q q q
    q q q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q q
    q q
    q q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q q q
    q
    q
    q
    q
    q
    q
    q
    q
    −3 −2 −1 0 1 2
    −2 −1 0 1 2
    Mean imputation
    X
    Y
    q
    q
    q
    q q q
    q q
    q
    q q
    q
    q q
    q
    q
    q
    q q
    q q
    q q
    q
    q q
    q
    q
    q q q
    q
    q q
    q q q
    q
    q q
    q q q q
    q q
    q
    q
    q q q
    q q
    q
    q q
    q q
    q q
    q q
    q q
    q
    q q
    q
    q q q q
    q q
    q
    q
    q qq
    q q q
    q q q
    q q q
    q q
    q
    q q q
    q q q
    q
    q
    q q
    q q
    q
    q q
    q
    q q
    q
    q q
    q q
    q q q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    −3 −2 −1 0 1 2
    −2 −1 0 1 2
    Regression imputation
    X
    Y
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    −3 −2 −1 0 1 2
    −3 −2 −1 0 1 2
    Stochastic regression imputation
    X
    Y
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    µy = 0
    σy = 1
    ρ = 0.6
    CIµy 95%
    0.01
    0.5
    0.30
    39.4
    0.01
    0.72
    0.78
    61.6
    0.01
    0.99
    0.59
    70.8
    ⇒ Standard errors of the parameters (ˆ
    σˆ
    µy
    ) calculated from the
    imputed data set are underestimated
    6 / 92

    View Slide

  13. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Recommended methods
    ⇒ Multiple imputation (Rubin, 1987)
    • Generate M plausible values for each missing value
    ( ˆ
    F ˆ
    u′)ij
    ( ˆ
    F ˆ
    u′)1
    ij
    + ε1
    ij
    ( ˆ
    F ˆ
    u′)2
    ij
    + ε2
    ij
    ( ˆ
    F ˆ
    u′)3
    ij
    + ε3
    ij
    ( ˆ
    F ˆ
    u′)B
    ij
    + εB
    ij
    • Perform the analysis on each imputed data set: ˆ
    θm, Var ˆ
    θm
    • Combine the results: ˆ
    θ = 1
    M
    M
    m=1
    ˆ
    θm
    T = 1
    M
    M
    m=1
    Var ˆ
    θm + 1 + 1
    M
    1
    M−1
    M
    m=1
    ˆ
    θm − ˆ
    θ
    2
    7 / 92

    View Slide

  14. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Recommended methods
    ⇒ Multiple imputation (Rubin, 1987)
    ⇒ Maximum likelihood: EM algorithm (Dempster et al., 1977) to
    obtain point estimates + other algorithms for their variability
    One specific algorithms for each statistical method
    ⇒ Common aim: provide estimation of the parameters and of their
    variability (taken into account the variability due to missing values)
    8 / 92

    View Slide

  15. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Outline
    1 Introduction
    2 Point estimates of the PCA axes and components
    3 Uncertainty
    4 MCA/MFA
    5 Single imputation for mixed variables
    6 Multiple imputation
    7 Practice
    8 Appendix
    9 / 92

    View Slide

  16. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    PCA reconstruction
    -2.00 -2.74
    -1.56 -0.77
    -1.11 -1.59
    -0.67 -1.13
    -0.22 -1.22
    0.22 -0.52
    0.67 1.46
    1.11 0.63
    1.56 1.10
    2.00 1.00
    X
    X
    X
    X
    -3 -2 -1 0 1 2 3
    -3 -2 -1 0 1 2 3
    x1
    x2
    10 / 92

    View Slide

  17. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    PCA reconstruction
    -2.00 -2.74
    -1.56 -0.77
    -1.11 -1.59
    -0.67 -1.13
    -0.22 -1.22
    0.22 -0.52
    0.67 1.46
    1.11 0.63
    1.56 1.10
    2.00 1.00
    -2.16 -2.58
    -0.96 -1.35
    -1.15 -1.55
    -0.70 -1.09
    -0.53 -0.92
    0.04 -0.34
    1.24 0.89
    1.05 0.69
    1.50 1.15
    1.67 1.33
    X
    X
    X
    X
    -3 -2 -1 0 1 2 3
    -3 -2 -1 0 1 2 3
    x1
    x2
    X
    X
    X
    X
    10 / 92

    View Slide

  18. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    PCA reconstruction
    ˆ
    X = FV t
    10 / 92

    View Slide

  19. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Minimizes the reconstruction error
    ⇒ Minimize the distance between observations and their projection
    ⇒ Approximation of X with a low rank matrix S < p
    Xn×p − ˆ
    Xn×p
    2
    SVD: ˆ
    XPCA = Un×SΛ
    1
    2
    S×S
    Vp×S
    = Fn×SVp×S
    F = UΛ1
    2 PC - scores
    V principal axes - loadings
    11 / 92

    View Slide

  20. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Missing values in PCA
    ⇒ PCA: least squares
    Xn×p − Un×SΛ
    1
    2
    S×S
    Vp×S
    2
    ⇒ PCA with missing values: weighted least squares
    Wn×p ∗ (Xn×p − Un×SΛ
    1
    2
    S×S
    Vp×S
    ) 2
    with wij = 0 if xij is missing, wij = 1 otherwise
    Many algorithms: weighted alternating least squares (Gabriel &
    Zamir, 1979); iterative PCA (Kiers, 1997)
    12 / 92

    View Slide

  21. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Weighted least squares
    ⇒ Rank 1: n
    i=1
    p
    j=1
    (xij − Fi1Vj1)2
    2 simple regressions: Vj1 = i
    (xij ×Fi1)
    i
    F2
    i1
    Fi1 = j
    (xij ×Vj1)
    j
    u2
    j1
    Power method. Deflation: (F2, V2) in ˆ
    ε1 = X − F1V1
    NIPALS (Non linear Iterative PArtial Least Squares, Wold,
    Christofferson, 1966, 1969). Vj1 = i
    (wij xij Fi1)
    i
    wij F2
    i1
    ; Fi1 = j
    (wij xij uj1)
    j
    wij V 2
    j1
    ⇒ Subspace S > 1:
    2 multiple regressions: V = X F(F F)−1; F = XV (V V )−1
    2 multiple weighted regressions
    13 / 92

    View Slide

  22. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Iterative PCA
    -2 -1 0 1 2 3
    -2 -1 0 1 2 3
    x1
    x2
    x1 x2
    -2.0 -2.01
    -1.5 -1.48
    0.0 -0.01
    1.5 NA
    2.0 1.98
    14 / 92

    View Slide

  23. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Iterative PCA
    -2 -1 0 1 2 3
    -2 -1 0 1 2 3
    x1
    x2
    x1 x2
    -2.0 -2.01
    -1.5 -1.48
    0.0 -0.01
    1.5 NA
    2.0 1.98
    x1 x2
    -2.0 -2.01
    -1.5 -1.48
    0.0 -0.01
    1.5 0.00
    2.0 1.98
    Initialization = 0: X0 (mean imputation)
    14 / 92

    View Slide

  24. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Iterative PCA
    -2 -1 0 1 2 3
    -2 -1 0 1 2 3
    x1
    x2
    x1 x2
    -2.0 -2.01
    -1.5 -1.48
    0.0 -0.01
    1.5 NA
    2.0 1.98
    x1 x2
    -2.0 -2.01
    -1.5 -1.48
    0.0 -0.01
    1.5 0.00
    2.0 1.98
    x1 x2
    -1.98 -2.04
    -1.44 -1.56
    0.15 -0.18
    1.00 0.57
    2.27 1.67
    PCA on the completed data set → (U , Λ , V );
    14 / 92

    View Slide

  25. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Iterative PCA
    -2 -1 0 1 2 3
    -2 -1 0 1 2 3
    x1
    x2
    x1 x2
    -2.0 -2.01
    -1.5 -1.48
    0.0 -0.01
    1.5 NA
    2.0 1.98
    x1 x2
    -2.0 -2.01
    -1.5 -1.48
    0.0 -0.01
    1.5 0.00
    2.0 1.98
    x1 x2
    -1.98 -2.04
    -1.44 -1.56
    0.15 -0.18
    1.00 0.57
    2.27 1.67
    Missing values imputed with the model matrix ˆ
    X = U Λ1/2 V
    14 / 92

    View Slide

  26. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Iterative PCA
    -2 -1 0 1 2 3
    -2 -1 0 1 2 3
    x1
    x2
    x1 x2
    -2.0 -2.01
    -1.5 -1.48
    0.0 -0.01
    1.5 NA
    2.0 1.98
    x1 x2
    -2.0 -2.01
    -1.5 -1.48
    0.0 -0.01
    1.5 0.00
    2.0 1.98
    x1 x2
    -1.98 -2.04
    -1.44 -1.56
    0.15 -0.18
    1.00 0.57
    2.27 1.67
    x1 x2
    -2.0 -2.01
    -1.5 -1.48
    0.0 -0.01
    1.5 0.57
    2.0 1.98
    The new imputed dataset is X = W ∗ X + (1 − W) ∗ ˆ
    X
    14 / 92

    View Slide

  27. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Iterative PCA
    x1 x2
    -2.0 -2.01
    -1.5 -1.48
    0.0 -0.01
    1.5 NA
    2.0 1.98
    x1 x2
    -2.0 -2.01
    -1.5 -1.48
    0.0 -0.01
    1.5 0.57
    2.0 1.98
    x1 x2
    -2.0 -2.01
    -1.5 -1.48
    0.0 -0.01
    1.5 0.57
    2.0 1.98
    -2 -1 0 1 2 3
    -2 -1 0 1 2 3
    x1
    x2
    14 / 92

    View Slide

  28. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Iterative PCA
    x1 x2
    -2.0 -2.01
    -1.5 -1.48
    0.0 -0.01
    1.5 NA
    2.0 1.98
    x1 x2
    -2.0 -2.01
    -1.5 -1.48
    0.0 -0.01
    1.5 0.57
    2.0 1.98
    x1 x2
    -2.00 -2.01
    -1.47 -1.52
    0.09 -0.11
    1.20 0.90
    2.18 1.78
    x1 x2
    -2.0 -2.01
    -1.5 -1.48
    0.0 -0.01
    1.5 0.90
    2.0 1.98
    -2 -1 0 1 2 3
    -2 -1 0 1 2 3
    x1
    x2
    14 / 92

    View Slide

  29. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Iterative PCA
    x1 x2
    -2.0 -2.01
    -1.5 -1.48
    0.0 -0.01
    1.5 NA
    2.0 1.98
    x1 x2
    -2.0 -2.01
    -1.5 -1.48
    0.0 -0.01
    1.5 0.00
    2.0 1.98
    x1 x2
    -1.98 -2.04
    -1.44 -1.56
    0.15 -0.18
    1.00 0.57
    2.27 1.67
    x1 x2
    -2.0 -2.01
    -1.5 -1.48
    0.0 -0.01
    1.5 0.57
    2.0 1.98
    -2 -1 0 1 2 3
    -2 -1 0 1 2 3
    x1
    x2
    Steps are repeated until convergence
    14 / 92

    View Slide

  30. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Iterative PCA
    x1 x2
    -2.0 -2.01
    -1.5 -1.48
    0.0 -0.01
    1.5 NA
    2.0 1.98
    x1 x2
    -2.0 -2.01
    -1.5 -1.48
    0.0 -0.01
    1.5 1.46
    2.0 1.98
    -2 -1 0 1 2 3
    -2 -1 0 1 2 3
    x1
    x2
    PCA on the completed data set → (U , Λ , V )
    Missing values imputed with the model matrix ˆ
    X = U Λ1/2 V
    14 / 92

    View Slide

  31. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Iterative PCA
    1 initialization = 0: X0 (mean imputation)
    2 step :
    (a) PCA on the completed data set → (U , Λ , V );
    S dimensions are kept
    (b) missing values imputed with ˆ
    X = U Λ1/2 V ;
    the new imputed dataset is X = W ∗ X + (1 − W) ∗ ˆ
    X
    3 steps of estimation and imputation are repeated
    15 / 92

    View Slide

  32. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Iterative PCA
    1 initialization = 0: X0 (mean imputation)
    2 step :
    (a) PCA on the completed data set → (U , Λ , V );
    S dimensions are kept
    (b) missing values imputed with ˆ
    X = U Λ1/2 V ;
    the new imputed dataset is X = W ∗ X + (1 − W) ∗ ˆ
    X
    (c) means (and standard deviations) are updated
    3 steps of estimation and imputation are repeated
    15 / 92

    View Slide

  33. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Iterative PCA
    1 initialization = 0: X0 (mean imputation)
    2 step :
    (a) PCA on the completed data set → (U , Λ , V );
    S dimensions are kept
    (b) missing values imputed with ˆ
    X = U Λ1/2 V ;
    the new imputed dataset is X = W ∗ X + (1 − W) ∗ ˆ
    X
    (c) means (and standard deviations) are updated
    3 steps of estimation and imputation are repeated
    15 / 92

    View Slide

  34. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Iterative PCA
    1 initialization = 0: X0 (mean imputation)
    2 step :
    (a) PCA on the completed data set → (U , Λ , V );
    S dimensions are kept
    (b) missing values imputed with ˆ
    X = U Λ1/2 V ;
    the new imputed dataset is X = W ∗ X + (1 − W) ∗ ˆ
    X
    (c) means (and standard deviations) are updated
    3 steps of estimation and imputation are repeated
    ⇒ EM algorithm of the fixed effect model (Caussinus, 1986)
    xij = S
    s=1

    λsUisVjs + εij εij ∼ N(0, σ2)
    ⇒ Imputation (matrix completion framework, Netflix)
    ⇒ Reduction of the variability (imputation by UΛ1/2V )
    15 / 92

    View Slide

  35. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Overfitting
    X41×6 = F41×2V2×6
    + N(0, 0.5)
    -4 -2 0 2 4
    -3 -2 -1 0 1 2 3 4
    ACP sur données complètes
    Dim 1 (55.09%)
    Dim 2 (27.91%)
    SEBRLE
    CLAY
    KARPOV
    BERNARD
    YURKOV
    WARNERS
    ZSIVOCZKY
    McMULLEN
    MARTINEAU
    HERNU
    BARRAS
    NOOL
    BOURGUIGNON
    Sebrle
    Clay
    Karpov
    Macey
    Warners
    Zsivoczky
    Hernu
    Nool
    Bernard
    Schwarzl
    Pogorelov
    Schoenbeck
    Barras
    Smith
    Averyanov
    Ojaniemi
    Smirnov
    Qi
    Drews
    Parkhomenko
    Terek
    Gomez
    Turi
    Lorenzo
    Karlivans
    Korkizoglou
    Uldal
    Casarsa
    16 / 92

    View Slide

  36. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Overfitting
    X41×6 = F41×2V2×6
    + N(0, 0.5) ⇒ 50% of NA
    -4 -2 0 2 4
    -3 -2 -1 0 1 2 3 4
    ACP sur données complètes
    Dim 1 (55.09%)
    Dim 2 (27.91%)
    SEBRLE
    CLAY
    KARPOV
    BERNARD
    YURKOV
    WARNERS
    ZSIVOCZKY
    McMULLEN
    MARTINEAU
    HERNU
    BARRAS
    NOOL
    BOURGUIGNON
    Sebrle
    Clay
    Karpov
    Macey
    Warners
    Zsivoczky
    Hernu
    Nool
    Bernard
    Schwarzl
    Pogorelov
    Schoenbeck
    Barras
    Smith
    Averyanov
    Ojaniemi
    Smirnov
    Qi
    Drews
    Parkhomenko
    Terek
    Gomez
    Turi
    Lorenzo
    Karlivans
    Korkizoglou
    Uldal
    Casarsa
    -4 -2 0 2 4
    -4 -2 0 2
    ACP itérative
    Dim 1 (63.97%)
    Dim 2 (31.9%)
    SEBRLE
    CLAY
    KARPOV
    BERNARD
    YURKOV
    WARNERS
    ZSIVOCZKY
    McMULLEN
    MARTINEAU
    HERNU
    BARRAS
    NOOL
    BOURGUIGNON
    Sebrle
    Clay
    Karpov Macey
    Warners
    Zsivoczky
    Hernu
    Nool
    Bernard
    Schwarzl
    Pogorelov
    Schoenbeck
    Barras
    Smith
    Averyanov
    Ojaniemi
    Smirnov
    Qi
    Drews
    Parkhomenko
    Terek
    Gomez
    Turi
    Lorenzo
    Karlivans
    Korkizoglou
    Uldal
    Casarsa
    16 / 92

    View Slide

  37. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Overfitting
    X41×6 = F41×2V2×6
    + N(0, 0.5) ⇒ 50% of NA
    -4 -2 0 2 4
    -3 -2 -1 0 1 2 3 4
    ACP sur données complètes
    Dim 1 (55.09%)
    Dim 2 (27.91%)
    SEBRLE
    CLAY
    KARPOV
    BERNARD
    YURKOV
    WARNERS
    ZSIVOCZKY
    McMULLEN
    MARTINEAU
    HERNU
    BARRAS
    NOOL
    BOURGUIGNON
    Sebrle
    Clay
    Karpov
    Macey
    Warners
    Zsivoczky
    Hernu
    Nool
    Bernard
    Schwarzl
    Pogorelov
    Schoenbeck
    Barras
    Smith
    Averyanov
    Ojaniemi
    Smirnov
    Qi
    Drews
    Parkhomenko
    Terek
    Gomez
    Turi
    Lorenzo
    Karlivans
    Korkizoglou
    Uldal
    Casarsa
    -4 -2 0 2 4
    -4 -2 0 2
    ACP itérative
    Dim 1 (63.97%)
    Dim 2 (31.9%)
    SEBRLE
    CLAY
    KARPOV
    BERNARD
    YURKOV
    WARNERS
    ZSIVOCZKY
    McMULLEN
    MARTINEAU
    HERNU
    BARRAS
    NOOL
    BOURGUIGNON
    Sebrle
    Clay
    Karpov Macey
    Warners
    Zsivoczky
    Hernu
    Nool
    Bernard
    Schwarzl
    Pogorelov
    Schoenbeck
    Barras
    Smith
    Averyanov
    Ojaniemi
    Smirnov
    Qi
    Drews
    Parkhomenko
    Terek
    Gomez
    Turi
    Lorenzo
    Karlivans
    Korkizoglou
    Uldal
    Casarsa
    ⇒ fitting error is low: ||W ∗ (X − ˆ
    X)||2 = 0.48
    ⇒ prediction error is high: ||(1 − W) ∗ (X − ˆ
    X)||2 = 5.58
    16 / 92

    View Slide

  38. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Overfitting
    Overfitting when:
    • many parameters / the number of observed values (the
    number of dimensions S and of missing values are important)
    • data are very noisy
    ⇒ Trust too much the relationship between variables
    Remarks:
    • missing values: special case of small data set
    • iterative PCA: prediction method
    Solution:
    ⇒ Shrinkage methods
    17 / 92

    View Slide

  39. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Regularized iterative PCA (Josse et al., 2009)
    ⇒ Initialization - estimation step - imputation step
    The imputation step:
    ˆ
    xPCA
    ij
    =
    S
    s=1
    λsUisVjs
    is replaced by a "shrunk" imputation step (Efron & Morris 1972):
    ˆ
    xrPCA
    ij
    =
    S
    s=1
    λs − ˆ
    σ2
    λs
    λsUisVjs =
    S
    s=1
    λs −
    ˆ
    σ2

    λs
    UisVjs
    18 / 92

    View Slide

  40. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Regularized iterative PCA (Josse et al., 2009)
    ⇒ Initialization - estimation step - imputation step
    The imputation step:
    ˆ
    xPCA
    ij
    =
    S
    s=1
    λsUisVjs
    is replaced by a "shrunk" imputation step (Efron & Morris 1972):
    ˆ
    xrPCA
    ij
    =
    S
    s=1
    λs − ˆ
    σ2
    λs
    λsUisVjs =
    S
    s=1
    λs −
    ˆ
    σ2

    λs
    UisVjs
    ˆ
    σ2 =
    RSS
    ddl
    =
    n q
    s=S+1
    λs
    np − p − nS − pS + S2 + S
    (Xn×p; Un×S; Vp×S)
    18 / 92

    View Slide

  41. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Regularized iterative PCA (Josse et al., 2009)
    ⇒ Initialization - estimation step - imputation step
    The imputation step:
    ˆ
    xPCA
    ij
    =
    S
    s=1
    λsUisVjs
    is replaced by a "shrunk" imputation step (Efron & Morris 1972):
    ˆ
    xrPCA
    ij
    =
    S
    s=1
    λs − ˆ
    σ2
    λs
    λsUisVjs =
    S
    s=1
    λs −
    ˆ
    σ2

    λs
    UisVjs
    ˆ
    σ2 =
    RSS
    ddl
    =
    n q
    s=S+1
    λs
    np − p − nS − pS + S2 + S
    (Xn×p; Un×S; Vp×S)
    Between hard/soft thresholding (Mazumder, Hastie & Tibshirani, 2010)
    σ2 small → regularized PCA ≈ PCA
    σ2 large → mean imputation
    18 / 92

    View Slide

  42. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Regularized iterative PCA
    X41×6 = F41×2V2×6
    + N(0, 0.5) ⇒ 50% of NA
    -4 -2 0 2 4
    -3 -2 -1 0 1 2 3 4
    ACP sur données complètes
    Dim 1 (55.09%)
    Dim 2 (27.91%)
    SEBRLE
    CLAY
    KARPOV
    BERNARD
    YURKOV
    WARNERS
    ZSIVOCZKY
    McMULLEN
    MARTINEAU
    HERNU
    BARRAS
    NOOL
    BOURGUIGNON
    Sebrle
    Clay
    Karpov
    Macey
    Warners
    Zsivoczky
    Hernu
    Nool
    Bernard
    Schwarzl
    Pogorelov
    Schoenbeck
    Barras
    Smith
    Averyanov
    Ojaniemi
    Smirnov
    Qi
    Drews
    Parkhomenko
    Terek
    Gomez
    Turi
    Lorenzo
    Karlivans
    Korkizoglou
    Uldal
    Casarsa
    -4 -2 0 2 4
    -3 -2 -1 0 1 2 3
    ACP régularisée
    Dim 1 (64.27%)
    Dim 2 (30.72%)
    SEBRLE
    CLAY
    KARPOV
    BERNARD
    YURKOV
    WARNERS
    ZSIVOCZKY
    McMULLEN
    MARTINEAU
    HERNU
    BARRAS
    NOOL
    BOURGUIGNON
    Sebrle
    Clay
    Karpov
    Macey
    Warners
    Zsivoczky
    Hernu
    Nool
    Bernard
    Schwarzl
    Pogorelov
    Schoenbeck
    Barras
    Smith
    Averyanov
    Ojaniemi
    Smirnov
    Qi
    Drews
    Parkhomenko
    Terek
    Gomez
    Turi
    Lorenzo
    Karlivans
    Korkizoglou
    Uldal
    Casarsa
    ⇒ fitting error: ||W ∗ (X − ˆ
    X)||2 = 0.52 (EM= 0.48)
    ⇒ prediction error: ||(1 − W) ∗ (X − ˆ
    X)||2 = 0.67 (EM= 5.58)
    19 / 92

    View Slide

  43. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Properties
    ⇒ Quality of estimation of the parameters:
    Simulation X = FV + ε
    RV coefficient between complete/ incomplete
    • performances decrease with missing values and level of noise
    • difficult settings: regularized PCA equals mean imputation
    • the choice of the number of dimensions is less crucial
    ⇒ Quality of imputation:
    • Good when the structure is strong (imputation uses similarities
    between individuals and relationship between variables)
    • Competitive with random forests
    20 / 92

    View Slide

  44. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    A real dataset
    O3 T9 T12 T15 Ne9 Ne12 Ne15 Vx9 Vx12 Vx15 O3v
    0601 NA 15.6 18.5 18.4 4 4 8 NA -1.7101 -0.6946 84
    0602 82 17 18.4 17.7 5 5 7 NA NA NA 87
    0603 92 NA 17.6 19.5 2 5 4 2.9544 1.8794 0.5209 82
    0604 114 16.2 NA NA 1 1 0 NA NA NA 92
    0605 94 17.4 20.5 NA 8 8 7 -0.5 NA -4.3301 114
    0606 80 17.7 NA 18.3 NA NA NA -5.6382 -5 -6 94
    0607 NA 16.8 15.6 14.9 7 8 8 -4.3301 -1.8794 -3.7588 80
    0610 79 14.9 17.5 18.9 5 5 4 0 -1.0419 -1.3892 NA
    0611 101 NA 19.6 21.4 2 4 4 -0.766 NA -2.2981 79
    0612 NA 18.3 21.9 22.9 5 6 8 1.2856 -2.2981 -3.9392 101
    0613 101 17.3 19.3 20.2 NA NA NA -1.5 -1.5 -0.8682 NA
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    0919 NA 14.8 16.3 15.9 7 7 7 -4.3301 -6.0622 -5.1962 42
    0920 71 15.5 18 17.4 7 7 6 -3.9392 -3.0642 0 NA
    0921 96 NA NA NA 3 3 3 NA NA NA 71
    0922 98 NA NA NA 2 2 2 4 5 4.3301 96
    0923 92 14.7 17.6 18.2 1 4 6 5.1962 5.1423 3.5 98
    0924 NA 13.3 17.7 17.7 NA NA NA -0.9397 -0.766 -0.5 92
    0925 84 13.3 17.7 17.8 3 5 6 0 -1 -1.2856 NA
    0927 NA 16.2 20.8 22.1 6 5 5 -0.6946 -2 -1.3681 71
    0928 99 16.9 23 22.6 NA 4 7 1.5 0.8682 0.8682 NA
    0929 NA 16.9 19.8 22.1 6 5 3 -4 -3.7588 -4 99
    0930 70 15.7 18.6 20.7 NA NA NA 0 -1.0419 -4 NA
    21 / 92

    View Slide

  45. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    PCA on the incomplete data
    q
    −4 −2 0 2 4 6
    −6 −4 −2 0 2 4
    Individuals factor map (PCA)
    Dim 1 (57.47%)
    Dim 2 (21.34%)
    East
    North
    West
    South
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    East
    North
    West
    South
    q
    −1.0 −0.5 0.0 0.5 1.0
    −1.0 −0.5 0.0 0.5 1.0
    Variables factor map (PCA)
    Dim 1 (55.85%)
    Dim 2 (21.73%)
    T9
    T12
    T15
    Ne9
    Ne12
    Ne15
    Vx9
    Vx12
    Vx15
    maxO3v
    maxO3
    22 / 92

    View Slide

  46. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Outline
    1 Introduction
    2 Point estimates of the PCA axes and components
    3 Uncertainty
    4 MCA/MFA
    5 Single imputation for mixed variables
    6 Multiple imputation
    7 Practice
    8 Appendix
    23 / 92

    View Slide

  47. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Uncertainty with incomplete case
    ⇒ A new source of variability to take into account
    • less data: more uncertainty
    • iterative PCA: single imputation → residual bootstrap on the
    completed data leads to underestimate the variability
    ⇒ Multiple imputation
    1 Generating B imputed data sets
    2 Performing the analysis on each imputed data set
    3 Combining: variance = within + between imputation variance
    24 / 92

    View Slide

  48. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Uncertainty with incomplete case
    ⇒ A new source of variability to take into account
    • less data: more uncertainty
    • iterative PCA: single imputation → residual bootstrap on the
    completed data leads to underestimate the variability
    ⇒ Multiple imputation
    1 Generating B imputed data sets: b = 1, ..., B,
    missing values xb
    ij
    drawn from the predictive N (FV )ij, ˆ
    σ2
    ⇒ "improper" imputation
    2 Performing the analysis on each imputed data set
    3 Combining: variance = within + between imputation variance
    24 / 92

    View Slide

  49. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    “proper” multiple imputation
    1 Variability of the parameters: obtaining B plausible sets of
    parameters, (F, V )1, ..., (F, V )B ⇒ bootstrap/bayesian
    2 Noise: for b = 1, ..., B, missing values xb
    ij
    are imputing by
    drawing from the predictive distribution N (FV )b
    ij
    , ˆ
    σ2
    ( ˆ
    F ˆ
    U′)ik
    ( ˆ
    F ˆ
    U′)1
    ik
    + ε1
    ik
    ( ˆ
    F ˆ
    U′)2
    ik
    + ε2
    ik
    ( ˆ
    F ˆ
    U′)3
    ik
    + ε3
    ik
    ( ˆ
    F ˆ
    U′)B
    ik
    + εB
    ik
    25 / 92

    View Slide

  50. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Supplementary projection
    ⇒ Individuals position (and variables) with other predictions
    Supplementary
    projection
    PCA
    Regularized iterative PCA
    ⇒ reference configuration
    26 / 92

    View Slide

  51. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Supplementary projection
    ⇒ Individuals position (and variables) with other predictions
    Supplementary
    projection
    PCA
    Regularized iterative PCA
    ⇒ reference configuration
    26 / 92

    View Slide

  52. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Supplementary projection
    ⇒ Individuals position (and variables) with other predictions
    Supplementary
    projection
    PCA
    Regularized iterative PCA
    ⇒ reference configuration
    26 / 92

    View Slide

  53. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Multiple imputation in practice
    q
    −5 0 5
    −8 −6 −4 −2 0 2 4 6
    Supplementary projection
    Dim 1 (57.20%)
    Dim 2 (20.27%)
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    4142 43
    44
    45 46
    47
    48 49
    50
    51
    52
    53
    54 55 56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    7677
    78
    79
    80
    81
    82 83
    84
    85
    86
    87
    88
    89
    9091
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104 105
    106
    107
    108
    109
    110
    111
    112
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q q
    q
    q
    q
    q
    q
    −1.0 −0.5 0.0 0.5 1.0
    −1.0 −0.5 0.0 0.5 1.0
    Variable representation
    Dim 1 (57.20%)
    Dim 2 (20.27%)
    maxO3
    T9
    T12
    T15
    Ne9
    Ne12 Ne15
    Vx9
    Vx12
    Vx15
    maxO3v
    27 / 92

    View Slide

  54. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Between imputation variability
    ⇒ Influence of the different predictions on the parameters (PCA
    on each table)
    PCA
    28 / 92

    View Slide

  55. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Between imputation variability
    ⇒ Influence of the different predictions on the parameters (PCA
    on each table)
    PCA
    ( ˜
    F ˜
    U′)1 ( ˜
    F ˜
    U′)2 ( ˜
    F ˜
    U′)3 ( ˜
    F ˜
    U′)B
    28 / 92

    View Slide

  56. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Between imputation variability
    ⇒ Influence of the different predictions on the parameters (PCA
    on each table)
    Procrustean rotation
    PCA
    ( ˜
    F ˜
    U′)1 ( ˜
    F ˜
    U′)2 ( ˜
    F ˜
    U′)3 ( ˜
    F ˜
    U′)B
    28 / 92

    View Slide

  57. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Between imputation variability
    ⇒ Influence of the different predictions on the parameters (PCA
    on each table)
    Procrustean rotation
    PCA
    ( ˜
    F ˜
    U′)1 ( ˜
    F ˜
    U′)2 ( ˜
    F ˜
    U′)3 ( ˜
    F ˜
    U′)B
    28 / 92

    View Slide

  58. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Between imputation variability
    q
    −4 −2 0 2 4 6
    −4 −2 0 2
    Multiple imputation using Procrustes
    Dim 1 (71.33%)
    Dim 2 (17.17%)
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    1 2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    29 / 92

    View Slide

  59. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Outline
    1 Introduction
    2 Point estimates of the PCA axes and components
    3 Uncertainty
    4 MCA/MFA
    5 Single imputation for mixed variables
    6 Multiple imputation
    7 Practice
    8 Appendix
    30 / 92

    View Slide

  60. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    MCA for categorical data
    MCA can be seen as the PCA of (data, metric, row masses)
    IXD−1
    Σ
    ,
    1
    IJ
    DΣ,
    1
    I II
    with X the indicator matrix and DΣ the diagonal matrix of the
    column margins of X,
    xik
    I1
    Ik
    IK
    J
    J
    J
    IJ
    X = DΣ =
    I1
    Ik
    IK
    .
    .
    .
    ..
    .
    .
    ..
    .
    .
    .
    .
    .
    .
    ..
    .
    .
    ..
    .
    .
    .
    .
    .
    .
    ..
    .
    .
    ..
    .
    .
    .
    .
    .
    .
    ..
    .
    .
    ..
    .
    .
    .
    0
    0
    1 0 0 1 0 0 1 ... 0 1
    1 0 0 1 0 1 0 ... NA NA
    NA NA NA 0 1 0 0 ... 0 1
    1 0 0 1 0 0 1 ... 0 1
    0 0 1 NA NA 0 ... 0 1
    1 0 0 1 0 0 1 ... 0 1
    31 / 92

    View Slide

  61. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Regularized iterative MCA (Josse et al., 2012)
    • Initialization: imputation of the indicator matrix (proportion)
    • Iterate until convergence
    1 Estimation of F , V : MCA on the completed indicator matrix
    2 Imputation of the missing values with the model matrix
    3 Column margins are updated
    V1 V2 V3 … V14 V1_a V1_b V1_c V2_e V2_f V3_g V3_h …
    ind 1 a NA g … u ind 1 1 0 0 0.71 0.29 1 0 …
    ind 2 NA f g u ind 2 0.12 0.29 0.59 0 1 1 0 …
    ind 3 a e h v ind 3 1 0 0 1 0 0 1 …
    ind 4 a e h v ind 4 1 0 0 1 0 0 1 …
    ind 5 b f h u ind 5 0 1 0 0 1 0 1 …
    ind 6 c f h u ind 6 0 0 1 0 1 0 1 …
    ind 7 c f NA v ind 7 0 0 1 0 1 0.37 0.63 …
    … … … … … … … … … … … … … …
    ind 1232 c f h v ind 1232 0 0 1 0 1 0 1 …
    ⇒ Imputed values can be seen as degree of membership
    ⇒ Missing values mask an underlying value
    32 / 92

    View Slide

  62. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    A real example
    • 1232 respondents, 14 questions, 35 categories, 9% of missing
    values concerning 42% of respondents
    q
    0 1 2 3 4 5 6
    −3 −2 −1 0 1 2 3
    Missing single: categories
    Dim 1 (11.74%)
    Dim 2 (8.618%)
    Q1.NA
    Q1_1
    Q1_2
    Q1_3
    Q2.NA
    Q2_1
    Q2_2
    Q2_3
    Q3.NA
    Q3_1
    Q3_2
    Q3_3
    Q4.NA
    Q4_1
    Q4_2
    Q5.NA
    Q5_1
    Q5_2
    Q6.NA
    Q6_1
    Q6_2
    Q7.NA
    Q7_1
    Q7_2
    Q8.NA
    Q8_1
    Q8_2
    Q9.NA
    Q9_1
    Q9_2
    Q9_3
    Q10.NA
    Q10_1
    Q10_2
    Q11.NA
    Q11_1
    Q11_2
    Q12.NA
    Q12_1
    Q12_2
    Q12_3
    Q13.NA
    Q13_1
    Q13_2
    Q13_3
    Q14.NA
    Q14_1
    Q14_2
    Q14_3
    q
    0 1 2 3 4 5
    −3 −2 −1 0 1 2 3
    Missing single: subjects
    Dim 1 (11.74%)
    Dim 2 (8.618%)
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    33 / 92

    View Slide

  63. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    A real example
    • 1232 respondents, 14 questions, 35 categories, 9% of missing
    values concerning 42% of respondents
    q
    0 1 2 3 4 5 6
    −3 −2 −1 0 1 2 3
    Missing single: categories
    Dim 1 (11.74%)
    Dim 2 (8.618%)
    Q1.NA
    Q1_1
    Q1_2
    Q1_3
    Q2.NA
    Q2_1
    Q2_2
    Q2_3
    Q3.NA
    Q3_1
    Q3_2
    Q3_3
    Q4.NA
    Q4_1
    Q4_2
    Q5.NA
    Q5_1
    Q5_2
    Q6.NA
    Q6_1
    Q6_2
    Q7.NA
    Q7_1
    Q7_2
    Q8.NA
    Q8_1
    Q8_2
    Q9.NA
    Q9_1
    Q9_2
    Q9_3
    Q10.NA
    Q10_1
    Q10_2
    Q11.NA
    Q11_1
    Q11_2
    Q12.NA
    Q12_1
    Q12_2
    Q12_3
    Q13.NA
    Q13_1
    Q13_2
    Q13_3
    Q14.NA
    Q14_1
    Q14_2
    Q14_3
    q
    0 1 2 3 4 5
    −3 −2 −1 0 1 2 3
    Missing single: subjects
    Dim 1 (11.74%)
    Dim 2 (8.618%)
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    −1.0 −0.5 0.0 0.5 1.0 1.5
    −0.5 0.0 0.5 1.0 1.5
    Regularized iterative MCA: categories
    Dim 1 (14.58%)
    Dim 2 (11.21%)
    Q1.1
    Q1.2
    Q1.3
    Q2.1
    Q2.2
    Q2.3
    Q3.1
    Q3.2
    Q3.3
    Q4.1
    Q4.2
    Q5.1
    Q5.2
    Q6.1
    Q6.2
    Q7.1
    Q7.2
    Q8.1
    Q8.2
    Q9.1
    Q9.2
    Q9.3
    Q10.1
    Q10.2
    Q11.1
    Q11.2
    Q12.1
    Q12.2
    Q12.3
    Q13.1
    Q13.2
    Q13.3
    Q14.1
    Q14.2
    Q14.3
    q
    −1.0 −0.5 0.0 0.5 1.0 1.5
    −1.0 −0.5 0.0 0.5 1.0 1.5
    Regularized iterative MCA: subjects
    Dim 1 (14.58%)
    Dim 2 (11.21%)
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    33 / 92

    View Slide

  64. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Multi-blocks data set
    • Biology: 10 samples without expression data
    • Sensory analysis: each judge can’t evaluate more than a
    certain number of products (saturation)
    Planned missing products judge, experimental design: BIB
    ⇒ Missing rows per subtable
    ⇒ Regularized iterative MFA (Husson & Josse, 2013)
    34 / 92

    View Slide

  65. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Journal impact factors
    journalmetrics.com provides 27000 journals/ 15 years of metrics.
    443 journals (Computer Science, Statistics, Probability and
    Mathematics). 45 metrics, some may be NA, 15 years by 3 types
    of measures:
    • IPP - Impact Per Publication (like the ISI impact factor but
    for 3 (rather than 2) years.
    • SNIP - Source Normalized Impact Per Paper: Tries to weight
    by the number of citations per subject field to adjust for
    different citation cultures.
    • SJR - SCImago Journal Rank: Tries to capture average
    prestige per publication.
    35 / 92

    View Slide

  66. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    MFA with missing values
    -5 0 5 10 15 20
    -4 -2 0 2 4 6
    Journals
    Dim 1 (74.03%)
    Dim 2 (8.29%)
    ACM Transactions on Autonomous and Adaptive Systems
    ACM Transactions on Mathematical Software
    ACM Transactions on Programming Languages and Systems
    ACM Transactions on Software Engineering and Methodology
    Ad Hoc Networks
    Advances in Engineering Software (1978)
    Annals of Applied Probability
    Annals of Probability
    Annals of Statistics
    Bioinformatics
    Biometrics
    Biometrika
    Biostatistics
    Computer Vision and Image Understanding
    Finance and Stochastics
    IBM Systems Journal IEEE Micro
    IEEE Network
    IEEE Pervasive Computing
    IEEE Transactions on Affective Computing IEEE Transactions on Evolutionary Computation
    IEEE Transactions on Image Processing
    IEEE Transactions on Medical Imaging
    IEEE Transactions on Mobile Computing
    IEEE Transactions on Neural Networks
    IEEE Transactions on Pattern Analysis and Machine Intelligence
    IEEE Transactions on Software Engineering
    IEEE Transactions on Systems, Man and Cybernetics Part B: Cybernetics
    IEEE Transactions on Visualization and Computer Graphics
    IEEE/ACM Transactions on Networking
    Information Systems International Journal of Computer Vision
    International Journal of Robotics Research
    Journal of Business
    Journal of Business and Economic Statistics
    Journal of Cryptology
    Journal of Informetrics
    Journal of Machine Learning Research
    Journal of the ACM
    Journal of the American Society for Information Science and Technology
    Journal of the American Statistical Association
    Journal of the Royal Statistical Society. Series B: Statistical Methodology
    Machine Learning
    Mathematical Programming, Series B
    Multivariate Behavioral Research
    New Zealand Statistician
    Pattern Recognition
    Physical Review E - Statistical, Nonlinear, and Soft Matter Physics
    Probability Surveys
    Probability Theory and Related Fields
    Journal of Computational and Graphical Statistics
    R Journal
    Annals of Applied Statistics
    Journal of Statistical Software
    36 / 92

    View Slide

  67. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    MFA with missing values
    q
    −1.0 −0.5 0.0 0.5 1.0
    −1.0 −0.5 0.0 0.5 1.0
    Correlation circle
    Dim 1 (74.03%)
    Dim 2 (8.29%)
    IPP_1999 IPP_2000
    IPP_2001
    IPP_2002
    IPP_2003
    IPP_2004
    IPP_2005
    IPP_2006 IPP_2007
    IPP_2008
    IPP_2009
    IPP_2010
    IPP_2011
    IPP_2012
    IPP_2013
    IPP_1999 IPP_2000
    IPP_2001
    IPP_2002
    IPP_2003
    IPP_2004
    IPP_2005
    IPP_2006 IPP_2007
    IPP_2008
    IPP_2009
    IPP_2010
    IPP_2011
    IPP_2012
    IPP_2013
    IPP_1999 IPP_2000
    IPP_2001
    IPP_2002
    IPP_2003
    IPP_2004
    IPP_2005
    IPP_2006 IPP_2007
    IPP_2008
    IPP_2009
    IPP_2010
    IPP_2011
    IPP_2012
    IPP_2013
    IPP_1999 IPP_2000
    IPP_2001
    IPP_2002
    IPP_2003
    IPP_2004
    IPP_2005
    IPP_2006 IPP_2007
    IPP_2008
    IPP_2009
    IPP_2010
    IPP_2011
    IPP_2012
    IPP_2013
    IPP_1999 IPP_2000
    IPP_2001
    IPP_2002
    IPP_2003
    IPP_2004
    IPP_2005
    IPP_2006 IPP_2007
    IPP_2008
    IPP_2009
    IPP_2010
    IPP_2011
    IPP_2012
    IPP_2013
    IPP_1999 IPP_2000
    IPP_2001
    IPP_2002
    IPP_2003
    IPP_2004
    IPP_2005
    IPP_2006 IPP_2007
    IPP_2008
    IPP_2009
    IPP_2010
    IPP_2011
    IPP_2012
    IPP_2013
    IPP_1999 IPP_2000
    IPP_2001
    IPP_2002
    IPP_2003
    IPP_2004
    IPP_2005
    IPP_2006 IPP_2007
    IPP_2008
    IPP_2009
    IPP_2010
    IPP_2011
    IPP_2012
    IPP_2013
    IPP_1999 IPP_2000
    IPP_2001
    IPP_2002
    IPP_2003
    IPP_2004
    IPP_2005
    IPP_2006 IPP_2007
    IPP_2008
    IPP_2009
    IPP_2010
    IPP_2011
    IPP_2012
    IPP_2013
    IPP_1999 IPP_2000
    IPP_2001
    IPP_2002
    IPP_2003
    IPP_2004
    IPP_2005
    IPP_2006 IPP_2007
    IPP_2008
    IPP_2009
    IPP_2010
    IPP_2011
    IPP_2012
    IPP_2013
    q
    −1.0 −0.5 0.0 0.5 1.0
    −1.0 −0.5 0.0 0.5 1.0
    Correlation circle
    Dim 1 (74.03%)
    Dim 2 (8.29%)
    SNIP_1999
    SNIP_2000
    SNIP_2001
    SNIP_2002
    SNIP_2003
    SNIP_2004
    SNIP_2005
    SNIP_2006
    SNIP_2007
    SNIP_2008 SNIP_2009
    SNIP_2010
    SNIP_2011
    SNIP_2012
    SNIP_2013
    SNIP_1999
    SNIP_2000
    SNIP_2001
    SNIP_2002
    SNIP_2003
    SNIP_2004
    SNIP_2005
    SNIP_2006
    SNIP_2007
    SNIP_2008 SNIP_2009
    SNIP_2010
    SNIP_2011
    SNIP_2012
    SNIP_2013
    SNIP_1999
    SNIP_2000
    SNIP_2001
    SNIP_2002
    SNIP_2003
    SNIP_2004
    SNIP_2005
    SNIP_2006
    SNIP_2007
    SNIP_2008 SNIP_2009
    SNIP_2010
    SNIP_2011
    SNIP_2012
    SNIP_2013
    SNIP_1999
    SNIP_2000
    SNIP_2001
    SNIP_2002
    SNIP_2003
    SNIP_2004
    SNIP_2005
    SNIP_2006
    SNIP_2007
    SNIP_2008 SNIP_2009
    SNIP_2010
    SNIP_2011
    SNIP_2012
    SNIP_2013
    SNIP_1999
    SNIP_2000
    SNIP_2001
    SNIP_2002
    SNIP_2003
    SNIP_2004
    SNIP_2005
    SNIP_2006
    SNIP_2007
    SNIP_2008 SNIP_2009
    SNIP_2010
    SNIP_2011
    SNIP_2012
    SNIP_2013
    SNIP_1999
    SNIP_2000
    SNIP_2001
    SNIP_2002
    SNIP_2003
    SNIP_2004
    SNIP_2005
    SNIP_2006
    SNIP_2007
    SNIP_2008 SNIP_2009
    SNIP_2010
    SNIP_2011
    SNIP_2012
    SNIP_2013
    SNIP_1999
    SNIP_2000
    SNIP_2001
    SNIP_2002
    SNIP_2003
    SNIP_2004
    SNIP_2005
    SNIP_2006
    SNIP_2007
    SNIP_2008 SNIP_2009
    SNIP_2010
    SNIP_2011
    SNIP_2012
    SNIP_2013
    SNIP_1999
    SNIP_2000
    SNIP_2001
    SNIP_2002
    SNIP_2003
    SNIP_2004
    SNIP_2005
    SNIP_2006
    SNIP_2007
    SNIP_2008 SNIP_2009
    SNIP_2010
    SNIP_2011
    SNIP_2012
    SNIP_2013
    SNIP_1999
    SNIP_2000
    SNIP_2001
    SNIP_2002
    SNIP_2003
    SNIP_2004
    SNIP_2005
    SNIP_2006
    SNIP_2007
    SNIP_2008 SNIP_2009
    SNIP_2010
    SNIP_2011
    SNIP_2012
    SNIP_2013
    36 / 92

    View Slide

  68. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    MFA with missing values
    q
    −1.0 −0.5 0.0 0.5 1.0
    −1.0 −0.5 0.0 0.5 1.0
    Correlation circle
    Dim 1 (74.03%)
    Dim 2 (8.29%)
    IPP_1999 IPP_2000
    IPP_2001
    IPP_2002
    IPP_2003
    IPP_2004
    IPP_2005
    IPP_2006 IPP_2007
    IPP_2008
    IPP_2009
    IPP_2010
    IPP_2011
    IPP_2012
    IPP_2013
    IPP_1999 IPP_2000
    IPP_2001
    IPP_2002
    IPP_2003
    IPP_2004
    IPP_2005
    IPP_2006 IPP_2007
    IPP_2008
    IPP_2009
    IPP_2010
    IPP_2011
    IPP_2012
    IPP_2013
    IPP_1999 IPP_2000
    IPP_2001
    IPP_2002
    IPP_2003
    IPP_2004
    IPP_2005
    IPP_2006 IPP_2007
    IPP_2008
    IPP_2009
    IPP_2010
    IPP_2011
    IPP_2012
    IPP_2013
    IPP_1999 IPP_2000
    IPP_2001
    IPP_2002
    IPP_2003
    IPP_2004
    IPP_2005
    IPP_2006 IPP_2007
    IPP_2008
    IPP_2009
    IPP_2010
    IPP_2011
    IPP_2012
    IPP_2013
    IPP_1999 IPP_2000
    IPP_2001
    IPP_2002
    IPP_2003
    IPP_2004
    IPP_2005
    IPP_2006 IPP_2007
    IPP_2008
    IPP_2009
    IPP_2010
    IPP_2011
    IPP_2012
    IPP_2013
    IPP_1999 IPP_2000
    IPP_2001
    IPP_2002
    IPP_2003
    IPP_2004
    IPP_2005
    IPP_2006 IPP_2007
    IPP_2008
    IPP_2009
    IPP_2010
    IPP_2011
    IPP_2012
    IPP_2013
    IPP_1999 IPP_2000
    IPP_2001
    IPP_2002
    IPP_2003
    IPP_2004
    IPP_2005
    IPP_2006 IPP_2007
    IPP_2008
    IPP_2009
    IPP_2010
    IPP_2011
    IPP_2012
    IPP_2013
    IPP_1999 IPP_2000
    IPP_2001
    IPP_2002
    IPP_2003
    IPP_2004
    IPP_2005
    IPP_2006 IPP_2007
    IPP_2008
    IPP_2009
    IPP_2010
    IPP_2011
    IPP_2012
    IPP_2013
    IPP_1999 IPP_2000
    IPP_2001
    IPP_2002
    IPP_2003
    IPP_2004
    IPP_2005
    IPP_2006 IPP_2007
    IPP_2008
    IPP_2009
    IPP_2010
    IPP_2011
    IPP_2012
    IPP_2013
    q
    −1.0 −0.5 0.0 0.5 1.0
    −1.0 −0.5 0.0 0.5 1.0
    Correlation circle
    Dim 1 (74.03%)
    Dim 2 (8.29%)
    SNIP_1999
    SNIP_2000
    SNIP_2001
    SNIP_2002
    SNIP_2003
    SNIP_2004
    SNIP_2005
    SNIP_2006
    SNIP_2007
    SNIP_2008
    SNIP_2009
    SNIP_2010
    SNIP_2011
    SNIP_2012
    SNIP_2013
    SNIP_1999
    SNIP_2000
    SNIP_2001
    SNIP_2002
    SNIP_2003
    SNIP_2004
    SNIP_2005
    SNIP_2006
    SNIP_2007
    SNIP_2008
    SNIP_2009
    SNIP_2010
    SNIP_2011
    SNIP_2012
    SNIP_2013
    SNIP_1999
    SNIP_2000
    SNIP_2001
    SNIP_2002
    SNIP_2003
    SNIP_2004
    SNIP_2005
    SNIP_2006
    SNIP_2007
    SNIP_2008
    SNIP_2009
    SNIP_2010
    SNIP_2011
    SNIP_2012
    SNIP_2013
    SNIP_1999
    SNIP_2000
    SNIP_2001
    SNIP_2002
    SNIP_2003
    SNIP_2004
    SNIP_2005
    SNIP_2006
    SNIP_2007
    SNIP_2008
    SNIP_2009
    SNIP_2010
    SNIP_2011
    SNIP_2012
    SNIP_2013
    SNIP_1999
    SNIP_2000
    SNIP_2001
    SNIP_2002
    SNIP_2003
    SNIP_2004
    SNIP_2005
    SNIP_2006
    SNIP_2007
    SNIP_2008
    SNIP_2009
    SNIP_2010
    SNIP_2011
    SNIP_2012
    SNIP_2013
    SNIP_1999
    SNIP_2000
    SNIP_2001
    SNIP_2002
    SNIP_2003
    SNIP_2004
    SNIP_2005
    SNIP_2006
    SNIP_2007
    SNIP_2008
    SNIP_2009
    SNIP_2010
    SNIP_2011
    SNIP_2012
    SNIP_2013
    SNIP_1999
    SNIP_2000
    SNIP_2001
    SNIP_2002
    SNIP_2003
    SNIP_2004
    SNIP_2005
    SNIP_2006
    SNIP_2007
    SNIP_2008
    SNIP_2009
    SNIP_2010
    SNIP_2011
    SNIP_2012
    SNIP_2013
    SNIP_1999
    SNIP_2000
    SNIP_2001
    SNIP_2002
    SNIP_2003
    SNIP_2004
    SNIP_2005
    SNIP_2006
    SNIP_2007
    SNIP_2008
    SNIP_2009
    SNIP_2010
    SNIP_2011
    SNIP_2012
    SNIP_2013
    SNIP_1999
    SNIP_2000
    SNIP_2001
    SNIP_2002
    SNIP_2003
    SNIP_2004
    SNIP_2005
    SNIP_2006
    SNIP_2007
    SNIP_2008
    SNIP_2009
    SNIP_2010
    SNIP_2011
    SNIP_2012
    SNIP_2013
    36 / 92

    View Slide

  69. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    MFA with missing values
    ACM Transactions on Networking trajectory.pdf
    q
    −20 −10 0 10 20 30 40 50
    −20 −10 0 10 20 30 40
    Individual factor map
    Dim 1 (74.03%)
    Dim 2 (8.29%)
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    IEEE/ACM Transactions on Networking
    q
    year_1999
    year_2000
    year_2001
    year_2002
    year_2003
    year_2004
    year_2005
    year_2006
    year_2007
    year_2008
    year_2009
    year_2010
    year_2011
    year_2012
    year_2013
    36 / 92

    View Slide

  70. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    After performing principal component methods despite missing
    entries (getting the graphical outputs and the principal component
    and axes), we use these methods as tools of single and multiple
    imputation and compare them to the state of the art methods.
    PC methods are powerful to impute, since they use similarities
    between rows, relationship between columns and require a small
    number of parameters (dimensionality reduction)
    With single imputation, the aim to complete a dataset as best as
    possible (prediction). With multiple imputation the aim is to
    perform other statistical methods after and to estimate parameters
    and their variability taking into account the missing values
    uncertainty.
    37 / 92

    View Slide

  71. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Outline
    1 Introduction
    2 Point estimates of the PCA axes and components
    3 Uncertainty
    4 MCA/MFA
    5 Single imputation for mixed variables
    6 Multiple imputation
    7 Practice
    8 Appendix
    38 / 92

    View Slide

  72. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Principal component method for mixed data (complete)
    Factorial Analysis on Mixed Data (Escofier, 1979), PCAMIX (Kiers, 1991)
    Categorical
    variables
    Continuous
    variables
    0 1 0 1 0
    centring &
    scaling
    I1
    I2
    Ik
    division by
    and centring
    I/Ik
    0 1 0 1 0
    0 1 0 0 1
    51 100 190
    70 96 196
    38 69 166
    0 1
    1 0
    1 0
    1 0 0
    0 1 0
    0 1 0
    Indicator matrix
    Matrix which balances the
    influence of each variable
    A PCA is performed on the weighted matrix: SVD (X, D−1
    Σ
    , 1
    I
    II
    ), with X the
    matrix with the continuous variables and the indicator matrix, DΣ
    , the diagonal
    matrix with the standard deviation and the weights (Ik
    /I).
    39 / 92

    View Slide

  73. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Properties of the method
    • The distance between individuals is:
    d2(i, l) =
    Kcont
    k=1
    1
    σk
    (xik − xlk)2 +
    Q
    q=1
    Kq
    k=1
    1
    Ikq
    (xiq − xlq)2
    • The principal component Fs maximises:
    Kcont
    k=1
    r2(Fs, vk) +
    Qcat
    q=1
    η2(Fs, vq)
    40 / 92

    View Slide

  74. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Iterative FAMD algorithm
    1 Initialization: imputation mean (continuous) and proportion (dummy)
    2 Iterate until convergence
    (a) estimation: FAMD on the completed data ⇒ U, Λ, V
    (b) imputation of the missing values with the fitted matrix
    ˆ
    X = US
    Λ1/2
    S
    VS
    (c) means, standard deviations and column margins are updated
    age weight size alcohol sex snore tobacco
    NA 100 190 NA M yes no
    70 96 186 1-2 gl/d M NA <=1
    NA 104 194 No W no NA
    62 68 165 1-2 gl/d M no <=1
    age weight size alcohol sex snore tobacco
    51 100 190 1-2 gl/d M yes no
    70 96 186 1-2 gl/d M no <=1
    48 104 194 No W no <=1
    62 68 165 1-2 gl/d M no <=1
    51 100 190 0.2 0.7 0.1 1 0 0 1 1 0 0
    70 96 186 0 1 0 1 0 0.8 0.2 0 1 0
    48 104 194 1 0 0 0 1 1 0 0.1 0.8 0.1
    62 68 165 0 1 0 1 0 1 0 0 1 0
    NA 100 190 NA NA NA 1 0 0 1 1 0 0
    70 96 186 0 1 0 1 0 NA NA 0 1 0
    NA 104 194 1 0 0 0 1 1 0 NA NA NA
    62 68 165 0 1 0 1 0 1 0 0 1 0
    imputeAFDM
    ⇒ Imputed values can be seen as degrees of membership
    41 / 92

    View Slide

  75. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Iterative Random Forests imputation
    1 Initial imputation: mean imputation - random category
    Sort the variables according to the amount of missing values
    2 Fit a RF Xobs
    j
    on variables Xobs
    −j
    and then predict Xmiss
    j
    3 Cycling through variables
    4 Repeat step 2 and 3 until convergence
    • number of trees: 100
    • number of variables randomly selected at each node

    p
    • number of iterations: 4-5
    Implemented in the R package missForest (Daniel J. Stekhoven, Peter
    Buhlmann, 2011)
    42 / 92

    View Slide

  76. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Simulation study
    Several data sets
    • Relationships between variables
    • Number of categories
    • percentage of missing values (10%,20%,30%)
    Criteria:
    • for continuous data: RMSE
    • for categorical data: proportion of falsely classified entries
    43 / 92

    View Slide

  77. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Comparison on real data sets
    Imputations obtained with random forest & FAMD algorithm
    44 / 92

    View Slide

  78. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Summary
    Imputations with PC methods are good:
    • for strong linear relationships
    • for categorical variables
    • especially for rare categories (weights of MCA)
    ⇒ Number of components S?? Cross-Validation (GCV)
    Imputations with RF are good:
    • for strong non-linear relationships between continuous
    variables
    • when there are interactions
    ⇒ No tunning parameters?
    Rq: categorical data improve the imputation on continuous data
    and continuous data improve the imputation on categorical data
    45 / 92

    View Slide

  79. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Summary
    Imputations with PC methods are good:
    • for strong linear relationships
    • for categorical variables
    • especially for rare categories (weights of MCA)
    ⇒ Number of components S?? Cross-Validation (GCV)
    Imputations with RF are good:
    • for strong non-linear relationships between continuous
    variables (cutting continuous variables into categories)
    • when there are interactions (creating interactions)
    ⇒ No tunning parameters?
    Rq: categorical data improve the imputation on continuous data
    and continuous data improve the imputation on categorical data
    45 / 92

    View Slide

  80. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Outline
    1 Introduction
    2 Point estimates of the PCA axes and components
    3 Uncertainty
    4 MCA/MFA
    5 Single imputation for mixed variables
    6 Multiple imputation
    7 Practice
    8 Appendix
    46 / 92

    View Slide

  81. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Multiple imputation continuous data: bivariate case
    ⇒ Proper multiple imputation with yi = xi β + εi
    1 Variability of the parameters, M plausible: (ˆ
    β)1, ..., (ˆ
    β)M
    ⇒ Bootstrap
    ⇒ Posterior distribution: Data Augmentation (Tanner & Wong, 1987)
    2 Noise: for m = 1, ..., M, missing values ym
    i
    are imputed by
    drawing from the predictive distribution N(xi
    ˆ
    βm, (ˆ
    σ2)m)
    Improper Proper
    CIµy 95% 0.818 0.935
    47 / 92

    View Slide

  82. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Joint modeling
    ⇒ Hypothesis xi. ∼ N (µ, Σ)
    Algorithm Expectation Maximization Bootstrap:
    1 Bootstrap rows: X1, ... , XM
    EM algorithm: (ˆ
    µ1, ˆ
    Σ1), ... , (ˆ
    µM, ˆ
    ΣM)
    2 Imputation: xm
    ij
    drawn from N ˆ
    µm, ˆ
    Σm
    Easy to parallelized. Implemented in Amelia (website)
    Amelia Earhart
    James Honaker Gary King Matt Blackwell
    48 / 92

    View Slide

  83. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    (Fully) Conditional modeling
    ⇒ Hypothesis: one model/variable
    1 Initial imputation: mean imputation
    2 For a variable j
    2.1 (β−j , σ−j ) drawn from a Bootstrap or a posterior distribution
    2.2 Imputation: stochastic regression xij
    from N X−j
    β−j , σ−j
    3 Cycling through variables
    4 Repeat M times steps 2 and 3
    ⇒ Iteratively refine the imputation.
    Implemented in mice (website)
    “There is no clear-cut method for determining
    whether the MICE algorithm has converged” Stef van Buuren
    49 / 92

    View Slide

  84. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    (Fully) Conditional modeling
    ⇒ Hypothesis: one model/variable
    1 Initial imputation: mean imputation
    2 For a variable j
    2.1 (β−j , σ−j ) drawn from a Bootstrap or a posterior distribution
    2.2 Imputation: stochastic regression xij
    from N X−j
    β−j , σ−j
    3 Cycling through variables
    4 Repeat M times steps 2 and 3
    ⇒ Iteratively refine the imputation.
    ⇒ With continuous variables and a regression/variable: N (µ, Σ)
    Implemented in mice (website)
    “There is no clear-cut method for determining
    whether the MICE algorithm has converged” Stef van Buuren
    49 / 92

    View Slide

  85. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Joint / Conditional modeling
    ⇒ Both seen imputed values are drawn from a Joint distribution
    (even if joint does not exist)
    ⇒ Conditional modeling takes the lead?
    • Flexible: one model/variable. Easy to deal with interactions
    and variables of different nature (binary, ordinal, categorical...)
    • Many statistical models are conditional models!
    • Tailor to your data
    • Appears to work quite well in practice
    ⇒ Drawbacks: one model/variable... tedious...
    50 / 92

    View Slide

  86. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Joint / Conditional modeling
    ⇒ Both seen imputed values are drawn from a Joint distribution
    (even if joint does not exist)
    ⇒ Conditional modeling takes the lead?
    • Flexible: one model/variable. Easy to deal with interactions
    and variables of different nature (binary, ordinal, categorical...)
    • Many statistical models are conditional models!
    • Tailor to your data
    • Appears to work quite well in practice
    ⇒ Drawbacks: one model/variable... tedious...
    ⇒ What to do with high correlation or when n < p?
    • JM shrinks the covariance Σ + kI (selection of k?)
    • CM: ridge regression or predictors selection/variable ⇒ a lot
    of tuning ... not so easy ...
    50 / 92

    View Slide

  87. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Multiple imputation with Bootstrap/Bayesian PCA
    xij = ˜
    xij + εij =
    S
    s=1
    λsuisvjs + εij , εij ∼ N(0, σ2)
    1 Variability of the parameters, M plausible: (ˆ
    xij)1, ..., (ˆ
    xij)M
    Bootstrap - Iterative PCA
    2 Noise: for m = 1, ..., M, missing values xm
    ij
    drawn N(ˆ
    xm
    ij
    , ˆ
    σ2)
    Implemented in missMDA (website)
    François Husson
    51 / 92

    View Slide

  88. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Simulations
    • 1000 simulations
    • data set drawn from Np
    (µ, Σ) with
    a two-block structure, varying n
    (30 or 200), p (6 or 60) and ρ (0.3
    or 0.9)
    0
    0
    0
    0
    0
    0
    0
    0
    0.8
    0.8
    0.8
    0.8
    0.8
    0.8
    0.8
    0.8
    0.8
    0.8
    0.8
    0.8
    0.8
    0.8
    0.8
    0.8
    • 10% or 30% of missing values using a MCAR mechanism
    • multiple imputation using M = 20 imputed data
    • Quantities of interest: θ1 = E [Y ] , θ2 = β1, θ3 = ρ
    • Criteria
    • bias
    • CI width, coverage
    52 / 92

    View Slide

  89. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Results for the expectation
    parameters confidence interval width coverage
    n p ρ %
    Amelia
    MICE
    BayesMIPCA
    Amelia
    MICE
    BayesMIPCA
    1 30 6 0.3 0.1 0.803 0.805 0.781 0.955 0.953 0.950
    2 30 6 0.3 0.3 1.010 0.898 0.971 0.949
    3 30 6 0.9 0.1 0.763 0.759 0.756 0.952 0.95 0.949
    4 30 6 0.9 0.3 0.818 0.783 0.965 0.953
    5 30 60 0.3 0.1 0.775 0.955
    6 30 60 0.3 0.3 0.864 0.952
    7 30 60 0.9 0.1 0.742 0.953
    8 30 60 0.9 0.3 0.759 0.954
    9 200 6 0.3 0.1 0.291 0.294 0.292 0.947 0.947 0.946
    10 200 6 0.3 0.3 0.328 0.334 0.325 0.954 0.959 0.952
    11 200 6 0.9 0.1 0.281 0.281 0.281 0.953 0.95 0.952
    12 200 6 0.9 0.3 0.288 0.289 0.288 0.948 0.951 0.951
    13 200 60 0.3 0.1 0.304 0.289 0.957 0.945
    14 200 60 0.3 0.3 0.384 0.313 0.981 0.958
    15 200 60 0.9 0.1 0.282 0.279 0.951 0.948
    16 200 60 0.9 0.3 0.296 0.283 0.958 0.952
    53 / 92

    View Slide

  90. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Joint, conditional and PCA
    ⇒ Good estimates of the parameters and their variance from an
    incomplete data (coverage close to 0.95)
    The variability due to missing values is well taken into account
    Amelia & mice have difficulties with large correlations or n < p
    missMDA does not but requires a tuning parameter: number of dim.
    Amelia & missMDA are based on linear relationships
    mice is more flexible (one model per variable)
    MI based on PCA works in a large range of configuration, n < p, n > p
    strong or weak relationships, low or high percentage of missing values
    54 / 92

    View Slide

  91. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Remarks
    ⇒ MI theory: good theory for regression parameters. Others?
    ⇒ Imputation model as complex as the analysis model
    (interaction)
    55 / 92

    View Slide

  92. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Remarks
    ⇒ MI theory: good theory for regression parameters. Others?
    ⇒ Imputation model as complex as the analysis model
    (interaction)
    ⇒ Some practical issues:
    • Imputation not in agreement (X and X2): missing passive
    • Imputation out of range? (Predictive mean matching pmm)
    • Problems of logical bounds (> 0) ⇒ truncation?
    55 / 92

    View Slide

  93. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    MI for categorical variables
    • Loglinear model: R package cat (J.L. Schafer)
    • Fully conditional specification: R package mice (Van Burren)
    • Imputation with Gaussian distribution
    • Latent Class Variables: mixture models: each sample belongs
    to a latent class in which variables are independent (D.
    Vidotto, M. C. Kapteijn, and Vermunt J.K, 2014)
    Non-parametric version: Dirichlet process mixture of products
    of multinomial distributions model DPMPM (Y. Si and J.P.
    Reiter, 2014)
    56 / 92

    View Slide

  94. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Multiple imputation for categorical data using MCA
    A set of parameters:
    UI×S
    , Λ1/2
    S×S
    , VJ×S
    1
    , . . . , UI×S
    , Λ1/2
    S×S
    , VJ×S
    M
    obtained using a non-parametric Bootstrap approach:
    1 Generate M bootstrap replicates
    2 Estimate the parameters on each incomplete replicate
    3 Add uncertainty on the prediction
    57 / 92

    View Slide

  95. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Multiple imputation with MCA
    1 Variability of the parameters of MCA (UI×S, Λ1/2
    S×S
    , VJ×S
    )
    using a non-parametric bootstrap:
    → define M weightings (Rm)1≤m≤M
    for the individuals
    58 / 92

    View Slide

  96. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Multiple imputation with MCA
    1 Variability of the parameters of MCA (UI×S, Λ1/2
    S×S
    , VJ×S
    )
    using a non-parametric bootstrap:
    → define M weightings (Rm)1≤m≤M
    for the individuals
    2 Estimate MCA parameters using SVD of X, 1
    K
    (DΣ)−1 , Rm
    58 / 92

    View Slide

  97. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Multiple imputation with MCA
    1 Variability of the parameters of MCA (UI×S, Λ1/2
    S×S
    , VJ×S
    )
    using a non-parametric bootstrap:
    → define M weightings (Rm)1≤m≤M
    for the individuals
    2 Estimate MCA parameters using SVD of X, 1
    K
    (DΣ)−1 , Rm
    ˆ
    X1
    ˆ
    X2
    ˆ
    XM
    1 0 . . . 1 0
    1 0 . . . 1 0
    1 0 . . .
    0.81 0.19
    0.25 0.75
    0 1
    0 1 0 1
    1 0 . . . 1 0
    1 0 . . . 1 0
    1 0 . . .
    0.60 0.40
    0.26 0.74
    0 1
    0 1 0 1
    . . .
    1 0 . . . 1 0
    1 0 . . . 1 0
    1 0 . . .
    0.74 0.16
    0.20 0.80
    0 1
    0 1 0 1
    58 / 92

    View Slide

  98. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Multiple imputation with MCA
    1 Variability of the parameters of MCA (UI×S, Λ1/2
    S×S
    , VJ×S
    )
    using a non-parametric bootstrap:
    → define M weightings (Rm)1≤m≤M
    for the individuals
    2 Estimate MCA parameters using SVD of X, 1
    K
    (DΣ)−1 , Rm
    ˆ
    X1
    ˆ
    X2
    ˆ
    XM
    1 0 . . . 1 0
    1 0 . . . 1 0
    1 0 . . .
    0.81 0.19
    0.25 0.75
    0 1
    0 1 0 1
    1 0 . . . 1 0
    1 0 . . . 1 0
    1 0 . . .
    0.60 0.40
    0.26 0.74
    0 1
    0 1 0 1
    . . .
    1 0 . . . 1 0
    1 0 . . . 1 0
    1 0 . . .
    0.74 0.16
    0.20 0.80
    0 1
    0 1 0 1
    A . . . A
    A . . . A
    A . . .
    A
    B
    . . . C
    B . . . B
    A . . . A
    A . . . A
    A . . .
    A
    B
    . . . C
    B . . . B
    . . .
    A . . . A
    A . . . A
    A . . .
    A
    B
    . . . C
    B . . . B
    majority ⇒ lack of variability
    58 / 92

    View Slide

  99. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Multiple imputation with MCA
    1 Variability of the parameters of MCA (UI×S, Λ1/2
    S×S
    , VJ×S
    )
    using a non-parametric bootstrap:
    → define M weightings (Rm)1≤m≤M
    for the individuals
    2 Estimate MCA parameters using SVD of X, 1
    K
    (DΣ)−1 , Rm
    ˆ
    X1
    ˆ
    X2
    ˆ
    XM
    1 0 . . . 1 0
    1 0 . . . 1 0
    1 0 . . .
    0.81 0.19
    0.25 0.75
    0 1
    0 1 0 1
    1 0 . . . 1 0
    1 0 . . . 1 0
    1 0 . . .
    0.60 0.40
    0.26 0.74
    0 1
    0 1 0 1
    . . .
    1 0 . . . 1 0
    1 0 . . . 1 0
    1 0 . . .
    0.74 0.16
    0.20 0.80
    0 1
    0 1 0 1
    3 Draw categories from the values of ˆ
    Xm
    1≤m≤M
    A . . . A
    A . . . A
    A . . .
    B
    B
    . . . C
    B . . . B
    A . . . A
    A . . . A
    A . . .
    A
    B
    . . . C
    B . . . B
    . . .
    A . . . A
    A . . . A
    A . . .
    B
    B
    . . . C
    B . . . B
    58 / 92

    View Slide

  100. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Simulations
    • Quantities of interest: θ = parameters of a logistic model
    • 200 simulations from real data sets
    • the real data set is considered as a population
    • drawn one sample from the data set
    • generate 20% of missing values
    • multiple imputation using M = 5 imputed data
    • Criteria
    • bias
    • CI width, coverage
    59 / 92

    View Slide

  101. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Results - Inference
    q
    MIMCA 5
    Loglinear
    Latent class
    FCS−log
    FCS−rf
    0.80
    0.85
    0.90
    0.95
    1.00
    Titanic
    coverage
    q
    q
    q
    q
    MIMCA 2
    Loglinear
    Latent class
    FCS−log
    FCS−rf
    0.80
    0.85
    0.90
    0.95
    1.00
    Galetas
    coverage
    q
    MIMCA 5
    Latent class
    FCS−log
    FCS−rf
    0.80
    0.85
    0.90
    0.95
    1.00
    Income
    coverage
    Titanic Galetas Income
    Number of variables 4 4 14
    Number of categories ≤ 4 ≤ 11 ≤ 9
    60 / 92

    View Slide

  102. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Results - Time
    Titanic Galetas Income
    MIMCA 2.750 8.972 58.729
    Loglinear 0.740 4.597 NA
    Latent class model 10.854 17.414 143.652
    FCS logistic 4.781 38.016 881.188
    FCS forests 265.771 112.987 6329.514
    Table : Time in second
    Titanic Galetas Income
    Number of individuals 2201 1192 6876
    Number of variables 4 4 14
    61 / 92

    View Slide

  103. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Conclusion
    Multiple imputation methods for continuous and categorical data
    using dimensionality reduction method
    Properties:
    • requires a small number of parameters
    • captures the relationships between variables
    • captures the similarities between individuals
    From a practical point of view:
    • can be applied on data sets of various dimensions
    • provides correct inferences for analysis model based on
    relationships between pairs of variables
    • requires to choose the number of dimensions S
    Perspective:
    • mixed data
    62 / 92

    View Slide

  104. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Mixed variables
    ⇒ Joint modeling:
    • General location model (Schafer, 1997) =⇒ pb when many
    categories
    • Transform the categorical variables into dummy variables and
    deal as continuous variables (Amelia)
    • Latent class models (Vermunt) – nonparametric Bayesian
    models (work in progress, Dunson, Reiter, Duke University)
    ⇒ Conditional modeling: linear, logistic, multinomial logit models
    (mice), Random forests
    63 / 92

    View Slide

  105. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    To conclude
    Take home message:
    • “The idea of imputation is both seductive and dangerous. It is seductive
    because it can lull the user into the pleasurable state of believing that the data
    are complete after all, and it is dangerous because it lumps together situations
    where the problem is sufficiently minor that it can be legitimately handled in
    this way and situations where standard estimators applied to the real and
    imputed data have substantial biases.” (Dempster and Rubin, 1983)
    • Advanced methods are available to estimate parameters and
    their variance (taking into account the variability due to
    missing values)
    • Multiple imputation is an appealing method .... but ... how
    can we do with big data?
    • Still an active area of research
    64 / 92

    View Slide

  106. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Ressources
    ⇒ Softwares:
    • van Buuren webpage:
    http://www.stefvanbuuren.nl/mi/Software.html
    • R task View: Official Statistics & Survey Methodology
    ⇒ Recent Books:
    • van Buuren (2012). Flexible Imputation of Missing Data. Chapman & Hall/CRC
    • Carpenter & Kenward (2013). Multiple Imputation and its Application. Wiley
    • G. Molenberghs, G. Fitzmaurice, M.G. Kenward, A. Tsiatis & G. Verbeke (nov
    2014). Handbook of Missing Data. Chapman & Hall/CRC
    ⇒ Little & Rubin (2002). Statistical Analysis with missing data - Schafer (1997)
    Analysis of incomplete multivariate data
    ⇒ J.L. Schafer & J.W. Graham, 2002. Missing Data: Our View of the State of the
    Art. Psychological Methods, 7 147-177
    ⇒ B. Efron. 1989. Missing data, Imputation and the Bootstrap. Journal of the
    American Statistical Association, 426 463-475
    65 / 92

    View Slide

  107. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Contributors on the topic of multiple imputation
    • J. Honaker - G. King - M. Blackwell (Harvard): Amelia
    • S. van Buuren (Utrecht): mice
    • F. Husson - J. Josse (Rennes): missMDA
    • A. Gelman - J. Hill - Y. Su (Colombia): mi
    • J. Reiter (Duke): NPBayesImpute Non-Parametric Bayesian
    Multiple Imputation for Categorical Data
    • J. Bartlett - J. Carpenter - M. Kenward (UCL): smcfcs
    Substantive model compatible FCS multiple imputation
    • H. Goldstein (Bristol) : realcom for multi-level data
    • J.K. Vermunt (Tilburg): poLCA latent class models
    • Shaun Seaman (Medical Research Council Biostatistics Unit,
    UK), Roderick Little (Michigan)...
    • Donald B Rubin (Harvard)
    66 / 92

    View Slide

  108. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Conference on missing data and matrix completion
    http://missdata2015.agrocampus-ouest.fr/
    67 / 92

    View Slide

  109. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Outline
    1 Introduction
    2 Point estimates of the PCA axes and components
    3 Uncertainty
    4 MCA/MFA
    5 Single imputation for mixed variables
    6 Multiple imputation
    7 Practice
    8 Appendix
    68 / 92

    View Slide

  110. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    A real dataset
    O3 T9 T12 T15 Ne9 Ne12 Ne15 Vx9 Vx12 Vx15 O3v
    0601 NA 15.6 18.5 18.4 4 4 8 NA -1.7101 -0.6946 84
    0602 82 17 18.4 17.7 5 5 7 NA NA NA 87
    0603 92 NA 17.6 19.5 2 5 4 2.9544 1.8794 0.5209 82
    0604 114 16.2 NA NA 1 1 0 NA NA NA 92
    0605 94 17.4 20.5 NA 8 8 7 -0.5 NA -4.3301 114
    0606 80 17.7 NA 18.3 NA NA NA -5.6382 -5 -6 94
    0607 NA 16.8 15.6 14.9 7 8 8 -4.3301 -1.8794 -3.7588 80
    0610 79 14.9 17.5 18.9 5 5 4 0 -1.0419 -1.3892 NA
    0611 101 NA 19.6 21.4 2 4 4 -0.766 NA -2.2981 79
    0612 NA 18.3 21.9 22.9 5 6 8 1.2856 -2.2981 -3.9392 101
    0613 101 17.3 19.3 20.2 NA NA NA -1.5 -1.5 -0.8682 NA
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    0919 NA 14.8 16.3 15.9 7 7 7 -4.3301 -6.0622 -5.1962 42
    0920 71 15.5 18 17.4 7 7 6 -3.9392 -3.0642 0 NA
    0921 96 NA NA NA 3 3 3 NA NA NA 71
    0922 98 NA NA NA 2 2 2 4 5 4.3301 96
    0923 92 14.7 17.6 18.2 1 4 6 5.1962 5.1423 3.5 98
    0924 NA 13.3 17.7 17.7 NA NA NA -0.9397 -0.766 -0.5 92
    0925 84 13.3 17.7 17.8 3 5 6 0 -1 -1.2856 NA
    0927 NA 16.2 20.8 22.1 6 5 5 -0.6946 -2 -1.3681 71
    0928 99 16.9 23 22.6 NA 4 7 1.5 0.8682 0.8682 NA
    0929 NA 16.9 19.8 22.1 6 5 3 -4 -3.7588 -4 99
    0930 70 15.7 18.6 20.7 NA NA NA 0 -1.0419 -4 NA
    69 / 92

    View Slide

  111. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Count missing values
    > library(VIM)
    > aggr(don,only.miss=TRUE,sortVar=TRUE)
    > res<-summary(aggr(don,prop=TRUE,combined=TRUE))$combinations
    > res[rev(order(res[,2])),]
    Variables sorted by
    number of missings: Combinations Count Percent
    Variable Count 0:0:0:0:0:0:0:0:0:0:0 13 11.6071429
    Ne12 0.37500000 0:1:1:1:0:0:0:0:0:0:0 7 6.2500000
    T9 0.33035714 0:0:0:0:0:1:0:0:0:0:0 5 4.4642857
    T15 0.33035714 0:1:0:0:0:0:0:0:0:0:0 4 3.5714286
    Ne9 0.30357143 0:1:0:0:1:1:1:0:0:0:0 3 2.6785714
    T12 0.29464286 0:0:1:0:0:0:0:0:0:0:0 3 2.6785714
    Ne15 0.28571429 0:0:0:1:0:0:0:0:0:0:0 3 2.6785714
    Vx15 0.18750000 0:0:0:0:1:1:1:0:0:0:0 3 2.6785714
    Vx9 0.16071429 0:0:0:0:0:1:0:0:0:0:1 3 2.6785714
    maxO3 0.14285714 0:1:1:1:1:0:0:0:0:0:0 2 1.7857143
    maxO3v 0.10714286 0:0:0:0:1:0:0:0:0:1:0 2 1.7857143
    Vx12 0.08928571 0:0:0:0:0:0:1:1:0:0:0 2 1.7857143
    0:0:0:0:0:0:1:0:0:0:0 2 1.7857143
    ..................... . ...
    70 / 92

    View Slide

  112. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Pattern visualization
    Proportion of missings
    0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
    Ne12
    T9
    T15
    Ne9
    T12
    Ne15
    Vx15
    Vx9
    maxO3
    maxO3v
    Vx12
    Combinations
    Ne12
    T9
    T15
    Ne9
    T12
    Ne15
    Vx15
    Vx9
    maxO3
    maxO3v
    Vx12
    > aggr(don,only.miss=TRUE,sortVar=TRUE)
    71 / 92

    View Slide

  113. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Visualization
    maxO3
    T9
    T12
    T15
    Ne9
    Ne12
    Ne15
    Vx9
    Vx12
    Vx15
    maxO3v
    0 20 40 60 80 100
    Index
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q q
    q q
    q
    q
    q q
    q q
    q q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    16
    37
    4
    12 14 16 18 20 22 24
    40 60 80 100 120 140 160
    T9
    maxO3
    > matrixplot(don,sortby=2)
    > marginplot(don[,c("T9","maxO3")])
    72 / 92

    View Slide

  114. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Visualization with Multiple Correspondence Analysis
    ⇒ Create the missingness matrix
    > mis.ind <- matrix("o",nrow=nrow(don),ncol=ncol(don))
    > mis.ind[is.na(don)]="m"
    > dimnames(mis.ind)=dimnames(don)
    > mis.ind
    maxO3 T9 T12 T15 Ne9 Ne12 Ne15 Vx9 Vx12 Vx15 maxO3v
    20010601 "o" "o" "o" "m" "o" "o" "o" "o" "o" "o" "o"
    20010602 "o" "m" "m" "m" "o" "o" "o" "o" "o" "o" "o"
    20010603 "o" "o" "o" "o" "o" "m" "m" "o" "m" "o" "o"
    20010604 "o" "o" "o" "m" "o" "o" "o" "m" "o" "o" "o"
    20010605 "o" "m" "o" "o" "m" "m" "m" "o" "o" "o" "o"
    20010606 "o" "o" "o" "o" "o" "m" "o" "o" "o" "o" "o"
    20010607 "o" "o" "o" "o" "o" "o" "m" "o" "o" "o" "o"
    20010610 "o" "o" "o" "o" "o" "o" "m" "o" "o" "o" "o"
    73 / 92

    View Slide

  115. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Visualization with Multiple Correspondence Analysis
    q
    −1.0 −0.5 0.0 0.5 1.0 1.5
    −1.0 −0.5 0.0 0.5 1.0
    MCA graph of the categories
    Dim 1 (19.07%)
    Dim 2 (17.71%)
    maxO3_m
    maxO3_o
    T9_m
    T9_o
    T12_m
    T12_o
    T15_m
    T15_o
    Ne9_m
    Ne9_o
    Ne12_m
    Ne12_o
    Ne15_m
    Ne15_o
    Vx9_m
    Vx9_o
    Vx12_m
    Vx12_o
    Vx15_m
    Vx15_o
    maxO3v_m
    maxO3v_o
    > library(FactoMineR)
    > resMCA <- MCA(mis.ind)
    > plot(resMCA,invis="ind",title="MCA graph of the categories")
    74 / 92

    View Slide

  116. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Imputation with PCA
    ⇒ Step 1: Estimation of the number of dimensions
    > library(missMDA)
    > nb <- estim_ncpPCA(don,method.cv="Kfold")
    > nb$ncp #2
    > plot(0:5,nb$criterion,xlab="nb dim", ylab="MSEP")
    q
    q
    q
    q
    q q
    0 1 2 3 4 5
    4000 5000 6000 7000
    nb dim
    MSEP
    75 / 92

    View Slide

  117. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Imputation with PCA
    ⇒ Step 2: Imputation of the missing values
    > res.comp <- imputePCA(don,ncp=2)
    > res.comp$completeObs[1:3,]
    maxO3 T9 T12 T15 Ne9 Ne12 Ne15 Vx9 Vx12 Vx15 maxO3v
    0601 87 15.60 18.50 20.47 4 4.00 8.00 0.69 -1.71 -0.69 84
    0602 82 18.51 20.88 21.81 5 5.00 7.00 -4.33 -4.00 -3.00 87
    0603 92 15.30 17.60 19.50 2 3.98 3.81 2.95 1.97 0.52 82
    76 / 92

    View Slide

  118. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    PCA representation
    ⇒ Step 3: PCA on the completed data set
    q
    −4 −2 0 2 4 6
    −6 −4 −2 0 2 4
    Individuals factor map (PCA)
    Dim 1 (57.47%)
    Dim 2 (21.34%)
    East
    North
    West
    South
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    East
    North
    West
    South
    q
    −1.0 −0.5 0.0 0.5 1.0
    −1.0 −0.5 0.0 0.5 1.0
    Variables factor map (PCA)
    Dim 1 (55.85%)
    Dim 2 (21.73%)
    T9
    T12
    T15
    Ne9
    Ne12
    Ne15
    Vx9
    Vx12
    Vx15
    maxO3v
    maxO3
    > imp <- cbind.data.frame(res.comp$completeObs,WindDirection)
    > res.pca <- PCA(imp,quanti.sup=1,quali.sup=12)
    > plot(res.pca, hab=12, lab="quali"); plot(res.pca, choix="var")
    > res.pca$ind$coord #scores (principal components)
    77 / 92

    View Slide

  119. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Multiple imputation in practice
    ⇒ Step 1: Generate M imputed data sets
    > library(Amelia)
    > res.amelia <- amelia(don,m=100) ## in combination with zelig
    > library(mice)
    > res.mice <- mice(don,m=100,defaultMethod="norm.boot")
    > library(missMDA)
    > res.MIPCA <- MIPCA(don,ncp=2,nboot=100)
    > res.MIPCA$res.MI
    78 / 92

    View Slide

  120. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Multiple imputation in practice
    ⇒ Step 2: visualization
    10 15 20 25 30 35
    0.00 0.02 0.04 0.06 0.08 0.10 0.12
    Observed and Imputed values of T12
    T12 −− Fraction Missing: 0.295
    Relative Density
    Mean Imputations
    Observed Values
    40 60 80 100 120 140 160
    50 100 150 200
    Observed versus Imputed Values of maxO3
    Observed Values
    Imputed Values
    0−.2 .2−.4 .4−.6 .6−.8 .8−1
    > library(Amelia)
    > compare.density(res.amelia, var="T12")
    > overimpute(res.amelia, var="maxO3")
    function stripplot in mice
    79 / 92

    View Slide

  121. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Multiple imputation in practice
    ⇒ Step 2: visualization
    > res.MIPCA <- MIPCA(don,ncp=2)
    > plot(res.MIPCA,choice= "ind.supp"); plot(res.MIPCA,choice= "var")
    q
    −5 0 5
    −8 −6 −4 −2 0 2 4 6
    Supplementary projection
    Dim 1 (57.20%)
    Dim 2 (20.27%)
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    4142 43
    44
    45 46
    47
    48 49
    50
    51
    52
    53
    54 55 56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    7677
    78
    79
    80
    81
    82 83
    84
    85
    86
    87
    88
    89
    9091
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104 105
    106
    107
    108
    109
    110
    111
    112
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q q
    q
    q
    q
    q
    q
    −1.0 −0.5 0.0 0.5 1.0
    −1.0 −0.5 0.0 0.5 1.0
    Variable representation
    Dim 1 (57.20%)
    Dim 2 (20.27%)
    maxO3
    T9
    T12
    T15
    Ne9
    Ne12 Ne15
    Vx9
    Vx12
    Vx15
    maxO3v
    80 / 92

    View Slide

  122. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Multiple imputation in practice
    ⇒ Step 3. Regression on each table and pool the results
    ˆ
    β = 1
    M
    M
    m=1
    ˆ
    βm
    T = 1
    M m
    Var ˆ
    βm + 1 + 1
    M
    1
    M−1 m
    ˆ
    βm − ˆ
    β
    2
    > library(mice)
    > imp.mice <- mice(don,m=100,defaultMethod="norm")
    > lm.mice.out <- with(res.mice, lm(maxO3 ~ T9+T12+T15+Ne9+Ne12+
    Ne15+Vx9+Vx12+Vx15+maxO3v))
    > pool.mice <- pool(lm.mice.out)
    > summary(pool.mice)
    est se t df Pr(>|t|) lo 95 hi 95 nmis fmi lambda
    (Intercept) 19.31 16.30 1.18 50.48 0.24 -13.43 52.05 NA 0.46 0.44
    T9 -0.88 2.25 -0.39 26.43 0.70 -5.50 3.75 37 0.71 0.69
    T12 3.29 2.38 1.38 27.54 0.18 -1.59 8.18 33 0.70 0.68
    ....
    Vx15 0.23 1.33 0.17 39.00 0.87 -2.47 2.93 21 0.57 0.55
    maxO3v 0.36 0.10 3.65 46.03 0.00 0.16 0.56 12 0.50 0.48
    81 / 92

    View Slide

  123. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Mixed imputation in practice
    > library(missMDA)
    > imputeFAMD(mydata,ncp=2)
    > library(missForest)
    > missForest(mydata)
    > library(mice)
    > mice(mydata)
    > mice(mydata, defaultMethod = "rf") ## mice with random forests
    82 / 92

    View Slide

  124. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    An ecological data set
    Glopnet data: 2494 species described by 6 quantitative variables
    • LMA (leaf mass per area)
    • LL (leaf lifespan)
    • Amass (photosynthetic assimilation)
    • Nmass (leaf nitrogen),
    • Pmass (leaf phosphorus)
    • Rmass (dark respiration rate)
    and 1 categorical variable: the biome
    Wright IJ, et al. (2004). The worldwide leaf economics spectrum.
    Nature, 428:821.
    www.nature.com/nature/journal/v428/n6985/extref/nature02403-s2.xls
    83 / 92

    View Slide

  125. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    An ecological data set
    > sum(is.na(don))/(nrow(don)*ncol(don)) # 53% of missing values
    [1] 0.5338145
    > dim(na.omit(don)) ## Delete species with missing values
    [1] 72 6 ## only 72 remaining species!
    > library(VIM)
    > aggr(don,numbers=TRUE,sortVar=TRUE)
    Proportion of missings
    0.0 0.2 0.4 0.6 0.8
    Rmass
    LL
    Pmass
    Amass
    Nmass
    LMA
    Combinations
    Rmass
    LL
    Pmass
    Amass
    Nmass
    LMA
    0.2326
    0.1985
    0.1359
    0.0714
    0.0589
    0.0573
    0.0525
    0.0397
    0.0289
    0.0180
    0.0180
    0.0152
    0.0124
    0.0124
    0.0120
    0.0080
    0.0056
    0.0052
    0.0036
    0.0028
    0.0024
    0.0024
    0.0024
    0.0020
    0.0004
    0.0004
    0.0004
    0.0004
    84 / 92

    View Slide

  126. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    An ecological data set
    q
    −1 0 1 2
    −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
    MCA graph of the categories
    Dim 1 (33.67%)
    Dim 2 (21.07%)
    LL_m
    LL_o
    LMA_m
    LMA_o
    Nmass_m
    Nmass_o
    Pmass_m
    Pmass_o
    Amass_m
    Amass_o
    Rmass_m
    Rmass_o
    > mis.ind <- matrix("o",nrow=nrow(don),ncol=ncol(don))
    > mis.ind[is.na(don)] <- "m"
    > dimnames(mis.ind) <- dimnames(don)
    > library(FactoMineR)
    > resMCA <- MCA(mis.ind)
    > plot(resMCA,invis="ind",title="MCA graph of the categories")
    85 / 92

    View Slide

  127. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    An ecological data set
    What about mean imputation?
    q
    −5 0 5
    −6 −4 −2 0 2 4 6 8
    Individuals factor map (PCA)
    Dim 1 (44.79%)
    Dim 2 (23.50%)
    alpine
    boreal
    desert
    grass/m
    temp_for
    temp_rf
    trop_for
    trop_rf
    tundra
    wland
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    qq
    q q
    q
    q
    q q q
    q
    q
    q q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    qq
    q
    qq
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    qq
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    qq
    qq
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    qq
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q q q
    q
    qq
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    alpine
    boreal
    desert
    grass/m
    temp_for
    temp_rf
    trop_for
    trop_rf
    tundra
    wland
    86 / 92

    View Slide

  128. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    An ecological data set
    q
    −10 −5 0 5
    −6 −4 −2 0 2 4 6
    Individuals factor map (PCA)
    Dim 1 (91.18%)
    Dim 2 (4.97%)
    alpine
    boreal
    desert
    grass/m
    temp_for
    temp_rf
    trop_for
    trop_rf
    tundra
    wland
    q
    q
    q q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    qq
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q q qq
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q q
    q
    q
    q
    q q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q q q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q q
    qq q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q q
    q
    q q q
    q
    q q q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q q
    q q q
    q
    q q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q q
    q q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    qq
    q
    q
    q q q q
    q
    q
    q
    q q
    q q q q q
    q q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q q
    q q
    q
    qq q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q q
    q qq q q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q q
    q
    q q q q
    q
    q
    q
    q
    q
    q
    qq
    q
    q q q q
    q q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q q q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q q
    q
    q
    q q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    qq q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q q q
    q
    q
    q
    q
    q
    q qq
    q
    q
    q
    q
    q
    q q
    q q
    q
    q
    q
    q
    q q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q q
    q
    q q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q q
    q
    q q
    q
    q
    q
    q
    qq
    q
    q q
    q q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q q
    q
    q
    q
    q
    q
    q q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q q q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q q q
    q q
    q
    q
    q
    q
    qq q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q qq q
    q
    q
    q
    q q
    q q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q q
    q
    q q
    q
    q
    q
    q q q
    q
    q q
    q q q
    q q
    q q q
    qq q q
    q q
    q q
    q
    q q q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    alpine
    boreal
    desert
    grass/m
    temp_for
    temp_rf
    trop_for
    trop_rf
    tundra
    wland
    q
    −1 0 1 2
    −2 −1 0 1
    Individuals factor map (PCA)
    Dim 1 (91.18%)
    Dim 2 (4.97%)
    alpine
    boreal
    desert
    grass/m
    temp_for
    temp_rf
    trop_for
    trop_rf
    tundra
    wland
    alpine
    boreal
    desert
    grass/m
    temp_for
    temp_rf
    trop_for
    trop_rf
    tundra
    wland
    q
    −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
    −1.0 −0.5 0.0 0.5 1.0
    Variables factor map (PCA)
    Dim 1 (91.18%)
    Dim 2 (4.97%)
    LL
    LMA
    Nmass
    Pmass
    Amass
    Rmass
    > library(missMDA)
    > nb <- estim_ncpPCA(don,method.cv="Kfold",nbsim=100)
    > res.comp <- imputePCA(don,ncp=2)
    > imp <- cbind.data.frame(res.comp$completeObs,tab.init[,1:4])
    > res.pca <- PCA(imp,quanti.sup=1,quali.sup=12)
    > plot(res.pca, hab=12, lab="quali"); plot(res.pca, choix="var")
    > res.pca$ind$coord #scores (principal components)
    > res.MIPCA <- MIPCA(don,ncp=2)
    > plot(res.MIPCA,choice= "ind.supp"); plot(res.MIPCA,choice= "var ")
    87 / 92

    View Slide

  129. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Outline
    1 Introduction
    2 Point estimates of the PCA axes and components
    3 Uncertainty
    4 MCA/MFA
    5 Single imputation for mixed variables
    6 Multiple imputation
    7 Practice
    8 Appendix
    88 / 92

    View Slide

  130. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Expectation - Maximization (Dempster et al., 1977)
    Need the modification of the estimation process (not always easy!)
    Rationale to get ML estimates on the observed values max Lobs
    through max of Lcomp of X = (Xobs, Xmiss). Augment the data to
    simplify the problem
    E step (conditional expectation):
    Q(θ, θ ) = ln(f (X|θ))f (Xmiss|Xobs, θ )dXmiss
    M step (maximization):
    θ +1 = argmaxθ
    Q(θ, θ )
    Result: when θ +1 max Q(θ, θ ) then L(Xobs, θ +1) ≥ L(Xobs, θ )
    89 / 92

    View Slide

  131. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Maximum likelihood approach
    Hypothesis xi. ∼ N (µ, Σ)
    ⇒ Point estimates with EM:
    > library(norm)
    > pre <- prelim.norm(as.matrix(don))
    > thetahat <- em.norm(pre)
    > getparam.norm(pre,thetahat)
    90 / 92

    View Slide

  132. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Maximum likelihood approach
    Hypothesis xi. ∼ N (µ, Σ)
    ⇒ Point estimates with EM:
    > library(norm)
    > pre <- prelim.norm(as.matrix(don))
    > thetahat <- em.norm(pre)
    > getparam.norm(pre,thetahat)
    ⇒ Variances:
    • Supplemented EM (Meng, 1991)
    • Bootstrap approach:
    • Bootstrap rows: X1, ... , XB
    • EM algorithm: (ˆ
    µ1, ˆ
    Σ1
    ), ... , (ˆ
    µB, ˆ
    ΣB
    )
    90 / 92

    View Slide

  133. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    Maximum likelihood approach
    Hypothesis xi. ∼ N (µ, Σ)
    ⇒ Point estimates with EM:
    > library(norm)
    > pre <- prelim.norm(as.matrix(don))
    > thetahat <- em.norm(pre)
    > getparam.norm(pre,thetahat)
    ⇒ Variances:
    • Supplemented EM (Meng, 1991)
    • Bootstrap approach:
    • Bootstrap rows: X1, ... , XB
    • EM algorithm: (ˆ
    µ1, ˆ
    Σ1
    ), ... , (ˆ
    µB, ˆ
    ΣB
    )
    Issue: develop a specific method for each statistical method
    90 / 92

    View Slide

  134. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    MI using the loglinear model
    • Hypothesis X = (xijk)i,j,k:
    X|θ ∼ M (n, θ) where:
    log(θijk) = λ0 + λA
    i
    + λB
    j
    + λC
    k
    + λAB
    ij
    + λAC
    ik
    + λBC
    jk
    + λABC
    ijk
    1 Variability of the parameters
    • prior on θ : θ|θ ∈ Θ ∼ D(α)
    • posterior: θ|x, θ ∈ Θ ∼ D(α )
    • Data Augmentation (M.A. Tanner, W.H. Wong, 1987)
    2 Imputation according to the loglinear model using the set of
    M parameters
    • Implemented: R package cat (J.L. Schafer)
    91 / 92

    View Slide

  135. Introduction Point estimates Confidence Areas MCA/MFA SI for mixed var. Multiple imputation Practice Appendix
    MI using a DPMPM model (Si and Reiter, 2013)
    • Hypothesis: P (X = (x1, . . . , xK ); θ) =
    L
    =1
    θ
    K
    k=1
    θ( )
    xk
    1 Variability of the parameters:
    • a hierarchic prior on θ:
    α ∼ G(.25, .25) ζ ∼ B(1, α) θ = ζ
    g<
    (1 − ζg
    ) for in 1, . . . , ∞
    • posterior on θ: untractable
    → Gibbs sampler and Data Augmentation
    2 Imputation according to the mixture model using the set of M
    parameters
    • Implemented: R package mi (Gelman et al.)
    92 / 92

    View Slide