Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Multi-blocks methods

julie josse
October 30, 2015

Multi-blocks methods

julie josse

October 30, 2015
Tweet

More Decks by julie josse

Other Decks in Research

Transcript

  1. Data Common Structure Groups Study Partial Analyses To go further Example
    Multiple Factor Analysis
    Julie Josse
    Applied Mathematics Department, Agrocampus Ouest
    Stat 300
    Stanford, July 2015
    1 / 58

    View Slide

  2. Data Common Structure Groups Study Partial Analyses To go further Example
    Multiple Factor Analysis
    1 Data - Issues
    2 Common Structure
    3 Groups Study
    4 Partial Analyses
    5 Example
    2 / 58

    View Slide

  3. Data Common Structure Groups Study Partial Analyses To go further Example
    Multi-blocks data set
    3
    Groups of variables (MFA)
    Groups of
    variables are
    quantitative and/
    or qualitative
    Objectives: - study the link between the sets of variables
    - balance the influence of each group of variables
    - give the classical graphs but also specific graphs:
    groups of variables - partial representation
    Examples: - Genomic: DNA, protein
    - Sensory analysis: sensorial, physico-chemical
    - Comparison of coding (quantitative / qualitative)
    • Sensory analysis: products - sensorial, physico-chemical
    • Survey: individuals - questionnaires themes (students health:
    addicted consumptions, psychological conditions, sleep, id)
    • Economy: countries - economic indicators each year
    • Biology: samples - Omics data (brain tumors: CGH, transcriptome;
    mouse: transcriptome, hepatic fatty acid measurements)
    ⇒ Generalized Canonical Correlation, Procrustes, Statis, etc.
    ⇒ MFA (Escofier & Pagès, 1998)
    ⇒ Continuous / categorical / contingency sets of variables
    3 / 58

    View Slide

  4. Data Common Structure Groups Study Partial Analyses To go further Example
    Example: gliomas brain tumors
    The data

    Gliomas: Brain tumors, WHO classification
    - astrocytoma (A)……….……… x5
    - oligodendroglioma (O)……… x8
    - oligo-astrocytoma (OA)…… x6
    - glioblastoma (GBM)………… x24
    43 tumor samples
    (Bredel et al.,2005)
    - transcriptional modification (RNA), Microarrays
    - damage to DNA, CGH arrays
    • Transcriptional modification (RNA), microarrays: 489 variables
    • Damage to DNA (CGH array): 113 variables
    ‘-omics’ data
    1 j
    1
    J
    1
    1
    i
    I
    Tumors
    1 j
    2
    J
    2

    The data, the expectations


    4 / 58

    View Slide

  5. Data Common Structure Groups Study Partial Analyses To go further Example
    Objectives
    • Study the similarities between individuals with respect to all
    the variables
    • Study the linear relationships between variables
    ⇒ taking into account the structure on the data (balance the
    influence of each group)
    • Find the common structure with respect to all the groups -
    highlight the specificities of each group
    • Compare the typologies obtained from each group of variables
    (separate analyses)
    5 / 58

    View Slide

  6. Data Common Structure Groups Study Partial Analyses To go further Example
    Principal component methods
    The core of principal component methods is PCA on particular
    matrices
    "Doing a data analysis, in good mathematics, is simply searching
    eigenvectors, all the science of it (the art) is just to find the right
    matrix to diagonalize"
    Benzécri
    MFA is a particular weighted PCA!
    6 / 58

    View Slide

  7. Data Common Structure Groups Study Partial Analyses To go further Example
    Balancing the groups of variables
    MFA is a weighted PCA:
    • compute the first eigenvalue λj
    1
    of each group of variables
    • perform a global PCA on the weighted data table:


    X1
    λ1
    1
    ;
    X2
    λ2
    1
    ; ...;
    XJ
    λJ
    1


    ⇒ Same idea as in PCA when variables are standardized: variables
    are weighted to compute distances between individuals i and i
    8 variables
    highly
    correlated
    2 var
    i
    i′
    7 / 58

    View Slide

  8. Data Common Structure Groups Study Partial Analyses To go further Example
    Balancing the groups of variables
    Transcriptome Genome
    λ1 162 12
    λ2 35 10
    λ3 21 5
    This weighting allows that:
    • Same weight for all the variables of one group: the structure of
    the group is preserved
    • For each group the variance of the main dimension of
    variability (first eigenvalue) is equal to 1
    • No group can generate by itself the first global dimension
    • A multidimensional group will contribute to the construction
    of more dimensions than a one-dimensional group
    8 / 58

    View Slide

  9. Data Common Structure Groups Study Partial Analyses To go further Example
    Individuals and variables representations
    q
    −2 −1 0 1 2 3
    −3 −2 −1 0 1 2
    Individual factor map
    Dim 1 (20.99 %)
    Dim 2 (13.51 %)
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    AA3
    AO1
    AO2
    AO3
    AOA1
    AOA2
    AOA3
    AOA4
    AOA6
    AOA7
    GBM1
    GBM11
    GBM15
    GBM16
    GBM21
    GBM22
    GBM23
    GBM24
    GBM25
    GBM26
    GBM27
    GBM28
    GBM29
    GBM3
    GBM30
    GBM31
    GBM4
    GBM5
    GBM6
    GBM9
    GNN1
    GS1
    GS2
    JPA2
    JPA3
    LGG1
    O1
    O2
    O3
    O4
    O5
    sGBM1
    sGBM3
    A
    GBM
    O
    OA q
    −1.0 −0.5 0.0 0.5 1.0
    −1.0 −0.5 0.0 0.5 1.0
    Correlation circle
    Dim 1 (20.99 %)
    Dim 2 (13.51 %)
    EXT1
    KIAA0934
    SVIL
    MGC39606
    ADARB2
    PARD3
    SPAG6
    AKR1C1
    CCDC3
    AKR1C2
    FLJ40873
    GYPA
    VASP
    ACAD8
    NPD014
    USP6NL
    CACNB2
    CLCA3
    COX7A1
    CEAL1
    FLJ38944
    LOC56931
    KIAA1543
    HBXIP
    ZNF226
    EDG1
    SORT1
    LOC283070
    DCLRE1B
    ZNF549
    KLRC3
    MGC4728
    DDEF1
    DMN
    KCNJ1
    MGC18216
    PSG5
    FBXO32
    FLJ12586
    SBP1
    MGC42367
    OSR1
    D21S2089E
    HNRNPG.T
    AP4B1
    ZNF160
    LRBA
    LILRA1
    EGLN2 LU
    GPSM2
    APOC1
    IGF1R
    TLL2
    CA11
    MGC39581
    KCNK6
    TFDP2
    ZNF233
    COX6A2
    UBE4B
    NDUFB11
    FLJ12572
    RBP7
    IGSF3
    BLVRB
    PSG1
    MYBPC2
    MSN
    H08563
    BGN
    TNFRSF12A
    CAPG
    MYCBP2
    EMP3 RTN3
    RGS16
    ITGA5
    PPP3CB
    LOC541471
    H78560
    ASPA
    TIMP1
    AA398420
    IMAGE.33267
    SOCS3
    PDPN
    CHI3L2
    MASP1
    PLP2
    S100A11
    SETD5
    T97457
    UBA52
    DLL3
    TSPYL1
    FAM84B
    RALY
    LOC613212
    CLIC1
    VEGF
    AA281932
    COL1A1
    CD47
    IFI30
    ATP6V1C1
    PSD3
    RUNX1
    LHFPL2 C9orf48
    CA12
    EDNRB
    PPP3CA
    LGALS3
    STEAP3
    IGFBP3
    KLRC3.1
    KLRC2
    KLRC1
    NPL
    ARPP.19
    PDE2A
    COL4A2
    TIMP3
    PLAUR
    PNCK
    WDR7
    X37864
    NNMT
    TBC1D7
    MRC2
    ABCA5
    TCF7L1
    CD58
    MST1
    DECR2
    AA131320
    LOC57228
    MST150
    PEG3
    TMEM49
    FABP3
    SLC16A3
    SERPINH1
    ADM
    NMNAT2
    GABBR1
    VAMP2
    CSPG2
    WASF1
    CBX6
    RAB27B
    COL3A1
    KIAA1644
    NLGN1
    C16orf5
    LOC400451
    AEBP1
    COL1A2
    COL6A2
    HEY1
    HSPG2
    RPS19BP1
    SUV39H1
    RFXAP
    TNC DKFZp313A2432
    STC1
    PDXP
    MOAP1
    MAPKBP1
    ID4 CACNB3
    SERPING1
    RND3
    HMOX1
    MTHFD2
    R70506
    APOC2
    ANXA1
    AA489629
    H86813
    SCAMP5
    RPS3A
    PLAU
    R61377
    NEFH
    STMN1
    FBXL16
    C1R
    WWTR1
    EPHB1
    ANXA5 AA906888
    AI262682
    LOC146795
    SOX4
    COL9A2
    SERPINA3
    AA029415
    PRKCZ
    FLJ35740
    FN1
    LOC388610
    AI871056
    FGF13
    STXBP1
    AI357047
    SP100
    HCLS1
    MGC26694
    LSP1
    PRKAR1B
    YWHAG
    CLTB
    AI335002
    A2BP1
    HLA.A
    SNCG
    IGFBP5
    PYCR1
    ABCC3
    L3MBTL4
    DPYD
    H20822
    AI822135
    LMO7
    ADFP KCNMA1
    PCDHGC3
    MSTP9
    AA479357
    LTF
    ZNF217
    AA490257
    LOC389831
    AA975768
    NOTCH1
    MB RAB30
    FCGR2B
    C4A FLJ38984
    PYGL
    SNRPN
    KNS2
    GBP1 IL32
    LY96
    AA181288
    IGF2BP3
    CALM1
    CASP1
    PDLIM7
    DEF6
    W93688
    USP3
    X38595
    H87106
    CDKN1A
    PHLDA1
    GOT1
    EFHB GYPC
    MYC
    CDC20B
    TNFAIP3
    POSTN
    PLOD2
    CDKN2D
    AA669383 CLSTN3
    MDK
    MVP
    CCL2
    FAM46A
    PLA2G2A
    KCNQ2
    CKMT1A
    H24428
    KIAA0963
    NSF
    RAB11FIP4
    CD53
    AA873230
    URP2
    CAMK2A
    APCS
    HPRT1
    LAMA4
    ITPKA
    HK2
    RBP4
    DDIT4
    OSBPL1A
    CAMKK2
    L1CAM
    PYHIN1
    H91845
    SLC35A2
    FCGRT
    AI005038
    PTPRZ1
    ADCY1
    HAMP
    CD44
    FAS
    NUAK1
    DNASE1L1
    GPNMB
    HBA1
    T62491 FBXO2
    VSNL1
    SPINT2
    C8orf4
    NCF2 PRSS3
    PLTP CAP2
    LAMB1
    EDG3
    INPP5F
    PDE4DIP
    MMP2
    S100A10
    LAPTM5
    PRRX1
    IL1RAP
    HLA.G
    TSPAN4
    ITGA9
    CCR1 MAL2
    DSCR1L1
    C6orf12
    DDN
    CBLN2
    GBP2 PRKCB1
    F13A1 S100A1
    R52960
    BCL2A1
    YWHAH
    FREQ
    UGCG
    SERPINI1
    NLK
    ANK3
    AI002301
    NCF1
    CA11.1
    NY.SAR.48
    AA598555
    CYBA
    ID3
    TAP1
    TGFBI
    AI263051
    TOMM40
    C1orf187
    IQSEC1
    DES
    NPC2
    AIF1
    HLA.F
    AI350724
    PRSS1
    SAA2
    CYR61 T51726
    SYNGR3
    ITGA3
    CHN1
    AA702986 FBXW7
    AA598631
    LUM
    PCP4
    SRPX
    IGHG1
    FAM84A
    H41096
    HPCA
    CTHRC1
    AA401952
    DYNLT1
    GAS1
    RAB20
    ESM1
    AA424849
    NR4A1 EPB49
    MDFI
    LYN
    TXNDC
    PALLD
    R70684
    CAV1
    ZFHX1B
    H10054
    NDST1
    SPARC
    SCN2B
    SYT7
    MED11
    PARP14
    MICAL2
    TncRNA
    BNIP2
    FZD7
    GADD45B
    FBL
    LOC283130
    STK17A
    PRKCG
    HLA.DRB1
    PRG1
    N98591
    PPP1R14A
    SLC15A2
    NPTX1
    MGP
    ⇒ What can be added to interpret?
    9 / 58

    View Slide

  10. Data Common Structure Groups Study Partial Analyses To go further Example
    Individuals and variables representations
    q
    −2 −1 0 1 2 3
    −3 −2 −1 0 1 2
    Individual factor map
    Dim 1 (20.99 %)
    Dim 2 (13.51 %)
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    AA3
    AO1
    AO2
    AO3
    AOA1
    AOA2
    AOA3
    AOA4
    AOA6
    AOA7
    GBM1
    GBM11
    GBM15
    GBM16
    GBM21
    GBM22
    GBM23
    GBM24
    GBM25
    GBM26
    GBM27
    GBM28
    GBM29
    GBM3
    GBM30
    GBM31
    GBM4
    GBM5
    GBM6
    GBM9
    GNN1
    GS1
    GS2
    JPA2
    JPA3
    LGG1
    O1
    O2
    O3
    O4
    O5
    sGBM1
    sGBM3
    A
    GBM
    O
    OA
    A
    GBM
    O
    OA
    Figure 4: Multi-way glioma data set: Characteristics of oligodendrogliomas are linked to modifica
    the genomic status of genes located on 1p and 19q positions.
    ⇒ Do the means of the tumors coordinates per stage on each
    dimension significantly differ from each other?
    10 / 58

    View Slide

  11. Data Common Structure Groups Study Partial Analyses To go further Example
    Groups study
    ⇒ Synthetic comparison of the groups
    ⇒ Are the relative positions of individuals globally similar from one
    group to another? Are the partial clouds similar?
    ⇒ Do the groups bring the same information?
    11 / 58

    View Slide

  12. Data Common Structure Groups Study Partial Analyses To go further Example
    Principal component in MFA
    MFA = weighted PCA ⇒ first principal component of MFA
    maximizes
    J
    j=1 k∈Kj
    cov2


    x.k
    λj
    1
    , F1


    =
    J
    j=1
    Lg
    (F1, Kj
    )
    Lg
    (F1, Kj
    ) =<
    WKj
    λ1
    , F1F1
    >= trace(WKj
    F1F1
    )
    ⇒ F1 the most related to the groups in the Lg sense
    12 / 58

    View Slide

  13. Data Common Structure Groups Study Partial Analyses To go further Example
    Representation of the groups
    Group j has the coordinates (Lg
    (F1, Kj
    ), Lg
    (F2, Kj
    ))
    0.0 0.2 0.4 0.6 0.8 1.0
    0.0 0.2 0.4 0.6 0.8 1.0
    Groups representation
    Dim 1 (20.99 %)
    Dim 2 (13.51 %)
    CGH
    expr
    WHO
    • 2 groups are all the more
    close that they induce the
    same structure
    • The 1st dimension is
    common to all the groups
    • 2nd dimension mainly due
    to CGH
    0 ≤ Lg
    (F1, Kj
    ) =
    1
    λj
    1 k∈Kj
    cov2(x.k, F1
    )
    ≤λj
    1
    ≤ 1
    ⇒ Could you predict the results of the PCA for each group?
    13 / 58

    View Slide

  14. Data Common Structure Groups Study Partial Analyses To go further Example
    The RV coefficient
    Xj(I×Kj )
    and Xm(I×Km)
    not directly comparable
    Wj(I×I)
    = Xj Xj
    and Wm(I×I)
    = XmXm
    can be compared
    Inner product matrices = relative position of the individuals
    Covariance between two groups:
    < Wj , Wm >=
    k∈Kj l∈Km
    cov2(x.k, x.l
    )
    Correlation between two groups (Escoufier, 1973):
    RV (Kj , Km
    ) =
    < Wj , Wm >
    Wj Wm
    0 ≤ RV ≤ 1
    RV = 0: variables of Kj are uncorrelated with variables of Km
    RV = 1: the two clouds of points are homothetic
    ⇒ Extension of the notion of correlation matrix
    14 / 58

    View Slide

  15. Data Common Structure Groups Study Partial Analyses To go further Example
    Similarity between two groups
    Measure of similarity between groups Kj and Km:
    Lg
    (Kj , Km
    ) =
    k∈Kj l∈Km
    cov2
    x.k
    λk
    1
    ,
    x.l
    λl
    1
    Ramsay (1984): "Matrices may be similar or dissimilar in a many
    ways"
    Canonical correlation (Hotteling, 1936), Mantel (1967), Procrustes
    (Gower, 1971), dCov (Szekely et al., 2007), kernel based HSIC
    (Gretton et al., 2005), etc...
    15 / 58

    View Slide

  16. Data Common Structure Groups Study Partial Analyses To go further Example
    Numeric indicators
    > res.mfa$group$Lg
    CGH expr WHO
    CGH 2.51 0.60 0.46
    expr 0.60 1.10 0.36
    WHO 0.46 0.36 0.50
    > res.mfa$group$RV
    CGH expr WHO
    CGH 1.00 0.36 0.41
    expr 0.36 1.00 0.48
    WHO 0.41 0.48 1.00
    Lg
    (Kj , Kj
    ) =
    Kj
    k=1
    (λj
    k
    )2
    (λj
    1
    )2
    = 1+
    Kj
    k=2
    (λj
    k
    )2
    (λj
    1
    )2
    • CGH gives richer description (Lg greater)
    • RV: a standardized Lg
    • CGH and expr are not linked (RV=0.36)
    Contribution of each group to each component of the MFA
    > res.mfa$group$contrib
    Dim.1 Dim.2 Dim.3
    CGH 45.8 93.3 78.1
    expr 54.2 6.7 21.9
    • Similar contribution of the 2 groups to
    the first dimension
    • Second dimension only due to CGH
    16 / 58

    View Slide

  17. Data Common Structure Groups Study Partial Analyses To go further Example
    Partial analyses
    ⇒ Comparison of the groups through the individuals
    ⇒ Comparison of the typologies provided by each group in a
    common space
    ⇒ Are there individuals very particular with respect to one group?
    ⇒ Comparison of the separate PCA
    17 / 58

    View Slide

  18. Data Common Structure Groups Study Partial Analyses To go further Example
    Projection of partial points
    xxxxxxxx
    xxxxxxxx
    xxxxxxxx
    xxxxxxxx
    xxxxxxxx
    xxxxxxxx
    xxxxxxxx
    00000000
    00000000
    00000000
    00000000
    00000000
    00000000
    00000000
    00000000
    00000000
    00000000
    00000000
    00000000
    00000000
    00000000
    xxxxxxxx
    xxxxxxxx
    xxxxxxxx
    xxxxxxxx
    xxxxxxxx
    xxxxxxxx
    xxxxxxxx
    xxxxx
    xxxxx
    xxxxx
    xxxxx
    xxxxx
    xxxxx
    xxxxx
    xxxxx
    xxxxx
    xxxxx
    xxxxx
    xxxxx
    xxxxx
    xxxxx
    xxxxx
    xxxxx
    xxxxx
    xxxxx
    xxxxx
    xxxxx
    xxxxx
    xxxxx
    xxxxx
    xxxxx
    xxxxx
    xxxxx
    xxxxx
    xxxxx
    00000
    00000
    00000
    00000
    00000
    00000
    00000
    00000
    00000
    00000
    00000
    00000
    00000
    00000
    00000
    00000
    00000
    00000
    00000
    00000
    00000
    00000
    00000
    00000
    00000
    00000
    00000
    00000
    Projection of group 1
    Projection of group 2
    Projection of group 3
    Data
    MFA individuals configuration
    i
    i1
    i2
    i3
    i
    Mean point
    Partial point 3
    Partial point 2
    Partial point 1
    G1 G2 G3
    RK = ⊕ RKj
    18 / 58

    View Slide

  19. Data Common Structure Groups Study Partial Analyses To go further Example
    Partial points
    opinion attitude
    individuals
    individual i
    What you think
    What you do
    behavioral conflict
    F1
    F2
    19 / 58

    View Slide

  20. Data Common Structure Groups Study Partial Analyses To go further Example
    Partial points
    What you expected
    for the tutorial
    What you have learned
    during the tutorial
    Tutorial participants
    F
    F
    F
    F1
    1
    1
    1
    F
    F
    F
    F2
    2
    2
    2
    What you have learned
    during the tutorial
    What you expected
    for the tutorial
    What you have learned
    during the tutorial
    What you expected
    for the tutorial
    20 / 58

    View Slide

  21. Data Common Structure Groups Study Partial Analyses To go further Example
    Partial points
    What you expected
    for the tutorial
    What you have learned
    during the tutorial
    Tutorial participants
    F
    F
    F
    F1
    1
    1
    1
    F
    F
    F
    F2
    2
    2
    2
    What you have learned
    during the tutorial
    What you expected
    for the tutorial
    What you have learned
    during the tutorial
    What you expected
    for the tutorial
    Disappointed
    learner
    Happy learner
    20 / 58

    View Slide

  22. Data Common Structure Groups Study Partial Analyses To go further Example
    Representation of the partial points
    q
    −4 −2 0 2 4 6
    −6 −4 −2 0 2 4
    Individual factor map
    Dim 1 (20.99 %)
    Dim 2 (13.51 %)
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    AA3
    AO1
    AO2
    AO3
    AOA1
    AOA2
    AOA3
    AOA4
    AOA6
    AOA7
    GBM1
    GBM11
    GBM15
    GBM16
    GBM21
    GBM22
    GBM23
    GBM24
    GBM25
    GBM26
    GBM27
    GBM28
    GBM29
    GBM3
    GBM30
    GBM31
    GBM4
    GBM5
    GBM6
    GBM9
    GNN1
    GS1
    GS2 JPA2
    JPA3
    LGG1
    O1
    O2
    O3
    O4
    O5
    sGBM1
    sGBM3
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    A
    GBM
    O
    OA
    CGH
    expr
    q
    −1 0 1 2
    −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
    Individual factor map
    Dim 1 (20.99 %)
    Dim 2 (13.51 %)
    A
    GBM
    O
    OA
    CGH
    expr
    • an individual is at the barycentre of its partial points
    • an individual is all the more "homogeneous" that its
    superposed representations are close
    21 / 58

    View Slide

  23. Data Common Structure Groups Study Partial Analyses To go further Example
    Identify particular individuals
    q
    −2 −1 0 1 2 3
    −3 −2 −1 0 1 2
    Individual factor map
    Dim 1 (20.99 %)
    Dim 2 (13.51 %)
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    AA3
    AO1
    AO2
    AO3
    AOA1
    AOA2
    AOA3
    AOA4
    AOA6
    AOA7
    GBM1
    GBM11
    GBM15
    GBM16
    GBM21
    GBM22
    GBM23
    GBM24
    GBM25
    GBM26
    GBM27
    GBM28
    GBM29
    GBM3
    GBM30
    GBM31
    GBM4
    GBM5
    GBM6
    GBM9
    GNN1
    GS1
    GS2
    JPA2
    JPA3
    LGG1
    O1
    O2
    O3
    O4
    O5
    sGBM1
    sGBM3
    q
    q
    A
    GBM
    O
    OA
    CGH
    expr
    22 / 58

    View Slide

  24. Data Common Structure Groups Study Partial Analyses To go further Example
    Numeric indicators
    I
    i=1
    J
    j=1
    (Fij q
    )2 =
    I
    i=1
    J
    j=1
    (Fiq
    )2 +
    I
    i=1
    J
    j=1
    (Fij q
    − Fiq
    )2
    Total inertia = Between indiv. inertia + Within indiv. inertia
    > res.mfa$inertia.ratio
    Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
    0.84 0.56 0.44 0.59 0.43
    • For the first dimension, the coordinates of each partial points
    are close (0.84 close to 1)
    • The within inertia can be decomposed by individuals
    res.mfa$ind$within.inertia
    23 / 58

    View Slide

  25. Data Common Structure Groups Study Partial Analyses To go further Example
    Representation of the partial components
    Do the separate analyses give similar dimensions as MFA?
    PCA
    i
    I
    1
    1 q Q
    1 q Q
    24 / 58

    View Slide

  26. Data Common Structure Groups Study Partial Analyses To go further Example
    Representation of the partial components
    q
    −1.0 −0.5 0.0 0.5 1.0
    −1.0 −0.5 0.0 0.5 1.0
    Partial axes
    Dim 1 (20.99 %)
    Dim 2 (13.51 %)
    Dim1.CGH
    Dim2.CGH
    Dim3.CGH
    Dim1.expr
    Dim2.expr
    Dim3.expr
    Dim1.WHO
    Dim2.WHO
    Dim3.WHO
    CGH
    expr
    WHO
    • The first dimension of
    each group is well
    projected
    • CGH has same
    dimensions as MFA
    25 / 58

    View Slide

  27. Data Common Structure Groups Study Partial Analyses To go further Example
    Use of biological knowledge
    Genes can be grouped by gene ontology (GO) biological process
    GO:0006928
    cell motility
    ANXA1
    CALD1
    EGFR
    ENPP2
    FN1
    FPRL2
    LSP1
    MSN
    PDPN
    PLAUR
    PRSS3
    SAA2
    SPINT2
    TNFRSF12A
    VEGF
    WASF1
    YARS
    GO:0009966
    regulation of signal
    transduction
    CASP1
    EDG2
    F2R
    HCLS1
    HMOX1
    IGFBP3
    IQSEC1
    LYN
    MALT1
    TCF7L1
    TNFAIP3
    TRIO
    VEGF
    YWHAG
    YWHAH
    GO:0052276
    chromosome
    organisation and
    biogenesis
    CBX6
    NUSAP1
    PCOLN3
    PTTG1
    SUV39H1
    TCF7L1
    TSPYL1
    26 / 58

    View Slide

  28. Data Common Structure Groups Study Partial Analyses To go further Example
    Use of biological knowledge
    • Biological processes considered as supplementary groups of
    variables
    ‘-omics’ data
    1 j
    1
    J
    1
    1
    i
    I
    1 j
    2
    J
    2
    M1 M2 M3 …..
    Modules

    Tumors
    Modules
    Modular approach
    => Integration of the modules as groups of supplementary variables
    27 / 58

    View Slide

  29. Data Common Structure Groups Study Partial Analyses To go further Example
    Use of biological knowledge
    0.0 0.2 0.4 0.6 0.8 1.0
    0.0 0.2 0.4 0.6 0.8 1.0
    Groups representation
    Dim 1 (20.99 %)
    Dim 2 (13.51 %)
    CGH
    expr
    WHO
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    qq
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q
    q
    q
    q
    q q
    q
    q
    q
    q
    q q
    q
    q q
    q
    q
    q
    q
    q
    q q
    q
    Many biological processes
    induce the same structure
    on the individuals than
    MFA
    28 / 58

    View Slide

  30. Data Common Structure Groups Study Partial Analyses To go further Example
    To go further
    • Mixed data: MFA with 1 group = 1 variable
    continuous variables: PCA is recovered; categorical variables:
    MCA is recovered
    mixed: FAMD
    • MFA used for methodological purposes:
    • comparison of coding (continuous or categorical)
    • comparison between preprocessing (standardized PCA and
    unstandardized PCA)
    • comparison of results from different analyses
    • Hierarchical Multiple Factor Analysis:
    Takes into account a hierarchy on the variables: variables are
    grouped and subgrouped (like in questionnaires structured in
    topics and subtopics)
    29 / 58

    View Slide

  31. Data Common Structure Groups Study Partial Analyses To go further Example
    Clustering: MFA as a preprocessing
    i
    i’
    X1 X2
    MFA balances the influence of the groups when computing
    distances between individuals
    d2(i, i ) =
    J
    j=1
    1
    λj
    Kj
    k=1
    (xik − xi k
    )2
    AHC or k-means onto the first principal components (F.1, ..., F.Q)
    obtained from MFA allows to
    • take into account the groups structure in the clustering
    • make the clustering more robust by deleting the last dimensions
    30 / 58

    View Slide

  32. Data Common Structure Groups Study Partial Analyses To go further Example
    Clustering
    AHC onto the first 5 principal components from MFA
    0.0 0.4 0.8
    Hierarchical clustering
    inertia gain
    O4
    AO2
    sGBM1
    AOA6
    AO1
    AO3
    GBM1
    LGG1
    AOA3
    O3
    O1
    GBM6
    O2
    O5
    AOA4
    GBM25
    GS1
    GBM29
    GBM31
    GBM15
    GBM26
    JPA2
    GBM9
    GBM21
    GBM22
    GBM23
    GBM27
    GBM11
    GBM4
    GBM5
    GBM30
    GBM28
    GS2
    sGBM3
    AOA1
    GNN1
    AOA2
    AOA7
    GBM24
    JPA3
    AA3
    GBM3
    GBM16
    0.0 0.2 0.4 0.6 0.8 1.0
    Cluster Dendrogram
    Individuals are sorted according to their coordinate on F.1
    31 / 58

    View Slide

  33. Data Common Structure Groups Study Partial Analyses To go further Example
    Partition from the tree
    An empirical number of clusters is suggested
    0.0 0.4 0.8
    Hierarchical clustering
    inertia gain
    O4
    AO2
    sGBM1
    AOA6
    AO1
    AO3
    GBM1
    LGG1
    AOA3
    O3
    O1
    GBM6
    O2
    O5
    AOA4
    GBM25
    GS1
    GBM29
    GBM31
    GBM15
    GBM26
    JPA2
    GBM9
    GBM21
    GBM22
    GBM23
    GBM27
    GBM11
    GBM4
    GBM5
    GBM30
    GBM28
    GS2
    sGBM3
    AOA1
    GNN1
    AOA2
    AOA7
    GBM24
    JPA3
    AA3
    GBM3
    GBM16
    0.0 0.2 0.4 0.6 0.8 1.0
    Cluster Dendrogram
    32 / 58

    View Slide

  34. Data Common Structure Groups Study Partial Analyses To go further Example
    Partition on the principal component map
    −2 −1 0 1 2 3
    −4 −2 0 2
    Factor map
    Dim 1 (20.99%)
    Dim 2 (13.51%)
    GBM3
    GBM27
    GBM4
    GBM16
    GBM5 GBM9
    GBM30
    GS2
    GBM11
    GBM21
    GBM28
    GBM29
    GBM31
    GBM22
    JPA3
    GBM15
    GBM23
    GBM24
    GBM25
    GS1
    GBM26
    JPA2
    AA3
    sGBM3
    AOA4
    sGBM1
    AOA2
    GBM1
    GBM6
    AO2
    GNN1
    LGG1AOA6
    AOA3
    AO1
    O2
    AOA1
    AOA7
    O5
    AO3
    O3
    O1
    O4
    cluster 1
    cluster 2
    cluster 3
    cluster 1
    cluster 2
    cluster 3
    Continuous vision (principal component) and discontinuous
    (clusters)
    33 / 58

    View Slide

  35. Data Common Structure Groups Study Partial Analyses To go further Example
    Cluster description by variables
    v.test =
    ¯
    x − ¯
    x
    s2
    I
    I−I
    I−1
    H0
    : random sampling of I values from I
    with ¯
    x the mean of variable x in cluster , ¯
    x (s) the mean
    (standard deviation) of the variable x in the data set, I the cardinal
    of cluster
    $desc.var$quanti$‘1‘
    v.test Mean in category Overall mean sd in category Overall sd p.value
    TMEM49 4.488 -0.430 -1.424 0.722 1.277 0.000
    TNFRSF12A 4.433 -0.794 -1.838 0.789 1.357 0.000
    LGALS3 4.369 -0.222 -1.216 0.861 1.312 0.000
    S100A11 4.300 -0.737 -1.500 0.525 1.024 0.000
    BGN 4.273 2.105 1.106 0.697 1.348 0.000
    IFI30 4.264 0.987 0.026 0.979 1.300 0.000
    ....
    ....
    C9orf48 -4.411 -0.686 -0.037 0.540 0.848 0.000
    PSD3 -4.594 -1.684 -1.024 0.419 0.829 0.000
    AA398420 -4.635 0.324 1.134 0.635 1.007 0.000
    34 / 58

    View Slide

  36. Data Common Structure Groups Study Partial Analyses To go further Example
    Cluster description by observations
    • parangon: the closest observations to the centroid of the cluster
    min
    i∈
    d(xi., C ) with C the centroid of cluster
    • specific observations: the furthest observations to the centroids
    of the other clusters (the observations sorted according to their
    distance from the highest to the smallest to the closest centroid)
    max
    i∈
    min
    =
    d(xi., C )
    desc.ind$para
    cluster: 1
    GBM11 GBM28 GBM5 GBM25 GBM31
    0.6649847 0.7001998 0.7973604 0.8869271 0.9674042
    ---------------------------------------------------------------
    desc.ind$dist
    cluster: 1
    GBM30 GS2 GBM21 GBM22 GBM27
    3.227968 3.096048 3.031256 2.904327 2.778950
    ---------------------------------------------------------------
    35 / 58

    View Slide

  37. Data Common Structure Groups Study Partial Analyses To go further Example
    Cluster description
    • by the principal components (observations coordinates): same
    description than for continuous variables
    $desc.axes$quanti$‘1‘
    v.test Mean in category Overall mean sd in category Overall sd p.value
    Dim.2 2.919 0.511 0 0.465 1.010 0.004
    Dim.1 -4.458 -0.974 0 0.560 1.259 0.000
    • by categorical variables: chi-square and hypergeometric test
    $test.chi2
    p.value df
    type 8.433474e-06 6
    ⇒ Active and supplementary elements are used
    ⇒ Only significant results are presented
    36 / 58

    View Slide

  38. Data Common Structure Groups Study Partial Analyses To go further Example
    Cluster description
    $‘1‘
    Cla/Mod Mod/Cla Global p.value v.test
    type=GBM 75 94.73684 55.81395 3.300966e-06 4.651145
    type=OA 0 0.00000 13.95349 2.207775e-02 -2.289028
    type=O 0 0.00000 18.60465 5.071916e-03 -2.802430
    $‘2‘
    Cla/Mod Mod/Cla Global p.value v.test
    type=A 60 37.5 11.62791 0.0398214 2.055597
    $‘3‘
    Cla/Mod Mod/Cla Global p.value v.test
    type=O 100.0 50.00 18.60465 8.875341e-05 3.919444
    type=GBM 12.5 18.75 55.81395 2.319983e-04 -3.681354
    37 / 58

    View Slide

  39. Data Common Structure Groups Study Partial Analyses To go further Example
    Complementarity between hierarchical clustering and
    partitioning
    • Partitioning after AHC: the k-means algorithm is initialized
    from the centroids of the partition obtained from the tree
    • consolidate the partition
    • loss of the hierarchy
    • AHC with many individuals: time-consuming
    ⇒ partitioning before AHC
    • compute k-means with approximately 100 clusters
    • AHC on the weighted centroids obtained from the k-means
    ⇒ top of the tree is approximately the same
    38 / 58

    View Slide

  40. Data Common Structure Groups Study Partial Analyses To go further Example
    Other methods: ade4
    Two-table analysis
    Available methods
    individuals
    variables
    variables
    Function name Analysis name
    between Between-class analysis
    within Within-class analysis
    discrimin Discriminant analysis
    coinertia Coinertia analysis
    cca Canonical correspondence analysis
    pcaiv PCA on Instrumental Variables
    pcaivortho Orthogonal PCAIV
    procuste Procustes analysis
    niche Niche (OMI) analysis
    St´
    ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 22 / 31
    39 / 58

    View Slide

  41. Data Common Structure Groups Study Partial Analyses To go further Example
    Other methods: ade4
    Other functions
    K-table
    variables
    individuals
    Function name Analysis name
    sepan K-table separate analyses
    pta Partial triadic analysis
    foucart Foucart analysis
    statis STATIS analysis
    mfa Multiple factor analysis
    mcoa Multiple coinertia analysis
    statico 2 K-table analysis
    St´
    ephane Dray (Univ. Lyon 1) CARME 2011, Rennes 26 / 31
    40 / 58

    View Slide

  42. Data Common Structure Groups Study Partial Analyses To go further Example
    Other methods
    Predict one block with others:
    • Multi-block PLS regression
    • Multi-block PCA on instrumental variables..
    41 / 58

    View Slide

  43. Data Common Structure Groups Study Partial Analyses To go further Example
    RV Tests
    Is there any (linear) relationship between the 2 sets? H0
    : ρV = 0
    Asymptotic tests: distributions normal, elliptical - rank (Robert et
    al, 1985, Cléroux, 1995, Cléroux & Ducharme, 1989) nRV ∼ λi Z2
    i
    ⇒ sensitive to the departure from the distribution and to n
    42 / 58

    View Slide

  44. Data Common Structure Groups Study Partial Analyses To go further Example
    RV Tests
    Is there any (linear) relationship between the 2 sets? H0
    : ρV = 0
    Asymptotic tests: distributions normal, elliptical - rank (Robert et
    al, 1985, Cléroux, 1995, Cléroux & Ducharme, 1989) nRV ∼ λi Z2
    i
    ⇒ sensitive to the departure from the distribution and to n
    Permutation tests:
    permute one matrix’s rows - compute the RV for n! permutations
    p-value: proportion of the values greater than the observed one
    ⇒ computationally costly (“old fashion" argument?)
    42 / 58

    View Slide

  45. Data Common Structure Groups Study Partial Analyses To go further Example
    RV Tests
    Is there any (linear) relationship between the 2 sets? H0
    : ρV = 0
    Asymptotic tests: distributions normal, elliptical - rank (Robert et
    al, 1985, Cléroux, 1995, Cléroux & Ducharme, 1989) nRV ∼ λi Z2
    i
    ⇒ sensitive to the departure from the distribution and to n
    Permutation tests:
    permute one matrix’s rows - compute the RV for n! permutations
    p-value: proportion of the values greater than the observed one
    ⇒ computationally costly (“old fashion" argument?)
    Approximation of the permutation distribution
    • sampling from the permutations - package ade4 (RV.rtest)
    • moment matching: Pearson family, Edgeworth expansion
    42 / 58

    View Slide

  46. Data Common Structure Groups Study Partial Analyses To go further Example
    Moments matching
    The first three moments under H0 (Kazi-Aoual et al., 1995)
    EH0
    (RV ) =
    βx × βy
    n − 1
    βx
    =
    (tr(X X))2
    tr((X X)2)
    =
    ( λi
    )2
    λ2
    i
    .
    βx a measure of complexity 1 ≤ βx ≤ p
    RV large: n small and many orthogonal variables per group
    ⇒ Normal approximation:
    RVstd
    =
    RV − EH0
    (RV )
    VH0
    (RV )
    43 / 58

    View Slide

  47. Data Common Structure Groups Study Partial Analyses To go further Example
    Moments matching
    Problem: the exact distribution of the RVstd is often skewed
    Histogram of the standardized RV
    Density
    −1 0 1 2 3 4
    0.0 0.1 0.2 0.3 0.4 0.5
    Normal
    Gamma
    Edgeworth
    ⇒ Pearson type III
    f(x) (skewness= γ):
    (2/γ)4/γ2
    Γ(4/γ2)
    2+γx
    γ
    (4−γ2)/γ2
    e−2(2+xγ)/γ2
    ⇒ package FactoMineR (coeffRV) (Josse et al., 2008)
    44 / 58

    View Slide

  48. Data Common Structure Groups Study Partial Analyses To go further Example
    Back to the wine example!
    • 10 white wines from Val de Loire (5 Vouvray - 5 Sauvignon)
    • 27 continuous variables: sensory descriptors
    O.fruity
    O.passion
    O.citrus

    Sweetness
    Acidity
    Bitterness
    Astringency
    Aroma.intensity
    Aroma.persistency
    Visual.intensity
    Odor.preferene
    Overall.preference
    Label
    S Michaud 4.3 2.4 5.7 … 3.5 5.9 4.1 1.4 7.1 6.7 5.0 6.0 5.0 Sauvignon
    S Renaudie 4.4 3.1 5.3 … 3.3 6.8 3.8 2.3 7.2 6.6 3.4 5.4 5.5 Sauvignon
    S Trotignon 5.1 4.0 5.3 … 3.0 6.1 4.1 2.4 6.1 6.1 3.0 5.0 5.5 Sauvignon
    S Buisse Domaine 4.3 2.4 3.6 … 3.9 5.6 2.5 3.0 4.9 5.1 4.1 5.3 4.6 Sauvignon
    S Buisse Cristal 5.6 3.1 3.5 … 3.4 6.6 5.0 3.1 6.1 5.1 3.6 6.1 5.0 Sauvignon
    V Aub Silex 3.9 0.7 3.3 … 7.9 4.4 3.0 2.4 5.9 5.6 4.0 5.0 5.5 Vouvray
    V Aub Marigny 2.1 0.7 1.0 … 3.5 6.4 5.0 4.0 6.3 6.7 6.0 5.1 4.1 Vouvray
    V Font Domaine 5.1 0.5 2.5 … 3.0 5.7 4.0 2.5 6.7 6.3 6.4 4.4 5.1 Vouvray
    V Font Brûlés 5.1 0.8 3.8 … 3.9 5.4 4.0 3.1 7.0 6.1 7.4 4.4 6.4 Vouvray
    V Font Coteaux 4.1 0.9 2.7 … 3.8 5.1 4.3 4.3 7.3 6.6 6.3 6.0 5.7 Vouvray
    45 / 58

    View Slide

  49. Data Common Structure Groups Study Partial Analyses To go further Example
    Back to the wine example!
    • 3 panels (oenologists, naive consumers, our students!)
    • 60 preference scores: taste evaluation 1 - 10
    Categorical
    Continuous variables
    Student
    (15)
    wine 10

    wine 2
    wine 1
    Label
    (1)
    Preference
    (60)
    Consu
    mer
    (15)
    Expert
    (27)
    • How are the products described by the panels?
    • Do the panels describe the products in a same way? Is there a
    specific description done by one panel?
    46 / 58

    View Slide

  50. Data Common Structure Groups Study Partial Analyses To go further Example
    Practice with R
    1 Define groups of active and supplementary variables
    2 Scale or not the variables
    3 Perform MFA
    4 Choose the number of dimensions to interpret
    5 Simultaneously interpret the individuals and variables graphs
    6 Study the groups of variables
    7 Study the partial representations
    8 Use indicators to enrich the interpretation
    47 / 58

    View Slide

  51. Data Common Structure Groups Study Partial Analyses To go further Example
    Practice with R
    library(FactoMineR)
    Expert <- read.table("http://factominer.free.fr/docs/Expert_wine.csv",
    header=TRUE, sep=";", row.names=1)
    Consu <- read.table(".../Consumer_wine.csv",header=T,sep=";",row.names=1)
    Stud <- read.table(".../Student_wine.csv",header=T,sep=";",row.names=1)
    Pref <- read.table(".../Preference_wine.csv",header=T,sep=";",row.names=1)
    palette(c("black","red","blue","orange","darkgreen","maroon","darkviolet"))
    complet <- cbind.data.frame(Expert[,1:28],Consu[,2:16],Stud[,2:16],Pref)
    res.mfa <- MFA(complet,group=c(1,27,15,15,60),type=c("n",rep("s",4)),
    num.group.sup=c(1,5),graph=FALSE,
    name.group=c("Label","Expert","Consumer","Student","Preference"))
    plot(res.mfa,choix="group",palette=palette())
    plot(res.mfa,choix="var",invisible="quanti.sup",hab="group",palette=palette())
    plot(res.mfa,choix="ind",partial="all",habillage="group",palette=palette())
    plot(res.mfa,choix="axes",habillage="group",palette=palette())
    dimdesc(res.mfa)
    write.infile(res.mfa,file="my_wine_results.csv") #to export a list
    48 / 58

    View Slide

  52. Data Common Structure Groups Study Partial Analyses To go further Example
    Representation of the individuals
    -2 -1 0 1 2 3
    -3 -2 -1 0 1
    Dim 1 (42.52 %)
    Dim 2 (24.42 %)
    S Michaud
    S Renaudie
    S Trotignon
    S Buisse Domaine
    S Buisse Cristal
    V Aub Silex
    V Aub Marigny
    V Font Domaine
    V Font Brûlés
    V Font Coteaux
    Sauvignon
    Vouvray
    Sauvignon
    Vouvray
    • The two labels are
    well separated
    • Vouvray are
    sensorially more
    different
    • Several groups of
    wines, ...
    49 / 58

    View Slide

  53. Data Common Structure Groups Study Partial Analyses To go further Example
    Representation of the active variables
    -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
    -1.0 -0.5 0.0 0.5 1.0
    Dim 1 (42.52 %)
    Dim 2 (24.42 %)
    Expert
    Consumer
    Student O.Intensity.before.shaking
    O.Intensity.after.shaking
    Expression
    O.fruity
    O.passion
    O.citrus
    O.candied.fruit
    O.vanilla
    O.wooded
    O.mushroom
    O.plante
    O.flower
    O.alcohol
    Typicity
    Attack.intensity
    Sweetness
    Acidity
    Bitterness
    Astringency
    Freshness
    Oxidation
    Smoothness
    A.intensity
    A.persistency
    Visual.intensity
    Grade
    Surface.feeling
    O.Intensity.before.shaking_C
    O.Intensity.after.shaking_C
    O.alcohol_C
    O.plante_C
    O.mushroom_C
    O.passion_C
    O.Typicity_C
    A.intensity_C
    Sweetness_C
    Acidity_C
    Bitterness_C
    Astringency_C
    A.alcohol_C
    Balance_C
    Typical_C
    O.Intensity.before.shaking_S
    O.Intensity.after.shaking_S
    O.alcohol_S
    O.plante_S
    O.mushroom_S
    O.passion_S
    O.Typicity_S A.intensity_S
    Sweetness_S
    Acidity_S
    Bitterness_S
    Astringency_S
    A.alcohol_S
    Balance_S
    Typical_S
    50 / 58

    View Slide

  54. Data Common Structure Groups Study Partial Analyses To go further Example
    Representation of the active variables
    -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
    -1.0 -0.5 0.0 0.5 1.0
    Dim 1 (42.52 %)
    Dim 2 (24.42 %)
    Expert
    Consumer
    Student
    O.passion
    Sweetness
    Acidity
    O.passion_C
    Sweetness_C
    Acidity_C
    O.passion_S
    Sweetness_S
    Acidity_S
    50 / 58

    View Slide

  55. Data Common Structure Groups Study Partial Analyses To go further Example
    Representation of the groups
    0.0 0.2 0.4 0.6 0.8 1.0
    0.0 0.2 0.4 0.6 0.8 1.0
    Dim 1 (42.52 %)
    Dim 2 (24.42 %)
    Expert
    Consumer
    Student
    Preference
    Label
    • 2 groups are all the
    more close that they
    induce the same
    structure
    • The 1st dimension is
    common to all the
    panels
    • 2nd dimension mainly
    due to the experts
    • Preference linked to
    sensory description
    51 / 58

    View Slide

  56. Data Common Structure Groups Study Partial Analyses To go further Example
    Representation of the partial points
    -4 -2 0 2 4
    -3 -2 -1 0 1 2
    Dim 1 (42.52 %)
    Dim 2 (24.42 %)
    S Michaud
    S Renaudie
    S Trotignon
    S Buisse Domaine
    S Buisse Cristal
    V Aub Silex
    V Aub Marigny
    V Font Domaine
    V Font Brûlés
    V Font Coteaux
    Sauvignon
    Vouvray
    Expert
    Consumer
    Student
    52 / 58

    View Slide

  57. Data Common Structure Groups Study Partial Analyses To go further Example
    Representation of the partial dimensions
    -1.5 -1.0 -0.5 0.0 0.5 1.0
    -1.0 -0.5 0.0 0.5 1.0
    Dim 1 (42.52 %)
    Dim 2 (24.42 %)
    Dim1.Expert
    Dim2.Expert
    Dim1.Consumer
    Dim2.Consumer
    Dim1.Student
    Dim2.Student
    Dim1.Preference Dim2.Preference
    Dim1.Label
    Expert
    Consumer
    Student
    Preference
    Label
    • The two first
    dimensions of each
    group are well projected
    • Consumer has same
    dimensions as MFA
    53 / 58

    View Slide

  58. Data Common Structure Groups Study Partial Analyses To go further Example
    Representation of supplementary continuous variables
    -1.0 -0.5 0.0 0.5 1.0
    -1.0 -0.5 0.0 0.5 1.0
    Dim 1 (42.52 %)
    Dim 2 (24.42 %)
    ⇒ Preferences do not participated to the construction of the
    dimensions
    ⇒ Preferences are linked to sensory description
    54 / 58

    View Slide

  59. Data Common Structure Groups Study Partial Analyses To go further Example
    Representation of supplementary continuous variables
    -2 -1 0 1 2 3
    -3 -2 -1 0 1
    Dim 1 (42.52 %)
    Dim 2 (24.42 %)
    S Michaud
    S Renaudie
    S Trotignon
    S Buisse Domaine
    S Buisse Cristal
    V Aub Silex
    V Aub Marigny
    V Font Domaine
    V Font Brûlés
    V Font Coteaux
    Sauvignon
    Vouvray
    Sauvignon
    Vouvray
    -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
    -1.0 -0.5 0.0 0.5 1.0
    Dim 1 (42.52 %)
    Dim 2 (24.42 %)
    Expert
    Consumer
    Student
    O.passion
    Sweetness
    Acidity
    O.passion_C
    Sweetness_C
    Acidity_C
    O.passion_S
    Sweetness_S
    Acidity_S
    54 / 58

    View Slide

  60. Data Common Structure Groups Study Partial Analyses To go further Example
    Representation of supplementary continuous variables
    ⇒ Main information: the favourite is Vouvray Aubuisières Silex
    54 / 58

    View Slide

  61. Data Common Structure Groups Study Partial Analyses To go further Example
    Helps to interpret
    • Contribution of each group of variables to each component of
    the MFA
    > res.mfa$group$contrib
    Dim.1 Dim.2 Dim.3
    Expert 30.5 46.0 33.7
    Consumer 33.2 23.1 31.2
    Student 36.3 30.9 35.1
    • Similar contribution of the 3 groups
    to the first dimension
    • Second dimension mainly due to the
    expert
    • Correlation between the global cloud and each partial cloud
    > res.mfa$group$correlation
    Dim.1 Dim.2 Dim.3
    Expert 0.95 0.95 0.96
    Consumer 0.95 0.83 0.87
    Student 0.99 0.99 0.84
    First components are highly linked to
    the 3 groups: the 3 clouds of points
    are nearly homothetic
    55 / 58

    View Slide

  62. Data Common Structure Groups Study Partial Analyses To go further Example
    Similarity measures between groups
    > res.mfa$group$Lg
    Expert Consumer Student Preference Label MFA
    Expert 1.45 0.94 1.17 1.01 0.89 1.33
    Consumer 0.94 1.25 1.04 1.11 0.28 1.21
    Student 1.17 1.04 1.29 1.03 0.62 1.31
    Preference 1.01 1.11 1.03 1.47 0.37 1.18
    Label 0.89 0.28 0.62 0.37 1.00 0.67
    MFA 1.33 1.21 1.31 1.18 0.67 1.44
    > res.mfa$group$RV
    Expert Consumer Student Preference Label MFA
    Expert 1.00 0.70 0.85 0.69 0.74 0.92
    Consumer 0.70 1.00 0.82 0.82 0.25 0.90
    Student 0.85 0.82 1.00 0.75 0.55 0.96
    Preference 0.69 0.82 0.75 1.00 0.31 0.81
    Label 0.74 0.25 0.55 0.31 1.00 0.56
    MFA 0.92 0.90 0.96 0.81 0.56 1.00
    • Expert gives a richer description (Lg greater)
    • Groups Student and Expert are linked (RV = 0.85)
    • Group Student is the closest to the overall (RV = 0.96)
    56 / 58

    View Slide

  63. Data Common Structure Groups Study Partial Analyses To go further Example
    Partition from the tree
    An empirical number of clusters is suggested (minq
    Wq−Wq+1
    Wq−1−Wq
    )
    0.0 0.5 1.0 1.5 2.0
    Hierarchical Clustering
    inertia gain
    V Aub Silex
    S Trotignon
    S Renaudie
    S Michaud
    S Buisse Domaine
    S Buisse Cristal
    V Font Brûlés
    V Font Domaine
    V Aub Marigny
    V Font Coteaux
    0.0 0.5 1.0 1.5 2.0
    Hierarchical Classification
    57 / 58

    View Slide

  64. Data Common Structure Groups Study Partial Analyses To go further Example
    Partition on the principal component map
    -2 -1 0 1 2 3 4
    -3 -2 -1 0 1 2
    Dim 1 (42.52%)
    Dim 2 (24.42%)
    V Aub Silex
    S Trotignon
    S Buisse Domaine
    S Renaudie
    S Michaud
    S Buisse Cristal
    V Font Brûlés
    V Font Domaine
    V Aub Marigny
    V Font Coteaux
    cluster 1
    cluster 2
    cluster 3
    cluster 4
    cluster 5
    -2 -1 0 1 2 3 4
    -3 -2 -1 0 1 2
    Dim 1 (42.52%)
    Dim 2 (24.42%)
    V Aub Silex
    S Trotignon
    S Buisse Domaine
    S Renaudie
    S Michaud
    S Buisse Cristal
    V Font Brûlés
    V Font Domaine
    V Aub Marigny
    V Font Coteaux
    cluster 1
    cluster 2
    cluster 3
    cluster 4
    cluster 5
    Continuous vision (principal component) and discontinuous
    (clusters)
    58 / 58

    View Slide