Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Alfred Hero

S³ Seminar
January 30, 2015

Alfred Hero

(University of Michigan, Ann Arbor, MI, USA)

https://s3-seminar.github.io/seminars/alfred-hero

Title — Correlation mining in high dimension with limited samples

Abstract — Correlation mining arises in many areas of engineering, social sciences, and natural sciences. Correlation mining discovers columns of a random matrix that are highly correlated with other columns of the matrix and can be used to construct a dependency network over columns. However, when the number n of samples is finite and the number p of columns increases such exploration becomes futile due to a phase transition phenomenon: spurious discoveries will eventually dominate. In this presentation I will present theory for predicting these phase transitions and present Poisson limit theorems that can be used to determine finite sample behavior of correlation structure. The theory has application to areas including gene expression analysis, network security, remote sensing, and portfolio selection.

Biography — Alfred O. Hero III received the B.S. (summa cum laude) from Boston University (1980) and the Ph.D from Princeton University (1984), both in Electrical Engineering. Since 1984 he has been with the University of Michigan, Ann Arbor, where he is the R. Jamison and Betty Williams Professor of Engineering. His primary appointment is in the Department of Electrical Engineering and Computer Science and he also has appointments, by courtesy, in the Department of Biomedical Engineering and the Department of Statistics. From 2008-2013 he was held the Digiteo Chaire d'Excellence at the Ecole Superieure d'Electricite, Gif-sur-Yvette, France. He is a Fellow of the Institute of Electrical and Electronics Engineers (IEEE) and several of his research articles have recieved best paper awards. Alfred Hero was awarded the University of Michigan Distinguished Faculty Achievement Award (2011). He received the IEEE Signal Processing Society Meritorious Service Award (1998), the IEEE Third Millenium Medal (2000), and the IEEE Signal Processing Society Technical Achievement Award (2014). Alfred Hero was President of the IEEE Signal Processing Society (2006-2008) and was on the Board of Directors of the IEEE (2009-2011) where he served as Director of Division IX (Signals and Applications). Alfred Hero's recent research interests are in statistical signal processing, machine learning and the analysis of high dimensional spatio-temporal data. Of particular interest are applications to networks, including social networks, multi-modal sensing and tracking, database indexing and retrieval, imaging, and genomic signal processing.

S³ Seminar

January 30, 2015
Tweet

More Decks by S³ Seminar

Other Decks in Research

Transcript

  1. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Correlation mining in high dimension with
    limited samples
    Alfred Hero
    University of Michigan - Ann Arbor
    Jan. 30, 2015
    1 54

    View Slide

  2. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    1 Motivation
    2 Correlation mining principles
    3 Application: network analysis
    4 Application: SPARC∗ predictor design
    5 Conclusions

    View Slide

  3. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Acknowledgements
    • Bala Rajaratnam, Stanford Statistics
    • Hamed Firouzi, UM EECS (doctoral student)
    • Rob Brown, UCLA Bioinformatics (doctoral student)
    • Yongsheng Huang, Merck Labs (Former UM-PIBS student)
    • Geoffrey Ginsburg, Amy Zaas, Chris Woods: Duke Medicine
    Sponsors
    • AFOSR Complex Networks Program
    • NSF Theoretical Foundations Program
    • ARO Social Informatics Program
    • ARO MURI Value of Information Program
    • NIH P01 program NIBIB - Meyer PI
    • DARPA Predicting Health and Disease Program - Ginsburg PI
    3 54

    View Slide

  4. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Outline
    1 Motivation
    2 Correlation mining principles
    3 Application: network analysis
    4 Application: SPARC∗ predictor design
    5 Conclusions
    ∗SPARC=Screening, Prediction, and Regression via Correlation (SPARC)
    4 54

    View Slide

  5. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Why mine for high sample correlation vs sample mean?
    Mining for treatment effects: p = 12023 biomarkers, n = 130
    samples/treatment
    HMBOX1 vs NRLP2 JARID1D vs SNX19
    Blue: treatment 1 (Sx). Green: treatment 2 (Asx).
    Solid: women. Hollow: men. Size: hours elapsed since inoculation.
    5 54

    View Slide

  6. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Correlation mining pipeline
    6 54

    View Slide

  7. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Network discovery from correlation
    O/I correlation gene correlation mutual correlation
    • p = 1.5 × 109 vertices • p = 23, 000 vertices • p = 7000 vertices
    • 6 × 109 ≤ 10−8 p
    2
    edges • 1.5 × 105 ≤ 10−3 p
    2
    edges • 7 × 105 ≤ 10−2 p
    2
    edges
    • n = 365 samples • n = 270 samples • n = 6 samples
    7 54

    View Slide

  8. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Network discovery from correlation
    O/I correlation gene correlation mutual correlation
    • ”Big data” aspects
    • Large number of unknowns (hubs, edges, subgraphs)
    • Small number of samples for inference on unknowns
    • Crucial need to manage uncertainty (false positives)
    8 54

    View Slide

  9. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Sample correlation: p = 2 variables n = 50 samples
    Sample correlation:
    corrX,Y =
    n
    i=1
    (Xi − X)(Yi − Y )
    n
    i=1
    (Xi − X)2 n
    i=1
    (Yi − Y )2
    ∈ [−1, 1]
    ,
    Positive correlation =1 Negative correlation =-1
    9 54

    View Slide

  10. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Sample correlation for two sequences: p = 2, n = 50
    Q: Are the two time sequences Xi and Yj correlated, e.g.
    |corrXY | > 0.5?
    10 54

    View Slide

  11. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Sample correlation for two sequences: p = 2, n = 50
    Q: Are the two time sequences Xi and Yj correlated?
    A: No. Computed over range i = 1, . . . 50: corrXY = −0.0809
    11 54

    View Slide

  12. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Sample correlation for two sequences: p = 2, n < 15
    Q: Are the two time sequences Xi and Yj correlated?
    A: Yes. corrXY > 0.5 over range i = 3, . . . 12 and corrXY < −0.5
    over range i = 29, . . . , 42.
    12 54

    View Slide

  13. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Correlating a set of p = 20 sequences
    Q: Are any pairs of sequences correlated? Are there patterns of
    correlation?
    13 54

    View Slide

  14. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Sample correlation R w/ correlation thresholding (0.5)
    Correlation matrix Thresholded matrix
    Apparent patterns emerge after thresholding each pairwise
    correlation at ±0.5. (12 cross-correlations).
    14 54

    View Slide

  15. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Associated sample correlation graph
    Graph has an edge between node (variable) i and j if ij-th entry of
    thresholded correlation is non-zero.
    Sequences are actually uncorrelated Gaussian.
    15 54

    View Slide

  16. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Misreporting of correlations is a real problem
    Source: Young and Karr, Significance, Sept. 2011
    16 54

    View Slide

  17. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    The problem of false discoveries: phase transitions
    • Number of discoveries exhibit phase transition phenomenon
    • This phenomenon gets worse as p/n increases.
    • Example: false discoveries of high correlation for uncorrelated
    Gaussian variables
    17 54

    View Slide

  18. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    The problem of false discoveries: phase transitions
    • Number of discoveries exhibit phase transition phenomenon
    • This phenomenon gets worse as p/n increases.
    • Example: false discoveries of high correlation for uncorrelated
    Gaussian variables
    18 54

    View Slide

  19. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Outline
    1 Motivation
    2 Correlation mining principles
    3 Application: network analysis
    4 Application: SPARC∗ predictor design
    5 Conclusions
    ∗SPARC=Screening, Prediction, and Regression via Correlation (SPARC)
    19 54

    View Slide

  20. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Principled design of correlation mining algorithms
    Design objective: estimate or detect patterns of correlation in high
    dimensional sample-poor environments with low error rates
    Fundamental design question
    What are the fundamental properties of a network of p
    interacting variables that can be accurately estimated
    from a small number n of measurements?
    Regimes
    • n/p → ∞: sample rich regime (CLT, LLNs)
    • n/p → c: sample critical regime (Semi-circle,
    Marchenko-Pastur)
    • n/p → 0: sample starved regime (Chen-Stein)
    It is important to design the procedure for the regime one is in

    View Slide

  21. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Fundamental sampling regimes
    • Classical asymptotics: n → ∞, p fixed (’small data’)
    • Mixed asymptotics: n → ∞, p → ∞ (’Medium sized data’)
    • Purely high dimensional: n fixed, p → ∞ (’Big data’)
    21 54

    View Slide

  22. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Why is correlation important?
    • Network modeling: learning/simulating descriptive models
    • Empirical prediction: forecast a response variable Y
    • Classification: estimate type of correlation from samples
    • Anomaly detection: localize unusual activity in a sample
    22 54

    View Slide

  23. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Why is correlation important?
    • Network modeling: learning/simulating descriptive models
    • Empirical prediction: forecast a response variable Y
    • Classification: estimate type of correlation from samples
    • Anomaly detection: localize unusual activity in a sample
    Each application requires estimate of cov matrix ΣX or its inverse
    Prediction: Linear minimum MSE predictor of q variables Y from X
    ˆ
    Y = ΣYX
    Σ−1
    X
    X
    Covariance matrix related to inter-dependency structure.
    Classification: QDA test H0
    : ΣX
    = Σ0
    vs H1
    : ΣX
    = Σ1
    XT
    (Σ−1
    0
    − Σ−1
    1
    )X
    H1
    >
    <
    H0
    η
    Anomaly detection: Mahalanobis test H0
    : ΣX
    = Σ0
    vs H1
    : ΣX
    = Σ0
    XT
    Σ−1
    0
    X
    XT
    X
    H1
    >
    <
    H0
    η
    22 54

    View Slide

  24. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Estimation, selection, testing, screening
    • Regularized l2 or lF covariance estimation
    • Banded covariance model: Bickel-Levina (2008) Sparse
    eigendecomposition model: Johnstone-Lu (2007)
    • Stein shrinkage estimator: Ledoit-Wolf (2005),
    Chen-Weisel-Eldar-H (2010)
    • Gaussian graphical model selection
    • l1
    regularized GGM: Meinshausen-B¨
    uhlmann (2006),
    Wiesel-Eldar-H (2010).
    • Sparse Kronecker GGM (Matrix Normal):Allen-Tibshirani
    (2010), Tsiligkaridis-Zhou-H (2012)
    • Independence testing
    • Sphericity test for multivariate Gaussian: Wilks (1935)
    • Maximal correlation test: Moran (1980), Eagleson (1983),
    Jiang (2004), Zhou (2007), Cai and Jiang (2011)
    • Correlation screening (H, Rajaratnam 2011, 2012)
    • Find variables having high correlation wrt other variables
    • Find hubs of degree ≥ k ≡ test maximal k-NN.
    23 54

    View Slide

  25. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Sample complexity regimes for different tasks
    Hero and Rajaratnam, submitted 2015
    • Sample complexity regime specified by # available samples
    • Some of these regimes require knowledge of sparsity factor
    • From L to R, regimes require progressively larger sample size
    24 54

    View Slide

  26. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Sample complexity regimes for different tasks
    Hero and Rajaratnam, submitted 2015
    • There are niche regimes for reliable screening, detection, . . . ,
    performance estimation
    • Smallest amount of data needed to screen for high correlations
    • Largest amount of data needed to quantify uncertainty
    25 54

    View Slide

  27. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Implication: adapt inference task to sample size
    Dichotomous sampling regimes has motivated (Firouzi-H-R 2014):
    • Progressive correlation mining
    ⇒ match the mining task to the available sample size.
    • Multistage correlation mining for budget limited applications
    ⇒ Screen small exploratory sample prior to big collection
    26 54

    View Slide

  28. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Screening edges and hubs (H-Rajaratnam 2011, 2012)
    After applying threshold ρ obtain a graph G having edges E
    · · ·
    • Number of hub nodes in G: Nδ,ρ = p
    i=1
    I(di ≥ δ)
    I(di ≥ δ) =
    1, card{j : j = i, |Cij | ≥ ρ} ≥ δ
    0, o.w.
    C is either sample correlation matrix
    R = diag(Sn)−1/2Sndiag(Sn)−1/2
    or sample partial correlation matrix
    ˆ
    Ω = diag(S†
    n
    )−1/2S†
    n
    diag(S†
    n
    )−1/2
    27 54

    View Slide

  29. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Asymptotics for fixed sample size n, p → ∞, and ρ → 1
    Asymptotics of hub screening: (Rajaratnam and H 2011, 2012))
    Assume that rows of n × p matrix X are i.i.d. circular complex
    random variables with bounded elliptically contoured density and
    block sparse covariance.
    Theorem
    Let p and ρ = ρp satisfy limp→∞ p1/δ(p − 1)(1 − ρ2
    p
    )(n−2)/2 = en,δ.
    Then
    P(Nδ,ρ > 0) →
    1 − exp(−λδ,ρ,n/2), δ = 1
    1 − exp(−λδ,ρ,n), δ > 1
    .
    λδ,ρ,n = p
    p − 1
    δ
    (P0(ρ, n))δ
    P0(ρ, n) = 2B((n − 2)/2, 1/2)
    1
    ρ
    (1 − u2)n−4
    2 du
    28 54

    View Slide

  30. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    False positive rate as function of ρ (δ = 1)
    n 550 500 450 150 100 50 10 8 6
    ρc 0.188 0.197 0.207 0.344 0.413 0.559 0.961 0.988 0.9997
    Critical threshold (δ = 1): ρc ≈ max{ρ : dE[Nδ,ρ]/dρ = −1}
    ρc = 1 − cn(p − 1)−2/(n−4)
    29 54

    View Slide

  31. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    False positive rate as function of ρ and n (δ = 1)
    p=10 (δ = 1) p=10000
    30 54

    View Slide

  32. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    False positive rate as function of ρ and n (δ = 1)
    p=10 (δ = 1) p=10000
    Critical threshold for any δ > 0 :
    ρc = 1 − cδ,n(p − 1)−2δ/δ(n−2)−2
    30 54

    View Slide

  33. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Critical threshold ρc
    as function of n (H-Rajaratnam 2012)
    31 54

    View Slide

  34. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Critical threshold ρc
    as function of n (H-Rajaratnam 2012)
    32 54

    View Slide

  35. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Outline
    1 Motivation
    2 Correlation mining principles
    3 Application: network analysis
    4 Application: SPARC∗ predictor design
    5 Conclusions
    ∗SPARC=Screening, Prediction, and Regression via Correlation (SPARC)
    33 54

    View Slide

  36. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Respiratory virus challenge study: experimental design
    Zaas et al, Cell, Host and Microbe, 2009
    Chen et al, IEEE Trans. Biomedical Engineering, 2010
    Chen et al BMC Bioinformatics, 2011
    Puig et al IEEE Trans. Signal Processing, 2011
    Huang et al, PLoS Genetics, 2011
    Woods et al, PLoS One, 2012
    Bazot et al, BMC Bioinformatics, 2013
    Zaas et al, Science Translation Medicine, 2014
    34 54

    View Slide

  37. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Data collected from seven challenge studies
    Challenge Virus Year Location Duration (hrs) # Subjects
    DEE1 RSV 2008 Retroscreen 166 20
    DEE2 H3N2 2009 Retroscreen 166 17
    DEE3 H1N1 2009 Retroscreen 166 24
    DEE4 H1N1 2010 Retroscreen 166 19
    DEE5 H3N2 2011 Retroscreen 680 21
    HRV UVA HRV 2008 Univ. of Virginia 120 20
    HRV Duke HRV 2010 Duke Univ. 136 30
    35 54

    View Slide

  38. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Details on H3N2 DEE2 challenge study
    • 17 subjects inoculated and sampled over 7 days
    • 373 samples collected
    • 21 Affymetrix gene chips assayed for each subject
    • p = 12023 genes recorded for each sample
    • 10 symptom scored from {0, 1, 2, 3} for each sample
    [Huang et al, PLoS Genetics, 2011]
    36 54

    View Slide

  39. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Critical threshold ρc
    for H3N2 DEE2
    Samples fall into 3 categories
    • Pre-inoculation samples
    • Number of Pre-inoc. samples: n = 34
    • Critical threshold: ρc
    = 0.70
    • 10−6 FWER threshold: ρ = 0.92
    • Post-inoculation symptomatic samples
    • Number of Post-inoc. Sx samples: n = 170
    • Critical threshold: ρc
    = 0.36
    • 10−6 FWER threshold: ρ = 0.55
    • Post-inoculation asymptomatic samples
    • Number of Pre-inoc. samples: n = 152
    • Critical threshold: ρc
    = 0.37
    • 10−6 FWER threshold: ρ = 0.57
    37 54

    View Slide

  40. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Correlation-mining the pre-inoc. samples
    • Screen correlation at FWER 10−6: 1658 genes, 8718 edges
    • Screen partial correlation at FWER 10−6: 39 genes, 111 edges
    38 54

    View Slide

  41. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    P-value waterfall analysis (Pre-inoc. parcor)
    39 54

    View Slide

  42. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Outline
    1 Motivation
    2 Correlation mining principles
    3 Application: network analysis
    4 Application: SPARC∗ predictor design
    5 Conclusions
    ∗SPARC=Screening, Prediction, and Regression via Correlation (SPARC)
    40 54

    View Slide

  43. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Correlation mining for predictor design: bipartite graph
    Q: What genes are predictive of certain symptom combinations?
    Firouzi, Rajaratnam and H, ”Two-stage variable selection for molecular prediction of disease ,” CAMSAP 2013
    41 54

    View Slide

  44. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Cost considerations
    Doing experiments are costly (>$250K per challenge study)
    Figure: Pricing per slide for Agilent Custom Micorarrays G2309F,
    G2513F, G4503A, G4502A (Feb 2013). Source: BMC RNA Profiling Core
    42 54

    View Slide

  45. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Single stage learning of a predictor
    Q: What genes are predictive of certain symptom combinations?
    43 54

    View Slide

  46. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Two-stage learning of a predictor (SPARC/SIS)
    44 54

    View Slide

  47. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Related work
    Falls in the general framework of adaptive support set recovery
    Some related work
    • Compressive sensing approaches
    • Distilled sampling (DS) (Haupt, Castro and Nowak 2010)
    • Sequentially designed compressive sensing (Haupt, Baraniuk,
    Castro and Nowak 2011)
    • Sparse multivariate regression approaches
    • Lasso recovery (Wainright 2006, Zhao and Yu 2007)
    • Group lasso recovery (Obozinski, Wainright, and Jordan 2008)
    • Sure independence screening (SIS) (Fan and Lv, 2008)
    • Screens cross-correlation Syx
    for good predictor variables
    • SPARC approach
    • Screens predictor coefficients Syx
    S†
    x
    • Only requires n = logt full-size samples in SPARC stage 1
    45 54

    View Slide

  48. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    SPARC recovery of support of active variables
    Theorem (Firouzi, H, R, 2013, 2014)
    Assume that the response Y satisfies the following noiseless
    ground truth model:
    Y = ai1
    Xi1
    + ai2
    Xi2
    + · · · + aik
    Xik
    If n ≥ Θ(logp) then, with probability at least 1 − 1/p, PCS
    recovers support of active variables π0.
    • Analogous to condition for LASSO support recovery (Obozinski,
    Wainright, Jordan 2008).
    • The constant in Θ(logp) is increasing in dynamic range
    coefficient
    |π0|−1
    l∈π0
    |al |
    minj∈π0
    |aj |
    ∈ [1, ∞)
    • Worst case: high dynamic range in active regression coefficients.
    46 54

    View Slide

  49. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Optimal pre-screening allocation under budget µ
    Assume that: cost(acquisition of 1 sample of 1 variable)=1. Define
    • Total budget for two-stage experiment: µ.
    • Number of selected variables k. Total number of samples t.
    To meet budget t, n, k, p must satisfy:
    np + (t − n)k ≤ µ
    Theorem
    MSE optimal pre-screening allocation rule for two-stage predictor
    n =
    O(logt), c(p − k)logt + kt ≤ µ
    0, o.w.
    When budget is tight skip stage 1 (n = 0).

    View Slide

  50. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Simulation comparing SPARC to LASSO/SIS
    Figure: Avg mis-selection for two-stage predictor under AR(1) model.
    n = 25logt samples are used for the first stage and all t samples are used
    the second stage. p = 10, 000.
    48 54

    View Slide

  51. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Comparison of prediction accuracy and computation
    Figure: Prediction accuracy (L) and avg. CPU time (Matlab) (R) for
    AR(1) model. SPARC compared to SIS and active set implementation of
    LASSO. p = 10, 000.
    49 54

    View Slide

  52. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Prediction of Symptoms of H3N2 Based on Gene
    Expression Levels
    Figure: Prediction accuracy for symptom prediction in H3N2 DEE2 50 54

    View Slide

  53. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Top 20 predictive biomarkers selected
    51 54

    View Slide

  54. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Variablility comparisons btwn PCS and lasso
    Figure:
    PCS genes for subj 25 lasso genes for subj 25
    52 54

    View Slide

  55. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Outline
    1 Motivation
    2 Correlation mining principles
    3 Application: network analysis
    4 Application: SPARC∗ predictor design
    5 Conclusions
    ∗SPARC=Screening, Prediction, and Regression via Correlation (SPARC)
    53 54

    View Slide

  56. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Conclusions
    Correlation mining requires care when n p
    • “Classical” low dimensional (“CLT”) setting inadequate.
    • “Ultra-high” dimensional setting inadequate when n fixed.
    • “Purely high” dimensional (”big data”) setting well suited
    • Universal phase transition thresholds can be predicted
    • Phase transitions useful for properly sample-sizing experiments

    View Slide

  57. Outline Motivation Correlation mining principles Network analysis SPARC predictor design Conclusions
    Conclusions
    Correlation mining requires care when n p
    • “Classical” low dimensional (“CLT”) setting inadequate.
    • “Ultra-high” dimensional setting inadequate when n fixed.
    • “Purely high” dimensional (”big data”) setting well suited
    • Universal phase transition thresholds can be predicted
    • Phase transitions useful for properly sample-sizing experiments
    Correlation mining topics not covered here
    • Individualized predictors: reference-aided classifiers (Liu et al
    2013)
    • Structured covariance: Kronecker, Toeplitz, low rank+sparse,
    etc (Tsiligkaridis and H 2013), (Greenewald and H 2014) ,,
    • Non-linear correlation mining (Todros and H, 2011, 2012)
    • Spectral correlation mining: stationary time series (Firouzi
    and H, 2014)
    54 54

    View Slide