Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Matthew McKay

S³ Seminar
February 03, 2015

Matthew McKay

(Hong Kong University of Science and Technology)

https://s3-seminar.github.io/seminars/matthew-mckay/

Title — Signal Processing meets Immunology- Towards a Hepatitis C Vaccine via High-Dimensional Covariance Estimation

Abstract — Chronic Hepatitis C Virus (HCV) infection is one of the leading causes of liver failure and liver cancer, affecting around 3% of the world’s population. Current treatment for HCV is expensive, frequently fails, and accompanies massive side effects. Thus, there is an urgent need for an efficient HCV vaccine. The major problem related to the design of a HCV vaccine is its extreme variability that helps it to evade immune surveillance. This talk will discuss a new approach to vaccine design for HCV based on finding “multi-dimensionally conserved residues?. Effectively, the approach is based on a statistical study of the diverse publicly-available HCV sequences, using methods common in statistical signal processing; primarily, robust covariance estimation. Our analysis reveals parts of the virus that may be most susceptible to immune pressure, despite the high mutability of the virus. These studies are backed up with clinical evidence and serve as a basis for new vaccine designs that we propose. The talk is directed towards an electrical engineering or statistical signal processing audience, and assumes no prior knowledge of biology or immunology.

Biography — Matthew McKay received his Ph.D. from the University of Sydney, Australia, prior to joining the Hong Kong University of Science and Technology (HKUST), where he is currently the Hari Harilela Associate Professor of Electronic and Computer Engineering. He is currently on leave at MIT as a Visiting Scientist in the Institute for Medical Engineering and Science (IMES). Matthew’s research interests include communications, signal processing, and associated applications. Most recently, he has developed a keen interest in the interdisciplinary areas of computational immunology and financial engineering. He and his coauthors have received best paper awards at IEEE ICASSP 2006, IEEE VTC 2006, ACM IWCMC 2010, IEEE Globecom 2010, and IEEE ICC 2011. He also received a 2010 Young Author Best Paper Award by the IEEE Signal Processing Society, the 2011 Stephen O. Rice Prize in the Field of Communication Theory by the IEEE Communication Society, and the 2011 Young Investigator Research Excellence Award by the School of Engineering at HKUST. In 2013, he was the recipient the Asia-Pacific Best Young Researcher Award by the IEEE Communication Society.

S³ Seminar

February 03, 2015
Tweet

More Decks by S³ Seminar

Other Decks in Research

Transcript

  1. Matthew McKay
    ECE Department
    Hong Kong University of Science and Technology
    Centrale-Supelec
    February 3, 2015
    Signal Processing meets Immunology:
    Towards a Hepatitis C Vaccine via High-
    Dimensional Correlation Estimation

    View Slide

  2. Other Team Members
    2
    I-Ming Hsing
    Professor, CBME
    Head and Professor, BME
    Raymond H. Y. Louie
    Visiting Assistant Professor
    ECE
    Ahmed Abdul Quadeer
    PhD student, ECE
    Arup K. Chakraborty
    Robert T. Haslam Professor of
    Chemical Engineering, Professor
    of Chemistry, Physics,
    and Biological Engineering
    Karthik Shekhar
    Post-doc, Broad Institute

    View Slide

  3. Outline
    3
     Immunology Background
     Vaccine Design – Challenges, Conventional Strategy, and
    Proposed Idea
     Correlation Matrix Estimation using RMT
     Vaccine Design – Details and Validation
     Conclusions

    View Slide

  4. Virus
    4
     Invading microbial organism that replicates
    inside the living cells
     Cause infectious diseases like
     Human Immunodeficiency Virus (HIV) that leads
    to AIDS
     Hepatitis (Hepatitis A,B,C virus)
     Influenza (H1N1, H3N2, H7N9)

    View Slide

  5. Hepatitis C virus (HCV)
    5
     HCV causes an infectious disease that affects
    mainly the liver
     More than 170 million people affected globally
     Treatment available  Pegylated interferon and
    ribavirin
     Expensive
     Prolonged
     Extensive side-effects
     Frequently fails
     No vaccine available!
    Vexing problem:
    Virus’s extreme mutability

    View Slide

  6. Virus consists of proteins
    6
    HCV Viral Genome

    View Slide

  7. Proteins consist of sequence of amino acids
    7
    No. Amino Acid Letter
    1 Alanine A
    2 Arginine R
    3 Asparagine N
    4 Aspartic acid D
    5 Cysteine C
    6 Glutamic acid E
    7 Glutamine Q
    8 Glycine G
    9 Histidine H
    10 Isoleucine I
    11 Leucine L
    12 Lysine K
    13 Methionine M
    14 Phenylalanine F
    15 Proline P
    16 Serine S
    17 Threonine T
    18 Tryptophan W
    19 Tyrosine Y
    20 Valine V

    View Slide

  8.  Same function but different effectiveness
    Protein properties
    8
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
    Protein 1 V Y A T T S A S A G L R Q K K
    V A S K T K R S K G L R R K K
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
    Protein 1 V Y A T T S A S A G L R Q K K
    1 2 3 4 5 6 7 8
    Protein 2 M Q S A A K L R
    Different proteins have different amino acid sequence and length
    The same protein has similar length and amino acid sequence

    View Slide

  9. Multiple sequence alignment (MSA)
    9
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 …
    Sequence 1 V Y A T T S A S A G L R Q V K …
    Sequence 2 V Y S T T K R S K G L R Q K K …
    Sequence 3 V Y S T T S R S K G L R Q K K …
    : : : : : : : : : : : : : : : : …
    Sequence n V Y A T T S R S A G L R Q K K …
    Peptide
    All observed viral sequences are considered fit

    View Slide

  10. Peptide - MHC
    10
    Host cell
    Antibodies
    Virus
    T cell
    TCR
    Pathogen specific adaptive immune system
    BCR
    Infected cell
    B cell

    View Slide

  11. Peptide - MHC
    T cell
    TCR
    11
     Memory of past infections  Basis for vaccination
     Goal: Find specific peptides that kill large number of infected cells
    Host cell
    Antibodies
    Virus
    B cell
    BCR
    CTL
    Pathogen specific adaptive immune system
    Infected cell
    B cell

    View Slide

  12. Peptide - MHC
    12
    T cell
    TCR
    Infected cell
    T cell
    T cell
    Epitope with no
    mutation
    Epitope with one
    mutation
    Cannot
    recognize
    Recognition
    and Activation
    Single mutation in peptide can abrogate T cell recognition

    View Slide

  13. Outline
    13
     Immunology Background
     Vaccine Design – Challenges, Conventional Strategy,
    and Proposed Idea
     Correlation Matrix Estimation using RMT
     Vaccine Design – Details and Validation
     Conclusions

    View Slide

  14. Vaccine Design Challenges
    14
     1. Which type of immune response should the vaccine induce?
     2. Which proteins to target?
     3. Which peptides of the protein to target?

    View Slide

  15. 1. B cell or T cell vaccine?
    15
     B cells (antibodies) based vaccine
    that targets the external
    proteins?
     T cell based vaccine that
    targets the internal proteins?
     Experimental and clinical
    studies reveal that
    HCV controllers use broadly
    directed T cell response to
    clear the virus
    T cell based immune response is
    important in case of HCV

    View Slide

  16. 2. Which proteins to target?
    16
     Why NS3?
     Immune system of HCV Controllers target peptides of NS3
     Comparatively large number of sequences
    Helicase/
    Protease
    Function
    Membrane
    Binding
    Function
    Polymerase
    Function

    View Slide

  17. 3. Which peptides of the protein to target?
    17
     Major challenge
     Difficult to address experimentally
    Use of statistical and computational methods to
    help finding a solution based on the large amount
    of sequence data available now

    View Slide

  18. Human Genome Project
     Modern advances in bio-technology are
    revolutionizing the field of biomedical
    research
     Landmark: Human Genome Project
     Time Period: 1990  2003
     Cost: 3 BILLION US DOLLARS
     Advancement in Genomics paved the way for
    advanced study in the field of medicine to
    develop treatment of cancer and other
    diseases
    18

    View Slide

  19. 19
     Increase in data

    View Slide

  20. 20
    and many more.. (e.g. UniProt, ProDm, VectorBase….)
    Lots of databases!
    Explosive growth in submissions!
    Open databases

    View Slide

  21. 21
    Large number of sequences for many
    infectious diseases!

    View Slide

  22. 3. Which peptides of the protein to target?
    22
     Large number of sequences (observations) (2800+ in NS3)
     Large number of amino acids in the protein (variables) (631 in NS3)
    Most difficult challenge to be addressed using high-
    dimensional correlation matrix estimation

    View Slide

  23.  No mutation at all  100% conserved
     Conventional approach: Design a vaccine which can
    elicit a T cell response to target highly conserved peptides
     Basis of a recently proposed HCV vaccine IC-41
    23
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 …
    Sequence 1 V Y A T T S A S A G L R Q V K …
    Sequence 2 V Y S T T K R S K G L R Q K K …
    Sequence 3 V Y S T T S R S K G L R Q K K …
    : : : : : : : : : : : : : : : : …
    Sequence n V Y A T T S R S A G L R Q K K …
    Consensus
    Sequence
    V Y A T T S R S A G L R Q K K …
    A TOY EXAMPLE:
    Conventional vaccine design strategy
    Problem: High mutability of virus may result
    in escape mutations
    T cell
    T cell

    View Slide

  24. Proposed vaccine design approach
    24
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 …
    Sequence 1 V Y A T T S A S A G L R Q V K …
    Sequence 2 V Y S T T K R S K G L R Q K K …
    Sequence 3 V Y S T T S R S K G L R Q K K …
    : : : : : : : : : : : : : : : : …
    Sequence n V Y A T T S R S A G L R Q K K …
    Consensus
    Sequence
    V Y A T T S R S A G L R Q K K …
    Positively correlated pairs of locations  Beneficial mutations

    View Slide

  25. Proposed vaccine design approach
    25
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 …
    Sequence 1 V Y A T T S A S A G L R Q V K …
    Sequence 2 V Y S T T K R S K G L R Q K K …
    Sequence 3 V Y S T T S R S K G L R Q K K …
    : : : : : : : : : : : : : : : : …
    Sequence n V Y A T T S R S A G L R Q K K …
    Consensus
    Sequence
    V Y A T T S R S A G L R Q K K …
    Positively correlated pairs of locations  Beneficial mutations
    Negatively correlated pairs of locations  Harmful mutations
    Target the negatively correlated pairs of locations along with the
    100% conserved ones and avoid the positively correlated pairs of locations

    View Slide

  26. Outline
    26
     Immunology Background
     Vaccine Design – Challenges, Conventional Strategy, and
    Proposed Idea
     Correlation Matrix Estimation using RMT
     Vaccine Design – Details and Validation
     Conclusions

    View Slide

  27. Technical problem …
    27
     Large number of sequences (observations) (2800+ in NS3)
     Large number of amino acids in the protein (variables) (631 in NS3)
    Challenge: Accurate high dimensional correlation estimation

    View Slide

  28. Correlation matrix estimation
     Examples
     Portfolio management and risk assessment
     Array processing
     Designing wireless communication receivers
     Number of observations ≈ number of variables
     The sample correlation is known to have poor performance
    [Johnstone, 2001]

    View Slide

  29. Basis - RMT application in finance
    29
     Random Matrix Theory (RMT) for noise-cleaning
    in finance
     RMT also instrumental in modern communication system design
    such as WiFi and cellular phones
     HIV work by Arup Chakraborty (MIT) [PNAS, 2011]
     Finding HIV sectors (groups of amino acids)
     Designing vaccine to attack such sectors
     Vaccine trials in progress
    Bouchaud Stanley
    Arup K. Chakraborty

    View Slide

  30. In the news…
    30

    View Slide

  31. Method
    31
     Advantages:
     The results can potentially yield significant improvements over IC-41
     Such vaccine strategies can be explored with computational methods
    Obtain the Multiple
    Sequence Alignment
    (MSA)
    Construct the sample
    correlation matrix
    from MSA
    Clean the correlation
    matrix using RMT
    Design immunogen
    targeting the highly
    conserved and
    negatively correlated
    pairs of sites

    View Slide

  32. Sample correlation matrix
    32

    View Slide

  33. Cleaned correlation matrix
    Statistical Noise Phylogenetic Noise

    View Slide

  34. Alternate covariance matrix estimation
    methods
    34
     Regularized (shrinkage) methods [Ledoit et. al., 2004, Ledoit et. al., 2012]
     Sparse covariance matrix estimation [Bickel et. al., 2008, Cai et. al., 2012]
     Sparse PCA [Johnstone et. al., 2009, Paul et. al. 2012, Ma 2013, Vu 2013, Liu et. al. 2014]
     Robust estimation [Maronna 1976,, Couillet et. al. 2013, Zheng et. al. 2014]

    View Slide

  35. Outline
    35
     Immunology Background
     Vaccine Design – Challenges, Conventional Strategy, and
    Proposed Idea
     Correlation Matrix Estimation using RMT
     Vaccine Design – Details and Validation
     Conclusions

    View Slide

  36. Important factors in the proposed vaccine
    design
    36
    1. Metric L - calculated based on correlations
    2. Population coverage
    MHC
    Peptide
    Host Cell
    T cell

    View Slide

  37. 1. Metric L - calculated based on correlations
    37
     PCP = Percentage of 100% conserved pairs
     PNCP = Percentage of negatively correlated pairs
     PPCP = Percentage of positively correlated pairs
     PUCP = Percentage of uncorrelated pairs
    Vaccine Design Objective:
    Maximize L = PCP + PNCP – PPCP – PUCP
    Peptide 1 Peptide 1 with single mutation
    Peptide 2 Peptide 2 with single mutation

    View Slide

  38. 38
    Cell
    MHC Molecules
     Different people have different types of MHC molecules
     Different MHC molecules may present different peptides
     Thus different people may present different peptides
    Person 1 Person 2 Person 3 Person 4 Person 5
    Difference in MHC molecules leads to presentation of different
    peptides across populations
    2. Population Coverage

    View Slide

  39. 39
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 …
    V Y A T T S A S A G L R Q K K R E D K M V L K F G S …
    Person 1
    V Y A T T S A S A G L R Q K K R E D K M V L K F G S …
    Person 2
    V Y A T T S A S A G L R Q K K R E D K M V L K F G S …
    Person 3
     Challenge: Designing a vaccine that covers a large proportion of population
     Information required:
     Detailed statistics of distribution of MHCs in a given population
     Data of NS3 peptides presented by particular MHCs (IEDB database)
    V Y A T T S A S A G L R Q K K R E D K M V L K F G S …
    Person 4
    2. Population Coverage

    View Slide

  40. Statistics of haplotypes in US Caucasian
    population [Maiers et. al. 2007]
    40

    View Slide

  41. Proposed T cell vaccine design
    41
     A list of 32 peptides recognized by T cells in individuals in a large proportion of
    the US Caucasian population was compiled
     We consider a 5-peptides based vaccine design for this population as an
    example
    APITAYAQQTRGLLGCIITSLTGRDKNQVEGEVQIVSTAAQTFLATCINGVCWTVYHGAGTRTIASPKGPVIQMYTNVDQDLV
    GWPAPQGARSLTPCTCGSSDLYLVTRHADVIPVRRRGDSRGSLLSPRPISYLKGSSGGPLLCPAGHAVGIFRAAVCTRGVAKAV
    DFIPVENLETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGI
    DPNIRTGVRTITTGSPITYSTYGKFLADGGCSGGAYDIIICDECHSTDATSILGIGTVLDQAETAGARLVVLATATPPGSVTVPHP
    NIEEVALSTTGEIPFYGKAIPLEVIKGGRHLIFCHSKKKCDELAAKLVALGINAVAYYRGLDVSVIPTSGDVVVVATDALMT
    GFTGDFDSVIDCNTCVTQTVDFSLDPTFTIETTTLPQDAVSRTQRRGRTGRGKPGIYRFVAPGERPSGMFDSSVLCECYDAGCA
    WYELTPAETTVRLRAYMNTPGLPVCQDHLEFWEGVFTGLTHIDAHFLSQTKQSGENLPYLVAYQATVCARAQAPPPSW
    DQMWKCLIRLKPTLHGPTPLLYRLGAVQNEVTLTHPITKYIMTCMSADLEVVT

    View Slide

  42. 42
     Obtain 10 combinations with maximum L (effectiveness of combination to kill
    viruses)
     Order them with respect to Dcov (double coverage)
    Combination Peptide 1 Peptide 2 Peptide 3 Peptide 4 Peptide 5 L Dcov
    1 1251-1259 1292-1300 1436-1444 1585-1594 1585-1595 63.58 0.50
    2 1123-1131 1169-1177 1251-1259 1292-1300 1436-1444 61.62 0.44
    3 1123-1131 1175-1183 1251-1259 1292-1300 1436-1444 65.45 0.37
    4 1123-1131 1175-1183 1251-1259 1359-1367 1436-1444 61.62 0.37
    5 1169-1177 1175-1183 1251-1259 1292-1300 1436-1444 64.46 0.34
    6 1123-1131 1251-1259 1292-1300 1359-1367 1436-1444 65.45 0.30
    7 1251-1259 1292-1300 1436-1444 1540-1550 1541-1550 61.31 0.18
    8 1169-1177 1251-1259 1292-1300 1359-1367 1436-1444 61.62 0.14
    9 1175-1183 1251-1259 1292-1300 1359-1367 1436-1444 65.45 0.07
    10 1123-1131 1175-1183 1251-1259 1292-1300 1359-1367 61.62 0.07
    Proposed T cell vaccine design

    View Slide

  43. Analysis of NS3 peptides of IC41
    43
     Plus point  No positively correlated pairs of sites!
     Rank in 2-peptides based vaccine design  71 /496
    0
    0,02
    0,04
    0,06
    0,08
    0,1
    0,12
    0,14
    1 IC41 2 3 4 5
    Combination of 2 NS3 peptides
    Double Coverage
    92
    93
    94
    95
    96
    97
    98
    99
    100
    1 IC41 2 3 4 5
    Combination of 2 NS3 peptides
    Mean conservation across all genotypes
    67.03 38.34 75.44 72.55
    80.39 86.93
    L-score

    View Slide

  44. Validation
    44
     Experiments
     Existing clinical and experimental data
     Cannot directly validate proposed peptides
     Validation Strategy:
    1. Identify group/sector of potentially vulnerable sites (negatively correlated)
    that are collectively coupled
    2. Validate this sector by comparing with structural and clinical data
    3. Check if our vaccine targets the sites in this sector

    View Slide

  45. 1. Identify sectors of potentially vulnerable
    sites
    45
     Use clustering algorithm based on eigenvectors of Ccleaned
     Finance  Economic sectors

    View Slide

  46. 46
    0,8
    0,9
    1
    1 2 3
    Mean
    conservation
    0
    10
    20
    30
    1 2 3
    %Positive
    correlations
    0
    2
    4
    6
    8
    10
    12
    1 2 3
    Sector
    %Negative
    correlations
    0
    2
    4
    6
    8
    1 2 3
    Sector
    Neg/pos
    correlations
    3-D Scatter plot of
    eigenvectors
    Sector 1 consists of the most
    immunologically vulnerable sites
    Three sectors of co-evolving sites in NS3

    View Slide

  47. 2. Structural significance of sector 1
    47
    Sector1 sites are dominant in the critical interface of the NS3 crystal structure
    (p-value < 0.01)
    Red – Sector 1 sites

    View Slide

  48. 2. Significance of sector 1 based on previously
    published experimental and clinical results
    48
    >30%
    Majority of peptides targeted by “HCV Controllers” consist of
    predominantly sector 1 sites (p-value < 0.05).
    0
    10
    20
    30
    40
    50
    60
    70
    80
    1 2 3 1 2 3 4 5 6 7 8 9 10 11 12 13
    Allele-
    independent
    epitopes
    Allele-restricted epitopes
    % Sector 1 sites

    View Slide

  49. 3. Sector 1 sites in proposed vaccine design
    49
    Combination Peptide1 Peptide2 Peptide3 Peptide4 Peptide5 L Dcov
    1 1251-1259 1292-1300 1436-1444 1585-1594 1585-1595 63.58 0.50
    2 1123-1131 1169-1177 1251-1259 1292-1300 1436-1444 61.62 0.44
    3 1123-1131 1175-1183 1251-1259 1292-1300 1436-1444 65.45 0.37
    4 1123-1131 1175-1183 1251-1259 1359-1367 1436-1444 61.62 0.37
    5 1169-1177 1175-1183 1251-1259 1292-1300 1436-1444 64.46 0.34
    6 1123-1131 1251-1259 1292-1300 1359-1367 1436-1444 65.45 0.30
    7 1251-1259 1292-1300 1436-1444 1540-1550 1541-1550 61.31 0.18
    8 1169-1177 1251-1259 1292-1300 1359-1367 1436-1444 61.62 0.14
    9 1175-1183 1251-1259 1292-1300 1359-1367 1436-1444 65.45 0.07
    10 1123-1131 1175-1183 1251-1259 1292-1300 1359-1367 61.62 0.07
    A large proportion (~60%) of sites in the proposed vaccine design belong to
    sector 1 (p-value < 0.01)

    View Slide

  50. Conclusions
    50
     Majority of the sites present in the proposed design belong to sector 1
    that appears to be significant from experimental and clinical data
    available in literature
     Numerical validation of currently proposed vaccine design, IC-41
     Proposal of new vaccine design strategies which can:
     Potentially improve upon IC-41 by inducing an immune response against more vulnerable parts of
    the HCV genome
     Cover a large portion of the population (currently, for US)
     Similar analysis for NS4B and NS5B proteins also reveals potential sites
    for vaccine design
    Next step: Experimental trials!

    View Slide

  51. Conclusions
    51
     There is much similarity between high-dimensional statistical problems
    in immunology and those in signal processing
     Many methods common in SP find direct application (though, currently
    not well explored):
     Maximum entropy modeling
     Sampling methods (e.g., MCMC)
     Sparsity
     Subspace estimation
     Robust estimation
     Machine learning
     …

    View Slide

  52. Related Publications
    52
     A. A. Quadeer, R. H. Y. Louie, K. Shekhar, A. K. Chakraborty, I. Hsing, and M.
    R. McKay, “Discovering statistical vulnerabilities in highly mutable viruses: a
    random matrix approach,” in Proc. of the IEEE Workshop on Statistical Signal
    Processing (SSP), Gold Coast,Australia, July 2014.
     A. A. Quadeer, R. H. Y. Louie, K. Shekhar, A. K. Chakraborty, I. Hsing, and M.
    R. McKay, “Statistical linkage of substitutions in patient-derived sequences of
    genotype 1a hepatitis C virus non-structural protein 3 exposes targets for
    immunogen design,” Journal ofVirology, 88 (13), pp. 7628-7644, July 2014.

    View Slide

  53. Join us in Brisbane
    19 – 24 April 2015
    www.icassp2015.org

    View Slide