Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Matthew McKay

C3bc10b8a72ed3c3bfd843793b8a9868?s=47 S³ Seminar
February 03, 2015

Matthew McKay

(Hong Kong University of Science and Technology)

https://s3-seminar.github.io/seminars/matthew-mckay/

Title — Signal Processing meets Immunology- Towards a Hepatitis C Vaccine via High-Dimensional Covariance Estimation

Abstract — Chronic Hepatitis C Virus (HCV) infection is one of the leading causes of liver failure and liver cancer, affecting around 3% of the world’s population. Current treatment for HCV is expensive, frequently fails, and accompanies massive side effects. Thus, there is an urgent need for an efficient HCV vaccine. The major problem related to the design of a HCV vaccine is its extreme variability that helps it to evade immune surveillance. This talk will discuss a new approach to vaccine design for HCV based on finding “multi-dimensionally conserved residues?. Effectively, the approach is based on a statistical study of the diverse publicly-available HCV sequences, using methods common in statistical signal processing; primarily, robust covariance estimation. Our analysis reveals parts of the virus that may be most susceptible to immune pressure, despite the high mutability of the virus. These studies are backed up with clinical evidence and serve as a basis for new vaccine designs that we propose. The talk is directed towards an electrical engineering or statistical signal processing audience, and assumes no prior knowledge of biology or immunology.

Biography — Matthew McKay received his Ph.D. from the University of Sydney, Australia, prior to joining the Hong Kong University of Science and Technology (HKUST), where he is currently the Hari Harilela Associate Professor of Electronic and Computer Engineering. He is currently on leave at MIT as a Visiting Scientist in the Institute for Medical Engineering and Science (IMES). Matthew’s research interests include communications, signal processing, and associated applications. Most recently, he has developed a keen interest in the interdisciplinary areas of computational immunology and financial engineering. He and his coauthors have received best paper awards at IEEE ICASSP 2006, IEEE VTC 2006, ACM IWCMC 2010, IEEE Globecom 2010, and IEEE ICC 2011. He also received a 2010 Young Author Best Paper Award by the IEEE Signal Processing Society, the 2011 Stephen O. Rice Prize in the Field of Communication Theory by the IEEE Communication Society, and the 2011 Young Investigator Research Excellence Award by the School of Engineering at HKUST. In 2013, he was the recipient the Asia-Pacific Best Young Researcher Award by the IEEE Communication Society.

C3bc10b8a72ed3c3bfd843793b8a9868?s=128

S³ Seminar

February 03, 2015
Tweet

Transcript

  1. Matthew McKay ECE Department Hong Kong University of Science and

    Technology Centrale-Supelec February 3, 2015 Signal Processing meets Immunology: Towards a Hepatitis C Vaccine via High- Dimensional Correlation Estimation
  2. Other Team Members 2 I-Ming Hsing Professor, CBME Head and

    Professor, BME Raymond H. Y. Louie Visiting Assistant Professor ECE Ahmed Abdul Quadeer PhD student, ECE Arup K. Chakraborty Robert T. Haslam Professor of Chemical Engineering, Professor of Chemistry, Physics, and Biological Engineering Karthik Shekhar Post-doc, Broad Institute
  3. Outline 3  Immunology Background  Vaccine Design – Challenges,

    Conventional Strategy, and Proposed Idea  Correlation Matrix Estimation using RMT  Vaccine Design – Details and Validation  Conclusions
  4. Virus 4  Invading microbial organism that replicates inside the

    living cells  Cause infectious diseases like  Human Immunodeficiency Virus (HIV) that leads to AIDS  Hepatitis (Hepatitis A,B,C virus)  Influenza (H1N1, H3N2, H7N9)
  5. Hepatitis C virus (HCV) 5  HCV causes an infectious

    disease that affects mainly the liver  More than 170 million people affected globally  Treatment available  Pegylated interferon and ribavirin  Expensive  Prolonged  Extensive side-effects  Frequently fails  No vaccine available! Vexing problem: Virus’s extreme mutability
  6. Virus consists of proteins 6 HCV Viral Genome

  7. Proteins consist of sequence of amino acids 7 No. Amino

    Acid Letter 1 Alanine A 2 Arginine R 3 Asparagine N 4 Aspartic acid D 5 Cysteine C 6 Glutamic acid E 7 Glutamine Q 8 Glycine G 9 Histidine H 10 Isoleucine I 11 Leucine L 12 Lysine K 13 Methionine M 14 Phenylalanine F 15 Proline P 16 Serine S 17 Threonine T 18 Tryptophan W 19 Tyrosine Y 20 Valine V
  8.  Same function but different effectiveness Protein properties 8 1

    2 3 4 5 6 7 8 9 10 11 12 13 14 15 Protein 1 V Y A T T S A S A G L R Q K K V A S K T K R S K G L R R K K 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Protein 1 V Y A T T S A S A G L R Q K K 1 2 3 4 5 6 7 8 Protein 2 M Q S A A K L R Different proteins have different amino acid sequence and length The same protein has similar length and amino acid sequence
  9. Multiple sequence alignment (MSA) 9 1 2 3 4 5

    6 7 8 9 10 11 12 13 14 15 … Sequence 1 V Y A T T S A S A G L R Q V K … Sequence 2 V Y S T T K R S K G L R Q K K … Sequence 3 V Y S T T S R S K G L R Q K K … : : : : : : : : : : : : : : : : … Sequence n V Y A T T S R S A G L R Q K K … Peptide All observed viral sequences are considered fit
  10. Peptide - MHC 10 Host cell Antibodies Virus T cell

    TCR Pathogen specific adaptive immune system BCR Infected cell B cell
  11. Peptide - MHC T cell TCR 11  Memory of

    past infections  Basis for vaccination  Goal: Find specific peptides that kill large number of infected cells Host cell Antibodies Virus B cell BCR CTL Pathogen specific adaptive immune system Infected cell B cell
  12. Peptide - MHC 12 T cell TCR Infected cell T

    cell T cell Epitope with no mutation Epitope with one mutation Cannot recognize Recognition and Activation Single mutation in peptide can abrogate T cell recognition
  13. Outline 13  Immunology Background  Vaccine Design – Challenges,

    Conventional Strategy, and Proposed Idea  Correlation Matrix Estimation using RMT  Vaccine Design – Details and Validation  Conclusions
  14. Vaccine Design Challenges 14  1. Which type of immune

    response should the vaccine induce?  2. Which proteins to target?  3. Which peptides of the protein to target?
  15. 1. B cell or T cell vaccine? 15  B

    cells (antibodies) based vaccine that targets the external proteins?  T cell based vaccine that targets the internal proteins?  Experimental and clinical studies reveal that HCV controllers use broadly directed T cell response to clear the virus T cell based immune response is important in case of HCV
  16. 2. Which proteins to target? 16  Why NS3? 

    Immune system of HCV Controllers target peptides of NS3  Comparatively large number of sequences Helicase/ Protease Function Membrane Binding Function Polymerase Function
  17. 3. Which peptides of the protein to target? 17 

    Major challenge  Difficult to address experimentally Use of statistical and computational methods to help finding a solution based on the large amount of sequence data available now
  18. Human Genome Project  Modern advances in bio-technology are revolutionizing

    the field of biomedical research  Landmark: Human Genome Project  Time Period: 1990  2003  Cost: 3 BILLION US DOLLARS  Advancement in Genomics paved the way for advanced study in the field of medicine to develop treatment of cancer and other diseases 18
  19. 19  Increase in data

  20. 20 and many more.. (e.g. UniProt, ProDm, VectorBase….) Lots of

    databases! Explosive growth in submissions! Open databases
  21. 21 Large number of sequences for many infectious diseases!

  22. 3. Which peptides of the protein to target? 22 

    Large number of sequences (observations) (2800+ in NS3)  Large number of amino acids in the protein (variables) (631 in NS3) Most difficult challenge to be addressed using high- dimensional correlation matrix estimation
  23.  No mutation at all  100% conserved  Conventional

    approach: Design a vaccine which can elicit a T cell response to target highly conserved peptides  Basis of a recently proposed HCV vaccine IC-41 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 … Sequence 1 V Y A T T S A S A G L R Q V K … Sequence 2 V Y S T T K R S K G L R Q K K … Sequence 3 V Y S T T S R S K G L R Q K K … : : : : : : : : : : : : : : : : … Sequence n V Y A T T S R S A G L R Q K K … Consensus Sequence V Y A T T S R S A G L R Q K K … A TOY EXAMPLE: Conventional vaccine design strategy Problem: High mutability of virus may result in escape mutations T cell T cell
  24. Proposed vaccine design approach 24 1 2 3 4 5

    6 7 8 9 10 11 12 13 14 15 … Sequence 1 V Y A T T S A S A G L R Q V K … Sequence 2 V Y S T T K R S K G L R Q K K … Sequence 3 V Y S T T S R S K G L R Q K K … : : : : : : : : : : : : : : : : … Sequence n V Y A T T S R S A G L R Q K K … Consensus Sequence V Y A T T S R S A G L R Q K K … Positively correlated pairs of locations  Beneficial mutations
  25. Proposed vaccine design approach 25 1 2 3 4 5

    6 7 8 9 10 11 12 13 14 15 … Sequence 1 V Y A T T S A S A G L R Q V K … Sequence 2 V Y S T T K R S K G L R Q K K … Sequence 3 V Y S T T S R S K G L R Q K K … : : : : : : : : : : : : : : : : … Sequence n V Y A T T S R S A G L R Q K K … Consensus Sequence V Y A T T S R S A G L R Q K K … Positively correlated pairs of locations  Beneficial mutations Negatively correlated pairs of locations  Harmful mutations Target the negatively correlated pairs of locations along with the 100% conserved ones and avoid the positively correlated pairs of locations
  26. Outline 26  Immunology Background  Vaccine Design – Challenges,

    Conventional Strategy, and Proposed Idea  Correlation Matrix Estimation using RMT  Vaccine Design – Details and Validation  Conclusions
  27. Technical problem … 27  Large number of sequences (observations)

    (2800+ in NS3)  Large number of amino acids in the protein (variables) (631 in NS3) Challenge: Accurate high dimensional correlation estimation
  28. Correlation matrix estimation  Examples  Portfolio management and risk

    assessment  Array processing  Designing wireless communication receivers  Number of observations ≈ number of variables  The sample correlation is known to have poor performance [Johnstone, 2001]
  29. Basis - RMT application in finance 29  Random Matrix

    Theory (RMT) for noise-cleaning in finance  RMT also instrumental in modern communication system design such as WiFi and cellular phones  HIV work by Arup Chakraborty (MIT) [PNAS, 2011]  Finding HIV sectors (groups of amino acids)  Designing vaccine to attack such sectors  Vaccine trials in progress Bouchaud Stanley Arup K. Chakraborty
  30. In the news… 30

  31. Method 31  Advantages:  The results can potentially yield

    significant improvements over IC-41  Such vaccine strategies can be explored with computational methods Obtain the Multiple Sequence Alignment (MSA) Construct the sample correlation matrix from MSA Clean the correlation matrix using RMT Design immunogen targeting the highly conserved and negatively correlated pairs of sites
  32. Sample correlation matrix 32

  33. Cleaned correlation matrix Statistical Noise Phylogenetic Noise

  34. Alternate covariance matrix estimation methods 34  Regularized (shrinkage) methods

    [Ledoit et. al., 2004, Ledoit et. al., 2012]  Sparse covariance matrix estimation [Bickel et. al., 2008, Cai et. al., 2012]  Sparse PCA [Johnstone et. al., 2009, Paul et. al. 2012, Ma 2013, Vu 2013, Liu et. al. 2014]  Robust estimation [Maronna 1976,, Couillet et. al. 2013, Zheng et. al. 2014]
  35. Outline 35  Immunology Background  Vaccine Design – Challenges,

    Conventional Strategy, and Proposed Idea  Correlation Matrix Estimation using RMT  Vaccine Design – Details and Validation  Conclusions
  36. Important factors in the proposed vaccine design 36 1. Metric

    L - calculated based on correlations 2. Population coverage MHC Peptide Host Cell T cell
  37. 1. Metric L - calculated based on correlations 37 

    PCP = Percentage of 100% conserved pairs  PNCP = Percentage of negatively correlated pairs  PPCP = Percentage of positively correlated pairs  PUCP = Percentage of uncorrelated pairs Vaccine Design Objective: Maximize L = PCP + PNCP – PPCP – PUCP Peptide 1 Peptide 1 with single mutation Peptide 2 Peptide 2 with single mutation
  38. 38 Cell MHC Molecules  Different people have different types

    of MHC molecules  Different MHC molecules may present different peptides  Thus different people may present different peptides Person 1 Person 2 Person 3 Person 4 Person 5 Difference in MHC molecules leads to presentation of different peptides across populations 2. Population Coverage
  39. 39 1 2 3 4 5 6 7 8 9

    10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 … V Y A T T S A S A G L R Q K K R E D K M V L K F G S … Person 1 V Y A T T S A S A G L R Q K K R E D K M V L K F G S … Person 2 V Y A T T S A S A G L R Q K K R E D K M V L K F G S … Person 3  Challenge: Designing a vaccine that covers a large proportion of population  Information required:  Detailed statistics of distribution of MHCs in a given population  Data of NS3 peptides presented by particular MHCs (IEDB database) V Y A T T S A S A G L R Q K K R E D K M V L K F G S … Person 4 2. Population Coverage
  40. Statistics of haplotypes in US Caucasian population [Maiers et. al.

    2007] 40
  41. Proposed T cell vaccine design 41  A list of

    32 peptides recognized by T cells in individuals in a large proportion of the US Caucasian population was compiled  We consider a 5-peptides based vaccine design for this population as an example APITAYAQQTRGLLGCIITSLTGRDKNQVEGEVQIVSTAAQTFLATCINGVCWTVYHGAGTRTIASPKGPVIQMYTNVDQDLV GWPAPQGARSLTPCTCGSSDLYLVTRHADVIPVRRRGDSRGSLLSPRPISYLKGSSGGPLLCPAGHAVGIFRAAVCTRGVAKAV DFIPVENLETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGI DPNIRTGVRTITTGSPITYSTYGKFLADGGCSGGAYDIIICDECHSTDATSILGIGTVLDQAETAGARLVVLATATPPGSVTVPHP NIEEVALSTTGEIPFYGKAIPLEVIKGGRHLIFCHSKKKCDELAAKLVALGINAVAYYRGLDVSVIPTSGDVVVVATDALMT GFTGDFDSVIDCNTCVTQTVDFSLDPTFTIETTTLPQDAVSRTQRRGRTGRGKPGIYRFVAPGERPSGMFDSSVLCECYDAGCA WYELTPAETTVRLRAYMNTPGLPVCQDHLEFWEGVFTGLTHIDAHFLSQTKQSGENLPYLVAYQATVCARAQAPPPSW DQMWKCLIRLKPTLHGPTPLLYRLGAVQNEVTLTHPITKYIMTCMSADLEVVT
  42. 42  Obtain 10 combinations with maximum L (effectiveness of

    combination to kill viruses)  Order them with respect to Dcov (double coverage) Combination Peptide 1 Peptide 2 Peptide 3 Peptide 4 Peptide 5 L Dcov 1 1251-1259 1292-1300 1436-1444 1585-1594 1585-1595 63.58 0.50 2 1123-1131 1169-1177 1251-1259 1292-1300 1436-1444 61.62 0.44 3 1123-1131 1175-1183 1251-1259 1292-1300 1436-1444 65.45 0.37 4 1123-1131 1175-1183 1251-1259 1359-1367 1436-1444 61.62 0.37 5 1169-1177 1175-1183 1251-1259 1292-1300 1436-1444 64.46 0.34 6 1123-1131 1251-1259 1292-1300 1359-1367 1436-1444 65.45 0.30 7 1251-1259 1292-1300 1436-1444 1540-1550 1541-1550 61.31 0.18 8 1169-1177 1251-1259 1292-1300 1359-1367 1436-1444 61.62 0.14 9 1175-1183 1251-1259 1292-1300 1359-1367 1436-1444 65.45 0.07 10 1123-1131 1175-1183 1251-1259 1292-1300 1359-1367 61.62 0.07 Proposed T cell vaccine design
  43. Analysis of NS3 peptides of IC41 43  Plus point

     No positively correlated pairs of sites!  Rank in 2-peptides based vaccine design  71 /496 0 0,02 0,04 0,06 0,08 0,1 0,12 0,14 1 IC41 2 3 4 5 Combination of 2 NS3 peptides Double Coverage 92 93 94 95 96 97 98 99 100 1 IC41 2 3 4 5 Combination of 2 NS3 peptides Mean conservation across all genotypes 67.03 38.34 75.44 72.55 80.39 86.93 L-score
  44. Validation 44  Experiments  Existing clinical and experimental data

     Cannot directly validate proposed peptides  Validation Strategy: 1. Identify group/sector of potentially vulnerable sites (negatively correlated) that are collectively coupled 2. Validate this sector by comparing with structural and clinical data 3. Check if our vaccine targets the sites in this sector
  45. 1. Identify sectors of potentially vulnerable sites 45  Use

    clustering algorithm based on eigenvectors of Ccleaned  Finance  Economic sectors
  46. 46 0,8 0,9 1 1 2 3 Mean conservation 0

    10 20 30 1 2 3 %Positive correlations 0 2 4 6 8 10 12 1 2 3 Sector %Negative correlations 0 2 4 6 8 1 2 3 Sector Neg/pos correlations 3-D Scatter plot of eigenvectors Sector 1 consists of the most immunologically vulnerable sites Three sectors of co-evolving sites in NS3
  47. 2. Structural significance of sector 1 47 Sector1 sites are

    dominant in the critical interface of the NS3 crystal structure (p-value < 0.01) Red – Sector 1 sites
  48. 2. Significance of sector 1 based on previously published experimental

    and clinical results 48 >30% Majority of peptides targeted by “HCV Controllers” consist of predominantly sector 1 sites (p-value < 0.05). 0 10 20 30 40 50 60 70 80 1 2 3 1 2 3 4 5 6 7 8 9 10 11 12 13 Allele- independent epitopes Allele-restricted epitopes % Sector 1 sites
  49. 3. Sector 1 sites in proposed vaccine design 49 Combination

    Peptide1 Peptide2 Peptide3 Peptide4 Peptide5 L Dcov 1 1251-1259 1292-1300 1436-1444 1585-1594 1585-1595 63.58 0.50 2 1123-1131 1169-1177 1251-1259 1292-1300 1436-1444 61.62 0.44 3 1123-1131 1175-1183 1251-1259 1292-1300 1436-1444 65.45 0.37 4 1123-1131 1175-1183 1251-1259 1359-1367 1436-1444 61.62 0.37 5 1169-1177 1175-1183 1251-1259 1292-1300 1436-1444 64.46 0.34 6 1123-1131 1251-1259 1292-1300 1359-1367 1436-1444 65.45 0.30 7 1251-1259 1292-1300 1436-1444 1540-1550 1541-1550 61.31 0.18 8 1169-1177 1251-1259 1292-1300 1359-1367 1436-1444 61.62 0.14 9 1175-1183 1251-1259 1292-1300 1359-1367 1436-1444 65.45 0.07 10 1123-1131 1175-1183 1251-1259 1292-1300 1359-1367 61.62 0.07 A large proportion (~60%) of sites in the proposed vaccine design belong to sector 1 (p-value < 0.01)
  50. Conclusions 50  Majority of the sites present in the

    proposed design belong to sector 1 that appears to be significant from experimental and clinical data available in literature  Numerical validation of currently proposed vaccine design, IC-41  Proposal of new vaccine design strategies which can:  Potentially improve upon IC-41 by inducing an immune response against more vulnerable parts of the HCV genome  Cover a large portion of the population (currently, for US)  Similar analysis for NS4B and NS5B proteins also reveals potential sites for vaccine design Next step: Experimental trials!
  51. Conclusions 51  There is much similarity between high-dimensional statistical

    problems in immunology and those in signal processing  Many methods common in SP find direct application (though, currently not well explored):  Maximum entropy modeling  Sampling methods (e.g., MCMC)  Sparsity  Subspace estimation  Robust estimation  Machine learning  …
  52. Related Publications 52  A. A. Quadeer, R. H. Y.

    Louie, K. Shekhar, A. K. Chakraborty, I. Hsing, and M. R. McKay, “Discovering statistical vulnerabilities in highly mutable viruses: a random matrix approach,” in Proc. of the IEEE Workshop on Statistical Signal Processing (SSP), Gold Coast,Australia, July 2014.  A. A. Quadeer, R. H. Y. Louie, K. Shekhar, A. K. Chakraborty, I. Hsing, and M. R. McKay, “Statistical linkage of substitutions in patient-derived sequences of genotype 1a hepatitis C virus non-structural protein 3 exposes targets for immunogen design,” Journal ofVirology, 88 (13), pp. 7628-7644, July 2014.
  53. Join us in Brisbane 19 – 24 April 2015 www.icassp2015.org