Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Human Pangenome Graph Analysis for the MHC

Jason Chin
November 08, 2023
33

Human Pangenome Graph Analysis for the MHC

Present in ASHI 2023 for detailed analysis of the MHC class II regions genome architectures

Jason Chin

November 08, 2023
Tweet

Transcript

  1. Confidential & Proprietary. Do Not Distribute. Human Pangenome Graph Analysis

    for the MHC ASHI, Oct.18 2023, San Antonio Jason Chin, VP Genomics Technology and Algorithm
  2. Confidential & Proprietary. Do Not Distribute. 2 MHC Haplotype Project

    2000-2006 Traditional approaches to determining haplotypes based on limited genotype / SNPs information have inherent ambiguity/errors. Full “haplotype assembly contigs” (~haplotigs with all bases in MHC from single choromosome) can resolve these ambiguities but it was hard and expensive to do.
  3. Confidential & Proprietary. Do Not Distribute. 3 Continued Progress of

    The Genomics Technology From the first long-read human genome assembly (not haplotype resolved) 2014 2020 First complete haplotyped resolved MHC assembly published from diploid sample with only shotgun sequencing A diploid assembly-based benchmark for variants in the major histocompatibility complex, Chin, et. al., 2020, Nature Communication 2023 47 Phased human genome assemblies published by Human Pangenome Reference Consortium A draft human pangenome reference, Liao, et. al, Nature 2023
  4. Confidential & Proprietary. Do Not Distribute. 4 Human Pangenome Reference

    Consortium With the similar haplotype read separation techniques, 47 haplotyped resolved genome assemblies generated and published.
  5. Confidential & Proprietary. Do Not Distribute. 5 Genome Assembly for

    The MHC Regions The first full haplotype resolved MCH assembly by shotgun sequence : A diploid assembly-based benchmark for variants in the major histocompatibility complex, Chin, et. al., 2020, Nature Communication Long-read sequence separated by haplotype are assembled separately to get full phased “haplotigs” through the 5Mb MHC region for sample HG002.
  6. Confidential & Proprietary. Do Not Distribute. 6 How Can We

    “See” and Analyze The Pangenome to Understand the Complexity of the MHC How do we visualize complex regions like MHC with the pangenome? We develop Pangenome Research Toolkit (PGR-TK) for analyzing pangenome data
  7. Confidential & Proprietary. Do Not Distribute. 7 In Pangenome, We

    Need Represent Many Haplotypes at Once Haplotype I Haplotype III Haplotype II Different haplotype might have shared and distinct blocks. However, we don’t know which parts are shared by just looking at each individual sequence.
  8. Confidential & Proprietary. Do Not Distribute. 8 One Solution: Capture

    all information in a pangenome graph: represent all sequences as paths in a graph. Each possible path can be a possible haplotype Each box (the node in the graph) represents a block of similar DNA sequences from different haplotypes A haplotype sequence is a path in the graph Different haplotypes have different paths (There are many different flavors of pangenome graphs, here we focus on Minimizer Anchored Pangenome Graph from PGR-TK, MAP-Graph)
  9. Confidential & Proprietary. Do Not Distribute. 9 Minimizer Anchored Pangenome

    Graph (MAP-Graph) for the MHC Class II Region Chr6:32,163,513-32,992,088 GRCh38 and HG19 New Telomere to telomere Reference: CHM13 If one genome does not share similar path from the reference genomes, it is almost impossible to map the sequence reads and perform analyze. Pangenome DRB1 DQA1 DQB1
  10. Confidential & Proprietary. Do Not Distribute. 10 chr6_cox_hap2_hg19_3602951_4403309_0 chr6_dbb_hap3_hg19_3481454_4240123_0 chr6_mcf_hap5_hg19_3512099_4282867_0

    chr6_qbl_hap6_hg19_3393290_4184339_0 chr6_ssto_hap7_hg19_3480367_4439545_0 Note: These cell lines are HLA “homozygous”
  11. Confidential & Proprietary. Do Not Distribute. 11 HG002 Both haplotypes

    from sequence data HG002 Haplotype 1 HG002 Haplotype 2 Haplotype resolved assembly Haplotype resolved assembly
  12. Confidential & Proprietary. Do Not Distribute. 13 Shared Sequence Blocks

    And The MAP graph The graph is useful, but it may not be intuitive for most biomedical researchers. We develop a linearize view to make complex variation easier to interpret. MAP Graph Haplotype I Haplotype III Haplotype II The shared blocks along with each haplotype Shared Blocks Each block represents a set of closely related similar sequences
  13. Confidential & Proprietary. Do Not Distribute. 14 MAP-Graph vs. Shared

    Blocks View 94 HPRC haplotypes (+ HG38 + CHM13)
  14. Confidential & Proprietary. Do Not Distribute. 15 Calling HLA Types

    for MHC Class II Region Direct HLA and KIR type calling on the pangenome with cutting edge bioinformatics tools: “miniprot” and “Immuannot” (developed by Ying Zhou and Heng Li) https://github.com/YingZhou001/Immuannot DRA*01-DRB3*01-DRB1*13-DQA1*01-DQB1*06-DQA2*01-DQB2*01-DOB*01-TAP2*02- TAP1*01-DMB*01-DMA*01 + HG01243#2#JAHEOX010000097.1_26625704_27447351_1 DRA*01-DRB4*01-DRB1*04-DQA1*03-DQB1*03-DQA2*01-DQB2*01-DOB*01-TAP2*02- TAP1*06-DMB*01-DMA*01 + chr6_ssto_hap7_hg19_3480367_4439545_0 Get gene level haplotypes analyzed along with the sequence level haplotypes DRA*01 DRB4*01 DRB1*04 DQA1*03 DRB3*01 DRB1*13 https://github.com/cschin/dash13-KIR-MHC-assemblies/
  15. Confidential & Proprietary. Do Not Distribute. 16 Highlight of Several

    Distinct Clusters DRB3*01/02/03-DRB1*13-DQA1*01-DQB1*05/06-DQA2*01 Tree built with just DRB1 exon sequence
  16. Confidential & Proprietary. Do Not Distribute. 17 Highlight of Several

    Distinct Clusters DRB5*01-DRB1*15-DQA1*01-DQB1*06-DQA2*01-DQB2*01 Tree built with just DRB1 exon sequence Two observations: • The DRB1 CDS are highly associated with overall sequence level haplotype structures • There are still non-coding level structural differences even the gene level haplotypes are the same. All haplotype clusters in HTML
  17. Confidential & Proprietary. Do Not Distribute. 18 Gene Level Haplotype

    Flow Populational level haplotype structure from DRA to DQA Haplotigs without DRB3/4/5 DRB1*08 DRB1*10
  18. Confidential & Proprietary. Do Not Distribute. 19 Population Level Haplotype

    Flow Populational level haplotype structure from DRA to DQA (Add colors for different population and two new allele haplotype combination) EAS AMR AFR SAS Two samples with unusual combination: HLA-DRB1*13-HLA-DQA1*01-HLA-DQB1*05 HG03516: Esan from Nigeria (ESN), NA18906: Yoruba in Ibadan, Nigeria (YRI)
  19. Confidential & Proprietary. Do Not Distribute. 20 Diplotype Analysis PCA

    1 PCA 2 Each line connects the two haplotype in a single sample DRA-DRB3-DRB1-DQA1 DRA-DRB5-DRB1-DQA1 DRA-DRB1-DQA1 DRA-DRB4-DRB1-DQA1
  20. Confidential & Proprietary. Do Not Distribute. 23 Population Level Haplotype

    Flow AMR AFR CenB CenA Tel A Tel B Tel A CenB CenA
  21. Confidential & Proprietary. Do Not Distribute. 24 Distinct KIR Haplotype

    Group KIR3DL3:00/01/04- KIR2DL3:00-KIR2DP1:00- KIR2DL1:00- KIR3DP1:00/01- KIR2DL4:00-KIR3DL1:00- KIR2DS4:00-KIR3DL2:00 KIR3DL3:00-KIR2DS2:00- KIR2DL2:00-KIR3DP1:00- KIR2DL4:00-KIR3DS1:01- KIR2DL5A:00- KIR2DS5:00-KIR2DS1:00- KIR3DL2:01 (Same KIR2DL3:00 called, but different non-coding structures)
  22. Confidential & Proprietary. Do Not Distribute. 25 Additional KIR Assemblies

    Possible with Capture Protocol and Targeted Sequencing 48 Additional target sampled sequences for the KIR regions (On going analysis) Loops due CNV / Duplication
  23. Confidential & Proprietary. Do Not Distribute. 26 Outlook and Applications

    • The Human Pangenome Reference and PGR-Toolkit aid in analyzing complex loci like MHC Class II and KIR with novel visualization techniques. • Advanced sequencing technologies now produce reliable contig sequence for each haplotype of the MHC and KIR regions • Computational tools enhance understanding of key regions like MHC, KIR, IGH, capturing previously unseen haplotype variations • Merging these resources with technology can boost typing accuracy in clinical settings beyond just genomics research
  24. Confidential & Proprietary. Do Not Distribute. 27 Acknowledgements NMDP, Be

    the Match Martin Maiers, Michael Wright Anthony Nolan Steven Marsh, James Robinson DaSH 13 Hackathon: Haipeng Liu (Lab Corp), Nicholas Pollock (U of Colorado) Ying Zhou and Heng Li (Dana-Farber Cancer Institute and Harvard Medical School) Matthew W. Anderson HPRC, https://humanpangenome.org Foundation of Biological Data Sciences: Asif Khalak NIST: Justin Zook, Justin Wanger UC Berkely: Peter Sudmant Baylor College of Medicine: Fritz Sedlazeck, Sairam Behera GeneDX: Gustavo Stolovitzky Kristy McWalter