Human Pangenome Graph Analysis for the MHC

Confidential & Proprietary. Do Not Distribute. Human Pangenome Graph Analysis
for the MHC ASHI, Oct.18 2023, San Antonio Jason Chin, VP Genomics Technology and Algorithm

Confidential & Proprietary. Do Not Distribute. 2 MHC Haplotype Project
2000-2006 Traditional approaches to determining haplotypes based on limited genotype / SNPs information have inherent ambiguity/errors. Full “haplotype assembly contigs” (~haplotigs with all bases in MHC from single choromosome) can resolve these ambiguities but it was hard and expensive to do.

Confidential & Proprietary. Do Not Distribute. 3 Continued Progress of
The Genomics Technology From the first long-read human genome assembly (not haplotype resolved) 2014 2020 First complete haplotyped resolved MHC assembly published from diploid sample with only shotgun sequencing A diploid assembly-based benchmark for variants in the major histocompatibility complex, Chin, et. al., 2020, Nature Communication 2023 47 Phased human genome assemblies published by Human Pangenome Reference Consortium A draft human pangenome reference, Liao, et. al, Nature 2023

Confidential & Proprietary. Do Not Distribute. 4 Human Pangenome Reference
Consortium With the similar haplotype read separation techniques, 47 haplotyped resolved genome assemblies generated and published.

Confidential & Proprietary. Do Not Distribute. 5 Genome Assembly for
The MHC Regions The first full haplotype resolved MCH assembly by shotgun sequence : A diploid assembly-based benchmark for variants in the major histocompatibility complex, Chin, et. al., 2020, Nature Communication Long-read sequence separated by haplotype are assembled separately to get full phased “haplotigs” through the 5Mb MHC region for sample HG002.

Confidential & Proprietary. Do Not Distribute. 6 How Can We
“See” and Analyze The Pangenome to Understand the Complexity of the MHC How do we visualize complex regions like MHC with the pangenome? We develop Pangenome Research Toolkit (PGR-TK) for analyzing pangenome data

Confidential & Proprietary. Do Not Distribute. 7 In Pangenome, We
Need Represent Many Haplotypes at Once Haplotype I Haplotype III Haplotype II Different haplotype might have shared and distinct blocks. However, we don’t know which parts are shared by just looking at each individual sequence.

Confidential & Proprietary. Do Not Distribute. 8 One Solution: Capture
all information in a pangenome graph: represent all sequences as paths in a graph. Each possible path can be a possible haplotype Each box (the node in the graph) represents a block of similar DNA sequences from different haplotypes A haplotype sequence is a path in the graph Different haplotypes have different paths (There are many different flavors of pangenome graphs, here we focus on Minimizer Anchored Pangenome Graph from PGR-TK, MAP-Graph)

Confidential & Proprietary. Do Not Distribute. 9 Minimizer Anchored Pangenome
Graph (MAP-Graph) for the MHC Class II Region Chr6:32,163,513-32,992,088 GRCh38 and HG19 New Telomere to telomere Reference: CHM13 If one genome does not share similar path from the reference genomes, it is almost impossible to map the sequence reads and perform analyze. Pangenome DRB1 DQA1 DQB1

Confidential & Proprietary. Do Not Distribute. 10 chr6_cox_hap2_hg19_3602951_4403309_0 chr6_dbb_hap3_hg19_3481454_4240123_0 chr6_mcf_hap5_hg19_3512099_4282867_0
chr6_qbl_hap6_hg19_3393290_4184339_0 chr6_ssto_hap7_hg19_3480367_4439545_0 Note: These cell lines are HLA “homozygous”

Confidential & Proprietary. Do Not Distribute. 11 HG002 Both haplotypes
from sequence data HG002 Haplotype 1 HG002 Haplotype 2 Haplotype resolved assembly Haplotype resolved assembly

Confidential & Proprietary. Do Not Distribute. 13 Shared Sequence Blocks
And The MAP graph The graph is useful, but it may not be intuitive for most biomedical researchers. We develop a linearize view to make complex variation easier to interpret. MAP Graph Haplotype I Haplotype III Haplotype II The shared blocks along with each haplotype Shared Blocks Each block represents a set of closely related similar sequences

Confidential & Proprietary. Do Not Distribute. 14 MAP-Graph vs. Shared
Blocks View 94 HPRC haplotypes (+ HG38 + CHM13)

Confidential & Proprietary. Do Not Distribute. 15 Calling HLA Types
for MHC Class II Region Direct HLA and KIR type calling on the pangenome with cutting edge bioinformatics tools: “miniprot” and “Immuannot” (developed by Ying Zhou and Heng Li) https://github.com/YingZhou001/Immuannot DRA*01-DRB3*01-DRB1*13-DQA1*01-DQB1*06-DQA2*01-DQB2*01-DOB*01-TAP2*02- TAP1*01-DMB*01-DMA*01 + HG01243#2#JAHEOX010000097.1_26625704_27447351_1 DRA*01-DRB4*01-DRB1*04-DQA1*03-DQB1*03-DQA2*01-DQB2*01-DOB*01-TAP2*02- TAP1*06-DMB*01-DMA*01 + chr6_ssto_hap7_hg19_3480367_4439545_0 Get gene level haplotypes analyzed along with the sequence level haplotypes DRA*01 DRB4*01 DRB1*04 DQA1*03 DRB3*01 DRB1*13 https://github.com/cschin/dash13-KIR-MHC-assemblies/

Confidential & Proprietary. Do Not Distribute. 16 Highlight of Several
Distinct Clusters DRB3*01/02/03-DRB1*13-DQA1*01-DQB1*05/06-DQA2*01 Tree built with just DRB1 exon sequence

Confidential & Proprietary. Do Not Distribute. 17 Highlight of Several
Distinct Clusters DRB5*01-DRB1*15-DQA1*01-DQB1*06-DQA2*01-DQB2*01 Tree built with just DRB1 exon sequence Two observations: • The DRB1 CDS are highly associated with overall sequence level haplotype structures • There are still non-coding level structural differences even the gene level haplotypes are the same. All haplotype clusters in HTML

Confidential & Proprietary. Do Not Distribute. 18 Gene Level Haplotype
Flow Populational level haplotype structure from DRA to DQA Haplotigs without DRB3/4/5 DRB1*08 DRB1*10

Confidential & Proprietary. Do Not Distribute. 19 Population Level Haplotype
Flow Populational level haplotype structure from DRA to DQA (Add colors for different population and two new allele haplotype combination) EAS AMR AFR SAS Two samples with unusual combination: HLA-DRB1*13-HLA-DQA1*01-HLA-DQB1*05 HG03516: Esan from Nigeria (ESN), NA18906: Yoruba in Ibadan, Nigeria (YRI)

Confidential & Proprietary. Do Not Distribute. 20 Diplotype Analysis PCA
1 PCA 2 Each line connects the two haplotype in a single sample DRA-DRB3-DRB1-DQA1 DRA-DRB5-DRB1-DQA1 DRA-DRB1-DQA1 DRA-DRB4-DRB1-DQA1

Flow for KIR

Flow for KIR CenB CenA Tel A Tel B

Confidential & Proprietary. Do Not Distribute. 23 Population Level Haplotype
Flow AMR AFR CenB CenA Tel A Tel B Tel A CenB CenA

Confidential & Proprietary. Do Not Distribute. 24 Distinct KIR Haplotype
Group KIR3DL3:00/01/04- KIR2DL3:00-KIR2DP1:00- KIR2DL1:00- KIR3DP1:00/01- KIR2DL4:00-KIR3DL1:00- KIR2DS4:00-KIR3DL2:00 KIR3DL3:00-KIR2DS2:00- KIR2DL2:00-KIR3DP1:00- KIR2DL4:00-KIR3DS1:01- KIR2DL5A:00- KIR2DS5:00-KIR2DS1:00- KIR3DL2:01 (Same KIR2DL3:00 called, but different non-coding structures)

Confidential & Proprietary. Do Not Distribute. 25 Additional KIR Assemblies
Possible with Capture Protocol and Targeted Sequencing 48 Additional target sampled sequences for the KIR regions (On going analysis) Loops due CNV / Duplication

Confidential & Proprietary. Do Not Distribute. 26 Outlook and Applications
• The Human Pangenome Reference and PGR-Toolkit aid in analyzing complex loci like MHC Class II and KIR with novel visualization techniques. • Advanced sequencing technologies now produce reliable contig sequence for each haplotype of the MHC and KIR regions • Computational tools enhance understanding of key regions like MHC, KIR, IGH, capturing previously unseen haplotype variations • Merging these resources with technology can boost typing accuracy in clinical settings beyond just genomics research

Confidential & Proprietary. Do Not Distribute. 27 Acknowledgements NMDP, Be
the Match Martin Maiers, Michael Wright Anthony Nolan Steven Marsh, James Robinson DaSH 13 Hackathon: Haipeng Liu (Lab Corp), Nicholas Pollock (U of Colorado) Ying Zhou and Heng Li (Dana-Farber Cancer Institute and Harvard Medical School) Matthew W. Anderson HPRC, https://humanpangenome.org Foundation of Biological Data Sciences: Asif Khalak NIST: Justin Zook, Justin Wanger UC Berkely: Peter Sudmant Baylor College of Medicine: Fritz Sedlazeck, Sairam Behera GeneDX: Gustavo Stolovitzky Kristy McWalter

Human Pangenome Graph Analysis for the MHC

Human Pangenome Graph Analysis for the MHC

Jason Chin

More Decks by Jason Chin

Featured

Transcript

Confidential & Proprietary. Do Not Distribute. Human Pangenome Graph Analysis

Confidential & Proprietary. Do Not Distribute. 2 MHC Haplotype Project

Confidential & Proprietary. Do Not Distribute. 3 Continued Progress of

Confidential & Proprietary. Do Not Distribute. 4 Human Pangenome Reference

Confidential & Proprietary. Do Not Distribute. 5 Genome Assembly for

Confidential & Proprietary. Do Not Distribute. 6 How Can We

Confidential & Proprietary. Do Not Distribute. 7 In Pangenome, We

Confidential & Proprietary. Do Not Distribute. 8 One Solution: Capture

Confidential & Proprietary. Do Not Distribute. 9 Minimizer Anchored Pangenome

Confidential & Proprietary. Do Not Distribute. 10 chr6_cox_hap2_hg19_3602951_4403309_0 chr6_dbb_hap3_hg19_3481454_4240123_0 chr6_mcf_hap5_hg19_3512099_4282867_0

Confidential & Proprietary. Do Not Distribute. 11 HG002 Both haplotypes

Confidential & Proprietary. Do Not Distribute. 13 Shared Sequence Blocks

Confidential & Proprietary. Do Not Distribute. 14 MAP-Graph vs. Shared

Confidential & Proprietary. Do Not Distribute. 15 Calling HLA Types

Confidential & Proprietary. Do Not Distribute. 16 Highlight of Several

Confidential & Proprietary. Do Not Distribute. 17 Highlight of Several

Confidential & Proprietary. Do Not Distribute. 18 Gene Level Haplotype

Confidential & Proprietary. Do Not Distribute. 19 Population Level Haplotype

Confidential & Proprietary. Do Not Distribute. 20 Diplotype Analysis PCA

Confidential & Proprietary. Do Not Distribute. 21 Gene Level Haplotype

Confidential & Proprietary. Do Not Distribute. 22 Gene Level Haplotype

Confidential & Proprietary. Do Not Distribute. 23 Population Level Haplotype

Confidential & Proprietary. Do Not Distribute. 24 Distinct KIR Haplotype

Confidential & Proprietary. Do Not Distribute. 25 Additional KIR Assemblies

Confidential & Proprietary. Do Not Distribute. 26 Outlook and Applications

Confidential & Proprietary. Do Not Distribute. 27 Acknowledgements NMDP, Be