2000-2006 Traditional approaches to determining haplotypes based on limited genotype / SNPs information have inherent ambiguity/errors. Full “haplotype assembly contigs” (~haplotigs with all bases in MHC from single choromosome) can resolve these ambiguities but it was hard and expensive to do.
The Genomics Technology From the first long-read human genome assembly (not haplotype resolved) 2014 2020 First complete haplotyped resolved MHC assembly published from diploid sample with only shotgun sequencing A diploid assembly-based benchmark for variants in the major histocompatibility complex, Chin, et. al., 2020, Nature Communication 2023 47 Phased human genome assemblies published by Human Pangenome Reference Consortium A draft human pangenome reference, Liao, et. al, Nature 2023
The MHC Regions The first full haplotype resolved MCH assembly by shotgun sequence : A diploid assembly-based benchmark for variants in the major histocompatibility complex, Chin, et. al., 2020, Nature Communication Long-read sequence separated by haplotype are assembled separately to get full phased “haplotigs” through the 5Mb MHC region for sample HG002.
“See” and Analyze The Pangenome to Understand the Complexity of the MHC How do we visualize complex regions like MHC with the pangenome? We develop Pangenome Research Toolkit (PGR-TK) for analyzing pangenome data
Need Represent Many Haplotypes at Once Haplotype I Haplotype III Haplotype II Different haplotype might have shared and distinct blocks. However, we don’t know which parts are shared by just looking at each individual sequence.
all information in a pangenome graph: represent all sequences as paths in a graph. Each possible path can be a possible haplotype Each box (the node in the graph) represents a block of similar DNA sequences from different haplotypes A haplotype sequence is a path in the graph Different haplotypes have different paths (There are many different flavors of pangenome graphs, here we focus on Minimizer Anchored Pangenome Graph from PGR-TK, MAP-Graph)
Graph (MAP-Graph) for the MHC Class II Region Chr6:32,163,513-32,992,088 GRCh38 and HG19 New Telomere to telomere Reference: CHM13 If one genome does not share similar path from the reference genomes, it is almost impossible to map the sequence reads and perform analyze. Pangenome DRB1 DQA1 DQB1
And The MAP graph The graph is useful, but it may not be intuitive for most biomedical researchers. We develop a linearize view to make complex variation easier to interpret. MAP Graph Haplotype I Haplotype III Haplotype II The shared blocks along with each haplotype Shared Blocks Each block represents a set of closely related similar sequences
for MHC Class II Region Direct HLA and KIR type calling on the pangenome with cutting edge bioinformatics tools: “miniprot” and “Immuannot” (developed by Ying Zhou and Heng Li) https://github.com/YingZhou001/Immuannot DRA*01-DRB3*01-DRB1*13-DQA1*01-DQB1*06-DQA2*01-DQB2*01-DOB*01-TAP2*02- TAP1*01-DMB*01-DMA*01 + HG01243#2#JAHEOX010000097.1_26625704_27447351_1 DRA*01-DRB4*01-DRB1*04-DQA1*03-DQB1*03-DQA2*01-DQB2*01-DOB*01-TAP2*02- TAP1*06-DMB*01-DMA*01 + chr6_ssto_hap7_hg19_3480367_4439545_0 Get gene level haplotypes analyzed along with the sequence level haplotypes DRA*01 DRB4*01 DRB1*04 DQA1*03 DRB3*01 DRB1*13 https://github.com/cschin/dash13-KIR-MHC-assemblies/
Distinct Clusters DRB5*01-DRB1*15-DQA1*01-DQB1*06-DQA2*01-DQB2*01 Tree built with just DRB1 exon sequence Two observations: • The DRB1 CDS are highly associated with overall sequence level haplotype structures • There are still non-coding level structural differences even the gene level haplotypes are the same. All haplotype clusters in HTML
Flow Populational level haplotype structure from DRA to DQA (Add colors for different population and two new allele haplotype combination) EAS AMR AFR SAS Two samples with unusual combination: HLA-DRB1*13-HLA-DQA1*01-HLA-DQB1*05 HG03516: Esan from Nigeria (ESN), NA18906: Yoruba in Ibadan, Nigeria (YRI)
Possible with Capture Protocol and Targeted Sequencing 48 Additional target sampled sequences for the KIR regions (On going analysis) Loops due CNV / Duplication
• The Human Pangenome Reference and PGR-Toolkit aid in analyzing complex loci like MHC Class II and KIR with novel visualization techniques. • Advanced sequencing technologies now produce reliable contig sequence for each haplotype of the MHC and KIR regions • Computational tools enhance understanding of key regions like MHC, KIR, IGH, capturing previously unseen haplotype variations • Merging these resources with technology can boost typing accuracy in clinical settings beyond just genomics research
the Match Martin Maiers, Michael Wright Anthony Nolan Steven Marsh, James Robinson DaSH 13 Hackathon: Haipeng Liu (Lab Corp), Nicholas Pollock (U of Colorado) Ying Zhou and Heng Li (Dana-Farber Cancer Institute and Harvard Medical School) Matthew W. Anderson HPRC, https://humanpangenome.org Foundation of Biological Data Sciences: Asif Khalak NIST: Justin Zook, Justin Wanger UC Berkely: Peter Sudmant Baylor College of Medicine: Fritz Sedlazeck, Sairam Behera GeneDX: Gustavo Stolovitzky Kristy McWalter