Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Pan Genome Research Tool Kit (PGR-TK) for T2T F2F meeting

Jason Chin
August 09, 2022

Pan Genome Research Tool Kit (PGR-TK) for T2T F2F meeting

A summary about Pan Genome Research Tool Kit can help for human pan gnome analysis.

Jason Chin

August 09, 2022
Tweet

More Decks by Jason Chin

Other Decks in Science

Transcript

  1. Pan Genomics Research Tool Kit (PGR-TK) • A non-”AI” approach

    though :p • Promote T2T and pangenome resources for diagnostics / testing applications in my organization (GeneDX + Sema4) Code: https://github.com/Sema4-Research/pgr-tk Document: https://sema4-research.github.io/pgr-tk/ Example: https://github.com/sema4-Research/pgr-tk-notebooks preprint: Multiscale Analysis of Pangenome Enables Improved Representation of Genomic Diversity For Repetitive And Clinical Relevant Genes | bioRxiv (https://www.biorxiv.org/content/10.1101/2022.08.05.502980v)
  2. Minimizer Anchored Pangenomics Graph a:b a:c c:h h:e h:i i:f

    b:c c:d d:e e:f f:g c h i f h e f g e f b c d e a c h i c d b a a a a a a b b b c c c c d d h h e e e i f f f f g g e f g g g Sequence with minimizer anchors Graph Vertex: A set of sequences with shared minimizer anchors at both ends Induced Minimizer Anchored Pan-genomics Graph
  3. Pangenomics Graph at Multi- scales w=128, k=32, r =8 12.5

    kbp 12.5 kbp 12.5 kbp w=128, k=32, r =6 w=128, k=32, r =4 (w, k, r) = (48, 56, 12) (w, k, r) = (48, 56, 4) KCNE1 AMY1A
  4. Break The Sequences Into Repeat Units • Identify high multiplicity

    principal bundles • Pick the ”most repetitive but non- trivial principal bundle”, identify the locations where a sequence start to overlaps with the bundle as the starting points of the repeat elements. • Maybe more sophisticate HMM can be deployed in the future the repeat
  5. PCA Analysis of The Repeat Units (TSPY) 400+ repeat units

    Binary Vector by projecting the repeats to the principal bundles
  6. We need better tools together to "cook" pangenome!! Acknowledge: Sairam

    Behera, Asif Khalak, Fritz Sedlazeck, Justin Wagner, Justin M. Zook for great collaborations and T2T / HPRC for data generation & release