Slide 1

Slide 1 text

FIND MEANING IN COMPLEXITY © Copyright 2013 by Pacific Biosciences of California, Inc. All rights reserved. Base Modification Detection Overview

Slide 2

Slide 2 text

Overview of Release Base Modification Detection Agenda 2 Experimental design recommendations Example project Bacterial epigenome characterization Kinetics in SMRT® Sequencing Analysis recommendations Where to find additional information

Slide 3

Slide 3 text

Kinetics in SMRT® Sequencing • SMRT Sequencing uses kinetic information for each nucleotide addition to call bases • Same information can also be used to distinguish modified and native bases • Can compare results of SMRT Sequencing to an in silico kinetic reference for dynamics without modifications to infer the presence of bases different from A, C, G or T 3 Interpulse duration (IPD)

Slide 4

Slide 4 text

Detection of DNA Base Modifications Using Kinetics Example: N6-methyladenine 4 Flusberg et al. (2010) Nature Methods 7: 461-465 A G C T mA G T T Template strand C G A G C T AG TTC A T G T Template strand

Slide 5

Slide 5 text

Detection of DNA Base Modifications by SMRT® Sequencing 5 Flusberg et al. (2010) Nature Methods 7: 461-465 SMRT Portal can recognize and annotate multi-site modified-base signatures 5-mC 4-mC 6-mA Calculation of IPD ratios across the reference gives information about base modification at every position.

Slide 6

Slide 6 text

Detectable by other Sequencing Methods Signatures of Different DNA Base Modifications 6 Prokaryotic Eukaryotic DNA Damage

Slide 7

Slide 7 text

E. coli Outbreak in Germany (June 2011) Country HUS “EHEC” Cases Deaths Cases Deaths Germany 814 27 2773 12 Total 856 28 2841 12

Slide 8

Slide 8 text

Genomes of 0104:H4 enteroaggregative E. coli strains: 55989 C227-11 C734-09 C35-10 C682-09 C760-09 C754-09 C777-90 Horizontal Genetic Exchange Allowed for the Emergence of the Highly Virulent, Shiga Toxin-Producing Strain Strains compared to the common reference strain : TY2482 Rasko et al. Origins of the E. coli Strain Causing an Outbreak of Hemolytic-Uremic Syndrome in Germany, New Engl J Med 2011; 365:709 - 717

Slide 9

Slide 9 text

Genome-Wide Detection of Methyladenine in Outbreak Strain Genome Wide Detection of Kinetic Variation The log likelihood ratio (LLR) is the likelihood that a kinetic signal in the native sample is significantly different from that in the amplified control. Fang et al. Genome-wide mapping of methylated adenine residues in pathogenic Escherichia coli using single- molecule real-time sequencing, Nature Biotechnology (2012)

Slide 10

Slide 10 text

Identified a Large Panel of Methylases in Outbreak Strain 11 Predicted methyltransferase Predicted methyltransferase A methyltransferase and restriction enzyme (both unique in C227-11) A Dam-like methyltransferase (unique in C227-11) Predicted methyltransferase A Dam-like methyltransferase DAM Methyltransferase and restriction enzyme Predicted methyltransferase

Slide 11

Slide 11 text

CTGCAG Motif is Unique to Outbreak Strain 13 Methyltransferase Motif (weblogo) C227 outbreak 55989 782-09 17-2 734-09 760-09 35-10 042 1010 M.EcoGI Non-specific M. EcoGII Non-specific M. EcoGV M. EcoGVIII M. EcoGVI M. EcoGIV M. EcoGIII M. EcoGVII M. EcoGIX Non-specific M. EcoGDam

Slide 12

Slide 12 text

CTGCAG Methylation Affects Gene Expression 14 Up-regulated Down-regulated

Slide 13

Slide 13 text

Bacterial Methylation Experimental Design

Slide 14

Slide 14 text

Detecting DNA Base Modifications: Experimental Design and Analysis • High-level guidance on experimental design to study bacterial methylomes • Workflow recommendations • Details on Analysis Procedures www.pacb.com/basemod 16

Slide 15

Slide 15 text

Experimental Design – Choosing an Analysis Method Project Goals: • Characterization of methylome • De novo assembly • Resequencing • Comparing differences in base modification between two native DNA samples • Identifying highly modified motifs throughout the genome vs. interrogating specific regions of the genome with high confidence Coverage Needs: • Coverage needs vary based on magnitude of the kinetic signal • Magnitude of kinetic signal varies by type of modification • Recommendation: Target 100x Coverage – Coverage across genome will follow normal distribution – Minimum of 25x coverage per strand – Matches with recommendations for de novo assembly Experimental Design Isolate DNA Template Preparation Sequencing Analysis Modification Type Minimum Coverage per Strand 4-mC 25x 6-mA 25x 5-mC 250x TET-converted 5-mC 25x

Slide 16

Slide 16 text

The Choice of Control for Determining IPD Ratios Will Also Influence Experiment Planning 19 Control Options Definition In silico Computational model for predicting the mean IPD per given sequence context at the position of inquiry Amplified Control created by separately sequencing an amplified version of the sample of interest Native Comparison of two native DNA samples to identify differences in modifications present Advantages Eliminates need for data from an amplified control, reducing sequence data required May produce lower background (statistical noise) than in-silico control when sequencing to very high coverage Focuses on regions of differential modification When to Use • Default analysis mode • All polymerases supported • C2 chemistry only • For identification • XL chemistry • Confirming results from in-silico control Comparative studies to find differential modification

Slide 17

Slide 17 text

Template Preparation Recommendations • Follow normal extraction recommendations • No amplification – must use native DNA! – Except if using amplified control • DNA-damage repair – Does not impact methylated bases (4-mC, 6-mA, 5-mC) – Avoid for other types of modification (e.g., 8-oxo-G, dU) • Follow large-insert-library template-preparation recommendations for de novo assembly – Compatible with size-selection methods • Optional Tet1-conversion protocol on SampleNet • For resequencing, can use shorter-insert libraries – Proper mapping critical; insert size must exceed the repeat length – Multi-molecule analysis sufficient to perform motif analysis (exception rare DNA- damage events) Experimental Design Isolate DNA Template Preparation Sequencing Analysis

Slide 18

Slide 18 text

Sequencing Recommendations Long-Insert Libraries (De novo) Short-Insert Libraries (Resequencing) Instrument PacBio® RS II PacBio® RS II DNA Polymerase/ Binding Kit DNA/Polymerase Binding Kit P4 DNA/Polymerase Binding Kit P4 DNA Sequencing Kit DNA Sequencing Kit 2.0 (C2) DNA Sequencing Kit 2.0 (C2) Loading MagBead loading; follow protocol for insert size MagBead loading Stage Start Stage Start = yes Stage Start = no Movie Time 1 x 120 minute 1 x 45 movie or longer Experimental Design Isolate DNA Template Preparation Sequencing Analysis

Slide 19

Slide 19 text

Methylome Analysis Recommendations Generate Reference • Recommend 100x coverage for de novo Hierarchical Assembly • Upload Reference into SMRT® Portal • If using an amplified control • Align the WGA sequence to the reference sequence so that may be used as an amplified control • For P4-optimized in silico reference, use SMRT Portal v2.0.1 Identification of putative modification sites • SMRT Portal v2.0.1 using RS_Modification_and_Motif_Analysis.1 • If using an amplified control, refer to the amplified control job • View results in SMRT View Generate Reference Identification of putative sites with SMRT Portal v 2.0.1 Automated Motif Identification Visualize with SMRT View Experimental Design Isolate DNA Template Preparation Sequencing Analysis

Slide 20

Slide 20 text

Microbial Methylome Experimental Design Takeaways • Use PacBio® RS II to maximize throughput • Target 100X Coverage • P4, XL and C2 binding kit all supported with v2.0.1; P4 Polymerase recommended • Sequencing Kit 2.0 (C2); NOT XL • >10 kb libraries: 1x120 movies (MagBead, with Stage Start) • <3 kb libraries: 1x45 movies (magbead, no stage start) • SMRT Analysis v2.0.1 required for optimized in silico control for P4 enzyme • Use HGAP assembly method including Quiver to generate best assembly as reference • SMRT Portal: RS_Modification_and_Motif_ Analysis.1 protocol • SMRT View to visualize potential modification events • Refer to DevNet for additional information • Limit DNA damage during sample extraction • Standard SMRTbell™ library preparation workflow • No amplification! • Target insert size based on assembly needs • Tet1 conversion possible for 5-mC (see SampleNet) • Use in silico control Sample Prep Run Design Sequencing on the PacBio® RS and primary analysis Secondary Analysis Tertiary Analysis

Slide 21

Slide 21 text

Example Project : Meiothermus ruber DSM 1279

Slide 22

Slide 22 text

Microbial Assembly - Meiothermus ruber 10kb SMRTbell™ library 3 SMRT® Cells (C2-C2 Chemistry, PacBio® RS) Long seed reads (>5 kb) Pre-assembled long reads 5 contigs 1 contig Pre-assembly 1 contig Celera Assembler Minimus2 Quiver Collaboration with A. Clum, A. Copeland (Joint Genome Institute) • Single-contig assembly • 99.99965% concordance with reference • 99.3% genes predicted

Slide 23

Slide 23 text

Methylome Analysis - Meiothermus ruber 10kb SMRTbell™ library 3 SMRT® Cells (C2-C2 Chemistry, PacBio® RS) Long seed reads (>5 kb) Pre-assembled long reads 5 contigs 1 contig Pre-assembly 1 contig Celera Assembler Minimus2 Quiver Collaboration with A. Clum, A. Copeland (Joint Genome Institute) Modification and Motif Detection Workflow Modification and Motif Detection Workflow Additional Optional Analysis with R-kinetics Package Methylome

Slide 24

Slide 24 text

Kinetic Variation across the Genome of M. ruber 27

Slide 25

Slide 25 text

SMRT® View Displays Kinetic Information to Examine Base Modifications 28

Slide 26

Slide 26 text

Kinetics Distributions by Base 29 • Distributions show clear evidence for 6-mA and 4-mC

Slide 27

Slide 27 text

Detected Methyltransferase Specificities 30 • Detected six m6A and two m4C methyltransferase activities • 5’-TTAm6A-’3 and 5’-Am6ATT-3’ are not reverse complementary

Slide 28

Slide 28 text

Plots Showing the Increased IPD Ratio at Modified Positions at Selected Motifs 31 CAGACG RGATCY

Slide 29

Slide 29 text

PacBio® Detection vs. Rebase Prediction 32 5’-CCA(N6)TGCC-3’ 3’-GGT(N6)ACGG-5’ 5’-GGGAGC-3’ Or 5’-CAGAYG-3’ ?

Slide 30

Slide 30 text

In-Depth Methylome Analysis Can Uncover Interesting Biology • Significant number of GATC sites have very low Qmod values and good coverage • Anti-correlation between 4-mC and 6-mA in 5’-RGATCY-3’ contexts (R= A/G ; Y= C/T) • Evidence for partial methylation patterns • Multi-base kinetic footprint for 4-mC is not perfectly recognized, though scores do reflect methylation ^ ^ ^ ^ ^ ^

Slide 31

Slide 31 text

Being in the Context of 5’-RGATCY-3’ Affects the Methylation Score of A in 5’-Gm6ATC-3’ 34 • 5’-GATC-3’ motif can be categorized into two types: 5’-RGATCY-3’ and non-RGATCY – Kinetic score of A in 5’-RGATCY-3’ is low (mean = 1.78, median = 0.00) – Kinetic score of A in non-RGATCY is high, with distribution centered around 310 • What is the mechanism for A to remain unmethylated in the presence of 4-mC in 5-RGATCY- 3’ context? • Is there a biological correlation to this observation? non-RGATCY RGATCY

Slide 32

Slide 32 text

Available Resources

Slide 33

Slide 33 text

Summary of Key Points • Modifications are involved in many biochemical processes • SMRT® Sequencing provides a path to distinguishing dozens of different modifications • SMRT® Analysis simplifies detection of putative modification events using IPD-ratio analysis and a built-in motif-discovery algorithm • Guidance on experimental design and analysis recommendations available 36

Slide 34

Slide 34 text

Base-Modification Information and Tools - Available Now 37 • White paper – Overview of application area and relevant background • Webinars and Videos • Technical note – Bacterial-methylome analysis • Code and “How to” on DevNet – Tech note describing how to analyze IPDs – Motif analysis using R or GUI – Bacterial Assembly and Epigenetic Analysis Training Web Video http://www.pacificbiosciences.com/Tutorials/ Bacterial_Assembly_Epigenetic_Analysis_H GAP/story_html5.html • Publications • Base Modification Data – http://pacb.com/bmd http://pacb.com/basemod

Slide 35

Slide 35 text

Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, and SMRTbell are trademarks of Pacific Biosciences in the United States and/or other countries. All other trademarks are the sole property of their respective owners.