Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Analyzing, Annotating, and Linking RNA Data to Create Knowledge and Facilitate Research

BGSU RNA
April 21, 2015

Analyzing, Annotating, and Linking RNA Data to Create Knowledge and Facilitate Research

This talk was given by Professor Neocles Leontis at the European Bioinformatics on April 21, 2015.

Abstract:
Linking and integrating diverse RNA data resources is crucial for deepening our understanding about RNA sequence, structure, function, and evolution, while also facilitating knowledge discovery for practical applications, such as RNA design for nanotechnology and nanomedicine. Atomic-resolution 3D data of highly structured RNA molecules, like ribosomal RNAs, riboswitches, and ribozymes, are rapidly accumulating. These data, appropriately integrated, are relevant for understanding many other RNA molecules since the potential for 3D structure can be as important as the sequence, even for molecules such as mRNAs, where the sequence-encoded information is central to function but structured regions can serve to modulate the translation of that information in important ways. I will use 16S rRNA as an example to illustrate the characteristics of structured RNA molecules and ways of visualizing, analyzing, and annotating RNA structures at different levels of organization, from pairwise interactions to recurrent local motifs (hairpin, internal and junction loops) to domains and folds. I will review the challenges and successes of visualising structural annotations on 2D diagrams, connecting sequence alignments with structures, and the need for automated tools to assist in comparative analysis. A major issue is integrated access to the different forms of RNA data by diverse users in ways that facilitate knowledge discovery. A key challenge is that their use is currently limited by the familiarity of users with the specialized software tools for accessing, interpreting, and assessing the quality of those data. Ways are needed to overcome these barriers to provide actionable information to diverse users regardless of their scientific backgrounds.

BGSU RNA

April 21, 2015
Tweet

Other Decks in Research

Transcript

  1. Analyzing,  Annotating,  and  Linking  RNA   Data  to  Create  Knowledge

     and  Facilitate   Research European Bioinformatics Institute April 21, 2015 Neocles Leontis Bowling Green State University BGSU RNA Bioinformatics and Nanotechnology Group rna.bgsu.edu
  2. rna.bgsu.edu PDB/NDB: Helen Berman John Westbrook Saheli Ghosn Buvna Narayanan

    Alumni (recent): Irina Novikova Jesse Stombaugh Kiril Afonin (UNC) Ryan Rahrig (ONU) Amal Abu Almakarem Lorena Parlea (NCI) Lab members: Neocles Leontis Craig Zirbel Blake Sweeney James Roll Poorna Roy Maryam Hosseini Jamie Cannone Emil Khisamutdinov Anton Petrov (EBI) Steve Dinda Ali Mokdad Megan Pirrung
  3. Acknowledgments Collaborators: Craig Zirbel BGSU Mathematics and Statistics: Software and

    Database Development: FR3D, JAR3D, R3D Align, RNA 3D Hub Eric Westhof, IBMC, Strasbourg, France: RNA 3D Structure Analysis Helen Berman and John Westbrook, Rutgers University, PDB/NDB Biao Ding, Plant Molecular Genetics, Ohio State University: Viroid Structure/Function Jiri & Judit Sponer Masaryk University, Brno, Czech Republic: QM and MD simulation Robin Gutell & Jamie Cannone, Univ. of Texas Austin
  4. Goals:  Build,  Maintain,  and  Deploy   Reliable  Tools  and  Resources

     for   RNA  3D  Annotation,  Query,   Analysis,  and  Prediction Collaborators: Craig Zirbel (BGSU), Helen Berman, John Westbrook, Saheli Ghosh and Buvna Narayanan, Rutgers University, PDB/NDB Funding: NIH
  5. Automated  Pipeline: 1. Classify  and  Annotate  New RNA-­containing  Structures  

    from  PDB  (weekly)   2. Update  Non-­Redundant  Set  of  3D  Structures  (weekly) 3. Extract  3D  motifs  from  new  structures   4. Cluster  3D  motifs  and  update  RNA  3D  Motif  Atlas   (monthly) 5. Update  probabilistic  models  for  3D  motif  prediction   (monthly)
  6. Software  and  Web  Services Non-­redundant  Sets RNA  3D  Structure  Annotations

    3D  Motif  Atlas Find  RNA  3D  (“FR3D”  and  “WebFR3D”)  – 3D  motif   search R3DAlign  – align  structures JAR3D  – motif  prediction R3D-­2-­MSA– sequence  alignments   Base  pair  and  Base  Triple  Databases
  7. Sarcin-­‐ricin Kink-­‐turn C-­‐loop Triple-­‐sheared Sarcin-­‐like Tandem-­‐sheared Figures  produced  using  VARNA

         Darty,  Denise  &  Ponty (2009) Over  100  other  structured  motifs  … Structured  internal  loops
  8. RNA 3D Motif Atlas • RNA 3D motifs are extracted

    from the current Non-Redundant (NR) list of 3D structures • Clustered by geometric similarity • All instances of the same motif have the same non-Watson-Crick base pairing pattern • A motif can have instances with different sequences and numbers of nucleotides • Instances and Motifs are assigned Unique and stable ids • Versioning and archiving system
  9. FR3D  Software  Package http://rna.bgsu.edu/FR3D/ ✤ Annotates  3D  Structures ✤ Supports

     Geometric,   Symbolic,  or  mixed  3D   Motif  search ✤ Aligns  Instances ✤ Lists  all  interactions  to   compare  motif  instances   …  
  10. FR3D  Output  for  C-­loops Filename Discrepancy Structural Alignment (PDB) from

    query 1-2 1-6 2-5 3-4 3-6 4-5 5-6 12 34 5 6 2AW4 0.000 U 2680 C 2681 C 2683 U 2684 A 2725 A 2727 s35 cWW tWH s35 cWS cWW s35 UCA-CU....AA-A 1s72 0.127 C 2717 C 2718 C 2720 U 2721 A 2761 G 2763 s35 cWW tWH s35 cWS cWW s35 CCA-CU....AC-G 1kog 0.136 C 96 C 97 C 99 U 100 A 74 G 76 s35 cWW tWH s35 cWS cWW s35 CCA-CU....AU-G 2j01 0.229 G 1319 C 1320 A 1322 U 1323 A 1331 C 1333 s35 cWW tWH s35 ncWS ncWW s35 GCA-AU....AG-C 2AW4 0.232 C 1319 C 1320 A 1322 C 1323 G 1331 G 1333 s35 cWW tWH s35 ncWS cWW s35 CCA-AC....GG-G 2AW4 0.244 G 864 C 865 C 867 U 868 A 909 C 912 s35 cWW tWH s35 ncWS cWW s35 GCA-CU....AAAC 1s72 0.256 G 1425 C 1426 C 1428 U 1429 A 1437 C 1439 s35 cWW tWH s35 cWS cWW s35 GCA-CU....AG-C 2j01 0.278 G 864 C 865 C 867 U 868 A 909 C 912 s35 cWW tWH s35 ncWS cWW s35 GCA-CU....AAAC 1j5e 0.380 G 371 C 372 A 374 U 375 A 389 C 390 s35 cWW tWH s35 cWS cWW s35 GCA-AU....A--C 1s72 0.402 G 958 C 959 C 962 C 963 A 1005 C 1008 s35 cWW tWH s35 cWS cWW s35 GCGACC....AAAC 2AVY 0.415 A 371 C 372 A 374 U 375 A 389 U 390 s35 cWW ntWH s35 cWS cWW s35 ACA-AU....A--U 5 6 Pairwise Interactions Motif Nucleotides 1 2 3 4
  11. Questions • Why is RNA 3D Structure important in ncRNA?

    • What can we learn from structured RNAs like the ribosome that is transferable to new RNAs? • Modular 3D motifs: How do we find, compare, classify, and predict RNA 3D motifs? • What recurrent interactions occur in RNA 3D structure?
  12. Challenge: To Represent Complex RNA Structures -- -- To make

    them 1. easily comprehended by humans and 2. readable and computable by computer software -- To automatically integrate diverse data to understand Ban et al. 2000 50S  subunit
  13. Example:  5S  rRNA: What  do  we  see  in  2D  Structures?

    -­-­>  Watson-­Crick  Helices  and  “Loops” “Loop A” “Loop B” “Loop C” “Loop D” “Loop E” Hairpin Loops - C and D Internal Loops - B and E Junction Loops - A Helix 1 Helix 2 Helix 3 Helix 4 Helix 5 5S rRNA 5S rRNA
  14. 3D x-ray structure of 5S loop E shows: Loop is

    not a “Loop” – all bases Paired! “Loop E”
  15. 5S loop E 3D Structure: “Loops” are not Loops –

    all bases Paired! “Loop E” Loop E solved: (Correll  et  al.  1997) All bases paired!
  16. 5S loop E: Predicted to be recurrent (Leontis & Westhof,

    1998) “Loop E” Predicted Chloroplast Loop E (1997): Confirmed by NMR (Vallurupalli &  Moore  2003) Loop E solved: (Correll et  al.  1997) All bases paired!
  17. Modular  3D  Motifs:  RNA  Structural   Building  Blocks • Correspond

     to  2D   “loop” motifs  – hairpin  (external),   internal,  and   junction  loops • Modular • Recurrent • Play  specific  roles • Goals:  Predict  from   sequence  and  2D   structure  &  Make   better  alignments
  18. How  many  different  types  are  there?   How  many  Hairpin

     loops How  many  Internal loops How  many  Junction  loops How  to  group  and  classify  them? -­Structurally -­Functionally How  to  recognize  them  in  sequences? What  are  the  important  stabilizing   interactions?   How  many  DIFFERENT  sequences  form   the  SAME  motif?   What  is  the  effect  of  a  given  mutation  on   stability,  structure,  dynamics?   Questions  about  3D  Motifs
  19. How  we  define  “RNA  Motif”   depends  on  the  

     Level  of  Analysis: Sequence   Level  Definitions: Secondary   Structure Level: G G C A U G A A G N R A Consensus Description 5-loop 4-loop 3x5-loop 3x3-loop Different  From:
  20. Recurrent Kink-turn, C-loops, & SR Motifs in 23S rRNA Kt-7

    Kt-38 Kt-42 Kt-58 KT-46 Kt-15 Composite Kt-78 C-38 C-52 C-96 Sarcin  motifs Kink-­turn  motifs C-­loops
  21. Architectural   • C-­loops:  Change  twist  of  Helix • Kink-­turns:

     Flexible  Bend  in  Helix • Platforms:  3° RNA  Binding  sites  (e.g.   GAAA  Loop  Receptor • Junctions:  Branch  points  and  hinges
  22. Interactions  in  E.  coli  16S  rRNA   (2AW7) Type  of

     Interaction #  of  Nucleotides   (out  of  1534) WC  Basepairs 944 Non-­WC Basepairs 708 Base  Stacking 1476 Base  Phosphate 222 No  Interaction  (“Bulged”) 7
  23. Long  non-­coding  RNAs (lnc-­ RNA) • >200  Nts  (typically  1,000

     to  10,000  Nts  in   length • Often  poly-­adenylated • Transcribed  by  RNA  Pol II  and  spliced • Generally  Do  NOT  code  for  proteins  (some   do) • >15,000  lnc-­RNAs in  humans • Most  are  expressed  at  low  levels
  24. lnc-­RNA  functions Play  critical  regulatory  roles  in: – Embryonic  stem

     cell  pluripotency – Brain  function – Subcellular compartmentalization – Chromatin  remodeling Play  key  roles  in: – intracellular  signaling   – extracellular  signaling   – stress  response Links  to  diseases  including  cancer
  25. Questions  about  lnc-­RNAs: 1. Are  lncRNAs highly  structured  or  

    disordered? 2. Do  they  contain  globular  sub-­domains? -­-­OR  – 3. Are  they  organized  linearly  in  chains  of   stem-­loops? 4. Do  lncRNAs exist  in  ribonucleoprotein complexes  or  as  isolated  RNAs that   transiently  interact  with  proteins? 5. Do  these  molecules  contain  a  compact  core,   or  are  they  more  extended?
  26. mRNAs  only  represent  20%  of  all   transcripts Sanbonmatsu 2012

    Globular   Model Models for lnc-RNA Structures: Extended  Stem-­ Loop  Model Minimally   Structured  Model
  27. Chemical  Probing  of   Steroid   Activator lncRNA (SRA)  

    (Novikova et  al.  2012) SRA  contains: 25  helical  segments,   16  terminal  loops 15  internal  loops 5  junction  regions Work  of  Dr.  Irina  Novikova – former  BGSU  student
  28. http://rna.bgsu.edu/jar3d JAR3D (Java-­based  Alignment  of  RNA  using  3D structure)  uses

      consensus  interactions,  basepair isostericity,  and  edit  distance  to  score   matches  between  sequences  and  motif  groups
  29. Building  an  SCFG  (Stochastic  Context-­ Free  Grammar)  model  for  an

     internal  loop H B 3D   structure(s) Consensus   interactions B B B * GU  cWW AG  tHS UA  tWH B GG  tSH UG  cWW Model  tree IsoDiscrepancy for  UA  base   combination  and  tWH family Normalized     subst.  score Hoogsteen A C G U Watson-­Crick A 0.0476 0.0123 0.0580 0.0123 C 0.0903 0.0854 0.0718 0.0123 G 0.0123 0.0123 0.0930 0.0451 U 0.2467 0.0123 0.0956 0.0925 Probability  scores  for  basepair   corresponding  to  UA  tWH normalize then ), (IDI f Score = IDI  with  respect   to  UA  tWH Hoogsteen A C G U Watson-­Crick A 4.09 3.61 C 2.63 2.75 3.12 G 2.57 4.23 U 0.00 2.51 2.58
  30. To analyze RNA 3D Motifs – We need to understand

    RNA interactions -- Especially basepairs
  31. Bases interact Edge-to-Edge to form Basepairs How many ways can

    bases interact to form pairs? 1. How many edges are there? 2. How many ways can edges come together?
  32. Experimental evidence for thinking of Bases as Triangles: A base

    can make up to 3 BPs G68/A101/G64/A52 in T. th. 16S: 1j53.pdb
  33. For each pair of Edges, Bases can come together in

    two Orientations: • Cis (sugars on same side) • Trans (sugars on opposite sides)
  34. Putting it together: Counting the Base Pairing Types = 12

    Base Pairing Types Leontis & Westhof, RNA, 2001 Watson-Crick Hoogsteen Sugar } Edge 1 { } Watson-Crick Hoogsteen Sugar Edge 2 {Cis Trans Orientation
  35. Nt2 H S WC Start with Nt1 and Add Nt2

    to Form Pairs (cis orientation) Nt1 H S WC Cis-­Watson-­Crick/Watson-­Crick  (cWW) Symbol:
  36. Nt2 H S WC Start with Nt1 and Add Nt2

    to Form Pairs (trans) Nt1 H S WC Trans-­Watson-­Crick/Watson-­Crick  (tWW) Symbol:
  37. Nt2 H S WC Start with Nt1 and Add Nt2

    to Form Pairs Nt1 H S WC Cis-­Watson-­Crick/Hoogsteen (cWH) Symbol:
  38. Nt2 H S W C Start with Nt1 and Add

    Nt2 to Form Pairs Nt1 H S WC Trans-­Watson-­Crick/Hoogsteen (tWH) Symbol:
  39. 33% of Basepairs are Non-Watson-Crick (in structured RNAs) Leontis &

    Westhof RNA (2001); Stombaugh et al. Nucl. Acids. Res. (2009)