Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PR2 - 18S rRNA sequence database

PR2 - 18S rRNA sequence database

A short presentation about the present and future of the PR2 database. For more see https://github.com/vaulot/pr2_database.

Daniel Vaulot

May 23, 2018

More Decks by Daniel Vaulot

Other Decks in Research


  1. D. Vaulot, L. Guillou and F. Not, DIPO team, UMR7144,

    Station Biologique de Roscoff, France PR2 An update and perspectives....
  2. History • 1997 • Excel file created by D. Vaulot

    during L. Guillou thesis • 2000-2003 • Access/ARB database maintained by D. Vaulot during PICODIV • 2006-2012 • KeyDNAtools developed by L. Guillou • 2013- mid-2016 • PR2 creation • Coordination: L. Guillou • Database maintained by R. Christen • Web site active until mid-2016 : ssu-rrna.org • Sept 2016 ... • Database maintained by D. Vaulot • raw data put on Figshare • Database included in FROGS • April 2017... • Database transferred to MySQL on SCROL server • Development of R scripts to clean and update the database • November 2017 • Repository on GitHub (versioning)
  3. PR2

  4. MySQL - Database structure • Taxonomy is in a separate

    table • Sequences linked to taxonomy by species field
  5. Taxonomy table • Follows PR2 convention (_X, _XX etc..) •

    Contains 47 000 lines • Each name is unique (i.e. does not appear in different columns or different lines). • Any daughter taxon has a unique mother taxon. • Still needs some cleaning
  6. pr2 main • Sequences are linked to taxonomy by species

    name which is unique • Chimera are removed when PR2 is exported • Need to do more cleaning and we will be able to supply chimera lists in the future.
  7. pr2 metadata Metadata contain • genbank annotations (gb_ fields) •

    some gb fields have been manually edited (e.g. gb_strain and gb_clone) • manual curate annotations (eg. sample_ fields) • fields computed from gb fields such as longitude and latitude
  8. Database updates Division Class Who Date Status Rhizaria Collodaria T.

    Biard 2015 Done Chlorophyta M. Tragin 2015 Done Haptophyta B. Edvardsen 2015 Done Stramenopiles Bolidophyceae D. Vaulot 2017 Done Stramenopiles Pelagophyceae D. Vaulot 2017 Done Alveolata Ciliates W. T. Chen, C. Bachy Done Alveolata Dinoflagellates S. Mordret, D. Sarno Done Radiolaria Nasselaria Spumellaria Miguel Mendez In progress • Taxonomy structure cleaned • no redundancy (same taxo name in at 2 different levels) • no taxo name with 2 different parents • Metadata from GenBank incorporated (e.g. environmental / cultures etc...)
  9. Figshare - flat files • https://doi.org/10.6084/m9.figshare.5913181 • Format provided •

    Metabarcode assignment : Mothur / Qiime / Dada2 / Usearch • BLAST • Metadata • Note : PhytoRef (Plastid 16S rRNA) has also been transfered to Figshare : https://doi.org/10.6084/m9.figshare.4689826. An easy way to make data widely available
  10. Short term plans R scripts • Clean up base: short

    sequences, taxonomy • Update taxonomy: Green algae, Ciliates • Import new GenBank sequences with annotation • Incorporate new metadata (mixotrophs) • Analyze primers (Geisen) Web site • GitHub • Provide new formats • SQLite database • R data set (.rds) • Develop tutorial and example scripts to access PR2 • Develop small shiny apps
  11. Medium term plans • Link to EukRef • Base maintenance

    • Detect chimeras • Define reference sequences • Provide alignments for specific groups • Incorporate • 16S plastid • Functionalities for web site • BLAST • Visualisation of metadata (position etc...) • Download Chimeras • KeyDNA tools ? Keep informed on Research Gate : https://www.researchgate.net/project/Protist- Ribosomal-Reference-database-PR2