Slide 1

Slide 1 text

D. Vaulot, L. Guillou and F. Not, DIPO team, UMR7144, Station Biologique de Roscoff, France PR2 An update and perspectives....

Slide 2

Slide 2 text

History • 1997 • Excel file created by D. Vaulot during L. Guillou thesis • 2000-2003 • Access/ARB database maintained by D. Vaulot during PICODIV • 2006-2012 • KeyDNAtools developed by L. Guillou • 2013- mid-2016 • PR2 creation • Coordination: L. Guillou • Database maintained by R. Christen • Web site active until mid-2016 : ssu-rrna.org • Sept 2016 ... • Database maintained by D. Vaulot • raw data put on Figshare • Database included in FROGS • April 2017... • Database transferred to MySQL on SCROL server • Development of R scripts to clean and update the database • November 2017 • Repository on GitHub (versioning)

Slide 3

Slide 3 text

PICODIV (2000-2003) Oslo 2003 Bremerhaven 2002 Bremerhaven 2002 Roscoff 2000

Slide 4

Slide 4 text

PICODIV (2000-2003) Access database ARB database Pre-PR2

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

PR2

Slide 7

Slide 7 text

MySQL - Database structure • Taxonomy is in a separate table • Sequences linked to taxonomy by species field

Slide 8

Slide 8 text

Taxonomy table • Follows PR2 convention (_X, _XX etc..) • Contains 47 000 lines • Each name is unique (i.e. does not appear in different columns or different lines). • Any daughter taxon has a unique mother taxon. • Still needs some cleaning

Slide 9

Slide 9 text

pr2 main • Sequences are linked to taxonomy by species name which is unique • Chimera are removed when PR2 is exported • Need to do more cleaning and we will be able to supply chimera lists in the future.

Slide 10

Slide 10 text

pr2 metadata Metadata contain • genbank annotations (gb_ fields) • some gb fields have been manually edited (e.g. gb_strain and gb_clone) • manual curate annotations (eg. sample_ fields) • fields computed from gb fields such as longitude and latitude

Slide 11

Slide 11 text

Manage under R

Slide 12

Slide 12 text

Database updates Division Class Who Date Status Rhizaria Collodaria T. Biard 2015 Done Chlorophyta M. Tragin 2015 Done Haptophyta B. Edvardsen 2015 Done Stramenopiles Bolidophyceae D. Vaulot 2017 Done Stramenopiles Pelagophyceae D. Vaulot 2017 Done Alveolata Ciliates W. T. Chen, C. Bachy Done Alveolata Dinoflagellates S. Mordret, D. Sarno Done Radiolaria Nasselaria Spumellaria Miguel Mendez In progress • Taxonomy structure cleaned • no redundancy (same taxo name in at 2 different levels) • no taxo name with 2 different parents • Metadata from GenBank incorporated (e.g. environmental / cultures etc...)

Slide 13

Slide 13 text

Figshare - flat files • https://doi.org/10.6084/m9.figshare.5913181 • Format provided • Metabarcode assignment : Mothur / Qiime / Dada2 / Usearch • BLAST • Metadata • Note : PhytoRef (Plastid 16S rRNA) has also been transfered to Figshare : https://doi.org/10.6084/m9.figshare.4689826. An easy way to make data widely available

Slide 14

Slide 14 text

https://github.com/vaulot/pr2_database • Flat files • Versioning • Wiki

Slide 15

Slide 15 text

175,000 sequences

Slide 16

Slide 16 text

Short term plans R scripts • Clean up base: short sequences, taxonomy • Update taxonomy: Green algae, Ciliates • Import new GenBank sequences with annotation • Incorporate new metadata (mixotrophs) • Analyze primers (Geisen) Web site • GitHub • Provide new formats • SQLite database • R data set (.rds) • Develop tutorial and example scripts to access PR2 • Develop small shiny apps

Slide 17

Slide 17 text

Medium term plans • Link to EukRef • Base maintenance • Detect chimeras • Define reference sequences • Provide alignments for specific groups • Incorporate • 16S plastid • Functionalities for web site • BLAST • Visualisation of metadata (position etc...) • Download Chimeras • KeyDNA tools ? Keep informed on Research Gate : https://www.researchgate.net/project/Protist- Ribosomal-Reference-database-PR2