D. Vaulot, L. Guillou and F. Not, DIPO team, UMR7144, Station Biologique de Roscoff, France
PR2
An update and perspectives....
Slide 2
Slide 2 text
History
• 1997
• Excel file created by D. Vaulot during L. Guillou thesis
• 2000-2003
• Access/ARB database maintained by D. Vaulot during PICODIV
• 2006-2012
• KeyDNAtools developed by L. Guillou
• 2013- mid-2016
• PR2 creation
• Coordination: L. Guillou
• Database maintained by R. Christen
• Web site active until mid-2016 : ssu-rrna.org
• Sept 2016 ...
• Database maintained by D. Vaulot
• raw data put on Figshare
• Database included in FROGS
• April 2017...
• Database transferred to MySQL on SCROL server
• Development of R scripts to clean and update the database
• November 2017
• Repository on GitHub (versioning)
Slide 3
Slide 3 text
PICODIV (2000-2003)
Oslo 2003
Bremerhaven 2002 Bremerhaven 2002
Roscoff 2000
MySQL - Database structure
• Taxonomy is in a separate table
• Sequences linked to taxonomy by species field
Slide 8
Slide 8 text
Taxonomy table
• Follows PR2 convention (_X, _XX etc..)
• Contains 47 000 lines
• Each name is unique (i.e. does not appear
in different columns or different lines).
• Any daughter taxon has a unique mother
taxon.
• Still needs some cleaning
Slide 9
Slide 9 text
pr2 main
• Sequences are linked to taxonomy by
species name which is unique
• Chimera are removed when PR2 is exported
• Need to do more cleaning and we will be
able to supply chimera lists in the future.
Slide 10
Slide 10 text
pr2 metadata
Metadata contain
• genbank annotations (gb_ fields)
• some gb fields have been manually
edited (e.g. gb_strain and gb_clone)
• manual curate annotations (eg.
sample_ fields)
• fields computed from gb fields such
as longitude and latitude
Slide 11
Slide 11 text
Manage under R
Slide 12
Slide 12 text
Database updates
Division Class Who Date Status
Rhizaria Collodaria T. Biard
2015 Done
Chlorophyta M. Tragin
2015 Done
Haptophyta B. Edvardsen
2015 Done
Stramenopiles Bolidophyceae D. Vaulot
2017 Done
Stramenopiles Pelagophyceae D. Vaulot
2017 Done
Alveolata Ciliates W. T. Chen, C. Bachy
Done
Alveolata Dinoflagellates S. Mordret, D. Sarno
Done
Radiolaria Nasselaria
Spumellaria
Miguel Mendez
In progress
• Taxonomy structure cleaned
• no redundancy (same taxo name in at 2 different levels)
• no taxo name with 2 different parents
• Metadata from GenBank incorporated (e.g. environmental / cultures etc...)
Slide 13
Slide 13 text
Figshare - flat files
• https://doi.org/10.6084/m9.figshare.5913181
• Format provided
• Metabarcode assignment : Mothur / Qiime / Dada2 / Usearch
• BLAST
• Metadata
• Note : PhytoRef (Plastid 16S rRNA) has also been transfered to Figshare :
https://doi.org/10.6084/m9.figshare.4689826.
An easy way to make data widely available
Slide 14
Slide 14 text
https://github.com/vaulot/pr2_database
• Flat files
• Versioning
• Wiki
Slide 15
Slide 15 text
175,000 sequences
Slide 16
Slide 16 text
Short term plans
R scripts
• Clean up base: short sequences, taxonomy
• Update taxonomy: Green algae, Ciliates
• Import new GenBank sequences with annotation
• Incorporate new metadata (mixotrophs)
• Analyze primers (Geisen)
Web site
• GitHub
• Provide new formats
• SQLite database
• R data set (.rds)
• Develop tutorial and example scripts to access PR2
• Develop small shiny apps
Slide 17
Slide 17 text
Medium term plans
• Link to EukRef
• Base maintenance
• Detect chimeras
• Define reference sequences
• Provide alignments for specific groups
• Incorporate
• 16S plastid
• Functionalities for web site
• BLAST
• Visualisation of metadata (position etc...)
• Download Chimeras
• KeyDNA tools ?
Keep informed on Research Gate : https://www.researchgate.net/project/Protist-
Ribosomal-Reference-database-PR2