Slide 1

Slide 1 text

The Protist Ribosomal Reference database ecosystem Daniel Vaulot and the PR2 team AFEM Clermont - 2022-06-23

Slide 2

Slide 2 text

• PR2 • PR2 primers • metaPR2 History History Outline

Slide 3

Slide 3 text

3 / 32 Team Stefan Geisen Fred Mahé David Bass pr2-primers pr2-database metapr2

Slide 4

Slide 4 text

Metabarcoding

Slide 5

Slide 5 text

pr2-primers pr2-database metapr2 Metabarcoding

Slide 6

Slide 6 text

The PR2 database

Slide 7

Slide 7 text

7 / 32 History

Slide 8

Slide 8 text

8 / 32 More than 700 papers citing PR2 https://pr2-database.org/papers/papers-citing-pr2

Slide 9

Slide 9 text

pr2-database.org 9 / 32 Key features ● Version 4.14.0 released in May 2021 ● Web site: https://pr2-database.org/ ● Unified taxonomy (8 ranks from kingdom to species) ● 197 602 sequences □ nuclear 18S rRNA □ plastid 16S rRNA (PhytoRef) □ bacteria and archaea 16S rRNA ● Quality control (e.g. > 500 bp., N < 20, no "NN") ● Metadata (e.g. coordinates, environment) Guillou. et al. 2013. The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote Small Sub-Unit rRNA sequences with curated taxonomy. Nucleic Acids Res. 41:D597–604.

Slide 10

Slide 10 text

10 / 32 Management ● MySQL database ● R scripts for: ● importing ● exporting ● validating ● Data provided as ● text files (for dada2, mothur) ● fasta (phylogeny) ● R package

Slide 11

Slide 11 text

11 / 32 R package

Slide 12

Slide 12 text

12 / 32 Annotation - Contributions https://pr2-database.org/documentation/pr2-taxonomic-groups/

Slide 13

Slide 13 text

13 / 32 Annotation - Eukref https://pr2-database.org/eukref/about/

Slide 14

Slide 14 text

• Functional annotation • Trophic mode (photo, hetero, mixo…) • Size group (pico, nano, micro…) • Taxonomic annotation • Dinoflagellates, Ciliates • EukRef pipeline • Full operon • Interactive web interface History History What's next for pr2-database

Slide 15

Slide 15 text

The PR2 primer database

Slide 16

Slide 16 text

16 / 32 18S rRNA primers ● Wide diversity of primers and sets ● No database for protists ● Taxonomic specificity of primers? Vaulot, D., Mahé, F., Bass, D., Geisen, S., 2022. pr2-primer: an 18S rRNA primer database for protists. Molecular Ecology Resources 22, 168–179. https://doi.org/10.1111/1755-0998.13465

Slide 17

Slide 17 text

17 / 32 pr2-primers database https://app.pr2-primers.org

Slide 18

Slide 18 text

18 / 32 In silico analysis against PR2 ● Mismatches # ● Mismatches position ● Amplicon size Precomputed

Slide 19

Slide 19 text

19 / 32 Test your own

Slide 20

Slide 20 text

• Update PR2 from 4.12.0 to 4.14.0 • Add more primers • ITS/28S primers History History What's next for pr2-primers

Slide 21

Slide 21 text

The metaPR2 barcode database

Slide 22

Slide 22 text

22 / 32 Motivation ● In the last decade, many metabarcoding studies ● Data hard to compare: □ Different primers □ Different processing □ Different similarity levels ● Processed data usually not available ● Metadata not available ● Few global datasets used (Tara, Malaspina) ● These datasets only temperate and tropical marine

Slide 23

Slide 23 text

23 / 32 Strategy ● Scan papers and build database ● Start from raw data (fastq) available from GenBank SRA ● Use dada2 pipeline producing ASVs □ Different datasets are comparable ● Annotate taxonomy with PR2 ● Integrate metadata □ Latitude and longitude □ Depth □ Substrate (water, ice, soil) ● Data stored in MySQL database ● Develop web interface using R shiny Vaulot, D., Sim, C.W.H., Ong, D., Teo, B., Biwer, C., Jamy, M., Lopes dos Santos, A., 2022. metaPR2: a database of eukaryotic 18S rRNA metabarcodes with an emphasis on protists. In press in Molecular Ecology Resources. Deposited to BioRxiv https://doi.org/10.1101/2022.02.04.479133

Slide 24

Slide 24 text

60°S 30°S 0° 30°N 60°N 120°W 60°W 0° 60°E 120°E Project OSD Tara Malaspina Other 24 / 32 metapr2 version 1.0 ● Datasets: 41 □ Tara Oceans (reprocessed with dada2) □ Malaspina □ Ocean Sampling Day - 2014 & 2015 □ Arctic datasets □ Deep Sea □ Lakes, Rivers, Soils ● Samples: 4,150 ● ASVs: 90,000

Slide 25

Slide 25 text

25 / 32 Samples and ASVs 3216 1166 V9 V4 0 1000 2000 3000 gene region 4063 319 RNA DNA 0 1000 2000 3000 4000 DNA RNA 2075 1116 260 152 779 terrestrial freshwater rivers freshwater lakes coastal oceanic 0 500 1000 1500 2000 ecosystem 3516 51 13 23 779 soil sediment epibiota ice water 0 1000 2000 3000 4000 substrate 1197 191 625 290 290 227 1562 total meso micro nano−micro nano pico−nano pico 0 500 1000 1500 fraction name 6 3090 925 216 92 19 34 bottom composite bathypelagic mesopelagic euphotic surface under ice 0 1000 2000 3000 depth level Number of samples Dinoflagellata Ciliophora Apicomplexa Perkinsea Lobosa Chlorophyta Cryptophyta Haptophyta Picozoa Telonemia Cercozoa Radiolaria Ochrophyta Sagenista Opalozoa Alveolata Amoebozoa Archaeplastida Hacrobia Opisthokonta Rhizaria Stramenopiles Reads Dinofla Ciliophora Alve cASVs Protists only

Slide 26

Slide 26 text

26 / 32 Web interface ● Built with R shiny □ Available also as R package ● Panels □ Datasets □ Treemaps □ Maps □ Barplots □ Diversity □ Query □ Download

Slide 27

Slide 27 text

27 / 32 Web interface Alpha and beta diversity Maps Query

Slide 28

Slide 28 text

28 / 32 Environment n = 772 n = 152 n = 246 n = 1055 n = 664 oceanic coastal freshwater lakes freshwater rivers terrestrial 0 20 40 60 80 100 % of reads Supergroup Alveolata Amoebozoa Archaeplastida Hacrobia Opisthokonta Rhizaria Stramenopiles A n = 772 n = 152 n = 246 n = 1055 n = 664 oceanic coastal freshwater lakes freshwater rivers terrestrial 0 20 40 60 80 100 % of reads Ecological function parasites phagotrophs dinoflagellates phototrophs B 0 2 4 6 oceanic coastal freshwater lakes freshwater rivers terrestrial Shannon index C oceanic coastal terrestrial 80 85 90 95 100 % identity with PR2 sequences freshwater lakes freshwater rivers D

Slide 29

Slide 29 text

29 / 32 Transitions marine - terrestrial Jamy, M., Biwer, C., Vaulot, D., Obiol, A., Jing, H., Peura, S., Massana, R., Burki, F., 2022. Global patterns and rates of habitat transitions across the eukaryotic tree of life. Nature Ecology and Evolution in press. https://doi.org/10.1101/2021.11.01.466765

Slide 30

Slide 30 text

30 / 32 Biogeography Yau et al.., 2020. Mantoniella beaufortii and Mantoniella baffinensis sp. nov. (Mamiellales, Mamiellophyceae), two new green algal species from the high arctic. Journal of Phycology 56, 37–51. https://doi.org/10.1111/jpy.12932

Slide 31

Slide 31 text

• Datasets □ Version 2.0 (sep 2022) : 18 new datasets □ More to come (requests can be made) • Data □ Functional annotation □ Clustering □ BLAST similarity • Web □ Heatmaps History History What's next for metapr2

Slide 32

Slide 32 text

• PR2 collaborators: Laure, Javier, Stefan, David, Adriana, Fabien… • Authors of metabarcode papers… • Biomarks (EU) • EukRef (Moore Foundation) • CNRS • Sorbonne Université • Nanyang Technological University History History Thanks to