Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Protist Reference Database ecosystem

Daniel Vaulot
August 20, 2022

The Protist Reference Database ecosystem

2022-06-23 - AFEM meeting Clermont-Ferrand, France

Daniel Vaulot

August 20, 2022
Tweet

More Decks by Daniel Vaulot

Other Decks in Science

Transcript

  1. 3 / 32 Team Stefan Geisen Fred Mahé David Bass

    pr2-primers pr2-database metapr2
  2. pr2-database.org 9 / 32 Key features • Version 4.14.0 released

    in May 2021 • Web site: https://pr2-database.org/ • Unified taxonomy (8 ranks from kingdom to species) • 197 602 sequences □ nuclear 18S rRNA □ plastid 16S rRNA (PhytoRef) □ bacteria and archaea 16S rRNA • Quality control (e.g. > 500 bp., N < 20, no "NN") • Metadata (e.g. coordinates, environment) Guillou. et al. 2013. The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote Small Sub-Unit rRNA sequences with curated taxonomy. Nucleic Acids Res. 41:D597–604.
  3. 10 / 32 Management • MySQL database • R scripts

    for: • importing • exporting • validating • Data provided as • text files (for dada2, mothur) • fasta (phylogeny) • R package
  4. • Functional annotation • Trophic mode (photo, hetero, mixo…) •

    Size group (pico, nano, micro…) • Taxonomic annotation • Dinoflagellates, Ciliates • EukRef pipeline • Full operon • Interactive web interface History History What's next for pr2-database
  5. 16 / 32 18S rRNA primers • Wide diversity of

    primers and sets • No database for protists • Taxonomic specificity of primers? Vaulot, D., Mahé, F., Bass, D., Geisen, S., 2022. pr2-primer: an 18S rRNA primer database for protists. Molecular Ecology Resources 22, 168–179. https://doi.org/10.1111/1755-0998.13465
  6. 18 / 32 In silico analysis against PR2 • Mismatches

    # • Mismatches position • Amplicon size Precomputed
  7. • Update PR2 from 4.12.0 to 4.14.0 • Add more

    primers • ITS/28S primers History History What's next for pr2-primers
  8. 22 / 32 Motivation • In the last decade, many

    metabarcoding studies • Data hard to compare: □ Different primers □ Different processing □ Different similarity levels • Processed data usually not available • Metadata not available • Few global datasets used (Tara, Malaspina) • These datasets only temperate and tropical marine
  9. 23 / 32 Strategy • Scan papers and build database

    • Start from raw data (fastq) available from GenBank SRA • Use dada2 pipeline producing ASVs □ Different datasets are comparable • Annotate taxonomy with PR2 • Integrate metadata □ Latitude and longitude □ Depth □ Substrate (water, ice, soil) • Data stored in MySQL database • Develop web interface using R shiny Vaulot, D., Sim, C.W.H., Ong, D., Teo, B., Biwer, C., Jamy, M., Lopes dos Santos, A., 2022. metaPR2: a database of eukaryotic 18S rRNA metabarcodes with an emphasis on protists. In press in Molecular Ecology Resources. Deposited to BioRxiv https://doi.org/10.1101/2022.02.04.479133
  10. 60°S 30°S 0° 30°N 60°N 120°W 60°W 0° 60°E 120°E

    Project OSD Tara Malaspina Other 24 / 32 metapr2 version 1.0 • Datasets: 41 □ Tara Oceans (reprocessed with dada2) □ Malaspina □ Ocean Sampling Day - 2014 & 2015 □ Arctic datasets □ Deep Sea □ Lakes, Rivers, Soils • Samples: 4,150 • ASVs: 90,000
  11. 25 / 32 Samples and ASVs 3216 1166 V9 V4

    0 1000 2000 3000 gene region 4063 319 RNA DNA 0 1000 2000 3000 4000 DNA RNA 2075 1116 260 152 779 terrestrial freshwater rivers freshwater lakes coastal oceanic 0 500 1000 1500 2000 ecosystem 3516 51 13 23 779 soil sediment epibiota ice water 0 1000 2000 3000 4000 substrate 1197 191 625 290 290 227 1562 total meso micro nano−micro nano pico−nano pico 0 500 1000 1500 fraction name 6 3090 925 216 92 19 34 bottom composite bathypelagic mesopelagic euphotic surface under ice 0 1000 2000 3000 depth level Number of samples Dinoflagellata Ciliophora Apicomplexa Perkinsea Lobosa Chlorophyta Cryptophyta Haptophyta Picozoa Telonemia Cercozoa Radiolaria Ochrophyta Sagenista Opalozoa Alveolata Amoebozoa Archaeplastida Hacrobia Opisthokonta Rhizaria Stramenopiles Reads Dinofla Ciliophora Alve cASVs Protists only
  12. 26 / 32 Web interface • Built with R shiny

    □ Available also as R package • Panels □ Datasets □ Treemaps □ Maps □ Barplots □ Diversity □ Query □ Download
  13. 28 / 32 Environment n = 772 n = 152

    n = 246 n = 1055 n = 664 oceanic coastal freshwater lakes freshwater rivers terrestrial 0 20 40 60 80 100 % of reads Supergroup Alveolata Amoebozoa Archaeplastida Hacrobia Opisthokonta Rhizaria Stramenopiles A n = 772 n = 152 n = 246 n = 1055 n = 664 oceanic coastal freshwater lakes freshwater rivers terrestrial 0 20 40 60 80 100 % of reads Ecological function parasites phagotrophs dinoflagellates phototrophs B 0 2 4 6 oceanic coastal freshwater lakes freshwater rivers terrestrial Shannon index C oceanic coastal terrestrial 80 85 90 95 100 % identity with PR2 sequences freshwater lakes freshwater rivers D
  14. 29 / 32 Transitions marine - terrestrial Jamy, M., Biwer,

    C., Vaulot, D., Obiol, A., Jing, H., Peura, S., Massana, R., Burki, F., 2022. Global patterns and rates of habitat transitions across the eukaryotic tree of life. Nature Ecology and Evolution in press. https://doi.org/10.1101/2021.11.01.466765
  15. 30 / 32 Biogeography Yau et al.., 2020. Mantoniella beaufortii

    and Mantoniella baffinensis sp. nov. (Mamiellales, Mamiellophyceae), two new green algal species from the high arctic. Journal of Phycology 56, 37–51. https://doi.org/10.1111/jpy.12932
  16. • Datasets □ Version 2.0 (sep 2022) : 18 new

    datasets □ More to come (requests can be made) • Data □ Functional annotation □ Clustering □ BLAST similarity • Web □ Heatmaps History History What's next for metapr2
  17. • PR2 collaborators: Laure, Javier, Stefan, David, Adriana, Fabien… •

    Authors of metabarcode papers… • Biomarks (EU) • EukRef (Moore Foundation) • CNRS • Sorbonne Université • Nanyang Technological University History History Thanks to