$30 off During Our Annual Pro Sale. View Details »

The Protist Reference Database ecosystem

Daniel Vaulot
August 20, 2022

The Protist Reference Database ecosystem

2022-06-23 - AFEM meeting Clermont-Ferrand, France

Daniel Vaulot

August 20, 2022
Tweet

More Decks by Daniel Vaulot

Other Decks in Science

Transcript

  1. The Protist Ribosomal Reference
    database ecosystem
    Daniel Vaulot and the PR2 team
    AFEM Clermont - 2022-06-23

    View Slide

  2. • PR2
    • PR2 primers
    • metaPR2
    History
    History
    Outline

    View Slide

  3. 3 / 32
    Team
    Stefan Geisen Fred Mahé David Bass
    pr2-primers
    pr2-database
    metapr2

    View Slide

  4. Metabarcoding

    View Slide

  5. pr2-primers pr2-database metapr2
    Metabarcoding

    View Slide

  6. The PR2 database

    View Slide

  7. 7 / 32
    History

    View Slide

  8. 8 / 32
    More than 700 papers citing PR2
    https://pr2-database.org/papers/papers-citing-pr2

    View Slide

  9. pr2-database.org
    9 / 32
    Key features
    ● Version 4.14.0 released in May 2021
    ● Web site: https://pr2-database.org/
    ● Unified taxonomy (8 ranks from kingdom to species)
    ● 197 602 sequences
    □ nuclear 18S rRNA
    □ plastid 16S rRNA (PhytoRef)
    □ bacteria and archaea 16S rRNA
    ● Quality control (e.g. > 500 bp., N < 20, no "NN")
    ● Metadata (e.g. coordinates, environment)
    Guillou. et al. 2013. The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote Small Sub-Unit rRNA sequences with curated taxonomy.
    Nucleic Acids Res. 41:D597–604.

    View Slide

  10. 10 / 32
    Management
    ● MySQL database
    ● R scripts for:
    ● importing
    ● exporting
    ● validating
    ● Data provided as
    ● text files (for dada2, mothur)
    ● fasta (phylogeny)
    ● R package

    View Slide

  11. 11 / 32
    R package

    View Slide

  12. 12 / 32
    Annotation - Contributions
    https://pr2-database.org/documentation/pr2-taxonomic-groups/

    View Slide

  13. 13 / 32
    Annotation - Eukref
    https://pr2-database.org/eukref/about/

    View Slide

  14. • Functional annotation
    • Trophic mode (photo, hetero, mixo…)
    • Size group (pico, nano, micro…)
    • Taxonomic annotation
    • Dinoflagellates, Ciliates
    • EukRef pipeline
    • Full operon
    • Interactive web interface
    History
    History
    What's next for pr2-database

    View Slide

  15. The PR2 primer database

    View Slide

  16. 16 / 32
    18S rRNA primers
    ● Wide diversity of primers and sets
    ● No database for protists
    ● Taxonomic specificity of primers?
    Vaulot, D., Mahé, F., Bass, D., Geisen, S., 2022. pr2-primer: an 18S rRNA primer database for protists. Molecular Ecology Resources 22, 168–179.
    https://doi.org/10.1111/1755-0998.13465

    View Slide

  17. 17 / 32
    pr2-primers database
    https://app.pr2-primers.org

    View Slide

  18. 18 / 32
    In silico analysis against PR2
    ● Mismatches #
    ● Mismatches position
    ● Amplicon size Precomputed

    View Slide

  19. 19 / 32
    Test your own

    View Slide

  20. • Update PR2 from 4.12.0 to 4.14.0
    • Add more primers
    • ITS/28S primers
    History
    History
    What's next for pr2-primers

    View Slide

  21. The metaPR2 barcode database

    View Slide

  22. 22 / 32
    Motivation
    ● In the last decade, many metabarcoding studies
    ● Data hard to compare:
    □ Different primers
    □ Different processing
    □ Different similarity levels
    ● Processed data usually not available
    ● Metadata not available
    ● Few global datasets used (Tara, Malaspina)
    ● These datasets only temperate and tropical marine

    View Slide

  23. 23 / 32
    Strategy
    ● Scan papers and build database
    ● Start from raw data (fastq) available from GenBank SRA
    ● Use dada2 pipeline producing ASVs
    □ Different datasets are comparable
    ● Annotate taxonomy with PR2
    ● Integrate metadata
    □ Latitude and longitude
    □ Depth
    □ Substrate (water, ice, soil)
    ● Data stored in MySQL database
    ● Develop web interface using R shiny
    Vaulot, D., Sim, C.W.H., Ong, D., Teo, B., Biwer, C., Jamy, M., Lopes dos Santos, A., 2022. metaPR2: a database of eukaryotic 18S rRNA metabarcodes with an
    emphasis on protists. In press in Molecular Ecology Resources. Deposited to BioRxiv https://doi.org/10.1101/2022.02.04.479133

    View Slide

  24. 60°S
    30°S

    30°N
    60°N
    120°W 60°W 0° 60°E 120°E
    Project
    OSD
    Tara
    Malaspina
    Other
    24 / 32
    metapr2 version 1.0
    ● Datasets: 41
    □ Tara Oceans (reprocessed with dada2)
    □ Malaspina
    □ Ocean Sampling Day - 2014 & 2015
    □ Arctic datasets
    □ Deep Sea
    □ Lakes, Rivers, Soils
    ● Samples: 4,150
    ● ASVs: 90,000

    View Slide

  25. 25 / 32
    Samples and ASVs
    3216
    1166
    V9
    V4
    0 1000 2000 3000
    gene region
    4063
    319
    RNA
    DNA
    0 1000 2000 3000 4000
    DNA RNA
    2075
    1116
    260
    152
    779
    terrestrial
    freshwater rivers
    freshwater lakes
    coastal
    oceanic
    0 500 1000 1500 2000
    ecosystem
    3516
    51
    13
    23
    779
    soil
    sediment
    epibiota
    ice
    water
    0 1000 2000 3000 4000
    substrate
    1197
    191
    625
    290
    290
    227
    1562
    total
    meso
    micro
    nano−micro
    nano
    pico−nano
    pico
    0 500 1000 1500
    fraction name
    6
    3090
    925
    216
    92
    19
    34
    bottom
    composite
    bathypelagic
    mesopelagic
    euphotic
    surface
    under ice
    0 1000 2000 3000
    depth level
    Number of samples
    Dinoflagellata
    Ciliophora
    Apicomplexa
    Perkinsea
    Lobosa
    Chlorophyta
    Cryptophyta
    Haptophyta
    Picozoa
    Telonemia
    Cercozoa Radiolaria
    Ochrophyta Sagenista
    Opalozoa
    Alveolata Amoebozoa
    Archaeplastida
    Hacrobia
    Opisthokonta
    Rhizaria
    Stramenopiles
    Reads
    Dinofla
    Ciliophora
    Alve
    cASVs
    Protists only

    View Slide

  26. 26 / 32
    Web interface
    ● Built with R shiny
    □ Available also as R package
    ● Panels
    □ Datasets
    □ Treemaps
    □ Maps
    □ Barplots
    □ Diversity
    □ Query
    □ Download

    View Slide

  27. 27 / 32
    Web interface
    Alpha and beta diversity
    Maps
    Query

    View Slide

  28. 28 / 32
    Environment
    n = 772
    n = 152
    n = 246
    n = 1055
    n = 664
    oceanic
    coastal
    freshwater lakes
    freshwater rivers
    terrestrial
    0 20 40 60 80 100
    % of reads
    Supergroup
    Alveolata
    Amoebozoa
    Archaeplastida
    Hacrobia
    Opisthokonta
    Rhizaria
    Stramenopiles
    A
    n = 772
    n = 152
    n = 246
    n = 1055
    n = 664
    oceanic
    coastal
    freshwater lakes
    freshwater rivers
    terrestrial
    0 20 40 60 80 100
    % of reads
    Ecological function
    parasites phagotrophs dinoflagellates phototrophs
    B
    0 2 4 6
    oceanic
    coastal
    freshwater lakes
    freshwater rivers
    terrestrial
    Shannon index
    C
    oceanic
    coastal
    terrestrial
    80 85 90 95 100
    % identity with PR2 sequences
    freshwater lakes
    freshwater rivers
    D

    View Slide

  29. 29 / 32
    Transitions marine - terrestrial
    Jamy, M., Biwer, C., Vaulot, D., Obiol, A., Jing, H., Peura, S., Massana, R., Burki, F., 2022. Global patterns and rates of habitat transitions across the eukaryotic tree
    of life. Nature Ecology and Evolution in press. https://doi.org/10.1101/2021.11.01.466765

    View Slide

  30. 30 / 32
    Biogeography
    Yau et al.., 2020. Mantoniella beaufortii and Mantoniella baffinensis sp. nov. (Mamiellales, Mamiellophyceae), two new green algal species from the high arctic.
    Journal of Phycology 56, 37–51. https://doi.org/10.1111/jpy.12932

    View Slide

  31. • Datasets
    □ Version 2.0 (sep 2022) : 18 new datasets
    □ More to come (requests can be made)
    • Data
    □ Functional annotation
    □ Clustering
    □ BLAST similarity
    • Web
    □ Heatmaps
    History
    History
    What's next for metapr2

    View Slide

  32. • PR2 collaborators: Laure, Javier, Stefan, David, Adriana, Fabien…
    • Authors of metabarcode papers…
    • Biomarks (EU)
    • EukRef (Moore Foundation)
    • CNRS
    • Sorbonne Université
    • Nanyang Technological University
    History
    History
    Thanks to

    View Slide