Slide 1

Slide 1 text

French Guiana Virus Hunting in Nacho Caballero

Slide 2

Slide 2 text

French Guiana

Slide 3

Slide 3 text

Rodents Bats

Slide 4

Slide 4 text

Rodents Bats Leishmania

Slide 5

Slide 5 text

Capture

Slide 6

Slide 6 text

Capture Isolate viral particles

Slide 7

Slide 7 text

Capture Isolate viral particles Extract RNA

Slide 8

Slide 8 text

Capture Isolate viral particles Extract RNA Sequence

Slide 9

Slide 9 text

Estimated read coverage % reads with coverage smaller than x Rodents

Slide 10

Slide 10 text

Estimated read coverage % reads with coverage smaller than x Rodents

Slide 11

Slide 11 text

Estimated read coverage % reads with coverage smaller than x Rodents Bats

Slide 12

Slide 12 text

Read How can we estimate the coverage without a reference genome?

Slide 13

Slide 13 text

Read How can we estimate the coverage without a reference genome?

Slide 14

Slide 14 text

K-mers Read How can we estimate the coverage without a reference genome?

Slide 15

Slide 15 text

How can we estimate the coverage without a reference genome?

Slide 16

Slide 16 text

1 1 1 1 1 1 1 How can we estimate the coverage without a reference genome?

Slide 17

Slide 17 text

7 8 10 8 11 3 6

Slide 18

Slide 18 text

7 8 10 8 11 3 6 Median k-mer count ≈ Read coverage

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

k-mers make it possible to align without a reference

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

Problem: each sequencing error introduces k erroneous k-mers

Slide 23

Slide 23 text

Problem: each sequencing error introduces k erroneous k-mers

Slide 24

Slide 24 text

7 8 10 8 11 3 6 Over a threshold, additional reads are redundant

Slide 25

Slide 25 text

5 5 5 5 5 3 5 Solution: digital normalization reduces redundancy and errors

Slide 26

Slide 26 text

Assembly

Slide 27

Slide 27 text

Assembly SPADes

Slide 28

Slide 28 text

Assembly Alignment

Slide 29

Slide 29 text

Assembly Alignment BLAST

Slide 30

Slide 30 text

Assembly Taxonomy Alignment

Slide 31

Slide 31 text

Assembly Taxonomy Alignment NCBI

Slide 32

Slide 32 text

Problem: 67% of contigs in rodent dataset (serum) align to human sequences

Slide 33

Slide 33 text

Problem: 67% of contigs in rodent dataset (serum) align to human sequences Night-heron coronavirus HKU19 (1 Kb) Simian hemorrhagic fever virus (300 bp) Equine arteritis virus (3.7 Kb) Possum nidovirus Rodent hepacivirus Chipmunk parvovirus Theiler's disease-associated virus Reticuloendotheliosis virus Mosquito VEM Anellovirus SDBVL A Porcine reproductive and respiratory syndrome virus Dragonfly-associated circular virus 1 Gemycircularvirus 3 Rodent pegivirus Cyclovirus PK5510 Hypericum japonicum associated circular DNA virus

Slide 34

Slide 34 text

Pig stool associated circular ssDNA virus (1Kb) Avian gyrovirus 2 Torque teno sus virus 1a Mosquito VEM virus SDBVL G Turdivirus 3 Problem: 92% of contigs in bat dataset (droppings) don’t align to anything in NCBI

Slide 35

Slide 35 text

Lymphocytic choriomeningitis virus (7kb) Hepatitis C virus Amphotropic murine leukemia virus Murid herpesvirus 1 Mosquito VEM Anellovirus SDBVL A Rat retrovirus SC1 Mason-Pfizer monkey virus (retrovirus) Eidolon helvum parvovirus 2 Periplaneta fuliginosa densovirus (also a parvovirus) Moloney murine sarcoma virus Sclerotinia sclerotiorum hypovirulence associated DNA virus 1 Problem: 95% of contigs in rodent dataset 2 (serum, spleen) align to mouse sequences (2)

Slide 36

Slide 36 text

7 out of 10 samples contained more than 1Kb of Leishmania RNA virus (94% ident) 5 Kb genome

Slide 37

Slide 37 text

Lessons

Slide 38

Slide 38 text

Assume that 50% of your samples are going to fail Lessons

Slide 39

Slide 39 text

Assume that 50% of your samples are going to fail Lessons Design a small experiment, then iterate

Slide 40

Slide 40 text

Assume that 50% of your samples are going to fail Lessons Design a small experiment, then iterate Come up with excuses to learn