Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Hunting for viruses in French Guiana
Search
Nacho Caballero
April 29, 2014
Science
0
55
Hunting for viruses in French Guiana
Lab meeting presentation about my work doing viral metagenomic analysis in French Guiana
Nacho Caballero
April 29, 2014
Tweet
Share
More Decks by Nacho Caballero
See All by Nacho Caballero
Bridging data analysis and interactive visualization
nachocab
0
41
Other Decks in Science
See All in Science
O ChatGPT e outras IAs vão mudar toda a pesquisa científica
cardososampaio
0
170
Design of three-dimensional binary manipulators based on the KS statistic and maximum empty circles (IECON2023)
konakalab
0
230
勉強会資料 / “Asymptotic Statistics” Section 2.1
asymptotic_minato
0
220
DEIM2024 チュートリアル ~AWSで生成AIのRAGを使ったチャットボットを作ってみよう~
yamahiro
3
620
Machine Learning for Materials (Lecture 4)
aronwalsh
0
670
Demucsを用いた音源分離
508shuto
0
190
Spark_Task_Optimization_Journey_How_I_Increased_10x_Speed_by_Performance_Tuning
tlyu0419
0
200
Ph.D. defense "Convex Manifold Approximation for Tensors"
gkazunii
0
180
データで課題を解決する -因果関係を調べる統計的因果推論-
sshimizu2006
4
1.3k
2023-10-03-FOGBoston
lcolladotor
0
170
Microbiology Labs.
maleehafatima
0
130
早わかり W3C Community Group
takanorip
0
270
Featured
See All Featured
A Modern Web Designer's Workflow
chriscoyier
689
190k
Into the Great Unknown - MozCon
thekraken
10
990
The Invisible Side of Design
smashingmag
294
49k
Code Reviewing Like a Champion
maltzj
514
39k
The Language of Interfaces
destraynor
151
23k
Making the Leap to Tech Lead
cromwellryan
124
8.5k
Imperfection Machines: The Place of Print at Facebook
scottboms
260
12k
[RailsConf 2023] Rails as a piece of cake
palkan
23
3.9k
Reflections from 52 weeks, 52 projects
jeffersonlam
345
19k
Agile that works and the tools we love
rasmusluckow
325
20k
How GitHub Uses GitHub to Build GitHub
holman
468
290k
Practical Orchestrator
shlominoach
182
9.7k
Transcript
French Guiana Virus Hunting in Nacho Caballero
French Guiana
Rodents Bats
Rodents Bats Leishmania
Capture
Capture Isolate viral particles
Capture Isolate viral particles Extract RNA
Capture Isolate viral particles Extract RNA Sequence
Estimated read coverage % reads with coverage smaller than x
Rodents
Estimated read coverage % reads with coverage smaller than x
Rodents
Estimated read coverage % reads with coverage smaller than x
Rodents Bats
Read How can we estimate the coverage without a reference
genome?
Read How can we estimate the coverage without a reference
genome?
K-mers Read How can we estimate the coverage without a
reference genome?
How can we estimate the coverage without a reference genome?
1 1 1 1 1 1 1 How can we
estimate the coverage without a reference genome?
7 8 10 8 11 3 6
7 8 10 8 11 3 6 Median k-mer count
≈ Read coverage
None
k-mers make it possible to align without a reference
None
Problem: each sequencing error introduces k erroneous k-mers
Problem: each sequencing error introduces k erroneous k-mers
7 8 10 8 11 3 6 Over a threshold,
additional reads are redundant
5 5 5 5 5 3 5 Solution: digital normalization
reduces redundancy and errors
Assembly
Assembly SPADes
Assembly Alignment
Assembly Alignment BLAST
Assembly Taxonomy Alignment
Assembly Taxonomy Alignment NCBI
Problem: 67% of contigs in rodent dataset (serum) align to
human sequences
Problem: 67% of contigs in rodent dataset (serum) align to
human sequences Night-heron coronavirus HKU19 (1 Kb) Simian hemorrhagic fever virus (300 bp) Equine arteritis virus (3.7 Kb) Possum nidovirus Rodent hepacivirus Chipmunk parvovirus Theiler's disease-associated virus Reticuloendotheliosis virus Mosquito VEM Anellovirus SDBVL A Porcine reproductive and respiratory syndrome virus Dragonfly-associated circular virus 1 Gemycircularvirus 3 Rodent pegivirus Cyclovirus PK5510 Hypericum japonicum associated circular DNA virus
Pig stool associated circular ssDNA virus (1Kb) Avian gyrovirus 2
Torque teno sus virus 1a Mosquito VEM virus SDBVL G Turdivirus 3 Problem: 92% of contigs in bat dataset (droppings) don’t align to anything in NCBI
Lymphocytic choriomeningitis virus (7kb) Hepatitis C virus Amphotropic murine leukemia
virus Murid herpesvirus 1 Mosquito VEM Anellovirus SDBVL A Rat retrovirus SC1 Mason-Pfizer monkey virus (retrovirus) Eidolon helvum parvovirus 2 Periplaneta fuliginosa densovirus (also a parvovirus) Moloney murine sarcoma virus Sclerotinia sclerotiorum hypovirulence associated DNA virus 1 Problem: 95% of contigs in rodent dataset 2 (serum, spleen) align to mouse sequences (2)
7 out of 10 samples contained more than 1Kb of
Leishmania RNA virus (94% ident) 5 Kb genome
Lessons
Assume that 50% of your samples are going to fail
Lessons
Assume that 50% of your samples are going to fail
Lessons Design a small experiment, then iterate
Assume that 50% of your samples are going to fail
Lessons Design a small experiment, then iterate Come up with excuses to learn