Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Hunting for viruses in French Guiana
Search
Nacho Caballero
April 29, 2014
Science
0
60
Hunting for viruses in French Guiana
Lab meeting presentation about my work doing viral metagenomic analysis in French Guiana
Nacho Caballero
April 29, 2014
Tweet
Share
More Decks by Nacho Caballero
See All by Nacho Caballero
Bridging data analysis and interactive visualization
nachocab
0
45
Other Decks in Science
See All in Science
機械学習 - SVM
trycycle
PRO
1
910
風の力で振れ幅が大きくなる振り子!? 〜タコマナローズ橋はなぜ落ちたのか〜
syotasasaki593876
1
110
データベース05: SQL(2/3) 結合質問
trycycle
PRO
0
820
「美は世界を救う」を心理学で実証したい~クラファンを通じた新しい研究方法
jimpe_hitsuwari
1
170
ランサムウェア対策にも考慮したVMware、Hyper-V、Azure、AWS間のリアルタイムレプリケーション「Zerto」を徹底解説
climbteam
0
150
LayerXにおける業務の完全自動運転化に向けたAI技術活用事例 / layerx-ai-jsai2025
shimacos
2
17k
機械学習 - 授業概要
trycycle
PRO
0
260
研究って何だっけ / What is Research?
ks91
PRO
1
140
KH Coderチュートリアル(スライド版)
koichih
1
49k
Cross-Media Technologies, Information Science and Human-Information Interaction
signer
PRO
3
31k
蔵本モデルが解き明かす同期と相転移の秘密 〜拍手のリズムはなぜ揃うのか?〜
syotasasaki593876
1
110
白金鉱業Meetup_Vol.20 効果検証ことはじめ / Introduction to Impact Evaluation
brainpadpr
1
1.2k
Featured
See All Featured
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.7k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
10
620
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
Build your cross-platform service in a week with App Engine
jlugia
233
18k
Documentation Writing (for coders)
carmenintech
75
5.1k
A designer walks into a library…
pauljervisheath
209
24k
Making the Leap to Tech Lead
cromwellryan
135
9.6k
How to Think Like a Performance Engineer
csswizardry
27
2.1k
The Cost Of JavaScript in 2023
addyosmani
55
9.1k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
285
14k
Bootstrapping a Software Product
garrettdimon
PRO
307
110k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.2k
Transcript
French Guiana Virus Hunting in Nacho Caballero
French Guiana
Rodents Bats
Rodents Bats Leishmania
Capture
Capture Isolate viral particles
Capture Isolate viral particles Extract RNA
Capture Isolate viral particles Extract RNA Sequence
Estimated read coverage % reads with coverage smaller than x
Rodents
Estimated read coverage % reads with coverage smaller than x
Rodents
Estimated read coverage % reads with coverage smaller than x
Rodents Bats
Read How can we estimate the coverage without a reference
genome?
Read How can we estimate the coverage without a reference
genome?
K-mers Read How can we estimate the coverage without a
reference genome?
How can we estimate the coverage without a reference genome?
1 1 1 1 1 1 1 How can we
estimate the coverage without a reference genome?
7 8 10 8 11 3 6
7 8 10 8 11 3 6 Median k-mer count
≈ Read coverage
None
k-mers make it possible to align without a reference
None
Problem: each sequencing error introduces k erroneous k-mers
Problem: each sequencing error introduces k erroneous k-mers
7 8 10 8 11 3 6 Over a threshold,
additional reads are redundant
5 5 5 5 5 3 5 Solution: digital normalization
reduces redundancy and errors
Assembly
Assembly SPADes
Assembly Alignment
Assembly Alignment BLAST
Assembly Taxonomy Alignment
Assembly Taxonomy Alignment NCBI
Problem: 67% of contigs in rodent dataset (serum) align to
human sequences
Problem: 67% of contigs in rodent dataset (serum) align to
human sequences Night-heron coronavirus HKU19 (1 Kb) Simian hemorrhagic fever virus (300 bp) Equine arteritis virus (3.7 Kb) Possum nidovirus Rodent hepacivirus Chipmunk parvovirus Theiler's disease-associated virus Reticuloendotheliosis virus Mosquito VEM Anellovirus SDBVL A Porcine reproductive and respiratory syndrome virus Dragonfly-associated circular virus 1 Gemycircularvirus 3 Rodent pegivirus Cyclovirus PK5510 Hypericum japonicum associated circular DNA virus
Pig stool associated circular ssDNA virus (1Kb) Avian gyrovirus 2
Torque teno sus virus 1a Mosquito VEM virus SDBVL G Turdivirus 3 Problem: 92% of contigs in bat dataset (droppings) don’t align to anything in NCBI
Lymphocytic choriomeningitis virus (7kb) Hepatitis C virus Amphotropic murine leukemia
virus Murid herpesvirus 1 Mosquito VEM Anellovirus SDBVL A Rat retrovirus SC1 Mason-Pfizer monkey virus (retrovirus) Eidolon helvum parvovirus 2 Periplaneta fuliginosa densovirus (also a parvovirus) Moloney murine sarcoma virus Sclerotinia sclerotiorum hypovirulence associated DNA virus 1 Problem: 95% of contigs in rodent dataset 2 (serum, spleen) align to mouse sequences (2)
7 out of 10 samples contained more than 1Kb of
Leishmania RNA virus (94% ident) 5 Kb genome
Lessons
Assume that 50% of your samples are going to fail
Lessons
Assume that 50% of your samples are going to fail
Lessons Design a small experiment, then iterate
Assume that 50% of your samples are going to fail
Lessons Design a small experiment, then iterate Come up with excuses to learn