Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Hunting for viruses in French Guiana
Search
Nacho Caballero
April 29, 2014
Science
0
61
Hunting for viruses in French Guiana
Lab meeting presentation about my work doing viral metagenomic analysis in French Guiana
Nacho Caballero
April 29, 2014
Tweet
Share
More Decks by Nacho Caballero
See All by Nacho Caballero
Bridging data analysis and interactive visualization
nachocab
0
46
Other Decks in Science
See All in Science
コンピュータビジョンによるロボットの視覚と判断:宇宙空間での適応と課題
hf149
1
490
次代のデータサイエンティストへ~スキルチェックリスト、タスクリスト更新~
datascientistsociety
PRO
2
24k
【RSJ2025】PAMIQ Core: リアルタイム継続学習のための⾮同期推論・学習フレームワーク
gesonanko
0
570
あなたに水耕栽培を愛していないとは言わせない
mutsumix
1
150
Optimization of the Tournament Format for the Nationwide High School Kyudo Competition in Japan
konakalab
0
140
Kaggle: NeurIPS - Open Polymer Prediction 2025 コンペ 反省会
calpis10000
0
320
baseballrによるMLBデータの抽出と階層ベイズモデルによる打率の推定 / TokyoR118
dropout009
2
650
生成検索エンジン最適化に関する研究の紹介
ynakano
2
1.5k
データベース10: 拡張実体関連モデル
trycycle
PRO
0
1.1k
学術講演会中央大学学員会府中支部
tagtag
PRO
0
340
academist Prize 4期生 研究トーク延長戦!「美は世界を救う」っていうけど、どうやって?
jimpe_hitsuwari
0
460
LayerXにおける業務の完全自動運転化に向けたAI技術活用事例 / layerx-ai-jsai2025
shimacos
2
21k
Featured
See All Featured
For a Future-Friendly Web
brad_frost
180
10k
The Organizational Zoo: Understanding Human Behavior Agility Through Metaphoric Constructive Conversations (based on the works of Arthur Shelley, Ph.D)
kimpetersen
PRO
0
210
Groundhog Day: Seeking Process in Gaming for Health
codingconduct
0
74
How STYLIGHT went responsive
nonsquared
100
6k
Jamie Indigo - Trashchat’s Guide to Black Boxes: Technical SEO Tactics for LLMs
techseoconnect
PRO
0
38
How to audit for AI Accessibility on your Front & Back End
davetheseo
0
140
The SEO identity crisis: Don't let AI make you average
varn
0
47
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.4k
DevOps and Value Stream Thinking: Enabling flow, efficiency and business value
helenjbeal
1
77
Conquering PDFs: document understanding beyond plain text
inesmontani
PRO
4
2.2k
We Have a Design System, Now What?
morganepeng
54
8k
The B2B funnel & how to create a winning content strategy
katarinadahlin
PRO
0
230
Transcript
French Guiana Virus Hunting in Nacho Caballero
French Guiana
Rodents Bats
Rodents Bats Leishmania
Capture
Capture Isolate viral particles
Capture Isolate viral particles Extract RNA
Capture Isolate viral particles Extract RNA Sequence
Estimated read coverage % reads with coverage smaller than x
Rodents
Estimated read coverage % reads with coverage smaller than x
Rodents
Estimated read coverage % reads with coverage smaller than x
Rodents Bats
Read How can we estimate the coverage without a reference
genome?
Read How can we estimate the coverage without a reference
genome?
K-mers Read How can we estimate the coverage without a
reference genome?
How can we estimate the coverage without a reference genome?
1 1 1 1 1 1 1 How can we
estimate the coverage without a reference genome?
7 8 10 8 11 3 6
7 8 10 8 11 3 6 Median k-mer count
≈ Read coverage
None
k-mers make it possible to align without a reference
None
Problem: each sequencing error introduces k erroneous k-mers
Problem: each sequencing error introduces k erroneous k-mers
7 8 10 8 11 3 6 Over a threshold,
additional reads are redundant
5 5 5 5 5 3 5 Solution: digital normalization
reduces redundancy and errors
Assembly
Assembly SPADes
Assembly Alignment
Assembly Alignment BLAST
Assembly Taxonomy Alignment
Assembly Taxonomy Alignment NCBI
Problem: 67% of contigs in rodent dataset (serum) align to
human sequences
Problem: 67% of contigs in rodent dataset (serum) align to
human sequences Night-heron coronavirus HKU19 (1 Kb) Simian hemorrhagic fever virus (300 bp) Equine arteritis virus (3.7 Kb) Possum nidovirus Rodent hepacivirus Chipmunk parvovirus Theiler's disease-associated virus Reticuloendotheliosis virus Mosquito VEM Anellovirus SDBVL A Porcine reproductive and respiratory syndrome virus Dragonfly-associated circular virus 1 Gemycircularvirus 3 Rodent pegivirus Cyclovirus PK5510 Hypericum japonicum associated circular DNA virus
Pig stool associated circular ssDNA virus (1Kb) Avian gyrovirus 2
Torque teno sus virus 1a Mosquito VEM virus SDBVL G Turdivirus 3 Problem: 92% of contigs in bat dataset (droppings) don’t align to anything in NCBI
Lymphocytic choriomeningitis virus (7kb) Hepatitis C virus Amphotropic murine leukemia
virus Murid herpesvirus 1 Mosquito VEM Anellovirus SDBVL A Rat retrovirus SC1 Mason-Pfizer monkey virus (retrovirus) Eidolon helvum parvovirus 2 Periplaneta fuliginosa densovirus (also a parvovirus) Moloney murine sarcoma virus Sclerotinia sclerotiorum hypovirulence associated DNA virus 1 Problem: 95% of contigs in rodent dataset 2 (serum, spleen) align to mouse sequences (2)
7 out of 10 samples contained more than 1Kb of
Leishmania RNA virus (94% ident) 5 Kb genome
Lessons
Assume that 50% of your samples are going to fail
Lessons
Assume that 50% of your samples are going to fail
Lessons Design a small experiment, then iterate
Assume that 50% of your samples are going to fail
Lessons Design a small experiment, then iterate Come up with excuses to learn