Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Hunting for viruses in French Guiana
Search
Nacho Caballero
April 29, 2014
Science
0
56
Hunting for viruses in French Guiana
Lab meeting presentation about my work doing viral metagenomic analysis in French Guiana
Nacho Caballero
April 29, 2014
Tweet
Share
More Decks by Nacho Caballero
See All by Nacho Caballero
Bridging data analysis and interactive visualization
nachocab
0
42
Other Decks in Science
See All in Science
MoveItを使った産業用ロボット向け動作作成方法の紹介 / Introduction to creating motion for industrial robots using MoveIt
ry0_ka
0
220
Science of Scienceおよび科学計量学に関する研究論文の俯瞰可視化_ポスター版
hayataka88
0
160
機械学習による確率推定とカリブレーション/probabilistic-calibration-on-classification-model
ktgrstsh
2
320
Improving Search @scale with efficient query experimentation @BerlinBuzzwords 2024
searchhub
0
260
[第62回 CV勉強会@関東] Long-CLIP: Unlocking the Long-Text Capability of CLIP / kantoCV 62th ECCV 2024
lychee1223
1
790
FOGBoston2024
lcolladotor
0
130
化学におけるAI・シミュレーション活用のトレンドと 汎用原子レベルシミュレーター: Matlantisを使った素材開発
matlantis
0
370
深層学習を利用して 大豆の外部欠陥を判別した研究事例の紹介
kentaitakura
0
260
白金鉱業Meetup Vol.16_数理最適化案件のはじめかた・すすめかた
brainpadpr
3
1.2k
As We May Interact: Challenges and Opportunities for Next-Generation Human-Information Interaction
signer
PRO
0
240
小杉考司(専修大学)
kosugitti
2
580
最適化超入門
tkm2261
14
3.4k
Featured
See All Featured
The Illustrated Children's Guide to Kubernetes
chrisshort
48
49k
Building a Modern Day E-commerce SEO Strategy
aleyda
38
7k
Intergalactic Javascript Robots from Outer Space
tanoku
270
27k
YesSQL, Process and Tooling at Scale
rocio
170
14k
Code Reviewing Like a Champion
maltzj
521
39k
Faster Mobile Websites
deanohume
305
30k
Java REST API Framework Comparison - PWX 2021
mraible
28
8.3k
The Invisible Side of Design
smashingmag
299
50k
Documentation Writing (for coders)
carmenintech
67
4.5k
Building Adaptive Systems
keathley
38
2.3k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
45
2.3k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
3
340
Transcript
French Guiana Virus Hunting in Nacho Caballero
French Guiana
Rodents Bats
Rodents Bats Leishmania
Capture
Capture Isolate viral particles
Capture Isolate viral particles Extract RNA
Capture Isolate viral particles Extract RNA Sequence
Estimated read coverage % reads with coverage smaller than x
Rodents
Estimated read coverage % reads with coverage smaller than x
Rodents
Estimated read coverage % reads with coverage smaller than x
Rodents Bats
Read How can we estimate the coverage without a reference
genome?
Read How can we estimate the coverage without a reference
genome?
K-mers Read How can we estimate the coverage without a
reference genome?
How can we estimate the coverage without a reference genome?
1 1 1 1 1 1 1 How can we
estimate the coverage without a reference genome?
7 8 10 8 11 3 6
7 8 10 8 11 3 6 Median k-mer count
≈ Read coverage
None
k-mers make it possible to align without a reference
None
Problem: each sequencing error introduces k erroneous k-mers
Problem: each sequencing error introduces k erroneous k-mers
7 8 10 8 11 3 6 Over a threshold,
additional reads are redundant
5 5 5 5 5 3 5 Solution: digital normalization
reduces redundancy and errors
Assembly
Assembly SPADes
Assembly Alignment
Assembly Alignment BLAST
Assembly Taxonomy Alignment
Assembly Taxonomy Alignment NCBI
Problem: 67% of contigs in rodent dataset (serum) align to
human sequences
Problem: 67% of contigs in rodent dataset (serum) align to
human sequences Night-heron coronavirus HKU19 (1 Kb) Simian hemorrhagic fever virus (300 bp) Equine arteritis virus (3.7 Kb) Possum nidovirus Rodent hepacivirus Chipmunk parvovirus Theiler's disease-associated virus Reticuloendotheliosis virus Mosquito VEM Anellovirus SDBVL A Porcine reproductive and respiratory syndrome virus Dragonfly-associated circular virus 1 Gemycircularvirus 3 Rodent pegivirus Cyclovirus PK5510 Hypericum japonicum associated circular DNA virus
Pig stool associated circular ssDNA virus (1Kb) Avian gyrovirus 2
Torque teno sus virus 1a Mosquito VEM virus SDBVL G Turdivirus 3 Problem: 92% of contigs in bat dataset (droppings) don’t align to anything in NCBI
Lymphocytic choriomeningitis virus (7kb) Hepatitis C virus Amphotropic murine leukemia
virus Murid herpesvirus 1 Mosquito VEM Anellovirus SDBVL A Rat retrovirus SC1 Mason-Pfizer monkey virus (retrovirus) Eidolon helvum parvovirus 2 Periplaneta fuliginosa densovirus (also a parvovirus) Moloney murine sarcoma virus Sclerotinia sclerotiorum hypovirulence associated DNA virus 1 Problem: 95% of contigs in rodent dataset 2 (serum, spleen) align to mouse sequences (2)
7 out of 10 samples contained more than 1Kb of
Leishmania RNA virus (94% ident) 5 Kb genome
Lessons
Assume that 50% of your samples are going to fail
Lessons
Assume that 50% of your samples are going to fail
Lessons Design a small experiment, then iterate
Assume that 50% of your samples are going to fail
Lessons Design a small experiment, then iterate Come up with excuses to learn