Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Hunting for viruses in French Guiana
Search
Nacho Caballero
April 29, 2014
Science
0
56
Hunting for viruses in French Guiana
Lab meeting presentation about my work doing viral metagenomic analysis in French Guiana
Nacho Caballero
April 29, 2014
Tweet
Share
More Decks by Nacho Caballero
See All by Nacho Caballero
Bridging data analysis and interactive visualization
nachocab
0
42
Other Decks in Science
See All in Science
点群ライブラリPDALをGoogleColabにて実行する方法の紹介
kentaitakura
1
250
Collective Predictive Coding Hypothesis and Beyond (@Japanese Association for Philosophy of Science, 26th October 2024)
tanichu
0
120
創薬における機械学習技術について
kanojikajino
16
5.2k
統計学入門講座 第2回スライド
techmathproject
0
120
モンテカルロDCF法による事業価値の算出(モンテカルロ法とベイズモデリング) / Business Valuation Using Monte Carlo DCF Method (Monte Carlo Simulation and Bayesian Modeling)
ikuma_w
0
140
白金鉱業Meetup Vol.16_【初学者向け発表】 数理最適化のはじめの一歩 〜身近な問題で学ぶ最適化の面白さ〜
brainpadpr
11
2.2k
データベース01: データベースを使わない世界
trycycle
PRO
1
620
Explanatory material
yuki1986
0
260
多次元展開法を用いた 多値バイクラスタリング モデルの提案
kosugitti
0
320
地表面抽出の方法であるSMRFについて紹介
kentaitakura
1
690
AI(人工知能)の過去・現在・未来 —AIは人間を超えるのか—
tagtag
0
120
機械学習 - SVM
trycycle
PRO
1
760
Featured
See All Featured
RailsConf 2023
tenderlove
30
1.1k
Building Applications with DynamoDB
mza
95
6.4k
Building a Modern Day E-commerce SEO Strategy
aleyda
40
7.3k
Fireside Chat
paigeccino
37
3.5k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
1
74
The Art of Programming - Codeland 2020
erikaheidi
54
13k
Designing Experiences People Love
moore
142
24k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
331
21k
Building Adaptive Systems
keathley
41
2.6k
StorybookのUI Testing Handbookを読んだ
zakiyama
30
5.8k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
52
2.8k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Transcript
French Guiana Virus Hunting in Nacho Caballero
French Guiana
Rodents Bats
Rodents Bats Leishmania
Capture
Capture Isolate viral particles
Capture Isolate viral particles Extract RNA
Capture Isolate viral particles Extract RNA Sequence
Estimated read coverage % reads with coverage smaller than x
Rodents
Estimated read coverage % reads with coverage smaller than x
Rodents
Estimated read coverage % reads with coverage smaller than x
Rodents Bats
Read How can we estimate the coverage without a reference
genome?
Read How can we estimate the coverage without a reference
genome?
K-mers Read How can we estimate the coverage without a
reference genome?
How can we estimate the coverage without a reference genome?
1 1 1 1 1 1 1 How can we
estimate the coverage without a reference genome?
7 8 10 8 11 3 6
7 8 10 8 11 3 6 Median k-mer count
≈ Read coverage
None
k-mers make it possible to align without a reference
None
Problem: each sequencing error introduces k erroneous k-mers
Problem: each sequencing error introduces k erroneous k-mers
7 8 10 8 11 3 6 Over a threshold,
additional reads are redundant
5 5 5 5 5 3 5 Solution: digital normalization
reduces redundancy and errors
Assembly
Assembly SPADes
Assembly Alignment
Assembly Alignment BLAST
Assembly Taxonomy Alignment
Assembly Taxonomy Alignment NCBI
Problem: 67% of contigs in rodent dataset (serum) align to
human sequences
Problem: 67% of contigs in rodent dataset (serum) align to
human sequences Night-heron coronavirus HKU19 (1 Kb) Simian hemorrhagic fever virus (300 bp) Equine arteritis virus (3.7 Kb) Possum nidovirus Rodent hepacivirus Chipmunk parvovirus Theiler's disease-associated virus Reticuloendotheliosis virus Mosquito VEM Anellovirus SDBVL A Porcine reproductive and respiratory syndrome virus Dragonfly-associated circular virus 1 Gemycircularvirus 3 Rodent pegivirus Cyclovirus PK5510 Hypericum japonicum associated circular DNA virus
Pig stool associated circular ssDNA virus (1Kb) Avian gyrovirus 2
Torque teno sus virus 1a Mosquito VEM virus SDBVL G Turdivirus 3 Problem: 92% of contigs in bat dataset (droppings) don’t align to anything in NCBI
Lymphocytic choriomeningitis virus (7kb) Hepatitis C virus Amphotropic murine leukemia
virus Murid herpesvirus 1 Mosquito VEM Anellovirus SDBVL A Rat retrovirus SC1 Mason-Pfizer monkey virus (retrovirus) Eidolon helvum parvovirus 2 Periplaneta fuliginosa densovirus (also a parvovirus) Moloney murine sarcoma virus Sclerotinia sclerotiorum hypovirulence associated DNA virus 1 Problem: 95% of contigs in rodent dataset 2 (serum, spleen) align to mouse sequences (2)
7 out of 10 samples contained more than 1Kb of
Leishmania RNA virus (94% ident) 5 Kb genome
Lessons
Assume that 50% of your samples are going to fail
Lessons
Assume that 50% of your samples are going to fail
Lessons Design a small experiment, then iterate
Assume that 50% of your samples are going to fail
Lessons Design a small experiment, then iterate Come up with excuses to learn