Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
NextGen Sequencing data intro.
Search
David Rio Deiros
March 26, 2012
Research
3
230
NextGen Sequencing data intro.
Brief introduction to nextgen sequencing data.
David Rio Deiros
March 26, 2012
Tweet
Share
Other Decks in Research
See All in Research
MetaEarth: A Generative Foundation Model for Global-Scale Remote Sensing Image Generation
satai
4
170
AIスパコン「さくらONE」のLLM学習ベンチマークによる性能評価 / SAKURAONE LLM Training Benchmarking
yuukit
0
170
ストレス計測方法の確立に向けたマルチモーダルデータの活用
yurikomium
0
1.3k
Galileo: Learning Global & Local Features of Many Remote Sensing Modalities
satai
3
210
数理最適化に基づく制御
mickey_kubo
6
720
VectorLLM: Human-like Extraction of Structured Building Contours via Multimodal LLMs
satai
4
170
業界横断 副業・兼業者の実態調査
fkske
0
230
Vision and LanguageからのEmbodied AIとAI for Science
yushiku
PRO
1
520
AWSで実現した大規模日本語VLM学習用データセット "MOMIJI" 構築パイプライン/buiding-momiji
studio_graph
2
360
Generative Models 2025
takahashihiroshi
24
13k
2021年度-基盤研究B-研究計画調書
trycycle
PRO
0
270
近似動的計画入門
mickey_kubo
4
1k
Featured
See All Featured
Java REST API Framework Comparison - PWX 2021
mraible
33
8.8k
Side Projects
sachag
455
43k
A Tale of Four Properties
chriscoyier
160
23k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
15
1.6k
Visualization
eitanlees
147
16k
Building a Scalable Design System with Sketch
lauravandoore
462
33k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
61k
Scaling GitHub
holman
463
140k
Agile that works and the tools we love
rasmusluckow
330
21k
Bootstrapping a Software Product
garrettdimon
PRO
307
110k
Thoughts on Productivity
jonyablonski
69
4.8k
The Invisible Side of Design
smashingmag
301
51k
Transcript
Next-Gen Sequencing Data
@shiondev
@drio
None
Σ Bases DNA == “The Genome”
Σ Bases DNA == “The Genome” 3Gbp
…ACAGTTTTCAAGAGCCGGTTTTACTAGGATTATTACTG…
…ACAGTTTTCAAGAGCCGGTTTTACTAGGATTATTACTG… G
…ACAGTTTTCAAGAGCCGGTTTTACTAGGATTATTACTG… G
…ACAGTTTTCAAGAGCCGGTTTTACTAGGATTATTACTG… G T
…ACAGTTTTCAAGAGCCGGTTTTACTAGGATTATTACTG… G T -‐
…ACAGTTTTCAAGAGCCGGTTTTACTAGGATTATTACTG… G T -‐ A
…ACAGTTTTCAAGAGCCGGTTTTACTAGGATTATTACTG… G T -‐ A
SEQUENCING
None
# of Bases per day per machine
200kbp 2000
1Mbp 2003
200 Mbp 2005
3Gbp 2009
60Gbp 2012
what can we do with NGS data?
Re-sequencing
Re-sequencing Looking for changes in a Genome
Re-sequencing Looking for changes in a Genome (Given that we
have a HIGH quality reference)
Re-sequencing Looking for changes in a Genome (Given that we
have a HIGH quality reference) Consequences?
Reliably finding those changes is not easy
(1%-3%) of your bases may be errors.
30x
What’s the typical workflow in a re-sequencing project ?
Library preparation
Library preparation Sequencing
Library preparation Sequencing Analysis I (images -> reads)
.fastq ... >HWI-ST821_0129:5:1101:1927:2089#GATCAG/1 TGGACAACGGCCAGGTTAATGATGGGCAGGTAGAAGATGATCACT +HWI-ST821_0129:5:1101:1927:2089#GATCAG/1 ___ccccccYc[eff`]X`a^ef][RHP^_cXIYSXcXcfSWXcd ...
Library preparation Sequencing Analysis I (images -> reads) Analysis II
(alignments)
None
Library preparation Sequencing Analysis I (images -> reads) Analysis II
(alignments) Analysis III (Variant calling)
.vcf
Library preparation Sequencing Analysis I (images -> reads) Analysis II
(alignments) Analysis III (Variant calling) Annotation
Library preparation Sequencing Analysis I (images -> reads) Analysis II
(alignments) Analysis III (Variant calling) Annotation Science starts here …
None
Library preparation Sequencing Analysis I (images -> reads) Analysis II
(alignments) Analysis III (SNP calling) Annotation 5/15 Tb 150Gb 80G 16 8G 1G 1 400Mb 1
1 Genome (3-6 days) ~ 230Gb
1 Genome
Let’s do it again for N genomes
Let’s do it again for N genomes
None
None
None
None
None
None
None
personalize medicine
personalize medicine Tailor physician decisions and practices to individual patients
Let’s do it again for N genomes
None
None
Thanks!