Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
NextGen Sequencing data intro.
Search
David Rio Deiros
March 26, 2012
Research
3
230
NextGen Sequencing data intro.
Brief introduction to nextgen sequencing data.
David Rio Deiros
March 26, 2012
Tweet
Share
Other Decks in Research
See All in Research
Vision and LanguageからのEmbodied AIとAI for Science
yushiku
PRO
1
400
NLP Colloquium
junokim
1
180
定性データ、どう活かす? 〜定性データのための分析基盤、はじめました〜 / How to utilize qualitative data? ~We have launched an analysis platform for qualitative data~
kaminashi
7
1.1k
問いを起点に、社会と共鳴する知を育む場へ
matsumoto_r
PRO
0
500
生成的推薦の人気バイアスの分析:暗記の観点から / JSAI2025
upura
0
230
在庫管理のための機械学習と最適化の融合
mickey_kubo
3
1.1k
数理最適化と機械学習の融合
mickey_kubo
15
9k
最適決定木を用いた処方的価格最適化
mickey_kubo
4
1.8k
RHO-1: Not All Tokens Are What You Need
sansan_randd
1
160
SSII2025 [TS1] 光学・物理原理に基づく深層画像生成
ssii
PRO
4
4k
20250624_熊本経済同友会6月例会講演
trafficbrain
1
520
データxデジタルマップで拓く ミラノ発・地域共創最前線
mapconcierge4agu
0
200
Featured
See All Featured
Git: the NoSQL Database
bkeepers
PRO
431
65k
The Cost Of JavaScript in 2023
addyosmani
51
8.7k
Writing Fast Ruby
sferik
628
62k
YesSQL, Process and Tooling at Scale
rocio
173
14k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
44
2.4k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
30
2.2k
The Language of Interfaces
destraynor
158
25k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
Unsuck your backbone
ammeep
671
58k
Visualization
eitanlees
146
16k
Measuring & Analyzing Core Web Vitals
bluesmoon
7
530
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
16k
Transcript
Next-Gen Sequencing Data
@shiondev
@drio
None
Σ Bases DNA == “The Genome”
Σ Bases DNA == “The Genome” 3Gbp
…ACAGTTTTCAAGAGCCGGTTTTACTAGGATTATTACTG…
…ACAGTTTTCAAGAGCCGGTTTTACTAGGATTATTACTG… G
…ACAGTTTTCAAGAGCCGGTTTTACTAGGATTATTACTG… G
…ACAGTTTTCAAGAGCCGGTTTTACTAGGATTATTACTG… G T
…ACAGTTTTCAAGAGCCGGTTTTACTAGGATTATTACTG… G T -‐
…ACAGTTTTCAAGAGCCGGTTTTACTAGGATTATTACTG… G T -‐ A
…ACAGTTTTCAAGAGCCGGTTTTACTAGGATTATTACTG… G T -‐ A
SEQUENCING
None
# of Bases per day per machine
200kbp 2000
1Mbp 2003
200 Mbp 2005
3Gbp 2009
60Gbp 2012
what can we do with NGS data?
Re-sequencing
Re-sequencing Looking for changes in a Genome
Re-sequencing Looking for changes in a Genome (Given that we
have a HIGH quality reference)
Re-sequencing Looking for changes in a Genome (Given that we
have a HIGH quality reference) Consequences?
Reliably finding those changes is not easy
(1%-3%) of your bases may be errors.
30x
What’s the typical workflow in a re-sequencing project ?
Library preparation
Library preparation Sequencing
Library preparation Sequencing Analysis I (images -> reads)
.fastq ... >HWI-ST821_0129:5:1101:1927:2089#GATCAG/1 TGGACAACGGCCAGGTTAATGATGGGCAGGTAGAAGATGATCACT +HWI-ST821_0129:5:1101:1927:2089#GATCAG/1 ___ccccccYc[eff`]X`a^ef][RHP^_cXIYSXcXcfSWXcd ...
Library preparation Sequencing Analysis I (images -> reads) Analysis II
(alignments)
None
Library preparation Sequencing Analysis I (images -> reads) Analysis II
(alignments) Analysis III (Variant calling)
.vcf
Library preparation Sequencing Analysis I (images -> reads) Analysis II
(alignments) Analysis III (Variant calling) Annotation
Library preparation Sequencing Analysis I (images -> reads) Analysis II
(alignments) Analysis III (Variant calling) Annotation Science starts here …
None
Library preparation Sequencing Analysis I (images -> reads) Analysis II
(alignments) Analysis III (SNP calling) Annotation 5/15 Tb 150Gb 80G 16 8G 1G 1 400Mb 1
1 Genome (3-6 days) ~ 230Gb
1 Genome
Let’s do it again for N genomes
Let’s do it again for N genomes
None
None
None
None
None
None
None
personalize medicine
personalize medicine Tailor physician decisions and practices to individual patients
Let’s do it again for N genomes
None
None
Thanks!