Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
DBG-Assemble A Genome
Search
Buttonwood
August 28, 2013
Education
0
22
DBG-Assemble A Genome
Assemble a genome
Buttonwood
August 28, 2013
Tweet
Share
Other Decks in Education
See All in Education
1,000年後の未来人に届くWebサイトを作りたい!
oguemon
2
1.1k
4 занятие. Разбор бизнес-моделей и метод красной нити #ideaNN 9.02.2024.
karlov
0
210
LTをすべき100の理由
eltociear
0
180
HyRead2324
cbtlibrary
0
110
Monaca Educationを活用したプログラミング授業実践
asial_edu
0
150
Baa Baa Black Sheep
haiinya
0
110
キャリアと組織の成長塾#1 アスリートからエンジニアの道へ
takashi_toyosaki
1
240
執筆テーマの決め方
sapi_kawahara
1
150
Monaca Educationを活用した課題解決型の探究学習の実践
asial_edu
0
150
Introduction - Lecture 1 - Information Visualisation (4019538FNR)
signer
PRO
0
3.5k
スタートアップCTOが個人開発で収益化・年13本記事発信・5件登壇を平行するための時間管理
texmeijin
4
780
D&I推進レポート〜テクノロジー分野のジェンダーギャップとその取り組みについて〜
codeforeveryone
1
410
Featured
See All Featured
Designing Experiences People Love
moore
135
23k
RailsConf 2023
tenderlove
0
510
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
219
21k
WebSockets: Embracing the real-time Web
robhawkes
59
6.9k
JazzCon 2018 Closing Keynote - Leadership for the Reluctant Leader
reverentgeek
178
11k
Build The Right Thing And Hit Your Dates
maggiecrowley
23
1.9k
Become a Pro
speakerdeck
PRO
8
4.4k
Building Applications with DynamoDB
mza
88
5.6k
How GitHub (no longer) Works
holman
301
140k
How to Ace a Technical Interview
jacobian
272
22k
Making the Leap to Tech Lead
cromwellryan
122
8.4k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
352
28k
Transcript
De Bruijn Graph
[email protected]
http://buttonwood.github.io
Origin of de Bruijn graphs Graph Theory: Hamilton path VS
Euler path In 1946, the Dutch mathematician Nicolaas de Bruijn The ‘superstring problem’: find a shortest circular ‘superstring’ that contains all possible ‘substrings’ of length k (k-mers) over a given alphabet.{0,1} -> {A,T,C,G}
De Bruijn Graph of a Small Sequence
Let’s go!From simple examples...
Double-Stranded Nature of Genome ATGGAAGTCGCTTCCAT TACCTTCAGCCAAGGTA 5’ 5’ 3’ 3’
Impact of Changing k-mer Size Big? Small? Avoid even k?
avoid even k, because with even k, many k-mers become reverse comcomplements of their own sequences.
DBG OF A GENOME
Sequencing errors ---> Tips A B C D • Clip
the short tips that had lengths < 2 Kmers • Or less number of reads through it.
Sequencing errors ---> CrossLink ATGGAAGTCGCG ATGGAAG TGGAAGT GGAAGTC GAAGTCG AAGTCGC
AGTCGCG GAGGAAGACCTT GAGGAAG AGGAAGA GGAAGAC GAAGACC AAGACCT AGACCTT GAGGAAGTCC AGGAAGT ATGGAAGTCG seq1 seq1-read1 seq2-read1 seq2 • Remove low-coverage nodes. Low-coverage connection
Sequencing errors --->Bubbles ATGGAAGTCGCG... ATGGAAG TGGAAGT GGAAGTC GAAGTCG AAGTCGC AGTCGCG
TGGAAGA GGAAGAC GAAGACG AAGACGC AGACGCG ATGGAAGACG... ATGGAAGTCG... seq1 seq1-read1 seq1-read2 • Remove low-coverage paths. • Same as SNP Bubbles. Low-coverage paths Bubbles
Repetitive Regions Simple Repeats Tandem Repeats
TINY OR LONG REPEAT ATTTAAATTAGCGATATTAGCATCTCTT .... AATTA ATTAG TAGCG AGCGA
GCGAT CGATA GATAT ATATT TATTA TAGCA AGCAT GCATC CATCT ATCTC TCTCT ... TTAGC c a d b e .... AATTAGC ATTAGCG TAGCGAT AGCGATA GCGATAT CGATATT GATATTA ATATTAG TATTAGC TTAGCGA ATTAGCA TTAGCAT TAGCATC .... You see what? Bigger k-mer(long overlap) cross the repeat.
Haplotype Differences
SNPs--->Bubbles ATGGAAGTCGCG... ATGGAAG TGGAAGT GGAAGTC GAAGTCG AAGTCGC AGTCGCG TGGAAGA GGAAGAC
GAAGACG AAGACGC AGACGCG ATGGAAGACG... ATGGAAGTCG... hap1 hap1-read1 hap2-read1 • Equal coverage paths. Equal-coverage paths Bubbles ATGGAAGACGCG... hap2 ? Adjacent SNPs ATGGTAGTCGCG... ATGGAAGACGCG... hap1 hap2
Indels--->Bubbles ATGGAAGTCGCGTCGA... ATGGAAG TGGAAGT ... CGCGTCG GCGTCGA TGGAAGG ... GGCCTCG
ATGGAAG---- GCGTC... ATGGAAGTCG... hap1 hap1-read1 hap2-read1 • Equal coverage paths. • Long road map. Equal-coverage paths Bubbles ATGGAAG-----GCGTCGA... hap2 ? Adjacent SNPs with Indels ATGGTAGTCGCAAGCC... ... ATGGAAGACGC---GCG... hap1 hap2
ASSIGNMENT 1 Let Kmer=4 Let Kmer=5 Let Kmer=7 ATTA TTAG
TAGG AGGA ATTAGGATCATGATCCTCTGTGGATAAGATCTTTTTATTTAAAGATCTCTTTATTAGATCTCTT … ATTA DBG of Genome; DBG of Reads; L = 15 TRY!
ASSIGNMENT 2 Simulated Hap1 about 1M: Count K-mer freq; Then
sequence 40X; Then add repeat/Error rate; Then SNPs(two haplotype);