Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
DBG-Assemble A Genome
Search
Buttonwood
August 28, 2013
Education
0
22
DBG-Assemble A Genome
Assemble a genome
Buttonwood
August 28, 2013
Tweet
Share
Other Decks in Education
See All in Education
Tutorial: Foundations of Blind Source Separation and Its Advances in Spatial Self-Supervised Learning
yoshipon
1
150
OpenSourceSummitJapanを運営してみた話
kujiraitakahiro
0
790
”育てる”から”育つ”仕組みへ!スクラムによる新入社員教育
arapon
0
140
日本の情報系社会人院生のリアル -JAIST 修士編-
yurikomium
1
120
マネジメント「される側」 こそ覚悟を決めろ
nao_randd
9
5.5k
(キラキラ)人事教育担当のつらみ~教育担当として知っておくポイント~
masakiokuda
0
130
【品女100周年企画】Pitch Deck
shinagawajoshigakuin_100th
0
5.9k
みんなのコードD&I推進レポート2025 テクノロジー分野のジェンダーギャップとその取り組みについて
codeforeveryone
0
210
シリコンバレーでスタートアップを共同創業したファウンディングエンジニアとしての学び
tomoima525
1
1.2k
生成AIとの上手な付き合い方【公開版】/ How to Get Along Well with Generative AI (Public Version)
handlename
0
630
自己紹介 / who-am-i
yasulab
PRO
3
5.4k
Pydantic(AI)とJSONの詳細解説
mickey_kubo
0
190
Featured
See All Featured
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
How STYLIGHT went responsive
nonsquared
100
5.8k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
188
55k
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.4k
Building a Scalable Design System with Sketch
lauravandoore
462
33k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
285
13k
Building a Modern Day E-commerce SEO Strategy
aleyda
43
7.6k
Six Lessons from altMBA
skipperchong
28
4k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
139
34k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
15
1.6k
Into the Great Unknown - MozCon
thekraken
40
2k
How to Ace a Technical Interview
jacobian
279
23k
Transcript
De Bruijn Graph
[email protected]
http://buttonwood.github.io
Origin of de Bruijn graphs Graph Theory: Hamilton path VS
Euler path In 1946, the Dutch mathematician Nicolaas de Bruijn The ‘superstring problem’: find a shortest circular ‘superstring’ that contains all possible ‘substrings’ of length k (k-mers) over a given alphabet.{0,1} -> {A,T,C,G}
De Bruijn Graph of a Small Sequence
Let’s go!From simple examples...
Double-Stranded Nature of Genome ATGGAAGTCGCTTCCAT TACCTTCAGCCAAGGTA 5’ 5’ 3’ 3’
Impact of Changing k-mer Size Big? Small? Avoid even k?
avoid even k, because with even k, many k-mers become reverse comcomplements of their own sequences.
DBG OF A GENOME
Sequencing errors ---> Tips A B C D • Clip
the short tips that had lengths < 2 Kmers • Or less number of reads through it.
Sequencing errors ---> CrossLink ATGGAAGTCGCG ATGGAAG TGGAAGT GGAAGTC GAAGTCG AAGTCGC
AGTCGCG GAGGAAGACCTT GAGGAAG AGGAAGA GGAAGAC GAAGACC AAGACCT AGACCTT GAGGAAGTCC AGGAAGT ATGGAAGTCG seq1 seq1-read1 seq2-read1 seq2 • Remove low-coverage nodes. Low-coverage connection
Sequencing errors --->Bubbles ATGGAAGTCGCG... ATGGAAG TGGAAGT GGAAGTC GAAGTCG AAGTCGC AGTCGCG
TGGAAGA GGAAGAC GAAGACG AAGACGC AGACGCG ATGGAAGACG... ATGGAAGTCG... seq1 seq1-read1 seq1-read2 • Remove low-coverage paths. • Same as SNP Bubbles. Low-coverage paths Bubbles
Repetitive Regions Simple Repeats Tandem Repeats
TINY OR LONG REPEAT ATTTAAATTAGCGATATTAGCATCTCTT .... AATTA ATTAG TAGCG AGCGA
GCGAT CGATA GATAT ATATT TATTA TAGCA AGCAT GCATC CATCT ATCTC TCTCT ... TTAGC c a d b e .... AATTAGC ATTAGCG TAGCGAT AGCGATA GCGATAT CGATATT GATATTA ATATTAG TATTAGC TTAGCGA ATTAGCA TTAGCAT TAGCATC .... You see what? Bigger k-mer(long overlap) cross the repeat.
Haplotype Differences
SNPs--->Bubbles ATGGAAGTCGCG... ATGGAAG TGGAAGT GGAAGTC GAAGTCG AAGTCGC AGTCGCG TGGAAGA GGAAGAC
GAAGACG AAGACGC AGACGCG ATGGAAGACG... ATGGAAGTCG... hap1 hap1-read1 hap2-read1 • Equal coverage paths. Equal-coverage paths Bubbles ATGGAAGACGCG... hap2 ? Adjacent SNPs ATGGTAGTCGCG... ATGGAAGACGCG... hap1 hap2
Indels--->Bubbles ATGGAAGTCGCGTCGA... ATGGAAG TGGAAGT ... CGCGTCG GCGTCGA TGGAAGG ... GGCCTCG
ATGGAAG---- GCGTC... ATGGAAGTCG... hap1 hap1-read1 hap2-read1 • Equal coverage paths. • Long road map. Equal-coverage paths Bubbles ATGGAAG-----GCGTCGA... hap2 ? Adjacent SNPs with Indels ATGGTAGTCGCAAGCC... ... ATGGAAGACGC---GCG... hap1 hap2
ASSIGNMENT 1 Let Kmer=4 Let Kmer=5 Let Kmer=7 ATTA TTAG
TAGG AGGA ATTAGGATCATGATCCTCTGTGGATAAGATCTTTTTATTTAAAGATCTCTTTATTAGATCTCTT … ATTA DBG of Genome; DBG of Reads; L = 15 TRY!
ASSIGNMENT 2 Simulated Hap1 about 1M: Count K-mer freq; Then
sequence 40X; Then add repeat/Error rate; Then SNPs(two haplotype);