Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
DBG-Assemble A Genome
Search
Buttonwood
August 28, 2013
Education
0
26
DBG-Assemble A Genome
Assemble a genome
Buttonwood
August 28, 2013
Tweet
Share
Other Decks in Education
See All in Education
心理学を学び活用することで偉大なスクラムマスターを目指す − 大学とコミュニティを組み合わせた学びの循環 / Becoming a great Scrum Master by learning and using psychology
psj59129
1
2.1k
国際卓越研究大学計画|Science Tokyo(東京科学大学)
sciencetokyo
PRO
0
49k
2025年の本当に大事なAI動向まとめ
frievea
1
200
Railsチュートリアル × 反転学習の事例紹介
yasslab
PRO
3
170k
CoderDojoへようこそ ニンジャ&保護者向け (CoderDojo Guidance for Ninjas&Parents)
coderdojokodaira
1
130
高校数学とJulia言語
shimizudan
0
140
小さなまちで始める デジタル創作の居場所〜すべての子どもが創造的に未来を描ける社会へ〜
codeforeveryone
0
290
Leveraging LLMs for student feedback in introductory data science courses (Stats Up AI)
minecr
1
240
Adobe Express
matleenalaakso
2
8.2k
高校数学B「統計的な推測」 分野の問題と課題
shimizudan
1
120
SSH_handshake_easy_explain
kenbo
0
960
演習:GitHubの基本操作 / 06-github-basic
kaityo256
PRO
0
230
Featured
See All Featured
The Language of Interfaces
destraynor
162
26k
What’s in a name? Adding method to the madness
productmarketing
PRO
24
4k
Building a Modern Day E-commerce SEO Strategy
aleyda
45
9k
Principles of Awesome APIs and How to Build Them.
keavy
128
17k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.6k
How Fast Is Fast Enough? [PerfNow 2025]
tammyeverts
3
510
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
Groundhog Day: Seeking Process in Gaming for Health
codingconduct
0
130
What Being in a Rock Band Can Teach Us About Real World SEO
427marketing
0
200
HU Berlin: Industrial-Strength Natural Language Processing with spaCy and Prodigy
inesmontani
PRO
0
300
The innovator’s Mindset - Leading Through an Era of Exponential Change - McGill University 2025
jdejongh
PRO
1
140
Making the Leap to Tech Lead
cromwellryan
135
9.8k
Transcript
De Bruijn Graph
[email protected]
http://buttonwood.github.io
Origin of de Bruijn graphs Graph Theory: Hamilton path VS
Euler path In 1946, the Dutch mathematician Nicolaas de Bruijn The ‘superstring problem’: find a shortest circular ‘superstring’ that contains all possible ‘substrings’ of length k (k-mers) over a given alphabet.{0,1} -> {A,T,C,G}
De Bruijn Graph of a Small Sequence
Let’s go!From simple examples...
Double-Stranded Nature of Genome ATGGAAGTCGCTTCCAT TACCTTCAGCCAAGGTA 5’ 5’ 3’ 3’
Impact of Changing k-mer Size Big? Small? Avoid even k?
avoid even k, because with even k, many k-mers become reverse comcomplements of their own sequences.
DBG OF A GENOME
Sequencing errors ---> Tips A B C D • Clip
the short tips that had lengths < 2 Kmers • Or less number of reads through it.
Sequencing errors ---> CrossLink ATGGAAGTCGCG ATGGAAG TGGAAGT GGAAGTC GAAGTCG AAGTCGC
AGTCGCG GAGGAAGACCTT GAGGAAG AGGAAGA GGAAGAC GAAGACC AAGACCT AGACCTT GAGGAAGTCC AGGAAGT ATGGAAGTCG seq1 seq1-read1 seq2-read1 seq2 • Remove low-coverage nodes. Low-coverage connection
Sequencing errors --->Bubbles ATGGAAGTCGCG... ATGGAAG TGGAAGT GGAAGTC GAAGTCG AAGTCGC AGTCGCG
TGGAAGA GGAAGAC GAAGACG AAGACGC AGACGCG ATGGAAGACG... ATGGAAGTCG... seq1 seq1-read1 seq1-read2 • Remove low-coverage paths. • Same as SNP Bubbles. Low-coverage paths Bubbles
Repetitive Regions Simple Repeats Tandem Repeats
TINY OR LONG REPEAT ATTTAAATTAGCGATATTAGCATCTCTT .... AATTA ATTAG TAGCG AGCGA
GCGAT CGATA GATAT ATATT TATTA TAGCA AGCAT GCATC CATCT ATCTC TCTCT ... TTAGC c a d b e .... AATTAGC ATTAGCG TAGCGAT AGCGATA GCGATAT CGATATT GATATTA ATATTAG TATTAGC TTAGCGA ATTAGCA TTAGCAT TAGCATC .... You see what? Bigger k-mer(long overlap) cross the repeat.
Haplotype Differences
SNPs--->Bubbles ATGGAAGTCGCG... ATGGAAG TGGAAGT GGAAGTC GAAGTCG AAGTCGC AGTCGCG TGGAAGA GGAAGAC
GAAGACG AAGACGC AGACGCG ATGGAAGACG... ATGGAAGTCG... hap1 hap1-read1 hap2-read1 • Equal coverage paths. Equal-coverage paths Bubbles ATGGAAGACGCG... hap2 ? Adjacent SNPs ATGGTAGTCGCG... ATGGAAGACGCG... hap1 hap2
Indels--->Bubbles ATGGAAGTCGCGTCGA... ATGGAAG TGGAAGT ... CGCGTCG GCGTCGA TGGAAGG ... GGCCTCG
ATGGAAG---- GCGTC... ATGGAAGTCG... hap1 hap1-read1 hap2-read1 • Equal coverage paths. • Long road map. Equal-coverage paths Bubbles ATGGAAG-----GCGTCGA... hap2 ? Adjacent SNPs with Indels ATGGTAGTCGCAAGCC... ... ATGGAAGACGC---GCG... hap1 hap2
ASSIGNMENT 1 Let Kmer=4 Let Kmer=5 Let Kmer=7 ATTA TTAG
TAGG AGGA ATTAGGATCATGATCCTCTGTGGATAAGATCTTTTTATTTAAAGATCTCTTTATTAGATCTCTT … ATTA DBG of Genome; DBG of Reads; L = 15 TRY!
ASSIGNMENT 2 Simulated Hap1 about 1M: Count K-mer freq; Then
sequence 40X; Then add repeat/Error rate; Then SNPs(two haplotype);