Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
DBG-Assemble A Genome
Search
Buttonwood
August 28, 2013
Education
0
22
DBG-Assemble A Genome
Assemble a genome
Buttonwood
August 28, 2013
Tweet
Share
Other Decks in Education
See All in Education
Sponsor the Conference | VizChitra 2025
vizchitra
0
540
CHARMS-HP-Banner
weltraumreisende
0
170
子どものためのプログラミング道場『CoderDojo』〜法人提携例〜 / Partnership with CoderDojo Japan
coderdojojapan
4
16k
Tangible, Embedded and Embodied Interaction - Lecture 7 - Next Generation User Interfaces (4018166FNR)
signer
PRO
0
1.7k
新卒研修に仕掛ける 学びのサイクル / Implementing Learning Cycles in New Graduate Training
takashi_toyosaki
1
150
Virtual and Augmented Reality - Lecture 8 - Next Generation User Interfaces (4018166FNR)
signer
PRO
0
1.7k
Linuxのよく使うコマンドを解説
mickey_kubo
1
130
Human-AI Interaction - Lecture 11 - Next Generation User Interfaces (4018166FNR)
signer
PRO
0
450
RELC_2025_KYI
otamayuzak
0
110
SARA Annual Report 2024-25
sara2023
1
180
推しのコミュニティはなんぼあってもいい / Let's join a lot of communities.
kaga
2
1.7k
計算情報学研究室 (数理情報学第7研究室)紹介スライド (2025)
tomonatu8
0
510
Featured
See All Featured
Side Projects
sachag
455
42k
Adopting Sorbet at Scale
ufuk
77
9.4k
Automating Front-end Workflow
addyosmani
1370
200k
Scaling GitHub
holman
459
140k
Bootstrapping a Software Product
garrettdimon
PRO
307
110k
What's in a price? How to price your products and services
michaelherold
246
12k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
10
930
VelocityConf: Rendering Performance Case Studies
addyosmani
330
24k
Imperfection Machines: The Place of Print at Facebook
scottboms
267
13k
Typedesign – Prime Four
hannesfritz
42
2.7k
Site-Speed That Sticks
csswizardry
10
660
We Have a Design System, Now What?
morganepeng
53
7.7k
Transcript
De Bruijn Graph
[email protected]
http://buttonwood.github.io
Origin of de Bruijn graphs Graph Theory: Hamilton path VS
Euler path In 1946, the Dutch mathematician Nicolaas de Bruijn The ‘superstring problem’: find a shortest circular ‘superstring’ that contains all possible ‘substrings’ of length k (k-mers) over a given alphabet.{0,1} -> {A,T,C,G}
De Bruijn Graph of a Small Sequence
Let’s go!From simple examples...
Double-Stranded Nature of Genome ATGGAAGTCGCTTCCAT TACCTTCAGCCAAGGTA 5’ 5’ 3’ 3’
Impact of Changing k-mer Size Big? Small? Avoid even k?
avoid even k, because with even k, many k-mers become reverse comcomplements of their own sequences.
DBG OF A GENOME
Sequencing errors ---> Tips A B C D • Clip
the short tips that had lengths < 2 Kmers • Or less number of reads through it.
Sequencing errors ---> CrossLink ATGGAAGTCGCG ATGGAAG TGGAAGT GGAAGTC GAAGTCG AAGTCGC
AGTCGCG GAGGAAGACCTT GAGGAAG AGGAAGA GGAAGAC GAAGACC AAGACCT AGACCTT GAGGAAGTCC AGGAAGT ATGGAAGTCG seq1 seq1-read1 seq2-read1 seq2 • Remove low-coverage nodes. Low-coverage connection
Sequencing errors --->Bubbles ATGGAAGTCGCG... ATGGAAG TGGAAGT GGAAGTC GAAGTCG AAGTCGC AGTCGCG
TGGAAGA GGAAGAC GAAGACG AAGACGC AGACGCG ATGGAAGACG... ATGGAAGTCG... seq1 seq1-read1 seq1-read2 • Remove low-coverage paths. • Same as SNP Bubbles. Low-coverage paths Bubbles
Repetitive Regions Simple Repeats Tandem Repeats
TINY OR LONG REPEAT ATTTAAATTAGCGATATTAGCATCTCTT .... AATTA ATTAG TAGCG AGCGA
GCGAT CGATA GATAT ATATT TATTA TAGCA AGCAT GCATC CATCT ATCTC TCTCT ... TTAGC c a d b e .... AATTAGC ATTAGCG TAGCGAT AGCGATA GCGATAT CGATATT GATATTA ATATTAG TATTAGC TTAGCGA ATTAGCA TTAGCAT TAGCATC .... You see what? Bigger k-mer(long overlap) cross the repeat.
Haplotype Differences
SNPs--->Bubbles ATGGAAGTCGCG... ATGGAAG TGGAAGT GGAAGTC GAAGTCG AAGTCGC AGTCGCG TGGAAGA GGAAGAC
GAAGACG AAGACGC AGACGCG ATGGAAGACG... ATGGAAGTCG... hap1 hap1-read1 hap2-read1 • Equal coverage paths. Equal-coverage paths Bubbles ATGGAAGACGCG... hap2 ? Adjacent SNPs ATGGTAGTCGCG... ATGGAAGACGCG... hap1 hap2
Indels--->Bubbles ATGGAAGTCGCGTCGA... ATGGAAG TGGAAGT ... CGCGTCG GCGTCGA TGGAAGG ... GGCCTCG
ATGGAAG---- GCGTC... ATGGAAGTCG... hap1 hap1-read1 hap2-read1 • Equal coverage paths. • Long road map. Equal-coverage paths Bubbles ATGGAAG-----GCGTCGA... hap2 ? Adjacent SNPs with Indels ATGGTAGTCGCAAGCC... ... ATGGAAGACGC---GCG... hap1 hap2
ASSIGNMENT 1 Let Kmer=4 Let Kmer=5 Let Kmer=7 ATTA TTAG
TAGG AGGA ATTAGGATCATGATCCTCTGTGGATAAGATCTTTTTATTTAAAGATCTCTTTATTAGATCTCTT … ATTA DBG of Genome; DBG of Reads; L = 15 TRY!
ASSIGNMENT 2 Simulated Hap1 about 1M: Count K-mer freq; Then
sequence 40X; Then add repeat/Error rate; Then SNPs(two haplotype);