Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
DBG-Assemble A Genome
Search
Buttonwood
August 28, 2013
Education
0
22
DBG-Assemble A Genome
Assemble a genome
Buttonwood
August 28, 2013
Tweet
Share
Other Decks in Education
See All in Education
プログラミング教育する大学、ZEN大学
sifue
1
540
『会社を知ってもらう』から『安心して活躍してもらう』までの プロセスとフロー
sasakendayo
0
230
2025年度春学期 統計学 第4回 データを「分布」で見る (2025. 5. 1)
akiraasano
PRO
0
100
미국 교환학생 가서 무료 홈스테이 살면서 인턴 취업하기
maryang
0
110
Virtual and Augmented Reality - Lecture 8 - Next Generation User Interfaces (4018166FNR)
signer
PRO
0
1.7k
人になにかを教えるときに考えていること(2025-05版 / VRC-LT #18)
sksat
4
1k
Human-AI Interaction - Lecture 11 - Next Generation User Interfaces (4018166FNR)
signer
PRO
0
460
今も熱いもの!魂を揺さぶる戦士の儀式:マオリ族のハカ
shubox
0
210
SkimaTalk Tutorial for Students
skimatalk
0
1.8k
2025/06/05_読み漁り学習
nag8
0
150
第1回大学院理工学系説明会|東京科学大学(Science Tokyo)
sciencetokyo
PRO
0
3.8k
仮説の取扱説明書/User_Guide_to_a_Hypothesis
florets1
4
320
Featured
See All Featured
GraphQLの誤解/rethinking-graphql
sonatard
71
11k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
31
1.3k
The Illustrated Children's Guide to Kubernetes
chrisshort
48
50k
Optimising Largest Contentful Paint
csswizardry
37
3.3k
BBQ
matthewcrist
89
9.7k
Code Reviewing Like a Champion
maltzj
524
40k
Docker and Python
trallard
44
3.5k
Reflections from 52 weeks, 52 projects
jeffersonlam
351
20k
[RailsConf 2023] Rails as a piece of cake
palkan
55
5.7k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
35
2.4k
Large-scale JavaScript Application Architecture
addyosmani
512
110k
How to train your dragon (web standard)
notwaldorf
94
6.1k
Transcript
De Bruijn Graph
[email protected]
http://buttonwood.github.io
Origin of de Bruijn graphs Graph Theory: Hamilton path VS
Euler path In 1946, the Dutch mathematician Nicolaas de Bruijn The ‘superstring problem’: find a shortest circular ‘superstring’ that contains all possible ‘substrings’ of length k (k-mers) over a given alphabet.{0,1} -> {A,T,C,G}
De Bruijn Graph of a Small Sequence
Let’s go!From simple examples...
Double-Stranded Nature of Genome ATGGAAGTCGCTTCCAT TACCTTCAGCCAAGGTA 5’ 5’ 3’ 3’
Impact of Changing k-mer Size Big? Small? Avoid even k?
avoid even k, because with even k, many k-mers become reverse comcomplements of their own sequences.
DBG OF A GENOME
Sequencing errors ---> Tips A B C D • Clip
the short tips that had lengths < 2 Kmers • Or less number of reads through it.
Sequencing errors ---> CrossLink ATGGAAGTCGCG ATGGAAG TGGAAGT GGAAGTC GAAGTCG AAGTCGC
AGTCGCG GAGGAAGACCTT GAGGAAG AGGAAGA GGAAGAC GAAGACC AAGACCT AGACCTT GAGGAAGTCC AGGAAGT ATGGAAGTCG seq1 seq1-read1 seq2-read1 seq2 • Remove low-coverage nodes. Low-coverage connection
Sequencing errors --->Bubbles ATGGAAGTCGCG... ATGGAAG TGGAAGT GGAAGTC GAAGTCG AAGTCGC AGTCGCG
TGGAAGA GGAAGAC GAAGACG AAGACGC AGACGCG ATGGAAGACG... ATGGAAGTCG... seq1 seq1-read1 seq1-read2 • Remove low-coverage paths. • Same as SNP Bubbles. Low-coverage paths Bubbles
Repetitive Regions Simple Repeats Tandem Repeats
TINY OR LONG REPEAT ATTTAAATTAGCGATATTAGCATCTCTT .... AATTA ATTAG TAGCG AGCGA
GCGAT CGATA GATAT ATATT TATTA TAGCA AGCAT GCATC CATCT ATCTC TCTCT ... TTAGC c a d b e .... AATTAGC ATTAGCG TAGCGAT AGCGATA GCGATAT CGATATT GATATTA ATATTAG TATTAGC TTAGCGA ATTAGCA TTAGCAT TAGCATC .... You see what? Bigger k-mer(long overlap) cross the repeat.
Haplotype Differences
SNPs--->Bubbles ATGGAAGTCGCG... ATGGAAG TGGAAGT GGAAGTC GAAGTCG AAGTCGC AGTCGCG TGGAAGA GGAAGAC
GAAGACG AAGACGC AGACGCG ATGGAAGACG... ATGGAAGTCG... hap1 hap1-read1 hap2-read1 • Equal coverage paths. Equal-coverage paths Bubbles ATGGAAGACGCG... hap2 ? Adjacent SNPs ATGGTAGTCGCG... ATGGAAGACGCG... hap1 hap2
Indels--->Bubbles ATGGAAGTCGCGTCGA... ATGGAAG TGGAAGT ... CGCGTCG GCGTCGA TGGAAGG ... GGCCTCG
ATGGAAG---- GCGTC... ATGGAAGTCG... hap1 hap1-read1 hap2-read1 • Equal coverage paths. • Long road map. Equal-coverage paths Bubbles ATGGAAG-----GCGTCGA... hap2 ? Adjacent SNPs with Indels ATGGTAGTCGCAAGCC... ... ATGGAAGACGC---GCG... hap1 hap2
ASSIGNMENT 1 Let Kmer=4 Let Kmer=5 Let Kmer=7 ATTA TTAG
TAGG AGGA ATTAGGATCATGATCCTCTGTGGATAAGATCTTTTTATTTAAAGATCTCTTTATTAGATCTCTT … ATTA DBG of Genome; DBG of Reads; L = 15 TRY!
ASSIGNMENT 2 Simulated Hap1 about 1M: Count K-mer freq; Then
sequence 40X; Then add repeat/Error rate; Then SNPs(two haplotype);