Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
DBG-Assemble A Genome
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Buttonwood
August 28, 2013
Education
32
0
Share
DBG-Assemble A Genome
Assemble a genome
Buttonwood
August 28, 2013
Other Decks in Education
See All in Education
吉祥寺.pmは1つじゃない — 複数イベント並走運営の12年 —
magnolia
0
1.3k
From Days to Minutes: How We Taught an AI to Onboard 50+ Tenants on our AI Features
mfcabrera
0
160
Gesture-based Interaction - Lecture 6 - Next Generation User Interfaces (4018166FNR)
signer
PRO
1
2.3k
2026年度春学期 統計学 講義の進め方と成績評価について (2026. 4. 9)
akiraasano
PRO
0
180
SL AMIGOS 教育格差と私たちの取り組み - スリランカの支援学校への支援プロジェクト:リシンドゥ リオ 氏 (別府溝部学園短期大学 ビジネス観光コース 留学生):2720 Japan O.K. ロータリーEクラブ2026年4月6日卓話
2720japanoke
0
610
アラムコSTEAMチャレンジ 実践報告書
codeforeveryone
0
120
Data Processing and Visualisation Frameworks - Lecture 6 - Information Visualisation (4019538FNR)
signer
PRO
1
3.1k
2026年度春学期 統計学 第3回 クロス集計と感度・特異度,データの可視化 (2026. 4. 23)
akiraasano
PRO
0
130
教育現場から見た Ruby on Rails
yasslab
PRO
0
170
Catecismo 26 #1 - Aula inaugural
cm_manaus
0
160
Populism, Post-Liberalism & Climate Change
vyadav
0
180
Protecting Patrons with Digital Vendors
dsalo
0
180
Featured
See All Featured
<Decoding/> the Language of Devs - We Love SEO 2024
nikkihalliwell
1
230
Build your cross-platform service in a week with App Engine
jlugia
234
18k
brightonSEO & MeasureFest 2025 - Christian Goodrich - Winning strategies for Black Friday CRO & PPC
cargoodrich
3
720
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.7k
AI Search: Where Are We & What Can We Do About It?
aleyda
0
7.5k
How to Build an AI Search Optimization Roadmap - Criteria and Steps to Take #SEOIRL
aleyda
1
2.1k
Navigating Algorithm Shifts & AI Overviews - #SMXNext
aleyda
1
1.3k
A Guide to Academic Writing Using Generative AI - A Workshop
ks91
PRO
1
320
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.8k
Lightning talk: Run Django tests with GitHub Actions
sabderemane
0
190
BBQ
matthewcrist
89
10k
Digital Ethics as a Driver of Design Innovation
axbom
PRO
1
300
Transcript
De Bruijn Graph
[email protected]
http://buttonwood.github.io
Origin of de Bruijn graphs Graph Theory: Hamilton path VS
Euler path In 1946, the Dutch mathematician Nicolaas de Bruijn The ‘superstring problem’: find a shortest circular ‘superstring’ that contains all possible ‘substrings’ of length k (k-mers) over a given alphabet.{0,1} -> {A,T,C,G}
De Bruijn Graph of a Small Sequence
Let’s go!From simple examples...
Double-Stranded Nature of Genome ATGGAAGTCGCTTCCAT TACCTTCAGCCAAGGTA 5’ 5’ 3’ 3’
Impact of Changing k-mer Size Big? Small? Avoid even k?
avoid even k, because with even k, many k-mers become reverse comcomplements of their own sequences.
DBG OF A GENOME
Sequencing errors ---> Tips A B C D • Clip
the short tips that had lengths < 2 Kmers • Or less number of reads through it.
Sequencing errors ---> CrossLink ATGGAAGTCGCG ATGGAAG TGGAAGT GGAAGTC GAAGTCG AAGTCGC
AGTCGCG GAGGAAGACCTT GAGGAAG AGGAAGA GGAAGAC GAAGACC AAGACCT AGACCTT GAGGAAGTCC AGGAAGT ATGGAAGTCG seq1 seq1-read1 seq2-read1 seq2 • Remove low-coverage nodes. Low-coverage connection
Sequencing errors --->Bubbles ATGGAAGTCGCG... ATGGAAG TGGAAGT GGAAGTC GAAGTCG AAGTCGC AGTCGCG
TGGAAGA GGAAGAC GAAGACG AAGACGC AGACGCG ATGGAAGACG... ATGGAAGTCG... seq1 seq1-read1 seq1-read2 • Remove low-coverage paths. • Same as SNP Bubbles. Low-coverage paths Bubbles
Repetitive Regions Simple Repeats Tandem Repeats
TINY OR LONG REPEAT ATTTAAATTAGCGATATTAGCATCTCTT .... AATTA ATTAG TAGCG AGCGA
GCGAT CGATA GATAT ATATT TATTA TAGCA AGCAT GCATC CATCT ATCTC TCTCT ... TTAGC c a d b e .... AATTAGC ATTAGCG TAGCGAT AGCGATA GCGATAT CGATATT GATATTA ATATTAG TATTAGC TTAGCGA ATTAGCA TTAGCAT TAGCATC .... You see what? Bigger k-mer(long overlap) cross the repeat.
Haplotype Differences
SNPs--->Bubbles ATGGAAGTCGCG... ATGGAAG TGGAAGT GGAAGTC GAAGTCG AAGTCGC AGTCGCG TGGAAGA GGAAGAC
GAAGACG AAGACGC AGACGCG ATGGAAGACG... ATGGAAGTCG... hap1 hap1-read1 hap2-read1 • Equal coverage paths. Equal-coverage paths Bubbles ATGGAAGACGCG... hap2 ? Adjacent SNPs ATGGTAGTCGCG... ATGGAAGACGCG... hap1 hap2
Indels--->Bubbles ATGGAAGTCGCGTCGA... ATGGAAG TGGAAGT ... CGCGTCG GCGTCGA TGGAAGG ... GGCCTCG
ATGGAAG---- GCGTC... ATGGAAGTCG... hap1 hap1-read1 hap2-read1 • Equal coverage paths. • Long road map. Equal-coverage paths Bubbles ATGGAAG-----GCGTCGA... hap2 ? Adjacent SNPs with Indels ATGGTAGTCGCAAGCC... ... ATGGAAGACGC---GCG... hap1 hap2
ASSIGNMENT 1 Let Kmer=4 Let Kmer=5 Let Kmer=7 ATTA TTAG
TAGG AGGA ATTAGGATCATGATCCTCTGTGGATAAGATCTTTTTATTTAAAGATCTCTTTATTAGATCTCTT … ATTA DBG of Genome; DBG of Reads; L = 15 TRY!
ASSIGNMENT 2 Simulated Hap1 about 1M: Count K-mer freq; Then
sequence 40X; Then add repeat/Error rate; Then SNPs(two haplotype);