Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
DBG-Assemble A Genome
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Buttonwood
August 28, 2013
Education
0
26
DBG-Assemble A Genome
Assemble a genome
Buttonwood
August 28, 2013
Tweet
Share
Other Decks in Education
See All in Education
Introduction - Lecture 1 - Next Generation User Interfaces (4018166FNR)
signer
PRO
2
4.5k
Adobe Express
matleenalaakso
2
8.2k
地区危機管理委員会 出前セミナー「ロータリーにおける危機管理」:膳所 和彦 氏(国際ロータリー第2720地区 パストガバナー・日田ロータリークラブ・医療法人恒心会ぜぜ医院 理事長):2720 Japan O.K. ロータリーEクラブ2026年2月16日卓話
2720japanoke
1
590
Lenguajes de Programacion (Ingresantes UNI 2026)
robintux
0
120
インシデント対応
akira345
0
270
ブランチ操作 / 02-a-branch
kaityo256
PRO
0
190
Chapitre_2_-_Partie_3.pdf
bernhardsvt
0
200
OSINT入門-CTF for GIRLS_SECCON14電脳会議
nomizone
1
1.1k
次期バージョン 14.5.1 Early Access Program が始まりました
harunakano
1
120
栃木県警サイバーセキュリティ研修会2026
nomizone
0
340
タイムマシンのつくりかた
nomizone
3
1.1k
RGBでも蛍光を!? / RayTracingCamp11
kugimasa
2
460
Featured
See All Featured
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
254
22k
Dominate Local Search Results - an insider guide to GBP, reviews, and Local SEO
greggifford
PRO
0
100
Ten Tips & Tricks for a 🌱 transition
stuffmc
0
85
We Have a Design System, Now What?
morganepeng
55
8k
svc-hook: hooking system calls on ARM64 by binary rewriting
retrage
2
170
How to Get Subject Matter Experts Bought In and Actively Contributing to SEO & PR Initiatives.
livdayseo
0
83
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.7k
DBのスキルで生き残る技術 - AI時代におけるテーブル設計の勘所
soudai
PRO
62
51k
The Invisible Side of Design
smashingmag
302
51k
Lightning talk: Run Django tests with GitHub Actions
sabderemane
0
140
SERP Conf. Vienna - Web Accessibility: Optimizing for Inclusivity and SEO
sarafernandez
1
1.3k
Impact Scores and Hybrid Strategies: The future of link building
tamaranovitovic
0
230
Transcript
De Bruijn Graph
[email protected]
http://buttonwood.github.io
Origin of de Bruijn graphs Graph Theory: Hamilton path VS
Euler path In 1946, the Dutch mathematician Nicolaas de Bruijn The ‘superstring problem’: find a shortest circular ‘superstring’ that contains all possible ‘substrings’ of length k (k-mers) over a given alphabet.{0,1} -> {A,T,C,G}
De Bruijn Graph of a Small Sequence
Let’s go!From simple examples...
Double-Stranded Nature of Genome ATGGAAGTCGCTTCCAT TACCTTCAGCCAAGGTA 5’ 5’ 3’ 3’
Impact of Changing k-mer Size Big? Small? Avoid even k?
avoid even k, because with even k, many k-mers become reverse comcomplements of their own sequences.
DBG OF A GENOME
Sequencing errors ---> Tips A B C D • Clip
the short tips that had lengths < 2 Kmers • Or less number of reads through it.
Sequencing errors ---> CrossLink ATGGAAGTCGCG ATGGAAG TGGAAGT GGAAGTC GAAGTCG AAGTCGC
AGTCGCG GAGGAAGACCTT GAGGAAG AGGAAGA GGAAGAC GAAGACC AAGACCT AGACCTT GAGGAAGTCC AGGAAGT ATGGAAGTCG seq1 seq1-read1 seq2-read1 seq2 • Remove low-coverage nodes. Low-coverage connection
Sequencing errors --->Bubbles ATGGAAGTCGCG... ATGGAAG TGGAAGT GGAAGTC GAAGTCG AAGTCGC AGTCGCG
TGGAAGA GGAAGAC GAAGACG AAGACGC AGACGCG ATGGAAGACG... ATGGAAGTCG... seq1 seq1-read1 seq1-read2 • Remove low-coverage paths. • Same as SNP Bubbles. Low-coverage paths Bubbles
Repetitive Regions Simple Repeats Tandem Repeats
TINY OR LONG REPEAT ATTTAAATTAGCGATATTAGCATCTCTT .... AATTA ATTAG TAGCG AGCGA
GCGAT CGATA GATAT ATATT TATTA TAGCA AGCAT GCATC CATCT ATCTC TCTCT ... TTAGC c a d b e .... AATTAGC ATTAGCG TAGCGAT AGCGATA GCGATAT CGATATT GATATTA ATATTAG TATTAGC TTAGCGA ATTAGCA TTAGCAT TAGCATC .... You see what? Bigger k-mer(long overlap) cross the repeat.
Haplotype Differences
SNPs--->Bubbles ATGGAAGTCGCG... ATGGAAG TGGAAGT GGAAGTC GAAGTCG AAGTCGC AGTCGCG TGGAAGA GGAAGAC
GAAGACG AAGACGC AGACGCG ATGGAAGACG... ATGGAAGTCG... hap1 hap1-read1 hap2-read1 • Equal coverage paths. Equal-coverage paths Bubbles ATGGAAGACGCG... hap2 ? Adjacent SNPs ATGGTAGTCGCG... ATGGAAGACGCG... hap1 hap2
Indels--->Bubbles ATGGAAGTCGCGTCGA... ATGGAAG TGGAAGT ... CGCGTCG GCGTCGA TGGAAGG ... GGCCTCG
ATGGAAG---- GCGTC... ATGGAAGTCG... hap1 hap1-read1 hap2-read1 • Equal coverage paths. • Long road map. Equal-coverage paths Bubbles ATGGAAG-----GCGTCGA... hap2 ? Adjacent SNPs with Indels ATGGTAGTCGCAAGCC... ... ATGGAAGACGC---GCG... hap1 hap2
ASSIGNMENT 1 Let Kmer=4 Let Kmer=5 Let Kmer=7 ATTA TTAG
TAGG AGGA ATTAGGATCATGATCCTCTGTGGATAAGATCTTTTTATTTAAAGATCTCTTTATTAGATCTCTT … ATTA DBG of Genome; DBG of Reads; L = 15 TRY!
ASSIGNMENT 2 Simulated Hap1 about 1M: Count K-mer freq; Then
sequence 40X; Then add repeat/Error rate; Then SNPs(two haplotype);