Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
HiTSeq '16 slides
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
verve
July 08, 2016
Research
2
150
HiTSeq '16 slides
Slides from my HiTSeq '16 talk on Rail-RNA (
http://rail.bio
).
verve
July 08, 2016
Tweet
Share
Other Decks in Research
See All in Research
一般道の交通量減少と速度低下についての全国分析と熊本市におけるケーススタディ(20251122 土木計画学研究発表会)
trafficbrain
0
150
存立危機事態の再検討
jimboken
0
240
離散凸解析に基づく予測付き離散最適化手法 (IBIS '25)
taihei_oki
PRO
1
680
SREのためのテレメトリー技術の探究 / Telemetry for SRE
yuukit
13
3k
ウェブ・ソーシャルメディア論文読み会 第36回: The Stepwise Deception: Simulating the Evolution from True News to Fake News with LLM Agents (EMNLP, 2025)
hkefka385
0
150
令和最新技術で伝統掲示板を再構築: HonoX で作る型安全なスレッドフロート型掲示板 / かろっく@calloc134 - Hono Conference 2025
calloc134
0
550
ドメイン知識がない領域での自然言語処理の始め方
hargon24
1
240
病院向け生成AIプロダクト開発の実践と課題
hagino3000
0
530
超高速データサイエンス
matsui_528
2
380
HU Berlin: Industrial-Strength Natural Language Processing with spaCy and Prodigy
inesmontani
PRO
0
200
ロボット学習における大規模検索技術の展開と応用
denkiwakame
1
210
生成的情報検索時代におけるAI利用と認知バイアス
trycycle
PRO
0
260
Featured
See All Featured
Leo the Paperboy
mayatellez
4
1.4k
Into the Great Unknown - MozCon
thekraken
40
2.2k
Max Prin - Stacking Signals: How International SEO Comes Together (And Falls Apart)
techseoconnect
PRO
0
82
Mind Mapping
helmedeiros
PRO
0
78
The Cost Of JavaScript in 2023
addyosmani
55
9.5k
Why Mistakes Are the Best Teachers: Turning Failure into a Pathway for Growth
auna
0
51
The Impact of AI in SEO - AI Overviews June 2024 Edition
aleyda
5
730
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
254
22k
What the history of the web can teach us about the future of AI
inesmontani
PRO
1
430
Stewardship and Sustainability of Urban and Community Forests
pwiseman
0
110
AI in Enterprises - Java and Open Source to the Rescue
ivargrimstad
0
1.1k
Information Architects: The Missing Link in Design Systems
soysaucechin
0
770
Transcript
Scalable analysis of RNA-seq splicing and coverage @AbhiNellore at HiTSeq
‘16 Langmead & Leek Labs Johns Hopkins University http://rail.bio
Alignment …ATACATCAGACTAGACCGTACCACTCATAGACCTAGACCAGATACAG… CAGACTAGACCGTACCACTCATAGACCTAGACCAGATAC chr1 Sometimes, a read correctly aligns to
the reference genome end to end. read
Spliced alignment Other times, exon-exon junctions are overlapped. Rail-RNA divides
the read into readlets… ATACATCAGACTAGACCGTACCACACAGCATGACAGTCATTCGACGTACT ATACATCAGACTAGACCGTACCACA ATCAGACTAGACCGTACCACACAGC GACTAGACCGTACCACACAGCATGA AGACCGTACCACACAGCATGACAGT CGTACCACACAGCATGACAGTCATT CCACACAGCATGACAGTCATTCGAC ACAGCATGACAGTCATTCGACGTAC CAGCATGACAGTCATTCGACGTACT ATACATCAGACTAGA ATACATCAGACTAGAC ATACATCAGACTAGACCG ATACATCAGACTAGACCGT ATACATCAGACTAGACCGTAC ATACATCAGACTAGACCGTACAGC AGCATGACAGTCATTCGACGTACT ATGACAGTCATTCGACGTACT GACAGTCATTCGACGTACT ACAGTCATTCGACGTACT AGTCATTCGACGTACT GTCATTCGACGTACT read readlets
Spliced alignment …ATACATCAGACTAGACCGTACCACAGTAGTTCATGACCCTCAGCAGCATGACAGTCATTCGACGTACTCGTATCGATACAGTACAGTAGCC… intron CACAGCATGACAGTCATTCGACGTACTCGTATCGATACAGTACAGTAGCC ATACATCAGACTAGACCGTACCACACAGCATGACAGTCATTCGACGTACT chr1 read 2 needs
realignment to find junction read 1 …and align readlets to the genome to infer introns. Realignment may be necessary.
Why Rail-RNA • Works on many samples, many cores •
Easy to deploy in different computing environments • Borrows strength across samples • Writes many compact, queryable outputs
Many samples, many cores
Scaling Use MapReduce. Example: • Divide computer cluster into workers
controlled by a master • Divide problem up into sequence of aggregation and computation steps
Filter junctions Detect junctions Preprocess reads Align reads with Bowtie
2 / segment into readlets Align readlets with Bowtie 1 Finalize junction combos with Bowtie 2 Enumerate intron configurations Retrieve and index isofrags Realign reads with Bowtie 2 Collect & compare alignments Write BAMs Compile coverage vectors / write bigWigs Write junctions & indels Distribute Bowtie 2 index of isofrags across cluster Aggregate reads by nucleotide sequence Aggregate readlets by nucleotide sequence Aggregate readlets by read sequence data flow redundancy reduction intermediate step output step
Filter junctions Detect junctions Preprocess reads Align reads with Bowtie
2 / segment into readlets Align readlets with Bowtie 1 Finalize junction combos with Bowtie 2 Enumerate intron configurations Retrieve and index isofrags Realign reads with Bowtie 2 Collect & compare alignments Write BAMs Compile coverage vectors / write bigWigs Write junctions & indels Distribute Bowtie 2 index of isofrags across cluster Aggregate reads by nucleotide sequence Aggregate readlets by nucleotide sequence Aggregate readlets by read sequence data flow redundancy reduction intermediate step output step
Filter junctions Detect junctions Preprocess reads Align reads with Bowtie
2 / segment into readlets Align readlets with Bowtie 1 Finalize junction combos with Bowtie 2 Enumerate intron configurations Retrieve and index isofrags Realign reads with Bowtie 2 Collect & compare alignments Write BAMs Compile coverage vectors / write bigWigs Write junctions & indels Distribute Bowtie 2 index of isofrags across cluster Aggregate reads by nucleotide sequence Aggregate readlets by nucleotide sequence Aggregate readlets by read sequence data flow redundancy reduction intermediate step output step
Easy to deploy
http://rail.bio rail-rna go elastic —-manifest URLsOf500Samples.txt —-assembly hg38 —-output s3://your-bucket/output_folder
—-core-instance-count 20 —-core-instance-type c3.2xlarge rail-rna go parallel —-manifest URLsOf500Samples.txt —x /path/to/hg38_bowtie_basename —-output /path/to/output_folder Same outputs, different environments, reproducible Cloud w/ AWS EMR Local cluster w/ SGE
Ran Rail-RNA on 49,849 RNA-seq runs from the Sequence Read
Archive (over 150 terabases of reads)
+ • Rapid: 2 weeks to results • Repeatable: http://github.com/nellore/runs
for commands • Inexpensive: ~$1.40/sample
None
Borrows strength across samples
Borrowing strength …ATACATCAGACTAGACCGTACCACAGTAGTTCATGACCCTCAGCAGCATGACAGTCATTCGACGTACTCGTATCGATACAGTACAGTAGCC… intron CATAGCATGACAGTCATTCGACGTACTCGTATCGATACAGTACAGTAGCC ATACATCAGACTAGACCGTACCACACAGCATGACAGTCATTCGACGTACT chr1 read 2 found
to overlap junction on realignment read 1 Realignment after collecting and filtering a list of junctions across samples. sample 1 sample 2
81,066,376 junctions across 49,849 SRA samples vs. 540,746 annotated junctions
Why discrepancy? On single sample, every aligner finds some good
junctions and some duds goods duds junctions
Why discrepancy? But much more overlap between goods than between
duds across many samples vs.
Why discrepancy? So as you add samples… goods duds junctions
goods duds junctions
Junction filter Keep a junction if and only if it’s
initially detected in: (1) 5% of samples OR (2) at least 5 reads in any one sample
Rail-RNA: accuracy (mean ± stdev) exon-exon junction accuracy metrics across
20 GEUVADIS-based simulations Precisions Recalls F-scores Rail single .984 ± .000 .880 ± .004 .929 ± .002 Rail all no filter .846 ± .002 .957 ± .001 .898 ± .001 Rail all filter .976 ± .000 .939 ± .003 .957 ± .002
Writes compact outputs
Compact outputs • junction X sample table • 17 GB
compressed for 50k SRA samples • v1 spans 21.5k samples: available at http://intropolis.rail.bio • v2 w/ 50k coming • coverage bigWigs • 10x smaller than BAM
Annotation-agnostic pipeline derfinder Leo Collado-Torres Alyssa Frazee http://rail.bio biocLite("derfinder") sidesteps
assembly & annotation limitations resolves isoform-level features
http://docs.rail.bio
https://github.com/nellore/rail tested!
Rail-RNA: Scalable analysis of RNA-seq splicing and coverage http://rail.bio Ben
Langmead Jeff Leek Leo Collado-Torres Andrew Jaffe José Alquicira Hernández Summer intern: Jamie Morton Chris Wilks Jacob Pritt