Slide 3
Slide 3 text
Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 3
• Not collapsing
for disk usage
and BLAST
function
• Summarize results
by myself
• Quality control
will be added in
near future
• NGS QC Toolkit
(建樂學長)
may be used
• Automation &
Parallelization
NCBI SRA-Toolkit
dump to fasta/q format
Original datasets
on GEO
SRRxxx.sra, SRRyyy.sra, ...
Fastq format with QC
and sequencing details
SRRxxx.fastq, SRRyyy.fastq, ...
FastX Toolkit
clip off 3' adapter
Only clipped sequences left
SRRxxx_clipped.fastq,
SRRyyy_clipped.fastq,
...
Quality Control
discard low score reads
Fasta Converter
file format conversion
Simpler file format: fasta
SRRxxx_clipped.fasta,
SRRyyy_clipped.fasta,
...
BLAST+
make blast database
Original datasets
on GEO
SRRxxx.sra, SRRyyy.sra, ...
BLAST+
blastn query for
every candidates
on every dataset
novel miRNA
candidates
candidate01.fa,
candidate02.fa,
...
Handmade Script
summary all queries
BLAST detail results
for every query
candidate01_xxx.csv, candidate02_xxx.csv, …
candidate01_yyy.csv, candidate02_yyy.csv, …
…, …, …
Summarized read count
for all candidates
candidate01-10_xxx-zzz.csv
Excel
table output Script
Automation