• Not collapsing for disk usage and BLAST function • Summarize results by myself • Quality control will be added in near future • NGS QC Toolkit (建樂學長) may be used • Automation & Parallelization NCBI SRA-Toolkit dump to fasta/q format Original datasets on GEO SRRxxx.sra, SRRyyy.sra, ... Fastq format with QC and sequencing details SRRxxx.fastq, SRRyyy.fastq, ... FastX Toolkit clip off 3' adapter Only clipped sequences left SRRxxx_clipped.fastq, SRRyyy_clipped.fastq, ... Quality Control discard low score reads Fasta Converter file format conversion Simpler file format: fasta SRRxxx_clipped.fasta, SRRyyy_clipped.fasta, ... BLAST+ make blast database Original datasets on GEO SRRxxx.sra, SRRyyy.sra, ... BLAST+ blastn query for every candidates on every dataset novel miRNA candidates candidate01.fa, candidate02.fa, ... Handmade Script summary all queries BLAST detail results for every query candidate01_xxx.csv, candidate02_xxx.csv, … candidate01_yyy.csv, candidate02_yyy.csv, … …, …, … Summarized read count for all candidates candidate01-10_xxx-zzz.csv Excel table output Script Automation