Work Log 3/22

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine A
d a p t e r, R N A - S e q , Tu x e d o p ro t o c o l Work Log 03/22

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 2
Sample A Sample B Project Mr.T *.bcl Demultiplexing Illumina CASAVA 1.8 FASTQ R2.fastq FASTQ R1.fastq Sample Z Sample Y Project Mrs.A Paired-end (zipped) FASTQ R2.fastq FASTQ R1.fastq FASTQ R2.fastq FASTQ R1.fastq FASTQ R2.fastq FASTQ R1.fastq Quality Check FastQC v0.10.1 HTML *.html Figs *.png Report *.txt QC & Trimming cutadapt, seqtk, … (cleaned) R2.fastq Free from adaptters, PCR primers, …, contamination

Illustration of different constructs and the reads produced. •  I = Inserts •  R = single-end reads •  R1, R2 = paired-end reads •  LR = Read length •  LI = insert length A)  LI ≥ LR B)  LI < LR C)  LI ≥ 2LR D)  LR < LI < 2LR E)  LI<LR

File organization by Illumina demutiplexing Grouped under YYMMDD_<machine name>_XXXX_FCID/
•  Project_<Prj name>/ •  Sample_<Smpl name>/ •  <Smpl name>_<Index>_<Lane No>_R1_001.fastq.gz •  <Smpl name>_<Index>_<Lane No>_R1_001.fastq.gz •  SampleSheet.csv •  Project_A/ •  Sample_control/ •  Sample_cond1/ •  Sample_cond2/ •  cond2_AATTCC_L005_R1_001.fastq.gz •  cond2_AATTCC_L005_R2_001.fastq.gz •  SampleSheet.csv Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 4

RNA-Seq Tuxedo protocol = TopHat + Cufflinks + CummeRund

(cleaned) R2.fastq (cleaned) R1.fastq Genome Alignment TopHat v2.0.8 Transcript Assembly Cufﬂinks v2.0.2 Transcript Assembly HTSeq v0.5.4p1 FPKM Read counts by gene

(cleaned) R2.fastq (cleaned) R1.fastq Genome Alignment TopHat v2.0.8 iGenome DB Whole Genome Sequence Ex. hg19.fa Bowtie2 prebuild FW index genome.* Annotation gene.gtf a read / fragment ( single / paired-end ) Genome Alignment Bowtie v2.1.0 chr15: 314,159 - 320,000 sequence of chr15 Splicing ( known / novel ) sequence of chr15 chr15: 271,828 - 28,000 chr15: 317,000 - 34,000 exon1 exon2 mapped not mapped try spliced accetped_ hits.bam umapped. bam deletions .bed insertions .bed junctions .bed

Running time for TopHat Bioinformatics and Biostatistics Core, NTU Center
of Genomic Medicine 8 Project Sample Taxonomy Ref # of reads (millions reads) TopHat running time Lin A-D mouse mm10 36.9 4h 39m A-W mouse mm10 33.3 2h 32m D14G chicken galGal4 49.0 3h 1m StageX chicken galGal4 35.5 2h 17m Chou No94 human hg19 61.1 6h 13m No95 human hg19 66.8 7h 5m No97 human hg19 68.1 7h 12m

w/o gene annotation reference Bioinformatics and Biostatistics Core, NTU Center
of Genomic Medicine 9

•  if gene is alternatively spliced •  algorithm based on graph theory (b) •  isoforms ( alt. splice transcripts) (c) •  reads map to different sets of exons on same region •  read maps to a portion of exon •  expression rate by FPKM •  FPKM = fragments per kilobase of transcript pairs per million mapped reads •  some exons can be shared •  expression of each isoform (transcript) is not straight-forward (d) •  statistical inference (e) •  gene expression (FPKM) = sum of exp. of all isoforms directly

Terminology - GTF •  GTF = Gene Transfer Format • 
Gene ID, Transcript ID, feature (exon, intro, …), postition Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 15

Work Log 3/22

Work Log 3/22

Liang Bo Wang

More Decks by Liang Bo Wang

Featured

Transcript

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine A

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 2

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 3

File organization by Illumina demutiplexing Grouped under YYMMDD_<machine name>_XXXX_FCID/

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 5

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 6

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 7

Running time for TopHat Bioinformatics and Biostatistics Core, NTU Center

w/o gene annotation reference Bioinformatics and Biostatistics Core, NTU Center

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 10

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 11

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 12

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 13

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 14

Terminology - GTF •  GTF = Gene Transfer Format •