Slide 1

Slide 1 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine Work Log 3/1 Liang Bo Wang N vs TN Di!. Exp. NGS Workflow Version Control – git 

Slide 2

Slide 2 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 2 Previous Work

Slide 3

Slide 3 text

Datasets A, B, C Summary Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 3 Dataset ID Type # Sample Platform Description GSE39162 (A) Breast 15 (paired T, TN, N) # of T, TN, N = 5 GA, GAII de-novo miRNA GSE33858 (B) Lung 32 (paired T, NT) # of T, NT = 16 GAIIx GSE29173 (C) Breast 245 (unpaired) # of Normal = 16 GAIIx barcoded

Slide 4

Slide 4 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 4 Question Two Weeks Ago Compare type N vs TN of dataset A

Slide 5

Slide 5 text

N vs TN sample type in dataset A Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 5 dataset A N mean TN mean p-value chr7_8791 1.427 1.264 0.661 chr11_13342 0.179 0.769 0.156 chr22_20736 1.123 0.149 0.266 chr13_14817 0.395 0.297 0.651 chr17_17828 0.398 0.697 0.310 chr20_19494 0.392 0.387 0.973 chr20_19450 0.690 0.368 0.293 chr18_18769 0.000 0.343 0.374 chr11_13709 0.137 0.418 0.419 chr2_2356 0.216 0.113 0.509 chr4_5692 0.216 0.231 0.914 chr11_12760 0.091 0.213 0.362 chr3_3910 0.474 0.325 0.642 chr1_692 0.000 0.000 NA chr10_12452 0.178 0.072 0.409 dataset A N mean TN mean p-value chr6_7548 0.343 0.589 0.362 chr1_944 0.000 0.113 0.374 chr6_8151 0.343 0.435 0.657 chr11_13239 0.043 0.000 0.374 chr7_8991 0.127 0.113 0.925 chr7_8849 0.042 0.649 0.169 chr22_20809 0.220 0.221 0.991 chr2_3487 0.645 0.069 0.185 chr7_8673 0.212 0.328 0.403 chr14_15459 0.212 0.149 0.733 chr20_19463 0.175 0.080 0.514 chr6_8250 3.482 3.797 0.838 chr19_19992 0.046 0.537 0.023 chr1_689 0.219 0.414 0.397 chr17_17785 13.621 5.885 0.176

Slide 6

Slide 6 text

N vs TN of dataset A (continued) Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 6 0 5 10 15 20 25 30 0 5 10 15 20 25 30 N TN chr7_8791 chr11_13342 chr22_20736 chr13_14817 chr17_17828 chr20_19494 chr20_19450 chr18_18769 chr11_13709 chr2_2356 chr4_5692 chr11_12760 chr3_3910 chr1_692 chr10_12452 chr6_7548 chr1_944 chr6_8151 chr11_13239 chr7_8991 chr7_8849 chr22_20809 chr2_3487 chr7_8673 chr14_15459 chr20_19463 chr6_8250 chr19_19992 chr1_689 chr17_17785 Expression of different sample types in dataset A (p = 0.05) Reads Per Million

Slide 7

Slide 7 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 7 Compare ‘N’ of all dataset

Slide 8

Slide 8 text

Comparison of all N type samples Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 8 0 10000 20000 30000 40000 50000 0 10000 20000 30000 40000 50000 0 10000 20000 30000 40000 50000 0 10000 20000 30000 40000 50000 N of A TN of A NT of B Normal of C chr7_8791 chr11_13342 chr22_20736 chr13_14817 chr17_17828 chr20_19494 chr20_19450 chr18_18769 chr11_13709 chr2_2356 chr4_5692 chr11_12760 chr3_3910 chr1_692 chr10_12452 chr6_7548 chr1_944 chr6_8151 chr11_13239 chr7_8991 chr7_8849 chr22_20809 chr2_3487 chr7_8673 chr14_15459 chr20_19463 chr6_8250 chr19_19992 chr1_689 chr17_17785 Expression of all type 'N' samples Reads Per Million

Slide 9

Slide 9 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 9 Noted •  expression of chr6_8250, chr17_17785: •  A, C >> B in both Tumor (>2 order) and Normal (0 in B) type 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 N of A TN of A NT of B Normal of C chr7_8791 chr11_13342 chr22_20736 chr13_14817 chr17_17828 chr20_19494 chr20_19450 chr18_18769 chr11_13709 chr2_2356 chr4_5692 chr11_12760 chr3_3910 chr1_692 chr10_12452 chr6_7548 chr1_944 chr6_8151 chr11_13239 chr7_8991 chr7_8849 chr22_20809 chr2_3487 chr7_8673 chr14_15459 chr20_19463 chr6_8250 chr19_19992 chr1_689 chr17_17785 Expression of all type 'N' samples Reads Per Million

Slide 10

Slide 10 text

chr6_8250, located on intron of TULP4(TUSP)  Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 10

Slide 11

Slide 11 text

chr17_17785, located on intron of ACCN1(ASIC2) Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 11

Slide 12

Slide 12 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 12 A B C N-mean T-mean p-value N-mean T-mean p-value N-mean T-mean p-value chr7_8791 1.264 0.963 0.584 0.205 0.077 0.217 0.197 0.238 0.798 chr11_13342 0.769 2.183 0.436 0.089 0.177 0.005 0.217 0.541 0.275 chr22_20736 0.149 0.267 0.608 0.244 0.162 0.118 0.000 0.249 0.012 chr13_14817 0.297 0.295 0.992 0.004 0.003 0.866 0.000 0.009 0.318 chr17_17828 0.697 0.249 0.154 0.004 0.000 0.097 0.071 0.331 0.021 chr20_19494 0.387 0.402 0.925 0.043 0.032 0.410 1.101 0.078 0.217 chr20_19450 0.368 0.000 0.198 0.041 0.018 0.042 0.217 0.106 0.626 chr18_18769 0.343 1.120 0.522 0.000 0.000 NA 0.000 0.335 0.008 chr11_13709 0.418 0.077 0.316 0.001 0.000 0.325 0.163 0.127 0.858 chr2_2356 0.113 0.000 0.374 0.509 0.382 0.245 0.417 0.396 0.940 chr4_5692 0.231 0.142 0.621 0.105 0.080 0.228 3.470 0.433 0.222 chr11_12760 0.213 0.494 0.414 0.007 0.037 0.034 0.000 0.053 0.318 chr3_3910 0.325 0.057 0.346 0.000 0.000 NA 107.354 0.580 0.047 chr1_692 0.000 0.603 0.143 0.000 0.005 0.056 0.071 0.232 0.105 chr10_12452 0.072 0.000 0.178 0.012 0.017 0.347 0.000 0.050 0.069 chr6_7548 0.589 0.369 0.510 0.075 0.233 0.023 0.000 0.342 0.008 chr1_944 0.113 0.000 0.374 0.001 0.002 0.515 0.000 0.054 0.272 chr6_8151 0.435 0.184 0.253 0.075 0.233 0.023 0.000 0.342 0.008 chr11_13239 0.000 0.000 NA 0.160 0.157 0.964 0.359 0.071 0.187 chr7_8991 0.113 0.628 0.012 0.039 0.037 0.840 0.000 0.140 0.054 chr7_8849 0.649 0.793 0.723 0.027 0.041 0.472 0.365 0.445 0.786 chr22_20809 0.221 0.184 0.791 4.147 3.200 0.701 238.809 311.001 0.121 chr2_3487 0.069 0.325 0.305 0.001 0.018 0.359 8.719 10.700 0.536 chr7_8673 0.328 0.242 0.666 0.000 0.004 0.197 0.000 0.418 0.124 chr14_15459 0.149 0.312 0.450 0.180 0.236 0.332 0.222 0.966 0.013 chr20_19463 0.080 0.312 0.296 0.012 0.023 0.342 0.000 0.113 0.090 chr6_8250 3.797 4.475 0.757 0.000 0.060 0.268 13917.165 4797.623 0.047 chr19_19992 0.537 0.128 0.044 0.011 62.053 0.324 7313.514 11421.140 0.018 chr1_689 0.414 0.281 0.644 0.001 0.000 0.325 0.000 0.029 0.180 chr17_17785 5.885 8.258 0.368 0.000 0.355 0.302 233.105 306.265 0.117 chr_: BvsL DE chr_: not passed neg test N vs T

Slide 13

Slide 13 text

Discussion For N vs TN, •  sample size(=5) are too small for such comparison •  expression of N and TN are different but not significant Overall, •  Variation of same sample type is too large to compare, Expression are too low. Results are not convincing •  Datasets of better quality are required Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 13 Breast vs Lung N vs T per dataset chr6_8250 significant not in A, B significant in C chr17_17785 significant not significant

Slide 14

Slide 14 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 14 NGS Workflow @lab

Slide 15

Slide 15 text

Every step in workflow works independently •  Every step => a script ( ADD some VEGATABLE ) •  Script runs a specific simple task ( THE MEAT !!) •  Report (log) will be generated ( THE BREAD ) •  tables/figures also if needed Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 15 Main Tool Ex: BLAST, DEseq Self written scripts: passing arguments Input data Report in text, HTML, tables, figures Self written scripts: organizing results

Slide 16

Slide 16 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 16 This is how a workflow looks like 乍看之下很醜,但實際上從蔬菜的角度, 就看成5層的漢堡(5個步驟) 上學期在做的事,也是 galaxy 的做法

Slide 17

Slide 17 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 17 SRA to FASTA/Q NCBI SRA-Toolkit Adapter Trimming cutadapt / FASTX_Clipper Quality Control NGS QC Toolkit / ... FASTQ to FASTA (Many) / biopython / ... Alignment to genome Bowtie Alignment to (…) Bowtie / BWA/ BLAST Differential Expression HTSeq / DESeq / ... Showing statistics interatively HTML -> d3.js Showing statistics important R / (any plot software) CSV file Figures

Slide 18

Slide 18 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 18 Version Control - Git

Slide 19

Slide 19 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 19

Slide 20

Slide 20 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 20 Backup Slides

Slide 21

Slide 21 text

Breast vs Lung of T type sample Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 21 T type sample breast- mean lung- mean p-value Negative test chr7_8791 0.250 0.077 0.047 chr11_13342 0.569 0.177 0.047 chr22_20736 0.250 0.162 0.379 chr13_14817 0.014 0.003 0.275 chr17_17828 0.329 0.000 0.000 chr20_19494 0.084 0.032 0.328 chr20_19450 0.104 0.018 0.129 chr18_18769 0.348 0.000 0.005 chr11_13709 0.126 0.000 0.275 chr2_2356 0.390 0.382 0.955 chr4_5692 0.428 0.080 0.130 chr11_12760 0.060 0.037 0.659 chr3_3910 0.571 0.000 0.016 not passed chr1_692 0.238 0.005 0.000 chr10_12452 0.049 0.017 0.246 T type sample breast- mean lung- mean p-value Negative test chr6_7548 0.343 0.233 0.436 chr1_944 0.053 0.002 0.287 chr6_8151 0.340 0.233 0.450 chr11_13239 0.070 0.157 0.182 chr7_8991 0.149 0.037 0.120 chr7_8849 0.451 0.041 0.004 chr22_20809 305.666 3.200 0.000 not passed chr2_3487 10.522 0.018 0.000 not passed chr7_8673 0.415 0.004 0.124 chr14_15459 0.954 0.236 0.000 chr20_19463 0.117 0.023 0.158 chr6_8250 4715.337 0.060 0.000 not passed chr19_19992 11225.071 62.053 0.000 not passed chr1_689 0.033 0.000 0.122 chr17_17785 301.149 0.355 0.000 not passed

Slide 22

Slide 22 text

Candidates with significant differential expression Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 22 ID location gene gene function chr7_8791 intron ZYX zyxin chr11_13342 exon BTBD10 糖代謝 chr17_17828 intron CASC3 cancer susceptibility candidate chr18_18769 intergenic - chr3_3910 intron NR2C2 zinc finger chr1_692 intron CREB3L4 cAMP related chr7_8849 3'UTR RBM33 RNA binding motif chr22_20809 intron AP1B1P1 pseudogene chr2_3487 intergenic - chr14_15459 intron IF127 interferon chr6_8250 intron TULP4 tubby like protein chr19_19992 intron SPTBN4 beta-spectrin chr17_17785 intron ACCN1 DEG/ENaC, neurotransmission, Multiple Sclerosis red: pass negative test

Slide 23

Slide 23 text

N vs TN sample type in dataset A Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 23 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0 2.5 N TN chr7_8791 chr11_13342 chr22_20736 chr13_14817 chr17_17828 chr20_19494 chr20_19450 chr18_18769 chr11_13709 chr2_2356 chr4_5692 chr11_12760 chr3_3910 chr1_692 chr10_12452 chr6_7548 chr1_944 chr6_8151 chr11_13239 chr7_8991 chr7_8849 chr22_20809 chr2_3487 chr7_8673 chr14_15459 chr20_19463 chr6_8250 chr19_19992 chr1_689 chr17_17785 Expression of different sample types in dataset A (p = 0.05) Reads Per Million