Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Work Log 03/01

Liang Bo Wang
February 28, 2013
41

Work Log 03/01

Liang Bo Wang

February 28, 2013
Tweet

Transcript

  1. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine Work

    Log 3/1 Liang Bo Wang N vs TN Di!. Exp. NGS Workflow Version Control – git 
  2. Datasets A, B, C Summary Bioinformatics and Biostatistics Core, NTU

    Center of Genomic Medicine 3 Dataset ID Type # Sample Platform Description GSE39162 (A) Breast 15 (paired T, TN, N) # of T, TN, N = 5 GA, GAII de-novo miRNA GSE33858 (B) Lung 32 (paired T, NT) # of T, NT = 16 GAIIx GSE29173 (C) Breast 245 (unpaired) # of Normal = 16 GAIIx barcoded
  3. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 4

    Question Two Weeks Ago Compare type N vs TN of dataset A
  4. N vs TN sample type in dataset A Bioinformatics and

    Biostatistics Core, NTU Center of Genomic Medicine 5 dataset A N mean TN mean p-value chr7_8791 1.427 1.264 0.661 chr11_13342 0.179 0.769 0.156 chr22_20736 1.123 0.149 0.266 chr13_14817 0.395 0.297 0.651 chr17_17828 0.398 0.697 0.310 chr20_19494 0.392 0.387 0.973 chr20_19450 0.690 0.368 0.293 chr18_18769 0.000 0.343 0.374 chr11_13709 0.137 0.418 0.419 chr2_2356 0.216 0.113 0.509 chr4_5692 0.216 0.231 0.914 chr11_12760 0.091 0.213 0.362 chr3_3910 0.474 0.325 0.642 chr1_692 0.000 0.000 NA chr10_12452 0.178 0.072 0.409 dataset A N mean TN mean p-value chr6_7548 0.343 0.589 0.362 chr1_944 0.000 0.113 0.374 chr6_8151 0.343 0.435 0.657 chr11_13239 0.043 0.000 0.374 chr7_8991 0.127 0.113 0.925 chr7_8849 0.042 0.649 0.169 chr22_20809 0.220 0.221 0.991 chr2_3487 0.645 0.069 0.185 chr7_8673 0.212 0.328 0.403 chr14_15459 0.212 0.149 0.733 chr20_19463 0.175 0.080 0.514 chr6_8250 3.482 3.797 0.838 chr19_19992 0.046 0.537 0.023 chr1_689 0.219 0.414 0.397 chr17_17785 13.621 5.885 0.176
  5. N vs TN of dataset A (continued) Bioinformatics and Biostatistics

    Core, NTU Center of Genomic Medicine 6 0 5 10 15 20 25 30 0 5 10 15 20 25 30 N TN chr7_8791 chr11_13342 chr22_20736 chr13_14817 chr17_17828 chr20_19494 chr20_19450 chr18_18769 chr11_13709 chr2_2356 chr4_5692 chr11_12760 chr3_3910 chr1_692 chr10_12452 chr6_7548 chr1_944 chr6_8151 chr11_13239 chr7_8991 chr7_8849 chr22_20809 chr2_3487 chr7_8673 chr14_15459 chr20_19463 chr6_8250 chr19_19992 chr1_689 chr17_17785 Expression of different sample types in dataset A (p = 0.05) Reads Per Million
  6. Comparison of all N type samples Bioinformatics and Biostatistics Core,

    NTU Center of Genomic Medicine 8 0 10000 20000 30000 40000 50000 0 10000 20000 30000 40000 50000 0 10000 20000 30000 40000 50000 0 10000 20000 30000 40000 50000 N of A TN of A NT of B Normal of C chr7_8791 chr11_13342 chr22_20736 chr13_14817 chr17_17828 chr20_19494 chr20_19450 chr18_18769 chr11_13709 chr2_2356 chr4_5692 chr11_12760 chr3_3910 chr1_692 chr10_12452 chr6_7548 chr1_944 chr6_8151 chr11_13239 chr7_8991 chr7_8849 chr22_20809 chr2_3487 chr7_8673 chr14_15459 chr20_19463 chr6_8250 chr19_19992 chr1_689 chr17_17785 Expression of all type 'N' samples Reads Per Million
  7. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 9

    Noted •  expression of chr6_8250, chr17_17785: •  A, C >> B in both Tumor (>2 order) and Normal (0 in B) type 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 N of A TN of A NT of B Normal of C chr7_8791 chr11_13342 chr22_20736 chr13_14817 chr17_17828 chr20_19494 chr20_19450 chr18_18769 chr11_13709 chr2_2356 chr4_5692 chr11_12760 chr3_3910 chr1_692 chr10_12452 chr6_7548 chr1_944 chr6_8151 chr11_13239 chr7_8991 chr7_8849 chr22_20809 chr2_3487 chr7_8673 chr14_15459 chr20_19463 chr6_8250 chr19_19992 chr1_689 chr17_17785 Expression of all type 'N' samples Reads Per Million
  8. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 12

    A B C N-mean T-mean p-value N-mean T-mean p-value N-mean T-mean p-value chr7_8791 1.264 0.963 0.584 0.205 0.077 0.217 0.197 0.238 0.798 chr11_13342 0.769 2.183 0.436 0.089 0.177 0.005 0.217 0.541 0.275 chr22_20736 0.149 0.267 0.608 0.244 0.162 0.118 0.000 0.249 0.012 chr13_14817 0.297 0.295 0.992 0.004 0.003 0.866 0.000 0.009 0.318 chr17_17828 0.697 0.249 0.154 0.004 0.000 0.097 0.071 0.331 0.021 chr20_19494 0.387 0.402 0.925 0.043 0.032 0.410 1.101 0.078 0.217 chr20_19450 0.368 0.000 0.198 0.041 0.018 0.042 0.217 0.106 0.626 chr18_18769 0.343 1.120 0.522 0.000 0.000 NA 0.000 0.335 0.008 chr11_13709 0.418 0.077 0.316 0.001 0.000 0.325 0.163 0.127 0.858 chr2_2356 0.113 0.000 0.374 0.509 0.382 0.245 0.417 0.396 0.940 chr4_5692 0.231 0.142 0.621 0.105 0.080 0.228 3.470 0.433 0.222 chr11_12760 0.213 0.494 0.414 0.007 0.037 0.034 0.000 0.053 0.318 chr3_3910 0.325 0.057 0.346 0.000 0.000 NA 107.354 0.580 0.047 chr1_692 0.000 0.603 0.143 0.000 0.005 0.056 0.071 0.232 0.105 chr10_12452 0.072 0.000 0.178 0.012 0.017 0.347 0.000 0.050 0.069 chr6_7548 0.589 0.369 0.510 0.075 0.233 0.023 0.000 0.342 0.008 chr1_944 0.113 0.000 0.374 0.001 0.002 0.515 0.000 0.054 0.272 chr6_8151 0.435 0.184 0.253 0.075 0.233 0.023 0.000 0.342 0.008 chr11_13239 0.000 0.000 NA 0.160 0.157 0.964 0.359 0.071 0.187 chr7_8991 0.113 0.628 0.012 0.039 0.037 0.840 0.000 0.140 0.054 chr7_8849 0.649 0.793 0.723 0.027 0.041 0.472 0.365 0.445 0.786 chr22_20809 0.221 0.184 0.791 4.147 3.200 0.701 238.809 311.001 0.121 chr2_3487 0.069 0.325 0.305 0.001 0.018 0.359 8.719 10.700 0.536 chr7_8673 0.328 0.242 0.666 0.000 0.004 0.197 0.000 0.418 0.124 chr14_15459 0.149 0.312 0.450 0.180 0.236 0.332 0.222 0.966 0.013 chr20_19463 0.080 0.312 0.296 0.012 0.023 0.342 0.000 0.113 0.090 chr6_8250 3.797 4.475 0.757 0.000 0.060 0.268 13917.165 4797.623 0.047 chr19_19992 0.537 0.128 0.044 0.011 62.053 0.324 7313.514 11421.140 0.018 chr1_689 0.414 0.281 0.644 0.001 0.000 0.325 0.000 0.029 0.180 chr17_17785 5.885 8.258 0.368 0.000 0.355 0.302 233.105 306.265 0.117 chr_<ID>: BvsL DE chr_<ID>: not passed neg test N vs T
  9. Discussion For N vs TN, •  sample size(=5) are too

    small for such comparison •  expression of N and TN are different but not significant Overall, •  Variation of same sample type is too large to compare, Expression are too low. Results are not convincing •  Datasets of better quality are required Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 13 Breast vs Lung N vs T per dataset chr6_8250 significant not in A, B significant in C chr17_17785 significant not significant
  10. Every step in workflow works independently •  Every step =>

    a script ( ADD some VEGATABLE ) •  Script runs a specific simple task ( THE MEAT !!) •  Report (log) will be generated ( THE BREAD ) •  tables/figures also if needed Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 15 Main Tool Ex: BLAST, DEseq Self written scripts: passing arguments Input data Report in text, HTML, tables, figures Self written scripts: organizing results
  11. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 16

    This is how a workflow looks like 乍看之下很醜,但實際上從蔬菜的角度, 就看成5層的漢堡(5個步驟) 上學期在做的事,也是 galaxy 的做法
  12. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 17

    SRA to FASTA/Q NCBI SRA-Toolkit Adapter Trimming cutadapt / FASTX_Clipper Quality Control NGS QC Toolkit / ... FASTQ to FASTA (Many) / biopython / ... Alignment to genome Bowtie Alignment to (…) Bowtie / BWA/ BLAST Differential Expression HTSeq / DESeq / ... Showing statistics interatively HTML -> d3.js Showing statistics important R / (any plot software) CSV file Figures
  13. Breast vs Lung of T type sample Bioinformatics and Biostatistics

    Core, NTU Center of Genomic Medicine 21 T type sample breast- mean lung- mean p-value Negative test chr7_8791 0.250 0.077 0.047 chr11_13342 0.569 0.177 0.047 chr22_20736 0.250 0.162 0.379 chr13_14817 0.014 0.003 0.275 chr17_17828 0.329 0.000 0.000 chr20_19494 0.084 0.032 0.328 chr20_19450 0.104 0.018 0.129 chr18_18769 0.348 0.000 0.005 chr11_13709 0.126 0.000 0.275 chr2_2356 0.390 0.382 0.955 chr4_5692 0.428 0.080 0.130 chr11_12760 0.060 0.037 0.659 chr3_3910 0.571 0.000 0.016 not passed chr1_692 0.238 0.005 0.000 chr10_12452 0.049 0.017 0.246 T type sample breast- mean lung- mean p-value Negative test chr6_7548 0.343 0.233 0.436 chr1_944 0.053 0.002 0.287 chr6_8151 0.340 0.233 0.450 chr11_13239 0.070 0.157 0.182 chr7_8991 0.149 0.037 0.120 chr7_8849 0.451 0.041 0.004 chr22_20809 305.666 3.200 0.000 not passed chr2_3487 10.522 0.018 0.000 not passed chr7_8673 0.415 0.004 0.124 chr14_15459 0.954 0.236 0.000 chr20_19463 0.117 0.023 0.158 chr6_8250 4715.337 0.060 0.000 not passed chr19_19992 11225.071 62.053 0.000 not passed chr1_689 0.033 0.000 0.122 chr17_17785 301.149 0.355 0.000 not passed
  14. Candidates with significant differential expression Bioinformatics and Biostatistics Core, NTU

    Center of Genomic Medicine 22 ID location gene gene function chr7_8791 intron ZYX zyxin chr11_13342 exon BTBD10 糖代謝 chr17_17828 intron CASC3 cancer susceptibility candidate chr18_18769 intergenic - chr3_3910 intron NR2C2 zinc finger chr1_692 intron CREB3L4 cAMP related chr7_8849 3'UTR RBM33 RNA binding motif chr22_20809 intron AP1B1P1 pseudogene chr2_3487 intergenic - chr14_15459 intron IF127 interferon chr6_8250 intron TULP4 tubby like protein chr19_19992 intron SPTBN4 beta-spectrin chr17_17785 intron ACCN1 DEG/ENaC, neurotransmission, Multiple Sclerosis red: pass negative test
  15. N vs TN sample type in dataset A Bioinformatics and

    Biostatistics Core, NTU Center of Genomic Medicine 23 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0 2.5 N TN chr7_8791 chr11_13342 chr22_20736 chr13_14817 chr17_17828 chr20_19494 chr20_19450 chr18_18769 chr11_13709 chr2_2356 chr4_5692 chr11_12760 chr3_3910 chr1_692 chr10_12452 chr6_7548 chr1_944 chr6_8151 chr11_13239 chr7_8991 chr7_8849 chr22_20809 chr2_3487 chr7_8673 chr14_15459 chr20_19463 chr6_8250 chr19_19992 chr1_689 chr17_17785 Expression of different sample types in dataset A (p = 0.05) Reads Per Million