Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Work Log 07/12

Liang Bo Wang
July 12, 2013
56

Work Log 07/12

Liang Bo Wang

July 12, 2013
Tweet

Transcript

  1. Work Log 07/12 HiPipe by NCGM Novel miRNA Validation Prof.

    Motion’s CCRT miRNA Head Into R/Bioconductor 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang  Brand-new slide style ~ pixiv id: 36875076
  2. HiPipe by NCGM Intro Usage Peformance 2013.07 Bioinformatics and Biostatistics

    Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang  http://hipipe.ncgm.sinica.edu.tw/
  3. HiPipe •  View/Control a task by its ID –  e.g.,

    eaeabdab-0bb5-42ed-9ee9-dd3d4a58a5e2 •  Accept fastq.gz output by CASAVA –  can be paired •  Total size of a task is limited by 3TB •  I can run SNP detection of a sample now –  first 10 paired reads of Prof. Chou, human, No94 –  using GATK to detect SNP –  total process time: approx. 3.5hr –  no SNP found 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 
  4. 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine,

    National Taiwan University Slides by Liang Bo Wang 
  5. 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine,

    National Taiwan University Slides by Liang Bo Wang 
  6. Results 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic

    Medicine, National Taiwan University Slides by Liang Bo Wang  High Performance Pipelines for NGS Data Analysis Home Support Info Guide << back to upload Sequence alignment : bwa-0.6.2-mt Variant detection : GenomeAnalysisTKLite-2.3-9 Others : picard-tools-1.74 samtools-0.1.18 IGVTools-2.1.24 Known indel : dbsnp137 Reference sequence : human_g1k_v37 Your task ID : eaeabdab-0bb5-42ed-9ee9-dd3d4a58a5e2 Tools used in this analysis References used in this analysis View No94_ACAGTG Results Reference sequence : human_g1k_v37 Step 1: Launch IGV in JAVA First time IGV user please visit IGV site for details with 750MB RAM with 1.2GB RAM with 2GB RAM with 10GB RAM Step 2: Load Data in IGV NOTE: Before clicking the following buttons, make sure genome "Human (1kg, b37+decoy)" was selected in the dropdown list on the top left corner of IGV window. If that option is not available, click "more..." in that list and select "Human (1kg, b37+decoy)" to add it to the list. View variants View alignment result Variants (12KB, VCF Format) Alignment result (8.0KB, BAM Format) Both (20KB) Copyright © 2013 National Center for Genome Medicine, Academia Sinica, Taiwan View No94_ACAGTG Results Download No94_ACAGTG Results
  7. Use a larger sample •  Select first 5,000,000 reads – 

    originally No94 has 30,563,931 paired reads –  after compressed, total file size is ~800 MB –  task ID: 20b45f9a-65ff-476c-8076-11c1b7bbed05 –  start at 7/8 16:30 –  end at 7/8 18:30 using 2 hrs 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 
  8. 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine,

    National Taiwan University Slides by Liang Bo Wang 
  9. Novel miRNA Validation What’s our story? 2013.07 Bioinformatics and Biostatistics

    Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 
  10. What We Have Done •  identify miRNA candidates using breast

    dataset –  containing both normal and various cancer subtypes •  in silico verification by mapping candidates to breast and lung dataset •  We will focus on breast dataset 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 
  11. What We Will Do Next •  Predict target genes and

    pathways of these miRNA candidates –  wish to find breast related target genes and pathway •  This prediction will be done by 建樂學長’s algorithm –  most prediction algorithms are for known miRNAs only 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 
  12. Prof. Motion’s CCRT miRNA update miRBase from 19 to 20

    special issues – multiple alignment 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 
  13. Dataset Summary Total Read Count 2013.07 Bioinformatics and Biostatistics Core,

    NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 
  14. Dataset Summary Percentage of Reads Mapped to Known miRNAs 2013.07

    Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 
  15. What miRBase Changes From 19 to 20 •  For Human

    miRNA –  # mature form: 2245 to 2801 –  # precursor form: 1600 to 1872 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 
  16. Special Issues – Multiple Alignment •  a mature form miRNA

    has multiple alignment position on its precursor form •  Solution: go to miRBase to check the reported position 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang  mature precursor # alignment on precursor hsa-miR-3142 hsa-mir-3142 2 hsa-miR-3673 hsa-mir-3673 3 hsa-miR-4487 hsa-mir-4487 2
  17. Special Issues – Multiple Alignment >hsa-mir-3142 total read count hsa-miR-3142

    read count 0 hsa-miR-3142 read count 0 remaining read count 0 exp ffffffMMMMMMMMMMMMMMMMMMMMMMffffffffffffffffffffMMMMMMMMMMMMMMMMMMMMMMffffffffffff pri_seq uucagaaaggccuuucugaaccuucagaaaggcugcugaaucuucagaaaggccuuucugaaccuucagaaaggcugcugaa pri_struct .((((...((((((((((((..((((((((((((.((((....))))...))))))))))))..)))))))))))).)))). #MM >hsa-mir-4487 total read count hsa-miR-4487 read count 0 hsa-miR-4487 read count 0 remaining read count 0 exp ffffffffffffffMMMMMMMMMMMMMMMMMMMffffffffffffffffffffMMMMMMMMMMMMMMMMMMMf pri_seq acuguccuucagccagagcuggcugaagggcagaagggaacuguccuucagccagagcuggcugaagggcaga pri_struct .(((((((((((((((..(((((((((((((((.......)))))))))))))))..))))))))))))))). #MM 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 
  18. From Stem loop (cont’d) •  mir-3142 •  reported miR-3142 and

    alternative miR-3142 location •  These kind of miRNAs are usually low-expressed and are not expressed in our data set. 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang  u aaa cc --g a ! ucag ggccuuucugaa uucagaaaggcu cuga u! |||| |||||||||||| |||||||||||| |||| ! aguc ucggaaagacuu aagucuuuccgg gacu c! a --g cc aaa u!
  19. Head Into R/Bioconductor Get human(hg19) 3’UTR info and their sequences

    2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 
  20. Install Required Packages 2013.07 Bioinformatics and Biostatistics Core, NTU Center

    of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 
  21. Fetch hg19 All 3’UTR Information 2013.07 Bioinformatics and Biostatistics Core,

    NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang  gene ID conversion
  22. Result 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic

    Medicine, National Taiwan University Slides by Liang Bo Wang 
  23. Retrieve Corresponding Sequences •  Installation time ~ 1hr (download required

    reference) •  Run time ~ 5mins 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang  Full code on https://gist.github.com/ccwang002/5978498
  24. Result 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic

    Medicine, National Taiwan University Slides by Liang Bo Wang 
  25. Open in Excel •  Takes a while, for file is

    too big (65,247 columns) 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang