Slide 1

Slide 1 text

Work Log 07/12 HiPipe by NCGM Novel miRNA Validation Prof. Motion’s CCRT miRNA Head Into R/Bioconductor 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang  Brand-new slide style ~ pixiv id: 36875076

Slide 2

Slide 2 text

HiPipe by NCGM Intro Usage Peformance 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang  http://hipipe.ncgm.sinica.edu.tw/

Slide 3

Slide 3 text

HiPipe •  View/Control a task by its ID –  e.g., eaeabdab-0bb5-42ed-9ee9-dd3d4a58a5e2 •  Accept fastq.gz output by CASAVA –  can be paired •  Total size of a task is limited by 3TB •  I can run SNP detection of a sample now –  first 10 paired reads of Prof. Chou, human, No94 –  using GATK to detect SNP –  total process time: approx. 3.5hr –  no SNP found 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 

Slide 4

Slide 4 text

2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 

Slide 5

Slide 5 text

2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 

Slide 6

Slide 6 text

Results 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang  High Performance Pipelines for NGS Data Analysis Home Support Info Guide << back to upload Sequence alignment : bwa-0.6.2-mt Variant detection : GenomeAnalysisTKLite-2.3-9 Others : picard-tools-1.74 samtools-0.1.18 IGVTools-2.1.24 Known indel : dbsnp137 Reference sequence : human_g1k_v37 Your task ID : eaeabdab-0bb5-42ed-9ee9-dd3d4a58a5e2 Tools used in this analysis References used in this analysis View No94_ACAGTG Results Reference sequence : human_g1k_v37 Step 1: Launch IGV in JAVA First time IGV user please visit IGV site for details with 750MB RAM with 1.2GB RAM with 2GB RAM with 10GB RAM Step 2: Load Data in IGV NOTE: Before clicking the following buttons, make sure genome "Human (1kg, b37+decoy)" was selected in the dropdown list on the top left corner of IGV window. If that option is not available, click "more..." in that list and select "Human (1kg, b37+decoy)" to add it to the list. View variants View alignment result Variants (12KB, VCF Format) Alignment result (8.0KB, BAM Format) Both (20KB) Copyright © 2013 National Center for Genome Medicine, Academia Sinica, Taiwan View No94_ACAGTG Results Download No94_ACAGTG Results

Slide 7

Slide 7 text

Use a larger sample •  Select first 5,000,000 reads –  originally No94 has 30,563,931 paired reads –  after compressed, total file size is ~800 MB –  task ID: 20b45f9a-65ff-476c-8076-11c1b7bbed05 –  start at 7/8 16:30 –  end at 7/8 18:30 using 2 hrs 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 

Slide 8

Slide 8 text

2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 

Slide 9

Slide 9 text

Novel miRNA Validation What’s our story? 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 

Slide 10

Slide 10 text

What We Have Done •  identify miRNA candidates using breast dataset –  containing both normal and various cancer subtypes •  in silico verification by mapping candidates to breast and lung dataset •  We will focus on breast dataset 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 

Slide 11

Slide 11 text

What We Will Do Next •  Predict target genes and pathways of these miRNA candidates –  wish to find breast related target genes and pathway •  This prediction will be done by 建樂學長’s algorithm –  most prediction algorithms are for known miRNAs only 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 

Slide 12

Slide 12 text

Prof. Motion’s CCRT miRNA update miRBase from 19 to 20 special issues – multiple alignment 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 

Slide 13

Slide 13 text

Dataset Summary Total Read Count 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 

Slide 14

Slide 14 text

Dataset Summary Percentage of Reads Mapped to Known miRNAs 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 

Slide 15

Slide 15 text

What miRBase Changes From 19 to 20 •  For Human miRNA –  # mature form: 2245 to 2801 –  # precursor form: 1600 to 1872 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 

Slide 16

Slide 16 text

Special Issues – Multiple Alignment •  a mature form miRNA has multiple alignment position on its precursor form •  Solution: go to miRBase to check the reported position 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang  mature precursor # alignment on precursor hsa-miR-3142 hsa-mir-3142 2 hsa-miR-3673 hsa-mir-3673 3 hsa-miR-4487 hsa-mir-4487 2

Slide 17

Slide 17 text

Special Issues – Multiple Alignment >hsa-mir-3142 total read count hsa-miR-3142 read count 0 hsa-miR-3142 read count 0 remaining read count 0 exp ffffffMMMMMMMMMMMMMMMMMMMMMMffffffffffffffffffffMMMMMMMMMMMMMMMMMMMMMMffffffffffff pri_seq uucagaaaggccuuucugaaccuucagaaaggcugcugaaucuucagaaaggccuuucugaaccuucagaaaggcugcugaa pri_struct .((((...((((((((((((..((((((((((((.((((....))))...))))))))))))..)))))))))))).)))). #MM >hsa-mir-4487 total read count hsa-miR-4487 read count 0 hsa-miR-4487 read count 0 remaining read count 0 exp ffffffffffffffMMMMMMMMMMMMMMMMMMMffffffffffffffffffffMMMMMMMMMMMMMMMMMMMf pri_seq acuguccuucagccagagcuggcugaagggcagaagggaacuguccuucagccagagcuggcugaagggcaga pri_struct .(((((((((((((((..(((((((((((((((.......)))))))))))))))..))))))))))))))). #MM 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 

Slide 18

Slide 18 text

From Stem loop (cont’d) •  mir-3142 •  reported miR-3142 and alternative miR-3142 location •  These kind of miRNAs are usually low-expressed and are not expressed in our data set. 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang  u aaa cc --g a ! ucag ggccuuucugaa uucagaaaggcu cuga u! |||| |||||||||||| |||||||||||| |||| ! aguc ucggaaagacuu aagucuuuccgg gacu c! a --g cc aaa u!

Slide 19

Slide 19 text

Head Into R/Bioconductor Get human(hg19) 3’UTR info and their sequences 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 

Slide 20

Slide 20 text

Install Required Packages 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 

Slide 21

Slide 21 text

Fetch hg19 All 3’UTR Information 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang  gene ID conversion

Slide 22

Slide 22 text

Result 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 

Slide 23

Slide 23 text

Retrieve Corresponding Sequences •  Installation time ~ 1hr (download required reference) •  Run time ~ 5mins 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang  Full code on https://gist.github.com/ccwang002/5978498

Slide 24

Slide 24 text

Result 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 

Slide 25

Slide 25 text

Open in Excel •  Takes a while, for file is too big (65,247 columns) 2013.07 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang