Work Log 06/13

2e195d9f305aed00da7d05e20db6423c?s=47 Liang Bo Wang
June 12, 2014
41

Work Log 06/13

2e195d9f305aed00da7d05e20db6423c?s=128

Liang Bo Wang

June 12, 2014
Tweet

Transcript

  1. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Work Log 06/12 Speaker: Liang Bo Wang  2014.06 Slides by Liang Bo Wang
  2. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University BioCloud Architecture overview Development/project progress Paper/poster possible submission  2014.06 Slides by Liang Bo Wang
  3. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Architecture overview  2014.06 Slides by Liang Bo Wang Technical detail •  VM for each user •  or Hadoop cluster … don’t care •  Communicate by defined API Web Frontend •  Report generator (our part) •  and user/analysis management (FXN)
  4. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Workflow (view by functions)  2014.06 Slides by Liang Bo Wang Explicitly, we are working on this
  5. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Objectives for report generator •  From view of either NGS service provider or web developer, this report generator should –  Generate a static/local/portable analysis report for service user –  View a summary report on web after submitted job finishes •  Therefore our generator first takes local file input and produces local report •  Host the report on web (basically)  2014.06 Slides by Liang Bo Wang
  6. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Manual for report generator •  A manual for result interpretation •  Use Sphinx for manual generation –  Take plain text (reStructured Text, rst) into html pages –  Easier than word to maintain •  How/who/when to fill all the contents?  2014.06 Slides by Liang Bo Wang Link to detailed manual page
  7. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University How Sphinx works  2014.06 Slides by Liang Bo Wang by Sphinx and docutils from RST files
  8. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Development progress •  Two pipelines result page status –  Tuxedo: remain Cufflink, Cuffdiff –  VarScan: almost done, changing lib to jsGrid •  STAR and GATK are still in progress •  Rewriting the generator to reuse same result subpage, such as FastQC, Tophat or BWA •  Writing the parser for real result data (generated last week)  2014.06 Slides by Liang Bo Wang
  9. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Project progress •  Co-IP contract modification –  Received review (draft) from NTU consultant –  Expect to get advice on contract modification today •  Midterm report (due Jun. 27) –  Received template from Dr. Dai –  Cover NGS pipelines in use –  Reuse the content back to the manual for result page –  Most people here expected to be involved  2014.06 Slides by Liang Bo Wang
  10. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University •  DNA-Seq pipeline documentation and script request from FXN –  Granted? •  1st poster on APCMBE (亞太醫工年會) (due Jun. 22) –  Subject on NGS data reading and QC processing –  Python package Nextbiopy –  With a example use case –  ARI co-author? •  Done survey about further poster submissions 2014.06  Slides by Liang Bo Wang
  11. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Submission: GIW / ISCB Asia 2014 •  Dec 15-17, Tokyo (http://www.jsbi.org/giw2014) –  ISCB = International Society for Computational Biology –  GIW = Genome Informatics •  Proceedings acceptance such as Bioinformatics, BMC Genomics, JBCB and so on •  Deadline –  Jul 7 paper/oral –  Aug 25 poster submission  2014.06 Slides by Liang Bo Wang
  12. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University  2014.06 Slides by Liang Bo Wang
  13. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Submission: AYRCOB •  Jan 19-20, 2015, Hsinchu (http://2015.ayrcob.org/) –  AYRCOB = Asian Young Researchers Conference on Computational and Omics Biology •  Jul 31 submission deadline (not sure poster or paper) •  Not sure about the date for acceptance announcement •  Too late?  2014.06 Slides by Liang Bo Wang
  14. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Future Plan •  Make sure midterm report meet the deadline –  Fill the content by collaboration •  Continue on report generator / result parser development •  Abstract for APCMBE poster •  Initiate the structure for report manual  2014.06 Slides by Liang Bo Wang
  15. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University MRSA Research Possible goal Available large datasets in lab ICGC related cancer project intro  2014.06 Slides by Liang Bo Wang
  16. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Goal •  Identify a diagnosis that human or current technology can do well but difficult to scale –  Ex. Pathology analysis on biopsy –  Ex. Some somatic mutation confirmed to develop cancer based on SNP microarray •  Boost the prediction rate or speed up the prediction process by –  Distributed computation –  Multiple sources of data to do multiple instance learning  2014.06 Slides by Liang Bo Wang
  17. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Multiple Instance Learning •  Some algorithm that Microsoft has been in leading position for years, used on video pattern recognition and linguistic analysis •  Require data of same observation from multiple sources –  Multiple sources of data (SNP, CNV, RNA-Seq, Chip-Seq data) –  Large data size for model training (this will be a complex model anyway) •  Asking if lab has such data sets (>100 samples) –  Replied: small sample size in NGS data but not sure about microarray data –  Better if accompanied with clinical data (concern about privacy issue)  2014.06 Slides by Liang Bo Wang
  18. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Workaround for large datasets •  Search for large public dataset, but finding multiple source in public is hard •  Take a look on projects like TCGA –  Data policy has changed –  For level I/II, require application for data access •  Anyway, after some survey on such datasets, a summary about cancer genomic project  2014.06 Slides by Liang Bo Wang
  19. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University  2014.06 Slides by Liang Bo Wang
  20. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University  2014.06 Slides by Liang Bo Wang
  21. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University  2014.06 Slides by Liang Bo Wang The following content is directly extracted from slides made by bioinformatics.ca, which are shared under CC 2.5 BY license
  22. ICGC BAM/FASTQ TCGA BAM/FASTQ ICGC Open Data (includes

  23. Module 1: Cancer Genomic Databases bioinformatics.ca ICGC Map – November

    2013
  24. Module 1: Cancer Genomic Databases bioinformatics.ca ICGC datasets to date

    Dec-­‐11   Jan-­‐2012   Feb   March   April   June   July   Aug   Sept   Oct   May   Nov   Dec   Jan-­‐2013   Feb   March   April   May   June   July   Aug   Sept-­‐2013   1000   2000   3000   4000   5000   6000   7000   8000   9000   10,000   Release 7 Release 8 Release 9 Release 10 Release 11 Release 12 Release 13 Release 14 Number     of     Donors   ICGC  Data  Portal  Cumula.ve  Donor  Count  for  Member  Projects   Hardeep Nahal
  25. •  Cancer types: 41 •  Donors: 8,532 (18,056 specimens) • 

    Simple somatic mutations: 1,995,134 •  Copy number mutations: 18,526,593 •  Structural rearrangements: 18,614 •  Genes affected* by simple somatic mutations: 22,074 •  Genes affected* by non-synonymous coding mutations: 19,150 Genes affected* by copy number mutations: 20,341 •  Genes affected* by structural rearrangements: 1,884 •  *out 22,259 protein coding genes annotated in Ensembl Human release 69 •  Open tier and controlled data currently available ICGC dataset version 14
  26. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University End of the extraction 2014.06 Slides by Liang Bo Wang 
  27. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University  2014.06 Slides by Liang Bo Wang
  28. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University  2014.06 Slides by Liang Bo Wang
  29. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University  2014.06 Slides by Liang Bo Wang Summary Mutated Genes Mutations Donors Publications Page Filters ! Mutation Impact Summary Code LUSC-KR Name " Lung Cancer - KR Primary Site Lung Tumour Type Lung cancer Tumour Subtype Squamous cell carcinoma Countries South Korea Total number of donors 111 Experimental Analyses WXS 111 samples from 111 donors # Download Sample Sheet Raw data is available at " European Genome-phenome Archive An approved " data access request is required. Available Data Types Clinical Data 111 donors Simple Somatic Mutations (SSM) 111 donors Copy Number Somatic Mutations (CNSM) -- Structural Somatic Mutations (StSM) -- Simple Germline Variants (SGV) -- Array-based DNA Methylation (METH-A) -- Sequence-based DNA Methylation (METH-S) -- Array-based Gene Expression (EXP-A) -- Sequence-based Gene Expression (EXP-S) -- Protein Expression (PEXP) -- Sequence-based miRNA Expression (miRNA) -- Exon junction (JCN) -- OPEN IN $ Data Repository OPEN IN ADVANCED SEARCH | GENOME VIEWER Most Frequently Mutated Genes ( Login P LUNG CANCER - KR ICGC Data Portal ) * + $Quick Search + , High , Low , Unknown ed 75
  30. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University  2014.06 Slides by Liang Bo Wang OPEN IN ADVANCED SEARCH | GENOME VIEWER % Most Frequently Mutated Genes Showing 10 of 27,347 genes Symbol Name Location Type # Donors affected # Mutations in LUSC-KR ! Across all Projects TTN titin chr2:179390716- 179695529 protein_coding 2,122 / 6,590 (32.20%) & 189 TTN-AS1 TTN antisense RNA 1 chr2:179385910- 179639402 antisense 2,029 / 6,590 (30.79%) & 178 TP53 tumor protein p53 chr17:7565097- 7590856 protein_coding 2,020 / 6,590 (30.65%) & 59 SNHG14 small nucleolar RNA host gene 14 (non-protein coding) chr15:25223730- 25664609 processed_transcript 778 / 6,590 (11.81%) & 92 % of Donors Affected TTN TTN -AS1 TP53 SN H G 14 RYR2 USH 2A M UC16 ZFH X4 M T-CO 1 CSM D 3 0 25 50 75 61 / 111 (54.95%) 60 / 111 (54.05%) 57 / 111 (51.35%) 53 / 111 (47.75%)
  31. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University  2014.06 Slides by Liang Bo Wang OPEN IN ADVANCED SEARCH | GENOME VIEWER % Most Frequent Mutations Showing 10 of 60,288 mutations ID DNA change Type Consequences # Donors affected in LUSC-KR ! Across all Project MU5219 chr3:g.178936091G>A single base substitution Missense: PIK3CA E545K Upstream: PIK3CA 144 / 6,590 (2.19%) & MU24637 chr17:g.7577120C>T single base substitution Missense: TP53 R141H, R273H NC Exon: TP53 Upstream: TP53 Downstream: TP53 Intron: TP53 72 / 6,590 (1.09%) & MU5286 chr17:g.7577121G>A single base substitution Missense: TP53 R273C, R141C NC Exon: TP53 65 / 6,590 (0.99%) & Donors affected M U5219 M U24637 M U5286 M U55099 M U64353 M U69856 M U17943 M U67642 M U66992 M U64201 0 100 200 144 / 111 (129.73%) 72 / 111 (64.86%) 65 / 111 (58.56%)
  32. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University  2014.06 Slides by Liang Bo Wang
  33. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Misc. VM server tool update Paper reading recap last week  2014.06 Slides by Liang Bo Wang
  34. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Misc. •  VM image Debian Jessie is added •  RDP connection (Windows Remote Desktop) now possible •  FreeNX -> X2go –  FreeNX is outdated –  X2go based on NoMachine NX3 protocol (2 concurrent connection limit?) –  Some connection latency and failure encountered –  Still resolving problems  2014.06 Slides by Liang Bo Wang
  35. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Misc. (cont’d) •  Paper last week recap –  Based on different event outcome, gene features can be more useful •  CCRT miRNA reanalysis –  Find differential expressed miRNA in different conditions –  Still discussing methods  2014.06 Slides by Liang Bo Wang
  36. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Thank You! Q&A Time  2014.06 Slides by Liang Bo Wang