Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Work Log 08/16

Liang Bo Wang
August 15, 2013
55

Work Log 08/16

Liang Bo Wang

August 15, 2013
Tweet

Transcript

  1. NGS Cloud Platform Survey Lung Cancer miRNA Dataset More on

    CCRT Analysis Tutorial Plan Work Log 08/16 2013.08 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 
  2. Available NGS Cloud Platforms •  Genome Space, Broad Inst. http://www.genomespace.org/

    •  DNAnexus, Google https://www.dnanexus.com/ •  Galaxy, UCSC http://genome.ucsc.edu/ Keep doing survey … 2013.08 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 
  3. Lung Cancer Dataset 2013.08 Bioinformatics and Biostatistics Core, NTU Center

    of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 
  4. Collect Public sRNA-Seq Dataset •  InSilicoDB –  curated datasets – 

    manage, upload one’s own samples –  edit samples clinical infor –  share –  public data also 2013.08 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang  https://insilicodb.org/
  5. Collected Dataset •  Its search interface is not designed for

    massive search without specific keywords •  But it is good for manage one’s own data 2013.08 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang  ID Title Sample Size Platform GSE37764 A high dimensional deep sequencing study of non-small cell lung adenocarcinoma in never-smoker Korean females [Seq] 24 (6x2N2T) GAIIx
  6. Collect Public sRNA-Seq Dataset •  GEOmetadb –  also available for

    NGS data –  filter result by various custom fields –  previous result can be re-used 2013.08 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang  GEOmetadb: GEO Microarray Search Tool What is GEOmetadb? GEOmetadb is an attempt to make access to the metadata associated with the NCBI Gene Expression Omnibus (GEO) samples, platforms, and datasets much more feasible for common biologists and bioinformatians/statistians. read more GEOmetadb Paper (Paper Link) Bioinformatics 2008 24(23):2798-2800; doi:10.1093/bioinformatics/btn520. What's new about GEOmetadb? GEOmetadb has been upgraded to version 2.0. New feature includes: Database tables and search interfaces have been modified significantly Search performance has been improved Several user-friendly functions have been added, e.g. drill-down search, download search results, ... read more GEOmetadb Web Interface: - GEO Microarray Online Search Tool GEOmetadb Distributions: - BioConductor Package/SQLite Database: Get Started: Joint Search | GSE Search Main features: Search by individual data types Search by GSE-GPL-GSM cross data types GEO entities are linked by relationships between them Multiple field query Query within results List creation Flexible display options Export or view details Read More If you want to find GEO microarray data of interest directly within R by using power of SQL? Please try combination use of GEOmetadb and GEOquery. BioConductor package: GEOmetadb (in BioC 2.2 with R2.7 ) SQLite3 database: GEOmetadb.sqlite.gz ( 176.9 MB, August 10 2013 15:16:31. ) - Matlab GEOtools: Download: MATLAB_GEOtools.zip (Mac OS X, Intel) Document: GEOmetadb_matlab.pdf (pdf) - FileMaker distribution: Download: GEOmetaDB.fp7.zip (32.5 MB, 08/01/2008) Readme: SQLite2FileMakerPro.Readme.txt Meltzerlab/GB/CCR/NCI/NIH @2008 Contact: Powered by BxAF Search Meltzerlab | GEO Site Home | GSE-GPL-GSM | GPL | GSE | GSM | GDS | GDS Subset | sMatrix | Help
  7. GPL Platform to Query 2013.08 Bioinformatics and Biostatistics Core, NTU

    Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang  GEO Accession Title Organism GSE Count GSM Count GPL9115 Illumina Genome Analyzer II Homo sapiens 375 3680 GPL10999 Illumina Genome Analyzer IIx Homo sapiens 258 3286 GPL16791 Illumina HiSeq 2500 Homo sapiens 3 11 GPL11154 Illumina HiSeq 2000 Homo sapiens 268 3390 GPL15433 Illumina HiSeq 1000 Homo sapiens 6 10 GPL15456 Illumina HiScanSQ Homo sapiens 4 50 GPL15520 Illumina MiSeq Homo sapiens 4 9 GPL10329 Illumina Genome Analyzer Homo sapiens; Mus musculus 1 2 GPL16061 Illumina Genome Analyzer IIx Homo sapiens; Mus musculus 2 10 GPL17232 Illumina Genome Analyzer Iix Homo sapiens 1 6
  8. More on CCRT Analysis 2013.08 Bioinformatics and Biostatistics Core, NTU

    Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 
  9. We want •  Detailed numbers of –  how many reads

    are mapped to miRBase 20 ? –  how many reads are not mapper to miRBase but still mapped to genome reference (hg19) ? –  how many reads are unmapped ? •  We dropped temp files of the previous run –  require a re-run of analysis –  verify the result if remained same 2013.08 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 
  10. 2013.08 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine,

    National Taiwan University Slides by Liang Bo Wang 
  11. 2013.08 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine,

    National Taiwan University Slides by Liang Bo Wang  New ID R1 R2 R3 R4 Sample Name 878T Eco-Ca-246 838T 818T miRBase 20 1,079,369 177,113 484,026 868,084 Genome 3,515,167 1,499,610 2,283,319 3,634,012 Unmapped 11,497,325 12,756,920 9,553,949 11,720,879 Total reads 15,012,492 14,256,530 11,837,268 15,354,891 New ID N1 N2 N3 N4 Sample Name 870T 884T Eco-Ca-373 65T miRBase 20 100,060 1,971,123 309,174 332,712 Genome 757,564 5,183,532 1,659,204 2,911,938 Unmapped 7,807,853 15,623,583 17,202,518 11,709,561 Total reads 8,565,417 20,807,115 18,861,722 14,621,499
  12. Tutorial Plan 2013.08 Bioinformatics and Biostatistics Core, NTU Center of

    Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 
  13. Tutorial Plan Covers •  Python 3 syntax •  Python Standard

    Library •  Useful Python packages: IPython, Pandas, Numpy If involved next project •  Markdown, RST documentation •  Version Control – Git 2013.08 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 
  14. 2013.08 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine,

    National Taiwan University Slides by Liang Bo Wang  $ Start NOW initialize environment ... done setup fundamental tools ... done initialize first mission ... [y/N]?
  15. Server OS Upgrade 2013.08 Bioinformatics and Biostatistics Core, NTU Center

    of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang 
  16. It’s time for Cent OS 6 •  Software complicated dependency

    •  Seriously, the main reason to update OS is due to the grandpa gcc glibc version •  They do not provide some essential features 2013.08 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang  $ sudo upgrade OS
  17. •  which means we need to use older version of

    most software if we stick to Cent 5.x •  It is possible to have newer version of these libraries, but the dependency tree will be tangled and hard to maintain, … and not easy to do so. 2013.08 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang  $ gcc --version gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-54) $ ldd --version (GNU libc) 2.5 Copyright (C) 2006 Free Software Foundation, Inc.
  18. Upgrading to Cent OS 6.4 •  Start with old and

    less used machine –  maybe 172.16.0.15x •  If possible, also upgrade 171 and 173 –  趁碩班學長姐畢業這時 2013.08 Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Slides by Liang Bo Wang