Work Log 09/13

2e195d9f305aed00da7d05e20db6423c?s=47 Liang Bo Wang
September 12, 2013
29

Work Log 09/13

2e195d9f305aed00da7d05e20db6423c?s=128

Liang Bo Wang

September 12, 2013
Tweet

Transcript

  1. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Work Log 09/13 Cloud Service Plan NextBiopy Progress Azure Cloud Experience 2013.09  Slides by Liang Bo Wang
  2. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Cloud Service Plan Tools Survey Work Diagram Feature 2013.09  Slides by Liang Bo Wang
  3. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Overview Diagram 2013.09  Slides by Liang Bo Wang upload samples request data return data cluster & job status submit computing intensive job ask data to process provide data Front-end provides: •  interface to run & design pipeline •  monitor system status •  view & manipulate analysis result
  4. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Tool with Hadoop 2013.09  Slides by Liang Bo Wang cluster & job status submit computing intensive job ask data to process provide data Try-n-Error Analysis Heavy Computation
  5. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Overview Diagram 2013.09  Slides by Liang Bo Wang upload samples request data return data Genome Browser Play with Result
  6. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Features •  Graphically interactive •  Flexible architecture (loosely coupled) •  Extensible 2013.09  Slides by Liang Bo Wang
  7. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Features – Graphically Interactive •  Manage sample online –  experiment metadata: condition, tissue type, … –  groping: folder, labeling •  Genome browser –  interact with results •  result shows along with reference genome •  allow jumping over regions given a record clicked •  Perform analysis –  manipulate result: table filtering, search, … –  visualization •  static •  interactive: HTML5, SVG, D3.js –  export result •  to Excel •  download through link 2013.09  Slides by Liang Bo Wang
  8. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Features - Flexibility •  Database backend –  MySQL, PostgreSQL, MongoDB, Reddis •  Storage –  Local server –  Cloud: Amazon S3, Microsoft Azure, Google Cloud •  Computing cluster –  optional (not all labs have this) –  implementation not sure (message queue: ZeroMQ) –  Foxconn custom Hadoop Cluster –  Amazon Elastic MapReduce 2013.09  Slides by Liang Bo Wang
  9. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Flexibility 2013.09  Slides by Liang Bo Wang
  10. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Features – Extensible •  Provide API –  for all bindings, e.g., DB, storage, … –  for communication with main platform •  set up one’s own pipeline •  Provide SDK (Software Development Kit) –  combining their own tools –  for their own cloud 2013.09  Slides by Liang Bo Wang
  11. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Genome Plots R Bioconductor •  ggbio •  Gviz •  GenomeGraphs Others •  GenomeTools (C, Python, Lua binder) http://genometools.org/ Reference: •  http://www.biostars.org/p/378/ •  http://www.biostars.org/p/18117/ 2013.09  Slides by Liang Bo Wang
  12. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Genome Browser •  ChromoZOOM https://github.com/rothlab/chromozoom •  scribl (HTML5) http://chmille4.github.io/Scribl/ 2013.09  Slides by Liang Bo Wang
  13. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Integrative Platform •  Genome Space, Broad Inst. http://www.genomespace.org/ •  DNAnexus, Google https://www.dnanexus.com/ •  Galaxy, UCSC http://genome.ucsc.edu/ •  bcbio-nextgen https://github.com/chapmanb/bcbio-nextgen 2013.09  Slides by Liang Bo Wang
  14. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University DNA Nexus Experience Price User Interface Workflow 2013.09  Slides by Liang Bo Wang
  15. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University DNA Nexus Pricing 2013.09  Slides by Liang Bo Wang
  16. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University NextBiopy Progress Contribution Dev. Service Integration 2013.09  Slides by Liang Bo Wang
  17. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University NextBiopy Progress 2013.09  Slides by Liang Bo Wang
  18. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Issue Page 2013.09  Slides by Liang Bo Wang
  19. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Online Documentation •  Host on readthedocs •  Auto gen. with code –  hook on every commit to repo –  regenerate as changes •  Ship w. every version •  Realize the conduct “documentation in code” •  Also has developer’s guide 2013.09  Slides by Liang Bo Wang index index modules modules | next next | NextBiopy 0.0.1-3-g1868dc8 documentation » NextBiopy: your next bio Python library Version: 0.0.1-3-g1868dc8 Last update: September 11, 2013 Introduction NextBiopy is a Python package providing basic, fast, and flexible data structure to store file formats widely-used in Biology. It aims to support the following file format: FASTA/Q BAM/SAM (using PySAM) VCF (using PyVCF) Underneath it extends numpy and pandas so it should be easy to import your sequence data into further data analysis. Contents Installation Quick Install Dependencies Quick Start Tutorial Package API nextbiopy.core Module Developers’ Guide How to Contribute? FAQ Indices and tables Index Module Index Search Page Go Project Versions v0.0.1 master latest RTD Search Full-text doc search. Table Of Contents NextBiopy: your next bio Python library Introduction Contents Indices and tables Next topic Installation This Page Show Source Show on GitHub Edit on GitHub index index modules modules | next next | NextBiopy 0.0.1-3-g1868dc8 documentation » © Copyright 2013, Liang Bo Wang. Last updated on Sep 11, 2013. Created using Sphinx 1.1.3.
  20. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University 2013.09  Slides by Liang Bo Wang
  21. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University 2013.09  Slides by Liang Bo Wang
  22. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Auto Testing 2013.09  Slides by Liang Bo Wang
  23. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Auto Test Coverage 2013.09  Slides by Liang Bo Wang
  24. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University 2013.09  Slides by Liang Bo Wang
  25. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Coming Up Event 2013.09  Slides by Liang Bo Wang
  26. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University 2013.09  Slides by Liang Bo Wang nextbiopy/nextbiopy nextbiopy
  27. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Microsoft Azure Cloud Experience Introduction Summary 2013.09  Slides by Liang Bo Wang
  28. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Registration 2013.09  Slides by Liang Bo Wang Use NTU tax id Don’t know why needed
  29. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University 2013.09  Slides by Liang Bo Wang
  30. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University 2013.09  Slides by Liang Bo Wang
  31. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University 2013.09  Slides by Liang Bo Wang
  32. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University 2013.09  Slides by Liang Bo Wang
  33. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University 2013.09  Slides by Liang Bo Wang start an instance requires ~5 min
  34. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University 2013.09  Slides by Liang Bo Wang Then set up this cloud virtual machine same as usual
  35. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University 2013.09  Slides by Liang Bo Wang Easy online, no Routing Config
  36. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University 2013.09  Slides by Liang Bo Wang
  37. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Tedious to set up all instances •  Create a system image •  By using script (Azure SDK), one can control many instances same time. 2013.09  Slides by Liang Bo Wang
  38. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University How’s the Price? •  Looks expensive at first sight 2013.09  Slides by Liang Bo Wang
  39. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University How’s the Price? (Cont’d) •  Actually it is cheap, –  You won’t open an instance 24/7 •  Unless you are a web server •  Azure has 6mo/12mo pre-paid discount (20% off) –  Use when requires intensive computing •  Example •  L (NT$7.5/hr) x 20 instance x 5 hr ≈ NT$800 –  Remember to turn instances off •  You link the account w. your credit card, don’t’ be stupid •  Worth try! 2013.09  Slides by Liang Bo Wang
  40. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University Summary •  Cent 6.3 fails, need some hack –  Switch to Ubuntu 13.04 temporarily –  Solution •  Build VM locally on Windows Server (.VHD) •  Upload VHD image file to server •  Seems cheap and I have some quota to play around –  $6,300 for 1 month trial, •  Though Google gives me $63,000 for trial –  Try Google Cloud Platform later 2013.09  Slides by Liang Bo Wang
  41. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University FYI – Data Visualization •  http://www.visualisingdata.com/index.php/ 2013/09/essential-resources-programming- languages-toolkits-and-libraries/ 2013.09  Slides by Liang Bo Wang
  42. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National

    Taiwan University 2013.09  Slides by Liang Bo Wang