Slide 1

Slide 1 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Work Log 09/13 Cloud Service Plan NextBiopy Progress Azure Cloud Experience 2013.09  Slides by Liang Bo Wang

Slide 2

Slide 2 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Cloud Service Plan Tools Survey Work Diagram Feature 2013.09  Slides by Liang Bo Wang

Slide 3

Slide 3 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Overview Diagram 2013.09  Slides by Liang Bo Wang upload samples request data return data cluster & job status submit computing intensive job ask data to process provide data Front-end provides: •  interface to run & design pipeline •  monitor system status •  view & manipulate analysis result

Slide 4

Slide 4 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Tool with Hadoop 2013.09  Slides by Liang Bo Wang cluster & job status submit computing intensive job ask data to process provide data Try-n-Error Analysis Heavy Computation

Slide 5

Slide 5 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Overview Diagram 2013.09  Slides by Liang Bo Wang upload samples request data return data Genome Browser Play with Result

Slide 6

Slide 6 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Features •  Graphically interactive •  Flexible architecture (loosely coupled) •  Extensible 2013.09  Slides by Liang Bo Wang

Slide 7

Slide 7 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Features – Graphically Interactive •  Manage sample online –  experiment metadata: condition, tissue type, … –  groping: folder, labeling •  Genome browser –  interact with results •  result shows along with reference genome •  allow jumping over regions given a record clicked •  Perform analysis –  manipulate result: table filtering, search, … –  visualization •  static •  interactive: HTML5, SVG, D3.js –  export result •  to Excel •  download through link 2013.09  Slides by Liang Bo Wang

Slide 8

Slide 8 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Features - Flexibility •  Database backend –  MySQL, PostgreSQL, MongoDB, Reddis •  Storage –  Local server –  Cloud: Amazon S3, Microsoft Azure, Google Cloud •  Computing cluster –  optional (not all labs have this) –  implementation not sure (message queue: ZeroMQ) –  Foxconn custom Hadoop Cluster –  Amazon Elastic MapReduce 2013.09  Slides by Liang Bo Wang

Slide 9

Slide 9 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Flexibility 2013.09  Slides by Liang Bo Wang

Slide 10

Slide 10 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Features – Extensible •  Provide API –  for all bindings, e.g., DB, storage, … –  for communication with main platform •  set up one’s own pipeline •  Provide SDK (Software Development Kit) –  combining their own tools –  for their own cloud 2013.09  Slides by Liang Bo Wang

Slide 11

Slide 11 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Genome Plots R Bioconductor •  ggbio •  Gviz •  GenomeGraphs Others •  GenomeTools (C, Python, Lua binder) http://genometools.org/ Reference: •  http://www.biostars.org/p/378/ •  http://www.biostars.org/p/18117/ 2013.09  Slides by Liang Bo Wang

Slide 12

Slide 12 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Genome Browser •  ChromoZOOM https://github.com/rothlab/chromozoom •  scribl (HTML5) http://chmille4.github.io/Scribl/ 2013.09  Slides by Liang Bo Wang

Slide 13

Slide 13 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Integrative Platform •  Genome Space, Broad Inst. http://www.genomespace.org/ •  DNAnexus, Google https://www.dnanexus.com/ •  Galaxy, UCSC http://genome.ucsc.edu/ •  bcbio-nextgen https://github.com/chapmanb/bcbio-nextgen 2013.09  Slides by Liang Bo Wang

Slide 14

Slide 14 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University DNA Nexus Experience Price User Interface Workflow 2013.09  Slides by Liang Bo Wang

Slide 15

Slide 15 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University DNA Nexus Pricing 2013.09  Slides by Liang Bo Wang

Slide 16

Slide 16 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University NextBiopy Progress Contribution Dev. Service Integration 2013.09  Slides by Liang Bo Wang

Slide 17

Slide 17 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University NextBiopy Progress 2013.09  Slides by Liang Bo Wang

Slide 18

Slide 18 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Issue Page 2013.09  Slides by Liang Bo Wang

Slide 19

Slide 19 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Online Documentation •  Host on readthedocs •  Auto gen. with code –  hook on every commit to repo –  regenerate as changes •  Ship w. every version •  Realize the conduct “documentation in code” •  Also has developer’s guide 2013.09  Slides by Liang Bo Wang index index modules modules | next next | NextBiopy 0.0.1-3-g1868dc8 documentation » NextBiopy: your next bio Python library Version: 0.0.1-3-g1868dc8 Last update: September 11, 2013 Introduction NextBiopy is a Python package providing basic, fast, and flexible data structure to store file formats widely-used in Biology. It aims to support the following file format: FASTA/Q BAM/SAM (using PySAM) VCF (using PyVCF) Underneath it extends numpy and pandas so it should be easy to import your sequence data into further data analysis. Contents Installation Quick Install Dependencies Quick Start Tutorial Package API nextbiopy.core Module Developers’ Guide How to Contribute? FAQ Indices and tables Index Module Index Search Page Go Project Versions v0.0.1 master latest RTD Search Full-text doc search. Table Of Contents NextBiopy: your next bio Python library Introduction Contents Indices and tables Next topic Installation This Page Show Source Show on GitHub Edit on GitHub index index modules modules | next next | NextBiopy 0.0.1-3-g1868dc8 documentation » © Copyright 2013, Liang Bo Wang. Last updated on Sep 11, 2013. Created using Sphinx 1.1.3.

Slide 20

Slide 20 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University 2013.09  Slides by Liang Bo Wang

Slide 21

Slide 21 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University 2013.09  Slides by Liang Bo Wang

Slide 22

Slide 22 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Auto Testing 2013.09  Slides by Liang Bo Wang

Slide 23

Slide 23 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Auto Test Coverage 2013.09  Slides by Liang Bo Wang

Slide 24

Slide 24 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University 2013.09  Slides by Liang Bo Wang

Slide 25

Slide 25 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Coming Up Event 2013.09  Slides by Liang Bo Wang

Slide 26

Slide 26 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University 2013.09  Slides by Liang Bo Wang nextbiopy/nextbiopy nextbiopy

Slide 27

Slide 27 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Microsoft Azure Cloud Experience Introduction Summary 2013.09  Slides by Liang Bo Wang

Slide 28

Slide 28 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Registration 2013.09  Slides by Liang Bo Wang Use NTU tax id Don’t know why needed

Slide 29

Slide 29 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University 2013.09  Slides by Liang Bo Wang

Slide 30

Slide 30 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University 2013.09  Slides by Liang Bo Wang

Slide 31

Slide 31 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University 2013.09  Slides by Liang Bo Wang

Slide 32

Slide 32 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University 2013.09  Slides by Liang Bo Wang

Slide 33

Slide 33 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University 2013.09  Slides by Liang Bo Wang start an instance requires ~5 min

Slide 34

Slide 34 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University 2013.09  Slides by Liang Bo Wang Then set up this cloud virtual machine same as usual

Slide 35

Slide 35 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University 2013.09  Slides by Liang Bo Wang Easy online, no Routing Config

Slide 36

Slide 36 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University 2013.09  Slides by Liang Bo Wang

Slide 37

Slide 37 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Tedious to set up all instances •  Create a system image •  By using script (Azure SDK), one can control many instances same time. 2013.09  Slides by Liang Bo Wang

Slide 38

Slide 38 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University How’s the Price? •  Looks expensive at first sight 2013.09  Slides by Liang Bo Wang

Slide 39

Slide 39 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University How’s the Price? (Cont’d) •  Actually it is cheap, –  You won’t open an instance 24/7 •  Unless you are a web server •  Azure has 6mo/12mo pre-paid discount (20% off) –  Use when requires intensive computing •  Example •  L (NT$7.5/hr) x 20 instance x 5 hr ≈ NT$800 –  Remember to turn instances off •  You link the account w. your credit card, don’t’ be stupid •  Worth try! 2013.09  Slides by Liang Bo Wang

Slide 40

Slide 40 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University Summary •  Cent 6.3 fails, need some hack –  Switch to Ubuntu 13.04 temporarily –  Solution •  Build VM locally on Windows Server (.VHD) •  Upload VHD image file to server •  Seems cheap and I have some quota to play around –  $6,300 for 1 month trial, •  Though Google gives me $63,000 for trial –  Try Google Cloud Platform later 2013.09  Slides by Liang Bo Wang

Slide 41

Slide 41 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University FYI – Data Visualization •  http://www.visualisingdata.com/index.php/ 2013/09/essential-resources-programming- languages-toolkits-and-libraries/ 2013.09  Slides by Liang Bo Wang

Slide 42

Slide 42 text

Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine, National Taiwan University 2013.09  Slides by Liang Bo Wang