Upgrade to Pro — share decks privately, control downloads, hide ads and more …

WQ-MAKER-PAG2017

 WQ-MAKER-PAG2017

This presentation is at the NextGeneration Sequencing and annotation workshop at PAG 2017

Upendra Kumar Devisetty

January 14, 2017
Tweet

More Decks by Upendra Kumar Devisetty

Other Decks in Research

Transcript

  1. Overview of CyVerse Vision: Transforming science through data-driven discovery Mission:

    Design, develop, deploy, and expand a na:onal cyberinfrastructure for life science research, and train scien:sts Funding: Na:onal Science Founda:on Usage: More than 38K users, PB’s of data, and hundreds of publica:ons, courses, and discoveries http://www.cyverse.org/
  2. Atmosphere Overview: Cloud Compu:ng Get it Done Reproducibility Produc:vity • 

    Work in an on-demand Linux environments •  Collaborate with students and colleagues on the same instance •  Overcome usability challenges of cloud plaTorm •  Mul:core high memory images to run mul:threading applica:ons Move your analyses from your laptop to the cloud •  Make data, workflows, and analyses available in a public image •  Access previous soXware version and images
  3. funded by the National Science Foundation Award #ACI-1445604 http://jetstream- cloud.org/

    Jetstream Background •  Jetstream funded as NSF’s first produc:on cloud facility Jetstream: A Distributed Cloud Infrastructure for Under-resourced Higher Educa=on Communi=es Fischer, Jeremy; Tuecke, Steven; Foster, Ian; Stewart, Craig A. •  Part of the NSF eXtreme Digital (XD) program and supported by XSEDE –  Small and under-resourced colleges and universi:es –  Provide on-demand interac(ve compu:ng and analysis –  Increase effort efficiency - perceived and read ease of use
  4. Why do we need WQ-MAKER? •  MAKER is a flexible

    and scalable genome annota:on pipeline –  Denovo genome annota:on –  Upda:ng exis:ng genome annota:on –  Combining evidence with genome •  Limita:ons of MAKER –  Installa:on of MAKER is challenging and complex –  MAKER runs are not :me efficient •  WQ-MAKER is a modified MAKER annota:on pipeline capable of being run on distributed compu:ng resources using Work Queue •  WQ-MAKER is configured to run on Jetstream
  5. Components of WQ-MAKER on Jetstream Worker Worker Worker Worker Worker

    WQ-MAKER image Augustus SNAP Exonerate BLAST RepeatMasker cctools icommands MAKER Ansible Scaling up genome annotation using MAKER and work queue Andrew Thrasher, Zachary Musgrave, Brian Kachmarck, Douglas Thain, and Scott Emrich International Journal of Bioinformatics Research and Applications 2014 10:4-5, 447-460
  6. Benchmarking of WQ-MAKER on testdata* Blast CPU = 1 MPI

    = N Blast CPU = 6 MPI = N Blast CPU = 1 MPI = Y *First 300kb of 12 chromosomes of rice, m1.medium (6 CPUs, 16 GB memory, 60 GB disk)
  7. Genomes tested so far.. Sporobolus species A Sporobolus species B

    Brassica rapa Kochia scoparia Oryza sa(va Calypte anna Sclero(ana homeocarpa Zea mays
  8. Genome Group Number of sequences Take taken to finish Number

    of workers MPI Cores Sprobolus species A Plant 11789 con:gs 144 hours 22 Y 6 Sprobolus species B Plant 6615 con:gs 108 hours 21-35 Y 6 Sclero:nia homoeocarpa isolate 10 Fungi 231 con:gs 6 hours 10 N 1 Sclero:nia homoeocarpa isolate 11 Fungi 257 con:gs 6 hours 10 N 1 Calypte_anna Humming bird 265 super scaffolds 8 hours 10 N 1 Brassica rapa Plant 10 + 44,000 scaffolds 4 hours 10 N 1 Kochia Plant 19,671 scaffolds 72 hours 21 N 1 WQ-MAKER run :mes on Jetstream* *WQ-MAKER achieves a speed-up of 45x using 50 workers using a 180MB Caenorhabditis japonica test case on Amazon AWS
  9. How to get started? •  XSEDE account (hqps://www.xsede.org/user-portal) – Jetstream cloud

    resource alloca:on through CyVerse (quick start) (hqp://:nyurl.com/JSalloca:on) – XSEDE resource alloca:on (produc:on runs) (hqps://www.xsede.org/alloca:ons) •  Sign into Jetstream using XSEDE creden:als (hqps://use.jetstream-cloud.org) •  Register to use MAKER (hqp://yandell.topaz.gene:cs.utah.edu/cgi-bin/maker_license.cgi)
  10. 1 8 Instance size CPUs Mem (GB) m.1Sny 1 2

    m1.small 2 4 m1.medium 6 16 m1.large 10 30 m1.large.paramtest 24 60 m1.xlarge 24 60 m1.xxlarge 44 120 How to get started? hqps://ask.cyverse.org [email protected] RT
  11. In the works.. •  Bulk launch op:on to deploy mul:ple

    instances/ workers •  Ansible to automate Jetstream VM/Instance management •  R Shiny App integra:on to inform the amount of computa:onal resources are needed •  JBrowse to visualize the genome annota:ons •  Thorough tes:ng of WQ-MAKER with MPI op:on •  Manuscript for WQ-MAKER on Jetstream
  12. ACKNOWLEDGEMENTS • University of Notre Dame –  Nicholas Hazekamp and Doug

    Thain • CyVerse/University of Arizona –  Nirav Merchant –  Shabari Subramanyam –  Eric Lyons –  Blake Joyce • CyVerse/CSHL –  Kapeel Chogule • University of Utah –  Yandell Lab • Jetstream development team at CyVerse/University of Arizona • CyVerse execuSve team • NSF XSEDE CyVerse at PAG 2017 Exhibit Booth # 502 http://tinyurl.com/CyVersePAG2017