Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Galaxy Update 2019

Galaxy Update 2019

Presented by @jxtx and @nekrut at the Galaxy Community Conference 2019 in Freiburg

3ee44f53c39bcd4bc663a2ea0e21d526?s=128

James Taylor

July 03, 2019
Tweet

More Decks by James Taylor

Other Decks in Science

Transcript

  1. None
  2. Ten years of the Galaxy Community

  3. None
  4. None
  5. 2010: First Galaxy Developer Community Conference

  6. None
  7. Feature highlights

  8. Highlights: Galaxy Help!

  9. Improving Galaxy’s User Experience From 2018 Community Update

  10. User Experience Highlights New default style Favorite tools Dataset picker

    Colorful #nametags Grouptags for structured experimental designs Widespread modernization, vueification, and tweaks
  11. Workflow User Experience Highlights Connection feedback Access previous versions Why

    can’t I connect? Expressions steps in workflows Reusable input parameters
  12. Galaxy Architecture for Protected / Secure Data

  13. From 2018 Community Update Supporting a diverse research landscape with

    Galaxy
  14. Genomic Analysis, Visualization, Informatics Lab Taylor, Nekrutenko, Goecks + Broad

    institute and many more - Cloud environment for analysis and interpretation of genomic and genome adjacent data - Allow for users to analyze protected data while maintaining security and compliance - Integrate multiple analysis and visualization environments A Galaxy for Cancer Genomics Research Goecks, Taylor, Blankenberg, Nekrutenko - Cloud environment for analysis and interpretation of cancer genomes and related datasets - Integrate with dozens of other tools funded under ITCR that #usegalaxy - Integrate with existing resources like the Cancer Research Data Commons and NCI cancer clouds AnVIL Both require bringing Galaxy into a certified secure environments (FISMA compliant), maintaining infrastructure level isolation between individual users to ensure data remains protected
  15. AnVIL: Inverting the model of genomic data sharing Traditional: Bring

    data to the researcher - Copying/moving data is costly - Harder to enforce security - Redundant infrastructure - Siloed compute Goal: Bring researcher to the data - Reduced redundancy and costs - Active threat detection and auditing - Greater accessibility - Elastic, shared, compute
  16. AnVIL / Terra: analysis workspaces and batch workflows AnVIL /

    Gen3: Data models, indexing, querying AnVIL / Dockstore: sharing containerized tools and workflows AnVIL / Analysis Environments: Jupyter Notebooks, RStudio, Galaxy, ...
  17. AnVIL / Terra: analysis workspaces and batch workflows AnVIL /

    Gen3: Data models, indexing, querying AnVIL / Analysis Environments: Jupyter Notebooks, RStudio, Galaxy, ... FISMA Moderate 2 ATOs Pursuing FedRAMP All data use and analysis in a FISMA moderate environment Implemented on Primary data storage costs covered by AnVIL, user private data and compute billed directly through Google
  18. Initially, relying on the platform to deploy a Galaxy instance

    for each user More overhead than the current user experience Can we provide secure Galaxy instances with (close to) the current Galaxy user experience?
  19. Security Boundary Shared DB (No protected Data) Unprivileged Galaxy Instance

    Galaxy Multiplexer Isolated Galaxy instances with a single interface
  20. Security Boundary Shared DB (No protected Data) Anonymous User Unprivileged

    Galaxy Instance Galaxy Multiplexer Isolated Galaxy instances with a single interface
  21. Security Boundary User 1 Isolated Resources User Data and DB

    User 1 Galaxy Instance User Compute Containers Shared DB (No protected Data) Anonymous User Unprivileged Galaxy Instance User 1 Galaxy Multiplexer Isolated Galaxy instances with a single interface
  22. Security Boundary User 1 Isolated Resources User Data and DB

    User 1 Galaxy Instance User Compute Containers Shared DB (No protected Data) User 2 Isolated Resources User Data and DB User 2 Galaxy Instance User Compute Containers Anonymous User Unprivileged Galaxy Instance User 1 User 2 Galaxy Multiplexer Isolated Galaxy instances with a single interface
  23. To achieve this, need to make it easy to manage

    a Galaxy composed of many different containers, across different security boundaries, ...
  24. Bootstrap via CloudLaunch >_ run VM IP CloudBridge AWS Azure

    GCE OpenStack CloudLaunch-plugin galaxy/cloudman-boot cloudman-boot → Rancher K8S Helm CloudMan chart CloudBridge CloudLaunch CloudMan HelmsMan Multi-cloud Infrastructure Coordination Applications VM ... ... ... ... Galaxy Chart Remote object store(s) Local cache Authn / authz Authnz Authnz Containerized jobs @EnisAfgan @nuwan_ag @pabloOmics @almahmoud @ic4f Galaxy + Kubernetes
  25. Distributing Galaxy Execution or... “How little can a Galaxy actually

    touch the data?”
  26. Cloud Galaxy new job: inputs: - dataset 1 - dataset

    2 outputs: - dataset 3 tool: HISAT2 create job Data Storage Volume Compute Compute Compute Pulsar execute job get datasets 1, 2 1 2 get datasets 1, 2 execute job 3 3 job complete 1 2 1 2 1 2 3 compute 3 Time control message data movement Current Remote Execution Data Flow NFS @jmchilton @natefoo
  27. Cloud Galaxy new job: inputs: - dataset 1 - dataset

    2 outputs: - dataset 3 tool: HISAT2 create job Data Storage Volume Compute Compute Compute Pulsar execute job get datasets 1, 2 execute job 3 job complete 1 2 1 2 3 compute Time Future Remote Execution Data Flow NFS 3 1 2 control message data movement @jmchilton @natefoo
  28. Kubernetes Job Pod Galaxy new job: inputs: - dataset 1

    - dataset 2 outputs: - dataset 3 tool: HISAT2 create job Data Storage Volume execute job get datasets 1, 2 execute job 3 job complete 1 2 1 2 3 compute Time Future k8s Remote Execution Data Flow NFS 3 1 2 control message data movement BioContainer Executor Container @jmchilton @natefoo
  29. Challenges ...

  30. - Galaxy is not Netflix - Communicating new features to

    all audiences - Fighting misconceptions and heartburn - Attracting reluctant users
  31. 2020: Coordinating the Global Galaxy Community

  32. From 2018 Community Update

  33. Achieving usegalaxy.✱ coherence • Common reference and index data ◦

    These are already distributed by ❤CVMFS❤, but organized in a ad hoc manner due to the history of Galaxy ◦ Currently building an automated approach where metadata defining the complete set of reference and index data will live in Github, builds will be automated based on Github state, and successful builds deployed through ❤CVMFS❤ for replication to all site - Intergalactic Data Commission: https://github.com/usegalaxy-eu/idc • Common tools ◦ A common set of tools and a common tool menu organization is currently being defined. Tools and tool configuration will also be replicated through ❤CVMFS❤ ◦ This will ensure both that users will have the same user experience across different usegalaxy. ✱ instances, and that workflows can be moved between instances and still execute correctly and reproducibly ◦ Local custom tools will still be supported but clearly identified • See gxadmin, common tools on ❤CVMFS❤ + build + installation, and other coordination efforts developing
  34. None
  35. None
  36. - Unified vision - Joined funding efforts - Optimization of

    development and training efforts - Improving coordination
  37. Acknowledgements

  38. Acknowledgements: Contributors - Core Code: contributors to galaxyproject/galaxy: - ~315

    (~39 new since last year) - Tools: contributors to galaxyproject/tools-iuc: - ~195 (~38 new since last year) - ...and the ever vigilant Intergalactic Utilities Commission for handling these contributions and maintaining the quality of essential Galaxy tools - ...and everyone else who has contributed a tool to the ToolShed - Training: contributors to galaxyproject/training-material - ~114 (~34 new since last year) - ...and everyone who has conducted or attended Galaxy Training - Everyone who has contributed to Galaxy in other ways: - users, supporters, … - Funding: NSF and NIH (to our team), and all of the funders of the Global Galaxy Community
  39. Acknowledgements And, everyone who has attended any of the TEN

    GALAXY CONFERENCES!
  40. 231

  41. 231 Attendees, making the biggest GCC ever!

  42. (fin)

  43. You’ve gone too far!

  44. (seriously stop)

  45. Galaxy Community Update 2019 @jxtx @nekrut #usegalaxy

  46. Colors We use (nearly) the “Paired” colormap