Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Galaxy Update 2019

Galaxy Update 2019

Presented by @jxtx and @nekrut at the Galaxy Community Conference 2019 in Freiburg

James Taylor

July 03, 2019
Tweet

More Decks by James Taylor

Other Decks in Science

Transcript

  1. User Experience Highlights New default style Favorite tools Dataset picker

    Colorful #nametags Grouptags for structured experimental designs Widespread modernization, vueification, and tweaks
  2. Workflow User Experience Highlights Connection feedback Access previous versions Why

    can’t I connect? Expressions steps in workflows Reusable input parameters
  3. Genomic Analysis, Visualization, Informatics Lab Taylor, Nekrutenko, Goecks + Broad

    institute and many more - Cloud environment for analysis and interpretation of genomic and genome adjacent data - Allow for users to analyze protected data while maintaining security and compliance - Integrate multiple analysis and visualization environments A Galaxy for Cancer Genomics Research Goecks, Taylor, Blankenberg, Nekrutenko - Cloud environment for analysis and interpretation of cancer genomes and related datasets - Integrate with dozens of other tools funded under ITCR that #usegalaxy - Integrate with existing resources like the Cancer Research Data Commons and NCI cancer clouds AnVIL Both require bringing Galaxy into a certified secure environments (FISMA compliant), maintaining infrastructure level isolation between individual users to ensure data remains protected
  4. AnVIL: Inverting the model of genomic data sharing Traditional: Bring

    data to the researcher - Copying/moving data is costly - Harder to enforce security - Redundant infrastructure - Siloed compute Goal: Bring researcher to the data - Reduced redundancy and costs - Active threat detection and auditing - Greater accessibility - Elastic, shared, compute
  5. AnVIL / Terra: analysis workspaces and batch workflows AnVIL /

    Gen3: Data models, indexing, querying AnVIL / Dockstore: sharing containerized tools and workflows AnVIL / Analysis Environments: Jupyter Notebooks, RStudio, Galaxy, ...
  6. AnVIL / Terra: analysis workspaces and batch workflows AnVIL /

    Gen3: Data models, indexing, querying AnVIL / Analysis Environments: Jupyter Notebooks, RStudio, Galaxy, ... FISMA Moderate 2 ATOs Pursuing FedRAMP All data use and analysis in a FISMA moderate environment Implemented on Primary data storage costs covered by AnVIL, user private data and compute billed directly through Google
  7. Initially, relying on the platform to deploy a Galaxy instance

    for each user More overhead than the current user experience Can we provide secure Galaxy instances with (close to) the current Galaxy user experience?
  8. Security Boundary Shared DB (No protected Data) Unprivileged Galaxy Instance

    Galaxy Multiplexer Isolated Galaxy instances with a single interface
  9. Security Boundary Shared DB (No protected Data) Anonymous User Unprivileged

    Galaxy Instance Galaxy Multiplexer Isolated Galaxy instances with a single interface
  10. Security Boundary User 1 Isolated Resources User Data and DB

    User 1 Galaxy Instance User Compute Containers Shared DB (No protected Data) Anonymous User Unprivileged Galaxy Instance User 1 Galaxy Multiplexer Isolated Galaxy instances with a single interface
  11. Security Boundary User 1 Isolated Resources User Data and DB

    User 1 Galaxy Instance User Compute Containers Shared DB (No protected Data) User 2 Isolated Resources User Data and DB User 2 Galaxy Instance User Compute Containers Anonymous User Unprivileged Galaxy Instance User 1 User 2 Galaxy Multiplexer Isolated Galaxy instances with a single interface
  12. To achieve this, need to make it easy to manage

    a Galaxy composed of many different containers, across different security boundaries, ...
  13. Bootstrap via CloudLaunch >_ run VM IP CloudBridge AWS Azure

    GCE OpenStack CloudLaunch-plugin galaxy/cloudman-boot cloudman-boot → Rancher K8S Helm CloudMan chart CloudBridge CloudLaunch CloudMan HelmsMan Multi-cloud Infrastructure Coordination Applications VM ... ... ... ... Galaxy Chart Remote object store(s) Local cache Authn / authz Authnz Authnz Containerized jobs @EnisAfgan @nuwan_ag @pabloOmics @almahmoud @ic4f Galaxy + Kubernetes
  14. Cloud Galaxy new job: inputs: - dataset 1 - dataset

    2 outputs: - dataset 3 tool: HISAT2 create job Data Storage Volume Compute Compute Compute Pulsar execute job get datasets 1, 2 1 2 get datasets 1, 2 execute job 3 3 job complete 1 2 1 2 1 2 3 compute 3 Time control message data movement Current Remote Execution Data Flow NFS @jmchilton @natefoo
  15. Cloud Galaxy new job: inputs: - dataset 1 - dataset

    2 outputs: - dataset 3 tool: HISAT2 create job Data Storage Volume Compute Compute Compute Pulsar execute job get datasets 1, 2 execute job 3 job complete 1 2 1 2 3 compute Time Future Remote Execution Data Flow NFS 3 1 2 control message data movement @jmchilton @natefoo
  16. Kubernetes Job Pod Galaxy new job: inputs: - dataset 1

    - dataset 2 outputs: - dataset 3 tool: HISAT2 create job Data Storage Volume execute job get datasets 1, 2 execute job 3 job complete 1 2 1 2 3 compute Time Future k8s Remote Execution Data Flow NFS 3 1 2 control message data movement BioContainer Executor Container @jmchilton @natefoo
  17. - Galaxy is not Netflix - Communicating new features to

    all audiences - Fighting misconceptions and heartburn - Attracting reluctant users
  18. Achieving usegalaxy.✱ coherence • Common reference and index data ◦

    These are already distributed by ❤CVMFS❤, but organized in a ad hoc manner due to the history of Galaxy ◦ Currently building an automated approach where metadata defining the complete set of reference and index data will live in Github, builds will be automated based on Github state, and successful builds deployed through ❤CVMFS❤ for replication to all site - Intergalactic Data Commission: https://github.com/usegalaxy-eu/idc • Common tools ◦ A common set of tools and a common tool menu organization is currently being defined. Tools and tool configuration will also be replicated through ❤CVMFS❤ ◦ This will ensure both that users will have the same user experience across different usegalaxy. ✱ instances, and that workflows can be moved between instances and still execute correctly and reproducibly ◦ Local custom tools will still be supported but clearly identified • See gxadmin, common tools on ❤CVMFS❤ + build + installation, and other coordination efforts developing
  19. - Unified vision - Joined funding efforts - Optimization of

    development and training efforts - Improving coordination
  20. Acknowledgements: Contributors - Core Code: contributors to galaxyproject/galaxy: - ~315

    (~39 new since last year) - Tools: contributors to galaxyproject/tools-iuc: - ~195 (~38 new since last year) - ...and the ever vigilant Intergalactic Utilities Commission for handling these contributions and maintaining the quality of essential Galaxy tools - ...and everyone else who has contributed a tool to the ToolShed - Training: contributors to galaxyproject/training-material - ~114 (~34 new since last year) - ...and everyone who has conducted or attended Galaxy Training - Everyone who has contributed to Galaxy in other ways: - users, supporters, … - Funding: NSF and NIH (to our team), and all of the funders of the Global Galaxy Community
  21. 231