Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Galaxy Update 2019

Galaxy Update 2019

Presented by @jxtx and @nekrut at the Galaxy Community Conference 2019 in Freiburg

James Taylor

July 03, 2019
Tweet

More Decks by James Taylor

Other Decks in Science

Transcript

  1. Ten years of the Galaxy Community

    View full-size slide

  2. 2010: First Galaxy Developer Community Conference

    View full-size slide

  3. Feature highlights

    View full-size slide

  4. Highlights: Galaxy Help!

    View full-size slide

  5. Improving Galaxy’s User Experience
    From 2018
    Community
    Update

    View full-size slide

  6. User Experience Highlights
    New default style Favorite tools Dataset picker
    Colorful #nametags Grouptags for structured
    experimental designs
    Widespread modernization,
    vueification, and tweaks

    View full-size slide

  7. Workflow User Experience Highlights
    Connection feedback Access previous versions Why can’t I connect?
    Expressions steps in workflows Reusable input parameters

    View full-size slide

  8. Galaxy Architecture for Protected /
    Secure Data

    View full-size slide

  9. From 2018
    Community
    Update
    Supporting a diverse research landscape with Galaxy

    View full-size slide

  10. Genomic Analysis, Visualization, Informatics Lab
    Taylor, Nekrutenko, Goecks + Broad institute and many more
    - Cloud environment for analysis and
    interpretation of genomic and genome
    adjacent data
    - Allow for users to analyze protected data
    while maintaining security and compliance
    - Integrate multiple analysis and
    visualization environments
    A Galaxy for Cancer Genomics Research
    Goecks, Taylor, Blankenberg, Nekrutenko
    - Cloud environment for analysis and
    interpretation of cancer genomes and
    related datasets
    - Integrate with dozens of other tools
    funded under ITCR that #usegalaxy
    - Integrate with existing resources like the
    Cancer Research Data Commons and NCI
    cancer clouds
    AnVIL
    Both require bringing Galaxy into a certified secure environments (FISMA compliant), maintaining
    infrastructure level isolation between individual users to ensure data remains protected

    View full-size slide

  11. AnVIL: Inverting the model of genomic data sharing
    Traditional: Bring data to the researcher
    - Copying/moving data is costly
    - Harder to enforce security
    - Redundant infrastructure
    - Siloed compute
    Goal: Bring researcher to the data
    - Reduced redundancy and costs
    - Active threat detection and auditing
    - Greater accessibility
    - Elastic, shared, compute

    View full-size slide

  12. AnVIL / Terra: analysis
    workspaces and batch workflows
    AnVIL / Gen3: Data models,
    indexing, querying
    AnVIL / Dockstore: sharing
    containerized tools and workflows
    AnVIL / Analysis Environments: Jupyter
    Notebooks, RStudio, Galaxy, ...

    View full-size slide

  13. AnVIL / Terra: analysis workspaces
    and batch workflows
    AnVIL / Gen3: Data models,
    indexing, querying
    AnVIL / Analysis Environments: Jupyter
    Notebooks, RStudio, Galaxy, ...
    FISMA Moderate
    2 ATOs
    Pursuing FedRAMP
    All data use and analysis in a FISMA moderate environment
    Implemented on
    Primary data storage costs covered by AnVIL, user private
    data and compute billed directly through Google

    View full-size slide

  14. Initially, relying on the platform to deploy a
    Galaxy instance for each user
    More overhead than the current user
    experience
    Can we provide secure Galaxy instances with
    (close to) the current Galaxy user experience?

    View full-size slide

  15. Security Boundary
    Shared DB
    (No protected Data)
    Unprivileged
    Galaxy Instance
    Galaxy Multiplexer
    Isolated Galaxy instances with a single interface

    View full-size slide

  16. Security Boundary
    Shared DB
    (No protected Data)
    Anonymous User
    Unprivileged
    Galaxy Instance
    Galaxy Multiplexer
    Isolated Galaxy instances with a single interface

    View full-size slide

  17. Security Boundary
    User 1 Isolated Resources
    User Data
    and DB
    User 1 Galaxy
    Instance
    User Compute
    Containers
    Shared DB
    (No protected Data)
    Anonymous User
    Unprivileged
    Galaxy Instance
    User 1
    Galaxy Multiplexer
    Isolated Galaxy instances with a single interface

    View full-size slide

  18. Security Boundary
    User 1 Isolated Resources
    User Data
    and DB
    User 1 Galaxy
    Instance
    User Compute
    Containers
    Shared DB
    (No protected Data)
    User 2 Isolated Resources
    User Data
    and DB
    User 2 Galaxy
    Instance
    User Compute
    Containers
    Anonymous User
    Unprivileged
    Galaxy Instance
    User 1
    User 2
    Galaxy Multiplexer
    Isolated Galaxy instances with a single interface

    View full-size slide

  19. To achieve this, need to make it easy to
    manage a Galaxy composed of many
    different containers, across different security
    boundaries, ...

    View full-size slide

  20. Bootstrap via
    CloudLaunch >_ run
    VM IP
    CloudBridge
    AWS Azure GCE OpenStack
    CloudLaunch-plugin
    galaxy/cloudman-boot
    cloudman-boot → Rancher K8S Helm
    CloudMan
    chart CloudBridge CloudLaunch CloudMan HelmsMan
    Multi-cloud Infrastructure Coordination Applications
    VM
    ...
    ...
    ...
    ...
    Galaxy
    Chart
    Remote
    object store(s)
    Local
    cache
    Authn / authz
    Authnz
    Authnz
    Containerized jobs
    @EnisAfgan @nuwan_ag
    @pabloOmics @almahmoud
    @ic4f
    Galaxy + Kubernetes

    View full-size slide

  21. Distributing Galaxy Execution
    or...
    “How little can a Galaxy actually touch
    the data?”

    View full-size slide

  22. Cloud
    Galaxy
    new job:
    inputs:
    - dataset 1
    - dataset 2
    outputs:
    - dataset 3
    tool: HISAT2
    create job
    Data Storage
    Volume
    Compute
    Compute
    Compute
    Pulsar
    execute job
    get datasets 1, 2
    1
    2
    get datasets 1, 2
    execute job
    3
    3
    job complete
    1
    2
    1
    2
    1
    2
    3
    compute
    3
    Time
    control message
    data movement
    Current Remote
    Execution Data Flow
    NFS
    @jmchilton
    @natefoo

    View full-size slide

  23. Cloud
    Galaxy
    new job:
    inputs:
    - dataset 1
    - dataset 2
    outputs:
    - dataset 3
    tool: HISAT2
    create job
    Data Storage
    Volume
    Compute
    Compute
    Compute
    Pulsar
    execute job
    get datasets 1, 2
    execute job
    3
    job complete
    1
    2
    1
    2
    3
    compute
    Time
    Future Remote
    Execution Data Flow
    NFS
    3
    1
    2
    control message
    data movement
    @jmchilton
    @natefoo

    View full-size slide

  24. Kubernetes
    Job Pod
    Galaxy
    new job:
    inputs:
    - dataset 1
    - dataset 2
    outputs:
    - dataset 3
    tool: HISAT2
    create job
    Data Storage
    Volume
    execute job
    get datasets 1, 2
    execute job
    3
    job complete
    1
    2
    1
    2
    3
    compute
    Time
    Future k8s Remote
    Execution Data Flow
    NFS
    3
    1
    2
    control message
    data movement
    BioContainer
    Executor
    Container
    @jmchilton
    @natefoo

    View full-size slide

  25. Challenges ...

    View full-size slide

  26. - Galaxy is not Netflix
    - Communicating new features to all
    audiences
    - Fighting misconceptions and
    heartburn
    - Attracting reluctant users

    View full-size slide

  27. 2020: Coordinating the Global Galaxy
    Community

    View full-size slide

  28. From 2018
    Community
    Update

    View full-size slide

  29. Achieving usegalaxy.✱ coherence
    ● Common reference and index data
    ○ These are already distributed by ❤CVMFS❤, but organized in a ad hoc manner due to the
    history of Galaxy
    ○ Currently building an automated approach where metadata defining the complete set of
    reference and index data will live in Github, builds will be automated based on Github state,
    and successful builds deployed through ❤CVMFS❤ for replication to all site
    - Intergalactic Data Commission: https://github.com/usegalaxy-eu/idc
    ● Common tools
    ○ A common set of tools and a common tool menu organization is currently being defined.
    Tools and tool configuration will also be replicated through ❤CVMFS❤
    ○ This will ensure both that users will have the same user experience across different usegalaxy.
    ✱ instances, and that workflows can be moved between instances and still execute correctly
    and reproducibly
    ○ Local custom tools will still be supported but clearly identified
    ● See gxadmin, common tools on ❤CVMFS❤ + build + installation, and other coordination efforts
    developing

    View full-size slide

  30. - Unified vision
    - Joined funding efforts
    - Optimization of development and
    training efforts
    - Improving coordination

    View full-size slide

  31. Acknowledgements

    View full-size slide

  32. Acknowledgements: Contributors
    - Core Code: contributors to galaxyproject/galaxy:
    - ~315 (~39 new since last year)
    - Tools: contributors to galaxyproject/tools-iuc:
    - ~195 (~38 new since last year)
    - ...and the ever vigilant Intergalactic Utilities Commission for handling these contributions and
    maintaining the quality of essential Galaxy tools
    - ...and everyone else who has contributed a tool to the ToolShed
    - Training: contributors to galaxyproject/training-material
    - ~114 (~34 new since last year)
    - ...and everyone who has conducted or attended Galaxy Training
    - Everyone who has contributed to Galaxy in other ways:
    - users, supporters, …
    - Funding: NSF and NIH (to our team), and all of the funders of the Global Galaxy Community

    View full-size slide

  33. Acknowledgements
    And, everyone who has attended any of the TEN GALAXY CONFERENCES!

    View full-size slide

  34. 231
    Attendees, making the biggest GCC ever!

    View full-size slide

  35. You’ve gone too far!

    View full-size slide

  36. (seriously stop)

    View full-size slide

  37. Galaxy Community
    Update 2019
    @jxtx @nekrut #usegalaxy

    View full-size slide

  38. Colors
    We use (nearly) the “Paired” colormap

    View full-size slide