Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Galaxy Update 2019

Galaxy Update 2019

Presented by @jxtx and @nekrut at the Galaxy Community Conference 2019 in Freiburg

James Taylor

July 03, 2019
Tweet

More Decks by James Taylor

Other Decks in Science

Transcript

  1. View Slide

  2. Ten years of the Galaxy Community

    View Slide

  3. View Slide

  4. View Slide

  5. 2010: First Galaxy Developer Community Conference

    View Slide

  6. View Slide

  7. Feature highlights

    View Slide

  8. Highlights: Galaxy Help!

    View Slide

  9. Improving Galaxy’s User Experience
    From 2018
    Community
    Update

    View Slide

  10. User Experience Highlights
    New default style Favorite tools Dataset picker
    Colorful #nametags Grouptags for structured
    experimental designs
    Widespread modernization,
    vueification, and tweaks

    View Slide

  11. Workflow User Experience Highlights
    Connection feedback Access previous versions Why can’t I connect?
    Expressions steps in workflows Reusable input parameters

    View Slide

  12. Galaxy Architecture for Protected /
    Secure Data

    View Slide

  13. From 2018
    Community
    Update
    Supporting a diverse research landscape with Galaxy

    View Slide

  14. Genomic Analysis, Visualization, Informatics Lab
    Taylor, Nekrutenko, Goecks + Broad institute and many more
    - Cloud environment for analysis and
    interpretation of genomic and genome
    adjacent data
    - Allow for users to analyze protected data
    while maintaining security and compliance
    - Integrate multiple analysis and
    visualization environments
    A Galaxy for Cancer Genomics Research
    Goecks, Taylor, Blankenberg, Nekrutenko
    - Cloud environment for analysis and
    interpretation of cancer genomes and
    related datasets
    - Integrate with dozens of other tools
    funded under ITCR that #usegalaxy
    - Integrate with existing resources like the
    Cancer Research Data Commons and NCI
    cancer clouds
    AnVIL
    Both require bringing Galaxy into a certified secure environments (FISMA compliant), maintaining
    infrastructure level isolation between individual users to ensure data remains protected

    View Slide

  15. AnVIL: Inverting the model of genomic data sharing
    Traditional: Bring data to the researcher
    - Copying/moving data is costly
    - Harder to enforce security
    - Redundant infrastructure
    - Siloed compute
    Goal: Bring researcher to the data
    - Reduced redundancy and costs
    - Active threat detection and auditing
    - Greater accessibility
    - Elastic, shared, compute

    View Slide

  16. AnVIL / Terra: analysis
    workspaces and batch workflows
    AnVIL / Gen3: Data models,
    indexing, querying
    AnVIL / Dockstore: sharing
    containerized tools and workflows
    AnVIL / Analysis Environments: Jupyter
    Notebooks, RStudio, Galaxy, ...

    View Slide

  17. AnVIL / Terra: analysis workspaces
    and batch workflows
    AnVIL / Gen3: Data models,
    indexing, querying
    AnVIL / Analysis Environments: Jupyter
    Notebooks, RStudio, Galaxy, ...
    FISMA Moderate
    2 ATOs
    Pursuing FedRAMP
    All data use and analysis in a FISMA moderate environment
    Implemented on
    Primary data storage costs covered by AnVIL, user private
    data and compute billed directly through Google

    View Slide

  18. Initially, relying on the platform to deploy a
    Galaxy instance for each user
    More overhead than the current user
    experience
    Can we provide secure Galaxy instances with
    (close to) the current Galaxy user experience?

    View Slide

  19. Security Boundary
    Shared DB
    (No protected Data)
    Unprivileged
    Galaxy Instance
    Galaxy Multiplexer
    Isolated Galaxy instances with a single interface

    View Slide

  20. Security Boundary
    Shared DB
    (No protected Data)
    Anonymous User
    Unprivileged
    Galaxy Instance
    Galaxy Multiplexer
    Isolated Galaxy instances with a single interface

    View Slide

  21. Security Boundary
    User 1 Isolated Resources
    User Data
    and DB
    User 1 Galaxy
    Instance
    User Compute
    Containers
    Shared DB
    (No protected Data)
    Anonymous User
    Unprivileged
    Galaxy Instance
    User 1
    Galaxy Multiplexer
    Isolated Galaxy instances with a single interface

    View Slide

  22. Security Boundary
    User 1 Isolated Resources
    User Data
    and DB
    User 1 Galaxy
    Instance
    User Compute
    Containers
    Shared DB
    (No protected Data)
    User 2 Isolated Resources
    User Data
    and DB
    User 2 Galaxy
    Instance
    User Compute
    Containers
    Anonymous User
    Unprivileged
    Galaxy Instance
    User 1
    User 2
    Galaxy Multiplexer
    Isolated Galaxy instances with a single interface

    View Slide

  23. To achieve this, need to make it easy to
    manage a Galaxy composed of many
    different containers, across different security
    boundaries, ...

    View Slide

  24. Bootstrap via
    CloudLaunch >_ run
    VM IP
    CloudBridge
    AWS Azure GCE OpenStack
    CloudLaunch-plugin
    galaxy/cloudman-boot
    cloudman-boot → Rancher K8S Helm
    CloudMan
    chart CloudBridge CloudLaunch CloudMan HelmsMan
    Multi-cloud Infrastructure Coordination Applications
    VM
    ...
    ...
    ...
    ...
    Galaxy
    Chart
    Remote
    object store(s)
    Local
    cache
    Authn / authz
    Authnz
    Authnz
    Containerized jobs
    @EnisAfgan @nuwan_ag
    @pabloOmics @almahmoud
    @ic4f
    Galaxy + Kubernetes

    View Slide

  25. Distributing Galaxy Execution
    or...
    “How little can a Galaxy actually touch
    the data?”

    View Slide

  26. Cloud
    Galaxy
    new job:
    inputs:
    - dataset 1
    - dataset 2
    outputs:
    - dataset 3
    tool: HISAT2
    create job
    Data Storage
    Volume
    Compute
    Compute
    Compute
    Pulsar
    execute job
    get datasets 1, 2
    1
    2
    get datasets 1, 2
    execute job
    3
    3
    job complete
    1
    2
    1
    2
    1
    2
    3
    compute
    3
    Time
    control message
    data movement
    Current Remote
    Execution Data Flow
    NFS
    @jmchilton
    @natefoo

    View Slide

  27. Cloud
    Galaxy
    new job:
    inputs:
    - dataset 1
    - dataset 2
    outputs:
    - dataset 3
    tool: HISAT2
    create job
    Data Storage
    Volume
    Compute
    Compute
    Compute
    Pulsar
    execute job
    get datasets 1, 2
    execute job
    3
    job complete
    1
    2
    1
    2
    3
    compute
    Time
    Future Remote
    Execution Data Flow
    NFS
    3
    1
    2
    control message
    data movement
    @jmchilton
    @natefoo

    View Slide

  28. Kubernetes
    Job Pod
    Galaxy
    new job:
    inputs:
    - dataset 1
    - dataset 2
    outputs:
    - dataset 3
    tool: HISAT2
    create job
    Data Storage
    Volume
    execute job
    get datasets 1, 2
    execute job
    3
    job complete
    1
    2
    1
    2
    3
    compute
    Time
    Future k8s Remote
    Execution Data Flow
    NFS
    3
    1
    2
    control message
    data movement
    BioContainer
    Executor
    Container
    @jmchilton
    @natefoo

    View Slide

  29. Challenges ...

    View Slide

  30. - Galaxy is not Netflix
    - Communicating new features to all
    audiences
    - Fighting misconceptions and
    heartburn
    - Attracting reluctant users

    View Slide

  31. 2020: Coordinating the Global Galaxy
    Community

    View Slide

  32. From 2018
    Community
    Update

    View Slide

  33. Achieving usegalaxy.✱ coherence
    ● Common reference and index data
    ○ These are already distributed by ❤CVMFS❤, but organized in a ad hoc manner due to the
    history of Galaxy
    ○ Currently building an automated approach where metadata defining the complete set of
    reference and index data will live in Github, builds will be automated based on Github state,
    and successful builds deployed through ❤CVMFS❤ for replication to all site
    - Intergalactic Data Commission: https://github.com/usegalaxy-eu/idc
    ● Common tools
    ○ A common set of tools and a common tool menu organization is currently being defined.
    Tools and tool configuration will also be replicated through ❤CVMFS❤
    ○ This will ensure both that users will have the same user experience across different usegalaxy.
    ✱ instances, and that workflows can be moved between instances and still execute correctly
    and reproducibly
    ○ Local custom tools will still be supported but clearly identified
    ● See gxadmin, common tools on ❤CVMFS❤ + build + installation, and other coordination efforts
    developing

    View Slide

  34. View Slide

  35. View Slide

  36. - Unified vision
    - Joined funding efforts
    - Optimization of development and
    training efforts
    - Improving coordination

    View Slide

  37. Acknowledgements

    View Slide

  38. Acknowledgements: Contributors
    - Core Code: contributors to galaxyproject/galaxy:
    - ~315 (~39 new since last year)
    - Tools: contributors to galaxyproject/tools-iuc:
    - ~195 (~38 new since last year)
    - ...and the ever vigilant Intergalactic Utilities Commission for handling these contributions and
    maintaining the quality of essential Galaxy tools
    - ...and everyone else who has contributed a tool to the ToolShed
    - Training: contributors to galaxyproject/training-material
    - ~114 (~34 new since last year)
    - ...and everyone who has conducted or attended Galaxy Training
    - Everyone who has contributed to Galaxy in other ways:
    - users, supporters, …
    - Funding: NSF and NIH (to our team), and all of the funders of the Global Galaxy Community

    View Slide

  39. Acknowledgements
    And, everyone who has attended any of the TEN GALAXY CONFERENCES!

    View Slide

  40. 231

    View Slide

  41. 231
    Attendees, making the biggest GCC ever!

    View Slide

  42. (fin)

    View Slide

  43. You’ve gone too far!

    View Slide

  44. (seriously stop)

    View Slide

  45. Galaxy Community
    Update 2019
    @jxtx @nekrut #usegalaxy

    View Slide

  46. Colors
    We use (nearly) the “Paired” colormap

    View Slide