Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Full scientific software environments

Phil
June 27, 2016

Full scientific software environments

A talk at the JASMIN conference about the software stack deployments at the Met Office. The presentation was designed to lead into an open floor discussion on software deployment across many NERC (Natural Environment Research Council, UK) collaborators.

Phil

June 27, 2016
Tweet

More Decks by Phil

Other Decks in Technology

Transcript

  1. Scientific software environments within the Met Office A work in

    progress to simplify scientific software deployment and provision multiple (full) application stacks Philip Elson - 28 / 06 / 2016
  2. Why doesn’t a single software environment cut it? • Software

    deployment is slow for shared application instances (HPC, Desktop, compute clusters) • Risk of impacting at least one user is high • No ability to roll back on a per application basis • Users want to move at a different pace • Operational environment doesn’t need to contain many R&D tools
  3. Managing multiple environments - conda • As a developer, conda

    provides the ability to have many isolated software environments for simplified testing and dependency tracking. Use it over venv/virtualenv as it allows management of non-python packages (C[++], Fortran, R, IDL, Lua, data, etc.). • conda doesn’t have the ability to manage centrally deployed environments (though an environment on a networked disk could be used by many users) • + many users don’t need/want to be able to manage their own environment The aim: A solution combining the power of conda with the maintainability of a git repo and the flexibility to chose the deployment mechanism (e.g. shared disk, RPMs, tarball, conda itself).
  4. What have we done so far? • Created conda-forge to

    build a community of conda packagers (>120 contributors in 6 months) • Contributed 100s of build “recipes” to conda-forge for collaborative ownership and maintenance • Developed conda-build-all to simplify conda-building of many packages against an appropriate matrix of versions (python, numpy, R, etc)
  5. What have we done so far (contd.)? • Developed conda-gitenv

    to manage conda environments as a git repo (diff, tag, revert, etc.) • Developed conda-rpms to convert a conda-gitenv environment into RPMs. Working, but in need of further development/docs. • Deployed production environments to shared disk on Cray HPC, administered through (automatic) continuous deployment and conda-gitenv
  6. Collateral benefit • Isolating builds helps produce better binaries with

    greater isolation of dependencies. This will avoid the need for huge projects when rolling out new OS updates. The exact same binaries work on any Linux with a suitable glibc (RHEL >=5, ubuntu, Fedora, Suse, etc.). [note: bespoke binaries have been built against highly optimised compilers on the Cray HPC] • Ability for users to create their own environments for non-centrally managed machines (e.g. bring your own device, development teams) • Continuous deployment is a possibility (for untagged environments) • Sharing of build “recipes” with a wide community (medical, finance, environmental sciences etc.) • Full stack description controlled with git (diffable, revertable, taggable/referenceable) • The same process could be used to deploy to Windows and OSX
  7. What’s left? • conda-rpms to be made production ready and

    RPMs to be deployed across the Met Office estate • Defining and documenting the governance model for cross-team environment maintenance (i.e. pull request) • Ongoing maintenance of conda-forge machinery • Ongoing maintenance of recipes (new versions, improved optimisations, correcting metadata) • Setting up collaborations to share knowledge, tools, recipes and environment specifications – ultimately to enable a consistent and modern software stack for use in scientific analysis.