SciTokens: Capability-Based Secure Access to Remote Scientific Data

SciTokens: Capability-Based Secure Access to Remote Scientific Data

Presented at PEARC18: https://www.pearc18.pearc.org/
Pre-print: https://arxiv.org/abs/1807.04728
Full Paper: https://doi.org/10.1145/3219104.3219135
Project Home: https://scitokens.org/

Abstract:
The management of security credentials (e.g., passwords, secret keys) for computational science workflows is a burden for scientists and information security officers. Problems with credentials (e.g., expiration, privilege mismatch) cause workflows to fail to fetch needed input data or store valuable scientific results, distracting scientists from their research by requiring them to diagnose the problems, re-run their computations, and wait longer for their results. In this paper, we introduce SciTokens, open source software to help scientists manage their security credentials more reliably and securely. We describe the SciTokens system architecture, implementation design, and initial experimental deployment results to address use cases from the Laser Interferometer Gravitational-Wave Observatory (LIGO) Scientific Collaboration and the Large Synoptic Survey Telescope (LSST) projects. We also present our integration with widely-used software that supports distributed scientific computing, including HTCondor, CVMFS, and XrootD. SciTokens uses IETF-standard OAuth tokens for capability-based secure access to remote scientific data. The access tokens convey the specific authorizations needed by the workflows, rather than general-purpose authentication impersonation credentials, to address the risks of scientific workflows running on distributed infrastructure including NSF resources (e.g., LIGO Data Grid, Open Science Grid, XSEDE) and public clouds (e.g., Amazon Web Services, Google Cloud, Microsoft Azure). By improving the interoperability and security of scientific workflows, SciTokens 1) enables use of distributed computing for scientific domains that require greater data protection and 2) enables use of more widely distributed computing resources by reducing the risk of credential abuse on remote systems.

0ae744585b101d83bacbf1e80eceec6b?s=128

Jim Basney

July 25, 2018
Tweet

Transcript

  1. SciTokens: Capability-Based Secure Access to Remote Scientific Data Jim Basney

    <jbasney@ncsa.Illinois.edu> https://www.scitokens.org/ This material is based upon work supported by the National Science Foundation under Grant No. 1738962. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
  2. SciTokens Project • The SciTokens project: • Introduces a capabilities-based

    authorization infrastructure for distributed scientific computing, • Provides a reference platform, combining CILogon, HTCondor, CVMFS, and XRootD, and • Implements specific use cases to help our science stakeholders (LIGO and LSST) better achieve their scientific aims.
  3. SciTokens uses standards • RFC 6749: OAuth 2.0 Authorization Framework

    • token request, consent, refresh • RFC 7519: JSON Web Token (JWT) • self-describing tokens, distributed validation • RFC 8414: OAuth 2.0 Authorization Server Metadata • token signing keys, policies, endpoint URLs • OAuth 2.0 Token Exchange (IETF OAuth WG I-D) • token delegation, drop privileges
  4. Example Token, Decoded • The decoded token contains multiple scopes

    - basically filesystem authorizations. • The audience narrows who the token is intended for. • The issuer identifies who created the token; value used to locate the public keys needed to validate signature. • The subject is an opaque identifier for the resource owner. In this case, it also happens to be the identity. • The expiration is a Unix timestamp when the token expires. A typical lifetime is 10 minutes.
  5. User ID Name Email CILogon and SciTokens CILogon • Federated

    Identity Management • OpenID Connect • ID Tokens SciTokens • Federated Authorization • OAuth 2.0 • Access Tokens InCommon IdP CILogon SciTokens Resource User Info VO Info Groups Access Rights
  6. SciTokens System Architecture Job Submission Job Execution Data Access condor_submit

    condor_schedd condor_credd condor_shadow condor_startd condor_starter User’s job Token Server Data Server (CVMFS / XRootD) User Policy DB = refresh tokens A A A R R A = access tokens A Identity Provider
  7. User Experience user@chtc$ condor_submit workflow.jdl Visit https://chtc.example.edu/authorize to authorize your

    jobs. user@chtc$ Your HTCondor jobs require the following permissions: • Read from /frames on LIGO Frame Server • Write to /users/dbrown/pycbc-32931 on LIGO Data Server Allow Deny
  8. Early results on OSG • End-to-end token-based auth{z,n} workflow for

    the OSG VO submit service • Includes patches to Xrootd to validate tokens presented via HTTPS and to write files out with the correct Unix user permissions • Details: • instead of using OAuth2 to generate the token, we keep a signing key on the submit host. • only one token needed. • submit host and storage server owned by OSG.
  9. Give SciTokens a try! • https://demo.scitokens.org/ - token generator •

    https://github.com/scitokens/ - open source software • Java and Python implementations • SciTokens-aware token server • CVMFS, Nginx, and XRootD plugins • Docker image for XRootD setup • https://scitokens.org/ - docs, email lists