$30 off During Our Annual Pro Sale. View Details »

SciTokens: Capability-Based Secure Access to Remote Scientific Data

SciTokens: Capability-Based Secure Access to Remote Scientific Data

Presented at PEARC18: https://www.pearc18.pearc.org/
Pre-print: https://arxiv.org/abs/1807.04728
Full Paper: https://doi.org/10.1145/3219104.3219135
Project Home: https://scitokens.org/

Abstract:
The management of security credentials (e.g., passwords, secret keys) for computational science workflows is a burden for scientists and information security officers. Problems with credentials (e.g., expiration, privilege mismatch) cause workflows to fail to fetch needed input data or store valuable scientific results, distracting scientists from their research by requiring them to diagnose the problems, re-run their computations, and wait longer for their results. In this paper, we introduce SciTokens, open source software to help scientists manage their security credentials more reliably and securely. We describe the SciTokens system architecture, implementation design, and initial experimental deployment results to address use cases from the Laser Interferometer Gravitational-Wave Observatory (LIGO) Scientific Collaboration and the Large Synoptic Survey Telescope (LSST) projects. We also present our integration with widely-used software that supports distributed scientific computing, including HTCondor, CVMFS, and XrootD. SciTokens uses IETF-standard OAuth tokens for capability-based secure access to remote scientific data. The access tokens convey the specific authorizations needed by the workflows, rather than general-purpose authentication impersonation credentials, to address the risks of scientific workflows running on distributed infrastructure including NSF resources (e.g., LIGO Data Grid, Open Science Grid, XSEDE) and public clouds (e.g., Amazon Web Services, Google Cloud, Microsoft Azure). By improving the interoperability and security of scientific workflows, SciTokens 1) enables use of distributed computing for scientific domains that require greater data protection and 2) enables use of more widely distributed computing resources by reducing the risk of credential abuse on remote systems.

Jim Basney

July 25, 2018
Tweet

More Decks by Jim Basney

Other Decks in Technology

Transcript

  1. SciTokens: Capability-Based Secure
    Access to Remote Scientific Data
    Jim Basney
    https://www.scitokens.org/
    This material is based upon work supported by the National Science Foundation under Grant
    No. 1738962. Any opinions, findings, and conclusions or recommendations expressed in this material
    are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

    View Slide

  2. SciTokens Project
    • The SciTokens project:
    • Introduces a capabilities-based authorization infrastructure
    for distributed scientific computing,
    • Provides a reference platform, combining CILogon, HTCondor,
    CVMFS, and XRootD, and
    • Implements specific use cases to help our science
    stakeholders (LIGO and LSST) better achieve their scientific
    aims.

    View Slide

  3. SciTokens uses standards
    • RFC 6749: OAuth 2.0 Authorization Framework
    • token request, consent, refresh
    • RFC 7519: JSON Web Token (JWT)
    • self-describing tokens, distributed validation
    • RFC 8414: OAuth 2.0 Authorization Server Metadata
    • token signing keys, policies, endpoint URLs
    • OAuth 2.0 Token Exchange (IETF OAuth WG I-D)
    • token delegation, drop privileges

    View Slide

  4. Example Token, Decoded
    • The decoded token contains
    multiple scopes - basically
    filesystem authorizations.
    • The audience narrows who the
    token is intended for.
    • The issuer identifies who created
    the token; value used to locate the
    public keys needed to validate
    signature.
    • The subject is an opaque identifier
    for the resource owner. In this case,
    it also happens to be the identity.
    • The expiration is a Unix timestamp
    when the token expires. A typical
    lifetime is 10 minutes.

    View Slide

  5. User ID
    Name
    Email
    CILogon and SciTokens
    CILogon
    • Federated Identity Management
    • OpenID Connect
    • ID Tokens
    SciTokens
    • Federated Authorization
    • OAuth 2.0
    • Access Tokens
    InCommon IdP
    CILogon
    SciTokens
    Resource
    User Info
    VO Info
    Groups
    Access Rights

    View Slide

  6. SciTokens System Architecture
    Job Submission Job Execution
    Data Access
    condor_submit
    condor_schedd
    condor_credd
    condor_shadow
    condor_startd
    condor_starter
    User’s job
    Token Server
    Data Server
    (CVMFS / XRootD)
    User
    Policy DB
    = refresh tokens
    A
    A A
    R
    R A = access tokens
    A
    Identity Provider

    View Slide

  7. User Experience
    user@chtc$ condor_submit workflow.jdl
    Visit https://chtc.example.edu/authorize to authorize your jobs.
    user@chtc$
    Your HTCondor jobs require the following permissions:
    • Read from /frames on LIGO Frame Server
    • Write to /users/dbrown/pycbc-32931 on LIGO Data Server
    Allow Deny

    View Slide

  8. Early results on OSG
    • End-to-end token-based auth{z,n} workflow for
    the OSG VO submit service
    • Includes patches to Xrootd to validate tokens
    presented via HTTPS and to write files out with
    the correct Unix user permissions
    • Details:
    • instead of using OAuth2 to generate the token,
    we keep a signing key on the submit host.
    • only one token needed.
    • submit host and storage server owned by OSG.

    View Slide

  9. Give SciTokens a try!
    • https://demo.scitokens.org/ - token generator
    • https://github.com/scitokens/ - open source software
    • Java and Python implementations
    • SciTokens-aware token server
    • CVMFS, Nginx, and XRootD plugins
    • Docker image for XRootD setup
    • https://scitokens.org/ - docs, email lists

    View Slide