$30 off During Our Annual Pro Sale. View Details »

R/Pharma RStudio Connect Admin Workshop

kellobri
November 01, 2021
140

R/Pharma RStudio Connect Admin Workshop

RStudio Connect Admin Training Workshop (Virtual) for R/Pharma 2021

kellobri

November 01, 2021
Tweet

Transcript

  1. RStudio Connect
    Admin Workshop

    View Slide

  2. Workshop Agenda
    ● Part 1: Get Set Up for Success
    ● Part 2: The Admin Experience
    ● Part 3: Preview of things to come

    View Slide

  3. What is the purpose of RStudio Connect?
    1
    Publishing
    Deploy
    R
    or Python
    content via
    a
    variety
    of
    m
    ethods: push-button, CLI, git-backed, or
    Server API.
    2
    Execution
    Publish
    static
    content, or source
    code-backed
    item
    s. Set up
    docum
    ents
    to
    run
    on
    a
    schedule, control resource
    allocation
    to
    interactive
    applications
    and
    APIs.
    3
    M
    anagem
    ent
    Control the
    m
    etadata
    associated
    w
    ith
    a
    content
    item
    . Publish
    new
    versions
    in
    place, roll
    forw
    ard/backw
    ard. Access
    scheduled
    report
    history. Add
    organizational tags. Track
    the
    usage
    m
    etrics.
    4
    Distribution
    Add
    view
    ers
    and
    collaborators. Set a
    vanity
    URL. Schedule
    a
    custom
    em
    ail to
    be
    set on
    success
    criteria.

    View Slide

  4. Content Basics
    All the work you do in R & Python: “Data Products”
    ● Applications
    ○ Shiny
    ○ Dash, Streamlit, Bokeh
    ● Documents
    ○ R Markdown
    ○ Jupyter Notebooks
    ○ Static content: sites, plots, graphs
    ● Pins
    ● Web APIs (RSC Standard & Enterprise)
    ○ Plumber
    ○ Flask, FastAPI
    ○ Tableau Analytic Extensions: plumbertableau, fastAPItableau
    ● Models

    View Slide

  5. User Roles
    ● Administrators
    ○ Have all privileges, but must
    explicitly grant themselves content
    access
    ○ Actions are audited
    ○ Special access to the Admin tab and
    certain content settings
    ● Publishers
    ○ Can upload new content items
    ● Viewers
    ○ Can see content items
    ● Content Collaborators
    ○ Can publish new versions
    ○ Manage settings
    ○ Download source bundles
    (code)
    ● Content Viewers
    ○ Can only see and interact
    with the content itself
    Content Privileges

    View Slide

  6. RStudio Connect Overview
    ● Demo of Publishing Mechanisms
    ○ Push-button (RStudio IDE, Jupyter Notebooks)
    ○ Git-backed (manifest generation)
    ○ Programmatic (Azure DevOps example)
    ● Demo of Application Permissions Management
    ○ Users
    ○ Groups
    ● Demo of Admin Dashboard Functionality
    ○ Metrics
    ○ Process listing (updated)
    ○ Tags
    ○ Audit Logs
    ○ Unpublished Content
    ○ Scheduled Content Calendar

    View Slide

  7. Configuration Basics

    View Slide

  8. Supported Linux Distributions
    ● RHEL/CentOS 7 & 8*
    ● Ubuntu 18.07 LTS & 20.04 LTS
    ● SLES 12 SP5
    ● SLES 15 SP2 / openSUSE 15.2
    *Distributions such as Rocky Linux and AlmaLinux can be used as long as they stay 1:1
    binary compatible with RHEL 8. CentOS Stream is not supported by RStudio.

    View Slide

  9. Reference Architectures: Single Server

    View Slide

  10. Reference Architectures: Cluster

    View Slide

  11. RStudio Connect & Docker
    https://github.com/rstudio/rstudio-docker-products
    RStudio products are designed to
    live on long-running Linux
    servers. RStudio products are
    entirely compatible with treating
    a container like the underlying
    Linux server to better encapsulate
    dependencies and diminish server
    statefulness.
    In this model, each RStudio
    product is placed in its own
    long-running container and
    treated as a standalone instance
    of the product. Multiple
    containers can be load-balanced
    and treated as a cluster. These
    containers can be managed by a
    Kubernetes cluster, should you
    wish.
    There are some specific
    considerations for running
    RStudio products in containers,
    which are detailed in this article.

    View Slide

  12. Types of Evaluations
    1. Not very useful to Admins: RStudio Hosted Evaluation
    https://www.rstudio.com/products/connect/evaluation/
    2. Useful but you have to DIY: 45-day Evaluation Key
    https://www.rstudio.com/products/connect/download-commercial/

    View Slide

  13. DEMO
    Configuration from Scratch

    View Slide

  14. Authentication Decision Making
    Authentication
    Provider Type
    User
    Self-Registration
    Authorization via Groups Current-User
    Execution
    Password (built-in) Yes (can also be disabled) Groups must be managed locally in Connect No
    PAM No Groups must be managed locally in Connect Yes (per-app basis)
    LDAP/AD No Groups can come from the Provider or be local to
    Connect
    No
    SAML No Groups can come from the Provider or be local to
    Connect
    No
    OIDC - Google No Groups must be managed locally in Connect No
    OIDC - Others No Groups can come from the Provider or be local to
    Connect
    No
    Proxied through an
    external service
    No Groups can come from the Provider or be local to
    Connect
    No

    View Slide

  15. The Key to Configuration: Publisher Relations
    The most successful RStudio Connect Installations require
    open dialog between admins and publishers.
    ● Do you know who your Publishers are?
    ● Do your Publishers know who you are?
    ● Do you know what types of content they will be publishing?
    ● Do you know what versions of R and Python they'll need?
    ● Do you know how they plan to connect to data sources?
    This is also your chance to make any Dev/SecOps policies and expectations known.

    View Slide

  16. More Publisher Discussion Topics
    Connect Access Restrictions
    ● Will you place limits on allowed viewership? (all, logged_in, acl)
    ○ Applications.MostPermissiveAccessType
    ○ Applications.AdminMostPermissiveAccessType
    Vanity URL Management
    ● Will Publishers be allowed to set vanity URLs for content?
    ○ Authorization.PublishersCanManageVanities
    Content Organization: Tags
    ● Work with your Publishers to set up a Tag Schema for content organization

    View Slide

  17. Resource Management & Budgeting

    View Slide

  18. View Slide

  19. Scaling Applications
    ● R is single threaded
    ● Load balance between processes
    ● Application owners can set Minimum and Maximum processes
    ● Use Scheduler.MinProcessLimit to cap resources if this
    becomes a problem
    ● Scheduler.MaxProcessLimit is also available

    View Slide

  20. Application Timeouts
    ● The maximum amount of time to wait for an app to start
    Scheduler.InitTimeout = 60s
    ● The minimum time to keep a worker process alive after it goes idle
    Scheduler.IdleTimeout = 5s
    After the last user disconnects from a process, RStudio Connect waits 5s before
    that process is reaped.
    You might want to increase Scheduler.IdleTimeout if you have a process that
    is resource-intensive to start up.

    View Slide

  21. Report Concurrency
    ● Applications.ScheduleConcurrency (default: 2)
    ● Maximum number of scheduled reports to run in parallel
    ● Setting this to zero will disable scheduled execution
    This lets you control (throttle) scheduled content execution
    ● If all your publishers schedule reports to run at midnight, Connect
    will iterate through them as quickly as possible.

    View Slide

  22. Disk Usage Resource Management
    Things Connect stores on disk:
    ● Content bundles (uploaded compressed bundles from users)
    ● Unzipped bundles for running applications
    ● Package cache
    ○ One copy of each version of each package specific to the R (or Python)
    version
    ● Metrics (RAM and CPU usage)
    ● R/Python process information/logs

    View Slide

  23. Content Bundle Retention
    Throttle the number of bundles retained for each content item
    ● Applications.BundleRetentionLimit (default 0, which retains everything)
    If you experience problems with large bundles:
    ● Ask publishers not to package large sets of data in the content bundle and
    provision data on the server separately

    View Slide

  24. Process Information Retention
    ● Maximum number of jobs preserved on disk for any one application:
    ○ Jobs.MaxCompleted (default: 1000)
    ● Maximum age of a completed job retained on disk:
    ○ Jobs.OldestCompleted (default: 30d)
    On-disk job metadata is removed if either the MaxCompleted or
    OldestCompleted restrictions are violated.
    Adjust this retention window based on your auditing requirements.

    View Slide

  25. How will your publishers deploy to Connect?
    Three ways to publish content to RStudio Connect:

    View Slide

  26. Code Promotion

    View Slide

  27. Publishing methods for Code Promotion

    View Slide

  28. DEMO
    Git-backed Code Promotion

    View Slide

  29. View Slide

  30. View Slide

  31. View Slide

  32. Importance of an Environment Management Strategy
    Environment management takes work. Here are some cases where the reward is
    worth the effort:
    ● When you are working on a long-term project, and need to safely upgrade
    packages.
    ● In cases where you and your team need to collaborate on the same project, using a
    common source of truth.
    ● If you need to validate and control the packages you’re using.
    ● When you are ready to deploy a data product to production, such as a Shiny app, R
    Markdown document, or plumber API.

    View Slide

  33. Private Packages
    Many organizations find value in hosting their own package repository.
    Hosting an internal repository allows organizations to:
    ● Share and version their internal packages
    ● Access and govern packages from external sources
    ● Audit package use

    View Slide

  34. Validated Environment Management
    Recommended Exercises:
    ❏ Review the curated resources and recommendations for
    Using R for Validated Work
    ❏ Can you recreate your environment?
    ❏ Can you trust the things in your environment?
    ❏ Learn about the Validated Environment Strategy
    ❏ Learn about Internal Package Repositories

    View Slide

  35. Reproducibility & Environment Strategy Maps
    To select a strategy, you need to answer two questions:
    ● Who is responsible for managing the environment?
    ● How open is the environment?

    View Slide

  36. Custom Branding
    ● Replace the RStudio logo and favicon with your own.
    ● Direct logged-in users to a landing page of your choice when they first enter
    RStudio Connect.
    ● Generate custom content landing pages with R code using connectwidgets.
    ● Customize what anonymous and logged-out users see when they visit your server.
    ● Control email settings such as sender display name, “from” address, sender
    address headers, and subject prefix.
    ● Hide the Documentation tab from viewers.

    View Slide

  37. Branding Configuration Settings

    View Slide

  38. Email Customization Settings

    View Slide

  39. Custom Landing Pages
    Create a custom landing
    page that all anonymous
    or logged-out users will
    see.
    Workbook Exercise:
    Use the Server.LandingDir
    configuration setting to specify the
    path to a directory that contains
    index.html and all assets (CSS,
    images, javascript, etc.)

    View Slide

  40. Other Types of Custom Landing Pages
    Landing Pages for Logged-in Users
    ● Server.RootRedirect (Default: The
    Server.Dashboard path) The URL logged-in
    users will be redirected to when visiting the
    public URL used to access the server.
    ● Server.DashboardPath (Default: "/connect")
    The URL path name to be used where RStudio
    Connect's dashboard is hosted.
    One option for creating a
    custom landing page is to make
    a content showcase with the
    connectwidgets R package.

    View Slide

  41. Unsupported Customizations (November 2021)
    ● RStudio Connect dashboard color palette
    ● Hiding Tags from viewers
    ● Removal of footer text that says “Powered by RStudio Connect”
    ● Removal of RStudio copyright information

    View Slide

  42. Special Considerations for Consultancies (External Users)
    ● Branding and Landing Page Customization
    ● Managing multiple clients
    ○ User Isolation: Authorization.ViewersCanOnlySeeThemselves, Server.HideEmailAddresses
    ○ Viewer Restrictions: Server.ViewerKiosk When enabled, users with viewer role will not be allowed to
    submit permission requests for content access or to request elevated role privileges.
    ● Multiple authentication providers
    ○ Federated authentication: RStudio Connect will authenticate against an external identity provider
    (usually via SAML), and the provider will federate identity management to all the different
    authentication providers.

    View Slide

  43. Federated Identity Management

    View Slide

  44. Golden Rules of RStudio Connect Configuration
    ❏ Check your configuration file: Is Server.Address set?
    ❏ Verify your email server configuration: Send a test email
    ❏ Maintain an open dialog with your publisher users
    ❏ Before you start publishing content:
    ❏ Make an informed decision about your authentication provider
    ❏ Make an informed decision about your package repository
    ❏ Life is better with Package Manager or an Internal Repository

    View Slide

  45. Admin Experience

    View Slide

  46. License Management
    ● RStudio Connect uses the license-manager to determine if a valid
    license is available:
    sudo /opt/rstudio-connect/bin/license-manager status
    ● The Connect dashboard will display a notification to admins and
    publishers when the license is within 15 days of expiration.
    ● You can disable this with Licensing.ExpirationUIWarning

    View Slide

  47. User Management
    ● Adding Users
    ○ Accounts can be either created / pre-provisioned or auto-registered. Details and
    capabilities differ by authentication provider.
    ○ Example: Server API driven user provisioning
    ● Locking Users
    ○ Forbids login and publishing
    ○ Removes user from your license count
    ○ Example: Server API documentation
    ● Removing Users
    ○ Last resort option
    ○ Could Require content ownership migration

    View Slide

  48. Group Management
    ● Local Groups
    ○ Manage through the UI: “People” tab
    ○ Manage with the RStudio Connect Server API
    ○ Disable local group support with: Authorization.UserGroups (existing groups
    must be removed)
    ● Remote Groups
    ○ Management is the responsibility of of the external authentication provider
    ○ Group memberships are locally synchronized through successful login events
    Note! Having a mix of Local and Remote groups on your server is not recommended.
    Migrate completely from one mode to the other when making a change.

    View Slide

  49. RStudio Connect API Keys
    ● Programmatically access content on RStudio Connect and use
    the Server API
    ● API Keys are associated with users, not content
    Resources:
    ● Server API documentation
    ● Server API Cookbook

    View Slide

  50. Setting up Programmatic Deployments
    DEMO: Azure DevOps Pipelines for content deployments
    Additional Resources:
    ● Publishing Methods Explained
    ● Publishing to RStudio Connect with Github Actions

    View Slide

  51. Making Announcements
    RStudio Connect provides several methods for posting custom HTML
    messages to the User Interface:
    ● Server.PublicWarning - Visible on the unauthenticated landing
    page
    ● Server.LoggedInWarning - Visible above recent content when
    logged in
    ○ Useful for things like scheduling maintenance windows

    View Slide

  52. End of Support for Python 2 (January 2022)
    Starting January 2022, RStudio Connect will no longer support Python 2.
    Factors that have gone into our decision include the following:
    ● Python 3 is now widely adopted and is the actively-developed version of the
    Python language.
    ● In January 2021, the pip 21.0 release officially dropped support for Python 2.
    ● A large number of projects pledged to drop support for Python 2 in 2020 including
    TensorFlow, scikit-learn, Apache Spark, pandas, XGBoost, NumPy, Bokeh,
    Matplotlib, IPython, and Jupyter notebook.

    View Slide

  53. Exercise: Use the RStudio Connect Server
    API to audit the versions of R/Python in use

    View Slide

  54. Other Server API Project Ideas
    ● Build a report examine access control list details for each content item on your
    RStudio Connect server Example
    ● Audit all the unpublished (orphaned) content items on your RStudio Connect
    server Example
    ● Audit all the vanity URLs currently in use on your RStudio Connect server Example
    ● Audit all the tags currently in use on the server, and list all the tagged content items
    Example

    View Slide

  55. Content Usage Data & Tracking
    Example
    Shiny Applications:
    ● Records information
    about each visit and the
    length of that visit
    Other Content:
    ● Records information
    about each visit: user,
    timestamp, content
    rendering info

    View Slide

  56. Managing RStudio Connect Upgrades
    ● RStudio Connect versions are supported for 18 months
    ● We recommend upgrading at least once a year.
    ● Most upgrades should require less than five minutes unless
    breaking changes have occurred in the interim and require
    configuration adjustments.
    ● Consult the release notes before undergoing an upgrade.

    View Slide

  57. Performing an Upgrade
    Download and run the installation script
    The installation script works across all supported Linux distributions,
    validates the GPG key of the downloaded package, and includes
    support for offline use.
    Example:
    curl -Lo rsc-installer.sh https://cdn.rstudio.com/connect/installer/installer-v1.9.5.sh
    sudo -E bash ./rsc-installer.sh 2021.10.0

    View Slide

  58. RStudio Product Support
    Submit a Support Ticket: https://support.rstudio.com/hc/en-us/requests/new
    Generate a server diagnostic report:
    If you are on RStudio Connect version 1.7.2 and later, run the following command on the
    server and send us the output:
    sudo /opt/rstudio-connect/scripts/run-diagnostics.sh /path/to/output/dir

    View Slide

  59. RStudio Connect
    Roadmap

    View Slide

  60. RStudio Connect Investments
    Short Term Future
    Vision: Data scientists own the publication, execution, management, and distribution of their work in a
    safe and sophisticated manner, fully sanctioned by their IT admins.
    Strategic Goals: Increase the types of content available to share, improve content discovery and
    management, and facilitate production deployments.
    Early 2022
    Administrators can enable
    remote content execution on a
    Kubernetes back-end while
    maintaining easy self-serve
    publishing.
    ● Publishers are able to
    drive viewer engagement
    on their work
    ● Publishers can manage
    process automation
    workflows
    ● Feature parity for Python
    users
    ● Extend Cloud Native
    capabilities to ease
    integrations
    ● Improvements to
    Docker-friendly
    installation
    October 2021
    BI Integration: Extend
    Tableau dashboards with
    R, Shiny and Python

    View Slide

  61. Invitation to the Beta Program for Off-Host Execution
    ● Begins in December, runs until the GA launch in early 2022
    ● Beta will not have feature parity with RStudio Connect local execution
    Sign-up form
    Requirements:
    ● A Kubernetes cluster where you have full cluster-admin privileges
    ● A PostgreSQL database that meets Connect’s requirements
    ● An NFS server that meets Connect’s shared storage requirements
    ● Willingness to provide feedback on the installation/configuration process
    ● Publishers who are willing to provide feedback

    View Slide