Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Powering Open Data Hub with Ray (Erik Erlandson...

Powering Open Data Hub with Ray (Erik Erlandson, Red Hat, AI Center of Excellence)

Ray is quickly gaining momentum as a distributed computing platform that combines a powerful parallel compute model with a cloud native serverless-style scaling model. Open Data Hub (ODH) is a flexible and customizable federation of open source data science tools that is a great fit for taking advantage of Ray compute clusters.

In this talk, Erik will explain how to integrate Ray with Open Data Hub, by configuring ODH profiles that deploy on-demand Ray clusters for Jupyter notebooks. He’ll demonstrate Ray in action as a compute resource for ODH, and explore the potential use cases opened up by self-service notebooks backed by Ray. Along the way he’ll also discuss the logistics of adapting Ray to OpenShift’s security features.

Attendees will learn how Ray integrates with Open Data Hub’s architecture, and how they can power ODH with Ray to solve distributed computing problems in the popular Jupyter environment.

Anyscale

July 21, 2021
Tweet

More Decks by Anyscale

Other Decks in Technology

Transcript

  1. Native Ray Libraries • Tune: Scalable Hyperparameter Tuning • RLlib:

    Scalable Reinforcement Learning • RaySGD: Distributed Training Wrappers • Ray Serve: Scalable and Programmable Serving
  2. Ray Community Integrations • XGBoost • Dask • Horovod •

    sklearn • Spacy • huggingface https://docs.ray.io/en/master/ray-libraries.html
  3. Ray Community Integrations • XGBoost • Dask • Horovod •

    sklearn • Spacy • huggingface https://docs.ray.io/en/master/ray-libraries.html
  4. Data Science with ODH Set goals Gather and prepare data

    Develop ML model Deploy ML models in app dev process Implement Apps & Inference ML models Monitoring & Management
  5. Data Science with ODH Set goals Gather and prepare data

    Develop ML model Deploy ML models in app dev process Implement Apps & Inference ML models Monitoring & Management
  6. Data Science with ODH Set goals Gather and prepare data

    Develop ML model Deploy ML models in app dev process Implement Apps & Inference ML models Monitoring & Management App developer IT operations Data engineer Business leadership Data scientists ML Engineer
  7. Data Science with ODH Set goals Gather and prepare data

    Develop ML model Deploy ML models in app dev process Implement Apps & Inference ML models Monitoring & Management App developer IT operations Data engineer Business leadership Data scientists ML Engineer Seldon Jupyter Ceph Spark TensorFlow Kafka SuperSet Argo/Airflow/Tekton Hue Prometheus/Grafana Argo/Airflow/Tekton Ceph Kafka Seldon Middleware M odel to M icroservice
  8. Dog-Fooding ODH at Red Hat Application Logs Applications in the

    product release pipeline store their runtime logs in our system. These groups are also engaged for anomaly detection Cluster Metrics Operational metrics from OpenShift clusters. AIOps is engaged here. Customer Support Data Storage of customer data like SOSReports, customer feedback, etc.
  9. Analogy: Spark on ODH ODH JupyterHub Launcher Spark SingleUser Profile

    Spark Cluster Service Template Jupyter Environment
  10. Analogy: Spark on ODH Spark cluster ODH JupyterHub Launcher Spark

    SingleUser Profile Spark Cluster Service Template Jupyter Environment
  11. Analogy: Spark on ODH Spark cluster ODH JupyterHub Launcher Spark

    SingleUser Profile Spark Cluster Service Template Jupyter Environment
  12. Analogy: Spark on ODH Spark cluster Spark SingleUser Profile Spark

    Cluster Service Template ConfigMap ConfigMap
  13. Ray on ODH? Ray cluster ODH JupyterHub Launcher Ray SingleUser

    Profile Ray Cluster Service Template Jupyter Environment
  14. Demo: Ray on ODH! Ray cluster ODH JupyterHub Launcher Ray

    SingleUser Profile Ray Cluster Service Template Jupyter Environment
  15. Ray on ODH at the Mass-Open Cloud Led by Boston

    University, the MOC is a collaborative effort among BU, Harvard, UMass Amherst, MIT, and Northeastern University, as well as the Massachusetts Green High-Performance Computing Center (MGHPCC) and Oak Ridge National Laboratory (ORNL). It is supported by a broad alliance of industry partners, including Red Hat.
  16. Ray on MOC • Maximum 5 workers + 1 head

    • 1 CPU, 1 GB memory • Pre-installed:
  17. Collaboration: IBM • Ray with Code Engine • Ray on

    IBM OpenShift Clusters • Scikit-Learn pipelines on Ray • Ray Use Cases ◦ Machine Learning Model Explorations ◦ Earth Science
  18. IBM Research at Ray Summit Raghu Ganti: Scaling and Unifying

    SciKit Learn and Spark Pipelines using Ray Linsong Chu: Serverless Earth Science Data Labeling using Unsupervised Deep Learning with Ray
  19. Roadmap • Community Ray Operator in Catalog • Maintain Ray

    Images via Project Thoth • Community Use Cases With Jupyter • Formal Integration With KF and ODH • KF Pipeline Nodes Backed by Ray
  20. Call To Action • Play with Ray on Jupyter up

    on MOC • File issues and PRs with op-1st • Report Back! [email protected] https://www.operate-first.cloud/users/moc-ray-demo/README.md https://odh.operate-first.cloud/