NVIDIA AI Enterprise for Red Hat OpenShift

Slide 1

Slide 1 text

Slide 2

Slide 2 text

NVIDIA AI Enterprise with OpenShift NVIDIA AI Enterprise NVIDIAが認定、ライセンス、サポートを提供する包括的なクラウドネイティブAIソフトウェアスイートです。 Red Hat OpenShiftとNVIDIA-Certified Systems上で動作することが認定されています。 Red Hat OpenShift Red Hatが提供するエンタープライズ対応のコンテナプラットフォームです。フルスタックの自律運用と開発者向けのセルフサービス・プロビジョニングを提供しており、クラウドやオンプレミスなど様々な環境に対応しています。 ref. https://resources.nvidia.com/en-us-nvidia-ai-enterprise/nvaie-red-hat-overview

Slide 3

Slide 3 text

AI活用におけるRed Hat OpenShiftの魅力インフラリソースへのセルフサービスアクセス Self-Service Access to Infrastructure Resources MLOpsパイプラインの安全な自動化 Securely Automate MLOps Pipelines Kubernetesを活用したアプリケーション開発 Kubernetes-Powered Application Development コンテナ化されたAIツールやインフラリソースを利用するためのセルフサービスと一貫したクラウド体験 OpenShift GitOpsやPipelines を活用した機械学習パイプラインにおける学習/推論フェーズの自動化機能の拡張 GPUに最適化されたコンテナイメージを使って、ML/DLモデルを容易に開発・デプロイ・スケールできる環境を提供

Slide 4

Slide 4 text

インフラリソースへのセルフサービスアクセス Self-Service Access to Infrastructure Resources

Slide 5

Slide 5 text

NVIDIA AI Enterprise Software Suite OpenShiftでのRDMAおよびGPU Direct RDMAワークロードを有効化し、ネットワーク関連のNVIDIAコンポーネント管理を自動化します NVIDIA Network Operator GPUのプロビジョニングに必要な NVIDIA Driver、Kubernetes Device PluginなどのNVIDIAソフトウェアコンポーネント管理を自動化します NVIDIA GPU Operator AIのCloud-Native Deployment に欠かせないインフラリソース管理の完全自動化 Self-Service Access to Infrastructure Resources

Slide 6

Slide 6 text

GPUリソースの取り扱い Self-Service Access to Infrastructure Resources 仮想化におけるGPUリソースの取り扱い OpenShiftにおけるGPUリソースの取り扱い NVIDIA-Certified Systems NVIDIA GPU Hypervisor vGPU vGPU Guest VM NVIDIA Driver Applications Guest VM NVIDIA Driver Applications NVIDIA Virtualization Software NVIDIA-Certified Systems NVIDIA GPU GPU Operator NVIDIA Container Runtime Container Applications NVIDIA Driver NVIDIA GPU NVIDIA Kubernetes Device Plugin NVIDIA GPU Monitoring Container Applications GPUリソース管理の完全自動化互換性維持の複雑化

Slide 7

Slide 7 text

NVIDIA GPU Operator NVIDIA-Certified Systems Self-Service Access to Infrastructure Resources NVIDIA コンテナランタイムと連携し、コンテナを介して NVIDIAドライバを提供します (*詳細) NVIDIA Driver Kubernetesクラスタの各ノードにあるGPUの状態を管理し、コンテナ起動時のGPUを割り当てます NVIDIA Kubernetes Device Plugin DCGM(Data Center GPU Manager)-exporterを利用して、 NVIDIA GPUデバイスの監視(ヘルスモニタリング、ポリシー、グループ管理など)を行います NVIDIA GPU Monitoring

Slide 8

Slide 8 text

Operator Framework Self-Service Access to Infrastructure Resources ref. https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/openshift/install-gpu-ocp.html#installing-the-nvidia-gpu-operator-by-using-the-web-console OpenShift Web UIの「Operator Hub」から数クリックで導入できます OpenShift Web UIにおける GPUリソースの可視化 1クリックインストール

Slide 9

Slide 9 text

MLOpsパイプラインの安全な自動化 Securely Automate MLOps Pipelines

Slide 10

Slide 10 text

MLOpsを支えるRed Hatのサービス Securely Automate MLOps Pipelines (OpenShift Pipelines) KubernetesネイティブなCI/CDパイプライン作成のフレームワーク (Red Hat AMQ Streams) スケーラビリティに優れた分散メッセージキュー (Red Hat OpenShift Data Science) AI/MLツール運用の自動化・セルフサービス化したセット (Red Hat Quay) 分散型コンテナイメージレジストリ (OpenShift GitOps) KubernetesネイティブなGitOps ベースのCDツール (Red Hat OpenShift Data Foundation) コンテナ専用のソフトウェア・デファインド・ストレージ(SDS)

Slide 11

Slide 11 text

MLOpsのパイプライン Securely Automate MLOps Pipelines どのような機械学習モデルを作るためにも、データの準備と収集、モデルの開発(学習)、モデルの展開と兆候の監視 (推論)というサイクルは一貫しています推論フェーズ (Serving) 学習フェーズ (Training) Step.1 | Model Development Step.2 | ML model training pipelines Step.3 | ML models serving pipelines Step.4 | Monitoring Validation

Slide 12

Slide 12 text

学習(Model Training)フェーズ Securely Automate MLOps Pipelines Data Store Model Development ML Model Model Store Model Image Test Step.2 | ML model training pipelines Step.1 | Model Development Red Hat OpenShift上の Jupyter notebooksを使い機械学習モデルを構築します【OpenShift Pipelines】イベント駆動の継続的インテグレーションによって、機械学習モデルをコンテナイメージ化します ▶ Saving: デプロイ準備ができたモデルをModel Storeに保存 ▶ Converting: モデルをコンテナイメージに変換 ▶ Testing: モデルのイメージをテストして機能を確認 ▶ Storing: コンテナレジストリに確認済みのコンテナイメージを保存 Image Registry OpenShift Pipelines by

Slide 13

Slide 13 text

推論(Model Serving)フェーズ Securely Automate MLOps Pipelines Configuration Repository (Manifests) Trigger ML Service Intelligent App Step.4 | Monitoring validation Step.3 | ML models serving pipelines PrometheusやGrafana、サードパーティツールを使い、学習モデルによる推論のパフォーマンスを監視し、必要に応じて再トレーニングやデプロイを行います【OpenShift GitOps】マニフェストを監視し、機械学習モデルを安全にデプロイします ▶ Configuring: Gitリポジトリ経由での構成設定 ▶ Monitoring:設定用マニフェストファイルの変更差分を監視 ▶ Triggering: MLサービス上のモデルを更新 ▶ Deploying: EdgeやDatacenter、クラウドなどに展開 Monitor drift OpenShift GitOps by Deploy

Slide 14

Slide 14 text

Kubernetesを活用したアプリケーション開発 Kubernetes-Powered Application Development

Slide 15

Slide 15 text

OpenShiftのInstall Type Kubernetes-Powered Application Development NVIDIA-Certified Systems NVIDIA GPU RHEL NVIDIA-Certified Systems OpenShift NVIDIA GPU RHEL / RHCOS NVIDIA-Certified Systems OpenShift NVIDIA GPU RHEL / RHCOS Hypervisor (VMware) NVIDIA AI Enterprise 物理マシン OS/Hypervisor 仮想マシン TensorFlow PyTorch NVIDIA AI Enterprise TensorFlow PyTorch NVIDIA AI Enterprise TensorFlow PyTorch Container Orchestrator Container Container Runtime(Podman) 1. Virtualization 2. Bare-Metal N/A N/A

Slide 16

Slide 16 text

AI開発のコンテナサポート Kubernetes-Powered Application Development NVIDIA-Certified Systems OpenShift NVIDIA GPU RHEL / RHCOS NVIDIA AI Enterprise TensorFlow PyTorch NVIDIA AI Enterprise に含まれるサポートサービス開発のランタイム Application Runtimes Python Java AI開発専用のフレームワーク AI and data science frameworks: - TensorFlow - PyTorch - NVIDIA TAO Toolkit - NVIDIA Triton Inference Server - NVIDIA TensorRT - NVIDIA RAPIDS Red Hat OpenShift に含まれるサポート Application Streams: - PHP - Python - Perl - Node.js - Ruby - OpenJDK - Quarkus - MySQL / MariaDB etc…

Slide 17

Slide 17 text

NVIDIA AI Enterprise 2.0のサポートOS Kubernetes-Powered Application Development ref. https://docs.nvidia.com/ai-enterprise/latest/product-support-matrix/index.html Install Type Hypervisor or Bare-Metal OS Guest OS Support Virtualization VMware vSphere Hypervisor (ESXi) Enterprise Plus Edition 7.0 Update 2 or 3 > Ubuntu 20.04 LTS > Red Hat Enterprise Linux 8.4 > Red Hat OpenShift 4.9 Virtualization VMware vSphere 6.7 Bare-Metal Ubuntu 20.04 LTS Bare-Metal Red Hat Enterprise Linux 8.4 Bare-Metal Red Hat OpenShift 4.9 w/Red Hat Linux CoreOS (RHCOS) Install Typeが仮想化(Virtualization)でもベアメタル(Bare-Metal)でも、コンテナ自体は「Ubuntu(20.04 LTS)」または「RHEL(8.4 or RHCOS)」の上で展開されることがサポート要件

Slide 18

Slide 18 text

RHELの恩恵によるトータルサポート Kubernetes-Powered Application Development OpenShift RHEL / RHCOS Container UBI Applications Container UBI Applications Container Platform Operation System Container Base Image Applications Container Base Image Applications Container Orchestration OS Container Image コンテナイメージとして展開されるUBIは、RHEL(Red Hat Enterprise Linux)のライフサイクルに基づいてサポートされます。 Universal Base Image Red Hatのコンテナ実行環境を利用する場合、UBIの使用を完全にサポート OpenShift ホストOSであるRHEL/RHCOSのサポートを含む RHEL コンテナランタイムとしての稼働をサポート ref. https://access.redhat.com/articles/2726611

Slide 19

Slide 19 text

Conclusion NVIDIA AI Enterprise with OpenShift

Slide 20

Slide 20 text

NVIDIAとRed HatのAI対応プラットフォーム NVIDIA AI Enterprise NVIDIA-Certified Systems インフラリソースへのセルフサービスアクセス Self-Service Access to Infrastructure Resources MLOpsパイプラインの安全な自動化 Securely Automate MLOps Pipelines Kubernetesを活用したアプリケーション開発 Kubernetes-Powered Application Development

Slide 21

Slide 21 text

linkedin.com/company/red-hat youtube.com/user/RedHatVideos facebook.com/redhatinc twitter.com/RedHat 22 Thank you Red Hat is the world’s leading provider of enterprise open source software solutions. Award-winning support, training, and consulting services make Red Hat a trusted adviser to the Fortune 500.

Slide 22

Slide 22 text

https://www.redhat.com/en/partners/nvidia