2022-04-29 Ray紹介@機械学習の社会実装勉強会

PythonをシンプルにスケーラブルにするRayの紹介 Naka Masato

自己紹介名前那珂将人経歴 • アルゴリズムエンジニアとしてレコメンドエンジン開発 • インフラ基盤整備 GitHub: https://github.com/nakamasato
Twitter: https://twitter.com/gymnstcs

Rayとは UC Berkeley RISE Lab で開発されたオープンソースのプロジェクト As a general-purpose
and universal distributed compute framework, you can ﬂexibly run any compute-intensive Python workload — 1. from distributed training or 2. hyperparameter tuning to 3. deep reinforcement learning and 4. production model serving. Deep learning から Model Serving まで開発者が簡単にスケールできる https://www.ray.io/

Components さまざまな Package がある 1. Core: コア 2. Tune: Scalable
hyperparameter tuning 3. RLlib: Reinforcement learning 4. Train: Distributed deep learning (PyTorch, TensorFlow, Horovod) 5. Datasets: Distributed data loading and compute 6. Serve: Scalable and programmable serving 7. Workﬂows: Fast, durable application ﬂows

Concept 1. Tasks: 異なる Python ワーカ上で実行される非同期関数 2. Actors: Task を拡張した
Class で本質的にはステートフルなワーカ 3. Objects: Task や Actor が作成されるもの ( クラスタの各ノードにある Object ストアに保存される ) 4. Placement Groups: 複数のノード上でのリソースのグループを保存する (e.g. Gang Scheduling) 5. Environment Dependencies: Task は複数のマシン上で実行されるので実行環境で依存パッケージや環境変数などが使えるように設定 ( ①クラスタ設定、② Runtime 環境設定 )

Ray Example 1. ray.init(): ray クラスタの初期化 2. @ray.remote: 関数を task
(remote function) にするデコレータ 3. func.remote(): Task の呼び出し→ future が返る 4. ray.get(future): 結果を取得 ray.init でクラスタ上で task を複数のマシンで実行できる

Ray.init - クラスタへの接続 ray.init(): 既存クラスタへの接続 or クラスタ作成 + 接続
1. init() ローカルの場合 a. Redis, raylet, plasma store, plasma manager, some workers をスタートして接続 2. init(address=“auto”) or init(address=“ray://123.45.67.89:10001”) → 既存のクラスタに接続 Task の処理を分散して実行できる (remote function) Ray cluster task (@ray.remote) ray.get(futures)

Ray Cluster Cluster: 1. Head node 2. Worker node Launch
a cluster: 1. The cluster launcher: ray up conﬁg.yml 2. The kubernetes operator: helm -n ray install example-cluster --create-namespace ./ray Supported Cloud: 1. AWS 2. Azure 3. GCP 4. Aliyun

Ray Cluster作成 ~ AWS Prerequisite: 1. aws configure (default profile
のみ対応 ?) 2. IAM 権限 IAM と EC2 の作成用が必要 (Docs で明記されてない ?) 3. VPC と Subnet は事前に必要 Step: 1. config.yaml 作成 a. 右の yaml ap-northeast-1 では動かず 2. ray up -y config.yaml a. Minimal で 3 分弱

Ray Cluster作成 - AWS 意外とハマりどころがある 1. Prerequisite (AWS のプロファイル、 IAM
権限、 VPC) でコケる 2. example の conﬁg.yml が簡単に動かない (ap-northeast-1) a. Subnet なくてエラー b. AMI イメージ選択 c. Ray cluster 作成が落ちる `pip not found`, `docker not found` 3. Ray cluster を削除しても AWS のリソースが残る (key pair, IAM, security group…)

Ray Cluster - AWS - Jobの提出方法ローカルから Ray Cluster の
Head に直接接続ではなく SDK や CLI から Job を提出する Ray cluster ray.init(“ray:// 10.0.0.1:100 01”) python example.py (local) ray submit conﬁg.yml example.py CLI 詳細 : https://docs.ray.io/en/master/cluster/job-submi ssion.html#job-submission-architecture

Ray Cluster on Kubernetes 1. Helm でインストール可能 a. helm -n
ray install example-cluster --create-namespace ./ray 2. インストールされるもの a. ray-operator: raycluster を管理するコンポーネント b. raycluster (custom resource) -> 3 pods (1 head + 2 worker) c. service: head へアクセスするエンドポイント 3. Ray Job の Submit a. Dashboard のサービスをローカルに Port Forward & CLI で提出 i. kubectl -n ray port-forward service/example-cluster-ray-head 8265:8265 ii. ray job submit –runtime-env-json=... – python script.py b. Ray Head 10001 を Port Forward して ray.init(“local”) でローカル Run (Security 的に微妙 ) c. Kubernetes の Pod (Job などから ) で ray.init(“head-service”) を Kubernetes クラスタから実行 + 環境変数から Head の情報を渡す https://github.com/ray-project/ray/tree/master/doc/kubernetes/example_scripts

まとめ今日 1. Ray の基本的な使い方 a. Concept + クラスタ作成 ToDo:
1. Ray の多様な機能 (Data, Train, Tune, Serve, RLlib, Workflows) 2. Pytorch 、 Tensorflow の Distributed Training に Ray を使うメリット・デメリット 3. Kubeflow Training Operator との比較

2022-04-29 Ray紹介@機械学習の社会実装勉強会

2022-04-29 Ray紹介@機械学習の社会実装勉強会

Naka Masato

More Decks by Naka Masato

Other Decks in Technology

Featured

Transcript

PythonをシンプルにスケーラブルにするRayの紹介 Naka Masato

自己紹介名前那珂将人経歴 • アルゴリズムエンジニアとしてレコメンドエンジン開発 • インフラ基盤整備 GitHub: https://github.com/nakamasato

Rayとは UC Berkeley RISE Lab で開発されたオープンソースのプロジェクト As a general-purpose

Components さまざまな Package がある 1. Core: コア 2. Tune: Scalable

Concept 1. Tasks: 異なる Python ワーカ上で実行される非同期関数 2. Actors: Task を拡張した

Ray Example 1. ray.init(): ray クラスタの初期化 2. @ray.remote: 関数を task

Ray.init - クラスタへの接続 ray.init(): 既存クラスタへの接続 or クラスタ作成 + 接続

Ray Cluster Cluster: 1. Head node 2. Worker node Launch

Ray Cluster作成 ~ AWS Prerequisite: 1. aws conﬁgure (default proﬁle

Ray Cluster作成 - AWS 意外とハマりどころがある 1. Prerequisite (AWS のプロファイル、 IAM

Ray Cluster - AWS - Jobの提出方法ローカルから Ray Cluster の

Ray Cluster on Kubernetes 1. Helm でインストール可能 a. helm -n

まとめ今日 1. Ray の基本的な使い方 a. Concept + クラスタ作成 ToDo: