Kubernetes-based GPU as a Service Platform by using Open Source Software [GTC 2020]

Kubernetes-based GPU as a Service Platform using Open Source Software

Who are we? AI Category Owner Lee joined CyberAgent in
2016. Contributing to improving in-house products as Solution Architects and platform development (e.g., our OpenStack and container service). Lee is also developing an AI platform as an AI category owner. Lee Yeongjae Masaya Aoyama Shuichiro Makigaki Daisuke Takahashi K8s aaS Product Owner Implemented GKE-like Kubernetes as a service on private cloud as product owner and supported the "Developer Experts" for Kubernetes projects at CyberAgent. Co-chair of the largest Cloud Native conference in Japan. ML/Backend Engineer Joined CyberAgent in 2016. Mainly works for in-house system development as backend engineer and architect. He also works in platform development (OpenStack and container service) and is developing an AI platform. Infrastructure Engineer Mainly responsible for development of private OpenStack platform and Kubernetes-as-a-Service as well as effective utilization of various accelerator devices. Building underlying physical infrastructures for GPUaaS/AI platform.

Agenda 1. Overview of CyberAgent, Inc. 2. Why we decided
to use an on-premise environment 3. Kubernetes-based GPU-as-a-Service Platform 4. AI Platform 5. Physical layer around GPU 6. Conclusion

“To create the 21st century’s leading company” Media A variety
of media services enjoyed by countless people ➔ AbemaTV ➔ AWA ➔ WinTicket Advertisement Offering comprehensive advertising solutions from agency business to ad technologies ➔ Dynalyst ➔ CA Wise ➔ AIR TRACK Game Developing 50+ smartphone games (including eight major titles on various platforms) ➔ GRANBLUE FANTASY ➔ PRINCESS CONNECT! Re:Dive ➔ Shadowverse 3 Main Segments ※「ABEMA」：© Abema TV, Inc. ※※「GRANBLUE FANTASY」、「PRINCESS CONNECT! Re:Dive」： © Cygames, Inc.

Why AI solution for advertising? • To reduce the time
needed to create effective ads and domain knowledge of customer's business • To discover new highly effective ad creatives • To predict the performance of ad creatives and prioritize them by ranking • To help analyze and improve the effectiveness of ad creatives • To identify and avoid ads that cause negative reaction ※「GRANBLUE FANTASY」： © Cygames, Inc. 97 Points Similar ad detected! Creative

Why GPUs? We must perform high processing volumes at high
speed. GPU power can contribute to our business. • There is a huge number of combinations of advertising and media. • Computational complexity increases as more demographic information (e.g., region, age, and gender) is considered. • A fast learning cycle is required because advertisements change rapidly in response to changing consumer interests. • The advertising system treats bidding; thus, increased inference latency affects our business critically and the requirement is severe.

Why on-premises? Functionalities • To build a flexible software stack
• To link existing services Costs • Cloud fees remain high • Total on-premise costs will be lower in the long term

Monthly cost ($) of GPU-only usage on cloud (part of
the business segment) Why on-premises?

Provide GPU instances for users • Multiple instances • Multiple
GPUs per instance Isolate GPUs between processes Pay out shared volumes for each tasks GPUaaS architecture overview and minimal requirements Container icons: https://icons8.jp/icons/set/video-card Computing resource pool Storage pool

Container-based vs VM-based vs metal-based • Pros for container-based ◦
Easy image packaging to run environment [cf. VM, Metal] ◦ Low overhead and short launch time [cf. VM] ◦ Environment isolation for multi-tenancy [cf. Metal] • Cons for container-based ◦ Low runtime isolation [cf. VM] ◦ Short lifecycle [cf. VM, Metal]

Kubernetes Aggregate computing resources and orchestrate containers, volumes, etc. =
aggregate GPUs and assigning to processes with volumes Computing resource pool Storage pool • Storage systems ◦ Block ◦ Shared filesystem ◦ Others

Isolation for multi-tenancy Kubernetes namespace can be isolated for multi-tenancy
NOTE: Container runtime (Docker / runC) cannot be completely isolated User A namespace User B namespace

User authentication/authorization on Kubernetes • Authentication ◦ Service account for
Kubernetes ◦ OIDC integration ◦ Cloud provider user/service account integration • Authorization ◦ Role-based access control（RBAC） ▪ CRUD specific resources only

Accessing GPU instances (containers) 1. Access via Jupyter notebook from
web browser 1. SSH-like access via kubernetes client tool $ kubectl exec -it PODNAME-0 -- bash PODNAME-0 #

Why Kubernetes? For "Cloud Native“ • Resiliency • Easily managed
• Observability • Fast updates • Others https://github.com/cncf/toc/blob/master/DEFINITION.md Methods: A. Reconciliation by Kubernetes B. Ecosystem C. Extending and customizing ⇒ Continue to improve the platform with OSS for business success Cloud Native means:

A: Reconciliation loop • Automatic recover (converge) to desired state
by many controllers ◦ Re-launch container (process) quickly ◦ Replace latest configs and credentials ◦ Reassign load balancer members Actual ReplicaSet (replicas = 3) Watch ReplicaSet Controller kind: ReplicaSet spec: replicas: 3 template: spec: containers: - image: nginx:1.16 Desired ReplicaSet

B: Automate with Kubernetes ecosystem • Prometheus/Grafana ◦ Monitor GPU
and server metrics • Cert-manager ◦ Create and update certificates with ACME • External-dns ◦ Associate IP address and hostnames • oauth2-proxy + nginx ingress ◦ OAuth2 authentication for WebUI • Others ◦ Auto scaling, templating settings, etc.

C: Extending and customizing with Kubernetes 1. Implement custom controller
with reconciliation model e.g., S3 image caching for volumes 2. Mutating container settings by webhook e.g., automatically inject credentials 3. Any status can be accessed via Kubernetes API e.g., collect usage status for billing 4. Store metadata to Kubernetes using ConfigMap or Secret e.g., a user’s container image references for web UI

Why Kubernetes? For "Cloud Native" • Resiliency • Easily managed
• Observability • Fast updates • Others https://github.com/cncf/toc/blob/master/DEFINITION.md Methods: A. Reconciliation by Kubernetes B. Ecosystem C. Extending and customizing ⇒ Continue to improve the platform with OSS for business success Cloud Native means:

NVIDIA and OSS https://events.linuxfoundation.org/kubecon-cloudnativecon-europe/program/schedule/ • Kubernetes GPU device plugin •
OSS monitoring stack at KubeCon EU 2020 presentation https://github.com/NVIDIA/k8s-device-plugin

Development speed will be reduced; thus, I want to complete
all processing with GCP or AWS. Because it's not as easy to perform machine learning as the GCP AI Platform. I don't use it because it's difficult to migrate from the public cloud. User’s voice of our GPUaaS (Multiple answers allowed) Please identify your dissatisfaction with GPUaaS.

Using computational resources in the right place We should select
what we should use. Creating cutting-edge environment for innovative products It is important to be best friends with the environment. Why on-premise AI platform? The public cloud has already provided many machine learning platforms. Why should we?

Example: AI platform training in Google Cloud A service to
train models via different customization options Supports different machine types, distributed training, hyperparameter tuning, and GPU/TPU acceleration Four simple steps: 1. Package training codes 2. Prepare job definition by YAML (with hyperparameter tuning if required) 3. Save code&YAML to Google Cloud Storage 4. Submit gcloud ai-platform jobs submit training https://cloud.google.com/ai-platform

Idea: GCP AI Platform-compatible on-prem. AI Platform Ease of use
is justification: many users, good IO interface, continuous improvement, easy to introduce, etc. Same configuration and codes • Introducing Kubeflow is reasonable • Treat GCP AI Platform Job = Kubeflow (Katib) resource • Abstract TFJob/PytorchJob/K8SJob, etc. Same commands • Implement compatible commands by kubectl plugins Remove barriers between on-premises & cloud

Army knife for machine learning on Kubernetes https://www.kubeflow.org/docs/started/kubeflow-overview/ • On-prem.
deployment • Resource usage control by Kubernetes • Hyperparameter tuning by Katib What is Kubeflow?

What is Katib (in Kubeflow)? Hyperparameter tuning component Optimize Hyperparameters
Neural Architecture Search Optimize neural network structure Multi-machine learning framework support TensorFlow, PyTorch, etc.

Katib Resources Experiment Suggestion Trial Trial Trial TFJob/PytorchJob/Job Pod Worker
Container Metrics Container Experiment Execution unit of hyperparameter tuning Contains all settings (e.g., algorithms) Suggestion Contains a hyperparameter pair according to the algorithm specified in the Experiment Trial Coordinate each hyperparameter from Suggestions Metrics Collector Katib DB

Overview of our AI platform

Same configuration/codes/commands kubectl ai-platform jobs submit training kubectl ai-platform jobs
list|get kubectl ai-platform jobs describe kubectl ai-platform jobs stream-logs kubectl ai-platform jobs cancel gcloud ai-platform jobs submit training gcloud ai-platform jobs list|get gcloud ai-platform jobs describe gcloud ai-platform jobs stream-logs gcloud ai-platform jobs cancel On-prem. resource Cloud resource GCP Job definition

Abstract TFJob/K8SJob, etc. by Katib Experiment Treat GCP AIP Job
as Katib Experiment Parse GCP-style Job definition on client side and convert it to Katib Experiment Transparent operation from end users User: create/delete Job = create/delete Experiment (internally) = create/delete TFJob/Pytorch Job (internally) Experiment Suggestion Trial Trial Trial TFJob/PytorchJob/Job Pod Worker Container Metrics Container Metrics Collector Katib DB kubectl plugin implementation

Run job w/o hyperparameter tuning If no hyperparameter tuning section
in Job definition, substitute it by limiting feasible space Parameters: FeasibleSpace: List: 0.02 Name: dummy ParameterType: discrete Experiment Suggestion Trial Trial Trial TFJob/PytorchJob/Job Pod Worker Container Metrics Container Metrics Collector Katib DB kubectl plugin implementation (cont.)

Serving should be in the right place Private Cloud Pros:
close to data source, suitable for private test CPU on virtual machine and NVIDIA T4 are available Public Cloud Pros: Flexibility and availability via global platform CPU+GPU and TPU is available Serving can often work using less resource than training

Workstations in MDF room (2019) • Clustering unused GeForce GTX
1080 Tis with Kubernetes for researchers ◦ Much higher demand than expected and many requests for similar service from developers

Issues w/ workstation cluster 1. Facility ◦ Poor power and
cooling capabilities of MDF room for high-power devices ▪ e.g., annual power outage ◦ High latency connection to our datacenter network (Site-to-site VPN) ▪ Not suited for inference serving application 1. Workstation ◦ Lack of BMC/IPMI (remote management feature) on our machines ▪ Would like to maintain remotely due to COVID-19 pandemic 1. GPU ◦ Limited memory capacity of GeForce ▪ Insufficient for some workloads

Infrastructure considerations (2020) 1. Location ◦ Our datacenter in Tokyo
▪ Sufficient power, cooling, and network capabilities 1. Hardware ◦ Rack-mount servers (with IPMI) ▪ Convenient maintenance ◦ NVIDIA data center GPUs ▪ Sufficient GPU memory We began looking for GPU-accelerated servers at the end of April

NVIDIA A100/DGX A100 • Ampere architecture ◦ Notable performance improvements
compared to “Volta” ▪ Up to 20x faster with sparsity • 3rd-gen NVLink/2nd-gen NVSwitch ◦ Seamlessly scalable up to 16 GPUs ◦ 2x faster GPU-to-GPU connection bandwidth than predecessors • Announce/Release Timing (14th May) ◦ Announced while we were making the list of candidate GPU servers ▪ Including DGX-1 and DGX-2

MIG: Multi-instance GPU MIG mode in the NVIDIA Ampere architecture
can run seven jobs in parallel on an A100 GPU (NVIDIA Blog) • Multi-tenancy ◦ For DGX A100, its 8 GPUs can be sliced into 56 GPU instances ◦ Administrators can assign right-sized GPU for each job • Guaranteed QoS ◦ All GPU instances include isolated memory and cores

DGX A100 • 1 node (for now) ◦ Scale-out if
required • Almost ready ☑ Setup (OS, Kubernetes, etc.) ☑ Benchmark ☐ Evaluate MIG support of Kubernetes device plugin

Hardware around DGX A100 100GbE 25GbE Compute NVIDIA DGX A100
Network Mellanox SN2010 Storage NetApp AFF A800

Conclusion: Purpose Why do we need GPUs? We must perform
high processing volumes at high speed. GPU power can contribute to our business. Advantages of our on-premises resources Functionalities • To build a flexible software stack • To link existing services Costs • Cloud fees remain high • Total on-premise costs will be lower in the long term

To improve the platform by OSS stack Operation automation with
Kubernetes Conclusion: our solutions DGX A100 GPUaaS (Kubernetes) AI Platform AI Platform The agility of application development will be increased by actively using OSS and improving the platform, which will have a significant impact on the business. AFF A800 AI Platform compatible with GCP High-performance GPU and storage

ToDos GPUaaS • Automatic slicing of GPU instances (MIG) On-premises
AI Platform • Serving implementation • Pipeline implementation A100 GPU / DGX A100 • Add more DGX A100 along with our business growth • Explore more new possibilities of MIG and Kubernetes • Integration of A100 with other GPUs (e.g. T4) for cost-efficiency

Thank you for listening

Kubernetes-based GPU as a Service Platform by u...

Kubernetes-based GPU as a Service Platform by using Open Source Software [GTC 2020]

More Decks by Daisuke Takahashi

Other Decks in Programming

Featured

Transcript