LLMariner - Transform your Kubernetes Cluster Into a GenAI platform

© 2024 CloudNatix, All Rights Reserved LLMariner Transform your Kubernetes
Cluster Into a GenAI platform

© 2024 CloudNatix, All Rights Reserved LLMariner Provide a unified
AI/ML platform with efficient GPU and K8s management LLMariner LLM (inference, fine-tuning, RAG) Workbench (Jupyter Notebook) Non-LLM Training public/private cloud GPUs (gen G1, arch A1) GPUs (gen G2, arch A2) public/private cloud GPUs (gen G1, arch A1) GPUs (gen G2, arch A2) public/private cloud GPUs (gen G1, arch A1) GPUs (gen G2, arch A2)

© 2024 CloudNatix, All Rights Reserved Example Use Cases •
Develop LLM applications with the API that is compatible with OpenAI API ◦ Leverage existing ecosystem to build applications • Fine-tune models while keeping data safely and securely in your on-premise datacenter Code auto-completion Chat bot

© 2024 CloudNatix, All Rights Reserved Key Features • LLM
Inference • LLM fine-tuning • RAG • Jupyter Notebook • General-purpose training • Flexible deployment model • Efficient GPU management • Security / access control • GPU visibility/showback (*) • Highly-reliable GPU management (*) For AI/ML team For infrastructure team (*) under development

© 2024 CloudNatix, All Rights Reserved High Level Architecture Worker
GPU K8s cluster LLMariner Agent for AI/ML Worker GPU K8s cluster LLMariner Agent for AI/ML Worker GPU K8s cluster LLMariner Agent for AI/ML Control plane K8s cluster LLMariner Control Plane for AI/ML API endpoint

© 2024 CloudNatix, All Rights Reserved Features for AI/ML team
and infra team APIs for the AI/ML team K8s cluster OpenAI-compatible API (chat completion, embedding, RAG, fine-tuning, …) Workbench with Jupyter Notebooks Inference engine User mgmt General purpose training jobs Cluster federation GPU workloads mgmt Storage mgmt Model mgmt Open models Closed models owned by your org Fine-tuned models Runtime mgmt (e.g., autoscaling, routing) vLLM Nvidia Triton Ollama Fine-tuning jobs API usage audits K8s cluster K8s cluster Files Vector DBs Jupyter Notebooks Training jobs Kueue Dex API authn/authz API key mgmt Orgs & projects mgmt Cluster mgmt Secure session mgmt

© 2024 CloudNatix, All Rights Reserved LLM Inference Serving •
Compatible with OpenAI API ◦ Can leverage the existing ecosystem and applications • Advanced capabilities surpassing standard inference runtimes, such as vLLM ◦ Optimized request serving and GPU management ◦ Multiple inference runtime support ◦ Multiple model support ◦ Built-in RAG integration

© 2024 CloudNatix, All Rights Reserved Multiple Model and Runtime
Support • Multiple model support • Multiple inference runtime support Open models from Hugging Face Private models in customers’ environment Fine-tuned models generated with LLMariner vLLM Ollama Nvidia Triton Inference Server Hugging Face TGI Upcoming Experimental

© 2024 CloudNatix, All Rights Reserved Cluster X Optimized Inference
Serving • Efficiently utilize GPU to achieve high throughput and low latency • Key technologies: ◦ Autoscaling ◦ Model-aware request load balancing & routing ◦ Multi-model management & caching ◦ Multi-cluster/cloud federation LLMariner Inference Manager Engine vLLM Llama 3.1 vLLM Gemma 2 Autoscaling vLLM Llama 3.1 Ollama Deepseek Coder Cluster Y

© 2024 CloudNatix, All Rights Reserved Built-in RAG Integration •
Use API compatible OpenAI to manage vector stores and files ◦ Use Milvus as an underlying vector DB • Inference engine retrieves relevant data when processing requests File File File Upload and create embeddings LLMariner Inference Engine Retrieve data

© 2024 CloudNatix, All Rights Reserved GPU K8s cluster Beyond
LLM Inference • Provide LLM fine-tuning, general-purpose training, and Jupyter Notebook management • Empower AI/ML teams to harness the full power of GPUs in a secure self-contained environment Supervised Fine-tuning Trainer

© 2024 CloudNatix, All Rights Reserved A Fine-tuning Example •
Submit a fine-tuning job using the OpenAI Python library ◦ Fine-tuned job runs in an underlying Kubernetes cluster • Enforce quota with integration with open source Kueue K8s cluster GPU GPU GPU GPU Fine-tuning job Fine-tuning job Quota enforcement with Kueue submit

© 2024 CloudNatix, All Rights Reserved Project X Enterprise-Ready Access
Control • Control API scope with “organizations” and “projects” ◦ A user in Project X can access fine-tuned models generated by other users in project X ◦ A user in Project Y cannot access the fine-tuned models in X • Can be integrated with a customer’s identity management platform (e.g., SAML, OIDC) Project Y User 1 User 2 Fine-tuned model User 3 create read cannot access

© 2024 CloudNatix, All Rights Reserved Supported Deployment Models Single
public cloud Single private cloud Air-gapped env Appliance Hybrid cloud (public & private) Multi-cloud federation Private cloud Public cloud LLMariner Control Plane LLMariner Agent Cloud Y Cloud A LLMariner Control Plane K8s cluster LLMariner Control Plane LLMariner Agent Cloud Y Cloud B LLMariner Agent ※ No need to open incoming ports in worker clusters, only outgoing port 443 is required

LLMariner - Transform your Kubernetes Cluster I...

LLMariner - Transform your Kubernetes Cluster Into a GenAI platform

Kenji Kaneda

Other Decks in Technology

Featured

Transcript

© 2024 CloudNatix, All Rights Reserved LLMariner Transform your Kubernetes

© 2024 CloudNatix, All Rights Reserved LLMariner Provide a unified

© 2024 CloudNatix, All Rights Reserved Example Use Cases •

© 2024 CloudNatix, All Rights Reserved Key Features • LLM

© 2024 CloudNatix, All Rights Reserved High Level Architecture Worker

© 2024 CloudNatix, All Rights Reserved Key Feature Details

© 2024 CloudNatix, All Rights Reserved Features for AI/ML team

© 2024 CloudNatix, All Rights Reserved LLM Inference Serving •

© 2024 CloudNatix, All Rights Reserved Multiple Model and Runtime

© 2024 CloudNatix, All Rights Reserved Cluster X Optimized Inference

© 2024 CloudNatix, All Rights Reserved Built-in RAG Integration •

© 2024 CloudNatix, All Rights Reserved GPU K8s cluster Beyond

© 2024 CloudNatix, All Rights Reserved A Fine-tuning Example •

© 2024 CloudNatix, All Rights Reserved Project X Enterprise-Ready Access

© 2024 CloudNatix, All Rights Reserved Supported Deployment Models Single