Uncharted AI: 대AI시대

대AI시대: Uncharted AI 신정규 / 래블업 주식회사

“Make AI Accessible”

 A smallest reference NVIDIA SuperPod (1 SU) config requires:
– 12 switches – >500 cables – 400 Gbps Infiniband NDR  Dominant traffic patterns – Cloud: North-South – Kubernetes: Hybrid (NS+EW) – HPC & AI: East-West – End-to-end MLOps: Hybrid (NS+EW)  Software stack: NVIDIA NCCL

Compute Network Storage Energy

“Make AI Scalable”

1500+ GPUs on Single Site 100+ and growing! 13,000 Enterprise
GPUs Managed 2M Environment Downloads 100+ Enterprise Accounts in 3 Continents

Open- source Backend.AI Core Lablup Enterprise

open-source

aiodocker aiomonitor aiotools raftify callosum Backend.AI

raftify @ 2024 Contribution Academy

raftify aiodocker aiomonitor aiotools callosum Backend.AI DPDK FreeBSD OpenStack NBA
tensorflow numpy vllm openblas python bitsandbytes ML/AI zeromq pyzmq aiohttp googletest Messaging Infrastructure / OS Maintenance Contributing

TUI interface for easy installation  Automates package-based installation 
Supports auto-install meta- configuration  Enables offline installation  Allows anyone to effortlessly set up a Backend.AI cluster Backend.AI CLI Installer https://www.backend.ai/ko/download

Simplify Backend.AI development  Easily set up and manage complex
Backend.AI dev environments  Enables seamless customization and hacking of Backend.AI  Empowers everyone to own and modify their AI infrastructure bndev https://bnd.ai/bndev

Backend.AI Core Open-source Lablup Enterprise

ION Model recipe for Backend.AI Inference NIM As a first-class
Inference runtime Intel Gaudi Gaudi 2/3 integration Agent Selector Plugin Fine-grained Job batching OIDC Unified authentication Cillum Integration eBPF Model Store GraceHopper GH200 / GB200 support 24.09 Core Priority Scheduler GPUDirect Storage Speedup Legacy+Cloud Hybrid clustering

Backend.AI Agent (Docker backend) Docker (moby) Native Accelerator Plugin(s) Backend.AI
Agent (K8s backend v2) K8s API server kubelet Backend.AI Manager Standalone K8s cluster Controller Node Computing Node(s) Computing Node(s) Proxied comptue node Proxied accelerator device Vendor-provided device operators kubelet Vendor-provided device operators kubelet Vendor-provided device operators Proxied comptue node Proxied accelerator device Proxied comptue node Proxied accelerator device ✓ Job queue managed by Backend.AI Sokovan ✓ User/project resource quota ✓Full accelerator support ✓Full GPU virtualization support for CUDA (proprietary enterprise version only!) ✓Full storage folder (vfolder) support ✓Vendor-provided accelerator support ✓Each node treated as individual virtual nodes △ Limited storage folder (vfolder) support : Components under development : Existing components : External components Next-gen Sokovan: Multi-engine architecture Kubernetes as first-class citizen

enterprise

______ made easy

Scaling made easy

 Project-based pipeline management  Various templates – Foundation model
training – Fine-tuning – Automatic deploy  Custom task nodes with Partners FastTrack 2 https://cloud-mlops.backend.ai

 ‘Finetuning via Prompt’ – NO DATA required! – Uses
NemoTron-4-340B  Starts from Llama3.1, Gemma2  Built on FastTrack  Runs on Backend.AI Cloud The waitlist is now open! finetun.ing https://www.finetun.ing

GUI-based cluster design tool  Tailors cluster configuration to desired
scale and performance  Automatically calculates effective performance, required hardware, and estimated costs  Ideal for validating optimal architecture before actual deployment Design your own cluster at our demo booth! Backend.AI Cluster designer

Interactive cluster management interface  Enables complex cluster operations through
terminal chatting  Utilizes Gemma-based fine-tuned models to accurately understand user intentions  Supports on-premises conversational fine-tuning pipeline construction  Integrates packages like TorchTune, LangGraph, and LangChain Backend.AI Helmsman Agent

Acceleration made easy

X64 High performance, widely used in PC & HPC Armv9
Energy-efficient, dominates mobile devices RISC-V Open-source, highly customizable

Intel Gaudi Gaudi 2/3 GraceHopper GH200 / GB200 NVIDIA Ampere
A10 / A40 / A100 NVIDIA Blackwell x86-64 based node NVIDIA Hopper H100 / H100 NVL / H200 AMD Instinct MI250 / MI300 Rebellion ATOM / ATOM+ FuriosaAI Warboy / RNGD Google TPU TPU v3/4/5 Amazon Inferentia / Trainium v2 NVIDIA Turing Titan RTX / RTX 8000 NVIDIA Volta V100 X64 High performance, widely used in PC & HPC Armv9 Energy-efficient, dominates mobile devices RISC-V Open-source, highly customizable Jetson TX / Xavier / AGX Orin Coral EdgeTPU Groq GroqCard GraphCore IPU / BOW AMD RDNA RDNA2

Inference made easy

PALI: Performant AI Launcher for Inference

ION Model recipe for Backend.AI Inference NIM As a first-class
Inference runtime Hugging Face model via model URL PALI: Performant AI Launcher for Inference Model Store Open Models Lablup GPU Virtualizer Backend.AI Model Player Partner Models

PALI Square: Dedicated hardware infrastructure appliance for PALI  Easily
scalable by connecting multiple PALI-equipped appliances  Optimized architecture for AI workloads – Delivers high performance and low latency PALI Performant AI Launcher for Inference Intel Gaudi Gaudi 2/3 integration GraceHopper GH200 / GB200 A6000 / L40 x86-64 based node Model Store Per architecture / chip PALI2: Scalable AI H/W Infrastructure

PALI2: Scalable AI H/W Infrastructure Upcoming releases  PALI based
on NVIDIA GH200 reference platform (Korea) – Pre-orders in October, sales from Q4  Instant.AI by Kyocera Mirai Envision (Japan) – Launching on October 1st, 2024  PALI^2 appliances for US and European markets – Expected as early as Q4 this year https://www.kcme.jp/product/instant-ai-server/

Comprehensive language model platform  Incorporates PALI, FastTrack, Talkativot, and
Helmsman  Simplifies large-scale language model deployment and operation – Provides ready-to-use inference and finetuning settings  Talkativot enables easy creation of customized chatbots PALANG: PALI for LANGuage models

Comprehensive language model platform  Incorporates PALI, FastTrack, Talkativot, and
Helmsman  Simplifies large-scale language model deployment and operation – Provides ready-to-use inference and finetuning settings  Talkativot enables easy creation of customized chatbots PALANG: PALI for LANGuage models PALI Performant AI Launcher for Inference GraceHopper GH200 / GB200 A6000 / L40 x86-64 based node Model Store Per architecture / chip FastTrack MLOps 2 Helmsman Conversional Backend.AI management UX Talkativot Chatbot UI for Language models LLMs Model weights H100 / B200 x86-64 based node

Gemma2-based language model family  Easily customizable through Finetun.ing 
Versatile applications – Backend model for Helmsman – Enterprise-level agents “G” : one more thing

Scaling Acceleration Inference made easy

AI made easy

CORE 24.09 / 25.03 From cell to society ION Open
Model Recpies for AI Inference BNDEV DevStack manager FastTrack MLOps 2 CLI Installer Interactive terminal UI Site Designer Helmsman Conversional Backend.AI management UX PALI Performant AI Launcher for Inference PALI2 PALI Appliance PALANG Language model-oriented AI Inference platform GARNET Gemma2-based SLM Next-gen Sokovan Also with Kubernetes finetun.ing Model tuning by discussion, without data WebUI 3 Neo Rewritten UI/UX

Uncharted AI: 대AI시대

Uncharted AI: 대AI시대

More Decks by Lablup Inc.

Other Decks in Technology

Featured

Transcript