Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Uncharted AI: 대AI시대

Uncharted AI: 대AI시대

Lablup Inc.

November 27, 2024
Tweet

More Decks by Lablup Inc.

Other Decks in Technology

Transcript

  1.  A smallest reference NVIDIA SuperPod (1 SU) config requires:

    – 12 switches – >500 cables – 400 Gbps Infiniband NDR  Dominant traffic patterns – Cloud: North-South – Kubernetes: Hybrid (NS+EW) – HPC & AI: East-West – End-to-end MLOps: Hybrid (NS+EW)  Software stack: NVIDIA NCCL
  2. 1500+ GPUs on Single Site 100+ and growing! 13,000 Enterprise

    GPUs Managed 2M Environment Downloads 100+ Enterprise Accounts in 3 Continents
  3. raftify aiodocker aiomonitor aiotools callosum Backend.AI DPDK FreeBSD OpenStack NBA

    tensorflow numpy vllm openblas python bitsandbytes ML/AI zeromq pyzmq aiohttp googletest Messaging Infrastructure / OS Maintenance Contributing
  4. TUI interface for easy installation  Automates package-based installation 

    Supports auto-install meta- configuration  Enables offline installation  Allows anyone to effortlessly set up a Backend.AI cluster Backend.AI CLI Installer https://www.backend.ai/ko/download
  5. TUI interface for easy installation  Automates package-based installation 

    Supports auto-install meta- configuration  Enables offline installation  Allows anyone to effortlessly set up a Backend.AI cluster Backend.AI CLI Installer https://www.backend.ai/ko/download
  6. Simplify Backend.AI development  Easily set up and manage complex

    Backend.AI dev environments  Enables seamless customization and hacking of Backend.AI  Empowers everyone to own and modify their AI infrastructure bndev https://bnd.ai/bndev
  7. ION Model recipe for Backend.AI Inference NIM As a first-class

    Inference runtime Intel Gaudi Gaudi 2/3 integration Agent Selector Plugin Fine-grained Job batching OIDC Unified authentication Cillum Integration eBPF Model Store GraceHopper GH200 / GB200 support 24.09 Core Priority Scheduler GPUDirect Storage Speedup Legacy+Cloud Hybrid clustering
  8. Backend.AI Agent (Docker backend) Docker (moby) Native Accelerator Plugin(s) Backend.AI

    Agent (K8s backend v2) K8s API server kubelet Backend.AI Manager Standalone K8s cluster Controller Node Computing Node(s) Computing Node(s) Proxied comptue node Proxied accelerator device Vendor-provided device operators kubelet Vendor-provided device operators kubelet Vendor-provided device operators Proxied comptue node Proxied accelerator device Proxied comptue node Proxied accelerator device ✓ Job queue managed by Backend.AI Sokovan ✓ User/project resource quota ✓Full accelerator support ✓Full GPU virtualization support for CUDA (proprietary enterprise version only!) ✓Full storage folder (vfolder) support ✓Vendor-provided accelerator support ✓Each node treated as individual virtual nodes △ Limited storage folder (vfolder) support : Components under development : Existing components : External components Next-gen Sokovan: Multi-engine architecture Kubernetes as first-class citizen
  9.  Project-based pipeline management  Various templates – Foundation model

    training – Fine-tuning – Automatic deploy  Custom task nodes with Partners FastTrack 2 https://cloud-mlops.backend.ai
  10.  ‘Finetuning via Prompt’ – NO DATA required! – Uses

    NemoTron-4-340B  Starts from Llama3.1, Gemma2  Built on FastTrack  Runs on Backend.AI Cloud The waitlist is now open! finetun.ing https://www.finetun.ing
  11. GUI-based cluster design tool  Tailors cluster configuration to desired

    scale and performance  Automatically calculates effective performance, required hardware, and estimated costs  Ideal for validating optimal architecture before actual deployment Design your own cluster at our demo booth! Backend.AI Cluster designer
  12. Interactive cluster management interface  Enables complex cluster operations through

    terminal chatting  Utilizes Gemma-based fine-tuned models to accurately understand user intentions  Supports on-premises conversational fine-tuning pipeline construction  Integrates packages like TorchTune, LangGraph, and LangChain Backend.AI Helmsman Agent
  13. X64 High performance, widely used in PC & HPC Armv9

    Energy-efficient, dominates mobile devices RISC-V Open-source, highly customizable
  14. Intel Gaudi Gaudi 2/3 GraceHopper GH200 / GB200 NVIDIA Ampere

    A10 / A40 / A100 NVIDIA Blackwell x86-64 based node NVIDIA Hopper H100 / H100 NVL / H200 AMD Instinct MI250 / MI300 Rebellion ATOM / ATOM+ FuriosaAI Warboy / RNGD Google TPU TPU v3/4/5 Amazon Inferentia / Trainium v2 NVIDIA Turing Titan RTX / RTX 8000 NVIDIA Volta V100 X64 High performance, widely used in PC & HPC Armv9 Energy-efficient, dominates mobile devices RISC-V Open-source, highly customizable Jetson TX / Xavier / AGX Orin Coral EdgeTPU Groq GroqCard GraphCore IPU / BOW AMD RDNA RDNA2
  15. ION Model recipe for Backend.AI Inference NIM As a first-class

    Inference runtime Hugging Face model via model URL PALI: Performant AI Launcher for Inference Model Store Open Models Lablup GPU Virtualizer Backend.AI Model Player Partner Models
  16. PALI Square: Dedicated hardware infrastructure appliance for PALI  Easily

    scalable by connecting multiple PALI-equipped appliances  Optimized architecture for AI workloads – Delivers high performance and low latency PALI Performant AI Launcher for Inference Intel Gaudi Gaudi 2/3 integration GraceHopper GH200 / GB200 A6000 / L40 x86-64 based node Model Store Per architecture / chip PALI2: Scalable AI H/W Infrastructure
  17. PALI2: Scalable AI H/W Infrastructure Upcoming releases  PALI based

    on NVIDIA GH200 reference platform (Korea) – Pre-orders in October, sales from Q4  Instant.AI by Kyocera Mirai Envision (Japan) – Launching on October 1st, 2024  PALI^2 appliances for US and European markets – Expected as early as Q4 this year https://www.kcme.jp/product/instant-ai-server/
  18. Comprehensive language model platform  Incorporates PALI, FastTrack, Talkativot, and

    Helmsman  Simplifies large-scale language model deployment and operation – Provides ready-to-use inference and fine- tuning settings  Talkativot enables easy creation of customized chatbots PALANG: PALI for LANGuage models
  19. Comprehensive language model platform  Incorporates PALI, FastTrack, Talkativot, and

    Helmsman  Simplifies large-scale language model deployment and operation – Provides ready-to-use inference and fine- tuning settings  Talkativot enables easy creation of customized chatbots PALANG: PALI for LANGuage models PALI Performant AI Launcher for Inference GraceHopper GH200 / GB200 A6000 / L40 x86-64 based node Model Store Per architecture / chip FastTrack MLOps 2 Helmsman Conversional Backend.AI management UX Talkativot Chatbot UI for Language models LLMs Model weights H100 / B200 x86-64 based node
  20. Gemma2-based language model family  Easily customizable through Finetun.ing 

    Versatile applications – Backend model for Helmsman – Enterprise-level agents “G” : one more thing
  21. CORE 24.09 / 25.03 From cell to society ION Open

    Model Recpies for AI Inference BNDEV DevStack manager FastTrack MLOps 2 CLI Installer Interactive terminal UI Site Designer Helmsman Conversional Backend.AI management UX PALI Performant AI Launcher for Inference PALI2 PALI Appliance PALANG Language model-oriented AI Inference platform GARNET Gemma2-based SLM Next-gen Sokovan Also with Kubernetes finetun.ing Model tuning by discussion, without data WebUI 3 Neo Rewritten UI/UX