Upgrade to Pro — share decks privately, control downloads, hide ads and more …

지속적인 AI를 위한 모델 추론 최적화 및 경량화

Avatar for Lablup Inc. Lablup Inc. PRO
November 02, 2025
2

지속적인 AI를 위한 모델 추론 최적화 및 경량화

Track 2_1730_Lablup Conf 2025_김태호

Avatar for Lablup Inc.

Lablup Inc. PRO

November 02, 2025
Tweet

Transcript

  1. Speaker Introduction Senior Research Scientist, KAIST Institute Research Intern, Universite

    de Montreal (P.I. Yoshua Bengio) Research Intern, CUHK (P.I. Xiaogang Wang) CTO / Co-Founder, Nota B.S. Bio and Brain Engineering, KAIST M.S. Electrical Engineering, KAIST 08' 13' 15' 15' 17' Ph.D. (ABD) Electrical Engineering, KAIST 15' CEO / Co-Founder, Nota 14' 20'
  2. 7 © 2025. Nota Inc. Growing Divide Between Hardware and

    Software Software-driven optimization is essential to close the gap Transformer GPT-1 BERT GPT-2 Megatron LM Microsoft T-NLG GPT-3 GShard Switch Transformer Megatron-Turing P100 (12GB) TPUv2 (16GB) V100 (32GB) TPUv3 (32GB) A100 (40GB) A100-80 (80GB) H100 (80GB) 0 0 1 10 100 1000 10000 2015 2016 2017 2018 2019 2021 2022 2023 매개변수 수 (단위: 십억) Bridge the gap with NPU and SW AI Model Size : 410x / 2 Yrs HW Capacity : 2x / 2 Yrs Source: AI and Memory Wall (2024)
  3. 8 © 2025. Nota Inc. Hardware for Training and Inference

    학습용 AI칩은 한 종류만 쓰이지만 추론용 AI칩은 매우 많다. Training Inference
  4. 9 © 2025. Nota Inc. Can we deploy and optimize

    any models on any devices? No! What’s the problems? Inference Models
  5. 10 © 2025. Nota Inc. Things Don’t Always Go As

    Planned AI Model Target Device
  6. 11 © 2025. Nota Inc. Problems: What’s happening on the

    journey? Diverse model conversion options Out of Memory Operation not supported Latency is too high Various types of SDKs AI Model Target Hardware A Target Hardware B Target Hardware C
  7. 13 © 2025. Nota Inc. Series C ($43M in total)

    AI Optimization SDK for Samsung NPU Supports 35+ AI Devices Founded (Spun out from KAIST) Released NetsPresso® SDK Brief History 2024 2015 2023 Strategic Financial Location / Team Company Overview Snapshot EU Subsidiary Berlin, Germany Headquarter Daejeon, S. Korea R&D Center Seoul, S. Korea US Subsidiary, Sunnyvale, US R&D 69% BD Ops In progress for IPO to KOSDAQ (Q4 2025) CB Insights Global AI 100 A long-term contract with a top-tier mobile AP vendor 2025 Korean MIT
  8. 14 © 2025. Nota Inc. Nota AI in the News

    2025 AI 100 by CB Insights • Recognized for its cutting- edge AI technology and transformative impact. • Being named by CB Insights as one of the top 100 most promising private AI companies in 2025 is a significant achievement and a strong indicator of its innovation and potential in the accelerated computing and hardware space, particularly within AI infrastructure at the edge.
  9. 15 © 2025. Nota Inc. NetsPresso®: One-Stop-Shop for AI Development

    NetsPresso Bridges the Gap Between Diverse Hardware and AI Models AI HWs (IP, SoC) AI SW Stack Industry SDV Mobile Surveillance Home IoT Task/Model/ Application Mobility Computer Vision Speech/Audio LLMs GenAI
  10. 16 © 2025. Nota Inc. Solution Nota AI Business NetsPresso®

    Accelerates Edge AI Development Process Platform • Type: Traffic accident • Percentage: 80% • Emergency Level: 4 (Serious) • Possibility of second accident: 20% • Expected level of traffic congestion: 5 (Very serious) • Immediate action: Slow down, move over, and give right of way to emergency vehicles. • Variable message sign: "Accident ahead, slow down and be cautious." • Summary: A traffic accident occurred on the road, causing traffic congestion and the need for immediate attention to clear the incident and ensure road safety. Accident Alert Type Message …
  11. 18 © 2025. Nota Inc. Data Collection Training Optimization Deployment

    Testing NetsPresso is an AI model optimization solution that considers hardware specifications. It provides a modular pipeline that supports the entire AI lifecycle, from model development and device optimization to testing. AI model optimization solution supporting the AI lifecycle process Automation Modular Python Package Seamless integration with existing pipelines Intuitive GUI Fast and easy AI model deployment
  12. 19 © 2025. Nota Inc. Dataset Deployment Module AI Model

    Optimization Solution Considering Hardware Specifications Model zoo Work flow 1 Model Development Trainer Compressor Work flow 2 Device Optimization Quantizer IR Converter Graph Optimization Work flow 3 Testing on Real Devices Compile Benchmarker Device farm
  13. 20 © 2025. Nota Inc. Dataset Model zoo Trainer Compressor

    Workflow 1: Fast and Efficient AI Model Development Model Development Compressed model Expanded Compatible Model Range Quickly test the latest SOTA models. Automated Model Training Find optimal hyperparameters and reduce training time. Model Compression Reduce computational load and shrink model size using Pruning and Filter Decomposition.
  14. 21 © 2025. Nota Inc. 01. DarkNet-based - CSPDarkNet 02.

    Efficient CNNs - MobileNetV3 - MobileNetV4 - MixNet Support for Various AI Models and Automated Training List of models compatible with NetsPresso Trainer * Some models are optimized for NetsPresso Trainer. For a full list, refer to the documentation. Automated model training and visualization of training experiments 04. Hybrid Models - MobileViT 03. Transformer-based - ViT (Vision Transformer) - MixTransformer - EfficientFormer NetsPresso Trainer (GUI)
  15. 22 © 2025. Nota Inc. Comparison of Model Training and

    Compression Results ▪ Measured on QCS6490 ▪ Input Resolution: 224 x 224 input shape ▪ Latency reduced by 80.98% ▪ Accepted at ECCV 2024: https://arxiv.org/abs/2404.11630 * The smaller the value, the faster the inference speed ▪ Measured on Snapdragon Gen3 ▪ Input Resolution: 512 x 512 ▪ Latency reduced by 81.59% ▪ All operations run on the NPU AI Acceleration Benefits Using NetsPresso 371.7 70.7 -80.98% Image Classification (DeiT) 101.6 18.7 -81.59% Semantic Segmentation (SegFormer-b0)
  16. 23 © 2025. Nota Inc. model Quantize IR Convert Graph

    Optimize Workflow 2: Device Optimization for Hardware-Specific AI Device Optimization Optimized model Model Operator HW Optimization Modify model operators that are incompatible with hardware to ensure smooth AI model execution. Improved Computational Speed Compress models without performance degradation to enhance computational speed Model Compression Reduce model size, memory usage, and power consumption.
  17. 24 © 2025. Nota Inc. • Layer-wise profiling on target

    devices. • Supports Full Int8 and Mixed Precision Quantization. • Implements Automatic Mixed Precision Quantization, considering the trade-off between accuracy loss and inference speed. Applies quantization with an optimal balance Performance Measurement on Real Devices Identifies bottleneck layers through visualization Achieves an optimal balance through layer visualization Scheduled for Release: End of May 2025 NetsPresso Optimizer(GUI)
  18. 26 © 2025. Nota Inc. model Compile Benchmark Device Farm

    Workflow 3: Testing on Real Devices Testing on Real Devices Validation & Insight Cross-HW Performance Analysis Measure and compare performance on different hardware. Real-World Testing Execute AI models on real devices in a Device Farm. Test Automation Quickly and easily verify model compatibility and performance on target devices.
  19. 27 © 2025. Nota Inc. Seamless Deployment from AI Model

    Conversion to Testing https://launchx.netspresso.ai/ NetsPresso LaunchX(GUI) Automatic Conversion Measure model performance Real Device
  20. 30 © 2025. Nota Inc. 노타 최적화 성공 사례 생성형

    AI 불가능 생성형 AI 4.47 ms 최적화 이미지 생성 1885.8 ms 이미지 생성 738.5 ms 2.4x 가속 회사 Without Nota AI With Nota AI Transformer Model 101.6 ms 5.4x 가속 Transformer Model 18.7 ms Transformer Model 3.93 ms 5.4x 가속 Transformer Model 0.73 ms ViT Model 불가능 최적화 ViT Model 가능 Transformer Model 2.09 ms 5.2x 가속 Transformer Model 0.40 ms * Measured on Qualcomm Snapdragon 8 Gen 3, Arm Ethos-U55, Renesas RZ/V2L, and NXP i.MX 93 respectively.