Upgrade to Pro — share decks privately, control downloads, hide ads and more …

End-to-End Machine Learning with Amazon SageMaker

End-to-End Machine Learning with Amazon SageMaker

Amazon SageMaker를 활용한 머신 러닝 개발
- Machine Learning Workflow
- Amazon SageMaker Studio Notebooks
- Amazon SageMaker Training Job
- Amazon SageMaker Hyperparameter Optimization
- Amazon SageMaker Endpoints
- Amazon SageMaker Pipelines
- Amazon SageMaker Autopilot

Sungmin Kim

April 19, 2022
Tweet

More Decks by Sungmin Kim

Other Decks in Programming

Transcript

  1. © 2021 Amazon Web Services, Inc. or its affiliates. All

    rights reserved | Sungmin Kim, AWS Solutions Architect End-to-End Machine Learning with Amazon SageMaker
  2. In this Talk • What is Machine Learning? • Machine

    Learning Workflow • Build → Train → Deploy • Build fast and collaborate • Amazon SageMaker Studio Notebooks • Training and tune models • Amazon SageMaker Training Job • Amazon SageMaker Hyperparameter Optimization • Deploy and manage models • Amazon SageMaker Endpoints • Amazon SageMaker Pipelines • Automatic ML Model Generation • Amazon SageMaker Autopilot • Machine Learning in the cloud
  3. Option 1- Build A Rule Engine Age Gender Purchase Date

    Items 30 M 3/1/2017 Toy 40 M 1/3/2017 Books …. …… ….. ….. Input Output Age Gender Purchase Date Items 30 M 3/1/2017 Toy …. …… ….. ….. Rule 1: 15 <age< 30 Rule 2: Bought Toy=Y, Last Purchase<30 days Rule 3: Gender = ‘M’, Bought Toy =‘Y’ Rule 4: …….. Rule 5: …….. Human Programmer
  4. Option 2 - Learn The Business Rules From Data Learning

    Algorithm Model Output Historical Purchase Data (Training Data) Prediction Age Gender Items 35 F 39 M Toy Input - New Unseen Data Age Gender Purchase Date Items 30 M 3/1/2017 Toy 40 M 1/3/2017 Books …. …… ….. …..
  5. We Call This Approach Machine Learning Learning Algorithm Model Output

    Historical Purchase Data (Training Data) Prediction Age Gender Items 35 F 39 M Toy Input - New Unseen Data Age Gender Purchase Date Items 30 M 3/1/2017 Toy 40 M 1/3/2017 Books …. …… ….. ….. Rule 1: 15 <age< 30 Rule 2: Bought Toy=Y, Last Purchase<30 days Rule 3: Gender = ‘M’, Bought Toy =‘Y’ Rule 4: …….. Rule 5: …….. Human Programmer
  6. Typical Machine Learning Process Collect, prepare and label training data

    Choose and optimize ML algorithm Train and tune ML models Set up and manage environments for training Deploy models in production Scale and manage the production environment 1 2 3
  7. Set up and track experiment Machine Learning is iterative Choose

    model Debug, compare, and evaluate experiments Monitor quality, detect drift, and retrain Share, review, and collaborate
  8. Common machine learning development Laptop Upside: • Flexible. Personal. Easy

    to get started. Downside: • Extremely difficult to scale • Nearly impossible to run in production • Need virtual environments in order to experiment
  9. Common machine learning development Servers Upside: • Familiar. May seem

    less expensive upfront. Downside: • Availability is incredibly challenging to maintain • Stuck in either over- or under- utilization • Experimentation is risky and expensive • New ideas have to wait for months to start • Good luck going global!
  10. Amazon SageMaker Label data Aggregate & prepare data Store &

    share features Auto ML Spark/R Detect bias Visualize in notebooks Pick algorithm Train models Tune parameters Debug & profile Deploy in production Manage & monitor CI/CD Human review Ground Truth Data Wrangler Feature store Autopilot Processing Clarify Studio Notebooks Built-in or Bring-your-own Experiments Spot Training Distributed Training Automatic Model Tuning Debugger Model Hosting Multi-model Endpoints Model Monitor Pipelines Augmented AI AMAZON SAGEMAKER EDGE MANAGER SAGEMAKER STUDIO IDE AMAZON SAGEMAKER JUMPSTART VISION SPEECH TEXT SEARCH CHATBOTS PERSONALIZATION FORECASTING FRAUD CONTACT CENTERS Deep Learning AMIs & Containers GPUs & CPUs Elastic Inference Trainium Inferentia FPGA AI SERVICES ML SERVICES FRAMEWORKS & INFRASTRUCTURE DeepGraphLibrary Amazon Rekognition Amazon Polly Amazon Transcribe +Medical Amazon Lex Amazon Personalize Amazon Forecast Amazon Comprehend +Medical Amazon Textract Amazon Kendra Amazon CodeGuru Amazon Fraud Detector Amazon Translate INDUSTRIAL AI CODE AND DEVOPS Amazon DevOps Guru Voice ID For Amazon Connect Contact Lens Amazon Monitron AWS Panorama + Appliance Amazon Lookout for Vision Amazon Lookout for Equipment Amazon HealthLake HEALTHCARE AI Amazon Lookout for Metrics ANOMOLY DETECTION Amazon Transcribe for Medical Amazon Comprehend for Medical 모든 개발자를 위한 다양한 인공 지능 도구 제공
  11. Amazon SageMaker Label data Aggregate & prepare data Store &

    share features Auto ML Spark/R Detect bias Visualize in notebooks Pick algorithm Train models Tune parameters Debug & profile Deploy in production Manage & monitor CI/CD Human review Ground Truth Data Wrangler Feature store Autopilot Processing Clarify Studio Notebooks Built-in or Bring-your-own Experiments Spot Training Distributed Training Automatic Model Tuning Debugger Model Hosting Multi-model Endpoints Model Monitor Pipelines Augmented AI AMAZON SAGEMAKER EDGE MANAGER SAGEMAKER STUDIO IDE AMAZON SAGEMAKER JUMPSTART VISION SPEECH TEXT SEARCH CHATBOTS PERSONALIZATION FORECASTING FRAUD CONTACT CENTERS Deep Learning AMIs & Containers GPUs & CPUs Elastic Inference Trainium Inferentia FPGA AI SERVICES ML SERVICES FRAMEWORKS & INFRASTRUCTURE DeepGraphLibrary Amazon Rekognition Amazon Polly Amazon Transcribe +Medical Amazon Lex Amazon Personalize Amazon Forecast Amazon Comprehend +Medical Amazon Textract Amazon Kendra Amazon CodeGuru Amazon Fraud Detector Amazon Translate INDUSTRIAL AI CODE AND DEVOPS Amazon DevOps Guru Voice ID For Amazon Connect Contact Lens Amazon Monitron AWS Panorama + Appliance Amazon Lookout for Vision Amazon Lookout for Equipment Amazon HealthLake HEALTHCARE AI Amazon Lookout for Metrics ANOMOLY DETECTION Amazon Transcribe for Medical Amazon Comprehend for Medical 모든 개발자를 위한 다양한 인공 지능 도구 제공
  12. End-to-End Machine Learning Platform Zero setup Flexible Model Training Pay

    by the second $ Amazon SageMaker 손쉬운 기계 학습 모델 생성, 훈련 및 서비스 배포 완전 관리 서비스
  13. Set up and track experiment Choose model Debug, compare, and

    evaluate experiments Monitor quality, detect drift, and retrain Share, review, and collaborate Build fast and collaborate
  14. Amazon SageMaker Studio Collaboration at scale 코드 의존성 추적 없이

    확장 가능한 노트북 공유 Easy experiment management 수천 개의 모델 실험을 구성, 추적 및 비교 Automatic model generation 코드 작성 없이 데이터를 가지고 자동 모델 생성 Higher quality ML models 오류 자동 디버깅 및 실시간 오류 경보 모델 모니터링 및 고품질 유지 Increased productivity 완전 자동화된 머신 러닝 워크플로 구축 기계 학습 모델 개발 및 배포를 위한 최초의 완전 통합 개발 환경 (IDE)
  15. 개발자가 몇 초 만에 ML 노트북을 가동 후 한 번의

    클릭으로 공유 할 수 있는 새로운 개발 환경 제공 Amazon SageMaker Notebooks 직원 자격 증빙으로 바로 개발 환경 접근 가능 관리자가 손쉽게 권한 및 접근 제어 가능 보안성 높은 완전 관리형 서비스 손쉬운 협업 환경 제공 클릭 한번으로 URL기반 공유 가능 싱글 사인온 (SSO)을 통한 손쉬운 접근 컴퓨팅 리소스 없이 서버리스 환경 별도의 설정이나 구동 불필요
  16. • Jupyter notebooks • Support for Jupyter Lab • Multiple

    built-in kernels • Install external libraries and kernels • Integrate with Git • Sample notebooks • VPC Integration for integrated security
  17. Set up and track experiment Choose model Debug, compare, and

    evaluate experiments Monitor quality, detect drift, and retrain Share, review, and collaborate Train and tune models
  18. Amazon SageMaker Training Docker Container EC2 Instance S3 Bucket Elastic

    Container Registry Download Algorithm Image 2 Write trained model to S3 4 Sends your data 3 EC2 Instance EC2 Instance model.fit() 1
  19. Amazon SageMaker Training How does training happen XGBoost validation(optional) test(optional

    ECR S3 ML Instance ml.m4.xlarge xgboost linear-learner PCA DeepAR BlazingText Image classification … Object Detection Images S3 SageMaker Notebook SageMaker Training Job train Model
  20. SageMaker training supports Spot Instances EC2 Instance Spot Pricing •

    Specify a maximum wait time • SageMaker will default to giving you the lowest possible cost • Store model checkpoints in Amazon S3 in case your job is interrupted for BYOM • Many built-in algorithms automatically revert to a training job • We have examples • Save up to 90%!
  21. Train with a built-in algorithm xgboost linear-learner PCA DeepAR BlazingText

    Image classification … Object Detection Built-in Algorithm Images Elastic Container Registry
  22. Training code • Matrix factorization • Regression • Principal component

    analysis • K-means clustering • Gradient boosted trees • And more! 17 Built-in algorithms Bring your own script (Amazon SageMaker managed container) Bring your own algorithm (you build the Docker container) Subscribe to Algorithms and Model Packages on AWS Marketplace Many ways to train a model on SageMaker Algorithm Options
  23. Neural Networks Number of layers Hidden layer width Learning rate

    Embedding dimensions Dropout … Decision Trees Tree depth Max leaf nodes Gamma Eta Lambda Alpha … “Hyperparameters” (algorithm parameters that significantly affect model quality) Amazon SageMaker Automatic Model Tuning Hyperparameter Tuning
  24. Automatic Model Tuning Training Job 1 Training Job 2 Training

    Job N Best Model Selector Best Model • Define Metrics • Hyperparameter ranges/scaling • Stop tuning job early • Use warm start • Bayesian ~OR~ Random Search Amazon SageMaker Automatic Model Tuning Hyperparameter Tuning
  25. Amazon SageMaker Automatic Model Tuning What if I need all

    my jobs tuned at the same time? Bayesian Search Random Search
  26. Set up and track experiment Choose model Debug, compare, and

    evaluate experiments Monitor quality, detect drift, and retrain Share, review, and collaborate Deploy models
  27. Amazon SageMaker Deployment Hosting Services Inference Image Training Image Training

    Data Model artifacts Endpoint Amazon SageMaker Amazon S3 Amazon ECR Model artifacts Inference Image Model artifacts Inference Image 1 2 3
  28. Amazon SageMaker Deployment SageMaker Endpoints (Private API) Auto Scaling group

    Availability Zone 1 Availability Zone 2 Availability Zone 3 Elastic Load Balancing Model Endpoint Client Deployment / Hosting Amazon SageMaker ML Compute Instances Input Data (Request) Prediction (Response)
  29. Amazon SageMaker Deployment SageMaker Endpoints (Public API) Auto Scaling group

    Availability Zone 1 Availability Zone 2 Availability Zone 3 Elastic Load Balancing Model Endpoint Amazon API Gateway Client Deployment / Hosting Amazon SageMaker ML Compute Instances Input Data (Request) Prediction (Response)
  30. Amazon SageMaker Deployment Updating Endpoints Blue-green deployments mean no scheduled

    downtime Deploy one or more models behind the same endpoint
  31. Amazon SageMaker Deployment A/B Testing A/B Testing Secure Endpoint Inference

    Code Helper Code Model Artifacts Inference code Images Client Application Inference request Inference result • 1-10 Production Variants (Model Versions) • All models must have the same I/O schema • Endpoint Modification w/o service disruption Model-1 Inference Code Helper Code Model Artifacts Inference code Images Model-2 { … 'InitialVariantWeight’: 2 } {ProductionVariants} { … 'InitialVariantWeight’: 1 }
  32. Amazon SageMaker Deployment Multi-Model Endpoints • Scalable/Cost Effective for large

    number of models • Works best when models are of similar size and latency • Automatic memory handling Multi-Model Endpoints Secure Endpoint Model Artifacts Client Application Inference request Inference result Model-1 Inference Code Helper Code Container Model Artifacts Model-2 Inference Code Helper Code Container Invoke Endpoint: TargetModel = Model-1 Pre fix = SalesForecast/ Prefix = SalesForecast/
  33. Multi-model endpoints Significant savings for large-scale deployments EP-1 Model 1

    EP-2 Model 2 EP-10 Model 10 … EP Model 1 Model 2 … Model 10 Sample scenario: ml.c5.xlarge, $0.238/hour, 2 instances running 24/7 10 separate endpoints $3,430/month 1 multi-model endpoint $343/month
  34. Multi-model endpoints nevada.tar.gz Mode: MultiModel Artifact location: s3://bucket/your-endpoint-models predict predict(‘nevada.tar.gz’,

    features) s3://bucket/your-endpoint-models/ new_york.tar.gz florida.tar.gz texas.tar.gz load new_york.tar.gz texas.tar.gz florida.tar.gz nevada.tar.gz Amazon SageMaker Multi-model endpoint Amazon S3 model storage
  35. Set up and track experiment Choose model Debug, compare, and

    evaluate experiments Monitor quality, detect drift, and retrain Share, review, and collaborate Manage Workflow for ML Lifecycle
  36. Challenges with creating a complete workflow for the ML lifecycle

    1 2 컨셉concept에서 프로덕션까지 모델을 가져오는 데는 여러 단계가 포함 • ML 수명주기lifecycle의 각 단계에 대한 표준 코드 패키지 생성 • 워크플로라는 구조로 연결 • 단계step 간 종속성 관리 • 오케스트레이션 된 시퀀스로 워크플로 실행 모델 구축, 훈련 및 배포는 반복적인 프로세스 3 워크플로의 각 단계에 대한 아티팩트 추적 5 ML Ops의 일부로 전체 워크플로 자동화 및 확장 4 수천 개의 모델에서 올바른 버전의 모델 배포 및 관리
  37. Amazon SageMaker Pipelines 규모에 맞게 완전 자동화된 머신 러닝 워크플로

    구축 ML Workflow 작성 및 관리 사용하기 쉬운 Python SDK로 상세한 Workflow를 만들고 시각적으로 관리 거버넌스 및 감사audit를 위한 모델 계보lineage 추적 ML 수명주기lifecycle의 각 단계에 대한 코드, 데이터 셋 및 버전 추적 Workflow 재생 및 재실행 사용자 지정 일정에 따라 모든 단계를 다시 실행하여 모델을 최신 상태로 유지 시각적으로 모델 비교, 선택 및 배포 가능 SageMaker Studio의 시각적 인터페이스를 통해 모듈 배포 및 관리 Registery를 활용한 중앙 집중식 ML 모델 관리 모델 레지스트리를 사용하여 프로덕션 배포에 가장 적합한 모델 선택 CI/CD 지원이 내장된 완전 관리형 MLOps CI/CD 사례를 사용하여 완전 자동화된 머신 러닝 워크플로 구축
  38. CI/CD Pipeline 예제 (1) 2. Git Commit & Push 3.

    Automatic Pipelining 1. 코드 수정 & Git Add
  39. CI/CD Pipeline 예제 (2) 프로덕션 배포 승인 여부 UI로 쉽게

    모델 버전 간 성능을 비교할 수 있고, status를 변경해서 One-click 모델 배포 가능 모델 버전 간 metric 비교 3 1 4 2
  40. Set up and track experiment Choose model Debug, compare, and

    evaluate experiments Monitor quality, detect drift, and retrain Share, review, and collaborate Machine Learning Workflow
  41. Amazon SageMaker Studio 기계 학습 모델 개발 및 배포를 위한

    최초의 완전 통합 개발 환경 (IDE) 학습 모델 구축 및 협업 SageMaker Notebooks SageMaker Pipelines 완전 자동화된 머신 러닝 워크플로 구축 학습 모델 훈련 및 검증 SageMaker Training Job One-click 배포 , 모델 모니터링 및 고품질 유지 SageMaker Endpoints 학습 모델 최적화 및 다중 알고리즘 튜닝 SageMaker HPO
  42. Set up and track experiment Choose model Debug, compare, and

    evaluate experiments Monitor quality, detect drift, and retrain Share, review, and collaborate If You Still Feel Machine Learning Difficult…
  43. 자동 모델 생성 자동 모델 튜닝을 통한 ML 모델 자동

    생성 추천 및 최적화 기능 리더 보드 확보 및 모델 개선 계속 Amazon SageMaker Autopilot 기존 AutoML의 단점을 극복하기 위해 모델 제어 및 가시성 확보를 기반으로 자동 모델 생성 및 관리 서비스 가시성 및 데이터 제어 모델에 맞는 노트북 소스 코드 빠르게 시작 가능
  44. Generate the Codes and Notebooks for you Amazon SageMaker Autopilot

    Data Exploration Amazon SageMaker Autopilot Candidate Definition Notebook
  45. Classification • Linear Learner • XGBoost • KNN Working with

    Text • BlazingText • Supervised • Unsupervised* Recommendation • Factorization Machines Forecasting • DeepAR Topic Modeling • LDA • NTM Amazon SageMaker에서 제공하는 Built-in Algorithms Sequence Translation • Seq2Seq* Clustering • KMeans Feature Reduction • PCA • Object2Vec Anomaly Detection • Random Cut Forests • IP Insights Computer Vision • Image Classification • Object Detection • Semantic Segmentation Regression • Linear Learner • XGBoost • KNN https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html
  46. PREPARE SageMaker Ground Truth Label training data for machine learning

    SageMaker Data Wrangler Aggregate and prepare data for machine learning SageMaker Processing Built-in Python, BYO R/Spark SageMaker Feature Store Store, update, retrieve, and share features SageMaker Clarify Detect bias and understand model predictions BUILD SageMaker Studio Notebooks Jupyter notebooks with elastic compute and sharing Built-in and Bring your-own Algorithms Dozens of optimized algorithms or bring your own Local Mode Test and prototype on your local machine SageMaker Autopilot Automatically create machine learning models with full visibility SageMaker JumpStart Pre-built solutions for common use cases TRAIN & TUNE Managed Training Distributed infrastructure management SageMaker Experiments Capture, organize, and compare every step Automatic Model Tuning Hyperparameter optimization Distributed Training Libraries Training for large datasets and models SageMaker Debugger Debug and profile training runs Managed Spot Training Reduce training cost by 90% DEPLOY & MANAGE Managed Deployment Fully managed, ultra low latency, high throughput Kubernetes & Kubeflow Integration Simplify Kubernetes-based machine learning Multi-Model Endpoints Reduce cost by hosting multiple models per instance SageMaker Model Monitor Maintain accuracy of deployed models SageMaker Edge Manager Manage and monitor models on edge devices SageMaker Pipelines Workflow orchestration and automation Amazon SageMaker SageMaker Studio Integrated development environment (IDE) for ML Amazon SageMaker overview ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
  47. Build Train Deploy ML infrastructure Operations Security & Compliance Machine

    Learning in the cloud SageMaker offers up to 96% lower TCO and 10x more developer productivity
  48. Capability Amazon SageMaker Compared to self-managed Amazon EC2 Compared to

    self-managed Kubernetes (EKS) Provision & manage instances Fully managed Self-managed Managed by AWS Manage security & compliance Built-in Self-managed Self-managed Infrastructure performance optimization Scales automatically Self-managed Self-managed Infrastructure management for high-availability Optimizes automatically Self-managed Self-managed Source of cost-savings
  49. Getting started with • SageMaker Immersion Day Workshop ✯✯✯ •

    SageMaker Examples (100+) ✯✯✯ • SageMaker Workshop (한국어) • Amazon SageMaker Overview (2020-03-25) • [Video] Amazon SageMaker Overview (2020-03-25) • [Video] Amazon SageMaker 데모 (2020-03-25) ✯✯✯ • AI/ML Resources - 동영상, 발표 자료 등