Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Amazon SageMaker Model Deployment Strategies

Amazon SageMaker Model Deployment Strategies

High-Performance & Cost-Effective Model Deployment Strategies with Amazon SageMaker

- SageMaker deployment strategy
- Workflow of deploying models in SageMaker
- Inference load testing
- Inference A/B testing
- Model monitoring

Sungmin Kim

July 27, 2022
Tweet

More Decks by Sungmin Kim

Other Decks in Programming

Transcript

  1. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. High-Performance & Cost-Effective Model Deployment Strategies with Amazon SageMaker Sungmin Kim Solutions Architect AWS
  2. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. What we covered today SageMaker deployment strategy Workflow of deploying models in SageMaker Inference load testing Inference A/B testing Model monitoring
  3. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Amazon SageMaker Purpose-built tools so you can be 10x more productive Amazon SageMaker Studio notebooks Access ML data Connect to many data sources such as Amazon S3, Apache Spark, Amazon Redshift, CSV files, and more Prepare data Transform data to browse data sources, explore metadata, schemas, and write queries in popular languages Build ML models Optimized with 150+ popular open-source models and frameworks such as TensorFlow and PyTorch Train and tune ML model Correct performance problems in real time Deploy and monitor results Create, automate, and manage end-to-end ML workflows to improve model quality
  4. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Separation of Concerns ML App ML App interface W H Y N E E D A M L I N F E R E N C E E N D P O I N T
  5. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Why optimize model deployment Spend Predictions drive complexity and cost in production 90% Prediction 10% Training
  6. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Deploying Models in SageMaker transformer = model.transformer( instance_count = 1, instance_type = 'ml.m5.xlarge') transformer.transform( test_data_s3, content_type = 'text/csv') Easy deployment of ML models Online and offline scoring Fully managed infrastructure predictor = model.deploy( initial_instance_count = 1, instance_type = 'ml.c5.4xlarge') prediction = predictor.predict(x_test)
  7. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Benefits when deploying models in SageMaker • Decouple application code from ML models • Call models from anywhere • Full lifecycle support • No surprises (if it works locally, it will work on AWS) • Train anywhere • Self service deployments
  8. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Deploy Models for Inference in SageMaker
  9. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. SageMaker inference options Low latency Ultra high throughput Multi-model endpoints A/B testing Real-time inference Asynchronous inference Near real-time Large payloads (1 GB) Long timeouts (15 mins) First purpose built serverless ML inference in cloud Fully managed Pay only for what you use, billed in milliseconds Serverless inference Batch transform Process large datasets Job-based system
  10. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. SageMaker Deployment – Real-time Inference SageMaker Real-time Inference Create a long-running microservice Instant response for payload up to 6MB Accessible from an external application Autoscaling
  11. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. real_time_endpoint = model.deploy( initial_instance_count = 1, instance_type = "ml.c5.xlarge", ...) real_time_endpoint.predict(payload) SageMaker Deployment – Real-time Inference SageMaker Real-time Inference Create a long-running microservice Instant response for payload up to 6MB Accessible from an external application Autoscaling
  12. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Current customer challenges with ML inference Increases TCO Spend lot of time in provisioning and managing servers Challenging to provision capacity Data scientists are challenged with selecting optimal instance types and managing autoscaling policies End up over-provisioning capacity Utilization is low and costs are high regardless of number of requests Workloads are intermittent Some ML workloads have less predictable usage patterns and long periods of inactivity
  13. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. SageMaker Deployment – Serverless Inference SageMaker Serverless Inference Ideal for unpredictable prediction traffic Workload tolerable to cold start Autoscaling (down to 0 instance)
  14. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. SageMaker Deployment – Serverless Inference SageMaker Serverless Inference Ideal for unpredictable prediction traffic Workload tolerable to cold start Autoscaling (down to 0 instance) from sagemaker.serverless import ServerlessInferenceConfig serverless_config = ServerlessInferenceConfig( memory_size_in_mb=4096, max_concurrency=10 ) serverless_predictor = model.deploy( serverless_inference_config=serverless_config ) serverless_predictor.predict(data)
  15. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Other challenges with ML inference Customers need to control costs through an environment that can scale automatically (including down to zero). Deep Learning models are complex and large, and may take several minutes to finish processing. Existing Real-Time Inference requests timeout after 60 seconds. Spinning up batch clusters takes too long. Customers need near “real-time” inference. Model sizes can be large Need to control costs Inference payloads can be large Some workloads can tolerate some latency Customers need to process large payloads (100s of MB or GB).
  16. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. SageMaker Deployment – Async Inference SageMaker Asynchronous Inference Ideal for large payload up to 1GB Longer processing timeout up to 15 min Autoscaling (down to 0 instance) Suitable for CV/NLP use cases
  17. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. SageMaker Deployment – Async Inference SageMaker Asynchronous Inference Ideal for large payload up to 1GB Longer processing timeout up to 15 min Autoscaling (down to 0 instance) Suitable for CV/NLP use cases from sagemaker.async_inference import AsyncInferenceConfig async_config = AsyncInferenceConfig( output_path="s3://{s3_bucket}/{bucket_prefix}/output", max_concurrent_invocations_per_instance=10, notification_config = { "SuccessTopic": sns_success_topic_arn, "ErrorTopic": sns_error_topic_arn }) async_predictor = model.deploy(async_inference_config=async_config) async_predictor.predict_async(input_path=input_s3_path)
  18. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. SageMaker Deployment – Batch Inference Fully managed mini-batching for large data Pay only for what you use Suitable for periodic arrival of large data SageMaker Batch Transform
  19. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. SageMaker Deployment – Batch Inference Fully managed mini-batching for large data Pay only for what you use Suitable for periodic arrival of large data SageMaker Batch Transform transformer = model.transformer( instance_count = 1, instance_type = "ml.m5.xlarge", output_path = "s3://{s3_bucket}/{bucket_prefix}/output") transformer.transform( input_data_s3, content_type = "text/csv")
  20. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. • Low latency • Multi-model/ Multi- container endpoints • A/B testing • Blue/Green Deployment guardrails • CPU/GPU support • Payload size < 6MB • Request timeout - 60 secs Example use cases: ad serving, personalized recommendations, fraud detection SageMaker Model Deployment Options Real-Time Inference Batch Transform Asynchronous Inference Serverless Inference • Process large datasets1 (Max mini-batch size - 100MB) • Higher throughput • Job-based system Example use cases: Data pre-processing, churn prediction, predictive maintenance • Near real-time • Large payloads (<1 GB) • Long timeouts (15 min) Example use cases: computer vision, object detection • Automatic scaling • No need to select or manage servers. • For workloads that can tolerate cold-start • Intermittent or unpredictable traffic • Payload size < 4MB • Request Timeout - 60 secs • Limits on maximum concurrent invocations per endpoint • CPU only support Example use cases: Test workloads, Extract & analyze data from documents, form processing Real-time Micro-batch Batch 1Each instance has 30GB EBS Volume. Maximum dataset size depends on number of instances in the batch transform and type of instance. G4dn instances come with their own local SSD storage.
  21. real_time_endpoint = model.deploy( initial_instance_count = 1, instance_type = "ml.c5.xlarge", ...)

    real_time_endpoint.predict(payload) from sagemaker.serverless import ( ServerlessInferenceConfig ) serverless_config = ServerlessInferenceConfig( memory_size_in_mb=4096, max_concurrency=10 ) serverless_predictor = model.deploy( serverless_inference_config=serverless_config ) serverless_predictor.predict(data) from sagemaker.async_inference import ( AsyncInferenceConfig ) async_config = AsyncInferenceConfig( output_path= "s3://{s3_bucket}/{bucket_prefix}/output", max_concurrent_invocations_per_instance=10, notification_config = { "SuccessTopic": sns_success_topic_arn, "ErrorTopic": sns_error_topic_arn }) async_predictor = model.deploy( async_inference_config=async_config) async_predictor.predict_async( input_path=input_s3_path) transformer = model.transformer( instance_count=1, instance_type="ml.m5.xlarge", output_path="s3://{s3_bucket}/{bucket_prefix}/output") transformer.transform( input_data_s3, content_type = "text/csv") 1 2 3 4 Real-time Inference Serverless Inference Async Inference Batch Inference
  22. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Cost-effective Model Deployment
  23. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Cost Considerations H O S T I N G I N D I V I D U A L E N D P O I N T S EndpointName='endpoint-05'
  24. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. SageMaker Deployment – Multi-model Endpoint C O S T - S A V I N G O P P O R T U N I T Y Host multiple models in one container Direct invocation to target model Improves resource utilization Dynamic loading model from Amazon S3 TargetModel= 'model-007.tar.gz' SageMaker Multi-Model Endpoint
  25. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. SageMaker Deployment – Multi-model Endpoint C O S T - S A V I N G O P P O R T U N I T Y Host multiple models in one container Direct invocation to target model Improves resource utilization Dynamic loading model from Amazon S3 SageMaker Multi-Model Endpoint TargetModel= 'model-013.tar.gz'
  26. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. EP-1 Model 1 EP-2 Model 2 EP-10 Model 10 … EP Model 1 Model 2 … Model 10 Example scenario: ml.c5.xlarge, $0.238/hr., 2 instances running 24/7 10 separate endpoints $3,430/mo. 1 multi-model endpoint $343/mo. SageMaker Deployment – Multi-model Endpoint C O S T - S A V I N G O P P O R T U N I T Y
  27. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. SageMaker Deployment – Multi-model Endpoint C O S T - S A V I N G O P P O R T U N I T Y Host multiple models in one container Direct invocation to target model Improves resource utilization Dynamic loading model from Amazon S3 SageMaker Multi-Model Endpoint container = { 'Image’: mme-supported-image, 'ModelDataUrl': 's3://my-bucket/folder-of-tar-gz’, 'Mode': 'MultiModel’} sm.create_model( Containers = [container], ...) sm.create_endpoint_config(); sm.create_endpoint() smrt.invoke_endpoint( EndpointName = endpoint_name, TargetModel = 'model-007.tar.gz’, Body = body, ...)
  28. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. TargetContainerHostname= ‘Container-05' SageMaker Deployment – Multi-container Endpoint C O S T - S A V I N G O P P O R T U N I T Y Host up to 15 distinct containers Direct or serial invocation No cold start vs. Multi-Model Endpoint SageMaker Multi-container Endpoint
  29. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. SageMaker Deployment – Multi-container Endpoint C O S T - S A V I N G O P P O R T U N I T Y Host up to 15 distinct containers Direct or serial invocation No cold start vs. Multi-Model Endpoint SageMaker Multi-container Endpoint
  30. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Inference Pipelines • Reuse the data transformers developed for training models • Low latency: All containers run on the same underlying EC2 Multi-container Endpoint: Inference Pipelines
  31. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. SageMaker Deployment – Multi-container Endpoint C O S T - S A V I N G O P P O R T U N I T Y container1 = { 'Image': container, 'ContainerHostname': 'firstContainer’}; ... sm.create_model( InferenceExecutionConfig = {'Mode': 'Direct’}, Containers = [container1, container2, ...], ...) sm.create_endpoint_config() sm.create_endpoint() smrt.invoke_endpoint( EndpointName = endpoint_name, TargetContainerHostname = 'firstContainer’, Body = body, ...) Host up to 15 distinct containers Direct or serial invocation No cold start vs. Multi-Model Endpoint SageMaker Multi-container Endpoint
  32. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Multi-Model vs. Multi-Container TargetModel= 'model-013.tar.gz' TargetContainerHostname= ‘Container-05' SageMaker Multi-container Endpoint SageMaker Multi-Model Endpoint
  33. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Multi-Model vs. Multi-Container container = { 'Image’: mme-supported-image, 'ModelDataUrl':'s3://my-bucket/folder-of-tar-gz’, 'Mode': 'MultiModel’} sm.create_model( Containers = [container], ...) sm.create_endpoint_config() sm.create_endpoint() smrt.invoke_endpoint( EndpointName = endpoint_name, TargetModel = 'model-007.tar.gz’, Body = body, ...) container1 = { 'Image': container, 'ContainerHostname': 'firstContainer’}; ... sm.create_model( InferenceExecutionConfig = {'Mode': 'Direct’}, Containers = [container1, container2, ...], ...) sm.create_endpoint_config() sm.create_endpoint() smrt.invoke_endpoint( EndpointName = endpoint_name, TargetContainerHostname = 'firstContainer’, Body = body, ...)
  34. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Inference Load Testing
  35. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. SageMaker ML instance options B A L A N C I N G B E T W E E N C O S T A N D P E R F O R M A N C E High throughput, and low-latency access to CUDA GPU INSTANCES P3 G4 Low throughput, low cost, most flexible CPU INSTANCES C5 Inf1: High throughput, high performance, and lowest cost in the cloud CUSTOM CHIP Inf1
  36. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. ML instance ML instance Endpoint Load testing K N O W Y O U R E N D P O I N T S Artificial requests Amazon SageMaker endpoint Endpoint Auto-scaling group Availability Zone 1 Availability Zone 2 ML instance ML instance ML instance ML instance Amazon CloudWatch Elastic Load Balancing
  37. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Optimizing inference takes skills, time, and effort Performance and load testing to validate latency and throughput requirements are met and costs are within budget Using ML frameworks with converters, compilers, and kernel libraries specific to different instance types and hardware vendors Selecting the right instance size, container parameters, and autoscaling properties to maximize performance Model tuning Manual benchmarking 70+ ML instance types Systems for ML Selecting the right instance type based on resource requirements of the ML model and data payloads
  38. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. SageMaker Inference Recommender F E A T U R E S Designed for MLOps engineers and data scientists to reduce time to get models into production Run extensive load tests that include production requirements – throughput, latency Load tests Get endpoint configuration settings that meet your production requirements Endpoint recommendations Instance recommendations Instance type recommendation for initial deployments
  39. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Get started with Inference Recommender 1 Container image 2 Model artifacts and sample payload Model registry 3 Model metadata Deploy your model Get initial instance recommendations Specify performance requirements and instance types for a custom load test View and compare performance and cost across different endpoint configurations Inference Recommender
  40. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. How to choose your Deployment Strategy A D E C I S I O N T R E E Live Predictions? SageMaker Serverless Inference Multiple models/ containers? Single ML framework SageMaker async inference SageMaker endpoint SageMaker multi-model endpoint SageMaker multi-container endpoint Fluctuating traffic? Load testing to right-size Auto-scaling Yes No Yes No. Multiple containers Yes Can Tolerate Cold Start? Yes Yes No No No (daily, hourly, weekly) Batch Transform > 4 MB Payload or > 60 sec Yes
  41. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Bonus - Model Monitoring - A/B Testing
  42. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. SageMaker Model Monitor O P T I M I Z I N G M O D E L A C C U R A C Y 79% Model quality drift Data drift SageMaker Clarify Feature importance drift & data bias
  43. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Endpoint A/B testing U S I N G P R O D U C T I O N V A R I A N T S sm.update_endpoint_weights_and_capacities( EndpointName=endpoint_name, DesiredWeightsAndCapacities=[ { "DesiredWeight": 0.1, "VariantName": ”new-model” }, { "DesiredWeight": 0.9, "VariantName": ”existing-model” } ] ) Elastic Load Balancing
  44. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Workflow of Deploying Models in SageMaker
  45. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. From SageMaker Notebooks training
  46. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Workflow of Deploying Models in SageMaker Creating a Model Defining the Endpoint Configuration Creating an Endpoint Invoking an Endpoint
  47. real_time_endpoint = model.deploy( initial_instance_count = 1, instance_type = "ml.c5.xlarge", ...)

    real_time_endpoint.predict(payload) from sagemaker.serverless import ( ServerlessInferenceConfig ) serverless_config = ServerlessInferenceConfig( memory_size_in_mb=4096, max_concurrency=10 ) serverless_predictor = model.deploy( serverless_inference_config=serverless_config ) serverless_predictor.predict(data) from sagemaker.async_inference import ( AsyncInferenceConfig ) async_config = AsyncInferenceConfig( output_path= "s3://{s3_bucket}/{bucket_prefix}/output", max_concurrent_invocations_per_instance=10, notification_config = { "SuccessTopic": sns_success_topic_arn, "ErrorTopic": sns_error_topic_arn }) async_predictor = model.deploy( async_inference_config=async_config) async_predictor.predict_async( input_path=input_s3_path) transformer = model.transformer( instance_count=1, instance_type="ml.m5.xlarge", output_path="s3://{s3_bucket}/{bucket_prefix}/output") transformer.transform( input_data_s3, content_type = "text/csv") 1 2 3 4 Real-time Inference Serverless Inference Async Inference Batch Inference Revisit: SageMaker Inference Options
  48. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Deploy Model: How it works 56 Create Model 1 Inference Container Image SageMaker Model Path to the SageMaker compatible inference image stored in ECR or a Private Docker Registry Packages your model for deployment
  49. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Deploy Model: How it works 57 Create Model 1 Inference Container Image Model Artifact SageMaker Model S3 Path to the trained model artifacts. **Required for SageMaker built-in algorithms. Packages your model for deployment
  50. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Deploy Model: How it works 58 Create Model 1 Inference Container Image Model Artifact SageMaker Model IAM Role IAM role that Sagemaker assumes to access model artifacts and the docker image for deployment Packages your model for deployment
  51. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Deploy Model: How it works 59 Create Model 1 Inference Container Image Model Artifact Advanced Configurations SageMaker Model IAM Role Advanced configuration options are dependent on the chosen deployment option. Examples include: VPC Configuration, Multi-Container & Multi-Model deployments. Packages your model for deployment
  52. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Deploy Model: How it works 60 Create Model 1 Packages your model for deployment Inference Container Image Model Artifact Advanced Configurations SageMaker Model IAM Role Configure & Deploy Model 2 Input Real-Time Inference Batch Transform Asynchronous Inference Serverless Inference Deploy model using the option that best meets the needs of your use case
  53. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Create Endpoint – SageMaker Python SDK Bring your own inference script using SageMaker Framework containers predictor.predict(payload) PyTorch EC2 t3.xlarge model = PyTorchModel(model_data=zipped_model_path, role=get_execution_role(), framework_version='1.5’, entry_point='inference.py’, py_version='py3’, predictor_cls=ImagePredictor) predictor = model.deploy( instance_type='ml.t3.medium’, initial_instance_count=1) Create Model Deploy Predict Creates endpoint Runs prediction Refers to Inference Container image inference.py 1. model_fn() -> model load 2. input_fn() ->input processing 3. predict_fn() -> predictions 4. output_fn()-> output processing Model Artifacts SageMaker framework container images
  54. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Inference with SageMaker Endpoint
  55. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Inference Endpoint Update
  56. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Endpoint UpdateEndpoint Docker Image (ECR) Model Artifacts (S3) model.tar.gz ── code | ├── inference.py | └── requirements.txt └── model.pth Docker Image (ECR) Model Artifacts (S3) model.tar.gz ── code | ├── inference.py | └── requirements.txt └── model.pth Instance Type Instance Count Variant … Instance Type Instance Count Variant … Endpoint Configuration 1 Model 1 Endpoint Configuration 2 Model 2 Updating an endpoint
  57. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. aws sagemaker create-model --model-name model2 --primary-container ‘{“Image”: “123.dkr.ecr.amazonaws.com/algo”, “ModelDataUrl”: “s3://bkt/model2.tar.gz”} --execution-role-arn arn:aws:iam::123:role/me aws sagemaker create-endpoint-config --endpoint-config-name model2-config --production-variants ‘{“InitialInstanceCount”: 2, “InstanceType”: “ml.m4.xlarge”, ”InitialVariantWeight”: 1, ”ModelName”: “model2”, ”VariantName”: “AllTraffic”}’ aws sagemaker update-endpoint --endpoint-name my-endpoint --endpoint-config-name model2-config New Model New Endpoint Config Same Endpoint Updating an endpoint using the AWS CLI R E A L - T I M E I N F E R E N C E
  58. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Benefits when deploying models in SageMaker Spend 90% Prediction 10% Training ML App interface Separation of Concerns Cost saving opportunity in production
  59. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. SageMaker Model Deployment Options Real-Time Inference Serverless Inference Asynchronous Inference Batch Transform Latency Low latency Sub-second Low Latency Sub-second (tolerates cold-start) Near real-time Long processing time (<15 min.) Indefinite timeout Frequency Continuous Unpredictable Near real-time user Event-based/ Scheduled Data Size Payload size < 6MB Payload size < 4MB Large payload (<1GB) Process large datasets Use Case Fraud Detection Form Processing Image Analysis Churn Prediction
  60. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. How to choose your Deployment Strategy A D E C I S I O N T R E E Live Predictions? SageMaker Serverless Inference Multiple models/ containers? Single ML framework SageMaker async inference SageMaker endpoint SageMaker multi-model endpoint SageMaker multi-container endpoint Fluctuating traffic? Load testing to right-size Auto-scaling Yes No Yes No. Multiple containers Yes Can Tolerate Cold Start? Yes Yes No No No (daily, hourly, weekly) Batch Transform > 4 MB Payload or > 60 sec Yes
  61. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Resources and Notebooks • Amazon SageMaker Workshop https://sagemaker-immersionday.workshop.aws/ • Amazon SageMaker Examples https://sagemaker-examples.readthedocs.io/en/latest/ • Amazon SageMaker python notebooks examples https://github.com/aws/amazon-sagemaker-examples • Amazon SageMaker Python SDK documentation https://sagemaker.readthedocs.io/en/stable/ • Amazon SageMaker Developer Guide https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html
  62. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Resources and Notebooks https://github.com/aws/amazon-sagemaker-examples • Real time inference /advanced_functionality/pytorch_deploy_pretrained_bert_model/pytorch_deploy_pretrained_be rt_model.ipynb • Serverless inference /serverless-inference/Serverless-Inference-Walkthrough.ipynb • Async Inference /async-inference/Async-Inference-Walkthrough-SageMaker-Python-SDK.ipynb • Batch Transform /sagemaker_batch_transform/pytorch_mnist_batch_transform/pytorch-mnist-batch- transform_outputs.ipynb
  63. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Resources and Notebooks https://github.com/aws/amazon-sagemaker-examples • Multi-Model /advanced_functionality/multi_model_xgboost_home_value/xgboost_multi_model_endpoint_ho me_value.ipynb • Multi-Container - Direct Invocation /advanced_functionality/multi-container-endpoint/direct-invocation/multi-container-direct- invocation.ipynb • Multi-Container - Inference Pipeline /sagemaker-python-sdk/scikit_learn_inference_pipeline/Inference Pipeline with Scikit-learn and Linear Learner.ipynb • Inference Recommender /sagemaker-inference-recommender/inference-recommender.ipynb
  64. Thank you! © 2022, Amazon Web Services, Inc. or its

    affiliates. All rights reserved.