Introduction to Kserve

© 2022, Amazon Web Services, Inc. or its affiliates. All
rights reserved. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. Introduction to Kserve Keita Watanabe Sr. Solutions Architect, AI/ML Frameworsk AWS

rights reserved. Agenda 2 • KServe Overview • KServe Components • Inference Service • Predictor • AutoScaling with Knative Pod Autoscaler (KPA) • ML inference with KServe Examples

rights reserved. KServe 3 https://kserve.github.io/website/master/

rights reserved. KServe Features 4 • Scale to and from Zero • Request based Autoscaling • Batching • Request/Response logging • Traffic management • Security with AuthN/AuthZ • Distributed Tracing • Out-of-the-box metrics

rights reserved. KServe Control Plane 5 • Responsible for reconciling the InferenceService custom resources. • It creates the Knative serverless deployment for predictor, transformer to enable autoscaling based on incoming request workload including scaling down to zero when no traffic is received.

rights reserved. KServe Control Plane 6 • Responsible for reconciling the InferenceService custom resources. • It creates the Knative serverless deployment for predictor, transformer to enable autoscaling based on incoming request workload including scaling down to zero when no traffic is received.

rights reserved. Predictor 7 https://kserve.github.io/website/master/

rights reserved. Predictor 8 https://kserve.github.io/website/master/ Queue Proxy measures and limit concurrency to the user’s application Model Server deploys, manages, and serves machine learning models Storage Initializer retrieves and prepares machine learning models from various storage backends like Amazon S3

rights reserved. Transformer 9 https://kserve.github.io/website/master/

rights reserved. Transformer 10 https://kserve.github.io/website/master/ Queue Proxy measures and limits concurrency to the user’s application. Model Server preprocesses input data and postprocesses output predictions, enabling seamless integration of custom logic or data transformations with the deployed machine learning models for improved model serving and inference.

rights reserved. KServe Control Plane 11 Inference Service KServe Controller Knative Service Knative Revision Deployment Reconcile Serverless Raw Deployment

rights reserved. KServe Control Plane 12 Inference Service KServe Controller Knative Service Knative Revision Deployment Reconcile Serverless Raw Deployment

rights reserved. Knative Components 13 Inference Service KServe Controller Knative Service Knative Revision Deployment Reconcile Serverless Raw Deployment https://knative.dev/docs/serving/

rights reserved. Knative Serving 14 Inference Service KServe Controller Knative Service Knative Revision Deployment Reconcile Serverless Raw Deployment https://knative.dev/docs/serving/ Knative

rights reserved. Primary Knative Serving Resources 15 Knative Knative Service resource automatically manages the whole lifecycle of your workload. Routes maps a network endpoint to one or more revisions. Configuration maintains the desired state for your deployment. Revision is a point-in-time snapshot of the code and configuration for each modification made to the workload. Deployment

rights reserved. Revision Autoscaling with Knative Pod Autoscaler (KPA) 16 Route Activator Pods Deployment Autoscaler Inactive route Pull metrics Push metrics scales Creates/ deletes Active route https://knative.dev/docs/serving/istio-authorization/ https://developer.aliyun.com/article/710828

rights reserved. Scaling up and down (steady state) 17 https://github.com/knative/serving/blob/main/docs/scaling/SYSTEM.md

rights reserved. Scaling to zero 18 https://github.com/knative/serving/blob/main/docs/scaling/SYSTEM.md

rights reserved. Scaling from Zero 19 https://github.com/knative/serving/blob/main/docs/scaling/SYSTEM.md

rights reserved. Autoscale Sample 20 https://github.com/dewitt/knative- docs/tree/master/serving/samples/autoscale-go Ramp up traffic to maintain 10 in-flight requests.

rights reserved. Scaling pod from zero 21 https://github.com/dewitt/knative- docs/tree/master/serving/samples/autoscale-go

rights reserved. Difference between KPA and HPA 22 Knative Pod Autoscaler (KPA) • Part of the Knative Serving core and enabled by default once Knative Serving is installed. • Supports scale to zero functionality. • Does not support CPU-based autoscaling. Horizontal Pod Autoscaler (HPA) • Not part of the Knative Serving core, and must be enabled after Knative Serving installation. • Does not support scale to zero functionality. • Supports CPU-based autoscaling. https://kserve.github.io/website/0.8/modelserving/v1b eta1/torchserve/#autoscaling

rights reserved. We have covered Knative Serving part… 23 Inference Service KServe Controller Knative Service Knative Revision Deployment Reconcile Serverless Raw Deployment https://knative.dev/docs/serving/ Knative

rights reserved. Up next: Inference Service 24 Inference Service KServe Controller Knative Service Knative Revision Deployment Reconcile Serverless Raw Deployment Question: Do we have to deal with the complexity in Knative? Answer: No! All we need is Inference Service.

rights reserved. First InferenceSevice 25 Apply

rights reserved. First Inference Service 26

rights reserved. First Inference Service: load test 27 Under the hood https://kserve.github.io/website/master/get_started/first_isvc/ #5-perform-inference

Introduction to Kserve

Introduction to Kserve

Keita Watanabe

More Decks by Keita Watanabe

Other Decks in Technology

Featured

Transcript

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All

© 2022, Amazon Web Services, Inc. or its affiliates. All