Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Kserve

Introduction to Kserve

This deck covers basics of KServe including:
* KServe overview
* Explanation of major Kserve components
* Under the hood of Autoscaling: Knative Pod Autocaler (KPA)

Keita Watanabe

May 05, 2023
Tweet

More Decks by Keita Watanabe

Other Decks in Technology

Transcript

  1. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. Introduction to Kserve Keita Watanabe Sr. Solutions Architect, AI/ML Frameworsk AWS
  2. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Agenda 2 • KServe Overview • KServe Components • Inference Service • Predictor • AutoScaling with Knative Pod Autoscaler (KPA) • ML inference with KServe Examples
  3. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. KServe 3 https://kserve.github.io/website/master/
  4. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. KServe Features 4 • Scale to and from Zero • Request based Autoscaling • Batching • Request/Response logging • Traffic management • Security with AuthN/AuthZ • Distributed Tracing • Out-of-the-box metrics
  5. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. KServe Control Plane 5 • Responsible for reconciling the InferenceService custom resources. • It creates the Knative serverless deployment for predictor, transformer to enable autoscaling based on incoming request workload including scaling down to zero when no traffic is received.
  6. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. KServe Control Plane 6 • Responsible for reconciling the InferenceService custom resources. • It creates the Knative serverless deployment for predictor, transformer to enable autoscaling based on incoming request workload including scaling down to zero when no traffic is received.
  7. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Predictor 7 https://kserve.github.io/website/master/
  8. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Predictor 8 https://kserve.github.io/website/master/ Queue Proxy measures and limit concurrency to the user’s application Model Server deploys, manages, and serves machine learning models Storage Initializer retrieves and prepares machine learning models from various storage backends like Amazon S3
  9. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Transformer 9 https://kserve.github.io/website/master/
  10. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Transformer 10 https://kserve.github.io/website/master/ Queue Proxy measures and limits concurrency to the user’s application. Model Server preprocesses input data and postprocesses output predictions, enabling seamless integration of custom logic or data transformations with the deployed machine learning models for improved model serving and inference.
  11. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. KServe Control Plane 11 Inference Service KServe Controller Knative Service Knative Revision Deployment Reconcile Serverless Raw Deployment
  12. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. KServe Control Plane 12 Inference Service KServe Controller Knative Service Knative Revision Deployment Reconcile Serverless Raw Deployment
  13. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Knative Components 13 Inference Service KServe Controller Knative Service Knative Revision Deployment Reconcile Serverless Raw Deployment https://knative.dev/docs/serving/
  14. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Knative Serving 14 Inference Service KServe Controller Knative Service Knative Revision Deployment Reconcile Serverless Raw Deployment https://knative.dev/docs/serving/ Knative
  15. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Primary Knative Serving Resources 15 Knative Knative Service resource automatically manages the whole lifecycle of your workload. Routes maps a network endpoint to one or more revisions. Configuration maintains the desired state for your deployment. Revision is a point-in-time snapshot of the code and configuration for each modification made to the workload. Deployment
  16. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Revision Autoscaling with Knative Pod Autoscaler (KPA) 16 Route Activator Pods Deployment Autoscaler Inactive route Pull metrics Push metrics scales Creates/ deletes Active route https://knative.dev/docs/serving/istio-authorization/ https://developer.aliyun.com/article/710828
  17. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Scaling up and down (steady state) 17 https://github.com/knative/serving/blob/main/docs/scaling/SYSTEM.md
  18. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Scaling to zero 18 https://github.com/knative/serving/blob/main/docs/scaling/SYSTEM.md
  19. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Scaling from Zero 19 https://github.com/knative/serving/blob/main/docs/scaling/SYSTEM.md
  20. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Autoscale Sample 20 https://github.com/dewitt/knative- docs/tree/master/serving/samples/autoscale-go Ramp up traffic to maintain 10 in-flight requests.
  21. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Scaling pod from zero 21 https://github.com/dewitt/knative- docs/tree/master/serving/samples/autoscale-go
  22. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Difference between KPA and HPA 22 Knative Pod Autoscaler (KPA) • Part of the Knative Serving core and enabled by default once Knative Serving is installed. • Supports scale to zero functionality. • Does not support CPU-based autoscaling. Horizontal Pod Autoscaler (HPA) • Not part of the Knative Serving core, and must be enabled after Knative Serving installation. • Does not support scale to zero functionality. • Supports CPU-based autoscaling. https://kserve.github.io/website/0.8/modelserving/v1b eta1/torchserve/#autoscaling
  23. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. We have covered Knative Serving part… 23 Inference Service KServe Controller Knative Service Knative Revision Deployment Reconcile Serverless Raw Deployment https://knative.dev/docs/serving/ Knative
  24. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Up next: Inference Service 24 Inference Service KServe Controller Knative Service Knative Revision Deployment Reconcile Serverless Raw Deployment Question: Do we have to deal with the complexity in Knative? Answer: No! All we need is Inference Service.
  25. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. First InferenceSevice 25 Apply
  26. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. First Inference Service 26
  27. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. First Inference Service: load test 27 Under the hood https://kserve.github.io/website/master/get_started/first_isvc/ #5-perform-inference
  28. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you!