$30 off During Our Annual Pro Sale. View Details »

Cloud Run for Kubernetes users

Cloud Run for Kubernetes users

Ahmet Alp Balkan

March 12, 2021
Tweet

More Decks by Ahmet Alp Balkan

Other Decks in Technology

Transcript

  1. @ahmetb Cloud Run for Kubernetes users

  2. about me • once loved/advocated for Kubernetes • now I

    think it’s an overkill for many you may know me from: • kubectx / kubens • Krew (kubectl plugin manager) • Kubernetes NetworkPolicy recipes
  3. cloud.run Serverless containers on Google Cloud’s managed infrastructure

  4. App Engine, Heroku, .. Kubernetes Cloud Run

  5. DEMO

  6. If you know Knative, Cloud Run is a hosted Knative

    Serving API GKE Standard GKE Autopilot Google infra (Borg) Knative open source Kubernetes / GKE Knative API UI CLI YAML Cloud Run Anthos GKE-on-prem
  7. $ gcloud run deploy \ /-image=gcr.io/my-image:latest \ /-cpu=2 \ /-memory=512Mi

    \ /-allow-unauthenticated \ /-set-env-vars FOO=BAR \ /-max-instances=100 \ /-concurrency=5 \ [//.] imperative deployments
  8. Very similar to Kubernetes Deployment+Service objects • Knative Service CRD

    ◦ K8s Service+Deployment merged. • same PodSpec as Kubernetes $ gcloud run services replace <MANIFEST> declarative deployments
  9. • designed for stateless applications that are serving HTTP-based protocols

    • great for: ◦ microservices ◦ APIs ◦ frontends • not for: ◦ databases ◦ custom binary protocols ◦ game servers ◦ run-to-completion jobs what does it run?
  10. container contract • Linux executables (x86-64 ABI) • Deploy OCI

    container images No Windows or ARM support.
  11. CPU allocated only during requests (no background threads) • very

    low/no cpu otherwise • ok for garbage collection etc • not enough to push out metrics/traces CPU OFF CPU ON CPU OFF _
  12. container lifecycle • new containers are started on ◦ first-time

    deployment ◦ scale-up ◦ failed health checks • we wait until the container is listening on $PORT ◦ now ready to receive traffic • CPU is allocated only during requests • unused container gets terminated with SIGTERM
  13. networking: protocols works on Cloud Run: • HTTP/1 • HTTP/2

    (h2c) • WebSockets • gRPC no arbitrary/binary protocols
  14. networking: TLS • Cloud Run provisions TLS certificates ◦ both

    for *.run.app & custom domain • forced HTTP→HTTPS redirects • your app should prepare unencrypted responses ◦ encryption added later by the infra
  15. • built-in L7 load balancing for a service • routes

    traffic between container instances • no unintentional sticky sessions (per-request) serving: load balancing
  16. • can split traffic between revisions (90% → v1, 10%

    → v2) • used for canary deployments, progressive rollouts • can have a domain for each revision: ◦ v1//-myservice.[//.].run.app ◦ commit-aeab13f//-myservice.[//.].run.app ◦ latest//-myservice.[//.].run.app serving: traffic splitting/canary
  17. break the glass & create Google Cloud L7 HTTPS LB.

    ◦ enable CDN ◦ BYO TLS certs ◦ multi-region load balancing/failover ◦ IAP (Identity-Aware Proxy a.k.a. BeyondCorp) serving: advanced load balancing
  18. serving: concurrency limits • typically functions/lambda: 1 req/container • Cloud

    Run: supports concurrents requests (up to 250) why? • helps you limit your load • informs autoscaling decisions • overlapping requests are not double-charged
  19. • based on in-flight HTTP requests (no CPU/memory targets like

    Kubernetes HPA) • concurrency=5 + inflight_requests=15 → 3 containers • you can limit max container instances (prevent unbounded spending) • no guarantee how long idle containers will be around autoscaling
  20. serving: request/response limits • max request timeout: 60 minutes •

    no maximum request/response size (as much as you can recv or send) • requests or responses are not buffered (bi-directional streaming)
  21. pricing pay only "during" requests (for CPU/memory) • overlapping requests

    are not double-charged networking egress costs + request count costs exist. generous free tier (cloud.google.com/free) FREE CHARGED FREE _
  22. • cold starts exist! (inactive services scaling up 0/>1, new

    container instance waking up to handle traffic) • minimize cold starts by using "minimum instances". ◦ these are kept "warm" ◦ costs ~10% of the active (serving) time cold start
  23. • gVisor sandbox ◦ Linux syscall emulation in userspace ◦

    used internally at Google to run untrusted workloads on Borg ◦ low-level Linux APIs might not work: ▪ iptables, some ioctls, … • might change soon kernel
  24. • files written to local disk: ◦ are not persisted

    ◦ counts towards available memory of container instance • (currently) no support for mounting external volumes ◦ such as Cloud Filestore (NAS) or Cloud Storage filesystem/volumes
  25. • no current support • upcoming support for Google Cloud

    Secret Manager ◦ it will look like Kubernetes secret volume/volumeMounts secrets/configmaps
  26. • individual container instances are not exposed to you (to

    keep things simple) • auto-healing properties ◦ faulty containers (20 consecutive 5xx) are replaced ◦ crash-looping containers are replaced container instance fleet
  27. 1. Know the full URL (xxx.run.app) 2. Get a JWT

    token 3. Add as “Authorization” header to outgoing request Kubernetes or K8s+Istio experience shines here (for now). service-to-service calls
  28. • Sidecars / multiple containers per Pod ◦ Knative supports

    this • Init containers • Health probes ◦ Cloud Run has primitive built-in startup health check ◦ What do periodic probes mean when you don't have CPU allocated? • mTLS/cert auth is still not possible Unsupported (for now) from Kubernetes
  29. • Cloud Run: ◦ solid highly-scalable serving layer ◦ scales

    real-time, based on in-flight requests ◦ designed with spiky workloads in mind ◦ request buffering during autoscaling • Kubernetes: ◦ we all typically over-provision for unpredicted load ◦ CPU/memory metrics are a side-effect of load, and often collected w/ a delay ◦ without a buffering meat shield, excessive load will crash Pods. Kubernetes+Cloud Run=?
  30. Kubernetes /> Cloud Run: 1. use GCP workload identity 2.

    get a JWT token from metadata service + add it to outbound request 3. set IAM permission on Cloud Run 4. Cloud Run serving layer automatically checks the JWT + IAM permissions of it. Cloud Run /> Kubernetes: 1. connect to the same VPC 2. provision an internal LB (or use VPC private IPs) 3. get a JWT token from metadata service + add it to outbound request 4. verify the JWT in your app coming from Cloud Run + DIY ACLs. Hybrid?
  31. thanks Documentation: https://cloud.run FAQ: github.com/ahmetb/cloud-run-faq twitter.com/ahmetb github.com/ahmetb

  32. Appendix Migrating a Kubernetes deployment to Knative

  33. apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: replicas: 1

    selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080
  34. apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: replicas: 1

    selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080 no need, Knative will give us a $PORT
  35. apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: replicas: 1

    selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080 no need for all these labels and selectors
  36. apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: replicas: 1

    selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080 Knative autoscales
  37. apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: replicas: 1

    selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080 Knative creates both internal and external endpoints by default
  38. apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: replicas: 1

    selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080 No need for a container name if you have only one
  39. apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer

    selector: app: hello tier: web ports: - port: 80 targetPort: 8080 apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: replicas: 1 selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service
  40. apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: template: spec:

    containers: - image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi
  41. apiVersion: apps/v1 serving.knative.dev/v1 kind: Deployment Service metadata: name: hello-web spec:

    template: spec: containers: - image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi
  42. apiVersion: serving.knative.dev/v1 kind: Service metadata: name: hello-web spec: template: spec:

    containers: - image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi