Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cloud Run for Kubernetes users

Cloud Run for Kubernetes users

Ahmet Alp Balkan

March 12, 2021
Tweet

More Decks by Ahmet Alp Balkan

Other Decks in Technology

Transcript

  1. about me • once loved/advocated for Kubernetes • now I

    think it’s an overkill for many you may know me from: • kubectx / kubens • Krew (kubectl plugin manager) • Kubernetes NetworkPolicy recipes
  2. If you know Knative, Cloud Run is a hosted Knative

    Serving API GKE Standard GKE Autopilot Google infra (Borg) Knative open source Kubernetes / GKE Knative API UI CLI YAML Cloud Run Anthos GKE-on-prem
  3. $ gcloud run deploy \ /-image=gcr.io/my-image:latest \ /-cpu=2 \ /-memory=512Mi

    \ /-allow-unauthenticated \ /-set-env-vars FOO=BAR \ /-max-instances=100 \ /-concurrency=5 \ [//.] imperative deployments
  4. Very similar to Kubernetes Deployment+Service objects • Knative Service CRD

    ◦ K8s Service+Deployment merged. • same PodSpec as Kubernetes $ gcloud run services replace <MANIFEST> declarative deployments
  5. • designed for stateless applications that are serving HTTP-based protocols

    • great for: ◦ microservices ◦ APIs ◦ frontends • not for: ◦ databases ◦ custom binary protocols ◦ game servers ◦ run-to-completion jobs what does it run?
  6. container contract • Linux executables (x86-64 ABI) • Deploy OCI

    container images No Windows or ARM support.
  7. CPU allocated only during requests (no background threads) • very

    low/no cpu otherwise • ok for garbage collection etc • not enough to push out metrics/traces CPU OFF CPU ON CPU OFF _
  8. container lifecycle • new containers are started on ◦ first-time

    deployment ◦ scale-up ◦ failed health checks • we wait until the container is listening on $PORT ◦ now ready to receive traffic • CPU is allocated only during requests • unused container gets terminated with SIGTERM
  9. networking: protocols works on Cloud Run: • HTTP/1 • HTTP/2

    (h2c) • WebSockets • gRPC no arbitrary/binary protocols
  10. networking: TLS • Cloud Run provisions TLS certificates ◦ both

    for *.run.app & custom domain • forced HTTP→HTTPS redirects • your app should prepare unencrypted responses ◦ encryption added later by the infra
  11. • built-in L7 load balancing for a service • routes

    traffic between container instances • no unintentional sticky sessions (per-request) serving: load balancing
  12. • can split traffic between revisions (90% → v1, 10%

    → v2) • used for canary deployments, progressive rollouts • can have a domain for each revision: ◦ v1//-myservice.[//.].run.app ◦ commit-aeab13f//-myservice.[//.].run.app ◦ latest//-myservice.[//.].run.app serving: traffic splitting/canary
  13. break the glass & create Google Cloud L7 HTTPS LB.

    ◦ enable CDN ◦ BYO TLS certs ◦ multi-region load balancing/failover ◦ IAP (Identity-Aware Proxy a.k.a. BeyondCorp) serving: advanced load balancing
  14. serving: concurrency limits • typically functions/lambda: 1 req/container • Cloud

    Run: supports concurrents requests (up to 250) why? • helps you limit your load • informs autoscaling decisions • overlapping requests are not double-charged
  15. • based on in-flight HTTP requests (no CPU/memory targets like

    Kubernetes HPA) • concurrency=5 + inflight_requests=15 → 3 containers • you can limit max container instances (prevent unbounded spending) • no guarantee how long idle containers will be around autoscaling
  16. serving: request/response limits • max request timeout: 60 minutes •

    no maximum request/response size (as much as you can recv or send) • requests or responses are not buffered (bi-directional streaming)
  17. pricing pay only "during" requests (for CPU/memory) • overlapping requests

    are not double-charged networking egress costs + request count costs exist. generous free tier (cloud.google.com/free) FREE CHARGED FREE _
  18. • cold starts exist! (inactive services scaling up 0/>1, new

    container instance waking up to handle traffic) • minimize cold starts by using "minimum instances". ◦ these are kept "warm" ◦ costs ~10% of the active (serving) time cold start
  19. • gVisor sandbox ◦ Linux syscall emulation in userspace ◦

    used internally at Google to run untrusted workloads on Borg ◦ low-level Linux APIs might not work: ▪ iptables, some ioctls, … • might change soon kernel
  20. • files written to local disk: ◦ are not persisted

    ◦ counts towards available memory of container instance • (currently) no support for mounting external volumes ◦ such as Cloud Filestore (NAS) or Cloud Storage filesystem/volumes
  21. • no current support • upcoming support for Google Cloud

    Secret Manager ◦ it will look like Kubernetes secret volume/volumeMounts secrets/configmaps
  22. • individual container instances are not exposed to you (to

    keep things simple) • auto-healing properties ◦ faulty containers (20 consecutive 5xx) are replaced ◦ crash-looping containers are replaced container instance fleet
  23. 1. Know the full URL (xxx.run.app) 2. Get a JWT

    token 3. Add as “Authorization” header to outgoing request Kubernetes or K8s+Istio experience shines here (for now). service-to-service calls
  24. • Sidecars / multiple containers per Pod ◦ Knative supports

    this • Init containers • Health probes ◦ Cloud Run has primitive built-in startup health check ◦ What do periodic probes mean when you don't have CPU allocated? • mTLS/cert auth is still not possible Unsupported (for now) from Kubernetes
  25. • Cloud Run: ◦ solid highly-scalable serving layer ◦ scales

    real-time, based on in-flight requests ◦ designed with spiky workloads in mind ◦ request buffering during autoscaling • Kubernetes: ◦ we all typically over-provision for unpredicted load ◦ CPU/memory metrics are a side-effect of load, and often collected w/ a delay ◦ without a buffering meat shield, excessive load will crash Pods. Kubernetes+Cloud Run=?
  26. Kubernetes /> Cloud Run: 1. use GCP workload identity 2.

    get a JWT token from metadata service + add it to outbound request 3. set IAM permission on Cloud Run 4. Cloud Run serving layer automatically checks the JWT + IAM permissions of it. Cloud Run /> Kubernetes: 1. connect to the same VPC 2. provision an internal LB (or use VPC private IPs) 3. get a JWT token from metadata service + add it to outbound request 4. verify the JWT in your app coming from Cloud Run + DIY ACLs. Hybrid?
  27. apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: replicas: 1

    selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080
  28. apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: replicas: 1

    selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080 no need, Knative will give us a $PORT
  29. apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: replicas: 1

    selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080 no need for all these labels and selectors
  30. apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: replicas: 1

    selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080 Knative autoscales
  31. apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: replicas: 1

    selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080 Knative creates both internal and external endpoints by default
  32. apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: replicas: 1

    selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080 No need for a container name if you have only one
  33. apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer

    selector: app: hello tier: web ports: - port: 80 targetPort: 8080 apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: replicas: 1 selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service
  34. apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: template: spec:

    containers: - image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi
  35. apiVersion: apps/v1 serving.knative.dev/v1 kind: Deployment Service metadata: name: hello-web spec:

    template: spec: containers: - image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi
  36. apiVersion: serving.knative.dev/v1 kind: Service metadata: name: hello-web spec: template: spec:

    containers: - image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi