Cloud Run for Kubernetes users

Slide 1

Slide 1 text

@ahmetb Cloud Run for Kubernetes users

Slide 2

Slide 2 text

about me ● once loved/advocated for Kubernetes ● now I think it’s an overkill for many you may know me from: ● kubectx / kubens ● Krew (kubectl plugin manager) ● Kubernetes NetworkPolicy recipes

Slide 3

Slide 3 text

cloud.run Serverless containers on Google Cloud’s managed infrastructure

Slide 4

Slide 4 text

App Engine, Heroku, .. Kubernetes Cloud Run

Slide 5

Slide 5 text

DEMO

Slide 6

Slide 6 text

If you know Knative, Cloud Run is a hosted Knative Serving API GKE Standard GKE Autopilot Google infra (Borg) Knative open source Kubernetes / GKE Knative API UI CLI YAML Cloud Run Anthos GKE-on-prem

Slide 7

Slide 7 text

$ gcloud run deploy \ /-image=gcr.io/my-image:latest \ /-cpu=2 \ /-memory=512Mi \ /-allow-unauthenticated \ /-set-env-vars FOO=BAR \ /-max-instances=100 \ /-concurrency=5 \ [//.] imperative deployments

Slide 8

Slide 8 text

Very similar to Kubernetes Deployment+Service objects ● Knative Service CRD ○ K8s Service+Deployment merged. ● same PodSpec as Kubernetes $ gcloud run services replace declarative deployments

Slide 9

Slide 9 text

● designed for stateless applications that are serving HTTP-based protocols ● great for: ○ microservices ○ APIs ○ frontends ● not for: ○ databases ○ custom binary protocols ○ game servers ○ run-to-completion jobs what does it run?

Slide 10

Slide 10 text

container contract ● Linux executables (x86-64 ABI) ● Deploy OCI container images No Windows or ARM support.

Slide 11

Slide 11 text

CPU allocated only during requests (no background threads) ● very low/no cpu otherwise ● ok for garbage collection etc ● not enough to push out metrics/traces CPU OFF CPU ON CPU OFF _

Slide 12

Slide 12 text

container lifecycle ● new containers are started on ○ first-time deployment ○ scale-up ○ failed health checks ● we wait until the container is listening on $PORT ○ now ready to receive traffic ● CPU is allocated only during requests ● unused container gets terminated with SIGTERM

Slide 13

Slide 13 text

networking: protocols works on Cloud Run: ● HTTP/1 ● HTTP/2 (h2c) ● WebSockets ● gRPC no arbitrary/binary protocols

Slide 14

Slide 14 text

networking: TLS ● Cloud Run provisions TLS certificates ○ both for *.run.app & custom domain ● forced HTTP→HTTPS redirects ● your app should prepare unencrypted responses ○ encryption added later by the infra

Slide 15

Slide 15 text

● built-in L7 load balancing for a service ● routes traffic between container instances ● no unintentional sticky sessions (per-request) serving: load balancing

Slide 16

Slide 16 text

● can split traffic between revisions (90% → v1, 10% → v2) ● used for canary deployments, progressive rollouts ● can have a domain for each revision: ○ v1//-myservice.[//.].run.app ○ commit-aeab13f//-myservice.[//.].run.app ○ latest//-myservice.[//.].run.app serving: traffic splitting/canary

Slide 17

Slide 17 text

break the glass & create Google Cloud L7 HTTPS LB. ○ enable CDN ○ BYO TLS certs ○ multi-region load balancing/failover ○ IAP (Identity-Aware Proxy a.k.a. BeyondCorp) serving: advanced load balancing

Slide 18

Slide 18 text

serving: concurrency limits ● typically functions/lambda: 1 req/container ● Cloud Run: supports concurrents requests (up to 250) why? ● helps you limit your load ● informs autoscaling decisions ● overlapping requests are not double-charged

Slide 19

Slide 19 text

● based on in-flight HTTP requests (no CPU/memory targets like Kubernetes HPA) ● concurrency=5 + inflight_requests=15 → 3 containers ● you can limit max container instances (prevent unbounded spending) ● no guarantee how long idle containers will be around autoscaling

Slide 20

Slide 20 text

serving: request/response limits ● max request timeout: 60 minutes ● no maximum request/response size (as much as you can recv or send) ● requests or responses are not buffered (bi-directional streaming)

Slide 21

Slide 21 text

pricing pay only "during" requests (for CPU/memory) ● overlapping requests are not double-charged networking egress costs + request count costs exist. generous free tier (cloud.google.com/free) FREE CHARGED FREE _

Slide 22

Slide 22 text

● cold starts exist! (inactive services scaling up 0/>1, new container instance waking up to handle traffic) ● minimize cold starts by using "minimum instances". ○ these are kept "warm" ○ costs ~10% of the active (serving) time cold start

Slide 23

Slide 23 text

● gVisor sandbox ○ Linux syscall emulation in userspace ○ used internally at Google to run untrusted workloads on Borg ○ low-level Linux APIs might not work: ■ iptables, some ioctls, … ● might change soon kernel

Slide 24

Slide 24 text

● files written to local disk: ○ are not persisted ○ counts towards available memory of container instance ● (currently) no support for mounting external volumes ○ such as Cloud Filestore (NAS) or Cloud Storage filesystem/volumes

Slide 25

Slide 25 text

● no current support ● upcoming support for Google Cloud Secret Manager ○ it will look like Kubernetes secret volume/volumeMounts secrets/configmaps

Slide 26

Slide 26 text

● individual container instances are not exposed to you (to keep things simple) ● auto-healing properties ○ faulty containers (20 consecutive 5xx) are replaced ○ crash-looping containers are replaced container instance fleet

Slide 27

Slide 27 text

1. Know the full URL (xxx.run.app) 2. Get a JWT token 3. Add as “Authorization” header to outgoing request Kubernetes or K8s+Istio experience shines here (for now). service-to-service calls

Slide 28

Slide 28 text

● Sidecars / multiple containers per Pod ○ Knative supports this ● Init containers ● Health probes ○ Cloud Run has primitive built-in startup health check ○ What do periodic probes mean when you don't have CPU allocated? ● mTLS/cert auth is still not possible Unsupported (for now) from Kubernetes

Slide 29

Slide 29 text

● Cloud Run: ○ solid highly-scalable serving layer ○ scales real-time, based on in-flight requests ○ designed with spiky workloads in mind ○ request buffering during autoscaling ● Kubernetes: ○ we all typically over-provision for unpredicted load ○ CPU/memory metrics are a side-effect of load, and often collected w/ a delay ○ without a buffering meat shield, excessive load will crash Pods. Kubernetes+Cloud Run=?

Slide 30

Slide 30 text

Kubernetes /> Cloud Run: 1. use GCP workload identity 2. get a JWT token from metadata service + add it to outbound request 3. set IAM permission on Cloud Run 4. Cloud Run serving layer automatically checks the JWT + IAM permissions of it. Cloud Run /> Kubernetes: 1. connect to the same VPC 2. provision an internal LB (or use VPC private IPs) 3. get a JWT token from metadata service + add it to outbound request 4. verify the JWT in your app coming from Cloud Run + DIY ACLs. Hybrid?

Slide 31

Slide 31 text

thanks Documentation: https://cloud.run FAQ: github.com/ahmetb/cloud-run-faq twitter.com/ahmetb github.com/ahmetb

Slide 32

Slide 32 text

Appendix Migrating a Kubernetes deployment to Knative

Slide 33

Slide 33 text

apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: replicas: 1 selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080

Slide 34

Slide 34 text

Slide 35

Slide 35 text

Slide 36

Slide 36 text

Slide 37

Slide 37 text

Slide 38

Slide 38 text

Slide 39

Slide 39 text

apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080 apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: replicas: 1 selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service

Slide 40

Slide 40 text

apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: template: spec: containers: - image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi

Slide 41

Slide 41 text

apiVersion: apps/v1 serving.knative.dev/v1 kind: Deployment Service metadata: name: hello-web spec: template: spec: containers: - image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi

Slide 42

Slide 42 text

apiVersion: serving.knative.dev/v1 kind: Service metadata: name: hello-web spec: template: spec: containers: - image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi