Slide 1

Slide 1 text

@ahmetb Cloud Run for Kubernetes users

Slide 2

Slide 2 text

about me ● once loved/advocated for Kubernetes ● now I think it’s an overkill for many you may know me from: ● kubectx / kubens ● Krew (kubectl plugin manager) ● Kubernetes NetworkPolicy recipes

Slide 3

Slide 3 text

cloud.run Serverless containers on Google Cloud’s managed infrastructure

Slide 4

Slide 4 text

App Engine, Heroku, .. Kubernetes Cloud Run

Slide 5

Slide 5 text

DEMO

Slide 6

Slide 6 text

If you know Knative, Cloud Run is a hosted Knative Serving API GKE Standard GKE Autopilot Google infra (Borg) Knative open source Kubernetes / GKE Knative API UI CLI YAML Cloud Run Anthos GKE-on-prem

Slide 7

Slide 7 text

$ gcloud run deploy \ /-image=gcr.io/my-image:latest \ /-cpu=2 \ /-memory=512Mi \ /-allow-unauthenticated \ /-set-env-vars FOO=BAR \ /-max-instances=100 \ /-concurrency=5 \ [//.] imperative deployments

Slide 8

Slide 8 text

Very similar to Kubernetes Deployment+Service objects ● Knative Service CRD ○ K8s Service+Deployment merged. ● same PodSpec as Kubernetes $ gcloud run services replace declarative deployments

Slide 9

Slide 9 text

● designed for stateless applications that are serving HTTP-based protocols ● great for: ○ microservices ○ APIs ○ frontends ● not for: ○ databases ○ custom binary protocols ○ game servers ○ run-to-completion jobs what does it run?

Slide 10

Slide 10 text

container contract ● Linux executables (x86-64 ABI) ● Deploy OCI container images No Windows or ARM support.

Slide 11

Slide 11 text

CPU allocated only during requests (no background threads) ● very low/no cpu otherwise ● ok for garbage collection etc ● not enough to push out metrics/traces CPU OFF CPU ON CPU OFF _

Slide 12

Slide 12 text

container lifecycle ● new containers are started on ○ first-time deployment ○ scale-up ○ failed health checks ● we wait until the container is listening on $PORT ○ now ready to receive traffic ● CPU is allocated only during requests ● unused container gets terminated with SIGTERM

Slide 13

Slide 13 text

networking: protocols works on Cloud Run: ● HTTP/1 ● HTTP/2 (h2c) ● WebSockets ● gRPC no arbitrary/binary protocols

Slide 14

Slide 14 text

networking: TLS ● Cloud Run provisions TLS certificates ○ both for *.run.app & custom domain ● forced HTTP→HTTPS redirects ● your app should prepare unencrypted responses ○ encryption added later by the infra

Slide 15

Slide 15 text

● built-in L7 load balancing for a service ● routes traffic between container instances ● no unintentional sticky sessions (per-request) serving: load balancing

Slide 16

Slide 16 text

● can split traffic between revisions (90% → v1, 10% → v2) ● used for canary deployments, progressive rollouts ● can have a domain for each revision: ○ v1//-myservice.[//.].run.app ○ commit-aeab13f//-myservice.[//.].run.app ○ latest//-myservice.[//.].run.app serving: traffic splitting/canary

Slide 17

Slide 17 text

break the glass & create Google Cloud L7 HTTPS LB. ○ enable CDN ○ BYO TLS certs ○ multi-region load balancing/failover ○ IAP (Identity-Aware Proxy a.k.a. BeyondCorp) serving: advanced load balancing

Slide 18

Slide 18 text

serving: concurrency limits ● typically functions/lambda: 1 req/container ● Cloud Run: supports concurrents requests (up to 250) why? ● helps you limit your load ● informs autoscaling decisions ● overlapping requests are not double-charged

Slide 19

Slide 19 text

● based on in-flight HTTP requests (no CPU/memory targets like Kubernetes HPA) ● concurrency=5 + inflight_requests=15 → 3 containers ● you can limit max container instances (prevent unbounded spending) ● no guarantee how long idle containers will be around autoscaling

Slide 20

Slide 20 text

serving: request/response limits ● max request timeout: 60 minutes ● no maximum request/response size (as much as you can recv or send) ● requests or responses are not buffered (bi-directional streaming)

Slide 21

Slide 21 text

pricing pay only "during" requests (for CPU/memory) ● overlapping requests are not double-charged networking egress costs + request count costs exist. generous free tier (cloud.google.com/free) FREE CHARGED FREE _

Slide 22

Slide 22 text

● cold starts exist! (inactive services scaling up 0/>1, new container instance waking up to handle traffic) ● minimize cold starts by using "minimum instances". ○ these are kept "warm" ○ costs ~10% of the active (serving) time cold start

Slide 23

Slide 23 text

● gVisor sandbox ○ Linux syscall emulation in userspace ○ used internally at Google to run untrusted workloads on Borg ○ low-level Linux APIs might not work: ■ iptables, some ioctls, … ● might change soon kernel

Slide 24

Slide 24 text

● files written to local disk: ○ are not persisted ○ counts towards available memory of container instance ● (currently) no support for mounting external volumes ○ such as Cloud Filestore (NAS) or Cloud Storage filesystem/volumes

Slide 25

Slide 25 text

● no current support ● upcoming support for Google Cloud Secret Manager ○ it will look like Kubernetes secret volume/volumeMounts secrets/configmaps

Slide 26

Slide 26 text

● individual container instances are not exposed to you (to keep things simple) ● auto-healing properties ○ faulty containers (20 consecutive 5xx) are replaced ○ crash-looping containers are replaced container instance fleet

Slide 27

Slide 27 text

1. Know the full URL (xxx.run.app) 2. Get a JWT token 3. Add as “Authorization” header to outgoing request Kubernetes or K8s+Istio experience shines here (for now). service-to-service calls

Slide 28

Slide 28 text

● Sidecars / multiple containers per Pod ○ Knative supports this ● Init containers ● Health probes ○ Cloud Run has primitive built-in startup health check ○ What do periodic probes mean when you don't have CPU allocated? ● mTLS/cert auth is still not possible Unsupported (for now) from Kubernetes

Slide 29

Slide 29 text

● Cloud Run: ○ solid highly-scalable serving layer ○ scales real-time, based on in-flight requests ○ designed with spiky workloads in mind ○ request buffering during autoscaling ● Kubernetes: ○ we all typically over-provision for unpredicted load ○ CPU/memory metrics are a side-effect of load, and often collected w/ a delay ○ without a buffering meat shield, excessive load will crash Pods. Kubernetes+Cloud Run=?

Slide 30

Slide 30 text

Kubernetes /> Cloud Run: 1. use GCP workload identity 2. get a JWT token from metadata service + add it to outbound request 3. set IAM permission on Cloud Run 4. Cloud Run serving layer automatically checks the JWT + IAM permissions of it. Cloud Run /> Kubernetes: 1. connect to the same VPC 2. provision an internal LB (or use VPC private IPs) 3. get a JWT token from metadata service + add it to outbound request 4. verify the JWT in your app coming from Cloud Run + DIY ACLs. Hybrid?

Slide 31

Slide 31 text

thanks Documentation: https://cloud.run FAQ: github.com/ahmetb/cloud-run-faq twitter.com/ahmetb github.com/ahmetb

Slide 32

Slide 32 text

Appendix Migrating a Kubernetes deployment to Knative

Slide 33

Slide 33 text

apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: replicas: 1 selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080

Slide 34

Slide 34 text

apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: replicas: 1 selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080 no need, Knative will give us a $PORT

Slide 35

Slide 35 text

apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: replicas: 1 selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080 no need for all these labels and selectors

Slide 36

Slide 36 text

apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: replicas: 1 selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080 Knative autoscales

Slide 37

Slide 37 text

apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: replicas: 1 selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080 Knative creates both internal and external endpoints by default

Slide 38

Slide 38 text

apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: replicas: 1 selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080 No need for a container name if you have only one

Slide 39

Slide 39 text

apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080 apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: replicas: 1 selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service

Slide 40

Slide 40 text

apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: template: spec: containers: - image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi

Slide 41

Slide 41 text

apiVersion: apps/v1 serving.knative.dev/v1 kind: Deployment Service metadata: name: hello-web spec: template: spec: containers: - image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi

Slide 42

Slide 42 text

apiVersion: serving.knative.dev/v1 kind: Service metadata: name: hello-web spec: template: spec: containers: - image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi