Cloud Run for Kubernetes users

@ahmetb Cloud Run for Kubernetes users

about me • once loved/advocated for Kubernetes • now I
think it’s an overkill for many you may know me from: • kubectx / kubens • Krew (kubectl plugin manager) • Kubernetes NetworkPolicy recipes

cloud.run Serverless containers on Google Cloud’s managed infrastructure

App Engine, Heroku, .. Kubernetes Cloud Run

If you know Knative, Cloud Run is a hosted Knative
Serving API GKE Standard GKE Autopilot Google infra (Borg) Knative open source Kubernetes / GKE Knative API UI CLI YAML Cloud Run Anthos GKE-on-prem

$ gcloud run deploy \ /-image=gcr.io/my-image:latest \ /-cpu=2 \ /-memory=512Mi
\ /-allow-unauthenticated \ /-set-env-vars FOO=BAR \ /-max-instances=100 \ /-concurrency=5 \ [//.] imperative deployments

Very similar to Kubernetes Deployment+Service objects • Knative Service CRD
◦ K8s Service+Deployment merged. • same PodSpec as Kubernetes $ gcloud run services replace <MANIFEST> declarative deployments

• designed for stateless applications that are serving HTTP-based protocols
• great for: ◦ microservices ◦ APIs ◦ frontends • not for: ◦ databases ◦ custom binary protocols ◦ game servers ◦ run-to-completion jobs what does it run?

container contract • Linux executables (x86-64 ABI) • Deploy OCI
container images No Windows or ARM support.

CPU allocated only during requests (no background threads) • very
low/no cpu otherwise • ok for garbage collection etc • not enough to push out metrics/traces CPU OFF CPU ON CPU OFF _

container lifecycle • new containers are started on ◦ first-time
deployment ◦ scale-up ◦ failed health checks • we wait until the container is listening on $PORT ◦ now ready to receive traffic • CPU is allocated only during requests • unused container gets terminated with SIGTERM

networking: protocols works on Cloud Run: • HTTP/1 • HTTP/2
(h2c) • WebSockets • gRPC no arbitrary/binary protocols

networking: TLS • Cloud Run provisions TLS certificates ◦ both
for *.run.app & custom domain • forced HTTP→HTTPS redirects • your app should prepare unencrypted responses ◦ encryption added later by the infra

• built-in L7 load balancing for a service • routes
traffic between container instances • no unintentional sticky sessions (per-request) serving: load balancing

• can split traffic between revisions (90% → v1, 10%
→ v2) • used for canary deployments, progressive rollouts • can have a domain for each revision: ◦ v1//-myservice.[//.].run.app ◦ commit-aeab13f//-myservice.[//.].run.app ◦ latest//-myservice.[//.].run.app serving: traffic splitting/canary

break the glass & create Google Cloud L7 HTTPS LB.
◦ enable CDN ◦ BYO TLS certs ◦ multi-region load balancing/failover ◦ IAP (Identity-Aware Proxy a.k.a. BeyondCorp) serving: advanced load balancing

serving: concurrency limits • typically functions/lambda: 1 req/container • Cloud
Run: supports concurrents requests (up to 250) why? • helps you limit your load • informs autoscaling decisions • overlapping requests are not double-charged

• based on in-flight HTTP requests (no CPU/memory targets like
Kubernetes HPA) • concurrency=5 + inflight_requests=15 → 3 containers • you can limit max container instances (prevent unbounded spending) • no guarantee how long idle containers will be around autoscaling

serving: request/response limits • max request timeout: 60 minutes •
no maximum request/response size (as much as you can recv or send) • requests or responses are not buffered (bi-directional streaming)

pricing pay only "during" requests (for CPU/memory) • overlapping requests
are not double-charged networking egress costs + request count costs exist. generous free tier (cloud.google.com/free) FREE CHARGED FREE _

• cold starts exist! (inactive services scaling up 0/>1, new
container instance waking up to handle traffic) • minimize cold starts by using "minimum instances". ◦ these are kept "warm" ◦ costs ~10% of the active (serving) time cold start

• gVisor sandbox ◦ Linux syscall emulation in userspace ◦
used internally at Google to run untrusted workloads on Borg ◦ low-level Linux APIs might not work: ▪ iptables, some ioctls, … • might change soon kernel

• files written to local disk: ◦ are not persisted
◦ counts towards available memory of container instance • (currently) no support for mounting external volumes ◦ such as Cloud Filestore (NAS) or Cloud Storage filesystem/volumes

• no current support • upcoming support for Google Cloud
Secret Manager ◦ it will look like Kubernetes secret volume/volumeMounts secrets/configmaps

• individual container instances are not exposed to you (to
keep things simple) • auto-healing properties ◦ faulty containers (20 consecutive 5xx) are replaced ◦ crash-looping containers are replaced container instance fleet

1. Know the full URL (xxx.run.app) 2. Get a JWT
token 3. Add as “Authorization” header to outgoing request Kubernetes or K8s+Istio experience shines here (for now). service-to-service calls

• Sidecars / multiple containers per Pod ◦ Knative supports
this • Init containers • Health probes ◦ Cloud Run has primitive built-in startup health check ◦ What do periodic probes mean when you don't have CPU allocated? • mTLS/cert auth is still not possible Unsupported (for now) from Kubernetes

• Cloud Run: ◦ solid highly-scalable serving layer ◦ scales
real-time, based on in-flight requests ◦ designed with spiky workloads in mind ◦ request buffering during autoscaling • Kubernetes: ◦ we all typically over-provision for unpredicted load ◦ CPU/memory metrics are a side-effect of load, and often collected w/ a delay ◦ without a buffering meat shield, excessive load will crash Pods. Kubernetes+Cloud Run=?

Kubernetes /> Cloud Run: 1. use GCP workload identity 2.
get a JWT token from metadata service + add it to outbound request 3. set IAM permission on Cloud Run 4. Cloud Run serving layer automatically checks the JWT + IAM permissions of it. Cloud Run /> Kubernetes: 1. connect to the same VPC 2. provision an internal LB (or use VPC private IPs) 3. get a JWT token from metadata service + add it to outbound request 4. verify the JWT in your app coming from Cloud Run + DIY ACLs. Hybrid?

thanks Documentation: https://cloud.run FAQ: github.com/ahmetb/cloud-run-faq twitter.com/ahmetb github.com/ahmetb

Appendix Migrating a Kubernetes deployment to Knative

apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: replicas: 1
selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080

selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080 no need, Knative will give us a $PORT

selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080 no need for all these labels and selectors

selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080 Knative autoscales

selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080 Knative creates both internal and external endpoints by default

selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080 No need for a container name if you have only one

apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer
selector: app: hello tier: web ports: - port: 80 targetPort: 8080 apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: replicas: 1 selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service

apiVersion: apps/v1 kind: Deployment metadata: name: hello-web spec: template: spec:
containers: - image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi

apiVersion: apps/v1 serving.knative.dev/v1 kind: Deployment Service metadata: name: hello-web spec:
template: spec: containers: - image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi

apiVersion: serving.knative.dev/v1 kind: Service metadata: name: hello-web spec: template: spec:
containers: - image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi

Cloud Run for Kubernetes users

Cloud Run for Kubernetes users

More Decks by Ahmet Alp Balkan

Other Decks in Technology

Featured

Transcript