• great for: ◦ microservices ◦ APIs ◦ frontends • not for: ◦ databases ◦ custom binary protocols ◦ game servers ◦ run-to-completion jobs what does it run?
deployment ◦ scale-up ◦ failed health checks • we wait until the container is listening on $PORT ◦ now ready to receive traffic • CPU is allocated only during requests • unused container gets terminated with SIGTERM
→ v2) • used for canary deployments, progressive rollouts • can have a domain for each revision: ◦ v1//-myservice.[//.].run.app ◦ commit-aeab13f//-myservice.[//.].run.app ◦ latest//-myservice.[//.].run.app serving: traffic splitting/canary
Run: supports concurrents requests (up to 250) why? • helps you limit your load • informs autoscaling decisions • overlapping requests are not double-charged
Kubernetes HPA) • concurrency=5 + inflight_requests=15 → 3 containers • you can limit max container instances (prevent unbounded spending) • no guarantee how long idle containers will be around autoscaling
container instance waking up to handle traffic) • minimize cold starts by using "minimum instances". ◦ these are kept "warm" ◦ costs ~10% of the active (serving) time cold start
used internally at Google to run untrusted workloads on Borg ◦ low-level Linux APIs might not work: ▪ iptables, some ioctls, … • might change soon kernel
◦ counts towards available memory of container instance • (currently) no support for mounting external volumes ◦ such as Cloud Filestore (NAS) or Cloud Storage filesystem/volumes
this • Init containers • Health probes ◦ Cloud Run has primitive built-in startup health check ◦ What do periodic probes mean when you don't have CPU allocated? • mTLS/cert auth is still not possible Unsupported (for now) from Kubernetes
real-time, based on in-flight requests ◦ designed with spiky workloads in mind ◦ request buffering during autoscaling • Kubernetes: ◦ we all typically over-provision for unpredicted load ◦ CPU/memory metrics are a side-effect of load, and often collected w/ a delay ◦ without a buffering meat shield, excessive load will crash Pods. Kubernetes+Cloud Run=?
get a JWT token from metadata service + add it to outbound request 3. set IAM permission on Cloud Run 4. Cloud Run serving layer automatically checks the JWT + IAM permissions of it. Cloud Run /> Kubernetes: 1. connect to the same VPC 2. provision an internal LB (or use VPC private IPs) 3. get a JWT token from metadata service + add it to outbound request 4. verify the JWT in your app coming from Cloud Run + DIY ACLs. Hybrid?
selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080 no need, Knative will give us a $PORT
selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080 no need for all these labels and selectors
selector: matchLabels: app: hello tier: web template: metadata: labels: app: hello tier: web spec: containers: - name: main image: gcr.io/google-samples/hello-app:1.0 resources: limits: cpu: 100m memory: 256Mi Kubernetes Deployment Kubernetes Service apiVersion: v1 kind: Service metadata: name: hello-web spec: type: LoadBalancer selector: app: hello tier: web ports: - port: 80 targetPort: 8080 No need for a container name if you have only one