Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cloud Run for Kubernetes users

Cloud Run for Kubernetes users

Ahmet Alp Balkan

March 12, 2021
Tweet

More Decks by Ahmet Alp Balkan

Other Decks in Technology

Transcript

  1. @ahmetb
    Cloud Run for
    Kubernetes users

    View full-size slide

  2. about me
    ● once loved/advocated for Kubernetes
    ● now I think it’s an overkill for many
    you may know me from:
    ● kubectx / kubens
    ● Krew (kubectl plugin manager)
    ● Kubernetes NetworkPolicy recipes

    View full-size slide

  3. cloud.run
    Serverless containers on
    Google Cloud’s
    managed infrastructure

    View full-size slide

  4. App Engine,
    Heroku, ..
    Kubernetes
    Cloud Run

    View full-size slide

  5. If you know Knative,
    Cloud Run is a hosted
    Knative Serving API
    GKE
    Standard
    GKE
    Autopilot
    Google infra
    (Borg)
    Knative open source
    Kubernetes / GKE Knative API
    UI CLI YAML
    Cloud Run
    Anthos
    GKE-on-prem

    View full-size slide

  6. $ gcloud run deploy \
    /-image=gcr.io/my-image:latest \
    /-cpu=2 \
    /-memory=512Mi \
    /-allow-unauthenticated \
    /-set-env-vars FOO=BAR \
    /-max-instances=100 \
    /-concurrency=5 \
    [//.]
    imperative deployments

    View full-size slide

  7. Very similar to Kubernetes
    Deployment+Service objects
    ● Knative Service CRD
    ○ K8s Service+Deployment merged.
    ● same PodSpec as Kubernetes
    $ gcloud run services replace
    declarative deployments

    View full-size slide

  8. ● designed for stateless applications
    that are serving HTTP-based protocols
    ● great for:
    ○ microservices
    ○ APIs
    ○ frontends
    ● not for:
    ○ databases
    ○ custom binary protocols
    ○ game servers
    ○ run-to-completion jobs
    what does it run?

    View full-size slide

  9. container contract
    ● Linux executables (x86-64 ABI)
    ● Deploy OCI container images
    No Windows or ARM support.

    View full-size slide

  10. CPU
    allocated only during requests (no background threads)
    ● very low/no cpu otherwise
    ● ok for garbage collection etc
    ● not enough to push out metrics/traces
    CPU OFF CPU ON CPU OFF
    _

    View full-size slide

  11. container lifecycle
    ● new containers are started on
    ○ first-time deployment
    ○ scale-up
    ○ failed health checks
    ● we wait until the container is listening on $PORT
    ○ now ready to receive traffic
    ● CPU is allocated only during requests
    ● unused container gets terminated with SIGTERM

    View full-size slide

  12. networking: protocols
    works on Cloud Run:
    ● HTTP/1
    ● HTTP/2 (h2c)
    ● WebSockets
    ● gRPC
    no arbitrary/binary protocols

    View full-size slide

  13. networking: TLS
    ● Cloud Run provisions TLS certificates
    ○ both for *.run.app & custom domain
    ● forced HTTP→HTTPS redirects
    ● your app should prepare unencrypted responses
    ○ encryption added later by the infra

    View full-size slide

  14. ● built-in L7 load balancing for a service
    ● routes traffic between container instances
    ● no unintentional sticky sessions (per-request)
    serving: load balancing

    View full-size slide

  15. ● can split traffic between revisions
    (90% → v1, 10% → v2)
    ● used for canary deployments, progressive rollouts
    ● can have a domain for each revision:
    ○ v1//-myservice.[//.].run.app
    ○ commit-aeab13f//-myservice.[//.].run.app
    ○ latest//-myservice.[//.].run.app
    serving: traffic splitting/canary

    View full-size slide

  16. break the glass & create Google Cloud L7 HTTPS LB.
    ○ enable CDN
    ○ BYO TLS certs
    ○ multi-region load balancing/failover
    ○ IAP (Identity-Aware Proxy a.k.a. BeyondCorp)
    serving: advanced load balancing

    View full-size slide

  17. serving: concurrency limits
    ● typically functions/lambda: 1 req/container
    ● Cloud Run: supports concurrents requests (up to 250)
    why?
    ● helps you limit your load
    ● informs autoscaling decisions
    ● overlapping requests are not double-charged

    View full-size slide

  18. ● based on in-flight HTTP requests
    (no CPU/memory targets like Kubernetes HPA)
    ● concurrency=5 + inflight_requests=15 → 3 containers
    ● you can limit max container instances
    (prevent unbounded spending)
    ● no guarantee how long idle containers will be around
    autoscaling

    View full-size slide

  19. serving: request/response limits
    ● max request timeout: 60 minutes
    ● no maximum request/response size
    (as much as you can recv or send)
    ● requests or responses are not buffered
    (bi-directional streaming)

    View full-size slide

  20. pricing
    pay only "during" requests (for CPU/memory)
    ● overlapping requests are not double-charged
    networking egress costs + request count costs exist.
    generous free tier (cloud.google.com/free)
    FREE CHARGED FREE
    _

    View full-size slide

  21. ● cold starts exist!
    (inactive services scaling up 0/>1,
    new container instance waking up to handle traffic)
    ● minimize cold starts by using "minimum instances".
    ○ these are kept "warm"
    ○ costs ~10% of the active (serving) time
    cold start

    View full-size slide

  22. ● gVisor sandbox
    ○ Linux syscall emulation in userspace
    ○ used internally at Google to run untrusted workloads on Borg
    ○ low-level Linux APIs might not work:
    ■ iptables, some ioctls, …
    ● might change soon
    kernel

    View full-size slide

  23. ● files written to local disk:
    ○ are not persisted
    ○ counts towards available memory of container instance
    ● (currently) no support for mounting external volumes
    ○ such as Cloud Filestore (NAS) or Cloud Storage
    filesystem/volumes

    View full-size slide

  24. ● no current support
    ● upcoming support for Google Cloud Secret Manager
    ○ it will look like Kubernetes secret volume/volumeMounts
    secrets/configmaps

    View full-size slide

  25. ● individual container instances are
    not exposed to you (to keep things simple)
    ● auto-healing properties
    ○ faulty containers (20 consecutive 5xx) are replaced
    ○ crash-looping containers are replaced
    container instance fleet

    View full-size slide

  26. 1. Know the full URL (xxx.run.app)
    2. Get a JWT token
    3. Add as “Authorization” header to outgoing request
    Kubernetes or K8s+Istio experience shines here (for now).
    service-to-service calls

    View full-size slide

  27. ● Sidecars / multiple containers per Pod
    ○ Knative supports this
    ● Init containers
    ● Health probes
    ○ Cloud Run has primitive built-in startup health check
    ○ What do periodic probes mean when you don't have CPU allocated?
    ● mTLS/cert auth is still not possible
    Unsupported (for now) from Kubernetes

    View full-size slide

  28. ● Cloud Run:
    ○ solid highly-scalable serving layer
    ○ scales real-time, based on in-flight requests
    ○ designed with spiky workloads in mind
    ○ request buffering during autoscaling
    ● Kubernetes:
    ○ we all typically over-provision for unpredicted load
    ○ CPU/memory metrics are a side-effect
    of load, and often collected w/ a delay
    ○ without a buffering meat shield,
    excessive load will crash Pods.
    Kubernetes+Cloud Run=?

    View full-size slide

  29. Kubernetes /> Cloud Run:
    1. use GCP workload identity
    2. get a JWT token from metadata service + add it to outbound request
    3. set IAM permission on Cloud Run
    4. Cloud Run serving layer automatically checks the JWT + IAM permissions of it.
    Cloud Run /> Kubernetes:
    1. connect to the same VPC
    2. provision an internal LB (or use VPC private IPs)
    3. get a JWT token from metadata service + add it to outbound request
    4. verify the JWT in your app coming from Cloud Run + DIY ACLs.
    Hybrid?

    View full-size slide

  30. thanks
    Documentation: https://cloud.run
    FAQ: github.com/ahmetb/cloud-run-faq
    twitter.com/ahmetb
    github.com/ahmetb

    View full-size slide

  31. Appendix
    Migrating a Kubernetes
    deployment to Knative

    View full-size slide

  32. apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: hello-web
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: hello
    tier: web
    template:
    metadata:
    labels:
    app: hello
    tier: web
    spec:
    containers:
    - name: main
    image: gcr.io/google-samples/hello-app:1.0
    resources:
    limits:
    cpu: 100m
    memory: 256Mi
    Kubernetes Deployment Kubernetes Service
    apiVersion: v1
    kind: Service
    metadata:
    name: hello-web
    spec:
    type: LoadBalancer
    selector:
    app: hello
    tier: web
    ports:
    - port: 80
    targetPort: 8080

    View full-size slide

  33. apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: hello-web
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: hello
    tier: web
    template:
    metadata:
    labels:
    app: hello
    tier: web
    spec:
    containers:
    - name: main
    image: gcr.io/google-samples/hello-app:1.0
    resources:
    limits:
    cpu: 100m
    memory: 256Mi
    Kubernetes Deployment Kubernetes Service
    apiVersion: v1
    kind: Service
    metadata:
    name: hello-web
    spec:
    type: LoadBalancer
    selector:
    app: hello
    tier: web
    ports:
    - port: 80
    targetPort: 8080
    no need, Knative will
    give us a $PORT

    View full-size slide

  34. apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: hello-web
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: hello
    tier: web
    template:
    metadata:
    labels:
    app: hello
    tier: web
    spec:
    containers:
    - name: main
    image: gcr.io/google-samples/hello-app:1.0
    resources:
    limits:
    cpu: 100m
    memory: 256Mi
    Kubernetes Deployment Kubernetes Service
    apiVersion: v1
    kind: Service
    metadata:
    name: hello-web
    spec:
    type: LoadBalancer
    selector:
    app: hello
    tier: web
    ports:
    - port: 80
    targetPort: 8080
    no need for all these
    labels and selectors

    View full-size slide

  35. apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: hello-web
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: hello
    tier: web
    template:
    metadata:
    labels:
    app: hello
    tier: web
    spec:
    containers:
    - name: main
    image: gcr.io/google-samples/hello-app:1.0
    resources:
    limits:
    cpu: 100m
    memory: 256Mi
    Kubernetes Deployment Kubernetes Service
    apiVersion: v1
    kind: Service
    metadata:
    name: hello-web
    spec:
    type: LoadBalancer
    selector:
    app: hello
    tier: web
    ports:
    - port: 80
    targetPort: 8080
    Knative autoscales

    View full-size slide

  36. apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: hello-web
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: hello
    tier: web
    template:
    metadata:
    labels:
    app: hello
    tier: web
    spec:
    containers:
    - name: main
    image: gcr.io/google-samples/hello-app:1.0
    resources:
    limits:
    cpu: 100m
    memory: 256Mi
    Kubernetes Deployment Kubernetes Service
    apiVersion: v1
    kind: Service
    metadata:
    name: hello-web
    spec:
    type: LoadBalancer
    selector:
    app: hello
    tier: web
    ports:
    - port: 80
    targetPort: 8080
    Knative creates both
    internal and external
    endpoints by default

    View full-size slide

  37. apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: hello-web
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: hello
    tier: web
    template:
    metadata:
    labels:
    app: hello
    tier: web
    spec:
    containers:
    - name: main
    image: gcr.io/google-samples/hello-app:1.0
    resources:
    limits:
    cpu: 100m
    memory: 256Mi
    Kubernetes Deployment Kubernetes Service
    apiVersion: v1
    kind: Service
    metadata:
    name: hello-web
    spec:
    type: LoadBalancer
    selector:
    app: hello
    tier: web
    ports:
    - port: 80
    targetPort: 8080
    No need for a
    container name if
    you have only one

    View full-size slide

  38. apiVersion: v1
    kind: Service
    metadata:
    name: hello-web
    spec:
    type: LoadBalancer
    selector:
    app: hello
    tier: web
    ports:
    - port: 80
    targetPort: 8080
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: hello-web
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: hello
    tier: web
    template:
    metadata:
    labels:
    app: hello
    tier: web
    spec:
    containers:
    - name: main
    image: gcr.io/google-samples/hello-app:1.0
    resources:
    limits:
    cpu: 100m
    memory: 256Mi
    Kubernetes Deployment Kubernetes Service

    View full-size slide

  39. apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: hello-web
    spec:
    template:
    spec:
    containers:
    - image: gcr.io/google-samples/hello-app:1.0
    resources:
    limits:
    cpu: 100m
    memory: 256Mi

    View full-size slide

  40. apiVersion: apps/v1 serving.knative.dev/v1
    kind: Deployment Service
    metadata:
    name: hello-web
    spec:
    template:
    spec:
    containers:
    - image: gcr.io/google-samples/hello-app:1.0
    resources:
    limits:
    cpu: 100m
    memory: 256Mi

    View full-size slide

  41. apiVersion: serving.knative.dev/v1
    kind: Service
    metadata:
    name: hello-web
    spec:
    template:
    spec:
    containers:
    - image: gcr.io/google-samples/hello-app:1.0
    resources:
    limits:
    cpu: 100m
    memory: 256Mi

    View full-size slide