Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cloud Run for Kubernetes users

Cloud Run for Kubernetes users

Ahmet Alp Balkan

March 12, 2021
Tweet

More Decks by Ahmet Alp Balkan

Other Decks in Technology

Transcript

  1. @ahmetb
    Cloud Run for
    Kubernetes users

    View Slide

  2. about me
    ● once loved/advocated for Kubernetes
    ● now I think it’s an overkill for many
    you may know me from:
    ● kubectx / kubens
    ● Krew (kubectl plugin manager)
    ● Kubernetes NetworkPolicy recipes

    View Slide

  3. cloud.run
    Serverless containers on
    Google Cloud’s
    managed infrastructure

    View Slide

  4. App Engine,
    Heroku, ..
    Kubernetes
    Cloud Run

    View Slide

  5. DEMO

    View Slide

  6. If you know Knative,
    Cloud Run is a hosted
    Knative Serving API
    GKE
    Standard
    GKE
    Autopilot
    Google infra
    (Borg)
    Knative open source
    Kubernetes / GKE Knative API
    UI CLI YAML
    Cloud Run
    Anthos
    GKE-on-prem

    View Slide

  7. $ gcloud run deploy \
    /-image=gcr.io/my-image:latest \
    /-cpu=2 \
    /-memory=512Mi \
    /-allow-unauthenticated \
    /-set-env-vars FOO=BAR \
    /-max-instances=100 \
    /-concurrency=5 \
    [//.]
    imperative deployments

    View Slide

  8. Very similar to Kubernetes
    Deployment+Service objects
    ● Knative Service CRD
    ○ K8s Service+Deployment merged.
    ● same PodSpec as Kubernetes
    $ gcloud run services replace
    declarative deployments

    View Slide

  9. ● designed for stateless applications
    that are serving HTTP-based protocols
    ● great for:
    ○ microservices
    ○ APIs
    ○ frontends
    ● not for:
    ○ databases
    ○ custom binary protocols
    ○ game servers
    ○ run-to-completion jobs
    what does it run?

    View Slide

  10. container contract
    ● Linux executables (x86-64 ABI)
    ● Deploy OCI container images
    No Windows or ARM support.

    View Slide

  11. CPU
    allocated only during requests (no background threads)
    ● very low/no cpu otherwise
    ● ok for garbage collection etc
    ● not enough to push out metrics/traces
    CPU OFF CPU ON CPU OFF
    _

    View Slide

  12. container lifecycle
    ● new containers are started on
    ○ first-time deployment
    ○ scale-up
    ○ failed health checks
    ● we wait until the container is listening on $PORT
    ○ now ready to receive traffic
    ● CPU is allocated only during requests
    ● unused container gets terminated with SIGTERM

    View Slide

  13. networking: protocols
    works on Cloud Run:
    ● HTTP/1
    ● HTTP/2 (h2c)
    ● WebSockets
    ● gRPC
    no arbitrary/binary protocols

    View Slide

  14. networking: TLS
    ● Cloud Run provisions TLS certificates
    ○ both for *.run.app & custom domain
    ● forced HTTP→HTTPS redirects
    ● your app should prepare unencrypted responses
    ○ encryption added later by the infra

    View Slide

  15. ● built-in L7 load balancing for a service
    ● routes traffic between container instances
    ● no unintentional sticky sessions (per-request)
    serving: load balancing

    View Slide

  16. ● can split traffic between revisions
    (90% → v1, 10% → v2)
    ● used for canary deployments, progressive rollouts
    ● can have a domain for each revision:
    ○ v1//-myservice.[//.].run.app
    ○ commit-aeab13f//-myservice.[//.].run.app
    ○ latest//-myservice.[//.].run.app
    serving: traffic splitting/canary

    View Slide

  17. break the glass & create Google Cloud L7 HTTPS LB.
    ○ enable CDN
    ○ BYO TLS certs
    ○ multi-region load balancing/failover
    ○ IAP (Identity-Aware Proxy a.k.a. BeyondCorp)
    serving: advanced load balancing

    View Slide

  18. serving: concurrency limits
    ● typically functions/lambda: 1 req/container
    ● Cloud Run: supports concurrents requests (up to 250)
    why?
    ● helps you limit your load
    ● informs autoscaling decisions
    ● overlapping requests are not double-charged

    View Slide

  19. ● based on in-flight HTTP requests
    (no CPU/memory targets like Kubernetes HPA)
    ● concurrency=5 + inflight_requests=15 → 3 containers
    ● you can limit max container instances
    (prevent unbounded spending)
    ● no guarantee how long idle containers will be around
    autoscaling

    View Slide

  20. serving: request/response limits
    ● max request timeout: 60 minutes
    ● no maximum request/response size
    (as much as you can recv or send)
    ● requests or responses are not buffered
    (bi-directional streaming)

    View Slide

  21. pricing
    pay only "during" requests (for CPU/memory)
    ● overlapping requests are not double-charged
    networking egress costs + request count costs exist.
    generous free tier (cloud.google.com/free)
    FREE CHARGED FREE
    _

    View Slide

  22. ● cold starts exist!
    (inactive services scaling up 0/>1,
    new container instance waking up to handle traffic)
    ● minimize cold starts by using "minimum instances".
    ○ these are kept "warm"
    ○ costs ~10% of the active (serving) time
    cold start

    View Slide

  23. ● gVisor sandbox
    ○ Linux syscall emulation in userspace
    ○ used internally at Google to run untrusted workloads on Borg
    ○ low-level Linux APIs might not work:
    ■ iptables, some ioctls, …
    ● might change soon
    kernel

    View Slide

  24. ● files written to local disk:
    ○ are not persisted
    ○ counts towards available memory of container instance
    ● (currently) no support for mounting external volumes
    ○ such as Cloud Filestore (NAS) or Cloud Storage
    filesystem/volumes

    View Slide

  25. ● no current support
    ● upcoming support for Google Cloud Secret Manager
    ○ it will look like Kubernetes secret volume/volumeMounts
    secrets/configmaps

    View Slide

  26. ● individual container instances are
    not exposed to you (to keep things simple)
    ● auto-healing properties
    ○ faulty containers (20 consecutive 5xx) are replaced
    ○ crash-looping containers are replaced
    container instance fleet

    View Slide

  27. 1. Know the full URL (xxx.run.app)
    2. Get a JWT token
    3. Add as “Authorization” header to outgoing request
    Kubernetes or K8s+Istio experience shines here (for now).
    service-to-service calls

    View Slide

  28. ● Sidecars / multiple containers per Pod
    ○ Knative supports this
    ● Init containers
    ● Health probes
    ○ Cloud Run has primitive built-in startup health check
    ○ What do periodic probes mean when you don't have CPU allocated?
    ● mTLS/cert auth is still not possible
    Unsupported (for now) from Kubernetes

    View Slide

  29. ● Cloud Run:
    ○ solid highly-scalable serving layer
    ○ scales real-time, based on in-flight requests
    ○ designed with spiky workloads in mind
    ○ request buffering during autoscaling
    ● Kubernetes:
    ○ we all typically over-provision for unpredicted load
    ○ CPU/memory metrics are a side-effect
    of load, and often collected w/ a delay
    ○ without a buffering meat shield,
    excessive load will crash Pods.
    Kubernetes+Cloud Run=?

    View Slide

  30. Kubernetes /> Cloud Run:
    1. use GCP workload identity
    2. get a JWT token from metadata service + add it to outbound request
    3. set IAM permission on Cloud Run
    4. Cloud Run serving layer automatically checks the JWT + IAM permissions of it.
    Cloud Run /> Kubernetes:
    1. connect to the same VPC
    2. provision an internal LB (or use VPC private IPs)
    3. get a JWT token from metadata service + add it to outbound request
    4. verify the JWT in your app coming from Cloud Run + DIY ACLs.
    Hybrid?

    View Slide

  31. thanks
    Documentation: https://cloud.run
    FAQ: github.com/ahmetb/cloud-run-faq
    twitter.com/ahmetb
    github.com/ahmetb

    View Slide

  32. Appendix
    Migrating a Kubernetes
    deployment to Knative

    View Slide

  33. apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: hello-web
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: hello
    tier: web
    template:
    metadata:
    labels:
    app: hello
    tier: web
    spec:
    containers:
    - name: main
    image: gcr.io/google-samples/hello-app:1.0
    resources:
    limits:
    cpu: 100m
    memory: 256Mi
    Kubernetes Deployment Kubernetes Service
    apiVersion: v1
    kind: Service
    metadata:
    name: hello-web
    spec:
    type: LoadBalancer
    selector:
    app: hello
    tier: web
    ports:
    - port: 80
    targetPort: 8080

    View Slide

  34. apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: hello-web
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: hello
    tier: web
    template:
    metadata:
    labels:
    app: hello
    tier: web
    spec:
    containers:
    - name: main
    image: gcr.io/google-samples/hello-app:1.0
    resources:
    limits:
    cpu: 100m
    memory: 256Mi
    Kubernetes Deployment Kubernetes Service
    apiVersion: v1
    kind: Service
    metadata:
    name: hello-web
    spec:
    type: LoadBalancer
    selector:
    app: hello
    tier: web
    ports:
    - port: 80
    targetPort: 8080
    no need, Knative will
    give us a $PORT

    View Slide

  35. apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: hello-web
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: hello
    tier: web
    template:
    metadata:
    labels:
    app: hello
    tier: web
    spec:
    containers:
    - name: main
    image: gcr.io/google-samples/hello-app:1.0
    resources:
    limits:
    cpu: 100m
    memory: 256Mi
    Kubernetes Deployment Kubernetes Service
    apiVersion: v1
    kind: Service
    metadata:
    name: hello-web
    spec:
    type: LoadBalancer
    selector:
    app: hello
    tier: web
    ports:
    - port: 80
    targetPort: 8080
    no need for all these
    labels and selectors

    View Slide

  36. apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: hello-web
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: hello
    tier: web
    template:
    metadata:
    labels:
    app: hello
    tier: web
    spec:
    containers:
    - name: main
    image: gcr.io/google-samples/hello-app:1.0
    resources:
    limits:
    cpu: 100m
    memory: 256Mi
    Kubernetes Deployment Kubernetes Service
    apiVersion: v1
    kind: Service
    metadata:
    name: hello-web
    spec:
    type: LoadBalancer
    selector:
    app: hello
    tier: web
    ports:
    - port: 80
    targetPort: 8080
    Knative autoscales

    View Slide

  37. apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: hello-web
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: hello
    tier: web
    template:
    metadata:
    labels:
    app: hello
    tier: web
    spec:
    containers:
    - name: main
    image: gcr.io/google-samples/hello-app:1.0
    resources:
    limits:
    cpu: 100m
    memory: 256Mi
    Kubernetes Deployment Kubernetes Service
    apiVersion: v1
    kind: Service
    metadata:
    name: hello-web
    spec:
    type: LoadBalancer
    selector:
    app: hello
    tier: web
    ports:
    - port: 80
    targetPort: 8080
    Knative creates both
    internal and external
    endpoints by default

    View Slide

  38. apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: hello-web
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: hello
    tier: web
    template:
    metadata:
    labels:
    app: hello
    tier: web
    spec:
    containers:
    - name: main
    image: gcr.io/google-samples/hello-app:1.0
    resources:
    limits:
    cpu: 100m
    memory: 256Mi
    Kubernetes Deployment Kubernetes Service
    apiVersion: v1
    kind: Service
    metadata:
    name: hello-web
    spec:
    type: LoadBalancer
    selector:
    app: hello
    tier: web
    ports:
    - port: 80
    targetPort: 8080
    No need for a
    container name if
    you have only one

    View Slide

  39. apiVersion: v1
    kind: Service
    metadata:
    name: hello-web
    spec:
    type: LoadBalancer
    selector:
    app: hello
    tier: web
    ports:
    - port: 80
    targetPort: 8080
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: hello-web
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: hello
    tier: web
    template:
    metadata:
    labels:
    app: hello
    tier: web
    spec:
    containers:
    - name: main
    image: gcr.io/google-samples/hello-app:1.0
    resources:
    limits:
    cpu: 100m
    memory: 256Mi
    Kubernetes Deployment Kubernetes Service

    View Slide

  40. apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: hello-web
    spec:
    template:
    spec:
    containers:
    - image: gcr.io/google-samples/hello-app:1.0
    resources:
    limits:
    cpu: 100m
    memory: 256Mi

    View Slide

  41. apiVersion: apps/v1 serving.knative.dev/v1
    kind: Deployment Service
    metadata:
    name: hello-web
    spec:
    template:
    spec:
    containers:
    - image: gcr.io/google-samples/hello-app:1.0
    resources:
    limits:
    cpu: 100m
    memory: 256Mi

    View Slide

  42. apiVersion: serving.knative.dev/v1
    kind: Service
    metadata:
    name: hello-web
    spec:
    template:
    spec:
    containers:
    - image: gcr.io/google-samples/hello-app:1.0
    resources:
    limits:
    cpu: 100m
    memory: 256Mi

    View Slide