Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Robust Containers

Robust Containers

DockerCon Keynote
June 10, 2014

Eric Brewer

June 10, 2014
Tweet

More Decks by Eric Brewer

Other Decks in Technology

Transcript

  1. Robust Containers
    Eric Brewer
    VP, Infrastructure
    DockerCon Keynote, June 10, 2014
    @eric_brewer #dockercon

    View full-size slide

  2. 1) Application-centric, not machine-centric view
    It is easier, more natural, and more productive
    Why we Love Containers
    Over 2B containers launched per week
    (even our VMs run inside containers)
    We evolved here over the last decade…
    but Docker made them exciting and much easier to use (thanks!)
    2) Essentially the way Google works internally:
    Signed static bundles + Linux containers (resolve dependencies up front)

    View full-size slide

  3. Containers interfere with each other
    • Unimportant things break important things
    • We want fair use among equally important things
    Solution: resource & performance isolation
    Series of open-source solutions:
    2005: cpusets + “fake” NUMA to partition cores, memory
    2006: cgroups for general task hierarchies
    2009: bandwidth fair use, QoS levels
    2010: memcg for better memory accounting, enforcement
    Status: isolation works well in practice (if you use these tools)
    First Problem: Unpredictable Interference

    View full-size slide

  4. Second Problem: Low Utilization
    Tier 1: Live services (e.g. search engine)
    • Provision for peak load (2-10x higher than average)
    • High priority, always get resources when needed
    Tier 2: Batch jobs (e.g. MapReduce)
    • Run in the leftovers, never displace Tier 1
    • Lots of capacity — rarely at peak load
    If you partition resources, utilization goes down…
    Solution: controlled use of slack resources (free $$)
    Status: Our OSS container solutions support this well
    Note: Google does not overcommit customer VMs — you get the whole VM all the time

    View full-size slide

  5. Third Problem: Hard to Enforce Isolation
    Bad way: control loop (see LPC 2011)
    • Read stats, verify allocation, tune knobs, repeat
    • Slow response time, fragile
    Right way:
    • Direct enforcement in the kernel
    • Many patches to make this happen… (e.g. memcg)
    Status: enforcement now mostly in the kernel
    • Caches, memory bandwidth can still cause interference
    • Challenges getting these changes accepted upstream
    • Meta control loop: detect interference and migrate tasks (see CPI2)

    View full-size slide

  6. “Let Me Contain That For You” LMCTFY = “L-M-C-T-fee”
    You want this, but didn’t know it
    • Declarative allocation, prioritization of resources
    • Enforces resource isolation, with multiple hierarchies
    • Many resources: CPU, memory, bandwidth, latency, disk I/O, …
    • Enables better utilization
    • Stable API, as kernel mechanisms continue to evolve
    • Released as OSS in 2013 (see LPC 2013)
    OSS containers based on Docker are a core foundation for the future
    • Many contributors over the decade: SGI, LXC, RedHat, Parallels, Docker, …
    • We hope to move LMCTFY functionality into Docker’s libcontainer
    • Released for Docker Hackathon: cAdvisor for container stats & alerts (written in Go)

    View full-size slide

  7. Pods (or how we really use containers)
    We actually use groups of nested containers = pods
    • Use LMCTFY for nesting, isolation & utilization
    • Many things implemented as helpers:
    • Logging and log rotation
    • Content management system + webserver
    Pod attributes:
    • Containers deployed together (in a parent container!)
    • Shared local volumes
    • Individual IP address (even if multiple pods per VM)
    • Ensures clean port allocation
    OK, we don’t always use a single IP per pod, but we should have…
    Without this, you need to track/distribute port allocations, since they must be late bound...

    View full-size slide

  8. Kubernetes “koo ber NAY tace” — Greek for “helmsman”
    New OSS release: orchestrating replicated pods across multiple nodes
    Craig McLuckie, Brendan Burns to cover at 2pm today
    Master:
    • Manages worker pods dynamically
    • Uses etcd to track desired configuration API Server
    Replica
    Controller
    etcd
    k Workers:
    • Replicated Docker image
    • Parameterized: arguments passed in via
    environment variables
    • Shared view of load-balanced services
    Kubelet
    Service
    Proxy
    Docker

    View full-size slide

  9. Concept 1: Labels and Services
    Service = load-balanced replica set
    • Pod labels ⇒ the services they implement
    • Pods access services via localhost:
    • (Local) proxy sends traffic to member of set
    • Ports are the service “names”
    {
    "id": "redisslave",
    "port": 10001,
    "labels": {
    "name": "redisslave"
    }
    }
    Service Definition (JSON)
    "labels": {
    "name": "redisslave"
    }
    Partial pod definition (JSON)
    Pods have labels
    Many overlapping sets of labels:
    stage: production name: redis
    zone: west version: 2.6
    Replica set = a group of pods with the same labels
    The set is defined by a query (not a static list)
    (because entropy happens)

    View full-size slide

  10. Having an explicit desired state is a good idea!
    Otherwise can’t tell if the desire changed, or the actual state changes
    Concept 2: The Reconciler Model
    Key idea: Declare the desired state
    Loop { // the reconciler loop, run by master
    • Query the actual state of the system
    • Compare with desired state
    • Implement corrections (if any) // reconcile reality with desired state
    }
    In Kubernetes
    desiredState: if we lose a replica for some reason, add one
    replicas: 2

    View full-size slide

  11. Robust Containers
    Docker (used well) ⇒ clean, repeatable images
    Single-Node (pods):
    • Allocate ports per pod (conflict free!)
    • Attach data-only containers to the pod (as volumes)
    (clean sharing of data)
    • “Parameterized containers” using environment variables
    Multi-Node:
    • Labels for time-varying overlapping sets
    • Services are load-balanced groups of replicated pods
    • The Reconciler Model recovers from changes (expected or not)
    (actually used at worker level and master level)

    View full-size slide

  12. Containers are the Path to “Cloud Native”
    Pods as a building block
    • Clean port namespace
    • Shared volumes
    • Isolation, prioritization, tools for utilization
    • Auto restart (don’t run supervisord k times)
    • Liveness probes, stats for load balancing
    • sshd in environment (not in your container)
    Application-level cloud events per container or pod
    • Start, stop, restart
    • Notification of migration, resizing, new shards, ...
    • Resource alerts, OOM management
    Services and labels
    Reconciliation

    View full-size slide

  13. Summary
    We are standardizing around the Docker container image
    • Pushing for usable, scalable, open containers
    • Isolation, nesting, utilization, enforcement
    • Moving to Go to simplify integration, and because we like it!
    Thanks to Docker…
    for making containers lightweight, easy to use, and exciting!
    We look forward to creating a great robust space together
    News today:
    • Kubernetes: see Craig & Brendan at 2pm today
    • Docker on GAE: see Ekaterina Volkova at 2:50pm today
    • cAdvisor: stats & alerts for containers

    View full-size slide