Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Journey to #GIFEE

Rob
March 22, 2016

The Journey to #GIFEE

Walking through the 3 changes required to run a Google-like infrastructure stack on CoreOS + Kubernetes.

Rob

March 22, 2016
Tweet

More Decks by Rob

Other Decks in Technology

Transcript

  1. Rob Szumski
    @robszumski | [email protected]
    The Journey to #GIFEE

    View full-size slide

  2. Secure the Internet
    MISSION

    View full-size slide

  3. 3
    Application packaging
    Linux at scale
    Clustering

    View full-size slide

  4. #GIFEE
    Borg/Omega
    ChromeOS
    Chubby

    View full-size slide

  5. #GIFEE
    Borg/Omega
    ChromeOS
    Chubby

    View full-size slide

  6. #GIFEE
    Borg/Omega
    ChromeOS
    Chubby

    View full-size slide

  7. Linux at Scale
    1

    View full-size slide

  8. Patches to the OS and kernel are hard
    Rolling update tools
    Diverse hardware
    LARGE SCALE
    Safer to leave it alone
    No one owns security
    SMALL SCALE

    View full-size slide

  9. Auto-updating browsers fixed security
    We got HTML5 at the same time

    View full-size slide

  10. Atomic operating system updates

    View full-size slide

  11. Atomic operating system updates

    View full-size slide

  12. PXE/diskless
    Quick reboots
    Easy to boot, install, and manage
    Secure by default
    Cross-cloud

    View full-size slide

  13. Application Packaging
    2

    View full-size slide

  14. Abstract away app from the OS
    OS App

    View full-size slide

  15. Protect apps from each other
    Isolated network namespace
    Isolated file system namespace
    Mixed versions of dependencies
    eg. python 3.4 & python 2.7

    View full-size slide

  16. Base software managed by CoreOS
    systemd kernel OpenSSH

    View full-size slide

  17. Easily move apps between machines
    Easy scale out
    Recover from failure
    Painless OS software update

    View full-size slide

  18. Perfect touch-point for security
    Sign artifact from CI
    Scan containers at rest
    Audit trail

    View full-size slide

  19. A security-minded, standards-based
    container engine

    View full-size slide

  20. Specification for “application containers”

    View full-size slide

  21. Universal Container Format
    Packaged Downloaded Executed

    View full-size slide

  22. apt-get for containers
    Local mirrors
    Distributed namespace (DNS)
    Serve over HTTPS, no complex software

    View full-size slide

  23. Jetpack
    FreeBSD/Go
    Kurma
    Linux/Go
    rkt
    Linux/Go
    Independent GitHub organization
    Contributions from Cloud Foundry,
    Mesosphere, Google, Red Hat and many others

    View full-size slide

  24. Composable
    Designed for init systems
    Standard Unix process
    Separate build tool

    View full-size slide

  25. Composable
    No central daemon
    Not a “platform”

    View full-size slide

  26. systemd
    app
    systemd
    app
    docker run redis
    docker engine daemon

    View full-size slide

  27. $ sudo rkt run coreos.com/etcd:v2.0.0
    $ sudo rkt run coreos.com/etcd:v2.0.0 \
    --cpu=750m --memory=128M
    $ sudo rkt run --net=host coreos.com/etcd:v2.0.0
    rkt run

    View full-size slide

  28. Pods Built-in
    Deployed together
    Share local network
    Share volumes

    View full-size slide

  29. Pods Built-in
    rktnetes

    View full-size slide

  30. Tunable Isolation
    Match your workload
    3 isolation levels
    Make your own stage1

    View full-size slide

  31. stage0
    stage1
    stage2

    View full-size slide

  32. stage0
    stage1
    stage2
    The rkt binary
    ● Fetch ACI, verify
    ● Set up pod filesystem
    ● Unpack stage1 and
    stage2 ACIs

    View full-size slide

  33. stage0
    stage1
    stage2 Set up execution env
    ● Create cgroups,
    namespaces, & mounts
    ● Read pod manifest
    ● Start systemd-nspawn

    View full-size slide

  34. stage0
    stage1
    stage2 Your application!

    View full-size slide

  35. Benefit from standard packaging, signing and distribution at all isolation levels.
    Privileged
    eg. Kubelet
    Container/cgroup
    eg. Webapp
    Virtual Machine
    eg. Untrusted code

    View full-size slide

  36. $ sudo rkt run \
    example.com/worker -- --loglevel verbose --- \
    example.com/syncer -- --interval 30s
    rkt run a pod

    View full-size slide

  37. Unique rkt features
    Sensible, best practice security
    Ease of use for Ops

    View full-size slide

  38. $ sudo rkt gc
    Moving pod "81627cc6" to garbage
    Moving pod "cd642877" to garbage
    Moving pod "d65abad6" to garbage
    Pod "81627cc6" not removed: still within grace period
    (30m0s)
    Pod "cd642877" not removed: still within grace period
    (30m0s)
    Pod "d65abad6" not removed: still within grace period
    (30m0s)
    Garbage Collection
    Run as a cron job, customizable grace period

    View full-size slide

  39. $ sudo rkt trust --prefix=storage.coreos.com
    $ sudo rkt trust --prefix=coreos.com/etcd
    $ sudo rkt trust --root ~/aci-pubkeys.gpg
    Tools for trust
    Easily control what runs on your server

    View full-size slide

  40. $ find /etc/rkt/trustedkeys/
    /etc/rkt/trustedkeys/
    /etc/rkt/trustedkeys/prefix.d
    /etc/rkt/trustedkeys/prefix.d/coreos.com
    /etc/rkt/trustedkeys/prefix.d/coreos.com/etcd
    /etc/rkt/trustedkeys/prefix.d/coreos.
    com/etcd/8b86de38890ddb7291867b025210bd8888182190
    /etc/rkt/trustedkeys/root.d
    /etc/rkt/trustedkeys/root.
    d/d8685c1eff3b2276e5da37fd65eea12767432ac4
    Tools for trust
    Easily control what runs on your server

    View full-size slide

  41. $ rkt fetch quay.io/coreos/alpine-sh
    ...
    $ sudo rkt run quay.io/coreos/alpine-sh
    Fetch ACI as unprivileged user
    Don’t have to download as root

    View full-size slide

  42. $ sudo rkt run --insecure-options=image --interactive \
    docker://busybox -- /bin/sh
    Run Docker containers with rkt
    Use a more secure runtime without changing images

    View full-size slide

  43. Scale out workloads
    Everyone’s goal is #GIFEE
    Enables automation
    Cloud = Distributed Systems

    View full-size slide

  44. When do you need
    cluster coordination?
    Leader election Cluster-wide Semaphores
    Service discovery
    Dynamic configuration

    View full-size slide

  45. Hard Computer Science Problem
    ?

    View full-size slide

  46. Hard Computer Science Problem
    Chubby

    View full-size slide

  47. A distributed, reliable key-value
    store for the most critical data of a
    distributed system.

    View full-size slide

  48. No existing “cloud native” solutions
    High availability from beginning
    Dynamic reconfiguration
    Why build etcd?

    View full-size slide

  49. Simple key/value
    “Distributed etc”
    Feels like a file system
    eg. directories

    View full-size slide

  50. $ etcdctl set /foo bar
    bar
    $ etcdctl ls /config
    /config/verbosity
    /config/ratelimit
    Set a value
    $ etcdctl get /foo
    bar

    View full-size slide

  51. Simple interface
    Easily write clients
    Use curl if you want
    Already maintain TLS infra.

    View full-size slide

  52. Watch a value
    Service discovery
    Reconfiguration
    Locking
    Cluster scheduler

    View full-size slide

  53. Cluster-wide reboot lock - “locksmith”
    Distributed init system - “fleet”
    Leader election - “fleet”

    View full-size slide

  54. $ locksmithctl status
    Available: 1
    Max: 1
    $ sudo locksmithctl reboot
    locksmith
    $ locksmithctl status
    Available: 0
    Max: 1
    MACHINE ID
    7f9ccde3cff9441f8b506785
    $ sudo locksmithctl reboot
    Error locking: semaphore is
    at 0

    View full-size slide

  55. Industry Adoption
    500+ projects on Github

    View full-size slide

  56. 3
    Application packaging
    Linux at scale
    Clustering

    View full-size slide

  57. Minimal, secure Linux OS
    Containers for app packaging
    Self-updating cluster
    Distributed systems tools

    View full-size slide

  58. Sounds good, but...
    Is anyone successful with CoreOS in prod?

    View full-size slide

  59. Publically traded options exchange

    View full-size slide

  60. Containers on CoreOS are powering ISE's high-
    throughput, low-latency financial exchange
    Running in production
    Bare metal & AWS
    Billions of transactions a day
    150 million req/sec

    View full-size slide

  61. TIME PATCHING OS
    NEW MACHINE DEPLOYMENT

    View full-size slide

  62. Invisible
    Infrastructure

    View full-size slide

  63. We really look at that [CoreOS] number growing
    significantly over this next year. We did some of these
    benchmarks to see if our production trading systems could
    leverage this type of infrastructure, and it was highly
    successful for us, and we look forward to using it more in
    our other environments.
    On the Linux side, everything in AWS is CoreOS. On the
    physical side, 20% is CoreOS, and growing.


    Robert Cornish
    CTO
    Paul Morgan
    Systems Architect

    View full-size slide

  64. Kubernetes is our recommended
    orchestration platform

    View full-size slide

  65. Guides & Tools
    coreos.com/kubernetes
    kube-aws
    Cloud-configs

    View full-size slide

  66. Upstream
    rktnetes
    Auth/OIDC
    Node self-signed TLS

    View full-size slide

  67. Scaling
    15x scheduler performance
    30k pods on 1k nodes
    SIG-scale

    View full-size slide

  68. Off-the-shelf
    #GIFEE

    View full-size slide

  69. Enhances
    Kubernetes
    Included tools
    24/7 Support
    Enhanced security

    View full-size slide

  70. Quay Enterprise

    View full-size slide

  71. Tectonic Console

    View full-size slide

  72. Distributed
    Trusted Computing
    Only possible with #GIFEE

    View full-size slide

  73. Trusted Computing
    It’s in your pocket right now

    View full-size slide

  74. Kubernetes
    rkt
    CoreOS Linux
    Firmware & TPM
    Cluster
    Containers
    Hardware
    OS

    View full-size slide

  75. Kubernetes
    rkt
    CoreOS Linux
    Firmware & TPM
    Cluster
    Containers
    Hardware
    OS

    View full-size slide

  76. Customer key embedded in the firmware
    Kubernetes
    rkt
    CoreOS Linux
    Firmware & TPM
    Cluster
    Containers
    Hardware
    OS
    Kubernetes

    View full-size slide

  77. Verify integrity of the OS release
    Customer key embedded in the firmware
    Kubernetes
    rkt
    CoreOS Linux
    Firmware & TPM
    Cluster
    Containers
    Hardware
    OS

    View full-size slide

  78. Verify integrity of the OS release
    Customer key embedded in the firmware
    Verify configuration state
    Verify images with trusted keys
    Kubernetes
    rkt
    CoreOS Linux
    Firmware & TPM
    Cluster
    Containers
    Hardware
    OS

    View full-size slide

  79. Verify integrity of the OS release
    Customer key embedded in the firmware
    Verify configuration state
    Verify images with trusted keys
    Only attested machines are allowed to join
    Kubernetes
    rkt
    CoreOS Linux
    Firmware & TPM
    Cluster
    Containers
    Hardware
    OS

    View full-size slide

  80. Verify integrity of the OS release
    Customer key embedded in the firmware
    Verify configuration state
    Verify images with trusted keys
    Only attested machines are allowed to join
    Kubernetes
    rkt
    CoreOS Linux
    Firmware & TPM
    Cluster
    Containers
    Hardware
    OS
    Tamper-proof
    audit log (TPM)

    View full-size slide

  81. Identify Attacks
    Visibility into new classes of attacks
    Firmware OS Images Rootkits

    View full-size slide

  82. Inverting DRM
    Your company is in control

    View full-size slide

  83. You hold the keys
    Only software your company allows will run
    You are in control of the hardware
    Key

    View full-size slide

  84. New Level of Security
    Run in third party or hostile data centers with zero trust
    Prevent invisible attacks
    Verifiable audit log for when things go wrong
    Putting you in control
    Your company is in cryptographic control your environment

    View full-size slide

  85. The Journey to #GIFEE

    View full-size slide

  86. coreos.com/fest - @coreosfest
    May 9 & 10, 2016 - Berlin, Germany

    View full-size slide

  87. Thank You
    Rob Szumski
    Product Design Lead, CoreOS
    @robszumski

    View full-size slide