Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Container Scheduling Without the Hype: Why Bother?

Container Scheduling Without the Hype: Why Bother?

Tyler L

June 06, 2018
Tweet

More Decks by Tyler L

Other Decks in Technology

Transcript

  1. Container Scheduling Without
    the Hype: Why Bother?
    DevOpsDays Boise 2018
    Tyler Langlois
    Software Engineer, Elastic

    View Slide

  2. $ whois tylerjl
    ● Infrastructure/software/devops-y
    things @ Elastic
    ● Lots of recent work on
    dynamic/containerized
    environments
    Come talk to me about Arm SBCs (or
    if you want to yell about the Elastic
    Puppet modules)
    ____________________
    < angry at computers >
    --------------------
    \ ^__^
    \ (oo)\_______
    (__)\ )\/\
    ||----w |
    || ||

    View Slide

  3. Who is This For?
    ● Why care about container
    schedulers?
    ● What can they offer
    operations and
    development?
    ● Real-world achievements
    made possible by these
    solutions (or, new ideas
    for current practitioners)

    View Slide

  4. Where We’ve Been
    ???
    /var/log/?
    tcp:localhost:???
    statsd?
    prometheus?
    graphite?
    ...
    Dependent:
    ● Libraries
    ● Packages
    ● Runtime
    ● Distro
    ● etc.

    View Slide

  5. Where We Can Go
    ???
    /var/log/?
    tcp:localhost:???
    statsd?
    prometheus?
    graphite?
    ...
    Dependent:
    ● Libraries
    ● Packages
    ● Runtime
    ● Distro
    ● etc.
    Let’s talk about:
    ● Runtime
    ● Monitoring
    ● Persistence
    ● Services

    View Slide

  6. Runtime (Traditional)
    ● Without containers
    ○ Don’t even - dependencies are separate from code,
    messy
    ● With just containers
    ○ Where are you running them? Cloud instances?
    ○ How are you scheduling and running them?

    View Slide

  7. Runtime
    ---
    image: org/app:1.0
    env:
    FOO: bar
    count: 3
    ● Nodes are cattle
    ● Contract
    w/consumers is
    clear:
    ○ Build
    instructions
    ○ Runtime
    instructions
    FROM python:3
    COPY app.py app.py
    CMD python app.py

    View Slide

  8. Runtime
    ---
    image: org/app:1.0
    env:
    FOO: bar
    count: 3
    ● Nodes are cattle
    ● Contract
    w/consumers is
    clear:
    ○ Build
    instructions
    ○ Runtime
    instructions
    FROM python:3
    COPY app.py app.py
    CMD python app.py
    ● Deployments are always the same bits - repeatability
    ● Updates are hands-off for both dev and ops - rolling
    container upgrades
    ● Application changes async from backend (container build
    instructions)

    View Slide

  9. Monitoring (Traditional)
    Logs
    ● Format? Path?
    ● Opt-in
    ● Accessibility?
    Metrics
    ● System metrics
    != app metrics
    ● Scrape from
    app?
    Alerts
    ● Metrics are
    good;
    deployment
    statistics as
    well?

    View Slide

  10. Monitoring
    stdout stderr

    View Slide

  11. Monitoring
    stdout stderr
    ● Zero-config for generic logs/metrics out of the box
    ● Easily build custom tools atop this data for out of the box
    alerting as well
    ● Logs/metrics become self-service with appropriate
    visualization solutions

    View Slide

  12. Persistence (Traditional)
    ● Shared mass storage (ceph, gluster) in traditional setups
    ● Dynamically attached storage in the case of cloud environments (EBS)
    ● Works, but:
    ○ What ties them together, provisions them, migrates them, backs up?
    big ol’
    data
    ?

    View Slide

  13. Persistence
    ---
    volume:
    size: 50G
    ● Like runtime
    definitions, the
    underlying impl.
    Isn’t a concern
    ● Carve off a hunk of
    storage as needed
    ● Scheduling is
    happening all the
    time, storage
    follows
    big ol’
    data

    View Slide

  14. Persistence
    ---
    volume:
    size: 50G
    ● Like runtime
    definitions, the
    underlying impl.
    Isn’t a concern
    ● Carve off a hunk of
    storage as needed
    ● Scheduling is
    happening all the
    time, storage
    follows
    big ol’
    data
    ● Nobody cares where or what the persistence base is, we
    just have space now
    ● Infra can develop tools to enhance storage for everyone
    (automated backups, snapshotting, etc.)
    ● Backend-agnostic - GCP, AWS, Azure, etc.

    View Slide

  15. Services (Traditional)
    ● Both internal and external:
    ○ Spin up an app, add it to a pool
    of servers
    ○ Health checks sometimes
    ○ Typically, the “expose this”
    process very loosely coupled
    with “provision this”

    View Slide

  16. ● Tie service endpoints to groups
    of containers and let the
    router/proxy handle it for you
    Services
    pods

    View Slide

  17. ● Load balancers become a
    by-product of naturally selecting
    endpoints from a pool of healthy
    endpoints
    Services
    pods

    View Slide

  18. Services (+MORE)
    Traefik/Envoy/Fabio are solving neat
    problems:
    ● Automatic Let’s Encrypt TLS
    ● Automatic Host/app name
    routing
    ● Networking ACLs

    View Slide

  19. Better Processes
    Runtime
    Monitoring
    Persistence
    Services
    ● Contracts are clear - no one needs to
    learn another team’s tools if they don’t
    want to
    ● Improvements and iteration are
    completely unblocked on either side
    ● Infra tooling becomes immediately
    useful for everyone on the platform

    View Slide

  20. Thank you!
    github.com/tylerjl
    irc/twitter: leothrix
    tjll.net
    Additional Information:
    ● Google for:
    ○ Kubernetes
    ○ Nomad
    ○ Mesos
    ○ Traefik
    ○ Envoy
    Let’s talk about monitoring/metrics
    at the Elastic booth

    View Slide