Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Systems Management Concepts and Futures

Systems Management Concepts and Futures

Some views into the current state of things, why some things are the way they are, and where some things may be going.

Michael DeHaan

May 01, 2015
Tweet

More Decks by Michael DeHaan

Other Decks in Programming

Transcript

  1. SYSTEMS MANAGEMENT CONCEPTS AND
    FUTURES
    michaeldehaan.net / @laserllama

    View Slide

  2. ABOUT ME
    • Run subteam of DataStax OpsCenter (w00t!)
    • disclaimer: these opinions are my own
    • Previous systems management things:
    • IBM - storage management
    • Red Hat - wrote Cobbler, co-wrote Func, others
    • Puppet Labs - short stint helping with Product Management, but learned a lot
    • rPath - reproducible immutable systems before it’s time, and also way too complicated
    • Ansible - side-project started 3 years ago, now 120k downloads/month on PyPi (reality = x4?)
    • Ansible, Inc - CTO. Ran all of Engineering/Strategy/Architecture, Ansible Tower, & OSS project

    View Slide

  3. THE OPS WORLD IS CHANGING.
    CONCEPTS AND THOUGHTS.

    View Slide

  4. THE ROLE OF THE HUMAN
    IS
    DECREASING.

    View Slide

  5. EVOLUTION OF
    BETTER OPS
    manual effort
    in-house scripting
    automation tools
    basic manual virt
    basic private /
    public cloud
    effective use of IaaS
    immutable systems
    metal
    PaaS
    self-managing clusters?
    SKY NET
    Robots who can build/rack HW

    View Slide

  6. ASIDE: WHAT’S UP WITH
    “DEVOPS”
    • This word means too many things.
    • Originally much interest originally was a lot about automated tooling, regardless of the
    meaning of the phrase. This was “Infrastructure As Code”. Sort of the “Software
    Craftsmanship” or “Test Engineering” of the Sysadmin. “I don’t just type in and click stuff”.
    • Then it became about communication/culture (conferences started rejecting tooling talks)
    • Some interesting parts are actually about Japanese Auto Manufacturing. Pipelines:
    Continuous Integration, push-button (sometimes Continuous) Deployment. Pipelines.
    • There’s often lots of cloud and monitoring bits.
    • Doesn’t matter. Let’s talk about “Ops”.

    View Slide

  7. BASIC CONCEPTS

    View Slide

  8. AUTOMATION SYSTEMS
    • Configuration Management - services, packages, files
    • Application Deployment - your software
    • Orchestration - controlling the above over a network
    • Cloud Automation / Provisioning Systems
    • Image Build Systems / Continuous Integration / Deployment
    • Monitoring/Trending - Critical, Super Interesting, And We’re Not
    Talking About It Much Today

    View Slide

  9. CONFIGURATION
    MANAGEMENT
    • CFEngine (DSL) - the 1st major structured system, very uncommonly chosen
    today, though present in some large shops still, but mired in complexity
    • Puppet (DSL) - first usable system, IMHO. Gained popularity during an
    incompatibility between CFEngine 2 and 3.
    • Chef (Ruby) - founded by Puppet users unhappy with ordering and other
    kinks in Puppet, who also wanted to write directly in Ruby.
    • Various others - Pallet (Clojure!), Salt (impure YAML), bcfg2 (XML)
    • Ansible (YAML) - focused on multi-node management and converging
    application deployment cases, over SSH versus custom protocol/agent

    View Slide

  10. BACKGROUND: GPS
    ANALOGY
    • Imperative: to get to North Carolina, drive 2850 miles East (assuming you are
    in CA)
    • Declarative: be in North Carolina, just do it. If you’re there, do nothing.
    • Impotence: most misused word ever, but F(x) = F(F(x)).
    • Makes repeated re-application to minimize “drift”
    • Drift is a phantom fear if you just use your management tool to edit things
    properly. It’s real if you aren’t.
    • Centralized sources of truth. Manage everything from ONE place.

    View Slide

  11. ASIDE: SERVICE/PACKAGE/FILE
    • Config Management key parts are managing 3 key resources:
    • Service - make this service be running or stopped or
    disabled. Possibly automatically restart when certain files/
    packages change.
    • Files - templates, copies, attributes, SELinux, etc.
    • Packages - install this package (usually yum/apt) and make
    sure it’s at the latest version or a specific version or maybe
    just installed.

    View Slide

  12. APPLICATION DEPLOYMENT
    SYSTEMS
    • taking your application in source control to your
    machines
    • possibly migrating databases
    • (in-house software rarely gets packaged right)
    • examples: Capistrano, Fabric, Ansible (all SSH)

    View Slide

  13. ORCHESTRATION
    • Second Most Overused Term In This Space?
    • Examples: Ansible (SSH), Func (SSL), mCollective (message based),
    OpsWorks
    • Could Mean:
    • Ordered application of config management in tiers
    • Multi-node
    • Rolling updates w/ load balancers
    • Anything

    View Slide

  14. CLOUD CONTROL
    • Controlling Cloud Resources / Topology
    • Examples:
    • Ansible (not quite a cohesive model)
    • CloudFormation (Amazon specific)
    • Terraform

    View Slide

  15. MORE ABOUT ANSIBLE
    JUST BECAUSE I KNOW
    ABOUT THIS ONE :)
    HTTP://DOCS.ANSIBLE.COM

    View Slide

  16. WHY ANSIBLE
    • Computers are no longer about single nodes, but nodes working in
    concert
    • Sometimes a disconnect between Config Tools and Deployment Tools
    (Large # of Puppet/Chef users also using Fabric/Capistrano)
    • Avoiding head-desking over common problems with other tools
    (for me).
    • Avoiding historical agent-fun (NTP, SSL, certs/CAs, is the agent
    crashed, how do I upgrade?, CPU/RAM drain)

    View Slide

  17. PUSH VS PULL: MYTHS
    • Commonly believed that push doesn’t scale. Not true. 10k nodes is
    possible from 1 node with major caveats, but you must limit total number
    of tasks (ansible push runs ansible locally). Also - do you want to possibly
    break 10k nodes at once? Not usually. Updates should roll. (Talking to
    several hundred at once, totally reasonable, it will auto loop).
    • With push OR pull, Anything doing 10k nodes can set your network on
    fire. Package mirror? First to fall. Don’t do WGET from people’s travels
    on personal web space - personal favorite misuse of automation (DDOS!)
    • Pull can actually create a thundering herd, historically Puppet compilation
    was very CPU bound

    View Slide

  18. WHY NOT PULL:
    ALSO ALSO WIK
    • Push based systems can “do this now” on all
    nodes a bit more easily, without a separate system
    to tell the pull to “pull now”.
    • Quick to choreograph steps between tiers -
    maximum speed versus 30m+30m+30m worst
    case for a 3-tier op (web+db+other, etc).

    View Slide

  19. THINGS I WOULD HAVE
    DONE DIFFERENTLY
    • Puppet has a strong type/provider model - skipped to save time early
    on. Not critical, but would have been nice and now difficult (~300
    modules in core)
    • Much less modules in core - lots of time to support them (but also
    really good for adoption!)
    • Would I have focused more on modular architecture (for maintenance/
    testing not speed) earlier on vs large scale contribution rates, which were
    really too much. ‘v2’ effort now underway will take care of most of this,
    enabling nodes to deploy at their own pace versus in lockstep (optionally).

    View Slide

  20. ANSIBLE ARCHITECTURE
    • no server required, no database, no ui
    • no agents - just log in and SSH. ControlPersist.
    • deploy ‘modules’ as units of work, which are
    declarative/idempotent, etc. Emits JSON.
    • language is just YAML - easier for machines to
    read/write, and good enough for people

    View Slide

  21. ALL THE
    TRENDS

    View Slide

  22. CLOUD USAGE
    • Most people (90%?) use AWS as “VMWare in Cloud”
    • Real power is using more of it (accepting lock-in):
    • ELBs
    • s3
    • other services: machine learning!
    • immutable systems

    View Slide

  23. IMMUTABLE SYSTEMS
    • What is really meant by “treat computers like
    cattle, not like pets” is they are disposable.
    Instances do not have names.
    • Horizontal scaling should be implicit in
    architectural choices

    View Slide

  24. IMMUTABLE SYSTEMS
    • much faster!
    • what gets tested can be exactly what is deployed
    • avoid failure/surprises during install/upgrades/autoscale in:
    • package updated/missing on mirror
    • network outage on mirror
    • miscellaneous failure on wget

    View Slide

  25. IMMUTABLE SYSTEMS(2)
    • Handle persistent data with RDS or volumes
    or avoid
    • How do nodes find each other? Service
    discovery or load balanced pools:
    • etcd, ZooKeeper, consul, others, ELBs/zuul

    View Slide

  26. CONTINOUS INTEGRATION
    • Jenkins
    • always run your unit and integration tests
    • tests are required for Continuous Integration
    • successful tests result in new image builds if
    you are going an image based route that you’ll use
    later in deployment (recommended)

    View Slide

  27. CONTINUOUS DEPLOYMENT
    • One of the Nirvana paths not everybody can get to
    • Getting to “frequent deployment” is good.
    • Automated rollout from a button, or automatically from Jenkins,
    upgrading all nodes in cloud/system
    • Relies on orchestration tooling - and either images (better) or
    running config automation. Often use load balancers to take
    upgrading nodes offline or swap old instances out for newer ones.

    View Slide

  28. MONITORING/GRAPHING
    • Lots of options. Critical for any good DevOps pipeline.
    • Not just about reporting failure. Detect trends before they
    become problems (slow queries, resource issues, space issues, etc).
    • Hosted monitoring is growing popular because the monitoring
    system is available in the event of a crash of your infrastructure
    • Log file analysis is growing popular, ELK/others rising up because of
    high cost of proprietary options (ex: Splunk).

    View Slide

  29. PAAS
    • I think this feels kind of dead, but I travel in the wrong circles so
    I could be wrong. I want this to be very m much alive.
    • assume this, I just want my code to run in the cloud, give me
    however many instances I need and don’t make me see them.
    Classic Automation then supports bringing up the PaaS
    and stops. “Just let me be a developer”.
    • Hard for existing apps, may be great for green field (but
    sometimes expensive).

    View Slide

  30. CONTAINERS (“DOCKER”)
    • Makes immutable systems more accessible to non-
    cloud-image based crowd. Personally I think it’s most
    interesting for blue/green upgrades.
    • Sometimes confusing as some are running additional
    “cloud” software on another cloud.
    • Best/reliable/future management software not
    entirely certain yet. (Mesos/Fleet/OpenStack/Kub/other).

    View Slide

  31. SELF-MANAGING
    • Can management be made turnkey so
    applications scale themselves?
    • Apps just know AWS API and can add their own
    capacity for worker nodes.
    • Not generic PaaS, but domain specific.

    View Slide

  32. FINAL THOUGHTS
    • Ansible attempts to be a reset in making some earlier automation tool concepts more accessible, and mostly succeeds at this.
    YAML language is not great, but it’s quick. Less moving parts is a huge win and makes tooling accessible to audiences that
    struggled with previous efforts, which is why it’s so widely deployed. Still, it’s a stepping stone towards immutable - but with
    various shops at various points on that journey. Many have enough other things to deal with that they aren’t ready for that now.
    • Still, IT systems are evolving more towards immutable systems (image-based) and PaaS-enabling systems over time,
    particularly in leading edge shops or new ventures. Progress is good! Various things still getting refined. Containers will help, but
    right now also add a degree of complexity. Just building AMIs if you’re on Amazon is a great start. Immutable systems means you
    can skip learning automation languages, which is nice (ex: Docker files), but you likely still need automation to deploy your
    container management system itself.
    • The infrastructure and nuts and bolts behind the apps, cloud, and network will matter less to more people over time. More
    so, intent can be coded, rather than form and common building blocks.
    • True IaaS applications are significantly different, and should be written differently. Flexibility, lock-in, and cost is traded for
    better reliability, scalability, ease of management. Write apps for the business, not reinventing the same wheels everyone has to
    invent.
    • Much more people writing code in Ops land. Will “10 years in AWS Services” is going to be the new “10 years in J2EE”
    for Ops professionals?

    View Slide

  33. QUESTIONS?

    View Slide