Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Testing – Monitoring Continuum (DevOps Sydney, 2015)

Rob Howard
January 15, 2015

The Testing – Monitoring Continuum (DevOps Sydney, 2015)

"Testing" and "Monitoring" are often seen as completely separate things. Let's try mixing it up.

(Lightning talk.)

Rob Howard

January 15, 2015
Tweet

More Decks by Rob Howard

Other Decks in Technology

Transcript

  1. The
    Testing↔Monitoring
    Continuum

    View Slide

  2. Question Time

    View Slide

  3. Ops
    Who here uses Nagios?
    Monit+MMonit? ... Serverspec?

    View Slide

  4. Dev
    Who writes unit tests? Integration
    tests, eg. using browser-driving
    full-stack tools like Selenium,
    Capybara, etc?

    View Slide

  5. What is
    Monitoring?

    View Slide

  6. You're developing a product; an
    app or something.
    Let's say we have a bunch of
    machines running that app.

    View Slide

  7. Load Balancers

    View Slide

  8. Load Balancers
    App Servers

    View Slide

  9. Load Balancers
    App Servers
    Data Store

    View Slide

  10. We're monitoring. What do we
    do? Well, first we should probably
    make sure that the servers are
    actually up. Easy!

    View Slide

  11. Well, what about more specific
    things.
    Is PostgreSQL running on the
    database? Can we see its PID?

    View Slide

  12. Is Postgres accepting
    connections?

    View Slide

  13. Is it accepting connections with
    the right username + password for
    the app? Maybe we stuff up a
    config rollout.

    View Slide

  14. Okay, but does it have the PG
    extensions the app needs, eg. for
    UUID generation?

    View Slide

  15. Is the app's database named
    correctly?

    View Slide

  16. Can the app see the tables it
    needs in the database?

    View Slide

  17. Can it write to those tables?
    Maybe we screwed up the
    permissions.

    View Slide

  18. THIS IS GETTING A BIT MUCH.

    View Slide

  19. Do we have to do this for every
    service or node that we're
    running?
    Where do we stop?

    View Slide

  20. Run the App.
    Well, maybe the best way of doing
    this is running the app itself.
    We could write a bash+curl script
    that, like, tests just logging in.

    View Slide

  21. Run the App's Tests.
    But is that testing everything the app needs
    to use? Maybe it'll break on the next click.
    Why not go the whole hog? Our app has an
    integration test suite (or should have). We
    spent a lot of money on it!

    View Slide

  22. Story Time
    Let's say we have a multi-tenant,
    hosted, Software-as-a-Service app
    that users buy instances/accounts
    for. VM Hosting, Chat, whatever.

    View Slide

  23. Local Dev. Env
    We'd have unit tests that you run
    on your local box.

    View Slide

  24. Local Dev. Env
    But also those big browser-driven
    tests as well. The test runner is still
    local, against a local copy of your
    app.

    View Slide

  25. Local Dev. Env
    Production
    Staging
    We have staging and production
    environments too.

    View Slide

  26. Local Dev. Env
    Production
    Staging
    Why don't we:
    * Spin up a new account on staging.
    * Run the integration tests against that
    new account.
    * Throw away the account afterwards.

    View Slide

  27. Local Dev. Env
    Production
    Staging
    It could be a custom app kicking off
    these test runs, but it could easily be
    Jenkins.

    View Slide

  28. Local Dev. Env
    Production
    Staging
    Do the same for production!
    Have these tests run over and
    over again. Chew up some of your
    production capacity, but have
    greater surety that your app works
    when placed into the staging and
    production environments you've
    configured and rolled out.

    View Slide

  29. Local Dev. Env
    Production
    Staging
    We're testing the

    app+infrastructure interface.
    We're testing that the, say, file
    upload feature on your chat app
    actually works with the
    infrastructure it's relying on.

    View Slide

  30. Local Dev. Env
    Production
    Staging
    It's not super-easy or perfect, and
    testing interactions with external
    systems (particularly payment
    ones) is hard, and might just
    involve turning off parts of your
    test and instrumenting detection
    of errors instead.

    View Slide

  31. Local Dev. Env
    Production
    Staging
    And finally, to be clear, this isn't
    replacing your environment tests
    (eg. available disk/RAM/CPU) or
    error-rate instrumentation; this is
    to alleviate the need for a ton of
    individual fine-grained service
    checks that would be better
    tested by an app being hit by your
    existing test suite.

    View Slide

  32. Testing Monitoring
    Back to the title.
    Instead of Testing and Monitoring
    as separate, discrete things, I'd
    argue that…

    View Slide

  33. Testing
    Testing

    +

    Monitoring
    … Testing is a part of Good
    Monitoring.

    View Slide

  34. Fin.

    Rob Howard

    @damncabbage

    https://speakerdeck.com/damncabbage/
    Thanks! One final thing…

    View Slide

  35. I work at OrionVM and we're hiring; we're building
    cloud hosting (physical) infrastructure, and we're
    after an Ops person (networking+routing, physical
    server wrangling, configuration management) and a
    Ruby/JS dev (UI) to help out.

    View Slide

  36. Fin.

    Rob Howard

    @damncabbage

    https://speakerdeck.com/damncabbage/

    View Slide