Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Web Application Monitoring with Cucumber

Web Application Monitoring with Cucumber

Jesse Newland

October 02, 2011
Tweet

More Decks by Jesse Newland

Other Decks in Technology

Transcript

  1. Feature: Ruby on Rails Application Monitoring with Cucumber
    In order to ensure continuous application availability
    A developer should be able to assert the behavior of production apps
    From the outside in
    Without using antiquated monitoring tools
    To protect revenue

    View full-size slide

  2. VP of Research & Development
    railsmachine.com
    [email protected]
    @jnewland
    github.com/jnewland
    About me:
    I get to hack on Ruby tools to manage large Rails deployments all day long. Not a bad job,
    eh?

    View full-size slide

  3. Before we get into monitoring or cucumber, let’s talk about testing.
    In my career as a dev, my testing habits have evolved over time, largely inspired by available
    tools.
    I’m sure some of you have shared a similar journey - let’s take a quick look back.

    View full-size slide

  4. No
    more
    clicking
    around
    Save in your editor / refresh in your browser / lather / rinse repeat.
    Occasional human preformed quality assurance
    Broken by design

    View full-size slide

  5. I then made the jump to unit testing using Ruby’s Test::Unit - specifically the generated
    Model and Controller tests Rails generated.
    This was nice, but it was often devalued by stakeholders due to poor communication of the
    business value of this work on my part.

    View full-size slide

  6. R
    Enter Rspec and the BDD movement.
    Rspec helped me, and I’m sure a lot of others, associate the business value with writing
    tests / specs.
    Stakeholder-digestable code if you’re really good, stakeholder-digestable output if you’re
    doing things right.

    View full-size slide

  7. C
    U
    C
    U
    M
    B
    E
    R
    Basically, BDD nirvana. Stakeholder-*writable* if you’re crazy.

    View full-size slide

  8. Cucumber lets software
    development teams describe
    how software should behave in
    plain text. The text is written in
    a business-readable domain-
    specific language and serves
    as documentation, automated
    tests and development-aid - all
    rolled into one format.
    For those of you that aren’t familiar with Cuke

    View full-size slide

  9. TATFT!
    The most important part of the evolution of these tools is that they make it easy and -
    legitimately - fun to test first and test all of the time as you’re developing your application.

    View full-size slide

  10. Production
    Monitoring
    But what about production? We’re testing all the time in development, while we’re developing
    the that’s going to create revenue. But in production...

    View full-size slide

  11. Revenue
    Preservation
    ...there’s actually revenue being earned. Why not test with the same veracity in production?

    View full-size slide

  12. Current
    Monitoring
    Landscape
    Quiz:
    * Raise your hand if you are at least partially responsible for the continuous operation of a
    business critical production rails app
    * If you have ZERO monitoring of the site’s uptime - meaning your customers or boss would
    be the one to tell you that the homepage was down - put your hand down
    * If your monitoring solution runs on your server itself - monit or god, for example - put
    your hand down
    * If your external monitoring solution only hits one URL on the site, put your hand down
    Some sites are monitored very closely, but I’ve found that in most cases, the monitoring of
    many production apps is rather slim.
    I generally evaluate monitoring solutions on two axes:

    View full-size slide

  13. What’s being monitored - what URLs, metrics, system statistics, etc are being watched

    View full-size slide

  14. How
    closely
    are
    you
    looking
    at
    it?
    We’ll call this one the crazy monkey test - How frequently these URLs / metrics are being
    queried, what values are acceptable, etc.

    View full-size slide

  15. It seems that in many situations, the home page of an application is the only thing checked
    closely

    View full-size slide

  16. The crazy monkey has laser focus

    View full-size slide

  17. but if the crazy monkey is that focused

    View full-size slide

  18. Bad things can happen when he’s not looking.
    For example, in Rails apps, I see this happen all the time with...

    View full-size slide

  19. Search is a part of many applications that I’ve seen go unmonitored. I’m not singling out
    sphinx here - this is just a sweet picture - the same thing happens to Solr, etc

    View full-size slide

  20. Search can fail when the rest of a site works fine due to many reasons:
    * search daemon may go down
    * the indicies may be corrupt
    * or things may fail in a more interesting kind of way...

    View full-size slide

  21. 0 results for “beer”
    Wherein no results are returned when they obviously should be.

    View full-size slide

  22. TATFT?
    So are we really testing all the time?

    View full-size slide

  23. TATFT*
    *except in production
    It doesn’t seem so

    View full-size slide

  24. _why?
    But why? Why are we testing so....ferociously in development, but so weakly in production?

    View full-size slide

  25. Old
    Broken
    Tools
    I’m largely convinced it’s because the tools that are presented to us for use in the monitoring
    space are largely old and broken.

    View full-size slide

  26. How many of your recognize this? Oh, nagios.

    View full-size slide

  27. It’s the industry standard tool for infra monitoring. I haven’t met a single person that’s used
    nagios that’s been an honest fan. The most widely despised part of nagios

    View full-size slide

  28. is the noise. Unless masterly configured, Nagios is a noisy beast. This leads to “boy cries
    wolf” type scenarios, wherein alerts are improperly categorized as noise and discarded.

    View full-size slide

  29. EVIL
    Because of the noise, and the piece of crap interface, esoteric configuration language, and for
    years and years of waking me up for false positives, I’m going to paint this all in black and
    white and just call nagios evil.

    View full-size slide

  30. Pingdom’s a relatively new tool that’s gained a good bit of traction. It’s a hosted monitoring
    service, that can test HTTP and many other types of services from a network of computers
    around the world.

    View full-size slide


  31. Nagios and pingdom pass the crazy monkey intense focus test

    View full-size slide


  32. but in their default configuration generally only monitor a snapshot of what’s neccessary.

    View full-size slide

  33. A recent entry into the space that’s doesn’t get a quick EVIL stamp from me is watchmouse

    View full-size slide

  34. Twitter uses watchmouse to provide a public API status page, hitting many different API
    endpoints and watching for outages and service problems

    View full-size slide


  35. Twitter’s use of Watchmouse passes the “what are you looking at test”

    View full-size slide

  36. Business
    Value
    Disconnect
    However, one thing that all of these tools are missing is a clear link between the business
    value of the things they’re checking and the alerts they’re sending out

    View full-size slide

  37. Hey, I know something that does that well!

    View full-size slide

  38. Cucumber lets software
    development teams describe
    how software should behave in
    plain text. The text is written in
    a business-readable domain-
    specific language and serves
    as documentation, automated
    tests and development-aid - all
    rolled into one format.
    Cucumber’s served well for me in my experience in bringing stakeholders and developers
    together.

    View full-size slide

  39. Cucumber lets software
    development teams describe
    how software should behave in
    plain text. The text is written in
    a business-readable domain-
    specific language and serves
    as documentation, automated
    tests and development-aid - all
    rolled into one format.
    But with a couple quick edits

    View full-size slide

  40. Cucumber also lets operations
    teams describe how
    infrastructure should behave in
    plain text. The text is written in a
    business-readable domain-
    specific language and serves
    as documentation, monitoring
    and deployment-aid - all rolled
    into one format.
    We have a tool that can help us bring together developers, operations, *and* stakeholders

    View full-size slide

  41. #devops
    Some of you following the twitterz may have noticed some people in the ops and
    development space talking about the ‘devops movement’

    View full-size slide

  42. devs
    ops
    working together
    While calling this a movement is pretty wild - a hashtag does not a movement make - the
    ideas surrounding this ‘movement’ are things that I believe in personally, and things we’re
    working on everyday at Rails Machine - blurring the line between development and ops, and
    the line between the infrastructure and the application.

    View full-size slide

  43. Cucumber also lets #devops
    teams describe how
    applications should behave in
    plain text. The text is written in a
    business-readable domain-
    specific language and serves
    as documentation, monitoring
    and deployment-aid - all rolled
    into one format.
    Using cucumber in production embodies everything that is devops, and can blur those lines
    even more

    View full-size slide

  44. kumbaya
    And thus result in a big happy #devops family

    View full-size slide

  45. Example
    Production
    Cucumber
    Features

    View full-size slide

  46. Benchmarking

    View full-size slide

  47. Feature: slashdot.com
    To keep the geek masses satisfied
    Slashdot must be responsive
    Scenario: Cached pages are super quick
    Given I am benchmarking
    When I go to http://slashdot.org/
    Then the elapsed time should be less than 500 milliseconds
    When I follow "Login"
    Then the elapsed time should be less than 500 milliseconds
    When I follow "Contact"
    Then the elapsed time should be less than 500 milliseconds

    View full-size slide

  48. Email Deliverability

    View full-size slide

  49. Feature: Signup Emails
    In order to prevent bots from taking over the site
    A new user should receive a verification email upon signup
    Scenario: New User signup
    Given I visit "http://example.com"
    And I follow "Signup!"
    When I signup with a random email address and password
    And I press "Go"
    And I wait 10 seconds # an unfortunate reality
    Then I should have one email in my inbox
    And the email subject should match "^Welcome"
    And the email body should match "http:\/\/example.com\/v\/\w+"
    https://github.com/technicalpickles/mailinator-spec

    View full-size slide

  50. Existing
    Metrics

    View full-size slide

  51. Feature: Response Time
    As a impatient user
    Our web server should be in tip-top shape
    So our app can be super fast
    Background:
    Given my Scout account name is 'railsmachine'
    And my Scout email and password are '[email protected]' and 'sekret'
    Scenario: Passenger Queue
    When I get the metrics from the 'Passenger' plugin on 'example.com'
    Then the 'passenger_queue_depth' should be 0
    Scenatiro: CPU usage is low
    When I get the metrics from the 'Server Overview' plugin on 'example.com'
    Then 'cpu_last_minute' should be less than 1
    http://github.com/jnewland/cucumber-scout/

    View full-size slide

  52. Feature: Response Time
    As a impatient user
    Our app should be super fast
    Background:
    Given my NewRelic license key is 'omgwtfbbq'
    Scenario: Average Response time
    Given that my application is being monitored by New Relic
    Then my application's 'response time' should be less than 500 milliseconds
    Scenario: Apdex
    Given that my application is being monitored by New Relic
    Then my application's 'apdex' should be 1
    http://github.com/jnewland/cucumber-newrelic

    View full-size slide

  53. Feature: Cucumber wiki discoverability
    In order to learn more about Cucumber
    As an uninformed developer
    I should be able easily find the GitHub wiki
    Scenario: Searching for Cucumber on Google
    When I go to http://www.google.com/
    And I fill in "q" with "cucumber"
    And I press "Google Search"
    Then I should see "BDD that talks to domain experts first and code second"

    View full-size slide

  54. Feature: example.org ssh logins
    As a user of example.org
    I need to login remotely
    Scenario: Login with a key
    Given I have the following public keys:
    | keyfile |
    | /home/jnewland/.ssh/id_dsa |
    Then I can ssh to the following hosts with these credentials:
    | hostname | username |
    | example.org | jnewland |
    | mail.example.org | jnewland |
    Scenario: Checking /etc/passwd
    When I ssh to "example.org" with the following credentials:
    | username | password | keyfile |
    | jnewland | | /home/jnewland/.ssh/id_dsa |
    And I run "cat /etc/passwd"
    Then I should see "jnewland" in the output
    And I should not see "that_dude_we_just_fired" in the output
    http://github.com/auxesis/cucumber-nagios

    View full-size slide

  55. Infrastructure

    View full-size slide

  56. Feature: RAID
    To ensure optimal server operation
    And guarantee data is stored redundantly
    The RAID array should be in a good state
    Scenario: RAID Array status
    When I check the raid array status
    Then controller "1" should have a status of "optimal"
    And controller "2" should have a status of "optimal"
    And controller "1" should have "1" logical device with a status of "optimal"
    And controller "1" should have "4" drives in "online" state
    And controller "2" should have "1" logical device with a status of "optimal"
    And controller "2" should have "4" drives in "online" state
    http://github.com/auxesis/cucumber-nagios

    View full-size slide

  57. Feature: rubygems.org
    As a member of the Ruby community
    I should be able to easily install Ruby gems
    Scenario: DNS
    When I lookup "rubygems.org"
    Then the name should resolve an IP
    http://github.com/auxesis/cucumber-nagios

    View full-size slide

  58. Possibilities

    View full-size slide

  59. Credit
    Card
    Transactions

    View full-size slide

  60. Exception
    Rate

    View full-size slide

  61. Running
    features
    in production

    View full-size slide

  62. $ gem install cucumber-json cucumber-newrelic \
    cucumber-scout cucumber-nagios
    $ cd RAILS_ROOT
    $ mkdir -p production_features/step_definitions
    $ mkdir -p production_features/support
    $ vi config/cucumber.yml
    production: production_features -f Cucumber::Formatter::JSON --out tmp/cuke.json
    $ vi production_features/support/env.rb
    require 'cucumber/nagios/steps'
    require 'cucumber/newrelic'
    require 'cucumber/scout'
    # etc
    $ # hack on features
    $ cucumber -p production # doesn’t load the Rails env, just the defined steps
    $ # profit!
    Sorta Quick Setup
    Generator
    coming
    soon!

    View full-size slide

  63. Cucumber Scout Plugin

    View full-size slide

  64. http://github.com/jnewland/scout-plugins/raw/cucumber_ci/
    cucumber_ci/cucumber_ci.rb
    Cucumber Scout Plugin

    View full-size slide

  65. Cucumber Scout Plugin

    View full-size slide

  66. Cucumber Scout Plugin
    ZOMG!

    View full-size slide

  67. Cucumber Scout Plugin
    Sent via email

    View full-size slide

  68. Power
    to
    monitor
    anything
    and
    everything

    View full-size slide

  69. Clearly
    Defined
    Business
    Value

    View full-size slide

  70. Know before your customers do

    View full-size slide

  71. Thanks!
    Any questions?

    View full-size slide

  72. Get in touch:
    Jesse Newland
    [email protected]
    @jnewland
    github.com/jnewland

    View full-size slide

  73. Flickr Creative Commons Photos
    http://flic.kr/p/4EjsDJ
    http://flic.kr/p/c1UTf
    http://flic.kr/p/5BGJMu
    http://flic.kr/p/5iTuua
    http://flic.kr/p/77oAy2
    http://flic.kr/p/CdYo8
    http://flic.kr/p/k2LCm
    http://flic.kr/p/71vxY6
    http://flic.kr/p/5aZYkP
    http://flic.kr/p/79ikH2
    http://flic.kr/p/6rhqad
    http://flic.kr/p/4MdrW8
    http://flic.kr/p/5WuXzM
    http://flic.kr/p/3jzrJ
    http://flic.kr/p/5B4TaF
    http://flic.kr/p/4FAf2R
    http://flic.kr/p/39poLP
    http://flic.kr/p/6nq52E
    http://flic.kr/p/yQuMG
    http://flic.kr/p/jZ5Ae
    http://flic.kr/p/4yhzz
    http://flic.kr/p/nxAqt
    http://flic.kr/p/4VWY5F
    http://flic.kr/p/EKbuF
    http://flic.kr/p/5xitHh
    http://flic.kr/p/4uE9Wz
    http://flic.kr/p/65KZaJ
    http://flic.kr/p/7JKj5H
    http://flic.kr/p/79HLb5
    http://flic.kr/p/xiYny
    http://flic.kr/p/68vjKV
    http://flic.kr/p/rvc1
    http://flic.kr/p/6y7EfX
    http://flic.kr/p/2Mxkhp
    http://flic.kr/p/5t7h5
    http://flic.kr/p/29qc7
    http://flic.kr/p/JBaj
    http://flic.kr/p/smfab
    http://flic.kr/p/4t5Qf9
    http://flic.kr/p/M8kdv
    http://flic.kr/p/z3eWm
    http://flic.kr/p/4XAQs7

    View full-size slide