Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Web Application Monitoring with Cucumber

Web Application Monitoring with Cucumber

Jesse Newland

October 02, 2011
Tweet

More Decks by Jesse Newland

Other Decks in Technology

Transcript

  1. Feature: Ruby on Rails Application Monitoring with Cucumber
    In order to ensure continuous application availability
    A developer should be able to assert the behavior of production apps
    From the outside in
    Without using antiquated monitoring tools
    To protect revenue

    View Slide

  2. VP of Research & Development
    railsmachine.com
    [email protected]
    @jnewland
    github.com/jnewland
    About me:
    I get to hack on Ruby tools to manage large Rails deployments all day long. Not a bad job,
    eh?

    View Slide

  3. Before we get into monitoring or cucumber, let’s talk about testing.
    In my career as a dev, my testing habits have evolved over time, largely inspired by available
    tools.
    I’m sure some of you have shared a similar journey - let’s take a quick look back.

    View Slide

  4. No
    more
    clicking
    around
    Save in your editor / refresh in your browser / lather / rinse repeat.
    Occasional human preformed quality assurance
    Broken by design

    View Slide

  5. I then made the jump to unit testing using Ruby’s Test::Unit - specifically the generated
    Model and Controller tests Rails generated.
    This was nice, but it was often devalued by stakeholders due to poor communication of the
    business value of this work on my part.

    View Slide

  6. R
    Enter Rspec and the BDD movement.
    Rspec helped me, and I’m sure a lot of others, associate the business value with writing
    tests / specs.
    Stakeholder-digestable code if you’re really good, stakeholder-digestable output if you’re
    doing things right.

    View Slide

  7. C
    U
    C
    U
    M
    B
    E
    R
    Basically, BDD nirvana. Stakeholder-*writable* if you’re crazy.

    View Slide

  8. Cucumber lets software
    development teams describe
    how software should behave in
    plain text. The text is written in
    a business-readable domain-
    specific language and serves
    as documentation, automated
    tests and development-aid - all
    rolled into one format.
    For those of you that aren’t familiar with Cuke

    View Slide

  9. TATFT!
    The most important part of the evolution of these tools is that they make it easy and -
    legitimately - fun to test first and test all of the time as you’re developing your application.

    View Slide

  10. Production
    Monitoring
    But what about production? We’re testing all the time in development, while we’re developing
    the that’s going to create revenue. But in production...

    View Slide

  11. Revenue
    Preservation
    ...there’s actually revenue being earned. Why not test with the same veracity in production?

    View Slide

  12. Current
    Monitoring
    Landscape
    Quiz:
    * Raise your hand if you are at least partially responsible for the continuous operation of a
    business critical production rails app
    * If you have ZERO monitoring of the site’s uptime - meaning your customers or boss would
    be the one to tell you that the homepage was down - put your hand down
    * If your monitoring solution runs on your server itself - monit or god, for example - put
    your hand down
    * If your external monitoring solution only hits one URL on the site, put your hand down
    Some sites are monitored very closely, but I’ve found that in most cases, the monitoring of
    many production apps is rather slim.
    I generally evaluate monitoring solutions on two axes:

    View Slide

  13. What’s being monitored - what URLs, metrics, system statistics, etc are being watched

    View Slide

  14. How
    closely
    are
    you
    looking
    at
    it?
    We’ll call this one the crazy monkey test - How frequently these URLs / metrics are being
    queried, what values are acceptable, etc.

    View Slide

  15. It seems that in many situations, the home page of an application is the only thing checked
    closely

    View Slide

  16. The crazy monkey has laser focus

    View Slide

  17. but if the crazy monkey is that focused

    View Slide

  18. View Slide

  19. Bad things can happen when he’s not looking.
    For example, in Rails apps, I see this happen all the time with...

    View Slide

  20. Search is a part of many applications that I’ve seen go unmonitored. I’m not singling out
    sphinx here - this is just a sweet picture - the same thing happens to Solr, etc

    View Slide

  21. Search can fail when the rest of a site works fine due to many reasons:
    * search daemon may go down
    * the indicies may be corrupt
    * or things may fail in a more interesting kind of way...

    View Slide

  22. 0 results for “beer”
    Wherein no results are returned when they obviously should be.

    View Slide

  23. TATFT?
    So are we really testing all the time?

    View Slide

  24. TATFT*
    *except in production
    It doesn’t seem so

    View Slide

  25. _why?
    But why? Why are we testing so....ferociously in development, but so weakly in production?

    View Slide

  26. Old
    Broken
    Tools
    I’m largely convinced it’s because the tools that are presented to us for use in the monitoring
    space are largely old and broken.

    View Slide

  27. How many of your recognize this? Oh, nagios.

    View Slide

  28. It’s the industry standard tool for infra monitoring. I haven’t met a single person that’s used
    nagios that’s been an honest fan. The most widely despised part of nagios

    View Slide

  29. is the noise. Unless masterly configured, Nagios is a noisy beast. This leads to “boy cries
    wolf” type scenarios, wherein alerts are improperly categorized as noise and discarded.

    View Slide

  30. EVIL
    Because of the noise, and the piece of crap interface, esoteric configuration language, and for
    years and years of waking me up for false positives, I’m going to paint this all in black and
    white and just call nagios evil.

    View Slide

  31. Pingdom’s a relatively new tool that’s gained a good bit of traction. It’s a hosted monitoring
    service, that can test HTTP and many other types of services from a network of computers
    around the world.

    View Slide


  32. Nagios and pingdom pass the crazy monkey intense focus test

    View Slide


  33. but in their default configuration generally only monitor a snapshot of what’s neccessary.

    View Slide

  34. A recent entry into the space that’s doesn’t get a quick EVIL stamp from me is watchmouse

    View Slide

  35. Twitter uses watchmouse to provide a public API status page, hitting many different API
    endpoints and watching for outages and service problems

    View Slide


  36. Twitter’s use of Watchmouse passes the “what are you looking at test”

    View Slide

  37. Business
    Value
    Disconnect
    However, one thing that all of these tools are missing is a clear link between the business
    value of the things they’re checking and the alerts they’re sending out

    View Slide

  38. Hey, I know something that does that well!

    View Slide

  39. Cucumber lets software
    development teams describe
    how software should behave in
    plain text. The text is written in
    a business-readable domain-
    specific language and serves
    as documentation, automated
    tests and development-aid - all
    rolled into one format.
    Cucumber’s served well for me in my experience in bringing stakeholders and developers
    together.

    View Slide

  40. Cucumber lets software
    development teams describe
    how software should behave in
    plain text. The text is written in
    a business-readable domain-
    specific language and serves
    as documentation, automated
    tests and development-aid - all
    rolled into one format.
    But with a couple quick edits

    View Slide

  41. Cucumber also lets operations
    teams describe how
    infrastructure should behave in
    plain text. The text is written in a
    business-readable domain-
    specific language and serves
    as documentation, monitoring
    and deployment-aid - all rolled
    into one format.
    We have a tool that can help us bring together developers, operations, *and* stakeholders

    View Slide

  42. #devops
    Some of you following the twitterz may have noticed some people in the ops and
    development space talking about the ‘devops movement’

    View Slide

  43. devs
    ops
    working together
    While calling this a movement is pretty wild - a hashtag does not a movement make - the
    ideas surrounding this ‘movement’ are things that I believe in personally, and things we’re
    working on everyday at Rails Machine - blurring the line between development and ops, and
    the line between the infrastructure and the application.

    View Slide

  44. Cucumber also lets #devops
    teams describe how
    applications should behave in
    plain text. The text is written in a
    business-readable domain-
    specific language and serves
    as documentation, monitoring
    and deployment-aid - all rolled
    into one format.
    Using cucumber in production embodies everything that is devops, and can blur those lines
    even more

    View Slide

  45. kumbaya
    And thus result in a big happy #devops family

    View Slide

  46. Example
    Production
    Cucumber
    Features

    View Slide

  47. Benchmarking

    View Slide

  48. Feature: slashdot.com
    To keep the geek masses satisfied
    Slashdot must be responsive
    Scenario: Cached pages are super quick
    Given I am benchmarking
    When I go to http://slashdot.org/
    Then the elapsed time should be less than 500 milliseconds
    When I follow "Login"
    Then the elapsed time should be less than 500 milliseconds
    When I follow "Contact"
    Then the elapsed time should be less than 500 milliseconds

    View Slide

  49. Email Deliverability

    View Slide

  50. Feature: Signup Emails
    In order to prevent bots from taking over the site
    A new user should receive a verification email upon signup
    Scenario: New User signup
    Given I visit "http://example.com"
    And I follow "Signup!"
    When I signup with a random email address and password
    And I press "Go"
    And I wait 10 seconds # an unfortunate reality
    Then I should have one email in my inbox
    And the email subject should match "^Welcome"
    And the email body should match "http:\/\/example.com\/v\/\w+"
    https://github.com/technicalpickles/mailinator-spec

    View Slide

  51. Existing
    Metrics

    View Slide

  52. Feature: Response Time
    As a impatient user
    Our web server should be in tip-top shape
    So our app can be super fast
    Background:
    Given my Scout account name is 'railsmachine'
    And my Scout email and password are '[email protected]' and 'sekret'
    Scenario: Passenger Queue
    When I get the metrics from the 'Passenger' plugin on 'example.com'
    Then the 'passenger_queue_depth' should be 0
    Scenatiro: CPU usage is low
    When I get the metrics from the 'Server Overview' plugin on 'example.com'
    Then 'cpu_last_minute' should be less than 1
    http://github.com/jnewland/cucumber-scout/

    View Slide

  53. Feature: Response Time
    As a impatient user
    Our app should be super fast
    Background:
    Given my NewRelic license key is 'omgwtfbbq'
    Scenario: Average Response time
    Given that my application is being monitored by New Relic
    Then my application's 'response time' should be less than 500 milliseconds
    Scenario: Apdex
    Given that my application is being monitored by New Relic
    Then my application's 'apdex' should be 1
    http://github.com/jnewland/cucumber-newrelic

    View Slide

  54. SEO

    View Slide

  55. Feature: Cucumber wiki discoverability
    In order to learn more about Cucumber
    As an uninformed developer
    I should be able easily find the GitHub wiki
    Scenario: Searching for Cucumber on Google
    When I go to http://www.google.com/
    And I fill in "q" with "cucumber"
    And I press "Google Search"
    Then I should see "BDD that talks to domain experts first and code second"

    View Slide

  56. Security

    View Slide

  57. Feature: example.org ssh logins
    As a user of example.org
    I need to login remotely
    Scenario: Login with a key
    Given I have the following public keys:
    | keyfile |
    | /home/jnewland/.ssh/id_dsa |
    Then I can ssh to the following hosts with these credentials:
    | hostname | username |
    | example.org | jnewland |
    | mail.example.org | jnewland |
    Scenario: Checking /etc/passwd
    When I ssh to "example.org" with the following credentials:
    | username | password | keyfile |
    | jnewland | | /home/jnewland/.ssh/id_dsa |
    And I run "cat /etc/passwd"
    Then I should see "jnewland" in the output
    And I should not see "that_dude_we_just_fired" in the output
    http://github.com/auxesis/cucumber-nagios

    View Slide

  58. Infrastructure

    View Slide

  59. Feature: RAID
    To ensure optimal server operation
    And guarantee data is stored redundantly
    The RAID array should be in a good state
    Scenario: RAID Array status
    When I check the raid array status
    Then controller "1" should have a status of "optimal"
    And controller "2" should have a status of "optimal"
    And controller "1" should have "1" logical device with a status of "optimal"
    And controller "1" should have "4" drives in "online" state
    And controller "2" should have "1" logical device with a status of "optimal"
    And controller "2" should have "4" drives in "online" state
    http://github.com/auxesis/cucumber-nagios

    View Slide

  60. DNS

    View Slide

  61. Feature: rubygems.org
    As a member of the Ruby community
    I should be able to easily install Ruby gems
    Scenario: DNS
    When I lookup "rubygems.org"
    Then the name should resolve an IP
    http://github.com/auxesis/cucumber-nagios

    View Slide

  62. Possibilities

    View Slide

  63. Credit
    Card
    Transactions

    View Slide

  64. SSL

    View Slide

  65. Exception
    Rate

    View Slide

  66. Running
    features
    in production

    View Slide

  67. $ gem install cucumber-json cucumber-newrelic \
    cucumber-scout cucumber-nagios
    $ cd RAILS_ROOT
    $ mkdir -p production_features/step_definitions
    $ mkdir -p production_features/support
    $ vi config/cucumber.yml
    production: production_features -f Cucumber::Formatter::JSON --out tmp/cuke.json
    $ vi production_features/support/env.rb
    require 'cucumber/nagios/steps'
    require 'cucumber/newrelic'
    require 'cucumber/scout'
    # etc
    $ # hack on features
    $ cucumber -p production # doesn’t load the Rails env, just the defined steps
    $ # profit!
    Sorta Quick Setup
    Generator
    coming
    soon!

    View Slide

  68. View Slide

  69. Cucumber Scout Plugin

    View Slide

  70. http://github.com/jnewland/scout-plugins/raw/cucumber_ci/
    cucumber_ci/cucumber_ci.rb
    Cucumber Scout Plugin

    View Slide

  71. Cucumber Scout Plugin

    View Slide

  72. Cucumber Scout Plugin
    ZOMG!

    View Slide

  73. Cucumber Scout Plugin
    Sent via email

    View Slide

  74. Power
    to
    monitor
    anything
    and
    everything

    View Slide

  75. Clearly
    Defined
    Business
    Value

    View Slide

  76. I
    N
    R
    U
    B
    Y

    View Slide

  77. 24/7/365

    View Slide

  78. Know before your customers do

    View Slide

  79. Thanks!
    Any questions?

    View Slide

  80. Get in touch:
    Jesse Newland
    [email protected]
    @jnewland
    github.com/jnewland

    View Slide

  81. Flickr Creative Commons Photos
    http://flic.kr/p/4EjsDJ
    http://flic.kr/p/c1UTf
    http://flic.kr/p/5BGJMu
    http://flic.kr/p/5iTuua
    http://flic.kr/p/77oAy2
    http://flic.kr/p/CdYo8
    http://flic.kr/p/k2LCm
    http://flic.kr/p/71vxY6
    http://flic.kr/p/5aZYkP
    http://flic.kr/p/79ikH2
    http://flic.kr/p/6rhqad
    http://flic.kr/p/4MdrW8
    http://flic.kr/p/5WuXzM
    http://flic.kr/p/3jzrJ
    http://flic.kr/p/5B4TaF
    http://flic.kr/p/4FAf2R
    http://flic.kr/p/39poLP
    http://flic.kr/p/6nq52E
    http://flic.kr/p/yQuMG
    http://flic.kr/p/jZ5Ae
    http://flic.kr/p/4yhzz
    http://flic.kr/p/nxAqt
    http://flic.kr/p/4VWY5F
    http://flic.kr/p/EKbuF
    http://flic.kr/p/5xitHh
    http://flic.kr/p/4uE9Wz
    http://flic.kr/p/65KZaJ
    http://flic.kr/p/7JKj5H
    http://flic.kr/p/79HLb5
    http://flic.kr/p/xiYny
    http://flic.kr/p/68vjKV
    http://flic.kr/p/rvc1
    http://flic.kr/p/6y7EfX
    http://flic.kr/p/2Mxkhp
    http://flic.kr/p/5t7h5
    http://flic.kr/p/29qc7
    http://flic.kr/p/JBaj
    http://flic.kr/p/smfab
    http://flic.kr/p/4t5Qf9
    http://flic.kr/p/M8kdv
    http://flic.kr/p/z3eWm
    http://flic.kr/p/4XAQs7

    View Slide