Slide 1

Slide 1 text

Feature: Ruby on Rails Application Monitoring with Cucumber In order to ensure continuous application availability A developer should be able to assert the behavior of production apps From the outside in Without using antiquated monitoring tools To protect revenue

Slide 2

Slide 2 text

VP of Research & Development railsmachine.com jesse@railsmachine.com @jnewland github.com/jnewland About me: I get to hack on Ruby tools to manage large Rails deployments all day long. Not a bad job, eh?

Slide 3

Slide 3 text

Before we get into monitoring or cucumber, let’s talk about testing. In my career as a dev, my testing habits have evolved over time, largely inspired by available tools. I’m sure some of you have shared a similar journey - let’s take a quick look back.

Slide 4

Slide 4 text

No more clicking around Save in your editor / refresh in your browser / lather / rinse repeat. Occasional human preformed quality assurance Broken by design

Slide 5

Slide 5 text

I then made the jump to unit testing using Ruby’s Test::Unit - specifically the generated Model and Controller tests Rails generated. This was nice, but it was often devalued by stakeholders due to poor communication of the business value of this work on my part.

Slide 6

Slide 6 text

R Enter Rspec and the BDD movement. Rspec helped me, and I’m sure a lot of others, associate the business value with writing tests / specs. Stakeholder-digestable code if you’re really good, stakeholder-digestable output if you’re doing things right.

Slide 7

Slide 7 text

C U C U M B E R Basically, BDD nirvana. Stakeholder-*writable* if you’re crazy.

Slide 8

Slide 8 text

Cucumber lets software development teams describe how software should behave in plain text. The text is written in a business-readable domain- specific language and serves as documentation, automated tests and development-aid - all rolled into one format. For those of you that aren’t familiar with Cuke

Slide 9

Slide 9 text

TATFT! The most important part of the evolution of these tools is that they make it easy and - legitimately - fun to test first and test all of the time as you’re developing your application.

Slide 10

Slide 10 text

Production Monitoring But what about production? We’re testing all the time in development, while we’re developing the that’s going to create revenue. But in production...

Slide 11

Slide 11 text

Revenue Preservation ...there’s actually revenue being earned. Why not test with the same veracity in production?

Slide 12

Slide 12 text

Current Monitoring Landscape Quiz: * Raise your hand if you are at least partially responsible for the continuous operation of a business critical production rails app * If you have ZERO monitoring of the site’s uptime - meaning your customers or boss would be the one to tell you that the homepage was down - put your hand down * If your monitoring solution runs on your server itself - monit or god, for example - put your hand down * If your external monitoring solution only hits one URL on the site, put your hand down Some sites are monitored very closely, but I’ve found that in most cases, the monitoring of many production apps is rather slim. I generally evaluate monitoring solutions on two axes:

Slide 13

Slide 13 text

What’s being monitored - what URLs, metrics, system statistics, etc are being watched

Slide 14

Slide 14 text

How closely are you looking at it? We’ll call this one the crazy monkey test - How frequently these URLs / metrics are being queried, what values are acceptable, etc.

Slide 15

Slide 15 text

It seems that in many situations, the home page of an application is the only thing checked closely

Slide 16

Slide 16 text

The crazy monkey has laser focus

Slide 17

Slide 17 text

but if the crazy monkey is that focused

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

Bad things can happen when he’s not looking. For example, in Rails apps, I see this happen all the time with...

Slide 20

Slide 20 text

Search is a part of many applications that I’ve seen go unmonitored. I’m not singling out sphinx here - this is just a sweet picture - the same thing happens to Solr, etc

Slide 21

Slide 21 text

Search can fail when the rest of a site works fine due to many reasons: * search daemon may go down * the indicies may be corrupt * or things may fail in a more interesting kind of way...

Slide 22

Slide 22 text

0 results for “beer” Wherein no results are returned when they obviously should be.

Slide 23

Slide 23 text

TATFT? So are we really testing all the time?

Slide 24

Slide 24 text

TATFT* *except in production It doesn’t seem so

Slide 25

Slide 25 text

_why? But why? Why are we testing so....ferociously in development, but so weakly in production?

Slide 26

Slide 26 text

Old Broken Tools I’m largely convinced it’s because the tools that are presented to us for use in the monitoring space are largely old and broken.

Slide 27

Slide 27 text

How many of your recognize this? Oh, nagios.

Slide 28

Slide 28 text

It’s the industry standard tool for infra monitoring. I haven’t met a single person that’s used nagios that’s been an honest fan. The most widely despised part of nagios

Slide 29

Slide 29 text

is the noise. Unless masterly configured, Nagios is a noisy beast. This leads to “boy cries wolf” type scenarios, wherein alerts are improperly categorized as noise and discarded.

Slide 30

Slide 30 text

EVIL Because of the noise, and the piece of crap interface, esoteric configuration language, and for years and years of waking me up for false positives, I’m going to paint this all in black and white and just call nagios evil.

Slide 31

Slide 31 text

Pingdom’s a relatively new tool that’s gained a good bit of traction. It’s a hosted monitoring service, that can test HTTP and many other types of services from a network of computers around the world.

Slide 32

Slide 32 text

✔ Nagios and pingdom pass the crazy monkey intense focus test

Slide 33

Slide 33 text

✘ but in their default configuration generally only monitor a snapshot of what’s neccessary.

Slide 34

Slide 34 text

A recent entry into the space that’s doesn’t get a quick EVIL stamp from me is watchmouse

Slide 35

Slide 35 text

Twitter uses watchmouse to provide a public API status page, hitting many different API endpoints and watching for outages and service problems

Slide 36

Slide 36 text

✔ Twitter’s use of Watchmouse passes the “what are you looking at test”

Slide 37

Slide 37 text

Business Value Disconnect However, one thing that all of these tools are missing is a clear link between the business value of the things they’re checking and the alerts they’re sending out

Slide 38

Slide 38 text

Hey, I know something that does that well!

Slide 39

Slide 39 text

Cucumber lets software development teams describe how software should behave in plain text. The text is written in a business-readable domain- specific language and serves as documentation, automated tests and development-aid - all rolled into one format. Cucumber’s served well for me in my experience in bringing stakeholders and developers together.

Slide 40

Slide 40 text

Cucumber lets software development teams describe how software should behave in plain text. The text is written in a business-readable domain- specific language and serves as documentation, automated tests and development-aid - all rolled into one format. But with a couple quick edits

Slide 41

Slide 41 text

Cucumber also lets operations teams describe how infrastructure should behave in plain text. The text is written in a business-readable domain- specific language and serves as documentation, monitoring and deployment-aid - all rolled into one format. We have a tool that can help us bring together developers, operations, *and* stakeholders

Slide 42

Slide 42 text

#devops Some of you following the twitterz may have noticed some people in the ops and development space talking about the ‘devops movement’

Slide 43

Slide 43 text

devs ops working together While calling this a movement is pretty wild - a hashtag does not a movement make - the ideas surrounding this ‘movement’ are things that I believe in personally, and things we’re working on everyday at Rails Machine - blurring the line between development and ops, and the line between the infrastructure and the application.

Slide 44

Slide 44 text

Cucumber also lets #devops teams describe how applications should behave in plain text. The text is written in a business-readable domain- specific language and serves as documentation, monitoring and deployment-aid - all rolled into one format. Using cucumber in production embodies everything that is devops, and can blur those lines even more

Slide 45

Slide 45 text

kumbaya And thus result in a big happy #devops family

Slide 46

Slide 46 text

Example Production Cucumber Features

Slide 47

Slide 47 text

Benchmarking

Slide 48

Slide 48 text

Feature: slashdot.com To keep the geek masses satisfied Slashdot must be responsive Scenario: Cached pages are super quick Given I am benchmarking When I go to http://slashdot.org/ Then the elapsed time should be less than 500 milliseconds When I follow "Login" Then the elapsed time should be less than 500 milliseconds When I follow "Contact" Then the elapsed time should be less than 500 milliseconds

Slide 49

Slide 49 text

Email Deliverability

Slide 50

Slide 50 text

Feature: Signup Emails In order to prevent bots from taking over the site A new user should receive a verification email upon signup Scenario: New User signup Given I visit "http://example.com" And I follow "Signup!" When I signup with a random email address and password And I press "Go" And I wait 10 seconds # an unfortunate reality Then I should have one email in my inbox And the email subject should match "^Welcome" And the email body should match "http:\/\/example.com\/v\/\w+" https://github.com/technicalpickles/mailinator-spec

Slide 51

Slide 51 text

Existing Metrics

Slide 52

Slide 52 text

Feature: Response Time As a impatient user Our web server should be in tip-top shape So our app can be super fast Background: Given my Scout account name is 'railsmachine' And my Scout email and password are 'jesse@railsmachine.com' and 'sekret' Scenario: Passenger Queue When I get the metrics from the 'Passenger' plugin on 'example.com' Then the 'passenger_queue_depth' should be 0 Scenatiro: CPU usage is low When I get the metrics from the 'Server Overview' plugin on 'example.com' Then 'cpu_last_minute' should be less than 1 http://github.com/jnewland/cucumber-scout/

Slide 53

Slide 53 text

Feature: Response Time As a impatient user Our app should be super fast Background: Given my NewRelic license key is 'omgwtfbbq' Scenario: Average Response time Given that my application is being monitored by New Relic Then my application's 'response time' should be less than 500 milliseconds Scenario: Apdex Given that my application is being monitored by New Relic Then my application's 'apdex' should be 1 http://github.com/jnewland/cucumber-newrelic

Slide 54

Slide 54 text

SEO

Slide 55

Slide 55 text

Feature: Cucumber wiki discoverability In order to learn more about Cucumber As an uninformed developer I should be able easily find the GitHub wiki Scenario: Searching for Cucumber on Google When I go to http://www.google.com/ And I fill in "q" with "cucumber" And I press "Google Search" Then I should see "BDD that talks to domain experts first and code second"

Slide 56

Slide 56 text

Security

Slide 57

Slide 57 text

Feature: example.org ssh logins As a user of example.org I need to login remotely Scenario: Login with a key Given I have the following public keys: | keyfile | | /home/jnewland/.ssh/id_dsa | Then I can ssh to the following hosts with these credentials: | hostname | username | | example.org | jnewland | | mail.example.org | jnewland | Scenario: Checking /etc/passwd When I ssh to "example.org" with the following credentials: | username | password | keyfile | | jnewland | | /home/jnewland/.ssh/id_dsa | And I run "cat /etc/passwd" Then I should see "jnewland" in the output And I should not see "that_dude_we_just_fired" in the output http://github.com/auxesis/cucumber-nagios

Slide 58

Slide 58 text

Infrastructure

Slide 59

Slide 59 text

Feature: RAID To ensure optimal server operation And guarantee data is stored redundantly The RAID array should be in a good state Scenario: RAID Array status When I check the raid array status Then controller "1" should have a status of "optimal" And controller "2" should have a status of "optimal" And controller "1" should have "1" logical device with a status of "optimal" And controller "1" should have "4" drives in "online" state And controller "2" should have "1" logical device with a status of "optimal" And controller "2" should have "4" drives in "online" state http://github.com/auxesis/cucumber-nagios

Slide 60

Slide 60 text

DNS

Slide 61

Slide 61 text

Feature: rubygems.org As a member of the Ruby community I should be able to easily install Ruby gems Scenario: DNS When I lookup "rubygems.org" Then the name should resolve an IP http://github.com/auxesis/cucumber-nagios

Slide 62

Slide 62 text

Possibilities

Slide 63

Slide 63 text

Credit Card Transactions

Slide 64

Slide 64 text

SSL

Slide 65

Slide 65 text

Exception Rate

Slide 66

Slide 66 text

Running features in production

Slide 67

Slide 67 text

$ gem install cucumber-json cucumber-newrelic \ cucumber-scout cucumber-nagios $ cd RAILS_ROOT $ mkdir -p production_features/step_definitions $ mkdir -p production_features/support $ vi config/cucumber.yml production: production_features -f Cucumber::Formatter::JSON --out tmp/cuke.json $ vi production_features/support/env.rb require 'cucumber/nagios/steps' require 'cucumber/newrelic' require 'cucumber/scout' # etc $ # hack on features $ cucumber -p production # doesn’t load the Rails env, just the defined steps $ # profit! Sorta Quick Setup Generator coming soon!

Slide 68

Slide 68 text

No content

Slide 69

Slide 69 text

Cucumber Scout Plugin

Slide 70

Slide 70 text

http://github.com/jnewland/scout-plugins/raw/cucumber_ci/ cucumber_ci/cucumber_ci.rb Cucumber Scout Plugin

Slide 71

Slide 71 text

Cucumber Scout Plugin

Slide 72

Slide 72 text

Cucumber Scout Plugin ZOMG!

Slide 73

Slide 73 text

Cucumber Scout Plugin Sent via email

Slide 74

Slide 74 text

Power to monitor anything and everything

Slide 75

Slide 75 text

Clearly Defined Business Value

Slide 76

Slide 76 text

I N R U B Y

Slide 77

Slide 77 text

24/7/365

Slide 78

Slide 78 text

Know before your customers do

Slide 79

Slide 79 text

Thanks! Any questions?

Slide 80

Slide 80 text

Get in touch: Jesse Newland jesse@railsmachine.com @jnewland github.com/jnewland

Slide 81

Slide 81 text

Flickr Creative Commons Photos http://flic.kr/p/4EjsDJ http://flic.kr/p/c1UTf http://flic.kr/p/5BGJMu http://flic.kr/p/5iTuua http://flic.kr/p/77oAy2 http://flic.kr/p/CdYo8 http://flic.kr/p/k2LCm http://flic.kr/p/71vxY6 http://flic.kr/p/5aZYkP http://flic.kr/p/79ikH2 http://flic.kr/p/6rhqad http://flic.kr/p/4MdrW8 http://flic.kr/p/5WuXzM http://flic.kr/p/3jzrJ http://flic.kr/p/5B4TaF http://flic.kr/p/4FAf2R http://flic.kr/p/39poLP http://flic.kr/p/6nq52E http://flic.kr/p/yQuMG http://flic.kr/p/jZ5Ae http://flic.kr/p/4yhzz http://flic.kr/p/nxAqt http://flic.kr/p/4VWY5F http://flic.kr/p/EKbuF http://flic.kr/p/5xitHh http://flic.kr/p/4uE9Wz http://flic.kr/p/65KZaJ http://flic.kr/p/7JKj5H http://flic.kr/p/79HLb5 http://flic.kr/p/xiYny http://flic.kr/p/68vjKV http://flic.kr/p/rvc1 http://flic.kr/p/6y7EfX http://flic.kr/p/2Mxkhp http://flic.kr/p/5t7h5 http://flic.kr/p/29qc7 http://flic.kr/p/JBaj http://flic.kr/p/smfab http://flic.kr/p/4t5Qf9 http://flic.kr/p/M8kdv http://flic.kr/p/z3eWm http://flic.kr/p/4XAQs7