Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deterministic Solutions to Intermittent Failures

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.
Avatar for Tim Mertens Tim Mertens
November 16, 2017

Deterministic Solutions to Intermittent Failures

Presented at RubyConf 2017 in New Orleans.
https://confreaks.tv/events/rubyconf2017

Avatar for Tim Mertens

Tim Mertens

November 16, 2017
Tweet

Other Decks in Programming

Transcript

  1. T i m M e r t e n s

    HELLO, My Name Is github.com/tmertens @rockfx01
  2. Tests Are Software Too • Test code does exactly what

    you tell it to do • “Flaky” implies an unsolvable problem • “Non-Deterministic” behavior can be accounted for • Any failure can be resolved once you know the root cause
  3. Continuous Integration They call me “CI” for short A process

    or system by which new code is continuously validated against an existing test suite.
  4. Parallelized Builds Builds which spread the work of executing tests

    across 2 or more workers (e.g. containers, nodes)
  5. Common Reproducible Failures • Stale Branches • Business Dates and

    Times • Mocked Time vs System Time • Missing Preconditions • Real Bugs
  6. Test Group A subset of tests from the test suite

    which run on a specific node in a parallelized build.
  7. RSpec Test Group $ rspec # OR $ rspec ./spec/some_spec.rb

    ./spec/other_spec.rb
 
 $ rspec . --tag focus No Group - Runs All Tests Metadata Tags Specific Files
  8. RSpec Test Seed $ rspec
 …
 Finished in 1 minutes

    17.7 seconds
 99 examples, 0 failures, 4 pending
 
 Randomized with seed 13391 Test Seed
  9. Re-running Test Group With Seed $ rspec --seed 12345 --fail-fast

    # OR $ rspec ./spec/some_spec.rb \
 ./spec/other_spec.rb \
 --seed 12345 --fail-fast
  10. Test Bisect Repeatedly dividing a set of tests in half

    until you find the minimal set of tests which cause another test to fail.
  11. Bisecting Test Group with Seed $ rspec --seed 12345 --bisect

    # OR $ rspec ./spec/some_spec.rb \
 ./spec/other_spec.rb \
 --seed 12345 --bisect
  12. Test Pollution When the side effects of one or more

    tests in a test group cause one or more other tests to fail.
  13. Data Pollution • Data is persisted across test examples or

    test suite executions ◦ Database Records ◦ Caches (e.g. Redis)
  14. Defensive Testing • Tests should clean up after themselves, but…

    • Don’t expect pristine starting conditions
  15. • Don’t expect tables to be empty Defensive Testing #

    Don’t:
 expect(User.count ).to eq 1
 
 # Do:
 expect { foo.bar }.to change { User.count }.by(1)
  16. • Don’t expect global scopes to only return test records

    # Don’t:
 expect(User.active).to match_array [user1, user2]
 
 # Do:
 expect(User.active).to include(user1, user2)
 expect(User.active).not_to include(user3) Defensive Testing
  17. Class/Singleton Caching # Don’t:
 described_class.add("foo") # mutates the singleton
 expect(described_class.contains?("foo")).to

    be true
 
 # Do:
 subject = described_class.new
 subject.add("foo")
 expect(subject.contains?("foo")).to be true
  18. Mutated Constants • Don’t Overwrite constants # Don’t:
 before {

    SOME_CONST = "my test value” }
 
 # Do:
 stub_const("MyClass", "my test value")
 allow(MyClass).to receive(:foo).and_return("foo")
 fake_class = class_double(MyClass, foo: "foo")
 stub_const("MyClass", fake_class)
  19. Mutated Constants # Don't:
 before do
 MyClass.define_method(:foo) { "foo" }


    end
 
 # Do:
 instance = described_class.new
 allow(instance).to receive(:foo).and_return(“foo")
  20. Mutated (Test) Constants describe Foo do
 # Don’t:
 BAR =

    "some_value"
 it { expect(Foo.bar).to eq BAR }
 
 # Do:
 let(:bar) { "some_value" }
 it { expect(Foo.bar).to eq bar }
 end
  21. Real Bugs! • Always ensure you understand the reason for

    the test failure and ensure your production code is not at fault
  22. Running Tests in a Loop describe MyClass do
 100.times do


    describe "#some_method" do
 it "does something" do
 # ...
 end
 end
 end
 end
  23. Non-Deterministic Failure F a i l u r e s

    t h a t o c c u r a t seemingly random frequencies due to non-deterministic behavior of the code under test.
  24. Unordered Queries • Don’t assume queries return results in specific

    order • Unordered queries in Postgresql ◦ Postgresql returns results in non-deterministic order if query is not explicitly sorted # Don’t:
 expect(results).to eq [record_1, record_2]
 
 # Do:
 expect(results).to contain_exactly record_1, record_2
 expect(results).to match_array [record_1, record_2]
  25. Frozen Time • Creating records in frozen time ◦ All

    records have the same created_at time ◦ Queries ordered by created_at will return results in non-deterministic order • Prefer Timecop#travel over Timecop#freeze • Only freeze time when precise time is needed
  26. Randomized Test Data • Faker or other data generation or

    sampling methods return unexpected or unsupported data ◦ Non-alpha names (“D’Angelo”, “Doe-Smith”, “Mc Donald”) ◦ Invalid phone numbers, zip codes, unsupported states, etc. • Output relevant randomized data in the test error message to make troubleshooting easier
  27. Date and Time • Tests only fail on weekends/holidays? •

    Tests only fail at certain time of day? • Timecop to the date/time when the tests ran in CI Avant timecop-rspec gem:
 https://github.com/avantoss/timecop-rspec
  28. UTC vs Local Date/Time • `Date.today` uses system time zone

    • `Date.current` uses application time zone
  29. UTC vs Local Date/Time ENV["TZ"] = "UTC"
 Time.zone = "America/Chicago"


    
 early_morning_utc = Time.utc(2017,11,10,2)
 Timecop.travel(early_morning_utc) do
 # This will fail:
 expect(Date.current).to eq Date.today
 end
  30. SQL Date Comparisons • Database queries comparing Dates to Time

    ◦ Never pass Time objects to sequel queries against Date columns MyModel.where(‘start_date <= ?’, Time.now).to_sql
 #=> SELECT “my_models”.*
 FROM “my_models”
 WHERE (start_date <= ‘2017-11-03 06:29:45’)
  31. Timeouts and Asynchronous Javascript • CI performance is often worse

    than your local machine • Page load performance can vary widely based on application configuration and test ordering • Increase timeouts for CI as needed • Don’t use browser tests for performance testing
  32. Timeouts and Asynchronous Javascript • Wait for pages to finish

    loading before interacting with them ◦ SitePrism load_validations:
 https://github.com/natritmeyer/site_prism#load-validations
  33. Environmental Differences • Compare CI configuration and setup to local

    ◦ Environment Variables ◦ Test setup or execution inconsistencies • Database ◦ Seeds ◦ Migrations missing from schema or structure files
  34. Strategies for Unreproducible Failures • SSH into CI and try

    to reproduce • Use common sense ◦ What are the probable causes of the failure? • Check gem github repos for related issues or changes • Learn to use pry, byebug • Incrementally narrow the scope of the defect
  35. Strategies for Unreproducible Failures • Know your test support code

    in and out • Look at failure trends over time • Add logging
  36. Takeaways • Keep your builds green to avoid sadness •

    Tests are code too • Set realistic goals • Celebrate success!
  37. Get In Touch Tim Mertens Github { tmertens } Twitter

    { @rockfx01 } 
 https://github.com/tmertens/intermittent_test_failures