Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deterministic Solutions to Intermittent Failures

Tim Mertens
November 16, 2017

Deterministic Solutions to Intermittent Failures

Presented at RubyConf 2017 in New Orleans.
https://confreaks.tv/events/rubyconf2017

Tim Mertens

November 16, 2017
Tweet

Other Decks in Programming

Transcript

  1. T i m M e r t e n s

    HELLO, My Name Is github.com/tmertens @rockfx01
  2. Tests Are Software Too • Test code does exactly what

    you tell it to do • “Flaky” implies an unsolvable problem • “Non-Deterministic” behavior can be accounted for • Any failure can be resolved once you know the root cause
  3. Continuous Integration They call me “CI” for short A process

    or system by which new code is continuously validated against an existing test suite.
  4. Parallelized Builds Builds which spread the work of executing tests

    across 2 or more workers (e.g. containers, nodes)
  5. Common Reproducible Failures • Stale Branches • Business Dates and

    Times • Mocked Time vs System Time • Missing Preconditions • Real Bugs
  6. Test Group A subset of tests from the test suite

    which run on a specific node in a parallelized build.
  7. RSpec Test Group $ rspec # OR $ rspec ./spec/some_spec.rb

    ./spec/other_spec.rb
 
 $ rspec . --tag focus No Group - Runs All Tests Metadata Tags Specific Files
  8. RSpec Test Seed $ rspec
 …
 Finished in 1 minutes

    17.7 seconds
 99 examples, 0 failures, 4 pending
 
 Randomized with seed 13391 Test Seed
  9. Re-running Test Group With Seed $ rspec --seed 12345 --fail-fast

    # OR $ rspec ./spec/some_spec.rb \
 ./spec/other_spec.rb \
 --seed 12345 --fail-fast
  10. Test Bisect Repeatedly dividing a set of tests in half

    until you find the minimal set of tests which cause another test to fail.
  11. Bisecting Test Group with Seed $ rspec --seed 12345 --bisect

    # OR $ rspec ./spec/some_spec.rb \
 ./spec/other_spec.rb \
 --seed 12345 --bisect
  12. Test Pollution When the side effects of one or more

    tests in a test group cause one or more other tests to fail.
  13. Data Pollution • Data is persisted across test examples or

    test suite executions ◦ Database Records ◦ Caches (e.g. Redis)
  14. Defensive Testing • Tests should clean up after themselves, but…

    • Don’t expect pristine starting conditions
  15. • Don’t expect tables to be empty Defensive Testing #

    Don’t:
 expect(User.count ).to eq 1
 
 # Do:
 expect { foo.bar }.to change { User.count }.by(1)
  16. • Don’t expect global scopes to only return test records

    # Don’t:
 expect(User.active).to match_array [user1, user2]
 
 # Do:
 expect(User.active).to include(user1, user2)
 expect(User.active).not_to include(user3) Defensive Testing
  17. Class/Singleton Caching # Don’t:
 described_class.add("foo") # mutates the singleton
 expect(described_class.contains?("foo")).to

    be true
 
 # Do:
 subject = described_class.new
 subject.add("foo")
 expect(subject.contains?("foo")).to be true
  18. Mutated Constants • Don’t Overwrite constants # Don’t:
 before {

    SOME_CONST = "my test value” }
 
 # Do:
 stub_const("MyClass", "my test value")
 allow(MyClass).to receive(:foo).and_return("foo")
 fake_class = class_double(MyClass, foo: "foo")
 stub_const("MyClass", fake_class)
  19. Mutated Constants # Don't:
 before do
 MyClass.define_method(:foo) { "foo" }


    end
 
 # Do:
 instance = described_class.new
 allow(instance).to receive(:foo).and_return(“foo")
  20. Mutated (Test) Constants describe Foo do
 # Don’t:
 BAR =

    "some_value"
 it { expect(Foo.bar).to eq BAR }
 
 # Do:
 let(:bar) { "some_value" }
 it { expect(Foo.bar).to eq bar }
 end
  21. Real Bugs! • Always ensure you understand the reason for

    the test failure and ensure your production code is not at fault
  22. Running Tests in a Loop describe MyClass do
 100.times do


    describe "#some_method" do
 it "does something" do
 # ...
 end
 end
 end
 end
  23. Non-Deterministic Failure F a i l u r e s

    t h a t o c c u r a t seemingly random frequencies due to non-deterministic behavior of the code under test.
  24. Unordered Queries • Don’t assume queries return results in specific

    order • Unordered queries in Postgresql ◦ Postgresql returns results in non-deterministic order if query is not explicitly sorted # Don’t:
 expect(results).to eq [record_1, record_2]
 
 # Do:
 expect(results).to contain_exactly record_1, record_2
 expect(results).to match_array [record_1, record_2]
  25. Frozen Time • Creating records in frozen time ◦ All

    records have the same created_at time ◦ Queries ordered by created_at will return results in non-deterministic order • Prefer Timecop#travel over Timecop#freeze • Only freeze time when precise time is needed
  26. Randomized Test Data • Faker or other data generation or

    sampling methods return unexpected or unsupported data ◦ Non-alpha names (“D’Angelo”, “Doe-Smith”, “Mc Donald”) ◦ Invalid phone numbers, zip codes, unsupported states, etc. • Output relevant randomized data in the test error message to make troubleshooting easier
  27. Date and Time • Tests only fail on weekends/holidays? •

    Tests only fail at certain time of day? • Timecop to the date/time when the tests ran in CI Avant timecop-rspec gem:
 https://github.com/avantoss/timecop-rspec
  28. UTC vs Local Date/Time • `Date.today` uses system time zone

    • `Date.current` uses application time zone
  29. UTC vs Local Date/Time ENV["TZ"] = "UTC"
 Time.zone = "America/Chicago"


    
 early_morning_utc = Time.utc(2017,11,10,2)
 Timecop.travel(early_morning_utc) do
 # This will fail:
 expect(Date.current).to eq Date.today
 end
  30. SQL Date Comparisons • Database queries comparing Dates to Time

    ◦ Never pass Time objects to sequel queries against Date columns MyModel.where(‘start_date <= ?’, Time.now).to_sql
 #=> SELECT “my_models”.*
 FROM “my_models”
 WHERE (start_date <= ‘2017-11-03 06:29:45’)
  31. Timeouts and Asynchronous Javascript • CI performance is often worse

    than your local machine • Page load performance can vary widely based on application configuration and test ordering • Increase timeouts for CI as needed • Don’t use browser tests for performance testing
  32. Timeouts and Asynchronous Javascript • Wait for pages to finish

    loading before interacting with them ◦ SitePrism load_validations:
 https://github.com/natritmeyer/site_prism#load-validations
  33. Environmental Differences • Compare CI configuration and setup to local

    ◦ Environment Variables ◦ Test setup or execution inconsistencies • Database ◦ Seeds ◦ Migrations missing from schema or structure files
  34. Strategies for Unreproducible Failures • SSH into CI and try

    to reproduce • Use common sense ◦ What are the probable causes of the failure? • Check gem github repos for related issues or changes • Learn to use pry, byebug • Incrementally narrow the scope of the defect
  35. Strategies for Unreproducible Failures • Know your test support code

    in and out • Look at failure trends over time • Add logging
  36. Takeaways • Keep your builds green to avoid sadness •

    Tests are code too • Set realistic goals • Celebrate success!
  37. Get In Touch Tim Mertens Github { tmertens } Twitter

    { @rockfx01 } 
 https://github.com/tmertens/intermittent_test_failures