Fixing Flaky Tests Like a Detective

Fixing Flaky Tests Like a Detective

Every test suite has them: a few tests that usually pass but sometimes mysteriously fail when run on the same code. Since they can’t be reliably replicated, they can be tough to fix. The good news is there’s a set of usual suspects that cause them: test order, async code, time, sorting and randomness. While walking through examples of each type, I’ll show you methods for identifying a culprit that range from capturing screenshots to traveling through time. You’ll leave with the skills to fix any flaky test fast, and with strategies for monitoring and improving your test suite's reliability overall.

8a5a47fba47dcdc6075dbe6353f68406?s=128

Sonja Peterson

May 01, 2019
Tweet

Transcript

  1. 13.

    @SonjaBPeterson @SonjaBPeterson 1. Gather information 2. Identify type of flaky

    test it might be 3. Come up with a theory of how it’s happening 4. Fix it!
  2. 15.
  3. 17.

    @SonjaBPeterson @SonjaBPeterson Types of Evidence • Error messages and output

    from each failure • Time of day failures occurred • How often the test is failing • Which tests were run before, in what order
  4. 18.

    @SonjaBPeterson @SonjaBPeterson How to collect evidence • Set up any

    master build failure to report to a bug tracker with a link to the build • Make sure failures of the same test can be grouped together
  5. 22.

    @SonjaBPeterson @SonjaBPeterson A test in which some code runs asynchronously,

    resulting in events in the test happening in varying orders
  6. 24.

    @SonjaBPeterson @SonjaBPeterson Async code in system/feature tests • At least

    3 different threads involved: • Main thread executing your test • Another thread running your Rails server • A thread in a separate process running the browser
  7. 26.

    @SonjaBPeterson @SonjaBPeterson click_on “Submit Post” checks DB Your test code

    Browser click triggers Ajax updates UI Test Rails server creates blog post in DB
  8. 28.

    @SonjaBPeterson @SonjaBPeterson click_on “Submit Post” checks DB Your test code

    Browser click triggers Ajax updates UI Test Rails server creates post in DB have_content waits
  9. 29.

    @SonjaBPeterson @SonjaBPeterson Another example: async code visit books_path click_on "Sort”

    expect_alphabetical_order click_on "Sort" expect_reversed_alphabetical_order waits for books to appear in sorted order waits for books to appear in reverse order
  10. 30.

    @SonjaBPeterson @SonjaBPeterson Another example: async code visit books_path click_on "Sort"

    expect_alphabetical_order click_on "Sort" expect_reversed_alphabetical_order not actually waiting hits before first reload books already sorted
  11. 31.

    @SonjaBPeterson @SonjaBPeterson Another example: async code click_on "Sort" expect(page).to have_content("Sort:

    ASC") expect_alphabetical_order click_on "Sort" expect(page).to have_content("Sort: DESC") expect_reversed_alphabetical_order
  12. 32.

    @SonjaBPeterson @SonjaBPeterson Identifying async code flakes • Is it a

    system/feature test? • Does it trigger any events without explicitly waiting for the results? • Use Capybara’s page.save_screenshot or the capybara-screenshot gem
  13. 33.

    @SonjaBPeterson @SonjaBPeterson Preventing async flakes • Make sure your test

    is waiting for each action to finish. • Don’t use sleep - wait for something specific • Understand which Capybara methods wait and which don’t • Check that each assertion is working as expected
  14. 37.

    @SonjaBPeterson @SonjaBPeterson Potential areas of shared state • The database

    • Global or class variables • The browser (in tests that run in one)
  15. 38.

    @SonjaBPeterson @SonjaBPeterson Database cleaning • Each test should start with

    a "clean" DB • Transactions are fastest & are the default in Rails • In the past, couldn’t be used with Capybara tests because the test code & test server didn’t share a DB connection
  16. 39.

    @SonjaBPeterson @SonjaBPeterson Database cleaning • Rails 5 system tests fixed

    this by allowing shared access to DB connections in tests • Running in transactions may have subtle differences from normal behavior
  17. 40.

    @SonjaBPeterson @SonjaBPeterson Database cleaning • The database_cleaner gem can clean

    with truncation or deletion: • slower than transactional, but realistic • make sure database cleanup is running after Capybara’s cleanup
  18. 41.

    @SonjaBPeterson @SonjaBPeterson Know your database cleaner • Spend the time

    to understand how your database cleaning works, when it runs, and any potential gotchas
  19. 45.

    @SonjaBPeterson @SonjaBPeterson Other sources of order dependency • The browser

    • Capybara should take care of it • Global/class variables • especially watch out for any hashes, since changing a value within them won't trigger a warning
  20. 46.

    @SonjaBPeterson @SonjaBPeterson Identifying order dependency • Try replicating the failure

    with same set of tests in the same order • Cross reference each failed occurrence and see if the same tests ran before the failure
  21. 47.

    @SonjaBPeterson @SonjaBPeterson Preventing order dependency • Configure your test suite

    to run in random order • Understand your test setup and teardown process, and work to close any gaps where shared state to leak through
  22. 48.
  23. 51.

    @SonjaBPeterson @SonjaBPeterson Example: Time it "should set a default due

    date" do task = Task.create expected_due_date = (Date.today + 1).end_of_day expect(task.due_date).to eq expected_due_date end Starts failing after 7pm every night!
  24. 52.

    @SonjaBPeterson @SonjaBPeterson Example: Time it "should set a default due

    date" do task = Task.create expected_due_date = (Date.today + 1).end_of_day expect(task.due_date).to eq expected_due_date end Date.today = system time = UTC Date.tomorrow = time in zone = EST
  25. 53.

    @SonjaBPeterson @SonjaBPeterson Example: Time it "should set a default due

    date" do task = Task.create expected_due_date = (Date.current + 1).end_of_day expect(task.due_date).to eq expected_due_date end
  26. 54.

    @SonjaBPeterson @SonjaBPeterson Example: Time it "should set a default due

    date" do Timecop.freeze(Time.zone.local(2019, 1, 1, 10, 5, 0)) do task = Task.create expected_due_date = Time.zone.local(2019, 1, 2, 23, 59, 59) expect(task.due_date).to eq expected_due_date end end
  27. 55.

    @SonjaBPeterson @SonjaBPeterson Identifying time-based flakes • Are there any references

    to date or time in the test or the code under test? • Has every observed failure happened before or after a certain hour of day? • See if you can reliably replicate failure using Timecop
  28. 56.

    @SonjaBPeterson @SonjaBPeterson Preventing time flakes • If the current time

    could affect the test, freeze it to a specific value with Timecop, or test it with a passed in, static value • Set up your test suite to wrap every test in Timecop.travel with a random time of day to surface after hours failures faster
  29. 61.

    @SonjaBPeterson @SonjaBPeterson Identifying unordered collections • Look for any assertions

    about the order of a collection, the contents of an array, or the first or last item in one
  30. 64.
  31. 68.

    @SonjaBPeterson @SonjaBPeterson Identifying randomness-based flakes • Look for use of

    random number generator - often this is used in factories/fixtures • Try to see if you can reliably replicate failure with the same --seed option • In RSpec, make sure you’ve set Kernel.srand(config.seed) in spec_helper.rb
  32. 69.

    @SonjaBPeterson @SonjaBPeterson Preventing randomness based flakes • Remove randomness from

    your tests & instead explicitly test boundaries & edge cases • Avoid using gems like Faker to generate data in tests
  33. 71.

    @SonjaBPeterson @SonjaBPeterson Strategy tips • Run through each category &

    look for identifying signs • Don’t use trial & error to find a fixes - form a strong theory first • Do try to find a way to reliably replicate failures to prove your theory
  34. 72.

    @SonjaBPeterson @SonjaBPeterson If you’re stuck • Consider adding code that

    will give you more information next time it fails • Try pairing with another developer
  35. 77.

    @SonjaBPeterson @SonjaBPeterson system & feature tests unit tests More realistic,

    slower, more likely to flake Simpler, faster, less likely to flake
  36. 78.

    @SonjaBPeterson @SonjaBPeterson Fixing flaky tests as a team • Make

    fixing flaky tests high priority since they affect everyone’s velocity • Make sure someone is assigned to each active flake & responsibility is spread across the whole team • Set a target for your master branch pass rate and track it week over week
  37. 79.

    @SonjaBPeterson @SonjaBPeterson Flaky tests give you an opportunity to gain

    a deeper understanding of your tools & your code.