Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Fixing Flaky Tests Like a Detective

Fixing Flaky Tests Like a Detective

Every test suite has them: a few tests that usually pass but sometimes mysteriously fail when run on the same code. Since they can’t be reliably replicated, they can be tough to fix. The good news is there’s a set of usual suspects that cause them: test order, async code, time, sorting and randomness. While walking through examples of each type, I’ll show you methods for identifying a culprit that range from capturing screenshots to traveling through time. You’ll leave with the skills to fix any flaky test fast, and with strategies for monitoring and improving your test suite's reliability overall.

Sonja Peterson

May 01, 2019
Tweet

Other Decks in Technology

Transcript

  1. @SonjaBPeterson @SonjaBPeterson 1. Gather information 2. Identify type of flaky

    test it might be 3. Come up with a theory of how it’s happening 4. Fix it!
  2. @SonjaBPeterson @SonjaBPeterson Types of Evidence • Error messages and output

    from each failure • Time of day failures occurred • How often the test is failing • Which tests were run before, in what order
  3. @SonjaBPeterson @SonjaBPeterson How to collect evidence • Set up any

    master build failure to report to a bug tracker with a link to the build • Make sure failures of the same test can be grouped together
  4. @SonjaBPeterson @SonjaBPeterson A test in which some code runs asynchronously,

    resulting in events in the test happening in varying orders
  5. @SonjaBPeterson @SonjaBPeterson Async code in system/feature tests • At least

    3 different threads involved: • Main thread executing your test • Another thread running your Rails server • A thread in a separate process running the browser
  6. @SonjaBPeterson @SonjaBPeterson click_on “Submit Post” checks DB Your test code

    Browser click triggers Ajax updates UI Test Rails server creates blog post in DB
  7. @SonjaBPeterson @SonjaBPeterson click_on “Submit Post” checks DB Your test code

    Browser click triggers Ajax updates UI Test Rails server creates post in DB have_content waits
  8. @SonjaBPeterson @SonjaBPeterson Another example: async code visit books_path click_on "Sort”

    expect_alphabetical_order click_on "Sort" expect_reversed_alphabetical_order waits for books to appear in sorted order waits for books to appear in reverse order
  9. @SonjaBPeterson @SonjaBPeterson Another example: async code visit books_path click_on "Sort"

    expect_alphabetical_order click_on "Sort" expect_reversed_alphabetical_order not actually waiting hits before first reload books already sorted
  10. @SonjaBPeterson @SonjaBPeterson Another example: async code click_on "Sort" expect(page).to have_content("Sort:

    ASC") expect_alphabetical_order click_on "Sort" expect(page).to have_content("Sort: DESC") expect_reversed_alphabetical_order
  11. @SonjaBPeterson @SonjaBPeterson Identifying async code flakes • Is it a

    system/feature test? • Does it trigger any events without explicitly waiting for the results? • Use Capybara’s page.save_screenshot or the capybara-screenshot gem
  12. @SonjaBPeterson @SonjaBPeterson Preventing async flakes • Make sure your test

    is waiting for each action to finish. • Don’t use sleep - wait for something specific • Understand which Capybara methods wait and which don’t • Check that each assertion is working as expected
  13. @SonjaBPeterson @SonjaBPeterson Potential areas of shared state • The database

    • Global or class variables • The browser (in tests that run in one)
  14. @SonjaBPeterson @SonjaBPeterson Database cleaning • Each test should start with

    a "clean" DB • Transactions are fastest & are the default in Rails • In the past, couldn’t be used with Capybara tests because the test code & test server didn’t share a DB connection
  15. @SonjaBPeterson @SonjaBPeterson Database cleaning • Rails 5 system tests fixed

    this by allowing shared access to DB connections in tests • Running in transactions may have subtle differences from normal behavior
  16. @SonjaBPeterson @SonjaBPeterson Database cleaning • The database_cleaner gem can clean

    with truncation or deletion: • slower than transactional, but realistic • make sure database cleanup is running after Capybara’s cleanup
  17. @SonjaBPeterson @SonjaBPeterson Know your database cleaner • Spend the time

    to understand how your database cleaning works, when it runs, and any potential gotchas
  18. @SonjaBPeterson @SonjaBPeterson Other sources of order dependency • The browser

    • Capybara should take care of it • Global/class variables • especially watch out for any hashes, since changing a value within them won't trigger a warning
  19. @SonjaBPeterson @SonjaBPeterson Identifying order dependency • Try replicating the failure

    with same set of tests in the same order • Cross reference each failed occurrence and see if the same tests ran before the failure
  20. @SonjaBPeterson @SonjaBPeterson Preventing order dependency • Configure your test suite

    to run in random order • Understand your test setup and teardown process, and work to close any gaps where shared state to leak through
  21. @SonjaBPeterson @SonjaBPeterson Example: Time it "should set a default due

    date" do task = Task.create expected_due_date = (Date.today + 1).end_of_day expect(task.due_date).to eq expected_due_date end Starts failing after 7pm every night!
  22. @SonjaBPeterson @SonjaBPeterson Example: Time it "should set a default due

    date" do task = Task.create expected_due_date = (Date.today + 1).end_of_day expect(task.due_date).to eq expected_due_date end Date.today = system time = UTC Date.tomorrow = time in zone = EST
  23. @SonjaBPeterson @SonjaBPeterson Example: Time it "should set a default due

    date" do task = Task.create expected_due_date = (Date.current + 1).end_of_day expect(task.due_date).to eq expected_due_date end
  24. @SonjaBPeterson @SonjaBPeterson Example: Time it "should set a default due

    date" do Timecop.freeze(Time.zone.local(2019, 1, 1, 10, 5, 0)) do task = Task.create expected_due_date = Time.zone.local(2019, 1, 2, 23, 59, 59) expect(task.due_date).to eq expected_due_date end end
  25. @SonjaBPeterson @SonjaBPeterson Identifying time-based flakes • Are there any references

    to date or time in the test or the code under test? • Has every observed failure happened before or after a certain hour of day? • See if you can reliably replicate failure using Timecop
  26. @SonjaBPeterson @SonjaBPeterson Preventing time flakes • If the current time

    could affect the test, freeze it to a specific value with Timecop, or test it with a passed in, static value • Set up your test suite to wrap every test in Timecop.travel with a random time of day to surface after hours failures faster
  27. @SonjaBPeterson @SonjaBPeterson Identifying unordered collections • Look for any assertions

    about the order of a collection, the contents of an array, or the first or last item in one
  28. @SonjaBPeterson @SonjaBPeterson Identifying randomness-based flakes • Look for use of

    random number generator - often this is used in factories/fixtures • Try to see if you can reliably replicate failure with the same --seed option • In RSpec, make sure you’ve set Kernel.srand(config.seed) in spec_helper.rb
  29. @SonjaBPeterson @SonjaBPeterson Preventing randomness based flakes • Remove randomness from

    your tests & instead explicitly test boundaries & edge cases • Avoid using gems like Faker to generate data in tests
  30. @SonjaBPeterson @SonjaBPeterson Strategy tips • Run through each category &

    look for identifying signs • Don’t use trial & error to find a fixes - form a strong theory first • Do try to find a way to reliably replicate failures to prove your theory
  31. @SonjaBPeterson @SonjaBPeterson If you’re stuck • Consider adding code that

    will give you more information next time it fails • Try pairing with another developer
  32. @SonjaBPeterson @SonjaBPeterson system & feature tests unit tests More realistic,

    slower, more likely to flake Simpler, faster, less likely to flake
  33. @SonjaBPeterson @SonjaBPeterson Fixing flaky tests as a team • Make

    fixing flaky tests high priority since they affect everyone’s velocity • Make sure someone is assigned to each active flake & responsibility is spread across the whole team • Set a target for your master branch pass rate and track it week over week
  34. @SonjaBPeterson @SonjaBPeterson Flaky tests give you an opportunity to gain

    a deeper understanding of your tools & your code.