Fixing Flaky Tests Like a Detective

Hi! I’m Sonja Senior Software Engineer at Devoted Health @SonjaBPeterson
on Twitter

@SonjaBPeterson @SonjaBPeterson Storytime

@SonjaBPeterson @SonjaBPeterson

@SonjaBPeterson @SonjaBPeterson How hard could it be?

@SonjaBPeterson @SonjaBPeterson You can (almost) never use trial and error
to find a fix for a flaky test

@SonjaBPeterson @SonjaBPeterson Even just a few flaky tests slow down
your entire team

@SonjaBPeterson @SonjaBPeterson Every flake erodes a little more trust in
the test suite

@SonjaBPeterson @SonjaBPeterson A method for fixing flaky tests:

@SonjaBPeterson @SonjaBPeterson 1. Gather information 2. Identify type of flaky
test it might be 3. Come up with a theory of how it’s happening 4. Fix it!

@SonjaBPeterson @SonjaBPeterson 1. Gather evidence 2. Identify suspects 3. Come
up with a theory of means and motive 4. Solve it!

Gathering evidence

@SonjaBPeterson @SonjaBPeterson Types of Evidence • Error messages and output
from each failure • Time of day failures occurred • How often the test is failing • Which tests were run before, in what order

@SonjaBPeterson @SonjaBPeterson How to collect evidence • Set up any
master build failure to report to a bug tracker with a link to the build • Make sure failures of the same test can be grouped together

The Usual Suspects

@SonjaBPeterson @SonjaBPeterson Async code Order dependency Time Unordered collections Randomness

Async code

@SonjaBPeterson @SonjaBPeterson A test in which some code runs asynchronously,
resulting in events in the test happening in varying orders

@SonjaBPeterson @SonjaBPeterson Basically, all of your system/ feature tests using
Capybara

@SonjaBPeterson @SonjaBPeterson Async code in system/feature tests • At least
3 different threads involved: • Main thread executing your test • Another thread running your Rails server • A thread in a separate process running the browser

@SonjaBPeterson @SonjaBPeterson Example: async code click_on “Submit Post” expect(Post.count).to eq
1

@SonjaBPeterson @SonjaBPeterson click_on “Submit Post” checks DB Your test code
Browser click triggers Ajax updates UI Test Rails server creates blog post in DB

@SonjaBPeterson @SonjaBPeterson Example: async code click_on “Submit Post” expect(page).to have_content(“Post
Created”) expect(Post.count).to eq 1

@SonjaBPeterson @SonjaBPeterson click_on “Submit Post” checks DB Your test code
Browser click triggers Ajax updates UI Test Rails server creates post in DB have_content waits

@SonjaBPeterson @SonjaBPeterson Another example: async code visit books_path click_on "Sort”
expect_alphabetical_order click_on "Sort" expect_reversed_alphabetical_order waits for books to appear in sorted order waits for books to appear in reverse order

@SonjaBPeterson @SonjaBPeterson Another example: async code visit books_path click_on "Sort"
expect_alphabetical_order click_on "Sort" expect_reversed_alphabetical_order not actually waiting hits before ﬁrst reload books already sorted

@SonjaBPeterson @SonjaBPeterson Another example: async code click_on "Sort" expect(page).to have_content("Sort:
ASC") expect_alphabetical_order click_on "Sort" expect(page).to have_content("Sort: DESC") expect_reversed_alphabetical_order

@SonjaBPeterson @SonjaBPeterson Identifying async code flakes • Is it a
system/feature test? • Does it trigger any events without explicitly waiting for the results? • Use Capybara’s page.save_screenshot or the capybara-screenshot gem

@SonjaBPeterson @SonjaBPeterson Preventing async flakes • Make sure your test
is waiting for each action to ﬁnish. • Don’t use sleep - wait for something speciﬁc • Understand which Capybara methods wait and which don’t • Check that each assertion is working as expected

Order dependency

@SonjaBPeterson @SonjaBPeterson A test that can pass or fail depending
on which tests ran before it

@SonjaBPeterson @SonjaBPeterson Usually, caused by some sort of state "leaking"
between tests

@SonjaBPeterson @SonjaBPeterson Potential areas of shared state • The database
• Global or class variables • The browser (in tests that run in one)

@SonjaBPeterson @SonjaBPeterson Database cleaning • Each test should start with
a "clean" DB • Transactions are fastest & are the default in Rails • In the past, couldn’t be used with Capybara tests because the test code & test server didn’t share a DB connection

@SonjaBPeterson @SonjaBPeterson Database cleaning • Rails 5 system tests ﬁxed
this by allowing shared access to DB connections in tests • Running in transactions may have subtle differences from normal behavior

@SonjaBPeterson @SonjaBPeterson Database cleaning • The database_cleaner gem can clean
with truncation or deletion: • slower than transactional, but realistic • make sure database cleanup is running after Capybara’s cleanup

@SonjaBPeterson @SonjaBPeterson Know your database cleaner • Spend the time
to understand how your database cleaning works, when it runs, and any potential gotchas

@SonjaBPeterson @SonjaBPeterson Example: order dependency DatabaseCleaner.strategy = :truncation

@SonjaBPeterson @SonjaBPeterson Example: order dependency DatabaseCleaner.strategy = :truncation, {:except =>
%w[book_genres]}

@SonjaBPeterson @SonjaBPeterson Example: order dependency book_genre = BookGenre.find_by(name: "Mystery")  book_genre.update!(status:
"deleted")

@SonjaBPeterson @SonjaBPeterson Other sources of order dependency • The browser
• Capybara should take care of it • Global/class variables • especially watch out for any hashes, since changing a value within them won't trigger a warning

@SonjaBPeterson @SonjaBPeterson Identifying order dependency • Try replicating the failure
with same set of tests in the same order • Cross reference each failed occurrence and see if the same tests ran before the failure

@SonjaBPeterson @SonjaBPeterson Preventing order dependency • Conﬁgure your test suite
to run in random order • Understand your test setup and teardown process, and work to close any gaps where shared state to leak through

on the time of day when it is run

@SonjaBPeterson @SonjaBPeterson Example: Time def set_default_due_date if due_date.nil? self.due_date =
Date.tomorrow.end_of_day end end

@SonjaBPeterson @SonjaBPeterson Example: Time it "should set a default due
date" do task = Task.create expected_due_date = (Date.today + 1).end_of_day expect(task.due_date).to eq expected_due_date end Starts failing after 7pm every night!

date" do task = Task.create expected_due_date = (Date.today + 1).end_of_day expect(task.due_date).to eq expected_due_date end Date.today = system time = UTC Date.tomorrow = time in zone = EST

date" do task = Task.create expected_due_date = (Date.current + 1).end_of_day expect(task.due_date).to eq expected_due_date end

date" do Timecop.freeze(Time.zone.local(2019, 1, 1, 10, 5, 0)) do task = Task.create expected_due_date = Time.zone.local(2019, 1, 2, 23, 59, 59) expect(task.due_date).to eq expected_due_date end end

@SonjaBPeterson @SonjaBPeterson Identifying time-based flakes • Are there any references
to date or time in the test or the code under test? • Has every observed failure happened before or after a certain hour of day? • See if you can reliably replicate failure using Timecop

@SonjaBPeterson @SonjaBPeterson Preventing time flakes • If the current time
could affect the test, freeze it to a speciﬁc value with Timecop, or test it with a passed in, static value • Set up your test suite to wrap every test in Timecop.travel with a random time of day to surface after hours failures faster

Unordered collections

on the order of a set of items

@SonjaBPeterson @SonjaBPeterson Example: Unordered Collections active_posts = Post.where(state: active) expect(active_posts).to
eq([post1, post2])

@SonjaBPeterson @SonjaBPeterson Example: Unordered Collections active_posts = Post.where(state: active).order(:id) expected_posts
= [post1, post2].sort_by(&:id) expect(active_posts).to eq(expected_posts)

@SonjaBPeterson @SonjaBPeterson Identifying unordered collections • Look for any assertions
about the order of a collection, the contents of an array, or the ﬁrst or last item in one

@SonjaBPeterson @SonjaBPeterson Preventing unordered collection flakes • Use match_array (RSpec)
when you don’t care about order, or add an explicit sort

Randomness

on the output of a random number generator

@SonjaBPeterson @SonjaBPeterson Example: Randomness factory :event do start_date { Date.current
+ rand(5) } end

+ rand(5) } end_date { Date.current + rand(10) } end

+ 5 } end_date { Date.current + 10 } end

@SonjaBPeterson @SonjaBPeterson Identifying randomness-based flakes • Look for use of
random number generator - often this is used in factories/ﬁxtures • Try to see if you can reliably replicate failure with the same --seed option • In RSpec, make sure you’ve set Kernel.srand(conﬁg.seed) in spec_helper.rb

@SonjaBPeterson @SonjaBPeterson Preventing randomness based flakes • Remove randomness from
your tests & instead explicitly test boundaries & edge cases • Avoid using gems like Faker to generate data in tests

Forming a theory & solving it

@SonjaBPeterson @SonjaBPeterson Strategy tips • Run through each category &
look for identifying signs • Don’t use trial & error to find a fixes - form a strong theory first • Do try to find a way to reliably replicate failures to prove your theory

@SonjaBPeterson @SonjaBPeterson If you’re stuck • Consider adding code that
will give you more information next time it fails • Try pairing with another developer

@SonjaBPeterson @SonjaBPeterson Can I just delete it??

@SonjaBPeterson @SonjaBPeterson Fixing flaky tests is an unavoidable part of
writing tests

@SonjaBPeterson @SonjaBPeterson Think about test coverage holistically

@SonjaBPeterson @SonjaBPeterson When writing tests, we constantly make trade-offs between
realism & maintainability

@SonjaBPeterson @SonjaBPeterson system & feature tests unit tests More realistic,
slower, more likely to flake Simpler, faster, less likely to flake

@SonjaBPeterson @SonjaBPeterson Fixing flaky tests as a team • Make
fixing flaky tests high priority since they affect everyone’s velocity • Make sure someone is assigned to each active flake & responsibility is spread across the whole team • Set a target for your master branch pass rate and track it week over week

@SonjaBPeterson @SonjaBPeterson Flaky tests give you an opportunity to gain
a deeper understanding of your tools & your code.

Thanks for coming! @SonjaBPeterson

Fixing Flaky Tests Like a Detective

Fixing Flaky Tests Like a Detective

Other Decks in Technology

Featured

Transcript