Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Secret Ingredient: How to Understand and Resolve Just about Any Flaky Test

The Secret Ingredient: How to Understand and Resolve Just about Any Flaky Test

Flaky tests are an inscrutable bane. Hard to understand. Annoying. And, so frustrating! My personal nemesis is Daylight Saving Time. I can’t tell you how many times I’ve tripped over it. Let’s just say I was well into the “shame on me” part of that relationship, until I discovered the secret ingredient that nearly all flaky tests have in common. Turns out they only seem inscrutable. It really is possible to understand and resolve just about any flaky test.

Alan Ridlehoover

November 15, 2023
Tweet

More Decks by Alan Ridlehoover

Other Decks in Programming

Transcript

  1. It’s after 4 o’clock. The release was due an hour ago.

    You’ve got less than an hour to leave, or you’re going to be late for that thing…

    You can feel the clock ticking…

    View full-size slide

  2. Slack clicks to life…

    You check your messages…

    < HEAVY SIGH >

    View full-size slide

  3. The build just failed.

    A-gain.

    You look at the build.

    You look at the clock.

    < SHAKING HEAD > You don’t have time for flakiness…

    View full-size slide

  4. So, you re-run the build.

    A-gain.

    Two builds. Five different failing specs. None of them have anything to do with your commit.

    All you can think about is how you can’t be late to another thing…

    View full-size slide

  5. If only you knew the secret ingredient that all flaky tests have in common…

    You might be on your way to that thing right now…

    View full-size slide

  6. Hello! My name is Alan Ridlehoover. I’m an Engineering Manager at Cisco Meraki — the largest Rails shop you’ve never heard of.

    And, though I’m not a baker, I do know a thing or two about flakiness. In fact… sometimes…

    View full-size slide

  7. It’s all I can think about!

    Seriously! Since I started automating tests over 20 years ago, I’ve written my fair share of flaky specs. Daylight Saving Time is my personal nemesis. I can’t tell you how many times I’ve tripped on that. Let’s just say I’m well into the “shame on me” part of that relationship. Or, I was…

    But, I’m getting ahead of myself. Let’s start with a definition. What is a flaky test?

    View full-size slide

  8. A flaky spec is one that changes state without modification to either the test itself or the code being tested.

    View full-size slide

  9. So, if you write a spec A…

    View full-size slide

  10. And a method #foo…

    View full-size slide

  11. that makes the spec pass…

    Then you can expect that as long as…

    View full-size slide

  12. and the method remain unchanged…

    View full-size slide

  13. The spec should continue to pass.

    View full-size slide

  14. It’s when the spec…

    View full-size slide

  15. And the method stay the same…

    View full-size slide

  16. but the result changes…

    View full-size slide

  17. That’s when you know you have a flaky spec.

    But, how does this happen?

    View full-size slide

  18. Well, it happens because of the secret ingredient that all flaky tests have in common…

    But, what is it? What is that secret ingredient?

    View full-size slide

  19. It’s an assumption.

    All flaky tests make invalid assumptions about their environment.

    They assume their environment will be in a particular state when they begin. But that assumption is rendered incorrect by some change in the environment between or during test runs.

    View full-size slide

  20. Ok. But, what causes that change to the environment?

    Well, there are three recipes:
    * Non-determinism
    * Order dependence, and
    * Race conditions

    Let’s take a look at each of these, along with some examples in code…

    Starting with…

    View full-size slide

  21. …non-determinism.

    So, what is non-determinism? For that matter, what is determinism?

    View full-size slide

  22. Well, a deterministic algorithm is one that, given the same inputs, always produces the same output.

    For example...

    View full-size slide

  23. If I take these parameters,

    View full-size slide

  24. and pass them to a method called add,

    View full-size slide

  25. it should always return 2,

    View full-size slide

  26. no matter how many

    View full-size slide

  27. times you call it.

    View full-size slide

  28. But, what if there were a method #foo that

    View full-size slide

  29. always returned true

    View full-size slide

  30. until it didn’t.

    That’s the definition of non-determinism: an algorithm that, given the same inputs, does not always produce the same output.

    But, how could this be?

    View full-size slide

  31. Well, it might sound obvious, but utilizing non-deterministic features of the environment leads to non-deterministic code, including…

    * Random number generators - clearly - these are intended to be, well, random
    * The system clock - we don’t always think of this, but it’s always changing
    * Network connections - that might be up one minute and down the next
    * Floating point precision - it’s not guaranteed

    These are just a few examples, I’m sure this list is not exhaustive.

    But, what if our code relies on these things? How can we possibly write deterministic tests?

    View full-size slide

  32. The trick is to remove the non-determinism from the test by stubbing it, or to account for it by using advanced matchers so that the spec produces consistent results from one run to the next. To do that…

    * You can stub the random number generator to return a specific number
    * You can mock (or “freeze”) time
    * You can stub network responses
    * And, for floats, you can leverage some of RSpec’s more advanced matchers, like: be_witihin, and be_between.

    And, please! < ANIMATE > Don’t forget to document the undocumented use case with a spec!

    View full-size slide

  33. Ok. So, while that build is running, let’s see if we can fix some of those flaky specs that are making you late for that thing…

    First, a bit of context…

    The code we’re about to look at is entirely made up. Well, I guess, technically, all code is made up. But, what I mean is that this code was made up fresh, just for this talk. By me. With TDD. Not ChatGPT.

    It’s not production code. But, it was inspired by real code that I’ve personally worked on in production applications.

    It’s a bit of a hodge podge. It’s a class called RubyConf that provides some utility methods that might be useful for running the conference.

    View full-size slide

  34. Here’s a bit of code to determine whether or not the conference is currently in progress.

    View full-size slide

  35. Notice that the method uses the system clock to determine the current date and time.

    Simple enough. Let’s look at the specs…

    View full-size slide

  36. Oh… There’s only one spec. Hmmm… O…K… Well, let’s take a look at it.

    View full-size slide

  37. It says that the #in_progress? Method returns false before the conference begins. Ok. That makes sense. But, it does seem like the author forgot at least two other cases: during the conference and after the conference.

    But you know what? This is a common problem I see with date based specs. The author of the spec is living in the now. They aren’t thinking about the future. In fact, this is exactly what happens to me with Daylight Saving Time. I forget about it and never write a spec that proves the software still works after the clock changes.

    I bet this spec ran fine before the conference. But, it’s failing now that the conference is actually in progress.

    Let’s play with the system clock to see if I’m right…

    View full-size slide

  38. Ok. As predicted, it passes if I set the clock to October 21, 2023 - well before the conference. So, we know this is a flaky test because it was passing and now it’s failing despite there being no modifications to the code or tests.

    View full-size slide

  39. And, if I set the clock to the first day of the conference? It fails.

    This spec is flaky. It depends on the system clock being in a particular state.

    Ok, how do we fix this?

    Remember, whenever we’re facing non-deterministic flakiness, we want to mock the non-determinism to make it deterministic. In this case, that’s the system clock…

    View full-size slide

  40. So, here’s the code and the spec as they were…

    View full-size slide

  41. And, here it is with the fixed spec.

    Notice that the only difference here is that the new spec is mocking (or freezing) time. The code didn’t change at all. It’s fine. Only the spec changed.

    It sets time to a specific date so that the spec will never run outside that context.

    It does this for the duration of the block, then it returns the system to it’s normal state.

    Ok. Let’s see if that fixed it…

    View full-size slide

  42. Great! So, now the specs pass on November 13th, the first day of the conference.

    And, I even added specs for the missing use cases by freezing time and setting the dates appropriately.

    Like this…

    View full-size slide

  43. These specs freeze time the same way the other spec freezes time, just with different dates and expected results.

    View full-size slide

  44. Ok. Next let’s look at fixing a test that fails when the network goes down…

    View full-size slide

  45. It’s not uncommon for code to need to call external services across a network. In this session_description method,

    View full-size slide

  46. we’re calling an API to fetch the conference schedule.

    View full-size slide

  47. Then we’re parsing it, finding the session that matches the title parameter, and returning it’s description.

    View full-size slide

  48. And, here’s the spec.

    View full-size slide

  49. With WiFi enabled, this spec passes.

    View full-size slide

  50. Note that the call to the network adds over a second to the runtime. And, that’s when it succeeds. Most get requests default to a 60 second timeout when waiting for the other service to respond.

    Plus…

    View full-size slide

  51. With WiFi turned off, the spec fails.

    View full-size slide

  52. Fortunately, it fails quickly because the network failure is on my end.

    Now, these are particularly nasty tests to debug because the loss in connectivity is neither logged, nor persistent. So, by the time you’re debugging the failure, it may not be possible to reproduce. Pay attention to HTTP calls, or any other type of call that crosses a network (e.g. GRPC). And, try running your specs with WiFi turned off to see if you can catch any failures.

    View full-size slide

  53. Alright. Let’s fix this spec…

    Here it is a bit smaller than before. Same code. Different font size, because the fix is a bit large…

    View full-size slide

  54. Notice that again, the code itself is not changing. The problem is with the spec.

    Most of the changes to the spec are setting up data to stub a response from the API.

    Here’s where we’re actually creating the stub.

    This allows the spec to validate that we’re parsing the results correctly. That’s the code we care about. We don’t actually care whether the external service is up and running when we’re running our test suite. It shouldn’t matter.

    View full-size slide

  55. Now, sometime I get a bit of pushback when I offer this advice. Folks ask, “What if the API changes? How will we know if we’re mocking the response?”

    My answer to that is that each spec should have one and only one reason to exist. This spec’s reason to exist is to verify the code we wrote works the way we expect. It is a unit test. We want these to have as few dependencies as possible so they’ll run fast.

    You may ALSO require a spec to validate that an API’s schema has not changed. That’s a different reason for a spec to exist. So, that’s not this spec. In fact, that’s not even a unit test. It’s an integration test. And, it’s one that’s designed to fail in order to catch changes to the API. So, we probably don’t want to run it with our unit tests, which are designed to pass. Maybe the integration tests should be run separately, on a schedule, rather than intermingled with our unit tests on every build.

    Alright. Let’s run the specs.

    View full-size slide

  56. Alright! Running the specs with the WiFi turned off proves that the stubbed response prevented the spec from flaking.

    View full-size slide

  57. It’s also important to point out the difference in time between the live version and the stubbed version. The live version took 1.3 seconds to execute. This version took less than 1/100th of a second. Those decisions really add up as your test suite grows. They can become a real problem when you hit one hundred thousand specs like we did a few months ago.

    View full-size slide

  58. Ok. It’s now 4:15. Those specs took us about 10 minutes to resolve. That wasn’t so bad. But, you can still feel the clock ticking. Are you going to be able to make it to the thing on time?

    View full-size slide

  59. Next, let’s take a look at order dependence, starting with a definition…

    View full-size slide

  60. Order dependent specs are specs that pass in isolation, but fail when run with other specs in a specific order.

    So, for example,

    View full-size slide

  61. If Test A and B both pass when run in alphabetical order.

    View full-size slide

  62. But, Test A fails when run after Test B.

    View full-size slide

  63. That makes Test A flaky, and Test B leaky.

    But, what does that mean? Leaky?

    View full-size slide

  64. Well, remember, these specs are making an invalid assumption about their environment. Their environment includes all of the shared state they have access to. It works like this…

    View full-size slide

  65. Let’s pretend this blue square is the starting point for the shared environment. Spec A runs first, so it gets the blue square environment and passes.

    View full-size slide

  66. Spec A does not modify the environment, so spec B runs in the same context as spec A. It also passes.

    View full-size slide

  67. But imagine, if spec B runs first… It gets the starting environment… The blue square.

    View full-size slide

  68. And, it changes the environment to a pink hexagon

    View full-size slide

  69. causing spec A to fail.

    View full-size slide

  70. So, what’s happening is that state from spec B is leaking into the environment, causing spec A to flake.

    For this reason, this class of specs are often referred to as “leaky.”

    View full-size slide

  71. So, isn’t the leaky spec the real problem here?

    Not really.

    Both specs are to blame.

    Only one is breaking your build.

    Fix the broken spec first.

    Often you’ll find that fixing the broken spec will point to a broader solution.

    But, how do you do fix order dependent flakiness? Well, first, let’s take a look at what causes these kinds of failures…

    View full-size slide

  72. Order dependent failures are caused by mutable state that is shared across specs.

    This could be in the form of:
    * Broadly scoped variables, like global or class variables
    * Databases
    * Key/value stores
    * Caches
    * Or, even the DOM, if you’re writing JavaScript tests.

    Alright, that’s what causes order dependency, but…

    View full-size slide

  73. How do you reproduce these things?

    View full-size slide

  74. First, eliminate non-determinism by running the failing spec repeatedly, in isolation. If it fails, that’s non-determinism.

    View full-size slide

  75. If not, then run ALL the specs that ran together with the failing spec. One of them is leaking state into the failing spec.

    View full-size slide

  76. If running the specs in the default order doesn’t reproduce the failure, randomize the order in which the specs are run using the rspec order random option. Keep running it until you find a seed that consistently causes the failure.

    View full-size slide

  77. Next, locate the leaky spec or specs. I say specs, plural, because sometimes it takes several specs running in a specific order to produce the failure. You can use rspec bisect to find the leaky specs for you.

    I’ll show you how in a minute…

    View full-size slide

  78. But, first, how can we fix order dependent failures?

    You can remove the shared state, make it immutable, or isolate it…

    * Don’t use broadly scoped variables
    * Mock the shared data store (which you can do easily with a layer of abstraction, like the repository pattern).
    * Use database transactions, or
    * Reset the shared state between specs

    View full-size slide

  79. Alright. Let’s see if we can fix another one of those flaky specs that are keeping you from that thing…

    View full-size slide

  80. Here’s a simple getter and setter to store a favorite session.

    View full-size slide

  81. Notice that both getter and setter leverage an object called Cache.

    And, they are calling a class method named `instance`. What is that? Let’s take a look.

    View full-size slide

  82. The Cache class is a simple, in-memory, key/value store backed by a hash.

    View full-size slide

  83. The `instance` method effectively turns this class into a singleton so that every reference to the Cache.instance method is getting the same instance.

    View full-size slide

  84. Here’s the specs for the `favorite_session` getter and setter…

    These specs pass when run in this order: getter first, then setter. But, they’ll fail in the opposite order because

    View full-size slide

  85. we’re storing the title of the session in the cache.

    View full-size slide

  86. So, the getter will return the value in the cache, not “Matz’s Keynote”.

    To prove that, let’s run rspec with the order random option to see if we can get it to fail…

    View full-size slide

  87. So, here you can see…

    I ran the specs with the order random option. And,

    View full-size slide

  88. RSpec chose the random seed 12322. And,
    The getter ran before the setter, so it passed.

    Let’s try that again…

    View full-size slide

  89. The getter ran before the setter, so it passed.

    Let’s try that again…

    View full-size slide

  90. Ok.

    I’m still running rspec with the order random option.

    View full-size slide

  91. This time rspec chose the seed 63603. And…

    View full-size slide

  92. Low and behold! The setter spec ran first…

    View full-size slide

  93. Causing the getter spec to fail.

    So, how do you go about fixing this?

    View full-size slide

  94. Well, we know that one of the specs that ran before the getter spec must have polluted the environment. In this case, we’re pretty sure the setter spec is the culprit because of the memoization.

    But, what if you didn’t know which of the specs was to blame for modifying the environment? That’s where rspec bisect comes in. Let’s take a look…

    View full-size slide

  95. Here, I’m running rspec bisect with the same order clause and seed that produced the failure. This is important, because bisect won’t work unless it can reproduce the failure.

    View full-size slide

  96. The first thing bisect does is to find the failing spec…

    View full-size slide

  97. Next, it analyzes whether or not the failure appears to be order dependent.

    In this case, it does.

    View full-size slide

  98. So, it performs a binary search, looking for the spec or specs that need to be run first in order for the failure to happen.

    Note that this can take a very long time if the set of candidate leaky specs is large.

    View full-size slide

  99. Then, finally, it reports the minimal command required to reproduce the problem.

    Run that command and you’ll see exactly which order the specs ran in to cause the issue.

    View full-size slide

  100. So, here we go…

    I’ve run the command that RSpec bisect gave us.

    View full-size slide

  101. And, sure enough, the setter spec is the culprit.

    So, how do we fix it?

    View full-size slide

  102. Here we are back at the beginning. The font is smaller because the solution here is bigger.

    One way we could approach this would be to call Cache dot clear in between specs. But, because our specs are currently sharing state, that would likely lead to a race condition on the build server where we’re probably running the specs in parallel. So, the solution I prefer is actually dependency injection. That’s a simple technique where I just pass the Cache object into the RubyConf object when it is created. So each spec can have it’s own cache.

    Here’s what that looks like in the code…

    View full-size slide

  103. First, here’s the new initializer. Notice that the cache parameter defaults to Cache.instance. So, if we don’t pass anything, the code will just use the singleton, which is what we want.

    By doing this we’ve now created a seam in the software that allows the specs to use their own cache objects. This prevents state from leaking between the specs, without modifying the behavior of the production code.

    To finish up, we need to modify the specs…

    View full-size slide

  104. Like this…

    View full-size slide

  105. Here, we’re creating a new instance of the Cache class and passing it to the RubyConf object when we create it.

    That’s it. That’s all there is to dependency injection.

    And, because each spec has it’s own cache, they no longer share state.

    Let’s run the specs again…

    View full-size slide

  106. I’m running the specs again — with the same randomized seed that caused them to fail in the first place.

    Now, they pass, even though

    View full-size slide

  107. The setter ran before the getter

    Voila!

    View full-size slide

  108. Ok! We’re making good progress, and it’s only 4:30! You might actually make that thing, after all! Just a couple more broken specs.

    View full-size slide

  109. Finally, let’s look at race conditions.

    View full-size slide

  110. What is a race condition?

    A race condition is what can happen when parallel processes compete over a scarce, shared resource.

    Let’s look at how that happens with a file…

    View full-size slide

  111. Let’s start by looking at two specs running in sequence.

    In this example, Spec 1…

    View full-size slide

  112. first writes to the file,

    View full-size slide

  113. then reads from it, checks the result,

    View full-size slide

  114. and passes.

    Once Spec 1 finishes, Spec 2 runs…

    View full-size slide

  115. first it writes to the same file,

    View full-size slide

  116. then reads from it, checks the result,

    View full-size slide

  117. and passes.

    But…

    View full-size slide

  118. When you run these same specs in parallel…

    View full-size slide

  119. Spec 1 writes to the file

    View full-size slide

  120. Then, spec 2 writes to the file

    View full-size slide

  121. Then, spec 1 reads from the file, checks the result,

    View full-size slide

  122. And, fails — because there were two rows, not one

    View full-size slide

  123. Then spec 2 writes to the file again

    View full-size slide

  124. Then spec 2 reads from the file, checks the result,

    View full-size slide

  125. And fails — because there were three rows, not two

    View full-size slide

  126. So, both specs in this case, are susceptible to parallel flakiness due to a race condition.

    But, since this is asynchronous code, it’s entirely possible that the specs could pass.

    View full-size slide

  127. This is why race conditions are notoriously hard to reproduce. So, how can you go about debugging them if you can’t reproduce them? Well, you want to take a methodical approach.

    View full-size slide

  128. The first thing to do is to eliminate non-determinism. Run the failing spec repeatedly in isolation. If it fails, that’s non-determinism.

    View full-size slide

  129. If not, try to eliminate order dependence. Run the failing spec and all the specs that ran with it repeatedly in different orders. If you can repro the failure, that’s order dependence.

    View full-size slide

  130. If that doesn’t work, then run the specs repeatedly in parallel with the Parallel RSpec gem. I specifically mention that gem because it seems better suited for running the specs locally than Parallel Test or other options like Knapsack, which seem targeted at Rails apps running on CI. It’s best to debug this locally if at all possible.

    View full-size slide

  131. If you still can’t reproduce it, you can try randomizing the order in which the specs run in parallel.

    View full-size slide

  132. Once you’ve reproduced it, or even if you can’t, what should you look for to fix?

    The main cause of race conditions on build servers is asynchronous code competing over scarce, shared resources. Those resources might include:

    * File, or
    * Socket IO
    * Thread pools
    * Connection pools, or even
    * Low memory

    Once you have a suspect, how do you fix it?

    View full-size slide

  133. Well…

    * For IO-based issues, you can substitute StringIO for other kinds of IO in your spec. I’ll share an example of this in a moment.
    * You can test that the correct messages are being sent between collaborating objects, rather than testing the return value of a method.
    * You can write thread-safe code.
    * You can test threaded code synchronously - by extracting the guts of the thread into a plain old Ruby object (or PORO) and testing that, or
    * You can switch to fibers instead of threads

    Fibers are cool, because you can test them synchronously which is awesome. They significantly reduce the chances of a race condition, because they’re in control of when they relinquish the CPU back to the OS. So, atomic operations can complete without interruption.

    * Finally, you can always add more resources to your test environment. Though, that’s a bit of an arms race. You’ll most likely end up come back and increasing it again, and again, and again.

    View full-size slide

  134. Ok. Let’s take a look at the last two flaky specs that are keeping you from that thing…

    View full-size slide

  135. So, this feature of the app manages a list of reservations. There are two methods, one to reserve a seat. The other to get a list of the attendees.

    As you can see, we’re just…

    View full-size slide

  136. writing the names to a file and

    View full-size slide

  137. reading from that file.

    Let’s take a look at the specs…

    View full-size slide

  138. The first spec ensures that writing “Mickey Mouse” to the file grows the number of attendees by 1.

    View full-size slide

  139. The second spec ensures that when writing multiple lines, Donald and Goofy, the attendee count goes up accordingly.

    These are fine specs. Let’s run them…

    View full-size slide

  140. Hey! They pass!

    When run in sequence. In fact, they will even pass in the opposite order. But…

    View full-size slide

  141. If I break out the `parallel_rspec` gem and run them in parallel, they both fail.

    View full-size slide

  142. In this case, the second spec actually finished first. It failed because the attendee count to grew by 3 not 2.

    View full-size slide

  143. And, the first spec, which finished second thanks to parallelism, failed because the count grew by 2 not 1.

    We’ve already seen how that can happen, but let’s walk through it again…

    View full-size slide

  144. Before we get into how the specs failed, let’s look at how this RSpec code works. It’s a little bit complicated.

    View full-size slide

  145. Here’s the expectation in the second spec. It defines two blocks of code.

    View full-size slide

  146. The first block is passed to the expect method. And,

    View full-size slide

  147. The second is passed to the change method.

    The way RSpec handles this is to execute the change block (to get the initial value),

    View full-size slide

  148. then the expect block,

    View full-size slide

  149. and then the change block again (to get the final value). Finally, RSpec subtracts the initial value from the final value to get the delta, which needs to match the “by” clause, which in this case is 2.

    View full-size slide

  150. Ok. Here we are back at the beginning…

    This time, let’s track the order of operations on this timeline.

    View full-size slide

  151. First the second spec reads the file to grab the attendee count, which should be 0.

    View full-size slide

  152. Next, the first write happens in the second spec…

    View full-size slide

  153. Next, the first spec reads the file. This time it gets 1.

    View full-size slide

  154. Next, the other writes occur. No telling what order. You’d need to look at the file. But, they both happen…

    View full-size slide

  155. And, finally, both reads happen again. Here, we know that the second spec finished first, because it’s output appeared first. But, it doesn’t really matter.

    View full-size slide

  156. Ok. So, now that we know how it failed, let’s go back to the beginning and show you the solution.

    Turns out, the Ruby core team thought of this. They knew that testing asynchronous File IO would be a challenge. So, they included a class called StringIO to simulate other kinds of IO in specs. StringIO is a string, but with the interface of a file.

    View full-size slide

  157. So, what we want to do is to…

    Allow File to receive open and yield the a StringIO object.

    View full-size slide

  158. So, now, when the code calls File.open, the actual object that it will receive is a StringIO object.

    One caveat: Because this string behaves like a file, it has a cursor.

    View full-size slide

  159. So, after writing to the string, we need to rewind it before we can read it.

    That’s not necessary prior to introducing the StringIO because the file was being closed when the object fell out of scope. Here, the StringIO object isn’t closed because it hasn’t fallen out of scope, because it was declared in the spec. So, we need to rewind, before we can read…

    Alright! The proof is in the pudding. Did we fix the race condition?

    View full-size slide

  160. So, here we are, 40 minutes in, and we’ve found and resolved ALL of the flaky specs that were keeping you from that thing!

    Time to wrap things up so we can get to that thing in the lunchroom!

    View full-size slide

  161. ‘Cause, I don’t know about you, but this talk always makes me hungry…

    View full-size slide

  162. Ok. Here’s a cheat sheet for the entire talk…

    View full-size slide

  163. Non-deterministic flakiness reproduces in isolation. Look for interactions with non-deterministic elements of the environment. To fix this kind of flakiness, mock the non-determinism to make it deterministic. Don’t forget about Timecop when working with date or time related specs. There are also tools like webMock and VCR for handling specs that require network connections. I didn’t show them here because I prefer to just use RSpec like I did in this presentation. But, lots of folks find those tools very useful.

    View full-size slide

  164. Order dependent flakiness only reproduces with other specs, run in a certain order. It will not reproduce in isolation. Look for state that is shared across tests. To fix order dependency, remove shared state, make it immutable, or isolate it. RSpec order random can help you reproduce the failures. And, RSpec bisect can help you locate the leaky specs.

    View full-size slide

  165. And, race conditions only reproduce with other specs when run in parallel, not in isolation. Look for asynchronous code, or exhaustible shared resources. To fix race conditions, isolate things from one another (like we did with StringIO), or use fibers instead of threads. Seriously, they’re amazing! Finally, you can use Parallel RSpec to repro the failures locally instead of on your build server.

    View full-size slide

  166. And keep in mind that the secret ingredient in every flaky spec is an invalid assumption about the environment in which it is running.

    Sometimes, just remembering that fact will help you identify and resolve the flakiness.

    Ask yourself, how can I ensure that the environment for this test is what it expects?

    View full-size slide

  167. Oh, and one more thing… I have a bit of a hot take…

    Debugging this stuff is hard enough. But, it gets one hundred times harder if your specs are too DRY. So, avoid the use of these features of RSpec. They seem harmless — useful even — when you’re writing the specs. But, ultimately they make debugging way too hard.

    So, try to avoid…
    * Shared specs
    * Shared contexts
    * Nested contexts, and
    * Let statements

    Your specs should be super communicative. They are, after all, the executable documentation for your code. If you have to scroll all over the place or open a ton of files to write the specs, you can be guaranteed that you’ll be doing the same when you’re trying to understand and debug them when they fail.

    View full-size slide

  168. Don’t get me wrong. I love RSpec. But, it’s best to leave your tests WET. And, I’m not alone in this belief.

    The fine folks at thoughtbot have written about it.

    And, in fact, I honestly think that DRY might be the worst programming advice ever.

    I told you it was a hot take. If you disagree, come find me so I can change your mind.

    View full-size slide

  169. Again, my name is Alan Ridlehoover. I do know a thing or two about flakiness. But, it took me 20+ years to get here. Hopefully, this talk has short circuited that for you…

    View full-size slide

  170. As I mentioned at the beginning of the talk, I work for Cisco Meraki. So, I also know a thing or two about connectivity! Here’s how to connect with me. And, that last item is where you can find the source code for this talk. There’s even a bonus flaky spec regarding a raffle winner that I didn’t have time to cover today.

    Cisco Meraki is probably the largest Rails shop you’ve never heard of. And, we’re growing. We are currently hiring for a limited number of roles. Come chat with us to find out what it’s like to work at Meraki.

    View full-size slide

  171. Finally, a little shameless self promotion…

    My friend, Fito von Zastrow, and I love Ruby so much, we occasionally release something into the wild in the hopes that folks will find it useful. You can find links to our stuff at first try dot software, including Rubyist, the opinionated VS Code color theme I used in this talk. We’d love for you to check it out.

    View full-size slide

  172. Thank you so much for coming!

    If you have questions, come chat with me offstage. Or find me at lunch.

    View full-size slide