$30 off During Our Annual Pro Sale. View Details »

The Secret Ingredient: How to Understand and Resolve Just about Any Flaky Test

The Secret Ingredient: How to Understand and Resolve Just about Any Flaky Test

Flaky tests are an inscrutable bane. Hard to understand. Annoying. And, so frustrating! My personal nemesis is Daylight Saving Time. I can’t tell you how many times I’ve tripped over it. Let’s just say I was well into the “shame on me” part of that relationship, until I discovered the secret ingredient that nearly all flaky tests have in common. Turns out they only seem inscrutable. It really is possible to understand and resolve just about any flaky test.

Alan Ridlehoover
PRO

November 15, 2023
Tweet

More Decks by Alan Ridlehoover

Other Decks in Programming

Transcript

  1. View Slide

  2. It’s after 4 o’clock. The release was due an hour ago.

    You’ve got less than an hour to leave, or you’re going to be late for that thing…

    You can feel the clock ticking…

    View Slide

  3. Slack clicks to life…

    You check your messages…

    < HEAVY SIGH >

    View Slide

  4. The build just failed.

    A-gain.

    You look at the build.

    You look at the clock.

    < SHAKING HEAD > You don’t have time for flakiness…

    View Slide

  5. So, you re-run the build.

    A-gain.

    Two builds. Five different failing specs. None of them have anything to do with your commit.

    All you can think about is how you can’t be late to another thing…

    View Slide

  6. If only you knew the secret ingredient that all flaky tests have in common…

    You might be on your way to that thing right now…

    View Slide

  7. Hello! My name is Alan Ridlehoover. I’m an Engineering Manager at Cisco Meraki — the largest Rails shop you’ve never heard of.

    And, though I’m not a baker, I do know a thing or two about flakiness. In fact… sometimes…

    View Slide

  8. It’s all I can think about!

    Seriously! Since I started automating tests over 20 years ago, I’ve written my fair share of flaky specs. Daylight Saving Time is my personal nemesis. I can’t tell you how many times I’ve tripped on that. Let’s just say I’m well into the “shame on me” part of that relationship. Or, I was…

    But, I’m getting ahead of myself. Let’s start with a definition. What is a flaky test?

    View Slide

  9. A flaky spec is one that changes state without modification to either the test itself or the code being tested.

    View Slide

  10. So, if you write a spec A…

    View Slide

  11. And a method #foo…

    View Slide

  12. that makes the spec pass…

    Then you can expect that as long as…

    View Slide

  13. The spec…

    View Slide

  14. and the method remain unchanged…

    View Slide

  15. The spec should continue to pass.

    View Slide

  16. It’s when the spec…

    View Slide

  17. And the method stay the same…

    View Slide

  18. but the result changes…

    View Slide

  19. That’s when you know you have a flaky spec.

    But, how does this happen?

    View Slide

  20. Well, it happens because of the secret ingredient that all flaky tests have in common…

    But, what is it? What is that secret ingredient?

    View Slide

  21. It’s an assumption.

    All flaky tests make invalid assumptions about their environment.

    They assume their environment will be in a particular state when they begin. But that assumption is rendered incorrect by some change in the environment between or during test runs.

    View Slide

  22. Ok. But, what causes that change to the environment?

    Well, there are three recipes:
    * Non-determinism
    * Order dependence, and
    * Race conditions

    Let’s take a look at each of these, along with some examples in code…

    Starting with…

    View Slide

  23. …non-determinism.

    So, what is non-determinism? For that matter, what is determinism?

    View Slide

  24. Well, a deterministic algorithm is one that, given the same inputs, always produces the same output.

    For example...

    View Slide

  25. If I take these parameters,

    View Slide

  26. and pass them to a method called add,

    View Slide

  27. it should always return 2,

    View Slide

  28. no matter how many

    View Slide

  29. times you call it.

    View Slide

  30. But, what if there were a method #foo that

    View Slide

  31. always returned true

    View Slide

  32. until it didn’t.

    That’s the definition of non-determinism: an algorithm that, given the same inputs, does not always produce the same output.

    But, how could this be?

    View Slide

  33. Well, it might sound obvious, but utilizing non-deterministic features of the environment leads to non-deterministic code, including…

    * Random number generators - clearly - these are intended to be, well, random
    * The system clock - we don’t always think of this, but it’s always changing
    * Network connections - that might be up one minute and down the next
    * Floating point precision - it’s not guaranteed

    These are just a few examples, I’m sure this list is not exhaustive.

    But, what if our code relies on these things? How can we possibly write deterministic tests?

    View Slide

  34. The trick is to remove the non-determinism from the test by stubbing it, or to account for it by using advanced matchers so that the spec produces consistent results from one run to the next. To do that…

    * You can stub the random number generator to return a specific number
    * You can mock (or “freeze”) time
    * You can stub network responses
    * And, for floats, you can leverage some of RSpec’s more advanced matchers, like: be_witihin, and be_between.

    And, please! < ANIMATE > Don’t forget to document the undocumented use case with a spec!

    View Slide

  35. Ok. So, while that build is running, let’s see if we can fix some of those flaky specs that are making you late for that thing…

    First, a bit of context…

    The code we’re about to look at is entirely made up. Well, I guess, technically, all code is made up. But, what I mean is that this code was made up fresh, just for this talk. By me. With TDD. Not ChatGPT.

    It’s not production code. But, it was inspired by real code that I’ve personally worked on in production applications.

    It’s a bit of a hodge podge. It’s a class called RubyConf that provides some utility methods that might be useful for running the conference.

    View Slide

  36. Here’s a bit of code to determine whether or not the conference is currently in progress.

    View Slide

  37. Notice that the method uses the system clock to determine the current date and time.

    Simple enough. Let’s look at the specs…

    View Slide

  38. Oh… There’s only one spec. Hmmm… O…K… Well, let’s take a look at it.

    View Slide

  39. It says that the #in_progress? Method returns false before the conference begins. Ok. That makes sense. But, it does seem like the author forgot at least two other cases: during the conference and after the conference.

    But you know what? This is a common problem I see with date based specs. The author of the spec is living in the now. They aren’t thinking about the future. In fact, this is exactly what happens to me with Daylight Saving Time. I forget about it and never write a spec that proves the software still works after the clock changes.

    I bet this spec ran fine before the conference. But, it’s failing now that the conference is actually in progress.

    Let’s play with the system clock to see if I’m right…

    View Slide

  40. Ok. As predicted, it passes if I set the clock to October 21, 2023 - well before the conference. So, we know this is a flaky test because it was passing and now it’s failing despite there being no modifications to the code or tests.

    View Slide

  41. And, if I set the clock to the first day of the conference? It fails.

    This spec is flaky. It depends on the system clock being in a particular state.

    Ok, how do we fix this?

    Remember, whenever we’re facing non-deterministic flakiness, we want to mock the non-determinism to make it deterministic. In this case, that’s the system clock…

    View Slide

  42. So, here’s the code and the spec as they were…

    View Slide

  43. And, here it is with the fixed spec.

    Notice that the only difference here is that the new spec is mocking (or freezing) time. The code didn’t change at all. It’s fine. Only the spec changed.

    It sets time to a specific date so that the spec will never run outside that context.

    It does this for the duration of the block, then it returns the system to it’s normal state.

    Ok. Let’s see if that fixed it…

    View Slide

  44. Great! So, now the specs pass on November 13th, the first day of the conference.

    And, I even added specs for the missing use cases by freezing time and setting the dates appropriately.

    Like this…

    View Slide

  45. These specs freeze time the same way the other spec freezes time, just with different dates and expected results.

    View Slide

  46. Ok. Next let’s look at fixing a test that fails when the network goes down…

    View Slide

  47. It’s not uncommon for code to need to call external services across a network. In this session_description method,

    View Slide

  48. we’re calling an API to fetch the conference schedule.

    View Slide

  49. Then we’re parsing it, finding the session that matches the title parameter, and returning it’s description.

    View Slide

  50. And, here’s the spec.

    View Slide

  51. With WiFi enabled, this spec passes.

    View Slide

  52. Note that the call to the network adds over a second to the runtime. And, that’s when it succeeds. Most get requests default to a 60 second timeout when waiting for the other service to respond.

    Plus…

    View Slide

  53. With WiFi turned off, the spec fails.

    View Slide

  54. Fortunately, it fails quickly because the network failure is on my end.

    Now, these are particularly nasty tests to debug because the loss in connectivity is neither logged, nor persistent. So, by the time you’re debugging the failure, it may not be possible to reproduce. Pay attention to HTTP calls, or any other type of call that crosses a network (e.g. GRPC). And, try running your specs with WiFi turned off to see if you can catch any failures.

    View Slide

  55. Alright. Let’s fix this spec…

    Here it is a bit smaller than before. Same code. Different font size, because the fix is a bit large…

    View Slide

  56. Notice that again, the code itself is not changing. The problem is with the spec.

    Most of the changes to the spec are setting up data to stub a response from the API.

    Here’s where we’re actually creating the stub.

    This allows the spec to validate that we’re parsing the results correctly. That’s the code we care about. We don’t actually care whether the external service is up and running when we’re running our test suite. It shouldn’t matter.

    View Slide

  57. Now, sometime I get a bit of pushback when I offer this advice. Folks ask, “What if the API changes? How will we know if we’re mocking the response?”

    My answer to that is that each spec should have one and only one reason to exist. This spec’s reason to exist is to verify the code we wrote works the way we expect. It is a unit test. We want these to have as few dependencies as possible so they’ll run fast.

    You may ALSO require a spec to validate that an API’s schema has not changed. That’s a different reason for a spec to exist. So, that’s not this spec. In fact, that’s not even a unit test. It’s an integration test. And, it’s one that’s designed to fail in order to catch changes to the API. So, we probably don’t want to run it with our unit tests, which are designed to pass. Maybe the integration tests should be run separately, on a schedule, rather than intermingled with our unit tests on every build.

    Alright. Let’s run the specs.

    View Slide

  58. Alright! Running the specs with the WiFi turned off proves that the stubbed response prevented the spec from flaking.

    View Slide

  59. It’s also important to point out the difference in time between the live version and the stubbed version. The live version took 1.3 seconds to execute. This version took less than 1/100th of a second. Those decisions really add up as your test suite grows. They can become a real problem when you hit one hundred thousand specs like we did a few months ago.

    View Slide

  60. Ok. It’s now 4:15. Those specs took us about 10 minutes to resolve. That wasn’t so bad. But, you can still feel the clock ticking. Are you going to be able to make it to the thing on time?

    View Slide

  61. Next, let’s take a look at order dependence, starting with a definition…

    View Slide

  62. Order dependent specs are specs that pass in isolation, but fail when run with other specs in a specific order.

    So, for example,

    View Slide

  63. If Test A and B both pass when run in alphabetical order.

    View Slide

  64. But, Test A fails when run after Test B.

    View Slide

  65. That makes Test A flaky, and Test B leaky.

    But, what does that mean? Leaky?

    View Slide

  66. Well, remember, these specs are making an invalid assumption about their environment. Their environment includes all of the shared state they have access to. It works like this…

    View Slide

  67. Let’s pretend this blue square is the starting point for the shared environment. Spec A runs first, so it gets the blue square environment and passes.

    View Slide

  68. Spec A does not modify the environment, so spec B runs in the same context as spec A. It also passes.

    View Slide

  69. But imagine, if spec B runs first… It gets the starting environment… The blue square.

    View Slide

  70. And, it changes the environment to a pink hexagon

    View Slide

  71. causing spec A to fail.

    View Slide

  72. So, what’s happening is that state from spec B is leaking into the environment, causing spec A to flake.

    For this reason, this class of specs are often referred to as “leaky.”

    View Slide

  73. So, isn’t the leaky spec the real problem here?

    Not really.

    Both specs are to blame.

    Only one is breaking your build.

    Fix the broken spec first.

    Often you’ll find that fixing the broken spec will point to a broader solution.

    But, how do you do fix order dependent flakiness? Well, first, let’s take a look at what causes these kinds of failures…

    View Slide

  74. Order dependent failures are caused by mutable state that is shared across specs.

    This could be in the form of:
    * Broadly scoped variables, like global or class variables
    * Databases
    * Key/value stores
    * Caches
    * Or, even the DOM, if you’re writing JavaScript tests.

    Alright, that’s what causes order dependency, but…

    View Slide

  75. How do you reproduce these things?

    View Slide

  76. First, eliminate non-determinism by running the failing spec repeatedly, in isolation. If it fails, that’s non-determinism.

    View Slide

  77. If not, then run ALL the specs that ran together with the failing spec. One of them is leaking state into the failing spec.

    View Slide

  78. If running the specs in the default order doesn’t reproduce the failure, randomize the order in which the specs are run using the rspec order random option. Keep running it until you find a seed that consistently causes the failure.

    View Slide

  79. Next, locate the leaky spec or specs. I say specs, plural, because sometimes it takes several specs running in a specific order to produce the failure. You can use rspec bisect to find the leaky specs for you.

    I’ll show you how in a minute…

    View Slide

  80. But, first, how can we fix order dependent failures?

    You can remove the shared state, make it immutable, or isolate it…

    * Don’t use broadly scoped variables
    * Mock the shared data store (which you can do easily with a layer of abstraction, like the repository pattern).
    * Use database transactions, or
    * Reset the shared state between specs

    View Slide

  81. Alright. Let’s see if we can fix another one of those flaky specs that are keeping you from that thing…

    View Slide

  82. Here’s a simple getter and setter to store a favorite session.

    View Slide

  83. Notice that both getter and setter leverage an object called Cache.

    And, they are calling a class method named `instance`. What is that? Let’s take a look.

    View Slide

  84. The Cache class is a simple, in-memory, key/value store backed by a hash.

    View Slide

  85. The `instance` method effectively turns this class into a singleton so that every reference to the Cache.instance method is getting the same instance.

    View Slide

  86. Here’s the specs for the `favorite_session` getter and setter…

    These specs pass when run in this order: getter first, then setter. But, they’ll fail in the opposite order because

    View Slide

  87. we’re storing the title of the session in the cache.

    View Slide

  88. So, the getter will return the value in the cache, not “Matz’s Keynote”.

    To prove that, let’s run rspec with the order random option to see if we can get it to fail…

    View Slide

  89. So, here you can see…

    I ran the specs with the order random option. And,

    View Slide

  90. RSpec chose the random seed 12322. And,
    The getter ran before the setter, so it passed.

    Let’s try that again…

    View Slide

  91. The getter ran before the setter, so it passed.

    Let’s try that again…

    View Slide

  92. Ok.

    I’m still running rspec with the order random option.

    View Slide

  93. This time rspec chose the seed 63603. And…

    View Slide

  94. Low and behold! The setter spec ran first…

    View Slide

  95. Causing the getter spec to fail.

    So, how do you go about fixing this?

    View Slide

  96. Well, we know that one of the specs that ran before the getter spec must have polluted the environment. In this case, we’re pretty sure the setter spec is the culprit because of the memoization.

    But, what if you didn’t know which of the specs was to blame for modifying the environment? That’s where rspec bisect comes in. Let’s take a look…

    View Slide

  97. Here, I’m running rspec bisect with the same order clause and seed that produced the failure. This is important, because bisect won’t work unless it can reproduce the failure.

    View Slide

  98. The first thing bisect does is to find the failing spec…

    View Slide

  99. Next, it analyzes whether or not the failure appears to be order dependent.

    In this case, it does.

    View Slide

  100. So, it performs a binary search, looking for the spec or specs that need to be run first in order for the failure to happen.

    Note that this can take a very long time if the set of candidate leaky specs is large.

    View Slide

  101. Then, finally, it reports the minimal command required to reproduce the problem.

    Run that command and you’ll see exactly which order the specs ran in to cause the issue.

    View Slide

  102. So, here we go…

    I’ve run the command that RSpec bisect gave us.

    View Slide

  103. And, sure enough, the setter spec is the culprit.

    So, how do we fix it?

    View Slide

  104. Here we are back at the beginning. The font is smaller because the solution here is bigger.

    One way we could approach this would be to call Cache dot clear in between specs. But, because our specs are currently sharing state, that would likely lead to a race condition on the build server where we’re probably running the specs in parallel. So, the solution I prefer is actually dependency injection. That’s a simple technique where I just pass the Cache object into the RubyConf object when it is created. So each spec can have it’s own cache.

    Here’s what that looks like in the code…

    View Slide

  105. First, here’s the new initializer. Notice that the cache parameter defaults to Cache.instance. So, if we don’t pass anything, the code will just use the singleton, which is what we want.

    By doing this we’ve now created a seam in the software that allows the specs to use their own cache objects. This prevents state from leaking between the specs, without modifying the behavior of the production code.

    To finish up, we need to modify the specs…

    View Slide

  106. Like this…

    View Slide

  107. Here, we’re creating a new instance of the Cache class and passing it to the RubyConf object when we create it.

    That’s it. That’s all there is to dependency injection.

    And, because each spec has it’s own cache, they no longer share state.

    Let’s run the specs again…

    View Slide

  108. I’m running the specs again — with the same randomized seed that caused them to fail in the first place.

    Now, they pass, even though

    View Slide

  109. The setter ran before the getter

    Voila!

    View Slide

  110. Ok! We’re making good progress, and it’s only 4:30! You might actually make that thing, after all! Just a couple more broken specs.

    View Slide

  111. Finally, let’s look at race conditions.

    View Slide

  112. What is a race condition?

    A race condition is what can happen when parallel processes compete over a scarce, shared resource.

    Let’s look at how that happens with a file…

    View Slide

  113. Let’s start by looking at two specs running in sequence.

    In this example, Spec 1…

    View Slide

  114. first writes to the file,

    View Slide

  115. then reads from it, checks the result,

    View Slide

  116. and passes.

    Once Spec 1 finishes, Spec 2 runs…

    View Slide

  117. first it writes to the same file,

    View Slide

  118. twice,

    View Slide

  119. then reads from it, checks the result,

    View Slide

  120. and passes.

    But…

    View Slide

  121. When you run these same specs in parallel…

    View Slide

  122. Spec 1 writes to the file

    View Slide

  123. Then, spec 2 writes to the file

    View Slide

  124. Then, spec 1 reads from the file, checks the result,

    View Slide

  125. And, fails — because there were two rows, not one

    View Slide

  126. Then spec 2 writes to the file again

    View Slide

  127. Then spec 2 reads from the file, checks the result,

    View Slide

  128. And fails — because there were three rows, not two

    View Slide

  129. So, both specs in this case, are susceptible to parallel flakiness due to a race condition.

    But, since this is asynchronous code, it’s entirely possible that the specs could pass.

    View Slide

  130. This is why race conditions are notoriously hard to reproduce. So, how can you go about debugging them if you can’t reproduce them? Well, you want to take a methodical approach.

    View Slide

  131. The first thing to do is to eliminate non-determinism. Run the failing spec repeatedly in isolation. If it fails, that’s non-determinism.

    View Slide

  132. If not, try to eliminate order dependence. Run the failing spec and all the specs that ran with it repeatedly in different orders. If you can repro the failure, that’s order dependence.

    View Slide

  133. If that doesn’t work, then run the specs repeatedly in parallel with the Parallel RSpec gem. I specifically mention that gem because it seems better suited for running the specs locally than Parallel Test or other options like Knapsack, which seem targeted at Rails apps running on CI. It’s best to debug this locally if at all possible.

    View Slide

  134. If you still can’t reproduce it, you can try randomizing the order in which the specs run in parallel.

    View Slide

  135. Once you’ve reproduced it, or even if you can’t, what should you look for to fix?

    The main cause of race conditions on build servers is asynchronous code competing over scarce, shared resources. Those resources might include:

    * File, or
    * Socket IO
    * Thread pools
    * Connection pools, or even
    * Low memory

    Once you have a suspect, how do you fix it?

    View Slide

  136. Well…

    * For IO-based issues, you can substitute StringIO for other kinds of IO in your spec. I’ll share an example of this in a moment.
    * You can test that the correct messages are being sent between collaborating objects, rather than testing the return value of a method.
    * You can write thread-safe code.
    * You can test threaded code synchronously - by extracting the guts of the thread into a plain old Ruby object (or PORO) and testing that, or
    * You can switch to fibers instead of threads

    Fibers are cool, because you can test them synchronously which is awesome. They significantly reduce the chances of a race condition, because they’re in control of when they relinquish the CPU back to the OS. So, atomic operations can complete without interruption.

    * Finally, you can always add more resources to your test environment. Though, that’s a bit of an arms race. You’ll most likely end up come back and increasing it again, and again, and again.

    View Slide

  137. Ok. Let’s take a look at the last two flaky specs that are keeping you from that thing…

    View Slide

  138. So, this feature of the app manages a list of reservations. There are two methods, one to reserve a seat. The other to get a list of the attendees.

    As you can see, we’re just…

    View Slide

  139. writing the names to a file and

    View Slide

  140. reading from that file.

    Let’s take a look at the specs…

    View Slide

  141. Ok. So…

    View Slide

  142. The first spec ensures that writing “Mickey Mouse” to the file grows the number of attendees by 1.

    View Slide

  143. The second spec ensures that when writing multiple lines, Donald and Goofy, the attendee count goes up accordingly.

    These are fine specs. Let’s run them…

    View Slide

  144. Hey! They pass!

    When run in sequence. In fact, they will even pass in the opposite order. But…

    View Slide

  145. If I break out the `parallel_rspec` gem and run them in parallel, they both fail.

    View Slide

  146. In this case, the second spec actually finished first. It failed because the attendee count to grew by 3 not 2.

    View Slide

  147. And, the first spec, which finished second thanks to parallelism, failed because the count grew by 2 not 1.

    We’ve already seen how that can happen, but let’s walk through it again…

    View Slide

  148. Before we get into how the specs failed, let’s look at how this RSpec code works. It’s a little bit complicated.

    View Slide

  149. Here’s the expectation in the second spec. It defines two blocks of code.

    View Slide

  150. The first block is passed to the expect method. And,

    View Slide

  151. The second is passed to the change method.

    The way RSpec handles this is to execute the change block (to get the initial value),

    View Slide

  152. then the expect block,

    View Slide

  153. and then the change block again (to get the final value). Finally, RSpec subtracts the initial value from the final value to get the delta, which needs to match the “by” clause, which in this case is 2.

    View Slide

  154. Ok. Here we are back at the beginning…

    This time, let’s track the order of operations on this timeline.

    View Slide

  155. First the second spec reads the file to grab the attendee count, which should be 0.

    View Slide

  156. Next, the first write happens in the second spec…

    View Slide

  157. Next, the first spec reads the file. This time it gets 1.

    View Slide

  158. Next, the other writes occur. No telling what order. You’d need to look at the file. But, they both happen…

    View Slide

  159. And, finally, both reads happen again. Here, we know that the second spec finished first, because it’s output appeared first. But, it doesn’t really matter.

    View Slide

  160. Ok. So, now that we know how it failed, let’s go back to the beginning and show you the solution.

    Turns out, the Ruby core team thought of this. They knew that testing asynchronous File IO would be a challenge. So, they included a class called StringIO to simulate other kinds of IO in specs. StringIO is a string, but with the interface of a file.

    View Slide

  161. So, what we want to do is to…

    Allow File to receive open and yield the a StringIO object.

    View Slide

  162. So, now, when the code calls File.open, the actual object that it will receive is a StringIO object.

    One caveat: Because this string behaves like a file, it has a cursor.

    View Slide

  163. So, after writing to the string, we need to rewind it before we can read it.

    That’s not necessary prior to introducing the StringIO because the file was being closed when the object fell out of scope. Here, the StringIO object isn’t closed because it hasn’t fallen out of scope, because it was declared in the spec. So, we need to rewind, before we can read…

    Alright! The proof is in the pudding. Did we fix the race condition?

    View Slide

  164. Yes!

    View Slide

  165. So, here we are, 40 minutes in, and we’ve found and resolved ALL of the flaky specs that were keeping you from that thing!

    Time to wrap things up so we can get to that thing in the lunchroom!

    View Slide

  166. ‘Cause, I don’t know about you, but this talk always makes me hungry…

    View Slide

  167. Ok. Here’s a cheat sheet for the entire talk…

    View Slide

  168. Non-deterministic flakiness reproduces in isolation. Look for interactions with non-deterministic elements of the environment. To fix this kind of flakiness, mock the non-determinism to make it deterministic. Don’t forget about Timecop when working with date or time related specs. There are also tools like webMock and VCR for handling specs that require network connections. I didn’t show them here because I prefer to just use RSpec like I did in this presentation. But, lots of folks find those tools very useful.

    View Slide

  169. Order dependent flakiness only reproduces with other specs, run in a certain order. It will not reproduce in isolation. Look for state that is shared across tests. To fix order dependency, remove shared state, make it immutable, or isolate it. RSpec order random can help you reproduce the failures. And, RSpec bisect can help you locate the leaky specs.

    View Slide

  170. And, race conditions only reproduce with other specs when run in parallel, not in isolation. Look for asynchronous code, or exhaustible shared resources. To fix race conditions, isolate things from one another (like we did with StringIO), or use fibers instead of threads. Seriously, they’re amazing! Finally, you can use Parallel RSpec to repro the failures locally instead of on your build server.

    View Slide

  171. And keep in mind that the secret ingredient in every flaky spec is an invalid assumption about the environment in which it is running.

    Sometimes, just remembering that fact will help you identify and resolve the flakiness.

    Ask yourself, how can I ensure that the environment for this test is what it expects?

    View Slide

  172. Oh, and one more thing… I have a bit of a hot take…

    Debugging this stuff is hard enough. But, it gets one hundred times harder if your specs are too DRY. So, avoid the use of these features of RSpec. They seem harmless — useful even — when you’re writing the specs. But, ultimately they make debugging way too hard.

    So, try to avoid…
    * Shared specs
    * Shared contexts
    * Nested contexts, and
    * Let statements

    Your specs should be super communicative. They are, after all, the executable documentation for your code. If you have to scroll all over the place or open a ton of files to write the specs, you can be guaranteed that you’ll be doing the same when you’re trying to understand and debug them when they fail.

    View Slide

  173. Don’t get me wrong. I love RSpec. But, it’s best to leave your tests WET. And, I’m not alone in this belief.

    The fine folks at thoughtbot have written about it.

    And, in fact, I honestly think that DRY might be the worst programming advice ever.

    I told you it was a hot take. If you disagree, come find me so I can change your mind.

    View Slide

  174. Again, my name is Alan Ridlehoover. I do know a thing or two about flakiness. But, it took me 20+ years to get here. Hopefully, this talk has short circuited that for you…

    View Slide

  175. As I mentioned at the beginning of the talk, I work for Cisco Meraki. So, I also know a thing or two about connectivity! Here’s how to connect with me. And, that last item is where you can find the source code for this talk. There’s even a bonus flaky spec regarding a raffle winner that I didn’t have time to cover today.

    Cisco Meraki is probably the largest Rails shop you’ve never heard of. And, we’re growing. We are currently hiring for a limited number of roles. Come chat with us to find out what it’s like to work at Meraki.

    View Slide

  176. Finally, a little shameless self promotion…

    My friend, Fito von Zastrow, and I love Ruby so much, we occasionally release something into the wild in the hopes that folks will find it useful. You can find links to our stuff at first try dot software, including Rubyist, the opinionated VS Code color theme I used in this talk. We’d love for you to check it out.

    View Slide

  177. Thank you so much for coming!

    If you have questions, come chat with me offstage. Or find me at lunch.

    View Slide