[RubyConf 2023] The Secret Ingredient: How to Understand and Resolve Just about Any Flaky Test

Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

It’s after 4 o’clock. The release was due an hour ago. You’ve got less than an hour to leave, or you’re going to be late for that thing… You can feel the clock ticking…

Slide 3

Slide 3 text

Slack clicks to life… You check your messages… < HEAVY SIGH >

Slide 4

Slide 4 text

The build just failed. A-gain. You look at the build. You look at the clock. < SHAKING HEAD > You don’t have time for flakiness…

Slide 5

Slide 5 text

So, you re-run the build. A-gain. Two builds. Five different failing specs. None of them have anything to do with your commit. All you can think about is how you can’t be late to another thing…

Slide 6

Slide 6 text

If only you knew the secret ingredient that all flaky tests have in common… You might be on your way to that thing right now…

Slide 7

Slide 7 text

Hello! My name is Alan Ridlehoover. I’m an Engineering Manager at Cisco Meraki — the largest Rails shop you’ve never heard of. And, though I’m not a baker, I do know a thing or two about flakiness. In fact… sometimes…

Slide 8

Slide 8 text

It’s all I can think about! Seriously! Since I started automating tests over 20 years ago, I’ve written my fair share of flaky specs. Daylight Saving Time is my personal nemesis. I can’t tell you how many times I’ve tripped on that. Let’s just say I’m well into the “shame on me” part of that relationship. Or, I was… But, I’m getting ahead of myself. Let’s start with a definition. What is a flaky test?

Slide 9

Slide 9 text

A flaky spec is one that changes state without modification to either the test itself or the code being tested.

Slide 10

Slide 10 text

So, if you write a spec A…

Slide 11

Slide 11 text

And a method #foo…

Slide 12

Slide 12 text

that makes the spec pass… Then you can expect that as long as…

Slide 13

Slide 13 text

The spec…

Slide 14

Slide 14 text

and the method remain unchanged…

Slide 15

Slide 15 text

The spec should continue to pass.

Slide 16

Slide 16 text

It’s when the spec…

Slide 17

Slide 17 text

And the method stay the same…

Slide 18

Slide 18 text

but the result changes…

Slide 19

Slide 19 text

That’s when you know you have a flaky spec. But, how does this happen?

Slide 20

Slide 20 text

Well, it happens because of the secret ingredient that all flaky tests have in common… But, what is it? What is that secret ingredient?

Slide 21

Slide 21 text

It’s an assumption. All flaky tests make invalid assumptions about their environment. They assume their environment will be in a particular state when they begin. But that assumption is rendered incorrect by some change in the environment between or during test runs.

Slide 22

Slide 22 text

Ok. But, what causes that change to the environment? Well, there are three recipes: * Non-determinism * Order dependence, and * Race conditions Let’s take a look at each of these, along with some examples in code… Starting with…

Slide 23

Slide 23 text

…non-determinism. So, what is non-determinism? For that matter, what is determinism?

Slide 24

Slide 24 text

Well, a deterministic algorithm is one that, given the same inputs, always produces the same output. For example...

Slide 25

Slide 25 text

If I take these parameters,

Slide 26

Slide 26 text

and pass them to a method called add,

Slide 27

Slide 27 text

it should always return 2,

Slide 28

Slide 28 text

no matter how many

Slide 29

Slide 29 text

times you call it.

Slide 30

Slide 30 text

But, what if there were a method #foo that

Slide 31

Slide 31 text

always returned true

Slide 32

Slide 32 text

until it didn’t. That’s the definition of non-determinism: an algorithm that, given the same inputs, does not always produce the same output. But, how could this be?

Slide 33

Slide 33 text

Well, it might sound obvious, but utilizing non-deterministic features of the environment leads to non-deterministic code, including… * Random number generators - clearly - these are intended to be, well, random * The system clock - we don’t always think of this, but it’s always changing * Network connections - that might be up one minute and down the next * Floating point precision - it’s not guaranteed These are just a few examples, I’m sure this list is not exhaustive. But, what if our code relies on these things? How can we possibly write deterministic tests?

Slide 34

Slide 34 text

The trick is to remove the non-determinism from the test by stubbing it, or to account for it by using advanced matchers so that the spec produces consistent results from one run to the next. To do that… * You can stub the random number generator to return a specific number * You can mock (or “freeze”) time * You can stub network responses * And, for floats, you can leverage some of RSpec’s more advanced matchers, like: be_witihin, and be_between. And, please! < ANIMATE > Don’t forget to document the undocumented use case with a spec!

Slide 35

Slide 35 text

Ok. So, while that build is running, let’s see if we can fix some of those flaky specs that are making you late for that thing… First, a bit of context… The code we’re about to look at is entirely made up. Well, I guess, technically, all code is made up. But, what I mean is that this code was made up fresh, just for this talk. By me. With TDD. Not ChatGPT. It’s not production code. But, it was inspired by real code that I’ve personally worked on in production applications. It’s a bit of a hodge podge. It’s a class called RubyConf that provides some utility methods that might be useful for running the conference.

Slide 36

Slide 36 text

Here’s a bit of code to determine whether or not the conference is currently in progress.

Slide 37

Slide 37 text

Notice that the method uses the system clock to determine the current date and time. Simple enough. Let’s look at the specs…

Slide 38

Slide 38 text

Oh… There’s only one spec. Hmmm… O…K… Well, let’s take a look at it.

Slide 39

Slide 39 text

It says that the #in_progress? Method returns false before the conference begins. Ok. That makes sense. But, it does seem like the author forgot at least two other cases: during the conference and after the conference. But you know what? This is a common problem I see with date based specs. The author of the spec is living in the now. They aren’t thinking about the future. In fact, this is exactly what happens to me with Daylight Saving Time. I forget about it and never write a spec that proves the software still works after the clock changes. I bet this spec ran fine before the conference. But, it’s failing now that the conference is actually in progress. Let’s play with the system clock to see if I’m right…

Slide 40

Slide 40 text

Ok. As predicted, it passes if I set the clock to October 21, 2023 - well before the conference. So, we know this is a flaky test because it was passing and now it’s failing despite there being no modifications to the code or tests.

Slide 41

Slide 41 text

And, if I set the clock to the first day of the conference? It fails. This spec is flaky. It depends on the system clock being in a particular state. Ok, how do we fix this? Remember, whenever we’re facing non-deterministic flakiness, we want to mock the non-determinism to make it deterministic. In this case, that’s the system clock…

Slide 42

Slide 42 text

So, here’s the code and the spec as they were…

Slide 43

Slide 43 text

And, here it is with the fixed spec. Notice that the only difference here is that the new spec is mocking (or freezing) time. The code didn’t change at all. It’s fine. Only the spec changed. It sets time to a specific date so that the spec will never run outside that context. It does this for the duration of the block, then it returns the system to it’s normal state. Ok. Let’s see if that fixed it…

Slide 44

Slide 44 text

Great! So, now the specs pass on November 13th, the first day of the conference. And, I even added specs for the missing use cases by freezing time and setting the dates appropriately. Like this…

Slide 45

Slide 45 text

These specs freeze time the same way the other spec freezes time, just with different dates and expected results.

Slide 46

Slide 46 text

Ok. Next let’s look at fixing a test that fails when the network goes down…

Slide 47

Slide 47 text

It’s not uncommon for code to need to call external services across a network. In this session_description method,

Slide 48

Slide 48 text

we’re calling an API to fetch the conference schedule.

Slide 49

Slide 49 text

Then we’re parsing it, finding the session that matches the title parameter, and returning it’s description.

Slide 50

Slide 50 text

And, here’s the spec.

Slide 51

Slide 51 text

With WiFi enabled, this spec passes.

Slide 52

Slide 52 text

Note that the call to the network adds over a second to the runtime. And, that’s when it succeeds. Most get requests default to a 60 second timeout when waiting for the other service to respond. Plus…

Slide 53

Slide 53 text

With WiFi turned off, the spec fails.

Slide 54

Slide 54 text

Fortunately, it fails quickly because the network failure is on my end. Now, these are particularly nasty tests to debug because the loss in connectivity is neither logged, nor persistent. So, by the time you’re debugging the failure, it may not be possible to reproduce. Pay attention to HTTP calls, or any other type of call that crosses a network (e.g. GRPC). And, try running your specs with WiFi turned off to see if you can catch any failures.

Slide 55

Slide 55 text

Alright. Let’s fix this spec… Here it is a bit smaller than before. Same code. Different font size, because the fix is a bit large…

Slide 56

Slide 56 text

Notice that again, the code itself is not changing. The problem is with the spec. Most of the changes to the spec are setting up data to stub a response from the API. Here’s where we’re actually creating the stub. This allows the spec to validate that we’re parsing the results correctly. That’s the code we care about. We don’t actually care whether the external service is up and running when we’re running our test suite. It shouldn’t matter.

Slide 57

Slide 57 text

Now, sometime I get a bit of pushback when I offer this advice. Folks ask, “What if the API changes? How will we know if we’re mocking the response?” My answer to that is that each spec should have one and only one reason to exist. This spec’s reason to exist is to verify the code we wrote works the way we expect. It is a unit test. We want these to have as few dependencies as possible so they’ll run fast. You may ALSO require a spec to validate that an API’s schema has not changed. That’s a different reason for a spec to exist. So, that’s not this spec. In fact, that’s not even a unit test. It’s an integration test. And, it’s one that’s designed to fail in order to catch changes to the API. So, we probably don’t want to run it with our unit tests, which are designed to pass. Maybe the integration tests should be run separately, on a schedule, rather than intermingled with our unit tests on every build. Alright. Let’s run the specs.

Slide 58

Slide 58 text

Alright! Running the specs with the WiFi turned off proves that the stubbed response prevented the spec from flaking.

Slide 59

Slide 59 text

It’s also important to point out the difference in time between the live version and the stubbed version. The live version took 1.3 seconds to execute. This version took less than 1/100th of a second. Those decisions really add up as your test suite grows. They can become a real problem when you hit one hundred thousand specs like we did a few months ago.

Slide 60

Slide 60 text

Ok. It’s now 4:15. Those specs took us about 10 minutes to resolve. That wasn’t so bad. But, you can still feel the clock ticking. Are you going to be able to make it to the thing on time?

Slide 61

Slide 61 text

Next, let’s take a look at order dependence, starting with a definition…

Slide 62

Slide 62 text

Order dependent specs are specs that pass in isolation, but fail when run with other specs in a specific order. So, for example,

Slide 63

Slide 63 text

If Test A and B both pass when run in alphabetical order.

Slide 64

Slide 64 text

But, Test A fails when run after Test B.

Slide 65

Slide 65 text

That makes Test A flaky, and Test B leaky. But, what does that mean? Leaky?

Slide 66

Slide 66 text

Well, remember, these specs are making an invalid assumption about their environment. Their environment includes all of the shared state they have access to. It works like this…

Slide 67

Slide 67 text

Let’s pretend this blue square is the starting point for the shared environment. Spec A runs first, so it gets the blue square environment and passes.

Slide 68

Slide 68 text

Spec A does not modify the environment, so spec B runs in the same context as spec A. It also passes.

Slide 69

Slide 69 text

But imagine, if spec B runs first… It gets the starting environment… The blue square.

Slide 70

Slide 70 text

And, it changes the environment to a pink hexagon

Slide 71

Slide 71 text

causing spec A to fail.

Slide 72

Slide 72 text

So, what’s happening is that state from spec B is leaking into the environment, causing spec A to flake. For this reason, this class of specs are often referred to as “leaky.”

Slide 73

Slide 73 text

So, isn’t the leaky spec the real problem here? Not really. Both specs are to blame. Only one is breaking your build. Fix the broken spec first. Often you’ll find that fixing the broken spec will point to a broader solution. But, how do you do fix order dependent flakiness? Well, first, let’s take a look at what causes these kinds of failures…

Slide 74

Slide 74 text

Order dependent failures are caused by mutable state that is shared across specs. This could be in the form of: * Broadly scoped variables, like global or class variables * Databases * Key/value stores * Caches * Or, even the DOM, if you’re writing JavaScript tests. Alright, that’s what causes order dependency, but…

Slide 75

Slide 75 text

How do you reproduce these things?

Slide 76

Slide 76 text

First, eliminate non-determinism by running the failing spec repeatedly, in isolation. If it fails, that’s non-determinism.

Slide 77

Slide 77 text

If not, then run ALL the specs that ran together with the failing spec. One of them is leaking state into the failing spec.

Slide 78

Slide 78 text

If running the specs in the default order doesn’t reproduce the failure, randomize the order in which the specs are run using the rspec order random option. Keep running it until you find a seed that consistently causes the failure.

Slide 79

Slide 79 text

Next, locate the leaky spec or specs. I say specs, plural, because sometimes it takes several specs running in a specific order to produce the failure. You can use rspec bisect to find the leaky specs for you. I’ll show you how in a minute…

Slide 80

Slide 80 text

But, first, how can we fix order dependent failures? You can remove the shared state, make it immutable, or isolate it… * Don’t use broadly scoped variables * Mock the shared data store (which you can do easily with a layer of abstraction, like the repository pattern). * Use database transactions, or * Reset the shared state between specs

Slide 81

Slide 81 text

Alright. Let’s see if we can fix another one of those flaky specs that are keeping you from that thing…

Slide 82

Slide 82 text

Here’s a simple getter and setter to store a favorite session.

Slide 83

Slide 83 text

Notice that both getter and setter leverage an object called Cache. And, they are calling a class method named `instance`. What is that? Let’s take a look.

Slide 84

Slide 84 text

The Cache class is a simple, in-memory, key/value store backed by a hash.

Slide 85

Slide 85 text

The `instance` method effectively turns this class into a singleton so that every reference to the Cache.instance method is getting the same instance.

Slide 86

Slide 86 text

Here’s the specs for the `favorite_session` getter and setter… These specs pass when run in this order: getter first, then setter. But, they’ll fail in the opposite order because

Slide 87

Slide 87 text

we’re storing the title of the session in the cache.

Slide 88

Slide 88 text

So, the getter will return the value in the cache, not “Matz’s Keynote”. To prove that, let’s run rspec with the order random option to see if we can get it to fail…

Slide 89

Slide 89 text

So, here you can see… I ran the specs with the order random option. And,

Slide 90

Slide 90 text

RSpec chose the random seed 12322. And, The getter ran before the setter, so it passed. Let’s try that again…

Slide 91

Slide 91 text

The getter ran before the setter, so it passed. Let’s try that again…

Slide 92

Slide 92 text

Ok. I’m still running rspec with the order random option.

Slide 93

Slide 93 text

This time rspec chose the seed 63603. And…

Slide 94

Slide 94 text

Low and behold! The setter spec ran first…

Slide 95

Slide 95 text

Causing the getter spec to fail. So, how do you go about fixing this?

Slide 96

Slide 96 text

Well, we know that one of the specs that ran before the getter spec must have polluted the environment. In this case, we’re pretty sure the setter spec is the culprit because of the memoization. But, what if you didn’t know which of the specs was to blame for modifying the environment? That’s where rspec bisect comes in. Let’s take a look…

Slide 97

Slide 97 text

Here, I’m running rspec bisect with the same order clause and seed that produced the failure. This is important, because bisect won’t work unless it can reproduce the failure.

Slide 98

Slide 98 text

The first thing bisect does is to find the failing spec…

Slide 99

Slide 99 text

Next, it analyzes whether or not the failure appears to be order dependent. In this case, it does.

Slide 100

Slide 100 text

So, it performs a binary search, looking for the spec or specs that need to be run first in order for the failure to happen. Note that this can take a very long time if the set of candidate leaky specs is large.

Slide 101

Slide 101 text

Then, finally, it reports the minimal command required to reproduce the problem. Run that command and you’ll see exactly which order the specs ran in to cause the issue.

Slide 102

Slide 102 text

So, here we go… I’ve run the command that RSpec bisect gave us.

Slide 103

Slide 103 text

And, sure enough, the setter spec is the culprit. So, how do we fix it?

Slide 104

Slide 104 text

Here we are back at the beginning. The font is smaller because the solution here is bigger. One way we could approach this would be to call Cache dot clear in between specs. But, because our specs are currently sharing state, that would likely lead to a race condition on the build server where we’re probably running the specs in parallel. So, the solution I prefer is actually dependency injection. That’s a simple technique where I just pass the Cache object into the RubyConf object when it is created. So each spec can have it’s own cache. Here’s what that looks like in the code…

Slide 105

Slide 105 text

First, here’s the new initializer. Notice that the cache parameter defaults to Cache.instance. So, if we don’t pass anything, the code will just use the singleton, which is what we want. By doing this we’ve now created a seam in the software that allows the specs to use their own cache objects. This prevents state from leaking between the specs, without modifying the behavior of the production code. To finish up, we need to modify the specs…

Slide 106

Slide 106 text

Like this…

Slide 107

Slide 107 text

Here, we’re creating a new instance of the Cache class and passing it to the RubyConf object when we create it. That’s it. That’s all there is to dependency injection. And, because each spec has it’s own cache, they no longer share state. Let’s run the specs again…

Slide 108

Slide 108 text

I’m running the specs again — with the same randomized seed that caused them to fail in the first place. Now, they pass, even though

Slide 109

Slide 109 text

The setter ran before the getter Voila!

Slide 110

Slide 110 text

Ok! We’re making good progress, and it’s only 4:30! You might actually make that thing, after all! Just a couple more broken specs.

Slide 111

Slide 111 text

Finally, let’s look at race conditions.

Slide 112

Slide 112 text

What is a race condition? A race condition is what can happen when parallel processes compete over a scarce, shared resource. Let’s look at how that happens with a file…

Slide 113

Slide 113 text

Let’s start by looking at two specs running in sequence. In this example, Spec 1…

Slide 114

Slide 114 text

first writes to the file,

Slide 115

Slide 115 text

then reads from it, checks the result,

Slide 116

Slide 116 text

and passes. Once Spec 1 finishes, Spec 2 runs…

Slide 117

Slide 117 text

first it writes to the same file,

Slide 118

Slide 118 text

twice,

Slide 119

Slide 119 text

then reads from it, checks the result,

Slide 120

Slide 120 text

and passes. But…

Slide 121

Slide 121 text

When you run these same specs in parallel…

Slide 122

Slide 122 text

Spec 1 writes to the file

Slide 123

Slide 123 text

Then, spec 2 writes to the file

Slide 124

Slide 124 text

Then, spec 1 reads from the file, checks the result,

Slide 125

Slide 125 text

And, fails — because there were two rows, not one

Slide 126

Slide 126 text

Then spec 2 writes to the file again

Slide 127

Slide 127 text

Then spec 2 reads from the file, checks the result,

Slide 128

Slide 128 text

And fails — because there were three rows, not two

Slide 129

Slide 129 text

So, both specs in this case, are susceptible to parallel flakiness due to a race condition. But, since this is asynchronous code, it’s entirely possible that the specs could pass.

Slide 130

Slide 130 text

This is why race conditions are notoriously hard to reproduce. So, how can you go about debugging them if you can’t reproduce them? Well, you want to take a methodical approach.

Slide 131

Slide 131 text

The first thing to do is to eliminate non-determinism. Run the failing spec repeatedly in isolation. If it fails, that’s non-determinism.

Slide 132

Slide 132 text

If not, try to eliminate order dependence. Run the failing spec and all the specs that ran with it repeatedly in different orders. If you can repro the failure, that’s order dependence.

Slide 133

Slide 133 text

If that doesn’t work, then run the specs repeatedly in parallel with the Parallel RSpec gem. I specifically mention that gem because it seems better suited for running the specs locally than Parallel Test or other options like Knapsack, which seem targeted at Rails apps running on CI. It’s best to debug this locally if at all possible.

Slide 134

Slide 134 text

If you still can’t reproduce it, you can try randomizing the order in which the specs run in parallel.

Slide 135

Slide 135 text

Once you’ve reproduced it, or even if you can’t, what should you look for to fix? The main cause of race conditions on build servers is asynchronous code competing over scarce, shared resources. Those resources might include: * File, or * Socket IO * Thread pools * Connection pools, or even * Low memory Once you have a suspect, how do you fix it?

Slide 136

Slide 136 text

Well… * For IO-based issues, you can substitute StringIO for other kinds of IO in your spec. I’ll share an example of this in a moment. * You can test that the correct messages are being sent between collaborating objects, rather than testing the return value of a method. * You can write thread-safe code. * You can test threaded code synchronously - by extracting the guts of the thread into a plain old Ruby object (or PORO) and testing that, or * You can switch to fibers instead of threads Fibers are cool, because you can test them synchronously which is awesome. They significantly reduce the chances of a race condition, because they’re in control of when they relinquish the CPU back to the OS. So, atomic operations can complete without interruption. * Finally, you can always add more resources to your test environment. Though, that’s a bit of an arms race. You’ll most likely end up come back and increasing it again, and again, and again.

Slide 137

Slide 137 text

Ok. Let’s take a look at the last two flaky specs that are keeping you from that thing…

Slide 138

Slide 138 text

So, this feature of the app manages a list of reservations. There are two methods, one to reserve a seat. The other to get a list of the attendees. As you can see, we’re just…

Slide 139

Slide 139 text

writing the names to a file and

Slide 140

Slide 140 text

reading from that file. Let’s take a look at the specs…

Slide 141

Slide 141 text

Ok. So…

Slide 142

Slide 142 text

The first spec ensures that writing “Mickey Mouse” to the file grows the number of attendees by 1.

Slide 143

Slide 143 text

The second spec ensures that when writing multiple lines, Donald and Goofy, the attendee count goes up accordingly. These are fine specs. Let’s run them…

Slide 144

Slide 144 text

Hey! They pass! When run in sequence. In fact, they will even pass in the opposite order. But…

Slide 145

Slide 145 text

If I break out the `parallel_rspec` gem and run them in parallel, they both fail.

Slide 146

Slide 146 text

In this case, the second spec actually finished first. It failed because the attendee count to grew by 3 not 2.

Slide 147

Slide 147 text

And, the first spec, which finished second thanks to parallelism, failed because the count grew by 2 not 1. We’ve already seen how that can happen, but let’s walk through it again…

Slide 148

Slide 148 text

Before we get into how the specs failed, let’s look at how this RSpec code works. It’s a little bit complicated.

Slide 149

Slide 149 text

Here’s the expectation in the second spec. It defines two blocks of code.

Slide 150

Slide 150 text

The first block is passed to the expect method. And,

Slide 151

Slide 151 text

The second is passed to the change method. The way RSpec handles this is to execute the change block (to get the initial value),

Slide 152

Slide 152 text

then the expect block,

Slide 153

Slide 153 text

and then the change block again (to get the final value). Finally, RSpec subtracts the initial value from the final value to get the delta, which needs to match the “by” clause, which in this case is 2.

Slide 154

Slide 154 text

Ok. Here we are back at the beginning… This time, let’s track the order of operations on this timeline.

Slide 155

Slide 155 text

First the second spec reads the file to grab the attendee count, which should be 0.

Slide 156

Slide 156 text

Next, the first write happens in the second spec…

Slide 157

Slide 157 text

Next, the first spec reads the file. This time it gets 1.

Slide 158

Slide 158 text

Next, the other writes occur. No telling what order. You’d need to look at the file. But, they both happen…

Slide 159

Slide 159 text

And, finally, both reads happen again. Here, we know that the second spec finished first, because it’s output appeared first. But, it doesn’t really matter.

Slide 160

Slide 160 text

Ok. So, now that we know how it failed, let’s go back to the beginning and show you the solution. Turns out, the Ruby core team thought of this. They knew that testing asynchronous File IO would be a challenge. So, they included a class called StringIO to simulate other kinds of IO in specs. StringIO is a string, but with the interface of a file.

Slide 161

Slide 161 text

So, what we want to do is to… Allow File to receive open and yield the a StringIO object.

Slide 162

Slide 162 text

So, now, when the code calls File.open, the actual object that it will receive is a StringIO object. One caveat: Because this string behaves like a file, it has a cursor.

Slide 163

Slide 163 text

So, after writing to the string, we need to rewind it before we can read it. That’s not necessary prior to introducing the StringIO because the file was being closed when the object fell out of scope. Here, the StringIO object isn’t closed because it hasn’t fallen out of scope, because it was declared in the spec. So, we need to rewind, before we can read… Alright! The proof is in the pudding. Did we fix the race condition?

Slide 164

Slide 164 text

Yes!

Slide 165

Slide 165 text

So, here we are, 40 minutes in, and we’ve found and resolved ALL of the flaky specs that were keeping you from that thing! Time to wrap things up so we can get to that thing in the lunchroom!

Slide 166

Slide 166 text

‘Cause, I don’t know about you, but this talk always makes me hungry…

Slide 167

Slide 167 text

Ok. Here’s a cheat sheet for the entire talk…

Slide 168

Slide 168 text

Non-deterministic flakiness reproduces in isolation. Look for interactions with non-deterministic elements of the environment. To fix this kind of flakiness, mock the non-determinism to make it deterministic. Don’t forget about Timecop when working with date or time related specs. There are also tools like webMock and VCR for handling specs that require network connections. I didn’t show them here because I prefer to just use RSpec like I did in this presentation. But, lots of folks find those tools very useful.

Slide 169

Slide 169 text

Order dependent flakiness only reproduces with other specs, run in a certain order. It will not reproduce in isolation. Look for state that is shared across tests. To fix order dependency, remove shared state, make it immutable, or isolate it. RSpec order random can help you reproduce the failures. And, RSpec bisect can help you locate the leaky specs.

Slide 170

Slide 170 text

And, race conditions only reproduce with other specs when run in parallel, not in isolation. Look for asynchronous code, or exhaustible shared resources. To fix race conditions, isolate things from one another (like we did with StringIO), or use fibers instead of threads. Seriously, they’re amazing! Finally, you can use Parallel RSpec to repro the failures locally instead of on your build server.

Slide 171

Slide 171 text

And keep in mind that the secret ingredient in every flaky spec is an invalid assumption about the environment in which it is running. Sometimes, just remembering that fact will help you identify and resolve the flakiness. Ask yourself, how can I ensure that the environment for this test is what it expects?

Slide 172

Slide 172 text

Oh, and one more thing… I have a bit of a hot take… Debugging this stuff is hard enough. But, it gets one hundred times harder if your specs are too DRY. So, avoid the use of these features of RSpec. They seem harmless — useful even — when you’re writing the specs. But, ultimately they make debugging way too hard. So, try to avoid… * Shared specs * Shared contexts * Nested contexts, and * Let statements Your specs should be super communicative. They are, after all, the executable documentation for your code. If you have to scroll all over the place or open a ton of files to write the specs, you can be guaranteed that you’ll be doing the same when you’re trying to understand and debug them when they fail.

Slide 173

Slide 173 text

Don’t get me wrong. I love RSpec. But, it’s best to leave your tests WET. And, I’m not alone in this belief. The fine folks at thoughtbot have written about it. And, in fact, I honestly think that DRY might be the worst programming advice ever. I told you it was a hot take. If you disagree, come find me so I can change your mind.

Slide 174

Slide 174 text

Again, my name is Alan Ridlehoover. I do know a thing or two about flakiness. But, it took me 20+ years to get here. Hopefully, this talk has short circuited that for you…

Slide 175

Slide 175 text

As I mentioned at the beginning of the talk, I work for Cisco Meraki. So, I also know a thing or two about connectivity! Here’s how to connect with me. And, that last item is where you can find the source code for this talk. There’s even a bonus flaky spec regarding a raffle winner that I didn’t have time to cover today. Cisco Meraki is probably the largest Rails shop you’ve never heard of. And, we’re growing. We are currently hiring for a limited number of roles. Come chat with us to find out what it’s like to work at Meraki.

Slide 176

Slide 176 text

Finally, a little shameless self promotion… My friend, Fito von Zastrow, and I love Ruby so much, we occasionally release something into the wild in the hopes that folks will find it useful. You can find links to our stuff at first try dot software, including Rubyist, the opinionated VS Code color theme I used in this talk. We’d love for you to check it out.

Slide 177

Slide 177 text

Thank you so much for coming! If you have questions, come chat with me offstage. Or find me at lunch.