Avoiding Déjà Vu: Building Resilient APIs with Idempotency

by Paul Conroy

Slide 1

Slide 1 text

Avoiding Déjà Vu: Building Resilient APIs with Idempotency Paul Conroy / @conroyp 5th June, 2025

Slide 2

Slide 2 text

From Dublin, Ireland Started playing with the web 30+ years ago (Notepad, Frontpage & Geocities!) CTO at Square1 conroyp.com / @conroyp Paul Conroy 👴 🌍 🇮🇪

Slide 3

Slide 3 text

https://en.wikipedia.org/wiki/Déjà_vu

Slide 4

Slide 4 text

https://en.wikipedia.org/wiki/Déjà_vu

Slide 5

Slide 5 text

How does Déjà Vu affect our API servers?

Slide 6

Slide 6 text

💸 Double charges How does Déjà Vu affect our API servers?

Slide 7

Slide 7 text

💸 Double charges 📦 Duplicate orders How does Déjà Vu affect our API servers?

Slide 8

Slide 8 text

💸 Double charges 📦 Duplicate orders 🐛 Data integrity issues How does Déjà Vu affect our API servers?

Slide 9

Slide 9 text

💸 Double charges 📦 Duplicate orders 🐛 Data integrity issues How does Déjà Vu affect our API servers? 🤬 Angry customers

Slide 10

Slide 10 text

💸 Double charges 📦 Duplicate orders 🐛 Data integrity issues How does Déjà Vu affect our API servers? 🤬 Angry customers 🗣 Higher support costs

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

What is idempotency?

Slide 13

Slide 13 text

What is idempotency? How do I pronounce

Slide 14

Slide 14 text

eye-dum-po-ten-see

Slide 15

Slide 15 text

https://en.wikipedia.org/wiki/Idempotence “Idem” - same “Potence” - power

Slide 16

Slide 16 text

An operation is idempotent if applying it multiple times has the same effect as applying it once.

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

Idempotency in our APIs

Slide 21

Slide 21 text

🧑💻

Slide 22

Slide 22 text

🧑💻

Slide 23

Slide 23 text

🧑💻

Slide 24

Slide 24 text

🧑💻

Slide 25

Slide 25 text

🧑💻

Slide 26

Slide 26 text

🧑💻 🍔🍟

Slide 27

Slide 27 text

🧑💻 🍔🍟 💶

Slide 28

Slide 28 text

🧑💻

Slide 29

Slide 29 text

🧑💻

Slide 30

Slide 30 text

🧑💻

Slide 31

Slide 31 text

🧑💻

Slide 32

Slide 32 text

🤷

Slide 33

Slide 33 text

🧑💻

Slide 34

Slide 34 text

🧑💻

Slide 35

Slide 35 text

🤷

Slide 36

Slide 36 text

🤷

Slide 37

Slide 37 text

Reasons for retries ● Impatient users ● Mobile apps retrying failed connections ● Load balancer failovers ● CDN fallbacks

Slide 38

Slide 38 text

💸 Double charges 📦 Duplicate orders 🐛 Data integrity issues 🤬 Angry customers 🗣 Higher support costs Consequences of missing idempotency

Slide 39

Slide 39 text

Who uses it?

Slide 40

Slide 40 text

Who uses it?

Slide 41

Slide 41 text

Who uses it?

Slide 42

Slide 42 text

Who uses it?

Slide 43

Slide 43 text

How can we make sure the same request gets processed only once?

Slide 44

Slide 44 text

Naturally idempotent: ● GET ● HEAD ● OPTIONS HTTP Verbs ● PUT ● DELETE

Slide 45

Slide 45 text

Naturally idempotent: ● GET ● HEAD ● OPTIONS Non-idempotent verbs: ● POST ● PATCH HTTP Verbs ● PUT ● DELETE

Slide 46

Slide 46 text

Naturally idempotent: ● GET ● HEAD ● OPTIONS Non-idempotent verbs: ● POST ● PATCH HTTP Verbs ● PUT ● DELETE

Slide 47

Slide 47 text

Naturally idempotent: ● GET ● HEAD ● OPTIONS Non-idempotent verbs: ● POST ● PATCH HTTP Verbs ● PUT ● DELETE

Slide 48

Slide 48 text

Naturally idempotent: ● GET ● HEAD ● OPTIONS Non-idempotent verbs: ● POST ● PATCH HTTP Verbs ● PUT ● DELETE Observable side-effects

Slide 49

Slide 49 text

Naturally idempotent: ● GET ● HEAD ● OPTIONS Non-idempotent verbs: ● POST ● PATCH HTTP Verbs ● PUT ● DELETE Observable side-effects

Slide 50

Slide 50 text

🧑💻1

Slide 51

Slide 51 text

🧑💻1 2 🧑💻

Slide 52

Slide 52 text

🧑💻1 2 🧑💻

Slide 53

Slide 53 text

🧑💻1 2 🧑💻 🍔🍟

Slide 54

Slide 54 text

🧑💻1 2 🧑💻 🍔🍟

Slide 55

Slide 55 text

🧑💻1 2 🧑💻 🍔🍟 💸💸

Slide 56

Slide 56 text

🧑💻1 2 🧑💻 🍔🍟 💸💸 🍔🍟

Slide 57

Slide 57 text

🧑💻1 2 🧑💻 🍔🍟 💸💸 🍔🍟

Slide 58

Slide 58 text

🧑💻1

Slide 59

Slide 59 text

🧑💻1 1 🧑💻

Slide 60

Slide 60 text

🧑💻1 1 🧑💻

Slide 61

Slide 61 text

🧑💻1 1 🧑💻

Slide 62

Slide 62 text

🧑💻1 1 🧑💻

Slide 63

Slide 63 text

🧑💻1 1 🧑💻

Slide 64

Slide 64 text

🧑💻1 1 🧑💻 🍔🍟

Slide 65

Slide 65 text

🧑💻1 1 🧑💻 🍔🍟

Slide 66

Slide 66 text

🧑💻1 1 🧑💻 🍔🍟 💶

Slide 67

Slide 67 text

How does it work?

Slide 68

Slide 68 text

🧑💻 Have we seen the request before?

Slide 69

Slide 69 text

🧑💻 Have we seen the request before? Generate response No

Slide 70

Slide 70 text

🧑💻 Have we seen the request before? Generate response Save response to cache No

Slide 71

Slide 71 text

🧑💻 Have we seen the request before? Generate response Save response to cache Return response No

Slide 72

Slide 72 text

🧑💻 Have we seen the request before? Retrieve response from cache Generate response Save response to cache Return response Yes No

Slide 73

Slide 73 text

Have we seen the request before?

Slide 74

Slide 74 text

Use a unique hash per request, storing it as a cache key. Where do we get the hash from? How do we know if we’ve seen a request before?

Slide 75

Slide 75 text

Request Body Build up hash using all parameters contained in the request order

Slide 76

Slide 76 text

Request Body Build up hash using all parameters contained in the request order 🧑💻 🍔🍟

Slide 77

Slide 77 text

Request Body Build up hash using all parameters contained in the request order 🧑💻 🧑💻 🍔🍟 🍔🍟

Slide 78

Slide 78 text

Request Body Build up hash using all parameters contained in the request order 🧑💻 🧑💻 🍔🍟 🍔🍟 🏢

Slide 79

Slide 79 text

IP Address Include user’s public IP address in hash 🧑💻 🍔🍟 104.17.3.109

Slide 80

Slide 80 text

IP Address Include user’s public IP address in hash 🧑💻 🧑💻 🍔🍟 🍔🍟 🏢 104.17.3.109 104.17.3.109

Slide 81

Slide 81 text

User ID Logged-in user’s ID 🧑💻 🧑💻 🍔🍟 🍔🍟 🏢 user_id: 2552 user_id: 8303

Slide 82

Slide 82 text

User ID Logged-in user’s ID 🧑💻 🧑💻 🍔🍟 🍔🍟 🏢 user_id: 2552 user_id: 8303

Slide 83

Slide 83 text

User ID Logged-in user’s ID 🧑💻 🧑💻 🍔🍟 🍔🍟 🏢 user_id: 2552 user_id: 8303 user_id: null user_id: null

Slide 84

Slide 84 text

Use a unique hash per request, storing it as a cache key. Where do we get the hash from? How do we know if we’ve seen a request before?

Slide 85

Slide 85 text

Use a unique hash per request, storing it as a cache key. Where do we get the hash from? How do we know if we’ve seen a request before? ● Request body - but duplicate orders!

Slide 86

Slide 86 text

Use a unique hash per request, storing it as a cache key. Where do we get the hash from? How do we know if we’ve seen a request before? ● Request body - but duplicate orders! ● IP address - but shared public IPs!

Slide 87

Slide 87 text

Slide 88

Slide 88 text

Slide 89

Slide 89 text

🧑💻1 1 🧑💻 🍔🍟 💶

Slide 90

Slide 90 text

🧑💻1 1 🧑💻 🍔🍟 💶 Idempotency-Key: 1 Idempotency-Key: 1

Slide 91

Slide 91 text

🧑💻Idempotency-Key: 1 ● Make the client pass a key with each idempotent request ● Use this as the basis for our server-side cache ● Typically use UUIDs to avoid collisions ● Client SDKs help with key generation

Slide 92

Slide 92 text

https://github.com/stripe/stripe-php/blob/master/lib/HttpClient/CurlClient.php#L271

Slide 93

Slide 93 text

https://github.com/stripe/stripe-php/blob/master/lib/HttpClient/CurlClient.php#L271

Slide 94

Slide 94 text

🧑💻 Have we seen the request before? Retrieve response from cache Generate response Save response to cache Return response Yes No Idempotency-Key: 1

Slide 95

Slide 95 text

How does it work?

Slide 96

Slide 96 text

How does it really work?

Slide 97

Slide 97 text

Application 🧑💻

Slide 98

Slide 98 text

Application Middleware 🧑💻

Slide 99

Slide 99 text

Application Middleware 🧑💻

Slide 100

Slide 100 text

No content

Slide 101

Slide 101 text

No content

Slide 102

Slide 102 text

No content

Slide 103

Slide 103 text

No content

Slide 104

Slide 104 text

No content

Slide 105

Slide 105 text

No content

Slide 106

Slide 106 text

Make app generate response

Slide 107

Slide 107 text

Make app generate response Put whole structure into cache - headers and all

Slide 108

Slide 108 text

Make app generate response Put whole structure into cache - headers and all Cache duration! Let’s come back to this…

Slide 109

Slide 109 text

No content

Slide 110

Slide 110 text

Pull generated response from cache

Slide 111

Slide 111 text

Pull generated response from cache Has client repeated the key on a different URL?

Slide 112

Slide 112 text

Pull generated response from cache Has client repeated the key on a different URL? Additional header to show it’s a replayed request

Slide 113

Slide 113 text

No content

Slide 114

Slide 114 text

Internal use?

Slide 115

Slide 115 text

Choose a TTL based on retry behaviour, business impact, and storage constraints How long should we cache for? ⏳ Short TTL (Mins - Hours) ⏱ Medium TTL (Hours to Days) 📅 Long TTL (Days to Weeks) 🔒 Infinite TTL (Persistent Storage)

Slide 116

Slide 116 text

Scenario: Mobile app order processing (Food delivery application) User Behaviour: Users place orders on their phones while commuting or walking. Retry Pattern: The mobile app automatically retries failed requests up to 3 times within a 5-minute window. Key Selection Strategy: UUIDs generally sufficient. ⏳ Short TTL (Mins - Hours) ● Most connectivity issues resolve within minutes ● Retries typically happen almost immediately or within a few minutes ● After this window, a new order attempt is likely genuinely new

Slide 117

Slide 117 text

Scenario: Batch payment processing (B2B SaaS platform) User Behaviour: Businesses run scheduled payment batch jobs that process hundreds of transactions. Retry Pattern: Failed batches are commonly retried within the same business day or the next morning. Key Selection Strategy: UUIDs still generally sufficient - user/session identifiers may be helpful for multi-day caches. ⏱ Medium TTL (Hours to Days) ● Business hours and operational patterns dictate retry windows ● System failures may take several hours to resolve ● Next-day retries are common in business workflows

Slide 118

Slide 118 text

Scenario: Subscription Management System (B2B SaaS platform handling subscription payments) User Behaviour: Users change subscription tiers, add premium features, etc. Retry Pattern: Support teams may need to reprocess failed changes days later after verification. Key Selection Strategy: Consider additional business context added to the key. 📅 Long TTL (Days to Weeks) ● Customer service tickets often take days to resolve ● Subscription changes have billing cycle implications ● Retry attempts may happen after delays with customer communication

Slide 119

Slide 119 text

Scenario: Regulatory Compliance Reporting (Financial reporting system submitting legally-required transaction reports) User Behaviour: System submits mandatory reports to government agencies. Retry Pattern: Failed submissions must be retried indefinitely until successful, but must never be duplicated. Key Selection Strategy: Additional metadata about requester. 🔒 Infinite TTL (Persistent Storage) ● Regulatory requirements prohibit both missed and duplicate reports ● Legal penalties for non-compliance are severe ● The reporting requirement never expires

Slide 120

Slide 120 text

Should errors be cached? Should we allow retries? What about errors? https://docs.stripe.com/api/idempotent_requests

Slide 121

Slide 121 text

Should errors be cached? Should we allow retries? What about errors? Stripe “[..] works by saving the resulting status code and body of the first request made for any given idempotency key, regardless of whether it succeeds or fails. Subsequent requests with the same key return the same result, including 500 errors.” https://docs.stripe.com/api/idempotent_requests

Slide 122

Slide 122 text

Race conditions 🧑💻 ⏳

Slide 123

Slide 123 text

Race conditions 🧑💻 ⏳

Slide 124

Slide 124 text

Race conditions 🧑💻 ⏳

Slide 125

Slide 125 text

Cache locking Using a lock for the process allows us to ensure only one process handles the request. ● Try to get a cache lock ● If we can, process the request then release the lock ● If we can’t, the same request is being processed by another process. Wait until it’s done ● Allow for timeouts

Slide 126

Slide 126 text

Cache locking

Slide 127

Slide 127 text

Cache locking Can’t get a lock? Wait for other process to finish

Slide 128

Slide 128 text

Cache locking Can’t get a lock? Wait for other process to finish Make sure to release the lock when we’re done!

Slide 129

Slide 129 text

No content

Slide 130

Slide 130 text

Another process finished handling the request?

Slide 131

Slide 131 text

Another process finished handling the request? Sleep for a moment and check again

Slide 132

Slide 132 text

Another process finished handling the request? Sleep for a moment and check again Something has gone wrong, waiting too long

Slide 133

Slide 133 text

https://github.com/square1-io/laravel-idempotency

Slide 134

Slide 134 text

Laravel Implementation

Slide 135

Slide 135 text

No content

Slide 136

Slide 136 text

Race conditions - how long to “hold” a request

Slide 137

Slide 137 text

Race conditions - how long to “hold” a request Custom user id resolver

Slide 138

Slide 138 text

Race conditions - how long to “hold” a request Custom user id resolver Replay or throw exception?

Slide 139

Slide 139 text

https://apisyouwonthate.com/blog/idemptoency-keys/ Error on idempotent retry after success

Slide 140

Slide 140 text

Slide 141

Slide 141 text

Implementing Idempotency ● Decide on the endpoints which need it ● Select appropriate key cache TTLs ● Document idempotent operations in your API ● Allow your users to trust an operation is idempotent ● Decide on the endpoints which need it ● Select appropriate key cache TTLs ● Document idempotent operations in your API ● Allow your users to trust an operation is idempotent

Slide 142

Slide 142 text

No content

Slide 143

Slide 143 text

Registered As Canonical Username User ID Bigbird bigbird 123

Slide 144

Slide 144 text

Registered As Canonical Username User ID Bigbird bigbird 123

Slide 145

Slide 145 text

Registered As Canonical Username User ID Bigbird bigbird 123 BIGBIRD bigbird

Slide 146

Slide 146 text

Registered As Canonical Username User ID Bigbird bigbird 123 BIGBIRD bigbird

Slide 147

Slide 147 text

Bigbird BIGBIRD BigBird BiGBiRD getCanonicalUsername(user) bigbird

Slide 148

Slide 148 text

Bigbird BIGBIRD BigBird BiGBiRD getCanonicalUsername(user) bigbird

Slide 149

Slide 149 text

Registered As Canonical Username User ID Bigbird bigbird 123

Slide 150

Slide 150 text

Registered As Canonical Username User ID Bigbird bigbird 123 ᴮᴵᴳᴮᴵᴿᴰ

Slide 151

Slide 151 text

Registered As Canonical Username User ID Bigbird bigbird 123 ᴮᴵᴳᴮᴵᴿᴰ u'\u1d2e\u1d35\u1d33\u1d2e\u1d35\u1d3f\u1d30'

Slide 152

Slide 152 text

Registered As Canonical Username User ID Bigbird bigbird 123 ᴮᴵᴳᴮᴵᴿᴰ BIGBIRD u'\u1d2e\u1d35\u1d33\u1d2e\u1d35\u1d3f\u1d30'

Slide 153

Slide 153 text

Registered As Canonical Username User ID Bigbird bigbird 123 ᴮᴵᴳᴮᴵᴿᴰ BIGBIRD u'\u1d2e\u1d35\u1d33\u1d2e\u1d35\u1d3f\u1d30' 456

Slide 154

Slide 154 text

No content

Slide 155

Slide 155 text

No content

Slide 156

Slide 156 text

/reset?user=BIGBIRD&...

Slide 157

Slide 157 text

/reset?user=BIGBIRD&...

Slide 158

Slide 158 text

/reset?user=BIGBIRD&...

Slide 159

Slide 159 text

/reset?user=BIGBIRD&...

Slide 160

Slide 160 text

What went wrong? ● getCanonicalUsername relied on nodeprep.prepare ● Input string not being valid Unicode 3.2 meant nodeprep.prepare was no longer idempotent!  ● Fixed by double-checking username, and subsequently a library update

Slide 161

Slide 161 text

An operation is idempotent if applying it multiple times has the same effect as applying it once.

Slide 162

Slide 162 text

● Relying on server side hashes alone to identify repeat requests is risky ● Make the client do the work! ● Cache TTL appropriate to your use case ● Don’t worry about HTTP verbs that are already idempotent - consider DELETE! ● Replay behaviour - document your choice for your users ● Extra header helps with debugging Takeaways

Slide 163

Slide 163 text

Slide 164

Slide 164 text

● Making retries safe with idempotent APIs (AWS)  https://aws.amazon.com/builders-library/making-retries-safe-with-idempotent-APIs/ ● Designing robust and predictable APIs with idempotency (Stripe) https://stripe.com/blog/idempotency ● Creative usernames and Spotify account hijacking (Spotify) https://engineering.atspotify.com/2013/06/creative-usernames/ ● Idempotency - what is it, and how can it help our Laravel APIs? https://www.conroyp.com/articles/what-is-idempotency-add-to-laravel-apis ● Laravel Idempotency Package https://github.com/square1-io/laravel-idempotency Further reading