Avoiding Déjà Vu: Building Resilient APIs with Idempotency

Avoiding Déjà Vu: Building Resilient APIs with Idempotency Paul Conroy
/ @conroyp 5th June, 2025

From Dublin, Ireland Started playing with the web 30+ years
ago (Notepad, Frontpage & Geocities!) CTO at Square1 conroyp.com / @conroyp Paul Conroy 👴 🌍 🇮🇪

https://en.wikipedia.org/wiki/Déjà_vu

How does Déjà Vu affect our API servers?

💸 Double charges How does Déjà Vu affect our API
servers?

💸 Double charges 📦 Duplicate orders How does Déjà Vu
affect our API servers?

💸 Double charges 📦 Duplicate orders 🐛 Data integrity issues
How does Déjà Vu affect our API servers?

How does Déjà Vu affect our API servers? 🤬 Angry customers

How does Déjà Vu affect our API servers? 🤬 Angry customers 🗣 Higher support costs

What is idempotency?

What is idempotency? How do I pronounce

eye-dum-po-ten-see

https://en.wikipedia.org/wiki/Idempotence “Idem” - same “Potence” - power

An operation is idempotent if applying it multiple times has
the same effect as applying it once.

Idempotency in our APIs

🧑💻

🧑💻 🍔🍟

🧑💻 🍔🍟 💶

🧑💻

Reasons for retries • Impatient users • Mobile apps retrying
failed connections • Load balancer failovers • CDN fallbacks

🤬 Angry customers 🗣 Higher support costs Consequences of missing idempotency

Who uses it?

How can we make sure the same request gets processed
only once?

Naturally idempotent: • GET • HEAD • OPTIONS HTTP Verbs
• PUT • DELETE

Naturally idempotent: • GET • HEAD • OPTIONS Non-idempotent verbs:
• POST • PATCH HTTP Verbs • PUT • DELETE

Naturally idempotent: • GET • HEAD • OPTIONS Non-idempotent verbs:
• POST • PATCH HTTP Verbs • PUT • DELETE Observable side-effects

🧑💻1

🧑💻1 2 🧑💻

🧑💻1 2 🧑💻 🍔🍟

🧑💻1 2 🧑💻 🍔🍟 💸💸

🧑💻1 2 🧑💻 🍔🍟 💸💸 🍔🍟

🧑💻1

🧑💻1 1 🧑💻

🧑💻1 1 🧑💻 🍔🍟

🧑💻1 1 🧑💻 🍔🍟 💶

How does it work?

🧑💻 Have we seen the request before?

🧑💻 Have we seen the request before? Generate response No

🧑💻 Have we seen the request before? Generate response Save
response to cache No

🧑💻 Have we seen the request before? Generate response Save
response to cache Return response No

🧑💻 Have we seen the request before? Retrieve response from
cache Generate response Save response to cache Return response Yes No

Have we seen the request before?

Use a unique hash per request, storing it as a
cache key. Where do we get the hash from? How do we know if we’ve seen a request before?

Request Body Build up hash using all parameters contained in
the request order

the request order 🧑💻 🍔🍟

the request order 🧑💻 🧑💻 🍔🍟 🍔🍟

the request order 🧑💻 🧑💻 🍔🍟 🍔🍟 🏢

IP Address Include user’s public IP address in hash 🧑💻
🍔🍟 104.17.3.109

IP Address Include user’s public IP address in hash 🧑💻
🧑💻 🍔🍟 🍔🍟 🏢 104.17.3.109 104.17.3.109

User ID Logged-in user’s ID 🧑💻 🧑💻 🍔🍟 🍔🍟 🏢
user_id: 2552 user_id: 8303

User ID Logged-in user’s ID 🧑💻 🧑💻 🍔🍟 🍔🍟 🏢
user_id: 2552 user_id: 8303 user_id: null user_id: null

cache key. Where do we get the hash from? How do we know if we’ve seen a request before?

cache key. Where do we get the hash from? How do we know if we’ve seen a request before? • Request body - but duplicate orders!

cache key. Where do we get the hash from? How do we know if we’ve seen a request before? • Request body - but duplicate orders! • IP address - but shared public IPs!

cache key. Where do we get the hash from? How do we know if we’ve seen a request before? • Request body - but duplicate orders! • IP address - but shared public IPs! • User ID - but guest checkouts!

cache key. Where do we get the hash from? How do we know if we’ve seen a request before? • Request body - but duplicate orders! • IP address - but shared public IPs! • User ID - but guest checkouts! Make the client do some work!

🧑💻1 1 🧑💻 🍔🍟 💶

🧑💻1 1 🧑💻 🍔🍟 💶 Idempotency-Key: 1 Idempotency-Key: 1

🧑💻Idempotency-Key: 1 • Make the client pass a key with
each idempotent request • Use this as the basis for our server-side cache • Typically use UUIDs to avoid collisions • Client SDKs help with key generation

https://github.com/stripe/stripe-php/blob/master/lib/HttpClient/CurlClient.php#L271

🧑💻 Have we seen the request before? Retrieve response from
cache Generate response Save response to cache Return response Yes No Idempotency-Key: 1

How does it work?

How does it really work?

Application 🧑💻

Application Middleware 🧑💻

Make app generate response

Make app generate response Put whole structure into cache -
headers and all

Make app generate response Put whole structure into cache -
headers and all Cache duration! Let’s come back to this…

Pull generated response from cache

Pull generated response from cache Has client repeated the key
on a different URL?

Pull generated response from cache Has client repeated the key
on a different URL? Additional header to show it’s a replayed request

Internal use?

Choose a TTL based on retry behaviour, business impact, and
storage constraints How long should we cache for? ⏳ Short TTL (Mins - Hours) ⏱ Medium TTL (Hours to Days) 📅 Long TTL (Days to Weeks) 🔒 Infinite TTL (Persistent Storage)

Scenario: Mobile app order processing (Food delivery application) User Behaviour:
Users place orders on their phones while commuting or walking. Retry Pattern: The mobile app automatically retries failed requests up to 3 times within a 5-minute window. Key Selection Strategy: UUIDs generally sufficient. ⏳ Short TTL (Mins - Hours) • Most connectivity issues resolve within minutes • Retries typically happen almost immediately or within a few minutes • After this window, a new order attempt is likely genuinely new

Scenario: Batch payment processing (B2B SaaS platform) User Behaviour: Businesses
run scheduled payment batch jobs that process hundreds of transactions. Retry Pattern: Failed batches are commonly retried within the same business day or the next morning. Key Selection Strategy: UUIDs still generally sufficient - user/session identifiers may be helpful for multi-day caches. ⏱ Medium TTL (Hours to Days) • Business hours and operational patterns dictate retry windows • System failures may take several hours to resolve • Next-day retries are common in business workflows

Scenario: Subscription Management System (B2B SaaS platform handling subscription payments)
User Behaviour: Users change subscription tiers, add premium features, etc. Retry Pattern: Support teams may need to reprocess failed changes days later after verification. Key Selection Strategy: Consider additional business context added to the key. 📅 Long TTL (Days to Weeks) • Customer service tickets often take days to resolve • Subscription changes have billing cycle implications • Retry attempts may happen after delays with customer communication

Scenario: Regulatory Compliance Reporting (Financial reporting system submitting legally-required transaction
reports) User Behaviour: System submits mandatory reports to government agencies. Retry Pattern: Failed submissions must be retried indefinitely until successful, but must never be duplicated. Key Selection Strategy: Additional metadata about requester. 🔒 Infinite TTL (Persistent Storage) • Regulatory requirements prohibit both missed and duplicate reports • Legal penalties for non-compliance are severe • The reporting requirement never expires

Should errors be cached? Should we allow retries? What about
errors? https://docs.stripe.com/api/idempotent_requests

Should errors be cached? Should we allow retries? What about
errors? Stripe “[..] works by saving the resulting status code and body of the first request made for any given idempotency key, regardless of whether it succeeds or fails. Subsequent requests with the same key return the same result, including 500 errors.” https://docs.stripe.com/api/idempotent_requests

Race conditions 🧑💻 ⏳

Cache locking Using a lock for the process allows us
to ensure only one process handles the request. • Try to get a cache lock • If we can, process the request then release the lock • If we can’t, the same request is being processed by another process. Wait until it’s done • Allow for timeouts

Cache locking

Cache locking Can’t get a lock? Wait for other process
to finish

Cache locking Can’t get a lock? Wait for other process
to finish Make sure to release the lock when we’re done!

Another process finished handling the request?

Another process finished handling the request? Sleep for a moment
and check again

Another process finished handling the request? Sleep for a moment
and check again Something has gone wrong, waiting too long

https://github.com/square1-io/laravel-idempotency

Laravel Implementation

Race conditions - how long to “hold” a request

Race conditions - how long to “hold” a request Custom
user id resolver

Race conditions - how long to “hold” a request Custom
user id resolver Replay or throw exception?

https://apisyouwonthate.com/blog/idemptoency-keys/ Error on idempotent retry after success

Implementing Idempotency • Decide on the endpoints which need it
• Select appropriate key cache TTLs • Document idempotent operations in your API • Allow your users to trust an operation is idempotent

Implementing Idempotency • Decide on the endpoints which need it
• Select appropriate key cache TTLs • Document idempotent operations in your API • Allow your users to trust an operation is idempotent • Decide on the endpoints which need it • Select appropriate key cache TTLs • Document idempotent operations in your API • Allow your users to trust an operation is idempotent

Registered As Canonical Username User ID Bigbird bigbird 123

Registered As Canonical Username User ID Bigbird bigbird 123 BIGBIRD
bigbird

Bigbird BIGBIRD BigBird BiGBiRD getCanonicalUsername(user) bigbird

Registered As Canonical Username User ID Bigbird bigbird 123

Registered As Canonical Username User ID Bigbird bigbird 123 ᴮᴵᴳᴮᴵᴿᴰ

u'\u1d2e\u1d35\u1d33\u1d2e\u1d35\u1d3f\u1d30'

BIGBIRD u'\u1d2e\u1d35\u1d33\u1d2e\u1d35\u1d3f\u1d30'

BIGBIRD u'\u1d2e\u1d35\u1d33\u1d2e\u1d35\u1d3f\u1d30' 456

/reset?user=BIGBIRD&...

What went wrong? • getCanonicalUsername relied on nodeprep.prepare • Input
string not being valid Unicode 3.2 meant nodeprep.prepare was no longer idempotent!  • Fixed by double-checking username, and subsequently a library update

An operation is idempotent if applying it multiple times has
the same effect as applying it once.

• Relying on server side hashes alone to identify repeat
requests is risky • Make the client do the work! • Cache TTL appropriate to your use case • Don’t worry about HTTP verbs that are already idempotent - consider DELETE! • Replay behaviour - document your choice for your users • Extra header helps with debugging Takeaways

• Relying on server side hashes alone to identify repeat
requests is risky • Make the client do the work! • Cache TTL appropriate to your use case • Don’t worry about HTTP verbs that are already idempotent - consider DELETE! • Replay behaviour - document your choice for your users • Extra header helps with debugging Takeaways Idempotency stops your API backend from reliving the same request, over and over again.

• Making retries safe with idempotent APIs (AWS)  https://aws.amazon.com/builders-library/making-retries-safe-with-idempotent-APIs/ •
Designing robust and predictable APIs with idempotency (Stripe) https://stripe.com/blog/idempotency • Creative usernames and Spotify account hijacking (Spotify) https://engineering.atspotify.com/2013/06/creative-usernames/ • Idempotency - what is it, and how can it help our Laravel APIs? https://www.conroyp.com/articles/what-is-idempotency-add-to-laravel-apis • Laravel Idempotency Package https://github.com/square1-io/laravel-idempotency Further reading

Conroyp.com @conroyp 🌍 🌍 Thank you!

Avoiding Déjà Vu: Building Resilient APIs with ...

Avoiding Déjà Vu: Building Resilient APIs with Idempotency

More Decks by Paul Conroy

Other Decks in Technology

Featured

Transcript