Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Back on your Feet

Back on your Feet

When writing resilient Elixir applications one of our major concerns is state: where do we store it, what happens to it when a process crashes, how do we efficiently recreate it.
In this talk, we'll look at an example application and apply different techniques that can be used to protect and recover state, each one with specific implications and tradeoffs.
All of the techniques shown in this talk can be applied to everyday development.

Claudio Ortolina

September 07, 2017
Tweet

More Decks by Claudio Ortolina

Other Decks in Technology

Transcript

  1. HOW IT WORKS ➤ Given my location (defined by lat/lng),

    get a list of relevant gigs ➤ For each one of them get, the artists involved ➤ For each artist, get their latest release
  2. DATA MODEL defmodule Gig.Location do defstruct coords: {0, 0}, metro_area:

    nil, event_ids: [] end defmodule Gig.Event do defstruct id: nil, name: nil, artists: [], venue: nil, starts_at: :not_available end
  3. DATA MODEL defmodule Gig.Artist do defstruct id: nil, mbid: nil,

    name: nil end defmodule Gig.Release do defstruct id: nil, title: nil, type: "Album", release_date: :not_available end
  4. PAIN POINTS ➤ Cannot query APIs in real-time (too expensive,

    N+1 api calls) ➤ Both APIs are rate-limited ➤ Need to cache data ➤ Results update over time without us knowing anything about it (making polling necessary)
  5. ONE LOCATION, ONE PROCESS ➤ For each location, start a

    new process ➤ We use registry to track them ➤ Each process fetches and refreshes its own data
  6. PROS ➤ Basic isolation (an isolated process crash doesn't affect

    others) ➤ Scales predictably (memory usage) ➤ Easy expire (self-terminate the process)
  7. CONS ➤ A repeated failure of a single process can

    take down the application tree ➤ Events are duplicated among processes
  8. EXTRACT DATA STORAGE ➤ Events and releases moved to shared

    ETS tables ➤ Process keeps location data + list of event ids ➤ Requires periodic cleanup of storage (in case of crashes, data may get stale)
  9. PROS ➤ More efficient memory usage ➤ Concurrent reads and

    writes ➤ Data survives everything except a node crash
  10. MORE EXTRACTIONS ➤ Move locations to ETS ➤ Don’t go

    through the process for any reads ➤ The process is only responsible for refresh and expire
  11. PROS ➤ Fast concurrent lookup for everything ➤ Survives refresh

    crashes (worst case scenario is stale data)
  12. GOING DISTRIBUTED ➤ Discreet pieces of data linked by references

    (event ids, musicbrainz ids) ➤ If any reference points to non existing data, we can trigger a refresh and expose the inconsistent state to the api consumer, so that the user has the right expectations ➤ For sharding on normal distribution, we can replace ETS with shards (or equivalent) ➤ Other option is using classic, external datastore (which allows horizontal scaling)
  13. RATE LIMIT ➤ Interaction with external apis can modelled with

    a queue which fetches respecting a rate limit ➤ Api client should also use a rate limiter to avoid being blocked ➤ Load testing with rate limit is key
  14. ASSUME DATA INCONSISTENCY Sooner or later something will crash. Focus

    on writing code that recovers as efficiently as possible
  15. KEEP A WIDE ARSENAL OF TOOLS Queues, rate limiters, backoffs

    are only two examples. Resilient design requires planning.