Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Beyond validates_presence_of: ensuring eventual consistency in distributed systems

Beyond validates_presence_of: ensuring eventual consistency in distributed systems

You've added background jobs. You have calls to external services that perform actions asynchronously. Your data is no longer always in one perfect state-- it's in one of tens or hundreds of acceptable states.

How can you confidently ensure that your data is valid without validations?

In this talk, I’ll introduce some data consistency issues you may see in your app when you begin introducing background jobs and external services. You’ll learn some patterns for handling failure so your data never gets out of sync and we’ll talk about strategies to detect when something is wrong.

Amy Unger

April 27, 2017
Tweet

More Decks by Amy Unger

Other Decks in Programming

Transcript

  1. What I hope you’ll learn Causes Why would I let

    my data go wrong? Prevention How can I prevent data issues before I persist my data? Detection If things do go wrong, how can I make sure I know?
  2. class Product # :validates_presence_of :billing_record end def create @product =

    Product.new(params[:product]) Job::BillingCreator.enqueue(@product.id) respond_with @product.to_json end
  3. class Job::BillingCreator def perform(id) product = Product.find(id) data = {

    product: product.uuid, user: product.user.uuid } BillingServiceClient.new.post(data) end end
  4. Are you distinguishing your network failures? ▪ Cannot connect. ▪

    Service partially completes work; never responds. ▪ Service completes work; network cuts out at response. ▪ Service completes work; sees client side timeout and rolls back.
  5. Retry ▪ Idempotency ▪ Locks ▪ Creates OR Deletes ▪

    Exponential Backoff & Circuit breakers
  6. Roll Back ▪ Great if only your code has seen

    the action. ▪ What about external systems?
  7. Add Timestamps ▪ One timestamp per critical service call. ▪

    Set this column in the same transaction as the service call.
  8. -- Things we have sold that have been cancelled that

    we are still billing for. SELECT * FROM billing_records, products WHERE billing_records.deleted_at IS NULL AND products.deleted_at IS NOT NULL AND billing_records.products_id = products.id AND products.deleted_at < NOW() - interval '15 minutes'|)
  9. SQL with Timestamps ▪ SQL ▪ Automatic runs ▪ Drag-and-drop

    folders ▪ Alerting by default ▪ Documented remediation plans
  10. Challenges ▪ Non-SQL stores ▫ Pull, transform & cache ▫

    Write code, not SQL ▪ No timestamps ▫ Analysis events table ▫ SQL to determine coalescing
  11. Buying a Heroku Add-on Someone wants to buy some Redis!

    Are they authenticated? Is that product available?
  12. Event Stream challenges ▪ What if you emit the wrong

    events? ▪ What if you continue emitting events without doing the work? ▪ What if the stream consumer code is wrong?
  13. What I hope you’ll learn Causes Why would I let

    my data go wrong? Prevention How can I prevent data issues before I persist my data? Detection If things do go wrong, how can I make sure I know?