Save 37% off PRO during our Black Friday Sale! »

Beyond validates_presence_of: ensuring eventual consistency in distributed systems

Beyond validates_presence_of: ensuring eventual consistency in distributed systems

You've added background jobs. You have calls to external services that perform actions asynchronously. Your data is no longer always in one perfect state-- it's in one of tens or hundreds of acceptable states.

How can you confidently ensure that your data is valid without validations?

In this talk, I’ll introduce some data consistency issues you may see in your app when you begin introducing background jobs and external services. You’ll learn some patterns for handling failure so your data never gets out of sync and we’ll talk about strategies to detect when something is wrong.

C904d45853b2e4de64d080c6630c0d8f?s=128

Amy Unger

April 27, 2017
Tweet

Transcript

  1. Beyond :validates_presence_of

  2. Hello! I am amy @cdwort

  3. None
  4. None
  5. What I hope you’ll learn Causes Why would I let

    my data go wrong? Prevention How can I prevent data issues before I persist my data? Detection If things do go wrong, how can I make sure I know?
  6. 1. Causes How your data goes wrong

  7. “ And why would you expect your data to be

    correct?
  8. Tools for data correctness ▪ Database constraints and indexes ▪

    ORM code
  9. Database constraints and indexes

  10. ALTER TABLE product ADD billing_record_id, int NOT NULL;

  11. ORM code

  12. class Product :validates_presence_of :billing_record end

  13. class Product validates: :billing_record, presence: true end

  14. JOBS!

  15. class Product # :validates_presence_of :billing_record end def create @product =

    Product.new(params[:product]) Job::BillingCreator.enqueue(@product.id) respond_with @product.to_json end
  16. SERVICES!

  17. class Job::BillingCreator def perform(id) product = Product.find(id) data = {

    product: product.uuid, user: product.user.uuid } BillingServiceClient.new.post(data) end end
  18. TCP!

  19. Are you distinguishing your network failures? ▪ Cannot connect. ▪

    Service partially completes work; never responds. ▪ Service completes work; network cuts out at response. ▪ Service completes work; sees client side timeout and rolls back.
  20. “ What if my data cannot always be correct?

  21. 2. Prevention How to keep your data sane

  22. Retry ▪ Idempotency ▪ Locks ▪ Creates OR Deletes ▪

    Exponential Backoff & Circuit breakers
  23. Roll Back ▪ Great if only your code has seen

    the action. ▪ What about external systems?
  24. Roll Forward ▪ Job enqueued? ▪ Service called?

  25. “ Okay, cool, what does this look like?

  26. Transactions ▪ What strategy? ▪ Consider: Add your job queue

    to your database
  27. Add Timestamps ▪ One timestamp per critical service call. ▪

    Set this column in the same transaction as the service call.
  28. Code Organization ▪ Write your failure code in the same

    place as your success code.
  29. EmployeeCreator API Job

  30. EmployeeCreator API Job Herokai Creator Engineer Creator

  31. EmployeeCreator API Job Herokai Creator Engineer Creator

  32. 3. Detection Knowing when things have gone wrong

  33. SQL with Timestamps

  34. ALTER TABLE product ADD billing_record_id, int NOT NULL;

  35. -- Things we have sold that have been cancelled that

    we are still billing for. SELECT * FROM billing_records, products WHERE billing_records.deleted_at IS NULL AND products.deleted_at IS NOT NULL AND billing_records.products_id = products.id AND products.deleted_at < NOW() - interval '15 minutes'|)
  36. SQL with Timestamps ▪ SQL ▪ Automatic runs ▪ Drag-and-drop

    folders ▪ Alerting by default ▪ Documented remediation plans
  37. Challenges ▪ Non-SQL stores

  38. API API API API

  39. API API API API

  40. API API API API ETL

  41. API API API API Auditing Code

  42. Challenges ▪ Non-SQL stores ▫ Pull, transform & cache ▫

    Write code, not SQL ▪ No timestamps ▫ Analysis events table ▫ SQL to determine coalescing
  43. Event Stream

  44. Buying a Heroku Add-on Someone wants to buy some Redis!

    Are they authenticated? Is that product available?
  45. ... Redis cluster provisioned Billing started User Response generated

  46. Event Stream benefits ▪ Single format ▪ Black box testing

  47. Event Stream challenges ▪ What if you emit the wrong

    events? ▪ What if you continue emitting events without doing the work? ▪ What if the stream consumer code is wrong?
  48. HIGH RISK

  49. What I hope you’ll learn Causes Why would I let

    my data go wrong? Prevention How can I prevent data issues before I persist my data? Detection If things do go wrong, how can I make sure I know?
  50. Thanks!! amy @cdwort

  51. Questions?

  52. 1.00 - 2.00 pm Heroku booth

  53. Thanks!! amy @cdwort