Slide 1

Slide 1 text

Distributed Systems Your Only Guarantee Is Inconsistency

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

● Generate the user's invoice ● Charge them ● Email them ● Place account holds on delinquent users ● Generate reports for internal finance teams ● Perform other relevant actions Our end-of-month pipeline

Slide 4

Slide 4 text

Where we are... Architecture goals Where we want to be...

Slide 5

Slide 5 text

class MonthClose def perform generate_invoice_items # expensive amount = generate_invoice # expensive success = charge_balance(amount) # external dependencies email_user(amount) # external dependencies handle_failed_charge unless success # complicated and messy end end There's a lot to do

Slide 6

Slide 6 text

What if it fails halfway through?

Slide 7

Slide 7 text

Background workers

Slide 8

Slide 8 text

● Persistent jobs (SQL or Redis) ● Prioritized job queues ● Immediate, recurring, or delayed scheduling ● Expect failure: automatic retries ● Batch jobs with success/failure callbacks Background workers

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

● Sidekiq ● Delayed Job ● Resque Background workers

Slide 11

Slide 11 text

# app/workers/expensive_job_worker.rb class ExpensiveJobWorker include Sidekiq::Worker sidekiq_options(queue: :high) def perform(args) ExpensiveJob.new(args).expensive_method end end # app/lib/expensive_job.rb class ExpensiveJob def initialize(args) @args = args end def expensive_method end end

Slide 12

Slide 12 text

# Run it in the background ExpensiveJobWorker.perform_async(args)

Slide 13

Slide 13 text

# Run it in the background… in 10 minutes ExpensiveJobWorker.perform_in(10.minutes, args)

Slide 14

Slide 14 text

# Run it in the background every day # whenever gem => https://github.com/javan/whenever every :day do runner "ExpensiveJobWorker.perform_async(args)" end

Slide 15

Slide 15 text

class MonthCloseWorker def perform generate_invoice_items amount = generate_invoice charge_balance(amount) email_user(amount) handle_failed_charge end end We can do better

Slide 16

Slide 16 text

class MonthCloseWorker def perform generate_invoice_items amount = generate_invoice PaymentWorker.perform_async(amount) EmailWorker.perform_async(amount) end end Applying it to our use case class PaymentWorker def perform(amount) success = charge_user(amount) HandleFailedChargeWorker.perform_async unless success end end class HandleFailedChargeWorker def perform handle_failed_charge end end class EmailWorker def perform(amount) email_user(amount) end end

Slide 17

Slide 17 text

Before ● ~30 minutes per user (on average) ● 1-2 days for entire month close process After ● <10 minutes per user ● <8 hours for entire month close process So much better

Slide 18

Slide 18 text

Whoops! We just introduced all kinds of bugs

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

Average time between steps Before: 10 µs After: 5 min? 10 min?

Slide 21

Slide 21 text

Our mental model is an ideal world They're created from user stories or an ideal workflow They don't necessarily represent reality

Slide 22

Slide 22 text

Ideal workflows Invoice is generated and then payment is attempted

Slide 23

Slide 23 text

Ideal workflows Payment fails and then the user is suspended

Slide 24

Slide 24 text

Ideal workflows Payment succeeds and then the user is emailed a receipt

Slide 25

Slide 25 text

Notice the and thens?

Slide 26

Slide 26 text

Reality likes buts

Slide 27

Slide 27 text

Real world workflows Invoice is generated, but the user applied a credit before the payment could be made

Slide 28

Slide 28 text

Real world workflows Payment is attempted but the user removed their credit card before we realized we couldn’t charge them

Slide 29

Slide 29 text

Real world workflows Payment is attempted but the user already paid manually

Slide 30

Slide 30 text

Learning #1 When you pass information, you are working under the assumption that represents the state of the world at that time

Slide 31

Slide 31 text

Learning #2 Changing methods from synchronous to asynchronous is an implicit change in behavior

Slide 32

Slide 32 text

What can we do?

Slide 33

Slide 33 text

“Well we need payments to run immediately after an invoice is generated, so we'll mark it highest priority”

Slide 34

Slide 34 text

“Well we need payments to run immediately after an invoice is generated, so we'll mark it highest priority” NO!

Slide 35

Slide 35 text

Don’t engage in a priority arms race

Slide 36

Slide 36 text

queue_priority: - this_one_first_do_not_move_down - super_critical - critical - highest - higher - high - default

Slide 37

Slide 37 text

So… what can we do?

Slide 38

Slide 38 text

Assume the world changes

Slide 39

Slide 39 text

class PaymentWorker def perform(amount) current_balance = user.balance if current_balance != amount # charge user? throw error? do nothing? else charge_user(amount) end end end

Slide 40

Slide 40 text

Bonus

Slide 41

Slide 41 text

No content

Slide 42

Slide 42 text

Freeze your world in time

Slide 43

Slide 43 text

From: [email protected] Subject: Your August 2017 Invoice Hi Anthony, Thanks for being a loyal customer! As of 2017-08-07 19:31:09 PST, your balance is $10.00. Thanks, DigitalOcean

Slide 44

Slide 44 text

From: [email protected] Subject: Your August 2017 Invoice Hi Anthony, Thanks for being a loyal customer! As of 2017-08-07 19:31:09 PST, your balance is $10.00. As of 2017-08-08 03:31:09 CET, your balance is Ft2565. Thanks, DigitalOcean

Slide 45

Slide 45 text

Embrace the inconsistency

Slide 46

Slide 46 text

Thanks [email protected] @azacharax