ACIDic Jobs: A Layman's Guide to Job Bliss (Speaker Notes)

Stephen Margheim — RubyConf 2021 ACIDic Jobs A Layman's Guide
to Job Bliss

Stephen Margheim — RubyConf 2021 ACIDic Jobs A Layman's Guide
to Job Bliss It is an honor to be here today and to be able to talk with you all—those of you here in person, those on the live-stream, and those who might be watching in the future—about how we might be able to achieve job bliss

Stephen Margheim @fractaledmind My name is Stephen. You can find
me on Twitter @fractaledmind, tho fair warning I am more of a voyeur than an author there. I have somehow found myself these days working across 3 different projects • I am the head of engineering at test IO, which offers crowd-driven QA testing as a service to other companies • I am also a consulting engineer for RCRDSHP, a web3 company in the music NFT space • I am also building Smokestack QA on the side, which is an application that helps to integrate manual QA into teams' GitHub Pull Requests But, enough about my jobs, let's get to the topic at-hand — jobs

Jobs are essential in every company, with every app, jobs
are essential. Because jobs are what your app *does*, expressed as a distinct unit And they are powerful. jobs can be called from anywhere, run sync or async, and have retry mechanisms built-in

But what is a Job?

Job == Verb Job != ActiveJob object First, when I
say "job" I do not mean an instance of ActiveJob. jobs are verbs, in the same way models are nouns

Job == State Mutation Job != inspection or retrieval of
information More speciﬁcally, jobs represent those verbs that change the state of your system

Job => Side Effects Job !=> return value and jobs
are state mutations that produce side eﬀects. We typically do not care about their return value, since jobs are most often run async, the return value is never returned to the caller. since jobs are most often state mutations and state is most often stored in some persistent data store, jobs must produce the side-eﬀect of storing new state

• a Ruby class object, • representing a state mutation
action, • that takes as input a representation of initial state • and produces side-effects representing a next state. A job is Thus, we can say that a job is: ... This deﬁnition of "job" encompasses a wide variety of patterns that have emerged in the Ruby ecosystem

ActionDoer.call *arguments # service object ActionJob.perform_now *arguments # active job
ActionWorker.new.perform *arguments # sidekiq worker ActionOperation.run *arguments # operation class for the rest of the talk, I am going to use the language of "jobs" and the interface of ActiveJob, but the principles we will be exploring apply to all of these various ways to expressing a state mutation as a Ruby object So, how do we build jobs *well*?

Jobs must be idempotent & transactional Mike Perham, in the
Sidekiq docs, reminds us of this key truth: ... These are the two key characteristics of a well-built job.

Operation 1 Operation 2 Operation 3 Transaction A transaction is
a collection of operations, typically with ACIDic guarantees

• Atomicity: everything succeeds or everything fails • Consistency: the
data always ends up in a valid state, as deﬁned by your schema • Isolation: concurrent transactions won't conﬂict with each other • Durability: once committed always committed, even with system failures The ACIDic Guarantees The ACIDic guarantees are the foundational characteristics needed for correct and precise state mutations ... > In 1983, Andreas Reuter and Theo Härder coined the acronym ACID as shorthand for atomicity, consistency, isolation, and durability. They were building on earlier work by Jim Gray who’d proposed atomicity, consistency, and durability, but had initially left out the I. ACID is one of those inventions of the 80s that’s not only just still in use in the form of major database systems like Postgres, Oracle, and MSSQL, but which has never been displaced by a better idea.

Jobs · Databases Because SQL databases give us ACIDic transactions
**for free** We can lean on the power and resilience of SQL databases to help make our jobs ACIDic. > "I want to convince you that ACID databases are one of the most important tools in existence for ensuring maintainability and data correctness in big production systems"

Idempotency f(f(x)) == f(x) Job.perform & Job.perform == Job.perform As
for idempotency, for something to be "idempotent" it needs to be safely repeatable

• Functional Idempotency: the function always returns the same result,
even if called multiple times f(f(x)) == f(x) • Practical Idempotency: the side-effect(s) will happen once and only once, no matter how many times the job is performed Job.perform == Job.perform & Job.perform The Idempotent Guarantee idempotency is often defined in terms of pure functions, which while mathematically interesting, is not particularly helpful to us, since return values aren't typically meaningful to jobs. For jobs, we can use a more practical definition focused on side-effects. ...

Jobs · Retries Most job backends give us automatic retries
**for free** This means that if we make our jobs idempotent, we can lean on the power of the retry mechanism to ensure eventual correctness.

class JobRun < ActiveRecord::Base # ... end Now, the core
idea I want to explore with you all today is how this class can be used and leveraged to provide ﬂexible ways to create jobs of various degrees of complexity with these characteristics.

Nathan Griffith provides an excellent overview of how to write
transactional and idempotent jobs in his RailsConf talk from earlier this year; I highly recommend you go and check it out, tho I will briefly summarize his key points. His examples are save methods for synthetic models, but the principles are the same, and based on our definition, his save methods are jobs in-the-making

• Use a transaction to guarantee atomic execution • Use
locks to prevent concurrent data access • Use idempotency and retries to ensure eventual completion • Ensure enqueuing other jobs is co-transactional • Split complex operations into steps ACIDic Job Principles — Nathan Grifﬁth Nathan hones in on these 5 core characteristics of resilient jobs: ... I whole-heartedly agree with this assessment. What I want to explore today is how we can build more generalized tools and patterns to help us more easily ensure our jobs conform to these principles. So, let's consider Nathan's list and work our way thru it, exploring how to build a toolset for writing truly ACIDic jobs.

ACIDic Jobs Level 0 — Transactional Jobs So, let's start
with the ﬁrst two. How can we build transactional jobs generically and ﬂexibly?

def perform(from_account, to_account, amount) run = JobRun.find_or_create_by!( job_class: self.class, job_id:
job_id, job_args: [from_account, to_account, amount]) run.with_lock do from_account.lock! to_account.lock! from_account.update!(balance: from_account.balance - amount) to_account.update!(balance: to_account.balance - amount) end end By using a database record representing this particular job run, we have a mechanism for a database transaction that can be used in any job. Plus, we can also mitigate concurrency issues by locking the database row for this particular job run.

— Mike Perham “Just remember that Sidekiq will execute your
job at least once, not exactly once.” This is important because even Sidekiq, the titan in the Ruby job backend world, makes only an at-least-once guarantee for doing work.

• Use a database record to make job runs transactional
• Use a database lock to mitigate concurrency issues ACIDic Jobs Level 0 Recap The ﬁrst step to making resilient and robust jobs, jobs that are transactional, is to be sure to always wrap the operation in a database transaction and to use database locks to mitigate concurrency issues. A generic `JobRun` class allows us to wrap any and all jobs in such locked database transactions, and thus provides a solid foundation for our jobs.

ACIDic Jobs Level 1 — Idempotent Jobs Next, let's consider
how we can make our transactional jobs idempotent.

Idempotency & Uniqueness To guarantee idempotency, we must be able
to deﬁne and identify the job uniquely For a job to be idempotent, it requires being able to uniquely identify the unit of work being done

john_account.balance # initial state # => 100_00 TransferBalanceJob.perform_later(john_account, jane_account, 10_00)
TransferBalanceJob.perform_later(john_account, jane_account, 10_00) john_account.balance # resulting state # => 80_00 or 90_00 ? Let's consider as an example running a balance transfer job twice with the same arguments. The core question for building idempotency into our job is what is the correct resulting state? How do we tell when the system is executing a job multiple times invalidly versus when the system is executing a job multiple times validly?

• each job run uses a generic unique entity representing
this job run • each job run uses a generic unique entity representing this job execution (based on args) Forms of Job Uniqueness There are basically 2 diﬀerent ways to make your job have a sense of uniqueness Firstly, you can treat each run of the job as a unique entity, or you could treat each execution of the job as a unique entity. What do I mean? Well, ...

def perform(from_account, to_account, amount) run = JobRun.find_or_create_by!(job_class: self.class, job_id: job_id)
run.with_lock do return if run.completed? from_account.update!(balance: from_account.balance - amount) to_account.update!(balance: to_account.balance - amount) run.update!(completed_at: Time.current) end end Unique Job by Job Run For our ﬁrst example, let's imagine our `JobRun` model has a uniqueness constraint on the union of `job_class` and `job_id` and keeps track of when job runs are completed. With just a bit of boilerplate, we have a job that once enqueued will only ever execute the operation once, no matter if Sidekiq picks that job oﬀ the queue multiple times.

TransferBalanceJob.perform_later(john_account, jane_account, 10_00) john_account.balance # resulting state # => 80_00 ? This approach treats each separate enqueuing as a valid enqueuing, even with duplicate arguments. When using this strategy for job uniqueness we trust enqueuing but are wary of dequeuing. We could imagine the opposite tho, and this is the second strategy.

def perform(from_account, to_account, amount) run = JobRun.find_or_create_by!(job_class: self.class, job_args: [from_account,
to_account, amount]) run.with_lock do return if run.completed? from_account.update!(balance: from_account.balance - amount) to_account.update!(balance: to_account.balance - amount) run.update!(completed_at: Time.current) end end Unique Job by Execution Args We could imagine a slightly diﬀerent `JobRun` model that cares about the unique union of job_class and job_args. In this situation, it wouldn't matter how many times a job is enqueued or dequeued, it will only execute the operation once. This strategy is essentially the idempotent job version of memoization.

TransferBalanceJob.perform_later(john_account, jane_account, 10_00) john_account.balance # resulting state # => 90_00 ? So, with this "memoization" strategy, we would treat enqueuing with duplicate arguments as an invalid enqueuing. Thus we don't necessarily trust the enqueuing code to be trustworthy. This strategy is more cautious than the ﬁrst strategy, but that doesn't make it necessarily better. There are certainly units of work that rightfully should do the same thing multiple times. In our balance transfer example, it is perfectly reasonable to think that John might give Jane $10 once and then rightfully give her another $10 later.

Unique Job flexibly and generically class TransferBalanceJob < ApplicationJob prepend
UniqueByJobRun uniquely_identified_by_job_id # uniquely_identified_by_job_args def perform(from_account, to_account, amount) from_account.lock! to_account.lock! from_account.update!(balance: from_account.balance - amount) to_account.update!(balance: to_account.balance - amount) end end But, with a bit of work, we could build a job concern that allows each job to declare how it should be uniquely identiﬁed, all while still relying simply on our `JobRun` model. Such an approach would allow us to make our `TransferBalanceJob` behave in whichever way was correct for our system ...

TransferBalanceJob.perform_later(john_account, jane_account, 10_00) john_account.balance # resulting state # => 80_00 ... whether that is allowing the system to enqueue multiple runs of the same job with the same arguments

Unique Job flexibly and generically class TransferBalanceJob < ApplicationJob prepend
UniqueByJobRun # uniquely_identified_by_job_id uniquely_identified_by_job_args def perform(from_account, to_account, amount) from_account.lock! to_account.lock! from_account.update!(balance: from_account.balance - amount) to_account.update!(balance: to_account.balance - amount) end end Or, if the job was uniquely identiﬁed by its job args, ...

TransferBalanceJob.perform_later(john_account, jane_account, 10_00) john_account.balance # resulting state # => 90_00 ... constraining the system to only execute the operation once, even if enqueued multiple times.

• Use a database record to make job runs idempotent
• custom or generic • by job ID or by job args ACIDic Jobs Level 1 Recap Thus, we could use our `JobRun` class as the foundation for building both transactionality and idempotency into our jobs. But, thus far, we have only considered jobs whose operations are only database writes. Often, our jobs need to do more.

ACIDic Jobs Level 2 — Enqueuing other Jobs One of
the most common additional tasks required of jobs is to enqueue other jobs

uniquely_identified_by_job_id def perform(from_account, to_account, amount) from_account.lock! to_account.lock! from_account.update!(balance: from_account.balance -
amount) to_account.update!(balance: to_account.balance - amount) TransferMailer.with(account: from_account).outbound.deliver_later TransferMailer.with(account: to_account).inbound.deliver_later end We can imagine that after transferring balance, we need to send out conﬁrmation emails to both parties. As it stands, this code is susceptible to problems.

Failure Condition 1 job queue from.save! job process to.save! TransferMailer.deliver_later
transaction commits job starts job fails job queued by web process and dequeued by background worker The ﬁrst possible problem is that Sidekiq is simply too fast and enqueues, dequeues, and executes the job before the original job's database transaction commits. Because our outer job is ACIDic, this second job won't be able to see the state of the database created by the transaction until that transaction commits. While annoying, this problem at least naturally resolves, as the retry mechanism will try this second job again, and at some point that transaction will commit and the job will eventually succeed.

Failure Condition 2 job queue from.save! job process to.save! TransferMailer.deliver_later
job starts job fails rollback job queued by web process and dequeued by background worker The second problem is related, in that it is due to enqueuing a job within a transaction, but is also more pernicious. Imagine that instead of simply taking a while for the transaction to commit, the transaction actually rolls back. In this case the state mutations will be discarded completely, and the job inserted into the queue will never succeed no matter how many times it's retried.

Solution Option 1 • A database-backed job queue • delayed_job
• que • good_job But... no Sidekiq and increased db load One solution to these problems is to simply use a database-backed queue for all of your jobs, which makes all job enqueuing co-transactional. This is Nathan's suggestion. However, this means no Sidekiq, and for me that is a non-starter, as Sidekiq oﬀers more than just a job backend.

The second option is a pattern I ﬁrst learned of
from Brandur Leach in a blog post that is truly execellent and well worth your time. In fact, his entire blog is excellent and worth your time.

Solution Option 2 • A transactionally-staged job queue So... more
Sidekiq and minimal increased db load Brandur lays out the core value proposition clearly: make job enqueuing co-transactional by "staging" jobs in the database before enqueuing them in your background queue.

uniquely_identified_by_job_id def perform(from_account, to_account, amount) from_account.lock! to_account.lock! from_account.update!(balance: from_account.balance -
amount) to_account.update!(balance: to_account.balance - amount) TransferMailer.with(account: from_account).outbound.deliver_acidic TransferMailer.with(account: to_account).inbound.deliver_acidic end What if we could make it as easy to stage a job as it is to enqueue it? I was surprised at how little code we need to achieve this. For our example, we can extend `ActionMailer::MessageDelivery` to add a method to stage the job in a database record

def deliver_acidic(options = {}) job = delivery_job_class attributes = {
adapter: "activejob", job_name: job.name } job_args = if job <= ActionMailer::Parameterized::MailDeliveryJob [@mailer_class.name, @action, "deliver_now", {params: @params, args: @args}] else [@mailer_class.name, @action, "deliver_now", @params, *@args] end attributes[:job_args] = job.new(job_args).serialize StagedJob.create!(attributes) end Our custom delivery method does a bit of work to handle the diﬀerent kinds of ActionMailer deliveries, but at its heart all it essentially does is create a database record, which will respect the transactional boundary.

class StagedJob < ActiveRecord::Base after_create_commit :enqueue_job def enqueue_job case adapter
when "activejob" ActiveJob::Base.deserialize(job_args).enqueue when "sidekiq" Sidekiq::Client.push("class" => job_name, "args" => job_args) end end end And that database record can then enqueue the job via an ActiveRecord callback Nathan and Brandur both actually imagine this pattern as requiring an independent process to de-stage staged jobs and enqueue them. But, a little bit of ActiveRecord magic allows us to have our cake and eat it too.

• Use transactionally-staged jobs to keep job enqueuing co-transactional with
standard database operations • while keeping Sidekiq, and • not requiring an independent de-staging process ACIDic Jobs Level 2 Recap So, by using transactionally-staged jobs, we can keep the ACIDic guarantees provided by our database transaction, keep using Sidekiq, and not need an independent de-staging process.

ACIDic Jobs Level 3 — Operational Steps Nathan's last recommendation
is to split complex operations into steps

def perform(order) order.lock! order.process_and_fulfill! ShopifyAPI::Fulfillment.create!({ amount: order.amount, customer: order.purchaser, })
OrderMailer.with(order: order).fulfilled.deliver_acidic end A standard example would be fulﬁlling an order in a Shopify store. You receive the webhook for the order, ...

OrderMailer.with(order: order).fulfilled.deliver_acidic end you process that order and do whatever all you need to do in your database, ...

OrderMailer.with(order: order).fulfilled.deliver_acidic end And only after that transaction successfully commits do we start with step 2, telling Shopify that the order has been fulﬁlled ...

OrderMailer.with(order: order).fulfilled.deliver_acidic end And only after we successfully create the Shopify fulﬁllment do we want to send the email notiﬁcation

OrderMailer.with(order: order).fulfilled.deliver_acidic end But what happens if we can't, for some reason (like our email service provider is experiencing downtime), send out the notiﬁcation email?

OrderMailer.with(order: order).fulfilled.deliver_acidic end ? When the job retries, will we fulﬁll this order a second time, as if the user had purchased the same product twice?

Another Brandur Leach blog post can help us navigate out
of this tricky situation, showing us how to break our complex job workﬂow into transactional steps

Workflow step 1 step 2 step 3 In our example
workﬂow, we have 3 steps that are serially-dependent

Workflow — Run 1 step 1 step 2 step 3
If, on the ﬁrst run, the ﬁrst step succeeds but the second step fails

Workflow — Run 2 step 1 step 2 step 3
On the second run, the ﬁrst step will be skipped altogether, and the retry will jump straight to the second step.

Job-wise vs Step-wise Idempotency This is what it means to
move from job-wise to step-wise idempotency ...

Job-wise vs Step-wise vs Idempotency ... and I won't even
get into what all it would take to make our jobs "pennywise idempotent" ...

uniquely_identified_by_job_args def perform(order) @job.with_lock do order.process_and_fulfill! @job.update!(recovery_point: :fulfill_order) end if
@job.recovery_point == :start # ... end But, we can make the workﬂow at least step-wise idempotent by again leveraging the power of a database record representing the job execution. If we add a `recovery_point` column to the record, we can track which steps in the workﬂow have successfully been completed. Presuming the job record is created with the value initially set to `:start`, we then simply update the column at the end of the step. We then guard the execution of that step with the value of this column.

uniquely_identified_by_job_args def perform(order) # ... @job.with_lock do ShopifyAPI::Fulfillment.create!({ ... })
@job.update!(recovery_point: :send_email) end if @job.recovery_point == :fulfill_order # ... end Then, we guard the second step operation with the recovery point name of the second step, and again update the job record with the name of the next recovery point at the end of the step.

uniquely_identified_by_job_args def perform(order) # ... @job.with_lock do OrderMailer.with(order: order).fulfilled.deliver_acidic @job.update!(recovery_point:
:finished) end if @job.recovery_point == :send_email end In the ﬁnal step, we keep the same guarding logic but update the recovery point to `:ﬁnished` upon step completion.

include WithAcidity def perform(order) with_acidity do step :process_order step :fulfill_order
step :send_emails end end def process_order; # ... end def fulfill_order; # ... end def send_emails; # ... end Imagine we could have 1 Workﬂow Job that provides a clear overview of all of the steps and how they ﬂow, but is also step-wise idempotent. Moreover, we wouldn't have to repeat the boilerplate of the job lock transaction and the recovery-key updates.

module WithAcidity def perform_step(current_step_method, next_step_method) return unless @job.recovery_point == current_step_method
@job.with_lock do method(current_step_method).call @job.update!(recovery_point: next_step_method) end end end We could imagine a relatively simple heart to another concern that provides this DSL, which simply calls the step method and updates the job record recovery point within a transaction. Once again, our `JobRun` class and our locked database transactions form the foundation of an increasingly powerful and ﬂexible set of tools for building resilient and robust jobs.

• Use a recovery key to keep track of which
steps in a workflow job have been successfully completed • make each step ACIDic, and • keep the entire workflow job ACIDic ACIDic Jobs Level 3 Recap So, by using this recovery key field on our `JobRun` records, we can move the ACIDic guarantees provided by our database transaction down into each individual step of the workflow, and thus allow the entire workflow job to remain sufficiently transactional and step-wise idempotent

ACIDic Jobs Level 4 — Step Batches The problem with
the step-wise idempotent workﬂow is that all work is done sequentially, in the same worker, in the same queue. What if we have work that can be done in parallel, or work that is better done on a diﬀerent queue?

def perform(order) with_acidity do step :process_order step :fulfill_order, awaits: [ShopifyFulfillJob]
step :send_emails end end For example, instead of blocking our workﬂow queue with the external API call to Shopify, what if we could simply call a job that runs on a separate queue, but still not move onto to step 3 until that job succeeds?

— Mike Perham “Batches are Sidekiq Pro's [tool to] create
a set of jobs to execute in parallel and then execute a callback when all the jobs are finished.” Sidekiq Pro offers the amazing Batches feature, which provides—among other things—a callback for when a specified collection of jobs are all successfully finished. This provides a mechanism for adding a wonderful new layer of power to our jobs

Parallel Executing + Workflow Blocking It allows us to deﬁne
steps that have parallel executing jobs, but the step is still blocking the workﬂow from moving onto the next step

Workflow Job A Shopify Fulfill Job A Workflow Job B
Workflow Job A Thus, we could have a 3 step workflow that actually executes the second step on a separate queue, allowing another workflow job to start in parallel, while still ensuring that step 3 doesn't start until that separate job succeeds.

• Use Sidekiq Batches to allow parallel, separately queued jobs
to be used within a multi-step workﬂow • keep steps serially dependent • while allowing for parallelizatino ACIDic Jobs Level 4 Recap So, by leveraging the power of Sidekiq Batches, we can take our ACIDic jobs to the next level and enable steps to compose parallel executing jobs that can run on separate queues, while the workﬂow as a whole still retains its step-wise idempotency and serial dependence.

I am working to provide all of these various techniques
and tools for building out increasingly complex jobs, all while maintaining transactionality and idempotency, in a new gem that I call `acidic_job`. It is still in a pre-1.0 state, but we are using it in production in all of my various work projects successfully. I have no doubt that the community, that you all, could help me to bring this to 1.0 and provide the Ruby ecosystem with a powerful and ﬂexible toolset for building resilient, robust, acidic jobs. END

ACIDic Jobs: A Layman's Guide to Job Bliss (Speaker Notes)

ACIDic Jobs: A Layman's Guide to Job Bliss (Speaker Notes)

More Decks by Stephen

Other Decks in Programming

Featured

Transcript