cheat, and trick you throughout this presentation. I will omit certain implementation details for the sake of clarity and then add them back in later. If you have a question in one slide, chances are a couple later it will be answered, else ask me at the end.
a framework I developed last summer to deal with the enormous complexity of scheduling and sending emails. This talk is really the story of how I developed Telegram, and why I made certain design decisions. Because I can’t open source the code, I’m trying to give you enough to possibly inspire a similar framework. What are the problems?
be using the train wreck imagery wrong here. It’s not entirely bad code, it’s just that the people who were conducting it, had left the company. Additionally, it was tightly coupled and you couldn’t change it without changing lots of other mail-related code. What was another problem?
here. As a group we decided that a pipeline-structured system would be the best way to handle this. And then I thought, “wait discreet components talking to each other in a pipeline?” Where have we seen this before.
crazy, just hold off judgement for now. So quick recap, we’ve identified our problems and we’ve come up with basically two systems to deal with it. Let’s see how they work.
scheduler.call(user_ids) This is the general structure of the scheduler and slowly we’ll add on to this to demonstrate how powerful it can be. Now this doesn’t really do anything, so let’s start adding some basic functionality.
# ... end end scheduler.call(user_ids) Built in to Telegram is this idea of a filter chain. Filtering is one of the basic needs of any emailing pipeline.
use GenericFilter end end scheduler.call(user_ids) We’ve put generic filter here and we’ll explain this in the next slide. Basically any ruby object can act as a filter if it can respond to one of three methods.
first method is called #dataset. We use the Sequel library which is why it’s called dataset here. Basically Filterchain is building up a chain for it’s initial query to the users table. For AR this would be scopes, not datasets.
second method is called #ignore. Basically before the query is being built, we need to know which user ids to supress right away. Ideally all your filters should be ones that implement #ignore as it saves the cost when we actually need a user object to perform filtering.
end The last method is #filter. Here we’re passing in an array of user objects which we fetched from the database. Blacklist should probably be implemented in a different way, but that’s outside the scope of this redesign. I guess it’s a measure of pragmatism vs perfection???
use DeletedFilter use SuppresionFilter use BlacklistFilter end end scheduler.call(user_ids) So now we have our filters in place. Just as a comparison, at Rearden we have 9 filters in this step. So now we’ve finally slimmed down our list of users who should receive an email.
use DeletedFilter use SuppresionFilter use BlacklistFilter end use Rearden::SecretSauce end scheduler.call(user_ids) So the next components can be whatever you want. At Rearden this is where we start injecting some users from outside the original list, but it can be whatever you want.
# Do something important @app.call(user_ids) end end Remember, just like Rack the middleware can do whatever it wants as long as it passes user ids down and emails up. Now I keep mentioning emails, what exactly do I mean by that?
use DeletedFilter use SuppresionFilter use BlacklistFilter end use Rearden::SecretSauce run Rearden::Targeting end scheduler.call(user_ids) We’ll get to that now, in what’s called the Targeting step. Like any Rack app, the magic happens at the bottom. The Targeting step can do whatever it wants as long as it takes in the array of user ids from the previous step and outputs a “special structure”. I don’t have a name for it, so I’m just going to refer to it with air quotes until you see what they’re used for.
it’s an array of a these “special structures”. The first element of these structures is a string denoting the type, the next N - 1 elements can be any kind of specific data as long as the last element holds an array of user ids. I’ll use an example from our code. Here the 1 stands for the id of a specific deal.
use DeletedFilter use SuppresionFilter use BlacklistFilter end use Batching use Rearden::SecretSauce run Rearden::Targeting end scheduler.call(user_ids) So remember I said that users had to be the last element? It’s important because I withheld a vital piece of information earlier. There still is one piece we need to add, batching.
Now why the hell is this important? each element in this array is a discreet block of emails. Of course having a batch size of two seems silly, 500 is a more acceptable number. The reason why we batch is because we want to isolate failures in the mailer to a single batch. 500 is big enough that we get nice throughput but not big enough that we care if a batch dies for any reason.
use DeletedFilter use SuppresionFilter use BlacklistFilter end use Batching use Rearden::SecretSauce run Rearden::Targeting end scheduler.call(user_ids) Let’s go back to our code. You’ll notice scheduler.call really isn’t doing anything. Sure it’s pushing down our users and getting our “special structures” back. But scheduler doesn’t know what to do with those structures. I’m going to add a place for them to go.
use DeletedFilter use SuppresionFilter use BlacklistFilter end use Batching use Rearden::SecretSauce run Rearden::Targeting end scheduler.call(user_ids) So now I’ve added GenericMailer. This is the gap between Telegram’s two systems. The scheduler is sending the structures to GenericMailer. This is the scheduler.
mailer’s primary function is to answer the question, “what does the user actually receive?” To backtrack a bit, our Scheduler is sending our “special structures” to the mailer, but how is the mailer receiving them?
user_ids) # ... end end The “special structures” are just representations of the method call. In fact our scheduler could turn out any of these structures and GenericMailer would work as long as it had a corresponding method to be called.
= Deal[deal_id] set Morning, :locals => {:deal => deal} User.where(:id => user_ids).each do |user| add_recipient(:to => user.email) end deliver end end So here’s a full example of a mailer, I’ve cleaned it up from our codebase but the structure is all there. Let’s walk through this. Ignoring the “envelopes” for a second. We’re just fetching the deal, and then fetching the users in batches, and then calling deliver. So it looks like the only non-apparent methods are set and add_recipient. For know let’s assume set and add_recipient are modifying an internal data structure.
def formatted_time Time.now.strftime("%B %d, %Y") end def deal_image_url @deal.image_url end def formatted_price format_currency(@deal.price_in_dollars) end def formatted_value format_currency(@deal.value_in_dollars) end end A envelope is where the logic for an email resides. If anyone is familiar with Mustache contexts, this should be straightforward. Telegram uses the same idea, these methods are called and form a hash with data that each email can access.
= Deal[deal_id] set Morning, :locals => {:deal => deal} User.where(:id => user_ids).each do |user| add_recipient(:to => user.email) end deliver end end #set_recipient is very similar, except the has it creates is specific to a user and not specific to the batch. Now I brought up mustache and contexts earlier. I also mentioned that these two method modify an internal data structure.
{:name => "Bob", :to => "[email protected]"}, {:name => "Pooja", :to => "[email protected]"} ] ] This is what it’s building. All of our work has been to get to this point. This is an intermediate representation of a batch of emails. The first element is the HTML and text bodies. The second element is data unique to the batch and finally the third and last element is an array of data that is unique to each email. Questions? This is always where I lose people. So now you can see where I’m using mustache for templating and why I’m building up “contexts”. Set is working on the second element and add_recipient is working on the third.
= Renderer.new(messages) emails = renderer.call emails.each do |email| ActionMailer::Base.wrap_delivery_behavior(email) email.deliver end end end end end This is then sent to the “Renderer” which is not important. But at the end of the process you get a fully rendered email. This is verbatim what our sender does when deliver is called in the mailer. We’re still using ActionMailer to deliver the the mail, why not? We still have to send out emails that don’t go through this system, which is why we keep ActionMailer around.