Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Telegram or: How I Learned to Stop Worrying and...

Sam Aarons
October 06, 2012

Telegram or: How I Learned to Stop Worrying and Love Mass Mail

This is a talk I gave at MagicRuby 2012.

Sam Aarons

October 06, 2012
Tweet

More Decks by Sam Aarons

Other Decks in Programming

Transcript

  1. T e l e g r a m Or: How

    I Learned to Stop Worrying & Love Mass Mail
  2. Sam Aarons | github.com/saarons | Rearden Commerce About Me Junior

    at Columbia Currently an intern/consultant working at Rearden Commerce based out of New York
  3. I will betray, cheat & Trick you I will betray,

    cheat, and trick you throughout this presentation. I will omit certain implementation details for the sake of clarity and then add them back in later. If you have a question in one slide, chances are a couple later it will be answered, else ask me at the end.
  4. The PRoblem First of all, what is Telegram? Telegram is

    a framework I developed last summer to deal with the enormous complexity of scheduling and sending emails. This talk is really the story of how I developed Telegram, and why I made certain design decisions. Because I can’t open source the code, I’m trying to give you enough to possibly inspire a similar framework. What are the problems?
  5. OLD code The old code was a problem. I might

    be using the train wreck imagery wrong here. It’s not entirely bad code, it’s just that the people who were conducting it, had left the company. Additionally, it was tightly coupled and you couldn’t change it without changing lots of other mail-related code. What was another problem?
  6. Speed Speed was a problem. We send over 1,000,000 emails

    a day, and we need any new system to be more performant.
  7. Choice Choice was a problem. We have different requirements (HR,

    YP, Chase) and each partner can completely customize their email. Style, markup, even the subject line. Managing this was tough.
  8. The Solution So now let’s talk about the solutions. We

    talk all the problems we just talked about and we spent a few hours throwing it all up on the whiteboard.
  9. You can start to see the genesis of the idea

    here. As a group we decided that a pipeline-structured system would be the best way to handle this. And then I thought, “wait discreet components talking to each other in a pipeline?” Where have we seen this before.
  10. So this is what forms the essence of Telegram. Instead

    of http requests and responses, we have users going down and emails coming up.
  11. Users Emails Again, I’m lying to you right now. This

    may not make sense, we’ll dive deeper later.
  12. The Solution Scheduler ????????? So back to the solution. We

    call this rack-like system is the scheduler. But it’s only half of the solution. What’s the other half?
  13. Code Management Customization Performance The scheduler takes care of the

    first and third problems. So how do we solve customization? You basically write your own ActionMailer.
  14. The Solution Scheduler Mailer So you might start thinking I’m

    crazy, just hold off judgement for now. So quick recap, we’ve identified our problems and we’ve come up with basically two systems to deal with it. Let’s see how they work.
  15. The Scheduler Let’s discuss the scheduler. I could spend some

    time telling you how it works, but I think it’s much easier to show you in code and walk you through it.
  16. user_ids = [1,2,3] scheduler = Scheduler.new do # ... end

    scheduler.call(user_ids) This is the general structure of the scheduler and slowly we’ll add on to this to demonstrate how powerful it can be. Now this doesn’t really do anything, so let’s start adding some basic functionality.
  17. user_ids = [1,2,3] scheduler = Scheduler.new do use FilterChain do

    # ... end end scheduler.call(user_ids) Built in to Telegram is this idea of a filter chain. Filtering is one of the basic needs of any emailing pipeline.
  18. user_ids = [1,2,3] scheduler = Scheduler.new do use FilterChain do

    use GenericFilter end end scheduler.call(user_ids) We’ve put generic filter here and we’ll explain this in the next slide. Basically any ruby object can act as a filter if it can respond to one of three methods.
  19. class GenericFilter def dataset(chain) chain.where(:is_deleted => false) end end The

    first method is called #dataset. We use the Sequel library which is why it’s called dataset here. Basically Filterchain is building up a chain for it’s initial query to the users table. For AR this would be scopes, not datasets.
  20. class GenericFilter def ignore(user_ids) user_ids - SuppressionList.today end end The

    second method is called #ignore. Basically before the query is being built, we need to know which user ids to supress right away. Ideally all your filters should be ones that implement #ignore as it saves the cost when we actually need a user object to perform filtering.
  21. class GenericFilter def filter(users) users.reject do |u| Blacklist.include?(u.email) end end

    end The last method is #filter. Here we’re passing in an array of user objects which we fetched from the database. Blacklist should probably be implemented in a different way, but that’s outside the scope of this redesign. I guess it’s a measure of pragmatism vs perfection???
  22. user_ids = [1,2,3] scheduler = Scheduler.new do use FilterChain do

    use DeletedFilter use SuppresionFilter use BlacklistFilter end end scheduler.call(user_ids) So now we have our filters in place. Just as a comparison, at Rearden we have 9 filters in this step. So now we’ve finally slimmed down our list of users who should receive an email.
  23. user_ids = [1,2,3] scheduler = Scheduler.new do use FilterChain do

    use DeletedFilter use SuppresionFilter use BlacklistFilter end use Rearden::SecretSauce end scheduler.call(user_ids) So the next components can be whatever you want. At Rearden this is where we start injecting some users from outside the original list, but it can be whatever you want.
  24. class Rearden::SecretSauce def initialize(app) @app = app end def call(user_ids)

    # Do something important @app.call(user_ids) end end Remember, just like Rack the middleware can do whatever it wants as long as it passes user ids down and emails up. Now I keep mentioning emails, what exactly do I mean by that?
  25. user_ids = [1,2,3] scheduler = Scheduler.new do use FilterChain do

    use DeletedFilter use SuppresionFilter use BlacklistFilter end use Rearden::SecretSauce run Rearden::Targeting end scheduler.call(user_ids) We’ll get to that now, in what’s called the Targeting step. Like any Rack app, the magic happens at the bottom. The Targeting step can do whatever it wants as long as it takes in the array of user ids from the previous step and outputs a “special structure”. I don’t have a name for it, so I’m just going to refer to it with air quotes until you see what they’re used for.
  26. [ ["morning", 1, [1,2,3]] ] Practically the same thing, except

    it’s an array of a these “special structures”. The first element of these structures is a string denoting the type, the next N - 1 elements can be any kind of specific data as long as the last element holds an array of user ids. I’ll use an example from our code. Here the 1 stands for the id of a specific deal.
  27. user_ids = [1,2,3] scheduler = Scheduler.new do use FilterChain do

    use DeletedFilter use SuppresionFilter use BlacklistFilter end use Batching use Rearden::SecretSauce run Rearden::Targeting end scheduler.call(user_ids) So remember I said that users had to be the last element? It’s important because I withheld a vital piece of information earlier. There still is one piece we need to add, batching.
  28. [ ["morning", 1, [1,2,3]] ] Let’s say we have a

    batching size of 2. Batching would transform this...
  29. [ ["morning", 1, [1,2]], ["morning", 1, [3]] ] into this.

    Now why the hell is this important? each element in this array is a discreet block of emails. Of course having a batch size of two seems silly, 500 is a more acceptable number. The reason why we batch is because we want to isolate failures in the mailer to a single batch. 500 is big enough that we get nice throughput but not big enough that we care if a batch dies for any reason.
  30. user_ids = [1,2,3] scheduler = Scheduler.new do use FilterChain do

    use DeletedFilter use SuppresionFilter use BlacklistFilter end use Batching use Rearden::SecretSauce run Rearden::Targeting end scheduler.call(user_ids) Let’s go back to our code. You’ll notice scheduler.call really isn’t doing anything. Sure it’s pushing down our users and getting our “special structures” back. But scheduler doesn’t know what to do with those structures. I’m going to add a place for them to go.
  31. user_ids = [1,2,3] scheduler = Scheduler.new(GenericMailer) do use FilterChain do

    use DeletedFilter use SuppresionFilter use BlacklistFilter end use Batching use Rearden::SecretSauce run Rearden::Targeting end scheduler.call(user_ids) So now I’ve added GenericMailer. This is the gap between Telegram’s two systems. The scheduler is sending the structures to GenericMailer. This is the scheduler.
  32. Who should I send email to? So to recap, the

    Scheduler answers two important questions.
  33. The Mailer So now we’ve gotten to the mailer. The

    mailer’s primary function is to answer the question, “what does the user actually receive?” To backtrack a bit, our Scheduler is sending our “special structures” to the mailer, but how is the mailer receiving them?
  34. class GenericMailer < Telegram::Base def morning(deal_id, user_ids) # ... end

    end Now look at the mailer. RIght off the bat, anything seem familiar?
  35. # ["morning", 1, [1,2,3]] class GenericMailer < Telegram::Base def morning(deal_id,

    user_ids) # ... end end The “special structures” are just representations of the method call. In fact our scheduler could turn out any of these structures and GenericMailer would work as long as it had a corresponding method to be called.
  36. require "envelopes/morning" class GenericMailer < Telegram::Base def morning(deal_id, user_ids) deal

    = Deal[deal_id] set Morning, :locals => {:deal => deal} User.where(:id => user_ids).each do |user| add_recipient(:to => user.email) end deliver end end So here’s a full example of a mailer, I’ve cleaned it up from our codebase but the structure is all there. Let’s walk through this. Ignoring the “envelopes” for a second. We’re just fetching the deal, and then fetching the users in batches, and then calling deliver. So it looks like the only non-apparent methods are set and add_recipient. For know let’s assume set and add_recipient are modifying an internal data structure.
  37. class Morning < Telegram::Envelope def subject @deal.email_title.presence || @deal.title end

    def formatted_time Time.now.strftime("%B %d, %Y") end def deal_image_url @deal.image_url end def formatted_price format_currency(@deal.price_in_dollars) end def formatted_value format_currency(@deal.value_in_dollars) end end A envelope is where the logic for an email resides. If anyone is familiar with Mustache contexts, this should be straightforward. Telegram uses the same idea, these methods are called and form a hash with data that each email can access.
  38. require "envelopes/morning" class GenericMailer < Telegram::Base def morning(deal_id, user_ids) deal

    = Deal[deal_id] set Morning, :locals => {:deal => deal} User.where(:id => user_ids).each do |user| add_recipient(:to => user.email) end deliver end end #set_recipient is very similar, except the has it creates is specific to a user and not specific to the batch. Now I brought up mustache and contexts earlier. I also mentioned that these two method modify an internal data structure.
  39. [ ["<p>Hello {{name}}</p>", "Hello {{name}}"], {:subject => "Hello Email"}, [

    {:name => "Bob", :to => "[email protected]"}, {:name => "Pooja", :to => "[email protected]"} ] ] This is what it’s building. All of our work has been to get to this point. This is an intermediate representation of a batch of emails. The first element is the HTML and text bodies. The second element is data unique to the batch and finally the third and last element is an array of data that is unique to each email. Questions? This is always where I lose people. So now you can see where I’m using mustache for templating and why I’m building up “contexts”. Set is working on the second element and add_recipient is working on the third.
  40. module Telegram class Sender class << self def deliver(messages) renderer

    = Renderer.new(messages) emails = renderer.call emails.each do |email| ActionMailer::Base.wrap_delivery_behavior(email) email.deliver end end end end end This is then sent to the “Renderer” which is not important. But at the end of the process you get a fully rendered email. This is verbatim what our sender does when deliver is called in the mailer. We’re still using ActionMailer to deliver the the mail, why not? We still have to send out emails that don’t go through this system, which is why we keep ActionMailer around.