Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Zero-downtime payment platforms

Zero-downtime payment platforms

Revised talk after given at RailsConf 2013 (https://speakerdeck.com/sikachu/zero-downtime-payment-platforms)

Presented at RailsPacific 2014 on September 27, 2014.

Video is available at https://www.youtube.com/watch?v=N8sYlKheRrk

Prem Sichanugrist

September 27, 2014
Tweet

More Decks by Prem Sichanugrist

Other Decks in Programming

Transcript

  1. • Mobile payments (Android, iOS, WP7) company from Boston •

    Show QR code on phone to cashier to create an order • Order #create to Rails 4.1 app • Eventually hits credit/debit card via payment gateway.
  2. Our Stack • Heroku* cedar • Postgres DB, two followers

    (one on west coast) * Heroku is on AWS.
  3. Risk class  Risk      def  initialize(order)      

       @amount  =  order.balance.to_f      end          def  low?          @amount  <  100.0      end   end
  4. Timeout & Accept • Wrap a charge in a timeout

    • If it times out, evaluate risk • If low risk, save it and return success • Cron task to retry timed-out orders
  5. Timeout #  app/models/customer_charger.rb   def  charge      Timeout.timeout(TIMEOUT_IN_SECONDS)  do

             charge_card_via_gateway      end   rescue  Timeout::Error      assess_risk_of_saving_order_without_charging_card   end
  6. def  assess_risk_of_saving_order_without_charging_card      if  Risk.new(@order).low?        

     true      else          @card.errors.add  :base,  'card  failed!'          false      end   end
  7. def  assess_risk_of_saving_order_without_charging_card      if  Risk.new(@order).low?        

     @order.gateway_id  =                "gateway-­‐down-­‐#{SecureRandom.hex(32)}"          true      else          @card.errors.add  :base,  'card  failed!'          false      end   end
  8. #  app/models/order.rb   def  self.reconcilable      where("gateway_id  LIKE  'gateway-­‐down%'")

      end Order.reconcilable.find_each  do  |order|      order.reconcile   end
  9. def  reconcile      #  search  gateway  for  similar-­‐looking  charge

         if  gateway_id  =  SimilarOrderFinder.new(self).find          #  found  one!  update  this  order  and  don't  re-­‐charge          update_attribute  :gateway_id,  gateway_id      else          charge          save      end   end Order.reconcilable.find_each  do  |order|      order.reconcile   end
  10. Cons • Not really: it worked well for quite a

    while. • Very rarely SimilarOrderFinder might mistakenly find the wrong order.
  11. Number of failed orders 0" 500" 1000" 1500" 2000" 2500"

    10/19/12" 10/20/12" 10/21/12" 10/22/12" 10/23/12" 10/24/12" 10/25/12"
  12. Same risk as before • If an order is accepted

    that can’t be charged, we’re still on the hook. • Our support team follows up with customers to keep lost $$ as low as possible.
  13. Chocolate: • Single POST endpoint to save an Order into

    the database. • Pulls out interesting things (amount, customer to charge, etc).
  14. If order looks real... • Calculate risk: • If low,

    saves everything: params, headers, etc. to DB. • Returns a response that looks identical to a production response.
  15. When we’re back up: • Order model on chocolate has

    a replay method. • Manual process run by support team to track results (and follow up if necessary).
  16. De-duping • Could be a case where an order is

    in chocolate and in production. • Don’t want to double-charge the customer. • Need to de-dupe.
  17. De-duping • Akamai injects a unique request ID for every

    order we create. • Store this on each order in production and on replays in chocolate. • Chocolate sends this as part of a replay.
  18. Triggering • Akamai has a rule that if a POST

    to our order #create endpoint takes > 15 seconds, retry the exact same request on chocolate. • Sometimes production will actually succeed, but not a problem: chocolate de-dupes.
  19. Pros of using something like Akamai • Allows you to

    auto-replay to separate endpoints. • If done correctly, your site will never appear to be down.
  20. 0" 100" 200" 300" 400" 500" 600" 2/1/13" 2/2/13" 2/3/13"

    2/4/13" 2/5/13" 2/6/13" 2/7/13" 2/8/13" 2/9/13" 2/10/13" 2/11/13" 2/12/13" 2/13/13" 2/14/13" 2/15/13" 2/16/13" 2/17/13" 2/18/13" 2/19/13" 2/20/13" 2/21/13" 2/22/13" 2/23/13" 2/24/13" 2/25/13" 2/26/13" 2/27/13" 2/28/13" 3/1/13" 3/2/13" 3/3/13" 3/4/13" 3/5/13" 3/6/13" 3/7/13" 3/8/13" 3/9/13" 3/10/13" 3/11/13" 3/12/13" 3/13/13" 3/14/13" 3/15/13" 3/16/13" 3/17/13" 3/18/13" 3/19/13" 3/20/13" 3/21/13" 3/22/13" 3/23/13" 3/24/13" 3/25/13" 3/26/13" 3/27/13" 3/28/13" 3/29/13" 3/30/13" 3/31/13" 4/1/13" 4/2/13" Failovers*per*day*
  21. Dynos get backed up • Every day, a handful of

    orders still end up failing over to chocolate.
  22. Solutions • Make all endpoints fast to free up
 dynos

    quickly. • Keep tuning unicorn and failover timeouts. • No guaranteed way to solve this.