Upgrade to Pro — share decks privately, control downloads, hide ads and more …

API Optimization Tale: Monitor, Fix and Deploy (on Friday). RailsConf 2021

mrzasa
April 12, 2021

API Optimization Tale: Monitor, Fix and Deploy (on Friday). RailsConf 2021

Slides for a talk from RailsConf 2021: https://railsconf.com/program/sessions#session-1078

Abstract

I saw a green build on a Friday afternoon. I knew I need to push it to production before the weekend. My gut told me it was a trap. I had already stayed late to revert a broken deploy. I knew the risk.

In the middle of a service extraction project, we decided to migrate from REST to GraphQL and optimize API usage. My deploy was a part of this radical change.

Why was I deploying so late? How did we measure the migration effects? And why was I testing on production? I'll tell you a tale of small steps, monitoring, and old tricks in a new setting. Hope, despair, and broken production included.

mrzasa

April 12, 2021
Tweet

More Decks by mrzasa

Other Decks in Programming

Transcript

  1. API OPTIMIZATION TALE: API OPTIMIZATION TALE: MONITOR, FIX MONITOR, FIX

    AND DEPLOY AND DEPLOY (ON FRIDAY) (ON FRIDAY) MACIEK RZĄSA MACIEK RZĄSA TOPTAL TOPTAL Photo by on @MJRZASA @MJRZASA Tim Mossholder Unsplash
  2. BACKEND ENGINEER BACKEND ENGINEER @ TOPTAL @ TOPTAL at work

    Ruby & Postgres & Elasticsearch service extraction after work Rzeszów Ruby User Group ( ) Rzeszów University of Technology software that matters, agile text processing, distributed systems rrug.pl
  3. MONITOR MONITOR wait for it FIX: OPTIMIZE FIX: OPTIMIZE wait

    for it DEPLOY DEPLOY CI checks easy & reliable rollback safe env with a fallback feature flags .
  4. FROM FROM TO TO EXTRACTION EXTRACTION class Product < ApplicationRecord

    has_many :billing_records end class BillingRecord < ApplicationRecord belongs_to :product end class Product < ApplicationRecord def billing_records @billing_records ||= ::Billing::QueryService .billing_records_for_products(self) end end class BillingRecord def product @product ||= Product.find(product_id) end end
  5. standard errors (Rollbar/Sentry) performance (NewRelic) custom request instrumentation (Kibana) method

    name arguments stacktrace response time (elapsed) error EXTRACTION: MONITORING EXTRACTION: MONITORING { "payload": { "method": "records_for_product", "arguments": "[[\"gid://platform/Product/12345\"]]", "stacktrace": "[ \"app/models/product.rb:123\", \"app/services/sell_product.rb:43\" ]", "elapsed": 1.128494586795568, "error": null } }
  6. MONITOR MONITOR standard stack (Rollbar/NewRelic) custom request instrumentation FIX: OPTIMIZE

    FIX: OPTIMIZE wait for it DEPLOY DEPLOY CI checks easy & reliable rollback safe env with a fallback feature flags .
  7. FLOOD OF REQUESTS FLOOD OF REQUESTS PROBLEM: SINGLE VIEW/JOB INITIATES

    PROBLEM: SINGLE VIEW/JOB INITIATES MANY BILLING REQUESTS MANY BILLING REQUESTS HOW MANY? THOUSANDS! HOW MANY? THOUSANDS!
  8. INITIAL INITIAL OPTIMIZED OPTIMIZED FLOOD OF REQUESTS FLOOD OF REQUESTS

    def perform(*) products = Product.eligible products.find_in_batches.each do |product| # one billing request per call DoBusinessLogic.call(product) end end class DoBusinessLogic def call(product) product.billing_records.each {} end end class Product < ApplicationRecord def billing_records @billing_records ||= ::Billing::QueryService .billing_records_for_products(self) end end def perform(*) products = Product.eligible products.find_in_batches do |batch| # one billing request per batch cache_billing_records(batch).each do |p| # no billing requests DoBusinessLogic.call(p) end end end def cache_billing_records(products) indexed_records = ::Billing::QueryService .billing_records_for_products( *products ) .group_by(&:product_gid) products.each do |product| product.cache_billing_records!( indexed_records[product.gid].to_a ) end end
  9. FLOOD OF FLOOD OF REQUESTS? REQUESTS? PRELOAD FOR PRELOAD FOR

    A BATCH AND A BATCH AND (MEM)CACHE (MEM)CACHE ◀ Brad Fitzpatrick, Memcached. Image source: wikipedia.org
  10. FLOOD OF DB QUERIES FLOOD OF DB QUERIES PROBLEM: EVERY

    ELEMENT PROBLEM: EVERY ELEMENT OF A FETCHED COLLECTION (BILLING RECORD) OF A FETCHED COLLECTION (BILLING RECORD) NEEDED PLATFORM DATA (PRODUCT) NEEDED PLATFORM DATA (PRODUCT)
  11. INITIAL INITIAL OPTIMIZED OPTIMIZED FLOOD OF DB QUERIES FLOOD OF

    DB QUERIES def business_logic billing_records = ::Billing::QueryService .billing_records_for_products(*products) billing_records.each do |r| # one query to products table per call BusinessLogic.call(r, r.product) end end class BillingRecord attr_setter :product def product @product ||= Product.find(product_id) end end def business_logic # one query to products table here billing_records = ::Billing::QueryService .billing_records_for_products(*products) billing_records.each do |r| BusinessLogic.call(r, r.product) end end # Billing::QueryService def billing_records_for_products(products) products_by_gid = products.index_by(&:gid) billing_records = fetch_billing_records( product_gids: products_by_gid.keys ) billing_records.each do |billing_record| gid = billing_record.product_gid product = products_by_gid[gid] billing_record.product = product end end
  12. FLOOD OF DB FLOOD OF DB QUERIES? QUERIES? HASH JOINS

    HASH JOINS TO THE TO THE RESCUE! RESCUE! ◀ Ulf Michael Widenius, MySQL. Image source: wikipedia.org
  13. DATA FLOOD DATA FLOOD PROBLEM: GENERIC QUERIES PROBLEM: GENERIC QUERIES

    FETCHING ALL THE DATA THAT MIGHT BE NEEDED FETCHING ALL THE DATA THAT MIGHT BE NEEDED
  14. DATA FLOOD DATA FLOOD # REST response { "gid": "gid://..."

    "clientGid": "gid://..." "productGid": "gid://..." "availability": true "pending": false "frequency": "weekly" "startDate": "2020-08-21" "endDate": "2020-10-28" # ... # 36 fields total # loading 3-4 associations } def billing_records_for_products(*products) fetch_billing_records( filter: {products: products} ).select(&:accessible?) end query($filter: RecordFilter!) { cycles(filter: $filter) { nodes { gid productGid pending frequency } } } def billing_records_for_products(*products) fetch_billing_records( filter: { products: products, accessible: true } ) end
  15. WHAT COULD POSSIBLY GO WRONG? WHAT COULD POSSIBLY GO WRONG?

    ROOT CAUSE ROOT CAUSE FIX FIX # REST client get('/records', **params.slice(:product_gids)) # DB query in billing def billing_records(product_gids: nil, gids: nil, client_gid: nil) scope = ::BillingRecord scope = scope.where(product_gid: product_gids) if product_gids scope = scope.where(gid: gids) if gids scope = scope.where(client_gid: client_gid) if client_gid scope.all end def billing_records(product_gids: nil, gids: nil, client_gid: nil) return [] if [product_gids, gids, client_gid].all?(&:blank?) # ... end
  16. DATA FLOOD? DATA FLOOD? QUERY QUERY CUSTOMIZATION & CUSTOMIZATION &

    UNDERFETCHING UNDERFETCHING FILTERING ON THE FILTERING ON THE SERVER SIDE SERVER SIDE
  17. FREQUENTLY NEEDED DATA FREQUENTLY NEEDED DATA PROBLEM: SINGLE FIELD WAS

    FREQUENTLY USED PROBLEM: SINGLE FIELD WAS FREQUENTLY USED (~1K HITS PER DAY) (~1K HITS PER DAY)
  18. PLAN PLAN add field to kafka build a read model

    backfill the data start using the read model remove billing query SOLUTION SOLUTION find that date in local DB verify if it's really the same date use it and remove billing query FREQUENTLY NEEDED DATA FREQUENTLY NEEDED DATA # 1k billing hits per day ::Billing::QueryService .first_successful_record_created_at(client) &.in_time_zone&.to_date # one local DB query client .products.successful .minimum(:start_date)
  19. DATA NEEDED DATA NEEDED FREQUENTLY? FREQUENTLY? USE THE USE THE

    DOMAIN, LUKE! DOMAIN, LUKE! ◀ Image source: starwars.fandom.com
  20. 429 TOO MANY REQUESTS 429 TOO MANY REQUESTS PROBLEM: SPIKE

    OF REQUESTS PROBLEM: SPIKE OF REQUESTS EVERY SUNDAY EVENING EVERY SUNDAY EVENING
  21. PROBLEM PROBLEM WEEK1. WEEK1. SOLUTION: PRELOADING SOLUTION: PRELOADING WEEK2. PROPER

    SOLUTION: JITTER WEEK2. PROPER SOLUTION: JITTER WEEK3. FINAL PROPER SOLUTION WEEK3. FINAL PROPER SOLUTION WEEK4. REALLY FINAL PROPER WEEK4. REALLY FINAL PROPER SOLUTION: RATE LIMITING SOLUTION: RATE LIMITING 429 TOO MANY REQUESTS 429 TOO MANY REQUESTS # scheduling at talent's 5 PM on Sunday eligible_products.each do |p| WeeklyReminder.schedule( product, day: :sunday, time: '17:00' ) end eligible_products.find_in_batches do |batch| with_billing_records_preloaded(batch) do batch.each do |product| WeeklyReminder.schedule( product, day: :sunday, time: '17:00' ) end end end # class WeeklyReminder def scheduling_time(*) super + (SecureRandom.rand * 120 - 60).seconds end # class AnotherWeeklyReminder def scheduling_time(*) super + (SecureRandom.rand * 120 - 60).seconds end Sidekiq::Limiter.window( 'weekly-reminder', RATE_LIMIT_COUNT, RATE_LIMIT_INTERVAL, wait_timeout: 2 )
  22. 429 TOO MANY 429 TOO MANY REQUESTS? REQUESTS? I DON'T

    ALWAYS TEST ON I DON'T ALWAYS TEST ON PRODUCTION PRODUCTION BUT WHEN I DO, I RUN BUT WHEN I DO, I RUN TESTS ON FRIDAY TESTS ON FRIDAY
  23. MONITOR MONITOR standard stack (Rollbar/NewRelic) custom request instrumentation FIX: OPTIMIZE

    FIX: OPTIMIZE preloading to avoid N+1 app-level hash joins server-side filtering using local data underfetching spreading the load DEPLOY DEPLOY CI checks easy & reliable rollback safe env with a fallback feature flags . NIHIL NOVI SUB SOLE NIHIL NOVI SUB SOLE
  24. MONITOR MONITOR standard stack (Rollbar/NewRelic) custom request instrumentation FIX: OPTIMIZE

    FIX: OPTIMIZE preloading to avoid N+1 every ORM app-level hash joins even MySQL has hash-join server-side filtering find_all{} vs where() using local data The Best Request Is No Request underfetching SELECT * vs SELECT a, b spreading the load highscalability.com post about YouTube, 2012 DEPLOY DEPLOY CI checks easy & reliable rollback safe env with a fallback feature flags NIHIL NOVI SUB SOLE NIHIL NOVI SUB SOLE
  25. FAIL OFTEN SO FAIL OFTEN SO YOU CAN YOU CAN

    SUCCEED SUCCEED SOONER SOONER Tom Kelley Photo: snikologiannis/Flickr; http://ow.ly/CHwhd